Download as pdf or txt
Download as pdf or txt
You are on page 1of 376

Outliers in

Statistica[ ata

VIC BARNETT
University of Sheffield

and

TOBY LEWIS
University of Hull

John Wiley & Sons


Chichester · New York · Brisbane · Toronto
Outliers in
Statistica[ ata

VIC BARNETT
University of Sheffield

and

TOBY LEWIS
University of Hull

John Wiley & Sons


Chichester · New York · Brisbane · Toronto
Preface

The concept of an outlier has fascinated experimentalists since the earliest


attempts to interpret data. Even before the formai development of statistica!
method, argument raged over whether, and on what basis, we should discard
observations from a set of data on the grounds that they are 'unrepresenta-
tive', 'spurious', or 'mavericks' or 'rogues'. The early emphasis stressed the
contamination of the data by unanticipated and unwelcome errors or mis-
takes affecting some of the observations. Attitudes varied from one extreme
to another: from the view that we should never sully the sanctity of the data
by daring to adjudge its propriety, to an ultimate pragmatism expressing 'if
Copyright© 1978 by John Wiley & Sons Ltd.
in doubt, throw it out'.
Reprinted February 1979 The present views are more sophisticated. A wider variety of aims are
Reprinted June 1980 recognized in the handling of outliers, outlier-generating models bave been
Ali rights reserved. proposed, and there is now available a vast array of specific statistica!
techniques for processing outliers. The work is scattered throughout the
No part of this book may be reproduced by any means, nor
transmitted, nor translated into a machine language with- literature of the prese nt century, shows no sign of any abatement, but has
out the written permission of the publisher. no t previously bee n drawn together in a comprehensive review. Our purpose
in writing this book is to attempt to provide such a review, at two levels. On
Library of Congress Cataloging in Publication Data: the one band we seek to survey the existing state of knowledge in the outlier
field and to present the details of selected procedures for different situations.
Barnett, Vie.
Outliers in statistica} data. Barnett!Lewis On the other band we attempt to categorize differences in attitude, aim, and
(Wiley series in probability and mathematical model in the study of outliers, and to follow the implications of such
statistics) distinctions for the development of new research approaches. In offering
Bibliography: p. such a comprehensive overview of the principles and methods associated
lncludes index. with outliers we hope that we may help the practitioner in the analysis of
1. Outliers (Statistics) I. Lewis, Tobias, joint author. Il. Title.
QA276.B2849 519.5 77-21024
data and the researcher in opening up possible new avenues of enquiry.
Early work on outliers was (inevitably) characterized by lack of attention
ISBN O 471 99599 1 to the modelling of the outlier-generating mechanism, by informality of
technique with no hacking in terms of a study of the statistica! properties of
Typeset by The Universities Press, Belfast, Northern Ireland proposed procedures, and by a leaning towards the hardline view that
Printed and bound in Great Britain
at The Pitman Press, Bath
outliers should be either rejected or retained with full import. Even today
v
Preface

The concept of an outlier has fascinated experimentalists since the earliest


attempts to interpret data. Even before the formai development of statistica!
method, argument raged over whether, and on what basis, we should discard
observations from a set of data on the grounds that they are 'unrepresenta-
tive', 'spurious', or 'mavericks' or 'rogues'. The early emphasis stressed the
contamination of the data by unanticipated and unwelcome errors or mis-
takes affecting some of the observations. Attitudes varied from one extreme
to another: from the view that we should never sully the sanctity of the data
by daring to adjudge its propriety, to an ultimate pragmatism expressing 'if
Copyright© 1978 by John Wiley & Sons Ltd.
in doubt, throw it out'.
Reprinted February 1979 The present views are more sophisticated. A wider variety of aims are
Reprinted June 1980 recognized in the handling of outliers, outlier-generating models bave been
Ali rights reserved. proposed, and there is now available a vast array of specific statistica!
techniques for processing outliers. The work is scattered throughout the
No part of this book may be reproduced by any means, nor
transmitted, nor translated into a machine language with- literature of the prese nt century, shows no sign of any abatement, but has
out the written permission of the publisher. no t previously bee n drawn together in a comprehensive review. Our purpose
in writing this book is to attempt to provide such a review, at two levels. On
Library of Congress Cataloging in Publication Data: the one band we seek to survey the existing state of knowledge in the outlier
field and to present the details of selected procedures for different situations.
Barnett, Vie.
Outliers in statistica} data. Barnett!Lewis On the other band we attempt to categorize differences in attitude, aim, and
(Wiley series in probability and mathematical model in the study of outliers, and to follow the implications of such
statistics) distinctions for the development of new research approaches. In offering
Bibliography: p. such a comprehensive overview of the principles and methods associated
lncludes index. with outliers we hope that we may help the practitioner in the analysis of
1. Outliers (Statistics) I. Lewis, Tobias, joint author. Il. Title.
QA276.B2849 519.5 77-21024
data and the researcher in opening up possible new avenues of enquiry.
Early work on outliers was (inevitably) characterized by lack of attention
ISBN O 471 99599 1 to the modelling of the outlier-generating mechanism, by informality of
technique with no hacking in terms of a study of the statistica! properties of
Typeset by The Universities Press, Belfast, Northern Ireland proposed procedures, and by a leaning towards the hardline view that
Printed and bound in Great Britain
at The Pitman Press, Bath
outliers should be either rejected or retained with full import. Even today
v
vi Preface Preface vii

sufficient attention is not always paid to the form of the outlier model, or to Iease of life over the las t decade or so. It is directed to more than one kind
the practical purpose of investigating outliers, in the presentation of of reader: to the student (to inform him of the range of ideas and techni-
methods for processing outliers. Many procedures have an ad hoc, intui- ques), to the experimentalist (to assist him in the judicious choice of
tively justified, basis with little external reference in the sense of the relative methods for handling outliers), and to the professional statistician (as a
statistica! merits of different possibilities. In reviewing such techniques we guide to the present state of knowledge and a springboard for further
will attempt to set them, as far as possible, within a wider framework of research).
model, statistica! principle, and practical aim, and we shall also consider the The level of treatment assumes a knowledge of elementary probability
extent to which such basic considerations have begun to formally permeate theory and statistica! method such as would be acquired in an introductory
outlier study over recent years. university-level course. The metbodological exposition leans on an under-
Such an emphasis is reflected in the structure of the book. The opening standing of tbe principles and practical implications of testing and estima-
two chapters are designed respectively to motivate examination of outliers tion. Where basic modelling and demonstration of statistica! propriety are
and to pose basic questions about the nature of an outlier. Chapter l gives a discussed, a more mathematical appreciation of basic principles is assumed,
generai survey of the field. In Chapter 2 we consider the various ways in including some familiarity with optimality properties of methods of con-
which we can model the presence of outliers in a set of data. We examine structing tests and estimators and some knowledge of the properties of order
the different interests (from rejection of unacceptable contamination, statistics. Proofs of results are formally presented wbere appropriate, but at
through the accommodation of outliers with reduced influence in robust a heuristic ratber tban bighly matbematical level.
procedures applied to the whole set of data, to specific identification of Extensive tables of appropriate statistica! functions are presented in an
outliers as the facets of principal interest in the data). We discuss the Appendix, to aid the practical worker in tbe use of tbe different procedures.
statistica! respectability of distinct methods of study, and the special prob- Many of these tables are extracted from existing publisbed tables; we are
lems that arise from the dimensionality of the data set or from the purpose grateful to ali tbe authors and publisbers concerned, and bave made indi-
of its analysis (single-sample estimation or testing, regression, analysis of viduai acknowledgement at tbe appropriate places in our text. Otber tables
data from designed experiments, examination of slippage in multisample bave been specially produced by us. The whole set of tables bas been
data, and so on). presented in as compact and consistent a style as possible. Tbis has involved
Chapter 3 examines at length the assessment of discordancy of outliers in a good deal of selection and re-ordering of tbe previously publisbed mater-
single univariate samples. It discusses basic considerations and also presents ia!; we bave aimed as far as possible to standardize the ranges of tabulated
a battery of techniques for practical use with comment on the circumstances values of sample size, percentage point, etc.
supporting one method rather than another. Copious references are given throughout the text to source materia! and
Chapter 4, on the accommodation of outliers in single univariate samples, to further work on the various topics. Tbey are gathered togetber in the
deals with inference procedures which are robust in the sense of providing section entitled 'References and Bibliography' with appropriate page refer-
protection against the effect of outliers. Chapter 5 is concerned with ences to places of principal relevance in the text. Additional references
processing severa! univariate samples both with regard to the relative augment those wbicb have been discussed in tbe text. These will of course
slippage of the distributions from which they arise and (to a lesser extent) in appear without any page reference, but will carry an indication of the main
relation to the accommodation of outliers in robust analysis of the whole set area to wbicb tbey are relevant.
of data. It is, of course, a privilege and pleasure to acknowledge tbe help of others.
Chapters 6 and 7 extend the ideas and methods (in relation to the three We tbank Dave Collett, Nick Fieller, Agnes Herzberg, and David Kendall
interests: rejection, accommodation, identification) to single multivariate for helpful comments on early drafts of some of tbe materia!. We are
samples and to the analysis of data in regression, designed experiments, or particularly grateful to Kim Malafant who carried out the extensive calcula-
time-series situations. Chapter 8 gives fuller and more specific attention to tions of tbe new statistica! tables in Cbapter 3. Our grateful tbanks co also to
the implications of adopting a Bayesian, or a non-parametric, approach to Hazel Howard wbo coped nobly with tbe typing of a difficult manuscript.
the study of outliers. The concluding Chapter 9 poses a few issues for We are solely responsible for any imperfections in the book and should be
further consideration or investigation. glad to be informed of them.
The book aims to bring together in a logica! framework the vast amount
of work on outliers which has been scattered over the years in the various July, 1977 VIe BARNETT
professional journals and texts, and which appears to have acquired a new ToBY LEwis
vi Preface Preface vii

sufficient attention is not always paid to the form of the outlier model, or to Iease of life over the las t decade or so. It is directed to more than one kind
the practical purpose of investigating outliers, in the presentation of of reader: to the student (to inform him of the range of ideas and techni-
methods for processing outliers. Many procedures have an ad hoc, intui- ques), to the experimentalist (to assist him in the judicious choice of
tively justified, basis with little external reference in the sense of the relative methods for handling outliers), and to the professional statistician (as a
statistica! merits of different possibilities. In reviewing such techniques we guide to the present state of knowledge and a springboard for further
will attempt to set them, as far as possible, within a wider framework of research).
model, statistica! principle, and practical aim, and we shall also consider the The level of treatment assumes a knowledge of elementary probability
extent to which such basic considerations have begun to formally permeate theory and statistica! method such as would be acquired in an introductory
outlier study over recent years. university-level course. The metbodological exposition leans on an under-
Such an emphasis is reflected in the structure of the book. The opening standing of tbe principles and practical implications of testing and estima-
two chapters are designed respectively to motivate examination of outliers tion. Where basic modelling and demonstration of statistica! propriety are
and to pose basic questions about the nature of an outlier. Chapter l gives a discussed, a more mathematical appreciation of basic principles is assumed,
generai survey of the field. In Chapter 2 we consider the various ways in including some familiarity with optimality properties of methods of con-
which we can model the presence of outliers in a set of data. We examine structing tests and estimators and some knowledge of the properties of order
the different interests (from rejection of unacceptable contamination, statistics. Proofs of results are formally presented wbere appropriate, but at
through the accommodation of outliers with reduced influence in robust a heuristic ratber tban bighly matbematical level.
procedures applied to the whole set of data, to specific identification of Extensive tables of appropriate statistica! functions are presented in an
outliers as the facets of principal interest in the data). We discuss the Appendix, to aid the practical worker in tbe use of tbe different procedures.
statistica! respectability of distinct methods of study, and the special prob- Many of these tables are extracted from existing publisbed tables; we are
lems that arise from the dimensionality of the data set or from the purpose grateful to ali tbe authors and publisbers concerned, and bave made indi-
of its analysis (single-sample estimation or testing, regression, analysis of viduai acknowledgement at tbe appropriate places in our text. Otber tables
data from designed experiments, examination of slippage in multisample bave been specially produced by us. The whole set of tables bas been
data, and so on). presented in as compact and consistent a style as possible. Tbis has involved
Chapter 3 examines at length the assessment of discordancy of outliers in a good deal of selection and re-ordering of tbe previously publisbed mater-
single univariate samples. It discusses basic considerations and also presents ia!; we bave aimed as far as possible to standardize the ranges of tabulated
a battery of techniques for practical use with comment on the circumstances values of sample size, percentage point, etc.
supporting one method rather than another. Copious references are given throughout the text to source materia! and
Chapter 4, on the accommodation of outliers in single univariate samples, to further work on the various topics. Tbey are gathered togetber in the
deals with inference procedures which are robust in the sense of providing section entitled 'References and Bibliography' with appropriate page refer-
protection against the effect of outliers. Chapter 5 is concerned with ences to places of principal relevance in the text. Additional references
processing severa! univariate samples both with regard to the relative augment those wbicb have been discussed in tbe text. These will of course
slippage of the distributions from which they arise and (to a lesser extent) in appear without any page reference, but will carry an indication of the main
relation to the accommodation of outliers in robust analysis of the whole set area to wbicb tbey are relevant.
of data. It is, of course, a privilege and pleasure to acknowledge tbe help of others.
Chapters 6 and 7 extend the ideas and methods (in relation to the three We tbank Dave Collett, Nick Fieller, Agnes Herzberg, and David Kendall
interests: rejection, accommodation, identification) to single multivariate for helpful comments on early drafts of some of tbe materia!. We are
samples and to the analysis of data in regression, designed experiments, or particularly grateful to Kim Malafant who carried out the extensive calcula-
time-series situations. Chapter 8 gives fuller and more specific attention to tions of tbe new statistica! tables in Cbapter 3. Our grateful tbanks co also to
the implications of adopting a Bayesian, or a non-parametric, approach to Hazel Howard wbo coped nobly with tbe typing of a difficult manuscript.
the study of outliers. The concluding Chapter 9 poses a few issues for We are solely responsible for any imperfections in the book and should be
further consideration or investigation. glad to be informed of them.
The book aims to bring together in a logica! framework the vast amount
of work on outliers which has been scattered over the years in the various July, 1977 VIe BARNETT
professional journals and texts, and which appears to have acquired a new ToBY LEwis
Contents

CHAPTER l INTRODUCTION l
1.1 Human error and ignorance 6
1.2 Outliers in relation to probability models 7
1.3 Outliers in more structured situations 10
1.4 Bayesian and non-parametric methods 15
1.5 Survey of outlier problems 16

CHAPTER 2 WHAT SHOULD ONE DO ABOUT OUTLYING OBSER-


VATIONS? 18
2.1 Early informai approaches 18
2.2 Various aims 22
2.3 Models for discordancy 28
2.4 Test statistics 38
2.5 Statistica/ principles underlying tests of discordancy 41
2.6 Accommodation of outliers: robust estimation and testing 46

CHAPTER 3 DISCORDANCY TESTS FOR OUTLIERS IN UNI-


VARIATE SAMPLES 52
3.1 Statistica/ bases for construction of tests 56
3 .1.1 Inclusive and exclusive measures, and a recursive algorithm for
the null distribution of a test statistic 61
3.2 Performance criteria of tests 64
3.3 The multiple outlier problem 68
3.3.1 Block procedures for multiple outliers in univariate samples 71
3.3.2. Consecutive procedures for multiple outliers in univariate sam-
ples 73
3.4 Discordancy tests for practical use 7 5
3.4.1 Guide to use of the tests 75
ix
Contents

CHAPTER l INTRODUCTION l
1.1 Human error and ignorance 6
1.2 Outliers in relation to probability models 7
1.3 Outliers in more structured situations 10
1.4 Bayesian and non-parametric methods 15
1.5 Survey of outlier problems 16

CHAPTER 2 WHAT SHOULD ONE DO ABOUT OUTLYING OBSER-


VATIONS? 18
2.1 Early informai approaches 18
2.2 Various aims 22
2.3 Models for discordancy 28
2.4 Test statistics 38
2.5 Statistica/ principles underlying tests of discordancy 41
2.6 Accommodation of outliers: robust estimation and testing 46

CHAPTER 3 DISCORDANCY TESTS FOR OUTLIERS IN UNI-


VARIATE SAMPLES 52
3.1 Statistica/ bases for construction of tests 56
3 .1.1 Inclusive and exclusive measures, and a recursive algorithm for
the null distribution of a test statistic 61
3.2 Performance criteria of tests 64
3.3 The multiple outlier problem 68
3.3.1 Block procedures for multiple outliers in univariate samples 71
3.3.2. Consecutive procedures for multiple outliers in univariate sam-
ples 73
3.4 Discordancy tests for practical use 7 5
3.4.1 Guide to use of the tests 75
ix
x Contents Contents xi

3.4.2 Discordancy tests for gamma (including exponential) 6.2.7 Correlation methods 227
samples 76 6.2.8 A 'gap test' for multivariate outliers 229
3.4.3 Discordancy tests for normal samples 89 6.3 Accommodation of multivariate outliers 231
3.4.4 Discordancy tests for samples from other distributions 115
CHAPTER 7 OUTLIERS IN DESIGNED EXPERIMENTS, REGRES-
CHAPTER 4 ACCOMMODATION OF OUTLIERS IN UNIVARIATE SION ANO IN TIME-SERIES 234
SAMPLES: ROBUST ESTIMATION ANO TESTING 126
7 .l Outliers in designed experiments 238
4.1 Performance criteria 130 7 .1.1 Discordancy tests based on residuals 238
4.1.1 Efficiency measures for estimatoFs 130 7 .1.2 Residual-based accommodation procedures 246
4.1.2 The qualitative approach: influence curves 136 7 .1.3 Graphical methods 247
4.1.3 Robustness of confidence intervals 141 7 .1.4 Non-residual-based methods 249
4.1.4 Robustness of significance tests 142 7 .1.5 Non-parametric, and Bayesian, methods 251
4.2 Generai methods of accommodation 144 7.2 Outliers in regression 252
4.2.1 Estimation of location 144 7 .2.1 Outliers in linear regression 252
4.2.2 Performance characteristics of locatio11 estimators 155 7 .2.2 Multiple regression 256
4.2.3 Estimation of scale or dispersion 158 7.3 Outliers with generai linear models 257
4.2.4 Studentized location estimates, tests, and confidence inter- 7.3.1 Residual-based methods 257
vals 160 7 .3.2 Non-residual-based methods 264
4.3 Accommodation of outliers in univariate normal samples 163 7.4 Outliers in time-series 266
4.4 Accommodation of outliers in exponential samples 171
CHAPTER 8 BAYESIAN ANO NON-PARAMETRIC APPRO-
ACHES 269
CHAPTER 5 OUTLYING SUB-SAMPLES: SLIPPAGE TESTS 174
8.1 Bayesian methods 269
5.1. Non-parametric slippage tests 176 8.1.1 Bayesian 'tests of discordancy' 269
5.1.1 Non-parametric tests for slippage of a single population 176 8.1.2 Bayesian accommodation of outliers 277
5.1.2 Non-parametric tests for slippage of several populations: multi- 8.2 Non-parametric methods 282
ple comparisons 183
5.2 The slippage model 186 CHAPTER 9 PERSPECTIVE 286
5.3 Parametric slippage tests 187
5.3.1 Norma/ samples 188 APPENDIX: STATISTICAL TABLES 289
5.3.2 Generai slippage tests 197
5.3.3 Non-norma/ samples 201 REFERENCES ANO BIBLIOGRAPHY 337
5.3.4 Group parametric slippage tests 204
5.4 Other slippage work 205 INDEX 357

CHAPTER 6 OUTLIERS IN MULTIVARIATE DATA 208


6 .l Outliers in multivariate norma l samples 209
6.2 Informai detection of multivariate outliers 219
6.2.1 Marginai outliers 220
6.2.2 Linear constraints 221
6.2.3 Graphical methods 221
6.2.4 Principal component analysis method 223
6.2.5 Use of generalized distances 224
6.2.6 Fourier-type representation 227
x Contents Contents xi

3.4.2 Discordancy tests for gamma (including exponential) 6.2.7 Correlation methods 227
samples 76 6.2.8 A 'gap test' for multivariate outliers 229
3.4.3 Discordancy tests for normal samples 89 6.3 Accommodation of multivariate outliers 231
3.4.4 Discordancy tests for samples from other distributions 115
CHAPTER 7 OUTLIERS IN DESIGNED EXPERIMENTS, REGRES-
CHAPTER 4 ACCOMMODATION OF OUTLIERS IN UNIVARIATE SION ANO IN TIME-SERIES 234
SAMPLES: ROBUST ESTIMATION ANO TESTING 126
7 .l Outliers in designed experiments 238
4.1 Performance criteria 130 7 .1.1 Discordancy tests based on residuals 238
4.1.1 Efficiency measures for estimatoFs 130 7 .1.2 Residual-based accommodation procedures 246
4.1.2 The qualitative approach: influence curves 136 7 .1.3 Graphical methods 247
4.1.3 Robustness of confidence intervals 141 7 .1.4 Non-residual-based methods 249
4.1.4 Robustness of significance tests 142 7 .1.5 Non-parametric, and Bayesian, methods 251
4.2 Generai methods of accommodation 144 7.2 Outliers in regression 252
4.2.1 Estimation of location 144 7 .2.1 Outliers in linear regression 252
4.2.2 Performance characteristics of locatio11 estimators 155 7 .2.2 Multiple regression 256
4.2.3 Estimation of scale or dispersion 158 7.3 Outliers with generai linear models 257
4.2.4 Studentized location estimates, tests, and confidence inter- 7.3.1 Residual-based methods 257
vals 160 7 .3.2 Non-residual-based methods 264
4.3 Accommodation of outliers in univariate normal samples 163 7.4 Outliers in time-series 266
4.4 Accommodation of outliers in exponential samples 171
CHAPTER 8 BAYESIAN ANO NON-PARAMETRIC APPRO-
ACHES 269
CHAPTER 5 OUTLYING SUB-SAMPLES: SLIPPAGE TESTS 174
8.1 Bayesian methods 269
5.1. Non-parametric slippage tests 176 8.1.1 Bayesian 'tests of discordancy' 269
5.1.1 Non-parametric tests for slippage of a single population 176 8.1.2 Bayesian accommodation of outliers 277
5.1.2 Non-parametric tests for slippage of several populations: multi- 8.2 Non-parametric methods 282
ple comparisons 183
5.2 The slippage model 186 CHAPTER 9 PERSPECTIVE 286
5.3 Parametric slippage tests 187
5.3.1 Norma/ samples 188 APPENDIX: STATISTICAL TABLES 289
5.3.2 Generai slippage tests 197
5.3.3 Non-norma/ samples 201 REFERENCES ANO BIBLIOGRAPHY 337
5.3.4 Group parametric slippage tests 204
5.4 Other slippage work 205 INDEX 357

CHAPTER 6 OUTLIERS IN MULTIVARIATE DATA 208


6 .l Outliers in multivariate norma l samples 209
6.2 Informai detection of multivariate outliers 219
6.2.1 Marginai outliers 220
6.2.2 Linear constraints 221
6.2.3 Graphical methods 221
6.2.4 Principal component analysis method 223
6.2.5 Use of generalized distances 224
6.2.6 Fourier-type representation 227
CHAPTER l

Introduction
From the earliest gropings of man to harness and employ the information
implicit in collected data as an aid to understanding the world he lives in,
there has been a concern for 'unrepresentative', 'rogue', or 'outlying'
observations in sets of data. These are often seen as contaminating the data:
reducing and distorting the information it provides about its source or
generating mechanism. It is natura! to seek means of interpreting or
categorizing outliers-of sometimes rejecting them to restare the propriety
of the data, or at least of taking their presence properly into account in any
statistica! analysis.
What are outliers and what is the outlier problem? To quote from
Ferguson (1961a),
the generai problem ... is a very old and common one. In its simplest form it may be
stated as follows. In a sample of moderate size taken from a certain population it
appears that one or two values are surprisingly far away from the main group. The
experimenter is tempted to throw away the apparently erroneous values, and not
because he is certain that the values are spurioY:s. On the contrary, he will
undoubtedly admit that even if the population has a normal distribution there is a
positive although extremely small probability that such values will occur in an
experiment. It is rather because he feels that other explanations are more plausible,
and that the loss in the accuracy of the experiment caused by throwing away a couple
of good values is small compared to the loss caused by keeping even one bad value.
The problem, then, is to introduce some degree of objectivity into the rejection of
the outlying observations. (Copyright © 1961 by Regents of the University of
California; reprinted by permission of the University of California Press)
In the light of developments in outlier methodology over the last 15 years,
Ferguson's formulation is ul!Q!J.ly restrictive in various ways, as we shall see;
for example, outlying values are not necessary 'bad' or 'erroneous', and the
experimenter may be tempted in some situations not to 'throw away' the
qiscord~nt value but to welcome it as an indication of, say, some unexpec-
tedly useful industriai treatment or agricultural variety. However, the pas-
sage brings out most of the essentials of the outlier situation, and will serve
as a basis for discussion.
CHAPTER l

Introduction
From the earliest gropings of man to harness and employ the information
implicit in collected data as an aid to understanding the world he lives in,
there has been a concern for 'unrepresentative', 'rogue', or 'outlying'
observations in sets of data. These are often seen as contaminating the data:
reducing and distorting the information it provides about its source or
generating mechanism. It is natura! to seek means of interpreting or
categorizing outliers-of sometimes rejecting them to restare the propriety
of the data, or at least of taking their presence properly into account in any
statistica! analysis.
What are outliers and what is the outlier problem? To quote from
Ferguson (1961a),
the generai problem ... is a very old and common one. In its simplest form it may be
stated as follows. In a sample of moderate size taken from a certain population it
appears that one or two values are surprisingly far away from the main group. The
experimenter is tempted to throw away the apparently erroneous values, and not
because he is certain that the values are spurioY:s. On the contrary, he will
undoubtedly admit that even if the population has a normal distribution there is a
positive although extremely small probability that such values will occur in an
experiment. It is rather because he feels that other explanations are more plausible,
and that the loss in the accuracy of the experiment caused by throwing away a couple
of good values is small compared to the loss caused by keeping even one bad value.
The problem, then, is to introduce some degree of objectivity into the rejection of
the outlying observations. (Copyright © 1961 by Regents of the University of
California; reprinted by permission of the University of California Press)
In the light of developments in outlier methodology over the last 15 years,
Ferguson's formulation is ul!Q!J.ly restrictive in various ways, as we shall see;
for example, outlying values are not necessary 'bad' or 'erroneous', and the
experimenter may be tempted in some situations not to 'throw away' the
qiscord~nt value but to welcome it as an indication of, say, some unexpec-
tedly useful industriai treatment or agricultural variety. However, the pas-
sage brings out most of the essentials of the outlier situation, and will serve
as a basis for discussion.
2 Outliers in statistica[ data

The first point to make is that the problem, as well as being a 'very old :g~ ~~ ~~ ~~ 8; ~9 ~~ ~~ 9~ ~8 ~~ ~8
~\Ò Mlr) Mlr\ M'O '<:!"t'-. M'O Mlr\ Mlrl '<:1"\0 M'O '<:1"\0 M'O
and common one', is an unavoidable one. It is ali very well to say, as some
QO ~~ ~~ ~~ ~~ ~~ ~~ ~(::; ~8 ~~ ~~ ~~
statisticians do, that one should not consider dealing with outlying observa- ~N~ M~ M'-Ò M~ M~ M~ M'-Ò M'-Ò M'-Ò ~>è M'-Ò
tions unless furnished with information on their prior probabilities. Some S;~ ~~ 8~ ~~ ~:: $~ ~~ ~~ $~ ~~ ~~
will not even admit the concept of an outlier unless there is some obvious M'<\ M~~~ ~>è ~>è M~ M~~~ M~ ~>è M~
00'1:1' r:--t--. 000\ ~'1:1' tnt'-. 01~ r:--C \O.,... ~0\ '<:!".,.....
physical explanation of its presence! But the fact is that experimental Olr) 01'1:1' tnt'-. .-.\0 N~ \00\ OOir\ t--.,... OIC IO~
~>è M'-Ò ~>è ~>è M'-Ò M~ M~ M'-Ò M'-Ò M'-Ò (/}

scientists and oth~r people who have to deal with data and take decisions ~
'<:!"C .-.lr) .-.'1:1' tnt'-. ~0\ NC t--~ 00'1:1' Mt'-. .-.lr) ~
are forced to make judgments about outliers-whether or not to include N~ .-.0\ .-.\0 M~
~~~>è ~>è
t--~ N'l:!' 00.,.... '<:!".,..... 01\r) t--~
M'-Ò M'-Ò M~ M'-Ò M'-Ò M'-Ò M'-Ò ç
them, whether to make allowances for them on some compromise basis, and t--'1:1' '<:!"~ .-.\0 tnC \0'1:1' .,....,..... t--00 tnlr\ \000
.E
so on. Sometimes this is done in what appears, by modern standards, to be
'<:1"00 IO~ 010\ \0\0 00'0 Olt'-. t--0\ .-.00 MC
~~ ~~ M'-Ò M'-Ò M'-Ò M'-Ò M'-Ò ~>è M'-Ò
.:::
~
~
an unnecessarily naive or ine:fficient way. For example, a chemistry text ~ ~~ ~~ :g~ ~~~:Q~~~~~~ ~
eJJ
book in current use (Calvin et al., 1949, reprinted in 1960) advises its >è M'-Ò M'-Ò M~ M~ M~ M~ M'-Ò M'-Ò :E
01~ 0'1:1' 00~ M~ 00~ 01.,.... \0'1:1' 01~ .E
readers to use Chauvenet's method: 'Any result of a series containing n ... 00~ 00.,.... '<:l"lr\ N~ \000 0'1:1' M.,... '<:1"00 s
M'-Ò M'-Ò M~ M~ M~ M~ ~>è M~ u
(':$

observations shall be rejected when the magnitude of its deviation from tpe
mean of ali measurements is such that the probability of occurrence of ali
Mt'-. Olt'-. 00~ Nlrl 10'1:1' Mt'-. 10'0 00~
.-.oo ~oo O'~:!' 0101 o\0 r:--t--. oo01 '<:!""'i o
~>è M~ M~ N~ M~ N~ N~ M~ c:
deviations as large or larger is less than l/2n'. This rather strange method is -~
~~ ~~ ~~ ~~ ;::!;~ ~~ ~~ ·§
one of the earliest extant for dealing with outliers, and dates from the M~ M'-Ò M~ M'-Ò M~ M~ M~
~
middle of the nineteenth century (Chauvenet, 1863). We return to it in ~~ ~~ ;g~ ~& ~~ ~~ ~~
p..
>.
Section 2.1. M'-Ò MOÒ M~ M~ M'-Ò M~ M~ .D
'"O
~
Some data which have attracted much interest among statisticians over the ~t:::~~~~ 0::(::; ~~ ~~ ~~ u
:s
M~ M~ M~ M'-Ò M'-Ò ~>è ~>è '"O
years are the results given by Mercer and Hall (1912) on yields of wheat o
grain and straw for 500 similar-sized plots of soil planted with wheat over a
rectangular field. Both grain yields, and straw yields, seem to provide a good
0\ t--lr) \0'1:1' IO.,...
'O tnlr\ .-.oo r--~
>è M'-Ò M~ M'-Ò
Olt'-. OIC'I") OC
'<:l"lrl ~o- r--oo
~~~>è M'-Ò
v)
is
NC IOC -.oC OIC M.,... Nt'-. (!;l
00~ tnt'-. o c ~~ ..... C'l") ~n .....
fit to normal distributions. If one is concerned with edge effects and looks at M~ M'-Ò ~~ ~~ .noò ~oò
'"O

the 25 grain yields along the southern boundary of the field, results (in lb) B
are found as shown in the second row from the bottom in Figure 1.1, which
01~ .-.'0 r:--t--. N~ oot--. '<:!"t--.
OC '<:1"0\ N.,... tnt'-. \OC '<:1"00
~~M~~~~~~oò~'-Ò c
:;
is part of the originai table given by Mercer and Hall. (Upright numerals are 'O
.,.....
t--'1:1' ............ 01'0 ~'O -.o .....
'<:1"00 \Oir\ NO\ 010\ '<:1"0\ ::t:
~ ~~ ~~ ~\Ò ~~ ~\Ò
grain yields; italic, straw yields.) '"O
c:
(!;l
M'l:!'.,....,..... \0'1:1' tnt'-. 01'0
On these figures alone one might be rather worried about the value 5.09 ~~N'l:!' OIC ~'1:1' OOC'I")
~~~~M~~~ M~ 8'""
for grain yield in the fourth plot from the western edge. Mercer and Hall N~~~'<:l"~t--0\t--~
~
-~ '<:l"lr\ N~ M.,... Ollr) ~
were not concerned only with the edge yields, nor were they on the look out
.gs
~~ ~~ ~~~~M~
for 'outliers'. But even at the time of their work (1912) there were available 0\ ~'1:1' 00~ oo ...... \0'0
t--. 01~ M'l:!' \0\r) OC
better methods than Chauvenet's for detecting outliers. Wright's rule ~M~~~~~~~ ti
(!;l
i:l
(1884), for example, rejected any observation distant more than ±3.37 'O 01~ '<:l"lrl .-.lr) N~ ><
'1:1' '<:1"\0 Olt'-. '<:!".,..... Nt'-. ll.l
estimated standard deviations from the sample mean. >è M'-Ò M'-Ò ~oò ~~
~
What happens if we apply Chauvenet's, or Wright's, method to the 25 01'0 '<:l"lrl \0'1:1' \OC
~c oooo '<:l"lrl -.oc
..1
southern edge grain yields? The sample mean is m= 3.95 and the estimated M-.ò~oò~~M~ ~
:::1

standard deviation s = 0.463. Neither Chauvenet nor Wright distinguished 00~ NC \000
tnt'-.Oit'-.~t'-.01'
01~
~
~~ M'-Ò .noò .noò
between s and the population measure o-. For rejecting the observation
'1:1' Olrl 01'0 Nlr\
5.09 on the Chauvenet principle we need j5.09-ml/s to exceed 2.33. But .,..... M lr) \0 Q '<:!" '1:1'
~~~~ClÒ~ClÒ
(5.09- 3.95)/0.463 = 2.46, so that on this basic there would be some cause
~N~OO~Nir)
for concern about the value 5.09. The opposite conclusion is reached on 00 ~~
\Ò~~~~~~
NC N'O

Wright's principle! OON 100\ .-.c


MI'OOir)\0~
With the development during this century of more formai approaches to ~>è M'-Ò M'-Ò
the statistica! analysis of data, objectives have become clearer, principles
2 Outliers in statistica[ data

The first point to make is that the problem, as well as being a 'very old :g~ ~~ ~~ ~~ 8; ~9 ~~ ~~ 9~ ~8 ~~ ~8
~\Ò Mlr) Mlr\ M'O '<:!"t'-. M'O Mlr\ Mlrl '<:1"\0 M'O '<:1"\0 M'O
and common one', is an unavoidable one. It is ali very well to say, as some
QO ~~ ~~ ~~ ~~ ~~ ~~ ~(::; ~8 ~~ ~~ ~~
statisticians do, that one should not consider dealing with outlying observa- ~N~ M~ M'-Ò M~ M~ M~ M'-Ò M'-Ò M'-Ò ~>è M'-Ò
tions unless furnished with information on their prior probabilities. Some S;~ ~~ 8~ ~~ ~:: $~ ~~ ~~ $~ ~~ ~~
will not even admit the concept of an outlier unless there is some obvious M'<\ M~~~ ~>è ~>è M~ M~~~ M~ ~>è M~
00'1:1' r:--t--. 000\ ~'1:1' tnt'-. 01~ r:--C \O.,... ~0\ '<:!".,.....
physical explanation of its presence! But the fact is that experimental Olr) 01'1:1' tnt'-. .-.\0 N~ \00\ OOir\ t--.,... OIC IO~
~>è M'-Ò ~>è ~>è M'-Ò M~ M~ M'-Ò M'-Ò M'-Ò (/}

scientists and oth~r people who have to deal with data and take decisions ~
'<:!"C .-.lr) .-.'1:1' tnt'-. ~0\ NC t--~ 00'1:1' Mt'-. .-.lr) ~
are forced to make judgments about outliers-whether or not to include N~ .-.0\ .-.\0 M~
~~~>è ~>è
t--~ N'l:!' 00.,.... '<:!".,..... 01\r) t--~
M'-Ò M'-Ò M~ M'-Ò M'-Ò M'-Ò M'-Ò ç
them, whether to make allowances for them on some compromise basis, and t--'1:1' '<:!"~ .-.\0 tnC \0'1:1' .,....,..... t--00 tnlr\ \000
.E
so on. Sometimes this is done in what appears, by modern standards, to be
'<:1"00 IO~ 010\ \0\0 00'0 Olt'-. t--0\ .-.00 MC
~~ ~~ M'-Ò M'-Ò M'-Ò M'-Ò M'-Ò ~>è M'-Ò
.:::
~
~
an unnecessarily naive or ine:fficient way. For example, a chemistry text ~ ~~ ~~ :g~ ~~~:Q~~~~~~ ~
eJJ
book in current use (Calvin et al., 1949, reprinted in 1960) advises its >è M'-Ò M'-Ò M~ M~ M~ M~ M'-Ò M'-Ò :E
01~ 0'1:1' 00~ M~ 00~ 01.,.... \0'1:1' 01~ .E
readers to use Chauvenet's method: 'Any result of a series containing n ... 00~ 00.,.... '<:l"lr\ N~ \000 0'1:1' M.,... '<:1"00 s
M'-Ò M'-Ò M~ M~ M~ M~ ~>è M~ u
(':$

observations shall be rejected when the magnitude of its deviation from tpe
mean of ali measurements is such that the probability of occurrence of ali
Mt'-. Olt'-. 00~ Nlrl 10'1:1' Mt'-. 10'0 00~
.-.oo ~oo O'~:!' 0101 o\0 r:--t--. oo01 '<:!""'i o
~>è M~ M~ N~ M~ N~ N~ M~ c:
deviations as large or larger is less than l/2n'. This rather strange method is -~
~~ ~~ ~~ ~~ ;::!;~ ~~ ~~ ·§
one of the earliest extant for dealing with outliers, and dates from the M~ M'-Ò M~ M'-Ò M~ M~ M~
~
middle of the nineteenth century (Chauvenet, 1863). We return to it in ~~ ~~ ;g~ ~& ~~ ~~ ~~
p..
>.
Section 2.1. M'-Ò MOÒ M~ M~ M'-Ò M~ M~ .D
'"O
~
Some data which have attracted much interest among statisticians over the ~t:::~~~~ 0::(::; ~~ ~~ ~~ u
:s
M~ M~ M~ M'-Ò M'-Ò ~>è ~>è '"O
years are the results given by Mercer and Hall (1912) on yields of wheat o
grain and straw for 500 similar-sized plots of soil planted with wheat over a
rectangular field. Both grain yields, and straw yields, seem to provide a good
0\ t--lr) \0'1:1' IO.,...
'O tnlr\ .-.oo r--~
>è M'-Ò M~ M'-Ò
Olt'-. OIC'I") OC
'<:l"lrl ~o- r--oo
~~~>è M'-Ò
v)
is
NC IOC -.oC OIC M.,... Nt'-. (!;l
00~ tnt'-. o c ~~ ..... C'l") ~n .....
fit to normal distributions. If one is concerned with edge effects and looks at M~ M'-Ò ~~ ~~ .noò ~oò
'"O

the 25 grain yields along the southern boundary of the field, results (in lb) B
are found as shown in the second row from the bottom in Figure 1.1, which
01~ .-.'0 r:--t--. N~ oot--. '<:!"t--.
OC '<:1"0\ N.,... tnt'-. \OC '<:1"00
~~M~~~~~~oò~'-Ò c
:;
is part of the originai table given by Mercer and Hall. (Upright numerals are 'O
.,.....
t--'1:1' ............ 01'0 ~'O -.o .....
'<:1"00 \Oir\ NO\ 010\ '<:1"0\ ::t:
~ ~~ ~~ ~\Ò ~~ ~\Ò
grain yields; italic, straw yields.) '"O
c:
(!;l
M'l:!'.,....,..... \0'1:1' tnt'-. 01'0
On these figures alone one might be rather worried about the value 5.09 ~~N'l:!' OIC ~'1:1' OOC'I")
~~~~M~~~ M~ 8'""
for grain yield in the fourth plot from the western edge. Mercer and Hall N~~~'<:l"~t--0\t--~
~
-~ '<:l"lr\ N~ M.,... Ollr) ~
were not concerned only with the edge yields, nor were they on the look out
.gs
~~ ~~ ~~~~M~
for 'outliers'. But even at the time of their work (1912) there were available 0\ ~'1:1' 00~ oo ...... \0'0
t--. 01~ M'l:!' \0\r) OC
better methods than Chauvenet's for detecting outliers. Wright's rule ~M~~~~~~~ ti
(!;l
i:l
(1884), for example, rejected any observation distant more than ±3.37 'O 01~ '<:l"lrl .-.lr) N~ ><
'1:1' '<:1"\0 Olt'-. '<:!".,..... Nt'-. ll.l
estimated standard deviations from the sample mean. >è M'-Ò M'-Ò ~oò ~~
~
What happens if we apply Chauvenet's, or Wright's, method to the 25 01'0 '<:l"lrl \0'1:1' \OC
~c oooo '<:l"lrl -.oc
..1
southern edge grain yields? The sample mean is m= 3.95 and the estimated M-.ò~oò~~M~ ~
:::1

standard deviation s = 0.463. Neither Chauvenet nor Wright distinguished 00~ NC \000
tnt'-.Oit'-.~t'-.01'
01~
~
~~ M'-Ò .noò .noò
between s and the population measure o-. For rejecting the observation
'1:1' Olrl 01'0 Nlr\
5.09 on the Chauvenet principle we need j5.09-ml/s to exceed 2.33. But .,..... M lr) \0 Q '<:!" '1:1'
~~~~ClÒ~ClÒ
(5.09- 3.95)/0.463 = 2.46, so that on this basic there would be some cause
~N~OO~Nir)
for concern about the value 5.09. The opposite conclusion is reached on 00 ~~
\Ò~~~~~~
NC N'O

Wright's principle! OON 100\ .-.c


MI'OOir)\0~
With the development during this century of more formai approaches to ~>è M'-Ò M'-Ò
the statistica! analysis of data, objectives have become clearer, principles
4 Outliers in statistical data Introduction 5

more rigorously defined, and a vast array of sopbisticated metbodology bas way). Sbould such observations be foreign to tbe main population tbey may,
been constructed. Practical situations are commonly represented in terms of by tbeir very nature, cause difficulties in tbe attempt to represent tbe
different possible families of probability models often cbaracterized by some population: tbey can grossly contaminate estimates (or tests) of parameters
small number of parameters. Generai considerations of situation structure, in some model for tbe population. Accordingly tbe outlier problem takes tbe
past experience of similar circumstances, an d matbematical tractability, ali following form. We examine tbe data se't. We decide tbat outliers exist (in
combine to suggest one particular family of probability distributions wbicb tbe sense described above). How sbould we react to tbem? Wbat metbods
migbt reasonably be expected to represent tbe prevailing situation. Sample can be used to support rejecting tbe outlying observations, or adjusting tbeir
data may be analysed to assess tbe validity of tbe prescribed model, and to values, prior to processing tbe principal mass of data? Clearly, tbe answer
estimate or test bypotbeses concerning relevant parameters. Tbis greater depends on tbe form of tbe population; tecbniques will be conditioned by,
sopbistication in tbe design and use of statistital metbods makes it no less and specific to, any postulated model for tbat population. Tbus, metbods for
important to be able to assess tbe integrity of a set of data. However, tbere tbe processing of outliers take on an entirely relative form. It may be, of
is some tendency to give greater regard to tbe processing of data for course, tbat we do not go beyond tbe rejection stage in some cases. Our
parameter estimation or testing on tbe assumption tbat sucb and sucb a interest rests in identifying foreign observations as matters of major con-
model applies, tban to investigating wbetber tbe data give added support to cern: tbey indicate particular matters of practical interest.
tbe generai considerations wbicb bave promoted tbe model. One conceptual difficulty needs to be recognized at tbe outset. Opinion is
Tbis is a somewbat dangerous principle. Wbat is known to be a good divided on precisely wben it is justifiable to scrutinize outliers. Tbere is little
statistica! procedure for estimating tbe mean of a normal distribution may be dispute tbat it is reasonable wben outliers exist in tbe form of errors of
most inefficient if tbe distribution is not normal. Tbe actual data being observation, or mis-recording, tbat is, wben tbey can be substantiated by
analysed can so un d a warning for us! Perbaps o ne or more observations look practical considerations sucb as tbe sbeer impossibility of a recorded value,
suspicious wben tbe data are considered as a sample from a normal or an obvious buman errar. It is sometimes claimed (as remarked above)
distribution: tbey may bave been incorrectly recorded (or measured), of tbat tbese are tbe only genuine 'outliers' and tbat if no sucb tangible
course, or tbey may be a genuine reftection of tbe basic impropriety of explanation can be found for apparently unreasonable observations tben
assuming an underlying normal distribution. tbeir rejection, or accommodation by special treatment, is invalid. However,
Clearly sucb considerations are vitally important for proper statistica! two factors lead us to reject tbis nibilistic attitude. In tbe first piace a variety
practice. We need a battery of tecbniques for assessing tbe integrity of a set of metbods bave been proposed for dealing witb 'non-tangible' outliers;
of data witb respect to an assumed model. As a particular aspect of tbis we tbese are used by statisticians and it seems desirable to present tbem in a
need metbods for assessing, rejecting, or making allowances for, outlying classified manner for tbeir better understanding and application. Secondly,
observations. Sucb metbods do exist, but tbey tend to appear in a scattered and more fundamentally, tbe examination of an outlier must bave propriety
form tbrougbout tbe statistical literature. Tbe aim of tbis book is to bring if viewed in relative terms. Suppose we tbink tbat a sample arises from a
tbem togetber and to present a unified discussion of ways of bandling normal distribution but one observation seems intuitively unreasonable (it is
outliers in statistica! data, in relation to tbe nature of tbe outliers and to tbe an outlier); an appropriate statistica! test confirms its unacceptability. It
aims of tbe investigation. seems to beg tbe question to say tbat tbe unreasonable observation sbould
At tbis stage we must make clear wbat we mean by an outlier. We sball not bave been regarded as an outlier, on tbe grounds tbat it would not bave
define an outlier in a set of data to be an observation (or subset of appeared unreasonable if we bad bad in mind, say, a log-normal distribu-
observations) which appears to be inconsistent with the remainder of that se t of tion, as a model for explaining tbe data. Be tbis as it may, tbe rogue
data. Tbe pbrase 'appears to be inconsistent' is crucial. It is a matter of observation did appear as an outlier relative to our originai mode/, wbicb
subjective judgement on tbe part of tbe observer wbetber or not be picks presumably bad some basis as an initial specification. Examination of tbe
out some observation (or set of observations) for scrutiny. Wbat really outlier allows a more appropriate model to be formulated, or enables us to
worries bim is wbetber or not some observations are genuine members of assess any dangers tbat may arise from basing inferences on tbe normality
tbe main population. If tbey are not, tbey may frustrate bis attempts to draw assumption. Tbis is very mucb tbe way in wbicb outliers bave been discussed
inferences about tbat population. Any small number of spurious observa- in tbe statistica! literature, and seems a fruitful avenue of enquiry.
tions in tbe midst of tbe data may not be consQ!cuous in any case: tbey are We sball consider in subsequent cbapters tbe various metbods available
perbaps unlikely to seriously distort tbe inference process. But wbat cbarac- for dealing witb outliers in different situations, including some of tbe
terizes tbe 'outlier' is its impact on tbe observer (it appears extreme in some difficulties tbat inevitably arise.
4 Outliers in statistical data Introduction 5

more rigorously defined, and a vast array of sopbisticated metbodology bas way). Sbould such observations be foreign to tbe main population tbey may,
been constructed. Practical situations are commonly represented in terms of by tbeir very nature, cause difficulties in tbe attempt to represent tbe
different possible families of probability models often cbaracterized by some population: tbey can grossly contaminate estimates (or tests) of parameters
small number of parameters. Generai considerations of situation structure, in some model for tbe population. Accordingly tbe outlier problem takes tbe
past experience of similar circumstances, an d matbematical tractability, ali following form. We examine tbe data se't. We decide tbat outliers exist (in
combine to suggest one particular family of probability distributions wbicb tbe sense described above). How sbould we react to tbem? Wbat metbods
migbt reasonably be expected to represent tbe prevailing situation. Sample can be used to support rejecting tbe outlying observations, or adjusting tbeir
data may be analysed to assess tbe validity of tbe prescribed model, and to values, prior to processing tbe principal mass of data? Clearly, tbe answer
estimate or test bypotbeses concerning relevant parameters. Tbis greater depends on tbe form of tbe population; tecbniques will be conditioned by,
sopbistication in tbe design and use of statistital metbods makes it no less and specific to, any postulated model for tbat population. Tbus, metbods for
important to be able to assess tbe integrity of a set of data. However, tbere tbe processing of outliers take on an entirely relative form. It may be, of
is some tendency to give greater regard to tbe processing of data for course, tbat we do not go beyond tbe rejection stage in some cases. Our
parameter estimation or testing on tbe assumption tbat sucb and sucb a interest rests in identifying foreign observations as matters of major con-
model applies, tban to investigating wbetber tbe data give added support to cern: tbey indicate particular matters of practical interest.
tbe generai considerations wbicb bave promoted tbe model. One conceptual difficulty needs to be recognized at tbe outset. Opinion is
Tbis is a somewbat dangerous principle. Wbat is known to be a good divided on precisely wben it is justifiable to scrutinize outliers. Tbere is little
statistica! procedure for estimating tbe mean of a normal distribution may be dispute tbat it is reasonable wben outliers exist in tbe form of errors of
most inefficient if tbe distribution is not normal. Tbe actual data being observation, or mis-recording, tbat is, wben tbey can be substantiated by
analysed can so un d a warning for us! Perbaps o ne or more observations look practical considerations sucb as tbe sbeer impossibility of a recorded value,
suspicious wben tbe data are considered as a sample from a normal or an obvious buman errar. It is sometimes claimed (as remarked above)
distribution: tbey may bave been incorrectly recorded (or measured), of tbat tbese are tbe only genuine 'outliers' and tbat if no sucb tangible
course, or tbey may be a genuine reftection of tbe basic impropriety of explanation can be found for apparently unreasonable observations tben
assuming an underlying normal distribution. tbeir rejection, or accommodation by special treatment, is invalid. However,
Clearly sucb considerations are vitally important for proper statistica! two factors lead us to reject tbis nibilistic attitude. In tbe first piace a variety
practice. We need a battery of tecbniques for assessing tbe integrity of a set of metbods bave been proposed for dealing witb 'non-tangible' outliers;
of data witb respect to an assumed model. As a particular aspect of tbis we tbese are used by statisticians and it seems desirable to present tbem in a
need metbods for assessing, rejecting, or making allowances for, outlying classified manner for tbeir better understanding and application. Secondly,
observations. Sucb metbods do exist, but tbey tend to appear in a scattered and more fundamentally, tbe examination of an outlier must bave propriety
form tbrougbout tbe statistical literature. Tbe aim of tbis book is to bring if viewed in relative terms. Suppose we tbink tbat a sample arises from a
tbem togetber and to present a unified discussion of ways of bandling normal distribution but one observation seems intuitively unreasonable (it is
outliers in statistica! data, in relation to tbe nature of tbe outliers and to tbe an outlier); an appropriate statistica! test confirms its unacceptability. It
aims of tbe investigation. seems to beg tbe question to say tbat tbe unreasonable observation sbould
At tbis stage we must make clear wbat we mean by an outlier. We sball not bave been regarded as an outlier, on tbe grounds tbat it would not bave
define an outlier in a set of data to be an observation (or subset of appeared unreasonable if we bad bad in mind, say, a log-normal distribu-
observations) which appears to be inconsistent with the remainder of that se t of tion, as a model for explaining tbe data. Be tbis as it may, tbe rogue
data. Tbe pbrase 'appears to be inconsistent' is crucial. It is a matter of observation did appear as an outlier relative to our originai mode/, wbicb
subjective judgement on tbe part of tbe observer wbetber or not be picks presumably bad some basis as an initial specification. Examination of tbe
out some observation (or set of observations) for scrutiny. Wbat really outlier allows a more appropriate model to be formulated, or enables us to
worries bim is wbetber or not some observations are genuine members of assess any dangers tbat may arise from basing inferences on tbe normality
tbe main population. If tbey are not, tbey may frustrate bis attempts to draw assumption. Tbis is very mucb tbe way in wbicb outliers bave been discussed
inferences about tbat population. Any small number of spurious observa- in tbe statistica! literature, and seems a fruitful avenue of enquiry.
tions in tbe midst of tbe data may not be consQ!cuous in any case: tbey are We sball consider in subsequent cbapters tbe various metbods available
perbaps unlikely to seriously distort tbe inference process. But wbat cbarac- for dealing witb outliers in different situations, including some of tbe
terizes tbe 'outlier' is its impact on tbe observer (it appears extreme in some difficulties tbat inevitably arise.
6 Outliers in statistica[ data Introduction 7

In the following sections of this chapter some practical examples are been, perhaps, 2.05 kg. This conclusion is supported by biological considera-
discussed briefty, to illustrate the ways in which the outlier problem may tions, and by the overall pattern of results for a large sample of birds.
present itself. These also serve to motivate the different forms of statistical Whilst one cannot be one hundred per cent certain of the interpretation of
analysis considered in detail throughout the book. 1.55 in this last example, some instances arise where recorded values are
absolutely impos!S'ible. In a recent -student exercise records were kept of the
1.1 HUMAN ERROR AND IGNORANCE numbers of times a six occurred in ten throws of ten dice. One student
returned the results:
There is a class of situations where outliers are readily handled, where the
manner of dealing with them is obvious and non-controversia!. Such is the 2, O, 3, 12, 2, O, l, l, 3
situation when human errors lead to blatantly jncorrect recording of data, or Clearly the value 12 cannot be genuine. Furthermore, since ten observations
where lack of regard to practical factors results in serious misinterpretation. were asked for, it would seem merely that a comma has been omitted in a
In a study of low-temperature probabilities throughout the winter months sequence of numbers which should have read:
in the British Isles, Barnett and Lewis (1967) analysed extensive data on
hourly temperatures over several years and various geographical sites. In the 2, O, 3, l, 2, 2, O, l, l, 3
main, temperatures were recorded in degrees Fahrenheit. Among extensive These examples ali illustrate the effect of non-statistica! factors attributa-
data for Wick in northern Scotland the following hourly temperatures were ble, to a greater or lesser extent, to lack of care in the recording or
found for the late evening of 31 December 1960 and early morning of l presentation of data. Processing of outliers (or spurious values of any sort) in
January 1961: such cases is not a matter of statistica! analysis, but of native wit! Correct
action also raises no difficulties in most cases. In the examples which follow,
43,43,41,41,41,42,43,58,58,41,41
however, we will be very much concerned with statistica! factors which affect
t the occurrence and treatment of outliers.
midnight
The values 58, 58 for midnight and 1.00 a.m. stand out in severe contrast
1.2 OUTLIERS IN RELATION TO PROBABILITY MODELS
to the others in this time-series section, and initially give one grounds for
concern as to whether or not they are genuine-it seemed very warm at An interesting area of statistica! enquiry is the manner in which foodstuffs
midnight for New Year in northern Scotland. On further enquiry, however, and generai household products are purchased. Table 1.1 shows a frequency
they were found to be perfectly reasonable! At midnight the Meteorologica! distribution for the number of packets of a particular brand and packet size
Office changed its recording uni t from degrees Fahrenheit to io oc, so that in of breakfast cereal (Chatfield, 1974) bought over a given period of time.
degrees Fahrenheit the values appear as follows (to the nearest degree): One aspect which stands out in these data are the two single instances of
purchases of 39 and 52 packets, which seem somewhat out of line with other
43,43,41,41,41,42,43,42,42,39,39 observations. How should we react to these outliers? There are various
t possibili ti es.
midnight
These are much more satisfactory; so much for the 'outliers' 58 and 58!
In his Presidential address to the Royal Statistica! Society, Finney (1974) Table 1.1 Frequency distribution for numbers of packets of cereal purchased over
gives an interesting example of the way in which recording errors may 13 weeks by 2000 customers
appear as outliers in a set of data. He reports on measurements taken on the
growth of poultry. For one bird, the weights (in kg) for successive weighings No. of packets o l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
at regular intervals were shown as: Frequency 1149 199 129 87 71 43 49 46 44 24 45 22 23 33 8 2 7 2 3

No. of packets 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1.20, 1.60, 1.90, 1.55, 2.20, 2.25 Frequency 2 o o l o l 3 2 o l l o o o o o o

From the manner in which the weights were determined and recorded it was No. of packets 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 ;:. 53
clearly possible to commit recording errors of 0.50 kg or 1.00 kg. It seems Frequency o o l o o o o o o o o o o o o l o
highly likely that the fourth reading is a mis-recording of what should have
6 Outliers in statistica[ data Introduction 7

In the following sections of this chapter some practical examples are been, perhaps, 2.05 kg. This conclusion is supported by biological considera-
discussed briefty, to illustrate the ways in which the outlier problem may tions, and by the overall pattern of results for a large sample of birds.
present itself. These also serve to motivate the different forms of statistical Whilst one cannot be one hundred per cent certain of the interpretation of
analysis considered in detail throughout the book. 1.55 in this last example, some instances arise where recorded values are
absolutely impos!S'ible. In a recent -student exercise records were kept of the
1.1 HUMAN ERROR AND IGNORANCE numbers of times a six occurred in ten throws of ten dice. One student
returned the results:
There is a class of situations where outliers are readily handled, where the
manner of dealing with them is obvious and non-controversia!. Such is the 2, O, 3, 12, 2, O, l, l, 3
situation when human errors lead to blatantly jncorrect recording of data, or Clearly the value 12 cannot be genuine. Furthermore, since ten observations
where lack of regard to practical factors results in serious misinterpretation. were asked for, it would seem merely that a comma has been omitted in a
In a study of low-temperature probabilities throughout the winter months sequence of numbers which should have read:
in the British Isles, Barnett and Lewis (1967) analysed extensive data on
hourly temperatures over several years and various geographical sites. In the 2, O, 3, l, 2, 2, O, l, l, 3
main, temperatures were recorded in degrees Fahrenheit. Among extensive These examples ali illustrate the effect of non-statistica! factors attributa-
data for Wick in northern Scotland the following hourly temperatures were ble, to a greater or lesser extent, to lack of care in the recording or
found for the late evening of 31 December 1960 and early morning of l presentation of data. Processing of outliers (or spurious values of any sort) in
January 1961: such cases is not a matter of statistica! analysis, but of native wit! Correct
action also raises no difficulties in most cases. In the examples which follow,
43,43,41,41,41,42,43,58,58,41,41
however, we will be very much concerned with statistica! factors which affect
t the occurrence and treatment of outliers.
midnight
The values 58, 58 for midnight and 1.00 a.m. stand out in severe contrast
1.2 OUTLIERS IN RELATION TO PROBABILITY MODELS
to the others in this time-series section, and initially give one grounds for
concern as to whether or not they are genuine-it seemed very warm at An interesting area of statistica! enquiry is the manner in which foodstuffs
midnight for New Year in northern Scotland. On further enquiry, however, and generai household products are purchased. Table 1.1 shows a frequency
they were found to be perfectly reasonable! At midnight the Meteorologica! distribution for the number of packets of a particular brand and packet size
Office changed its recording uni t from degrees Fahrenheit to io oc, so that in of breakfast cereal (Chatfield, 1974) bought over a given period of time.
degrees Fahrenheit the values appear as follows (to the nearest degree): One aspect which stands out in these data are the two single instances of
purchases of 39 and 52 packets, which seem somewhat out of line with other
43,43,41,41,41,42,43,42,42,39,39 observations. How should we react to these outliers? There are various
t possibili ti es.
midnight
These are much more satisfactory; so much for the 'outliers' 58 and 58!
In his Presidential address to the Royal Statistica! Society, Finney (1974) Table 1.1 Frequency distribution for numbers of packets of cereal purchased over
gives an interesting example of the way in which recording errors may 13 weeks by 2000 customers
appear as outliers in a set of data. He reports on measurements taken on the
growth of poultry. For one bird, the weights (in kg) for successive weighings No. of packets o l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
at regular intervals were shown as: Frequency 1149 199 129 87 71 43 49 46 44 24 45 22 23 33 8 2 7 2 3

No. of packets 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
1.20, 1.60, 1.90, 1.55, 2.20, 2.25 Frequency 2 o o l o l 3 2 o l l o o o o o o

From the manner in which the weights were determined and recorded it was No. of packets 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 ;:. 53
clearly possible to commit recording errors of 0.50 kg or 1.00 kg. It seems Frequency o o l o o o o o o o o o o o o l o
highly likely that the fourth reading is a mis-recording of what should have
Introduction 9
8 Outliers in statistica[ data

(l) "W_e might b.e interested in identifying out-of-the-ordinary patterns of compound Poisson distribution where A varies from individuai to individuai
purchasmg behav10ur-perhaps due to institutional rather then personal is often warranted. Current work indeed favours the negative binomia!
shopping, or i~dica~ing hoarding of products in times of potential shortage model this policy promotes (Chatfield, Ehrenberg, and Goodhardt, 1966).
or expected pnce nses. The observations 39 and 52 (possibly others) then Whatever our conclusion, its basis rests on assessing values which are
become of prime interest. We might try to fit some probability distribution unrepresentative of the originai model.
to represent the majority of the data, relating to 'reasonable private pur- (3) Again we must be on the look-out for non-statistica! factors, inftuenc-
chase', an d then attempt to consider the outliers relative to that distribution.
ing the occurrence of outliers. In Table 1.1 we note that the data refer to
An appropriate method might lead us to conclude that the observations 39
numbers of purchases over a thirteen-week period. Might it be more than
and 52 are anomalous and might cast doubt in terms of poorness of fit on
coincidence that the two outliers, 52 an d 39, are both multiples of 13?
som: of the obs~rvations !n
the range 20-30. If our interest is in studying There seem to be further 'blips' in the frequency distribution at 26 (3
atyp1cal purchasmg behav10ur, the identified individuals constitute a set of
purchases) and 13 (33 purchases). The outliers cali our attention to another
~pe~i~l importance in their own right. Follow-up enquiries for the 'outlying'
possible ingredient in the purchasing model, corresponding to automatic
mdividuals may reveal special attitudes or patterns of behaviour with a
regular purchases of 4, 3, 2, or l packets per week. A better model might be
strong inftuence in sociological terms or in terms of the choice of reasonable
one with three components reftecting
policies for holding stocks of the product. The outliers may be accumulated
(i) lack of interest in the product,
with others in alternative sets of data, to build up more comprehensive
(ii) a distribution of regular purchases of l, 2, ... packets per week, and
information on this special group. (iii) a Poisson (or mixed Poisson) process pattern of casual purchases.
(2) If, in contrast with (1), prime interest rests on the overall pattern of
In (2) above we remarked on the way in which outliers may inftuence the
purchasing behaviour, any outliers play a subsidiary role. It may be that
propriety of different methods of estimating parameters in the basic model.
generai considerations suggest a possible model for the distribution of
Let us consider a more specific example. Suppose the following random
purchases, and that certain observations which appear as outliers merely
observations were obtained for some variable of interest:
cloud the issue; they arise for purely technical reasons unimportant in the
nature ~f our enquiry. Their detection and rejection then aid the study of 1.74, 1.46, -1.28, -0.02, -0.40, 0.02, 3.89, 1.35, -0.10, 1.71
the basic model. We can better assess its fit; we may be better able to
We wish to estimate the 'centre' of the parent population. Initial considera-
estimate relevant parameters. One or two extreme, unrepresentative, values
tions suggest that the population may be normal, N( 8, l), so the sample
can ser~ousl.y distort the fitting or estimation process. For example, as a first mean would clearly be a sensible form of estimator. But the value 3.89
approx1matton we might set up a model in the form of a modified Poisson
makes us suspicious of the N( 8, l) assumption! In fact, these data were
distribution, in which the population is di'vided into two groups (non-
generated as a random sample from a Cauchy distribution, with probability
purchasers and potential purchasers) where potential purchasers buy packets
of cereal according to a Poisson process of rate A. If 8 is the proportion of density function
non-purchasers, we would have a probability distribution for X, the number
?f packe~s purchased over the observation period of length T, with probabil-
Ity funct1on
The sample mean here is not even consistent, let alone of reasonable
p(O) = 8 + (1- 8)e-A.T efficiency, and we should have made very poor use of our data in the
p( x)= (l- 8)(AT)xe-A.T/x! x= l, 2, ... estimation procedure had we used it as an estimator of location.
Observations far removed from the main body of the sample arise
If this model is reasonable, estimation of 8 and A provides useful informa- naturally in sampling from a Cauchy distribution, and this contrasts with the
tion about purchas~ng behaviour. Outlying observations can seriously affect common situation where the presence of an outlier suggests the possible
the fit, and the esumates, and need to be carefully examined. If the model inappropriateness of a model. A similar phenomenon occurs not infre-
fits well to ~ll the data, its adoption is reinforced. If it fits well apart from quently in biologica! contexts. For example, the distribution of the number
the observat10n 52 (and perhaps 39), we must decide whether the outlier 52 of cones on a fir tree for trees in a given area of forest, or the distribution of
(and 39) is for som_e reason spurious, and should be omitted in further study, the number of lepidoptera of the same species present and observed in a
or whether we mtght need a more sophisticated model. For example, a particular location, are both characterized by high skewness. A typical
Introduction 9
8 Outliers in statistica[ data

(l) "W_e might b.e interested in identifying out-of-the-ordinary patterns of compound Poisson distribution where A varies from individuai to individuai
purchasmg behav10ur-perhaps due to institutional rather then personal is often warranted. Current work indeed favours the negative binomia!
shopping, or i~dica~ing hoarding of products in times of potential shortage model this policy promotes (Chatfield, Ehrenberg, and Goodhardt, 1966).
or expected pnce nses. The observations 39 and 52 (possibly others) then Whatever our conclusion, its basis rests on assessing values which are
become of prime interest. We might try to fit some probability distribution unrepresentative of the originai model.
to represent the majority of the data, relating to 'reasonable private pur- (3) Again we must be on the look-out for non-statistica! factors, inftuenc-
chase', an d then attempt to consider the outliers relative to that distribution.
ing the occurrence of outliers. In Table 1.1 we note that the data refer to
An appropriate method might lead us to conclude that the observations 39
numbers of purchases over a thirteen-week period. Might it be more than
and 52 are anomalous and might cast doubt in terms of poorness of fit on
coincidence that the two outliers, 52 an d 39, are both multiples of 13?
som: of the obs~rvations !n
the range 20-30. If our interest is in studying There seem to be further 'blips' in the frequency distribution at 26 (3
atyp1cal purchasmg behav10ur, the identified individuals constitute a set of
purchases) and 13 (33 purchases). The outliers cali our attention to another
~pe~i~l importance in their own right. Follow-up enquiries for the 'outlying'
possible ingredient in the purchasing model, corresponding to automatic
mdividuals may reveal special attitudes or patterns of behaviour with a
regular purchases of 4, 3, 2, or l packets per week. A better model might be
strong inftuence in sociological terms or in terms of the choice of reasonable
one with three components reftecting
policies for holding stocks of the product. The outliers may be accumulated
(i) lack of interest in the product,
with others in alternative sets of data, to build up more comprehensive
(ii) a distribution of regular purchases of l, 2, ... packets per week, and
information on this special group. (iii) a Poisson (or mixed Poisson) process pattern of casual purchases.
(2) If, in contrast with (1), prime interest rests on the overall pattern of
In (2) above we remarked on the way in which outliers may inftuence the
purchasing behaviour, any outliers play a subsidiary role. It may be that
propriety of different methods of estimating parameters in the basic model.
generai considerations suggest a possible model for the distribution of
Let us consider a more specific example. Suppose the following random
purchases, and that certain observations which appear as outliers merely
observations were obtained for some variable of interest:
cloud the issue; they arise for purely technical reasons unimportant in the
nature ~f our enquiry. Their detection and rejection then aid the study of 1.74, 1.46, -1.28, -0.02, -0.40, 0.02, 3.89, 1.35, -0.10, 1.71
the basic model. We can better assess its fit; we may be better able to
We wish to estimate the 'centre' of the parent population. Initial considera-
estimate relevant parameters. One or two extreme, unrepresentative, values
tions suggest that the population may be normal, N( 8, l), so the sample
can ser~ousl.y distort the fitting or estimation process. For example, as a first mean would clearly be a sensible form of estimator. But the value 3.89
approx1matton we might set up a model in the form of a modified Poisson
makes us suspicious of the N( 8, l) assumption! In fact, these data were
distribution, in which the population is di'vided into two groups (non-
generated as a random sample from a Cauchy distribution, with probability
purchasers and potential purchasers) where potential purchasers buy packets
of cereal according to a Poisson process of rate A. If 8 is the proportion of density function
non-purchasers, we would have a probability distribution for X, the number
?f packe~s purchased over the observation period of length T, with probabil-
Ity funct1on
The sample mean here is not even consistent, let alone of reasonable
p(O) = 8 + (1- 8)e-A.T efficiency, and we should have made very poor use of our data in the
p( x)= (l- 8)(AT)xe-A.T/x! x= l, 2, ... estimation procedure had we used it as an estimator of location.
Observations far removed from the main body of the sample arise
If this model is reasonable, estimation of 8 and A provides useful informa- naturally in sampling from a Cauchy distribution, and this contrasts with the
tion about purchas~ng behaviour. Outlying observations can seriously affect common situation where the presence of an outlier suggests the possible
the fit, and the esumates, and need to be carefully examined. If the model inappropriateness of a model. A similar phenomenon occurs not infre-
fits well to ~ll the data, its adoption is reinforced. If it fits well apart from quently in biologica! contexts. For example, the distribution of the number
the observat10n 52 (and perhaps 39), we must decide whether the outlier 52 of cones on a fir tree for trees in a given area of forest, or the distribution of
(and 39) is for som_e reason spurious, and should be omitted in further study, the number of lepidoptera of the same species present and observed in a
or whether we mtght need a more sophisticated model. For example, a particular location, are both characterized by high skewness. A typical
1O Outliers in statistica[ data Introduction 11

sample from tbis latter type of distribution is given below; it refers to tbe visual inspection of tbe table of responses will not reveal tbat it is an outlier.
number of individuals of a given species in a random sample of nocturnal Tbe evidence for tbis only comes to ligbt wben tbe parameters of tbe model
Macrolepidoptera caugbt in a ligbt-trap at Rotbamsted (Fisber, Corbet are fitted and tbe deviations of tbe observed responses from tbe fitted values
and Williams, 1943): are tabulated. Anotber situation of tbis kind arises in the context of out-
lying values in a regression analysis. Tbere is a strong body of opinion
11,54,5, 7,4,15,560, 18,120,24,3,51,3, 12,84 nowadays wbicb advises tbat examination of residuals sbould be carried out
as an essential part of any regression analysis. In tbis case also we may be
Here we bave a situation in wbich an outlying value (tbe value 560) is an concerned witb outliers-outlying residuals. Wby is tbis different from
inberent feature of tbe natural data pattern, and in no way anomalous. Ferguson's outlying values in univariate distributions? For one tbing, tbe
residuals are not independent; tbis may make outliers difficult to judge and
1.3 OUTLIERS IN MORE STRUCTURED SITUATIONS also complicates tbe metbodology.
Tbe examination of univariate samples for fitting models and estimating At tbis stage, it is useful to consider one or two practical examples of tbe
parameters, wbilst an important part of statistica! practice, bas somewbat existence of outliers in regression, time-series, and otber problems. No
limited aims and utility. More often, and more usefully, we need to consider attempt will be made bere to discuss implications in any detail (but see
more structured situations. For example, an interest in tbe way in wbicb Cbapter 7).
observations of a variable of principal interest vary witb values of otber
variables, or vary witb time, leads to tbe study of regression models, and (a) Regression models
time-series models, respectively. Or again, concern for tbe influence of Tbe linear regression of one random variable, Y, on a controlled variable,
different qualitative factors on tbe principal variable leads to additive X, (or conditional on tbe observed values of a random variable, X) is a
models analysed by analysis of variance tecbniques. Tbese various models model widely used for an initial study of tbe way in wbich Y varies witb X.
and tecbniques bave, of course, tbeir counterparts in tbe study of mul- Some data by Cruiksbank, reported by Quenouille (1953) show measure-
tivariate data.
ments (yi) of tbe coronary flow in a cat at twelve different times, witb tbe
In all tbese more structured cases we must also expect to encounter, from associated auricular pressure values (xi). Tbe scatter diagram of tbe results is
time to time, unrepresentative data in tbe form of outliers. Here it is just as given in Figure 1.2.
important as in tbe simple univariate sample to be able to interpret and Rabman (1972, p. 174) suggests fitting a linear regression of coronary
accommodate outliers by using appropriately designed statistica! tecbniques. flow on auricular pressure. Viewing sucb a proposal uncritically we are
Outliers may, as before, be of intrinsic interest in their own rigbt, or may be nonetbeless immediately struck by tbe apparent linearity of tbe relationsbip
indicative of inappropriate specifications of tbe error structure, or of tbe revealed by tbe data in the absence of the observations A, B, C, and D. In
basic model, witb consequential implications for tbe use of appropriate tbeir presence tbe linear model bas less obvious support, but tbe outliers
inference procedures. Witb more structured data two complications arise: bave a strange consistency. Tbey all relate to tbe same auricular pressure
suspicious observations tend to be less intuitively apparent, more bidden, in reading of 6.8, and were apparently tbe last four observations taken (in
tbe data mass, and formai metbods for tbeir rejection or tbeir accommoda- order: D last). Are tbey for some reason of quite a different basic nature to
tion are less bigbly developed.
tbe otbers; was tbe cat dead? Tbere may well be strong grounds bere for
Tbe outlier problem is a complex one. As soon as one starts tbinking of an omitting tbe outliers A, B, ç, and D from any analysis of tbe relationsbip
apparently simple formulation sucb as tbe above-quoted one by Ferguson, between coronary flow and auricular pressure.
tbe ramifications begin to appear. Ferguson speaks of identifying outliers by
perceiving tbem to be 'surprisingly far away from tbe main group'. But wbat (b) Time-Series
is surprising? From tbe examples we bave already considered it is clear tbat
treatment of outliers depends in an essential way on tbe assumed underlying Time-series data are widely studied in commerciai, industriai, meteorologi-
distribution. Again, in wbat way does tbe person making tbe judgement ca!, and sociological processes. Outliers can again arise and cause difficulties.
assess tbe relationsbip of tbe outlying values to the main group? Does be do Consider tbe following examples.
it by simple inspection? Tbis is tbe situation tbat com es first to mind an d bas Cbatfield (1975) presents some data on numbers of a particular product
been illustrated. But tbe outlier may be a value in a designed experiment- wbicb are sold eacb quarter over a six-year period. Tbese data are sbown in
one of the observed responses, say, in a Latin-square experiment. A simple Figure 1.3. Discussing tbe idea of outliers in the time-series data, Cbatfield
1O Outliers in statistica[ data Introduction 11

sample from tbis latter type of distribution is given below; it refers to tbe visual inspection of tbe table of responses will not reveal tbat it is an outlier.
number of individuals of a given species in a random sample of nocturnal Tbe evidence for tbis only comes to ligbt wben tbe parameters of tbe model
Macrolepidoptera caugbt in a ligbt-trap at Rotbamsted (Fisber, Corbet are fitted and tbe deviations of tbe observed responses from tbe fitted values
and Williams, 1943): are tabulated. Anotber situation of tbis kind arises in the context of out-
lying values in a regression analysis. Tbere is a strong body of opinion
11,54,5, 7,4,15,560, 18,120,24,3,51,3, 12,84 nowadays wbicb advises tbat examination of residuals sbould be carried out
as an essential part of any regression analysis. In tbis case also we may be
Here we bave a situation in wbich an outlying value (tbe value 560) is an concerned witb outliers-outlying residuals. Wby is tbis different from
inberent feature of tbe natural data pattern, and in no way anomalous. Ferguson's outlying values in univariate distributions? For one tbing, tbe
residuals are not independent; tbis may make outliers difficult to judge and
1.3 OUTLIERS IN MORE STRUCTURED SITUATIONS also complicates tbe metbodology.
Tbe examination of univariate samples for fitting models and estimating At tbis stage, it is useful to consider one or two practical examples of tbe
parameters, wbilst an important part of statistica! practice, bas somewbat existence of outliers in regression, time-series, and otber problems. No
limited aims and utility. More often, and more usefully, we need to consider attempt will be made bere to discuss implications in any detail (but see
more structured situations. For example, an interest in tbe way in wbicb Cbapter 7).
observations of a variable of principal interest vary witb values of otber
variables, or vary witb time, leads to tbe study of regression models, and (a) Regression models
time-series models, respectively. Or again, concern for tbe influence of Tbe linear regression of one random variable, Y, on a controlled variable,
different qualitative factors on tbe principal variable leads to additive X, (or conditional on tbe observed values of a random variable, X) is a
models analysed by analysis of variance tecbniques. Tbese various models model widely used for an initial study of tbe way in wbich Y varies witb X.
and tecbniques bave, of course, tbeir counterparts in tbe study of mul- Some data by Cruiksbank, reported by Quenouille (1953) show measure-
tivariate data.
ments (yi) of tbe coronary flow in a cat at twelve different times, witb tbe
In all tbese more structured cases we must also expect to encounter, from associated auricular pressure values (xi). Tbe scatter diagram of tbe results is
time to time, unrepresentative data in tbe form of outliers. Here it is just as given in Figure 1.2.
important as in tbe simple univariate sample to be able to interpret and Rabman (1972, p. 174) suggests fitting a linear regression of coronary
accommodate outliers by using appropriately designed statistica! tecbniques. flow on auricular pressure. Viewing sucb a proposal uncritically we are
Outliers may, as before, be of intrinsic interest in their own rigbt, or may be nonetbeless immediately struck by tbe apparent linearity of tbe relationsbip
indicative of inappropriate specifications of tbe error structure, or of tbe revealed by tbe data in the absence of the observations A, B, C, and D. In
basic model, witb consequential implications for tbe use of appropriate tbeir presence tbe linear model bas less obvious support, but tbe outliers
inference procedures. Witb more structured data two complications arise: bave a strange consistency. Tbey all relate to tbe same auricular pressure
suspicious observations tend to be less intuitively apparent, more bidden, in reading of 6.8, and were apparently tbe last four observations taken (in
tbe data mass, and formai metbods for tbeir rejection or tbeir accommoda- order: D last). Are tbey for some reason of quite a different basic nature to
tion are less bigbly developed.
tbe otbers; was tbe cat dead? Tbere may well be strong grounds bere for
Tbe outlier problem is a complex one. As soon as one starts tbinking of an omitting tbe outliers A, B, ç, and D from any analysis of tbe relationsbip
apparently simple formulation sucb as tbe above-quoted one by Ferguson, between coronary flow and auricular pressure.
tbe ramifications begin to appear. Ferguson speaks of identifying outliers by
perceiving tbem to be 'surprisingly far away from tbe main group'. But wbat (b) Time-Series
is surprising? From tbe examples we bave already considered it is clear tbat
treatment of outliers depends in an essential way on tbe assumed underlying Time-series data are widely studied in commerciai, industriai, meteorologi-
distribution. Again, in wbat way does tbe person making tbe judgement ca!, and sociological processes. Outliers can again arise and cause difficulties.
assess tbe relationsbip of tbe outlying values to the main group? Does be do Consider tbe following examples.
it by simple inspection? Tbis is tbe situation tbat com es first to mind an d bas Cbatfield (1975) presents some data on numbers of a particular product
been illustrated. But tbe outlier may be a value in a designed experiment- wbicb are sold eacb quarter over a six-year period. Tbese data are sbown in
one of the observed responses, say, in a Latin-square experiment. A simple Figure 1.3. Discussing tbe idea of outliers in the time-series data, Cbatfield
12 Outliers in statistica! data Introduction 13

y + .~~ A 8 C
oo ·
~u

12 ~-------L------~--------L-------~-------L-------
1 2 3 4 5

10 +

6 7 8 Time (hours)
8 +
Figure 1.4 Moisture content of Malaysian tobacco continuously monitored over
eight hours
6

+A
suggests that there is reason to doubt the observation A. His reasoning
4 + differs from our earlier examples, where some sort of 'extremeness' was the
key to examining an outlying observation. Here it is the break in the
+8
'pattern' of results which makes him suspect A. In previous years a relatively
2 +C low sales figure in the first quarter was followed by two intermediate values
and a relatively high value. In the last year, the second quarter figure (A)
D
breaks this pattern. Perhaps economie or accounting factors bave produced
2 4 6 8 10 12 x a spurious result, possibly compensated for by the opposingly atypical value,
B. Alternatively, the results for the final year may indicate a radical change
Figure 1.2 Auricular pressure (x) and coronary flow (y) fora cat
in the cyclic pattern of sales over the year.
3000
In Figure 1.4 we see (Fenton, 1975) a continuous trace of measured
moisture content of Malaysian tobacco, being automatically monitored as it
flows past the recording equipment on a conveyor belt at a particular stage
of the curing process. The equipment is known to suffer from occasionai
"'O
electronic 'hiccoughs' which appear as sharp spikes in the trace. An experi-
o1/1 enced observer comments that 'A an d B are clearly outliers' in this sense,
1/1
E
8 whilst 'C is unlikely to be an outlier'.
..~ A final example on possible outliers in time-series data is seen in the data
'O 2000 of Table 1.2 on the percentages of road accidents each month which result
in death, for the 10 years 1960-1970 in the British Isles (Chedzoy, 1973).
As a simple screening procedure for the data, which takes some account
of the inevitable seasonal nature of variability in such data, Chedzoy has
indicated ( l ) those observations, month by month, which are furthest from
the monthly average. The standard deviations of the monthly percentages
seem to be substantially reduced by omitting these extremes-see last two
rows of Table 1.2. They must be reduced of course; what is relevant is
whether or not they are reduced to a significantly greater extent than would
4 8 12 16 20 24 be expected by chance! It is interesting to note the particular observations
Quarter
picked out by this process. January and February 1963 were months of
Figure 1.3 Sales figures for a product over consecutive quarters fora period of six particularly severe weather. We might expect an excess of minor accidents: a
years (reproduced by permission of C. Chatfield and Chapman & Hall) reduction in the death rate. Perhaps November and December 1966 ('Black
12 Outliers in statistica! data Introduction 13

y + .~~ A 8 C
oo ·
~u

12 ~-------L------~--------L-------~-------L-------
1 2 3 4 5

10 +

6 7 8 Time (hours)
8 +
Figure 1.4 Moisture content of Malaysian tobacco continuously monitored over
eight hours
6

+A
suggests that there is reason to doubt the observation A. His reasoning
4 + differs from our earlier examples, where some sort of 'extremeness' was the
key to examining an outlying observation. Here it is the break in the
+8
'pattern' of results which makes him suspect A. In previous years a relatively
2 +C low sales figure in the first quarter was followed by two intermediate values
and a relatively high value. In the last year, the second quarter figure (A)
D
breaks this pattern. Perhaps economie or accounting factors bave produced
2 4 6 8 10 12 x a spurious result, possibly compensated for by the opposingly atypical value,
B. Alternatively, the results for the final year may indicate a radical change
Figure 1.2 Auricular pressure (x) and coronary flow (y) fora cat
in the cyclic pattern of sales over the year.
3000
In Figure 1.4 we see (Fenton, 1975) a continuous trace of measured
moisture content of Malaysian tobacco, being automatically monitored as it
flows past the recording equipment on a conveyor belt at a particular stage
of the curing process. The equipment is known to suffer from occasionai
"'O
electronic 'hiccoughs' which appear as sharp spikes in the trace. An experi-
o1/1 enced observer comments that 'A an d B are clearly outliers' in this sense,
1/1
E
8 whilst 'C is unlikely to be an outlier'.
..~ A final example on possible outliers in time-series data is seen in the data
'O 2000 of Table 1.2 on the percentages of road accidents each month which result
in death, for the 10 years 1960-1970 in the British Isles (Chedzoy, 1973).
As a simple screening procedure for the data, which takes some account
of the inevitable seasonal nature of variability in such data, Chedzoy has
indicated ( l ) those observations, month by month, which are furthest from
the monthly average. The standard deviations of the monthly percentages
seem to be substantially reduced by omitting these extremes-see last two
rows of Table 1.2. They must be reduced of course; what is relevant is
whether or not they are reduced to a significantly greater extent than would
4 8 12 16 20 24 be expected by chance! It is interesting to note the particular observations
Quarter
picked out by this process. January and February 1963 were months of
Figure 1.3 Sales figures for a product over consecutive quarters fora period of six particularly severe weather. We might expect an excess of minor accidents: a
years (reproduced by permission of C. Chatfield and Chapman & Hall) reduction in the death rate. Perhaps November and December 1966 ('Black
14 Outliers in statistica[ data Introduction 15

Table 1.2 Proportions of road accidents, month by month, resulting in death for the 10 of handling outliers in these situations, but some methods bave been
years 1960-1970 in the British Isles proposed an d will be discussed later. See Chapters 6 an d 7 o n multivariate
samples and designed experiments, respectively.
Jan. Feb. Mar. Apr. May June July Aug. Se pt. O et. Nov. Dee. Year

1961

1962
2.31

2.30
1.94

2.10
y<' 1.85 yn(
2.09 1.77 1.68
v
1.79
1.82

1.78
ljffi 1.91
1.83 1.88
2.09

2.12
2.13

2.13
2.25

2.22
1.975

1.963
1.4 BAYESIAN AND NON-PARAMETRIC METHODS

During recent years there has been much interest in, and application of,
1963
l~ 0 1.99 1.86 1.77 1.79 1.78 1.90 1.89 2.11 2.24 2.38 1.943
Bayesian and decision-theoretic methods of statistica! analysis. These differ
1964 2.22 2.25 2.09 1.75 1.78 1.87 1.68 1.,92 1.92 2.15 2.27 2.58 2.029 from the more traditional methods both in terms of their interpretation of
basic concepts and aims, and in their use of extra forms of relevant
1965 2.16 2.24 1.91
~ 1.81 1.89 1.70 1.86 1.97 2.06 2.10 2.31 1.998
information (prior probabilities and consequential costs, additional to sam-
1966 2.14 2.27 1.87 1.80 1.76 1.77 1.93 1.90 2.03 1.97
l~ ly( 2.035 ple data). The sample data components of the information used in such
approaches may well contain outlying observations. Again we must know
1967 2.18 2.30 2.00 1.80 1.64 1.79 1.91 1.81 2.10 1.92 2.21 2.20 1.978
how to deal with them. The more modern methods of statistica! analysis
1968 2.08 2.05 2.00 1.68 1.79 1.77 1.82 1.93 1.99 2.00 2.14 2.17 1.950 affect the outlier problem in two ways. On the one band we might ask how a
Bayesian, or decision-theoretic, analysis should try to cope with outliers. On
1969 2.22 1.89 2.02 1.98 1.80 1.84
~ 1.95 ~ y/ 2.22 2.47 2.090
the other band we might ask if such forms of analysis can be used for the
1970 2.29 2.10 1.86 1.93 1.84 1.81 1.97 2.10 1.93 2.24 2.30 2.32 2.064 detection and processing of outliers. There are some conceptual difficulties
bere. A basic tenet of the Bayesian approach is that inferences, or decisions,
Mean of
10 values 2.16 2.09 2.01 1.85 1.78 1.80 1.85 1.89 1.99 2.09 2.21 2.35 2.003 are strictly conditional on the actual sample data that bave been obtained. In
the main, it is contrary to the Bayesian tradition to view the data within a
Mean of framework of alternative sets of data which might have arisen. Thus the
9 values 2.21 2.13 1.98 1.32 1.76 1.81 1.82 1.91 1.96 2.07 2.19 2.32 1.993
probability mechanism for generating the data is seen to be irrelevant, and it
St. dev. might appear to be similarly irrelevant to question the integrity of the
of 10 0.175 0.188 0.115 0.119 0.087 0.053 0.131 0.115 0.110 0.116 0.089 0.156 0.050 realized data-in particular, to examine the implications of outlying obser-
vations. However, Bayesian methods revolve around the likelihood function
St. dev.
of 9 0.079 0.149 0.085 0.092 0.064 0.044 0.100 0.085 0.072 0.098 0.071 0.135 0.042 which needs to be specified in any particular case. The likelihood does
depend on an assumed probability model, and in turn it would seem that,
relative to that model, certain observations might be outliers. Can the
Bayesian approach afford, on principle, to ignore what these outliers might
Christmas') and September and October 1969 also had tangible explana- imply about incorrect specification of the likelihood?
tions for their unreasonably high death rates-clement weather, with unex- One aspect of the Bayesian approach is its formalization of subjective
pected fogs on motorways! impressions as an ingredient of statistica! analysis. We bave remarked above
how subjective factors arise in judging whether or not outliers exist in a set
(c) Designed experiments, multivariate analyses of data. In the example on weights of birds in Section 1.1, for instance, the
Here, also, we may expect to encounter outliers, with interpretations both reading 1.55 looked suspicious and could well represent a recording error
deterministic (arising from tangible, non-statistica!, sources) and probabilis- for 2.05. In the time-series example o n quarterly sales in Section 1.3, the
tic (causing us to question distributional or structural assumptions). A pattern of sales over the last year appeared inconsistent with earlier experi-
taxonomist might well be confronted with a problem of classification of an ence and made us question the values A and B. In both these cases
individuai on which he has taken a vector of measurements, no single one of subjective judgement was involved. Might it not be better to use Bayesian
which is 'surprising' in relation to its own marginai distribution yet the methods directly in trying to reach a conclusion about the outliers, or in
assemblage of which as a multivariate observation is in some sense 'surpris- taking them into account in processing the data? For example, we might
ingly far away from the main group'. Limited study has been made of ways attempt to assign prior probabilities to different possible explanations of the
14 Outliers in statistica[ data Introduction 15

Table 1.2 Proportions of road accidents, month by month, resulting in death for the 10 of handling outliers in these situations, but some methods bave been
years 1960-1970 in the British Isles proposed an d will be discussed later. See Chapters 6 an d 7 o n multivariate
samples and designed experiments, respectively.
Jan. Feb. Mar. Apr. May June July Aug. Se pt. O et. Nov. Dee. Year

1961

1962
2.31

2.30
1.94

2.10
y<' 1.85 yn(
2.09 1.77 1.68
v
1.79
1.82

1.78
ljffi 1.91
1.83 1.88
2.09

2.12
2.13

2.13
2.25

2.22
1.975

1.963
1.4 BAYESIAN AND NON-PARAMETRIC METHODS

During recent years there has been much interest in, and application of,
1963
l~ 0 1.99 1.86 1.77 1.79 1.78 1.90 1.89 2.11 2.24 2.38 1.943
Bayesian and decision-theoretic methods of statistica! analysis. These differ
1964 2.22 2.25 2.09 1.75 1.78 1.87 1.68 1.,92 1.92 2.15 2.27 2.58 2.029 from the more traditional methods both in terms of their interpretation of
basic concepts and aims, and in their use of extra forms of relevant
1965 2.16 2.24 1.91
~ 1.81 1.89 1.70 1.86 1.97 2.06 2.10 2.31 1.998
information (prior probabilities and consequential costs, additional to sam-
1966 2.14 2.27 1.87 1.80 1.76 1.77 1.93 1.90 2.03 1.97
l~ ly( 2.035 ple data). The sample data components of the information used in such
approaches may well contain outlying observations. Again we must know
1967 2.18 2.30 2.00 1.80 1.64 1.79 1.91 1.81 2.10 1.92 2.21 2.20 1.978
how to deal with them. The more modern methods of statistica! analysis
1968 2.08 2.05 2.00 1.68 1.79 1.77 1.82 1.93 1.99 2.00 2.14 2.17 1.950 affect the outlier problem in two ways. On the one band we might ask how a
Bayesian, or decision-theoretic, analysis should try to cope with outliers. On
1969 2.22 1.89 2.02 1.98 1.80 1.84
~ 1.95 ~ y/ 2.22 2.47 2.090
the other band we might ask if such forms of analysis can be used for the
1970 2.29 2.10 1.86 1.93 1.84 1.81 1.97 2.10 1.93 2.24 2.30 2.32 2.064 detection and processing of outliers. There are some conceptual difficulties
bere. A basic tenet of the Bayesian approach is that inferences, or decisions,
Mean of
10 values 2.16 2.09 2.01 1.85 1.78 1.80 1.85 1.89 1.99 2.09 2.21 2.35 2.003 are strictly conditional on the actual sample data that bave been obtained. In
the main, it is contrary to the Bayesian tradition to view the data within a
Mean of framework of alternative sets of data which might have arisen. Thus the
9 values 2.21 2.13 1.98 1.32 1.76 1.81 1.82 1.91 1.96 2.07 2.19 2.32 1.993
probability mechanism for generating the data is seen to be irrelevant, and it
St. dev. might appear to be similarly irrelevant to question the integrity of the
of 10 0.175 0.188 0.115 0.119 0.087 0.053 0.131 0.115 0.110 0.116 0.089 0.156 0.050 realized data-in particular, to examine the implications of outlying obser-
vations. However, Bayesian methods revolve around the likelihood function
St. dev.
of 9 0.079 0.149 0.085 0.092 0.064 0.044 0.100 0.085 0.072 0.098 0.071 0.135 0.042 which needs to be specified in any particular case. The likelihood does
depend on an assumed probability model, and in turn it would seem that,
relative to that model, certain observations might be outliers. Can the
Bayesian approach afford, on principle, to ignore what these outliers might
Christmas') and September and October 1969 also had tangible explana- imply about incorrect specification of the likelihood?
tions for their unreasonably high death rates-clement weather, with unex- One aspect of the Bayesian approach is its formalization of subjective
pected fogs on motorways! impressions as an ingredient of statistica! analysis. We bave remarked above
how subjective factors arise in judging whether or not outliers exist in a set
(c) Designed experiments, multivariate analyses of data. In the example on weights of birds in Section 1.1, for instance, the
Here, also, we may expect to encounter outliers, with interpretations both reading 1.55 looked suspicious and could well represent a recording error
deterministic (arising from tangible, non-statistica!, sources) and probabilis- for 2.05. In the time-series example o n quarterly sales in Section 1.3, the
tic (causing us to question distributional or structural assumptions). A pattern of sales over the last year appeared inconsistent with earlier experi-
taxonomist might well be confronted with a problem of classification of an ence and made us question the values A and B. In both these cases
individuai on which he has taken a vector of measurements, no single one of subjective judgement was involved. Might it not be better to use Bayesian
which is 'surprising' in relation to its own marginai distribution yet the methods directly in trying to reach a conclusion about the outliers, or in
assemblage of which as a multivariate observation is in some sense 'surpris- taking them into account in processing the data? For example, we might
ingly far away from the main group'. Limited study has been made of ways attempt to assign prior probabilities to different possible explanations of the
Introduction 17
16 Outliers in statistica! data

outliers. Some efforts in tbis direction bave been made and will be consi- W e sball start witb a generai discussion of tbe different aims and purposes
dered in Chapter 8. in studying outliers and proceed to wbat is perbaps tbe most elementary
Another avenue for statistica! analysis is in tbe use of non-parametric aspect of this, tbe examination of outliers in single univariate samples from a
methods. We sball need to consider wbetber tbe notion of an outlier makes given probability distribution. Special attention is given to the effect of
any sense in the context of standard non-parametric inference procedures. outliers in estimatìbn procedures. Outlying sub-samples are tben consi-
At first sigbt tbe relative nature of outliers (on page 5 we defined outliers in dered; also multivariate data. W e progress to more structured models sucb
relation to a provisional probability model) seems to deny sucb a prospect. as regression, designed experiments, and time-series. After some discussion
However, in non-parametric tests of location or dispersion for two samples of Bayesian and non-parametric procedures, we conclude witb comment on
we are essentially asking if one sample is an outlier relative to tbe otber. tbe future state of tbe art.
Here, of course, observations are 'labelled' as belonging to one or tbe otber
sample. Sucb labelling is crucial: it enables us stili to operate witbin a
'relative' framework and to extend the outlier concept to tbe non-parametric
area. Tbe discussion of slippage tests in Cbapter 5 is part of sucb an
extended vie w. It also seems sensible to enquire wbether we could use a
non-parametric approacb for the analysis of outliers wbere a probability
mode l has bee n prescribed.

1.5 SURVEY OF OUTLIER PROBLEMS

The informai discussion of ideas and examples in tbe earlier parts of tbis
cbapter enables us to draw up a broad classification of tbe types of enquiry
we need to make in tbe study of outliers. N o single factor classification will
do since we must consider tbe distinctions
(i) between deterministic and statistica! causes of outliers,
(ii) between univariate and multivariate data sets,
(iii) between different specific probability models,
(iv) between different forms of statistica! analysis in wbicb tbe outliers are
encountered,
(v) between single or multiple outliers (including outlying samples and sets
of samples), and
(vi) most fundamentally, between tbe different aims and purposes we may
bave in studying outliers.
Sucb a complex array of considerations does not lead to a particularly tidy
subdivision of topics, but tbe arrangement of tbe succeeding cbapters and
tbeir subsections bas been cbosen in recognition of sucb distinctions in wbat
seems to be a fairly natural progression from tbe simpler to tbe more
complex considerations. Our main object bas been to present a fairly full
review of existing metbodology in tbe treatment of outliers. We bave aimed
througbout botb to provide sufficient tbeoretical detail to meet tbe interests
of the matbematical statistician and to give a full enougb description of
practical methods to meet tbe needs of tbe data analyst. Application of the
methods is of paramount importance and we bave tried to present relevant
illustration (witbin the limitations of tbe scale of tbe book).
Introduction 17
16 Outliers in statistica! data

outliers. Some efforts in tbis direction bave been made and will be consi- W e sball start witb a generai discussion of tbe different aims and purposes
dered in Chapter 8. in studying outliers and proceed to wbat is perbaps tbe most elementary
Another avenue for statistica! analysis is in tbe use of non-parametric aspect of this, tbe examination of outliers in single univariate samples from a
methods. We sball need to consider wbetber tbe notion of an outlier makes given probability distribution. Special attention is given to the effect of
any sense in the context of standard non-parametric inference procedures. outliers in estimatìbn procedures. Outlying sub-samples are tben consi-
At first sigbt tbe relative nature of outliers (on page 5 we defined outliers in dered; also multivariate data. W e progress to more structured models sucb
relation to a provisional probability model) seems to deny sucb a prospect. as regression, designed experiments, and time-series. After some discussion
However, in non-parametric tests of location or dispersion for two samples of Bayesian and non-parametric procedures, we conclude witb comment on
we are essentially asking if one sample is an outlier relative to tbe otber. tbe future state of tbe art.
Here, of course, observations are 'labelled' as belonging to one or tbe otber
sample. Sucb labelling is crucial: it enables us stili to operate witbin a
'relative' framework and to extend the outlier concept to tbe non-parametric
area. Tbe discussion of slippage tests in Cbapter 5 is part of sucb an
extended vie w. It also seems sensible to enquire wbether we could use a
non-parametric approacb for the analysis of outliers wbere a probability
mode l has bee n prescribed.

1.5 SURVEY OF OUTLIER PROBLEMS

The informai discussion of ideas and examples in tbe earlier parts of tbis
cbapter enables us to draw up a broad classification of tbe types of enquiry
we need to make in tbe study of outliers. N o single factor classification will
do since we must consider tbe distinctions
(i) between deterministic and statistica! causes of outliers,
(ii) between univariate and multivariate data sets,
(iii) between different specific probability models,
(iv) between different forms of statistica! analysis in wbicb tbe outliers are
encountered,
(v) between single or multiple outliers (including outlying samples and sets
of samples), and
(vi) most fundamentally, between tbe different aims and purposes we may
bave in studying outliers.
Sucb a complex array of considerations does not lead to a particularly tidy
subdivision of topics, but tbe arrangement of tbe succeeding cbapters and
tbeir subsections bas been cbosen in recognition of sucb distinctions in wbat
seems to be a fairly natural progression from tbe simpler to tbe more
complex considerations. Our main object bas been to present a fairly full
review of existing metbodology in tbe treatment of outliers. We bave aimed
througbout botb to provide sufficient tbeoretical detail to meet tbe interests
of the matbematical statistician and to give a full enougb description of
practical methods to meet tbe needs of tbe data analyst. Application of the
methods is of paramount importance and we bave tried to present relevant
illustration (witbin the limitations of tbe scale of tbe book).
What should one do about outlying obseroations? 19

they had never rejected an observation merely because of its large residua!,
and that all completed observations, with equal weight, ought to be al-
lowed to contribute to the result. Others, such as Boscovich, practised
rejection. However, rejection was never envisaged in these early days as
being carried out according to any formai procedure, but was purely a
matter of the observer's judgment. Legendre, for example, in 1805 was
recommending the rejection of deviations 'adjudged too large to be admis-
CHAPTER 2 sible'. Indeed a century later Saunder could write (1903):
I believe tbat tbe practice amongst computers of experience is to rely almost entirely
on their individuai judgment, taking irrto account tbe conditions of the observations,
What should one do about and drawing tbe line somewbere about tbose observations wbicb give residuals of five
times tbe probable error.
outlying observations? The first published objective test for anomalous observations was due to
the American astronomer Peirce (1852). In Peirce's procedure, k doubtful
observations in a sample of n should be rejected if
2.1 EARLY INFORMAL APPROACHES
the probability of tbe system of errors obtained by retaining tbem is less tban tbat of
The existence of the problem of doubtful or anomalous values has been tbe system of errors obtained by tbeir rejection multiplied by tbe probability of
recognized for a very long time, certainly since the middle of the eighteenth making so many, and no more, abnormal observations.
century. D ani el Bernoulli, writing in 1777 about the combination of as- This last probability Peirce took to be pk(l- pt-k, where he began by
tronomica! observations, said: defining p as 'the probability, supposed to be unknown, of such an abnormal
is it rigbt to bold tbat tbe severa! observations are of tbe same weigbt or moment, or
observation that it is rejected upon account of its magnitude', and then
equally prone to any and every error? ... Is tbere everywbere tbe same probability? assigned it the value k/n.
Such an assertion would be quite absurd, which is undoubtedly the reason wby This bizarre test was followed in 1863 by the publication of the test for a
astronomers prefer to reject completely observations wbicb tbey judge to be too single doubtful observation (already referred to in Chapter l) by the
wide of tbe trutb, wbile retaining tbe rest and, indeed, assigning to tbem tbe same American astronomer, Chauvenet. Chauvenet's test has an attractive simp-
reliability. . .. I see no way of drawing a dividing line between those tbat are to be
utterly rejected and tbose tbat are to be wbolly retained; it may even bappen tbat tbe licity lacking in Peirce's; despite its now evident shortcomings, it persists in
rejected observation is tbe one tbat would bave supplied tbe best correction to tbe print to the prese n t day, a t any rate in a number of textbooks for students in
otbers. Nevertbeless, I do not condemn in every case tbe principle of rejecting one or engineering and the experimental sciences. The reasoning was as follows
otber of tbe observations, indeed I approve it, wbenever in tbe course of observation (see Stone, 1868). If 8(x) denotes the probability that an error is equal to or
an accident occurs wbicb in itself raises an immediate scruple in tbe mind of tbe greater than x,
observer, before be bas considered tbe event and compared it witb tbe otber
observations. If tbere is no sucb reason for dissatisfaction I tbink eacb and every tben tbe number of errors equal to or greater tban x wbicb may fairly be expected in
observation sbould be admitted wbatever its quality, as long as tbe observer is n observations is n8(x ). If therefore we fin d x su cb that n8(x) =!, any error greater
conscious tbat be bas taken every care. (Allen, 1961) tban x wiH bave a greater probability against it tban for it, an d may, tberefore, be
To take an even earlier example, Boscovich, attempting in 1755 to deter- rejected.
mine the ellipticity of the earth by averaging ten measurements of excess In effect, an observation is to be rejected if it lies outside the lower and
of the polar degree over the equatorial, decided to discard the two extreme upper l/(4n) points of the null distribution. Evidently with this procedure
values of excess as outliers and recomputed the mean from the reduced the chance of wrongly rejecting a non-discordant value is in a large sample
sample of eight, see Maire; Boscovich (1755). From this period until the approximately 1- e-!, i.e. about 40 per cent!
middle of the nineteenth century, the main point of discussion in the Soon after Chauvenet, Stone (1868) introduced a rejection test based on
literature with regard to outlying values is whether rejection is justified. the concept of a modulus of carelessness, m. This concept can be expressed
Some writers took the same view as Daniel Bernoulli, that observations in the following way: a given observer in a given sampling situation makes
should not be rejected purely on grounds of appearing inconsistent with on average one mistake in every m observations he takes. An observation is
the remaining data; Bessel and Baeuer, for example, wrote in 1838 that to be discarded if its deviation can be attributed with more probability to the
18
What should one do about outlying obseroations? 19

they had never rejected an observation merely because of its large residua!,
and that all completed observations, with equal weight, ought to be al-
lowed to contribute to the result. Others, such as Boscovich, practised
rejection. However, rejection was never envisaged in these early days as
being carried out according to any formai procedure, but was purely a
matter of the observer's judgment. Legendre, for example, in 1805 was
recommending the rejection of deviations 'adjudged too large to be admis-
CHAPTER 2 sible'. Indeed a century later Saunder could write (1903):
I believe tbat tbe practice amongst computers of experience is to rely almost entirely
on their individuai judgment, taking irrto account tbe conditions of the observations,
What should one do about and drawing tbe line somewbere about tbose observations wbicb give residuals of five
times tbe probable error.
outlying observations? The first published objective test for anomalous observations was due to
the American astronomer Peirce (1852). In Peirce's procedure, k doubtful
observations in a sample of n should be rejected if
2.1 EARLY INFORMAL APPROACHES
the probability of tbe system of errors obtained by retaining tbem is less tban tbat of
The existence of the problem of doubtful or anomalous values has been tbe system of errors obtained by tbeir rejection multiplied by tbe probability of
recognized for a very long time, certainly since the middle of the eighteenth making so many, and no more, abnormal observations.
century. D ani el Bernoulli, writing in 1777 about the combination of as- This last probability Peirce took to be pk(l- pt-k, where he began by
tronomica! observations, said: defining p as 'the probability, supposed to be unknown, of such an abnormal
is it rigbt to bold tbat tbe severa! observations are of tbe same weigbt or moment, or
observation that it is rejected upon account of its magnitude', and then
equally prone to any and every error? ... Is tbere everywbere tbe same probability? assigned it the value k/n.
Such an assertion would be quite absurd, which is undoubtedly the reason wby This bizarre test was followed in 1863 by the publication of the test for a
astronomers prefer to reject completely observations wbicb tbey judge to be too single doubtful observation (already referred to in Chapter l) by the
wide of tbe trutb, wbile retaining tbe rest and, indeed, assigning to tbem tbe same American astronomer, Chauvenet. Chauvenet's test has an attractive simp-
reliability. . .. I see no way of drawing a dividing line between those tbat are to be
utterly rejected and tbose tbat are to be wbolly retained; it may even bappen tbat tbe licity lacking in Peirce's; despite its now evident shortcomings, it persists in
rejected observation is tbe one tbat would bave supplied tbe best correction to tbe print to the prese n t day, a t any rate in a number of textbooks for students in
otbers. Nevertbeless, I do not condemn in every case tbe principle of rejecting one or engineering and the experimental sciences. The reasoning was as follows
otber of tbe observations, indeed I approve it, wbenever in tbe course of observation (see Stone, 1868). If 8(x) denotes the probability that an error is equal to or
an accident occurs wbicb in itself raises an immediate scruple in tbe mind of tbe greater than x,
observer, before be bas considered tbe event and compared it witb tbe otber
observations. If tbere is no sucb reason for dissatisfaction I tbink eacb and every tben tbe number of errors equal to or greater tban x wbicb may fairly be expected in
observation sbould be admitted wbatever its quality, as long as tbe observer is n observations is n8(x ). If therefore we fin d x su cb that n8(x) =!, any error greater
conscious tbat be bas taken every care. (Allen, 1961) tban x wiH bave a greater probability against it tban for it, an d may, tberefore, be
To take an even earlier example, Boscovich, attempting in 1755 to deter- rejected.
mine the ellipticity of the earth by averaging ten measurements of excess In effect, an observation is to be rejected if it lies outside the lower and
of the polar degree over the equatorial, decided to discard the two extreme upper l/(4n) points of the null distribution. Evidently with this procedure
values of excess as outliers and recomputed the mean from the reduced the chance of wrongly rejecting a non-discordant value is in a large sample
sample of eight, see Maire; Boscovich (1755). From this period until the approximately 1- e-!, i.e. about 40 per cent!
middle of the nineteenth century, the main point of discussion in the Soon after Chauvenet, Stone (1868) introduced a rejection test based on
literature with regard to outlying values is whether rejection is justified. the concept of a modulus of carelessness, m. This concept can be expressed
Some writers took the same view as Daniel Bernoulli, that observations in the following way: a given observer in a given sampling situation makes
should not be rejected purely on grounds of appearing inconsistent with on average one mistake in every m observations he takes. An observation is
the remaining data; Bessel and Baeuer, for example, wrote in 1838 that to be discarded if its deviation can be attributed with more probability to the
18
20 Outliers in statistica! data What should one do about outlying observations? 21

observer's carelessness than to random variation. This means, in effect, that Another weighting method proposed at this time was by Ne~comb
an observation is rejected if it lies outside the lower and upper lf(2m) points (1886); see Stigler (1973b). His procedure. as~um~s the n observat10ns to
of the null distribution, so the test is essentially similar to Chauvenet's, bave come from a mixture of r normal dtstnbut10ns, and evolves a final
becoming identica! with it if m = 2 n. estimate of IL which is constructed as a weighted mean of rn different
AI_ternatives to outright rejection of extreme values were also being weighted means oflhe xi. Rather interestingly, Newcomb refers in his paper
constdered by a number of writers. Within a few years of Stone's rejection to the 'evil' of a value; this turns out to be the mean squared error of an
test, several methods were published-one by Stone himself-for the estimate, and is an interesting early use of the concept of a loss function.
weighting of observations in calculating a sample mean. This can be re- Apart from these and other similar references of the period to the
garded as a robust procedure for estimating a location parameter which accommodation of extreme values by weighting, it is interesting to find what
secures the accommodation of outliers. la Rider's words (1933), we would now cali trimming discussed in 1895 by Mendeleev, the discoverer
Since the object of combining observations is to obtain the best possible estimate of
of the periodic table of elements. Referring to the evaluation of the length
the true value o f. a mag?itud~, the p~inciple underlying ... [weighting] methods is of the Russian standard platinum-iridium metre from a set of eleven
that an observat10n whtch dtffers wtdely from the rest should be retained but determinations, Mendeleev wrote:
assigned a s~aller. weight than. the others in computing a weighted averag~. Of
course retent10n With an exceedmgly small weight amounts to virtual rejection. I use ... [the following] method to evaluate the harmony of a series of observations
that must give identica} numbers, namely I divide aH the numbers into three, if
Glaish~r (1?72-?3) was perha~s the first to publish a weighting procedure, possible equal, groups (if the number of observations is not divisibl~ by three, the
remarkmg, It wdl be seen that It supersedes the necessity for the rejection of greatest number is left in the middle group): those of greatest magmtu~e, those of
anomalous observations'. medium magnitude, and those of smallest magnitude: the mean of the mtddle group
Glaisher's method was to assume that the n observations xi (i= l, ... , n) is considered the most probable ... and if the mean of the remaining groups is dose
were ~ormally distributed, with a common mean IL required to be estimated, to it ... the observations are considered harmonious. (See Harter, 1974-1976, Part
and wtth unknown and unequal variances u;.
For a provisional estimate of
I)
the mean he allotted what he regarded as plausible values for the u;.
These Returning to the history of ad hoc rejection tests such as Chauvenet's and
in turn led to a modified estimate of the me an, ne w estimates of the u;,
an d Stone's, the literature of the following fifty years until the period of the First
so on. Specifically, successive weighted means m( 1), m(z), ... were calculated, World War affords a number of further examples of such tests. In particular,
m(l) being the ordinary sample mean i, and the weights Wri for the rth
we may note Wright's procedure (1884), which rejects any observation
weighted mean m(r) =Li Wrixi being defined recursively by
deviating from the mean by more than three times the standard deviation, or
equivalently five times the probable error; the modified version by Wright
W,i ex: exp{-2(Wr-1,1 + ·. · + Wr-1,n)(Xi- m(r-1)) 2}. (2.1.1) and Hayford (1906), which adds to Wright's rule the further instruction:
Stone (1873) followed a few months later with a criticism of Glaisher's
method and a proposal for an alternative weighting procedure, based in Examine carefully each observation for which the residua! exceeds 3.5 times the
probable error, and reject it if any of the accompanying conditions are such as to
effect on maximizing the likelihood, which is proportional to produce lack of confidence

n[(1/uJexp -{(xi -1Lfl2uf}],


i
(2.1.2) and Goodwin's procedure (1913), which rejects an outlying observation in a
sample of n if its deviation from the mean of the remaining n- l exceeds
four times the average deviation of the n - l.
with respect to IL and all the ui. This leads to a weighted mean fi given by
the (n -l)th degree equation One notes that Wright and Goodwin chose, for the criticai ratios in their
tests, values 5, 3.5, 4, and so on, which were independent of the sample size
I
i=1
(xi- fi)-1 =O. (2.1.3)
n. This relates to one generai defect of ali the test procedures proposed up
to this time-they failed to distinguish between population variance and
sample variance.
The same method was published independently by Edgeworth (1883) ten Perhaps the first writer to make this point explicitly with regard to outlier
-years later, though Edgeworth subsequently (1887) acknowledged Stone's procedures was Irwin (1925), who pointed out the implications for outlier
priority. rejection of the unreliability of the sample standard deviation, s, as an
20 Outliers in statistica! data What should one do about outlying observations? 21

observer's carelessness than to random variation. This means, in effect, that Another weighting method proposed at this time was by Ne~comb
an observation is rejected if it lies outside the lower and upper lf(2m) points (1886); see Stigler (1973b). His procedure. as~um~s the n observat10ns to
of the null distribution, so the test is essentially similar to Chauvenet's, bave come from a mixture of r normal dtstnbut10ns, and evolves a final
becoming identica! with it if m = 2 n. estimate of IL which is constructed as a weighted mean of rn different
AI_ternatives to outright rejection of extreme values were also being weighted means oflhe xi. Rather interestingly, Newcomb refers in his paper
constdered by a number of writers. Within a few years of Stone's rejection to the 'evil' of a value; this turns out to be the mean squared error of an
test, several methods were published-one by Stone himself-for the estimate, and is an interesting early use of the concept of a loss function.
weighting of observations in calculating a sample mean. This can be re- Apart from these and other similar references of the period to the
garded as a robust procedure for estimating a location parameter which accommodation of extreme values by weighting, it is interesting to find what
secures the accommodation of outliers. la Rider's words (1933), we would now cali trimming discussed in 1895 by Mendeleev, the discoverer
Since the object of combining observations is to obtain the best possible estimate of
of the periodic table of elements. Referring to the evaluation of the length
the true value o f. a mag?itud~, the p~inciple underlying ... [weighting] methods is of the Russian standard platinum-iridium metre from a set of eleven
that an observat10n whtch dtffers wtdely from the rest should be retained but determinations, Mendeleev wrote:
assigned a s~aller. weight than. the others in computing a weighted averag~. Of
course retent10n With an exceedmgly small weight amounts to virtual rejection. I use ... [the following] method to evaluate the harmony of a series of observations
that must give identica} numbers, namely I divide aH the numbers into three, if
Glaish~r (1?72-?3) was perha~s the first to publish a weighting procedure, possible equal, groups (if the number of observations is not divisibl~ by three, the
remarkmg, It wdl be seen that It supersedes the necessity for the rejection of greatest number is left in the middle group): those of greatest magmtu~e, those of
anomalous observations'. medium magnitude, and those of smallest magnitude: the mean of the mtddle group
Glaisher's method was to assume that the n observations xi (i= l, ... , n) is considered the most probable ... and if the mean of the remaining groups is dose
were ~ormally distributed, with a common mean IL required to be estimated, to it ... the observations are considered harmonious. (See Harter, 1974-1976, Part
and wtth unknown and unequal variances u;.
For a provisional estimate of
I)
the mean he allotted what he regarded as plausible values for the u;.
These Returning to the history of ad hoc rejection tests such as Chauvenet's and
in turn led to a modified estimate of the me an, ne w estimates of the u;,
an d Stone's, the literature of the following fifty years until the period of the First
so on. Specifically, successive weighted means m( 1), m(z), ... were calculated, World War affords a number of further examples of such tests. In particular,
m(l) being the ordinary sample mean i, and the weights Wri for the rth
we may note Wright's procedure (1884), which rejects any observation
weighted mean m(r) =Li Wrixi being defined recursively by
deviating from the mean by more than three times the standard deviation, or
equivalently five times the probable error; the modified version by Wright
W,i ex: exp{-2(Wr-1,1 + ·. · + Wr-1,n)(Xi- m(r-1)) 2}. (2.1.1) and Hayford (1906), which adds to Wright's rule the further instruction:
Stone (1873) followed a few months later with a criticism of Glaisher's
method and a proposal for an alternative weighting procedure, based in Examine carefully each observation for which the residua! exceeds 3.5 times the
probable error, and reject it if any of the accompanying conditions are such as to
effect on maximizing the likelihood, which is proportional to produce lack of confidence

n[(1/uJexp -{(xi -1Lfl2uf}],


i
(2.1.2) and Goodwin's procedure (1913), which rejects an outlying observation in a
sample of n if its deviation from the mean of the remaining n- l exceeds
four times the average deviation of the n - l.
with respect to IL and all the ui. This leads to a weighted mean fi given by
the (n -l)th degree equation One notes that Wright and Goodwin chose, for the criticai ratios in their
tests, values 5, 3.5, 4, and so on, which were independent of the sample size
I
i=1
(xi- fi)-1 =O. (2.1.3)
n. This relates to one generai defect of ali the test procedures proposed up
to this time-they failed to distinguish between population variance and
sample variance.
The same method was published independently by Edgeworth (1883) ten Perhaps the first writer to make this point explicitly with regard to outlier
-years later, though Edgeworth subsequently (1887) acknowledged Stone's procedures was Irwin (1925), who pointed out the implications for outlier
priority. rejection of the unreliability of the sample standard deviation, s, as an
22 Outliers in statistica! data What should one do about outlying observations? 23

estimate of its population analogue, u. For the case where u is known, he subjective doubts about the propriety of the outlying values both in relation
proposed the test statistics to tbe specific data set we bave obtained and in relation to our initial views
of an appropriate probability model to describe the generation of our data.
(x(n)- X(n-1)]/u and (X(n-1)- X(n-2)]/u
Note bow our feelings about the data may, in this respect, differ quite widely
where x(i) denotes the ith ordered sample value. Ten years later an exact witb different possible basic probability models. If we anticipate a normal
test based o n a studentized criterio n, (x- i)/ s, was published by Thompson distribution we may react quite strongly to certain observations wbich would
(1935). This was shortly followed by the classic paper by Pear- arouse no specific concern if the expected model is longer-tailed, say
son and Chandra Sekar (1936) entitled 'The efficiency of statistica! tools and Iog-normal or Caucby. The purpose of a body of statistica! method for
a criterion for the rejection of outlying observations'. A rationale for the examing outliers is, in broad terms, to provide a means of assessing whetber
treatment of outliers was beginning to take shape. our subjective declaration of tbe presence of outliers in a particular set of
An encyclopaedic survey of outlier methods from the earliest times is data bas important objective implications for the furtber analysis of the data.
included in Harter (197 4-197 6). Outliers may bave arisen for purely deterministfc reasons: a reading,
recording, or calculating error in the data. When it is obvious tbat this is so
2.2 V ARIOUS AIMS tbe remedy is clear: tbe offending sample values sbould be removed from
tbe sample or replaced by corrected values wben tbe metbod of 'correction'
In the previous section we reviewed some of the early informai approacbes
is unambiguously understood. See the examples in Section 1.1. In less
wbich had been used for tbe study of outliers. In sucb work there was
clear-cut circumstances where we suspect, but cannot guarantee, such a
seldom any overt consideration of tbe purpose of examining tbe outlying
tangible explanation for outliers, appropriate statistica! procedures may be
observations, of tbe manner in which outliers may reflect contamination of a
used to assess discordancy. In this text, an observation will be termed
basic probability model, or of any optimality properties possessed by tbe
discordant if it is statistically unreasonable on the basis of some prescribed
prevailing statistica! metbods. Any detailed examination of the current state
probability model. We shall later extend tbis definition to include an
of theory and practice in relation to tbe study of outliers must consider such
observation known to bave been generated by a different probability model;
matters of aim, model, and principle. The remaining sections of tbis chapter
sucb a discordant observation need not necessarily show up as an outlier.
present a generai review of
Some writers use tbe word 'outlier' for an observation whicb is botb
(i) tbe different aims in examining outliers, surprising and discordant; a term such as 'suspect value' is then used by
(ii) probabilistic models explaining outliers, and tbem to describe a surprising value (an outlier in our sense). To quote a
(iii) tbe pbilosophy and form of relevant statistica! procedures. typical example we read (Grubbs, 1950):
Sucb matters are reconsidered in appropriate detail in later chapters wbere Then again, both the largest and the smallest observations may appear to be
fuller treatments of specific topics are given. 'different' from the remaining items in the sample. Here we are interested in testing
We commence witb what must be the most fundamental question: Wby the hypothesis that both the largest and the smallest observations are truly 'outliers'.
are we concerned about outliers? The practical examples of Chapter l bave
Of course extreme values must always occur in a set of data. Wbat is
illustrated some grounds for concern. Let us examine these more systemati-
important is whether or not they are so extreme that they could not
cally.
reasonably bave arisen by chance from tbe adopted model. lf so, we may
Let us recall wbat we mean by an 'outlier'. Note that, as explained below,
feel that we now bave substantiating evidence for some earlier conjecture of
the term 'outlier' is used by different autbors in two different senses. As
a 'mistake' in the data, and again would wish to reject (or correct) that
employed in this text, it is a subjective post-data manifestation. In observing
mistake. Alternatively, discordancy of unreasonably extreme outliers may
a set of observations in some practical situation one (or more) of tbe
promote the adoption of a new probability model, with important implica-
observations 'jars', stands out in contrast to other observations, usually as an
tions for furtber analysis of the data.
extreme value. As Grubbs (1969) remarks (italics inserted):
Wbat form does tbe alternative mode l take? Tbere are various pos-
An outlying observation, or 'outlier', is o ne that appears to deviate markedly from sibilities. 1t may be merely a differently shaped distribution in relation to
other members of the sample in which it occurs. wbich the complete set of data (including the outliers) appears as a
Sucb outliers do not fit with tbe tidy pattern present in our mind, at tbe bomogeneous random sample, or it may need to be more structured. For
outset of our enquiry, of wbat constitutes a reasonable set of data. We bave example, a random mixture of two distributions may reasonably account for
22 Outliers in statistica! data What should one do about outlying observations? 23

estimate of its population analogue, u. For the case where u is known, he subjective doubts about the propriety of the outlying values both in relation
proposed the test statistics to tbe specific data set we bave obtained and in relation to our initial views
of an appropriate probability model to describe the generation of our data.
(x(n)- X(n-1)]/u and (X(n-1)- X(n-2)]/u
Note bow our feelings about the data may, in this respect, differ quite widely
where x(i) denotes the ith ordered sample value. Ten years later an exact witb different possible basic probability models. If we anticipate a normal
test based o n a studentized criterio n, (x- i)/ s, was published by Thompson distribution we may react quite strongly to certain observations wbich would
(1935). This was shortly followed by the classic paper by Pear- arouse no specific concern if the expected model is longer-tailed, say
son and Chandra Sekar (1936) entitled 'The efficiency of statistica! tools and Iog-normal or Caucby. The purpose of a body of statistica! method for
a criterion for the rejection of outlying observations'. A rationale for the examing outliers is, in broad terms, to provide a means of assessing whetber
treatment of outliers was beginning to take shape. our subjective declaration of tbe presence of outliers in a particular set of
An encyclopaedic survey of outlier methods from the earliest times is data bas important objective implications for the furtber analysis of the data.
included in Harter (197 4-197 6). Outliers may bave arisen for purely deterministfc reasons: a reading,
recording, or calculating error in the data. When it is obvious tbat this is so
2.2 V ARIOUS AIMS tbe remedy is clear: tbe offending sample values sbould be removed from
tbe sample or replaced by corrected values wben tbe metbod of 'correction'
In the previous section we reviewed some of the early informai approacbes
is unambiguously understood. See the examples in Section 1.1. In less
wbich had been used for tbe study of outliers. In sucb work there was
clear-cut circumstances where we suspect, but cannot guarantee, such a
seldom any overt consideration of tbe purpose of examining tbe outlying
tangible explanation for outliers, appropriate statistica! procedures may be
observations, of tbe manner in which outliers may reflect contamination of a
used to assess discordancy. In this text, an observation will be termed
basic probability model, or of any optimality properties possessed by tbe
discordant if it is statistically unreasonable on the basis of some prescribed
prevailing statistica! metbods. Any detailed examination of the current state
probability model. We shall later extend tbis definition to include an
of theory and practice in relation to tbe study of outliers must consider such
observation known to bave been generated by a different probability model;
matters of aim, model, and principle. The remaining sections of tbis chapter
sucb a discordant observation need not necessarily show up as an outlier.
present a generai review of
Some writers use tbe word 'outlier' for an observation whicb is botb
(i) tbe different aims in examining outliers, surprising and discordant; a term such as 'suspect value' is then used by
(ii) probabilistic models explaining outliers, and tbem to describe a surprising value (an outlier in our sense). To quote a
(iii) tbe pbilosophy and form of relevant statistica! procedures. typical example we read (Grubbs, 1950):
Sucb matters are reconsidered in appropriate detail in later chapters wbere Then again, both the largest and the smallest observations may appear to be
fuller treatments of specific topics are given. 'different' from the remaining items in the sample. Here we are interested in testing
We commence witb what must be the most fundamental question: Wby the hypothesis that both the largest and the smallest observations are truly 'outliers'.
are we concerned about outliers? The practical examples of Chapter l bave
Of course extreme values must always occur in a set of data. Wbat is
illustrated some grounds for concern. Let us examine these more systemati-
important is whether or not they are so extreme that they could not
cally.
reasonably bave arisen by chance from tbe adopted model. lf so, we may
Let us recall wbat we mean by an 'outlier'. Note that, as explained below,
feel that we now bave substantiating evidence for some earlier conjecture of
the term 'outlier' is used by different autbors in two different senses. As
a 'mistake' in the data, and again would wish to reject (or correct) that
employed in this text, it is a subjective post-data manifestation. In observing
mistake. Alternatively, discordancy of unreasonably extreme outliers may
a set of observations in some practical situation one (or more) of tbe
promote the adoption of a new probability model, with important implica-
observations 'jars', stands out in contrast to other observations, usually as an
tions for furtber analysis of the data.
extreme value. As Grubbs (1969) remarks (italics inserted):
Wbat form does tbe alternative mode l take? Tbere are various pos-
An outlying observation, or 'outlier', is o ne that appears to deviate markedly from sibilities. 1t may be merely a differently shaped distribution in relation to
other members of the sample in which it occurs. wbich the complete set of data (including the outliers) appears as a
Sucb outliers do not fit with tbe tidy pattern present in our mind, at tbe bomogeneous random sample, or it may need to be more structured. For
outset of our enquiry, of wbat constitutes a reasonable set of data. We bave example, a random mixture of two distributions may reasonably account for
24 Outliers in statistica[ data What should o ne do about outlying obseroations? 25

the data form and, indeed, there may seem to be no alternative but to assume would incorporate them in a non-discordant fashion, or the purpose of the
that the outliers reflect 'foreign' random (rather than deterministic) in- further analysis may support some form of partial accommodation short of
ftuences in an otherwise homogeneous set of data. Such foreign influences complete incorporation (see comments below). Should we decide to reject
may, at one extreme, be matters of great interest in their own right. At the the outliers then the stricture of Kruskal (1960b) makes sound sense.
apposite extreme they may act only as obstructions in our efforts to assess
the properties of the main mass of data. Some examples of these distinctions As to practice, I suggest that it is of great importance to preach the doctrine that
are given in Section 1.2. Statistica! methods aimed at assessing discordancy apparent outliers should always be reported, even when one feels that their causes
are known or when one rejects them for whatever good rule or reason. The
bave played a large part in the literature on outliers. We shall be considering immediate pressures of practical statistical analysis are almost uniformly in the
a variety of tests for discordancy for different situations in Chapter 3, and direction of suppressing announcements of observations that do not fit the pattern;
later in this chapter we shall examine generai s.tatistical principles on which we must maintain a strong sea-wall against these pressures.
they are based.
Inevitably, a test for discordancy for outliers plays only an initial role in The outright rejection of outliers has statistica! consequences for the
the analysis of the data. Leaving aside the wider statistica! analysis we intend further analysis of the reduced sample. We may no longer bave a random
to apply to the data (and for which purpose they were presumably assem- sample, but a censored one. The practice of replacing rejected (non-
bled) an assessment of discordancy of some outliers must be viewed only as deterministically inexplicable) outliers by statistica! equivalents (further
a first stage of study of the outliers themselves. What action are we to take if simulated random observations from the assumed underlying distribution)
we adjudge one or more outliers to be discordant? This will depend on a involves similar consequences. The practices of 'Winsorization' and trim-
variety of factors relating to our interest in the practical situation. Obvious ming (see below) will also bave distributional implications which must be
possibilities arise: we may decide to reject (or correct) the discordant outliers allowed for.
and proceed to analyse the residua! (modified) data on the originai model, The following data described by Karl Pearson ( 1931) prese nt the
we may choose to modify the model to incorporate the outliers in a capacities (in cc) of a sample of seventeen male Moriori skulls.
non-discordant fashion, we may refine the way in which we analyse the 1230 1318 1380 1420 1630 1378
whole data set to accommodate the outliers (render the analysis relatively 1348 1380 1470 1445 1360 1410
impervious to their presence), or we may concentrate attention on the 1540 1260 1364 1410 1545
discordant outliers as a welcome identification of unsuspected factors of
practical importance. The observation 1630 was suspected as being 'too large'; suppose it proves
Let us consider some numerica! examples, where, at first sight, these to be discordant. It may bere be more appropriate to seek an alternative
separate possibilities seem reasonable. Suppose that in each case any de- model which incorporates the value 1630 in a non-discordant way. Biologica!
clared outliers prove to be discordant on the basis of an appropriate data often require skew distributions as models-see the example on macro-
statistica! test employing an 'initially reasonable' model. lepidoptera in Section 1.2. There is of course the possibility that identification
Chauvenet (1863) declared the observations 1.01 and -1.40 as, respec- of the outlier reflects the presence of a small number of another species in
tively, upper and lower outliers in the following set of 15 residuals (about a the population being studied, or (possibly less realistically) that it has arisen
simple model) of observations of the vertical semi-diameter of Venus, in from a once-and-for-all error of measurement, or of recording.
seconds, made by Lt. Herndon in 1846. Daniel (1959) reports the results of a 2 5 factorial experiment where the 31
contrasts arranged in order of increasing absolute value are:
-0.30 -0.24 -1.40 +0.18
-0.44 +0.06 -0.22 +0.39 0.0000 0.0281 -0.0561 -0.0842 -0.0982 0.1263 0.1684
+1.01 +0.63 -0.05 +0.10 0.1964 0.2245 -0.2526 0.2947 -0.3087 0.3929 0.4069
+0.48 -0.13 +0.20 0.4209 0.4350 0.4630 -0.4771 0.5472 0.6595 0.7437
If the outliers prove to be discordant on an assumed normal distribution it -0.7437 -0.7577 -0.8138 -0.8138 -0.8980 1.080 -1.305
is quite likely (bearing in mind the possibilities of inexplicable 'gross 2.147 -2.666 -3.143
errors') that we may choose to reject them before proceeding to further The last three observations are discordant outliers cn the normal model.
study of the data. We cannot, of course, be sure that this action is entirely But this is precisely what we are seeking: important effects of the experi-
proper. Perhaps an appropriately more sophisticated non-normal model mental factors. Thus we identify the outliers as indications of features of
24 Outliers in statistica[ data What should o ne do about outlying obseroations? 25

the data form and, indeed, there may seem to be no alternative but to assume would incorporate them in a non-discordant fashion, or the purpose of the
that the outliers reflect 'foreign' random (rather than deterministic) in- further analysis may support some form of partial accommodation short of
ftuences in an otherwise homogeneous set of data. Such foreign influences complete incorporation (see comments below). Should we decide to reject
may, at one extreme, be matters of great interest in their own right. At the the outliers then the stricture of Kruskal (1960b) makes sound sense.
apposite extreme they may act only as obstructions in our efforts to assess
the properties of the main mass of data. Some examples of these distinctions As to practice, I suggest that it is of great importance to preach the doctrine that
are given in Section 1.2. Statistica! methods aimed at assessing discordancy apparent outliers should always be reported, even when one feels that their causes
are known or when one rejects them for whatever good rule or reason. The
bave played a large part in the literature on outliers. We shall be considering immediate pressures of practical statistical analysis are almost uniformly in the
a variety of tests for discordancy for different situations in Chapter 3, and direction of suppressing announcements of observations that do not fit the pattern;
later in this chapter we shall examine generai s.tatistical principles on which we must maintain a strong sea-wall against these pressures.
they are based.
Inevitably, a test for discordancy for outliers plays only an initial role in The outright rejection of outliers has statistica! consequences for the
the analysis of the data. Leaving aside the wider statistica! analysis we intend further analysis of the reduced sample. We may no longer bave a random
to apply to the data (and for which purpose they were presumably assem- sample, but a censored one. The practice of replacing rejected (non-
bled) an assessment of discordancy of some outliers must be viewed only as deterministically inexplicable) outliers by statistica! equivalents (further
a first stage of study of the outliers themselves. What action are we to take if simulated random observations from the assumed underlying distribution)
we adjudge one or more outliers to be discordant? This will depend on a involves similar consequences. The practices of 'Winsorization' and trim-
variety of factors relating to our interest in the practical situation. Obvious ming (see below) will also bave distributional implications which must be
possibilities arise: we may decide to reject (or correct) the discordant outliers allowed for.
and proceed to analyse the residua! (modified) data on the originai model, The following data described by Karl Pearson ( 1931) prese nt the
we may choose to modify the model to incorporate the outliers in a capacities (in cc) of a sample of seventeen male Moriori skulls.
non-discordant fashion, we may refine the way in which we analyse the 1230 1318 1380 1420 1630 1378
whole data set to accommodate the outliers (render the analysis relatively 1348 1380 1470 1445 1360 1410
impervious to their presence), or we may concentrate attention on the 1540 1260 1364 1410 1545
discordant outliers as a welcome identification of unsuspected factors of
practical importance. The observation 1630 was suspected as being 'too large'; suppose it proves
Let us consider some numerica! examples, where, at first sight, these to be discordant. It may bere be more appropriate to seek an alternative
separate possibilities seem reasonable. Suppose that in each case any de- model which incorporates the value 1630 in a non-discordant way. Biologica!
clared outliers prove to be discordant on the basis of an appropriate data often require skew distributions as models-see the example on macro-
statistica! test employing an 'initially reasonable' model. lepidoptera in Section 1.2. There is of course the possibility that identification
Chauvenet (1863) declared the observations 1.01 and -1.40 as, respec- of the outlier reflects the presence of a small number of another species in
tively, upper and lower outliers in the following set of 15 residuals (about a the population being studied, or (possibly less realistically) that it has arisen
simple model) of observations of the vertical semi-diameter of Venus, in from a once-and-for-all error of measurement, or of recording.
seconds, made by Lt. Herndon in 1846. Daniel (1959) reports the results of a 2 5 factorial experiment where the 31
contrasts arranged in order of increasing absolute value are:
-0.30 -0.24 -1.40 +0.18
-0.44 +0.06 -0.22 +0.39 0.0000 0.0281 -0.0561 -0.0842 -0.0982 0.1263 0.1684
+1.01 +0.63 -0.05 +0.10 0.1964 0.2245 -0.2526 0.2947 -0.3087 0.3929 0.4069
+0.48 -0.13 +0.20 0.4209 0.4350 0.4630 -0.4771 0.5472 0.6595 0.7437
If the outliers prove to be discordant on an assumed normal distribution it -0.7437 -0.7577 -0.8138 -0.8138 -0.8980 1.080 -1.305
is quite likely (bearing in mind the possibilities of inexplicable 'gross 2.147 -2.666 -3.143
errors') that we may choose to reject them before proceeding to further The last three observations are discordant outliers cn the normal model.
study of the data. We cannot, of course, be sure that this action is entirely But this is precisely what we are seeking: important effects of the experi-
proper. Perhaps an appropriately more sophisticated non-normal model mental factors. Thus we identify the outliers as indications of features of
26 Outliers in statistica[ data What should one do about outlying observations? 27

practical importance rather tban as tedious reflections of possible inade- heigbts of men will reflect tbe amount of variability indigenous to tbat
quacies in tbe model or measurement tecbnique. population.
Tbere is one area of enquiry wbere our study of outliers may not Measurement error. Often we must take measurements on members of a
necessarily commence witb a test of discordancy. Consider again tbe Hern- population under study. Inadequacies in tbe measuring instrument superim-
don data on residuals of observations of tbe vertical semi-diameter of tbe pose a furtber degree of variability on the inberent factor. Tbe rounding of
planet Venus. Suppose tbat we wisb to estimate some summary measure of obtained values, or mistakes in recording, compound tbe measurement
tbe distribution of residuals, perbaps tbe mean or variance. Tbe properties error: tbey are part of it. Some contro! of tbis type of variability is possible.
of different estimators will vary witb tbe form of tbe distribution. Not Execution error. A furtber source of variability arises in tbe imperfect
knowing its form we want to use an estimator wbicb is reasonably robust collection of our data. We may inadvertently cboose a biased sample or
against different possible distributional forms. Tbis concern for robustness in include individuals wbo are not truly representative of tbe population we
estimation (or testing) includes an interest in procedures wbicb protect aimed to sample. Again, sensible precautions may reduce sucb variability.
against tbe possibility (or presence) of outliers. Tbe extreme prospect is tbat We can usefully attempt to classify outliers in relation to tbese tbree types
we decide to reject tbe outliers prior to estimation or testing, eitber because of variability. An outlier in a set of data may in fact be a perfectly
of clear tangible explanations of tbeir presence or following a test of reasonable reflection of the natural inberent variation. If sbown statistically
discordancy based on a confidently assumed model. But interest in robust- to be discordant tbis reflects an inadequate basic model (unless of course it
ness against outliers denies any great confidence in tbe appropriate model is merely a manifestation of Type I error). We would bope to learn from tbe
and preliminary tests of discordancy may not be feasible. Furtbermore tbe experience of tbe discordant outlier and adopt a more appropriate model.
severe act of rejecting outliers may be over-extravagant and over-specific. But as Anscombe (1960a) points out:
Tbere are possibilities of partial rejection, or indeed of impartial (sic)
In no field of observation can we entirely rule out the possibility that an observation
rejection. Altbougb we may be suspicious of tbe actual values -1.40 and is vitiated by a large measurement or execution error. ... there must be a suspicion
1.01, we may nonetbeless feel tbat tbe direction of tbe residuals carries that the deviation is caused by a blunder or gross error of some kind. Severa!
information and wisb to retain tbis information in some form. One possibil- possible reasons ... can usually be thought of without difficulty. In such cases, the
ity is to employ Winsorization wbere, for example, we replace tbe lower and reading will be checked or repeated if that is possible. If not, it may be rejected as
upper extremes by tbeir nearest neigbbours. For tbe Herndon data, -1.40 spurious because of its big residual, even though there is no known reason for
suspecting it. In sufficiently extreme cases, no one hesitates about such rejections. . ..
and 1.01 are replaced by -0.44 and 0.63, respectively, tbus making eacb of If we could be sure that an outlier was caused by a large measurement or execution
tbese latter values appear twice in tbe data. Alternatively as an aid to error which could not be rectified (and if we had no interest in studying such errors
robustness of estimation or testing we may cboose to use an a-trimmed for their own sake ), we should be justified in entirely discarding the obser~a,tion an d
sample, in wbicb a fixed fraction a of lower, and upper, extreme sample ali memory of it. The act of observation would have failed; there would be nothing to
values are totally discarded before processing tbe sample. Tbis 'old French report.
custom' (Huber, 1972) is not specifically concerned witb protecting against Certain points need to be underlined bere. An obvious unrepresentative
outliers, tbougb tbese will clearly be candidates for trimming. measurement error supports rejection of tbe offending observation (or
From tbe robustness standpoint, we are tbus aiming to devise statistica! occasionally we may be able to correct it, or repeat it). An outlier in tbe
procedures wbicb do not directly examine tbe outliers, but seek to accom- form of an excessive execution error may sometimes also lead to rejection,
modate tbem and render tbem less serious in tbeir inftuence on estimation but it could on occasions warrant a modified model (perbaps of a mixture
or tests of summary measures of tbe underlying distribution. type to cope witb infrequent foreign sample members-discordant values-
Several autbors of review papers on tbe topic of outliers (for example, reflected by tbe outlier), or it could serve to identify some factor of
Anscombe, 1960a; Grubbs, 1969) bave attempted to categorize tbe different importance in its own rigbt (as, for example, in an analysis of variance
ways in wbicb outliers may arise. Sucb ideas bave been toucbed on in situation). Anscombe (1960a) distinguisbes in terminology between outliers
Cbapter l. It is relevant to consider tbem in ratber more detail. In taking arising from large variation of tbe inherent type, and tbese from large
observations, different sources of variability can be encountered. We can measurement or execution erro r. He calls tbe former 'outliers', tbe latter
distinguisb tbree of tbese. 'spurious observations'. We sball make no sucb distinction. Tbe full study of
Inherent variability. Tbis is tbe expression of tbe way in wbicb observations statistica! metbods for outliers needs to encompass all derivative sources of
intrinsically vary over tbe population; sucb variation is a natura! feature of variation; the only exceptions are outliers arising from clearly discernible
the population and uncontrollable. Tbus, for example, measurements of deterministic mistakes of calculation, recording, etc. In tbis case rejection
26 Outliers in statistica[ data What should one do about outlying observations? 27

practical importance rather tban as tedious reflections of possible inade- heigbts of men will reflect tbe amount of variability indigenous to tbat
quacies in tbe model or measurement tecbnique. population.
Tbere is one area of enquiry wbere our study of outliers may not Measurement error. Often we must take measurements on members of a
necessarily commence witb a test of discordancy. Consider again tbe Hern- population under study. Inadequacies in tbe measuring instrument superim-
don data on residuals of observations of tbe vertical semi-diameter of tbe pose a furtber degree of variability on the inberent factor. Tbe rounding of
planet Venus. Suppose tbat we wisb to estimate some summary measure of obtained values, or mistakes in recording, compound tbe measurement
tbe distribution of residuals, perbaps tbe mean or variance. Tbe properties error: tbey are part of it. Some contro! of tbis type of variability is possible.
of different estimators will vary witb tbe form of tbe distribution. Not Execution error. A furtber source of variability arises in tbe imperfect
knowing its form we want to use an estimator wbicb is reasonably robust collection of our data. We may inadvertently cboose a biased sample or
against different possible distributional forms. Tbis concern for robustness in include individuals wbo are not truly representative of tbe population we
estimation (or testing) includes an interest in procedures wbicb protect aimed to sample. Again, sensible precautions may reduce sucb variability.
against tbe possibility (or presence) of outliers. Tbe extreme prospect is tbat We can usefully attempt to classify outliers in relation to tbese tbree types
we decide to reject tbe outliers prior to estimation or testing, eitber because of variability. An outlier in a set of data may in fact be a perfectly
of clear tangible explanations of tbeir presence or following a test of reasonable reflection of the natural inberent variation. If sbown statistically
discordancy based on a confidently assumed model. But interest in robust- to be discordant tbis reflects an inadequate basic model (unless of course it
ness against outliers denies any great confidence in tbe appropriate model is merely a manifestation of Type I error). We would bope to learn from tbe
and preliminary tests of discordancy may not be feasible. Furtbermore tbe experience of tbe discordant outlier and adopt a more appropriate model.
severe act of rejecting outliers may be over-extravagant and over-specific. But as Anscombe (1960a) points out:
Tbere are possibilities of partial rejection, or indeed of impartial (sic)
In no field of observation can we entirely rule out the possibility that an observation
rejection. Altbougb we may be suspicious of tbe actual values -1.40 and is vitiated by a large measurement or execution error. ... there must be a suspicion
1.01, we may nonetbeless feel tbat tbe direction of tbe residuals carries that the deviation is caused by a blunder or gross error of some kind. Severa!
information and wisb to retain tbis information in some form. One possibil- possible reasons ... can usually be thought of without difficulty. In such cases, the
ity is to employ Winsorization wbere, for example, we replace tbe lower and reading will be checked or repeated if that is possible. If not, it may be rejected as
upper extremes by tbeir nearest neigbbours. For tbe Herndon data, -1.40 spurious because of its big residual, even though there is no known reason for
suspecting it. In sufficiently extreme cases, no one hesitates about such rejections. . ..
and 1.01 are replaced by -0.44 and 0.63, respectively, tbus making eacb of If we could be sure that an outlier was caused by a large measurement or execution
tbese latter values appear twice in tbe data. Alternatively as an aid to error which could not be rectified (and if we had no interest in studying such errors
robustness of estimation or testing we may cboose to use an a-trimmed for their own sake ), we should be justified in entirely discarding the obser~a,tion an d
sample, in wbicb a fixed fraction a of lower, and upper, extreme sample ali memory of it. The act of observation would have failed; there would be nothing to
values are totally discarded before processing tbe sample. Tbis 'old French report.
custom' (Huber, 1972) is not specifically concerned witb protecting against Certain points need to be underlined bere. An obvious unrepresentative
outliers, tbougb tbese will clearly be candidates for trimming. measurement error supports rejection of tbe offending observation (or
From tbe robustness standpoint, we are tbus aiming to devise statistica! occasionally we may be able to correct it, or repeat it). An outlier in tbe
procedures wbicb do not directly examine tbe outliers, but seek to accom- form of an excessive execution error may sometimes also lead to rejection,
modate tbem and render tbem less serious in tbeir inftuence on estimation but it could on occasions warrant a modified model (perbaps of a mixture
or tests of summary measures of tbe underlying distribution. type to cope witb infrequent foreign sample members-discordant values-
Several autbors of review papers on tbe topic of outliers (for example, reflected by tbe outlier), or it could serve to identify some factor of
Anscombe, 1960a; Grubbs, 1969) bave attempted to categorize tbe different importance in its own rigbt (as, for example, in an analysis of variance
ways in wbicb outliers may arise. Sucb ideas bave been toucbed on in situation). Anscombe (1960a) distinguisbes in terminology between outliers
Cbapter l. It is relevant to consider tbem in ratber more detail. In taking arising from large variation of tbe inherent type, and tbese from large
observations, different sources of variability can be encountered. We can measurement or execution erro r. He calls tbe former 'outliers', tbe latter
distinguisb tbree of tbese. 'spurious observations'. We sball make no sucb distinction. Tbe full study of
Inherent variability. Tbis is tbe expression of tbe way in wbicb observations statistica! metbods for outliers needs to encompass all derivative sources of
intrinsically vary over tbe population; sucb variation is a natura! feature of variation; the only exceptions are outliers arising from clearly discernible
the population and uncontrollable. Tbus, for example, measurements of deterministic mistakes of calculation, recording, etc. In tbis case rejection
28 Outliers in statistical data What should one do about outlying observations? 29

Source of vonol!On Noture of outl1er Monner of treotment ond Act1on and appealing, to contemplate tbe statistics
( seldom known, of course) (dttlo)

[x(n)- X(n-1)]/ D
or
[x(n)- i']/ D

wbere x(i) is tbe itb ordered value in tbe sample (wben all observations are
placed in ascending order of magnitude), i' is the sample mean excluding
tbe outlier x(n) and D is some measure of tbe spread of tbe sample (again
excluding consideration of x<n>). See Cbapter 3, page 53. Such statistics seem
intuitively reasonable for assessing tbe discordancy of tbe outlier x<n>' and
were originally proposed purely on sucb an informai basis. Only at tbe more
formai stage of determining tbe level or size of tbe test, or of constructing
tables for assessing tbe statistica! significance of tbe test statistic value, does
it become unavoidable tbat tbe working bypotbesis is specified. To go
furtber and examine power cbaracteristics of tbe test, or to construct tests
witb certain desirable statistica! properties, requires tbe additional specifica-
Figure 2.1 Treatment of outliers
tion of tbe alternative bypotbesis.
Tbe bistorical development of tests of outliers mirrors sucb a progression
(or correction) is the only remedy; otberwise we need to clearly recognize in statistica! sopbistication. Wbilst the declaration of the alternative
tbe many possibilities, other than outrigbt rejection, for coping with outliers. hypotbesis is tbe crux of tbe problem of defining just wbat we mean by
Reviewing this section we can present scbematically tbe different interests outliers, it stili bas not been very widely discussed. Perhaps tbis is inevitable:
and aims in the handling of outliers. Figure 2.1 presents a simplified outliers are not easily defined, or incorporated in a generally acceptable
diagrammatic summary. form of model, and a degree of controversy stili surrounds tbeir study (see
tbe opening pages of Cbapter 1).
2.3 MODELS FOR DISCORDANCY
In tbe spirit of tbe previous section we may regard the test of discordancy as The working hypothesis
centrai to mucb of our interest in outliers. Any statistica! test must inevitably No fundamental conceptual difficulty exists in setting up tbe working
examine two bypotbeses: a null bypotbesis, or working hypothesis, wbicb will bypotbesis. W e bave repeatedly stressed tbe conditional nature of the outlier
be conserved unless significant evidence is found to support its rejection, concept. An outlier is an observation wbicb appears suspicious in tbe light of
and an alternative hypothesis in favour of whicb tbe working bypotbesis may some provisional initial assignment of a probability model to explain tbe
need to be rejected. For a test of discordancy of outliers, tbe working data generating process. Tbus tbe working bypotbesis is merely a statement
bypotbesis will express some basic probability model for tbe generation of of tbe initial (basic) probability model. In some discussions of tbe nature of
all tbe data with no contemplation of outliers; tbe alternative hypothesis tests of discordancy, or in attempts to accommodate outliers througb robust
expresses a way in wbicb the model may be modified to incorporate or statistica! procedures, we may not need to be too specific about tbe proba-
explain tbe outliers. We sball consider in some detail different forms of bility model. It may suffice to declare, as a working hypothesis, H, that tbe
outlier-generating model wbich bave been proposed and studied. In the data arise as independent observations from some common, but unspecified,
following section we extend the discussion of tests of discordancy to consider distribution F. We can denote tbis as:
tbe statistica! nature of tbe tests tbemselves and any optimality properties
tbey may possess. H:F.
Much of tbe early, or existing, work on tests of discordancy is bigbly But in tbe detailed analysis of real-world data we will often bave a mucb
intuitive in form and bas little regard for tbe nature of tbe working and more specific mode l in mind: for example, that tbe data consti tute a random
alternative bypotbeses. Tbus, for example, wben concerned witb a single sample from an exponential distribution witb scale parameter 8 (wbicb may,
upper outlier in a set of independent observations x 1 , x2 , ••• , Xn it is natural, or may no t, be specified in value) or from a homoscedastic linear regression
28 Outliers in statistical data What should one do about outlying observations? 29

Source of vonol!On Noture of outl1er Monner of treotment ond Act1on and appealing, to contemplate tbe statistics
( seldom known, of course) (dttlo)

[x(n)- X(n-1)]/ D
or
[x(n)- i']/ D

wbere x(i) is tbe itb ordered value in tbe sample (wben all observations are
placed in ascending order of magnitude), i' is the sample mean excluding
tbe outlier x(n) and D is some measure of tbe spread of tbe sample (again
excluding consideration of x<n>). See Cbapter 3, page 53. Such statistics seem
intuitively reasonable for assessing tbe discordancy of tbe outlier x<n>' and
were originally proposed purely on sucb an informai basis. Only at tbe more
formai stage of determining tbe level or size of tbe test, or of constructing
tables for assessing tbe statistica! significance of tbe test statistic value, does
it become unavoidable tbat tbe working bypotbesis is specified. To go
furtber and examine power cbaracteristics of tbe test, or to construct tests
witb certain desirable statistica! properties, requires tbe additional specifica-
Figure 2.1 Treatment of outliers
tion of tbe alternative bypotbesis.
Tbe bistorical development of tests of outliers mirrors sucb a progression
(or correction) is the only remedy; otberwise we need to clearly recognize in statistica! sopbistication. Wbilst the declaration of the alternative
tbe many possibilities, other than outrigbt rejection, for coping with outliers. hypotbesis is tbe crux of tbe problem of defining just wbat we mean by
Reviewing this section we can present scbematically tbe different interests outliers, it stili bas not been very widely discussed. Perhaps tbis is inevitable:
and aims in the handling of outliers. Figure 2.1 presents a simplified outliers are not easily defined, or incorporated in a generally acceptable
diagrammatic summary. form of model, and a degree of controversy stili surrounds tbeir study (see
tbe opening pages of Cbapter 1).
2.3 MODELS FOR DISCORDANCY
In tbe spirit of tbe previous section we may regard the test of discordancy as The working hypothesis
centrai to mucb of our interest in outliers. Any statistica! test must inevitably No fundamental conceptual difficulty exists in setting up tbe working
examine two bypotbeses: a null bypotbesis, or working hypothesis, wbicb will bypotbesis. W e bave repeatedly stressed tbe conditional nature of the outlier
be conserved unless significant evidence is found to support its rejection, concept. An outlier is an observation wbicb appears suspicious in tbe light of
and an alternative hypothesis in favour of whicb tbe working bypotbesis may some provisional initial assignment of a probability model to explain tbe
need to be rejected. For a test of discordancy of outliers, tbe working data generating process. Tbus tbe working bypotbesis is merely a statement
bypotbesis will express some basic probability model for tbe generation of of tbe initial (basic) probability model. In some discussions of tbe nature of
all tbe data with no contemplation of outliers; tbe alternative hypothesis tests of discordancy, or in attempts to accommodate outliers througb robust
expresses a way in wbicb the model may be modified to incorporate or statistica! procedures, we may not need to be too specific about tbe proba-
explain tbe outliers. We sball consider in some detail different forms of bility model. It may suffice to declare, as a working hypothesis, H, that tbe
outlier-generating model wbich bave been proposed and studied. In the data arise as independent observations from some common, but unspecified,
following section we extend the discussion of tests of discordancy to consider distribution F. We can denote tbis as:
tbe statistica! nature of tbe tests tbemselves and any optimality properties
tbey may possess. H:F.
Much of tbe early, or existing, work on tests of discordancy is bigbly But in tbe detailed analysis of real-world data we will often bave a mucb
intuitive in form and bas little regard for tbe nature of tbe working and more specific mode l in mind: for example, that tbe data consti tute a random
alternative bypotbeses. Tbus, for example, wben concerned witb a single sample from an exponential distribution witb scale parameter 8 (wbicb may,
upper outlier in a set of independent observations x 1 , x2 , ••• , Xn it is natural, or may no t, be specified in value) or from a homoscedastic linear regression
30 Outliers in statistica/ data What should one do about outlying observations? 31

situation witb normal errar structure. Inevitably, mucb of tbe discussion of favour of tbe alternative is deterministically correct (even if difficult to
statistica! tests of discordancy for outliers (as in so many areas of statistica! confirm from tbe practical point of view).
enquiry) assumes a norma[ working bypotbesis.
Cbapter 3 will be concerned witb tbe detailed forms of discordancy tests (ii) Inherent alternqtive
for individuai outliers, or small groups of lower and upper outliers. Test
statistics, relevant tables of percentage points and special advantages or In tbe terms of tbe discussion of sources of variability in tbe previous
disadvantages are described for a variety of tests. Most results are available section, we must entertain tbe possibility tbat outliers bave appeared in tbe
for tbe normal and gamma (including exponential) distributions and tbese data as a result of a greater degree of inberent variability tban we initially
are presented along witb wbat relatively little information exists for otber anticipated. Perbaps wbat we tbougbt was a sample from a normal distribu-
distributions sucb as tbe Pareto, Poisson, lO'g-normal and uniform. More tion was really from a 'fatter-tailed' distribution. Upper outliers may reflect,
effort could usefully go into devising and examining tbe detailed properties say, tbat an initial assumption of a gamma distribution is best replaced witb
of tests of discordancy for non-norma! distributions. a log-normal distribution. Tbus wbere outliers reflect a larger measure (or
If, on tbe basis of a test of discordancy, we adjudge an outlier (or small different form) of inberent variability tban is encompassed in some basic
group of outliers) to be discordant we implicitly reject tbe working model, F, we may choose to express tbis by opposing tbe working hypotbesis,
bypotbesis in favour of some alternative hypothesis. Clearly, we must know H:F,
wbat alternative bypotbesis is being adopted! Any assessment of tbe power
of tbe prevailing test of discordancy, indeed its very justification in statistica! tbat ali observations arise from tbe distribution, F, witb a suitably cbosen
terms, depends on specifying the alternative bypotbesis. Ali too frequently alternative bypotbesis,
tbis aspect is ignored in tbe discussion of tests of outliers. Let us consider H:G,
some of tbe possible forms of alternative bypotbesis for tests of discordancy
and examine (briefly at tbis stage) the extent to wbicb tbey figure in tbe tbat all observations arise from a distribution, G, under wbicb tbe outliers
construction or application of tests prescribed in tbe literature. no longer occasion tbe earlier degree of 'surprise'. F an d G may be different
fully specified distributions, or may be distinct generai parametric families of
Forms of alternative hypothesis distributions.
Of course, we would bope to be able to distinguisb between F and G by
We can readily contemplate a variety of different forms of alternative appropriate statistica! procedures (based o n tbe complete sample) more
bypotbesis for outlier tests of discordancy. Some of tbese bave bee n discus- powerful tban tbe outlier-specific test of discordancy. On tbe otber band, in
sed in tbe literature; otbers bave obvious appeal but do not appear to bave small sets of data our very motivation for considering a radically different
been considered in any detail. Tbe formulation of tbe alternative bypotbesis, form of model may stem entirely from tbe presence of outliers. Tbus tbe
in relation to tbe subjective manner in wbicb outliers are declared in a set of inherent alternative bypotbesis is relevant to tests of discordancy. Sbapiro
data, is no easy matter and mucb remains to be done in seeking an entirely and Wilk bave given sucb a test directed particularly to inberent alterna-
satisfactory form for tbe alternative bypotbesis. tives, botb for normal samples (1965) and for exponential samples (1972);
detailed results on tbe power of tbeir normal test against 45 different
(i) Deterministic alternative inberent alternatives are given by Sbapiro, Wilk, and Cben (1968), and on
Tbe first type of alternative bypotbesis wbicb comes to mind covers tbe tbe power of tbeir exponential test against 15 different inberent alternatives
cases of outliers caused by obvious identifiable gross errors of measurement, by Sbapiro and Wilk (1972).
recording, and so on. Sucb an alternative bypotbesis, wbicb we term
deterministic, is entirely specific to tbe actual data set, and tbe observed (iii) Mixture alternative
offending observations. Tbus if our data x 1 , x 2 , ••• , xn contain one observa- Ratber tban assume tbat outliers may reflect an unexpected degree or
tion, say xi, wbicb bas clearly arisen from a mistaken reading or recording, form of inberent variability, we migbt admit tbe possibility of 'errors of
we immediately reject any basic mode!, F, for tbe whole data set in favour of execution' allowing 'contamination' of the sample by a few members of a
an alternative model wbicb says tbat ali xi (j;é i) arise at random from F population otber tban tbat represented by tbe basic model. We assume tbat
wbilst xi is quite different and requires rejection (or correction, or repeat sucb 'foreign' sample members, or discordant values, sbow tbemselves as
reading). No test of discordancy is needed. Rejection of tbe initial model in outliers. For example, in examining a sample of fossils from wbat is
30 Outliers in statistica/ data What should one do about outlying observations? 31

situation witb normal errar structure. Inevitably, mucb of tbe discussion of favour of tbe alternative is deterministically correct (even if difficult to
statistica! tests of discordancy for outliers (as in so many areas of statistica! confirm from tbe practical point of view).
enquiry) assumes a norma[ working bypotbesis.
Cbapter 3 will be concerned witb tbe detailed forms of discordancy tests (ii) Inherent alternqtive
for individuai outliers, or small groups of lower and upper outliers. Test
statistics, relevant tables of percentage points and special advantages or In tbe terms of tbe discussion of sources of variability in tbe previous
disadvantages are described for a variety of tests. Most results are available section, we must entertain tbe possibility tbat outliers bave appeared in tbe
for tbe normal and gamma (including exponential) distributions and tbese data as a result of a greater degree of inberent variability tban we initially
are presented along witb wbat relatively little information exists for otber anticipated. Perbaps wbat we tbougbt was a sample from a normal distribu-
distributions sucb as tbe Pareto, Poisson, lO'g-normal and uniform. More tion was really from a 'fatter-tailed' distribution. Upper outliers may reflect,
effort could usefully go into devising and examining tbe detailed properties say, tbat an initial assumption of a gamma distribution is best replaced witb
of tests of discordancy for non-norma! distributions. a log-normal distribution. Tbus wbere outliers reflect a larger measure (or
If, on tbe basis of a test of discordancy, we adjudge an outlier (or small different form) of inberent variability tban is encompassed in some basic
group of outliers) to be discordant we implicitly reject tbe working model, F, we may choose to express tbis by opposing tbe working hypotbesis,
bypotbesis in favour of some alternative hypothesis. Clearly, we must know H:F,
wbat alternative bypotbesis is being adopted! Any assessment of tbe power
of tbe prevailing test of discordancy, indeed its very justification in statistica! tbat ali observations arise from tbe distribution, F, witb a suitably cbosen
terms, depends on specifying the alternative bypotbesis. Ali too frequently alternative bypotbesis,
tbis aspect is ignored in tbe discussion of tests of outliers. Let us consider H:G,
some of tbe possible forms of alternative bypotbesis for tests of discordancy
and examine (briefly at tbis stage) the extent to wbicb tbey figure in tbe tbat all observations arise from a distribution, G, under wbicb tbe outliers
construction or application of tests prescribed in tbe literature. no longer occasion tbe earlier degree of 'surprise'. F an d G may be different
fully specified distributions, or may be distinct generai parametric families of
Forms of alternative hypothesis distributions.
Of course, we would bope to be able to distinguisb between F and G by
We can readily contemplate a variety of different forms of alternative appropriate statistica! procedures (based o n tbe complete sample) more
bypotbesis for outlier tests of discordancy. Some of tbese bave bee n discus- powerful tban tbe outlier-specific test of discordancy. On tbe otber band, in
sed in tbe literature; otbers bave obvious appeal but do not appear to bave small sets of data our very motivation for considering a radically different
been considered in any detail. Tbe formulation of tbe alternative bypotbesis, form of model may stem entirely from tbe presence of outliers. Tbus tbe
in relation to tbe subjective manner in wbicb outliers are declared in a set of inherent alternative bypotbesis is relevant to tests of discordancy. Sbapiro
data, is no easy matter and mucb remains to be done in seeking an entirely and Wilk bave given sucb a test directed particularly to inberent alterna-
satisfactory form for tbe alternative bypotbesis. tives, botb for normal samples (1965) and for exponential samples (1972);
detailed results on tbe power of tbeir normal test against 45 different
(i) Deterministic alternative inberent alternatives are given by Sbapiro, Wilk, and Cben (1968), and on
Tbe first type of alternative bypotbesis wbicb comes to mind covers tbe tbe power of tbeir exponential test against 15 different inberent alternatives
cases of outliers caused by obvious identifiable gross errors of measurement, by Sbapiro and Wilk (1972).
recording, and so on. Sucb an alternative bypotbesis, wbicb we term
deterministic, is entirely specific to tbe actual data set, and tbe observed (iii) Mixture alternative
offending observations. Tbus if our data x 1 , x 2 , ••• , xn contain one observa- Ratber tban assume tbat outliers may reflect an unexpected degree or
tion, say xi, wbicb bas clearly arisen from a mistaken reading or recording, form of inberent variability, we migbt admit tbe possibility of 'errors of
we immediately reject any basic mode!, F, for tbe whole data set in favour of execution' allowing 'contamination' of the sample by a few members of a
an alternative model wbicb says tbat ali xi (j;é i) arise at random from F population otber tban tbat represented by tbe basic model. We assume tbat
wbilst xi is quite different and requires rejection (or correction, or repeat sucb 'foreign' sample members, or discordant values, sbow tbemselves as
reading). No test of discordancy is needed. Rejection of tbe initial model in outliers. For example, in examining a sample of fossils from wbat is
32 Outliers in statistica[ data What should one do about outlying observations? 33

~upposed to be a homogeneous population of the same species, we might discordancy we are to assess using this alternative mode! (1- A)F+ AG?
madvertently collect one or two fossils of a different species with different There is no easy answer to this-but it seems equally difficult to justify any
size characteristics. Whilst the presence of the different species might other of the proposed forms of alternative hypothesis, although to judge
reasonably be ascribed to execution error when we are only interested in the from the literature there seems to be greater intuitive appeal in the slippage
predominant species, the term 'error' is not in generai a good !abel for this type of alternative~hypothesis we shall discuss next.
type of manifestation. A particle physicist measuring characteristics of paths At this point (though not specifically concerned with solely the mixture
of radioactive particles is more likely to term it 'good fortune' than 'error' if alternative) it is relevant to point out an apparent philosophical inconsis-
be discovers in the form of outlying observations a basically new type of tency in the formulation of models to explain the presence of outliers. The
particle. (This illustrates again the distinction between rejection and iden- model (1- A)F+ AG merely declares that with a certain (small) probability
tification in our response t o a test of discordancy.) observations might be generated by G. Yet the data bave directed our
In these terms a sensible alternative to the working hypothesis H: F is an attention to specific observations: the outliers, which appear typically as
hypothesis of the form extreme values in the sample. Suppose we bave just one (upper) outlier, x<n>·
The alternative hypothesis merely contemplates the possibility of some
H:(l-A)F+AG observations arising from G, not necessarily, specifically or solely, x<n>·
Statistica! principles may lead to the conclusion that if there is only one
which declares that outliers reflect the (small) chance À that observations observation from G it is best (in some sense) adjudged to be x<n>· But this
arise from a distribution G, quite different from the initial model F. Such an seems to differ from our originai subjective interest in the outlier x<n>' per se,
alternative hypothesis will be called a mixture alternative, and it figures in which would favour an alternative hypothesis specifically related to x<n>·
some published work o n outliers (see below). There appears to be no discussion of this matter in the literature. In any
As with the inherent type of alternative, we should again hope that the case, there is no obvious way of expressing such an interest through a
dichotomy mixture type of alternative hypothesis. But even for the slippage type of
alternative discussed below, where such a facility can in fact be found, it
H:F versus H:(l-A)F+AG does not seem to bave been contemplated. We discuss this in some detail in
Section 3 .1.
is promoted, supported, and best analysed on a broader basis than the As an example of the mixture model, Box and Tiao (1968) consider a
degree of 'surprise' engendered by the presence of one or two outliers in the Bayesian approach to outliers in which, under the working hypothesis, we
data. But if the sample size, and the mixing parameter A, are small, it may bave a random sample from a normal distribution N(J.t, u 2 ). The alternative
be that the outliers alone focus our attention on the possibility of a mixture hypothesis declares that, independently and with probabilities 1- À, À,
model. Box an d Tiao (1968) an d Guttman (1973b) discuss the implications respectively (A< 0.1), observations arise either from N(J.t, u 2 ) or from
of a mixture alternative for the study of outliers. Box and Tiao remark that N(~J-, bu2 ), with b > 1. Since b > 1 we might expect observations from the
if the mixture prospect is revealed through outliers alone it will be necessary latter distribution (G) to appear as extreme values, declared to be outliers.
to assume that in a sample of at most 20 observations À is small: possibly (We consider the results of Box and Tiao in more detail in Section 8.1.2.
less than 0.05, or at very most 0.10. Otherwise, occasions will arise where Their interest is in estimating IL rather than testing discordancy of outliers,
we encounter more observations from the distribution G than we should be so it comes under the heading of accommodation.)
content to designate 'outliers'. (For larger samples, À will need to be even The mixing distribution G need not, of course, be restricted to a scale-
smaller on this argument.) Guttman inverts this argument in support of the shifted version of F; it could express a change of location or even a radically
commonly made assumption that a set of data contains at most one discor- different form of distribution. Clearly, only certain forms of mixture will be
dant outlier! However, there se~ms to be a certain circularity in such relevant to outliers: shifts of scale with b > 1 may give rise to upper and
arguments. If À is very small and H is a reasonable mode! we are unlikely to lower outliers in combination; shifts of location, upper or lower outliers
encounter more than one or two members of G which (for appropriate G) separately depending on the direction of shift. Various mixtures will not be
may s_!low up as outliers; if we encounter just one (or two) outliers it may be expected to be reflected in the occurrence of outliers.
that H is an appropriate model to explain the outliers but À must perforce An early use of a mixture model for outliers is made by Dixon (1953).
be small. But how are we to adjudge the propriety of the mixture alternative Tukey (1960) is interested in a mixture model with two norma! distributions
(1- A)F+ AG with no evidence other than the one (or two) outliers whose differing in variance.
32 Outliers in statistica[ data What should one do about outlying observations? 33

~upposed to be a homogeneous population of the same species, we might discordancy we are to assess using this alternative mode! (1- A)F+ AG?
madvertently collect one or two fossils of a different species with different There is no easy answer to this-but it seems equally difficult to justify any
size characteristics. Whilst the presence of the different species might other of the proposed forms of alternative hypothesis, although to judge
reasonably be ascribed to execution error when we are only interested in the from the literature there seems to be greater intuitive appeal in the slippage
predominant species, the term 'error' is not in generai a good !abel for this type of alternative~hypothesis we shall discuss next.
type of manifestation. A particle physicist measuring characteristics of paths At this point (though not specifically concerned with solely the mixture
of radioactive particles is more likely to term it 'good fortune' than 'error' if alternative) it is relevant to point out an apparent philosophical inconsis-
be discovers in the form of outlying observations a basically new type of tency in the formulation of models to explain the presence of outliers. The
particle. (This illustrates again the distinction between rejection and iden- model (1- A)F+ AG merely declares that with a certain (small) probability
tification in our response t o a test of discordancy.) observations might be generated by G. Yet the data bave directed our
In these terms a sensible alternative to the working hypothesis H: F is an attention to specific observations: the outliers, which appear typically as
hypothesis of the form extreme values in the sample. Suppose we bave just one (upper) outlier, x<n>·
The alternative hypothesis merely contemplates the possibility of some
H:(l-A)F+AG observations arising from G, not necessarily, specifically or solely, x<n>·
Statistica! principles may lead to the conclusion that if there is only one
which declares that outliers reflect the (small) chance À that observations observation from G it is best (in some sense) adjudged to be x<n>· But this
arise from a distribution G, quite different from the initial model F. Such an seems to differ from our originai subjective interest in the outlier x<n>' per se,
alternative hypothesis will be called a mixture alternative, and it figures in which would favour an alternative hypothesis specifically related to x<n>·
some published work o n outliers (see below). There appears to be no discussion of this matter in the literature. In any
As with the inherent type of alternative, we should again hope that the case, there is no obvious way of expressing such an interest through a
dichotomy mixture type of alternative hypothesis. But even for the slippage type of
alternative discussed below, where such a facility can in fact be found, it
H:F versus H:(l-A)F+AG does not seem to bave been contemplated. We discuss this in some detail in
Section 3 .1.
is promoted, supported, and best analysed on a broader basis than the As an example of the mixture model, Box and Tiao (1968) consider a
degree of 'surprise' engendered by the presence of one or two outliers in the Bayesian approach to outliers in which, under the working hypothesis, we
data. But if the sample size, and the mixing parameter A, are small, it may bave a random sample from a normal distribution N(J.t, u 2 ). The alternative
be that the outliers alone focus our attention on the possibility of a mixture hypothesis declares that, independently and with probabilities 1- À, À,
model. Box an d Tiao (1968) an d Guttman (1973b) discuss the implications respectively (A< 0.1), observations arise either from N(J.t, u 2 ) or from
of a mixture alternative for the study of outliers. Box and Tiao remark that N(~J-, bu2 ), with b > 1. Since b > 1 we might expect observations from the
if the mixture prospect is revealed through outliers alone it will be necessary latter distribution (G) to appear as extreme values, declared to be outliers.
to assume that in a sample of at most 20 observations À is small: possibly (We consider the results of Box and Tiao in more detail in Section 8.1.2.
less than 0.05, or at very most 0.10. Otherwise, occasions will arise where Their interest is in estimating IL rather than testing discordancy of outliers,
we encounter more observations from the distribution G than we should be so it comes under the heading of accommodation.)
content to designate 'outliers'. (For larger samples, À will need to be even The mixing distribution G need not, of course, be restricted to a scale-
smaller on this argument.) Guttman inverts this argument in support of the shifted version of F; it could express a change of location or even a radically
commonly made assumption that a set of data contains at most one discor- different form of distribution. Clearly, only certain forms of mixture will be
dant outlier! However, there se~ms to be a certain circularity in such relevant to outliers: shifts of scale with b > 1 may give rise to upper and
arguments. If À is very small and H is a reasonable mode! we are unlikely to lower outliers in combination; shifts of location, upper or lower outliers
encounter more than one or two members of G which (for appropriate G) separately depending on the direction of shift. Various mixtures will not be
may s_!low up as outliers; if we encounter just one (or two) outliers it may be expected to be reflected in the occurrence of outliers.
that H is an appropriate model to explain the outliers but À must perforce An early use of a mixture model for outliers is made by Dixon (1953).
be small. But how are we to adjudge the propriety of the mixture alternative Tukey (1960) is interested in a mixture model with two norma! distributions
(1- A)F+ AG with no evidence other than the one (or two) outliers whose differing in variance.
34 Outliers in statistical data What should o ne do about outlying observations? 35

(iv) Slippage alternative that under Hone of tbe xi comes from N(IL +a, u 2 ); wbicb one is unknown;
it may be any of tbe xi witb equal probabilities, 1/n. In a non-Bayesian
By far tbe most common type of alternative bypotbesis as a model for tbe
framework tbis extension of tbe slippage alternative bypotbesis (wbicb is
generation of outliers is wbat we sball refer to as tbe slippage alternative. It
best described as an exchangeable alternative bypotbesis-see below) can be
bas been widely discussed and used, and figures (sometimes only implicitly)
in work by Grubbs (1950), Dixon (1950), Anscombe (1960a), Ferguson expressed
(1961a), McMillan and David (1971), McMillan (1971), Guttman (1973b), H:one of Hi (i= 1, 2, ... , n) bolds,
and many others. In its most usual form the slippage alternative states tbat
ali observations apart from some small number k (1 or 2, say) arise wbere
independently from tbe initial model F indexed by location and scale (2.3.3)
parameters, IL and u 2 , wbilst tbe remaining k are independent observations
If we are concerned witb an outlier arising from a possible sbift in scale
from a modified version of F in wbicb IL or u 2 bave been sbifted in value (IL
(or dispersion) ratber tban location, we would take as tbe corresponding
in eitber direction, u 2 typically increased). In most publisbed work F is a
slippage-type alternative bypotbesis
normal distribution. Tbe models A and B of Ferguson (1961a) are perbaps
tbe most generai expression of tbe normal slippage alternative, reflecting H': one of H; (i= 1, 2, ... , n) bolds,
sbifts of location and dispersion, respectively.
wbere
Model A x1 , x 2, ... , Xn arise independently from normal distributions with
(b > 1). (2.3.4)
common variance, u 2. (Under H they have common mean IL·)
There are known constants ab a 2, ... , an (most of which will be Botb H and H' are immediate analogues of tbe alternative bypotbesis in
zero), an unknown parameter À and an unknown permutation multisample slippage tests in tbe special case of samples eacb of just one
(vb v2, ... , vn) of (1, 2, ... , n) such that the normal distributions observation (bence our terminology). We sball be considering slippage tests
from which the xi arise have means in more detail in Cbapter 5. If we wisb to bandle more tban one outlier, H
(i= 1, 2, ... , n) (2.3.1) and H' need to be appropriately extended, of course, in tbe spirit of
Ferguson's models A and B.
H: À ;i; O (or one-sided analogues, e. g. À >O when the ai have the Again we encounter tbe anomaly tbat tbis type of alternative bypotbesis is
same sign non-specific witb regard to wbicb observation corresponds witb tbe location-
sbifted (or scale-sbifted) distribution. Suppose we encounter a single upper
Model B X1, x 2, ... , Xn arise independently from normal distributions with
outlier, x<n>' in tbe case of a location-sbifted alternative bypotbesis. As in tbe
common me an, IL· ( Under H they ha ve common variance u 2 .)
case of tbe mixture alternative, we migbt bope to test tbe working
There are known positive constants ah a 2, ... , an (most of which
bypotbesis against tbe specific analogous alternative tbat x<n> (specifically,
will be zero), an unknown parameter À and an unknown per-
ratber tban any single xJ bas arisen from N(IL +a, u 2 ), witb a> O. As we
mutation (vb v2, ... , vn) of (1, 2, ... , n) such that the normal
bave said, tbis prospect is not considered in tbe literature and it is not
distributions from which the xi arise have variances
immediately obvious bow sucb an alternative bypotbesis sbould be expres-
(i= 1, 2, ... , n) (2.3.2) sed, even in the slippage context; see Section 3.1 for further discussion.
H: À >O (À <O is irrelevant to the outlier problem ).
(v) Exchangeable alternative
Tbese models are quite generai witb regard to tbe number of outliers in A different approacb to tbe form of tbe alternative bypotbesis, extending
tbe data. Some particularizations, or modifications, are wortb examining. tbe slippage formulation, is to be found in tbe work of Kale, Sinba, Veale,
For illustration we retain tbe normality assumption for F. Anscombe and otbers. Kale and Sinba (1971) and Veale and Kale (1972) were
(1960a) considers tbe case of an outlier arising from a sbift in tbe mean. He concerned respectively witb estimating, and testing, tbe value of tbe mean,
assumes that u 2 is known, tbat xh x 2, . .. , xn-t is a randon sample from 8, in an exponential distribution in a manner wbicb is robust against tbe
N(IL, u 2 ) and tbat under tbe alternative bypotbesis xn arises independently possibility of outliers. Tbe model tbey employ to reflect tbe presence of, for
from N(~-t + au, u 2 ) witb tbe value of a unknown. Guttman (1973b) declares example, a single outlier assumes in its generai form tbat x 1 , x2 , ••• , xi_ 1 ,
34 Outliers in statistical data What should o ne do about outlying observations? 35

(iv) Slippage alternative that under Hone of tbe xi comes from N(IL +a, u 2 ); wbicb one is unknown;
it may be any of tbe xi witb equal probabilities, 1/n. In a non-Bayesian
By far tbe most common type of alternative bypotbesis as a model for tbe
framework tbis extension of tbe slippage alternative bypotbesis (wbicb is
generation of outliers is wbat we sball refer to as tbe slippage alternative. It
best described as an exchangeable alternative bypotbesis-see below) can be
bas been widely discussed and used, and figures (sometimes only implicitly)
in work by Grubbs (1950), Dixon (1950), Anscombe (1960a), Ferguson expressed
(1961a), McMillan and David (1971), McMillan (1971), Guttman (1973b), H:one of Hi (i= 1, 2, ... , n) bolds,
and many others. In its most usual form the slippage alternative states tbat
ali observations apart from some small number k (1 or 2, say) arise wbere
independently from tbe initial model F indexed by location and scale (2.3.3)
parameters, IL and u 2 , wbilst tbe remaining k are independent observations
If we are concerned witb an outlier arising from a possible sbift in scale
from a modified version of F in wbicb IL or u 2 bave been sbifted in value (IL
(or dispersion) ratber tban location, we would take as tbe corresponding
in eitber direction, u 2 typically increased). In most publisbed work F is a
slippage-type alternative bypotbesis
normal distribution. Tbe models A and B of Ferguson (1961a) are perbaps
tbe most generai expression of tbe normal slippage alternative, reflecting H': one of H; (i= 1, 2, ... , n) bolds,
sbifts of location and dispersion, respectively.
wbere
Model A x1 , x 2, ... , Xn arise independently from normal distributions with
(b > 1). (2.3.4)
common variance, u 2. (Under H they have common mean IL·)
There are known constants ab a 2, ... , an (most of which will be Botb H and H' are immediate analogues of tbe alternative bypotbesis in
zero), an unknown parameter À and an unknown permutation multisample slippage tests in tbe special case of samples eacb of just one
(vb v2, ... , vn) of (1, 2, ... , n) such that the normal distributions observation (bence our terminology). We sball be considering slippage tests
from which the xi arise have means in more detail in Cbapter 5. If we wisb to bandle more tban one outlier, H
(i= 1, 2, ... , n) (2.3.1) and H' need to be appropriately extended, of course, in tbe spirit of
Ferguson's models A and B.
H: À ;i; O (or one-sided analogues, e. g. À >O when the ai have the Again we encounter tbe anomaly tbat tbis type of alternative bypotbesis is
same sign non-specific witb regard to wbicb observation corresponds witb tbe location-
sbifted (or scale-sbifted) distribution. Suppose we encounter a single upper
Model B X1, x 2, ... , Xn arise independently from normal distributions with
outlier, x<n>' in tbe case of a location-sbifted alternative bypotbesis. As in tbe
common me an, IL· ( Under H they ha ve common variance u 2 .)
case of tbe mixture alternative, we migbt bope to test tbe working
There are known positive constants ah a 2, ... , an (most of which
bypotbesis against tbe specific analogous alternative tbat x<n> (specifically,
will be zero), an unknown parameter À and an unknown per-
ratber tban any single xJ bas arisen from N(IL +a, u 2 ), witb a> O. As we
mutation (vb v2, ... , vn) of (1, 2, ... , n) such that the normal
bave said, tbis prospect is not considered in tbe literature and it is not
distributions from which the xi arise have variances
immediately obvious bow sucb an alternative bypotbesis sbould be expres-
(i= 1, 2, ... , n) (2.3.2) sed, even in the slippage context; see Section 3.1 for further discussion.
H: À >O (À <O is irrelevant to the outlier problem ).
(v) Exchangeable alternative
Tbese models are quite generai witb regard to tbe number of outliers in A different approacb to tbe form of tbe alternative bypotbesis, extending
tbe data. Some particularizations, or modifications, are wortb examining. tbe slippage formulation, is to be found in tbe work of Kale, Sinba, Veale,
For illustration we retain tbe normality assumption for F. Anscombe and otbers. Kale and Sinba (1971) and Veale and Kale (1972) were
(1960a) considers tbe case of an outlier arising from a sbift in tbe mean. He concerned respectively witb estimating, and testing, tbe value of tbe mean,
assumes that u 2 is known, tbat xh x 2, . .. , xn-t is a randon sample from 8, in an exponential distribution in a manner wbicb is robust against tbe
N(IL, u 2 ) and tbat under tbe alternative bypotbesis xn arises independently possibility of outliers. Tbe model tbey employ to reflect tbe presence of, for
from N(~-t + au, u 2 ) witb tbe value of a unknown. Guttman (1973b) declares example, a single outlier assumes in its generai form tbat x 1 , x2 , ••• , xi_ 1 ,
36 Outliers in statistica[ data What should one do about outlying observations? 37

xi+h ••• , xn arise as independent observations from tbe distribution F of the Considering tbe case of just one discordant value, we need tbe observation
initial model, wbereas xi is a random observation from a distribution G. It is from G to sbow up at one of tbe extremes of the sample. Specifically, if
furtber assumed tbat tbe index i of tbe aberrant (discordant) observation is
equally likely to be any of 1, 2, ... , n. Tbe random variables Xh X 2 , ••• , Xn (2.3.8)
are, on tbis mode!, not independent, but tbey are excbangeable, We sball denotes tbe probability tbat tbe rtb ordered value arises from G and
cali sucb an alternative bypotbesis an exchangeable alternative. dG/dF= t/J(x) is monotone (increasing or decreasing) in x, tben tbe {uJ can
The likelibood of the sample under the alternative bypotbesis of a single be shown to be monotone (increasing or decreasing). Tbus the discordant
d\scordant observation is observation from G is most likely to be eitber the smallest, or the largest, of
1 n x1 , x 2 , ••• , Xn. Tbis is surely wbat we would require of a model to explain a
l
L{x F, G} =-
n i=l
L
g(xi)fl f(x)
i~i
(2.3.5) lower, or an upper, outlier. Tbere are analogous results for severa[ discor-
dant observations. See Mount and Kale (1973).
wbere f(x), g(x) are tbe probability (density) functions of tbe distributions F
and G. F and G migbt be taken from distinct families of distributions, or (vi) Other alternative hypotheses
tbey may correspond merely witb different parameter values witbin a Tbere seem to be no other formai proposals for modelling outlier occur-
single-parameter family. Tbe latter is tbe case in tbe papers referred to rence (eitber as an alternative hypothesis in a test of discordancy, or as a
above, wbere model for robust accommodation of outliers). Some ideas relating to the
f(x) = (1/8)e-<xte) (2.3.6) investigation of multivariate outliers implicitly employ otber modelling
an d concepts. See Cbapter 6.
Remarks by Kruskal (1960b) are relevant to tbe generai question of
g(x) = (b/8)e-(bxte) (O<b<1). (2.3.7) modelling. In particular, be points the need for models which allo w tbe
Wbilst tbe generai form of tbe excbangeable alternative is clearly an occurrence of (measurement error) outliers to depend on the value tbat
extension of the slippage alternative, it comes very dose to the latter in would bave occurred had an error not taken place-but he makes no specific
particular cases sucb as tbe exponential example above. Here it expresses a proposals.
scale sbift of tbe slippage alternative type and tbe corresponding likelibood
is identica! in form (if different in motivation) to tbat wbicb would be (vii) Outlier proneness
employed in a Bayesian analysis of tbe slippage alternative with equal prior The final topic in tbis section cannot really be claimed to constitute a
probabilities for tbe index of tbe observation whicb arises from the anomal- model to describe the occurrence of outliers, altbough it is germane to
ous family, G, as described by Guttman (1973b). model development.
A furtber extension of the excbangeable alternative is given by Joshi Recalling tbe subjective manner in whicb sample observations may be
(1972b). declared outliers we stress again the element of 'surprise' tbey engender.
In applications of such a model for tbe study of outliers, the interest is on Sucb surprise should ideally be conditional on tbe initial model we bave in
robust estimation or testing and bence is concerned witb accommodation of mind for tbe data, so tbat, for example, extreme values in a Caucby sample
outliers ratber tban tests of discordancy. Accordingly we sball return to sucb would need to be even more extreme tban those in a norma! sample if we
work for more detailed study in Section 2.6 and in Cbapter 4. were to declare tbem 'outliers'. A somewhat different attitude seems to be
Wben more tban one outlier is observed the mode! is generalized as implied in recent work by Neyman and Scott (1971) and by Green (1974).
follows. For k discordant observations, we assume tbat xh, Xi2 , • • • , xin-k Botb works consider a method of distinguishing between families of dis-
come from F, whilst xin-k+l' ... , xin come from G 1 , G 2 , ••• , Gk. We might tributions witb regard to tbe differing extents to wbich they are likely to
bave some or ali of tbe Gi identica!. The association of tbe different exbibit outliers. Tbey define a concept of outlier proneness, and conclude,
observations witb tbe different distributions is again assumed to occur at inter alia, tbat tbe families of log-normal and gamma distributions are
random. outlier prone, wbereas the family of Caucby distributions is not.
In tbe cases of eitber one or several discordant observations certain Tbe concept binges on tbe probability P(K, n l F) tbat a random sample
relationsbips must exist between F and G (or G 1 , ••• , Gk) for it to be of size n from a distribution F (in a family 9F) contains an extreme member
reasonable tbat tbe propriety of tbe model will be reflected in outliers. x(n) whicb exceeds X<n-l) by more than an amount K(X<n-l)- x(l)). If tbe
36 Outliers in statistica[ data What should one do about outlying observations? 37

xi+h ••• , xn arise as independent observations from tbe distribution F of the Considering tbe case of just one discordant value, we need tbe observation
initial model, wbereas xi is a random observation from a distribution G. It is from G to sbow up at one of tbe extremes of the sample. Specifically, if
furtber assumed tbat tbe index i of tbe aberrant (discordant) observation is
equally likely to be any of 1, 2, ... , n. Tbe random variables Xh X 2 , ••• , Xn (2.3.8)
are, on tbis mode!, not independent, but tbey are excbangeable, We sball denotes tbe probability tbat tbe rtb ordered value arises from G and
cali sucb an alternative bypotbesis an exchangeable alternative. dG/dF= t/J(x) is monotone (increasing or decreasing) in x, tben tbe {uJ can
The likelibood of the sample under the alternative bypotbesis of a single be shown to be monotone (increasing or decreasing). Tbus the discordant
d\scordant observation is observation from G is most likely to be eitber the smallest, or the largest, of
1 n x1 , x 2 , ••• , Xn. Tbis is surely wbat we would require of a model to explain a
l
L{x F, G} =-
n i=l
L
g(xi)fl f(x)
i~i
(2.3.5) lower, or an upper, outlier. Tbere are analogous results for severa[ discor-
dant observations. See Mount and Kale (1973).
wbere f(x), g(x) are tbe probability (density) functions of tbe distributions F
and G. F and G migbt be taken from distinct families of distributions, or (vi) Other alternative hypotheses
tbey may correspond merely witb different parameter values witbin a Tbere seem to be no other formai proposals for modelling outlier occur-
single-parameter family. Tbe latter is tbe case in tbe papers referred to rence (eitber as an alternative hypothesis in a test of discordancy, or as a
above, wbere model for robust accommodation of outliers). Some ideas relating to the
f(x) = (1/8)e-<xte) (2.3.6) investigation of multivariate outliers implicitly employ otber modelling
an d concepts. See Cbapter 6.
Remarks by Kruskal (1960b) are relevant to tbe generai question of
g(x) = (b/8)e-(bxte) (O<b<1). (2.3.7) modelling. In particular, be points the need for models which allo w tbe
Wbilst tbe generai form of tbe excbangeable alternative is clearly an occurrence of (measurement error) outliers to depend on the value tbat
extension of the slippage alternative, it comes very dose to the latter in would bave occurred had an error not taken place-but he makes no specific
particular cases sucb as tbe exponential example above. Here it expresses a proposals.
scale sbift of tbe slippage alternative type and tbe corresponding likelibood
is identica! in form (if different in motivation) to tbat wbicb would be (vii) Outlier proneness
employed in a Bayesian analysis of tbe slippage alternative with equal prior The final topic in tbis section cannot really be claimed to constitute a
probabilities for tbe index of tbe observation whicb arises from the anomal- model to describe the occurrence of outliers, altbough it is germane to
ous family, G, as described by Guttman (1973b). model development.
A furtber extension of the excbangeable alternative is given by Joshi Recalling tbe subjective manner in whicb sample observations may be
(1972b). declared outliers we stress again the element of 'surprise' tbey engender.
In applications of such a model for tbe study of outliers, the interest is on Sucb surprise should ideally be conditional on tbe initial model we bave in
robust estimation or testing and bence is concerned witb accommodation of mind for tbe data, so tbat, for example, extreme values in a Caucby sample
outliers ratber tban tests of discordancy. Accordingly we sball return to sucb would need to be even more extreme tban those in a norma! sample if we
work for more detailed study in Section 2.6 and in Cbapter 4. were to declare tbem 'outliers'. A somewhat different attitude seems to be
Wben more tban one outlier is observed the mode! is generalized as implied in recent work by Neyman and Scott (1971) and by Green (1974).
follows. For k discordant observations, we assume tbat xh, Xi2 , • • • , xin-k Botb works consider a method of distinguishing between families of dis-
come from F, whilst xin-k+l' ... , xin come from G 1 , G 2 , ••• , Gk. We might tributions witb regard to tbe differing extents to wbich they are likely to
bave some or ali of tbe Gi identica!. The association of tbe different exbibit outliers. Tbey define a concept of outlier proneness, and conclude,
observations witb tbe different distributions is again assumed to occur at inter alia, tbat tbe families of log-normal and gamma distributions are
random. outlier prone, wbereas the family of Caucby distributions is not.
In tbe cases of eitber one or several discordant observations certain Tbe concept binges on tbe probability P(K, n l F) tbat a random sample
relationsbips must exist between F and G (or G 1 , ••• , Gk) for it to be of size n from a distribution F (in a family 9F) contains an extreme member
reasonable tbat tbe propriety of tbe model will be reflected in outliers. x(n) whicb exceeds X<n-l) by more than an amount K(X<n-l)- x(l)). If tbe
38 Outliers in statistica[ data What should one do about outlying observations? 39

supremum of P(K, n l F) over $ is strictly less tban unity, tben tbey call $ protect against: sucb as tbe outlier x<n> or otber extremes. If an independent
(K, n)-outlier-resistant; otherwise it is (K, n)-outlier-prone. If ?; is estimate of u is available, tbis could also be used.
( K, n )-outlier-prone for ali K >O apd ali n > 2 it is outlier-pro ne completely.
Green (1974) sbows tbat tbis is so provided only tbat $ is (K, n)-outlier- (b) Range/ spread statistics
prone for some K >O, n> 2. See also Kale (1975a, 1975b).
H ere we replace tbe numerator witb tbe sample range; for example
(David, Hartley, and Pearson, 1954; Pearson and Stepbens, 1964)
2.4 TEST STATISTICS
X(n)- X(l)
Wbatever tbe form of tbe working and alternative bypotbeses, indeed
sometimes in total disregard of tbese, we can distinguisb a small number of s
different forms of test statistic for tests of discordancy of outliers. Tbese Again s migbt be replaced by a restricted sample analogue, independent
bave obvious intuitive appeal and frequently bave been demonstrated (often estimate or known value of a measure of spread of tbe population. Using tbe
subsequent to tbeir originai introduction an d use) to be supported by range bas tbe disadvantage tbat it is not clear witbout furtber investigation
statistica! test principles applied to appropriate models. For tbe moment we wbetber significant results represent discordancy of an upper outlier, a lower
consider tbe qualitative nature of some test statistics; in Section 2.5 we outlier, or botb.
review any wider statistica! support tbat tbey enjoy and in Cbapter 3 we
consider in detail tbeir precise form and application. (c) Deviation/ spread statistics
Augmenting tbe classification of Tietjen and Moore (1972) we can dis-
Tbis latter difficulty is partly offset by using in tbe numerator a measure of
tinguisb six basic types of test statistic. We sball consider tbese in turn.
tbe distance of an outlier from some measure of centrai tendency in tbe
Some are more appropriate tban otbers in different types of situation, for
data. An example (Grubbs, 1950) for a lower outlier is
example, in examining a single upper outlier, or two lower outliers, or,
i-x(l)
perbaps, an upper and a lower outlier wbilst safeguarding against tbe
possibility of additional upper or lower outliers, and so on. We sball not s
consider sucb fine detail bere, but will examine it more fully in Cbapter 3. As for s, i migbt be based on a restricted sample, or replaced witb an
Tbe only associated concept we will discuss at tbis stage is tbat of masking independent estimate, or population value, of some convenient measure of
(see below). Tbe basic types of test statistics are as follows. location. A modification uses maximized deviation in tbe numerator; for
example, max lxi- il/s (Halperin et al., 1955).
(a) Excess/ spread statistics
(d) Sums of squares statistics
Tbese are ratios of differences between an outlier and its nearest or
next-nearest neigbbour to tbe range, or some otber measure of spread of tbe Somewbat different in form are test statistics expressed as ratios of sums
sample (possibly omitting tbe outlier and otber extreme observations). of squares for tbe restricted and total samples: for example, tbe statistic
Examples are
X(n)- X(n-1)
I (x<•>- X...n-tl
2
/.t, (x,- i)
2

X(n)- X(2)
wbere in,n- 1 = I?,:-r x(i)/(n- 2), proposed by Grubbs (1950) for testing two
(Dixon, 1951, for examining an upper outlier x<n>' avoiding x(l)) or (Irwin, upper outliers X<n- 1), X(n)·
1925)
(e) High-order moment statistics
Statistics sucb as measures of skewness and kurtosis, not specifically
designed for assessing outliers, can nonetbeless be useful in tbis context; for
wbere u is tbe standard deviation in tbe basic mode!. Irwin's statistic
example (Ferguson, 1961a)
assumes u is known, and is particularly relevant for a norma! distribution.
Clearly we could replace u witb an estimate wbicb migbt (perbaps usefully) n! I (xi- i) 3 n I (xi- i) 4
an d
be based on a restricted sample wbicb excludes observations we wisb to [I (xi i) 2]~ [I (xi- i) 2 ] 2 •
38 Outliers in statistica[ data What should one do about outlying observations? 39

supremum of P(K, n l F) over $ is strictly less tban unity, tben tbey call $ protect against: sucb as tbe outlier x<n> or otber extremes. If an independent
(K, n)-outlier-resistant; otherwise it is (K, n)-outlier-prone. If ?; is estimate of u is available, tbis could also be used.
( K, n )-outlier-prone for ali K >O apd ali n > 2 it is outlier-pro ne completely.
Green (1974) sbows tbat tbis is so provided only tbat $ is (K, n)-outlier- (b) Range/ spread statistics
prone for some K >O, n> 2. See also Kale (1975a, 1975b).
H ere we replace tbe numerator witb tbe sample range; for example
(David, Hartley, and Pearson, 1954; Pearson and Stepbens, 1964)
2.4 TEST STATISTICS
X(n)- X(l)
Wbatever tbe form of tbe working and alternative bypotbeses, indeed
sometimes in total disregard of tbese, we can distinguisb a small number of s
different forms of test statistic for tests of discordancy of outliers. Tbese Again s migbt be replaced by a restricted sample analogue, independent
bave obvious intuitive appeal and frequently bave been demonstrated (often estimate or known value of a measure of spread of tbe population. Using tbe
subsequent to tbeir originai introduction an d use) to be supported by range bas tbe disadvantage tbat it is not clear witbout furtber investigation
statistica! test principles applied to appropriate models. For tbe moment we wbetber significant results represent discordancy of an upper outlier, a lower
consider tbe qualitative nature of some test statistics; in Section 2.5 we outlier, or botb.
review any wider statistica! support tbat tbey enjoy and in Cbapter 3 we
consider in detail tbeir precise form and application. (c) Deviation/ spread statistics
Augmenting tbe classification of Tietjen and Moore (1972) we can dis-
Tbis latter difficulty is partly offset by using in tbe numerator a measure of
tinguisb six basic types of test statistic. We sball consider tbese in turn.
tbe distance of an outlier from some measure of centrai tendency in tbe
Some are more appropriate tban otbers in different types of situation, for
data. An example (Grubbs, 1950) for a lower outlier is
example, in examining a single upper outlier, or two lower outliers, or,
i-x(l)
perbaps, an upper and a lower outlier wbilst safeguarding against tbe
possibility of additional upper or lower outliers, and so on. We sball not s
consider sucb fine detail bere, but will examine it more fully in Cbapter 3. As for s, i migbt be based on a restricted sample, or replaced witb an
Tbe only associated concept we will discuss at tbis stage is tbat of masking independent estimate, or population value, of some convenient measure of
(see below). Tbe basic types of test statistics are as follows. location. A modification uses maximized deviation in tbe numerator; for
example, max lxi- il/s (Halperin et al., 1955).
(a) Excess/ spread statistics
(d) Sums of squares statistics
Tbese are ratios of differences between an outlier and its nearest or
next-nearest neigbbour to tbe range, or some otber measure of spread of tbe Somewbat different in form are test statistics expressed as ratios of sums
sample (possibly omitting tbe outlier and otber extreme observations). of squares for tbe restricted and total samples: for example, tbe statistic
Examples are
X(n)- X(n-1)
I (x<•>- X...n-tl
2
/.t, (x,- i)
2

X(n)- X(2)
wbere in,n- 1 = I?,:-r x(i)/(n- 2), proposed by Grubbs (1950) for testing two
(Dixon, 1951, for examining an upper outlier x<n>' avoiding x(l)) or (Irwin, upper outliers X<n- 1), X(n)·
1925)
(e) High-order moment statistics
Statistics sucb as measures of skewness and kurtosis, not specifically
designed for assessing outliers, can nonetbeless be useful in tbis context; for
wbere u is tbe standard deviation in tbe basic mode!. Irwin's statistic
example (Ferguson, 1961a)
assumes u is known, and is particularly relevant for a norma! distribution.
Clearly we could replace u witb an estimate wbicb migbt (perbaps usefully) n! I (xi- i) 3 n I (xi- i) 4
an d
be based on a restricted sample wbicb excludes observations we wisb to [I (xi i) 2]~ [I (xi- i) 2 ] 2 •
40 Outliers in statistica[ data What should o ne do about outlying observations? 41

Anotber omnibus statistic of relevance to tbe testing of outliers is tbe wbere in and s' are tbe sample mean, and an estimate of spread, obtained
W-statistic of Sbapiro and Wilk (1965, 1972) and Sbapiro, Wilk, and Cben on omission of x<n>· If tbe result is significant, x<n-l) is judged discordant and
(1968) already referred to. Tbis consists for norma! data of tbe ratio of tbe tbe procedure repeated for tbe next outlier, and so on until a non-discordant
square of a particular type of linear combination of all tbe ordered sample value is reacbed .. Tbe statistica! properties of sucb a repeated procedure are
values to tbe sum of squares of tbe individuai deviations about tbe sample investigated for tbe case wbere tbere are· a t most two outliers (McMillan an d
me an. David, 1971; McMillan, 1971; Moran and McMillan, 1973). See Section 3.3
(f) Extreme/ location statistics for fuller discussion of sucb 'sequential' (or as we sball cali tbem consecutive)
tests of outliers, also Tietjen and Moore (1972) and tbe earlier proposals by
Anotber class of test statistics takes tbe form of ratios of extreme values Dixon (1953) and Ferguson (1961a).
to measures of location. Tbese are particularly relevant to examining out-
liers wbere tbe initial model is from tbe gamma family of distributions. As
an example, a test of discordancy for an upper outlier may use 2.5 STATISTICAL PRINCIPLES UNDERLYING TESTS OF
DISCORDANCY
X(n/X.
Sucb statistics bave beeq examined by Epstein (1960a,b) and Likes (1966). In any context, a statistica! test migbt be able to be constructed merely by
Some new results on tbe null distributions are given by Lewis and Fieller setting up an intuitively appealing test statistic and rejecting or accepting
(1978). some working bypotbesis on tbe basis of tbe value of tbe test statistic. Tbis is
true of tests of discordancy for outliers. Indeed, tbe subjective basis of tbe
In a review paper, Grubbs (1969)' gives illustrative examples of tbe use of outlier concept and tbe long bistory of its study bas tended to encourage an
several different types of discordancy statistic, applying tbem to several sets informai attitude to proposals for 'outlier rejection'. At tbe very least,
of actual data. bowever, we need to be able to determine rejection criteria wbicb relate to
One effect tbat many of tbese statistics must be prone to, to differing known significance levels: tbe distribution of tbe test statistic under tbe
extents, is tbat of masking, tbat is to say, tbe tendency for tbe presence of working bypotbesis of no discordant outliers needs to be known. An initial
extreme observations not declared as outliers to mask tbe discordancy of filter tbus operates: we can consider only tbose test statistics for wbicb we
more extreme observations under investigation as outliers. Tbe term mask- know sucb null distributions.
ing is due to Murpby (1951); tbe pbenomenon seems to bave been first But we must bope for more tban tbis. To choose between riva! tests of
discussed by Pearson and Cbandra Sekar (1936); some recent comments are discordancy for a particular type of outlier manifestation we need to know
made by McMillan (1971) and by and Moore (1972) wbo give an sometbing about tbe power of tbe rival tests. Tbis requires botb tbe
empirica! example of tbe masking effect. See also Section 3.2. specification of an alternative bypotbesis to explain tbe outliers (a topic
Some test statistics bave been proposed to provide tests for several discussed at some lengtb above) and also tbe ability to bandle tbe often
outliers simultaneously. One example is the statistic in (d) above due to complicated distributional forms of tbe test statistic under sucb an alterna-
Grubbs (1950) wbicb is relevant to tbe case of two upper outliers. A different tive bypotbesis. In tbis respect tbe field of viable tests of discordancy
approacb is to examine tbe outliers in sequence, using a bierarcbical form of becomes even more limited.
test. McMillan and David (1971) and McMillan (1971) describe wbat tbey Ideally we sbould wisb to go even furtber and construct tests wbicb at
cali a 'sequential' test for several upper outliers '(tbe terminology is confus- tbeir conception seek to express optimality properties or at least to satisfy
ing, since tbe test is not sequential in tbe usual sense of taking observations certain useful practical constraints. Tbus if we cannot (as is inevitably tbe
one at a time in order to draw conclusions as quickly as possible: it is in fact case for most outlier tests) obtain tests wbicb are globally uniformly most
a fixed-sample-size test). Tbey first examine tbe principal upper outlier x<n> powerful we can at least strive for local optimality or unbiasedness or tbe
by means of a deviation/spread statistic of form [x<n)- x]/ s, wbere s2 may be satisfaction of certain invariance properties. Alternatively we migbt cboose
based on tbe sample data alone or combined with an independent estimate to construct tests by some accredited practical metbod, sucb as tbe max-
of variance. If x<n> proves to be discordant on this basis, they proceed to apply imum likelihood ratio principle, in tbe bope tbat tbe frequently encountered
a similar test to X<n- 1 ) in tbe reduced sample excluding X<n> using tbe statistic useful cbaracteristics of sucb a metbod transfer to tbe outlier problem.
We sball now consider various tests of discordancy wbicb bave some
X(n-1)- Xn sounder basis tban mere intuitive appeal. Tbis matter will be considered
s' more fully in tbe detailed discussion of tests of discordancy in Cbapter 3.
40 Outliers in statistica[ data What should o ne do about outlying observations? 41

Anotber omnibus statistic of relevance to tbe testing of outliers is tbe wbere in and s' are tbe sample mean, and an estimate of spread, obtained
W-statistic of Sbapiro and Wilk (1965, 1972) and Sbapiro, Wilk, and Cben on omission of x<n>· If tbe result is significant, x<n-l) is judged discordant and
(1968) already referred to. Tbis consists for norma! data of tbe ratio of tbe tbe procedure repeated for tbe next outlier, and so on until a non-discordant
square of a particular type of linear combination of all tbe ordered sample value is reacbed .. Tbe statistica! properties of sucb a repeated procedure are
values to tbe sum of squares of tbe individuai deviations about tbe sample investigated for tbe case wbere tbere are· a t most two outliers (McMillan an d
me an. David, 1971; McMillan, 1971; Moran and McMillan, 1973). See Section 3.3
(f) Extreme/ location statistics for fuller discussion of sucb 'sequential' (or as we sball cali tbem consecutive)
tests of outliers, also Tietjen and Moore (1972) and tbe earlier proposals by
Anotber class of test statistics takes tbe form of ratios of extreme values Dixon (1953) and Ferguson (1961a).
to measures of location. Tbese are particularly relevant to examining out-
liers wbere tbe initial model is from tbe gamma family of distributions. As
an example, a test of discordancy for an upper outlier may use 2.5 STATISTICAL PRINCIPLES UNDERLYING TESTS OF
DISCORDANCY
X(n/X.
Sucb statistics bave beeq examined by Epstein (1960a,b) and Likes (1966). In any context, a statistica! test migbt be able to be constructed merely by
Some new results on tbe null distributions are given by Lewis and Fieller setting up an intuitively appealing test statistic and rejecting or accepting
(1978). some working bypotbesis on tbe basis of tbe value of tbe test statistic. Tbis is
true of tests of discordancy for outliers. Indeed, tbe subjective basis of tbe
In a review paper, Grubbs (1969)' gives illustrative examples of tbe use of outlier concept and tbe long bistory of its study bas tended to encourage an
several different types of discordancy statistic, applying tbem to several sets informai attitude to proposals for 'outlier rejection'. At tbe very least,
of actual data. bowever, we need to be able to determine rejection criteria wbicb relate to
One effect tbat many of tbese statistics must be prone to, to differing known significance levels: tbe distribution of tbe test statistic under tbe
extents, is tbat of masking, tbat is to say, tbe tendency for tbe presence of working bypotbesis of no discordant outliers needs to be known. An initial
extreme observations not declared as outliers to mask tbe discordancy of filter tbus operates: we can consider only tbose test statistics for wbicb we
more extreme observations under investigation as outliers. Tbe term mask- know sucb null distributions.
ing is due to Murpby (1951); tbe pbenomenon seems to bave been first But we must bope for more tban tbis. To choose between riva! tests of
discussed by Pearson and Cbandra Sekar (1936); some recent comments are discordancy for a particular type of outlier manifestation we need to know
made by McMillan (1971) and by and Moore (1972) wbo give an sometbing about tbe power of tbe rival tests. Tbis requires botb tbe
empirica! example of tbe masking effect. See also Section 3.2. specification of an alternative bypotbesis to explain tbe outliers (a topic
Some test statistics bave been proposed to provide tests for several discussed at some lengtb above) and also tbe ability to bandle tbe often
outliers simultaneously. One example is the statistic in (d) above due to complicated distributional forms of tbe test statistic under sucb an alterna-
Grubbs (1950) wbicb is relevant to tbe case of two upper outliers. A different tive bypotbesis. In tbis respect tbe field of viable tests of discordancy
approacb is to examine tbe outliers in sequence, using a bierarcbical form of becomes even more limited.
test. McMillan and David (1971) and McMillan (1971) describe wbat tbey Ideally we sbould wisb to go even furtber and construct tests wbicb at
cali a 'sequential' test for several upper outliers '(tbe terminology is confus- tbeir conception seek to express optimality properties or at least to satisfy
ing, since tbe test is not sequential in tbe usual sense of taking observations certain useful practical constraints. Tbus if we cannot (as is inevitably tbe
one at a time in order to draw conclusions as quickly as possible: it is in fact case for most outlier tests) obtain tests wbicb are globally uniformly most
a fixed-sample-size test). Tbey first examine tbe principal upper outlier x<n> powerful we can at least strive for local optimality or unbiasedness or tbe
by means of a deviation/spread statistic of form [x<n)- x]/ s, wbere s2 may be satisfaction of certain invariance properties. Alternatively we migbt cboose
based on tbe sample data alone or combined with an independent estimate to construct tests by some accredited practical metbod, sucb as tbe max-
of variance. If x<n> proves to be discordant on this basis, they proceed to apply imum likelihood ratio principle, in tbe bope tbat tbe frequently encountered
a similar test to X<n- 1 ) in tbe reduced sample excluding X<n> using tbe statistic useful cbaracteristics of sucb a metbod transfer to tbe outlier problem.
We sball now consider various tests of discordancy wbicb bave some
X(n-1)- Xn sounder basis tban mere intuitive appeal. Tbis matter will be considered
s' more fully in tbe detailed discussion of tests of discordancy in Cbapter 3.
42 Outliers in statistica[ data What should o ne do about outlying observations ? 43

We must stress tbat tbe attribution of any optimality properties is crucially normal distributions, one or more of wbicb may bave a mean, or variance,
dependent on tbe adopted form for tbe alternative bypotbesis. Tbe cboice of wbicb differs from tbe otbers-see Cbapter 5) wbicb particularize, wben
an alternative bypotbesis is problematical; tbis uncertainty in turn may samples are ali of size 1, to tests of discordancy o n a slippage type
reduce tbe utility of any apparent 'optimum' properties of a test of discor- alternative bypotbesis. Paulson (1952b) considers a multi-decision formula-
dancy. We must acknowledge tbis dilemma in reacting to tbe foliowing tion wbere under'~ 0 we decide tbat tbe observations ali come from N(IL, u 2 )
results. wbereas un der ~i (i = 1, 2, ... , n) we decide tbat ILi bas slipped to IL + a
Most study of outliers assumes an initial normal distribution, and tbe most (a> 0). Under tbe restrictions tbat if ali means are IL we accept ~o witb
common alternative bypotbesis is of tbe slippage type. Ferguson (1961a) probability 1- a and tbat tbe decision procedure is invariant witb respect to
demonstrates tbat certain quasi-optimum tests of discordancy can be con- tbe index of tbe slipped distribution and to positive change of scale and
structed in sucb situations. He considers, in tbe context of bis mode[ A arbitrary cbange of origin, be sbows tbere to be an optimum procedure in
(slippage of tbe mean), tests wbicb are invariant witb respect to tbe labelling tbe sense of maximizing the probability of making the correct decision wben
of tbe observations and to cbanges of scale and location. (Tbe cbange of slippage to tbe rigbt bas occurred. Witb tbe modification of proof described
scale must be effected by multiplication by a common positive quantity by Kudo (1956a) to cope witb samples of size 1, tbis leads to tbe optimum
wbilst tbe distribution for eacb individuai observation may suffer a specific decision rule: if
cbange of location.) We recali tbat mode[ A declares (see 2.3.1)
ILi= IL +uaav1• [I?=t (xi- i) 2 f'
Ferguson sbows tbat a locally best'invariant test of size a exists for testing tben wben t :s;;; ha conclude no discordant outlier, wbilst if t> ha conclude
H: a= o against tbe one-sided alternative H: a> o. l t takes tbe foliowing tbat x<n> is discordant, where ha is cbosen to ensure tbat ~o is adopted witb
form: if IL3(a) ~O reject H wbenever .Jb 1 ~ K 1 wbere K 1 is cbosen to yield a probability 1- a in tbe null situation.
test of size a, and In Paulson (1952b) tbe procedure is sbown to be tbe Bayes solution wben
equal prior probabilities are assigned to ~ 1 , ~ 2 , ••• , ~n· David (1970, p.
1 n
IL3(a) =-
ni=l
L (ai- ii) 3 (2.5.1) 180) sbows tbat it bas corresponding support in non-Bayesian terms using
tbe Neyman-Pearson lemma. Kudo (1956b) demonstrates tbat tbe optimal-
b _ .Jn I (xi -x) 3 ity property remains wben slippage in tbe mean is accompanied by decrease
(2.5.2) in variance for tbe slipped distribution, wbilst Kapur (1957) adduces an
1
- [I (xi- x) P
2
unbiasedness property for tbe Paulson procedure in tbe sense tbat tbe
(tbis is just tbe coefficient of skewness statistic described above). probability of incorrectly taking any of tbe decisions ~ 0 , ~ 1 , ••. , ~n never
For a two-sided test, wbere tbe alternative bypothesis is H': a~ O, tbere is exceeds tbe probability of correctly taking any one of tbese decisions. David
a locally best unbiased invariant test of size a wbicb takes tbe form: if (1970, p. 182) also remarks tbat an obvious modification to allow for an
k 4 (a) ~O reject H wbenever b2 ~ K 2 wbere K 2 is cbosen to yield a test of independent estimate of u 2 remains optimum in tbe Paulson sense. Kudo
size a, kia) is tbe fourtb k-statistic of a 11 a2 , ••• , an and (1956a) extends Paulson's results to tbe case of slippage eitber to tbe rigbt
or to tbe left. Truax (1953) presents tbe immediately parallel results for
b _ n I (xi - i)
4
(2.5.3) unidirectional slippage of the variance of one of the normal distributions.
z- [L (xi- i)z]z
A furtber extension, relating to simultaneous slippage of tbe means of two
(tbe coefficient of kurtosis). distributions by equal amounts, but in apposite directions, is given by
Witb mode[ B, tbe alternative bypotbesis is, witb Ramacbandran and Kbatri (1957).
(In ali work relating to investigation of single outliers, we must remain
u; = u 2 exp(aavJ, ever vigilant to tbe prospect of masking if more tban one outlier is present.)
H": à >O. Under tbe same invariance requirements a locally best invariant In tbe work of Ferguson and of Paulson we note a basic distinction
test of size a exists an d leads to rejection if b2 > K, wbere K is cbosen to between tbe cbosen measures of performance of tbe tests: power and
yield a test of size a. maximum probability of correct action. Tbis focuses attention on tbe tborny
Stili restricting attention to single outliers and an initial normal distribu- issue (Iargely unresolved to date) of what constitutes an approp-
tion, tbere are results for slippage tests (wbere we bave several samples from riate performance criterion for a test for discordancy. Some comments on
42 Outliers in statistica[ data What should o ne do about outlying observations ? 43

We must stress tbat tbe attribution of any optimality properties is crucially normal distributions, one or more of wbicb may bave a mean, or variance,
dependent on tbe adopted form for tbe alternative bypotbesis. Tbe cboice of wbicb differs from tbe otbers-see Cbapter 5) wbicb particularize, wben
an alternative bypotbesis is problematical; tbis uncertainty in turn may samples are ali of size 1, to tests of discordancy o n a slippage type
reduce tbe utility of any apparent 'optimum' properties of a test of discor- alternative bypotbesis. Paulson (1952b) considers a multi-decision formula-
dancy. We must acknowledge tbis dilemma in reacting to tbe foliowing tion wbere under'~ 0 we decide tbat tbe observations ali come from N(IL, u 2 )
results. wbereas un der ~i (i = 1, 2, ... , n) we decide tbat ILi bas slipped to IL + a
Most study of outliers assumes an initial normal distribution, and tbe most (a> 0). Under tbe restrictions tbat if ali means are IL we accept ~o witb
common alternative bypotbesis is of tbe slippage type. Ferguson (1961a) probability 1- a and tbat tbe decision procedure is invariant witb respect to
demonstrates tbat certain quasi-optimum tests of discordancy can be con- tbe index of tbe slipped distribution and to positive change of scale and
structed in sucb situations. He considers, in tbe context of bis mode[ A arbitrary cbange of origin, be sbows tbere to be an optimum procedure in
(slippage of tbe mean), tests wbicb are invariant witb respect to tbe labelling tbe sense of maximizing the probability of making the correct decision wben
of tbe observations and to cbanges of scale and location. (Tbe cbange of slippage to tbe rigbt bas occurred. Witb tbe modification of proof described
scale must be effected by multiplication by a common positive quantity by Kudo (1956a) to cope witb samples of size 1, tbis leads to tbe optimum
wbilst tbe distribution for eacb individuai observation may suffer a specific decision rule: if
cbange of location.) We recali tbat mode[ A declares (see 2.3.1)
ILi= IL +uaav1• [I?=t (xi- i) 2 f'
Ferguson sbows tbat a locally best'invariant test of size a exists for testing tben wben t :s;;; ha conclude no discordant outlier, wbilst if t> ha conclude
H: a= o against tbe one-sided alternative H: a> o. l t takes tbe foliowing tbat x<n> is discordant, where ha is cbosen to ensure tbat ~o is adopted witb
form: if IL3(a) ~O reject H wbenever .Jb 1 ~ K 1 wbere K 1 is cbosen to yield a probability 1- a in tbe null situation.
test of size a, and In Paulson (1952b) tbe procedure is sbown to be tbe Bayes solution wben
equal prior probabilities are assigned to ~ 1 , ~ 2 , ••• , ~n· David (1970, p.
1 n
IL3(a) =-
ni=l
L (ai- ii) 3 (2.5.1) 180) sbows tbat it bas corresponding support in non-Bayesian terms using
tbe Neyman-Pearson lemma. Kudo (1956b) demonstrates tbat tbe optimal-
b _ .Jn I (xi -x) 3 ity property remains wben slippage in tbe mean is accompanied by decrease
(2.5.2) in variance for tbe slipped distribution, wbilst Kapur (1957) adduces an
1
- [I (xi- x) P
2
unbiasedness property for tbe Paulson procedure in tbe sense tbat tbe
(tbis is just tbe coefficient of skewness statistic described above). probability of incorrectly taking any of tbe decisions ~ 0 , ~ 1 , ••. , ~n never
For a two-sided test, wbere tbe alternative bypothesis is H': a~ O, tbere is exceeds tbe probability of correctly taking any one of tbese decisions. David
a locally best unbiased invariant test of size a wbicb takes tbe form: if (1970, p. 182) also remarks tbat an obvious modification to allow for an
k 4 (a) ~O reject H wbenever b2 ~ K 2 wbere K 2 is cbosen to yield a test of independent estimate of u 2 remains optimum in tbe Paulson sense. Kudo
size a, kia) is tbe fourtb k-statistic of a 11 a2 , ••• , an and (1956a) extends Paulson's results to tbe case of slippage eitber to tbe rigbt
or to tbe left. Truax (1953) presents tbe immediately parallel results for
b _ n I (xi - i)
4
(2.5.3) unidirectional slippage of the variance of one of the normal distributions.
z- [L (xi- i)z]z
A furtber extension, relating to simultaneous slippage of tbe means of two
(tbe coefficient of kurtosis). distributions by equal amounts, but in apposite directions, is given by
Witb mode[ B, tbe alternative bypotbesis is, witb Ramacbandran and Kbatri (1957).
(In ali work relating to investigation of single outliers, we must remain
u; = u 2 exp(aavJ, ever vigilant to tbe prospect of masking if more tban one outlier is present.)
H": à >O. Under tbe same invariance requirements a locally best invariant In tbe work of Ferguson and of Paulson we note a basic distinction
test of size a exists an d leads to rejection if b2 > K, wbere K is cbosen to between tbe cbosen measures of performance of tbe tests: power and
yield a test of size a. maximum probability of correct action. Tbis focuses attention on tbe tborny
Stili restricting attention to single outliers and an initial normal distribu- issue (Iargely unresolved to date) of what constitutes an approp-
tion, tbere are results for slippage tests (wbere we bave several samples from riate performance criterion for a test for discordancy. Some comments on
44 Outliers in statistica[ data What should one do about outlying obseroations? 45

the relative merits of five possible performance characteristics are made by practice is valid enough, except that sticking to a precise level of significa?ce may. not
David (1970, pp. 184-190). For a detailed discussion of these and related be crucial in exploratory work, frequently the purpose of tests for out~Iers .. Stnctly
issues see Section 3.2. speaking, one-sided tests should be confined to the detection. of ~uthers m cases
A development of Paulson's ideas for coping with severa! outliers is where only those in a specified direction are of interest, or to situations ~uch as the
repeated determination of the melting point of a substance, wher~ outl~ers ~ue. to
given by Murphy (1951) and further studied by McMillan (1971). The impurities must be on the low side since impurities depress th~ meltmg pm~t. S1m~Iar
development is limited: the alternative model is tbat k outliers ali arise from arguments show that it is equally incorrect to pick one's outher test after mspectlon
a common sbift in the mean (by tbe same amount and in tbe same direction). of the data.
For slippage to tbe right, tbe optimum test statistic is Tbis is a commonly expressed viewpoint but it is in opposition to tbe
(X(n) + X(n-1) ••• + X(n-k+l)- ki)/ S. attitude we bave adopted in tbis book. We define outliers in subjective terms
relative to a particular set of data-tbe data tbemselves initiate our interest
There seems to be a deartb of results on optimal tests for discordancy in outliers. If we declare tbere to be an upper outlier, x<n>' it seems natural,
when tbe initial mode! is non-norma[. For tbe gamma family, statistics based therefore, to use the appropriate one-sided criterion, and to ascribe discor-
on x<n>li do at least bave tbe advantage of arising from a maximum dancy if, say
likelihood ratio criterion using a slippage type alternative model for a single
upper outlier (this is true also of (x<n>- i)/ s wben tbe initial distribution is (X(n)- i)/ S > C
normal). But beyond this we appear to be able to make few claims of for some suitable value of C. Of course, if we wisb to protect ourselves
statistica! respectability in the sense of optimum (or even practically desira- against otber outliers we sbould reflect tbis in tbe cboice of test statistic.
ble) performance characteristics for· tests of discordancy witb non-normal Also, if our suspicions about x<n> are well founded tben use of max lxi- il/ s
initial distributions or multiple outliers (altbougb Ferguson's statistics .Jb 1 ratber tban (x( n)- i)/ s is no t going to be materially important. Surely wha t
and b2 remain optimum in norma! samples for multiple outliers wbose matters is
number, k, merely satisfies tbe reasonable constraints k < 0.5 n and k <
0.31n, respectively). Tbe performance of some specific (intuitively based) (i) our declaration of tbe outliers;
tests has been examined empirically, by simulation, or tbeoretically, and the (ii) tbe alternative model we employ.
results will be discussed wbere appropriate in Cbapter 3. But inter- This relates to tbe earlier remarks about tbe lack of regard for the specific
comparisons revealing uniform superiority of one test over anotber are outlier in tbe formulation of tbe alternative bypotbesis and tbe test of
conspicuously lacking. discordancy. A few results on performance cbaracteristics of tests do relate
Discussion of tbe behaviour of non-parametric tests for outhers (such as to tbis matter. David and Paulson (1965) consider tbe bebaviour of some
those proposed by Walsh 1950, 1959, 1965) will be deferred to Cbapter 8. tests of discordancy in terms of outlier-specific criteria (included in tbe five
In concluding this section brief comment is ne~essary on two related discussed by David, 1970, pp. 185-186) such as tbe probability tbat a
generai matters. Some proposed test statistics are clearly suited to one-sided particular sample member is significantly large and is tbe largest one, or the
tests, e.g. probability tbat a particular sample member is significantly large given tbat
X(n)-i it is tbe largest one. See again Section 3.3 for more details.
s Some Bayesian metbods for examining outliers bave been published. Tbose
wbose prime function is to provide a means of examining outliers per se,
whilst otbers relate to two-sided tests, e.g.
rather tban to promote generai inference procedures wbich are robust
maxlxi-il against tbe presence of possible outliers, are relevant to tbe current discus-
s sion. Bayesian methods for accommodation of outliers are briefly discussed
in Section 2.6; tbe generai Bayesian scene in relation to outliers is described
David (1970, p. 175) discusses a possible conflict of choice in relation to in detail in Cbapter 8.
wbicb of these types of test should be used in practice. He remarks as As we remarked in Section 2.3 Guttman (1973b) presents a method of
follows: 'detection of spuriosity' in a bigbly specific situation. Tbe data are assumed,
Indeed, the question may be raised whether we should not always use a two-sided ab initio, to arise as independent observations from N(~J-, u 2 ). To allow for
test, since applying a one-sided test in the direction indicated as most promising ~y the possibility of a single observation baving been 'generated by a spurious
the sample at hand is clearly not playing fair. This criticism of what is often done m source, wbere tbe spuriosity is of tbe mean sbift type' an alternative mode! is
44 Outliers in statistica[ data What should one do about outlying obseroations? 45

the relative merits of five possible performance characteristics are made by practice is valid enough, except that sticking to a precise level of significa?ce may. not
David (1970, pp. 184-190). For a detailed discussion of these and related be crucial in exploratory work, frequently the purpose of tests for out~Iers .. Stnctly
issues see Section 3.2. speaking, one-sided tests should be confined to the detection. of ~uthers m cases
A development of Paulson's ideas for coping with severa! outliers is where only those in a specified direction are of interest, or to situations ~uch as the
repeated determination of the melting point of a substance, wher~ outl~ers ~ue. to
given by Murphy (1951) and further studied by McMillan (1971). The impurities must be on the low side since impurities depress th~ meltmg pm~t. S1m~Iar
development is limited: the alternative model is tbat k outliers ali arise from arguments show that it is equally incorrect to pick one's outher test after mspectlon
a common sbift in the mean (by tbe same amount and in tbe same direction). of the data.
For slippage to tbe right, tbe optimum test statistic is Tbis is a commonly expressed viewpoint but it is in opposition to tbe
(X(n) + X(n-1) ••• + X(n-k+l)- ki)/ S. attitude we bave adopted in tbis book. We define outliers in subjective terms
relative to a particular set of data-tbe data tbemselves initiate our interest
There seems to be a deartb of results on optimal tests for discordancy in outliers. If we declare tbere to be an upper outlier, x<n>' it seems natural,
when tbe initial mode! is non-norma[. For tbe gamma family, statistics based therefore, to use the appropriate one-sided criterion, and to ascribe discor-
on x<n>li do at least bave tbe advantage of arising from a maximum dancy if, say
likelihood ratio criterion using a slippage type alternative model for a single
upper outlier (this is true also of (x<n>- i)/ s wben tbe initial distribution is (X(n)- i)/ S > C
normal). But beyond this we appear to be able to make few claims of for some suitable value of C. Of course, if we wisb to protect ourselves
statistica! respectability in the sense of optimum (or even practically desira- against otber outliers we sbould reflect tbis in tbe cboice of test statistic.
ble) performance characteristics for· tests of discordancy witb non-normal Also, if our suspicions about x<n> are well founded tben use of max lxi- il/ s
initial distributions or multiple outliers (altbougb Ferguson's statistics .Jb 1 ratber tban (x( n)- i)/ s is no t going to be materially important. Surely wha t
and b2 remain optimum in norma! samples for multiple outliers wbose matters is
number, k, merely satisfies tbe reasonable constraints k < 0.5 n and k <
0.31n, respectively). Tbe performance of some specific (intuitively based) (i) our declaration of tbe outliers;
tests has been examined empirically, by simulation, or tbeoretically, and the (ii) tbe alternative model we employ.
results will be discussed wbere appropriate in Cbapter 3. But inter- This relates to tbe earlier remarks about tbe lack of regard for the specific
comparisons revealing uniform superiority of one test over anotber are outlier in tbe formulation of tbe alternative bypotbesis and tbe test of
conspicuously lacking. discordancy. A few results on performance cbaracteristics of tests do relate
Discussion of tbe behaviour of non-parametric tests for outhers (such as to tbis matter. David and Paulson (1965) consider tbe bebaviour of some
those proposed by Walsh 1950, 1959, 1965) will be deferred to Cbapter 8. tests of discordancy in terms of outlier-specific criteria (included in tbe five
In concluding this section brief comment is ne~essary on two related discussed by David, 1970, pp. 185-186) such as tbe probability tbat a
generai matters. Some proposed test statistics are clearly suited to one-sided particular sample member is significantly large and is tbe largest one, or the
tests, e.g. probability tbat a particular sample member is significantly large given tbat
X(n)-i it is tbe largest one. See again Section 3.3 for more details.
s Some Bayesian metbods for examining outliers bave been published. Tbose
wbose prime function is to provide a means of examining outliers per se,
whilst otbers relate to two-sided tests, e.g.
rather tban to promote generai inference procedures wbich are robust
maxlxi-il against tbe presence of possible outliers, are relevant to tbe current discus-
s sion. Bayesian methods for accommodation of outliers are briefly discussed
in Section 2.6; tbe generai Bayesian scene in relation to outliers is described
David (1970, p. 175) discusses a possible conflict of choice in relation to in detail in Cbapter 8.
wbicb of these types of test should be used in practice. He remarks as As we remarked in Section 2.3 Guttman (1973b) presents a method of
follows: 'detection of spuriosity' in a bigbly specific situation. Tbe data are assumed,
Indeed, the question may be raised whether we should not always use a two-sided ab initio, to arise as independent observations from N(~J-, u 2 ). To allow for
test, since applying a one-sided test in the direction indicated as most promising ~y the possibility of a single observation baving been 'generated by a spurious
the sample at hand is clearly not playing fair. This criticism of what is often done m source, wbere tbe spuriosity is of tbe mean sbift type' an alternative mode! is
46 Outliers in statistica/ data What should one do about outlying obseroations? 47

adopted in wbicb one observation (arising at random from tbe n in tbe tbe variance. Ratber tban appropriately reducing tbe effect of extreme
sample) comes from N(IL +a, o- 2 ). Starting witb a non-informative prior values it encourages underestimation!
distribution for f.L, u, and a, tbe marginai posterior distribution of a is If on tbe otber band a reasonable alternative bypotbesis is of one of tbe
determined and its form is exploited to construct a principle for determining typ~s wbicb expresses c~ntamination of tbe initial mod~l (perbaps expressing
wbetber 'spuriosity bas or bas no t occurred'. Sin ce our potential application low-probability mixing, or slippage of one or two dtscordant values) tbe
of tbis approacb is likely to be triggered by tbe occurrence of an outlier, tbe estimation or testing of parameters in tbe initial model may well be tbe
principle can be regarded as a means of assessing discordancy of a single matter of principal interest and it is sensible to employ robust procedures to
outlier. protect against tbe occasionai low-probability component or sli~ped va~ue.
Otber work in tbe Bayesian idiom, by Box and Tiao (1968), De Finetti Tbe idea tbat we may wisb, in tbis spirit, to do more tben re1ect outhers,
(1961), Dempster and Rosner (1971), Guttman~ and otbers will be discussed tbat is to devise statistically respectable means of accommodating tbem in a
in Cbapter 8. wider 'inferential scbeme addressed to tbe initial model, takes our interest
away from tests of discordancy. Tbe outliers tbemselves are no longer of
prime concern. We wisb to proceed safely in spite of tbem! Tbis is of tbe
2.6 ACCOMMODATION OF OUTLIERS: ROBUST ESTIMATION
essence of tbe robustness concept.
AND TESTING Some interest in tbis alternative view of tbe outlier problem begins to
Tbere bas been an increasing interest over recent years in statistica! proce- sbow itself in quite early work. Tbe ideas of Glaisber (1872), Newcomb
dures wbicb provide a measure of protection against uncertainties of know- (1886), Mendeleev (1895), Student (1927), and Jeffreys (1932) amount to
ledge of tbe data generating mecbanism. Tbese include robust metbods for reducing tbe weigbt attacbed to extreme values in estimation, tbe latter
estimating or testing summary measùres of tbe underlying distribution: paper paying specific regard to outliers as tbe extreme members of tbe
wbere tbe estimators or tests retain desirable statistica! properties over a sample.
range of different possible distributional forms. Alternatively, procedures A review of later work wbicb implicitly or explicitly attempts to accom-
wbicb bave been derived to suit tbe specific properties of a particular modate outliers in tbe inference process conveniently divides itself into two
distribution become even more appealing if tbey can be sbown to be robust parts. Tbe first contains tbose metbods of estimation wbicb implicitly protect
(i.e. to retain wortbwbile operating cbaracteristics) wben tbe distribution against outliers in placing less importance on extreme ~alues tba? on otber
proves to be different from tbat wbicb promoted tbe procedures. An sample members. A variety of generai metbods of tb1s type ex1st and we
informative review of 'robust statistics' is given by Huber (1972); otber sball briefly examine some of tbese robust blanket procedures. Tbe seco~d
important references are Tukey (1960), Huber (1964), Bickel (1965), Jaec- part of tbe study of accommodation of outliers is specifically concerned ~ttb
kel (1971a), Hampel (1974), and Hogg (1974). tbe nature of tbe initial model, and of tbe explanatory model for outhers,
An obvious area in wbicb we may wisb to seek tbe protection of robust and derives metbods of estimation or testing designed specifically to suit
statistica! metbods is wbere we encounter, or anticipate, outliers in a set of tbose models. Some examples of sucb specific accommodation techniques are
data. As one example, extreme observations clearly bave an extreme effect also given below.
on tbe value of a sample variance! If we are interested in estimating a
parameter in an initial model, but are concerned about tbe prospects of
outliers, wbetber arising from random execution errors of no specific rele- Blanket procedures
vance to our studies, or from random measurement error, we would want to
use an estimator wbich is not likely to be bigbly sensitive to sucb outliers. A To illustrate robust statistica! metbods wbicb provide en passant some
simple (if somewbat paranoid) example is to be found in tbe use of tbe protection against outliers, wbilst not specifically concerned witb outliers, we
sample median as an estimator of location. consider a variety of metbods of robust estimation of a location parameter,
We must, of course, be ever conscious of tbe overriding importance of tbe IL· We single out four for discussion.
alternative bypotbesis. If outliers arise because our initial model does not Tbe aim tbrougbout is to reduce tbe influence of extreme observations in
reflect tbe appropriate degree of inberent variation (we really need, say, a tbe sample on tbe value of tbe estimate of IL· Since outliers manifest
fatter-tailed distribution ratber tban tbe ubiquitous normal distribution tbemselves as extreme observations tbis bas tbe effect of protecting against
initially adopted) tben omission of extreme values to 'protect against out- tbeir presence as well as meeting tbe prime object of rendering less dramatic
liers' is bardly a robust policy for estimating some measure of dispersion, say tbe effect of tbe tail bebaviour of tbe generating distribution on tbe
46 Outliers in statistica/ data What should one do about outlying obseroations? 47

adopted in wbicb one observation (arising at random from tbe n in tbe tbe variance. Ratber tban appropriately reducing tbe effect of extreme
sample) comes from N(IL +a, o- 2 ). Starting witb a non-informative prior values it encourages underestimation!
distribution for f.L, u, and a, tbe marginai posterior distribution of a is If on tbe otber band a reasonable alternative bypotbesis is of one of tbe
determined and its form is exploited to construct a principle for determining typ~s wbicb expresses c~ntamination of tbe initial mod~l (perbaps expressing
wbetber 'spuriosity bas or bas no t occurred'. Sin ce our potential application low-probability mixing, or slippage of one or two dtscordant values) tbe
of tbis approacb is likely to be triggered by tbe occurrence of an outlier, tbe estimation or testing of parameters in tbe initial model may well be tbe
principle can be regarded as a means of assessing discordancy of a single matter of principal interest and it is sensible to employ robust procedures to
outlier. protect against tbe occasionai low-probability component or sli~ped va~ue.
Otber work in tbe Bayesian idiom, by Box and Tiao (1968), De Finetti Tbe idea tbat we may wisb, in tbis spirit, to do more tben re1ect outhers,
(1961), Dempster and Rosner (1971), Guttman~ and otbers will be discussed tbat is to devise statistically respectable means of accommodating tbem in a
in Cbapter 8. wider 'inferential scbeme addressed to tbe initial model, takes our interest
away from tests of discordancy. Tbe outliers tbemselves are no longer of
prime concern. We wisb to proceed safely in spite of tbem! Tbis is of tbe
2.6 ACCOMMODATION OF OUTLIERS: ROBUST ESTIMATION
essence of tbe robustness concept.
AND TESTING Some interest in tbis alternative view of tbe outlier problem begins to
Tbere bas been an increasing interest over recent years in statistica! proce- sbow itself in quite early work. Tbe ideas of Glaisber (1872), Newcomb
dures wbicb provide a measure of protection against uncertainties of know- (1886), Mendeleev (1895), Student (1927), and Jeffreys (1932) amount to
ledge of tbe data generating mecbanism. Tbese include robust metbods for reducing tbe weigbt attacbed to extreme values in estimation, tbe latter
estimating or testing summary measùres of tbe underlying distribution: paper paying specific regard to outliers as tbe extreme members of tbe
wbere tbe estimators or tests retain desirable statistica! properties over a sample.
range of different possible distributional forms. Alternatively, procedures A review of later work wbicb implicitly or explicitly attempts to accom-
wbicb bave been derived to suit tbe specific properties of a particular modate outliers in tbe inference process conveniently divides itself into two
distribution become even more appealing if tbey can be sbown to be robust parts. Tbe first contains tbose metbods of estimation wbicb implicitly protect
(i.e. to retain wortbwbile operating cbaracteristics) wben tbe distribution against outliers in placing less importance on extreme ~alues tba? on otber
proves to be different from tbat wbicb promoted tbe procedures. An sample members. A variety of generai metbods of tb1s type ex1st and we
informative review of 'robust statistics' is given by Huber (1972); otber sball briefly examine some of tbese robust blanket procedures. Tbe seco~d
important references are Tukey (1960), Huber (1964), Bickel (1965), Jaec- part of tbe study of accommodation of outliers is specifically concerned ~ttb
kel (1971a), Hampel (1974), and Hogg (1974). tbe nature of tbe initial model, and of tbe explanatory model for outhers,
An obvious area in wbicb we may wisb to seek tbe protection of robust and derives metbods of estimation or testing designed specifically to suit
statistica! metbods is wbere we encounter, or anticipate, outliers in a set of tbose models. Some examples of sucb specific accommodation techniques are
data. As one example, extreme observations clearly bave an extreme effect also given below.
on tbe value of a sample variance! If we are interested in estimating a
parameter in an initial model, but are concerned about tbe prospects of
outliers, wbetber arising from random execution errors of no specific rele- Blanket procedures
vance to our studies, or from random measurement error, we would want to
use an estimator wbich is not likely to be bigbly sensitive to sucb outliers. A To illustrate robust statistica! metbods wbicb provide en passant some
simple (if somewbat paranoid) example is to be found in tbe use of tbe protection against outliers, wbilst not specifically concerned witb outliers, we
sample median as an estimator of location. consider a variety of metbods of robust estimation of a location parameter,
We must, of course, be ever conscious of tbe overriding importance of tbe IL· We single out four for discussion.
alternative bypotbesis. If outliers arise because our initial model does not Tbe aim tbrougbout is to reduce tbe influence of extreme observations in
reflect tbe appropriate degree of inberent variation (we really need, say, a tbe sample on tbe value of tbe estimate of IL· Since outliers manifest
fatter-tailed distribution ratber tban tbe ubiquitous normal distribution tbemselves as extreme observations tbis bas tbe effect of protecting against
initially adopted) tben omission of extreme values to 'protect against out- tbeir presence as well as meeting tbe prime object of rendering less dramatic
liers' is bardly a robust policy for estimating some measure of dispersion, say tbe effect of tbe tail bebaviour of tbe generating distribution on tbe
What should o ne do about outlying observations? 49
48 Outliers in statistica/ data

estimation of IL· Cox and Hinkley (1974, Section 9.4) review robust estima- M-estimators are obtained by solving an equation of the form
tion of location, and Andrews et al. (1972) present a major sampling study
of different estimators.
f t/J(xi- {i)= O
i=l
(2.6.3)
One obvious way of achieving the objective is to use estimators having the to obtain an estimator {i where x 1 , x 2 , • •• , Xn is a random sample and t/J(u)
form of linear combinations of ordered sample values is some weight function with desirable features. For example if lt/J(u)l is
(2.6.1) small for large lui, il will discount extreme sample values and protect
against outliers. The particular choice t/J(u) = f'(u)/f(u), where the distribu-
where the weights ci are lower in the extremes than in the body of the data tion has probability density function of the form f(x -IL)1 will of course
set. The 'ultimate' example of such linear order statistics estimators (called yield il as a solution of the likelihood equation.
'L-estimators' by Huber, 1972) is the sample median where ci= O forali but Hodges and Lehmann (1963) were the first to remark that estimators of IL
the middle, or two middle, ordered observations. In contrast with the sample could be obtained from certain rank test procedures, such as the Wilcoxon
mean (ci= 1/n: i= 1, 2, ... , n) this can on occasions effect considerable test. Such R-estimators often prove to be robust. An example is given by
improvement: for example, for the Cauchy distribution where an even more Huber (1972) who considers the two-sample rank test for location shift. If
drastic policy of assigning negative weight to extreme observations can yield the samples are xh x2 , ••• , Xn and y1 , y2 , ••• , Yn the test statistic is
further improvement still (Barnett, 1966). Other examples are found in the
use of a-trimmed means, where a prescribed proportion, a, of the lower and (2.6.4)
upper ordered sample values are omitted and IL estimated by the (unweigh-
ted) average of the retained values, or in the use of specific combinations of wbere J[i/(2n + 1)] is some suitably chosen function of the empirica! dis-
a few sample quantiles (for example sèe Gastwirth, 1966). tribution function for the combined sample, and Vi = 1 if the ith ordered
Among other proposed robust estimators of IL which, en passant, help to value in the combined sample is one of the x-values (otherwise Vi= 0). We
eliminate the effect of outliers we bave the Winsorized mean, and estimators can derive an estimator f1 as the solution of
using the principle of 'jackknifing'. An example of a Winsorized mean is
obtained by 'collapsing' the most extreme (upper and lower) sample values W(x 1 - {i, x 2 - {i, ... , Xn- {i; -x 1 +{i, -x 2 + {i, ... , -xn +{i)= O (2.6.5)
to their nearest neighbours in the ordered sample and taking an unweighted and the asymptotic behaviour of {i is obtained from the power function of
average from the modified sample. This is just another L-estimator, with the test. For symmetric distributions R-estimators can, for an appropriate
c 1 = Cn =O, c 2 = cn-l = 2/n, ci= 1/n (i= 3, 4, ... , n- 2). (See Tukey and choice of J( · ), be asymptotically efficient and asymptotically normally
McLaughlin, 1963, for a discussion of Winsorized, and trimmed, means.) distributed.
The principle of jackknifing was originally proposed by Quenouille (1956)
as a method of reducing bias in estimators; the term was introduced by pecuharitles ot a part1cu1ar set ot data. Aaaptzve estzmators for outller
Tukey. Suppose fin is an estimator of IL to be evaluated from a sample of protection bave not been widely studied, but some possibilities bave been
size n, and fin-l the corresponding estimator evaluated from a sample of considered. For example, the trimming factor a in the a-trimmed mean
size n- 1. Considering the random sample xh x2 , ••• , Xn we bave a single might be chosen to exclude outliers in the data (Jaeckel, 1971a). Takeuchi
value for fin from the whole sample but n possible values fin-l,i for fin- 1 , (1971), Johns (1974), consider estimators based on weighted combinations
each obtained on omission of a single observation xi, (i= 1, 2, ... , n). We of the sums of subgroups of ordered sample values, with the weights
define jackknifed pseudo-values determined empirically. Somewhat similar is the simplified version of the
Zni = nfin- (n -1)fin-l,i (i= 1, 2, ... , n) (2.6.2) Hodges-Lehmann estimator due to Bickel and Hodges (1967): the median
of the quasi-mid-ranges
and construct an estimator in terms of these values. The principal advan- (i= 1, 2, ... , [n/2]).
tages of this approach are that it may reduce bias in estimators and can yield
useful estimates of their variances. This latter facility is the major advantage Specific accommodation techniques
of jackknifing.
Two other generai principles of robust estimation yield what Huber A number of proposals bave been made for taking specific account of
(1972) calls maximum likelihood type estimators (M-estimators) and rank outliers in the estimation or testing of parameters in the initial probability
test estimators (R-estimators). mode!. We shall consider two of these in detail.
What should o ne do about outlying observations? 49
48 Outliers in statistica/ data

estimation of IL· Cox and Hinkley (1974, Section 9.4) review robust estima- M-estimators are obtained by solving an equation of the form
tion of location, and Andrews et al. (1972) present a major sampling study
of different estimators.
f t/J(xi- {i)= O
i=l
(2.6.3)
One obvious way of achieving the objective is to use estimators having the to obtain an estimator {i where x 1 , x 2 , • •• , Xn is a random sample and t/J(u)
form of linear combinations of ordered sample values is some weight function with desirable features. For example if lt/J(u)l is
(2.6.1) small for large lui, il will discount extreme sample values and protect
against outliers. The particular choice t/J(u) = f'(u)/f(u), where the distribu-
where the weights ci are lower in the extremes than in the body of the data tion has probability density function of the form f(x -IL)1 will of course
set. The 'ultimate' example of such linear order statistics estimators (called yield il as a solution of the likelihood equation.
'L-estimators' by Huber, 1972) is the sample median where ci= O forali but Hodges and Lehmann (1963) were the first to remark that estimators of IL
the middle, or two middle, ordered observations. In contrast with the sample could be obtained from certain rank test procedures, such as the Wilcoxon
mean (ci= 1/n: i= 1, 2, ... , n) this can on occasions effect considerable test. Such R-estimators often prove to be robust. An example is given by
improvement: for example, for the Cauchy distribution where an even more Huber (1972) who considers the two-sample rank test for location shift. If
drastic policy of assigning negative weight to extreme observations can yield the samples are xh x2 , ••• , Xn and y1 , y2 , ••• , Yn the test statistic is
further improvement still (Barnett, 1966). Other examples are found in the
use of a-trimmed means, where a prescribed proportion, a, of the lower and (2.6.4)
upper ordered sample values are omitted and IL estimated by the (unweigh-
ted) average of the retained values, or in the use of specific combinations of wbere J[i/(2n + 1)] is some suitably chosen function of the empirica! dis-
a few sample quantiles (for example sèe Gastwirth, 1966). tribution function for the combined sample, and Vi = 1 if the ith ordered
Among other proposed robust estimators of IL which, en passant, help to value in the combined sample is one of the x-values (otherwise Vi= 0). We
eliminate the effect of outliers we bave the Winsorized mean, and estimators can derive an estimator f1 as the solution of
using the principle of 'jackknifing'. An example of a Winsorized mean is
obtained by 'collapsing' the most extreme (upper and lower) sample values W(x 1 - {i, x 2 - {i, ... , Xn- {i; -x 1 +{i, -x 2 + {i, ... , -xn +{i)= O (2.6.5)
to their nearest neighbours in the ordered sample and taking an unweighted and the asymptotic behaviour of {i is obtained from the power function of
average from the modified sample. This is just another L-estimator, with the test. For symmetric distributions R-estimators can, for an appropriate
c 1 = Cn =O, c 2 = cn-l = 2/n, ci= 1/n (i= 3, 4, ... , n- 2). (See Tukey and choice of J( · ), be asymptotically efficient and asymptotically normally
McLaughlin, 1963, for a discussion of Winsorized, and trimmed, means.) distributed.
The principle of jackknifing was originally proposed by Quenouille (1956)
as a method of reducing bias in estimators; the term was introduced by pecuharitles ot a part1cu1ar set ot data. Aaaptzve estzmators for outller
Tukey. Suppose fin is an estimator of IL to be evaluated from a sample of protection bave not been widely studied, but some possibilities bave been
size n, and fin-l the corresponding estimator evaluated from a sample of considered. For example, the trimming factor a in the a-trimmed mean
size n- 1. Considering the random sample xh x2 , ••• , Xn we bave a single might be chosen to exclude outliers in the data (Jaeckel, 1971a). Takeuchi
value for fin from the whole sample but n possible values fin-l,i for fin- 1 , (1971), Johns (1974), consider estimators based on weighted combinations
each obtained on omission of a single observation xi, (i= 1, 2, ... , n). We of the sums of subgroups of ordered sample values, with the weights
define jackknifed pseudo-values determined empirically. Somewhat similar is the simplified version of the
Zni = nfin- (n -1)fin-l,i (i= 1, 2, ... , n) (2.6.2) Hodges-Lehmann estimator due to Bickel and Hodges (1967): the median
of the quasi-mid-ranges
and construct an estimator in terms of these values. The principal advan- (i= 1, 2, ... , [n/2]).
tages of this approach are that it may reduce bias in estimators and can yield
useful estimates of their variances. This latter facility is the major advantage Specific accommodation techniques
of jackknifing.
Two other generai principles of robust estimation yield what Huber A number of proposals bave been made for taking specific account of
(1972) calls maximum likelihood type estimators (M-estimators) and rank outliers in the estimation or testing of parameters in the initial probability
test estimators (R-estimators). mode!. We shall consider two of these in detail.
50 Outliers in statistica/ data What should one do about outlying obseroations? 51

Tbe paper by Anscombe (1960a) expounds a basic pbilosopby in tbe wbere O< b < l. Tbe estimators considered are L-estimators (linear combi-
matter of accommodating outliers and applies tbis to a range of different nations of ordered sample values). lt is sbown tbat tbe observation most
situations. Altbougb be talks of 'rejection rules' for outliers tbese are really likely to be tbe aberrant one is x<n> and tbat an optimum estimator of 8
metbods of estimating parameters in tbe presence of outliers, ratber tban (minimizing tbe mean square error) based on tbe first m< n ordered sample
tests of discordancy. values bas tbe forni
Illustrating bis ideas for a random sample x 1 , x2 , ••• , Xn from N(IL, u 2 )
wbere u 2 is known, wbere IL is to be estimated, and wbere tbere may be a _ 1
8m = m+l
[m-1 ]
~ x(i)+(n-m+l)x(m) . (2.6.7)
single outlier, be proposes a rule: lf xM maximizes lxi- xl (i= l, 2, ... , n)
reject xM if maxi lxi -il> Cu for some suitable choice of C; otherwise reject no No firm prescription is given about cboice of m.
obseroations. Estimate IL by the mean of the reklined obseroations: Veale and Kale (1972) consider tests of H: 8 =l versus H: 8 >l based on
ém as test statistic. Tbe case of én-t is considered in detail witb regard to test
il = i if max lxi- xl< Cu power and extensions of tbe Anscombe concepts of premium and protec-
i
tion. Sinba (1973a) extends tbe study of tbe efficiency of tbe estimator ém
=i-maxlxi-il/(n-1) if maxlxi-xi>Cu and elsewbere (1973c) be considers tbe implications of tbe exchangeable
i i
type model for estimation of scale and location parameters simultaneously in
For more tban one outlier be suggests repeated use of tbis rule. Tbat is, a two-parameter exponential distribution.
apply tbe rejection criterion until no furtber sample values are rejected and Tbe estimator ém bas tbe form of a Winsorized mean, and was advanced
estimate IL by tbe mean of tbe remaining observations. on a premium-protection basis. In Kale (1975c) tbe maximum likelibood
Anscombe goes on to consider tbe properties of sucb a rule witb respect metbod is applied to tbe same type of situation but wbere tbe initial, and
to tbe 'premium' payable and tbe 'protection' afforded: essentially tbe loss aberrant, distributions (F and G in tbe notation of Section 2.3) can bave a
of efficiency under tbe basic model and tbe gain in efficiency under tbe more generai form: as any members of tbe single parameter exponential
alternative model (see Section 4.1). Tbe rule is extended to deal witb more family. Tbe possibility of more tban one outlier (observation from G) is also
complex sets of data, including results from a factorial design experiment entertained. Maximum likelibood estimators are sbown to bave tbe form of
(see Section 7.1.2). trimmed means, ratber tban Winsorized means.
Otbers bave adopted a similar premium-protection approacb to tbe bandl- Tbe first attempt to specifically accommodate tbe prospect of outliers in
ing of outliers. In particular Guttman and Smitb (1969, 1971) examine estimation or testing situations seems to be tbat of Dixon (1953). Otber
furtber tbe problems of determining tbe premium and tbe protection levels aspects of tbe problem include tbe use of Bayesian metbods (Gebbardt,
for tbe Anscombe rejection rule and extend tbe investigation to situations 1964, 1966; Sinba, 1972, 1973b) and explicit study of tbe estimation of IL
wbere tbe outlier is no t rejected but instead tbe sample is Winsorized (or from · samples of size 3, wbere usually IL is estimated from tbe two closest
modified Winsorization takes piace witb tbe observation yielding tbe max- obsetvations (Setb, 1950; Lieblein, 1952, 1962; Willke, 1966; Anscombe
imum value of lxi- xl being replaced by tbe closer of tbe two values i± Cu). and Barron, 1966; Veale and Huntsberger, 1969; Guttman and Smitb,
A different approacb is adopted by Kale and Sinba (1971), Veale and 1969; Desu, Geban, and Severo, 1974).
Kale (1972), Sinba (1973a, 1973b), and Kale (1975c). Having an interest in Fuller details and illustrations of metbods for accommodating outliers in
estimating or testing tbe value of tbe scale parameter in an exponential statistica! analyses are given in Cbapter 4.
distribution Kale and Sinba (1971) postulate an exchangeable alternative
bypotbesis to account for a possible outlier. Tbe working bypotbesis declares
that xh x 2 , ••• , Xn arise at random from a distribution with probability
density function

f(x, 8) =el exp (-x)


8 , (2.6.6)

whilst under the alternative hypotbesis one of the observations (which is


equally likely to be any particular one) arises from the distribution f(x, 8/b)
50 Outliers in statistica/ data What should one do about outlying obseroations? 51

Tbe paper by Anscombe (1960a) expounds a basic pbilosopby in tbe wbere O< b < l. Tbe estimators considered are L-estimators (linear combi-
matter of accommodating outliers and applies tbis to a range of different nations of ordered sample values). lt is sbown tbat tbe observation most
situations. Altbougb be talks of 'rejection rules' for outliers tbese are really likely to be tbe aberrant one is x<n> and tbat an optimum estimator of 8
metbods of estimating parameters in tbe presence of outliers, ratber tban (minimizing tbe mean square error) based on tbe first m< n ordered sample
tests of discordancy. values bas tbe forni
Illustrating bis ideas for a random sample x 1 , x2 , ••• , Xn from N(IL, u 2 )
wbere u 2 is known, wbere IL is to be estimated, and wbere tbere may be a _ 1
8m = m+l
[m-1 ]
~ x(i)+(n-m+l)x(m) . (2.6.7)
single outlier, be proposes a rule: lf xM maximizes lxi- xl (i= l, 2, ... , n)
reject xM if maxi lxi -il> Cu for some suitable choice of C; otherwise reject no No firm prescription is given about cboice of m.
obseroations. Estimate IL by the mean of the reklined obseroations: Veale and Kale (1972) consider tests of H: 8 =l versus H: 8 >l based on
ém as test statistic. Tbe case of én-t is considered in detail witb regard to test
il = i if max lxi- xl< Cu power and extensions of tbe Anscombe concepts of premium and protec-
i
tion. Sinba (1973a) extends tbe study of tbe efficiency of tbe estimator ém
=i-maxlxi-il/(n-1) if maxlxi-xi>Cu and elsewbere (1973c) be considers tbe implications of tbe exchangeable
i i
type model for estimation of scale and location parameters simultaneously in
For more tban one outlier be suggests repeated use of tbis rule. Tbat is, a two-parameter exponential distribution.
apply tbe rejection criterion until no furtber sample values are rejected and Tbe estimator ém bas tbe form of a Winsorized mean, and was advanced
estimate IL by tbe mean of tbe remaining observations. on a premium-protection basis. In Kale (1975c) tbe maximum likelibood
Anscombe goes on to consider tbe properties of sucb a rule witb respect metbod is applied to tbe same type of situation but wbere tbe initial, and
to tbe 'premium' payable and tbe 'protection' afforded: essentially tbe loss aberrant, distributions (F and G in tbe notation of Section 2.3) can bave a
of efficiency under tbe basic model and tbe gain in efficiency under tbe more generai form: as any members of tbe single parameter exponential
alternative model (see Section 4.1). Tbe rule is extended to deal witb more family. Tbe possibility of more tban one outlier (observation from G) is also
complex sets of data, including results from a factorial design experiment entertained. Maximum likelibood estimators are sbown to bave tbe form of
(see Section 7.1.2). trimmed means, ratber tban Winsorized means.
Otbers bave adopted a similar premium-protection approacb to tbe bandl- Tbe first attempt to specifically accommodate tbe prospect of outliers in
ing of outliers. In particular Guttman and Smitb (1969, 1971) examine estimation or testing situations seems to be tbat of Dixon (1953). Otber
furtber tbe problems of determining tbe premium and tbe protection levels aspects of tbe problem include tbe use of Bayesian metbods (Gebbardt,
for tbe Anscombe rejection rule and extend tbe investigation to situations 1964, 1966; Sinba, 1972, 1973b) and explicit study of tbe estimation of IL
wbere tbe outlier is no t rejected but instead tbe sample is Winsorized (or from · samples of size 3, wbere usually IL is estimated from tbe two closest
modified Winsorization takes piace witb tbe observation yielding tbe max- obsetvations (Setb, 1950; Lieblein, 1952, 1962; Willke, 1966; Anscombe
imum value of lxi- xl being replaced by tbe closer of tbe two values i± Cu). and Barron, 1966; Veale and Huntsberger, 1969; Guttman and Smitb,
A different approacb is adopted by Kale and Sinba (1971), Veale and 1969; Desu, Geban, and Severo, 1974).
Kale (1972), Sinba (1973a, 1973b), and Kale (1975c). Having an interest in Fuller details and illustrations of metbods for accommodating outliers in
estimating or testing tbe value of tbe scale parameter in an exponential statistica! analyses are given in Cbapter 4.
distribution Kale and Sinba (1971) postulate an exchangeable alternative
bypotbesis to account for a possible outlier. Tbe working bypotbesis declares
that xh x 2 , ••• , Xn arise at random from a distribution with probability
density function

f(x, 8) =el exp (-x)


8 , (2.6.6)

whilst under the alternative hypotbesis one of the observations (which is


equally likely to be any particular one) arises from the distribution f(x, 8/b)
Di$cordancy tests foroutliers in univariate samples 53

tbe mix, allows a suitable bardening period, and tben determines tbeir
strengtbs in p.s.i. to be as follows:
790,750,910,650,990,630,1290,820,860,710.
Tbe value 1290 seems to bim to be out of line witb tbe otber nine values
and. be wisbes t? test it as an outlier. Wbat test criteria migbt be use? Tbe
CHAPTER 3 cb01ce depend~ 1? tbe first piace on tbe form of tbe distribution of crusbing
stren~tbs of stmdar cubes from tbe same mix. Experience suggests tbat
crusbmg strengtbs are normally distributed. However, tbe me an an d var-
Discordancy Tests for Outliers ln iance will not be known.
As regards tbe test criterion, we must surely expect tbat units of measure-
Univariate Samples ment a~e. ir~elevant: tbat any test is invariant witb respect to cbanges of scale
and ongm m tbe data. For example, tbe ten values in tbe sample
4.79, 4.75, 4.91, 4.65, 4.99, 4.63, 5.29, 4.82, 4.86, 4.71
We bave now discussed in some detail tbe meaning of tbe term 'outlier', tbe
are linear transforms of tbe ten values in tbe first sample. Any test for 1290
nature of tbe outlier problem, and tbe variety of contexts in wbicb outliers
as an ~utlier in tbe first sample must give tbe same results wben applied to
can arise. We bave also drawn tbe distinction between different types of
5:29. vte~ed as an outlier in tbe second sample. Had, say, an exponential
action wbicb may be called for in ·response to an outlier: rejection, and
dtstnbutwn been assumed instead of a normal distribution, a test procedure
omission from tbe subsequent analysis; adjustment of its value for purposes
would only need to remain unaltered under cbanges in scale, not sbifts in
of estimation from tbe wbole sample; using it as a due to tbe existence of
origin, sin ce practical support for a single (scale) parameter exponential
some previously unsuspected and possibly interesting factor; interpreting it
mode l rests o n a natura l origin of measurement-see below.
as a stgnal to find a more appropriate model for tbe data. As a prerequisite
Using tbe notation introduced in Section 2.3, let us arrange tbe ten values
to ali of tbese, and an indispensable one to ali except tbe value adjustment
procedure (which can, if so desired, be carried through automatically what- in ascending order and name tbem x( 1), x< 2), ••• , x(to) respectively:
ever tbe values of tbe observations), a detection procedure must be under- x(l) X(z) x(3) x( 4 ) x(s) x( 6 ) x(7) x(s) x( 9 ) x(to)
taken; a statistica! test, termed bere a test of discordancy, to decide wbetber 630 650 710 750 790 820 860 910 990 1290
or not tbe outlier is to be regarded as a member of tbe main population.
Sucb tests are very often referred to in tbe literature as tests for tbe rejection Tbe figure sbows tbese ten values as points on a line:
of outliers, but, as we bave stressed, rejection is not tbe only course open
wben an observation is detected as foreign to tbe main data set. In this and 600 800 1000 1200
later cbapters we deal witb tests of discordancy in different situations. We
Tbe reason tbe outlier x< 1o) appears aberrant is because it is 'widely
start in tbis cbapter witb tbe simplest situation: wben tbe data, witb the
separated' from the remainder of tbe sample in relation to the spread of the
possible exception of any outliers, form a sample from a univariate distribu-
tion from a prescribed family (for example, gamma of unknown parameter;
sample. Tbis leads o ne to tbink of test statistics of the form N/ D, wh ere tbe
numerator N is a measure of tbe separation of x< 1o) from tbe remainder of
exponential; normal; normal witb known variance).
tbe sample and tbe denominator D is a measure of tbe spread of tbe sample.
It is not difficult in any particular situation to propose reasonable-looking
For tbe reason given above, D must be of tbe same dimensions as N, i.e. in
test statistics. (It may be quite anotber matter, of course, to ascertain the
tbis example D and N would botb be in p.s.i. For N one migbt consider
criticai v'alues or percentage points against which the value of any sucb
using the separation of x 00) from its nearest neighbour x<9), i.e. x 00) - x(9 ) =
statistic sbould be judged, to determine the distribution of tbe statistic on tbe
300; or again the separation of x00 ) from the other nine values considered as
assumption tbat tbe outlier is consistent witb tbe rest of tbe data, and to
assess tbe advantages and disadvantages of tbe test procedure.) Suppose, for a group, say specifically from their mean x'= 790. For D one migbt use the
example, tbat a civil engineer, wisbing to find tbe mean crushing strengtb of range of this group, x( 9) - x(l) = 360, or the spacing x<9) - x(s) = 80 wbich is
cement made from a particular mix, makes up a set of ten test cubes from markedly less than x 00) - x( 9), or perhaps the standard deviation s'= 119 of
the nine values. These considerations suggest as possible test statistics such
52
Di$cordancy tests foroutliers in univariate samples 53

tbe mix, allows a suitable bardening period, and tben determines tbeir
strengtbs in p.s.i. to be as follows:
790,750,910,650,990,630,1290,820,860,710.
Tbe value 1290 seems to bim to be out of line witb tbe otber nine values
and. be wisbes t? test it as an outlier. Wbat test criteria migbt be use? Tbe
CHAPTER 3 cb01ce depend~ 1? tbe first piace on tbe form of tbe distribution of crusbing
stren~tbs of stmdar cubes from tbe same mix. Experience suggests tbat
crusbmg strengtbs are normally distributed. However, tbe me an an d var-
Discordancy Tests for Outliers ln iance will not be known.
As regards tbe test criterion, we must surely expect tbat units of measure-
Univariate Samples ment a~e. ir~elevant: tbat any test is invariant witb respect to cbanges of scale
and ongm m tbe data. For example, tbe ten values in tbe sample
4.79, 4.75, 4.91, 4.65, 4.99, 4.63, 5.29, 4.82, 4.86, 4.71
We bave now discussed in some detail tbe meaning of tbe term 'outlier', tbe
are linear transforms of tbe ten values in tbe first sample. Any test for 1290
nature of tbe outlier problem, and tbe variety of contexts in wbicb outliers
as an ~utlier in tbe first sample must give tbe same results wben applied to
can arise. We bave also drawn tbe distinction between different types of
5:29. vte~ed as an outlier in tbe second sample. Had, say, an exponential
action wbicb may be called for in ·response to an outlier: rejection, and
dtstnbutwn been assumed instead of a normal distribution, a test procedure
omission from tbe subsequent analysis; adjustment of its value for purposes
would only need to remain unaltered under cbanges in scale, not sbifts in
of estimation from tbe wbole sample; using it as a due to tbe existence of
origin, sin ce practical support for a single (scale) parameter exponential
some previously unsuspected and possibly interesting factor; interpreting it
mode l rests o n a natura l origin of measurement-see below.
as a stgnal to find a more appropriate model for tbe data. As a prerequisite
Using tbe notation introduced in Section 2.3, let us arrange tbe ten values
to ali of tbese, and an indispensable one to ali except tbe value adjustment
procedure (which can, if so desired, be carried through automatically what- in ascending order and name tbem x( 1), x< 2), ••• , x(to) respectively:
ever tbe values of tbe observations), a detection procedure must be under- x(l) X(z) x(3) x( 4 ) x(s) x( 6 ) x(7) x(s) x( 9 ) x(to)
taken; a statistica! test, termed bere a test of discordancy, to decide wbetber 630 650 710 750 790 820 860 910 990 1290
or not tbe outlier is to be regarded as a member of tbe main population.
Sucb tests are very often referred to in tbe literature as tests for tbe rejection Tbe figure sbows tbese ten values as points on a line:
of outliers, but, as we bave stressed, rejection is not tbe only course open
wben an observation is detected as foreign to tbe main data set. In this and 600 800 1000 1200
later cbapters we deal witb tests of discordancy in different situations. We
Tbe reason tbe outlier x< 1o) appears aberrant is because it is 'widely
start in tbis cbapter witb tbe simplest situation: wben tbe data, witb the
separated' from the remainder of tbe sample in relation to the spread of the
possible exception of any outliers, form a sample from a univariate distribu-
tion from a prescribed family (for example, gamma of unknown parameter;
sample. Tbis leads o ne to tbink of test statistics of the form N/ D, wh ere tbe
numerator N is a measure of tbe separation of x< 1o) from tbe remainder of
exponential; normal; normal witb known variance).
tbe sample and tbe denominator D is a measure of tbe spread of tbe sample.
It is not difficult in any particular situation to propose reasonable-looking
For tbe reason given above, D must be of tbe same dimensions as N, i.e. in
test statistics. (It may be quite anotber matter, of course, to ascertain the
tbis example D and N would botb be in p.s.i. For N one migbt consider
criticai v'alues or percentage points against which the value of any sucb
using the separation of x 00) from its nearest neighbour x<9), i.e. x 00) - x(9 ) =
statistic sbould be judged, to determine the distribution of tbe statistic on tbe
300; or again the separation of x00 ) from the other nine values considered as
assumption tbat tbe outlier is consistent witb tbe rest of tbe data, and to
assess tbe advantages and disadvantages of tbe test procedure.) Suppose, for a group, say specifically from their mean x'= 790. For D one migbt use the
example, tbat a civil engineer, wisbing to find tbe mean crushing strengtb of range of this group, x( 9) - x(l) = 360, or the spacing x<9) - x(s) = 80 wbich is
cement made from a particular mix, makes up a set of ten test cubes from markedly less than x 00) - x( 9), or perhaps the standard deviation s'= 119 of
the nine values. These considerations suggest as possible test statistics such
52
54 Outliers in statistica/ data Discordancy tests for outliers in univariate samples 55

quantities as To tbe best of our knowledge tbe properties of tbis statistic bave not been
y(9, lO; l, 9) = x(to)- x(9) 300 studied and no percentage points are available, so no practical use can be
(value bere= = 0.83),
x(9)-x(t) 360 made of it. Tbis is probably no great loss, as tbere are reasons to believe tbat
it bas no particular advantages.
y(9, 10; 8, 9) X(lo)- x(9) 300
(value bere =so= 3.75), Consider now a different example. Tbe table sbows tbe lengtbs of stay (in
x(9)-x(s)
days) of 92 patients in a bospital observation ward before tbey were
T'= X(lo)- i' 500 transferred to a main ward (data by kind permission of J. Hoenig: Hoenig
(value bere= = 4.20). and Crotty, 1958, refers).
s' 119
Statistics of tbe form Lengtb of stay in days l 2 3 4 5 6 7 8 9 10 11 21 Total
Number of patients 11 18 28 8 12 5 5 l l l l l 92
y(r, S; p, q)= X(s)- X(r) (3.0.1) Regarding x<92> = 21 as an outlier, wbat criterion migbt be used for identify-
x(q)- x<P>
ing it? Tbe assumption of a normal distribution for tbe lengtbs of stay is not
-wbicb we sball term Dixon statistics-bave been investigated by Dixon plausible, but tbere is some evidence to support tbe use of a gamma
(1950, 1951), Likes (1966) and otbers, and some percentage points bave distribution witb origin at zero. Witb sucb a distribution, a test criterion is
been tabulated by Dixon; tbe y-notation is due to Likes. An attractive required to be invariant under cbanges of scale. As before, we look for test
alternative is to judge tbe outlier by tbe ratio of tbe spacing x<to)- x<9> to tbe statistics of tbe form N/ D wbere N measures tbe separatio n of tbe outlier
range of ali ten values including tbe outlier, giving from tbe rest of tbe sample, and D, in tbe same units as N, measures tbe
spread of tbe sample.
y(9, lO; l, lO)= X(lo)- x<9> 300 lf tbe underlying gamma distribution (denoted f(r, A)) bas parameters r
(value bere= = 0.45),
x<to>- x<t> 660 and A, i.e. if it bas probability density function
but tbis is effectively tbe same statistic as y(9, 10; l, 9), since clearly f(x) = A(AxY- 1 e-Àx/f(r), (3.0.3)
l l tben its me an is r/ A an d its variance is r/ A2 • If r is known, but À is unknown,
l.
y(9, 10; l, 10) y(9, 10; l, 9) tben tbe spread of tbe distribution can be measured not only by tbe sample
In a similar way, tbe statistic standard deviation S but also by tbe sample mean x or equivalently by tbe
sample sum I xi. Tbis suggests tbat a useful statistic for identifying tbe
T= X(to)-i 1290-840 upper outlier x<92 > would be
(value bere= 2.32),
s 194
wbere i, s are tbe mean and standard deviation of ali ten values including
tbe outlier, is equivalent to T' since tbe two quantities are functionally or equivalently
related; in fact (see Section 3.1)
x(92)
(n - l )2 n (n - 2) I xi
nT 2 (n -l)T' 2 =l. (3.0.2)
where I xi is tbe sum of all 92 observations.
Properties of tbe test based o n T' (or T) bave bee n discussed by P e arso n As witb T and T' in tbe normal case discussed above, tbe statistic is
and Cbandra Sekar (1936), Grubbs (1950), and otbers, and tables of functionally related to, and bence equivalent to,
percentage points are given by Grubbs in tbe same reference.
x(92)
As remarked earlier, it is easy to propose otber test statistics for tbe above
outlier example, for instance I' xi
wbere I' xi is tbe sum of tbe 91 observations omitting the outlier.
x(to)- i
Tbis statistic would of course bave been inappropriate for judging an
X(lo)- x(l)
upper outlier in a normal sample. On tbe otber band, statistics of Dixon's
54 Outliers in statistica/ data Discordancy tests for outliers in univariate samples 55

quantities as To tbe best of our knowledge tbe properties of tbis statistic bave not been
y(9, lO; l, 9) = x(to)- x(9) 300 studied and no percentage points are available, so no practical use can be
(value bere= = 0.83),
x(9)-x(t) 360 made of it. Tbis is probably no great loss, as tbere are reasons to believe tbat
it bas no particular advantages.
y(9, 10; 8, 9) X(lo)- x(9) 300
(value bere =so= 3.75), Consider now a different example. Tbe table sbows tbe lengtbs of stay (in
x(9)-x(s)
days) of 92 patients in a bospital observation ward before tbey were
T'= X(lo)- i' 500 transferred to a main ward (data by kind permission of J. Hoenig: Hoenig
(value bere= = 4.20). and Crotty, 1958, refers).
s' 119
Statistics of tbe form Lengtb of stay in days l 2 3 4 5 6 7 8 9 10 11 21 Total
Number of patients 11 18 28 8 12 5 5 l l l l l 92
y(r, S; p, q)= X(s)- X(r) (3.0.1) Regarding x<92> = 21 as an outlier, wbat criterion migbt be used for identify-
x(q)- x<P>
ing it? Tbe assumption of a normal distribution for tbe lengtbs of stay is not
-wbicb we sball term Dixon statistics-bave been investigated by Dixon plausible, but tbere is some evidence to support tbe use of a gamma
(1950, 1951), Likes (1966) and otbers, and some percentage points bave distribution witb origin at zero. Witb sucb a distribution, a test criterion is
been tabulated by Dixon; tbe y-notation is due to Likes. An attractive required to be invariant under cbanges of scale. As before, we look for test
alternative is to judge tbe outlier by tbe ratio of tbe spacing x<to)- x<9> to tbe statistics of tbe form N/ D wbere N measures tbe separatio n of tbe outlier
range of ali ten values including tbe outlier, giving from tbe rest of tbe sample, and D, in tbe same units as N, measures tbe
spread of tbe sample.
y(9, lO; l, lO)= X(lo)- x<9> 300 lf tbe underlying gamma distribution (denoted f(r, A)) bas parameters r
(value bere= = 0.45),
x<to>- x<t> 660 and A, i.e. if it bas probability density function
but tbis is effectively tbe same statistic as y(9, 10; l, 9), since clearly f(x) = A(AxY- 1 e-Àx/f(r), (3.0.3)
l l tben its me an is r/ A an d its variance is r/ A2 • If r is known, but À is unknown,
l.
y(9, 10; l, 10) y(9, 10; l, 9) tben tbe spread of tbe distribution can be measured not only by tbe sample
In a similar way, tbe statistic standard deviation S but also by tbe sample mean x or equivalently by tbe
sample sum I xi. Tbis suggests tbat a useful statistic for identifying tbe
T= X(to)-i 1290-840 upper outlier x<92 > would be
(value bere= 2.32),
s 194
wbere i, s are tbe mean and standard deviation of ali ten values including
tbe outlier, is equivalent to T' since tbe two quantities are functionally or equivalently
related; in fact (see Section 3.1)
x(92)
(n - l )2 n (n - 2) I xi
nT 2 (n -l)T' 2 =l. (3.0.2)
where I xi is tbe sum of all 92 observations.
Properties of tbe test based o n T' (or T) bave bee n discussed by P e arso n As witb T and T' in tbe normal case discussed above, tbe statistic is
and Cbandra Sekar (1936), Grubbs (1950), and otbers, and tables of functionally related to, and bence equivalent to,
percentage points are given by Grubbs in tbe same reference.
x(92)
As remarked earlier, it is easy to propose otber test statistics for tbe above
outlier example, for instance I' xi
wbere I' xi is tbe sum of tbe 91 observations omitting the outlier.
x(to)- i
Tbis statistic would of course bave been inappropriate for judging an
X(lo)- x(l)
upper outlier in a normal sample. On tbe otber band, statistics of Dixon's
56 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 57

type sucb as piovided A~ l, i.e. wben ()=l/i' and A= i'lxm provided xn ~i'. Its max-
y(91 92· l 91)= x<92)-x(91 ) imized vaiue is accordingly (3.1.2) if xn <i\ otberwise it is
' ' ' x(91)- x(l) '
LH=-(n l)ln i'-ln Xn-n. (3.1.4)
discussed above in tbe normai sampie situation, are clearly applicabie to Tbe test statistic based on tbe maximum Iikelibood ratio is {iii- LH}. Tbis is
gamma samples. equal to zero if Xn <i', wbile if Xn ~ i' it is
Generai considerations of tbis kind produce a wide cboice of possible test
statistics. We must now ask wbicb test is 'best' for any particuiar situation, -{(n -l)ln i'+ In Xn- n In i}
bow can it be constructed, and bow sbouid its performance be assessed? (3.1.5)
n-T
=-(n -l)In ---In T, wbere T= xji.
n-l
3.1 STATISTICAL BASES FOR CONSTRUCTION OF TESTS
Apart from intuitively based procedures, two widely applicable metbods It follows tbat tbe maximum likelibood ratio test is equivalent to rejecting H
exist for setting up discordancy tests, as bas been said in Section 2.5. Tbese wben T is large.
are tbe maximum likelihood ratio principle and tbe principle of local optimal- Strictly speaking, we are not in a position to use tbis test, because we do
ity perbaps restricted to tbe classes of unbiased, or invariant, tests. Naturally not know wbicb of tbe observations is tbe discordant one beionging to G if
tbe construction of tbe tests depends in tbe first instance on tbe alternative ii is true. In practice tbis observation is assumed to be x<n)' tbe outlier, and
bypotbesis employed to account for tbe outliers (Section 2.3). T( n)= x< n>! i is used as test statistic. If it were no t for tbe presence of tbe
Consider for exampie tbe testing of a single upper outiier x(n) in an outlier we would not be moved to query H or to test tbe strengtb of tbe
exponentiai sample. Our working bypotbesis is evidence for H. Tbis. is an intuitive justification for tbe use of T(n)' but
clearly T(n) is not tbe maximum Iikelibood ratio test statistic for tbe ii we
H:F, bave specified.
declaring tbat ali tbe observations x 1 , ••• , xn belong to tbe distribution F Tbere are two ways in wbicb we couid legitimateiy estabiisb T(n) as tbe
witb density _oe- 6x (x> O), 8 being unknown. Suppose we bave a slippage appropriate test statistic. Tbe first faces up squareiy to tbe fact tbat our
alternative H stating tbat n - l of tbe observations beiong to F and tbe desire for a test of discordancy stems from our reaction to one specific
remaining one, Xn say, to tbe exponential distribution G witb density observation; nameiy, tbe greatest observation, x(n)· An alternative
A8e-À 6x (x> O; A< l). We may write bypotbesis can be set up wbicb reflects tbis, in tbe form
H:A=l H: x( 1 ), X(z), • • • , x(n- 1 ) beiong to F
H: A< l. x(n) belongs to G.
The log Iikelibood of tbe observations on bypotbesis H is Tbis bypotbesis, wbicb we may call tbe bypotbesis of labelled slippage,
LH(O) =n In 8- nOi (3.1.1) identifies tbe extreme observation as tbe only possibie discordant vaiue.
If y 11 ••• , Yn- 1 is a random sampie from F and Yn is a random observation
wbere i is tbe me an of x 11 ••• , Xn. LH( 8) is maximized by 8 = l/ i, an d its from G, we can tbink of our ordered sample x( 1), ••• , x(n) as a particular
maximized vaiue is reaiization y 11 y2 , ••• , Yn in wbicb tbe observation Yn turns out to be tbe
iH = -n In i - n. (3.1.2) largest. Tbus tbe Iikeiibood under H' is
On bypotbesis H, tbe log likelibood of tbe observations is
P(x( 1 ), ••• , X(n))
L:H(8, A)= n In 8 +In A- (n -l)Oi'- AOxn (3.1.3)
wbere P(y 1 , •.• , Yn) is tbe likelibood of y1 , ••• , Yn conditional on y1 < ... <
wbere i' is tbe mean of x 1 , ••• , xn-l· L:H(8, A) is maximized wben Yn- 1 < Yw Now eacb Yi (j = l, ... , n -l) may be regarded as tbe time to tbe
first event in a Poisson process of rate 8, and Yn as tbe time to tbe first event
n/0-(n-l)i'-Axn =O
in a Poisson process of rate AO; by superposing tbese n processes, assuming
and tbem independent, and considering wbicb event occurs first, tbe probability
l/A- 8xn =O, tbat y1 is tbe smallest of tbe y's is seen to be 8/[(n -1)8 + AO] = ll(n -l+ A).
56 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 57

type sucb as piovided A~ l, i.e. wben ()=l/i' and A= i'lxm provided xn ~i'. Its max-
y(91 92· l 91)= x<92)-x(91 ) imized vaiue is accordingly (3.1.2) if xn <i\ otberwise it is
' ' ' x(91)- x(l) '
LH=-(n l)ln i'-ln Xn-n. (3.1.4)
discussed above in tbe normai sampie situation, are clearly applicabie to Tbe test statistic based on tbe maximum Iikelibood ratio is {iii- LH}. Tbis is
gamma samples. equal to zero if Xn <i', wbile if Xn ~ i' it is
Generai considerations of tbis kind produce a wide cboice of possible test
statistics. We must now ask wbicb test is 'best' for any particuiar situation, -{(n -l)ln i'+ In Xn- n In i}
bow can it be constructed, and bow sbouid its performance be assessed? (3.1.5)
n-T
=-(n -l)In ---In T, wbere T= xji.
n-l
3.1 STATISTICAL BASES FOR CONSTRUCTION OF TESTS
Apart from intuitively based procedures, two widely applicable metbods It follows tbat tbe maximum likelibood ratio test is equivalent to rejecting H
exist for setting up discordancy tests, as bas been said in Section 2.5. Tbese wben T is large.
are tbe maximum likelihood ratio principle and tbe principle of local optimal- Strictly speaking, we are not in a position to use tbis test, because we do
ity perbaps restricted to tbe classes of unbiased, or invariant, tests. Naturally not know wbicb of tbe observations is tbe discordant one beionging to G if
tbe construction of tbe tests depends in tbe first instance on tbe alternative ii is true. In practice tbis observation is assumed to be x<n)' tbe outlier, and
bypotbesis employed to account for tbe outliers (Section 2.3). T( n)= x< n>! i is used as test statistic. If it were no t for tbe presence of tbe
Consider for exampie tbe testing of a single upper outiier x(n) in an outlier we would not be moved to query H or to test tbe strengtb of tbe
exponentiai sample. Our working bypotbesis is evidence for H. Tbis. is an intuitive justification for tbe use of T(n)' but
clearly T(n) is not tbe maximum Iikelibood ratio test statistic for tbe ii we
H:F, bave specified.
declaring tbat ali tbe observations x 1 , ••• , xn belong to tbe distribution F Tbere are two ways in wbicb we couid legitimateiy estabiisb T(n) as tbe
witb density _oe- 6x (x> O), 8 being unknown. Suppose we bave a slippage appropriate test statistic. Tbe first faces up squareiy to tbe fact tbat our
alternative H stating tbat n - l of tbe observations beiong to F and tbe desire for a test of discordancy stems from our reaction to one specific
remaining one, Xn say, to tbe exponential distribution G witb density observation; nameiy, tbe greatest observation, x(n)· An alternative
A8e-À 6x (x> O; A< l). We may write bypotbesis can be set up wbicb reflects tbis, in tbe form
H:A=l H: x( 1 ), X(z), • • • , x(n- 1 ) beiong to F
H: A< l. x(n) belongs to G.
The log Iikelibood of tbe observations on bypotbesis H is Tbis bypotbesis, wbicb we may call tbe bypotbesis of labelled slippage,
LH(O) =n In 8- nOi (3.1.1) identifies tbe extreme observation as tbe only possibie discordant vaiue.
If y 11 ••• , Yn- 1 is a random sampie from F and Yn is a random observation
wbere i is tbe me an of x 11 ••• , Xn. LH( 8) is maximized by 8 = l/ i, an d its from G, we can tbink of our ordered sample x( 1), ••• , x(n) as a particular
maximized vaiue is reaiization y 11 y2 , ••• , Yn in wbicb tbe observation Yn turns out to be tbe
iH = -n In i - n. (3.1.2) largest. Tbus tbe Iikeiibood under H' is
On bypotbesis H, tbe log likelibood of tbe observations is
P(x( 1 ), ••• , X(n))
L:H(8, A)= n In 8 +In A- (n -l)Oi'- AOxn (3.1.3)
wbere P(y 1 , •.• , Yn) is tbe likelibood of y1 , ••• , Yn conditional on y1 < ... <
wbere i' is tbe mean of x 1 , ••• , xn-l· L:H(8, A) is maximized wben Yn- 1 < Yw Now eacb Yi (j = l, ... , n -l) may be regarded as tbe time to tbe
first event in a Poisson process of rate 8, and Yn as tbe time to tbe first event
n/0-(n-l)i'-Axn =O
in a Poisson process of rate AO; by superposing tbese n processes, assuming
and tbem independent, and considering wbicb event occurs first, tbe probability
l/A- 8xn =O, tbat y1 is tbe smallest of tbe y's is seen to be 8/[(n -1)8 + AO] = ll(n -l+ A).
58 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 59

Continuing stepwise, we get maximizing the probability of adopting the correct iii when slippage has
occurred, subject to a prescribed probability of correct adoption of the basic
P(y 1 < ... < Yn) = 1/[(n -l+ A)(n- 2+ A) ... (l+ A)]. (3.1.6)
hypothesis H and to certain invariance conditions (index permutation,
Hence the log likelihood of the observations on H' is positive changes of scale, arbitrary changes of origin). In the present
n-l situation of an exponential basic model, changes in location are inapprop-
L:H·(O, A)= n In O+ln A-8(ni-x(n))-A8x(n)+ L ln(j+A).
j=l
(3.1.7) riate and the procedure leads to adopting iii if ~ is maximized when j = i
and is sufficiently large. Thus the appropriate test statistic is precisely T(n)·
This is maximized when Ferguson (196la) applies the same type of multiple decision argument to the
case of testing fora model B (variance-covariance slippage) type outlier in a
n/ 8- ni +X(n)- Ax(n)..:: 0 multivariate normal sample.
an d Another way of handling (3.1.10) would be by means of a two-stage
n-l maximum likelihood ratio test: declaring as a discordant outlier that obser-
l/A- 8x(n) + L (j + A)-
j=l
1
=O. vation whose omission effects the greatest increase in maximized likelihood,
provided that increase is significantly large. Thus we consider (see 3.1.5)
The maximizing value of A, À say, must therefore satisfy
n-l "
n-
" _ -(n-l)ln ( ~ -In Ti
T) (~;:::: 1).
L (j + À)-
j=O
1
= nT(n)/[n- (1- À)T(n)J. (3.1.8) La,-LH-
{
o
n l
(~ ~ 1).
(3.1.11)

The maximized value of L:H,(O, A) comes out to be Choosing i to maximize (3.1.11) implies identifying the hypothesis ~ for
n-l which xi= x(n); this ~ is adopted and x(n) declared a discordant outlier if
iiì,=-nlni-n-nln[n-(1-À)T<n>]+ L ln(j+À).
j=O
(3.1.9) T(n) is sufficiently large.
Stili considering the case of a single upper outlier in an exponential
Under H, the log likelihood is sample, let us now construct a discordancy test on the basis of one-sided
n In 8 - nOi - In n ! local (invariant) optimality. Assuming first the slippage alternative H, with
log likelihood
with maximized value L:H(8, A)= n In O+ln A-(n-1)8i'-A8xm
iH = -n In i - n -In n! we bave iJL:H(O, A)/iJA =(l/A)- 8xn.
Hence iH- iiì' depends on the observations in terms of T<n> and À only, Under the working hypothesis H, A= l, and aL:HiiJA is then equal to
and is therefore a function of T(n) since in view of (3.1.8) À is a function of 1- ()xn. Replacing () by its maximizing value 8=l/i', we obtain 1- (xji') as
T<n> only. The discordancy test statistic T(n) is thus equivalent to the the locally optimal test statistic, which is equivalent to T= xj i. If instead we
maximum likelihood ratio test statistic in the labelled slippage formulation. take the labelled slippage alternative H', we get
n-l
The second way in which T<n> is established directly as the appropriate test
statistic is by a multiple decision procedure applied to a set of n alternative aLH,(O, A)/iJA =(1/A)-()x(n)+ L (j+A)-
j=l
1

hypotheses
iii :xi comes from G (some i)} When A =l this is equal to LJ=l j- 1 - 8x<n>; substituting 8 = 8=l/i we get
(3.1.10)
xi com es from F (j ::f:: i) LJ=l r 1
- X~n) as the locally optimal test statistic, or in effect T(n)•
x
for i= l, 2, ... , n. This formulation is similar to the model B type of Consider now the testing of a single upper outlier in a normal sample, for
slippage alternative hypothesis considered by Ferguson (196la) (see Section which the statistic T<n> = {x(n)- i}/s is commonly used. As in the exponential
2.3 iv) specialized to the case of a single outlier and an exponential case discussed above, it can be shown that T(n) is effectively the maximum
distribution. We noted in Section 2.5 Paulson's (1952b) use of a multiple likelihood ratio test statistic for a labelled slippage alternative, also for the
decision approach to the corresponding set of alternative hypotheses for a corresponding multiple decision formulation, and the unidentifiable equival-
location shifted outlier in a normal sample. The decision criterion is that of ent T= {xn- i}/ s is the corresponding statistic for the ordinary slippage
58 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 59

Continuing stepwise, we get maximizing the probability of adopting the correct iii when slippage has
occurred, subject to a prescribed probability of correct adoption of the basic
P(y 1 < ... < Yn) = 1/[(n -l+ A)(n- 2+ A) ... (l+ A)]. (3.1.6)
hypothesis H and to certain invariance conditions (index permutation,
Hence the log likelihood of the observations on H' is positive changes of scale, arbitrary changes of origin). In the present
n-l situation of an exponential basic model, changes in location are inapprop-
L:H·(O, A)= n In O+ln A-8(ni-x(n))-A8x(n)+ L ln(j+A).
j=l
(3.1.7) riate and the procedure leads to adopting iii if ~ is maximized when j = i
and is sufficiently large. Thus the appropriate test statistic is precisely T(n)·
This is maximized when Ferguson (196la) applies the same type of multiple decision argument to the
case of testing fora model B (variance-covariance slippage) type outlier in a
n/ 8- ni +X(n)- Ax(n)..:: 0 multivariate normal sample.
an d Another way of handling (3.1.10) would be by means of a two-stage
n-l maximum likelihood ratio test: declaring as a discordant outlier that obser-
l/A- 8x(n) + L (j + A)-
j=l
1
=O. vation whose omission effects the greatest increase in maximized likelihood,
provided that increase is significantly large. Thus we consider (see 3.1.5)
The maximizing value of A, À say, must therefore satisfy
n-l "
n-
" _ -(n-l)ln ( ~ -In Ti
T) (~;:::: 1).
L (j + À)-
j=O
1
= nT(n)/[n- (1- À)T(n)J. (3.1.8) La,-LH-
{
o
n l
(~ ~ 1).
(3.1.11)

The maximized value of L:H,(O, A) comes out to be Choosing i to maximize (3.1.11) implies identifying the hypothesis ~ for
n-l which xi= x(n); this ~ is adopted and x(n) declared a discordant outlier if
iiì,=-nlni-n-nln[n-(1-À)T<n>]+ L ln(j+À).
j=O
(3.1.9) T(n) is sufficiently large.
Stili considering the case of a single upper outlier in an exponential
Under H, the log likelihood is sample, let us now construct a discordancy test on the basis of one-sided
n In 8 - nOi - In n ! local (invariant) optimality. Assuming first the slippage alternative H, with
log likelihood
with maximized value L:H(8, A)= n In O+ln A-(n-1)8i'-A8xm
iH = -n In i - n -In n! we bave iJL:H(O, A)/iJA =(l/A)- 8xn.
Hence iH- iiì' depends on the observations in terms of T<n> and À only, Under the working hypothesis H, A= l, and aL:HiiJA is then equal to
and is therefore a function of T(n) since in view of (3.1.8) À is a function of 1- ()xn. Replacing () by its maximizing value 8=l/i', we obtain 1- (xji') as
T<n> only. The discordancy test statistic T(n) is thus equivalent to the the locally optimal test statistic, which is equivalent to T= xj i. If instead we
maximum likelihood ratio test statistic in the labelled slippage formulation. take the labelled slippage alternative H', we get
n-l
The second way in which T<n> is established directly as the appropriate test
statistic is by a multiple decision procedure applied to a set of n alternative aLH,(O, A)/iJA =(1/A)-()x(n)+ L (j+A)-
j=l
1

hypotheses
iii :xi comes from G (some i)} When A =l this is equal to LJ=l j- 1 - 8x<n>; substituting 8 = 8=l/i we get
(3.1.10)
xi com es from F (j ::f:: i) LJ=l r 1
- X~n) as the locally optimal test statistic, or in effect T(n)•
x
for i= l, 2, ... , n. This formulation is similar to the model B type of Consider now the testing of a single upper outlier in a normal sample, for
slippage alternative hypothesis considered by Ferguson (196la) (see Section which the statistic T<n> = {x(n)- i}/s is commonly used. As in the exponential
2.3 iv) specialized to the case of a single outlier and an exponential case discussed above, it can be shown that T(n) is effectively the maximum
distribution. We noted in Section 2.5 Paulson's (1952b) use of a multiple likelihood ratio test statistic for a labelled slippage alternative, also for the
decision approach to the corresponding set of alternative hypotheses for a corresponding multiple decision formulation, and the unidentifiable equival-
location shifted outlier in a normal sample. The decision criterion is that of ent T= {xn- i}/ s is the corresponding statistic for the ordinary slippage
60 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 61

alternative. Tbe alternative bypotbeses must be appropriately cb~en; t~at multivariate sample. Once detected, it can of course be tested for discordancy
x 1, ... , xn-l (in tbe case of H) or x(l), ... , x<n-l) (in tbe cases of H' or H J (see Cbapters 6 and 7). Tbis principle is embodied in tbe use of tbe multiple
belong to F:N(p., u 2 ) and tbat Xm or x(n)' belongs to a normal distribution decision formulation of tbe alternative bypotbesis.
G :N(IL +a, u 2 ) witb a different mean IL+ a (a> O) but tbe same variance. lf
instead we make tbe alternative a normal distribution N(~L, bu 2 ) (b > 1), i.e. 3.1.1 Inclusive ami exclusive measures, and a recursive algorithm for the
witb tbe same mean as F but a larger variance, a different and less tractable null distribution of a test statistic
criterion emerges. For an account of tbe construction of sucb tests of In Section 2.4 we bave listed six basic types of discordancy test statistic.
discordancy for normal samples based on tbe local optimality principle, see Most of tbese are ratios of tbe form N/ D, as discussed in tbe introduction to
Ferguson (1961a). His tests, based variously on sample skewness and tbis cbapter: bere N is a measure of tbe separation of tbe outlying value or
kurtosis, bave been referred to in Section 2."5 and are described in detail values from tbe main mass of tbe sample and D is a measure of tbe spread
below (Section 3.4.3). of tbe sample. Measures used for N include tbe excess, tbe deviation and,
Test statistics obtained by eitber of tbe metbods discussed above bave tbe in tbe gamma case, tbe extreme (as deviation from zero), ali as defined in
required invariance properties. Tbey are, in tbe first piace, invariant under Section 2.4; measures used forD include tbe standard deviation, tbe range,
permutation of tbe subscripts of xb ... , xn and bence can be expressed as tbe sum of squares of observations corrected to tbe sample mean, and, again
functions of tbe ordered values x( 1), ••• , x(n)· (Any symmetric functions of in tbe gamma case, tbe sample mean or sum. One ambiguity immediately
tbe x(i) wbicb figure bere can of course be written as tbe correspond~ng presents itself: sbould tbe means and measures of spre~d w~icb enter i~to
functions of tbe unordered xi.) Ferguson sets out to ensure scale and locat10n tbese statistics be calculated from tbe complete data set mcludmg tbe outher
invariance in bis test criteria and asserts (for tbe normal case witb unknown (or outliers), or from tbe reduced data set excluding tbe outlier? We would
mean and variance wbicb be is examining) tbat tbis implies tbat tbe appear to bave a double set of statistics, based ~espectiv.e~y on inclus.ive
observations will appear in tbe expressions for tbe criteria purely in terms of measures and exclusive measures, and to be faced wttb a dectston as to wbtcb
ratios of intervals between ordered values, of tbe type {x(a)- x(b)}/{x(c)- x(d)}. is preferable. It turns out, bowever, tbat a statistic N/ D based on inclusive
For example, tbe familiar {x(n)- i}/s is a function of tbese ratios. (Here we measures and its analogue based on exclusive measures are in many cases
bave a justification for tbe use of Dixon statistics wbicb are, of course, tbe equivalent. Tbe following examples will make this clear.
ratios tbemselves.) Scale and location invariance will also bold in more
specific circumstances, wbere certain parameter values are assumed known, (i) Excess/ Range statistic for testing single upper outlier
altbougb tbe condition tbat tbe test statistics are functions merely of ratios
of differences between ordered values will not necessarily apply. For exam-
T= X(n)- X(n-1) (3.1.12)
ple, consider tbe testing of normal samples wben tbe variance u 2 is known. Inclusive
In tbis situation {x(n)- i}/u, for instance, is scale- and location-invariant, and X(n)- X(t)
is a valid discordancy statistic. Similarly if IL but not u 2 is known,
T'= X(n)- X(n-1) (3.1.13)
{x<n>- IL }l s, again no t expressible in terms of Dix o n statistics, is a valid Exclusive
statistic. In some contexts dual invariance will not be appropriate, as we X(n-1)- X(t)
bave remarked above in relation to exponential samples. Clearly
Reverting to tbe maximum likelibood ratio principle, tbis bas a furtber l l
application of great importance in tbe detection of outliers, quite apart from ---=1.
tbe assessment of tbeir discordancy. In data situations sucb as regression or
T T'
tbe results of designed experiments, outliers may be present, essentially as
outlying residuals, but sucb values may not in generai be immediately (ii) Deviation/ Spread statistic for testing single lower outlier
recognized as outliers in tbe same way tbat extreme values are in univariate
samples. One way of defining tbe 'most outlying' observation or point in a T=i-x(l)
Inclusive
sample is as tbe one wbose omission produces tbe greatest increase in tbe s
maximized likelibood; tbis amounts, of course, to identifying it as tbe one
out of tbe n wbose maximum likelibood ratio test statistic bas tbe greatest T'= i'-x(l)
Exclusive
value. Tbe same principle can be used to detect tbe most outlying point in a s'
60 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 61

alternative. Tbe alternative bypotbeses must be appropriately cb~en; t~at multivariate sample. Once detected, it can of course be tested for discordancy
x 1, ... , xn-l (in tbe case of H) or x(l), ... , x<n-l) (in tbe cases of H' or H J (see Cbapters 6 and 7). Tbis principle is embodied in tbe use of tbe multiple
belong to F:N(p., u 2 ) and tbat Xm or x(n)' belongs to a normal distribution decision formulation of tbe alternative bypotbesis.
G :N(IL +a, u 2 ) witb a different mean IL+ a (a> O) but tbe same variance. lf
instead we make tbe alternative a normal distribution N(~L, bu 2 ) (b > 1), i.e. 3.1.1 Inclusive ami exclusive measures, and a recursive algorithm for the
witb tbe same mean as F but a larger variance, a different and less tractable null distribution of a test statistic
criterion emerges. For an account of tbe construction of sucb tests of In Section 2.4 we bave listed six basic types of discordancy test statistic.
discordancy for normal samples based on tbe local optimality principle, see Most of tbese are ratios of tbe form N/ D, as discussed in tbe introduction to
Ferguson (1961a). His tests, based variously on sample skewness and tbis cbapter: bere N is a measure of tbe separation of tbe outlying value or
kurtosis, bave been referred to in Section 2."5 and are described in detail values from tbe main mass of tbe sample and D is a measure of tbe spread
below (Section 3.4.3). of tbe sample. Measures used for N include tbe excess, tbe deviation and,
Test statistics obtained by eitber of tbe metbods discussed above bave tbe in tbe gamma case, tbe extreme (as deviation from zero), ali as defined in
required invariance properties. Tbey are, in tbe first piace, invariant under Section 2.4; measures used forD include tbe standard deviation, tbe range,
permutation of tbe subscripts of xb ... , xn and bence can be expressed as tbe sum of squares of observations corrected to tbe sample mean, and, again
functions of tbe ordered values x( 1), ••• , x(n)· (Any symmetric functions of in tbe gamma case, tbe sample mean or sum. One ambiguity immediately
tbe x(i) wbicb figure bere can of course be written as tbe correspond~ng presents itself: sbould tbe means and measures of spre~d w~icb enter i~to
functions of tbe unordered xi.) Ferguson sets out to ensure scale and locat10n tbese statistics be calculated from tbe complete data set mcludmg tbe outher
invariance in bis test criteria and asserts (for tbe normal case witb unknown (or outliers), or from tbe reduced data set excluding tbe outlier? We would
mean and variance wbicb be is examining) tbat tbis implies tbat tbe appear to bave a double set of statistics, based ~espectiv.e~y on inclus.ive
observations will appear in tbe expressions for tbe criteria purely in terms of measures and exclusive measures, and to be faced wttb a dectston as to wbtcb
ratios of intervals between ordered values, of tbe type {x(a)- x(b)}/{x(c)- x(d)}. is preferable. It turns out, bowever, tbat a statistic N/ D based on inclusive
For example, tbe familiar {x(n)- i}/s is a function of tbese ratios. (Here we measures and its analogue based on exclusive measures are in many cases
bave a justification for tbe use of Dixon statistics wbicb are, of course, tbe equivalent. Tbe following examples will make this clear.
ratios tbemselves.) Scale and location invariance will also bold in more
specific circumstances, wbere certain parameter values are assumed known, (i) Excess/ Range statistic for testing single upper outlier
altbougb tbe condition tbat tbe test statistics are functions merely of ratios
of differences between ordered values will not necessarily apply. For exam-
T= X(n)- X(n-1) (3.1.12)
ple, consider tbe testing of normal samples wben tbe variance u 2 is known. Inclusive
In tbis situation {x(n)- i}/u, for instance, is scale- and location-invariant, and X(n)- X(t)
is a valid discordancy statistic. Similarly if IL but not u 2 is known,
T'= X(n)- X(n-1) (3.1.13)
{x<n>- IL }l s, again no t expressible in terms of Dix o n statistics, is a valid Exclusive
statistic. In some contexts dual invariance will not be appropriate, as we X(n-1)- X(t)
bave remarked above in relation to exponential samples. Clearly
Reverting to tbe maximum likelibood ratio principle, tbis bas a furtber l l
application of great importance in tbe detection of outliers, quite apart from ---=1.
tbe assessment of tbeir discordancy. In data situations sucb as regression or
T T'
tbe results of designed experiments, outliers may be present, essentially as
outlying residuals, but sucb values may not in generai be immediately (ii) Deviation/ Spread statistic for testing single lower outlier
recognized as outliers in tbe same way tbat extreme values are in univariate
samples. One way of defining tbe 'most outlying' observation or point in a T=i-x(l)
Inclusive
sample is as tbe one wbose omission produces tbe greatest increase in tbe s
maximized likelibood; tbis amounts, of course, to identifying it as tbe one
out of tbe n wbose maximum likelibood ratio test statistic bas tbe greatest T'= i'-x(l)
Exclusive
value. Tbe same principle can be used to detect tbe most outlying point in a s'
62 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 63

wh ere When Xn is omitted, X 1 , . . . , Xn-l are n - l i.i.d. random variables with


ni =
n
L x(i),
i=l
(n -l)i' =f x(i),
i=2
the distribution (3.1.15) and we may rename these as Yj (j =l, ... , n -l),
with
(n-l)s 2 = f (x(i)-i)
i=l
2
, (n- 2)s' = f
2

i=2
(x(i)- i') 2 • Ay=
n-1
L Yj.
j=l

W e bave nsT =(n -l)s'T' and


Also write Tj = Y/AY (j =l, ... , n -l), and T*= (T!, ... , T!- 1 )', of di-
(n -l)s 2 - (n- 2)s' 2 =(i- x< 1 >f +(n -l)(i'- if mension n- 2. Clearly Xm Ay, T* are independent, hence X,J Ay an d T* are
•l independent.
= szTz+-- szTz Corresponding with X,J A x = T m write X,J Ay = T~. Clearly
n-1 '
whence l l
(n-1) 2 n(n-2) ---=1. (3.1.16)
l. (3.1.14) Tn T~
---;:;:rz - (n - l) T' 2

When the inclusive an d exclusive forms of a discordancy statistic N/ D are Hence when X,J A x = t, it follows that
functionally related, as above, the null distribution can be obtained by X,JAy = t/(1- t).
means of a very useful recursive argument. While it is not our intention in
this book to give detailed proofs of all results used, we feel it is worthwhile Suppose now that gn(t) is the probability density function, and Gn(t) the
setting out this recursive argument in some detail for a simple particular distribution function, of the random variable X( n)/ A x, wh ere X( n) is the
case, namely that of an upper outlier in an exponential sample. lt will then greatest of the Xi. Then
be sufficient to state other results obtainable by the same type of argument
as they arise, without giving the argument in detail. gn(t) ot = P[X(n)!Ax E (t, t+ ot)]
= nP[XJAx E (t, t+ ot), xn = X(n)]
The distribution of the ratio of the greatest observation to the sum of the
observations for an exponential sample
= nP[Xn!Ax E (t, t+ ot), xl <Xm ... 'Xn-l <Xn]
= nP[XJAx E (t, t+ ot), X/Ay < t/(1- t) for j =l, ... ' n -l]
In the introduction to this chapter we remarked on the usefulness of X(nlL xi
(or some simple function of it) for judging a single upper outlier in a gamma = nP[Tn E (t, t+ ot), Tf <t!( l- t) for j =l, ... ' n -l].
sample. For its detailed application we need to know its distribution in the
From (3.1.16), Tn and T* are independent. Hence
null case, i.e. when there is no discordant value present. Let us consider this
specifically for the exponential distribution, r(l, À ). gn (t) ot = nP[Tn E (t, t+ ot)]P[Tj < t/(1- t) for j =l, 2, ... ' n -l].
Suppose X 1 , •.• , Xn are n independent identically distributed (i.i.d.) ran- The second of these probabilities is
dom variables each exponentially distributed with probability density func-
tion
P(m~x r;"< 1 ~t)=a._,( 1 ~J
f(x) = Ae-Àx (x> 0). (3.1.15)
Write The first probability is equal to
n
Ax =L Xi, 1j = X/Ax
j=l
(j =l, ... , n).

Then T= (T1 , ••. , Tn)' is a vector random variable of dimension n -l, since where t'= t/(1- t), so that we conclude
~T.-=1·
and Ax, T are statistically independent.
I..J l '
g.(t) lìt =n~ T~ E (t', t'+~~ lìt) ]a·-•C ~ J (3.1.17)
62 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 63

wh ere When Xn is omitted, X 1 , . . . , Xn-l are n - l i.i.d. random variables with


ni =
n
L x(i),
i=l
(n -l)i' =f x(i),
i=2
the distribution (3.1.15) and we may rename these as Yj (j =l, ... , n -l),
with
(n-l)s 2 = f (x(i)-i)
i=l
2
, (n- 2)s' = f
2

i=2
(x(i)- i') 2 • Ay=
n-1
L Yj.
j=l

W e bave nsT =(n -l)s'T' and


Also write Tj = Y/AY (j =l, ... , n -l), and T*= (T!, ... , T!- 1 )', of di-
(n -l)s 2 - (n- 2)s' 2 =(i- x< 1 >f +(n -l)(i'- if mension n- 2. Clearly Xm Ay, T* are independent, hence X,J Ay an d T* are
•l independent.
= szTz+-- szTz Corresponding with X,J A x = T m write X,J Ay = T~. Clearly
n-1 '
whence l l
(n-1) 2 n(n-2) ---=1. (3.1.16)
l. (3.1.14) Tn T~
---;:;:rz - (n - l) T' 2

When the inclusive an d exclusive forms of a discordancy statistic N/ D are Hence when X,J A x = t, it follows that
functionally related, as above, the null distribution can be obtained by X,JAy = t/(1- t).
means of a very useful recursive argument. While it is not our intention in
this book to give detailed proofs of all results used, we feel it is worthwhile Suppose now that gn(t) is the probability density function, and Gn(t) the
setting out this recursive argument in some detail for a simple particular distribution function, of the random variable X( n)/ A x, wh ere X( n) is the
case, namely that of an upper outlier in an exponential sample. lt will then greatest of the Xi. Then
be sufficient to state other results obtainable by the same type of argument
as they arise, without giving the argument in detail. gn(t) ot = P[X(n)!Ax E (t, t+ ot)]
= nP[XJAx E (t, t+ ot), xn = X(n)]
The distribution of the ratio of the greatest observation to the sum of the
observations for an exponential sample
= nP[Xn!Ax E (t, t+ ot), xl <Xm ... 'Xn-l <Xn]
= nP[XJAx E (t, t+ ot), X/Ay < t/(1- t) for j =l, ... ' n -l]
In the introduction to this chapter we remarked on the usefulness of X(nlL xi
(or some simple function of it) for judging a single upper outlier in a gamma = nP[Tn E (t, t+ ot), Tf <t!( l- t) for j =l, ... ' n -l].
sample. For its detailed application we need to know its distribution in the
From (3.1.16), Tn and T* are independent. Hence
null case, i.e. when there is no discordant value present. Let us consider this
specifically for the exponential distribution, r(l, À ). gn (t) ot = nP[Tn E (t, t+ ot)]P[Tj < t/(1- t) for j =l, 2, ... ' n -l].
Suppose X 1 , •.• , Xn are n independent identically distributed (i.i.d.) ran- The second of these probabilities is
dom variables each exponentially distributed with probability density func-
tion
P(m~x r;"< 1 ~t)=a._,( 1 ~J
f(x) = Ae-Àx (x> 0). (3.1.15)
Write The first probability is equal to
n
Ax =L Xi, 1j = X/Ax
j=l
(j =l, ... , n).

Then T= (T1 , ••. , Tn)' is a vector random variable of dimension n -l, since where t'= t/(1- t), so that we conclude
~T.-=1·
and Ax, T are statistically independent.
I..J l '
g.(t) lìt =n~ T~ E (t', t'+~~ lìt) ]a·-•C ~ J (3.1.17)
64 Outliers in statistica[ data Discordancy tests foroutliers in uni variate samples 65

Since, from (3.1.15), (n -l)T~ bas tbe F-distribution on 2 and 2(n -l) Consider first tbe slippage alternative. To fix ideas, suppose we are testing
degrees of freedom, we bave in tbe exponential case tbe recurrence relation- an upper outlier x<n> in a univariate sample x1 , •.• , xn. Tbe null bypotbesis is
sbip
H:F,
(3.1.18) i.e. ali tbe observations arise from a distribution F wbicb is, say, N(J.t, u 2 )
witb J.t, u 2 unknown. We envisage a slippage alternative H wbicb states tbat
n- l of tbe observations belong to F and tbe ntb observation Xm wbicb we
wbicb gives tbe null distribution of tbe outlier statistic X( n)/ Ax for a sample
will now rename Xc and call tbe contaminant, belongs to a different distribu-
of size n, in terms of tbe corresponding distribution for a sample of size
tion G. lf F is N(J.t, u 2 ), G may be N(~-t + u!l., u 2 ) (for slippage in location)
n-1.
or N(J.t, u 2 exp !l) (for slippage in dispersion); tbe bypotbeses can tben be
Tbe range of possible values of X(n/ A x is from 1/n to l, bence Gn (t)= l
written
for t~ l. Tbe following recursive calculation arises from (3.1.18).

Range versus
~t)
Range
for (1- t)- 1 Gn-1(1 for t gn(t) Gn(t)
H:!l.>O.
[l, 00] l H. l] n(n -1)(1- t)n- 2 1- n(l- t)n- 1
We wisb to test x(n) for discordancy. For tbe moment, let us distinguisb
H. l] 1-(n-1) c-2tr-2
-- H.!J n(n -1)(1- t)n- 2 1- n(l- t)n- 1 between two kinds of test statistic, 'generai' and 'specific' say. (We will not
1-t
n(n -l) need to maintain tbis distinction for long.) To construct a generai statistic
- n(n -1) 2 (1- 2t)n- 2 +--(l-2t)n- 1
2! Z<n>' we start witb a measure Zi of tbe positioning of any observation xi in
relation to tbe rest of tbe sample; for example, Zi could be
[l,!] 1-(n-1) c--2')n-2
- L\, f] n( n -1)(1- t)n- 2 1-n(l- t)n- 1
1-t
(n-l)(n-2) n(n -l)
(xi -i)/ s or (xi- i)/(x(n)- x< 1>).
n(n -1) 2 (1- 2t)n- 2 +--(1-2t)n-1
+ 2! 2! By particularizing to x<n> we get tbe corresponding discordancy statistic Z(n)'

x c-3tr-2 + n(n 1)2 (n- 2) n(n -l)(n- 2) (1- 3t)n-t e.g.


(1- 3t)n- 2
1-t 2 3!
Z<n> =(x< n>- i)/ s or (x<n>- i)/(x<n>- x(l)).

and so on. Tbe density function consists of a succession of smootbly A specific statistic, on tbe otber band, is sensibly defined only in relation to
connected arcs in tbe intervals [~, 1], [i,~], Li, iJ, ... , [n~b ~]. Tbis well tbe outlier x<n>' and cannot be meaningfully embedded in some set of
known result was first given by Fisber (1929). statistics of like form ranging over tbe n sample members; e.g.
Tbere is, of course, notbing in tbe metbod of derivation of (3.1.18) wbicb
Z = (x(n)- X(n-l))/s.
is specific to tbe exponential distribution and it can be applied in otber
circumstances. Additionally, it is easily modified foF bandling, say, X(l)/Ax, Suppose our discordancy statistic is of tbe generai type. Tbe test takes tbe
wbicb will also be of interest. form: 'Adjudge X(n) discordant if Z(n)> Za, wbere Za is tbe criticai value for
preassigned significance level a defined by
3.2 PERFORMANCE CRITERIA OF TESTS P(Z(n) > Za l H) = a'. (3.2.1)
We raised in Section 2.5 tbe question of wbat constitutes an appropriate Since we assume a slippage alternative, one of tbe sample observations
performance criterion for a test of discordancy. A key measure is tbe under H will be tbe contaminant xc; and since Z<n> is a generai statistic, a
significance level, altbougb its interpretation is to some extent problematical corresponding measure Zc exists for tbe contaminant. In tbe context of tbis
(see Collett and Lewis, 1976). Comparison of tests of tbe same significance particular set of assumptions, David (1970) suggested tbe following five
level must of course depend on tbe alternative bypotbesis we bave in mind probabilities as 'reasonable measures' of tbe performance of Z<n>·
for explaining tbe outliers (Section 2.3).
(i) Pl = P(Z(n) > Za l H). (3.2.2)
64 Outliers in statistica[ data Discordancy tests foroutliers in uni variate samples 65

Since, from (3.1.15), (n -l)T~ bas tbe F-distribution on 2 and 2(n -l) Consider first tbe slippage alternative. To fix ideas, suppose we are testing
degrees of freedom, we bave in tbe exponential case tbe recurrence relation- an upper outlier x<n> in a univariate sample x1 , •.• , xn. Tbe null bypotbesis is
sbip
H:F,
(3.1.18) i.e. ali tbe observations arise from a distribution F wbicb is, say, N(J.t, u 2 )
witb J.t, u 2 unknown. We envisage a slippage alternative H wbicb states tbat
n- l of tbe observations belong to F and tbe ntb observation Xm wbicb we
wbicb gives tbe null distribution of tbe outlier statistic X( n)/ Ax for a sample
will now rename Xc and call tbe contaminant, belongs to a different distribu-
of size n, in terms of tbe corresponding distribution for a sample of size
tion G. lf F is N(J.t, u 2 ), G may be N(~-t + u!l., u 2 ) (for slippage in location)
n-1.
or N(J.t, u 2 exp !l) (for slippage in dispersion); tbe bypotbeses can tben be
Tbe range of possible values of X(n/ A x is from 1/n to l, bence Gn (t)= l
written
for t~ l. Tbe following recursive calculation arises from (3.1.18).

Range versus
~t)
Range
for (1- t)- 1 Gn-1(1 for t gn(t) Gn(t)
H:!l.>O.
[l, 00] l H. l] n(n -1)(1- t)n- 2 1- n(l- t)n- 1
We wisb to test x(n) for discordancy. For tbe moment, let us distinguisb
H. l] 1-(n-1) c-2tr-2
-- H.!J n(n -1)(1- t)n- 2 1- n(l- t)n- 1 between two kinds of test statistic, 'generai' and 'specific' say. (We will not
1-t
n(n -l) need to maintain tbis distinction for long.) To construct a generai statistic
- n(n -1) 2 (1- 2t)n- 2 +--(l-2t)n- 1
2! Z<n>' we start witb a measure Zi of tbe positioning of any observation xi in
relation to tbe rest of tbe sample; for example, Zi could be
[l,!] 1-(n-1) c--2')n-2
- L\, f] n( n -1)(1- t)n- 2 1-n(l- t)n- 1
1-t
(n-l)(n-2) n(n -l)
(xi -i)/ s or (xi- i)/(x(n)- x< 1>).
n(n -1) 2 (1- 2t)n- 2 +--(1-2t)n-1
+ 2! 2! By particularizing to x<n> we get tbe corresponding discordancy statistic Z(n)'

x c-3tr-2 + n(n 1)2 (n- 2) n(n -l)(n- 2) (1- 3t)n-t e.g.


(1- 3t)n- 2
1-t 2 3!
Z<n> =(x< n>- i)/ s or (x<n>- i)/(x<n>- x(l)).

and so on. Tbe density function consists of a succession of smootbly A specific statistic, on tbe otber band, is sensibly defined only in relation to
connected arcs in tbe intervals [~, 1], [i,~], Li, iJ, ... , [n~b ~]. Tbis well tbe outlier x<n>' and cannot be meaningfully embedded in some set of
known result was first given by Fisber (1929). statistics of like form ranging over tbe n sample members; e.g.
Tbere is, of course, notbing in tbe metbod of derivation of (3.1.18) wbicb
Z = (x(n)- X(n-l))/s.
is specific to tbe exponential distribution and it can be applied in otber
circumstances. Additionally, it is easily modified foF bandling, say, X(l)/Ax, Suppose our discordancy statistic is of tbe generai type. Tbe test takes tbe
wbicb will also be of interest. form: 'Adjudge X(n) discordant if Z(n)> Za, wbere Za is tbe criticai value for
preassigned significance level a defined by
3.2 PERFORMANCE CRITERIA OF TESTS P(Z(n) > Za l H) = a'. (3.2.1)
We raised in Section 2.5 tbe question of wbat constitutes an appropriate Since we assume a slippage alternative, one of tbe sample observations
performance criterion for a test of discordancy. A key measure is tbe under H will be tbe contaminant xc; and since Z<n> is a generai statistic, a
significance level, altbougb its interpretation is to some extent problematical corresponding measure Zc exists for tbe contaminant. In tbe context of tbis
(see Collett and Lewis, 1976). Comparison of tests of tbe same significance particular set of assumptions, David (1970) suggested tbe following five
level must of course depend on tbe alternative bypotbesis we bave in mind probabilities as 'reasonable measures' of tbe performance of Z<n>·
for explaining tbe outliers (Section 2.3).
(i) Pl = P(Z(n) > Za l H). (3.2.2)
66 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 67

This is the probability under H that the outlier is identified as discordant, in discordancy criteria for outliers in normal samples, says:
other words the power function.
The performance of the ... criteria is measured by computing the proportion of the
time the contaminating distribution provides an extreme value and the test discovers
(ii) Pz = P(Zc > Za l H). (3.2.3)
the ':alue [i.e. P3]. 6f course, performance could be measured by the proportion of
(iii) P3 = P(Zc = Z(n)• Z(n) > Za l H). (3.2.4) the tlm~ th~ test gives. a significant value when a member of the contaminating
population _xs p~e~ent m the sample, even though not at an extreme [i.e. P 1].
This is the probability that the contaminant is the outlier and is identified as However,_ sm_ce 1t 1s assumed that discovery of an outlier will frequently be followed
by the reJectxo~ of an extreme we shall consider discovery a success only when the
discordant. extreme value IS from the contaminating distribution.
(iv) P4 = P(Zc = Z(n)> Za, Z(n_:l) < Za l H). (3.2.5)
Fora 'good' test, we require P3 and P5 to be high. W e also want P 1 - P3 to
(v) Ps = P(Zc > Za l Zc = Z(n); H). (3.2.6) be low; this is the probability that the test wrongly identifies a good
observation as discordant. P 3 / P5 is the probability that the contaminant
This is the probability that, when the contaminant is the outlier, it is identified shows up as the outlier, and it might appear that one would like this ratio to
as discordant. be as large as possible, in conftict with the requirement for a high value of
David observes that: Ps. Consider, however, two hypothetical tests with performance measures as
P 1 measures the probability of significance [i.e. adjudged discordancy] for any reason follows:
whatever and is thus especially suitable for sounding a generai alarm .... P 2 , P 3 and test A: P3 = !, P5 =l, P3/P5 = !.
P 4 focus with increasing severity on the correct detection of the outlier ... ; only P 4 testB: P 3 =!,P5 =!, P3/Ps=l.
specifically excludes the possibility thàt good observations might be significant
[adjudged discordant] in addition to ... [the contaminant]. We see that In test A, the contaminant has only 50 per cent chance of showing up as the
extreme value, indicating that the degree of contamination is not severe on
average (the value of !l in H is not large); however, when the contaminant
does appear as the outlier, it is certain to be detected. In test B, on the other
In point of fact P~> P 3 , and P5 are useful measures, but the information
conveyed by P 2 and P 4 would seem to be rather limited. Suppose for band, contamination is more severe and the contaminant is always the
extreme value. But it is only detected as discordant 50 per cent of times.
example that the contaminant Xc is the second greatest observation, x(n-t)·
For a 'good' test we want a high probability of identifying Xc as discordant What we require of a test is that it should identify contamination when this
but this should appropriately be done by reference to the null distribution of is sufficiently manifest, so test A is preferable to test B, and a high value of
Ps is desirable rather than a high value of P 3 / P5 • The same line of argument
Z(n-t) and not of Z(n)• so that Za is not the appropriate criticai value and P 2

not the appropriate measure. Similar considerations apply to P 4 , based as it indicates that P5 takes precedence over P3 as a measure of test performance.
To sum up, a good test may be characterized by high P5 , high P 1 , and low
is on the inequality z(n-1) < Za. In fact, David (1970) labels Pz, p4 respec-
tively as the 'probability that X 1 [xc in our notation] is significantly large' Pt- P 3 •
and the 'probability that only xl [i.e. Xc] is significant'. While these So far we bave been talking in terms of a slippage alternative. Consider
definitions appear attractive, the concept of 'significance' on which they rest now the assessment of performance of a discordancy test against a mixture
is ill-defined. alternative .. As be~ore? take the particular ~ituation of an upper outlier x(n)
We can therefore discard P 2 and P 4 as performance criteria. Once we do under test m a umvanate sample. Under H the number of contaminants in
t~e ~ampie is no longer fixed as in the slippage case, but is a binomially-
this, the need to distinguish between generai and specific tests for discor-
dancy disappears, since P~> P 3 , and P5 do not depend for their definition on distnbuted random number which may be O, l or more. The following
the test being of one or the other type. (For this generalization the event events are clearly relevant to the assessment of performance:
'Zc = Z(n)' in the definitions of P3 and P5 needs to be rewritten as 'xc= x(n() D: that the test identifies x(n) as discordant.
P 1 , P 3 , and P5 may be considered to contain between them ali the relevant E: that H holds an d the sample contains o ne or more contaminants.
information about the performance of a discordancy test against a slippage
F: that H holds and x(n) is a contaminant.
alternative. P 1 is a convenient generai measure, for the reason indicated by
David (1970) (see above). There are strong arguments, however, for prefer- The direct analogue o! the performance measure P 1 defined in (3.2.2), i.e.
ring P3 as a measure. Dixon (1950), discussing the assessment of a number of the power, is P(D l H). However P(D l E), which is a power function
66 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 67

This is the probability under H that the outlier is identified as discordant, in discordancy criteria for outliers in normal samples, says:
other words the power function.
The performance of the ... criteria is measured by computing the proportion of the
time the contaminating distribution provides an extreme value and the test discovers
(ii) Pz = P(Zc > Za l H). (3.2.3)
the ':alue [i.e. P3]. 6f course, performance could be measured by the proportion of
(iii) P3 = P(Zc = Z(n)• Z(n) > Za l H). (3.2.4) the tlm~ th~ test gives. a significant value when a member of the contaminating
population _xs p~e~ent m the sample, even though not at an extreme [i.e. P 1].
This is the probability that the contaminant is the outlier and is identified as However,_ sm_ce 1t 1s assumed that discovery of an outlier will frequently be followed
by the reJectxo~ of an extreme we shall consider discovery a success only when the
discordant. extreme value IS from the contaminating distribution.
(iv) P4 = P(Zc = Z(n)> Za, Z(n_:l) < Za l H). (3.2.5)
Fora 'good' test, we require P3 and P5 to be high. W e also want P 1 - P3 to
(v) Ps = P(Zc > Za l Zc = Z(n); H). (3.2.6) be low; this is the probability that the test wrongly identifies a good
observation as discordant. P 3 / P5 is the probability that the contaminant
This is the probability that, when the contaminant is the outlier, it is identified shows up as the outlier, and it might appear that one would like this ratio to
as discordant. be as large as possible, in conftict with the requirement for a high value of
David observes that: Ps. Consider, however, two hypothetical tests with performance measures as
P 1 measures the probability of significance [i.e. adjudged discordancy] for any reason follows:
whatever and is thus especially suitable for sounding a generai alarm .... P 2 , P 3 and test A: P3 = !, P5 =l, P3/P5 = !.
P 4 focus with increasing severity on the correct detection of the outlier ... ; only P 4 testB: P 3 =!,P5 =!, P3/Ps=l.
specifically excludes the possibility thàt good observations might be significant
[adjudged discordant] in addition to ... [the contaminant]. We see that In test A, the contaminant has only 50 per cent chance of showing up as the
extreme value, indicating that the degree of contamination is not severe on
average (the value of !l in H is not large); however, when the contaminant
does appear as the outlier, it is certain to be detected. In test B, on the other
In point of fact P~> P 3 , and P5 are useful measures, but the information
conveyed by P 2 and P 4 would seem to be rather limited. Suppose for band, contamination is more severe and the contaminant is always the
extreme value. But it is only detected as discordant 50 per cent of times.
example that the contaminant Xc is the second greatest observation, x(n-t)·
For a 'good' test we want a high probability of identifying Xc as discordant What we require of a test is that it should identify contamination when this
but this should appropriately be done by reference to the null distribution of is sufficiently manifest, so test A is preferable to test B, and a high value of
Ps is desirable rather than a high value of P 3 / P5 • The same line of argument
Z(n-t) and not of Z(n)• so that Za is not the appropriate criticai value and P 2

not the appropriate measure. Similar considerations apply to P 4 , based as it indicates that P5 takes precedence over P3 as a measure of test performance.
To sum up, a good test may be characterized by high P5 , high P 1 , and low
is on the inequality z(n-1) < Za. In fact, David (1970) labels Pz, p4 respec-
tively as the 'probability that X 1 [xc in our notation] is significantly large' Pt- P 3 •
and the 'probability that only xl [i.e. Xc] is significant'. While these So far we bave been talking in terms of a slippage alternative. Consider
definitions appear attractive, the concept of 'significance' on which they rest now the assessment of performance of a discordancy test against a mixture
is ill-defined. alternative .. As be~ore? take the particular ~ituation of an upper outlier x(n)
We can therefore discard P 2 and P 4 as performance criteria. Once we do under test m a umvanate sample. Under H the number of contaminants in
t~e ~ampie is no longer fixed as in the slippage case, but is a binomially-
this, the need to distinguish between generai and specific tests for discor-
dancy disappears, since P~> P 3 , and P5 do not depend for their definition on distnbuted random number which may be O, l or more. The following
the test being of one or the other type. (For this generalization the event events are clearly relevant to the assessment of performance:
'Zc = Z(n)' in the definitions of P3 and P5 needs to be rewritten as 'xc= x(n() D: that the test identifies x(n) as discordant.
P 1 , P 3 , and P5 may be considered to contain between them ali the relevant E: that H holds an d the sample contains o ne or more contaminants.
information about the performance of a discordancy test against a slippage
F: that H holds and x(n) is a contaminant.
alternative. P 1 is a convenient generai measure, for the reason indicated by
David (1970) (see above). There are strong arguments, however, for prefer- The direct analogue o! the performance measure P 1 defined in (3.2.2), i.e.
ring P3 as a measure. Dixon (1950), discussing the assessment of a number of the power, is P(D l H). However P(D l E), which is a power function
68 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 69

conditional on the actual presence of contamination, is a more useful meas- more than 10 outliers in a sample of 100. Any attempt at quantification of
ure than P(D l H). k would depend on a sensible choice of the working and alternative
Analogously to the measures P3 , P5 of (3.2.4) and (3.2.6) we can define h;~theses in relation to each other, and information regarding the best
measures P(Fn D l E) and P(D l F) (which is of course the same as achievable performance (Section 3.2) by discordancy tests (Section 3.1).
l
P(D FnE)). Such studies have not so far been undertaken. In published work to date,
Characteristics of a good test are, by the previous line of argument, a high discordancy tests and other procedures for multiple outliers have been
value of P(D l F), a high value of P(D l E), and a low value of P(D l H)- mainly confined to k = 2 and 3 irrespective of n.
P(D n F l H), the probability that the test identifies as discordant an obser- Faced with a multiple outlier situation, there is a basic choice between two
vation actualiy generated by the basic model.: types of procedure, which we may cali block procedures and consecutive
In the case of a discordancy test against an inherent alternative, the procedures (sometimes referred to as 'sequential' procedures-see e.g.
situation simplifies. There is now no specifiable contaminant observation, Dixon, 1950, and Section 3.3.2).
and the probabilities P3 and P5 (and the events E and F) are undefined. The Suppose, to fix ideas, that we wish to test for discordancy two upper
appropriate measure of performance of the test is the power P(D l H). outliers x<n- 1>, x<n> in an exponential sample of size n. Take Xc 1>, .•. , Xcn-z) as
belonging to the distribution F with density 8e- 6 x (x> O), and Xcn- 1 ) and x(n)
3.3 THE MULTIPLE OUTLIER PROBLEM as belonging to exponential distributions Gt. G 2 with respective densities
AfJe-Mx, ,..,oe-IL6x (x> 0). The working hypothesis is
We bave discussed the testing of a single outlier for discordancy. New
problems of procedure arise when tbe_ number of observations wbich appear H:A=~-t=l.
aberrant in relation to tbe main data mass is more than one. We may bave,
for instance, a normal sample of size n, with two upper outliers Xcn-l) and lf we take as the single alternative bypotbesis
x(n) both of whicb are unusually far to tbe right of the other n - 2, or a
H:A=~-t<l,
normal sample with two lower outliers x(2 ) and x(l) unusually far to tbe left,
or again a sample with a lower and an upper outlier-pair, Xc 1) and x<n>• we are led to a single discordancy test, as tbe result of which we either
widely bracketing the main data mass. Again, a normal sample may contain accept both outliers as consistent with the rest of the sample, or adjudge
tbree extreme values which appear to be outlying in relation to the main tbem both discordant. A possible test criterion in this context would be
n- 3, perbaps ali tbree upper, perhaps two upper and one lower, and so on.
Similar situations may arise with gamma samples, though in the particular
case of the exponential distribution its J -sbape makes it likely that ali
outliers presenting themselves will be upper ones, not lower. Two or more
outliers may in fact be encountered in samples from most univariate This exemplifies a block procedure-we would be testing the multiple
distributions or from distributions of bigber dimensionality; and it is possible outliers en bloc.
to find two or more outlying points in a regression, two or more outlying On the other hand, we could go fora pair of consecutive alternatives to H,
residuals underlying the observations from a designed experiment, or two or which might typicaliy be as follows:
more outlying values in a time-series. In ali these multiple outlier situations
tbere are k(> l) outliers or outlying points in a data set of size n, and the
H':A=l,~-t<l
analyst envisages the possibility of up to k discordant values. Appropriate
tests of discordancy will therefore be required. H":A <l.
For given sample size n there is an effective upper limit to the number of
outliers k. T o take an extreme case, one cannot consider k = n- l observa- The procedure would be first to test H against H' using a test for a single
tions as outliers in relation to the remaining l! lndeed the concept of the upper outlier. If H is accepted, both outliers are declared consistent with the
'main data mass' is hardly meaningful if k ~ n/2, say. Intuitively, an upper remainder of the sample, and the discordancy test terminates. If H is
limit of the form kmax =cna suggests itself for k, where c is a positive rejected, H" is tested against a revised working hypothesis confined to
constant (perhaps l) and a a constant between O an d l. Taking a = !, X(1)• ••• 'X(n-1)•

C= l, for example, we would get kmax = .J n, so that one would not de al with H": A= l.
68 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 69

conditional on the actual presence of contamination, is a more useful meas- more than 10 outliers in a sample of 100. Any attempt at quantification of
ure than P(D l H). k would depend on a sensible choice of the working and alternative
Analogously to the measures P3 , P5 of (3.2.4) and (3.2.6) we can define h;~theses in relation to each other, and information regarding the best
measures P(Fn D l E) and P(D l F) (which is of course the same as achievable performance (Section 3.2) by discordancy tests (Section 3.1).
l
P(D FnE)). Such studies have not so far been undertaken. In published work to date,
Characteristics of a good test are, by the previous line of argument, a high discordancy tests and other procedures for multiple outliers have been
value of P(D l F), a high value of P(D l E), and a low value of P(D l H)- mainly confined to k = 2 and 3 irrespective of n.
P(D n F l H), the probability that the test identifies as discordant an obser- Faced with a multiple outlier situation, there is a basic choice between two
vation actualiy generated by the basic model.: types of procedure, which we may cali block procedures and consecutive
In the case of a discordancy test against an inherent alternative, the procedures (sometimes referred to as 'sequential' procedures-see e.g.
situation simplifies. There is now no specifiable contaminant observation, Dixon, 1950, and Section 3.3.2).
and the probabilities P3 and P5 (and the events E and F) are undefined. The Suppose, to fix ideas, that we wish to test for discordancy two upper
appropriate measure of performance of the test is the power P(D l H). outliers x<n- 1>, x<n> in an exponential sample of size n. Take Xc 1>, .•. , Xcn-z) as
belonging to the distribution F with density 8e- 6 x (x> O), and Xcn- 1 ) and x(n)
3.3 THE MULTIPLE OUTLIER PROBLEM as belonging to exponential distributions Gt. G 2 with respective densities
AfJe-Mx, ,..,oe-IL6x (x> 0). The working hypothesis is
We bave discussed the testing of a single outlier for discordancy. New
problems of procedure arise when tbe_ number of observations wbich appear H:A=~-t=l.
aberrant in relation to tbe main data mass is more than one. We may bave,
for instance, a normal sample of size n, with two upper outliers Xcn-l) and lf we take as the single alternative bypotbesis
x(n) both of whicb are unusually far to tbe right of the other n - 2, or a
H:A=~-t<l,
normal sample with two lower outliers x(2 ) and x(l) unusually far to tbe left,
or again a sample with a lower and an upper outlier-pair, Xc 1) and x<n>• we are led to a single discordancy test, as tbe result of which we either
widely bracketing the main data mass. Again, a normal sample may contain accept both outliers as consistent with the rest of the sample, or adjudge
tbree extreme values which appear to be outlying in relation to the main tbem both discordant. A possible test criterion in this context would be
n- 3, perbaps ali tbree upper, perhaps two upper and one lower, and so on.
Similar situations may arise with gamma samples, though in the particular
case of the exponential distribution its J -sbape makes it likely that ali
outliers presenting themselves will be upper ones, not lower. Two or more
outliers may in fact be encountered in samples from most univariate This exemplifies a block procedure-we would be testing the multiple
distributions or from distributions of bigber dimensionality; and it is possible outliers en bloc.
to find two or more outlying points in a regression, two or more outlying On the other hand, we could go fora pair of consecutive alternatives to H,
residuals underlying the observations from a designed experiment, or two or which might typicaliy be as follows:
more outlying values in a time-series. In ali these multiple outlier situations
tbere are k(> l) outliers or outlying points in a data set of size n, and the
H':A=l,~-t<l
analyst envisages the possibility of up to k discordant values. Appropriate
tests of discordancy will therefore be required. H":A <l.
For given sample size n there is an effective upper limit to the number of
outliers k. T o take an extreme case, one cannot consider k = n- l observa- The procedure would be first to test H against H' using a test for a single
tions as outliers in relation to the remaining l! lndeed the concept of the upper outlier. If H is accepted, both outliers are declared consistent with the
'main data mass' is hardly meaningful if k ~ n/2, say. Intuitively, an upper remainder of the sample, and the discordancy test terminates. If H is
limit of the form kmax =cna suggests itself for k, where c is a positive rejected, H" is tested against a revised working hypothesis confined to
constant (perhaps l) and a a constant between O an d l. Taking a = !, X(1)• ••• 'X(n-1)•

C= l, for example, we would get kmax = .J n, so that one would not de al with H": A= l.
70 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 71

Again we would use a test for a single upper outlier. We thus have a The first direct quantitative comparisons of block and consecutive
consecutive procedure, with three possible patbs: procedures appear to be tbose of McMillan and David (1971) and McMillan
(1971). McMillan and David considera normal sample witb known variance,
Accept H~ adjudge neither x(n) nor x(n- 1) discordant. unity say, containing two contaminants from a normal distribution also
Reject H, accept H"~ adjudge x(n) discordant, but not x(n- 1). having unit variance but with mean slipped to tbe right. They evaluate P 3
Reject H, reject H"~ adjudge botb x(n) and x(n- 1 ) discordant. (Section 3.2) for a block discordancy test based on tbe sum of tbe two
Tbe above discussion illustrates tbe cboice between block and consecutive largest deviations from tbe sample mean; and tbey also consider a consecu-
procedures. An apparently similar kind of cboice is, of course, familiar in tive procedure based at eacb stage on tbe largest deviation from tbe mean,
tbe testing of tbe relevance, of a subset of tbe regressor variables x h x 2 , ••• evaluating tbe probabilities tbat at least one contaminant is identified as
in a regression analysis. In tbe regression situation tbe data context some- discordant and that botb contaminants are so identified; see Section 3.3.2
times gives a guide as to wben a block procedure is appropriate, in below. McMillan (1971) gives corresponding results in terms of studentized
preference to the consecutive testing of variables one by one wbich is tbe deviations for tbe case wben tbe underlying variance is unknown. Hawkins
norm. For example, we migbt be studying tbe effect of nine factors (1973) extends McMillan's results and gives values of tbe power function
x 1 , . • . , x 9 on tbe efficiency, y, of a domestic beating device tested in situ. for tbe consecutive test in various cases. As Hawkins says in bis conclusion:
Suppose xh x 2 are properties of tbe fuel used, x 3 , ••• , x 7 are different
features of tbe internai construction of tbe bouse, and x8 , x 9 are measure- we [have discussed] the problem of repeated use of a single outlier statistic. The null
hypothesis distributions are solved, but much research remains to be done on the
ments of ambient temperature at ground and roof level outside the house. alternative hypothesis distributions.
Tben it is reasonable to test the significance of x8 and x 9 jointly-a block
Consecutive procedures bave an obvious appeal, but, as has long been
procedure-on tbe basis tbat if y is affected at ali by tbe outside tempera-
recognized, they suffer in the form described above from one inberent
ture we sbould clearly take botb of tbe outside temperature measurements
limitation. Tbis is tbe possible effect of masking. Suppose, to fix ideas, that
into account.
two upper outliers x(n- 1), x(n) are to be tested for discordancy by up to two
In tbe usual multiple outlier situation, ali tbe observations bave tbe same
consecutive applications of a test using a statistic of tbe form N/ D, wbere N
status on tbe working bypotbesis, and discordancy tests are only invoked
is a measure of tbe separation of tbe greatest value from tbe rest of tbe
wben particular values show up as outliers; tbus no guidance is available
sample and D is a measure of tbe spread of tbe sample. At the first stage of
from tbe data context as to wbether a block procedure sbould be used. Tbe
the test, wbere we consider x(n) alone, the D-value will be large since it
exception is wben we have prior information wbicb leads us to focus on
involves the outlier x(n- 1), so tbat x(n) may not be adjudged discordant; tbe
some particular subset of tbe data-for example, if x(l), x(n) were observa-
procedure terminates without a second test and botb x(n) and x(n- 1) are
tions made by experimenter A and tbe otber n -2 by experimenter B.
declared consistent with the remainder of the sample. On tbe otber band a
In principle, tbe cboice between a block procedure and a consecutive
block test of x(n) and x(n- 1 ) as a pair might identify them as bigbly
procedure in a multiple outlier situation depends on tbe relative perfor-
discordant. This is wbat would bappen for example, witb a sample sucb as 3,
mances of tbe test procedures in relation to an alternative bypothesis ii. We
4, 7, 8, 10, 949, 951. In a pbrase due to Murpby (1951), x(n- 1) has bad a
bave to say 'in principle' ratber tban 'in practice', sin ce in most cases no
masking effect on the identification of X(n); tbe masking pbenomenon was
performance criteria bave so far been evaluated. Dixon (1950) gives values
discussed as early as 1936 by Pearson and Cbandra Sekar. An interesting
of performance measure P 3 (Section 3.2) for block tests of two upper (or
alternative danger bas recently been described by Fieller (1976), tbat false
lower) outliers in a normal sample using tbe test statistics
conclusions may be drawn owing to an effect wbicb be terms swamping. For
su m of squares about me an for n - 2 observations omitting outliers example, consider tbe sample 3, 4, 7, 8, 10, 13, 951. A block procedure
(i)
sum of squares about mean for n observations including outliers applied to tbe upper two values 13 and 951 may well declare tbem
discordant as a pair; tbe extreme outlier 951 bas 'carried' tbe otberwise
an d unexceptionable value 13.
(ii)
3.3.1 Block procedures for multiple outliers in univariate samples
Ferguson (1961a) gives values of tbe power function P 1 for generai multiple
outlier discordancy tests based on tbe sample skewness and tbe sample The considerations governing discordancy tests for single outliers, discussed
kurtosis for an unspecified number of outliers; see Worksheets below. earlier in this cbapter, extend to block-type discordancy tests for multiple
70 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 71

Again we would use a test for a single upper outlier. We thus have a The first direct quantitative comparisons of block and consecutive
consecutive procedure, with three possible patbs: procedures appear to be tbose of McMillan and David (1971) and McMillan
(1971). McMillan and David considera normal sample witb known variance,
Accept H~ adjudge neither x(n) nor x(n- 1) discordant. unity say, containing two contaminants from a normal distribution also
Reject H, accept H"~ adjudge x(n) discordant, but not x(n- 1). having unit variance but with mean slipped to tbe right. They evaluate P 3
Reject H, reject H"~ adjudge botb x(n) and x(n- 1 ) discordant. (Section 3.2) for a block discordancy test based on tbe sum of tbe two
Tbe above discussion illustrates tbe cboice between block and consecutive largest deviations from tbe sample mean; and tbey also consider a consecu-
procedures. An apparently similar kind of cboice is, of course, familiar in tive procedure based at eacb stage on tbe largest deviation from tbe mean,
tbe testing of tbe relevance, of a subset of tbe regressor variables x h x 2 , ••• evaluating tbe probabilities tbat at least one contaminant is identified as
in a regression analysis. In tbe regression situation tbe data context some- discordant and that botb contaminants are so identified; see Section 3.3.2
times gives a guide as to wben a block procedure is appropriate, in below. McMillan (1971) gives corresponding results in terms of studentized
preference to the consecutive testing of variables one by one wbich is tbe deviations for tbe case wben tbe underlying variance is unknown. Hawkins
norm. For example, we migbt be studying tbe effect of nine factors (1973) extends McMillan's results and gives values of tbe power function
x 1 , . • . , x 9 on tbe efficiency, y, of a domestic beating device tested in situ. for tbe consecutive test in various cases. As Hawkins says in bis conclusion:
Suppose xh x 2 are properties of tbe fuel used, x 3 , ••• , x 7 are different
features of tbe internai construction of tbe bouse, and x8 , x 9 are measure- we [have discussed] the problem of repeated use of a single outlier statistic. The null
hypothesis distributions are solved, but much research remains to be done on the
ments of ambient temperature at ground and roof level outside the house. alternative hypothesis distributions.
Tben it is reasonable to test the significance of x8 and x 9 jointly-a block
Consecutive procedures bave an obvious appeal, but, as has long been
procedure-on tbe basis tbat if y is affected at ali by tbe outside tempera-
recognized, they suffer in the form described above from one inberent
ture we sbould clearly take botb of tbe outside temperature measurements
limitation. Tbis is tbe possible effect of masking. Suppose, to fix ideas, that
into account.
two upper outliers x(n- 1), x(n) are to be tested for discordancy by up to two
In tbe usual multiple outlier situation, ali tbe observations bave tbe same
consecutive applications of a test using a statistic of tbe form N/ D, wbere N
status on tbe working bypotbesis, and discordancy tests are only invoked
is a measure of tbe separation of tbe greatest value from tbe rest of tbe
wben particular values show up as outliers; tbus no guidance is available
sample and D is a measure of tbe spread of tbe sample. At the first stage of
from tbe data context as to wbether a block procedure sbould be used. Tbe
the test, wbere we consider x(n) alone, the D-value will be large since it
exception is wben we have prior information wbicb leads us to focus on
involves the outlier x(n- 1), so tbat x(n) may not be adjudged discordant; tbe
some particular subset of tbe data-for example, if x(l), x(n) were observa-
procedure terminates without a second test and botb x(n) and x(n- 1) are
tions made by experimenter A and tbe otber n -2 by experimenter B.
declared consistent with the remainder of the sample. On tbe otber band a
In principle, tbe cboice between a block procedure and a consecutive
block test of x(n) and x(n- 1 ) as a pair might identify them as bigbly
procedure in a multiple outlier situation depends on tbe relative perfor-
discordant. This is wbat would bappen for example, witb a sample sucb as 3,
mances of tbe test procedures in relation to an alternative bypothesis ii. We
4, 7, 8, 10, 949, 951. In a pbrase due to Murpby (1951), x(n- 1) has bad a
bave to say 'in principle' ratber tban 'in practice', sin ce in most cases no
masking effect on the identification of X(n); tbe masking pbenomenon was
performance criteria bave so far been evaluated. Dixon (1950) gives values
discussed as early as 1936 by Pearson and Cbandra Sekar. An interesting
of performance measure P 3 (Section 3.2) for block tests of two upper (or
alternative danger bas recently been described by Fieller (1976), tbat false
lower) outliers in a normal sample using tbe test statistics
conclusions may be drawn owing to an effect wbicb be terms swamping. For
su m of squares about me an for n - 2 observations omitting outliers example, consider tbe sample 3, 4, 7, 8, 10, 13, 951. A block procedure
(i)
sum of squares about mean for n observations including outliers applied to tbe upper two values 13 and 951 may well declare tbem
discordant as a pair; tbe extreme outlier 951 bas 'carried' tbe otberwise
an d unexceptionable value 13.
(ii)
3.3.1 Block procedures for multiple outliers in univariate samples
Ferguson (1961a) gives values of tbe power function P 1 for generai multiple
outlier discordancy tests based on tbe sample skewness and tbe sample The considerations governing discordancy tests for single outliers, discussed
kurtosis for an unspecified number of outliers; see Worksheets below. earlier in this cbapter, extend to block-type discordancy tests for multiple
72 Outliers in statistica[ data Discordancy tests foroutliers in uni variate samples 73

outliers-in the construction of tests, the existence in some cases of inclusive On the other hand, the labelled slippage alternative corresponding to H",
and exclusive measures and of recursive relations for null distributions, and viz.
in tbe setting up of performance criteria.
H"':x< 1 >, ••• , x<n- 2) belong to F
As in tbe single outlier case (Section 3.1), appealing test statistics can be
set up on an intuitive basis; and 'best' tests can be constructed on tbe x(n- 1 ) belongs to GÀ
maximum likelibood ratio principle or again on the principle of local opti- x< n> belongs to GIL
mality. Consider again tbe testing of a pair of upper outliers x<n- 1 ), x<n> in an
exponential sample. On intuitive grounds-maybe by generalization from does not now lead to T<~- 1 ,n)' but gives a maximum likelibood ratio test
the case of a single upper outlier-one could propose statistic wbicb cannot be expressed in closed form.
As regards performance criteria for tests of multiple outliers, tbe discus-
(3.3.1) sion of Section 3.2 carries over to the block test situation, tbe 'contaminant'
as a sensible statistic. Again, Dixon-type statistics sucb as {xcn>- x<n-2)}1 now being a contaminant subset of two or more observations.
{xcn- 2 >- xc 1>} bave a natura! appeal. On tbe otber hand, le t us see w bere Publisbed work on block procedures includes Grubbs (1950) and Tietjen
tbe maximum likelibood ratio principle leads. Our working bypotbesis and Moore (1972, but see the cautionary remark on Worksheet N4) for
is normal samples, and Likes (1966) and Lewis and Fieller (1978) for gamma
samples.
H:F
declaring that ali tbe observations xh ... , xn belong to the distribution F 3.3.2 Consecutive procedures for multiple outliers in univariate samples
witb density ee-ex (x> O), e unknown:
Suppose first tbat we bave a slippage alternative H stating tbat n - 2 of Tbe possibility of testing multiple outliers consecutively for discordancy bas
tbe observations belong to F and tbe remaining two, xn_ 1 and xn say, come been mentioned by a number of autbors, mainly with reference to tbe
from tbe exponential distribution GÀ witb density Aee-À 8x (x> O; A< 1). masking effect; see Pearson and Cbandra Sekar (1936), Dixon (1953),
Calculations similar to those of equations (3.1.1) to (3.1.5) lead to Ferguson (1961b), David (1970, p. 185), and Tietjen and Moore (1972).
However, as regards quantitative discussion of actual tests and tbeir proper-
T= (xn-1 + Xn)12x (3.3.2) ties, nearly ali tbe references in tbe literature relate to block procedures.
as test statistic, providing tbat (xn_ 1+ xn)/2 ~x", tbe mean of x h . . . , xn_ 2. If Notable exceptions are tbe papers by McMillan and David (1971), McMillan
instead of H we adopt tbe corresponding labelled slippage alternative (1971), and Hawkins (1973) mentioned above.
Up to now autbors bave mostly used tbe word 'sequential' wben referring
H': x(l), ... , x<n- 2) be long to F to tbe successive testing of multiple outliers one at a time. We prefer tbe
X(n-1), X(n) belong to GÀ, word consecutive. Sequential testing, in common statistica! parlance, implies
that the sample size is not fixed but is determined in eacb realization in
calculations similar to tbose of equations (3.1.6) to (3.1.9) lead to T(n-t,n) as relation to tbe values of the earlier observations. In successive testing of
test statistic. Alternatively, Tcn-l,n) can be set up on tbe basis of a multiple multiple outliers this sequential property applies to tbe number of times tbe
decision argument as in (3.1.10). test is used, not to tbe sample size, which is fixed.
But H, H' are not tbe only pair of slippage alternatives. Consider instead Consecutive procedures present no separate problem of test construction,
the slippage alternative H" stating tbat Xt. ... , Xn- 2 belong to F, Xn-1 since tbey merely involve repeated use of single outlier tests from tbe
belongs to GÀ, and xn belongs to tbe exponential distribution GIL witb available repertoire. As regards performance criteria, however, we are in a
density p.ee-IL 8 x (x> O; p.< A). Tbe maximum likelibood ratio statistic is fresh situation. The measures described in Section 3.2 will need generalizing
now not T but since tbere is, by definition, not one contaminant but severa!. Suppose for
T"= (xn;~ n 1/n( n
X
)
+
Xn-~ Xn
)(n-2)/n.
(3.3.3) example tbat tbe alternative hypotbesis H envisages two discordant values,
liable to appear as upper outliers, in a sample of n. Tbe following events
The multiple decision argument leads to the statistic defined under H would seem to be relevant:

T" = (x (n-1) X (n)


l/n( X(n-1)
+ X(n) )(n-2)/n
(3.3.4)
E 1 : tbat x< n> is o ne of tbe two contaminants
(n-1,n) x2 ) n x . E 2 : tbat x<n-t> is one of tbe two contaminants
72 Outliers in statistica[ data Discordancy tests foroutliers in uni variate samples 73

outliers-in the construction of tests, the existence in some cases of inclusive On the other hand, the labelled slippage alternative corresponding to H",
and exclusive measures and of recursive relations for null distributions, and viz.
in tbe setting up of performance criteria.
H"':x< 1 >, ••• , x<n- 2) belong to F
As in tbe single outlier case (Section 3.1), appealing test statistics can be
set up on an intuitive basis; and 'best' tests can be constructed on tbe x(n- 1 ) belongs to GÀ
maximum likelibood ratio principle or again on the principle of local opti- x< n> belongs to GIL
mality. Consider again tbe testing of a pair of upper outliers x<n- 1 ), x<n> in an
exponential sample. On intuitive grounds-maybe by generalization from does not now lead to T<~- 1 ,n)' but gives a maximum likelibood ratio test
the case of a single upper outlier-one could propose statistic wbicb cannot be expressed in closed form.
As regards performance criteria for tests of multiple outliers, tbe discus-
(3.3.1) sion of Section 3.2 carries over to the block test situation, tbe 'contaminant'
as a sensible statistic. Again, Dixon-type statistics sucb as {xcn>- x<n-2)}1 now being a contaminant subset of two or more observations.
{xcn- 2 >- xc 1>} bave a natura! appeal. On tbe otber hand, le t us see w bere Publisbed work on block procedures includes Grubbs (1950) and Tietjen
tbe maximum likelibood ratio principle leads. Our working bypotbesis and Moore (1972, but see the cautionary remark on Worksheet N4) for
is normal samples, and Likes (1966) and Lewis and Fieller (1978) for gamma
samples.
H:F
declaring that ali tbe observations xh ... , xn belong to the distribution F 3.3.2 Consecutive procedures for multiple outliers in univariate samples
witb density ee-ex (x> O), e unknown:
Suppose first tbat we bave a slippage alternative H stating tbat n - 2 of Tbe possibility of testing multiple outliers consecutively for discordancy bas
tbe observations belong to F and tbe remaining two, xn_ 1 and xn say, come been mentioned by a number of autbors, mainly with reference to tbe
from tbe exponential distribution GÀ witb density Aee-À 8x (x> O; A< 1). masking effect; see Pearson and Cbandra Sekar (1936), Dixon (1953),
Calculations similar to those of equations (3.1.1) to (3.1.5) lead to Ferguson (1961b), David (1970, p. 185), and Tietjen and Moore (1972).
However, as regards quantitative discussion of actual tests and tbeir proper-
T= (xn-1 + Xn)12x (3.3.2) ties, nearly ali tbe references in tbe literature relate to block procedures.
as test statistic, providing tbat (xn_ 1+ xn)/2 ~x", tbe mean of x h . . . , xn_ 2. If Notable exceptions are tbe papers by McMillan and David (1971), McMillan
instead of H we adopt tbe corresponding labelled slippage alternative (1971), and Hawkins (1973) mentioned above.
Up to now autbors bave mostly used tbe word 'sequential' wben referring
H': x(l), ... , x<n- 2) be long to F to tbe successive testing of multiple outliers one at a time. We prefer tbe
X(n-1), X(n) belong to GÀ, word consecutive. Sequential testing, in common statistica! parlance, implies
that the sample size is not fixed but is determined in eacb realization in
calculations similar to tbose of equations (3.1.6) to (3.1.9) lead to T(n-t,n) as relation to tbe values of the earlier observations. In successive testing of
test statistic. Alternatively, Tcn-l,n) can be set up on tbe basis of a multiple multiple outliers this sequential property applies to tbe number of times tbe
decision argument as in (3.1.10). test is used, not to tbe sample size, which is fixed.
But H, H' are not tbe only pair of slippage alternatives. Consider instead Consecutive procedures present no separate problem of test construction,
the slippage alternative H" stating tbat Xt. ... , Xn- 2 belong to F, Xn-1 since tbey merely involve repeated use of single outlier tests from tbe
belongs to GÀ, and xn belongs to tbe exponential distribution GIL witb available repertoire. As regards performance criteria, however, we are in a
density p.ee-IL 8 x (x> O; p.< A). Tbe maximum likelibood ratio statistic is fresh situation. The measures described in Section 3.2 will need generalizing
now not T but since tbere is, by definition, not one contaminant but severa!. Suppose for
T"= (xn;~ n 1/n( n
X
)
+
Xn-~ Xn
)(n-2)/n.
(3.3.3) example tbat tbe alternative hypotbesis H envisages two discordant values,
liable to appear as upper outliers, in a sample of n. Tbe following events
The multiple decision argument leads to the statistic defined under H would seem to be relevant:

T" = (x (n-1) X (n)


l/n( X(n-1)
+ X(n) )(n-2)/n
(3.3.4)
E 1 : tbat x< n> is o ne of tbe two contaminants
(n-1,n) x2 ) n x . E 2 : tbat x<n-t> is one of tbe two contaminants
74 Outliers in statistica l data Discordancy tests foroutliers in univariate samples 75

E= E 1 n E 2 : tbat tbe two contaminants are tbe two outliers 3.4 DISCORDANCY TESTS FOR PRACTICAL USE
D 0 : tbat x(n) is not adjudged discordant (on tbe first test) We now present detailed information on a wide range of useful tests.
D 1: tbat x(n) is adjudged discordant (on tbe first test) but X(n-l) 'Useful' means two tbings bere, first tbat tbe test performs reasonably well,
is not adjudged discordant (on tbe second test) even if not optimally, in relation to some meaningful alternative bypotbesis;
D 2 : tbat x<n>• x<n-l) are botb adjudged discordant (requiring secondly tbat some information on percentage points is availabl~-~t le.ast
two tests). an inequality, if no t an extensive tabulation. Tbe main types of dtstnbutiOn
for wbicb useful tests are available are gamma and normal, and tbese are
'Total' measures corresponding to Pb P 3 , P5 in Section 3.2 will be respec- dealt with in Section 3.4.2, Section 3.4.3 respectively. Some tests for
tively P(D2), P(D 2n E), P(D2 1 E). But 'partial' measures are also of
samples from otber distributions, including uniform, log-normal, and Pois-
interest, sucb as P(D 1 l E) and P(D0 l E) ( = l_: P(D 1 l E)- P(D2I E)).
son, are described in Section 3.4.4.
Tbe measures of performance used by McMillan and David (1971) and
McMillan (1971) in tbeir pioneering papers on consecutive testing are not in 3.4.1 Guide to use of the tests
fact any of tbose we bave listed above, but are
In each case (gamma, normal, uniform, etc.) we commence witb some
P(C1 ), P(C2), P(C3 )
generai discussion of tbe types of outlier situation wbere tbe distribution
defined as follows. Denoting by R tbe discordancy region for tbe first-stage might be appropriate. We tben present for eacb distribution a contents list of
test based on ali n observations, C1 is tbe event tbat at least one of the two tests (pp. 77-79, 90-93, 116); we bave labelled tbe tests Gl, G2, ... for
contaminants is in R (and so adjudged discordant), and C2 the event that gamma samples, Pl, P2, ... for Poisson samples, and so on, as is explained
botb contaminants are in R. C3 is tbe event tbat at least one of tbe two in detail later. For each individuai test we tben give a worksheet, wbicb
contaminants is in R and the otber is in tbe discordancy region for a second presents systematically the following information wbere available:
test based on tbe reduced sample omitting the first contaminant. Tbe values
Label and purpose of test
of P( C 1) an d P( C3 ) should correspond reasonably closely witb P(D 1 l E)+
Test statistic, denoted by T with tbe test label as subscript. For example, we
P(D2 1 E) and P(D2 j E) respectively, at any rate wben the degree of
denote tbe statistic for test N l by T N l·
contamination is marked enougb to make it virtually certain tbat the
Test distribution, i.e. tbe distribution of tbe statistic on the working
contaminants appear as the outliers. However, in generai tbe use of P( C 1),
bypotbesis tbat tbe outlying value or set of values is consistent with tbe
P( C2), and P( C3 ) seems ratber arbitrary. P( C2) in particular seems difficult
to interpret. rest of tbe sample. Tbe probability density function and tbe distribution
function for tbis distribution are denoted respectively by fn (t), Fn (t), wbere n
An event wbich can usefully be defined is
is tbe sample size.
D 3 : tbat x<n> is not adjudged discordant (on tbe first test), but x<n-l) would Recurrence relationship for the test distribution (wbere appropriate).
be adjudged discordant on a bypotbetical second test omitting x(n) Simple inequality for the significance probability (wbere appropriate). Tbe
from tbe sample. significance probability attacbing to an observed value t of a discordancy
P(D3 ) is clearly a measure of tbe masking effect. It is relatively easy to statistic T is denoted bere by SP(t). Tbat is to say, SP(t) is tbe probability
calculate; we give a numerica! illustration. tbat, on tbe working bypotbesis, T takes values more discordant tban t
(for most tests tbis means T> t).
Example. Suppose we ha ve a sample of size l O from a norma[ distribution Tabulated significance levels in tbe form of references to wbere tbese will be
with unknown mean and variance, with eight of the values x< 1 >, ••• , X(s) found in tbe set of tables in tbe Appendix at tbe back of tbe book,
grouped together near O and x<9 ) far to the rìght at a say. Using consecutive togetber witb source attribution.
Further tables: reference to books and journals extending our tabulated
testing, with a discordancy statistic of the form x(n)- i at the 5 per cent leve[, it significance levels, or tabulating otber quantities of interest, for example,
s power.
will be impossible to identify x< 1o> as discordant, owing to the masking effect of
References to otber publisbed materia! on tbe test sucb as derivation of test
x<9 ), unless it exceeds 1.47 a. Por otherwise 1-P(D3 ) will exceed 0.05.
distribution, optimality properties, power considerations, etc.
----a------------)(
)UeaeaeE )(
Properties of test: advantages and disadvantages, including statement of any
x
(9)
x secondary features of tbe data against wbicb tbe test provides a particular
(10)
74 Outliers in statistica l data Discordancy tests foroutliers in univariate samples 75

E= E 1 n E 2 : tbat tbe two contaminants are tbe two outliers 3.4 DISCORDANCY TESTS FOR PRACTICAL USE
D 0 : tbat x(n) is not adjudged discordant (on tbe first test) We now present detailed information on a wide range of useful tests.
D 1: tbat x(n) is adjudged discordant (on tbe first test) but X(n-l) 'Useful' means two tbings bere, first tbat tbe test performs reasonably well,
is not adjudged discordant (on tbe second test) even if not optimally, in relation to some meaningful alternative bypotbesis;
D 2 : tbat x<n>• x<n-l) are botb adjudged discordant (requiring secondly tbat some information on percentage points is availabl~-~t le.ast
two tests). an inequality, if no t an extensive tabulation. Tbe main types of dtstnbutiOn
for wbicb useful tests are available are gamma and normal, and tbese are
'Total' measures corresponding to Pb P 3 , P5 in Section 3.2 will be respec- dealt with in Section 3.4.2, Section 3.4.3 respectively. Some tests for
tively P(D2), P(D 2n E), P(D2 1 E). But 'partial' measures are also of
samples from otber distributions, including uniform, log-normal, and Pois-
interest, sucb as P(D 1 l E) and P(D0 l E) ( = l_: P(D 1 l E)- P(D2I E)).
son, are described in Section 3.4.4.
Tbe measures of performance used by McMillan and David (1971) and
McMillan (1971) in tbeir pioneering papers on consecutive testing are not in 3.4.1 Guide to use of the tests
fact any of tbose we bave listed above, but are
In each case (gamma, normal, uniform, etc.) we commence witb some
P(C1 ), P(C2), P(C3 )
generai discussion of tbe types of outlier situation wbere tbe distribution
defined as follows. Denoting by R tbe discordancy region for tbe first-stage might be appropriate. We tben present for eacb distribution a contents list of
test based on ali n observations, C1 is tbe event tbat at least one of the two tests (pp. 77-79, 90-93, 116); we bave labelled tbe tests Gl, G2, ... for
contaminants is in R (and so adjudged discordant), and C2 the event that gamma samples, Pl, P2, ... for Poisson samples, and so on, as is explained
botb contaminants are in R. C3 is tbe event tbat at least one of tbe two in detail later. For each individuai test we tben give a worksheet, wbicb
contaminants is in R and the otber is in tbe discordancy region for a second presents systematically the following information wbere available:
test based on tbe reduced sample omitting the first contaminant. Tbe values
Label and purpose of test
of P( C 1) an d P( C3 ) should correspond reasonably closely witb P(D 1 l E)+
Test statistic, denoted by T with tbe test label as subscript. For example, we
P(D2 1 E) and P(D2 j E) respectively, at any rate wben the degree of
denote tbe statistic for test N l by T N l·
contamination is marked enougb to make it virtually certain tbat the
Test distribution, i.e. tbe distribution of tbe statistic on the working
contaminants appear as the outliers. However, in generai tbe use of P( C 1),
bypotbesis tbat tbe outlying value or set of values is consistent with tbe
P( C2), and P( C3 ) seems ratber arbitrary. P( C2) in particular seems difficult
to interpret. rest of tbe sample. Tbe probability density function and tbe distribution
function for tbis distribution are denoted respectively by fn (t), Fn (t), wbere n
An event wbich can usefully be defined is
is tbe sample size.
D 3 : tbat x<n> is not adjudged discordant (on tbe first test), but x<n-l) would Recurrence relationship for the test distribution (wbere appropriate).
be adjudged discordant on a bypotbetical second test omitting x(n) Simple inequality for the significance probability (wbere appropriate). Tbe
from tbe sample. significance probability attacbing to an observed value t of a discordancy
P(D3 ) is clearly a measure of tbe masking effect. It is relatively easy to statistic T is denoted bere by SP(t). Tbat is to say, SP(t) is tbe probability
calculate; we give a numerica! illustration. tbat, on tbe working bypotbesis, T takes values more discordant tban t
(for most tests tbis means T> t).
Example. Suppose we ha ve a sample of size l O from a norma[ distribution Tabulated significance levels in tbe form of references to wbere tbese will be
with unknown mean and variance, with eight of the values x< 1 >, ••• , X(s) found in tbe set of tables in tbe Appendix at tbe back of tbe book,
grouped together near O and x<9 ) far to the rìght at a say. Using consecutive togetber witb source attribution.
Further tables: reference to books and journals extending our tabulated
testing, with a discordancy statistic of the form x(n)- i at the 5 per cent leve[, it significance levels, or tabulating otber quantities of interest, for example,
s power.
will be impossible to identify x< 1o> as discordant, owing to the masking effect of
References to otber publisbed materia! on tbe test sucb as derivation of test
x<9 ), unless it exceeds 1.47 a. Por otherwise 1-P(D3 ) will exceed 0.05.
distribution, optimality properties, power considerations, etc.
----a------------)(
)UeaeaeE )(
Properties of test: advantages and disadvantages, including statement of any
x
(9)
x secondary features of tbe data against wbicb tbe test provides a particular
(10)
76 Outliers in statistica[ data Discordancy tests for outliers in un iv ariate samples 77

safeguard, such as a suspicious least value when testing for an upper or even deaths from horse kicks. Attention to such problems in the litera-
outlier; whether the test has a theoretical validation, such as being a ture has developed in recent years (Epstein, 1960a, 1960b; Laurent, 1963;
maximum likelihood ratio test for some alternative; information on power Basu, 1965; Likes, 1966; Kabe, 1970; Kale and Sinha, 1971; Joshi, 1972b;
or other performance measures, if available. Sinha, 1972, 1973a, 1973b, 1973c; Veale and Kale, 1972; Mount and Kale,
Illustrative examples are also given for some of the tests. 1973; Kale, 1974a, 1975c; Lewis and Fieller, 1978).
Further applications arise through transformation. Procedures for outliers
The following notation is used for standard distributions, random vari-
in exponential samples can sometimes be applied to outliers in samples from
ables and functions:
other distributions, such as the extreme-value distribution and the Weibull,
by transforming the observations. For example, if the n values xb ... , xn
Notation for distributions or random variables are (on the working hypothesis) a sample from the extreme-value distribu-
tion with distribution function P(X ~x)= exp{-exp[ -(x- a)/b ]}, then the n
N(~-t, u2) normal with mean p., and variance u 2
transformed values exp(- x 1 / b), ... , exp(- XJj b) are a sample from the expo-
tv Student's t with v degrees of freedom
nential distribution with mean exp(-a/b). Thus if b is known but a un-
FVt,V2 variance-ratio (or F) with v1 and v2 degrees of freedom
known, an outlier in the extreme-value sample can be tested by applying to
f(r, A) gamma with scale parameter A and shape parameter r, i.e. with
the transformed values a suitable discordancy test for an exponential sam-
density f(x) = [Arf(r)t 1(xr- 1)exp(- x/ A) (x> O)
ple. See Section 3.4.4.
E( A) exponential with mean A, i.e. with density f(x) =A - l exp(- x/ A)
Outlier situations can also arise in the context of shifted exponential or
(x> O), O (x< 0)-same as f(l, A)
gamma distributions. If the origin of the exponential distribution E(A) with
E( A; a) exponential with scale parameter A and origin at a, i.e. with
density A- 1 exp(-x/A) (x>O), O (x<O) is shifted to x= a, say, we get the
density f(x)=A- 1 exp[-(x-a]/A] (x> a), O (x<a)
distribution E(A;a) with density A- 1 exp[-(x-a)/A] (x>a), O (x<a).
P(~-t) Poisson with mean IL
(Similar remarks apply of course to the gamma distribution f(r, A).) Now
B(n, p) binomia! with parameters n, p
some discordancy tests for exponential or gamma samples do require the
H(N; n, r) hypergeometric with parameters N: n, r
assumption that the origin of the distribution is at zero, or at any rate is
known; for example, the test based o n the statistic x< n/L xi assumes that the
Notation for functions origin is zero, and it can obviously be adapted to known non-zero origin a
cf>( t) probability density function of N(O, 1), i.e. (21T)-! exp(-!t 2 ) by using (x<n>- a)/CI xi- na) as statistic. In contrast there are other tests
<l>( t) distribution function of N(O, 1), i.e. S~"" cf>(u) du which do not depend on knowledge of the origin, for example the Dixon-
B(r, s) beta function with parameters r and s, i.e. f(r)f(s)/f(r+ s) type test based on the statistic (x<n>- x<n-o)l(x<n>- x< 1 >); such tests are useful
br,s(t) beta density with parameters r and s, i.e. [B(r, s)t 1 tr- 1 (1- t)s-l for two reasons. First, ·they are needed in the data contexts, sometimes
(O~ t~ l) encountered, where a shifted gamma or exponential distribution is the
appropriate model. For instance, the development times of diapausing
3.4.2 Discordancy tests for gamma (including exponential) samples pupae of the cotton bollworm under conditions of constant temperature may
be regarded as exponentially distributed with non-zero origin tmim this
Until fairly recently, most of the published work on outliers in univariate parameter being a minimum development time. Secondly, they are useful
samples has been in the context of normal distributions. However, problems for testing outliers in samples from Pareto distributions, since the above-
of outliers in samples from gamma distributions, and in particular from described transformation technique can be applied. Specifically, if the n
exponential distributions, are of considerable practical importance. Outlier values x 11 ••• , xn are a sample from a Pareto distribution with origin a (>O)
situations in exponential samples arise naturally in such contexts as life and shape parameter r, then the n transformed values In xb ... , In Xn are a
testing; outliers in x2 samples arise in analysis of variance; outliers in sample from an exponential distribution with origin In a and scale parameter
gamma samples of arbitrary shape parameter arise with skew-distributed 1/r, i.e. from E(l/r; In a) in our notation. See Section 3.4.4.
data, for which a gamma distribution is often a useful pragmatic model; and
outliers in both gamma and specifically exponential samples arise in any Contents List: Gamma Samples
contexts where Poisson processes are appropriate basic models, e.g. in The gamma distribution with scale parameter A and shape parameter r, i.e.
studying traffic flow, failures of electronic equipment, biologica! aggregation, the distribution with density f(x)=[Arf(r)t 1 xr- 1 exp(-x/A) (x>O), is
76 Outliers in statistica[ data Discordancy tests for outliers in un iv ariate samples 77

safeguard, such as a suspicious least value when testing for an upper or even deaths from horse kicks. Attention to such problems in the litera-
outlier; whether the test has a theoretical validation, such as being a ture has developed in recent years (Epstein, 1960a, 1960b; Laurent, 1963;
maximum likelihood ratio test for some alternative; information on power Basu, 1965; Likes, 1966; Kabe, 1970; Kale and Sinha, 1971; Joshi, 1972b;
or other performance measures, if available. Sinha, 1972, 1973a, 1973b, 1973c; Veale and Kale, 1972; Mount and Kale,
Illustrative examples are also given for some of the tests. 1973; Kale, 1974a, 1975c; Lewis and Fieller, 1978).
Further applications arise through transformation. Procedures for outliers
The following notation is used for standard distributions, random vari-
in exponential samples can sometimes be applied to outliers in samples from
ables and functions:
other distributions, such as the extreme-value distribution and the Weibull,
by transforming the observations. For example, if the n values xb ... , xn
Notation for distributions or random variables are (on the working hypothesis) a sample from the extreme-value distribu-
tion with distribution function P(X ~x)= exp{-exp[ -(x- a)/b ]}, then the n
N(~-t, u2) normal with mean p., and variance u 2
transformed values exp(- x 1 / b), ... , exp(- XJj b) are a sample from the expo-
tv Student's t with v degrees of freedom
nential distribution with mean exp(-a/b). Thus if b is known but a un-
FVt,V2 variance-ratio (or F) with v1 and v2 degrees of freedom
known, an outlier in the extreme-value sample can be tested by applying to
f(r, A) gamma with scale parameter A and shape parameter r, i.e. with
the transformed values a suitable discordancy test for an exponential sam-
density f(x) = [Arf(r)t 1(xr- 1)exp(- x/ A) (x> O)
ple. See Section 3.4.4.
E( A) exponential with mean A, i.e. with density f(x) =A - l exp(- x/ A)
Outlier situations can also arise in the context of shifted exponential or
(x> O), O (x< 0)-same as f(l, A)
gamma distributions. If the origin of the exponential distribution E(A) with
E( A; a) exponential with scale parameter A and origin at a, i.e. with
density A- 1 exp(-x/A) (x>O), O (x<O) is shifted to x= a, say, we get the
density f(x)=A- 1 exp[-(x-a]/A] (x> a), O (x<a)
distribution E(A;a) with density A- 1 exp[-(x-a)/A] (x>a), O (x<a).
P(~-t) Poisson with mean IL
(Similar remarks apply of course to the gamma distribution f(r, A).) Now
B(n, p) binomia! with parameters n, p
some discordancy tests for exponential or gamma samples do require the
H(N; n, r) hypergeometric with parameters N: n, r
assumption that the origin of the distribution is at zero, or at any rate is
known; for example, the test based o n the statistic x< n/L xi assumes that the
Notation for functions origin is zero, and it can obviously be adapted to known non-zero origin a
cf>( t) probability density function of N(O, 1), i.e. (21T)-! exp(-!t 2 ) by using (x<n>- a)/CI xi- na) as statistic. In contrast there are other tests
<l>( t) distribution function of N(O, 1), i.e. S~"" cf>(u) du which do not depend on knowledge of the origin, for example the Dixon-
B(r, s) beta function with parameters r and s, i.e. f(r)f(s)/f(r+ s) type test based on the statistic (x<n>- x<n-o)l(x<n>- x< 1 >); such tests are useful
br,s(t) beta density with parameters r and s, i.e. [B(r, s)t 1 tr- 1 (1- t)s-l for two reasons. First, ·they are needed in the data contexts, sometimes
(O~ t~ l) encountered, where a shifted gamma or exponential distribution is the
appropriate model. For instance, the development times of diapausing
3.4.2 Discordancy tests for gamma (including exponential) samples pupae of the cotton bollworm under conditions of constant temperature may
be regarded as exponentially distributed with non-zero origin tmim this
Until fairly recently, most of the published work on outliers in univariate parameter being a minimum development time. Secondly, they are useful
samples has been in the context of normal distributions. However, problems for testing outliers in samples from Pareto distributions, since the above-
of outliers in samples from gamma distributions, and in particular from described transformation technique can be applied. Specifically, if the n
exponential distributions, are of considerable practical importance. Outlier values x 11 ••• , xn are a sample from a Pareto distribution with origin a (>O)
situations in exponential samples arise naturally in such contexts as life and shape parameter r, then the n transformed values In xb ... , In Xn are a
testing; outliers in x2 samples arise in analysis of variance; outliers in sample from an exponential distribution with origin In a and scale parameter
gamma samples of arbitrary shape parameter arise with skew-distributed 1/r, i.e. from E(l/r; In a) in our notation. See Section 3.4.4.
data, for which a gamma distribution is often a useful pragmatic model; and
outliers in both gamma and specifically exponential samples arise in any Contents List: Gamma Samples
contexts where Poisson processes are appropriate basic models, e.g. in The gamma distribution with scale parameter A and shape parameter r, i.e.
studying traffic flow, failures of electronic equipment, biologica! aggregation, the distribution with density f(x)=[Arf(r)t 1 xr- 1 exp(-x/A) (x>O), is
78 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 79

denoted by r(r, A). If tbe origin is sbifted to a, tbe density is [A 'f(r)r 1 (x- ay-t Worksheet
exp[ -(x- a)/A] (x> a). r(v/2, 2) is tbe x2 -distribution witb v degrees La bel page no. Description of test Statistic
of freedom, x;.
r(l, A) is tbe exponential distribution witb mean A, denoted
El O 86 Test for a lower outlier-pair x(l), x<2> in x(3)- x(1l
bere by E(A). Tbe corresponding distribution witb origin sbifted to x= a is an exponential sample with origin un- X(n)- X(l)
denoted by E(A; a). known
In ali tbe tests given bere, A is assumed unknown. Except in test Ga13, Eall 86 Generai Dixon-type test for an exponen- X(s)-X(r)
tbe sbape parameter r is assumed known. Tbe tests are classified as follows: tial sample, using knowledge of origin x<q>-a
a
Code Distribution under tbe working bypotbesis El l 87 Generai Dixon-type test for an exponen- X(s)-X(r)
tial sample, irrespective of origin X(q)-X(p)
G gamma witb unknown origin E12 88 Test for presence of an undefined Shapiro and Wilk's
E exponential witb unknown origin number of dìscordant values in an ex- W-statistic. See
Ga gamma witb known origin O (or more generally a) ponential sample worksheet
Ea exponential witb known origin O (or more generally a) Ga13 88 Testing for discordancy in a gamma sam- See worksheet
ple of unknown shape parameter r by
Needless to say, any G- or Ga-test can be applied to an exponential sample transformation of the variables
as a special case (r = l); E- and Ea-tests are specific to tbe exponential case
(generally because tables are only available for tbis case). Gal(Eal) Discordancy test for a single upper outlier x(n) in a gamma (or
exponential) sample
Worksheet
La bel page no. Description of test Statistic Test statistic:
Gal(Eal) 79 Test for a single upper outlier x<n> in a X(n/L Xj
_ outlier _ X(n)
gamma sample
Toat- sum of observat1ons
. -"'-- xi ·
Ea2 80 Test for a single upper outlier x<n> in an X(n)- X(n-1) Test distribution:
exponential sample X( n)
E2 81 Test for a single upper outlier x<n> in an X(n)- X(n-1) For r(r, A),
exponential sample irrespective of X(n)-x(1)
origin
Ga3(Ea3) 81 Test for a single lower outlier x< 1> in a For E(A), O~ t~ l,
x<l)IL xi
gamma sample
E4 82 Test for a single lower outlier x(l) in an x<2>- x(l)
exponential sample with origin un- wbere [1/t] denotes tbe integer part of l/t.
X(n)-x(1)
known Recurrence relationship:
Ga5(Ea5) 83 Test for k (;:::=2) upper outliers X(n) + • • •+ X(n-k+1)
X<n-k+ 1>, ..• , X<n> in a gamma sample LXi
Inequality: In (t)= nbr,(n-l)r(t)Fn-l{t/(1- t)}.
Ea6 83 Test for an upper outlier-pair x<n-1>• X<n> X(n)- X(n-2)
in an exponential sample SP(t) ~ nP[Fzr,Z(n-t)r> (n -l)t/(1- t)]; equality for t~~­
X( n)
E6 83 Test for an upper outlier-pair X<n- 1>, X<n> X(n)- X(n-2) Tabulated significance levels: Table I, pp. 290-291; reproduced (witb
in an exponential sample irrespective
of origin
X(n)- X(l) appropriate cbange of notation) from Eisenbart, Hastay, and Wallis (1947),
Tables 15.1 and 15.2, pages 390-391.
Ga7(Ea7) 84 Test for a lower and upper outlier-pair X(n)
x<l)• x<n> in a gamma sample
X(1)
References: Fisber (1929), Cocbran (1941).
E8 85 Test for a lower and upper outlier-pair X(n-1)- X(2)
x 0 >, x<n> in an exponential sample with Properties of test: No special features. AH purpose, maximum likelibood
X(n)-x(1)
origin unknown ratio test for labelled slippage alternative.
Ga9(Ea9) 85 Test for k (;::::2) lower outliers x<1> + ... + x<k> Example: Table 3.1 sbows a sample of 131 excess cycle times in steel
x< 1 >, ••• , x<k> in a gamma sample
LXj manufacture.
78 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 79

denoted by r(r, A). If tbe origin is sbifted to a, tbe density is [A 'f(r)r 1 (x- ay-t Worksheet
exp[ -(x- a)/A] (x> a). r(v/2, 2) is tbe x2 -distribution witb v degrees La bel page no. Description of test Statistic
of freedom, x;.
r(l, A) is tbe exponential distribution witb mean A, denoted
El O 86 Test for a lower outlier-pair x(l), x<2> in x(3)- x(1l
bere by E(A). Tbe corresponding distribution witb origin sbifted to x= a is an exponential sample with origin un- X(n)- X(l)
denoted by E(A; a). known
In ali tbe tests given bere, A is assumed unknown. Except in test Ga13, Eall 86 Generai Dixon-type test for an exponen- X(s)-X(r)
tbe sbape parameter r is assumed known. Tbe tests are classified as follows: tial sample, using knowledge of origin x<q>-a
a
Code Distribution under tbe working bypotbesis El l 87 Generai Dixon-type test for an exponen- X(s)-X(r)
tial sample, irrespective of origin X(q)-X(p)
G gamma witb unknown origin E12 88 Test for presence of an undefined Shapiro and Wilk's
E exponential witb unknown origin number of dìscordant values in an ex- W-statistic. See
Ga gamma witb known origin O (or more generally a) ponential sample worksheet
Ea exponential witb known origin O (or more generally a) Ga13 88 Testing for discordancy in a gamma sam- See worksheet
ple of unknown shape parameter r by
Needless to say, any G- or Ga-test can be applied to an exponential sample transformation of the variables
as a special case (r = l); E- and Ea-tests are specific to tbe exponential case
(generally because tables are only available for tbis case). Gal(Eal) Discordancy test for a single upper outlier x(n) in a gamma (or
exponential) sample
Worksheet
La bel page no. Description of test Statistic Test statistic:
Gal(Eal) 79 Test for a single upper outlier x<n> in a X(n/L Xj
_ outlier _ X(n)
gamma sample
Toat- sum of observat1ons
. -"'-- xi ·
Ea2 80 Test for a single upper outlier x<n> in an X(n)- X(n-1) Test distribution:
exponential sample X( n)
E2 81 Test for a single upper outlier x<n> in an X(n)- X(n-1) For r(r, A),
exponential sample irrespective of X(n)-x(1)
origin
Ga3(Ea3) 81 Test for a single lower outlier x< 1> in a For E(A), O~ t~ l,
x<l)IL xi
gamma sample
E4 82 Test for a single lower outlier x(l) in an x<2>- x(l)
exponential sample with origin un- wbere [1/t] denotes tbe integer part of l/t.
X(n)-x(1)
known Recurrence relationship:
Ga5(Ea5) 83 Test for k (;:::=2) upper outliers X(n) + • • •+ X(n-k+1)
X<n-k+ 1>, ..• , X<n> in a gamma sample LXi
Inequality: In (t)= nbr,(n-l)r(t)Fn-l{t/(1- t)}.
Ea6 83 Test for an upper outlier-pair x<n-1>• X<n> X(n)- X(n-2)
in an exponential sample SP(t) ~ nP[Fzr,Z(n-t)r> (n -l)t/(1- t)]; equality for t~~­
X( n)
E6 83 Test for an upper outlier-pair X<n- 1>, X<n> X(n)- X(n-2) Tabulated significance levels: Table I, pp. 290-291; reproduced (witb
in an exponential sample irrespective
of origin
X(n)- X(l) appropriate cbange of notation) from Eisenbart, Hastay, and Wallis (1947),
Tables 15.1 and 15.2, pages 390-391.
Ga7(Ea7) 84 Test for a lower and upper outlier-pair X(n)
x<l)• x<n> in a gamma sample
X(1)
References: Fisber (1929), Cocbran (1941).
E8 85 Test for a lower and upper outlier-pair X(n-1)- X(2)
x 0 >, x<n> in an exponential sample with Properties of test: No special features. AH purpose, maximum likelibood
X(n)-x(1)
origin unknown ratio test for labelled slippage alternative.
Ga9(Ea9) 85 Test for k (;::::2) lower outliers x<1> + ... + x<k> Example: Table 3.1 sbows a sample of 131 excess cycle times in steel
x< 1 >, ••• , x<k> in a gamma sample
LXj manufacture.
80 Outliers in statistical data Discordancy tests for outliers in univariate samples 81
Table 3.1 Example: Applying test Ea2 to the example discussed in Worksheet
x . 92-35
Excess cycle time X Frequency Frequency Gal(Eal), the value of TEaz with n= 131 1s t= 92 = 0.6196.
l 18 11 6
2 12 12 7 Hence
3 18 13 2 SP(t) = 1- Fn(t) = 131 x 130B(3.629, 130)
4 16 14 l
5 lO 15 3 = 131 x 130f(3.629)f(l30)/f(l33.629)
6 4 21 3 = 131 !(2.629)(1.629)(0.897)/[(132.63 133·13 e- 132·63.J(21T )]
7 9 32 2 = 0.0013
8 9 35 l
9 2 92 l Compare SP(t) ~ 0.00008 for test Eal.
lO 7 131
E2 Discordancy test for a single upper outlier x(n) in an exponential sample
The sample of size 130 obtained by omitting the outlier x031 >= 92 has mean with unknown origin
i= 6.44, variance s 2 = 38.14, standard deviation s = 6.18, and third and
fourth moments about the mean m 3 = 493.4, m4 = 13444. Hence i/ s = 1.04, Test statistic:
m3/s 3 = 2.09, m4 /s 4 = 9.24, suggesting that the distribution may reasonably excess X( n)- X(n-1)
be assumed exponential (p,/u =l, p, 3/u 3 = 2, p, 4 /u 4 = 9 for an exponential TEz=--= ·
range x( n)- x(1)
distribution). On this assumption we can test the outlier 92 for consistency
with the other 130 values using test Eal. The value of TEa 1 is t= 92/929 = Test distribution:
0.0990, so
130 x 0.0990) 2- t ) (O~ t~ 1).
SP{t) ~ 131P( F 2,26 o> _ Fn{t)=l-(n-l)(n-2)B ( l-t' n-2
0 9010
= 131P(F2,260 > 14.28) T Ez for a sample of size n has the same test distribution as TEaz for a sample
of size n -l.
14.28)-130
=? 131 ( l+ 130 = 0.00008,
Tabulated significance levels: Table III, page 293; see Worksheet Ea2.
i.e. the evidence for regarding the value 92 as being too large to bave arisen
References: Likes (1966), Kabe (1970).
from the same distribution as the other 130 values is very strong.
Properties of test: Dixon-type test. Vulnerable to masking effect from x<n- 1>.
Ea2 Discordancy test for a single upper outlier x(n) in an exponential sample
Test statistic: Ga3(Ea3) Discordancy test for a single lower outlier x< 1> in a gamma (or
T = excess = x(n)- X(n-1) exponential) sample
EaZ OUtlier X(n) •

Test distribution: Test statistic:

F.(t)=l-n(n-l)BG=:, n-1) (O~ t~ 1).


outlier x(l)

T Ga3 = sum of observations = I xi ·


Tabulated significance levels: Table III, page 293; abridged from Likes Test distribution:
(1966), Table l, page 49, where 10 per cent, 5 per cent, and l per cent
points are given for n= 2(1)20. For E(A), fn(t) =n( n -1)(1- ntt- 2
Reference: Likes (1966).
Properties of test: Vulnerable to masking effect from x<n- 1). o
80 Outliers in statistical data Discordancy tests for outliers in univariate samples 81
Table 3.1 Example: Applying test Ea2 to the example discussed in Worksheet
x . 92-35
Excess cycle time X Frequency Frequency Gal(Eal), the value of TEaz with n= 131 1s t= 92 = 0.6196.
l 18 11 6
2 12 12 7 Hence
3 18 13 2 SP(t) = 1- Fn(t) = 131 x 130B(3.629, 130)
4 16 14 l
5 lO 15 3 = 131 x 130f(3.629)f(l30)/f(l33.629)
6 4 21 3 = 131 !(2.629)(1.629)(0.897)/[(132.63 133·13 e- 132·63.J(21T )]
7 9 32 2 = 0.0013
8 9 35 l
9 2 92 l Compare SP(t) ~ 0.00008 for test Eal.
lO 7 131
E2 Discordancy test for a single upper outlier x(n) in an exponential sample
The sample of size 130 obtained by omitting the outlier x031 >= 92 has mean with unknown origin
i= 6.44, variance s 2 = 38.14, standard deviation s = 6.18, and third and
fourth moments about the mean m 3 = 493.4, m4 = 13444. Hence i/ s = 1.04, Test statistic:
m3/s 3 = 2.09, m4 /s 4 = 9.24, suggesting that the distribution may reasonably excess X( n)- X(n-1)
be assumed exponential (p,/u =l, p, 3/u 3 = 2, p, 4 /u 4 = 9 for an exponential TEz=--= ·
range x( n)- x(1)
distribution). On this assumption we can test the outlier 92 for consistency
with the other 130 values using test Eal. The value of TEa 1 is t= 92/929 = Test distribution:
0.0990, so
130 x 0.0990) 2- t ) (O~ t~ 1).
SP{t) ~ 131P( F 2,26 o> _ Fn{t)=l-(n-l)(n-2)B ( l-t' n-2
0 9010
= 131P(F2,260 > 14.28) T Ez for a sample of size n has the same test distribution as TEaz for a sample
of size n -l.
14.28)-130
=? 131 ( l+ 130 = 0.00008,
Tabulated significance levels: Table III, page 293; see Worksheet Ea2.
i.e. the evidence for regarding the value 92 as being too large to bave arisen
References: Likes (1966), Kabe (1970).
from the same distribution as the other 130 values is very strong.
Properties of test: Dixon-type test. Vulnerable to masking effect from x<n- 1>.
Ea2 Discordancy test for a single upper outlier x(n) in an exponential sample
Test statistic: Ga3(Ea3) Discordancy test for a single lower outlier x< 1> in a gamma (or
T = excess = x(n)- X(n-1) exponential) sample
EaZ OUtlier X(n) •

Test distribution: Test statistic:

F.(t)=l-n(n-l)BG=:, n-1) (O~ t~ 1).


outlier x(l)

T Ga3 = sum of observations = I xi ·


Tabulated significance levels: Table III, page 293; abridged from Likes Test distribution:
(1966), Table l, page 49, where 10 per cent, 5 per cent, and l per cent
points are given for n= 2(1)20. For E(A), fn(t) =n( n -1)(1- ntt- 2
Reference: Likes (1966).
Properties of test: Vulnerable to masking effect from x<n- 1). o
82 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 83

For r(2, A) and r(3, A), the following expressions are ~ivailable for small n: Ga5(Ea5) Discordancy test for k (~ 2) upper outliers in a gamma (or expo-
r(2, A): / 2(t) = 12t(l- t) (O~ t~!)
nential) sample
2
/ 3 (t) = 60t(l- 3t)(l- 3t ) (O~ t~ j-) Test statistic:
2 2
f4(t) = 168t(l- 4t) (1 + 3t -12t - 4t 3 ) (O~ t~:\) _ sum of outliers _ X(n-k+ 1 ) + ... + X(n)
T.os- -
f 5 (t) = 360t(l- 5t) 3 (1 + 8t -18t2- 80t 3 + 64t 4 ) (O~ t~!)
a sum of observations I xi
f6 (t) = 660t(l- 6t) 4 (1 + 15 t- 360t3 + 864t5 ) (O~ t~ Ì) Inequality:

r(3,A): f2(t)=60t 2(1-t) 2 (O~t~!)


4

SP( t)~ w~F 2><.2(n-kJ, > ~~;~ :)J.


fit) = 504t2 (1- 3t)(l- 2t+ 4t2-18t 3 + 2lt4 ) (O~ t~ j-)
Reference: Fieller (1976).
/4 (t) = 1980t2(1- 4t) 2(1- Bt + 28t 2 - 224t 3 + 1540t4 - 5266t 5
+ 11032t6 -16832f + 13696t8 ) (O~ t~:\) Properties of test: Maximum likelihood ratio test. Inequality unlikely to be
useful unless k small.
Recurrence relationship:
Ea6 Discordancy test for an upper outlier-pair x(n- 1 ), x(n) in an exponential
sample
Inequality: Test statistic:
SP(t) < nP(F 2r, 2<n- 1)r <(n -l)t/(1- t)). T = excess = X(n)- X(n- 2 )
6
Tabulated significance levels: Table II, page 292; freshly compiled. Ea OUtlier X(n) •

Reference: Lewis and Fieller (1.978). Test distribution:


Properties of test: Ali purpose, maximum likelihood ratio test.
F.(t) = 1- n(n -l)(n- 2>[ B(31-::_~~, n- 2)-!BG=:, n- 2)].
E4 Discordancy test for a single lower outlier x< 1 ) in an exponential sample
with unknown origin Reference: Likes (1966).
Test statistic: Properties of test: Can be used as a discordancy test for x(n) if it is desired
to insure against masking by x<n- 1>.
_ excess _ x( 2 ) - x(l)
TE4- - •
range x( n)- x(1) E6 Discordancy test for an upper outlier-pair x<n- 1 ), x(n) in an exponential
Test distribution: sample with unknown origin

F.(t) = (n-2)Bc +~~~


2
)t, n-2) (O~ t~ 1).
Test statistic:
_ excess _ X( n)- X(n-2)
TE6- - ·
Tabulated significance levels: Table V, page 296; abridged from Like8 range x(n)- x(l)
(1966), Table 2, page 51, where 10 per cent, 5 per cent, and l per cent
Test distribution:
points are given for n= 3(1)20.
References: Likes (1966), Kabe (1970). Fn(t)=l-(n-l)(n-2)(n-3) [ B ( 3-2t
l-t, n-3 ) -!B (3- t n-3 )] .
l-t'
Properties of test: Dixon-type test. Note a practical difficulty in applying it:
the smallest values x(l), x(2) need to be given to a sufficient degree of TE6 fora sample of size n has the same distribution as TEa6 fora sample of
accuracy, which frequently will not be the case in practice (e.g. excess cycle size n -l.
times data, Table 3.1). Reference: Likes (1966).
82 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 83

For r(2, A) and r(3, A), the following expressions are ~ivailable for small n: Ga5(Ea5) Discordancy test for k (~ 2) upper outliers in a gamma (or expo-
r(2, A): / 2(t) = 12t(l- t) (O~ t~!)
nential) sample
2
/ 3 (t) = 60t(l- 3t)(l- 3t ) (O~ t~ j-) Test statistic:
2 2
f4(t) = 168t(l- 4t) (1 + 3t -12t - 4t 3 ) (O~ t~:\) _ sum of outliers _ X(n-k+ 1 ) + ... + X(n)
T.os- -
f 5 (t) = 360t(l- 5t) 3 (1 + 8t -18t2- 80t 3 + 64t 4 ) (O~ t~!)
a sum of observations I xi
f6 (t) = 660t(l- 6t) 4 (1 + 15 t- 360t3 + 864t5 ) (O~ t~ Ì) Inequality:

r(3,A): f2(t)=60t 2(1-t) 2 (O~t~!)


4

SP( t)~ w~F 2><.2(n-kJ, > ~~;~ :)J.


fit) = 504t2 (1- 3t)(l- 2t+ 4t2-18t 3 + 2lt4 ) (O~ t~ j-)
Reference: Fieller (1976).
/4 (t) = 1980t2(1- 4t) 2(1- Bt + 28t 2 - 224t 3 + 1540t4 - 5266t 5
+ 11032t6 -16832f + 13696t8 ) (O~ t~:\) Properties of test: Maximum likelihood ratio test. Inequality unlikely to be
useful unless k small.
Recurrence relationship:
Ea6 Discordancy test for an upper outlier-pair x(n- 1 ), x(n) in an exponential
sample
Inequality: Test statistic:
SP(t) < nP(F 2r, 2<n- 1)r <(n -l)t/(1- t)). T = excess = X(n)- X(n- 2 )
6
Tabulated significance levels: Table II, page 292; freshly compiled. Ea OUtlier X(n) •

Reference: Lewis and Fieller (1.978). Test distribution:


Properties of test: Ali purpose, maximum likelihood ratio test.
F.(t) = 1- n(n -l)(n- 2>[ B(31-::_~~, n- 2)-!BG=:, n- 2)].
E4 Discordancy test for a single lower outlier x< 1 ) in an exponential sample
with unknown origin Reference: Likes (1966).
Test statistic: Properties of test: Can be used as a discordancy test for x(n) if it is desired
to insure against masking by x<n- 1>.
_ excess _ x( 2 ) - x(l)
TE4- - •
range x( n)- x(1) E6 Discordancy test for an upper outlier-pair x<n- 1 ), x(n) in an exponential
Test distribution: sample with unknown origin

F.(t) = (n-2)Bc +~~~


2
)t, n-2) (O~ t~ 1).
Test statistic:
_ excess _ X( n)- X(n-2)
TE6- - ·
Tabulated significance levels: Table V, page 296; abridged from Like8 range x(n)- x(l)
(1966), Table 2, page 51, where 10 per cent, 5 per cent, and l per cent
Test distribution:
points are given for n= 3(1)20.
References: Likes (1966), Kabe (1970). Fn(t)=l-(n-l)(n-2)(n-3) [ B ( 3-2t
l-t, n-3 ) -!B (3- t n-3 )] .
l-t'
Properties of test: Dixon-type test. Note a practical difficulty in applying it:
the smallest values x(l), x(2) need to be given to a sufficient degree of TE6 fora sample of size n has the same distribution as TEa6 fora sample of
accuracy, which frequently will not be the case in practice (e.g. excess cycle size n -l.
times data, Table 3.1). Reference: Likes (1966).
84 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 85

Ga7(Ea7) Discordancy test for a lower and upper outlier-pair x(l), x(n) in a we make maximal allowance for rounding error and use values 40.5, 5.5
gamma (or exponential) sample instead of 40, 6, giving a ratio 7 .4.)
Test statistic:
upper outlier X(n)
E8 Discordancy test for a lower and upper outlier-pair x( 1), x(n) in an ex-
T Ga7 =l ower out l.1er x(l) ponential sample with unknown origin
Test distribution: Test statistic:
tu
reduced range x<n-1)- x(2)
TEs=----....:::;.....
range X( n)- x(l)
o
Test distribution:
Recurrence relationship:
1 n-3 (-)j+1j
n(n -l)rt'- 2
fn {t)= {l+ t)2r Fn-2(t). Fn(t) =l- (n -l)!(l- t) i~1 (j + l)!(n- 3- j)!(l + jt){n -l- (n- j- 2)t} ·
Inequality: Reference: Kabe (1970).
SP(t) ~ n(n -l)P(F2r, 2r >t).
Ga9(Ea9) Discordancy test for k (~2) lower outliers in a gamma (or ex-
Tabulated significance levels: Table IV, pp. 294-295; reproduced (with ponential) sample
appropriate change of notation) from Pearson and Hartley (1966), Table 31,
page 202. Test statistic:
References: Hartley (1950), David (1952). su m of outliers x(l) + ... + x(k)
Toa9
Properties of test: As with test E4, not suitable where rounding makes value
sum of observations I xi
of x(l) imprecise. Test distribution:
Hartley (1950) gives some values for the power of test Ga7(Ea7) in For E(A) and k = 2,
comparison with Bartlett's global test for heterogeneity of variances, the 2
alternative hypothesis being that the n population variances are a random fn(t) n(n -l) [(l- 12 ntt- 2- (l- (n -l)tt- 2]
n-2
sample from a log-normal distribution. The relative power of test Ga7 is 100
per cent when n= 2, and takes values in the range 90-100 per cent for l
O< t < - - ,
for
larger sample sizes (up to twelve). Hartley's figures must be treated with n- 1
caution, in vie w of inaccuracies in bis tables of percentage points of Toa7 2
(later corrected by David, 1952). n(n -l) (1-!ntt- 2 for - 1-< t<~
n-2 n-1 n'
Example: The times at which every fourth vehicle travelling westward along
O otherwise.
a main road in Hull passed an observer were recorded as follows (min: sec):
19:57, 20:14, 20:20, 20:38, 20:50, Inequality:
21:30, 21:38, 21:46, 22:07.
SP(t) <(~)p(F2k,,2<•-kl' <i~;:~;).
There are eight time intervals, viz. 17, 6, 18, 12, 40, 8, 8, and 21 seconds; if
the traffic flow is assumed to be random (i.e. in accord with a Poisson Reference: Fieller (1976), Lewis and Fieller (1978).
process), these will be independent values from a gamma distribution with
shape parameter r = 4. Taking the values 40 and 6 as upper and lower Properties of test: Maximum likelihood ratio test.
outliers, their ratio is 40/6 = 6. 7. The 5 per cent significance point for Toa 7 Example: Epstein (1960b, p.l71) considersalife test in which the failure times
with n= 8, r = 4 is 10.5, so on the basis of this test there is no reason to of te n items are observed, totalling I xi = 600 units. The failure times of the first
believe that the traffic flow was not random. (This conclusion is unaffected if two items to fail are the shortest of the ten, and total 24 units, so we can write
84 Outliers in statistica[ data Discordancy tests foroutliers in univariate samples 85

Ga7(Ea7) Discordancy test for a lower and upper outlier-pair x(l), x(n) in a we make maximal allowance for rounding error and use values 40.5, 5.5
gamma (or exponential) sample instead of 40, 6, giving a ratio 7 .4.)
Test statistic:
upper outlier X(n)
E8 Discordancy test for a lower and upper outlier-pair x( 1), x(n) in an ex-
T Ga7 =l ower out l.1er x(l) ponential sample with unknown origin
Test distribution: Test statistic:
tu
reduced range x<n-1)- x(2)
TEs=----....:::;.....
range X( n)- x(l)
o
Test distribution:
Recurrence relationship:
1 n-3 (-)j+1j
n(n -l)rt'- 2
fn {t)= {l+ t)2r Fn-2(t). Fn(t) =l- (n -l)!(l- t) i~1 (j + l)!(n- 3- j)!(l + jt){n -l- (n- j- 2)t} ·
Inequality: Reference: Kabe (1970).
SP(t) ~ n(n -l)P(F2r, 2r >t).
Ga9(Ea9) Discordancy test for k (~2) lower outliers in a gamma (or ex-
Tabulated significance levels: Table IV, pp. 294-295; reproduced (with ponential) sample
appropriate change of notation) from Pearson and Hartley (1966), Table 31,
page 202. Test statistic:
References: Hartley (1950), David (1952). su m of outliers x(l) + ... + x(k)
Toa9
Properties of test: As with test E4, not suitable where rounding makes value
sum of observations I xi
of x(l) imprecise. Test distribution:
Hartley (1950) gives some values for the power of test Ga7(Ea7) in For E(A) and k = 2,
comparison with Bartlett's global test for heterogeneity of variances, the 2
alternative hypothesis being that the n population variances are a random fn(t) n(n -l) [(l- 12 ntt- 2- (l- (n -l)tt- 2]
n-2
sample from a log-normal distribution. The relative power of test Ga7 is 100
per cent when n= 2, and takes values in the range 90-100 per cent for l
O< t < - - ,
for
larger sample sizes (up to twelve). Hartley's figures must be treated with n- 1
caution, in vie w of inaccuracies in bis tables of percentage points of Toa7 2
(later corrected by David, 1952). n(n -l) (1-!ntt- 2 for - 1-< t<~
n-2 n-1 n'
Example: The times at which every fourth vehicle travelling westward along
O otherwise.
a main road in Hull passed an observer were recorded as follows (min: sec):
19:57, 20:14, 20:20, 20:38, 20:50, Inequality:
21:30, 21:38, 21:46, 22:07.
SP(t) <(~)p(F2k,,2<•-kl' <i~;:~;).
There are eight time intervals, viz. 17, 6, 18, 12, 40, 8, 8, and 21 seconds; if
the traffic flow is assumed to be random (i.e. in accord with a Poisson Reference: Fieller (1976), Lewis and Fieller (1978).
process), these will be independent values from a gamma distribution with
shape parameter r = 4. Taking the values 40 and 6 as upper and lower Properties of test: Maximum likelihood ratio test.
outliers, their ratio is 40/6 = 6. 7. The 5 per cent significance point for Toa 7 Example: Epstein (1960b, p.l71) considersalife test in which the failure times
with n= 8, r = 4 is 10.5, so on the basis of this test there is no reason to of te n items are observed, totalling I xi = 600 units. The failure times of the first
believe that the traffic flow was not random. (This conclusion is unaffected if two items to fail are the shortest of the ten, and total 24 units, so we can write
86 Outliers in statistical data Discordancy tests for outliers in univariate samples 87

x(l) + x(2 ) = 24. It is assumed that failure times under given conditions are Test distribution:
exponentially distributed. Epstein tests whether the first two items to fail can be n!
regarded as having failed abnormally early; making use of a straightforward 1-Fn(t}= (n-q)! (1-t)
F-test, he concludes in favour of this hypothesis. Suppose however that the ten
items were placed on test at different starting times and that the two shortest q-s s-r (-)i+k(q-r-i)![(n-s+k)t+(n-q+i)(l-t)t 1
failure times occurred, not necessarily first, but randomly in chronological
sequence, so that there is no a priori reason to consider these two items as
x{ i~l k~l(i -l}!(k -l)!(q- s- i)!(s- r- k)!(q- i)!(n- s + k}
different form the rest. How strong is the evidence for regarding them as r s-r (-)q-s+j+k(s-r+j-l}![(n-s+k)t+(n-r+j}(l-t)]- 1 }
inconsistent with the rest in vie w of their failure times? The value of T Ea9 is + i~l k~l (j -l}!(k -l)!(r- j)!(s- r- k)!(q- r+ j -l}!(n- s + k)
6 0~
2
t= = 0.04, which is in the range O< t<~· where the first of the double sums is omitted if q= s. TEall for sample size n
an d observations x(q)' x(r)' x<s> has the same distribution as T Ell for sample
Hence size n+ p and observations x<P>' x(q+p)' x(r+p)' x(s+p)·
0.04
References: Likes (1966), Kabe (1970).
SP( t) = f 8~O {(l - 5 u )8 - (l - 9 u )8 } d u
Properties of test: Applicable to any combination of lower and/or upper
o
outliers. For example, x(n- 2 >- x<4 > would be a suitable statistic for a block
=980 r0-~4)•- (O.~o)• -i+~}= 0.721. X(n)-a
test of discordancy of three lower and two upper outliers.
This is the significance probability attaching to the observed ratio t= 0.04,
i.e. there is no real evidence for regarding it as abnormally low-a contrary El l Genera[ Dixon-type discordancy test for an exponential sample irrespec-
conclusion to Epstein's, on our modified premise. tive of origin
Test statistic:
ElO Discordancy test for a lower outlier-pair x(l), x(2) in an exponential
- X(s)- X(r)
sample with unknown origin T Ell- '
X(q)- X(p)
Test statistic:
_ excess _ x< >- x( l) Test distribution:
TE l O _ _ _ _ 3 ·
range x(n)- x(l)
1- Fn(t) =(n- p)! (1- t)
Test distribution: (n- q)!
x { ~s sfr (- )i+k (q - r - i)![(n - s + k) t+ (n - q + i)( l - t) l r
i=l k=l (i -l}!(k -l)!(q- s- i)!(s- r- k)!(q- p- i)!(n- s + k)

r-p s-r (-)q-s+j+k(s-r+j-l}'[(n-s+k)t+(n-r+j)(l-t)t 1 }


References: Likes (1966), Kabe (1970). + i~l k~l (j -l)!(k -l)!(r- p- j}!(s- r- k)!(q- r+ j -l)!(n- s + k)
Properties of test: As for test E4, page 82. where the first of the double sums is omitted if q = s and the second if p = r.
TEll for sample size n an d observations x<P>' x(q), x<r>' x<s> has the same
Eall Genera[ Dixon-type discordancy test for an exponential sample, using distribution as TEall for sample size n- p and observations x(q-p)' x<r-p)'
knowledge of the origin a X(s-p)·

Test statistic: References: Dixon (1950, 1951), Likes (1966), Kabe (1970). Note that a
T - X(s)- X(r) factor (q -p- i)! needs inserting in the denominator of the first double sum
Eall- ' l~r<s~q~n.
x(q)-a in Kabe's equation (13}, page 17.
86 Outliers in statistical data Discordancy tests for outliers in univariate samples 87

x(l) + x(2 ) = 24. It is assumed that failure times under given conditions are Test distribution:
exponentially distributed. Epstein tests whether the first two items to fail can be n!
regarded as having failed abnormally early; making use of a straightforward 1-Fn(t}= (n-q)! (1-t)
F-test, he concludes in favour of this hypothesis. Suppose however that the ten
items were placed on test at different starting times and that the two shortest q-s s-r (-)i+k(q-r-i)![(n-s+k)t+(n-q+i)(l-t)t 1
failure times occurred, not necessarily first, but randomly in chronological
sequence, so that there is no a priori reason to consider these two items as
x{ i~l k~l(i -l}!(k -l)!(q- s- i)!(s- r- k)!(q- i)!(n- s + k}
different form the rest. How strong is the evidence for regarding them as r s-r (-)q-s+j+k(s-r+j-l}![(n-s+k)t+(n-r+j}(l-t)]- 1 }
inconsistent with the rest in vie w of their failure times? The value of T Ea9 is + i~l k~l (j -l}!(k -l)!(r- j)!(s- r- k)!(q- r+ j -l}!(n- s + k)
6 0~
2
t= = 0.04, which is in the range O< t<~· where the first of the double sums is omitted if q= s. TEall for sample size n
an d observations x(q)' x(r)' x<s> has the same distribution as T Ell for sample
Hence size n+ p and observations x<P>' x(q+p)' x(r+p)' x(s+p)·
0.04
References: Likes (1966), Kabe (1970).
SP( t) = f 8~O {(l - 5 u )8 - (l - 9 u )8 } d u
Properties of test: Applicable to any combination of lower and/or upper
o
outliers. For example, x(n- 2 >- x<4 > would be a suitable statistic for a block
=980 r0-~4)•- (O.~o)• -i+~}= 0.721. X(n)-a
test of discordancy of three lower and two upper outliers.
This is the significance probability attaching to the observed ratio t= 0.04,
i.e. there is no real evidence for regarding it as abnormally low-a contrary El l Genera[ Dixon-type discordancy test for an exponential sample irrespec-
conclusion to Epstein's, on our modified premise. tive of origin
Test statistic:
ElO Discordancy test for a lower outlier-pair x(l), x(2) in an exponential
- X(s)- X(r)
sample with unknown origin T Ell- '
X(q)- X(p)
Test statistic:
_ excess _ x< >- x( l) Test distribution:
TE l O _ _ _ _ 3 ·
range x(n)- x(l)
1- Fn(t) =(n- p)! (1- t)
Test distribution: (n- q)!
x { ~s sfr (- )i+k (q - r - i)![(n - s + k) t+ (n - q + i)( l - t) l r
i=l k=l (i -l}!(k -l)!(q- s- i)!(s- r- k)!(q- p- i)!(n- s + k)

r-p s-r (-)q-s+j+k(s-r+j-l}'[(n-s+k)t+(n-r+j)(l-t)t 1 }


References: Likes (1966), Kabe (1970). + i~l k~l (j -l)!(k -l)!(r- p- j}!(s- r- k)!(q- r+ j -l)!(n- s + k)
Properties of test: As for test E4, page 82. where the first of the double sums is omitted if q = s and the second if p = r.
TEll for sample size n an d observations x<P>' x(q), x<r>' x<s> has the same
Eall Genera[ Dixon-type discordancy test for an exponential sample, using distribution as TEall for sample size n- p and observations x(q-p)' x<r-p)'
knowledge of the origin a X(s-p)·

Test statistic: References: Dixon (1950, 1951), Likes (1966), Kabe (1970). Note that a
T - X(s)- X(r) factor (q -p- i)! needs inserting in the denominator of the first double sum
Eall- ' l~r<s~q~n.
x(q)-a in Kabe's equation (13}, page 17.
Discordancy tests foroutliers in uni variate samples 89
88 Outliers in statistica[ data
3.4.3 Discordancy tests for normal samples
Properties of test: Applicable to any combination of lower and/or upper
Historically, the motivation for a statistica! treatment of outliers carne first
outliers. For example, x<n- 2 ) - x( 4 ) would be a suitable statistic for a block from the problems of combining astronomica! observations, and repeated
X(n)- X(l)
measurements or determinations must always be one of the main contexts in
test of discordancy for three lower and two upper outliers. E2, E4, E6 are
which discordancy problems arise. In very many cases errors of measure-
important particular cases. ment may plausibly be assumed to follow a normal distribution, whether
through the operation of the centrai limit theorem on contributory error
El2 Two-sided test for the presence of an undefined number of discordant
components, or purely as an empirica! fact. It is not surprising, therefore,
values in an exponential sample irrespective of origin
that the vast body of published methodology on outliers from the eighteenth
Test statistic: century to the present day rests on the working hypothesis of a normal
distribution. Indeed, it is only in the last fifteen years or so that outliers in
TE 12 = Shapiro and Wilk's 'W-Exponential' statistic exponential and other non-normal models have been specifically considered.
n(x- x(l)f When the normal distribution is being used in this way as a kind of
all-purpose probability model, the mean p., and variance u 2 will both in
(n -l) L/= 1 (xi- i) 2 •
generai be unknown, and any discordancy test for outliers will reflect this.
Tabulated significance levels: Table VI, page 297; abridged from Shapiro However, discordancy tests also arise in situations when information is
and Wilk (1972), Table l, pages 361-362, where lower and upper 0.5, l, available concerning p., or u 2 or both. The value of p., may be known. The
2.5, 5, and 10 per cent points and the 50 per cent point are given for variance u 2 may be known exactly, or again some information on its value
n= 3(1)100. may be available in the form of an estimate independent of the particular
sample of observations un der study for discordancy. This estimate may
Further tables: Shapiro and Wilk (1972) give values for the power of the test perhaps be an item of background information 'from the fil es'. A quite
against 15 different inherent alternatives (see Chapter 2, page 31). different context giving rise to such an estimate is in analysis of variance,
Reference: Shapiro and Wilk (1972). when we may find a surprising value among a set of treatment means, and
have available the residua! mean square to assist in judging its discordancy.
Properties of test: A useful omnibus test against inherent alternatives. In the Outlier situations with p., unknown but u 2 known may arise in quality
outlier context, significantly high values of TE 12 indicate the presence of one control, where past experience provides reasonably accurate knowledge of
or more high discordant values x(n)' x<n- 1), .•• and/or one low discordant the process variance. Outlier situations with u 2 unknown but p., known may
value x0 ); significantly low values of TE 12 indicate the presence of a number arise, for example, in paired comparison situations where the sample values
of low discordant values x(l), x<2), . • . . we are considering for discordancy are differences between corresponding
responses and so have mean p., = O on the working hypothesis. The case
Gal3 Procedure for testing one or more outliers for discordancy in a gamma p.,, u 2 both known is of limited methodological interest, any discordancy test
sample of unknown shape parameter r being based simply on the appropriate extreme-value distribution. However,
Transform the values xb ... , Xn in the gamma sample t o y1 = .J xb ... , Yn = we have included this case as it has practical interest. It could arise, for
.J Xm and apply to the values y1 , ••• , Yn (ali taken positively) a discordancy example, in the validation of tables of random normal deviates; or, less
test for a sample from a norma! distribution with unknown mean and esoterically, in reaching decisions on classification for taxonomic, an-
variance (tests Nl-Nl7, Section 3.4.3). (For if X is distributed as r(r, A), thropological, or even legai purposes. For example, in a well known British
then .Jx is distributed approximately as N(.J[A(r-!)], !A); and when r and A legai case dating from shortly after the First World War, the husband's basis
are both unknown, the mean and variance of this approximating normal for bringing divorce proceedings was a 331-day period between his depar-
distribution are both unknown.) ture for military service abroad and the birth of his wife's child. Extensive
For example, if three upper outliers X(n)' x<n- 1), x<n- 2 ) in the gamma data on the duration of pregnancies (whilst not strictly normal) allow one to
sample are to be tested for discordancy, .Jx(n)' Jx<n- 1 ), and .Jx(n- 2 ) will be assume reasonably precise values for mean and variance. A 331-day gesta-
the three greatest values in the y-sample, and test N3 could appropriately be tion period is surprising; is it credible or must it be assumed discordant?
used, with [.J X( n)+ .J X<n-l) + .Jx(n- 2 ) - 3 y]/ Sy as test statistic, where y an d Sy Finally, the technique of transformation of the observations leads to a
are the mean and standard deviation of the y-values. further range of applications of normal-sample discordancy tests, as in the
Discordancy tests foroutliers in uni variate samples 89
88 Outliers in statistica[ data
3.4.3 Discordancy tests for normal samples
Properties of test: Applicable to any combination of lower and/or upper
Historically, the motivation for a statistica! treatment of outliers carne first
outliers. For example, x<n- 2 ) - x( 4 ) would be a suitable statistic for a block from the problems of combining astronomica! observations, and repeated
X(n)- X(l)
measurements or determinations must always be one of the main contexts in
test of discordancy for three lower and two upper outliers. E2, E4, E6 are
which discordancy problems arise. In very many cases errors of measure-
important particular cases. ment may plausibly be assumed to follow a normal distribution, whether
through the operation of the centrai limit theorem on contributory error
El2 Two-sided test for the presence of an undefined number of discordant
components, or purely as an empirica! fact. It is not surprising, therefore,
values in an exponential sample irrespective of origin
that the vast body of published methodology on outliers from the eighteenth
Test statistic: century to the present day rests on the working hypothesis of a normal
distribution. Indeed, it is only in the last fifteen years or so that outliers in
TE 12 = Shapiro and Wilk's 'W-Exponential' statistic exponential and other non-normal models have been specifically considered.
n(x- x(l)f When the normal distribution is being used in this way as a kind of
all-purpose probability model, the mean p., and variance u 2 will both in
(n -l) L/= 1 (xi- i) 2 •
generai be unknown, and any discordancy test for outliers will reflect this.
Tabulated significance levels: Table VI, page 297; abridged from Shapiro However, discordancy tests also arise in situations when information is
and Wilk (1972), Table l, pages 361-362, where lower and upper 0.5, l, available concerning p., or u 2 or both. The value of p., may be known. The
2.5, 5, and 10 per cent points and the 50 per cent point are given for variance u 2 may be known exactly, or again some information on its value
n= 3(1)100. may be available in the form of an estimate independent of the particular
sample of observations un der study for discordancy. This estimate may
Further tables: Shapiro and Wilk (1972) give values for the power of the test perhaps be an item of background information 'from the fil es'. A quite
against 15 different inherent alternatives (see Chapter 2, page 31). different context giving rise to such an estimate is in analysis of variance,
Reference: Shapiro and Wilk (1972). when we may find a surprising value among a set of treatment means, and
have available the residua! mean square to assist in judging its discordancy.
Properties of test: A useful omnibus test against inherent alternatives. In the Outlier situations with p., unknown but u 2 known may arise in quality
outlier context, significantly high values of TE 12 indicate the presence of one control, where past experience provides reasonably accurate knowledge of
or more high discordant values x(n)' x<n- 1), .•• and/or one low discordant the process variance. Outlier situations with u 2 unknown but p., known may
value x0 ); significantly low values of TE 12 indicate the presence of a number arise, for example, in paired comparison situations where the sample values
of low discordant values x(l), x<2), . • . . we are considering for discordancy are differences between corresponding
responses and so have mean p., = O on the working hypothesis. The case
Gal3 Procedure for testing one or more outliers for discordancy in a gamma p.,, u 2 both known is of limited methodological interest, any discordancy test
sample of unknown shape parameter r being based simply on the appropriate extreme-value distribution. However,
Transform the values xb ... , Xn in the gamma sample t o y1 = .J xb ... , Yn = we have included this case as it has practical interest. It could arise, for
.J Xm and apply to the values y1 , ••• , Yn (ali taken positively) a discordancy example, in the validation of tables of random normal deviates; or, less
test for a sample from a norma! distribution with unknown mean and esoterically, in reaching decisions on classification for taxonomic, an-
variance (tests Nl-Nl7, Section 3.4.3). (For if X is distributed as r(r, A), thropological, or even legai purposes. For example, in a well known British
then .Jx is distributed approximately as N(.J[A(r-!)], !A); and when r and A legai case dating from shortly after the First World War, the husband's basis
are both unknown, the mean and variance of this approximating normal for bringing divorce proceedings was a 331-day period between his depar-
distribution are both unknown.) ture for military service abroad and the birth of his wife's child. Extensive
For example, if three upper outliers X(n)' x<n- 1), x<n- 2 ) in the gamma data on the duration of pregnancies (whilst not strictly normal) allow one to
sample are to be tested for discordancy, .Jx(n)' Jx<n- 1 ), and .Jx(n- 2 ) will be assume reasonably precise values for mean and variance. A 331-day gesta-
the three greatest values in the y-sample, and test N3 could appropriately be tion period is surprising; is it credible or must it be assumed discordant?
used, with [.J X( n)+ .J X<n-l) + .Jx(n- 2 ) - 3 y]/ Sy as test statistic, where y an d Sy Finally, the technique of transformation of the observations leads to a
are the mean and standard deviation of the y-values. further range of applications of normal-sample discordancy tests, as in the
90 Outliers in statistical data Discordancy tests for outliers in uni variate samples 91

case of exponential samples (page 77). Tests designed for normal samples In a situation where IL is known but no appropriate test is listed under
with p., u 2 both unknown can be applied to outliers in samples from gamma code N IL because no significance levels are available, or similarly where u 2 is
distributions with unknown shape parameter. Tests designed for normal known but no appropriate test is liste d un der code N u, there may be an
samples with known variance u 2 are particularly useful, since they can be appropriate N-test which can be used though with some loss of efficiency.
applied to outliers in Poisson samples and binomia! samples. For example, if For example, to fest for a lower and upper outlier-pair x(l), x<n> with IL
2
the n values x1, ... , Xn are on the working hypothesis a sample from a known, u unknown, N 6 could be used, sin ce significance levels for a test
Poisson distribution P(/L), then the n transformed values J(x 1+}), based on (x<n>- x<t>)/s(IL) are not available. Where both IL and u 2 are known
... , J(xn +l) are (provided the "mean IL is not too small) a sample from a an Nu-test (or possibly an N11--test) ·can be used if necessary.
distribution approximately N(.J /L,}). For details of discordancy testing in the In view of the symmetry of the normal distribution, any test for an upper
Poisson and binomia! cases, see Section 3.4.4$; for the gamma case r(r, A) outlier, upper outlier-pair etc., can be used for a lower outlier, lower
with unknown r, see Section 3.4.2 (test Ga13). outlier-pair etc., with the obvious modifications. For example, two lower
Contents List: Norma[ Samples outliers x(1)' x(2) can be tested for discordancy by test N3 using the statistic
(2i- x(2)- x(1))/ s. T o sa ve space, such tests are only given here in terms of
The tests are classified as follows, according to the information available the upper outlier situation.
regarding the mean and variance of the normal distribution N(IL, u 2) as- Worksheet
sumed in the working hypothesis. La bel page no. Description of test Statistic
Code lnformation x -i
N1 93 Test for upper outlier ~or equivalently S~/S 2

c
x<n>
2 s
N IL and u both unknown N2 94 Test for extreme outlier
IL unknown. lnformation available on u 2 max ~, ~ or equi-
-i i-x )
Nv (two-sided form of N1)
independent of the sample in the form of valently min(S~/S 2 , Si/S 2 )
an estimate v = s; such that vs; is distri- N3 95 Test for k(?:!2) upper x(n-k+1) + ... + x(n)- ki
buted as
2
x; N4
outliers x(n-k+ 1>, • .• , xn s
N IL IL known, u unknown 96 Test for k(?:!2) upper S~-k+t, ... , n-t,n1S 2
2 outliers x(n-k+ 1), . •. , x(n)
Nu u known, IL unknown
2
NS 96 Test for lower and upper
N 11-u IL an d u 2 both known Si,n!S
outlier-pair x(ll' x(n)
In the case N, IL is estimated on the working hypothesis by i= I x/n, and N6 9'7 Test for lower and upper X(n)-x(1)
u 2 by s 2 =I (xi- i) 2/(n -l). In the case Nv, u 2 can be estimated by the outlier-pair x< 1>, x(n) s
independent estimators s 2, s;,
or by the pooled estimator N7(N~-t7) 97 Dixon-type test for upper x(n)-x(n-0

§ =
2
[L
(xi- i) 2 + vs;]!(n -l+ v).
N8(N~-t8) 98
outlier x<n>
Dixon-type test for extreme
x(n)- x(l)

In the case N IL' u 2 is estimated by s 2(1L) =I (xi- IL ) 2/n. outlier (two-sided form max [ X(n)- X(n-1)
, X\2)- XO>]
The sum of squares of the deviations from i of the n observations of N7(N~-t7)) x( n)- x(l) x( n)- x< 1>

x1,x2,•••,Xm Ii=1(xj-i) 2, is denoted by S 2. Thus s 2 =S 2/(n-l). If X(n) is N9(N~-t9) 98 Dixon-type test for upper X(n}- X(n-1)
outlier x<n>
omitted, the sum of squares of the deviations of the remaining n- l x(n)-x(2)
N10(N~-t10) 99 Dixon-type test for upper
observations from their- own mean is denoted by s~. s~-1,n is the corres- x(n}-x(n-0
outlier x(n)
ponding sum of squares when x<n- 1), x<n> are both omitted, and so on. The Nll(N~-tll)
x(n)'- x(3)
99
quantity Ii=t (xi -11-) 2, which is of relevance to case N IL, is denoted by
Dixon-type test for two x(n)- x(n-2)
upper outliers x(n- 1), x<n>
S 2(/L), with s 2(1L) = S 2(p.)/n, and with similar definitions for S~(IL), S~- 1 n(IL) N12(N~-t12) 100
x(n)-x(l)
Dixon-type test for two
etc. In the case Nv, the sum of squares I (xj- i) 2+ vs; is denoted by 2. S upper outliers x(n-tl• x(n)
x(n)- x(n-2)
x(n)- x(2)
Tests not involving the sample mean or population mean are coded both N13(N~-t13) 100 Dixon-type test for two x( n)- x(n-2}
as N and NIL (or, where u 2 is known, both as Nu and N11-u). Examples: upper outliers x(n-t)• x<n> x(n)-x(3)
N7(NIL7), test statistic (x<n>-x<n- 1>)/(x<n>-x< 1>); Nu6(N11-o6), test statistic N14 10(\ Test for one or more upper
(x(n)- X(t))/u. outliers Sample skewness g 1 = ../b 1
90 Outliers in statistical data Discordancy tests for outliers in uni variate samples 91

case of exponential samples (page 77). Tests designed for normal samples In a situation where IL is known but no appropriate test is listed under
with p., u 2 both unknown can be applied to outliers in samples from gamma code N IL because no significance levels are available, or similarly where u 2 is
distributions with unknown shape parameter. Tests designed for normal known but no appropriate test is liste d un der code N u, there may be an
samples with known variance u 2 are particularly useful, since they can be appropriate N-test which can be used though with some loss of efficiency.
applied to outliers in Poisson samples and binomia! samples. For example, if For example, to fest for a lower and upper outlier-pair x(l), x<n> with IL
2
the n values x1, ... , Xn are on the working hypothesis a sample from a known, u unknown, N 6 could be used, sin ce significance levels for a test
Poisson distribution P(/L), then the n transformed values J(x 1+}), based on (x<n>- x<t>)/s(IL) are not available. Where both IL and u 2 are known
... , J(xn +l) are (provided the "mean IL is not too small) a sample from a an Nu-test (or possibly an N11--test) ·can be used if necessary.
distribution approximately N(.J /L,}). For details of discordancy testing in the In view of the symmetry of the normal distribution, any test for an upper
Poisson and binomia! cases, see Section 3.4.4$; for the gamma case r(r, A) outlier, upper outlier-pair etc., can be used for a lower outlier, lower
with unknown r, see Section 3.4.2 (test Ga13). outlier-pair etc., with the obvious modifications. For example, two lower
Contents List: Norma[ Samples outliers x(1)' x(2) can be tested for discordancy by test N3 using the statistic
(2i- x(2)- x(1))/ s. T o sa ve space, such tests are only given here in terms of
The tests are classified as follows, according to the information available the upper outlier situation.
regarding the mean and variance of the normal distribution N(IL, u 2) as- Worksheet
sumed in the working hypothesis. La bel page no. Description of test Statistic
Code lnformation x -i
N1 93 Test for upper outlier ~or equivalently S~/S 2

c
x<n>
2 s
N IL and u both unknown N2 94 Test for extreme outlier
IL unknown. lnformation available on u 2 max ~, ~ or equi-
-i i-x )
Nv (two-sided form of N1)
independent of the sample in the form of valently min(S~/S 2 , Si/S 2 )
an estimate v = s; such that vs; is distri- N3 95 Test for k(?:!2) upper x(n-k+1) + ... + x(n)- ki
buted as
2
x; N4
outliers x(n-k+ 1>, • .• , xn s
N IL IL known, u unknown 96 Test for k(?:!2) upper S~-k+t, ... , n-t,n1S 2
2 outliers x(n-k+ 1), . •. , x(n)
Nu u known, IL unknown
2
NS 96 Test for lower and upper
N 11-u IL an d u 2 both known Si,n!S
outlier-pair x(ll' x(n)
In the case N, IL is estimated on the working hypothesis by i= I x/n, and N6 9'7 Test for lower and upper X(n)-x(1)
u 2 by s 2 =I (xi- i) 2/(n -l). In the case Nv, u 2 can be estimated by the outlier-pair x< 1>, x(n) s
independent estimators s 2, s;,
or by the pooled estimator N7(N~-t7) 97 Dixon-type test for upper x(n)-x(n-0

§ =
2
[L
(xi- i) 2 + vs;]!(n -l+ v).
N8(N~-t8) 98
outlier x<n>
Dixon-type test for extreme
x(n)- x(l)

In the case N IL' u 2 is estimated by s 2(1L) =I (xi- IL ) 2/n. outlier (two-sided form max [ X(n)- X(n-1)
, X\2)- XO>]
The sum of squares of the deviations from i of the n observations of N7(N~-t7)) x( n)- x(l) x( n)- x< 1>

x1,x2,•••,Xm Ii=1(xj-i) 2, is denoted by S 2. Thus s 2 =S 2/(n-l). If X(n) is N9(N~-t9) 98 Dixon-type test for upper X(n}- X(n-1)
outlier x<n>
omitted, the sum of squares of the deviations of the remaining n- l x(n)-x(2)
N10(N~-t10) 99 Dixon-type test for upper
observations from their- own mean is denoted by s~. s~-1,n is the corres- x(n}-x(n-0
outlier x(n)
ponding sum of squares when x<n- 1), x<n> are both omitted, and so on. The Nll(N~-tll)
x(n)'- x(3)
99
quantity Ii=t (xi -11-) 2, which is of relevance to case N IL, is denoted by
Dixon-type test for two x(n)- x(n-2)
upper outliers x(n- 1), x<n>
S 2(/L), with s 2(1L) = S 2(p.)/n, and with similar definitions for S~(IL), S~- 1 n(IL) N12(N~-t12) 100
x(n)-x(l)
Dixon-type test for two
etc. In the case Nv, the sum of squares I (xj- i) 2+ vs; is denoted by 2. S upper outliers x(n-tl• x(n)
x(n)- x(n-2)
x(n)- x(2)
Tests not involving the sample mean or population mean are coded both N13(N~-t13) 100 Dixon-type test for two x( n)- x(n-2}
as N and NIL (or, where u 2 is known, both as Nu and N11-u). Examples: upper outliers x(n-t)• x<n> x(n)-x(3)
N7(NIL7), test statistic (x<n>-x<n- 1>)/(x<n>-x< 1>); Nu6(N11-o6), test statistic N14 10(\ Test for one or more upper
(x(n)- X(t))/u. outliers Sample skewness g 1 = ../b 1
Discordancy tests for outliers in univariate samples 93
Worksheet
Label page no. Description of test Statistic Worksheet
La bel page no. Description of test Statisti c
N15 10 ì Two-sided test for one or
more outliers, irrespec- Sample kurtosis b2 Nu3 112 Test for k(?::2) upper out- x(n-k+1)+ ... +x(n)- ki
tive of their directions
N16 Block test for k outliers liers x(n-k+ 1), ... , x(n)
102 (T

irrespective of directions Tietjen and Moore's Ek- NILu3 112 Test for k(?::2) upper out- x(n-k+1ì + ... + x!nl- kiL
(i.e. of how many upper statistic. See worksheet liers x(n-k+ 1), ... , x(n) (T

and how many lower) Nu4 113 Test for two upper outliers s~-1,n/u2
N17 102 Test for presence of an x(n-1)• x(n)
Shapiro and Wilk's W- NILu4 113
undefined number of Test for two upper outliers s~-1.rt(IL)/u2
statistic. See worksheet
discordant values X(rt-1)• X(n)
NuS 113 Test for lower and upper Si,nlu 2
x(n)-i outlier-pair x< 1>, x(rt)
Nv1 103 Test for upper outlier x(n) NILu5
sv 113 Test for lower and upper Si,n(#L )/ U 2
x(n)-i outlier-pair x< 1>, x(n)
Nv2 104 Test for upper outlier x(n) N u6(N 1Lu6) 113
s Test for lower and upper x(n)- x(l)
Nv3 Test for extreme outlier outlier-pair x< 1>, x(n) (T

(two-sided form of Nv1) max c(n)-i , i-x(1)) Nu7(N 1Lu7) 114 Test for upper outlier
sv sv X!nì-x(rt-1)
Nv4 105 Test for extreme outlier x(n) (T
max c(n)-
_ i , i - _x(l)) Nu8(NJLu8) 114
(two-sided form of Nv2) s s Test for two upper outliers x!n-1)- x!n-2)
Nv5 106 Test for k(?::2) upper out- x(rt-1)• X(n)
x!n-k+1) + ... + x!n)- ki (T

Nu9(NILu9) 115 Test for k lower and k


liers x(n-k+l)• ... , x(n) s X(n-k+l)- X(k)
Nv6 106 Test for lower and upper upper outliers
x!n)-x(1)
outlier-pair x< 1>, x(rt) sv
Nv7 107 Test for lower and upper X(n)-X(l) Nl Discordancy test for a single upper outlier x(n) in a norma/ sample with IL
outlier-pair x< 1>, x(n) s and u 2 unknown
107 X!n> -IL
NIL1 Test for upper outlier x<n> Test statistic:
s(IL)
NIL2 108 Test for extreme outlier ('X(n)_IL, IIL-X(l)')
(two-sided form of NIL1) max ~· s(IL) TN t= internally studentized extreme deviati o n from mean = x( n)- i.
or equivalently s
min(S~(IL)/S 2 (~L), Si(IL)/ S 2 (~L)) An equivalent statistic is:
NIL3 108 Test for k(?::2) upper out- x!n-k+l) + ... + x(n)- kiL
liers x(n-k+l)• ... , x(n) s(IL) reduced sum of squares = S~ = ___n_ T~ t·
2 1
NIL4 108 Test for two upper outliers S~-t,n(IL )/ S (1L) total sum of squares S2 (n -1) 2
x(n-1)• x(n)
NIL5 109 Test for lower and upper Si,n(IL )/ S 2(1L) Recurrence relationship:
outlier-pair x< 1>, x(n)

n (n( - 2 ( 1 nt r-·l/ F._, f]


NIL6 109 Two-sided test for one or LÌ=1 (x; -IL)
4
(n-1) 2 2 2 2
more extreme outliers ns 4 (1L) n (n-2)t
/.(t)=n-1; r(n~2) -(n-lf
[ (
NIL14 109 Block test for k outliers
See worksheet
(n-1)((n-1f-nf)
irrespective of directions

Nu1 110 Test for upper outlier x<n>


x(n)-i
(T

X(n) -IL
e
Jn~t~ Jn
n-1)

NILu1 111 Test for upper outlier x<n> with


(T

Nu2 112 Test for extreme outlier


max c(n)-i , i-x(l))
F2 (t)=O(t<]2),l(t> ] ).
(two-sided form of Nu1) (T (T
2
92
Discordancy tests for outliers in univariate samples 93
Worksheet
Label page no. Description of test Statistic Worksheet
La bel page no. Description of test Statisti c
N15 10 ì Two-sided test for one or
more outliers, irrespec- Sample kurtosis b2 Nu3 112 Test for k(?::2) upper out- x(n-k+1)+ ... +x(n)- ki
tive of their directions
N16 Block test for k outliers liers x(n-k+ 1), ... , x(n)
102 (T

irrespective of directions Tietjen and Moore's Ek- NILu3 112 Test for k(?::2) upper out- x(n-k+1ì + ... + x!nl- kiL
(i.e. of how many upper statistic. See worksheet liers x(n-k+ 1), ... , x(n) (T

and how many lower) Nu4 113 Test for two upper outliers s~-1,n/u2
N17 102 Test for presence of an x(n-1)• x(n)
Shapiro and Wilk's W- NILu4 113
undefined number of Test for two upper outliers s~-1.rt(IL)/u2
statistic. See worksheet
discordant values X(rt-1)• X(n)
NuS 113 Test for lower and upper Si,nlu 2
x(n)-i outlier-pair x< 1>, x(rt)
Nv1 103 Test for upper outlier x(n) NILu5
sv 113 Test for lower and upper Si,n(#L )/ U 2
x(n)-i outlier-pair x< 1>, x(n)
Nv2 104 Test for upper outlier x(n) N u6(N 1Lu6) 113
s Test for lower and upper x(n)- x(l)
Nv3 Test for extreme outlier outlier-pair x< 1>, x(n) (T

(two-sided form of Nv1) max c(n)-i , i-x(1)) Nu7(N 1Lu7) 114 Test for upper outlier
sv sv X!nì-x(rt-1)
Nv4 105 Test for extreme outlier x(n) (T
max c(n)-
_ i , i - _x(l)) Nu8(NJLu8) 114
(two-sided form of Nv2) s s Test for two upper outliers x!n-1)- x!n-2)
Nv5 106 Test for k(?::2) upper out- x(rt-1)• X(n)
x!n-k+1) + ... + x!n)- ki (T

Nu9(NILu9) 115 Test for k lower and k


liers x(n-k+l)• ... , x(n) s X(n-k+l)- X(k)
Nv6 106 Test for lower and upper upper outliers
x!n)-x(1)
outlier-pair x< 1>, x(rt) sv
Nv7 107 Test for lower and upper X(n)-X(l) Nl Discordancy test for a single upper outlier x(n) in a norma/ sample with IL
outlier-pair x< 1>, x(n) s and u 2 unknown
107 X!n> -IL
NIL1 Test for upper outlier x<n> Test statistic:
s(IL)
NIL2 108 Test for extreme outlier ('X(n)_IL, IIL-X(l)')
(two-sided form of NIL1) max ~· s(IL) TN t= internally studentized extreme deviati o n from mean = x( n)- i.
or equivalently s
min(S~(IL)/S 2 (~L), Si(IL)/ S 2 (~L)) An equivalent statistic is:
NIL3 108 Test for k(?::2) upper out- x!n-k+l) + ... + x(n)- kiL
liers x(n-k+l)• ... , x(n) s(IL) reduced sum of squares = S~ = ___n_ T~ t·
2 1
NIL4 108 Test for two upper outliers S~-t,n(IL )/ S (1L) total sum of squares S2 (n -1) 2
x(n-1)• x(n)
NIL5 109 Test for lower and upper Si,n(IL )/ S 2(1L) Recurrence relationship:
outlier-pair x< 1>, x(n)

n (n( - 2 ( 1 nt r-·l/ F._, f]


NIL6 109 Two-sided test for one or LÌ=1 (x; -IL)
4
(n-1) 2 2 2 2
more extreme outliers ns 4 (1L) n (n-2)t
/.(t)=n-1; r(n~2) -(n-lf
[ (
NIL14 109 Block test for k outliers
See worksheet
(n-1)((n-1f-nf)
irrespective of directions

Nu1 110 Test for upper outlier x<n>


x(n)-i
(T

X(n) -IL
e
Jn~t~ Jn
n-1)

NILu1 111 Test for upper outlier x<n> with


(T

Nu2 112 Test for extreme outlier


max c(n)-i , i-x(l))
F2 (t)=O(t<]2),l(t> ] ).
(two-sided form of Nu1) (T (T
2
92
94 Outliers in statistical data Discordancy tests foroutliers in univariate samples 95

Inequality:
Inequality:
n(n- 2)t
SP(t) ~ nP( tn- 2 > [ (n -l) 2 _ nt 2
2
]!) • SP(t) ~ 2P(TN 1 >t). Equality holds for t> v'[(n -l)(n- 2)/2n].
Tabulated significance levels: Table VIIb, page 298; derived from Pearson
This is an equality when t~[(n-l)(n-2)/2n]!.
and Hartley (1966), Table 26b, page 188.
Tabulated significance levels: T ab le VIIa, page 298; abridged from Gru bbs
Further tables: Ferguson (196la) gives tables of power P 1 for alternatives of
and Beck (1972), Table I, pages 848-850, where 0.1, 0.5, l, 2,5, 5, and 10
slippage in location by a single observation and by two observations.
per cent points are given for n= 3(1)147.
References: Kudo (1956), Ferguson (196la, 196lb), Quesenberry and
Further tables: Dixon (1950) gives graphs of. the performance measure P3
David (1961), Tietjen and Moore (1972). Tietjen and Moore, working in
(see page 66) for n= 5, 15 and for alternatives of slippage in location and
terms of the equivalent statistic min(S~ S 2 , Si/ S 2 ), prese n t the inequality for
slippage in dispersion by one and two observations; the figures are derived
SP(t) as an equality.
from sampling experiments of size 200 at most. Ferguson (196la) gives
tables of power P 1 (pages 65-66) for the alternative of slippage in location by Properties of test: Maximum likelihood ratio test for a location-slippage
a single observation. David and Paulson (1965) give graphs of performance alternative in which one observation arises from a normal distribution
measure P 2 (page 66) in relation to the same alternative, for n= 4(2)10. N(IL +a, u 2 ), a =l O. For this alternative, has the optimal property of being
McMillan (1971) gives graphs of the performance measures the scale- and location-invariant test of given size which maximizes the
P(C1 ), P(C2 ), P(C3 ) (page 74) when Nl is used consecutively for the testing probability (P3 , page 66) of identifying the contaminant as discordant.
of two upper outliers; some corrections to these results are given by Moran Vulnerable to masking effect in small samples when there are two outliers in
and McMillan (1973). the same direction.
References: Pearson and Chandra Sekar (1936), Dixon (1950, 1962),
N3 Discordancy test for k upper outliers x<n-k+ 1>, •.• , x<n- 1>, x<n> in a norma[
Grubbs (1950, 1969), Kudo (1956a), Ferguson (196la, 196lb), Quesen-
sample with IL and u 2 unknown
berry and David (1961), David and Paulson (1965), Stefansky (1971),
McMillan (1971), Moran and McMillan (1973). Test statistic:
Properties of test: Nl is the maximum likelihood ratio test for a location- TN 3 = sum of internally studentized deviations from the mean
slippage alternative in which one observation arises from a normal distribu-
= X(n-k+1) + · · · + X(n-1) + X(n)- ki
tion N(~-t +a, u 2 ), a> O. For this alternative, it has the optimal property of
s
being the scale- and location-invariant test of given size which maximizes
the probability P 3 of identifying the contaminant as discordant. Vulnerable Inequality:
to masking effect when there is more than one contaminant, but less so than
N7(NIL7). Not very suitable for consecutive use when testing several out-
liers; preferable in this case to use a block procedure or to use Nl5
consecutively.
SP(t).; (;)1•·-z> [k(n _n~~:~):;_ ntz
This is an equality when t~ [k 2 (n -l)(n k -1)/(nk +n)]!.
n
N2 Two-sided discordancy test for an extreme outlier in a norma[ sample with Tabulated significance levels: Table IXa, page 304; freshly compiled on the
IL and u 2 unknown basis of simulations of sizes lO 000.

Test statistic: References: Murphy (1951), Kudo (1956a), Ferguson (196lb), McMillan
(1971), Fieller (1976). McMillan gives results for the comparative perfor-
_
T N 2 -max (X(n)- i
- - - , i---X(l))
-. mance of tests N3 and N4 as applied to two upper outliers (k = 2); see N4
s s
Worksheet.
An equivalent statistic is:
Properties of test: N3 is the maximum likelihood ratio test for a location-
mm
. (s~ Si)
S2 ' S2
n 2
= l - (n - l )2 T N2. slippage alternative in which k observations arise from a common normal
distribution N(IL +a, u 2 ), a> O. For this alternative, it has the optimal
94 Outliers in statistical data Discordancy tests foroutliers in univariate samples 95

Inequality:
Inequality:
n(n- 2)t
SP(t) ~ nP( tn- 2 > [ (n -l) 2 _ nt 2
2
]!) • SP(t) ~ 2P(TN 1 >t). Equality holds for t> v'[(n -l)(n- 2)/2n].
Tabulated significance levels: Table VIIb, page 298; derived from Pearson
This is an equality when t~[(n-l)(n-2)/2n]!.
and Hartley (1966), Table 26b, page 188.
Tabulated significance levels: T ab le VIIa, page 298; abridged from Gru bbs
Further tables: Ferguson (196la) gives tables of power P 1 for alternatives of
and Beck (1972), Table I, pages 848-850, where 0.1, 0.5, l, 2,5, 5, and 10
slippage in location by a single observation and by two observations.
per cent points are given for n= 3(1)147.
References: Kudo (1956), Ferguson (196la, 196lb), Quesenberry and
Further tables: Dixon (1950) gives graphs of. the performance measure P3
David (1961), Tietjen and Moore (1972). Tietjen and Moore, working in
(see page 66) for n= 5, 15 and for alternatives of slippage in location and
terms of the equivalent statistic min(S~ S 2 , Si/ S 2 ), prese n t the inequality for
slippage in dispersion by one and two observations; the figures are derived
SP(t) as an equality.
from sampling experiments of size 200 at most. Ferguson (196la) gives
tables of power P 1 (pages 65-66) for the alternative of slippage in location by Properties of test: Maximum likelihood ratio test for a location-slippage
a single observation. David and Paulson (1965) give graphs of performance alternative in which one observation arises from a normal distribution
measure P 2 (page 66) in relation to the same alternative, for n= 4(2)10. N(IL +a, u 2 ), a =l O. For this alternative, has the optimal property of being
McMillan (1971) gives graphs of the performance measures the scale- and location-invariant test of given size which maximizes the
P(C1 ), P(C2 ), P(C3 ) (page 74) when Nl is used consecutively for the testing probability (P3 , page 66) of identifying the contaminant as discordant.
of two upper outliers; some corrections to these results are given by Moran Vulnerable to masking effect in small samples when there are two outliers in
and McMillan (1973). the same direction.
References: Pearson and Chandra Sekar (1936), Dixon (1950, 1962),
N3 Discordancy test for k upper outliers x<n-k+ 1>, •.• , x<n- 1>, x<n> in a norma[
Grubbs (1950, 1969), Kudo (1956a), Ferguson (196la, 196lb), Quesen-
sample with IL and u 2 unknown
berry and David (1961), David and Paulson (1965), Stefansky (1971),
McMillan (1971), Moran and McMillan (1973). Test statistic:
Properties of test: Nl is the maximum likelihood ratio test for a location- TN 3 = sum of internally studentized deviations from the mean
slippage alternative in which one observation arises from a normal distribu-
= X(n-k+1) + · · · + X(n-1) + X(n)- ki
tion N(~-t +a, u 2 ), a> O. For this alternative, it has the optimal property of
s
being the scale- and location-invariant test of given size which maximizes
the probability P 3 of identifying the contaminant as discordant. Vulnerable Inequality:
to masking effect when there is more than one contaminant, but less so than
N7(NIL7). Not very suitable for consecutive use when testing several out-
liers; preferable in this case to use a block procedure or to use Nl5
consecutively.
SP(t).; (;)1•·-z> [k(n _n~~:~):;_ ntz
This is an equality when t~ [k 2 (n -l)(n k -1)/(nk +n)]!.
n
N2 Two-sided discordancy test for an extreme outlier in a norma[ sample with Tabulated significance levels: Table IXa, page 304; freshly compiled on the
IL and u 2 unknown basis of simulations of sizes lO 000.

Test statistic: References: Murphy (1951), Kudo (1956a), Ferguson (196lb), McMillan
(1971), Fieller (1976). McMillan gives results for the comparative perfor-
_
T N 2 -max (X(n)- i
- - - , i---X(l))
-. mance of tests N3 and N4 as applied to two upper outliers (k = 2); see N4
s s
Worksheet.
An equivalent statistic is:
Properties of test: N3 is the maximum likelihood ratio test for a location-
mm
. (s~ Si)
S2 ' S2
n 2
= l - (n - l )2 T N2. slippage alternative in which k observations arise from a common normal
distribution N(IL +a, u 2 ), a> O. For this alternative, it has the optimal
96 Outliers in statistica[ data Discordancy tests for outliers in uni variate samples 97

property of being the scale- and location-invariant test of given size which Tabulated significance levels: Table Xa, page 306; freshly compiled on the
maximizes the probability of identifying the k contaminants as discordant. basis of simulations of sizes 10 000. Values of TN 5 smaller than the tabu-
lated level are significant.
N4 Discordancy test for k('~2) upper outliers x<n-k+ 1>, ... , X<n-1), x<n> in a
norma/ sample with IL and uz unknown References: Grubbs (1950), Ferguson (196lb), Fieller (1976).
Test statistic: Properties of test: Maximum likelihood ratio test for a location-slippage
T _ reduced sum of squares s~-k+1, ... n-1,n alternative in which two observations arise from separate normal distri-
4
N - total sum of squares sz butions N(IL + a 11 a 2 ), N(IL +az, az), a 1 <O< az.

Tabulated significance levels: Table IXb, page 304; values for k = 2 ab- N6 Discordancy test for a lower and upper outlier-pair x(l), x<n> in a norma[
ridged from Grubbs and Beck (1972), Table II, pages 851-853, where 0.1, sample with IL and u 2 unknown
0.5, l, 2.5, 5, and lO per cent points are given for n= 4(1)149; values for
k = 3 and k = 4 abridged from Tietjen and Moore (1972), Table I, pages Test statistic:
587-590, where l, 2.5, 5, and lO per cent points are given for k = 1(1)10 TN6 = internally studentized range = x< n>- x(l) .

n
s
and n= max[3, 2k](l)20(5)50. Note that values of TN4 smaller than the
Inequality:
n(n-l)~t.- 2 > [ 2~-=-~~:2
tabulated level are significant.
SP(t),;
References: Grubbs (1950, 1969), Dixon (1950), McMillan (1971), Tietjen
and Moore (1972), Fieller (1976). This is an equality when t~ [~(n -l)p.
Properties of test: N4 is the maximum likelihood ratio test for a location- Tabulated significance levels: Table Xla, page 307; abridged from Pearson
slippage alternative in which k observations arise from separate normal and Hartley (1966), Table 29c, page 200, where lower and upper limits and
distributions each with variance u 2 but with distinct means all exceeding IL· lower and upper 0.5, l, 2.5, 5, and 10 per cent points are given for
A study by McMillan (1971) of the performance measure Pz (see page 66) n= 3(1)20(5)100, 150, 200, 500, 1000.
for tests N3 and N4 in the case k = 2 indicates that N4 is more robust than
N3 against departures from the relevant alternative. Further tables: Shapiro, Wilk, and Chen (1968) give values for the power of
Commonly the number, k, of outliers to be teste d will h ave bee n chosen, the test against 45 di:fferent inherent alternatives (se e Chapter 2, p age 31 ).
either as being the number of manifest outliers, or as a parameter in a References: David, Hartley, and Pearson (1954), Pearson and Stephens
data-processing procedure. As an alternative Tietjen and Moore (1972) (1964), Shapiro, Wilk and Chen (1968).
suggest proceeding in the following way: fin d the 'largest gap', i. e. the
largest of the intervals x<n>- X<n-1), X(n- 1) - X<n-z), . .. , to the right of the Properties: As a test against an inherent alternative, N6 has good power
mean i, and fix upon the observations to the right of this gap (k in number, properties against a variety of symmetric distributions alternative to the
say) for testing as upper outliers. Tietjen and Moore show that N4 applied normal, but performs poorly with respect to asymmetric alternatives.
as a block test to these k outliers has rather better performance, in terms of
proportion of contaminants correctly identified as discordant, than consecu- N7(NIL7) Discordancy test fora single upper outlier x<n> in a norma[ sample
tive test procedures using either Nl4 or Nl7, and much better performance with uz unknown
than consecutive procedures usìng either Nl or (an unspecified) one of the Test statistic:
Dixon-type tests. _ excess _ X(n)- X(n-1)
TN7- - (Dixon's r10 statistic).
range x(n)- x(1)
N5 Discordancy test for a lower and upper outlier-pair x(l), x<n> in a normal
sample with IL and u 2 unknown Test distribution: For small n, we have:
Test statistic: 3.J3
f3(t) = - (t2- t+ 1)-1
21T
_ reduced sum of squares Si,n 2
T f4(t)= .J3 fit)[(l 2t)(4tz-4t+3)-Ì-(t-2)(3t 2 -4t+4)-!].
NS - total sum of squares ~·
96 Outliers in statistica[ data Discordancy tests for outliers in uni variate samples 97

property of being the scale- and location-invariant test of given size which Tabulated significance levels: Table Xa, page 306; freshly compiled on the
maximizes the probability of identifying the k contaminants as discordant. basis of simulations of sizes 10 000. Values of TN 5 smaller than the tabu-
lated level are significant.
N4 Discordancy test for k('~2) upper outliers x<n-k+ 1>, ... , X<n-1), x<n> in a
norma/ sample with IL and uz unknown References: Grubbs (1950), Ferguson (196lb), Fieller (1976).
Test statistic: Properties of test: Maximum likelihood ratio test for a location-slippage
T _ reduced sum of squares s~-k+1, ... n-1,n alternative in which two observations arise from separate normal distri-
4
N - total sum of squares sz butions N(IL + a 11 a 2 ), N(IL +az, az), a 1 <O< az.

Tabulated significance levels: Table IXb, page 304; values for k = 2 ab- N6 Discordancy test for a lower and upper outlier-pair x(l), x<n> in a norma[
ridged from Grubbs and Beck (1972), Table II, pages 851-853, where 0.1, sample with IL and u 2 unknown
0.5, l, 2.5, 5, and lO per cent points are given for n= 4(1)149; values for
k = 3 and k = 4 abridged from Tietjen and Moore (1972), Table I, pages Test statistic:
587-590, where l, 2.5, 5, and lO per cent points are given for k = 1(1)10 TN6 = internally studentized range = x< n>- x(l) .

n
s
and n= max[3, 2k](l)20(5)50. Note that values of TN4 smaller than the
Inequality:
n(n-l)~t.- 2 > [ 2~-=-~~:2
tabulated level are significant.
SP(t),;
References: Grubbs (1950, 1969), Dixon (1950), McMillan (1971), Tietjen
and Moore (1972), Fieller (1976). This is an equality when t~ [~(n -l)p.
Properties of test: N4 is the maximum likelihood ratio test for a location- Tabulated significance levels: Table Xla, page 307; abridged from Pearson
slippage alternative in which k observations arise from separate normal and Hartley (1966), Table 29c, page 200, where lower and upper limits and
distributions each with variance u 2 but with distinct means all exceeding IL· lower and upper 0.5, l, 2.5, 5, and 10 per cent points are given for
A study by McMillan (1971) of the performance measure Pz (see page 66) n= 3(1)20(5)100, 150, 200, 500, 1000.
for tests N3 and N4 in the case k = 2 indicates that N4 is more robust than
N3 against departures from the relevant alternative. Further tables: Shapiro, Wilk, and Chen (1968) give values for the power of
Commonly the number, k, of outliers to be teste d will h ave bee n chosen, the test against 45 di:fferent inherent alternatives (se e Chapter 2, p age 31 ).
either as being the number of manifest outliers, or as a parameter in a References: David, Hartley, and Pearson (1954), Pearson and Stephens
data-processing procedure. As an alternative Tietjen and Moore (1972) (1964), Shapiro, Wilk and Chen (1968).
suggest proceeding in the following way: fin d the 'largest gap', i. e. the
largest of the intervals x<n>- X<n-1), X(n- 1) - X<n-z), . .. , to the right of the Properties: As a test against an inherent alternative, N6 has good power
mean i, and fix upon the observations to the right of this gap (k in number, properties against a variety of symmetric distributions alternative to the
say) for testing as upper outliers. Tietjen and Moore show that N4 applied normal, but performs poorly with respect to asymmetric alternatives.
as a block test to these k outliers has rather better performance, in terms of
proportion of contaminants correctly identified as discordant, than consecu- N7(NIL7) Discordancy test fora single upper outlier x<n> in a norma[ sample
tive test procedures using either Nl4 or Nl7, and much better performance with uz unknown
than consecutive procedures usìng either Nl or (an unspecified) one of the Test statistic:
Dixon-type tests. _ excess _ X(n)- X(n-1)
TN7- - (Dixon's r10 statistic).
range x(n)- x(1)
N5 Discordancy test for a lower and upper outlier-pair x(l), x<n> in a normal
sample with IL and u 2 unknown Test distribution: For small n, we have:
Test statistic: 3.J3
f3(t) = - (t2- t+ 1)-1
21T
_ reduced sum of squares Si,n 2
T f4(t)= .J3 fit)[(l 2t)(4tz-4t+3)-Ì-(t-2)(3t 2 -4t+4)-!].
NS - total sum of squares ~·
98 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 99

Tabulated significance levels: Table XIIIa, page 311; abridged from Dixon Tabulated significance levels: Table XIIIc, page 311; abridged from Dixon
(1951), Table I, page 73, where upper 0.5, l, 2 per cent points, upper and (1951), Table II, page 74, where more extensive values are given, as
Iower 5, 10, 20, 30, 40 per cent points, and the 50 per cent point are detailed in Worksheet N7(N 11-7).
given for n= 3(1)30.
Further tables: DÌ.Xon (1950) gives graphs of P 3 ; see Worksheet N7(NIL7).
Further tables: Dixon (1950) gives graphs of the performance measure P3,
based on sampling experiments of comparatively small size (66-200 replica- References: Dixon (1950, 1951).
tions). Ferguson (1961a) gives tables of power P 1 for the alternative of Properties of test: Advantage: avoids any possible masking effect of lowest
slippage in location by one observation. sample value x(l) (by inftation of the denominator). Disadvantage: vulnera-
References: Dixon (1950, 1951), Ferguson (1961a, 1961b). ble to masking effect of x<n- 1). See also Worksheet N12(N 11-12) in relation to
performance.
Properties of test: Mainly effective when there is at most one discordant
value, otherwise vulnerable to possible masking effect of x<n- 1) and/or x(l); NlO(NILlO) Discordancy test fora single upper outlier x(n) in a norma/ sample
see properties of test N l. The performances of tests N7(N IL7) as measured with u 2 unknown
both by P 3 and P 1 , against the alternative of slippage in location by a single
observation, are effectively the same for sample sizes up to 15. Test statistic:
X(n)- X(n-1)
N8(N 11-8) Two-sided discordancy test for an extreme outlier in a norma l (Dixon's r 12 statistic).
sample with u 2 unknown X(n)- X(3)

Test statistic:
Tabulated significance levels: Table XIIId~ page 311; abridged from Dixon
(1951), Table III, page 75, where more extensive values are given, as
detailed in Worksheet N7(NIL7).
Inequality: Further tables and references: As for N9(N 11-9).
SP(t)~2P(TN 7 > t).
Properties of test: Avoids any possible masking effect of the two lowest
This is an equality when t~~- observations x(l), X(z) on the testing of x(n)' but is vulnerable to any masking
Tabulated significance levels: T ab le XIIIb, page 311; freshly compiled on effect of x<n- 1). See also Worksheet N13(NIL13) in relation to performance.
the basis of simulations of sizes lO 000.
Nll(NILll) Discordancy test for an upper outlier-pair x<n-1), x<n> in a norma[
Reference: King (1953). sample with u 2 unknown
Properties of test: Two-sided form of N7(NIL7).
Test statistic:
N9(N IL9) Discordancy test for a single upper outlier x( n) in a norma l sample
with u 2 unknown (Dixon's r20 statistic).

Test statistic:
Tabulated significance levels: T ab le XIII e, p age 311; abridged from Dixon
(Dixon's r11 statistic). (1951), Table IV, page 76, where more extensive values are given, as
detailed in Worksheet N7(N IL 7).
Test distribution: For n= 4 we bave: Further tables and references: As for N9(N 11-9).
3.J3 2 -t+l)-2l [l+(t-2)[3(4-4t+3t)
f4(t)=-(t 2 ] 2l]
- • Properties of test: Nll(N 11-ll) can also be used as a discordancy test for a
7T single upper ·outlier X(n) which avoids the risk of masking by x(n- 1).
98 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 99

Tabulated significance levels: Table XIIIa, page 311; abridged from Dixon Tabulated significance levels: Table XIIIc, page 311; abridged from Dixon
(1951), Table I, page 73, where upper 0.5, l, 2 per cent points, upper and (1951), Table II, page 74, where more extensive values are given, as
Iower 5, 10, 20, 30, 40 per cent points, and the 50 per cent point are detailed in Worksheet N7(N 11-7).
given for n= 3(1)30.
Further tables: DÌ.Xon (1950) gives graphs of P 3 ; see Worksheet N7(NIL7).
Further tables: Dixon (1950) gives graphs of the performance measure P3,
based on sampling experiments of comparatively small size (66-200 replica- References: Dixon (1950, 1951).
tions). Ferguson (1961a) gives tables of power P 1 for the alternative of Properties of test: Advantage: avoids any possible masking effect of lowest
slippage in location by one observation. sample value x(l) (by inftation of the denominator). Disadvantage: vulnera-
References: Dixon (1950, 1951), Ferguson (1961a, 1961b). ble to masking effect of x<n- 1). See also Worksheet N12(N 11-12) in relation to
performance.
Properties of test: Mainly effective when there is at most one discordant
value, otherwise vulnerable to possible masking effect of x<n- 1) and/or x(l); NlO(NILlO) Discordancy test fora single upper outlier x(n) in a norma/ sample
see properties of test N l. The performances of tests N7(N IL7) as measured with u 2 unknown
both by P 3 and P 1 , against the alternative of slippage in location by a single
observation, are effectively the same for sample sizes up to 15. Test statistic:
X(n)- X(n-1)
N8(N 11-8) Two-sided discordancy test for an extreme outlier in a norma l (Dixon's r 12 statistic).
sample with u 2 unknown X(n)- X(3)

Test statistic:
Tabulated significance levels: Table XIIId~ page 311; abridged from Dixon
(1951), Table III, page 75, where more extensive values are given, as
detailed in Worksheet N7(NIL7).
Inequality: Further tables and references: As for N9(N 11-9).
SP(t)~2P(TN 7 > t).
Properties of test: Avoids any possible masking effect of the two lowest
This is an equality when t~~- observations x(l), X(z) on the testing of x(n)' but is vulnerable to any masking
Tabulated significance levels: T ab le XIIIb, page 311; freshly compiled on effect of x<n- 1). See also Worksheet N13(NIL13) in relation to performance.
the basis of simulations of sizes lO 000.
Nll(NILll) Discordancy test for an upper outlier-pair x<n-1), x<n> in a norma[
Reference: King (1953). sample with u 2 unknown
Properties of test: Two-sided form of N7(NIL7).
Test statistic:
N9(N IL9) Discordancy test for a single upper outlier x( n) in a norma l sample
with u 2 unknown (Dixon's r20 statistic).

Test statistic:
Tabulated significance levels: T ab le XIII e, p age 311; abridged from Dixon
(Dixon's r11 statistic). (1951), Table IV, page 76, where more extensive values are given, as
detailed in Worksheet N7(N IL 7).
Test distribution: For n= 4 we bave: Further tables and references: As for N9(N 11-9).
3.J3 2 -t+l)-2l [l+(t-2)[3(4-4t+3t)
f4(t)=-(t 2 ] 2l]
- • Properties of test: Nll(N 11-ll) can also be used as a discordancy test for a
7T single upper ·outlier X(n) which avoids the risk of masking by x(n- 1).
l 00 Outliers in statistical data Discordancy tests for outliers in univariate samples l Ol

Nl2(N IL12) Discordancy test for an upper outlier-pair x<n-l)' x<n> in a norma[ Tabulated significance levels: Table XIVa, page 312; abridged from Pearson
sample with u 2 unknown and Hartley (1966), Table 34B, page 207, where 5 per cent and l per cent
pointsare givenfor n= 25(5)50(10)100(25)200(50)1000(200)2000(500)5000,
Test statistic:
and from Ferguson (196la), Table I, page 281, where estimated 10, 5,
_ X(n)- X(n-2) and l per cent points are given for n= 5(5)25.
T Nl2- (Dixon's r21 statistic).
X(n)- X(2)
Further tables: Ferguson (196la) gives tables of power P 1 against the
Tabulated significance levels: T ab le XIIIf, page 311; abridged from Dixon alternative of slippage in location by a single observation. Shapiro, Wilk,
(1951), Table V, page 77, where more extensive values are given, as and Chen (1968) give values for power against 45 different inherent alterna-
detailed in Worksheet N7(NIL7). ~ tives (see Chapter 2, page 31).
Further tables and references: As for N9(N IL9). References: Ferguson (196la, 196lb), Shapiro, Wilk, and Chen (1968).
Properties of test: Avoids any possible masking e:ffect from x(l). Can be used Properties of test: Nl4 is the locally best invariant test of given size against a
as a discordancy test for a single upper outlier x<n>' and for this purpose is to location-slippage alternative in which k of the n observations arise from
be preferred to test N9(N IL9), sin ce its performance is similar against a separate normal distributions N(IL + a 1 , u 2 ), ••• , N(IL +ab u 2 ), a 1 >O, a 2 >
single contaminant and it avoids the risk of masking from x<n-1) if there is O, ... , ak >O, whatever the values of the a 's, an d whatever the value of k
more than one contaminant. provided only that the contamination proportion k/n under the alternative
hypothesis is less than !.
Nl3(N IL13) Discordancy test for an upper outlier-pair x<n-1), x<n> in a norma/ Its power is nearly as good as that of Nl against slippage in location fora
sample with u 2 unknown single observation by medium or large amounts. It also has good power
against inherent Cauchy and log-normal alternatives.
Test statistic:
_ X(n)- X(n-2)
T Nl3- (Dixon's r22 statistic). Nl5 Discordancy test for one or more outliers (irrespective of their directions)
X(n)- X(3)
in a norma[ sample with IL and u 2 unknown
Tabulated significance levels: Table XIIIg, page 311; abridged from Dixon
(1951), Table VI, page 78, where more extensive values are given, as
Test statistic:
4
detailed in Worksheet N7(NIL7). ,.,.,
A. NlS = samp1e k urtosts. = LJ=l (xi-
ns 4
i)

Further tables: As for N9(N IL9)., The value tested for discordancy is whichever of x<n> or x< 1> is further from i.
References: Dixon (1950, 1951), Ferguson (196lb). Discordancy is indicated by high values of the statistic. For more than one
outlier, apply test consecutively.
Properties of test: Avoids any possible masking e:ffect from the two lowest
observations x< 0 , x<2 >. Can be used as a discordancy test for a single upper Tabulated significance levels: Table XIVb, page 312; abridged from Pear-
outlier x<n>' and for this purpose is superior to test NlO(N ILlO), having a son and Hartley (1966), Table 34C, page 208, where lower and upper l per
similar performance against a single contaminant but being more robust cent and 5 per cent points are given for n= 50(25)150(50)700(100)
against the presence of a second upper outlier at x<n-l)· 1000(200)2000(500)5000, and from Ferguson (196la), Table II, page 282,
where estimated l, 5, and 10 per cent points are given for n= 5(5)25.
Nl4 Discordancy test for one or more upper (or lower) outliers in a norma/ Further tables: Ferguson (196la) gives tables of power against alternatives
sample with IL and u 2 unknown of slippage in location by a single observation and by two observations.
Test statistic: Shapiro, Wilk, and Chen (1968) give values for power against 45 different

TN 14 = sample skewness = [L~=


1 l (X·-
ns'3
i)
3
]! •
inherent alternatives (see Chapter 2, page 31).
References: Ferguson (196la, 196lb), Shapiro, Wilk, and Chen (1968).
The value tested for discordancy is x< n> or x< 1> according as the sign of TNl 4 Properties of test: Nl5 is the locally best unbiased invariant test of given size
is + or -. For more than one outlier, apply test consecutively. against a location-slippage alternative in which k of the n observations arise
l 00 Outliers in statistical data Discordancy tests for outliers in univariate samples l Ol

Nl2(N IL12) Discordancy test for an upper outlier-pair x<n-l)' x<n> in a norma[ Tabulated significance levels: Table XIVa, page 312; abridged from Pearson
sample with u 2 unknown and Hartley (1966), Table 34B, page 207, where 5 per cent and l per cent
pointsare givenfor n= 25(5)50(10)100(25)200(50)1000(200)2000(500)5000,
Test statistic:
and from Ferguson (196la), Table I, page 281, where estimated 10, 5,
_ X(n)- X(n-2) and l per cent points are given for n= 5(5)25.
T Nl2- (Dixon's r21 statistic).
X(n)- X(2)
Further tables: Ferguson (196la) gives tables of power P 1 against the
Tabulated significance levels: T ab le XIIIf, page 311; abridged from Dixon alternative of slippage in location by a single observation. Shapiro, Wilk,
(1951), Table V, page 77, where more extensive values are given, as and Chen (1968) give values for power against 45 different inherent alterna-
detailed in Worksheet N7(NIL7). ~ tives (see Chapter 2, page 31).
Further tables and references: As for N9(N IL9). References: Ferguson (196la, 196lb), Shapiro, Wilk, and Chen (1968).
Properties of test: Avoids any possible masking e:ffect from x(l). Can be used Properties of test: Nl4 is the locally best invariant test of given size against a
as a discordancy test for a single upper outlier x<n>' and for this purpose is to location-slippage alternative in which k of the n observations arise from
be preferred to test N9(N IL9), sin ce its performance is similar against a separate normal distributions N(IL + a 1 , u 2 ), ••• , N(IL +ab u 2 ), a 1 >O, a 2 >
single contaminant and it avoids the risk of masking from x<n-1) if there is O, ... , ak >O, whatever the values of the a 's, an d whatever the value of k
more than one contaminant. provided only that the contamination proportion k/n under the alternative
hypothesis is less than !.
Nl3(N IL13) Discordancy test for an upper outlier-pair x<n-1), x<n> in a norma/ Its power is nearly as good as that of Nl against slippage in location fora
sample with u 2 unknown single observation by medium or large amounts. It also has good power
against inherent Cauchy and log-normal alternatives.
Test statistic:
_ X(n)- X(n-2)
T Nl3- (Dixon's r22 statistic). Nl5 Discordancy test for one or more outliers (irrespective of their directions)
X(n)- X(3)
in a norma[ sample with IL and u 2 unknown
Tabulated significance levels: Table XIIIg, page 311; abridged from Dixon
(1951), Table VI, page 78, where more extensive values are given, as
Test statistic:
4
detailed in Worksheet N7(NIL7). ,.,.,
A. NlS = samp1e k urtosts. = LJ=l (xi-
ns 4
i)

Further tables: As for N9(N IL9)., The value tested for discordancy is whichever of x<n> or x< 1> is further from i.
References: Dixon (1950, 1951), Ferguson (196lb). Discordancy is indicated by high values of the statistic. For more than one
outlier, apply test consecutively.
Properties of test: Avoids any possible masking e:ffect from the two lowest
observations x< 0 , x<2 >. Can be used as a discordancy test for a single upper Tabulated significance levels: Table XIVb, page 312; abridged from Pear-
outlier x<n>' and for this purpose is superior to test NlO(N ILlO), having a son and Hartley (1966), Table 34C, page 208, where lower and upper l per
similar performance against a single contaminant but being more robust cent and 5 per cent points are given for n= 50(25)150(50)700(100)
against the presence of a second upper outlier at x<n-l)· 1000(200)2000(500)5000, and from Ferguson (196la), Table II, page 282,
where estimated l, 5, and 10 per cent points are given for n= 5(5)25.
Nl4 Discordancy test for one or more upper (or lower) outliers in a norma/ Further tables: Ferguson (196la) gives tables of power against alternatives
sample with IL and u 2 unknown of slippage in location by a single observation and by two observations.
Test statistic: Shapiro, Wilk, and Chen (1968) give values for power against 45 different

TN 14 = sample skewness = [L~=


1 l (X·-
ns'3
i)
3
]! •
inherent alternatives (see Chapter 2, page 31).
References: Ferguson (196la, 196lb), Shapiro, Wilk, and Chen (1968).
The value tested for discordancy is x< n> or x< 1> according as the sign of TNl 4 Properties of test: Nl5 is the locally best unbiased invariant test of given size
is + or -. For more than one outlier, apply test consecutively. against a location-slippage alternative in which k of the n observations arise
102 Outliers in statistica/ data Discordancy tests for outliers in un iv ariate samples 103

from separate normal distributions N(IL +ah u 2), ... , N(IL +ab u 2 ), where where [n/2] denotes the integer part of n/2, and the an,i are tabulated
ah ... , ak differ from zero but are otherwise arbitrary, provided that the constants (Table XVIb, page 315; extracted from Shapiro and Wilk (1965),
contamination proportion k/n under the alternative hypothesis is less than Table 5, pages 603-604, where values of these constants are given for
0.21. Nl5 is also the locally best invariant test of given size against a n= 2(1)50.
dispersion-slippage alternative in which k of the observations arise from Test distribution: For n= 3, we have:
separate normal distributions N(IL, b1 u 2 ), ••• , N(IL, bku 2 ), b1 >l, ... , bk >l,
irrespective of the proportion k/ n.
Its power is nearly as good as that of N2 against slippage in location for a (~~t~ 1).
single observation by medium or large amount§. Against slippage in location
by two observations it is superior to N2 in power, greatly so when the Tabulated significance levels: Table XVIa, page 314; abridged from Shapiro
sample size is less than, say, 20. and Wilk (1965), Table 6, page 605, where lower and upper l, 2, 5, and 10
Nl5 has the advantage of being robust against possible masking effect. It per cent points, and the 50 per cent point, are given for n= 3(1)50. Values
is suitable for consecutive use in the possible presence of more than one of T Nl? smaller than the tabulated level are significant.
outlier.
Further tables: Shapiro, Wilk, and Chen (1968) give values for the power of
Nl6 Two-sided discordancy test for k outliers (irrespective of directions) in a the test against 45 different inherent alternatives (see Chapter 2, page 31).
norma[ sample with IL and u 2 unknown Chen (1971) gives values for the power against contamination by a given
number of observations (either l or 2) in small samples (n~ lO), or with a
Test statistic: given contamination probability per observation (0.05, 0.10 or 0.20) in
larger samples (up to n = 50); h e deals with shifts both in location an d in
TN16
. .
= TtetjenandMoore'sEk-stattstlc =
. . r.~:;f (r · - fn-d 2
l n (J) - 2
dispersion.
Li=l (r<n- r)
References: Shapiro and Wilk (1965), Shapiro, Wilk, and Chen (1968),
where ri =lxi- il, the absolute deviation of xi from the sample mean; {r(i)} Chen (1971).
are the values of the ri in ascending order, '<o< '<z> < ... <'<n>; f is the mean
of ali the r's; and fn-k is the mean of the (n- k) lowest r's, i.e. Properties: A useful omnibus test, both against inherent alternatives and
against slippage alternatives.
fn-k = (r(l) + · · · + r(n-k))f(n- k).
Nvl Discordancy test fora single upper outlier x<n> in a norma/ sample with IL
Tabulated significance levels: Table XV, page 313, extracted from Tietjen unknown and an independent estimate of u 2 known
and Moore (1972), Table II, pages 591-593, where l, 5, and 10 per cent
points are given for k = 1(1)10 and n= [max(3, 2k)](l)20(5)50. Two errone- Test statistic:
ous entries in Tietjen and Moore's Table Ila have been amended (n= 10, . . . X(n)-i
k = 3; n= 10, k = 4). T Nvl = externally studentlzed extreme devtatmn from the me an= -s-- .
v
Reference: Tietjen and Moore (1972).
Recurrence relationship:
Properties of test: A pragmatic test procedure.
1
Nl7 Two-sided test for the presence of an undefined number of discordant fn(t) = (
n3 )! r(v; ) (
l+
nt2 )-<v+l)/
2
Fn-1
( nt )
--1
values in a norma/ sample with IL and u 2 unknown (n-1)7TV r(~) (n-l)v n-
Test statistic:
with F 1 (t)=O(t<O), l(t>O).
TN17 = Shapiro and Wilk's W-statistic
[n/2]
= ( i~l an,n-i+l[x(n-i+l)- x(i)]
)2/S 2
Inequality:
SP(t) < nP(tv >[n!( n -l)pt).
102 Outliers in statistica/ data Discordancy tests for outliers in un iv ariate samples 103

from separate normal distributions N(IL +ah u 2), ... , N(IL +ab u 2 ), where where [n/2] denotes the integer part of n/2, and the an,i are tabulated
ah ... , ak differ from zero but are otherwise arbitrary, provided that the constants (Table XVIb, page 315; extracted from Shapiro and Wilk (1965),
contamination proportion k/n under the alternative hypothesis is less than Table 5, pages 603-604, where values of these constants are given for
0.21. Nl5 is also the locally best invariant test of given size against a n= 2(1)50.
dispersion-slippage alternative in which k of the observations arise from Test distribution: For n= 3, we have:
separate normal distributions N(IL, b1 u 2 ), ••• , N(IL, bku 2 ), b1 >l, ... , bk >l,
irrespective of the proportion k/ n.
Its power is nearly as good as that of N2 against slippage in location for a (~~t~ 1).
single observation by medium or large amount§. Against slippage in location
by two observations it is superior to N2 in power, greatly so when the Tabulated significance levels: Table XVIa, page 314; abridged from Shapiro
sample size is less than, say, 20. and Wilk (1965), Table 6, page 605, where lower and upper l, 2, 5, and 10
Nl5 has the advantage of being robust against possible masking effect. It per cent points, and the 50 per cent point, are given for n= 3(1)50. Values
is suitable for consecutive use in the possible presence of more than one of T Nl? smaller than the tabulated level are significant.
outlier.
Further tables: Shapiro, Wilk, and Chen (1968) give values for the power of
Nl6 Two-sided discordancy test for k outliers (irrespective of directions) in a the test against 45 different inherent alternatives (see Chapter 2, page 31).
norma[ sample with IL and u 2 unknown Chen (1971) gives values for the power against contamination by a given
number of observations (either l or 2) in small samples (n~ lO), or with a
Test statistic: given contamination probability per observation (0.05, 0.10 or 0.20) in
larger samples (up to n = 50); h e deals with shifts both in location an d in
TN16
. .
= TtetjenandMoore'sEk-stattstlc =
. . r.~:;f (r · - fn-d 2
l n (J) - 2
dispersion.
Li=l (r<n- r)
References: Shapiro and Wilk (1965), Shapiro, Wilk, and Chen (1968),
where ri =lxi- il, the absolute deviation of xi from the sample mean; {r(i)} Chen (1971).
are the values of the ri in ascending order, '<o< '<z> < ... <'<n>; f is the mean
of ali the r's; and fn-k is the mean of the (n- k) lowest r's, i.e. Properties: A useful omnibus test, both against inherent alternatives and
against slippage alternatives.
fn-k = (r(l) + · · · + r(n-k))f(n- k).
Nvl Discordancy test fora single upper outlier x<n> in a norma/ sample with IL
Tabulated significance levels: Table XV, page 313, extracted from Tietjen unknown and an independent estimate of u 2 known
and Moore (1972), Table II, pages 591-593, where l, 5, and 10 per cent
points are given for k = 1(1)10 and n= [max(3, 2k)](l)20(5)50. Two errone- Test statistic:
ous entries in Tietjen and Moore's Table Ila have been amended (n= 10, . . . X(n)-i
k = 3; n= 10, k = 4). T Nvl = externally studentlzed extreme devtatmn from the me an= -s-- .
v
Reference: Tietjen and Moore (1972).
Recurrence relationship:
Properties of test: A pragmatic test procedure.
1
Nl7 Two-sided test for the presence of an undefined number of discordant fn(t) = (
n3 )! r(v; ) (
l+
nt2 )-<v+l)/
2
Fn-1
( nt )
--1
values in a norma/ sample with IL and u 2 unknown (n-1)7TV r(~) (n-l)v n-
Test statistic:
with F 1 (t)=O(t<O), l(t>O).
TN17 = Shapiro and Wilk's W-statistic
[n/2]
= ( i~l an,n-i+l[x(n-i+l)- x(i)]
)2/S 2
Inequality:
SP(t) < nP(tv >[n!( n -l)pt).
l 04 Outliers in statistica[ data Discordancy tests for outliers in univariate samples l O5

Tabulated significance levels: Table VIlla, page 300; abridged from Pear- Further tables: David and Paulson (1965) give graphs of performance meas-
son and Hartley (1966), Table 26, pages 185-186, where 10, 5, 2.5, l, 0.5, ure P 2 (see p age 66) for an alternative mode l of slippage in location by one
and 0.1 per cent points are given for n= 3(1)10, 12 and v= 10(1)20, 24, 30, observation. McMillan (1971), with subsequent amendments by Moran and
40, 60, 120, 00 , and further 5 and l per cent points for the additional McMillan (1973) 1 gives graphs of performance measures P( C 1 ), P( C2 ),
v-values 5(1)9. P(C3 ) (see page 74) when Nv2 is used consecutively for testing two upper
Further tables: David and Paulson (1965) give graphs of performance meas- outliers.
ure P2 (see page 66) for an alternative model of slippage in location by one References: Kudo (1956a), Quesenberry and David (1961), David and Paul-
observation. McMillan (1971), subsequently amended by Moran and McMil- son (1965), McMillan (1971), Moran and McMillan (1973).
lan (1973), gives graphs of performance measures P( C 1), P( C2), P( C3 ) (see
page 74) for the consecutive testing of two upper outliers using Nvl. Properties of test: For a location-slippage alternative in which one observa-
tion arises from a norma! distribution N(IL +a, u 2 ), a> O, Nv2 has the
References: Nair (1948, 1952), David (1956a, 1956b), David and Paulson optimal property of being the scale- and location-invariant test of given size
(1965), McMillan (1971), Moran and McMillan (1973). which maximizes the probability of identifying the contaminant as discor-
Properties of test: Makes no use of the internai estimate of variance; if there dant.
is at most one contaminant this wastes information, and test Nv2 is prefera-
ble; on the other hand it o:ffers a safeguard against the risk of masking if Nv3 Two-sided discordancy test for an extreme outlier in a norma/ sample
there is more than one contaminant. with IL unknown and an independent estimate of u 2 known
Test statistic:
Nv2 Discordancy test for a single upper outlier x<n> in a norma/ sample with IL
unknown and an independent estimate of u 2 known T Nv 3 = externally studentized extreme absolute deviation from the me an
Test statistic: _ (X(n)- i
-max - - - , - - - .
i- X(l))
Sv Sv
T Nvz = externally an d internally studentized extreme deviati o n
from the mean = x(n)- i Inequality:
s . SP(t) < 2P(TNvl >t)< 2nP(tv >[n/( n -l)]!t).
Recurrence relationship: Tabulated significance levels: Table VIIIb, page 301; derived from Halperin,

- ( n
3
l

)2
r(n-12 +v) ( nt2 )(n-4+v)/2
Greenhouse, Cornfield, and Zalokar (1955), Tables l and 2, pages 187-188,
where bounds on the 5 and l per cent points are given for n= 3(1)10, 15,
fn(t)- 1------
?r(n-l)(n-l+v) r(n-~+v) (n-l)(n-l+v) 20, 30, 40, 60 and v= 3(1)10, 15, 20, 30, 40, 60, 120, oo subject to v~ n.
Reference: Halperin et al. (1955).
xF
n-l
[(
2
2
n (n-2+v)t
2

(n-1) (n-l+v)-n(n-l)t 2
)!] '
Properties of test: An appropriate test for comparing treatment means in
analysis of variance.
t~ [(n -l)(n -l+ v)/np,
with F 1 (t)=O (t<O), l (t> O).
Nv4 Two-sided discordancy test for an extreme outlier in a norma[ sample
Inequality:

SP( t) < n1 t•-2+" > [ (n- ~i~n


-_2/+v ~ ;~ ntz
Tabulated significance levels: Table VIlle, page 302; derived from Quesen-
n with IL unknown and an independent estimate of u 2 known
Test statistic:

T Nv4 = externally an d internally studentized extreme absolute


berry a~d David (1961), Tables l and 2, page 388, where 5 and l per deviation from the mean
cent pomts of (n -l+ v)- 2 TNv2 are given for n= 3(1)10, 12, 15, 20 and X(n)-
= max ( ----, ---- .
i i- X(l))
v= 0(1)10, 12, 15, 20, 24, 30, 40, 50.
s s
l 04 Outliers in statistica[ data Discordancy tests for outliers in univariate samples l O5

Tabulated significance levels: Table VIlla, page 300; abridged from Pear- Further tables: David and Paulson (1965) give graphs of performance meas-
son and Hartley (1966), Table 26, pages 185-186, where 10, 5, 2.5, l, 0.5, ure P 2 (see p age 66) for an alternative mode l of slippage in location by one
and 0.1 per cent points are given for n= 3(1)10, 12 and v= 10(1)20, 24, 30, observation. McMillan (1971), with subsequent amendments by Moran and
40, 60, 120, 00 , and further 5 and l per cent points for the additional McMillan (1973) 1 gives graphs of performance measures P( C 1 ), P( C2 ),
v-values 5(1)9. P(C3 ) (see page 74) when Nv2 is used consecutively for testing two upper
Further tables: David and Paulson (1965) give graphs of performance meas- outliers.
ure P2 (see page 66) for an alternative model of slippage in location by one References: Kudo (1956a), Quesenberry and David (1961), David and Paul-
observation. McMillan (1971), subsequently amended by Moran and McMil- son (1965), McMillan (1971), Moran and McMillan (1973).
lan (1973), gives graphs of performance measures P( C 1), P( C2), P( C3 ) (see
page 74) for the consecutive testing of two upper outliers using Nvl. Properties of test: For a location-slippage alternative in which one observa-
tion arises from a norma! distribution N(IL +a, u 2 ), a> O, Nv2 has the
References: Nair (1948, 1952), David (1956a, 1956b), David and Paulson optimal property of being the scale- and location-invariant test of given size
(1965), McMillan (1971), Moran and McMillan (1973). which maximizes the probability of identifying the contaminant as discor-
Properties of test: Makes no use of the internai estimate of variance; if there dant.
is at most one contaminant this wastes information, and test Nv2 is prefera-
ble; on the other hand it o:ffers a safeguard against the risk of masking if Nv3 Two-sided discordancy test for an extreme outlier in a norma/ sample
there is more than one contaminant. with IL unknown and an independent estimate of u 2 known
Test statistic:
Nv2 Discordancy test for a single upper outlier x<n> in a norma/ sample with IL
unknown and an independent estimate of u 2 known T Nv 3 = externally studentized extreme absolute deviation from the me an
Test statistic: _ (X(n)- i
-max - - - , - - - .
i- X(l))
Sv Sv
T Nvz = externally an d internally studentized extreme deviati o n
from the mean = x(n)- i Inequality:
s . SP(t) < 2P(TNvl >t)< 2nP(tv >[n/( n -l)]!t).
Recurrence relationship: Tabulated significance levels: Table VIIIb, page 301; derived from Halperin,

- ( n
3
l

)2
r(n-12 +v) ( nt2 )(n-4+v)/2
Greenhouse, Cornfield, and Zalokar (1955), Tables l and 2, pages 187-188,
where bounds on the 5 and l per cent points are given for n= 3(1)10, 15,
fn(t)- 1------
?r(n-l)(n-l+v) r(n-~+v) (n-l)(n-l+v) 20, 30, 40, 60 and v= 3(1)10, 15, 20, 30, 40, 60, 120, oo subject to v~ n.
Reference: Halperin et al. (1955).
xF
n-l
[(
2
2
n (n-2+v)t
2

(n-1) (n-l+v)-n(n-l)t 2
)!] '
Properties of test: An appropriate test for comparing treatment means in
analysis of variance.
t~ [(n -l)(n -l+ v)/np,
with F 1 (t)=O (t<O), l (t> O).
Nv4 Two-sided discordancy test for an extreme outlier in a norma[ sample
Inequality:

SP( t) < n1 t•-2+" > [ (n- ~i~n


-_2/+v ~ ;~ ntz
Tabulated significance levels: Table VIlle, page 302; derived from Quesen-
n with IL unknown and an independent estimate of u 2 known
Test statistic:

T Nv4 = externally an d internally studentized extreme absolute


berry a~d David (1961), Tables l and 2, page 388, where 5 and l per deviation from the mean
cent pomts of (n -l+ v)- 2 TNv2 are given for n= 3(1)10, 12, 15, 20 and X(n)-
= max ( ----, ---- .
i i- X(l))
v= 0(1)10, 12, 15, 20, 24, 30, 40, 50.
s s
106 Outliers in statistica l data Discordancy tests for outliers in uni variate samples 107

Inequality: Tabulated significance levels: Table Xlc, pp. 308-309; abridged from Pear-
son and Hartley (1966), Table 29, pages 191-193, where 10, 5, and l per
SP(t)<2P(T
Nv2
>t)<2nP(t _
n 2+v
>[ n(n- 2 +v)t
(n-l)(n-l+v)-nt2
2
]!) . cent points are given for n= 2(1)20 and v= 1(1)20, 24, 30, 40, 60, 120, 00 •
Further tables: Harter (1969a) gives upper and lower 0.1, 0.5, l, 2.5, 5,
Tabulated significance levels: Table VIIId, page 303; derived from Quesen- 10(10)40 per cent points and the median, and extends the sample sizes to
berry and David (1961), Tables 3 and 4, pages 389-390, where bounds on n= 22(2)40(10)100.
the 5 and l per cent points of (n.-1 + v)-!TNv4 are given for n= 3(1)10, 12,
15, 20 and v= 0(1)10, 12, 15, 20, 24, 30, 40, 50. An error in Quesenberry References: Dixon (1950), Thompson (1955), Moore (1957), David (1962),
and David's Table 3 (the entry for n= 7, v= 4) has been corrected. Harter (1969a), Fieller (1976).
References: Kudo (1956a), Quesenberry and David (1961). Nv7 Discordancy test fora lower and upper outlier-pair x(t)' x<n> in a normal
Properties of test: For a location-slippage alternative in which one observa- sample with IL unknown and an independent estimate of u 2 known
tion arises from a normal distribution N(IL +a, u 2), a# O, Nv4 has the Test statistic:
optimal property of being the scale- and location-invariant test of given size
• • X(n)- X(ll
which maximizes the probability of identifying the contaminant as discor-
dant.
T Nv? = externally and mternally studentlzed range = s ,
Inequality:
Nv5 Discordancy test for k(':2;!2) upper outliers x<n-k+l)' ... , x<n-l)' x(n) in a 2
norma[ sample with IL unknown and an independent estimate of u 2 SP(t) ~ n(n -l)P(tn-Z+v >[(n- 2+ v)t 2 /(2n- 2+2v- t )]!).
known
This is an equality when t~ B(n -l+ v)]!.
Test statistic:
Reference: Fieller (1976).
TNvs = sum of jointly (externally and internally) studentized
deviations from the mean N IL l Discordancy test fora single upper outlier x( n) in a norma l sample with IL
= X(n-k+l) + · · · + X(n-1) + X(n)- ki known and u 2 unknown
s . Test statistic:
Inequality:

SP(t)~(n)P(t
k n-Z+v
>[ k(n-k)(n-l+v)-nt
n(n-2+v)t
2

2
]!)
• Recurrence relationship:
This is an equality when t~ [k (n -l+ v)(n- k -1)/(nk +n)]!.
2

References: McMillan (1971), Fieller (1976).


!.(t)
=
7T
r(~) (l-~)n2
(!!.)! r(n~l) n
3
F _ ([(n -l)t ]!)
• ln-t'
2

Properties of test: As for N3.

Nv6 Discorda~cy test fora lower and upper outlier-pair x< 1>, x<n> in a norma[ with F 1 (t)=O (t<-1),! (-l< t< l), l (t> l).
sample wzth IL unknown and an independent estimate of u 2 known Tabulated significance levels: Table VIle, page 298; freshly compiled on the
Test statistic: basis of simulations of sizes 10 000.
Properties of test: Note that T Nj.Ll can take negative values. !he statis~ic
T Nv6 = externally studentized range = x( n)- x(l) .
S~(IL)/S 2 (1L) = 1- (l/n)T~j.L 1 is therefore not equivalent to TN~Ll' m contradts-
Sv
Inequality: tinction to the one-one relationship between S~/ S 2 an d TNt· The occurrence of a
negative value for T Nj.Ll has probability 112n on the working hypothesis and is
SP(t) <n( n -l)P(tv > t../2). therefore rare except for very small samples.
106 Outliers in statistica l data Discordancy tests for outliers in uni variate samples 107

Inequality: Tabulated significance levels: Table Xlc, pp. 308-309; abridged from Pear-
son and Hartley (1966), Table 29, pages 191-193, where 10, 5, and l per
SP(t)<2P(T
Nv2
>t)<2nP(t _
n 2+v
>[ n(n- 2 +v)t
(n-l)(n-l+v)-nt2
2
]!) . cent points are given for n= 2(1)20 and v= 1(1)20, 24, 30, 40, 60, 120, 00 •
Further tables: Harter (1969a) gives upper and lower 0.1, 0.5, l, 2.5, 5,
Tabulated significance levels: Table VIIId, page 303; derived from Quesen- 10(10)40 per cent points and the median, and extends the sample sizes to
berry and David (1961), Tables 3 and 4, pages 389-390, where bounds on n= 22(2)40(10)100.
the 5 and l per cent points of (n.-1 + v)-!TNv4 are given for n= 3(1)10, 12,
15, 20 and v= 0(1)10, 12, 15, 20, 24, 30, 40, 50. An error in Quesenberry References: Dixon (1950), Thompson (1955), Moore (1957), David (1962),
and David's Table 3 (the entry for n= 7, v= 4) has been corrected. Harter (1969a), Fieller (1976).
References: Kudo (1956a), Quesenberry and David (1961). Nv7 Discordancy test fora lower and upper outlier-pair x(t)' x<n> in a normal
Properties of test: For a location-slippage alternative in which one observa- sample with IL unknown and an independent estimate of u 2 known
tion arises from a normal distribution N(IL +a, u 2), a# O, Nv4 has the Test statistic:
optimal property of being the scale- and location-invariant test of given size
• • X(n)- X(ll
which maximizes the probability of identifying the contaminant as discor-
dant.
T Nv? = externally and mternally studentlzed range = s ,
Inequality:
Nv5 Discordancy test for k(':2;!2) upper outliers x<n-k+l)' ... , x<n-l)' x(n) in a 2
norma[ sample with IL unknown and an independent estimate of u 2 SP(t) ~ n(n -l)P(tn-Z+v >[(n- 2+ v)t 2 /(2n- 2+2v- t )]!).
known
This is an equality when t~ B(n -l+ v)]!.
Test statistic:
Reference: Fieller (1976).
TNvs = sum of jointly (externally and internally) studentized
deviations from the mean N IL l Discordancy test fora single upper outlier x( n) in a norma l sample with IL
= X(n-k+l) + · · · + X(n-1) + X(n)- ki known and u 2 unknown
s . Test statistic:
Inequality:

SP(t)~(n)P(t
k n-Z+v
>[ k(n-k)(n-l+v)-nt
n(n-2+v)t
2

2
]!)
• Recurrence relationship:
This is an equality when t~ [k (n -l+ v)(n- k -1)/(nk +n)]!.
2

References: McMillan (1971), Fieller (1976).


!.(t)
=
7T
r(~) (l-~)n2
(!!.)! r(n~l) n
3
F _ ([(n -l)t ]!)
• ln-t'
2

Properties of test: As for N3.

Nv6 Discorda~cy test fora lower and upper outlier-pair x< 1>, x<n> in a norma[ with F 1 (t)=O (t<-1),! (-l< t< l), l (t> l).
sample wzth IL unknown and an independent estimate of u 2 known Tabulated significance levels: Table VIle, page 298; freshly compiled on the
Test statistic: basis of simulations of sizes 10 000.
Properties of test: Note that T Nj.Ll can take negative values. !he statis~ic
T Nv6 = externally studentized range = x( n)- x(l) .
S~(IL)/S 2 (1L) = 1- (l/n)T~j.L 1 is therefore not equivalent to TN~Ll' m contradts-
Sv
Inequality: tinction to the one-one relationship between S~/ S 2 an d TNt· The occurrence of a
negative value for T Nj.Ll has probability 112n on the working hypothesis and is
SP(t) <n( n -l)P(tv > t../2). therefore rare except for very small samples.
108 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 109

N IL2 Two-sided discordancy test for an extreme outlier in a norma/ sample


NIL5 Discordancy test fora lower and upper outlier-pair x(t)' x<n> in a norma[
with IL known and u 2 unknown
sample with IL known and u 2 unknown
Test statistic:
Test statistic:
TNv-5 = Si,n(IL)/S 2 (1L)·
Recurrence relationship:
Tabulated significance levels: Table Xb, page 306; freshly compiled on the
fn ( t)= nbt<n-t)12 (t)Fn-t[t/(1- t)] (O~ t~ 1). basis of simulations of sizes 10 000. Values of TNv- 5 smaller than the
Inequality: tabulated level are significant.
4

SP(t) ~ 2nP(tn-l >[(n -l)t/(1- t) p).


N IL6 Discordancy test for one or more outliers (irrespective of their directions)
This is an equality when t~!. in a norma/ sample with IL known and u 2 unknown
Tabulated significance levels: Table VIId, page 298; derived from Test statistic:
Eisenhart, Hastay, and Wallis (1947), Tables 15.1 and 15.2, pages 390-391,
where 5 and l per cent points of (TNv- 2 ) 2 /n are given (as part of a larger T Nv- 6 = sample kurtosis based on deviations from IL
table) for n= 2(1)10, 12, 15, 20, 24, 30, 40, 60, 120.
- LJ=t (xi -IL)4
References: Cochran (1941), Fieller (1976), Lewis and Fieller (1978). - ns 4 (~L)
Properties of test: Maximum likelihood ratio test when the alternative is that
one observation arises from a normal distribution N(~-t, bu 2 ), b >l. Note that The value tested for discordancy is x< n> or x< 1 >, according as lx< n>- IL l is
2
(TNv- 2 ) /n has the same distribution as T 0 a 1 with r=!. greater or less than !IL- x(l)l· The presence of a discordant val~e is indicated
by a high value of the test statistic. For more than one outher, apply test
NIL3 Discordancy test for k('~2) upper outliers x<n-k+t)' ... , x<n> in a norma[ consecutively.
sample with IL known and u 2 unknown Tabulated significance levels: Table XIVc, page 313; freshly compiled on
Test statistic: the basis of simulations of sizes lO 000.
Reference: Ferguson (196la).
Properties of test: N IL 6 is the locally bes t invariant test of given size against a
Inequality: Iocation-slippage alternative in which k of the n observations arise
SP(t) ~ (;)P(tn-l >[(n -l)t 2 /(nk- t 2 )]!).
from separate normal distributions N(IL .+a~, u 2 ), • • • , .N(IL + ak, u 2 ),
a l ;t: O' ... ' ak ;t: O' provided that the contammatmn proportmn l(k/n under ))
the alternative hypothesis is less than ~ (strictly, provided that k < 3 n+ 2 .
Tabulated significance levels: Table IXc, page 305; freshly compiled on the The test is suitable for consecutive use in the possible presence of more
basis of simulations of sizes lO 000.
than o ne outlier.
N~-t4 Discordancy test for two upper outliers x<n-t)' x<n> in a norma/ sample N IL14 Discordancy test for k outliers ( irrespective of directions) in a norma l
with IL known and u 2 unknown
sample with IL known and u 2 unknown
Test statistic:
Test statistic:
TNv-4 = s~-l,n(IL)/S 2
(1L). T = LJ=n-k+l d'Q)
Nv-14 S2(1L)
Tabulated significance levels: Table IXd, page 305; freshly compiled on the
basis of simulations of sizes 10 000. Values of TNv- 4 smaller than the
tabulated level are significant. wbere di = lxi- IL j, the absolute deviation of xi from the population me an;
an d {d(j)} are the values of the di in ascending or der, d< o< d(2) < ... < d< n>·
108 Outliers in statistica[ data Discordancy tests for outliers in univariate samples 109

N IL2 Two-sided discordancy test for an extreme outlier in a norma/ sample


NIL5 Discordancy test fora lower and upper outlier-pair x(t)' x<n> in a norma[
with IL known and u 2 unknown
sample with IL known and u 2 unknown
Test statistic:
Test statistic:
TNv-5 = Si,n(IL)/S 2 (1L)·
Recurrence relationship:
Tabulated significance levels: Table Xb, page 306; freshly compiled on the
fn ( t)= nbt<n-t)12 (t)Fn-t[t/(1- t)] (O~ t~ 1). basis of simulations of sizes 10 000. Values of TNv- 5 smaller than the
Inequality: tabulated level are significant.
4

SP(t) ~ 2nP(tn-l >[(n -l)t/(1- t) p).


N IL6 Discordancy test for one or more outliers (irrespective of their directions)
This is an equality when t~!. in a norma/ sample with IL known and u 2 unknown
Tabulated significance levels: Table VIId, page 298; derived from Test statistic:
Eisenhart, Hastay, and Wallis (1947), Tables 15.1 and 15.2, pages 390-391,
where 5 and l per cent points of (TNv- 2 ) 2 /n are given (as part of a larger T Nv- 6 = sample kurtosis based on deviations from IL
table) for n= 2(1)10, 12, 15, 20, 24, 30, 40, 60, 120.
- LJ=t (xi -IL)4
References: Cochran (1941), Fieller (1976), Lewis and Fieller (1978). - ns 4 (~L)
Properties of test: Maximum likelihood ratio test when the alternative is that
one observation arises from a normal distribution N(~-t, bu 2 ), b >l. Note that The value tested for discordancy is x< n> or x< 1 >, according as lx< n>- IL l is
2
(TNv- 2 ) /n has the same distribution as T 0 a 1 with r=!. greater or less than !IL- x(l)l· The presence of a discordant val~e is indicated
by a high value of the test statistic. For more than one outher, apply test
NIL3 Discordancy test for k('~2) upper outliers x<n-k+t)' ... , x<n> in a norma[ consecutively.
sample with IL known and u 2 unknown Tabulated significance levels: Table XIVc, page 313; freshly compiled on
Test statistic: the basis of simulations of sizes lO 000.
Reference: Ferguson (196la).
Properties of test: N IL 6 is the locally bes t invariant test of given size against a
Inequality: Iocation-slippage alternative in which k of the n observations arise
SP(t) ~ (;)P(tn-l >[(n -l)t 2 /(nk- t 2 )]!).
from separate normal distributions N(IL .+a~, u 2 ), • • • , .N(IL + ak, u 2 ),
a l ;t: O' ... ' ak ;t: O' provided that the contammatmn proportmn l(k/n under ))
the alternative hypothesis is less than ~ (strictly, provided that k < 3 n+ 2 .
Tabulated significance levels: Table IXc, page 305; freshly compiled on the The test is suitable for consecutive use in the possible presence of more
basis of simulations of sizes lO 000.
than o ne outlier.
N~-t4 Discordancy test for two upper outliers x<n-t)' x<n> in a norma/ sample N IL14 Discordancy test for k outliers ( irrespective of directions) in a norma l
with IL known and u 2 unknown
sample with IL known and u 2 unknown
Test statistic:
Test statistic:
TNv-4 = s~-l,n(IL)/S 2
(1L). T = LJ=n-k+l d'Q)
Nv-14 S2(1L)
Tabulated significance levels: Table IXd, page 305; freshly compiled on the
basis of simulations of sizes 10 000. Values of TNv- 4 smaller than the
tabulated level are significant. wbere di = lxi- IL j, the absolute deviation of xi from the population me an;
an d {d(j)} are the values of the di in ascending or der, d< o< d(2) < ... < d< n>·
11 O Outliers in statistica l data
Discordancy tests foroutliers in univariate samples 111
Inequality:
References: McKay (1935), Nair (1948), Grubbs (1950, 1969), Dixon
SP(t)< (~)~F"n-k > ~~;~~),. (1950, 1962), David (1956), Kudo (1956a), Perguson (1961b), McMillan
and David (1971), Pieller (1976).
References: Pieller (1976), Lewis and Pieller (1978). Properties of test: N u 1 is the maximum likelihood ratio test for th~ above-
stated alternative of slippage in location by a >O for one observatwn. Por
Properties of test: NIL 14 is the maximum likelihood ratio test for an alterna-
this alternative, it has the optimal property of being the scale- and location-
tive in which k of the n observations arise from a common normal
invariant test of given size which maximizes the probability P 3 of identifying
distribution N(~L, bu 2 ), b > 1.
the contaminant as discordant. Por the same alternative, some typical values
The test statistic TNp.t 4 has the same distribu*tion as Toas with r = !.
of power P 1 are as shown in Table 3.2 (David and Paulson, 1965).
Unlike test Nl, Nul can be used effectively when there is more than one
N u 1 Discordancy test for a single upper outlier x< n> in a norma l sample with
2
u known and IL unknown contaminant, being relatively unaffected by the risk of masking.

Test statistic: N~Lul Discordancy test fora single upper outlier x<n> in a normal sample with
both IL and u 2 known
T Nul = standardized extreme deviation from the me an = X< n>- i.
(1'
Test statistic:
Test distribution: T N{UTl = standardized extreme deviati o n from the population mean
2 = X(n)-IL
F (t)= exp { - 1
- d- } [<I>(t)]n
n 2n dt 2 • (1'

Recurrence relationship: Test distribution:

f.(t) = (z.,(:'-1))! exp(-~ nn~21)F._, (n~1),


Tabulated significance levels: Table Vllg, page 299; the entries for sample
2 sizes up to n= 30 are extracted from Pearson and Hartley (1966), Table _24,
with / 2 (t) = ..; 11' exp(-t 2
).
page 184, where lower and upper 10, 5, 2.5, 1, 0.5, and 0.1 per cent ~omts
Inequality: are given for n= 1(1)30; the entries for n> 30 bave been freshly compiled.
Reference: Dixon (1962).
SP(t) <n <I>[- n!tf(n -t)!].
Properties of test: Maximum likelihood ratio test when the alternative is that
Tabulated significance levels: Table VIle, page 299; abridged from Grubbs one observation arises from a normal distribution N(~L +a, u 2 ), a> O.
(1950), Table III, page 45, where 10, 5, 1, and 0.5 per cent points are given
for n= 2(1)25.
Table 3.2
Further tables: A table in Nair (1948), also in Pearson and Hartley (1966),
gives lower and upper 10, 5, 2.5, 1, 0.5, and 0.1 per cent points for
n= 3(1)9. Dixon (1950) gives graphs of performance measure P 3 • David
(1956) gives a table of values of power P 1 against the alternative that one Test at 5%
'Xu
3
2

0.31
3

0.63
4

0.87
5

0.98
observation arises from a normal distribution N(~L +a, u 2 ), a> O, for a/u = significance 10 0.27 0.62 0.89 0.99
leve l 25 0.21 0.54 0.86 0.98
1(1)4 and n= 3(1)10, 12, 15, 20, 25. David and Paulson (1965) correct
some errors in this table. McMillan and David (1971) give graphs of Test at 1% 3 0.14 0.40 0.71 0.92
performance measures P(C1), P(C2 ), P(C3 ) (page 74) for the consecutive significance 10 0.12 0.41 0.76 0.95
use of N u 1 in testing two upper outliers. leve l 25 0.09 0.35 0.72 0.94
11 O Outliers in statistica l data
Discordancy tests foroutliers in univariate samples 111
Inequality:
References: McKay (1935), Nair (1948), Grubbs (1950, 1969), Dixon
SP(t)< (~)~F"n-k > ~~;~~),. (1950, 1962), David (1956), Kudo (1956a), Perguson (1961b), McMillan
and David (1971), Pieller (1976).
References: Pieller (1976), Lewis and Pieller (1978). Properties of test: N u 1 is the maximum likelihood ratio test for th~ above-
stated alternative of slippage in location by a >O for one observatwn. Por
Properties of test: NIL 14 is the maximum likelihood ratio test for an alterna-
this alternative, it has the optimal property of being the scale- and location-
tive in which k of the n observations arise from a common normal
invariant test of given size which maximizes the probability P 3 of identifying
distribution N(~L, bu 2 ), b > 1.
the contaminant as discordant. Por the same alternative, some typical values
The test statistic TNp.t 4 has the same distribu*tion as Toas with r = !.
of power P 1 are as shown in Table 3.2 (David and Paulson, 1965).
Unlike test Nl, Nul can be used effectively when there is more than one
N u 1 Discordancy test for a single upper outlier x< n> in a norma l sample with
2
u known and IL unknown contaminant, being relatively unaffected by the risk of masking.

Test statistic: N~Lul Discordancy test fora single upper outlier x<n> in a normal sample with
both IL and u 2 known
T Nul = standardized extreme deviation from the me an = X< n>- i.
(1'
Test statistic:
Test distribution: T N{UTl = standardized extreme deviati o n from the population mean
2 = X(n)-IL
F (t)= exp { - 1
- d- } [<I>(t)]n
n 2n dt 2 • (1'

Recurrence relationship: Test distribution:

f.(t) = (z.,(:'-1))! exp(-~ nn~21)F._, (n~1),


Tabulated significance levels: Table Vllg, page 299; the entries for sample
2 sizes up to n= 30 are extracted from Pearson and Hartley (1966), Table _24,
with / 2 (t) = ..; 11' exp(-t 2
).
page 184, where lower and upper 10, 5, 2.5, 1, 0.5, and 0.1 per cent ~omts
Inequality: are given for n= 1(1)30; the entries for n> 30 bave been freshly compiled.
Reference: Dixon (1962).
SP(t) <n <I>[- n!tf(n -t)!].
Properties of test: Maximum likelihood ratio test when the alternative is that
Tabulated significance levels: Table VIle, page 299; abridged from Grubbs one observation arises from a normal distribution N(~L +a, u 2 ), a> O.
(1950), Table III, page 45, where 10, 5, 1, and 0.5 per cent points are given
for n= 2(1)25.
Table 3.2
Further tables: A table in Nair (1948), also in Pearson and Hartley (1966),
gives lower and upper 10, 5, 2.5, 1, 0.5, and 0.1 per cent points for
n= 3(1)9. Dixon (1950) gives graphs of performance measure P 3 • David
(1956) gives a table of values of power P 1 against the alternative that one Test at 5%
'Xu
3
2

0.31
3

0.63
4

0.87
5

0.98
observation arises from a normal distribution N(~L +a, u 2 ), a> O, for a/u = significance 10 0.27 0.62 0.89 0.99
leve l 25 0.21 0.54 0.86 0.98
1(1)4 and n= 3(1)10, 12, 15, 20, 25. David and Paulson (1965) correct
some errors in this table. McMillan and David (1971) give graphs of Test at 1% 3 0.14 0.40 0.71 0.92
performance measures P(C1), P(C2 ), P(C3 ) (page 74) for the consecutive significance 10 0.12 0.41 0.76 0.95
use of N u 1 in testing two upper outliers. leve l 25 0.09 0.35 0.72 0.94
112 Outliers in statistica/ data
Discordancy tests foroutliers in univariate samples 113
No-2 Two-sided discordancy test for an extreme outlier in a normal
Tabulated significance levels: Table IXg, page 306; freshly compiled on the
with u 2 known and IL unknown
basis of simulations of sizes 10 000.
Test statistic:
TNu 2 -_ max (X(n)- i , i - X(1)) . No-4 Discordancy test for two upper outliers x<n- 1>, x<n> in a normal sample with
O" O" u 2 known and IL unknown
lnequality:
SP(t) < 2P(TNu1 >t). Test statistic:
TNu4 = s~-1.nfu •
2

Tabulated significance levels: Table VIIf, page 299; extracted from


Halperin, Greenhouse, Cornfield and Zalok~r (1955), Tables 1 and 2, Tabulated significance levels: Table IXf, page 305; freshly compiled on the
pages 187-188 (v= oo; see Worksheet Nv3). basis of simulations of sizes 10 000. Values of TNu 4 smaller than the
tabulated level are significant.
Reference: Kudo (1956a).
Properties of test: As for N2, except that No-2 (unlike N2) is relatively N 1Lu4 Discordancy test for two upper outliers x<n- 1>, x< n> in a norma l sample
2
unaffected by masking from other outliers. with both IL and u known
Test statistic:
TNp.u4 = s~-1,n(IL)Iu •
2
Nu3 Discordancy test for k(;;.=;:2) upper outliers x<n-k+ 1>, . .. , x<n- 1>, x<n> in a
normal sample with u 2 known and IL unknown
Tabulated significance levels: Table IXh, page 306; freshly compiled on the
Test statistic: basis of simulations of sizes 10 000. Values of TNp.u 4 smaller than the
tabulated level are significant.
T Nu3 = su m of standardized deviations from the me an
= X(n-k+1) + • • •+ X(n-1) + X(n)- ki NuS Discordancy test fora lower and upper outlier-pair x(1)' x<n> in a normal
Inequality:
O" sample with u 2 known and IL unknown.
SP(t) < (;)<~>[ -n~t/(kn- k 2 )~]. Test statistic:
T NuS= Si,Jo- 2 •
Tabulated significance levels: Table IXe, page 305; values for k = 2, n~ 20 Tabulated significance levels: Table Xc, page 306; freshly compiled on the
extracted from McMillan and David (1971), Table 1, page 82, where 5 and basis of simulations of sizes 10 000. Values of T Nus smaller than the
1 per cent points are given for n= 4(1)27; values for k = 2, n;;.=::: 30 and for tabulated level are significant.
k = 3 and k = 4 freshly compiled on the basis of simulations of sizes 10 000.
Further tables: McMillan and David (1971) give graphs of the performance NILu5 Discordancy test for a lower and upper outlier-pair x(1)' X<n> in a
measure P2 for the case k = 2. normal sample with both IL and u 2 known
References: Kudo (1956a), McMillan and David (1971), Fieller (1976). Test statistic:
TNp.u5 = Si,n(IL)fo- 2 •
Properties of test: As for N3.
Tabulated significance levels: Table Xd, page 306; freshly compiled on
N 1Lu3 Discordancy test for k (;;.=::: 2) upper outliers x<n-k+ 1>, ... , x<n- 1>, x< n> in a the basis of simulations of sizes 10 000. Values of TNp.us smaller than the
normal sample with both IL and u 2 known tabulated level are significant.
Test statistic:
Nu6(NILu6) Discordancy test fora lower and upper outlier-pair x(1)' X<n> in a
T Np.u 3 = sum of standardized deviations from the population me an normal sample with u 2 known
X(n-k+1) + • • •+ X(n-1) + X(n)- kiL Test statistic:
O"
T Nu 6 = standardized range
112 Outliers in statistica/ data
Discordancy tests foroutliers in univariate samples 113
No-2 Two-sided discordancy test for an extreme outlier in a normal
Tabulated significance levels: Table IXg, page 306; freshly compiled on the
with u 2 known and IL unknown
basis of simulations of sizes 10 000.
Test statistic:
TNu 2 -_ max (X(n)- i , i - X(1)) . No-4 Discordancy test for two upper outliers x<n- 1>, x<n> in a normal sample with
O" O" u 2 known and IL unknown
lnequality:
SP(t) < 2P(TNu1 >t). Test statistic:
TNu4 = s~-1.nfu •
2

Tabulated significance levels: Table VIIf, page 299; extracted from


Halperin, Greenhouse, Cornfield and Zalok~r (1955), Tables 1 and 2, Tabulated significance levels: Table IXf, page 305; freshly compiled on the
pages 187-188 (v= oo; see Worksheet Nv3). basis of simulations of sizes 10 000. Values of TNu 4 smaller than the
tabulated level are significant.
Reference: Kudo (1956a).
Properties of test: As for N2, except that No-2 (unlike N2) is relatively N 1Lu4 Discordancy test for two upper outliers x<n- 1>, x< n> in a norma l sample
2
unaffected by masking from other outliers. with both IL and u known
Test statistic:
TNp.u4 = s~-1,n(IL)Iu •
2
Nu3 Discordancy test for k(;;.=;:2) upper outliers x<n-k+ 1>, . .. , x<n- 1>, x<n> in a
normal sample with u 2 known and IL unknown
Tabulated significance levels: Table IXh, page 306; freshly compiled on the
Test statistic: basis of simulations of sizes 10 000. Values of TNp.u 4 smaller than the
tabulated level are significant.
T Nu3 = su m of standardized deviations from the me an
= X(n-k+1) + • • •+ X(n-1) + X(n)- ki NuS Discordancy test fora lower and upper outlier-pair x(1)' x<n> in a normal
Inequality:
O" sample with u 2 known and IL unknown.
SP(t) < (;)<~>[ -n~t/(kn- k 2 )~]. Test statistic:
T NuS= Si,Jo- 2 •
Tabulated significance levels: Table IXe, page 305; values for k = 2, n~ 20 Tabulated significance levels: Table Xc, page 306; freshly compiled on the
extracted from McMillan and David (1971), Table 1, page 82, where 5 and basis of simulations of sizes 10 000. Values of T Nus smaller than the
1 per cent points are given for n= 4(1)27; values for k = 2, n;;.=::: 30 and for tabulated level are significant.
k = 3 and k = 4 freshly compiled on the basis of simulations of sizes 10 000.
Further tables: McMillan and David (1971) give graphs of the performance NILu5 Discordancy test for a lower and upper outlier-pair x(1)' X<n> in a
measure P2 for the case k = 2. normal sample with both IL and u 2 known
References: Kudo (1956a), McMillan and David (1971), Fieller (1976). Test statistic:
TNp.u5 = Si,n(IL)fo- 2 •
Properties of test: As for N3.
Tabulated significance levels: Table Xd, page 306; freshly compiled on
N 1Lu3 Discordancy test for k (;;.=::: 2) upper outliers x<n-k+ 1>, ... , x<n- 1>, x< n> in a the basis of simulations of sizes 10 000. Values of TNp.us smaller than the
normal sample with both IL and u 2 known tabulated level are significant.
Test statistic:
Nu6(NILu6) Discordancy test fora lower and upper outlier-pair x(1)' X<n> in a
T Np.u 3 = sum of standardized deviations from the population me an normal sample with u 2 known
X(n-k+1) + • • •+ X(n-1) + X(n)- kiL Test statistic:
O"
T Nu 6 = standardized range
114 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 115

Test distribution: Test distribution:

F.(t)= n J <f>(x)[<I>(x)-<l>(x-t)]n-1 dx. F. (t)= 1- n(n -l) J<f>(x + t)[1- <I>(x+ t)][<l>(x)]"-2 dx.

Tabulated significance levels: T ab le Xlb, page 307; abridged from Harter Tabulated significance levels: Table Xliii, page 312; derived from Irwin
(1969a), Table A7, pages 372-374, wbere lower and upper 0.01, 0.05, 0.1, (1925), Table III, page 242, wbere values of Fn(t) are given for t=
0.5, l, 2.5, 5, 10(10)40 per cent points and tbe 50 per cent point are given 0.1(0.1)2.0 and n= 3, 10(10)100(100)1000, and also for t= 2.1(0.1)4.0 in
for n= 2(1)20(2)40(10)100. tbe case n= 3.
Further tables: Dixon (1950) gives grapbs of performance measure P3 • Reference: Irwin (1925).
References: Tippett (1925), Pearson (1926, 1932), Pearson and Hartley Properties of test: Analogous to a Dixon-type test. As witb N u7, could be an
(1942), Dixon (1950, 1962), Harter (1969a). appropriate test in a life-testing context. Nu7 and Nu8 bave a bistori~al
standing as being two of tbe earliest publisbed tests in modern outher
N u7 (N ~tu7) Discordancy test for a single upper outlier x<n> in a normal metbodology.
sample with u 2 known
Test statistic: Nu9(N~tu9) Discordancy test for k lower and k upper outliers (k?:::2) in a
,.,.. X(n)- X(n-1) normal sample with u 2 known
.LNu7 •
O"
Test distribution: Test statistic:
• • X(n-k+l)- X(k)
TNcr 9 = (k -l)tb standardtzed quast-range = u .
F.(t)=1-n J 1
<f>(x+t)[<I>(x)]"- dx.
Tabulated significance levels: Table XII, page 310; abridged from Harter
(1969b), Table A7, pages 295-319, wbere lower and upper O.? l, 0.05, ?.l, 0.5,
Tabulated significance levels: Table XIIIb, page 312; derived from Irwin
l, 2.5, 5, 10(10)40 per cent points and tbe 50 per cent pomt are gtven for
(1925), T ab le II, page 239, wbere values of Fn (t) are given for t=
k = 1(1)9 and n= 2k(l)20(2)40(10)100.
0.1(0.1)5.0 and n= 2, 3, 10(10)100(100)1000.
Reference: Harter (1969b).
Further tables: Dixon (1950) gives grapbs of performance measure P 3 •
Properties of test: Advantage is taken bere of Harter's very extensive tables
References: Irwin (1925), Dixon (1950). of quasi-ranges to provide a test for discordancy in the tails of a large
Properties of test: Analogous in concept to a Dixon-type test. Performance sample wbose main centrai mass can be assumed normal.
P 3 is comparable to tbat of test Nul if tbere is just one contaminant, but
compares unfavourably witb Nul if tbere is more tban one contaminant, 3.4.4 Discordancy tests for samples from other distributions
owing to incidence of masking by x<n-t)· However, Nu7 could be a useful Many of tbe discordancy tests given in Section 3.4.2 for exponential samples
test for a lower outlier (witb test statistic [x<2 >- x(l)]/ u) in a life-test data can be used for samples from Pareto distributions and from distributions of
situation in wbicb for practical reasons only tbe sbortest lifetimes were asymptotic extreme-value type (i.e. Gumbel, Frécbet, and Weibull distribu-
actually observed.
tions), by simple transformation of tbe data. Likewise, various discordancy
tests given in Section 3.4.3 for normal samples can be used for samples from
Nu8(N~-tu8) Discordancy test for an upper outlier-pair x<n- 1>, x<n> in a normal log-normal distributions. Since a square root transformation converts a
sample with u 2 known Poisson random variable into a variable distributed approximately normally
Test statistic: witb variance !, wbatever tbe value of tbe Poisson mean provided it is not
too small, tbe Nu tests in Section 3.4.3 can be used for samples from
T _ X(n-1)- X(n-2)
Poisson distributions. Similarly, tbe N u tests can be used for samples from
Ncr8- •
O" binomia} distributions.
114 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 115

Test distribution: Test distribution:

F.(t)= n J <f>(x)[<I>(x)-<l>(x-t)]n-1 dx. F. (t)= 1- n(n -l) J<f>(x + t)[1- <I>(x+ t)][<l>(x)]"-2 dx.

Tabulated significance levels: T ab le Xlb, page 307; abridged from Harter Tabulated significance levels: Table Xliii, page 312; derived from Irwin
(1969a), Table A7, pages 372-374, wbere lower and upper 0.01, 0.05, 0.1, (1925), Table III, page 242, wbere values of Fn(t) are given for t=
0.5, l, 2.5, 5, 10(10)40 per cent points and tbe 50 per cent point are given 0.1(0.1)2.0 and n= 3, 10(10)100(100)1000, and also for t= 2.1(0.1)4.0 in
for n= 2(1)20(2)40(10)100. tbe case n= 3.
Further tables: Dixon (1950) gives grapbs of performance measure P3 • Reference: Irwin (1925).
References: Tippett (1925), Pearson (1926, 1932), Pearson and Hartley Properties of test: Analogous to a Dixon-type test. As witb N u7, could be an
(1942), Dixon (1950, 1962), Harter (1969a). appropriate test in a life-testing context. Nu7 and Nu8 bave a bistori~al
standing as being two of tbe earliest publisbed tests in modern outher
N u7 (N ~tu7) Discordancy test for a single upper outlier x<n> in a normal metbodology.
sample with u 2 known
Test statistic: Nu9(N~tu9) Discordancy test for k lower and k upper outliers (k?:::2) in a
,.,.. X(n)- X(n-1) normal sample with u 2 known
.LNu7 •
O"
Test distribution: Test statistic:
• • X(n-k+l)- X(k)
TNcr 9 = (k -l)tb standardtzed quast-range = u .
F.(t)=1-n J 1
<f>(x+t)[<I>(x)]"- dx.
Tabulated significance levels: Table XII, page 310; abridged from Harter
(1969b), Table A7, pages 295-319, wbere lower and upper O.? l, 0.05, ?.l, 0.5,
Tabulated significance levels: Table XIIIb, page 312; derived from Irwin
l, 2.5, 5, 10(10)40 per cent points and tbe 50 per cent pomt are gtven for
(1925), T ab le II, page 239, wbere values of Fn (t) are given for t=
k = 1(1)9 and n= 2k(l)20(2)40(10)100.
0.1(0.1)5.0 and n= 2, 3, 10(10)100(100)1000.
Reference: Harter (1969b).
Further tables: Dixon (1950) gives grapbs of performance measure P 3 •
Properties of test: Advantage is taken bere of Harter's very extensive tables
References: Irwin (1925), Dixon (1950). of quasi-ranges to provide a test for discordancy in the tails of a large
Properties of test: Analogous in concept to a Dixon-type test. Performance sample wbose main centrai mass can be assumed normal.
P 3 is comparable to tbat of test Nul if tbere is just one contaminant, but
compares unfavourably witb Nul if tbere is more tban one contaminant, 3.4.4 Discordancy tests for samples from other distributions
owing to incidence of masking by x<n-t)· However, Nu7 could be a useful Many of tbe discordancy tests given in Section 3.4.2 for exponential samples
test for a lower outlier (witb test statistic [x<2 >- x(l)]/ u) in a life-test data can be used for samples from Pareto distributions and from distributions of
situation in wbicb for practical reasons only tbe sbortest lifetimes were asymptotic extreme-value type (i.e. Gumbel, Frécbet, and Weibull distribu-
actually observed.
tions), by simple transformation of tbe data. Likewise, various discordancy
tests given in Section 3.4.3 for normal samples can be used for samples from
Nu8(N~-tu8) Discordancy test for an upper outlier-pair x<n- 1>, x<n> in a normal log-normal distributions. Since a square root transformation converts a
sample with u 2 known Poisson random variable into a variable distributed approximately normally
Test statistic: witb variance !, wbatever tbe value of tbe Poisson mean provided it is not
too small, tbe Nu tests in Section 3.4.3 can be used for samples from
T _ X(n-1)- X(n-2)
Poisson distributions. Similarly, tbe N u tests can be used for samples from
Ncr8- •
O" binomia} distributions.
Discordancy tests foroutliers in univariate samples 117
116 Outliers in statistica/ data

Apart from the use of transformations, a few specific discordancy tests are Discordancy tests for Gumbel, Fréchet, and Weibull samples
available for Poisson and binomia! samples and for samples from other The asymptotic extreme-value distributions of the first, second, and third
distributions. types, in other words the Gumbel, Fréchet, and Weibull distributions, are
Details for the various distributions are given below as follows: well known as mod'els for extreme observations such as annual maximum
page wind speeds, ftoods (as greatest-value phenomena), endurance limits in
Discordancy tests for Pareto samples 116 fatigue testing (as smallest-value phenomena), annual minimum tempera-
Discordancy tests for Gumbel, Fréchet, and Weibull samples 117 tures, oldest ages of individuals in a population, and shortest lives of
Discordancy tests for log-normal samples 118 manufactured items. In analysing extreme-value data it is obviously impor-
Discordancy tests for uniform samples 118 tant to remove where possible the biasing effect of any contaminant values
Discordancy tests for Poisson samples 120 which may be present, and the testing of outliers for discordancy is of
Discordancy tests for binomia! samples 122 particular relevance.
Discordancy tests for truncated exponential samples 124 The Gumbel distribution depends on a location parameter a and a positive
scale parameter b; the Fréchet and Weibull distributions depend on these
Discordancy tests for Pareto samples two parameters and also on a positive shape parameter r. In terms of
these parameters their distribution functions P(X <x) are given in Table
In addition to its well known role in economics as a model for the 3.3. Note that each distribution has two forms, according as it relates to
distribution of incomes, the Pareto distribution can be used as a pragmatic greatest-value or smallest-value extremes. It can be shown that if the shape
model for other skew-distributed data characterized by a main mass of low parameter is reparametrized as A= 1/r (Weibull), A= -1/r (Fréchet), the
values at one end and a gradation to a long tail of infrequently occurring
Gumbel distribution corresponds to the limiting case A ~ O.
high values at the other. In data of this nature, high-valued outliers If X has a Gumbel greatest-value distribution, the transformed random
requiring test may well arise. variable Y = exp(-X/b) has an exponential distribution with origin O and
A Pareto random variable X is characterized by two parameters, the scale parameter exp(-a/b). If we know the value of b we can test the
minimum value a (a> O), and the shape parameter r (r> O). Its distribution diseordancy of an outlier or a sé t of outliers in a sample from the X-
function can be written
distribution by transforming each observed value xi to Yi = exp(-xJb) and
P(X<x)=O (x~a), 1-(a/x)' (x;;:=a). using o n the y 's a discordancy test for an exponential sample with origin O.
Note that with this particular transformation an upper outlier x<n> in the
If Y = In X, the distribution function of Y is x-sample converts to a lower outlier y(l) in the y-sample, so the test on the
P(Y < y) =O (y ~In a), 1- exp[ -r(y In a)] (y ;:::In a), y-values must be chosen accordingly.
Corresponding transformations to the one just described are available for
i.e. Y has an exponentiai distribution with origin In a and scale parameter the Gumbel smallest-value distribution and the various Fréchet and Weibull
1/r. Suppose then that we bave, on the working hypothesis, a Pareto sampie distributions. In each case, the Y -distribution will h ave a density of the form
x< 1 >, ••• , x<n>' containing one or more outliers. The transformed quantities {1/A)exp(-y/A) (y>O) on the working hypothesis. Ali the tests listed in
In x< I>' ..• , In x< n> will also be in ascending order, and so can be written Section 3.4.2 are applicable to a sample from such a distribution.
y(l)' ... , Y<n>' an ordered sample from the exponential Y-distribution. If a
and r are both unknown, the outlying vaiue or vaiues in the x-sample can be
tested for discordancy by applying to the corresponding y-vaiues the ap- Table 3.3
propriate test for an exponential sample with unknown origin. The available
tests given in Section 3.4.2 are the Dixon-type tests E2 E4 E6 ES E10 Distribution of greatest values Distribution of smallest values
E11. , ' ' ' '
Gumbel <x <
exp[ -e-(x-a )/h], -00 00 l - exp[ -e-(a-x)/'1, <x<
-00 00

If a is known, the transformation Z = ln(X/ a) should be used. On the


working hypothesis the transformed quantities z(l) = ln[x(l/ a], ... , z<n> = Fréchet exp[ -(x~ a)-'], a<x 1- exp[- (a~ x) -r], x< a
ln[x<n>la] beiong to an exponential distribution with originatO (and density
re-rz (z > 0)). Ali the G and E tests Iisted in Section 3.4.2 are appiicabie to Weibull exp[-(a~x)'],x<a l -exp [- (x ~ ar], a< x
the transformed sampie.
Discordancy tests foroutliers in univariate samples 117
116 Outliers in statistica/ data

Apart from the use of transformations, a few specific discordancy tests are Discordancy tests for Gumbel, Fréchet, and Weibull samples
available for Poisson and binomia! samples and for samples from other The asymptotic extreme-value distributions of the first, second, and third
distributions. types, in other words the Gumbel, Fréchet, and Weibull distributions, are
Details for the various distributions are given below as follows: well known as mod'els for extreme observations such as annual maximum
page wind speeds, ftoods (as greatest-value phenomena), endurance limits in
Discordancy tests for Pareto samples 116 fatigue testing (as smallest-value phenomena), annual minimum tempera-
Discordancy tests for Gumbel, Fréchet, and Weibull samples 117 tures, oldest ages of individuals in a population, and shortest lives of
Discordancy tests for log-normal samples 118 manufactured items. In analysing extreme-value data it is obviously impor-
Discordancy tests for uniform samples 118 tant to remove where possible the biasing effect of any contaminant values
Discordancy tests for Poisson samples 120 which may be present, and the testing of outliers for discordancy is of
Discordancy tests for binomia! samples 122 particular relevance.
Discordancy tests for truncated exponential samples 124 The Gumbel distribution depends on a location parameter a and a positive
scale parameter b; the Fréchet and Weibull distributions depend on these
Discordancy tests for Pareto samples two parameters and also on a positive shape parameter r. In terms of
these parameters their distribution functions P(X <x) are given in Table
In addition to its well known role in economics as a model for the 3.3. Note that each distribution has two forms, according as it relates to
distribution of incomes, the Pareto distribution can be used as a pragmatic greatest-value or smallest-value extremes. It can be shown that if the shape
model for other skew-distributed data characterized by a main mass of low parameter is reparametrized as A= 1/r (Weibull), A= -1/r (Fréchet), the
values at one end and a gradation to a long tail of infrequently occurring
Gumbel distribution corresponds to the limiting case A ~ O.
high values at the other. In data of this nature, high-valued outliers If X has a Gumbel greatest-value distribution, the transformed random
requiring test may well arise. variable Y = exp(-X/b) has an exponential distribution with origin O and
A Pareto random variable X is characterized by two parameters, the scale parameter exp(-a/b). If we know the value of b we can test the
minimum value a (a> O), and the shape parameter r (r> O). Its distribution diseordancy of an outlier or a sé t of outliers in a sample from the X-
function can be written
distribution by transforming each observed value xi to Yi = exp(-xJb) and
P(X<x)=O (x~a), 1-(a/x)' (x;;:=a). using o n the y 's a discordancy test for an exponential sample with origin O.
Note that with this particular transformation an upper outlier x<n> in the
If Y = In X, the distribution function of Y is x-sample converts to a lower outlier y(l) in the y-sample, so the test on the
P(Y < y) =O (y ~In a), 1- exp[ -r(y In a)] (y ;:::In a), y-values must be chosen accordingly.
Corresponding transformations to the one just described are available for
i.e. Y has an exponentiai distribution with origin In a and scale parameter the Gumbel smallest-value distribution and the various Fréchet and Weibull
1/r. Suppose then that we bave, on the working hypothesis, a Pareto sampie distributions. In each case, the Y -distribution will h ave a density of the form
x< 1 >, ••• , x<n>' containing one or more outliers. The transformed quantities {1/A)exp(-y/A) (y>O) on the working hypothesis. Ali the tests listed in
In x< I>' ..• , In x< n> will also be in ascending order, and so can be written Section 3.4.2 are applicable to a sample from such a distribution.
y(l)' ... , Y<n>' an ordered sample from the exponential Y-distribution. If a
and r are both unknown, the outlying vaiue or vaiues in the x-sample can be
tested for discordancy by applying to the corresponding y-vaiues the ap- Table 3.3
propriate test for an exponential sample with unknown origin. The available
tests given in Section 3.4.2 are the Dixon-type tests E2 E4 E6 ES E10 Distribution of greatest values Distribution of smallest values
E11. , ' ' ' '
Gumbel <x <
exp[ -e-(x-a )/h], -00 00 l - exp[ -e-(a-x)/'1, <x<
-00 00

If a is known, the transformation Z = ln(X/ a) should be used. On the


working hypothesis the transformed quantities z(l) = ln[x(l/ a], ... , z<n> = Fréchet exp[ -(x~ a)-'], a<x 1- exp[- (a~ x) -r], x< a
ln[x<n>la] beiong to an exponential distribution with originatO (and density
re-rz (z > 0)). Ali the G and E tests Iisted in Section 3.4.2 are appiicabie to Weibull exp[-(a~x)'],x<a l -exp [- (x ~ ar], a< x
the transformed sampie.
118 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 119

Table 3.4 if a and b are unknown. Por example, an appropriate statistic to test two
upper outliers for discordancy would be
Required Scale pa- X( n) X (l)
knowledge ·Transfor- rameter À trans- trans- X(n)- X(n-2)
of param- mation to of Y-dis- forms forms
X-distribution eters be applied tribution to to X(n)- X(l)

Gumbel greatest-value b known Y = exp(- Xl b) exp(-a/b) y(l)


If a is known, it is preferable to use the statistic
Y<n)
Gumbel smallest-value b known Y=exp(X/b) exp(a/b) Y(n) y(l)
Fréchet greatest-value a, r known Y= (X- a)-r b-r y(l) Y(n)
X(n)- X(n-2) •

Fréchet smallest-value a, r known Y=(a-xrr b-r Y(n) Y<t>


X(n)-a '
Weibull greatest-value a, r known Y=(a-X)'" br y(l) Y(n)
Weibull smallest-value a, r known Y=(X-a)' br Y(n) y(l)
this can be included as a particular case of Tu by defining x<o> = a an d
extending the range of possible values of p down to zero. Likewise the value
Table 3.4 gives the various transformations in a form for working use.
of b, if known, can be used in these tests by defining x<n+ 1> b, q:::;; n+ l. =
The test distribution for Tu is
Admittedly their utility is limited by the knowledge of parameter values
required before they can be applied, unlike the Pareto case (page 116). In (t) = b s-r,q-p-s+r( t) (O :s.;; t:::;; 1).
No tabulated significance levels are required, other than those provided in
Discordancy tests lor log-normal samples standard F-tables, since
If X is a log-normal random variable with parameters IL and u, ln X is
q-p-s+r t )
N(/.L, u 2 ) or equivalently log 10 X is N(0.434/.L, (0.434) 2 cr). Thus to test for SP(t) =p( F2(s-r),2(q-p-s+r) > -- •
s-r 1 -t
discordancy any outlier or outliers in a log-normal sample, we need only
take the logarithms of the observations (to base e or 10 as convenient) and
Por the important particular case of a single outlier, say an upper one,
apply an appropriate normal sample test from those listed in Section 3.4.3 to
the transformed sample. X<n>' the test statistic X<n>- X(n- 1 ) has distribution In (t)= (n- 2)(1- tt- 3
X(n)- X(l)
No such facility is available for the generalized three-parameter log-
(O :s.;; t:::;; 1), and
normal distribution, in which ln(X- ~) is N(/.L, u 2 ), ~ being an unknown
location parameter. SP(t) = (1- tt- 2

Discordancy tests lor unilorm samples Example: Maguire, Pearson, and Wynn (1952) give the following table
(reproduced by permission of the Biometrika Trustees) of time intervals in days
Denote the lower and upper bounds of the uniform distribution by a, b between successive compensable accidents for one shift in a section of a mine.
respectively, so that its density is 1/(b- a) for a< x< b, O otherwise. Given The early interval 23 days is an outlier.
the ordered sample x(l) < x< 2 >< ... < x<n>' it is well known that the n+ l
intervals 3 l o 2 4 5 l 2 2 l 3 o 4
X(l)- a, X(2)- X(1)' ••• , X(n)- X(n-1)' b- X(n) 23 o l 3 2 o l o o 8 2 3 8
o l 3 o 3 4 8 2 l 3 o 2 Total 222
are distributed as n+ l independent exponential random variables with a o o o o o o l o
2 5 4 2
common (arbitrary) scale parameter, conditional upon their sum having the 2 o o o o o o 8 2
l 3 o
constant value b - a. Por testing outliers, therefore, we can use the fact that 3 o o o 2 2 2 o o
2 12 o
the ratio of any two non-intersecting combinations of these intervals will o o o o 2 l o 8 o l lO 14
bave an F-distribution on the working hypothesis. This leads at once to a 2 l o 3 4 l 2 8 o l 4 l
generai Dixon-type discordancy test applicable to any combination of lower
and upper outliers, the statistic being There were 98 intervals and thus 99 accidents, occurring a t times O, 3, 26, 26, 26,
28, ... , 209, 210, 214, and 222 days. Assuming that the accidents occurred
randomly a t a constant average rate, these 99 times can be regarded as points in a
Poisson process (or rather an equivalent lattice process, since their values are
118 Outliers in statistica/ data Discordancy tests foroutliers in univariate samples 119

Table 3.4 if a and b are unknown. Por example, an appropriate statistic to test two
upper outliers for discordancy would be
Required Scale pa- X( n) X (l)
knowledge ·Transfor- rameter À trans- trans- X(n)- X(n-2)
of param- mation to of Y-dis- forms forms
X-distribution eters be applied tribution to to X(n)- X(l)

Gumbel greatest-value b known Y = exp(- Xl b) exp(-a/b) y(l)


If a is known, it is preferable to use the statistic
Y<n)
Gumbel smallest-value b known Y=exp(X/b) exp(a/b) Y(n) y(l)
Fréchet greatest-value a, r known Y= (X- a)-r b-r y(l) Y(n)
X(n)- X(n-2) •

Fréchet smallest-value a, r known Y=(a-xrr b-r Y(n) Y<t>


X(n)-a '
Weibull greatest-value a, r known Y=(a-X)'" br y(l) Y(n)
Weibull smallest-value a, r known Y=(X-a)' br Y(n) y(l)
this can be included as a particular case of Tu by defining x<o> = a an d
extending the range of possible values of p down to zero. Likewise the value
Table 3.4 gives the various transformations in a form for working use.
of b, if known, can be used in these tests by defining x<n+ 1> b, q:::;; n+ l. =
The test distribution for Tu is
Admittedly their utility is limited by the knowledge of parameter values
required before they can be applied, unlike the Pareto case (page 116). In (t) = b s-r,q-p-s+r( t) (O :s.;; t:::;; 1).
No tabulated significance levels are required, other than those provided in
Discordancy tests lor log-normal samples standard F-tables, since
If X is a log-normal random variable with parameters IL and u, ln X is
q-p-s+r t )
N(/.L, u 2 ) or equivalently log 10 X is N(0.434/.L, (0.434) 2 cr). Thus to test for SP(t) =p( F2(s-r),2(q-p-s+r) > -- •
s-r 1 -t
discordancy any outlier or outliers in a log-normal sample, we need only
take the logarithms of the observations (to base e or 10 as convenient) and
Por the important particular case of a single outlier, say an upper one,
apply an appropriate normal sample test from those listed in Section 3.4.3 to
the transformed sample. X<n>' the test statistic X<n>- X(n- 1 ) has distribution In (t)= (n- 2)(1- tt- 3
X(n)- X(l)
No such facility is available for the generalized three-parameter log-
(O :s.;; t:::;; 1), and
normal distribution, in which ln(X- ~) is N(/.L, u 2 ), ~ being an unknown
location parameter. SP(t) = (1- tt- 2

Discordancy tests lor unilorm samples Example: Maguire, Pearson, and Wynn (1952) give the following table
(reproduced by permission of the Biometrika Trustees) of time intervals in days
Denote the lower and upper bounds of the uniform distribution by a, b between successive compensable accidents for one shift in a section of a mine.
respectively, so that its density is 1/(b- a) for a< x< b, O otherwise. Given The early interval 23 days is an outlier.
the ordered sample x(l) < x< 2 >< ... < x<n>' it is well known that the n+ l
intervals 3 l o 2 4 5 l 2 2 l 3 o 4
X(l)- a, X(2)- X(1)' ••• , X(n)- X(n-1)' b- X(n) 23 o l 3 2 o l o o 8 2 3 8
o l 3 o 3 4 8 2 l 3 o 2 Total 222
are distributed as n+ l independent exponential random variables with a o o o o o o l o
2 5 4 2
common (arbitrary) scale parameter, conditional upon their sum having the 2 o o o o o o 8 2
l 3 o
constant value b - a. Por testing outliers, therefore, we can use the fact that 3 o o o 2 2 2 o o
2 12 o
the ratio of any two non-intersecting combinations of these intervals will o o o o 2 l o 8 o l lO 14
bave an F-distribution on the working hypothesis. This leads at once to a 2 l o 3 4 l 2 8 o l 4 l
generai Dixon-type discordancy test applicable to any combination of lower
and upper outliers, the statistic being There were 98 intervals and thus 99 accidents, occurring a t times O, 3, 26, 26, 26,
28, ... , 209, 210, 214, and 222 days. Assuming that the accidents occurred
randomly a t a constant average rate, these 99 times can be regarded as points in a
Poisson process (or rather an equivalent lattice process, since their values are
120 Outliers in statistical data Discordancy tests foroutliers in univariate samples 121

rounded to the nearest integer-but this can be ignored in the analysis). On the values J(xi +!) (j =l, ... , n) and apply an appropriate Nu test with u =! to
working hypothesis, therefore, they constitute an ordered sample the transformed sample.

x(l) =O, x<2> = 3, x<3 ) = 26, ... , x<9 s) = 214, x<9 9) = 222, Pl Discordancy test for a single upper outlier X(n) in a Poisson sample of
unknown mean
from a uniform distribution with unknown bounds a <O, b > 222. Le t us test x< o Test statistic: Tp1 = outlier x(n)' tested conditionally on the observed sum of
and x<2 ) as a lower outlier-pair, using the test statistic Tu= observations L xi.
[x<3 ) - x(l)]/[x< 99) - x(l)]. The observed value of Tu is t= 26/222 = 0.1171, and
96 171 Inequality:
the significance probability of this value is SP€t) = P(F4 192 > O.i )=
, 2 0.8829 n(n -l)
P(F4 , 192 > 6.37) which is much smaller than 0.001. W e must therefore regard as np1- pi< SP(t)< np1
2
discordant the outlier-pair x(l), x< 2), i.e. the combined interval3 + 23 days from
the first to the third accident. This is a stronger result than the discordancy of the
single interval 23 days which it en passant implies.

Discordancy tests for Poisson samples Tabulated significance levels: Table XVIIa, page 316; freshly compiled.
Outlier situations in Poisson samples may arise in any of the numerous Further tables: Doornbos (1966), Table I, gives nominai 5 per cent and l
practical contexts giving rise to Poisson-distributed data, in particular where per cent criticai values of x(n) for n= 2(1)10 and L xi= 2(1)25, together with
the data are counts of events occurring randomly in a given time, or counts the actuallevels of significance attaching to each (necessarily discrete) entry.
of individuals scattered randomly over a given length, area, or volume. See also Section 5.3.3 and Table XXV.
Suppose, to fix ideas, that we wish to test for discordancy an upper outlier
x(n) in a sample from a Poisson distribution whose mean IL is unknown. The Reference: Doornbos (1966).
argument used earlier (page 55) in the case of a gamma sample might P2 Discordancy test for a single lower outlier x(l) in a Poisson sample of
suggest x<nlL xi as a possible statistic, being of the form N/ D an d invariant unknown mean
under change of scale. This statistic cannot be used as it stands, since its null
distribution depends on IL· However, the null distribution of x(n) conditional Test statistic: Tp2 = outlier x(l), tested conditionally on the observed sum of
on the observed value of L xi does not depend on IL' since the distribution of observations L xi.
x, ... , xn conditional on the observed value of L xi is multinomial with lnequality:
parameters (L xi,~, ... ,~)· Using this fact, Doornbos (1966) has shown np2-
n(n -l)
p~< SP(t) < np2
how to set up a discordancy test for (say) x(n)' based on its null distribution 2
conditional on L xi. The table of significance levels, say at 5 per cent, will
thus show a criticai value of X(n) corresponding to each pair of entry values n,
L xi. (As always with discrete distributions, the significance probability
attaching to each criticai integer value will not be exactly 5 per cent.)
Discordancy tests for four outlier situations are given belo w, namely a Tabulated significance levels: Table XVIIb, page 317; freshly compiled.
single upper outlier, a single lower outlier, an upper outlier-pair, and a
Reference: Doornbos (1966).
lower outlier-pair.
As an alternative to these specific tests we may, as already mentioned, use
P3 Discordancy test for an upper outlier-pair x<n-t)' x(n) in a Poisson sample
the fact that the transform .J (X+!) of a P oisso n random variable X with
of unknown mean
mean IL is approximately N(.J ILJ), providing that IL is not too small, say at
least 4 or 5. Thus to test for discordancy any outlier or outliers in a Poisson Test statistic: Tp3 = sum of outliers x<n- 1 ) + x(n)' tested conditionally on the
sample of unknown but not too small mean, we can take the transformed observed sum of observations L xi.
120 Outliers in statistical data Discordancy tests foroutliers in univariate samples 121

rounded to the nearest integer-but this can be ignored in the analysis). On the values J(xi +!) (j =l, ... , n) and apply an appropriate Nu test with u =! to
working hypothesis, therefore, they constitute an ordered sample the transformed sample.

x(l) =O, x<2> = 3, x<3 ) = 26, ... , x<9 s) = 214, x<9 9) = 222, Pl Discordancy test for a single upper outlier X(n) in a Poisson sample of
unknown mean
from a uniform distribution with unknown bounds a <O, b > 222. Le t us test x< o Test statistic: Tp1 = outlier x(n)' tested conditionally on the observed sum of
and x<2 ) as a lower outlier-pair, using the test statistic Tu= observations L xi.
[x<3 ) - x(l)]/[x< 99) - x(l)]. The observed value of Tu is t= 26/222 = 0.1171, and
96 171 Inequality:
the significance probability of this value is SP€t) = P(F4 192 > O.i )=
, 2 0.8829 n(n -l)
P(F4 , 192 > 6.37) which is much smaller than 0.001. W e must therefore regard as np1- pi< SP(t)< np1
2
discordant the outlier-pair x(l), x< 2), i.e. the combined interval3 + 23 days from
the first to the third accident. This is a stronger result than the discordancy of the
single interval 23 days which it en passant implies.

Discordancy tests for Poisson samples Tabulated significance levels: Table XVIIa, page 316; freshly compiled.
Outlier situations in Poisson samples may arise in any of the numerous Further tables: Doornbos (1966), Table I, gives nominai 5 per cent and l
practical contexts giving rise to Poisson-distributed data, in particular where per cent criticai values of x(n) for n= 2(1)10 and L xi= 2(1)25, together with
the data are counts of events occurring randomly in a given time, or counts the actuallevels of significance attaching to each (necessarily discrete) entry.
of individuals scattered randomly over a given length, area, or volume. See also Section 5.3.3 and Table XXV.
Suppose, to fix ideas, that we wish to test for discordancy an upper outlier
x(n) in a sample from a Poisson distribution whose mean IL is unknown. The Reference: Doornbos (1966).
argument used earlier (page 55) in the case of a gamma sample might P2 Discordancy test for a single lower outlier x(l) in a Poisson sample of
suggest x<nlL xi as a possible statistic, being of the form N/ D an d invariant unknown mean
under change of scale. This statistic cannot be used as it stands, since its null
distribution depends on IL· However, the null distribution of x(n) conditional Test statistic: Tp2 = outlier x(l), tested conditionally on the observed sum of
on the observed value of L xi does not depend on IL' since the distribution of observations L xi.
x, ... , xn conditional on the observed value of L xi is multinomial with lnequality:
parameters (L xi,~, ... ,~)· Using this fact, Doornbos (1966) has shown np2-
n(n -l)
p~< SP(t) < np2
how to set up a discordancy test for (say) x(n)' based on its null distribution 2
conditional on L xi. The table of significance levels, say at 5 per cent, will
thus show a criticai value of X(n) corresponding to each pair of entry values n,
L xi. (As always with discrete distributions, the significance probability
attaching to each criticai integer value will not be exactly 5 per cent.)
Discordancy tests for four outlier situations are given belo w, namely a Tabulated significance levels: Table XVIIb, page 317; freshly compiled.
single upper outlier, a single lower outlier, an upper outlier-pair, and a
Reference: Doornbos (1966).
lower outlier-pair.
As an alternative to these specific tests we may, as already mentioned, use
P3 Discordancy test for an upper outlier-pair x<n-t)' x(n) in a Poisson sample
the fact that the transform .J (X+!) of a P oisso n random variable X with
of unknown mean
mean IL is approximately N(.J ILJ), providing that IL is not too small, say at
least 4 or 5. Thus to test for discordancy any outlier or outliers in a Poisson Test statistic: Tp3 = sum of outliers x<n- 1 ) + x(n)' tested conditionally on the
sample of unknown but not too small mean, we can take the transformed observed sum of observations L xi.
122 Outliers in statistica/ data Discordancy tests {or outliers in univariate samples 123

Inequality: Discordancy tests are given below for an upper outlier and a lower
outlier; these are in effect the same test. Por convenience, the table of
significance Ieveis that we give for this test is entered with n, m, and m - x<n>'
the tabuiated quantity being L xi.
Tabulated significance levels: Table XVIIIa, page 318; freshly compiled. Once again, transformation is available as an alternative to these specific
discordancy tests. Por a binomia! random variable X with parameters m, p
Further tables: Doornbos (1966), Tabie III, gives nominai 5 per cent and 1 the transform sin- 1[(X/m)!] is distributed approximately N(sin- 1 (p!),
per cent criticai values of x<n-t) + x<n> for n= 4(1)10 and L xi= 5(1)25. 1/(4m)). Thus to test for discordancy any outlier or outliers in a binomial
Reference: Doornbos (1966). sample with known m but unknown p, apply an appropriate N u test with
u = 1/(2m!) to the sample of transformed values sin- 1 [(xi, m)!].
P4 Discordancy test for a lower outlier-pair x(l>' x<2>in a Poisson sample of
unknown mean B 1 Discordancy test for a single upper outlier x(n) in a binomia/ sample of
unknown probability parameter
Test statistic: Tp4 = sum of outliers x(l) + x<2>, tested conditionally o n the
observed sum of observations L xi. Test statistic: TB 1 = outlier x<n>' tested conditionally on the observed sum of
observations L xi.
Inequality:
lnequality:
n(n -1)
np1 pi< SP(t)< np 1
2
Tabulated significance levels: Table XVIIIb, page 319; freshly compiled.
where p 1 = P(H(nm; L xi, m);.;:: t).
Further tables: Doornbos (1966), Table IV, gives nominai 5 per cent and 1
per cent criticai values of x< 1>+ x<2>for the following cases: n= 4 and n= 5, Tabulated significance levels: Table XIX, pp. 320-322; freshly compiled.
L xi= 4(1)25; n= 6, L xi= 14(1)25; n= 7 and n= 8, L xi= 17(1)25. Reference: Doornbos (1966).
Reference: Doornbos (1966). Example: See the exampie in Worksheet B2.
Discordancy tests for binomia/ samples
B2 Discordancy test for a single lower outlier x< 1> in a binomial sample of
Suppose we bave a sample x 1 , ••• , Xn in which each observation xi is a value unknown probability parameter
from a binomia! distribution B(m, Pi) with m known, Pi unknown. On the
working hypothesis the Pi are all equal, say to p (unknown). Data of this Test statistic: TB 2 = outlier x< 1>, tested conditionally o n the observed sum of
kind could arise, for instance, in sampiing inspection of mass-produced observations L xi.
items, where successive samples each of m items are inspected and xi is the Inequality:
number of defectives found on the jth occasion; or again in experiments to n(n -1) 2
compare, say, the germination rates of seeds under different conditions, m np2 p 2< SP(t) < np2
seeds being tested under each condition, with replication of the contro!. 2
The null distribution of a statistic such as x<n/L xi depends on the where p2 = P(H(nm; L xi, m)~ t).
unknown p, but the null distribution of x 17 ••• , Xn conditional on the
observed value of L xi does not depend on p, being multihypergeometric Tabulated significance levels: Tabie XIX, pp. 320-322. This is the table for
with parameters (nm; L xi; m, ... , m). Therefore, as with Poisson samples test B1 (upper outlier); to use it for test B2, repiace the given binomia! sample
Xt, ••• , Xn with Iower outlier x< 1> by the compiementary binomiai sampie
discussed above (see page 120), discordancy tests for outliers are carried out
conditionally on the observed value of L xi. In principle, the three quantities Yt =m-'- X1, ... , Yn =m- Xm and apply test B1 to the upper outlier Y<n> =
n, m, and L xi are needed for entering any table of significance levels of (say) m- x(l).
X( n)• Reference: Doornbos (1966).
122 Outliers in statistica/ data Discordancy tests {or outliers in univariate samples 123

Inequality: Discordancy tests are given below for an upper outlier and a lower
outlier; these are in effect the same test. Por convenience, the table of
significance Ieveis that we give for this test is entered with n, m, and m - x<n>'
the tabuiated quantity being L xi.
Tabulated significance levels: Table XVIIIa, page 318; freshly compiled. Once again, transformation is available as an alternative to these specific
discordancy tests. Por a binomia! random variable X with parameters m, p
Further tables: Doornbos (1966), Tabie III, gives nominai 5 per cent and 1 the transform sin- 1[(X/m)!] is distributed approximately N(sin- 1 (p!),
per cent criticai values of x<n-t) + x<n> for n= 4(1)10 and L xi= 5(1)25. 1/(4m)). Thus to test for discordancy any outlier or outliers in a binomial
Reference: Doornbos (1966). sample with known m but unknown p, apply an appropriate N u test with
u = 1/(2m!) to the sample of transformed values sin- 1 [(xi, m)!].
P4 Discordancy test for a lower outlier-pair x(l>' x<2>in a Poisson sample of
unknown mean B 1 Discordancy test for a single upper outlier x(n) in a binomia/ sample of
unknown probability parameter
Test statistic: Tp4 = sum of outliers x(l) + x<2>, tested conditionally o n the
observed sum of observations L xi. Test statistic: TB 1 = outlier x<n>' tested conditionally on the observed sum of
observations L xi.
Inequality:
lnequality:
n(n -1)
np1 pi< SP(t)< np 1
2
Tabulated significance levels: Table XVIIIb, page 319; freshly compiled.
where p 1 = P(H(nm; L xi, m);.;:: t).
Further tables: Doornbos (1966), Table IV, gives nominai 5 per cent and 1
per cent criticai values of x< 1>+ x<2>for the following cases: n= 4 and n= 5, Tabulated significance levels: Table XIX, pp. 320-322; freshly compiled.
L xi= 4(1)25; n= 6, L xi= 14(1)25; n= 7 and n= 8, L xi= 17(1)25. Reference: Doornbos (1966).
Reference: Doornbos (1966). Example: See the exampie in Worksheet B2.
Discordancy tests for binomia/ samples
B2 Discordancy test for a single lower outlier x< 1> in a binomial sample of
Suppose we bave a sample x 1 , ••• , Xn in which each observation xi is a value unknown probability parameter
from a binomia! distribution B(m, Pi) with m known, Pi unknown. On the
working hypothesis the Pi are all equal, say to p (unknown). Data of this Test statistic: TB 2 = outlier x< 1>, tested conditionally o n the observed sum of
kind could arise, for instance, in sampiing inspection of mass-produced observations L xi.
items, where successive samples each of m items are inspected and xi is the Inequality:
number of defectives found on the jth occasion; or again in experiments to n(n -1) 2
compare, say, the germination rates of seeds under different conditions, m np2 p 2< SP(t) < np2
seeds being tested under each condition, with replication of the contro!. 2
The null distribution of a statistic such as x<n/L xi depends on the where p2 = P(H(nm; L xi, m)~ t).
unknown p, but the null distribution of x 17 ••• , Xn conditional on the
observed value of L xi does not depend on p, being multihypergeometric Tabulated significance levels: Tabie XIX, pp. 320-322. This is the table for
with parameters (nm; L xi; m, ... , m). Therefore, as with Poisson samples test B1 (upper outlier); to use it for test B2, repiace the given binomia! sample
Xt, ••• , Xn with Iower outlier x< 1> by the compiementary binomiai sampie
discussed above (see page 120), discordancy tests for outliers are carried out
conditionally on the observed value of L xi. In principle, the three quantities Yt =m-'- X1, ... , Yn =m- Xm and apply test B1 to the upper outlier Y<n> =
n, m, and L xi are needed for entering any table of significance levels of (say) m- x(l).
X( n)• Reference: Doornbos (1966).
Discordancy tests for outliers in univariate samples 125
124 Outliers in statistica/ data

Example: Suppose that, in the inspection of quality of a manufactured ite m, five Reference: Wani and Kabe (1971).
specimens are selected randomly from each of ten batches and tested to
destruction, with the following results: TE2 Discordancy test for a single lower outlier x(l) in a truncated exponential
sample
Batch A B C D E F G H I J
Test statistic:
Number of good
items out of five 5 4 4 5 4 1 4 5 3 4 excess x(2>- x<1>
TTE2=--= .
range x< n>- x(l)
Can one take it as obvious that batch F is out of line with the others?
Test distribution:
On the basic model, we bave a sample~ of ten from a distribution
(n-1)! n (-t+i {1
B(m, p) where m= 5 and p is unknown. We wish to test the lower outlier
x6 = x< 1> = 1 for discordancy. T o convert into an upper outlier test, we
fn (t)
(1- e-at i~3 (j- 3)!(n- j)! uJ
consider instead the numbers of failures yb ... , y10 =O, 1, 1, O, 1, 4, 1, O, 2,
l. The upper outlier Y(lo) = 4 =m -1. Entering Table XIX with n= 10
and m= 5, we see that this outlier would be significant at 1 per cent if L Yi
where ui=n-j+1+(j-2)t, (O~t~1).
were 7 or less, and would be significant at 5 per cent if L Yi were 10 or less.
In fact L Yi = 11, so the evidence for regarding batch F as discordant is weak. Reference: Wani and Kabe (1971).

Discordancy tests for truncated exponential samples


If the exponential distribution with density (1/A)exp(-x/A) (x>O) is trun-
cated at x= a we get the distribution with density (1/A) x
[1-exp(-a/A)]- 1 exp(-x/A) (O< x< a), O (x> a). Such a distribution might
arise, for example, in life testing data where for practical reasons the test is
not allowed to continue for longer than some preassigned duration a. Upper
outlier problems are perhaps not particularly likely to be found with such
data, in view of the truncation. Two Dixon-type discordancy tests are
available in the literature (Wani and Kabe, 1971), and we give details of
these belo w.

TE1 Discordancy test fora single upper outlier x<n> in a truncated exponential
sample
Test statistic:
,..,., _ excess _ X(n)- X<n-1)
~TE1____ ·
range x<n>- x(l)
Test distribution:
(n-1)! n-l (-t+i { 1
fn(t)=(1- e -a)n.L ("-2)1(
. n -·)1
1 . ui2
1= 2 1

where ui=n-j+1-(n-j)t, (O~t~1).


Discordancy tests for outliers in univariate samples 125
124 Outliers in statistica/ data

Example: Suppose that, in the inspection of quality of a manufactured ite m, five Reference: Wani and Kabe (1971).
specimens are selected randomly from each of ten batches and tested to
destruction, with the following results: TE2 Discordancy test for a single lower outlier x(l) in a truncated exponential
sample
Batch A B C D E F G H I J
Test statistic:
Number of good
items out of five 5 4 4 5 4 1 4 5 3 4 excess x(2>- x<1>
TTE2=--= .
range x< n>- x(l)
Can one take it as obvious that batch F is out of line with the others?
Test distribution:
On the basic model, we bave a sample~ of ten from a distribution
(n-1)! n (-t+i {1
B(m, p) where m= 5 and p is unknown. We wish to test the lower outlier
x6 = x< 1> = 1 for discordancy. T o convert into an upper outlier test, we
fn (t)
(1- e-at i~3 (j- 3)!(n- j)! uJ
consider instead the numbers of failures yb ... , y10 =O, 1, 1, O, 1, 4, 1, O, 2,
l. The upper outlier Y(lo) = 4 =m -1. Entering Table XIX with n= 10
and m= 5, we see that this outlier would be significant at 1 per cent if L Yi
where ui=n-j+1+(j-2)t, (O~t~1).
were 7 or less, and would be significant at 5 per cent if L Yi were 10 or less.
In fact L Yi = 11, so the evidence for regarding batch F as discordant is weak. Reference: Wani and Kabe (1971).

Discordancy tests for truncated exponential samples


If the exponential distribution with density (1/A)exp(-x/A) (x>O) is trun-
cated at x= a we get the distribution with density (1/A) x
[1-exp(-a/A)]- 1 exp(-x/A) (O< x< a), O (x> a). Such a distribution might
arise, for example, in life testing data where for practical reasons the test is
not allowed to continue for longer than some preassigned duration a. Upper
outlier problems are perhaps not particularly likely to be found with such
data, in view of the truncation. Two Dixon-type discordancy tests are
available in the literature (Wani and Kabe, 1971), and we give details of
these belo w.

TE1 Discordancy test fora single upper outlier x<n> in a truncated exponential
sample
Test statistic:
,..,., _ excess _ X(n)- X<n-1)
~TE1____ ·
range x<n>- x(l)
Test distribution:
(n-1)! n-l (-t+i { 1
fn(t)=(1- e -a)n.L ("-2)1(
. n -·)1
1 . ui2
1= 2 1

where ui=n-j+1-(n-j)t, (O~t~1).


Accommodation of outliers in univariate samples 127

Many of tbe detailed results publisbed so far relate to normal or exponential


F. In our generai discussion (Section 4.1) of bow to construct robust
procedures and evaluate tbeir performance, we sball frequently find it
convenient to illustrate ideas on tbe assumption tbat F is normal; our basic
results will apply, mutatis mutandis, to otber distributions. In Section 4.2 we
review robust procedures wbere tbere is no specific declaration of tbe
CHAPTER 4 distributionai form of F. In Section 4.3 we will consider in some detail tbe
particular case wbere F is normal. .
. Most of tbe existing work based on tbe exchangeable type of alternative
Accommodation of Outliers zn model relates to exponential samples. Tbis work is often furtber par-
ticularized by tbe assumption tbat tbere is just one discordant observation.
Univariate Samples: Robust A vailabie results for exponential sampies will be discusse d in detail in
Section 4.4.
Estimation and Testing In tbe context of slippage alternatives, robust procedures bave been
discussed for normai sampies by a number of writers, including Anscombe
(1960a), Tiao and Guttman (1967), Veale and Huntsberger (1969), Gutt-
We bave outlined in Cbapter 2 tbe need for accommodation of outliers by man and Smith (1969, 1971), Guttman (1973a), and also, in tbe particular
means of estimation or testing procedures which are robust against the case of samples of size 3, Anscombe and Barron (1966) and Willke (1966).
presence of outliers-in tbe phrase of Anscombe and Barron (1966), 'desen- In ali tbese papers (not only tbe o n es dealing witb samples of size 3 !) tbe
sitized to outliers'. 'Wbat is a robust procedure?' Huber asks, in bis 1972 number, k, of discordant observations in a sample is assumed to be eitber
Wald Lecture Robust Statistics: A Review, and goes on to say: one or two.
one never has a very accurate knowledge of the true underlying distribution; ... the Undoubtediy the type of alternative modei most commonly envisaged, at
performance of some of the classica} tests or estimates is very unstable under small any rate for normai samples, is a mixture model. Tukey (1960), in a seminai
changes of the underlying distribution; ... some alternative tests or estimates ... lose paper 'A survey of sampling from contaminated distributions', discusses
very little efficiency for an exactly normal law, but show a much better and more robust estimation for samples wbere tbe basic modei is normal,
stable performance under deviations from it.
While for years one had been concerned mostly with what was later called H:F,
'robustness of validity' (that the actual confidence levels should be dose to, or at
least on the safe side of the nominai levels), one realized now that 'robustness of and tbe alternative is a mixture of two normals,
performance' (stability of power, or of the length of confidence intervals) was at least
as important .... H: (1- A)F+ AG (O<A<1),
From the beginning, 'robustness' has been a rather vague concept; ... if one wants
to choose in a rational fashion between different robust competitors to a classical
tbe normal distribution G baving eitber tbe same mean as F but a larger
procedure, one has to make precise the goals one wants to achieve. (Huber, 1972) variance, or tbe same variance but a sbifted mean. Tukey calls the mixture
(1- À )F + AG a contaminated distribution, tbe basic distribution F being
From our point of view, tbis implies tbat tbe procedures need to perform contaminated by tbe distribution G; tbe parameter À, commoniy a qui te
satisfactorily under alternative models of tbe kinds which generate outliers. small fraction, is tbe amount of contamination, or contamination fraction, or
Our attention is accordingly focused on alternative models of tbe categories just tbe contamination. Sampie observations wbicb come from G are con-
discussed in Cbapter 2, and specifically tbe mixture, slippage, and exchange- taminants; un der H, tbe number of contaminants in a sample of n observa-
able modeis ratber tban tbe inherent type of alternative. Inherent alterna- tions will be a binomia! random variable with parameters n, À. We bave
tives, such as a Caucby distribution for data normally distributed on tbe already used the term 'contaminant' in Sections 3.2 and 3.3 in tbe context of
basic model, are covered by more generai (non-outlier-specific) robustness a slippage alternative. Slippage models can of course be regarded as con-
procedures. (Wbat is meant by 'satisfactory' performance is discussed in tamination modeis in wbicb tbe number of contaminants in a sample is fixed.
some detail in Section 4.1.)
Accordingiy, some writers refer to tbe mixture model as one of 'random
In tbis cbapter, we restrict attention to a univariate sample of n observa-
contamination' and the slippage model as one with a fixed number of
tions x 11 ••• , Xm all of wbicb (on tbe basic model) belong to a distribution F. contaminants. However, we will use Tukey's terminology, wbicb bas become
126
Accommodation of outliers in univariate samples 127

Many of tbe detailed results publisbed so far relate to normal or exponential


F. In our generai discussion (Section 4.1) of bow to construct robust
procedures and evaluate tbeir performance, we sball frequently find it
convenient to illustrate ideas on tbe assumption tbat F is normal; our basic
results will apply, mutatis mutandis, to otber distributions. In Section 4.2 we
review robust procedures wbere tbere is no specific declaration of tbe
CHAPTER 4 distributionai form of F. In Section 4.3 we will consider in some detail tbe
particular case wbere F is normal. .
. Most of tbe existing work based on tbe exchangeable type of alternative
Accommodation of Outliers zn model relates to exponential samples. Tbis work is often furtber par-
ticularized by tbe assumption tbat tbere is just one discordant observation.
Univariate Samples: Robust A vailabie results for exponential sampies will be discusse d in detail in
Section 4.4.
Estimation and Testing In tbe context of slippage alternatives, robust procedures bave been
discussed for normai sampies by a number of writers, including Anscombe
(1960a), Tiao and Guttman (1967), Veale and Huntsberger (1969), Gutt-
We bave outlined in Cbapter 2 tbe need for accommodation of outliers by man and Smith (1969, 1971), Guttman (1973a), and also, in tbe particular
means of estimation or testing procedures which are robust against the case of samples of size 3, Anscombe and Barron (1966) and Willke (1966).
presence of outliers-in tbe phrase of Anscombe and Barron (1966), 'desen- In ali tbese papers (not only tbe o n es dealing witb samples of size 3 !) tbe
sitized to outliers'. 'Wbat is a robust procedure?' Huber asks, in bis 1972 number, k, of discordant observations in a sample is assumed to be eitber
Wald Lecture Robust Statistics: A Review, and goes on to say: one or two.
one never has a very accurate knowledge of the true underlying distribution; ... the Undoubtediy the type of alternative modei most commonly envisaged, at
performance of some of the classica} tests or estimates is very unstable under small any rate for normai samples, is a mixture model. Tukey (1960), in a seminai
changes of the underlying distribution; ... some alternative tests or estimates ... lose paper 'A survey of sampling from contaminated distributions', discusses
very little efficiency for an exactly normal law, but show a much better and more robust estimation for samples wbere tbe basic modei is normal,
stable performance under deviations from it.
While for years one had been concerned mostly with what was later called H:F,
'robustness of validity' (that the actual confidence levels should be dose to, or at
least on the safe side of the nominai levels), one realized now that 'robustness of and tbe alternative is a mixture of two normals,
performance' (stability of power, or of the length of confidence intervals) was at least
as important .... H: (1- A)F+ AG (O<A<1),
From the beginning, 'robustness' has been a rather vague concept; ... if one wants
to choose in a rational fashion between different robust competitors to a classical
tbe normal distribution G baving eitber tbe same mean as F but a larger
procedure, one has to make precise the goals one wants to achieve. (Huber, 1972) variance, or tbe same variance but a sbifted mean. Tukey calls the mixture
(1- À )F + AG a contaminated distribution, tbe basic distribution F being
From our point of view, tbis implies tbat tbe procedures need to perform contaminated by tbe distribution G; tbe parameter À, commoniy a qui te
satisfactorily under alternative models of tbe kinds which generate outliers. small fraction, is tbe amount of contamination, or contamination fraction, or
Our attention is accordingly focused on alternative models of tbe categories just tbe contamination. Sampie observations wbicb come from G are con-
discussed in Cbapter 2, and specifically tbe mixture, slippage, and exchange- taminants; un der H, tbe number of contaminants in a sample of n observa-
able modeis ratber tban tbe inherent type of alternative. Inherent alterna- tions will be a binomia! random variable with parameters n, À. We bave
tives, such as a Caucby distribution for data normally distributed on tbe already used the term 'contaminant' in Sections 3.2 and 3.3 in tbe context of
basic model, are covered by more generai (non-outlier-specific) robustness a slippage alternative. Slippage models can of course be regarded as con-
procedures. (Wbat is meant by 'satisfactory' performance is discussed in tamination modeis in wbicb tbe number of contaminants in a sample is fixed.
some detail in Section 4.1.)
Accordingiy, some writers refer to tbe mixture model as one of 'random
In tbis cbapter, we restrict attention to a univariate sample of n observa-
contamination' and the slippage model as one with a fixed number of
tions x 11 ••• , Xm all of wbicb (on tbe basic model) belong to a distribution F. contaminants. However, we will use Tukey's terminology, wbicb bas become
126
Accommodation of outliers in univariate samples 129
128 Outliers in statistica/ data

fairly widely adopted; 'contamination' will be understood to be in tbe The contamination bas caused tbe sampling variance to increase, relative to
mixture-model sense, unless a slippage model is explicitly specified. u 2 fn, by a factor l+ (b -l)A. Por À = 0.05 and b = 9, i.e. for 5 ~e~ cent
In a contamination model, tbe contaminating distribution G need not contamination by a distribution G with three times the standard dev1at10n of
necessarily be normal even if F is normal. It may be symmetric about tbe F-a not untoward situation-tbis factor is 1.4; for 10 per cent contamina-
mean of tbe basic normal distribution F, or it may not. We distinguisb tbe tion by the same G it is 1.8, a loss of efficiency of 44 per cent.
cases of symmetric contamination and asymmetric contamination in tbe Now consider the effect of tbe contamination on tbe performance of the
following manner for tbe case of a symmetric basic distribution F. Contami- sample variance s 2 as an estimator of u 2 • A straigbtforward calculation
nation is symmetric if tbe contaminating distribution is symmetric about tbe gives, conditional on R contaminants,
centre of tbe distribution F. Tbus if F is normal, G could be normal witb the
u2
same mean as F but witb a greater variance; or, more generally, it could be E(s 2 1 R) =-[(n- R)+ Rb] as in (4.0.1),
of arbitrary symmetric form centred at the mean of F. Witb asymmetric n
contamination, G may be symmetric about some value different from the 3u4
centre of F, or it may be asymmetric. Por example, again wben F is normal, E(s 4 R) = - 2 (n- R + Rb 2)
1
n
G may be normal (or non-normal symmetric) with a different mean from
tbat of F, or it may be of arbitrary asymmetric form. We sball be mainly + (n2- 2n + 3)u4 [(n- R)(n- R -1) + 2(n- R)Rb + R(R -l)b2].
n2
concerned with symmetric contamination of a basic symmetric distribution F
mixed with only one contaminating distribution G. (One could envisage F
being mixed witb a set of contaminating distributions, e.g.
H: (1- Àt- À2)F+ Àt Gl + A2G2
or, stili more generally,
H: J
G(A, x) dK(A)
wbere F is a particular distribution, say F(x) = G(A 0 , x), from a one- The contamination bas caused tbe sampling variance of s2 to increase,
parameter family G(A, x), and K is a mixing measure. For example, F migbt relative to the sampling variance 2u 4 /(n -l) under tbe basic model, by tbe
be N(~-t, u 2) and G(A, x), migbt be N(~-t, Au 2) witb À exponentially distri- factor in square brackets in (4.0.2). Por À = 0.05 and b = 9, tbe value of th~s
buted and A0 = l; tbe mixture specified by H is in this case a double- factor is 6.52- (4.56/n). Even witb À as small as 0.01 and b = 9, tbe factor 1s
exponential distribution, illustrating tbat with sucb a mixture situation we 2.12- (0.95/n); witb a sample size as small as 7, a mere l per cent
bave really moved over to an inherent alternative.) contamination by N(~-t, 9u 2) causes a loss of efficiency of 50 per cent in using
If tbe basic distribution F is N(~-t, u 2), inferences about IL may assume tbe sample variance to estimate u 2!
u 2 known or unknown; likewise, inferences about u 2 may assume IL known Tbis striking effect must be due to tbe incidence, even if infrequent, of
or unknown. It might be tbougbt tbat the cases u 2 known or IL known would extreme values from tbe contaminating distribution. Qne migbt tbink tbat
be only of academic interest, but tbis is not so; examples of practical contaminants wbicb bave so pronounced an effect on the efficiency of
situations involving knowledge of u 2 or of IL do arise (cf. Sections 3.4.3 and estimation would sbow up unmistakably as outliers and could be rejected on
3.4.4). tbe basis of some discordancy test. Tbis is not so. To entertain sucb a bope
What is tbe effect of contamination? It may be considerable, even for very witb tbe sample of size 7 discussed above would be vain; tbat it would be
small values of À. Suppose, for example, tbat F: N(~-t, u 2) is contaminated in equally so witb large samples bas been demonstrated cogently by Tukey
tbe ratio 1- À: À by another normal distribution G: N(~-t, bu 2 ) (b >l) baving (1960). He considers by way of example a sample of 1000 observations from
b times its variance. In a sample of size n from tbe mixture, there will be 2
a contaminated distribution (1- A)F+ AG, wbere F is N(~-t, u ), G is
R = O, l, . . . contaminants, tbe random variable R baving a binomia! 2
N(~-t, 9u ), and A= 0.01 (tbe l per cent contamination of our earlier exam-
distribution witb parameters n, À. The sample mean i will be an unbiased ple). Some typical percentiles of tbe two distributions are as sbown in T~ble
estimator of /L, witb sampling variance 4.1. Tbus tbe two cumulative distributions are indistinguisbable for practlcal
purposes for values of tbe variable between IL - 2!u an d IL + 2!u. Of e te n t?
(4.0.1) expected observations from G, only about 40 per ce n t (correspondmg to
Accommodation of outliers in univariate samples 129
128 Outliers in statistica/ data

fairly widely adopted; 'contamination' will be understood to be in tbe The contamination bas caused tbe sampling variance to increase, relative to
mixture-model sense, unless a slippage model is explicitly specified. u 2 fn, by a factor l+ (b -l)A. Por À = 0.05 and b = 9, i.e. for 5 ~e~ cent
In a contamination model, tbe contaminating distribution G need not contamination by a distribution G with three times the standard dev1at10n of
necessarily be normal even if F is normal. It may be symmetric about tbe F-a not untoward situation-tbis factor is 1.4; for 10 per cent contamina-
mean of tbe basic normal distribution F, or it may not. We distinguisb tbe tion by the same G it is 1.8, a loss of efficiency of 44 per cent.
cases of symmetric contamination and asymmetric contamination in tbe Now consider the effect of tbe contamination on tbe performance of the
following manner for tbe case of a symmetric basic distribution F. Contami- sample variance s 2 as an estimator of u 2 • A straigbtforward calculation
nation is symmetric if tbe contaminating distribution is symmetric about tbe gives, conditional on R contaminants,
centre of tbe distribution F. Tbus if F is normal, G could be normal witb the
u2
same mean as F but witb a greater variance; or, more generally, it could be E(s 2 1 R) =-[(n- R)+ Rb] as in (4.0.1),
of arbitrary symmetric form centred at the mean of F. Witb asymmetric n
contamination, G may be symmetric about some value different from the 3u4
centre of F, or it may be asymmetric. Por example, again wben F is normal, E(s 4 R) = - 2 (n- R + Rb 2)
1
n
G may be normal (or non-normal symmetric) with a different mean from
tbat of F, or it may be of arbitrary asymmetric form. We sball be mainly + (n2- 2n + 3)u4 [(n- R)(n- R -1) + 2(n- R)Rb + R(R -l)b2].
n2
concerned with symmetric contamination of a basic symmetric distribution F
mixed with only one contaminating distribution G. (One could envisage F
being mixed witb a set of contaminating distributions, e.g.
H: (1- Àt- À2)F+ Àt Gl + A2G2
or, stili more generally,
H: J
G(A, x) dK(A)
wbere F is a particular distribution, say F(x) = G(A 0 , x), from a one- The contamination bas caused tbe sampling variance of s2 to increase,
parameter family G(A, x), and K is a mixing measure. For example, F migbt relative to the sampling variance 2u 4 /(n -l) under tbe basic model, by tbe
be N(~-t, u 2) and G(A, x), migbt be N(~-t, Au 2) witb À exponentially distri- factor in square brackets in (4.0.2). Por À = 0.05 and b = 9, tbe value of th~s
buted and A0 = l; tbe mixture specified by H is in this case a double- factor is 6.52- (4.56/n). Even witb À as small as 0.01 and b = 9, tbe factor 1s
exponential distribution, illustrating tbat with sucb a mixture situation we 2.12- (0.95/n); witb a sample size as small as 7, a mere l per cent
bave really moved over to an inherent alternative.) contamination by N(~-t, 9u 2) causes a loss of efficiency of 50 per cent in using
If tbe basic distribution F is N(~-t, u 2), inferences about IL may assume tbe sample variance to estimate u 2!
u 2 known or unknown; likewise, inferences about u 2 may assume IL known Tbis striking effect must be due to tbe incidence, even if infrequent, of
or unknown. It might be tbougbt tbat the cases u 2 known or IL known would extreme values from tbe contaminating distribution. Qne migbt tbink tbat
be only of academic interest, but tbis is not so; examples of practical contaminants wbicb bave so pronounced an effect on the efficiency of
situations involving knowledge of u 2 or of IL do arise (cf. Sections 3.4.3 and estimation would sbow up unmistakably as outliers and could be rejected on
3.4.4). tbe basis of some discordancy test. Tbis is not so. To entertain sucb a bope
What is tbe effect of contamination? It may be considerable, even for very witb tbe sample of size 7 discussed above would be vain; tbat it would be
small values of À. Suppose, for example, tbat F: N(~-t, u 2) is contaminated in equally so witb large samples bas been demonstrated cogently by Tukey
tbe ratio 1- À: À by another normal distribution G: N(~-t, bu 2 ) (b >l) baving (1960). He considers by way of example a sample of 1000 observations from
b times its variance. In a sample of size n from tbe mixture, there will be 2
a contaminated distribution (1- A)F+ AG, wbere F is N(~-t, u ), G is
R = O, l, . . . contaminants, tbe random variable R baving a binomia! 2
N(~-t, 9u ), and A= 0.01 (tbe l per cent contamination of our earlier exam-
distribution witb parameters n, À. The sample mean i will be an unbiased ple). Some typical percentiles of tbe two distributions are as sbown in T~ble
estimator of /L, witb sampling variance 4.1. Tbus tbe two cumulative distributions are indistinguisbable for practlcal
purposes for values of tbe variable between IL - 2!u an d IL + 2!u. Of e te n t?
(4.0.1) expected observations from G, only about 40 per ce n t (correspondmg to
130 Outliers in statistica[ data Accommodation of outliers in univariate samples 131

Table 4.1 wbere typically G could be N(~-t, bu 2 ), b > 1, and A, b are known. In view of
tbe symmetry of tbe model we can reasonably assume E(T l H)= /L, and tbe
Cumulative Percentile Percentile question of bias in T does not arise.
probability of F of (1- A)F+ AG If var(M. l H) did not appreciably exceed var(M. l H), M. would itself be
robust, and tbere would be no need to seek for a rival estimator T; tbe ratio
0.25 f.L -0.67u f.L -0.68u
0.05 f.L -1.64u f.L -1.67u var(M-1 H)
0.01 f.L -2.33cr p. -2.41u var(M-1 H)
0.006 p.-2.51u f.L -2.64u
provides a quantitative indication of tbe need for a robust estimator alterna-
tive to M.. W e can assume tbat tbis ratio exceeds unity substantially, as in tbe
example discussed under equation (4.0.2).
standardized deviates ±2.5/3 = ±0.833) For robustness of T we require tbat var(T l H) sball be substantially less
tban var(ill H); also, of course, T must be a reasonable alternative to tbe
will fall outside these bounds, and we may thus expect only two observations from
the upper tail of the broader, rarer constituent [i.e. G], and another two from the optimal il wben tbe data obey tbe basic model F, i.e. we require tbat
lower tail. Beyond the same limits we will expect about six (in each tail) from the var(T l H) sball not be 'unduly' greater tban var(M.I H). So
narrower constituent [i.e. F]. Unless one or both of the two are very extreme, the var(T l H)/var(M-1 H) is to be 'small', and var(T l H)/var(M-1 H) 'not mucb
indication of non-normality will be very slight .... A sample of one thousand is likely greater tban unity'. Eacb of tbese ratios is, of course, a relative efficiency
to be of little help. (Tukey, 1960)
measure, and discussions of performance are often pbrased in terms of
efficiency. An alternative terminology is tbat of protection an d premium;
4.1 PERFORMANCE CRITERIA tbese concepts, equivalent to tbe above relative efficiency measures, were
introduced by Anscombe (1960a) in a classical paper. We quote tbe passage
4.1.1 Effi.ciency measures for estimators in wbicb be introduces tbese terms as part of bis exposition of a basic
pbilosopby in tbe matter of accommodating outliers:
Suppose we are estimating tbe location parameter IL of a symmetric popula-
tion F from a sample of size n. Our cboice of estimator m1gbt be unre- Rejection rules are not significance tests .... when a chemist doing routine analyses,
stricted, or we migbt decide on tbe otber band to confine ourselves to some or a surveyor making a triangulation, makes routine use of a rejection rule, he is not
restricted class of estimators, su cb as linear combinations a 1 x( l}+ •.• + anx(n) studying whether spurious readings occur (he may already be convinced they do
sometimes), but guarding himself from their adverse effect ....
of tbe sample order statistics. If tbe basic model bolds good, A rejection rule is like a householder's fire insurance policy. Three questions to be
H:F, considered in choosing a policy are
(1) What is the premium?
an 'optimal' estimator of IL can be defined among tbe estimators of tbe class (2) How much protection does the policy give in the event of fire?
we are considering; we will denote tbis by M.. For example, il migbt be tbe (3) How much danger really is there of a fire?
maximum likelibood estimator {l, or tbe linear unbiased estimator of form Item (3) corresponds to the study of whether spurious readings occur in fact ...
L aix(i) witb minimum variance (tbe 'best linear unbiased estimator', or The householder, satisfies that fires do occur, does not bother much about (3),
provided the premium seems moderate and the protection good. (Anscombe, 1960a)
BLUE); wbile tbese estimators coincide for a normal distribution, tbey will
be quite different for, say, a Caucby distribution. However we define it, i1 Tbe 'fire' bere is tbe occurrence of outliers. Tbe discussion is coucbed in
serves as a yardstick for wbat an estimator can acbieve in relation to tbe terms of rejection rules, but applies to accommodation procedures in
basic model. To fix ideas, suppose F is N(~-t, u 2 ); M. tben would typically be generai. Anscombe goes on to ask:
tbe sample mean x. We now propose some rival estimator, T, wbicb sball be
robust in relation to some outlier-generating model H. In what currency can we express the premium charged and the protection afforded
by a rejection rule? ... variance will be considered here, although in principle any
Starting witb tbe simplest situation, we take ii to be a simple and other measure of expected loss could be used. The premium payable may then be
symmetric alternative, providing merely for tbe observations to belong to a taken to be the percentage increase in the variance of estimation errors due to using
specified mixture distribution witb known symmetric contamination. Tbat is, the rejection rule, when in fact ali the observations come from a homogeneous
n ormai source; the protection given is the reduction in variance (or me an squared
H: (1-A)F+AG (4.1.1) error) when spurious readings are present. (Anscombe, 1960a)
130 Outliers in statistica[ data Accommodation of outliers in univariate samples 131

Table 4.1 wbere typically G could be N(~-t, bu 2 ), b > 1, and A, b are known. In view of
tbe symmetry of tbe model we can reasonably assume E(T l H)= /L, and tbe
Cumulative Percentile Percentile question of bias in T does not arise.
probability of F of (1- A)F+ AG If var(M. l H) did not appreciably exceed var(M. l H), M. would itself be
robust, and tbere would be no need to seek for a rival estimator T; tbe ratio
0.25 f.L -0.67u f.L -0.68u
0.05 f.L -1.64u f.L -1.67u var(M-1 H)
0.01 f.L -2.33cr p. -2.41u var(M-1 H)
0.006 p.-2.51u f.L -2.64u
provides a quantitative indication of tbe need for a robust estimator alterna-
tive to M.. W e can assume tbat tbis ratio exceeds unity substantially, as in tbe
example discussed under equation (4.0.2).
standardized deviates ±2.5/3 = ±0.833) For robustness of T we require tbat var(T l H) sball be substantially less
tban var(ill H); also, of course, T must be a reasonable alternative to tbe
will fall outside these bounds, and we may thus expect only two observations from
the upper tail of the broader, rarer constituent [i.e. G], and another two from the optimal il wben tbe data obey tbe basic model F, i.e. we require tbat
lower tail. Beyond the same limits we will expect about six (in each tail) from the var(T l H) sball not be 'unduly' greater tban var(M.I H). So
narrower constituent [i.e. F]. Unless one or both of the two are very extreme, the var(T l H)/var(M-1 H) is to be 'small', and var(T l H)/var(M-1 H) 'not mucb
indication of non-normality will be very slight .... A sample of one thousand is likely greater tban unity'. Eacb of tbese ratios is, of course, a relative efficiency
to be of little help. (Tukey, 1960)
measure, and discussions of performance are often pbrased in terms of
efficiency. An alternative terminology is tbat of protection an d premium;
4.1 PERFORMANCE CRITERIA tbese concepts, equivalent to tbe above relative efficiency measures, were
introduced by Anscombe (1960a) in a classical paper. We quote tbe passage
4.1.1 Effi.ciency measures for estimators in wbicb be introduces tbese terms as part of bis exposition of a basic
pbilosopby in tbe matter of accommodating outliers:
Suppose we are estimating tbe location parameter IL of a symmetric popula-
tion F from a sample of size n. Our cboice of estimator m1gbt be unre- Rejection rules are not significance tests .... when a chemist doing routine analyses,
stricted, or we migbt decide on tbe otber band to confine ourselves to some or a surveyor making a triangulation, makes routine use of a rejection rule, he is not
restricted class of estimators, su cb as linear combinations a 1 x( l}+ •.• + anx(n) studying whether spurious readings occur (he may already be convinced they do
sometimes), but guarding himself from their adverse effect ....
of tbe sample order statistics. If tbe basic model bolds good, A rejection rule is like a householder's fire insurance policy. Three questions to be
H:F, considered in choosing a policy are
(1) What is the premium?
an 'optimal' estimator of IL can be defined among tbe estimators of tbe class (2) How much protection does the policy give in the event of fire?
we are considering; we will denote tbis by M.. For example, il migbt be tbe (3) How much danger really is there of a fire?
maximum likelibood estimator {l, or tbe linear unbiased estimator of form Item (3) corresponds to the study of whether spurious readings occur in fact ...
L aix(i) witb minimum variance (tbe 'best linear unbiased estimator', or The householder, satisfies that fires do occur, does not bother much about (3),
provided the premium seems moderate and the protection good. (Anscombe, 1960a)
BLUE); wbile tbese estimators coincide for a normal distribution, tbey will
be quite different for, say, a Caucby distribution. However we define it, i1 Tbe 'fire' bere is tbe occurrence of outliers. Tbe discussion is coucbed in
serves as a yardstick for wbat an estimator can acbieve in relation to tbe terms of rejection rules, but applies to accommodation procedures in
basic model. To fix ideas, suppose F is N(~-t, u 2 ); M. tben would typically be generai. Anscombe goes on to ask:
tbe sample mean x. We now propose some rival estimator, T, wbicb sball be
robust in relation to some outlier-generating model H. In what currency can we express the premium charged and the protection afforded
by a rejection rule? ... variance will be considered here, although in principle any
Starting witb tbe simplest situation, we take ii to be a simple and other measure of expected loss could be used. The premium payable may then be
symmetric alternative, providing merely for tbe observations to belong to a taken to be the percentage increase in the variance of estimation errors due to using
specified mixture distribution witb known symmetric contamination. Tbat is, the rejection rule, when in fact ali the observations come from a homogeneous
n ormai source; the protection given is the reduction in variance (or me an squared
H: (1-A)F+AG (4.1.1) error) when spurious readings are present. (Anscombe, 1960a)
132 Outliers in statistical data Accommodation of outliers in univariate samples 133

Clearly, in the case we have considered so far, Thus the estimator has bias Aa, and mean squared error
. _ var(T l H)-var(~ l H)
premmm - var(~ H) l (4.1.2) MSE(i) = :2 + a2[ A2+ A(l: A)]. (4.1.6)

an d It follows from (4.1.4), (4.1.5) that if we want the bias of x to be less than,
say, one-half the standard deviation, the sample size must satisfy
. _ var(~ l H)-var(T l H)
protectlon - var(~ H) l (4.1.3)
u1( À) .2
1-
n<---+--
2 2
(4.1.7)
4 A a A
Whether the measurement of robustness is discussed in terms of variance
ratios or relative efficiencies on the one hand, or premium and protection on For example, with 1 per ce n t contamination, an d a equal to 5 u, the sample
the other, is a matter of taste. Papers which make explicit use of premium size must not exceed 125. For further discussion see Huber (1964, p. 83)
and protection include Anscombe and Barron (1966), Tiao and Guttman and Jaeckel (1971a).
(1967), Guttman and Smith (1969, 1971), Guttman (1973a), and Desu,
Gehan, and Severo (1974). (ii) Compound alternative H
From the simple situation just discussed, we may generalize in severa!
If in (4.1.1) we regard A as a parameter with a range of possible values (say
directions.
O< A :s:;; A1 ) rather than a single known quantity, our alternative mode l is a
family of mixture distributions, indexed by A. Consider any performance
(i) Asymmetric contamination
measure M, for example the protection measure defined in (4.1.3)
l
If F is N(~-t, u 2 ) and G is, say, N(~-t +a, u 2 ), E(T H) will not in generai be
M= [var(~ l H)-var(T l H)]/var(~ l H).
equal to /L, and we have to take account of bias in our estimators. In this
situation, M will no w be a function of A, an d we can write it as M(A ). (Similarly, M
could be a function of b, or of the two parameters A and b.) If our estimator
a natural criterion for judging them is their mean squared error, a measure which
takes into account both their inherent variability and their distance from the T is to possess robustness in relation to H, it must perform 'satisfactorily' for
estimand. (Jaeckel, 197la) every distribution which can arise un der H, an d the value of M(A) must be
'satisfactory' for ali values of A between zero and A1 • For a single measure
This leads us to replacing var(~ l H), var( T l H) in our discussion by the
of performance this naturally suggests the value of the worst possible
mean squared error values
performance under H, in other words the extreme value of M(A) (minimum
l
MSE(~ H)= E[(~- ~-t) 2 l H]; MSE(T l H)= E[(T-~-t) 2 1 H]. or maximum as appropriate) over the range of possible values of A. This
measure could, for instance, take the form of the maximum variance of the
, There is also the question of how the bias affects performance. We have
estimator under H, or on the other hand the minimum protection value, as A
E[(T-~-t) 2 1 H]= var(T l H)(1 + c 2 ) vari es between zero an d A1 • Correspondingly in the two-parameter case we
would use the extreme value of M(A, b).
wh ere c is the ratio of the bias to the standard deviation of the estimator.
Consider, for example, the performance of the sample mean x in estimating
(iii) Estimation of scale and other parameters
the mean IL of F: N(~-t, u 2 ) when there is contamination of amount A by
G: N(~-t +a, u 2 ). With R contaminants among the n observations, we have Robust estimation may of course be required, not only for location parame-
ters, but for scale parameters (as with exponential distributions), dispersion
l
E(x R) =IL+ (a/n)R,
or scale parameters (as with normal distributions), or shape parameters (as
so that with Pareto distributions). Here we use the term 'scale parameter' for a
(4.1.4) dispersion parameter expressed in the same dimensions as the random
variable, e.g. standard deviation (as opposed to variance). Our discussion of
an d performance criteria has focused on the robust estimation of a location
2 2
parameter, but obviously applies in essentials to any other kind of parame-
var (x-) = -u + __.;.
A(1-A)a
_ ___;._ (4.1.5)
n n ter. A scale parameter for a random variable X can in any case be regarded
132 Outliers in statistical data Accommodation of outliers in univariate samples 133

Clearly, in the case we have considered so far, Thus the estimator has bias Aa, and mean squared error
. _ var(T l H)-var(~ l H)
premmm - var(~ H) l (4.1.2) MSE(i) = :2 + a2[ A2+ A(l: A)]. (4.1.6)

an d It follows from (4.1.4), (4.1.5) that if we want the bias of x to be less than,
say, one-half the standard deviation, the sample size must satisfy
. _ var(~ l H)-var(T l H)
protectlon - var(~ H) l (4.1.3)
u1( À) .2
1-
n<---+--
2 2
(4.1.7)
4 A a A
Whether the measurement of robustness is discussed in terms of variance
ratios or relative efficiencies on the one hand, or premium and protection on For example, with 1 per ce n t contamination, an d a equal to 5 u, the sample
the other, is a matter of taste. Papers which make explicit use of premium size must not exceed 125. For further discussion see Huber (1964, p. 83)
and protection include Anscombe and Barron (1966), Tiao and Guttman and Jaeckel (1971a).
(1967), Guttman and Smith (1969, 1971), Guttman (1973a), and Desu,
Gehan, and Severo (1974). (ii) Compound alternative H
From the simple situation just discussed, we may generalize in severa!
If in (4.1.1) we regard A as a parameter with a range of possible values (say
directions.
O< A :s:;; A1 ) rather than a single known quantity, our alternative mode l is a
family of mixture distributions, indexed by A. Consider any performance
(i) Asymmetric contamination
measure M, for example the protection measure defined in (4.1.3)
l
If F is N(~-t, u 2 ) and G is, say, N(~-t +a, u 2 ), E(T H) will not in generai be
M= [var(~ l H)-var(T l H)]/var(~ l H).
equal to /L, and we have to take account of bias in our estimators. In this
situation, M will no w be a function of A, an d we can write it as M(A ). (Similarly, M
could be a function of b, or of the two parameters A and b.) If our estimator
a natural criterion for judging them is their mean squared error, a measure which
takes into account both their inherent variability and their distance from the T is to possess robustness in relation to H, it must perform 'satisfactorily' for
estimand. (Jaeckel, 197la) every distribution which can arise un der H, an d the value of M(A) must be
'satisfactory' for ali values of A between zero and A1 • For a single measure
This leads us to replacing var(~ l H), var( T l H) in our discussion by the
of performance this naturally suggests the value of the worst possible
mean squared error values
performance under H, in other words the extreme value of M(A) (minimum
l
MSE(~ H)= E[(~- ~-t) 2 l H]; MSE(T l H)= E[(T-~-t) 2 1 H]. or maximum as appropriate) over the range of possible values of A. This
measure could, for instance, take the form of the maximum variance of the
, There is also the question of how the bias affects performance. We have
estimator under H, or on the other hand the minimum protection value, as A
E[(T-~-t) 2 1 H]= var(T l H)(1 + c 2 ) vari es between zero an d A1 • Correspondingly in the two-parameter case we
would use the extreme value of M(A, b).
wh ere c is the ratio of the bias to the standard deviation of the estimator.
Consider, for example, the performance of the sample mean x in estimating
(iii) Estimation of scale and other parameters
the mean IL of F: N(~-t, u 2 ) when there is contamination of amount A by
G: N(~-t +a, u 2 ). With R contaminants among the n observations, we have Robust estimation may of course be required, not only for location parame-
ters, but for scale parameters (as with exponential distributions), dispersion
l
E(x R) =IL+ (a/n)R,
or scale parameters (as with normal distributions), or shape parameters (as
so that with Pareto distributions). Here we use the term 'scale parameter' for a
(4.1.4) dispersion parameter expressed in the same dimensions as the random
variable, e.g. standard deviation (as opposed to variance). Our discussion of
an d performance criteria has focused on the robust estimation of a location
2 2
parameter, but obviously applies in essentials to any other kind of parame-
var (x-) = -u + __.;.
A(1-A)a
_ ___;._ (4.1.5)
n n ter. A scale parameter for a random variable X can in any case be regarded
Accommodation of outliers in univariate samples 135
134 Outliers in statistica[ data

as a location parameter for an appropriately transformed random variable whose maximum variance over the family of distributions in H is as small as
Y: for example Y =IX- O!, where 8 is some location parameter for X. possible.
A related prospect (e.g. Gastwirth, 1966) is to seek the maximin robust
(iv) Asymptotic measures estimator. Here we consider the efficiencies of an estimator relative to other
estimators each of which is known to perform well for individuai distribu-
The variances, relative efficiencies and mean squared errors we have used so tions in the compound H. The maximin estimator is one whose minimum
far in our discussion ha ve bee n actual (i. e. finite-sample) values. For efficiency relative to the individually satisfactory estimators is as large as
example, the variance ratio var(s 2 1 H)/var(s 2 1 H), indicating the non- possible.
robustness of the normai sample variance s 2 as an estimator of u 2 , was See, e.g. Huber (1964), Bickel (1965), Gastwirth (1966), Hogg (1967),
shown in (4.0.2) to have the value Siddiqui and Raghunandanan (1967), and Jaeckel (1971a, 1971b), on use of
A( - ~)(b - )
3 1 12 the minimax and maximin criteria. We shall review some of the relevant
M(A, b)= l +!A(b -1)(3b + 1)-!A 2 (b -1) 2
n results in Section 4.2.
(4.1.8) At the same time, finite-sample variance and efficiency measures are
clearly the appropriate ones to use in some situations, such as Monte Carlo
for the contamination model (1-A)N(JL, u 2 )+ÀN(JL, bu 2 ). For the typical studies of robustness (cf. Andrews et al., 1972), or the assessment of
numerica! values A= 0.05, b = 9, this gave accommodation procedures for small samples, per se. (Extreme instances of
4.56 the latter are studies relating to samples of size 3, such as those by
M(0.05, 9) = 6.52--. (4.1.9) Anscombe and Barron, 1966, and Willke, 1966.) The choice resembles that
n
between the finite-sample variance of an estimator 8 and the information-
Unless n is small, this differs little in value from its limit as n ~ oo, i.e. 6.52. function reciproca! in the straightforward estimation of a parameter e. For
We can, if we wish, use this limiting value-the asymptotic variance ratio- robustness studies using finite-sample measures with sample sizes greater
as our measure, rather than the finite-sample variance-ratio (4.1.9). This than 3 see, e.g. Dixon (1960), Birnbaum and Laska (1967), Birnbaum,
choice between finite-sample and asymptotic values is obviously available Laska, and Meisner (1971), and Guttman (1973a). The finite-sample prop-
for any measure involving variances, or efficiencies based on variance, or erties of various measures, in relation to their asymptotic values, have been
their premium and protection equivalents. With certain adjustments (see investigated in detail by Gastwirth and Cohen (1970); see also Crow and
below), it is also available for measures based on mean squared errors. Siddiqui (1967). _
As regards nomenclature, var T will in generai be of the form (Al n) x In the case of asymmetric contamination, variances under H are, as we
(1 + O(n- 1 )), so that n var T tends to a finite limitA as n~ oo; it is this limit have seen, replaced by mean squared errors. Now, as our example (4.1.6)
which is conventionally called the asymptotic variance (and similarly for illustrates, while the variance of an estimator tends to zero as n- 1 when
asymptotic mean squared error and asymptotic bias, see below). n ~ oo, its bias may be independent of n, or at any rate may tend to a
, Huber (1964) argues in favour of using asymptotic measures: non-zero limit. On this basis, comparisons between asymptotic mean
Since ili effects from contamination are mainly felt for large sample sizes, it seems squared errors of estimators would be meaningless. To meet this situation
that one should primarily optimize large sample robustness properties. . . . the we might modify the contamination model (4.1.1) as follows. Instead of
asymptotic variance is not only easier to handle, but ... even for moderate values of taking the amount of contamination to be a basic parameter, A, we assume it
n it is a better measure of performance than the actual variance, because (i) the to depend on sample size according to the relation
actual variance of an estimator depends very much on the behaviour of the tails of H
[ G in our notation] . . . . (ii) If an estimator is asymptotically n ormai, then the (4.1.10)
important centrai part of its distribution and confidence intervals for moderate
confidence levels can better be approximated in terms of the asymptotic variance The alternative model is now
than in terms of the actual variance. (Huber, 1964)
On these grounds he adopts the maximum asymptotic variance (over the (4.1.11)
family of alternative distributions in a compound H) as a measure of
performance. This measure has been widely used in the construction of descriptive of a situation in which 'the amount of asymmetric contamination
robust estimators with optimal properties. The minimax robust estimator is large enough to affect the performance of the estimator, but is too small to
(Huber, 1964) is that estimator (perhaps restricted to be of a particular type) be measured accurately at the given sample size' (Jaeckel, 1971a). In (4.1.4),
Accommodation of outliers in univariate samples 135
134 Outliers in statistica[ data

as a location parameter for an appropriately transformed random variable whose maximum variance over the family of distributions in H is as small as
Y: for example Y =IX- O!, where 8 is some location parameter for X. possible.
A related prospect (e.g. Gastwirth, 1966) is to seek the maximin robust
(iv) Asymptotic measures estimator. Here we consider the efficiencies of an estimator relative to other
estimators each of which is known to perform well for individuai distribu-
The variances, relative efficiencies and mean squared errors we have used so tions in the compound H. The maximin estimator is one whose minimum
far in our discussion ha ve bee n actual (i. e. finite-sample) values. For efficiency relative to the individually satisfactory estimators is as large as
example, the variance ratio var(s 2 1 H)/var(s 2 1 H), indicating the non- possible.
robustness of the normai sample variance s 2 as an estimator of u 2 , was See, e.g. Huber (1964), Bickel (1965), Gastwirth (1966), Hogg (1967),
shown in (4.0.2) to have the value Siddiqui and Raghunandanan (1967), and Jaeckel (1971a, 1971b), on use of
A( - ~)(b - )
3 1 12 the minimax and maximin criteria. We shall review some of the relevant
M(A, b)= l +!A(b -1)(3b + 1)-!A 2 (b -1) 2
n results in Section 4.2.
(4.1.8) At the same time, finite-sample variance and efficiency measures are
clearly the appropriate ones to use in some situations, such as Monte Carlo
for the contamination model (1-A)N(JL, u 2 )+ÀN(JL, bu 2 ). For the typical studies of robustness (cf. Andrews et al., 1972), or the assessment of
numerica! values A= 0.05, b = 9, this gave accommodation procedures for small samples, per se. (Extreme instances of
4.56 the latter are studies relating to samples of size 3, such as those by
M(0.05, 9) = 6.52--. (4.1.9) Anscombe and Barron, 1966, and Willke, 1966.) The choice resembles that
n
between the finite-sample variance of an estimator 8 and the information-
Unless n is small, this differs little in value from its limit as n ~ oo, i.e. 6.52. function reciproca! in the straightforward estimation of a parameter e. For
We can, if we wish, use this limiting value-the asymptotic variance ratio- robustness studies using finite-sample measures with sample sizes greater
as our measure, rather than the finite-sample variance-ratio (4.1.9). This than 3 see, e.g. Dixon (1960), Birnbaum and Laska (1967), Birnbaum,
choice between finite-sample and asymptotic values is obviously available Laska, and Meisner (1971), and Guttman (1973a). The finite-sample prop-
for any measure involving variances, or efficiencies based on variance, or erties of various measures, in relation to their asymptotic values, have been
their premium and protection equivalents. With certain adjustments (see investigated in detail by Gastwirth and Cohen (1970); see also Crow and
below), it is also available for measures based on mean squared errors. Siddiqui (1967). _
As regards nomenclature, var T will in generai be of the form (Al n) x In the case of asymmetric contamination, variances under H are, as we
(1 + O(n- 1 )), so that n var T tends to a finite limitA as n~ oo; it is this limit have seen, replaced by mean squared errors. Now, as our example (4.1.6)
which is conventionally called the asymptotic variance (and similarly for illustrates, while the variance of an estimator tends to zero as n- 1 when
asymptotic mean squared error and asymptotic bias, see below). n ~ oo, its bias may be independent of n, or at any rate may tend to a
, Huber (1964) argues in favour of using asymptotic measures: non-zero limit. On this basis, comparisons between asymptotic mean
Since ili effects from contamination are mainly felt for large sample sizes, it seems squared errors of estimators would be meaningless. To meet this situation
that one should primarily optimize large sample robustness properties. . . . the we might modify the contamination model (4.1.1) as follows. Instead of
asymptotic variance is not only easier to handle, but ... even for moderate values of taking the amount of contamination to be a basic parameter, A, we assume it
n it is a better measure of performance than the actual variance, because (i) the to depend on sample size according to the relation
actual variance of an estimator depends very much on the behaviour of the tails of H
[ G in our notation] . . . . (ii) If an estimator is asymptotically n ormai, then the (4.1.10)
important centrai part of its distribution and confidence intervals for moderate
confidence levels can better be approximated in terms of the asymptotic variance The alternative model is now
than in terms of the actual variance. (Huber, 1964)
On these grounds he adopts the maximum asymptotic variance (over the (4.1.11)
family of alternative distributions in a compound H) as a measure of
performance. This measure has been widely used in the construction of descriptive of a situation in which 'the amount of asymmetric contamination
robust estimators with optimal properties. The minimax robust estimator is large enough to affect the performance of the estimator, but is too small to
(Huber, 1964) is that estimator (perhaps restricted to be of a particular type) be measured accurately at the given sample size' (Jaeckel, 1971a). In (4.1.4),
136 Outliers in statistica/ data Accommodation of outliers in univariate samples 137
(4.1.5), (4.1.6), the estimator i wouid now have bias À 1 an-~, variance contaminant Xc has a fixed value ~· And then we will wor]s: in terms of the
u2 effect on T of adding the contaminant ~ to the n good observations, so that
-[1 +O(n-~)], on the alternative model T is based on an enlarged sample of size n+ 1.
n
Suppose, for example, that T(xb ... , xn) = T is the sample mean i. Write
and mean squared error T(xb ... , Xm ~) = Te for the mean ic based on the eniarged contaminated
sample. Then the effect of adjoining ~ is to change the value of the estimator
by an amount
_ _ ni+~ _ ~-i
Ieading naturally to the definition of an asymptoJic bias À 1 a and an asympto- Xc - X = n + 1 - x = n + 1' (4.1.12)
tic me an squared error u 2 + À i a 2 •
Naturally enough this is proportional to 1/(n + 1), that is to the amount of
4.1.2 The qualitative approach: inftuence curves contamination in the sample; the effect standardized for the amount of
contamination is
So far in our discussion of performance criteria, we have confined ourseives
to the question 'How does contamination affect the precision (and maybe the (n+ 1)(ic- i)=~- i. (4.1.13)
bias) of an estimator?' This prompted the various dispersion-based and
This will, as we remarked above, exceed any bound for ~ Iarge enough. It is
efficiency-based criteria. But precision is only one aspect. In what way, one
a Iinear function of the vaiue of the contaminant.
would Iike to know, does contamination influence a given estimator? For
Again, if T is the sampie variance s 2 for a distribution with unknown
example, is the effect on the estimator proportionai to the number of
mean and variance, we have for the enlarged sample
contaminants present? Supposing there is just one contaminant, how is its
ns~ =(n -1)s + ni 2 + ~ 2 - (ni + ~) 2 /(n + 1),
2
efiect related to its magnitude? What is the worst possible effect that a single
contaminant can have? In particular, is this effect bounded or not? As we giving
show below, a contaminant in a sample of n will, if large enough, shift the
sample mean i beyond any bound; but two contaminants in a sample of odd s~- s 2 = [(~- if/(n + 1)]- (s 2 /n).
size n= 2m -1 can at most shift the sample median from x(m) to x<m-l) or The standardized effect is therefore
x(m+l}' however far out these two contaminants may be. Aspects such as
these underlie a powerful array of tools, which we will now describe, based n+1
(n+ 1)(s~- s 2 ) = (~- i) 2 - - - s 2 • (4.1.14)
on the influence function or influence curve. The approach is due to Hampel n
(1968, 1971); for a stimuiating and highly readable exposition, see Hampel The effect again exceeds any bound for ~ large enough, but this time is a
(1~74). quadratic function of ~.
As usuai, suppose we have a basic model F and a contamination model Effects per unit of contamination, such as (4.1.13) and (4.1.14), are called
(1- A)F + AG. If the contamination fraction À is small enough, the number finite-sample influence functions or, following Hampel, finite-sample in-
of contaminants R in a sample will effectiveiy be either O or 1, so that for fluence curves. A finite-sample influence curve depends on the argument ~'
marginai comparisons of the performance of estimators in the neighbour- on the estimator T, and in generai (see, for example, the case of the sampie
hood of À = O we need only consider the case of a single contaminant. Given median discussed beiow) on the basic distribution F; it may aiso depend
n 'good' (basic-model) observations xb ... , Xn and an estimator T(x 1 , ••• , explicitly, as (4.1.14) illustrates, on the sample size n. Accordingly we write
xn), we wish in principle to examine the effect on T of substituting a it JCT,F;n(~).
contaminant for one of the n observations. Denote the contaminant, as in Equation (4.1.14) aiso suggests that, as with the variance and efficiency
Section 3.2, by xc. The effect as defined would require averaging with measures discussed earlier, we may wish to use the asymptotic equivalent
respect to two random elements, first the random variation in xc as sampled
from G, and second the random variation in the good value, Xj say, which
has been replaced by xc. To sidestep these sources of variation we formulate n-+oo

the problem, equivalently and more conveniently, as follows. The con- say. In fact, it is this asymptotic influence curve, or simpiy the influence
taminating distribution G we will take to be atomic at ~; that is to say, the curve, ICT,F(~), which is the really useful tool.
136 Outliers in statistica/ data Accommodation of outliers in univariate samples 137
(4.1.5), (4.1.6), the estimator i wouid now have bias À 1 an-~, variance contaminant Xc has a fixed value ~· And then we will wor]s: in terms of the
u2 effect on T of adding the contaminant ~ to the n good observations, so that
-[1 +O(n-~)], on the alternative model T is based on an enlarged sample of size n+ 1.
n
Suppose, for example, that T(xb ... , xn) = T is the sample mean i. Write
and mean squared error T(xb ... , Xm ~) = Te for the mean ic based on the eniarged contaminated
sample. Then the effect of adjoining ~ is to change the value of the estimator
by an amount
_ _ ni+~ _ ~-i
Ieading naturally to the definition of an asymptoJic bias À 1 a and an asympto- Xc - X = n + 1 - x = n + 1' (4.1.12)
tic me an squared error u 2 + À i a 2 •
Naturally enough this is proportional to 1/(n + 1), that is to the amount of
4.1.2 The qualitative approach: inftuence curves contamination in the sample; the effect standardized for the amount of
contamination is
So far in our discussion of performance criteria, we have confined ourseives
to the question 'How does contamination affect the precision (and maybe the (n+ 1)(ic- i)=~- i. (4.1.13)
bias) of an estimator?' This prompted the various dispersion-based and
This will, as we remarked above, exceed any bound for ~ Iarge enough. It is
efficiency-based criteria. But precision is only one aspect. In what way, one
a Iinear function of the vaiue of the contaminant.
would Iike to know, does contamination influence a given estimator? For
Again, if T is the sampie variance s 2 for a distribution with unknown
example, is the effect on the estimator proportionai to the number of
mean and variance, we have for the enlarged sample
contaminants present? Supposing there is just one contaminant, how is its
ns~ =(n -1)s + ni 2 + ~ 2 - (ni + ~) 2 /(n + 1),
2
efiect related to its magnitude? What is the worst possible effect that a single
contaminant can have? In particular, is this effect bounded or not? As we giving
show below, a contaminant in a sample of n will, if large enough, shift the
sample mean i beyond any bound; but two contaminants in a sample of odd s~- s 2 = [(~- if/(n + 1)]- (s 2 /n).
size n= 2m -1 can at most shift the sample median from x(m) to x<m-l) or The standardized effect is therefore
x(m+l}' however far out these two contaminants may be. Aspects such as
these underlie a powerful array of tools, which we will now describe, based n+1
(n+ 1)(s~- s 2 ) = (~- i) 2 - - - s 2 • (4.1.14)
on the influence function or influence curve. The approach is due to Hampel n
(1968, 1971); for a stimuiating and highly readable exposition, see Hampel The effect again exceeds any bound for ~ large enough, but this time is a
(1~74). quadratic function of ~.
As usuai, suppose we have a basic model F and a contamination model Effects per unit of contamination, such as (4.1.13) and (4.1.14), are called
(1- A)F + AG. If the contamination fraction À is small enough, the number finite-sample influence functions or, following Hampel, finite-sample in-
of contaminants R in a sample will effectiveiy be either O or 1, so that for fluence curves. A finite-sample influence curve depends on the argument ~'
marginai comparisons of the performance of estimators in the neighbour- on the estimator T, and in generai (see, for example, the case of the sampie
hood of À = O we need only consider the case of a single contaminant. Given median discussed beiow) on the basic distribution F; it may aiso depend
n 'good' (basic-model) observations xb ... , Xn and an estimator T(x 1 , ••• , explicitly, as (4.1.14) illustrates, on the sample size n. Accordingly we write
xn), we wish in principle to examine the effect on T of substituting a it JCT,F;n(~).
contaminant for one of the n observations. Denote the contaminant, as in Equation (4.1.14) aiso suggests that, as with the variance and efficiency
Section 3.2, by xc. The effect as defined would require averaging with measures discussed earlier, we may wish to use the asymptotic equivalent
respect to two random elements, first the random variation in xc as sampled
from G, and second the random variation in the good value, Xj say, which
has been replaced by xc. To sidestep these sources of variation we formulate n-+oo

the problem, equivalently and more conveniently, as follows. The con- say. In fact, it is this asymptotic influence curve, or simpiy the influence
taminating distribution G we will take to be atomic at ~; that is to say, the curve, ICT,F(~), which is the really useful tool.
Accommodation of outliers in univariate samples 139
138 Outliers in statistica[ data

T((l- A)F + AG), the median of the mixture distribution, is equal to m+ a,


Wbat is tbe influence curve for s 2 , 1Cs2,p(ç)? If we let n ~ oo in (4.1.14),
say, where a is positive or negative according as ç is greater or less than m.
we mu~t not only replace (n+ l)/ n by l, but also i and s 2 by IL and u 2
We assume F to be continuous, with density f. To the first order of small
respectlvely:
quantities we have for a< o,
(4.1.15)
Tbis equation conveys tbe same information as tbe finite-sample version ! =(l- A)F(m +~)+A= (l- A)[!+ ~f(m)] +A= !+!A+ ~f(m),
(4.1.14) regarding tbe unbounded quadratic influence of a contaminant on
giving a= -A/[2/(m)]; similarly, for ~>O, a= +A/[2/(m)]. Hence the in-
s 2 • In deriving it, eacb estimator T(xb ... , xn) (e.g. i or s 2 ) on tbe
rigbt-band side of (4.1.14) bas been replaced by limn-+"" T(xb ... , xn) (e.g. IL ftuence curve, lim (a/ A) is
À-+0 '
or u 2 ); tbis limiting form depends only on F and we will write it T(F). For
example, ~f T is tbe sample mean i, T( F)= J xdF; if T is tbe sample IC- (i:) sgn(ç- m) (4.1.20)
x,F ~ 2f(m) .
variance s , T( F)= J (x -IL) 2 dF wbere IL= J x dF.
Encompassing our procedure in a formai definition, we say tbat tbe
influence curve of an estimator T(x 1 , ••• , xn) at tbe basic distribution F is The inftuence of a contaminant on the sample median is thus seen to be
bounded-an essential qualitative difference from, say, the sample me an!
ICT,F(ç) = lim {[T((l- A)F+ AG)- T(F)]/A} (4.1.16)
À-+0 A readily calculated finite-sample representation of tbe influence curve is
tbe sensitivity curve introduced by Tukey (1970). For tbis, tbe contaminant ç
wbere G is tbe atomic distribution
~s added, not to a random sample x 1 , ••• , xn from F as for tbe finite-sample
P(X= ç) =l. (4.1.17) mftuence curve, but to a pseudo-sample cb ... , cn consisting of tbe order
scores ci = E(X(i)) for a sample of size n from F. Tbis constructed sample
We may also write (4.1.16) as
may be tbougbt of as smootbly representing tbe basic distribution. Tbe
a (4.1.18)
sensitivity curve is given (as witb tbe finite-sample JC) by (n+ l) [T(c 1 , ••• ,
ICT,F(ç) = éJA {T[(l- A)F+ AGJ}IÀ=O Cm ç)- T( cb ... , cn)] regarded as a function of ç. Otber convenient order
statis.ti.cs could of course be used in piace of order scores; for example,
Example 4.1 Sample mean. We have conditlonal centroids or medians. AH of tbese bave been extensively tabu-
lCx,F(ç) = lim {[(l- A)IL + Aç-IL]/A}= ç-IL, (4.1.19) lated in tbe normal case (bavid, Barton, Ganesbalingam, Harter, Kim,
À-+0 Merrington, and Walley, 1968).
which could also have been obtained by letting n~ oo in (4.1.13). Reverting to tbe influence curve, we note an important property. If we
regard tbe argument ç as a random quantity distributed according to tbe
Example 4.2 Sample variance basic model F, it can be sbown (see, for example, Huber 1972, pp.
1051-1052) tbat tbe expectation of tbe influence curve witb respect to tbis
1Cs2,F(ç) = lim {[(1- A)(IL 2 + u 2 ) + Aç2 - ((l- A)IL + Açf- u 2 ]/A}
À-+0 variation in ç is zero:

as in (4.1.15).
JICr.F(~) dF(~) = 0, (4.1.21)

and tbat tbe mean squared value of tbe influence curve,


Example 4.3 Sample median. It is not practicable to calculate the inftuence
curve of the sample median i on a finite-sample basis, since the shift in
s~mple me~ian on moving from an odd to an even number of observations, or
J(ICr,F(~)f dF(~),
vzce versa, zs not defined. On an asymptotic basis, however, the calculation is
is equal to tbe asymptotic variance of T. Tbus we bave a direct connection
straightforward. T(F) is now the population median m, defined by
between tbe influence curve and our earlier dispers'ion-based performance
criteria.
JdF=!, We now describe some furtber parameters of tbe influence curve wbicb
tbrow ligbt on tbe robustness of an estimator.
140 Outliers in statistica[ data Accommodation of outliers in univariate samples 141

(i) The gross-error sensitivity Why stop at two? Clearly the data can absorb a greater amount of contami-
This is the supremum of the absolute value of the influence curve, nation than this without the sample median becoming totally unreliable.
With four contaminants added, the shift from x(m) is bounded (at most to
(4.1.22) x<m- 2 ) or x(m+ 2 )); with 2m-2 contaminants added, it is stili bounded (at
most to x< 1> or x<2 m-1)). But as soon as the number of added contaminants
The gross-error sensitivity 'measures the worst approximate influence which exceeds 2m-2, it is possible for i to take any value whatsoever. From this
a fixed amount of contamination can have on the value of the estimator point of view, the contamination be com es intolerable when its proportionate
(hence it may be regarded as an approximate bound for the bias of the amount reaches (2m -1)/(4m- 2), i.e. one-half. W e say that the sample
estimator)' (Hampel, 1974). median has breakdown point !. The breakdown point, 11'T,F, for any estimator
T is the smallest proportion of contamination which can carry the value of
'Example 4.4 Suppose F is N(~-t, u 2 ); fori and for s 2 , 'YT,F = oo (so that the the estimator over aH bounds. lt is an important measure of robustness; the
effect that a contaminant can have on these estimators is unbounded), while idea is due to Hodges (1967) and Hampel (1971).
for i ( the media n)
We have now discussed in some detail what Huber calls the stability aspect
'YT,F = 1/[2/(~-t)], from (4.1.20); that is 'YT,F = u../(211')/2 = 1.25u. of robustness,

(ii) The local-shift sensitivity in dose analogy to the stability of a mechanical structure (say of a bridge): (i) the
qualitative aspect: a small perturbation should have small effects; (ii) the breakdown
This is defined by aspect: how big can the perturbation be before everything breaks down; (iii) the
infinitesimal aspect: the effects of infinitesimal perturbations. (Huber, 1972)
(4.1.23)
4.1.3 Robustness of confidence intervals
It measures the worst possible effect of 'adjusting' a contaminant by modify-
ing its value, for example by Winsorizing. Suppose we have a sample x 1 , ••• , Xn which comes, on some basic model H,
from a distribution involving a location parameter IL and a scale parameter
(iii) The rejection point u, both unknown; typically this distribution might be N(~-t, u 2 ). Consider the
problem of constructing a confidence interval for IL at level 1- a. Essentially
Suppose that the influence curve vanishes for ali points ç outside some finite this is built up from the following elements:
interval
(i) an estimator T of IL ;
(ii) an estimator ST of the standard deviation of T;
say, centred on /L, the mean (or other appropriate location point) of F. This (iii) the distribution D of (T-~-t)IST, assuming the basic model.
implies that observations outside [~-t -p, IL +p] have no influence o n the
estimator T -i.e. that the estimation procedure rejects such observations. If uh u 2 are lower and upper !a-points of D, the confidence interval is then
p= PT,F is called the rejection point of the estimator. Examples of estimators (4.1.24)
with finite PT,F (and which therefore reject outliers beyond some particular
distance) will be encountered belo w in the context of M -estimation. If the parent distribution is N(~-t, u 2 ), T, ST and D particularize to i, s/Jn
In some cases ICT,F may be very small, though not zero, for lç -ILI and the tn-l distribution in the classica! procedure, and we get the familiar
sufficiently large. The rejection point is then infinite, but outliers, though not confidence interval
explicitly rejected, have very little effect on the estimator.
(i- tn_ (a/2) ln' i+ tn-l(a/2)ln)
1 (4.1.25)
(iv) The breakdown aspect of performance
where tn_ 1 (a/2) is the upper (a/2)-point of tn-l·
We noted above that, in contrast to the sample mean, the influence of a If in fact the observations xh ... , xn come, not from N(,...,, u 2 ) as assumed,
contaminant on the sample median i is bounded, and indeed that two but from a contaminated distribution, the confidence interval (4.1.25) may
contaminants, whatever their magnitudes, added to a sample of size n = be defective for two reasons. First, the distribution D of (T-~-t)IST =
2m - l can at most shift the sample median from x(m) to x<m-l) or x<m+l)· (i -~-t)l(s!Jn) may differ substantially from that of tn-h and the probability
140 Outliers in statistica[ data Accommodation of outliers in univariate samples 141

(i) The gross-error sensitivity Why stop at two? Clearly the data can absorb a greater amount of contami-
This is the supremum of the absolute value of the influence curve, nation than this without the sample median becoming totally unreliable.
With four contaminants added, the shift from x(m) is bounded (at most to
(4.1.22) x<m- 2 ) or x(m+ 2 )); with 2m-2 contaminants added, it is stili bounded (at
most to x< 1> or x<2 m-1)). But as soon as the number of added contaminants
The gross-error sensitivity 'measures the worst approximate influence which exceeds 2m-2, it is possible for i to take any value whatsoever. From this
a fixed amount of contamination can have on the value of the estimator point of view, the contamination be com es intolerable when its proportionate
(hence it may be regarded as an approximate bound for the bias of the amount reaches (2m -1)/(4m- 2), i.e. one-half. W e say that the sample
estimator)' (Hampel, 1974). median has breakdown point !. The breakdown point, 11'T,F, for any estimator
T is the smallest proportion of contamination which can carry the value of
'Example 4.4 Suppose F is N(~-t, u 2 ); fori and for s 2 , 'YT,F = oo (so that the the estimator over aH bounds. lt is an important measure of robustness; the
effect that a contaminant can have on these estimators is unbounded), while idea is due to Hodges (1967) and Hampel (1971).
for i ( the media n)
We have now discussed in some detail what Huber calls the stability aspect
'YT,F = 1/[2/(~-t)], from (4.1.20); that is 'YT,F = u../(211')/2 = 1.25u. of robustness,

(ii) The local-shift sensitivity in dose analogy to the stability of a mechanical structure (say of a bridge): (i) the
qualitative aspect: a small perturbation should have small effects; (ii) the breakdown
This is defined by aspect: how big can the perturbation be before everything breaks down; (iii) the
infinitesimal aspect: the effects of infinitesimal perturbations. (Huber, 1972)
(4.1.23)
4.1.3 Robustness of confidence intervals
It measures the worst possible effect of 'adjusting' a contaminant by modify-
ing its value, for example by Winsorizing. Suppose we have a sample x 1 , ••• , Xn which comes, on some basic model H,
from a distribution involving a location parameter IL and a scale parameter
(iii) The rejection point u, both unknown; typically this distribution might be N(~-t, u 2 ). Consider the
problem of constructing a confidence interval for IL at level 1- a. Essentially
Suppose that the influence curve vanishes for ali points ç outside some finite this is built up from the following elements:
interval
(i) an estimator T of IL ;
(ii) an estimator ST of the standard deviation of T;
say, centred on /L, the mean (or other appropriate location point) of F. This (iii) the distribution D of (T-~-t)IST, assuming the basic model.
implies that observations outside [~-t -p, IL +p] have no influence o n the
estimator T -i.e. that the estimation procedure rejects such observations. If uh u 2 are lower and upper !a-points of D, the confidence interval is then
p= PT,F is called the rejection point of the estimator. Examples of estimators (4.1.24)
with finite PT,F (and which therefore reject outliers beyond some particular
distance) will be encountered belo w in the context of M -estimation. If the parent distribution is N(~-t, u 2 ), T, ST and D particularize to i, s/Jn
In some cases ICT,F may be very small, though not zero, for lç -ILI and the tn-l distribution in the classica! procedure, and we get the familiar
sufficiently large. The rejection point is then infinite, but outliers, though not confidence interval
explicitly rejected, have very little effect on the estimator.
(i- tn_ (a/2) ln' i+ tn-l(a/2)ln)
1 (4.1.25)
(iv) The breakdown aspect of performance
where tn_ 1 (a/2) is the upper (a/2)-point of tn-l·
We noted above that, in contrast to the sample mean, the influence of a If in fact the observations xh ... , xn come, not from N(,...,, u 2 ) as assumed,
contaminant on the sample median i is bounded, and indeed that two but from a contaminated distribution, the confidence interval (4.1.25) may
contaminants, whatever their magnitudes, added to a sample of size n = be defective for two reasons. First, the distribution D of (T-~-t)IST =
2m - l can at most shift the sample median from x(m) to x<m-l) or x<m+l)· (i -~-t)l(s!Jn) may differ substantially from that of tn-h and the probability
142 Outliers in statistical data Accommodation of outliers in univariate samples 143

tbat tbe interval (4.1.25) covers tbe true value IL may tbus differ substan- H depends on tbe parameters A, b etc., wbicb for present purposes we will
tially from 1- a (and may-a particularly undesirable occurrence-be denote simply by A.
substantially less tban 1- a). Tbat is, tbe procedure may lack robustness of Tbe null bypotbesis for tbe significance test is
validity. Secondly, since x and s are sensitive to tbe presence of extreme
H 0 : F0 wbere F 0 is N(/J; 0 , u 2 ). (4.1.28)
values in tbe sample, tbe confidence interval may be unnecessarily wide. It
could well be preferable to use more robust estimators T and ST in tbe We are concerned witb tbe bebaviour of tbe test, botb in relation to tbe
construction (4.1.24), aiming at acbieving satisfactory validity, and at usual 'Type II error' family of alternatives (4.1.26) and in relation to tbe
tbe same time obtaining confidence intervals wbicb tend to be sborter in tbe family of alternatives
presence of extreme values. From tbis point of view the procedure leading
(4.1.29)
to (4.1.25) may lack robustness of effìciency, o:r 'robustness of performance'
in Huber's words quoted at tbe beginning of tbis cbapter. See Tukey and expressing contamination un der tbe true value of IL; more generally, we are
McLaugblin (1963), Dixon and Tukey (1968), Huber (1968, 1970). concerned witb tbe family (4.1.27), encompassing botb types of departure
Tbe cboice of T and ST will be discussed in Section 4.2.4. from Ho.
Tbe following measures of performance of a confidence interval sucb as For any cb o ice of T an d ST we bave, corresponding to tbe (1- a )-lev~l
(4.1.24) bave been proposed: confidence interval (4.1.24), a significance test at level a of H 0 against Ho
(i) Tbe probability tbat tbe confidence interval fails to cover tbe true witb criticai region
parameter value IL under a specified alternative model ii. Tbis reflects tbe (4.1.30)
robustness of validity of tbe procedure. Takeuchi (1971) estimates it by tbe
relative frequency of non-coverage of tbe parameter by tbe interval in a We can now formulate relevant measures of performance, as follows:
large number of simulations; be calls tbis tbe error frequency of tbe interval. (i) Tbe conventional power of tbe test, as a function of IL; tbis is
Analogously, tbe measure itself may be called tbe error probability.
ll 1 (~-L) = P[(T-/Lo)/ST E~ H]. l (4.1.31)
(ii) Suppose we bave a compound alternative ii; a confidence interval
(T- a, T+ a) of given lengtb 2a will operate at different confidence levels (ii) Tbe stability of tbe significance level under contamination, as a function
for tbe different distributions tbat can arise under ii. For specified a, tbe of A ; tbis is given by
minimum of tbese possible confidence levels gives a measure of 'guaranteed' lliA) = ll 2 (A, a)= P[(T-/J; 0 )/ST E~ Ho]. l (4.1.32)
performance. Tbe idea is due to Huber (1968).
It corresponds to tbe error probability of tbe equivalent confidence interval,
(iii) If tbe interval (4.1.24) bas robustness of validity, a natural measure of as defined above.
its efficiency (for specified a) is tbe ratio of tbe lengtbs of tbe intervals
(4.1.25), (4.1.24); or, from anotber point of view, tbe ratio measures tbe (iii) For a compound alternative H, tbe guaranteed significance level
relative efficiency of two procedures. Dixon and Tukey (1968, p. 86) call tbe
square of tbis ratio tbe relative efficiency. ll 3 = ll 3 (a) = min ll 2 (A, a). (4.1.33)
À

4.1.4 Robustness of significance tests This again corresponds to the guaranteed performance measure under
contamination defined above for a confidence interval.
Tbis obviously bears a relation to tbe robustness of confidence intervals
discussed above. Tbere is an important difference, bowever, inasmucb as we (iv) Tbe stability of tbe power under contamination, as a function of botb tJ-,
now bave a double family of alternative bypotbeses. Suppose, to fix ideas, tbe argument of tbe power function, and A, tbe measure of contamination:
tbat a two-sided test of tbe bypotbesis IL = ILo on tbe basis of an assumed
(4.1.34)
normal sample is required. We can stili tbink in terms of a basic model
H: F wbere F is N(/J;, u 2 ), (4.1.26) (v) For a compound alternative H, tbe guaranteed power at IL:
and a contamination alternative ii wbicb migbt typically be
(4.1.35)
ll 5 (/L) = min ll 4 (/J;, A).
ii: (1- A)F + AG wbere G is N(/J;, bu 2 ). (4.1.27) À
142 Outliers in statistical data Accommodation of outliers in univariate samples 143

tbat tbe interval (4.1.25) covers tbe true value IL may tbus differ substan- H depends on tbe parameters A, b etc., wbicb for present purposes we will
tially from 1- a (and may-a particularly undesirable occurrence-be denote simply by A.
substantially less tban 1- a). Tbat is, tbe procedure may lack robustness of Tbe null bypotbesis for tbe significance test is
validity. Secondly, since x and s are sensitive to tbe presence of extreme
H 0 : F0 wbere F 0 is N(/J; 0 , u 2 ). (4.1.28)
values in tbe sample, tbe confidence interval may be unnecessarily wide. It
could well be preferable to use more robust estimators T and ST in tbe We are concerned witb tbe bebaviour of tbe test, botb in relation to tbe
construction (4.1.24), aiming at acbieving satisfactory validity, and at usual 'Type II error' family of alternatives (4.1.26) and in relation to tbe
tbe same time obtaining confidence intervals wbicb tend to be sborter in tbe family of alternatives
presence of extreme values. From tbis point of view the procedure leading
(4.1.29)
to (4.1.25) may lack robustness of effìciency, o:r 'robustness of performance'
in Huber's words quoted at tbe beginning of tbis cbapter. See Tukey and expressing contamination un der tbe true value of IL; more generally, we are
McLaugblin (1963), Dixon and Tukey (1968), Huber (1968, 1970). concerned witb tbe family (4.1.27), encompassing botb types of departure
Tbe cboice of T and ST will be discussed in Section 4.2.4. from Ho.
Tbe following measures of performance of a confidence interval sucb as For any cb o ice of T an d ST we bave, corresponding to tbe (1- a )-lev~l
(4.1.24) bave been proposed: confidence interval (4.1.24), a significance test at level a of H 0 against Ho
(i) Tbe probability tbat tbe confidence interval fails to cover tbe true witb criticai region
parameter value IL under a specified alternative model ii. Tbis reflects tbe (4.1.30)
robustness of validity of tbe procedure. Takeuchi (1971) estimates it by tbe
relative frequency of non-coverage of tbe parameter by tbe interval in a We can now formulate relevant measures of performance, as follows:
large number of simulations; be calls tbis tbe error frequency of tbe interval. (i) Tbe conventional power of tbe test, as a function of IL; tbis is
Analogously, tbe measure itself may be called tbe error probability.
ll 1 (~-L) = P[(T-/Lo)/ST E~ H]. l (4.1.31)
(ii) Suppose we bave a compound alternative ii; a confidence interval
(T- a, T+ a) of given lengtb 2a will operate at different confidence levels (ii) Tbe stability of tbe significance level under contamination, as a function
for tbe different distributions tbat can arise under ii. For specified a, tbe of A ; tbis is given by
minimum of tbese possible confidence levels gives a measure of 'guaranteed' lliA) = ll 2 (A, a)= P[(T-/J; 0 )/ST E~ Ho]. l (4.1.32)
performance. Tbe idea is due to Huber (1968).
It corresponds to tbe error probability of tbe equivalent confidence interval,
(iii) If tbe interval (4.1.24) bas robustness of validity, a natural measure of as defined above.
its efficiency (for specified a) is tbe ratio of tbe lengtbs of tbe intervals
(4.1.25), (4.1.24); or, from anotber point of view, tbe ratio measures tbe (iii) For a compound alternative H, tbe guaranteed significance level
relative efficiency of two procedures. Dixon and Tukey (1968, p. 86) call tbe
square of tbis ratio tbe relative efficiency. ll 3 = ll 3 (a) = min ll 2 (A, a). (4.1.33)
À

4.1.4 Robustness of significance tests This again corresponds to the guaranteed performance measure under
contamination defined above for a confidence interval.
Tbis obviously bears a relation to tbe robustness of confidence intervals
discussed above. Tbere is an important difference, bowever, inasmucb as we (iv) Tbe stability of tbe power under contamination, as a function of botb tJ-,
now bave a double family of alternative bypotbeses. Suppose, to fix ideas, tbe argument of tbe power function, and A, tbe measure of contamination:
tbat a two-sided test of tbe bypotbesis IL = ILo on tbe basis of an assumed
(4.1.34)
normal sample is required. We can stili tbink in terms of a basic model
H: F wbere F is N(/J;, u 2 ), (4.1.26) (v) For a compound alternative H, tbe guaranteed power at IL:
and a contamination alternative ii wbicb migbt typically be
(4.1.35)
ll 5 (/L) = min ll 4 (/J;, A).
ii: (1- A)F + AG wbere G is N(/J;, bu 2 ). (4.1.27) À
144 Outliers in statistical data Accommodation of outliers in univariate samples 145

Tbese concepts apply to significance tests generally, tbougb our discussi o n of size n, we get tbe (r, s)-told Winsorized mean
has been in tbe context of tests for tbe mean of a normal distribution. Veale
and Kale (1972), for example, consider tbe testing of bypotbeses for tbe !r,s = (rX{r+l) + X(r+l) + ... + X(n-s) + SX(n-s))/n. (4.2.2)
parameter u of an exponential distribution witb density u- 1 exp(-u- 1x),
and tbey develop a test (described in Section 4.4) robust against a contamin- Often tbe amoufits of lower-tail and upper-tail trimming or Winsorizing
ant arising from an excbangeable alternative model witb contamination are tbe same, i. e. r = s, an d we bave tbe r-told symmetrically trimmed and
parameter b. Tbree measures of performance are tabulated, Pm, Pt, and pd, Winsorized means
eacb involving a comparison of tbe test witb tbe test of tbe same size based T
on tbe sample sum, wbicb is optimal under tbe basic model. In tbe notation Xr,r = (X(r+l) + ... + X(n-r))/(n- 2r), (4.2.3)
of (4.1.31), (4.1.32), and (4.1.34), tbese measutes can be written as
w
Pm = ll1(u)- I1 1(u), (4.1.36) Xr,r = (rx{r+l) + X{r+l) + · · · + X(n-r) + rX(n-r))/n. (4.2.4)
Pt =llib)-I1 2(b), (4.1.37)
Tbe a-trimmed means fir(a, a) referred to in Section 2.6 are r-fold
Pd =ll4(u, b)-I1iu, b), (4.1.38) symmetrically trimmed means in wbicb tbe amount of trimming is, for
wbere llb !12, !14 relate to tbe robust test and fil, n2, n4 to tbe optimal test. convenience, specified by tbe proportion 2a of tbe sample omitted ratber
Interestingly, Veale and Kale call Pm tbe premium and Pt tbe protection tban tbe number 2r of observations. Witb an a -trimming procedure in
involved in using tbe robust test, providing a natural extension of wbicb a bas been specified beforeband, tbe number an of observations
Anscombe's concepts of premium and protection in estimation discussed supposed to be trimmed at eacb end may not be an integer; suppose its
earlier; see (4.1.2) and (4.1.3). integer part is r, so tbat an= r +t (O< t< 1). We tben omit r observations at
eacb end, and include tbe nearest retained observations, x{r+l) and x<n-r)'
4.2 GENERAL METHODS OF ACCOMMODATION eacb witb reduced weigbt 1- f:
T
4.2.1 Estimation of location m( a, a)= ((1- f)x(r+l) + X(r+2) + ... + X(n-r-1) + (1- f)x(n-r)]/n(1- 2a).
We now consider some of tbe generai metbods tbat exist for constructing (4.2.5)
robust estimators, tests, or confidence intervals and give a- brief review of tbe
performance cbaracteristics of selected procedures. In tbe main tbe tecbni- Similarly we can define a- Winsorized means iri(a, a); tbere is now no need
ques and results do not specifically relate to particular assumed forms for tbe for any fractional weigbting, since tbe number of lower-tail observations
basic model (special cases of normal and exponential basic models are given Winsorized into x(r+l) is r +t+ 1- t- 1 = r. Tbus
separate attention in Sections 4.3 and 4.4, respectively).
We start witb two familiar, simple, and intuitively appealing procedures
for inducing robustness, namely trimming and Winsorizing. Tbese bave (4.2.6)
already been mentioned in Section 2.6. Tbe object is to control tbe variabil-
ity due to tbe r lowest sample values x< 1>, .•. , x<r> and tbe s bigbest ones Clearly tbe 0-trimmed an d 0-Winsorized means are botb tbe same as tbe
x<n-s+l)' ... , x<n>· Tbe cboice of r and s is discussed later; for tbe moment sample mean i, an d tbe !-Winsorized mean is tbe same as tbe sample
we suppose tbey are pre-cbosen parameters. If tbese r + s observations are
median i; tbe !-trimmed mean, by a suitable limiting argument, can also be
omitted, so tbat we confine ourselves to a censored sample of size n - r- s,
we get tbe (r, s)-told trimmed mean taken to be i. Tbe ~-trimmed mean, fiz(~, ~), is called tbe mid-mean.
Wbat is tbe influence curve, ICTF(g), for tbe a-trimmed mean? If F is
T continuous witb density t an d m~an /J;, an d O:s:;; a < !, we bave, in tbe
Xr,s = (X(r+l) + · · · + X(n-s))/(n- r- S). (4.2.1) notation of Section 4.1,
xl-«
If on tbe otber band tbe r lowest sample values are eacb replaced by tbe 1
~alue. of tbe ne~rest observation to be retained uncbanged, viz. x(r+l)' and T(F) = (1- 2 a) J x dF (4.2.7)
hkew1se tbe s b1gbest by x<n-s)' so tbat we work witb a transformed sample
144 Outliers in statistical data Accommodation of outliers in univariate samples 145

Tbese concepts apply to significance tests generally, tbougb our discussi o n of size n, we get tbe (r, s)-told Winsorized mean
has been in tbe context of tests for tbe mean of a normal distribution. Veale
and Kale (1972), for example, consider tbe testing of bypotbeses for tbe !r,s = (rX{r+l) + X(r+l) + ... + X(n-s) + SX(n-s))/n. (4.2.2)
parameter u of an exponential distribution witb density u- 1 exp(-u- 1x),
and tbey develop a test (described in Section 4.4) robust against a contamin- Often tbe amoufits of lower-tail and upper-tail trimming or Winsorizing
ant arising from an excbangeable alternative model witb contamination are tbe same, i. e. r = s, an d we bave tbe r-told symmetrically trimmed and
parameter b. Tbree measures of performance are tabulated, Pm, Pt, and pd, Winsorized means
eacb involving a comparison of tbe test witb tbe test of tbe same size based T
on tbe sample sum, wbicb is optimal under tbe basic model. In tbe notation Xr,r = (X(r+l) + ... + X(n-r))/(n- 2r), (4.2.3)
of (4.1.31), (4.1.32), and (4.1.34), tbese measutes can be written as
w
Pm = ll1(u)- I1 1(u), (4.1.36) Xr,r = (rx{r+l) + X{r+l) + · · · + X(n-r) + rX(n-r))/n. (4.2.4)
Pt =llib)-I1 2(b), (4.1.37)
Tbe a-trimmed means fir(a, a) referred to in Section 2.6 are r-fold
Pd =ll4(u, b)-I1iu, b), (4.1.38) symmetrically trimmed means in wbicb tbe amount of trimming is, for
wbere llb !12, !14 relate to tbe robust test and fil, n2, n4 to tbe optimal test. convenience, specified by tbe proportion 2a of tbe sample omitted ratber
Interestingly, Veale and Kale call Pm tbe premium and Pt tbe protection tban tbe number 2r of observations. Witb an a -trimming procedure in
involved in using tbe robust test, providing a natural extension of wbicb a bas been specified beforeband, tbe number an of observations
Anscombe's concepts of premium and protection in estimation discussed supposed to be trimmed at eacb end may not be an integer; suppose its
earlier; see (4.1.2) and (4.1.3). integer part is r, so tbat an= r +t (O< t< 1). We tben omit r observations at
eacb end, and include tbe nearest retained observations, x{r+l) and x<n-r)'
4.2 GENERAL METHODS OF ACCOMMODATION eacb witb reduced weigbt 1- f:
T
4.2.1 Estimation of location m( a, a)= ((1- f)x(r+l) + X(r+2) + ... + X(n-r-1) + (1- f)x(n-r)]/n(1- 2a).
We now consider some of tbe generai metbods tbat exist for constructing (4.2.5)
robust estimators, tests, or confidence intervals and give a- brief review of tbe
performance cbaracteristics of selected procedures. In tbe main tbe tecbni- Similarly we can define a- Winsorized means iri(a, a); tbere is now no need
ques and results do not specifically relate to particular assumed forms for tbe for any fractional weigbting, since tbe number of lower-tail observations
basic model (special cases of normal and exponential basic models are given Winsorized into x(r+l) is r +t+ 1- t- 1 = r. Tbus
separate attention in Sections 4.3 and 4.4, respectively).
We start witb two familiar, simple, and intuitively appealing procedures
for inducing robustness, namely trimming and Winsorizing. Tbese bave (4.2.6)
already been mentioned in Section 2.6. Tbe object is to control tbe variabil-
ity due to tbe r lowest sample values x< 1>, .•. , x<r> and tbe s bigbest ones Clearly tbe 0-trimmed an d 0-Winsorized means are botb tbe same as tbe
x<n-s+l)' ... , x<n>· Tbe cboice of r and s is discussed later; for tbe moment sample mean i, an d tbe !-Winsorized mean is tbe same as tbe sample
we suppose tbey are pre-cbosen parameters. If tbese r + s observations are
median i; tbe !-trimmed mean, by a suitable limiting argument, can also be
omitted, so tbat we confine ourselves to a censored sample of size n - r- s,
we get tbe (r, s)-told trimmed mean taken to be i. Tbe ~-trimmed mean, fiz(~, ~), is called tbe mid-mean.
Wbat is tbe influence curve, ICTF(g), for tbe a-trimmed mean? If F is
T continuous witb density t an d m~an /J;, an d O:s:;; a < !, we bave, in tbe
Xr,s = (X(r+l) + · · · + X(n-s))/(n- r- S). (4.2.1) notation of Section 4.1,
xl-«
If on tbe otber band tbe r lowest sample values are eacb replaced by tbe 1
~alue. of tbe ne~rest observation to be retained uncbanged, viz. x(r+l)' and T(F) = (1- 2 a) J x dF (4.2.7)
hkew1se tbe s b1gbest by x<n-s)' so tbat we work witb a transformed sample
146 Outliers in statistica[ data Accommodation of outliers in univariate samples 147

wbere Xa denotes tbe a-quantile of F: F(xa) =a. Hence equal to the infiuence of an additional x at [x"' resp. x 1 _"'] ••• the a-trimmed mean
Y1-.,. Y1-.. does not really 'throw out' outliers, in the sense of ignoring them completely, but in
T[(l-A)F+AG]=( l-A)
l-2a
f xdF+(-A-)
l-2a
f xdG (4.2.8)
effect 'brings them in' towards the bulk of the sample. But what about the
a-Winsorized mean which had been designed specifically to 'bring in' outliers?
(Hampel, 1974)
y.. y..
wbere Ya is determined from T o answer tbis we bave, for tbe a- Winsorized mean (O~ a < !),
xl-ex

with a similar definition for y1 _a; this gives T(F) = ax. + f x dF + ax 1 _ •• (4.2.10)

(éJya/éJAh=o = -(1- a)/f(xa) (ç < Xa), a/f(xa) (ç> Xa)


Again assuming symmetric F, tbe influence curve now comes out to bave tbe
(éJYl-a/éJAh=o = -a/f(xl-a) (ç <xl-a), (1- a)/f(xl-a) (ç> Xl-a). following·form, illustrated in Figure 4.1:
For symmetric F, and in particular for normal F, xa + x 1 _a = 2~L, and - [(x 1 _a -IL)+ a/f(xa)] for ç < Xa
f(xa) = f(xl-a). It tben readily follows from (4.2.8), (4.1.16) and tbe relation ICTF(ç)= ç-IL for Xa~ç~xl-a
J~-.. xdF=(l-2a)~L, tbat tbe influence curve for tbe a-trimmed mean is {
+[(x 1 -a -~L)+ a/f(xa)] for ç> Xl-a·
-(xl-a -~L)/(1- 2a) for ç< Xa (4.2.11)
ICT,F(ç)= (ç-~L)I(l-2a) for Xa~ç~xl-a (4.2.9) The IC is indeed bounded, the outliers 'brought in', but there is a jump a t [x"' and
{ x1 _"'] • • • • Furthermore both slope in the center and supremum differ from that of
(xl-a -IL )/(1- 2a) for ç > xl-a· the a -trimmed mean .... the ... point is that the mass of the tails is put o n single
Tbis is illustrated in Figure 4.1. It sbows tbat order statistics resp. single points in the limit, and shifting them ... causes appreci-
able fiuctuations of the Winsorized mean which are determined solely by the density
the 'infiuence' of an extreme outlier on the value of a trimmed mean is not zero as in (and near) these points. A contamination in the centrai part, on the other hand,
one would naively expect (arguing that the outlier will be 'thrown out'); rather is it has the same infiuence as on the arithmetic mean, while the trimmed mean spreads
the infiuence of outliers evenly over the centrai part, thus giving it a higher weight.
/CT,F(~} Thus ... both the trimmed mean and the Winsorized mean restrict the infiuence of
(x1-a- p.)
ltf;(a, a) l outliers, but in different ways. While the IC of the former is always continuous, the
IC of the latter is discontinuous and very sensitive to the local behaviour of the true
(1-2a} [sic] underlying distribution at two of its quantiles. (Hampel, 1974)
A basic problem in tbe use of trimmed or Winsorized means is cboosing
xa tbe extent of trimming or Winsorization. Sbould we employ an asymmetric
(r;é s), or symmetric (r = s), scbeme; bow sbould (r, s) be cbosen (or a in tbe
-(x,_a-p.}
symmetric proportionate scbemes)? No simple answers are feasible. Tbe
(1-2a} range (and degree of specification) of possible basic and alternative models,
tbe variety of performance criteria wbicb may be adopted, tbe dependence
IC (~) on sample size, and so on, ali affect tbis cboice. Some recommendations will
T,F ~~(a, a) l be discussed later wben we consider performance cbaracteristics (Sections
x1_;;p. 4.2.2, 4.3, 4.4).
Some modifications of trimmed or Winsorized means cbange tbe nature of
tbe problem of cboosing tbe degree of trimming or Winsorization. We
xa
migbt, for example, contemplate eliminating (or transferring) sample values
-(x1_;p.)
in terms of some quantitative measure of tbeir extremeness, ratber tban
merely o n tbe basis of tbeir rank order. Suppose we consider sample
residuals zi (j = l, 2, ... , n), de fin ed o n some appropriate basis. For
example, if tbe basic model is N(~L, o- 2 ) we migbt use zi =xi- i. Modified
FigBre 4.1 Infiuence curves for a-trimmed and a-Winsorized means trimming occurs witb tbe 'rejection rule' of Anscombe (1960a) wbere IL is
146 Outliers in statistica[ data Accommodation of outliers in univariate samples 147

wbere Xa denotes tbe a-quantile of F: F(xa) =a. Hence equal to the infiuence of an additional x at [x"' resp. x 1 _"'] ••• the a-trimmed mean
Y1-.,. Y1-.. does not really 'throw out' outliers, in the sense of ignoring them completely, but in
T[(l-A)F+AG]=( l-A)
l-2a
f xdF+(-A-)
l-2a
f xdG (4.2.8)
effect 'brings them in' towards the bulk of the sample. But what about the
a-Winsorized mean which had been designed specifically to 'bring in' outliers?
(Hampel, 1974)
y.. y..
wbere Ya is determined from T o answer tbis we bave, for tbe a- Winsorized mean (O~ a < !),
xl-ex

with a similar definition for y1 _a; this gives T(F) = ax. + f x dF + ax 1 _ •• (4.2.10)

(éJya/éJAh=o = -(1- a)/f(xa) (ç < Xa), a/f(xa) (ç> Xa)


Again assuming symmetric F, tbe influence curve now comes out to bave tbe
(éJYl-a/éJAh=o = -a/f(xl-a) (ç <xl-a), (1- a)/f(xl-a) (ç> Xl-a). following·form, illustrated in Figure 4.1:
For symmetric F, and in particular for normal F, xa + x 1 _a = 2~L, and - [(x 1 _a -IL)+ a/f(xa)] for ç < Xa
f(xa) = f(xl-a). It tben readily follows from (4.2.8), (4.1.16) and tbe relation ICTF(ç)= ç-IL for Xa~ç~xl-a
J~-.. xdF=(l-2a)~L, tbat tbe influence curve for tbe a-trimmed mean is {
+[(x 1 -a -~L)+ a/f(xa)] for ç> Xl-a·
-(xl-a -~L)/(1- 2a) for ç< Xa (4.2.11)
ICT,F(ç)= (ç-~L)I(l-2a) for Xa~ç~xl-a (4.2.9) The IC is indeed bounded, the outliers 'brought in', but there is a jump a t [x"' and
{ x1 _"'] • • • • Furthermore both slope in the center and supremum differ from that of
(xl-a -IL )/(1- 2a) for ç > xl-a· the a -trimmed mean .... the ... point is that the mass of the tails is put o n single
Tbis is illustrated in Figure 4.1. It sbows tbat order statistics resp. single points in the limit, and shifting them ... causes appreci-
able fiuctuations of the Winsorized mean which are determined solely by the density
the 'infiuence' of an extreme outlier on the value of a trimmed mean is not zero as in (and near) these points. A contamination in the centrai part, on the other hand,
one would naively expect (arguing that the outlier will be 'thrown out'); rather is it has the same infiuence as on the arithmetic mean, while the trimmed mean spreads
the infiuence of outliers evenly over the centrai part, thus giving it a higher weight.
/CT,F(~} Thus ... both the trimmed mean and the Winsorized mean restrict the infiuence of
(x1-a- p.)
ltf;(a, a) l outliers, but in different ways. While the IC of the former is always continuous, the
IC of the latter is discontinuous and very sensitive to the local behaviour of the true
(1-2a} [sic] underlying distribution at two of its quantiles. (Hampel, 1974)
A basic problem in tbe use of trimmed or Winsorized means is cboosing
xa tbe extent of trimming or Winsorization. Sbould we employ an asymmetric
(r;é s), or symmetric (r = s), scbeme; bow sbould (r, s) be cbosen (or a in tbe
-(x,_a-p.}
symmetric proportionate scbemes)? No simple answers are feasible. Tbe
(1-2a} range (and degree of specification) of possible basic and alternative models,
tbe variety of performance criteria wbicb may be adopted, tbe dependence
IC (~) on sample size, and so on, ali affect tbis cboice. Some recommendations will
T,F ~~(a, a) l be discussed later wben we consider performance cbaracteristics (Sections
x1_;;p. 4.2.2, 4.3, 4.4).
Some modifications of trimmed or Winsorized means cbange tbe nature of
tbe problem of cboosing tbe degree of trimming or Winsorization. We
xa
migbt, for example, contemplate eliminating (or transferring) sample values
-(x1_;p.)
in terms of some quantitative measure of tbeir extremeness, ratber tban
merely o n tbe basis of tbeir rank order. Suppose we consider sample
residuals zi (j = l, 2, ... , n), de fin ed o n some appropriate basis. For
example, if tbe basic model is N(~L, o- 2 ) we migbt use zi =xi- i. Modified
FigBre 4.1 Infiuence curves for a-trimmed and a-Winsorized means trimming occurs witb tbe 'rejection rule' of Anscombe (1960a) wbere IL is
148 Outliers in statistica[ data Accommodation of outliers in univariate samples 149

estimated by modification due to Bickel is described in Andrews et al., (1972). Hogg


(1974) stresses tbe difficulty in deciding wbat is an appropriate measure of
i if lzii< eu (ali j) location for an asymmetric distribution and follows up a suggestion by
i 1 ,0 if lz(l)l;::: eu and lz{l}l > lz<n>l Huber (1972) tbaLtbe measure migbt be defined in terms of tbe limiting
form of some appealing estimator. He commends tbe trimmed mean
fir(a 1, az) for tbis purpose witb a 1 and az (wbicb may well differ in value)
Xo,
T 1 if l l2:: eu
Z(n) and l l> lz(l} l
Z(n) cbosen adaptively to minimize an estimate sz(a 1 , az) of tbe variance of
fir(ab az). Otber adaptive estimators and tests will be described wbere
for a suitable cboice of c. (If u is unknown it is replaced by s.) appropriate in tbe following discussion.
Por an alternative model of tbe slippage type, incorporating precisely one Otber types of robust estimator are conveniently described in terms of tbe
discordant value, tbe observation witb maximum absolute residua! is trim- tbree-part classification of metbods for constructing estimators, outlined by
med if it is suffìciently extreme. Huber (1972); see Cbapter 2.
Corresponding modified Winsorization is also contemplated: instead of
trimming (rejecting) tbe observation witb sufficiently extreme maximum Maximum likelihood type estimators (M-estimators)
absolute residua!, it migbt be replaced by its nearest neigbbour in tbe
ordered sample. Tbus Guttman and Smitb (1969) suggest estimating IL in Huber (1964) proposes a generalization of tbe least squares principle for
N(~L, uz) by constructing estimators of (principally) location parameters. Suppose, on tbe
basic model, tbat tbe sample comes from a distribution witb distribution
i if lzii< eu (ali j)
function F(x- O). It is tbe location parameter, O, wbicb we wisb to estimate.
w W e migbt estimate (J by Tn= Tn(xb Xz, ..• , xn) cbosen to minimize
X1,o if lz(l>l~ eu an d lz<•>l > lz(nJII (4.2.12)

w
Xo,1 if lz(n)l~cu an d lz(n}l > lz(l)l
for a suitable cboice of c (again s replaces u, if u is unknown). wbere p( ) is some real valued non-constant function. As special cases we
Anotber possibility, termed semi- Winsorization (Guttman and Smitb, note tbat p( t)= t2 yields tbe sample mean, p( t)= lti yields tbe sample
1969) replaces tbe sufficiently extreme observation (tbat witb largest abso- median, wbilst p( t)= -log f(t) yields tbe maximum likelibood estimator
lute residua! if tbis exceeds eu) witb tbe appropriate cut-off point, i - eu or (wbere f(x) is tbe density function under tbe basic model wben (J = 0). If
i+ eu, ratber tban witb its nearest neigbbour. Again IL is estimated by tbe p( ) is continuous witb derivative t/1( ), equivalently we estimate (J by Tn
mean of tbe treated sample, or by i if lzii< eu forali j, and s is used in piace satisfying
n
of unknown u. (See Section 4.3 for furtber details.)
Tbere is a growing interest in so called adaptive methods of statistica!
Lt/J(xi- Tn) =O.
j=l
(4.2.13)

inference, in wbicb tbe cboice of inference procedures is allowed to depend


in part on tbe actual sample to band. Some sucb proposals bave been made Sucb an estimator is called a maximum likelihood type estimator, or
in tbe context of robust metbods for estimation and bypotbesis testing: see M-estimator. Usually we restrict attention to convex p( ), so tbat t/1( ) is
Hogg (1974) for a recent review of sucb work (togetber witb a contributed monotone and Tn unique. Under quite generai conditions Tn can be sbown to
discussion). One example, specificaliy concerned witb trimmed means, is bave desirable properties as an estimator. If p( ) is convex Tn is unique,
described by Jaeckel (1971b). Concerned witb optimal cboice of a in tbe translation invariant, consistent, and asymptotically normal (Huber 1964,
a -trimmed me an for estimating tbe location parameter of a symmetric 1967). Tbe question of cboice of p to acbieve an 'optimal' robust estimator
distribution be proposes tbat we cboose a in some permissible range (a 0 , a 1 ) of (J will be taken up at a later stage. One particular estimator witb desirable
to minimize tbe sample variance sz(a) of riì(a, a). Tbe resulting optimal- properties of robustness arises from putting
trimmed mean riì(a, a) is sbown to be asymptotically equivalent (in terms of
variance) to tbe bes t estimator riì (a, a) (i. e. witb minimum variance lti~K
(4.2.14)
u 2 (a, a)) provided tbe truly best a bappens to lie in tbe range (a 0 , a 1). A ltl> K
148 Outliers in statistica[ data Accommodation of outliers in univariate samples 149

estimated by modification due to Bickel is described in Andrews et al., (1972). Hogg


(1974) stresses tbe difficulty in deciding wbat is an appropriate measure of
i if lzii< eu (ali j) location for an asymmetric distribution and follows up a suggestion by
i 1 ,0 if lz(l)l;::: eu and lz{l}l > lz<n>l Huber (1972) tbaLtbe measure migbt be defined in terms of tbe limiting
form of some appealing estimator. He commends tbe trimmed mean
fir(a 1, az) for tbis purpose witb a 1 and az (wbicb may well differ in value)
Xo,
T 1 if l l2:: eu
Z(n) and l l> lz(l} l
Z(n) cbosen adaptively to minimize an estimate sz(a 1 , az) of tbe variance of
fir(ab az). Otber adaptive estimators and tests will be described wbere
for a suitable cboice of c. (If u is unknown it is replaced by s.) appropriate in tbe following discussion.
Por an alternative model of tbe slippage type, incorporating precisely one Otber types of robust estimator are conveniently described in terms of tbe
discordant value, tbe observation witb maximum absolute residua! is trim- tbree-part classification of metbods for constructing estimators, outlined by
med if it is suffìciently extreme. Huber (1972); see Cbapter 2.
Corresponding modified Winsorization is also contemplated: instead of
trimming (rejecting) tbe observation witb sufficiently extreme maximum Maximum likelihood type estimators (M-estimators)
absolute residua!, it migbt be replaced by its nearest neigbbour in tbe
ordered sample. Tbus Guttman and Smitb (1969) suggest estimating IL in Huber (1964) proposes a generalization of tbe least squares principle for
N(~L, uz) by constructing estimators of (principally) location parameters. Suppose, on tbe
basic model, tbat tbe sample comes from a distribution witb distribution
i if lzii< eu (ali j)
function F(x- O). It is tbe location parameter, O, wbicb we wisb to estimate.
w W e migbt estimate (J by Tn= Tn(xb Xz, ..• , xn) cbosen to minimize
X1,o if lz(l>l~ eu an d lz<•>l > lz(nJII (4.2.12)

w
Xo,1 if lz(n)l~cu an d lz(n}l > lz(l)l
for a suitable cboice of c (again s replaces u, if u is unknown). wbere p( ) is some real valued non-constant function. As special cases we
Anotber possibility, termed semi- Winsorization (Guttman and Smitb, note tbat p( t)= t2 yields tbe sample mean, p( t)= lti yields tbe sample
1969) replaces tbe sufficiently extreme observation (tbat witb largest abso- median, wbilst p( t)= -log f(t) yields tbe maximum likelibood estimator
lute residua! if tbis exceeds eu) witb tbe appropriate cut-off point, i - eu or (wbere f(x) is tbe density function under tbe basic model wben (J = 0). If
i+ eu, ratber tban witb its nearest neigbbour. Again IL is estimated by tbe p( ) is continuous witb derivative t/1( ), equivalently we estimate (J by Tn
mean of tbe treated sample, or by i if lzii< eu forali j, and s is used in piace satisfying
n
of unknown u. (See Section 4.3 for furtber details.)
Tbere is a growing interest in so called adaptive methods of statistica!
Lt/J(xi- Tn) =O.
j=l
(4.2.13)

inference, in wbicb tbe cboice of inference procedures is allowed to depend


in part on tbe actual sample to band. Some sucb proposals bave been made Sucb an estimator is called a maximum likelihood type estimator, or
in tbe context of robust metbods for estimation and bypotbesis testing: see M-estimator. Usually we restrict attention to convex p( ), so tbat t/1( ) is
Hogg (1974) for a recent review of sucb work (togetber witb a contributed monotone and Tn unique. Under quite generai conditions Tn can be sbown to
discussion). One example, specificaliy concerned witb trimmed means, is bave desirable properties as an estimator. If p( ) is convex Tn is unique,
described by Jaeckel (1971b). Concerned witb optimal cboice of a in tbe translation invariant, consistent, and asymptotically normal (Huber 1964,
a -trimmed me an for estimating tbe location parameter of a symmetric 1967). Tbe question of cboice of p to acbieve an 'optimal' robust estimator
distribution be proposes tbat we cboose a in some permissible range (a 0 , a 1 ) of (J will be taken up at a later stage. One particular estimator witb desirable
to minimize tbe sample variance sz(a) of riì(a, a). Tbe resulting optimal- properties of robustness arises from putting
trimmed mean riì(a, a) is sbown to be asymptotically equivalent (in terms of
variance) to tbe bes t estimator riì (a, a) (i. e. witb minimum variance lti~K
(4.2.14)
u 2 (a, a)) provided tbe truly best a bappens to lie in tbe range (a 0 , a 1). A ltl> K
150 Outliers in statistica/ data Accommodation of outliers in univariate samples 151

for a suitable cboice of K. It is interesting to note tbat tbis estimator is a quick robust scale parameter estimator and 'as a basis for tbe rejection of
related to tbe Winsorized mean. It turns out tbat tbe estimator Tn is outliers'. In tbe present context be also advocates its use in (4.2.16) for
equivalent to tbe sample mean of a sample in wbicb ali observations xi sucb developing three-part descending M-estimators wbere
tbat lxi - Tn l> K are replaced by Tn - K or Tn + K, wbicbever is tbe dose r.
ltl~a.
(We bave multiple semi-Winsorization operating a t botb ends of tbe ordered
sample.)
Anotber M-estimator, witb t/l( t)=
: sgn t a<ltl~b.
(4.2.20)
~(csgn t-t)/(c-b) b <ltl~ c.

p(t)= { l
!t2
2 (4.2.15)
l o/( t)
ltl>c.
211
can be similarly interpreted as a trimmed mean. Tn is now tbe sample mean
of tbose observations xi satisfying lxi- Tnl < 11· Tbis extends tbe modified
trimming above from rejection of a single extreme value to rejection of ali -c -b -a a b c
sample values wbose residuals about Tn are sufficiently large in absolute
value. See Huber (1964) for details.
Wben tbe basic model involves a scale parameter (tbe distribution func-
tion is of tbe form F[(x- 8)/u ]) modified forms of M-estimator bave been A somewbat similar proposal in Andrews et al. (1972) employs
proposed. Tbe estimator of 8 is a solution Tn of an equation of tbe type
sin(t/ d) lti< d?T
f: t/l[(xi- Tn)f S]
j=l
= O (4.2.16)
.p( t)= {
o lti> d?T
(4.2.21)

(tbis is investigated for tbe specific cboice d= 2.1.)


wbere tbe scale parameter estimator S is robust for u and is estimated eitber
Related estimators based on preliminary modified Winsorization of obser-
independently by some suitable scbeme or simultaneously witb 8 by joint
vations wbose residuals exceed es (for some cboice of c) in absolute value
solution of (4.2.16) and an equation of tbe form
bave also been proposed and examined (see for example, Andrews et al.,
n
1972, wbere tbey are referred to as one-step Huber estimators).
Lx[(xj- Tn)l S] = o.
j=l
(4.2.17) Tbere is a vas t range of possible M -estimators. Even in tbe cases
described above a deal of cboice remains in terms of bow to estimate u and
Different cboices for .p( ) [and for x( )] yield a large assortment of wbat values to take for cut-off points sucb as K, 11, a, b, c and d. Many
M-estimators wbicb bave been discussed in tbe literature. One example due tbeoretical, numerica! and simulation studies bave been made and we sball
to Hampel (see Andrews et al., 1972, or Hogg, 1974) employs Huber's review some of tbe results later (Sections 4.2.3, 4.3). Some key references
p(t) as given by (4.2.14), i.e. are tbe large-scale empirica! study by Andrews et al. (1972), also Hampel
(1974), Hogg (1974), Huber (1964), Jaeckel (1971a), and Leone,
t/l(t)={t lti~K}' (4.2.18) Jayacbandran, and Eisenstat (1967).
K sgn t ltl>K Before moving o n to otber types of estimator, bowever, we must note tbe
witb S taken as scope bere for an adaptive approacb. It is reasonable to contemplate
cboosing, for example, relevant cut-off points in tbe ligbt of tbe sample data.
median {lxi- il}/(0.6745) (4.2.19)
However, little work of tbis type seems to bave bee n carried out to date for
wbere x is tbe sample median. M -estima tors.
Hampel (1974) terms Ratber tban employing separate estimates of u in tbe case wbere 8 and u
are unknown, we can pursue a joint estimation process wbicb consists of
sm = median {lxi- il} simultaneous solution of (4.2.16) and (4.2.17). Tbe example wbicb bas
tbe median deviation (by analogy witb tbe mean deviation). He outlines its received most attention is known as Huber's proposal 2. Huber (1964)
earlier, but limited usage, going back as far as Gauss, and recommends it as suggested (primarily for a norma/ basic model) tbat we employ .p( ) as
150 Outliers in statistica/ data Accommodation of outliers in univariate samples 151

for a suitable cboice of K. It is interesting to note tbat tbis estimator is a quick robust scale parameter estimator and 'as a basis for tbe rejection of
related to tbe Winsorized mean. It turns out tbat tbe estimator Tn is outliers'. In tbe present context be also advocates its use in (4.2.16) for
equivalent to tbe sample mean of a sample in wbicb ali observations xi sucb developing three-part descending M-estimators wbere
tbat lxi - Tn l> K are replaced by Tn - K or Tn + K, wbicbever is tbe dose r.
ltl~a.
(We bave multiple semi-Winsorization operating a t botb ends of tbe ordered
sample.)
Anotber M-estimator, witb t/l( t)=
: sgn t a<ltl~b.
(4.2.20)
~(csgn t-t)/(c-b) b <ltl~ c.

p(t)= { l
!t2
2 (4.2.15)
l o/( t)
ltl>c.
211
can be similarly interpreted as a trimmed mean. Tn is now tbe sample mean
of tbose observations xi satisfying lxi- Tnl < 11· Tbis extends tbe modified
trimming above from rejection of a single extreme value to rejection of ali -c -b -a a b c
sample values wbose residuals about Tn are sufficiently large in absolute
value. See Huber (1964) for details.
Wben tbe basic model involves a scale parameter (tbe distribution func-
tion is of tbe form F[(x- 8)/u ]) modified forms of M-estimator bave been A somewbat similar proposal in Andrews et al. (1972) employs
proposed. Tbe estimator of 8 is a solution Tn of an equation of tbe type
sin(t/ d) lti< d?T
f: t/l[(xi- Tn)f S]
j=l
= O (4.2.16)
.p( t)= {
o lti> d?T
(4.2.21)

(tbis is investigated for tbe specific cboice d= 2.1.)


wbere tbe scale parameter estimator S is robust for u and is estimated eitber
Related estimators based on preliminary modified Winsorization of obser-
independently by some suitable scbeme or simultaneously witb 8 by joint
vations wbose residuals exceed es (for some cboice of c) in absolute value
solution of (4.2.16) and an equation of tbe form
bave also been proposed and examined (see for example, Andrews et al.,
n
1972, wbere tbey are referred to as one-step Huber estimators).
Lx[(xj- Tn)l S] = o.
j=l
(4.2.17) Tbere is a vas t range of possible M -estimators. Even in tbe cases
described above a deal of cboice remains in terms of bow to estimate u and
Different cboices for .p( ) [and for x( )] yield a large assortment of wbat values to take for cut-off points sucb as K, 11, a, b, c and d. Many
M-estimators wbicb bave been discussed in tbe literature. One example due tbeoretical, numerica! and simulation studies bave been made and we sball
to Hampel (see Andrews et al., 1972, or Hogg, 1974) employs Huber's review some of tbe results later (Sections 4.2.3, 4.3). Some key references
p(t) as given by (4.2.14), i.e. are tbe large-scale empirica! study by Andrews et al. (1972), also Hampel
(1974), Hogg (1974), Huber (1964), Jaeckel (1971a), and Leone,
t/l(t)={t lti~K}' (4.2.18) Jayacbandran, and Eisenstat (1967).
K sgn t ltl>K Before moving o n to otber types of estimator, bowever, we must note tbe
witb S taken as scope bere for an adaptive approacb. It is reasonable to contemplate
cboosing, for example, relevant cut-off points in tbe ligbt of tbe sample data.
median {lxi- il}/(0.6745) (4.2.19)
However, little work of tbis type seems to bave bee n carried out to date for
wbere x is tbe sample median. M -estima tors.
Hampel (1974) terms Ratber tban employing separate estimates of u in tbe case wbere 8 and u
are unknown, we can pursue a joint estimation process wbicb consists of
sm = median {lxi- il} simultaneous solution of (4.2.16) and (4.2.17). Tbe example wbicb bas
tbe median deviation (by analogy witb tbe mean deviation). He outlines its received most attention is known as Huber's proposal 2. Huber (1964)
earlier, but limited usage, going back as far as Gauss, and recommends it as suggested (primarily for a norma/ basic model) tbat we employ .p( ) as
152 Outliers in statistica[ data Accommodation of outliers in univariate samples 153

expressed in (4.2.18) witb some preliminary cboice of value for K and take Under appropriate conditions tbe corresponding Tn is consistent and asymp-
totically norma! (Cbernoff, Gastwirtb, and Jobns, 1967; Bickel, 1967;
(4.2.22) Jaeckel, 197la).
wbere Le t us consider, some furtber examples of L-estimators wbicb bave bee n
{3(K) = f1/1 2(1} dt. (4.2.23) proposed and investigated.
Gastwirtb and Coben (1970) consider, primarily fora normal basic model
Tbis form of x(t) was motivated by consideration of reasonable M- witb symmetric contamination, tbe estimator
estimators of u (see Section 4.3). Tbe corresponding (4.2.16) and (4.2.17)
need to be solved iteratively for suitably cbosen starting values. Tbe result- Tn = y(X([pn]+l) + X(n-[pn])) + (1- 2y).i (4.2.26)
ing estimators of 8 (and of u) bave received mucb attention (see, for (O< p< l, O< y < 1). Tbis is a weigbted combination of tbe lower and upper
example, Andrews et al., 1972; Bickel, 1965; Huber, 1964) and we sball ptb sample fractiles, eacb witb weigbt y, and tbe sample median, witb weigbt
consider tbem furtber in Sections 4.2.4 and 4.3. 1- 2y. It is compared witb many otber estimators, primarily of L-type.
A special case of (4.2.26) of tbe form
Linear order statistics estimators (L-estimators)
Tn = 0.3x([n/3+1]) + 0.4i + 0.3x(n-[n/3]) (4.2.27)
Suppose x(l) < x(2 ) ••• < x(n) denotes tbe ordered sample. We migbt estimate
8 by a linear form is proposed and investigated by Gastwirtb (1966). Anotber class of es-
timators based on a small number of selected ordered sample values
(4.2.24) includes tbe trimean
(4.2.28)
of the x0 ) (j = l, 2, ... , n). Sucb linear order statistics estimators bave wbere tbe hinges, h 1 and h 2 , are approximate sample q.uartiles. An adaptive
been widely studied for specific uncontaminated samples (see, for example, form employs tbe notion of skipping (Tukey, 1977) and examples are
tbe lengtby review in David, 1970). Mucb of tbis work is directed to investigated by Andrews et al. (1972). Tbe binges are taken as tbe lower and
censored samples. Wbilst not specifically concerned witb tbe problem of upper sample quartiles. Derived quantities of tbe form
outliers, in tbat tbe reason for censoring is seldom considered and no
outlier-specific alternative model employed, some linear order statistics c1_= h1 + 11(h2- h1)} (4.2.29)
estimators for censored samples will possess generai robustness properties c2- h 1-11 (h 2 - h 1 )
wbicb carry over to tbe outlier problem. Tbere is good reason bowever, to are defined for prescribed 11 (typically l, 1.5, or 2) and tbe skipping process
consider estimators of tbe form (4.2.24) specifically in tbe context of robust involves deleting observations in tbe tails of tbe sample (outside tbe interval
estimation from possibly contaminated samples. Indeed we bave already (c 17 c 2 )) preliminary to calculation of tbe trimean of tbe retained observa-
considered examples of sucb estimators including tbe sample median and tions. In iterative skipping tbe process is repeated witb recalculated binges at
trimmed and Winsorized means-all yielded by a particular cboice of tbe ai eacb stage until tbe retained data set remains constant: multiple skipping
in (4.2.24). Tbe modified trimmed and Winsorized means and indeed repeats tbis process by skipping applied to tbe retained data set witb
certain M-estimators) bave (or can be interpreted to bave an analogous different values of 11 at eacb stage.
quasi-adaptive form in tbe respect that tbe cboice of tbe ai depends on tbe Otber adaptive L-estimators bave been considered by Birnbaum and
observations in tbe sample. Miké (1970), Takeucbi (1971), and by Jaeckel (197la). We sbould also
If we represent tbe weigbts ai as include at tbis stage tbe 'shorth' wbicb is tbe sample mean of tbe sbortest balf
j/n of tbe sample (cbosen as x< 1>, • •• , x(l+[n! 2 J> wbere l minimizes x(l+[n/ 2 ])- x(l)),
and associated estimators. (See Andrews et al., 1972).
ai= J J(t) d t (4.2.25)
(j-1)/n Rank test estimators (R-estimators)

for some function J(t) satisfying J/) J(t) dt = l, we bave wbat Huber (1972) Witbin tbe wide range of non-parametric procedures we bave metbods of
calls L-estimators. Since most studies concern estimating tbe centre of a testing bypotbeses about location parameters in (primarily symmetric) un-
symmetric distribution we frequently encounter tbe furtber (natura!) as- specified distributions, and associated estimates, wbicb are often distribution-
sumption tbat tbe weigbts are symmetrically valued; tbat is, ai= an+l-i· free and may be expected to possess various robustness properties. Specific
152 Outliers in statistica[ data Accommodation of outliers in univariate samples 153

expressed in (4.2.18) witb some preliminary cboice of value for K and take Under appropriate conditions tbe corresponding Tn is consistent and asymp-
totically norma! (Cbernoff, Gastwirtb, and Jobns, 1967; Bickel, 1967;
(4.2.22) Jaeckel, 197la).
wbere Le t us consider, some furtber examples of L-estimators wbicb bave bee n
{3(K) = f1/1 2(1} dt. (4.2.23) proposed and investigated.
Gastwirtb and Coben (1970) consider, primarily fora normal basic model
Tbis form of x(t) was motivated by consideration of reasonable M- witb symmetric contamination, tbe estimator
estimators of u (see Section 4.3). Tbe corresponding (4.2.16) and (4.2.17)
need to be solved iteratively for suitably cbosen starting values. Tbe result- Tn = y(X([pn]+l) + X(n-[pn])) + (1- 2y).i (4.2.26)
ing estimators of 8 (and of u) bave received mucb attention (see, for (O< p< l, O< y < 1). Tbis is a weigbted combination of tbe lower and upper
example, Andrews et al., 1972; Bickel, 1965; Huber, 1964) and we sball ptb sample fractiles, eacb witb weigbt y, and tbe sample median, witb weigbt
consider tbem furtber in Sections 4.2.4 and 4.3. 1- 2y. It is compared witb many otber estimators, primarily of L-type.
A special case of (4.2.26) of tbe form
Linear order statistics estimators (L-estimators)
Tn = 0.3x([n/3+1]) + 0.4i + 0.3x(n-[n/3]) (4.2.27)
Suppose x(l) < x(2 ) ••• < x(n) denotes tbe ordered sample. We migbt estimate
8 by a linear form is proposed and investigated by Gastwirtb (1966). Anotber class of es-
timators based on a small number of selected ordered sample values
(4.2.24) includes tbe trimean
(4.2.28)
of the x0 ) (j = l, 2, ... , n). Sucb linear order statistics estimators bave wbere tbe hinges, h 1 and h 2 , are approximate sample q.uartiles. An adaptive
been widely studied for specific uncontaminated samples (see, for example, form employs tbe notion of skipping (Tukey, 1977) and examples are
tbe lengtby review in David, 1970). Mucb of tbis work is directed to investigated by Andrews et al. (1972). Tbe binges are taken as tbe lower and
censored samples. Wbilst not specifically concerned witb tbe problem of upper sample quartiles. Derived quantities of tbe form
outliers, in tbat tbe reason for censoring is seldom considered and no
outlier-specific alternative model employed, some linear order statistics c1_= h1 + 11(h2- h1)} (4.2.29)
estimators for censored samples will possess generai robustness properties c2- h 1-11 (h 2 - h 1 )
wbicb carry over to tbe outlier problem. Tbere is good reason bowever, to are defined for prescribed 11 (typically l, 1.5, or 2) and tbe skipping process
consider estimators of tbe form (4.2.24) specifically in tbe context of robust involves deleting observations in tbe tails of tbe sample (outside tbe interval
estimation from possibly contaminated samples. Indeed we bave already (c 17 c 2 )) preliminary to calculation of tbe trimean of tbe retained observa-
considered examples of sucb estimators including tbe sample median and tions. In iterative skipping tbe process is repeated witb recalculated binges at
trimmed and Winsorized means-all yielded by a particular cboice of tbe ai eacb stage until tbe retained data set remains constant: multiple skipping
in (4.2.24). Tbe modified trimmed and Winsorized means and indeed repeats tbis process by skipping applied to tbe retained data set witb
certain M-estimators) bave (or can be interpreted to bave an analogous different values of 11 at eacb stage.
quasi-adaptive form in tbe respect that tbe cboice of tbe ai depends on tbe Otber adaptive L-estimators bave been considered by Birnbaum and
observations in tbe sample. Miké (1970), Takeucbi (1971), and by Jaeckel (197la). We sbould also
If we represent tbe weigbts ai as include at tbis stage tbe 'shorth' wbicb is tbe sample mean of tbe sbortest balf
j/n of tbe sample (cbosen as x< 1>, • •• , x(l+[n! 2 J> wbere l minimizes x(l+[n/ 2 ])- x(l)),
and associated estimators. (See Andrews et al., 1972).
ai= J J(t) d t (4.2.25)
(j-1)/n Rank test estimators (R-estimators)

for some function J(t) satisfying J/) J(t) dt = l, we bave wbat Huber (1972) Witbin tbe wide range of non-parametric procedures we bave metbods of
calls L-estimators. Since most studies concern estimating tbe centre of a testing bypotbeses about location parameters in (primarily symmetric) un-
symmetric distribution we frequently encounter tbe furtber (natura!) as- specified distributions, and associated estimates, wbicb are often distribution-
sumption tbat tbe weigbts are symmetrically valued; tbat is, ai= an+l-i· free and may be expected to possess various robustness properties. Specific
154 Outliers in statistical data Accommodation of outliers in univariate samples 155

concern for robustness is reftected in a class of estimators (R-estimators) fora suitable. cboice of weigbts W, (I W,= l) wbicb are allowed to depend in
based on two-sample linear rank tests. Consider a function J(t) wbicb is value on tbe sample data. A special case wbicb bas received attention is tbe
antisymmetric about l, tbat is, piecewise estimator
J(t) = -J(l- t). fitc(t ~) t(x) < a 1
For any value of Il we form x1 -Il, ... , Xn -Il, -x 1 +Il, ... , -xn +Il and i a 1 ==::; t(x) ==::; b1
T= (4.2.33)
order tbe 2n numbers so obtained; an indicator function vi is formed wbere
Vi = l if tbe itb smallest is of type xi -Il, and ~ = O otberwise. Tbe
rh(t ~) b1 < t(x) ==::; c1
R -estimator Tn is a solution of tbe equation i t(x) > c 1
W(t)= O (4.2.30)
wbere
wbere fitc (~, ~) is tbe mean of tbe se t of symmetrically trimmed observations
W( t)=
n+
.I J(-2
J=l
i l) \'i· (4.2.31) witb trimming parameter a = ~' fit(t ~) is tbe corresponding trimmed mean
and i and i are tbe mean and median, respectively, of tbe wbole sample.
If J is monotone tbe solution of (4.2.30) is unique, consistent, bas known t(x) is some sample statistic and a 1 , b1 , c 1 , d 1 are a selected set of cut-off
variance and is asymptotically normally distributed. (See Hodges and points wbere we switcb from one form to anotber. Hogg (1967) and
Lebmann, 1963; Hodges, 1967; Huber, 1972; Jaeckel, 197la.) Andrews et al. (1972) consider tbe case wbere t(x) is tbe sample coefficient
An asymptotically equivalent special case, based on tbe one-sample of kurtosis and
Wilcoxon test, wbicb bas received mucb attention, is tbe Hodges-Lehmann
(Hodges and Lebmann, 1963) estimator. Tbis is tbe median of tbe set of
n(n + 1)/2 pairwise means (xi+ x1)/2 (i;é l; i= l, 2, ... , n; l= l, 2, ... , n). Hogg (1974) proposes a modified form replacing tbe sample coefficient of
Wbilst simple in form its calculation can be tedious if n is at alllarge. More kurtosis witb a 'better indicator of tbe lengtb of tbe tails'. Tbe revised
easily calculable versions bave been proposed, based on means of symmetri- estimator is felt to be better able to appropriately incorporate sborter-tailed
cally placed ordered sample values-tbere are only [(n+ 1)/2] sucb means. symmetric distributions.
For example we bave tbe folded-median type estimators. Tbe sample is We sball consider in Cbapter 8 some Bayesian approacbes to tbe accom-
folded by replacing x17 ••• , Xn witb [x(l) + x(n)]/2, [x< 2 >+ X<n-1)]/2 ... , and tbe modation of outliers due to Box and Tiao (1968) for normal distributions
median of tbe folded sample is cbosen as tbe estimator (tbe Bickel-Hodges and to Sinba (1972, 1973b) and Kale and Sinba (1971) for exponential
estimator). Reordering and furtber folding (witb or witbout trimming) is also distributions.
contemplated. See Andrews et al. (1972).

Other Estimators 4.2.2 Performance characteristics of location estimators


The large number of location estimators described above represents only a In Section 4.2.1 we reviewed tbe range of generai robust procedures wbicb
selection of tbose wbicb bave been proposed. An indication of tbe wider bave bee n proposed for estimating a location parameter, an d presente d a
range of prospects may be found in Andrews et al. (1972). variety of special estimators. We made no attempt to examine in detail
Hogg (1967) comments on tbe use of tbe mean of tbe 'trimmings' in a (i) wbat basic and alternative models (if any) were contemplated for any
trimmed sample (ratber tban tbe mean of tbe retained observations) as an estimator;
estimator wben tbe basic and alternative models ali bave sbort tails-but tbis (ii) wbetber, or not, distributions and contamination were assumed to be
is contrary to tbe spirit of our interest in outliers. However, bis proposal for symmetric;
an adaptive estimator based on tbe form of possible models is relevant to (iii) wbat could be claimed about tbe performance of estimators in respect
current interests. Suppose tbat tbe data carne from one of a set of possible of tbe variety of different performance criteria.
symmetric distributions {D1} (l= l, 2, ... , m) eacb centred on 8 and tbat
T1 is a good estimate of 8 under D 1• Hogg proposes tbat we adopt Space does not permit a full description of tbe range of publisbed results
on tbese matters, particularly tbe extensive simulation studies wbicb bave
T= LW, T, (4.2.32) been made of tbe relative performances of different estimators against
154 Outliers in statistical data Accommodation of outliers in univariate samples 155

concern for robustness is reftected in a class of estimators (R-estimators) fora suitable. cboice of weigbts W, (I W,= l) wbicb are allowed to depend in
based on two-sample linear rank tests. Consider a function J(t) wbicb is value on tbe sample data. A special case wbicb bas received attention is tbe
antisymmetric about l, tbat is, piecewise estimator
J(t) = -J(l- t). fitc(t ~) t(x) < a 1
For any value of Il we form x1 -Il, ... , Xn -Il, -x 1 +Il, ... , -xn +Il and i a 1 ==::; t(x) ==::; b1
T= (4.2.33)
order tbe 2n numbers so obtained; an indicator function vi is formed wbere
Vi = l if tbe itb smallest is of type xi -Il, and ~ = O otberwise. Tbe
rh(t ~) b1 < t(x) ==::; c1
R -estimator Tn is a solution of tbe equation i t(x) > c 1
W(t)= O (4.2.30)
wbere
wbere fitc (~, ~) is tbe mean of tbe se t of symmetrically trimmed observations
W( t)=
n+
.I J(-2
J=l
i l) \'i· (4.2.31) witb trimming parameter a = ~' fit(t ~) is tbe corresponding trimmed mean
and i and i are tbe mean and median, respectively, of tbe wbole sample.
If J is monotone tbe solution of (4.2.30) is unique, consistent, bas known t(x) is some sample statistic and a 1 , b1 , c 1 , d 1 are a selected set of cut-off
variance and is asymptotically normally distributed. (See Hodges and points wbere we switcb from one form to anotber. Hogg (1967) and
Lebmann, 1963; Hodges, 1967; Huber, 1972; Jaeckel, 197la.) Andrews et al. (1972) consider tbe case wbere t(x) is tbe sample coefficient
An asymptotically equivalent special case, based on tbe one-sample of kurtosis and
Wilcoxon test, wbicb bas received mucb attention, is tbe Hodges-Lehmann
(Hodges and Lebmann, 1963) estimator. Tbis is tbe median of tbe set of
n(n + 1)/2 pairwise means (xi+ x1)/2 (i;é l; i= l, 2, ... , n; l= l, 2, ... , n). Hogg (1974) proposes a modified form replacing tbe sample coefficient of
Wbilst simple in form its calculation can be tedious if n is at alllarge. More kurtosis witb a 'better indicator of tbe lengtb of tbe tails'. Tbe revised
easily calculable versions bave been proposed, based on means of symmetri- estimator is felt to be better able to appropriately incorporate sborter-tailed
cally placed ordered sample values-tbere are only [(n+ 1)/2] sucb means. symmetric distributions.
For example we bave tbe folded-median type estimators. Tbe sample is We sball consider in Cbapter 8 some Bayesian approacbes to tbe accom-
folded by replacing x17 ••• , Xn witb [x(l) + x(n)]/2, [x< 2 >+ X<n-1)]/2 ... , and tbe modation of outliers due to Box and Tiao (1968) for normal distributions
median of tbe folded sample is cbosen as tbe estimator (tbe Bickel-Hodges and to Sinba (1972, 1973b) and Kale and Sinba (1971) for exponential
estimator). Reordering and furtber folding (witb or witbout trimming) is also distributions.
contemplated. See Andrews et al. (1972).

Other Estimators 4.2.2 Performance characteristics of location estimators


The large number of location estimators described above represents only a In Section 4.2.1 we reviewed tbe range of generai robust procedures wbicb
selection of tbose wbicb bave been proposed. An indication of tbe wider bave bee n proposed for estimating a location parameter, an d presente d a
range of prospects may be found in Andrews et al. (1972). variety of special estimators. We made no attempt to examine in detail
Hogg (1967) comments on tbe use of tbe mean of tbe 'trimmings' in a (i) wbat basic and alternative models (if any) were contemplated for any
trimmed sample (ratber tban tbe mean of tbe retained observations) as an estimator;
estimator wben tbe basic and alternative models ali bave sbort tails-but tbis (ii) wbetber, or not, distributions and contamination were assumed to be
is contrary to tbe spirit of our interest in outliers. However, bis proposal for symmetric;
an adaptive estimator based on tbe form of possible models is relevant to (iii) wbat could be claimed about tbe performance of estimators in respect
current interests. Suppose tbat tbe data carne from one of a set of possible of tbe variety of different performance criteria.
symmetric distributions {D1} (l= l, 2, ... , m) eacb centred on 8 and tbat
T1 is a good estimate of 8 under D 1• Hogg proposes tbat we adopt Space does not permit a full description of tbe range of publisbed results
on tbese matters, particularly tbe extensive simulation studies wbicb bave
T= LW, T, (4.2.32) been made of tbe relative performances of different estimators against
156 Outliers in statistica/ data Accommodation of outliers in univariate samples 157

different possible models and using different performance criteria. Indeed, it minimum (asymptotic) efficiency. They adopt a Bayesian decision-theoretic
is doubtful whether much of the work is truly germane to a study of outliers, approach. Usually the maximin estimator is difficult to determine explicitly.
in that the implicit notions of robustness relate to much wider prospects (in One case, where C€ has just two members (double-exponential and logistic
terms of models) than we would expect to be manifest though outlying distributions), is examined in detail and an explicit maximin estimator
observations in a sample. We shall summarize, and give references for, those exhibited. This is of interest to the outlier problem only in as far as it
parts of the published work which bave relevant generai interest, or which provides an example of using an inherent type of alternative model. Gast-
come closest in spirit to the outlier problem. wirth and Rubin further show that maximin linear estimators do not possess
In Section 4.1.1 we defined the notions of minimax and maximin robust conspicuously higher asymptotic efficiency than simpler linear estimators
estimators. The former minimizes the maximum variance over the range of and suggest further limiting the field of estimators in the cause of simplicity
possible distributions encompassed in a composite alternative model; the and at little cost. They determine the maximin estimators (for a location
latter maximizes the minimum efficiency relative to corresponding 'optimal' parameter in the case of symmetric contamination under prescribed condi-
estimators. Various types of estimators bave been investigated in terms of tions) within the classes of trimmed means and of linear combinations of
such criteria. sample percentiles. Special cases considered yield high trimming factors: for
Jaeckel (1971a) demonstrates an asymptotic equivalence between M-, L- example, for Cauchy versus normal distributions we need a= 0.275 so that
and R-estimators for symmetric contamination of a symmetric basic model. only the middle 45 per cent of the sample is retained! Crow and Siddiqui
For a given M-estimator, asymptotically equivalent L- and R-estimators (1967) present other numerica! comparisons.
exist. Ali are asymptoticallv normal with equal variance and share jointly in The minimax (or maximin) approach is not ideai in two respects. It is
any asymptotic optimality. Huber (1964) shows that there is an M- bound to be a highly pessimistic policy-we pay a lot to protect against the
estimator which minimizes the supremum, over the class C€ of distributions most extreme prospects. Its asymptotic nature gives little due to the finite
of the form (1- A)F+ AG, of the asymptotic variances. This turns out to be sample behaviour of estimators.
the optimal M-estimator for a particular (exhibited) member F 0 of C€. In a different vein Hampel (1974) discusses in some detail the various
Jaeckel (1971a) shows that this optimality extends to asymptotically equival- measures of robustness based on the inftuence curve illustrating how differ-
ent L- and R-estimators. With asymmetric contamination of a basic sym- ent types of estimator stand up on such criteria. See also Hampel (1971).
metric distribution, F, the estimators are typically biased and do not con- Hogg (1974) reviews and considers some performance characteristics of
verge to the centre of symmetry of F (Huber, 1964). Jaeckel (1971a) various adaptive robust estimators. See also Jaeckel (1971b) on adaptive
attempts to overcome this difficulty by using a mixture model in which À is a L-estimators.
decreasing function of n in the sense of (4.1.10). Under rather specific Comparisons of the finite-sample, and asymptotic, behaviour of a large
conditions be exhibits a minimax optimality result (in terms of asymptotic variety ot specific location estimators bave been widely undertaken. Various
mean square error) analogous to that of Huber for the symmetric contami- performance characteristics are employed, often examined by simulation
nation case. See also Bickel (1965), Gastwirth (1966) and Huber (1972) on methods. Some results centred on the normal distribution as basic model
associated asymptotic behavioural and optimality considerations for M-, L-, will be summarized later (Section 4.3). References to work not specific to
an d R -estimators. the normal distribution include Bickel (1965), Crow and Siddiqui (1967),
Gastwirth (1966) and Crow and Siddiqui (1967) consider different classes Gastwirth and Cohen (1970), Hogg (1967), and Siddiqui and Raghunanda-
of estimator from the maximin (rather than minimax) standpoint. For nan (1967). But undoubtedly the tour de force is the study by Andrews et al.
location estimators in the case of symmetric contamination it proves to be (1972) of 68 different location estimators in terms of various asymptotic
useful to identify the two 'extreme' distributions in Cf6, and the maximin characteristics as well as a variety of finite sample characteristics for samples
estimator typically takes the form of a weighted average of the respective of sizes 5, 10, 20, and 40 for a large range (c. 20) of different possible data
'reasonable estimators' for these two extreme distributions, provided each generating models. They also consider such matters as the relative ease of
estimator is reasonably efficient for the alternative extreme distribution. computation of the estimators.
Gastwirth and Rubin (1969) consider maximin robust estimators in the We make no attempt to present a short recommended list of robust
specific class of linear order statistics estimators. They show under quite location estimators for generai use. Some sets of 'best buys' are offered: see
generous conditions that over a large class C€ of distributions (not restricted for example Andrews et al. (1972; in Chapter 7 each contributor tackles the
to mixture type alternatives), for each of which an asymptotically efficient unenviable task of summarizing the vast amount of information-some are
linear estimator exists, we can find a linear estimator which maximizes the even brave enough to specify tentative choices of 'best estimator') or Hogg
156 Outliers in statistica/ data Accommodation of outliers in univariate samples 157

different possible models and using different performance criteria. Indeed, it minimum (asymptotic) efficiency. They adopt a Bayesian decision-theoretic
is doubtful whether much of the work is truly germane to a study of outliers, approach. Usually the maximin estimator is difficult to determine explicitly.
in that the implicit notions of robustness relate to much wider prospects (in One case, where C€ has just two members (double-exponential and logistic
terms of models) than we would expect to be manifest though outlying distributions), is examined in detail and an explicit maximin estimator
observations in a sample. We shall summarize, and give references for, those exhibited. This is of interest to the outlier problem only in as far as it
parts of the published work which bave relevant generai interest, or which provides an example of using an inherent type of alternative model. Gast-
come closest in spirit to the outlier problem. wirth and Rubin further show that maximin linear estimators do not possess
In Section 4.1.1 we defined the notions of minimax and maximin robust conspicuously higher asymptotic efficiency than simpler linear estimators
estimators. The former minimizes the maximum variance over the range of and suggest further limiting the field of estimators in the cause of simplicity
possible distributions encompassed in a composite alternative model; the and at little cost. They determine the maximin estimators (for a location
latter maximizes the minimum efficiency relative to corresponding 'optimal' parameter in the case of symmetric contamination under prescribed condi-
estimators. Various types of estimators bave been investigated in terms of tions) within the classes of trimmed means and of linear combinations of
such criteria. sample percentiles. Special cases considered yield high trimming factors: for
Jaeckel (1971a) demonstrates an asymptotic equivalence between M-, L- example, for Cauchy versus normal distributions we need a= 0.275 so that
and R-estimators for symmetric contamination of a symmetric basic model. only the middle 45 per cent of the sample is retained! Crow and Siddiqui
For a given M-estimator, asymptotically equivalent L- and R-estimators (1967) present other numerica! comparisons.
exist. Ali are asymptoticallv normal with equal variance and share jointly in The minimax (or maximin) approach is not ideai in two respects. It is
any asymptotic optimality. Huber (1964) shows that there is an M- bound to be a highly pessimistic policy-we pay a lot to protect against the
estimator which minimizes the supremum, over the class C€ of distributions most extreme prospects. Its asymptotic nature gives little due to the finite
of the form (1- A)F+ AG, of the asymptotic variances. This turns out to be sample behaviour of estimators.
the optimal M-estimator for a particular (exhibited) member F 0 of C€. In a different vein Hampel (1974) discusses in some detail the various
Jaeckel (1971a) shows that this optimality extends to asymptotically equival- measures of robustness based on the inftuence curve illustrating how differ-
ent L- and R-estimators. With asymmetric contamination of a basic sym- ent types of estimator stand up on such criteria. See also Hampel (1971).
metric distribution, F, the estimators are typically biased and do not con- Hogg (1974) reviews and considers some performance characteristics of
verge to the centre of symmetry of F (Huber, 1964). Jaeckel (1971a) various adaptive robust estimators. See also Jaeckel (1971b) on adaptive
attempts to overcome this difficulty by using a mixture model in which À is a L-estimators.
decreasing function of n in the sense of (4.1.10). Under rather specific Comparisons of the finite-sample, and asymptotic, behaviour of a large
conditions be exhibits a minimax optimality result (in terms of asymptotic variety ot specific location estimators bave been widely undertaken. Various
mean square error) analogous to that of Huber for the symmetric contami- performance characteristics are employed, often examined by simulation
nation case. See also Bickel (1965), Gastwirth (1966) and Huber (1972) on methods. Some results centred on the normal distribution as basic model
associated asymptotic behavioural and optimality considerations for M-, L-, will be summarized later (Section 4.3). References to work not specific to
an d R -estimators. the normal distribution include Bickel (1965), Crow and Siddiqui (1967),
Gastwirth (1966) and Crow and Siddiqui (1967) consider different classes Gastwirth and Cohen (1970), Hogg (1967), and Siddiqui and Raghunanda-
of estimator from the maximin (rather than minimax) standpoint. For nan (1967). But undoubtedly the tour de force is the study by Andrews et al.
location estimators in the case of symmetric contamination it proves to be (1972) of 68 different location estimators in terms of various asymptotic
useful to identify the two 'extreme' distributions in Cf6, and the maximin characteristics as well as a variety of finite sample characteristics for samples
estimator typically takes the form of a weighted average of the respective of sizes 5, 10, 20, and 40 for a large range (c. 20) of different possible data
'reasonable estimators' for these two extreme distributions, provided each generating models. They also consider such matters as the relative ease of
estimator is reasonably efficient for the alternative extreme distribution. computation of the estimators.
Gastwirth and Rubin (1969) consider maximin robust estimators in the We make no attempt to present a short recommended list of robust
specific class of linear order statistics estimators. They show under quite location estimators for generai use. Some sets of 'best buys' are offered: see
generous conditions that over a large class C€ of distributions (not restricted for example Andrews et al. (1972; in Chapter 7 each contributor tackles the
to mixture type alternatives), for each of which an asymptotically efficient unenviable task of summarizing the vast amount of information-some are
linear estimator exists, we can find a linear estimator which maximizes the even brave enough to specify tentative choices of 'best estimator') or Hogg
158 Outliers in statistical data Accommodation of outliers in univariate samples 159

(1974). But wealth of detail and inconsistencies of relative performance are robust scale estimator), provide evidence to suggest tbat using a scale
not the major reasons for hesitating to recommend particular robust es- estimate based on sm (their 'robust scale estimate') is preferable to using one
timators at tbis stage. Tbe problem is that generai studies of robustness are based on Q.
seldom specific to our tbeme: outliers. Tbey reftect a more generai concept A variety of sca!ç estimators based on a Winsorized sample bave been
of robustness wbere alternative models encompass widely differing distribu- proposed botb for robust estimation of a scale parameter, u, per se, and, more
tions not promoted solely or even primarily by a desire to reftect outliers. often to obtain a robust studentized test statistic (see Section 4.2.4).
Undoubtedly tbe procedures tbat bave been advanced will include many Tbus if we effect (r, r) Winsorization of tbe ordered sample
wbicb will prove valuable on more specific investigation (wbich we bope will x(l), X(2 ), ••• , x<n> we obtain
materialize) from the outlier standpoint. Tbis belief explains the lengthy
review we present in tbis cbapter. In Section 4.3 some more detailed
prescriptions will be offered for a norma[ basic model with specific mixture r+ l times r+ l times
or slippage types of alternative model. Tbese are intrinsically closer in
relevance to the problem of accommodating outliers. w2 w2
Rewriting these as y(l)' Y<2 >, ••. , Y<n>' we might consider s r,r = S,)(n -l)
witb
4.2.3 Estimation of scale or dispersion
(4.2.34)
Far less attention bas been given to robust estimation of a scale or disper-
sion parameter, u, tban to robust estimation of location parameters. Wbat
few contributions exist again are not specific to the outlier accommodation as an estimator of u 2 where y = 'f,,,: tbe (r, r) Winsorized mean. See for
issue. example Dixon and Tukey (1968). Clearly tbe propriety of such a procedure
We bave remarked in Section 4.2.1 on one interesting estimator based on will depend strongly on how many discordant outliers there are at eacb end
tbe median deviation, of tbe sample relative to the prescribed r. Cboice of r (as for location-
Sm = median {lxj- xl}. estimation) is crucial, altbough likely to be even more so if we contemplate
By its nature we migbt expect it to provide reasonable protection against the using a trimmed rather than a Winsorized sample. But bere we bave added
inftuence of discordant values in tbe sample-tbis is toucbed on by Hampel risk of under-estimation due to deletion of 'respectable' extreme values in
(1974) who discusses tbe pedigree and performance of tbe estimator. It tbe sample. We might also consider asymmetric Winsorization in the outlier
arises 'as the M-estimate of scale witb the smallest possible gross-error- context, especially for a scale parameter in an asymmetric basic model.
sensitivity a t tbe norma! (and many otber) models'. Hampel regards it Generai linear forms
as the counterpart of tbe median as location estimator, and be discusses
the form of its inftuence function: in particular tbe gross-error-sensitivity (4.2.35)
and breakdown point. In these respects tbe median deviation is seen to be
more desirable tban tbe semi-interquartile-range, Q (althougb an equival- bave also been studied, including censored (trimmed) or Winsorized equival-
ence exists for symmetric samples and, asymptotically, for a symmetric basic ents (see, for example, Sarhan and Greenberg, 1962, pp. 218-251; David,
model). Q might also be expected to protect against outliers, but it may well 1970, pp. 109-124). Tbe interest in tbis work lies predominantly in wbat
be over-protective. loss occurs relative to tbe full sample equivalent for a prescribed distribution.
In spite of the relatively low efficiency of sm for uncontaminated data (c. Wbat little reference tbere is to robustness is not specific, nor particularly
40 per ce n t in tbe n ormai case) it appears to possess a robustness abse nt relevant, to accommodating outliers. It is a little surprising tbat no study of
from estimators wbicb are more efficient for homogeneous data. See Tukey (4.2.35) seems to bave bee n made in relation to accommodation of outliers,
(1960) and Stigler (1973b). wben tbe approacb proves so fruitful for censored data from a homogeneous
Andrews et al. (1972), in studying the characteristics of some rather source. Tukey (1960) points out that even for small contaminations (A-
attractive robust location estimators (sucb as tbe Huber 'proposal2' estimate 0.01) of a basic distribution N(#L, u 2 ) witb a contaminating distribution
witb K = 1.5, the Bickel one-step modification witb Winsorization of residu- N(#L, bu 2 ), where b >l, tbe relative advantage of tbe sample standard
als at ±Ks, where s is a robust scale estimate, and Hampel's tbree-part deviation over tbe mean deviation wbicb bolds in tbe uncontaminated
descending M-estimator witb one of tbe cut-off points determined from a situation is dramatically reversed. See also Section 4.3.
158 Outliers in statistical data Accommodation of outliers in univariate samples 159

(1974). But wealth of detail and inconsistencies of relative performance are robust scale estimator), provide evidence to suggest tbat using a scale
not the major reasons for hesitating to recommend particular robust es- estimate based on sm (their 'robust scale estimate') is preferable to using one
timators at tbis stage. Tbe problem is that generai studies of robustness are based on Q.
seldom specific to our tbeme: outliers. Tbey reftect a more generai concept A variety of sca!ç estimators based on a Winsorized sample bave been
of robustness wbere alternative models encompass widely differing distribu- proposed botb for robust estimation of a scale parameter, u, per se, and, more
tions not promoted solely or even primarily by a desire to reftect outliers. often to obtain a robust studentized test statistic (see Section 4.2.4).
Undoubtedly tbe procedures tbat bave been advanced will include many Tbus if we effect (r, r) Winsorization of tbe ordered sample
wbicb will prove valuable on more specific investigation (wbich we bope will x(l), X(2 ), ••• , x<n> we obtain
materialize) from the outlier standpoint. Tbis belief explains the lengthy
review we present in tbis cbapter. In Section 4.3 some more detailed
prescriptions will be offered for a norma[ basic model with specific mixture r+ l times r+ l times
or slippage types of alternative model. Tbese are intrinsically closer in
relevance to the problem of accommodating outliers. w2 w2
Rewriting these as y(l)' Y<2 >, ••. , Y<n>' we might consider s r,r = S,)(n -l)
witb
4.2.3 Estimation of scale or dispersion
(4.2.34)
Far less attention bas been given to robust estimation of a scale or disper-
sion parameter, u, tban to robust estimation of location parameters. Wbat
few contributions exist again are not specific to the outlier accommodation as an estimator of u 2 where y = 'f,,,: tbe (r, r) Winsorized mean. See for
issue. example Dixon and Tukey (1968). Clearly tbe propriety of such a procedure
We bave remarked in Section 4.2.1 on one interesting estimator based on will depend strongly on how many discordant outliers there are at eacb end
tbe median deviation, of tbe sample relative to the prescribed r. Cboice of r (as for location-
Sm = median {lxj- xl}. estimation) is crucial, altbough likely to be even more so if we contemplate
By its nature we migbt expect it to provide reasonable protection against the using a trimmed rather than a Winsorized sample. But bere we bave added
inftuence of discordant values in tbe sample-tbis is toucbed on by Hampel risk of under-estimation due to deletion of 'respectable' extreme values in
(1974) who discusses tbe pedigree and performance of tbe estimator. It tbe sample. We might also consider asymmetric Winsorization in the outlier
arises 'as the M-estimate of scale witb the smallest possible gross-error- context, especially for a scale parameter in an asymmetric basic model.
sensitivity a t tbe norma! (and many otber) models'. Hampel regards it Generai linear forms
as the counterpart of tbe median as location estimator, and be discusses
the form of its inftuence function: in particular tbe gross-error-sensitivity (4.2.35)
and breakdown point. In these respects tbe median deviation is seen to be
more desirable tban tbe semi-interquartile-range, Q (althougb an equival- bave also been studied, including censored (trimmed) or Winsorized equival-
ence exists for symmetric samples and, asymptotically, for a symmetric basic ents (see, for example, Sarhan and Greenberg, 1962, pp. 218-251; David,
model). Q might also be expected to protect against outliers, but it may well 1970, pp. 109-124). Tbe interest in tbis work lies predominantly in wbat
be over-protective. loss occurs relative to tbe full sample equivalent for a prescribed distribution.
In spite of the relatively low efficiency of sm for uncontaminated data (c. Wbat little reference tbere is to robustness is not specific, nor particularly
40 per ce n t in tbe n ormai case) it appears to possess a robustness abse nt relevant, to accommodating outliers. It is a little surprising tbat no study of
from estimators wbicb are more efficient for homogeneous data. See Tukey (4.2.35) seems to bave bee n made in relation to accommodation of outliers,
(1960) and Stigler (1973b). wben tbe approacb proves so fruitful for censored data from a homogeneous
Andrews et al. (1972), in studying the characteristics of some rather source. Tukey (1960) points out that even for small contaminations (A-
attractive robust location estimators (sucb as tbe Huber 'proposal2' estimate 0.01) of a basic distribution N(#L, u 2 ) witb a contaminating distribution
witb K = 1.5, the Bickel one-step modification witb Winsorization of residu- N(#L, bu 2 ), where b >l, tbe relative advantage of tbe sample standard
als at ±Ks, where s is a robust scale estimate, and Hampel's tbree-part deviation over tbe mean deviation wbicb bolds in tbe uncontaminated
descending M-estimator witb one of tbe cut-off points determined from a situation is dramatically reversed. See also Section 4.3.
160 Outliers in statistica[ data Accommodation of outliers in univariate samples 161

Dixon (1960) describes estimation of u based on the range of trimmed distributional· sources, can we replace i and s in (4.2.36) by (robust)
samples an d shows (in the normai case) that they can ha ve relative linear estimators T and S in such a way that the corresponding statistic is stili
efficiency in excess of 96 per cent. But there is again no consideration of distributed essentially as Student's t over the range of contemplated distribu-
their robustness (no alternative model is contemplated). tions? This requires a decision on what is to be our criterion of the
Huber (1970) considers estimators of dispersion based on rank tests, on 'Student's t-ness' of the statistic. (The broader issue of ab initio generation
sums of squares of order statistics and o n his earlier 'pro posai 2 '-ali with a of a robust test of location does not seem to bave received much attention.)
view to determining an appropriate studentized form for a robust location On the particular approach it seems almost inevitable that the basic
estimator. (See Section 4.2.4.) The latter approach does merit further distribution F should be normal, or at very least symmetric. Neither is it
comment here. In Huber's proposal 2 we bave a method of simultaneously surprising that the range of distributions contemplated often consists of a set
estimating 8 and o-via (4.2.16-18) and (4.2.22):Thisproposal was prompted by of symmetrically contaminated versions of F of a mixture type with normal
the desire to obtain a robust estimator S of u, as well as a robust estimator T (or symmetric) contaminating distributions. This, of course, bears on our
of 6, in the context of a mixture-type alternative model (l- A)F + AG with interest in outliers-although occasionally wider families of distributions
symmetric F. Huber (1964) remarks of the procedure bave been entertained with little outlier relevance.
It corresponds to Winsorizing a variable number of observations: slightly more if ... Huber (1970) summarizes the basic aims in examining a studentized
[(1- A)F+ AG] has heavier tails, slightly less if ... [it] has lighter tails, and if ... [G] version of a robust estimator T of a location parameter 6, and examines
is asymmetric, more on the side with heavier contamination. various possibilities. We are interested in
S (and T) are readily determined iteratively but typically possess no (T- 6)/S(T) (4.2.37)
simple explicit form.
Huber (1964) also considers robust estimation of u alone for the mixture- where, in the main, we would hope to achieve high robustness of performance
type model. He restricts his detailed proposals to the case where F is for T, and to then 'match' T with an estimated standard error S(T) yielding
normal, and the details are best deferred until Section 4.3: likewise methods high robustness of validity over a range of possible distributions for the
due to Guttman and Smith (1971) based on the Anscombe (1960a) premium- sample. Jackknifing may assist in obtaining an estimated variance for T (see
protection approach. Section 2.6). But our interest in determining the approximate distributional
form of quantities (4.2.37) undet a range of distributions supports the use of
4.2.4 Studentized location estimates, tests, and confidence intervals a simpler, tractable, estimator S(T). We can often expect asymptotic
normality for (4.2.37) but the question is when does this occur and how fast
If x 1 , x 2 , ••• , xn is a random sample from N(IL, u 2 ) where IL and u 2 are
is the approach to normality. These are complicated matters from the
unknown we bave the familiar test statistic
technical standpoint and highly dependent on the contemplated range of
i-ILo
t=-- (4.2.36) distributions. Asymptotic normality of the numerator (T- 6) is commonly
s/.Jn encountered as we bave already noted; finite sample approximate normality
has been examined by Hodges (1967) and by Leone, Jayachandran, and
for testing the hypothesis H: IL = ILo against H: IL :;i: ILo· The null-distribution Eisenstat (1967). For the studentized form it is also important to investigate
of t is of course Student's t with n l degrees of freedom. It is natura! to ask the consistency of S2 (T) as an estimator of the variance of T and also
how reasonable the t-test would be if we were wrong in attributing the whether S2 (T) and T become essentially independent. See Huber (1970,
sample to the normal distribution. This is an example of the robustness of 1972). In the former work Huber considers such problems in relation to L-,
significance tests. The t-test is not the only test we might wish to examine M-, an d R -estimators of 6, providing some prescriptions an d conjectures.
from this standpoint, but it is illustrative and has been given much attention See also Hodges and Lehmann (1963) (rank test procedures).
'in the literature. Various matters need to be more fully specified-in By analogy with the normal theory case it is also relevant to ask if
particular, we need to de dare the types of departure from normality to be (T- 6)/S(T) can bave its distribution approximated in finite samples by the
entertained and what criteria of robustness might be appropriately applied. t-distribution, and if so what form S(T) should take and what is an
As in ali aspects of the study of robustness only certain types of alternative appropriate number of degrees of freedom. Tukey and McLaughlin (1963)
model are relevant to the outlier problem. suggest a symmetrically trimmed mean Ir.r for T and tentatively conclude
One particular approach (among many possible ones) has been widely that this is best matched with S 2 (T) based on the related Winsorized sample,
studied. Given that the data may bave arisen from a variety of possible in an attempt to give greater (but not excessive) importance to the more
160 Outliers in statistica[ data Accommodation of outliers in univariate samples 161

Dixon (1960) describes estimation of u based on the range of trimmed distributional· sources, can we replace i and s in (4.2.36) by (robust)
samples an d shows (in the normai case) that they can ha ve relative linear estimators T and S in such a way that the corresponding statistic is stili
efficiency in excess of 96 per cent. But there is again no consideration of distributed essentially as Student's t over the range of contemplated distribu-
their robustness (no alternative model is contemplated). tions? This requires a decision on what is to be our criterion of the
Huber (1970) considers estimators of dispersion based on rank tests, on 'Student's t-ness' of the statistic. (The broader issue of ab initio generation
sums of squares of order statistics and o n his earlier 'pro posai 2 '-ali with a of a robust test of location does not seem to bave received much attention.)
view to determining an appropriate studentized form for a robust location On the particular approach it seems almost inevitable that the basic
estimator. (See Section 4.2.4.) The latter approach does merit further distribution F should be normal, or at very least symmetric. Neither is it
comment here. In Huber's proposal 2 we bave a method of simultaneously surprising that the range of distributions contemplated often consists of a set
estimating 8 and o-via (4.2.16-18) and (4.2.22):Thisproposal was prompted by of symmetrically contaminated versions of F of a mixture type with normal
the desire to obtain a robust estimator S of u, as well as a robust estimator T (or symmetric) contaminating distributions. This, of course, bears on our
of 6, in the context of a mixture-type alternative model (l- A)F + AG with interest in outliers-although occasionally wider families of distributions
symmetric F. Huber (1964) remarks of the procedure bave been entertained with little outlier relevance.
It corresponds to Winsorizing a variable number of observations: slightly more if ... Huber (1970) summarizes the basic aims in examining a studentized
[(1- A)F+ AG] has heavier tails, slightly less if ... [it] has lighter tails, and if ... [G] version of a robust estimator T of a location parameter 6, and examines
is asymmetric, more on the side with heavier contamination. various possibilities. We are interested in
S (and T) are readily determined iteratively but typically possess no (T- 6)/S(T) (4.2.37)
simple explicit form.
Huber (1964) also considers robust estimation of u alone for the mixture- where, in the main, we would hope to achieve high robustness of performance
type model. He restricts his detailed proposals to the case where F is for T, and to then 'match' T with an estimated standard error S(T) yielding
normal, and the details are best deferred until Section 4.3: likewise methods high robustness of validity over a range of possible distributions for the
due to Guttman and Smith (1971) based on the Anscombe (1960a) premium- sample. Jackknifing may assist in obtaining an estimated variance for T (see
protection approach. Section 2.6). But our interest in determining the approximate distributional
form of quantities (4.2.37) undet a range of distributions supports the use of
4.2.4 Studentized location estimates, tests, and confidence intervals a simpler, tractable, estimator S(T). We can often expect asymptotic
normality for (4.2.37) but the question is when does this occur and how fast
If x 1 , x 2 , ••• , xn is a random sample from N(IL, u 2 ) where IL and u 2 are
is the approach to normality. These are complicated matters from the
unknown we bave the familiar test statistic
technical standpoint and highly dependent on the contemplated range of
i-ILo
t=-- (4.2.36) distributions. Asymptotic normality of the numerator (T- 6) is commonly
s/.Jn encountered as we bave already noted; finite sample approximate normality
has been examined by Hodges (1967) and by Leone, Jayachandran, and
for testing the hypothesis H: IL = ILo against H: IL :;i: ILo· The null-distribution Eisenstat (1967). For the studentized form it is also important to investigate
of t is of course Student's t with n l degrees of freedom. It is natura! to ask the consistency of S2 (T) as an estimator of the variance of T and also
how reasonable the t-test would be if we were wrong in attributing the whether S2 (T) and T become essentially independent. See Huber (1970,
sample to the normal distribution. This is an example of the robustness of 1972). In the former work Huber considers such problems in relation to L-,
significance tests. The t-test is not the only test we might wish to examine M-, an d R -estimators of 6, providing some prescriptions an d conjectures.
from this standpoint, but it is illustrative and has been given much attention See also Hodges and Lehmann (1963) (rank test procedures).
'in the literature. Various matters need to be more fully specified-in By analogy with the normal theory case it is also relevant to ask if
particular, we need to de dare the types of departure from normality to be (T- 6)/S(T) can bave its distribution approximated in finite samples by the
entertained and what criteria of robustness might be appropriately applied. t-distribution, and if so what form S(T) should take and what is an
As in ali aspects of the study of robustness only certain types of alternative appropriate number of degrees of freedom. Tukey and McLaughlin (1963)
model are relevant to the outlier problem. suggest a symmetrically trimmed mean Ir.r for T and tentatively conclude
One particular approach (among many possible ones) has been widely that this is best matched with S 2 (T) based on the related Winsorized sample,
studied. Given that the data may bave arisen from a variety of possible in an attempt to give greater (but not excessive) importance to the more
162 Outliers in statistica[ data Accommodation of outliers in univariate samples 163

extreme observations. Specifically tbey suggest using 4.3 ACCOMMODATION OF OUTLIERS IN UNIVARIATE
T W
(Xr,r- 8}/{Sr)J[h(h -l)]} (4.2.38) NORMAL SAMPLES
as Student's t on (h -l) degrees of freedom, wbere h= n- 2r. Tbis prop- From the vast amount of materia! on generai robustness reviewed above we
osal is briefty examined in relation to (mainly) normal and uniform samples. now select for more detailed comment some results which relate most
A similar proposal, witb support in terms of tbe inftuence function, is closely to our theme: the accommodation of outliers. We sball be concerned
advanced by Huber (1972). Huber (1970) augments Tukey and McLaugb- with robustness in relation to families of distributions wbicb correspond witb
lin's small-sample investigations of (4.2.38) by demonstrating useful asymp- tbe alternative models for outliers previously discussed. Tbe information
totic properties. He reinforces the relative disadvantages of common match- below sometimes represents specific study of outlier-type families of
ing of trimmed means and scale estimates, or Winsorized means and scale distributions-more frequently it is obtained by judicious selection from
estimates (but see Section 4.3 on the latter combination). Huber also sbows more embracing robustness studies. It will sometimes prove convenient to
tbat lower degrees of freedom tban (h -l) for (4.2.38) would be appropriate use abbreviated notation to describe certain estimators and in sucb cases we
for long-tailed distributions. adopt that used by Andrews et al. (1972).
As ever, choice of tbe trimming factor r is a problem. Tukey and In tbe present section we concentrate on the case of a basic norma[ model
McLaugblin (1963) suggest an adaptive approach, cboosing r to minimize F, witb eitber a mixture type alternative
w
s;,rt[h(h -l)] or some allied quantity but they warn about loss of stability of (1-A)F+AG
tbe denominator term in (4.2.38) with an extensive amount of trimming. (G is also usually normal witb the same mean as F but larger variance) or a
We shall consider in more detail (Section 4.3) the use of a studentized slippage type alternative (wbere the slippage occurs eitber in the mean or in
form of Huber's 'proposal 2' and of tbe Hodges-Lebmann estimator for a tbe variance). In the concluding section we review results for the exponential
normal model symmetrically contaminated by mixing with a more disperse distribution witb an exchangeable type of alternative model.
normal distribution.
Huber (1968) considers some fundamental matters to do with robust Mixture model; estimation of the mean
testing of monotone likelihood ratio alternative hypotheses, and develops
minimax test procedures for various classes of problem. He extends bis tests Suppose our sample arises from (1- A)F+ AG wben F is N(#L, u 2 ) and we
in a natura! way to the construction of confidence limits. wisb to estimate IL· Huber (1964) sbows that tbe M-estimator witb
Tbe determination of robust confidence intervals for location or scale t lti~K
parameters can be approacbed in terms of almost ali the robust estimation t/l( t)= { (4.3.1)
metbods tbat we bave considered, provided we know the sampling distribu-
K sgn t lti> K
tiow of tbe estimator. But bere lies tbe major obstacle: seldom do we know is minimax among translation invariant estimators for symmetric G. Tabu-
enough about tbe sampling distribution, nor is it sufficiently constant in form lated values suggest tbat cboice of K is not bigbly criticai-performance
over tbe range of contemplated data-generating models. Altbough not being fairly insensitive and reasonable over tbe range l~ K ~ 2 for À < 0.2.
specifically directed to outlier-type contamination the range of non- For asymmetric G tbe above estimator proves to be biased. Huber discusses
parametric procedures do yield direct confidence intervals wbich may in tbe extent of the bias and concludes that attempts at substantially reducing
their generai robustness provide some basis for accommodating outliers. For the bias may be quite costly in terms of asymptotic variance. The straightfor-
a generai discussion see Noether (1974). Furtber details are included in ward M-estimators implicitly depend on knowing the values of u 2 and of A.
Lebmann (1975) and Noetber (1967, 1973). Scale invariant versions (allowing more realistically for unknown u 2 ) bave
Tbe asymptotic normality properties of L-, M-, and R-estimators ali been discussed in Section 4.2.1. Tbese include estimators where u 2 is
provide means for determining approximate confidence intervals, but tbeir estimated robustly, perbaps in terms of median deviation or interquartile
finite sample properties are little understood. Tbe more accessible forms for range and possibly involving associated Winsorization or baving multi-part
location parameters are typified by that whicb derives from studentized form.
location estimates. From (4.2.38), for example, we obtain a centrai confi- Alternatively, IL may be estimated simultaneously witb u 2 , as for example
in Huber's 'proposal 2' (tbat is solving (4.2.16) and (4.2.17) witb x(t) given
dence interval symmetric about ir,r with widtb an appropriate multiple (in by (4.2.22) and (4.2.23) and witb t/J(t) as in (4.3.1)). Some quantitative
w
terms of tbe approximating t-distribution) of Sr)J[h(h -l)]. features of tbis approach, for symmetric G, are given by Huber (1964) fora
162 Outliers in statistica[ data Accommodation of outliers in univariate samples 163

extreme observations. Specifically tbey suggest using 4.3 ACCOMMODATION OF OUTLIERS IN UNIVARIATE
T W
(Xr,r- 8}/{Sr)J[h(h -l)]} (4.2.38) NORMAL SAMPLES
as Student's t on (h -l) degrees of freedom, wbere h= n- 2r. Tbis prop- From the vast amount of materia! on generai robustness reviewed above we
osal is briefty examined in relation to (mainly) normal and uniform samples. now select for more detailed comment some results which relate most
A similar proposal, witb support in terms of tbe inftuence function, is closely to our theme: the accommodation of outliers. We sball be concerned
advanced by Huber (1972). Huber (1970) augments Tukey and McLaugb- with robustness in relation to families of distributions wbicb correspond witb
lin's small-sample investigations of (4.2.38) by demonstrating useful asymp- tbe alternative models for outliers previously discussed. Tbe information
totic properties. He reinforces the relative disadvantages of common match- below sometimes represents specific study of outlier-type families of
ing of trimmed means and scale estimates, or Winsorized means and scale distributions-more frequently it is obtained by judicious selection from
estimates (but see Section 4.3 on the latter combination). Huber also sbows more embracing robustness studies. It will sometimes prove convenient to
tbat lower degrees of freedom tban (h -l) for (4.2.38) would be appropriate use abbreviated notation to describe certain estimators and in sucb cases we
for long-tailed distributions. adopt that used by Andrews et al. (1972).
As ever, choice of tbe trimming factor r is a problem. Tukey and In tbe present section we concentrate on the case of a basic norma[ model
McLaugblin (1963) suggest an adaptive approach, cboosing r to minimize F, witb eitber a mixture type alternative
w
s;,rt[h(h -l)] or some allied quantity but they warn about loss of stability of (1-A)F+AG
tbe denominator term in (4.2.38) with an extensive amount of trimming. (G is also usually normal witb the same mean as F but larger variance) or a
We shall consider in more detail (Section 4.3) the use of a studentized slippage type alternative (wbere the slippage occurs eitber in the mean or in
form of Huber's 'proposal 2' and of tbe Hodges-Lebmann estimator for a tbe variance). In the concluding section we review results for the exponential
normal model symmetrically contaminated by mixing with a more disperse distribution witb an exchangeable type of alternative model.
normal distribution.
Huber (1968) considers some fundamental matters to do with robust Mixture model; estimation of the mean
testing of monotone likelihood ratio alternative hypotheses, and develops
minimax test procedures for various classes of problem. He extends bis tests Suppose our sample arises from (1- A)F+ AG wben F is N(#L, u 2 ) and we
in a natura! way to the construction of confidence limits. wisb to estimate IL· Huber (1964) sbows that tbe M-estimator witb
Tbe determination of robust confidence intervals for location or scale t lti~K
parameters can be approacbed in terms of almost ali the robust estimation t/l( t)= { (4.3.1)
metbods tbat we bave considered, provided we know the sampling distribu-
K sgn t lti> K
tiow of tbe estimator. But bere lies tbe major obstacle: seldom do we know is minimax among translation invariant estimators for symmetric G. Tabu-
enough about tbe sampling distribution, nor is it sufficiently constant in form lated values suggest tbat cboice of K is not bigbly criticai-performance
over tbe range of contemplated data-generating models. Altbough not being fairly insensitive and reasonable over tbe range l~ K ~ 2 for À < 0.2.
specifically directed to outlier-type contamination the range of non- For asymmetric G tbe above estimator proves to be biased. Huber discusses
parametric procedures do yield direct confidence intervals wbich may in tbe extent of the bias and concludes that attempts at substantially reducing
their generai robustness provide some basis for accommodating outliers. For the bias may be quite costly in terms of asymptotic variance. The straightfor-
a generai discussion see Noether (1974). Furtber details are included in ward M-estimators implicitly depend on knowing the values of u 2 and of A.
Lebmann (1975) and Noetber (1967, 1973). Scale invariant versions (allowing more realistically for unknown u 2 ) bave
Tbe asymptotic normality properties of L-, M-, and R-estimators ali been discussed in Section 4.2.1. Tbese include estimators where u 2 is
provide means for determining approximate confidence intervals, but tbeir estimated robustly, perbaps in terms of median deviation or interquartile
finite sample properties are little understood. Tbe more accessible forms for range and possibly involving associated Winsorization or baving multi-part
location parameters are typified by that whicb derives from studentized form.
location estimates. From (4.2.38), for example, we obtain a centrai confi- Alternatively, IL may be estimated simultaneously witb u 2 , as for example
in Huber's 'proposal 2' (tbat is solving (4.2.16) and (4.2.17) witb x(t) given
dence interval symmetric about ir,r with widtb an appropriate multiple (in by (4.2.22) and (4.2.23) and witb t/J(t) as in (4.3.1)). Some quantitative
w
terms of tbe approximating t-distribution) of Sr)J[h(h -l)]. features of tbis approach, for symmetric G, are given by Huber (1964) fora
164 Outliers in statistica[ data Accommodation of outliers in univariate samples 165

range of values of K and À. Minimax optimality, and asymptotic variances, minimax performance over wide-ranging families of distinct distributions-
are highly limited criteria, but it is of some relevance (if not specific to normal to Cauchy. Such a catholic situation does not accord with the outlier
outliers) to note that various M-type estimators find broad support on models we bave been considering.)
various bases in the study by Andrews et al. (1972). Although no simple Some Monte Carlo results reported by Huber (1972), from the Andrews
prescription of 'best estimator' is feasible, the different contributors tend to et al. (1972) work, are illuminating. For samples of size 20 he compares 20
include among their recommendations M -estimators an d one-step Huber estimators in terms of their estimated variance when (20- k) observations
estimators (both using (4.3.1), median deviation to estimate u and with K in come from N(O, l) and k come from N(O, 9). This amounts to a scale-
the vicinity of 1.5), Huber's proposal 2 (with somewhat smaller K) and the slippage type model. When k = l some estimators show up better than
three-part descending estimators (with a, b, c ÌI\ the regions of 2, 4, and 8, or others: the a-trimmed mean with a= 0.05 or 0.10, H20 and H15 (Huber's
with a chosen adaptively and b, c in the regions of 4 and 8). From Siddiqui 'proposal 2' with K = 2.0 or 1.5) A15 and P15 (Huber M-estimates, with
and Raghunandanan (1967) we can make a limited asymptotic comparison K = 1.5 and median dispersion scale estimate, in direct form, and in one-step
of the Hodges-Lehmann estimator, trimmed and Winsorized means an d the form starting with the median) and 25A (Hampel's three-part descending
estimator (4.2.26): that is, estimator with a= 2.5, b = 4.5, c= 9.5). For k = 2, 3 the a-trimmed means
remain impressive with a advancing to 0.1 or 0.15, and 0.15,. respectively,
Tn = y(X([pn]+l) + X(n-[pn])) +(l- 2y )i. as do H15 and Hl O, respectively, and, for k = 2 alone, A15, Pl5, 25A. With
With the mixture model where F and G are N(#L, u 2 ) and N(~-t, 9u 2 ), 18 observations from N(O, l) and two from N(O, 100), 25A stands out as
respectively, and the mixing parameter À is restricted to at most 0.05, there better than most other estimators. Often the estimated variances show only
is little to choose between the first three estimators (the best trimming factor small differences and to put the above recommendations in perspective it is
has the value in the region of a= 0.20) with minimum efficiency about 95 useful to examine Table 4.2 extracted from the tabulated results in Huber
per cent, almost 10 per cent higher than that of the best form of (4.2.26). (It (1972).
is worth noting that Gastwirth's version (4.2.27) achieves an efficiency of We should also bear in mind the computational effort involved in con-
about 80 per cent or more for the set of inherent alternatives: normal, structing the estimators. Trimmed means are fairly easily determined, and
Cauchy, double-exponential, logistic. See Gastwirth, 1966.) whilst possibly needing some iteration the various Huber-type estimators
Asymptotic properties need however to be augmented with information (such as H15, A15, P15) are not unreasonable. In comparison, the Hodges-
on finite sample behaviour. Lehmann estimator can be most time-consuming.
In Gastwirth and Cohen (1970) we find tables of means, variances, and Hodges (1967) uses Monte Carlo methods to examine the extent to which
covariances of order statistics for samples of sizes up to 20 from contami- some simple location estimators are efficient with respect to estimating the
nated normal distributions mean of a normal distribution and are able to 'tolerate extreme values' in
(1-A)F+AG the sense of not being inftuenced by the r lowest and r highest extremes.
Thus Ir,r and !,,r bave 'tolerance' r, the median i has tolerance [(n -1)/2], i
where F is N(O, 1), G is N(O, 9) and À = 0.01, 0.05, 0.10. These are useful
has tolerance O. A modified more easily calculated type of Hodges-
for comparing the performance of order-statistics-based robust linear es-
Lehmann estimator, BH, is the median of the means of symmetrically
timators of the mean in the corresponding range of contaminated normal
chosen pairs of ordered observations (the Bickel-Hodges folded median; see
distributions. The authors tabulate some results from which we see that over
also Bickel and Hodges, 1967). BH has tolerance [(n -1)/4] and is shown by
the types of estimator they consider (including mean, median, trimmed
sampling experiments with n= 18 to bave efficiency about 95 per cent
means, Winsorized means, combinations of the median with equally weigh-
relative to i. (But this needs careful interpretation-there is no contempla-
ted fractiles of the form (4.2.26), and the Hodges-Lehmann estimator), it is
tion of an alternative outlier generating model, we are merely estimating IL
again the trimmed means which perform well in terms of minimax variance
with reduced consideration of extreme values whether or not they are
both asymptotically and at the different finite sample sizes: typically (for
discordant. We do not learn how BH compares with other estimators for a
n= 20) needing trimming factor a in the regions of
prescribed mixture- or slippage-type model).
0.15-0.20 (O~X.~O.l), Other performance characteristics are also important, for example those
based on the inftuence curve including such features as gross-error-
0.10-0.15 (O~ X.~ 0.05).
sensitivity, local-shift-sensitivity, and rejection point. Hampel (1974) tabu-
(Note that published support for a-values as high as 0.25-0.30 is based on lates such quantities for many of the estimators we bave discussed assuming
164 Outliers in statistica[ data Accommodation of outliers in univariate samples 165

range of values of K and À. Minimax optimality, and asymptotic variances, minimax performance over wide-ranging families of distinct distributions-
are highly limited criteria, but it is of some relevance (if not specific to normal to Cauchy. Such a catholic situation does not accord with the outlier
outliers) to note that various M-type estimators find broad support on models we bave been considering.)
various bases in the study by Andrews et al. (1972). Although no simple Some Monte Carlo results reported by Huber (1972), from the Andrews
prescription of 'best estimator' is feasible, the different contributors tend to et al. (1972) work, are illuminating. For samples of size 20 he compares 20
include among their recommendations M -estimators an d one-step Huber estimators in terms of their estimated variance when (20- k) observations
estimators (both using (4.3.1), median deviation to estimate u and with K in come from N(O, l) and k come from N(O, 9). This amounts to a scale-
the vicinity of 1.5), Huber's proposal 2 (with somewhat smaller K) and the slippage type model. When k = l some estimators show up better than
three-part descending estimators (with a, b, c ÌI\ the regions of 2, 4, and 8, or others: the a-trimmed mean with a= 0.05 or 0.10, H20 and H15 (Huber's
with a chosen adaptively and b, c in the regions of 4 and 8). From Siddiqui 'proposal 2' with K = 2.0 or 1.5) A15 and P15 (Huber M-estimates, with
and Raghunandanan (1967) we can make a limited asymptotic comparison K = 1.5 and median dispersion scale estimate, in direct form, and in one-step
of the Hodges-Lehmann estimator, trimmed and Winsorized means an d the form starting with the median) and 25A (Hampel's three-part descending
estimator (4.2.26): that is, estimator with a= 2.5, b = 4.5, c= 9.5). For k = 2, 3 the a-trimmed means
remain impressive with a advancing to 0.1 or 0.15, and 0.15,. respectively,
Tn = y(X([pn]+l) + X(n-[pn])) +(l- 2y )i. as do H15 and Hl O, respectively, and, for k = 2 alone, A15, Pl5, 25A. With
With the mixture model where F and G are N(#L, u 2 ) and N(~-t, 9u 2 ), 18 observations from N(O, l) and two from N(O, 100), 25A stands out as
respectively, and the mixing parameter À is restricted to at most 0.05, there better than most other estimators. Often the estimated variances show only
is little to choose between the first three estimators (the best trimming factor small differences and to put the above recommendations in perspective it is
has the value in the region of a= 0.20) with minimum efficiency about 95 useful to examine Table 4.2 extracted from the tabulated results in Huber
per cent, almost 10 per cent higher than that of the best form of (4.2.26). (It (1972).
is worth noting that Gastwirth's version (4.2.27) achieves an efficiency of We should also bear in mind the computational effort involved in con-
about 80 per cent or more for the set of inherent alternatives: normal, structing the estimators. Trimmed means are fairly easily determined, and
Cauchy, double-exponential, logistic. See Gastwirth, 1966.) whilst possibly needing some iteration the various Huber-type estimators
Asymptotic properties need however to be augmented with information (such as H15, A15, P15) are not unreasonable. In comparison, the Hodges-
on finite sample behaviour. Lehmann estimator can be most time-consuming.
In Gastwirth and Cohen (1970) we find tables of means, variances, and Hodges (1967) uses Monte Carlo methods to examine the extent to which
covariances of order statistics for samples of sizes up to 20 from contami- some simple location estimators are efficient with respect to estimating the
nated normal distributions mean of a normal distribution and are able to 'tolerate extreme values' in
(1-A)F+AG the sense of not being inftuenced by the r lowest and r highest extremes.
Thus Ir,r and !,,r bave 'tolerance' r, the median i has tolerance [(n -1)/2], i
where F is N(O, 1), G is N(O, 9) and À = 0.01, 0.05, 0.10. These are useful
has tolerance O. A modified more easily calculated type of Hodges-
for comparing the performance of order-statistics-based robust linear es-
Lehmann estimator, BH, is the median of the means of symmetrically
timators of the mean in the corresponding range of contaminated normal
chosen pairs of ordered observations (the Bickel-Hodges folded median; see
distributions. The authors tabulate some results from which we see that over
also Bickel and Hodges, 1967). BH has tolerance [(n -1)/4] and is shown by
the types of estimator they consider (including mean, median, trimmed
sampling experiments with n= 18 to bave efficiency about 95 per cent
means, Winsorized means, combinations of the median with equally weigh-
relative to i. (But this needs careful interpretation-there is no contempla-
ted fractiles of the form (4.2.26), and the Hodges-Lehmann estimator), it is
tion of an alternative outlier generating model, we are merely estimating IL
again the trimmed means which perform well in terms of minimax variance
with reduced consideration of extreme values whether or not they are
both asymptotically and at the different finite sample sizes: typically (for
discordant. We do not learn how BH compares with other estimators for a
n= 20) needing trimming factor a in the regions of
prescribed mixture- or slippage-type model).
0.15-0.20 (O~X.~O.l), Other performance characteristics are also important, for example those
based on the inftuence curve including such features as gross-error-
0.10-0.15 (O~ X.~ 0.05).
sensitivity, local-shift-sensitivity, and rejection point. Hampel (1974) tabu-
(Note that published support for a-values as high as 0.25-0.30 is based on lates such quantities for many of the estimators we bave discussed assuming
166 Outliers in statistica[ data
Accommodation of outliers in univariate samples 167

Table 4.2 Monte Carlo variances of n!Tn for selected estimators and distributions; (1960a), that is, variance ratios or relative efficiencies. (See Sections 2.6 and
sample size n = 20 4.1.1.)
Consider first tbe case wbere we wisb to estimate 1-t· robustly under
(n- k)N(O, l) plus 18N(O, l) Ferguson's mode/ A for slippage of tbe mean or model B for slippage of
N(O, l) kN(O, 9), n= 20 plus scale. Here we assume tbat x 1 , x2 , ••• , Xn arise from N(~-t, u 2 ), but entertain
tbe prospect tbat at most one observation may bave arisen from N(~-t +a, u )
2
n=oo n=20 k=l k=2 k=3 2N(O, 100)
(mode} A) or from N(~-t, bu ) witb b > 1 (model B). In eitber case tbe tbree
2
Me an 1.00 1.00 1.40 1.80 2.20 10.90 robust estimators considered are tbose described in Section 4.2.1, namely
a= 0.05 1.026 1.02 1.16 1.39 1.64 2.90
a= 0.10 1.060 1.06 1.17 ~ 1.31 1.47 1.46 tbe modified trimmed mean, tbe modified Winsorized mean and tbe semi-
Trimmed Winsorized me an wbicb we de note TA' T w, and T 8 • Guttman an d Smitb
a= 0.15 1.100 1.10 1.19 1.32 1.44 1.43
me an
a =0.25 1.195 1.20 1.27 1.41 1.50 1.47 (1969) determine and compare tbe finite-sample premium and protection
medi an 1.571 1.50 1.52 1.70 1.75 1.80 measures for tbese estimators. Detailed results are presented for tbe case
wbere u 2 is known. Wben u 2 is unknown (and replaced by tbe full-sample
K =2.0 1.010 1.01 1.17 1.41 1.66 1.78
Huber K = 1.5 1.037 1.04 1.16 1.32 1.49 1.50 unbiased variance estimate s 2 ) computational difficulties restrict tbe amount
(1964) K = 1.0 1.107 1.11 1.21 1.34 1.44 1.43 of information readily obtainable.
prop. 2 K=0.7 1.187 1.20 1.27 1.42 1.49 1.47 Under tbe basic model i is optimal for 1-t· Putting il= i in (4.1.2) and
(4.1.3) (witb obvious modification of tbe latter to allow for any bias in tbe
Hodges-Lehmann 1.047 1.06 1.18 1.35 1.50 1.52 typical candidate estimator T) we can determine tbe premium and protec-
Gastwirth (1966) 1.28 1.23 1.30 1.45 1.52 1.50
Jaeckel (1969) 1.000 1.10 1.21 1.37 1.47 1.45 tion measures for TA, Tw, and T 8 • Tbe premiums bave tbe generai form:
Hogg (1967) 1.000 1.06 1.28 1.56 1.79 1.79 Premium = nE( lP)! u 2 (4.3.2)
Takeuchi (1971) 1.000 1.05 1.19 1.38 1.53 1.32
wben T is re-expressed as i+ U. Tbe protection is
Al5 1.037 1.05 1.17 1.33 1.47 1.49
{E[(i-~-t) 2 ]-E[(T-~-t) 2 ]}/E[(i-~-t) ]
2
Pl5 1.037 1.05 1.17 1.33 1.47 1.49

Hampel 25A 1.025 1.05 1.16 1.32 1.49 1.26 evaluated under tbe alternative bypotbesis, and for model A and model B,
Hampel 12A 1.166 1.20 1.26 1.40 1.47 1.32 respectively, we bave:
. {-n
ProtectiOn =
2 2 2
E[U(U +2au)/n]/[u (n + )]. a
(4.3.3)

an uncontaminated normal distribution. Tbougb we would be better armed -n E[U(U + 2i- 2~-t)]/[u (n + b -1)].
2 2
(4.3.4)
for current purposes if tbe distribution were of a mixture or slippage type, To determine (4.3.2) and (4.3.3) or (4.3.4) we bave to investigate tbe first
tbe results are interesting. One comment, in particular, adds to tbe summary two moments of tbe incrementai estimators U A' Uw, and Us under tbe basic
above: and alternative models. Simple closed form expressions are not available,
three-part descending M -estimators pay a small premium in asymptotic varia n ce of but Guttman and Smitb (1969) develop an appropriate computational
gross-error-sensitivity, as compared with Huber estimators in order to be able to (Monte Carlo type) procedure and provi de grapbs and tables for comparing
reject outliers completely. (Hampel, 1974) ' TA, T 8 , and Tw for sample sizes up to 10 at premium levels of 5 per cent
A detailed study of tbe accommodation of outliers in slippage models is and l per cent and for different values of tbe slippage parameters a and b.
presented by Guttman and Smitb (1969, 1971) and Guttman (1973a). Tbey Tbe generai conclusions are tbat under model A Ts is best for small a, T w
consider tbree specific metbods based on modified trimming, modified for intermedia te a, and TA for large a. Under mode l B, T A is no t a
Winsorization, and semi-Winsorization (tbe 'A-rule', after Anscombe, 1960a; contender; T 8 is best for small b, T w for large b. We need to recognize tbe
tbe 'W-rule' and tbe 'S-rule') for estimating tbe mean (Guttman and Smitb limitations of tbese results. Comparisons are purely relative witbin tbe set
1969; Guttman 197Ja) and tbe variance (Guttman and Smitb, 1971) for a {TA' T w, T8 }; we do no t know bow tbese estimators compare witb otbers.
normal basic model N(~-t, u 2 ) witb a location, or scale, slippage alternative Tbe values of tbe slippage parameters a or b will be unknown, so cboice
model to explain tbe bebaviour of one or two observations. Performance witbin tbe se t is problematical. Only sample sizes up to n = l O are consi-
cbaracteristics are restricted to tbe premium-protection ideas of Anscombe dered in detail. Tbe assumption tbat u 2 is known is unrealistic; only for the
166 Outliers in statistica[ data
Accommodation of outliers in univariate samples 167

Table 4.2 Monte Carlo variances of n!Tn for selected estimators and distributions; (1960a), that is, variance ratios or relative efficiencies. (See Sections 2.6 and
sample size n = 20 4.1.1.)
Consider first tbe case wbere we wisb to estimate 1-t· robustly under
(n- k)N(O, l) plus 18N(O, l) Ferguson's mode/ A for slippage of tbe mean or model B for slippage of
N(O, l) kN(O, 9), n= 20 plus scale. Here we assume tbat x 1 , x2 , ••• , Xn arise from N(~-t, u 2 ), but entertain
tbe prospect tbat at most one observation may bave arisen from N(~-t +a, u )
2
n=oo n=20 k=l k=2 k=3 2N(O, 100)
(mode} A) or from N(~-t, bu ) witb b > 1 (model B). In eitber case tbe tbree
2
Me an 1.00 1.00 1.40 1.80 2.20 10.90 robust estimators considered are tbose described in Section 4.2.1, namely
a= 0.05 1.026 1.02 1.16 1.39 1.64 2.90
a= 0.10 1.060 1.06 1.17 ~ 1.31 1.47 1.46 tbe modified trimmed mean, tbe modified Winsorized mean and tbe semi-
Trimmed Winsorized me an wbicb we de note TA' T w, and T 8 • Guttman an d Smitb
a= 0.15 1.100 1.10 1.19 1.32 1.44 1.43
me an
a =0.25 1.195 1.20 1.27 1.41 1.50 1.47 (1969) determine and compare tbe finite-sample premium and protection
medi an 1.571 1.50 1.52 1.70 1.75 1.80 measures for tbese estimators. Detailed results are presented for tbe case
wbere u 2 is known. Wben u 2 is unknown (and replaced by tbe full-sample
K =2.0 1.010 1.01 1.17 1.41 1.66 1.78
Huber K = 1.5 1.037 1.04 1.16 1.32 1.49 1.50 unbiased variance estimate s 2 ) computational difficulties restrict tbe amount
(1964) K = 1.0 1.107 1.11 1.21 1.34 1.44 1.43 of information readily obtainable.
prop. 2 K=0.7 1.187 1.20 1.27 1.42 1.49 1.47 Under tbe basic model i is optimal for 1-t· Putting il= i in (4.1.2) and
(4.1.3) (witb obvious modification of tbe latter to allow for any bias in tbe
Hodges-Lehmann 1.047 1.06 1.18 1.35 1.50 1.52 typical candidate estimator T) we can determine tbe premium and protec-
Gastwirth (1966) 1.28 1.23 1.30 1.45 1.52 1.50
Jaeckel (1969) 1.000 1.10 1.21 1.37 1.47 1.45 tion measures for TA, Tw, and T 8 • Tbe premiums bave tbe generai form:
Hogg (1967) 1.000 1.06 1.28 1.56 1.79 1.79 Premium = nE( lP)! u 2 (4.3.2)
Takeuchi (1971) 1.000 1.05 1.19 1.38 1.53 1.32
wben T is re-expressed as i+ U. Tbe protection is
Al5 1.037 1.05 1.17 1.33 1.47 1.49
{E[(i-~-t) 2 ]-E[(T-~-t) 2 ]}/E[(i-~-t) ]
2
Pl5 1.037 1.05 1.17 1.33 1.47 1.49

Hampel 25A 1.025 1.05 1.16 1.32 1.49 1.26 evaluated under tbe alternative bypotbesis, and for model A and model B,
Hampel 12A 1.166 1.20 1.26 1.40 1.47 1.32 respectively, we bave:
. {-n
ProtectiOn =
2 2 2
E[U(U +2au)/n]/[u (n + )]. a
(4.3.3)

an uncontaminated normal distribution. Tbougb we would be better armed -n E[U(U + 2i- 2~-t)]/[u (n + b -1)].
2 2
(4.3.4)
for current purposes if tbe distribution were of a mixture or slippage type, To determine (4.3.2) and (4.3.3) or (4.3.4) we bave to investigate tbe first
tbe results are interesting. One comment, in particular, adds to tbe summary two moments of tbe incrementai estimators U A' Uw, and Us under tbe basic
above: and alternative models. Simple closed form expressions are not available,
three-part descending M -estimators pay a small premium in asymptotic varia n ce of but Guttman and Smitb (1969) develop an appropriate computational
gross-error-sensitivity, as compared with Huber estimators in order to be able to (Monte Carlo type) procedure and provi de grapbs and tables for comparing
reject outliers completely. (Hampel, 1974) ' TA, T 8 , and Tw for sample sizes up to 10 at premium levels of 5 per cent
A detailed study of tbe accommodation of outliers in slippage models is and l per cent and for different values of tbe slippage parameters a and b.
presented by Guttman and Smitb (1969, 1971) and Guttman (1973a). Tbey Tbe generai conclusions are tbat under model A Ts is best for small a, T w
consider tbree specific metbods based on modified trimming, modified for intermedia te a, and TA for large a. Under mode l B, T A is no t a
Winsorization, and semi-Winsorization (tbe 'A-rule', after Anscombe, 1960a; contender; T 8 is best for small b, T w for large b. We need to recognize tbe
tbe 'W-rule' and tbe 'S-rule') for estimating tbe mean (Guttman and Smitb limitations of tbese results. Comparisons are purely relative witbin tbe set
1969; Guttman 197Ja) and tbe variance (Guttman and Smitb, 1971) for a {TA' T w, T8 }; we do no t know bow tbese estimators compare witb otbers.
normal basic model N(~-t, u 2 ) witb a location, or scale, slippage alternative Tbe values of tbe slippage parameters a or b will be unknown, so cboice
model to explain tbe bebaviour of one or two observations. Performance witbin tbe se t is problematical. Only sample sizes up to n = l O are consi-
cbaracteristics are restricted to tbe premium-protection ideas of Anscombe dered in detail. Tbe assumption tbat u 2 is known is unrealistic; only for the
Accommodation of outliers in univariate samples 169
168 Outliers in statistica/ data
2
case n = 3 are any results available wben u 2 is unknown. Restriction to 5 per are considered: specifically F is N( u. u 2 ) and G is N(#L +a, bu ) witb
cent and l per cent premium levels is arbitrary; we need to learn from a= O,!, l and b =l, 9, 25. Tbe estimators considered are various joint es-
experience and intercomparison of different measures if tbese levels are timators (T, S) obtained under Huber's proposal 2. Tbe goodness-of-fit of
reasonable in practical terms. Tbe estimators are defined in terms of cut-off Ji,( T- IL )l S to a"*Student's t distribution was examined for samples of size
values c (modification of residuals takes piace if tbey exceed eu in absolute n= 20 (also studied is tbe extent to wbicb tbeir T-estimators, and tbe
value). c needs to be determined in any situation. It depends on tbe cbosen Hodges-Lebmann estimator, bave approximate normal distributions).
premium, tbe sample size and tbe type of estimator. Guttman and Smitb Broad conclusions from a mass of empirica! results include tbe following.
(1969) tabulate approximate values of c for premiums of 5 per cent and l (i) For proximity of J~(T- f.L)IS to Student's t distribution over tbe con-
per cent and n= 3, 4(2)10. templated range of models, reasonable choice of K in Huber's proposal
Extensions to larger sample sizes (encompa;sing tbe prospect of one or 2 is in tbe region of 1.8 or 1.9. Tbe fit is reasonable except for extreme
two discordant values) are considered by Guttman (1973a), again principally cases sucb as A= 0.1, b = 25.
for tbe case of known u 2 • An interesting feature of tbis work is tbe
replacement of tbe residuals by adjusted (independent) residuals to facilitate (ii) Tbe Huber (K = l, 1.5, 2) and Hodges-Lebmann location estimators are
tbe calculation of pretbium and protection in larger sample sizes. See also reasonably normal, except again in extreme cases, e.g. A= 0.1, a= l,
Tiao and Guttman (1967). (Tbe multivariate case is considered in Section b=25.
7.3.1.) Under tbe basic model tbe residuals xi-i (j= l, 2, ... , n) bave
common variance (n -l)u 2 /n, and covariance -u 2 /n. Tbus if u is an Dispersion estimators
observation from N(O, 1), independent of tbe xi, tbe adjusted residuals Tbere bas been little detailed study of robust dispersion estimators per se,
(4.3.5) eitber in relation to a generai family of distinct distributional models or in
2
relation to specific cases sucb as families of mixed, or slipped, normal
are independent observations from N(O, u ). For reasonable sample sizes tbe distributions. We bave en passant referred to particular estimators based on
zi will differ little from tbe true residuals xi - i; tbe induced independence, tbe interquartile range, Q, tbe median deviation (e.g. Hampel, 1974)
however, renders tbe determination of performance measures of tbe corres-
ponding T A' T 8 , T w more tractable and enables some quantitative compari- sm = median {lxv)- il}
sons to be made. See Guttman (1973a) for details. use of quasi-ranges (but specifically for tbe uncontaminated normal-
Anotber area in wbicb some detailed numerica! studies bave been made distribution: Dixon, 1960) and quadratic measures, using trimmed or Win-
specifically for a normal basic model is tbat of studentized location es-
timators of tbe form Ji,(T-IL)IS. Dixon and Tukey (1968) study by sorized, samples, sucb as ~;,r (e.g. Dixon and Tukey, 1968) or, witb an
qualitative arguments and Monte Carlo metbods tbe sampling bebaviour of analogous notational interpretation, ~;,f' Tbese bave been used (witb especial
support for sm) in an auxiliary role in constructing robust location estimators
tw- Xr,r -IL)/ J{Sr)[n(n
_(w w2 -l)]}. (4.3.6) sucb as Huber-type estimators, also in considering robust studentized loca-
Tbey conclude tbat tion estimators.
(h -l)tw/(n -l) Specific proposals for mixture and slippage models witb normal distribu-
tions are made by Huber (1964) and Guttman and Smitb (1971).
has a distribution which is well approximated by Student's t distribution with Huber (1964) re-expresses tbe estimation of a scale parameter u for a
h -l degrees of freedom (h= n- 2r). No consideration is given to bow tbe random variable X in terms of estimation of a location parameter for
distribution of tw cbanges over, say, a mixture model (1- A)F+ AG where F Y = log (XZ). Tbus we are estimating T= log (u 2 ).
and G are similarly centred but differently scaled normal distributions, He sbows tbat for tbe contaminated normal case (tbe mixture model)
which would be germane to tbe outlier problem. tbere is a minimax M -estimator f of T satisfying
Just sucb a mixture model is examined, bowever, by Leone, Jayachan-
dran, and Eisenstat (1967). Again by Monte Carlo methods, tbey examine L x<Yi- f)= o
tbe sampling behaviour of studentized forms, Jn(T- f.L)IS, of robust loca- wbere
4(et -l) 41et-ll<c (4.3.7)
tion estimators, for tbe mixture model (1- A)F+ AG witb A= 0.05 and 0.10 x( t)= {c sgn(et -l) 41et-ll;?!:c
and F and G both normal. Symmetric and asymmetric contamination
Accommodation of outliers in univariate samples 169
168 Outliers in statistica/ data
2
case n = 3 are any results available wben u 2 is unknown. Restriction to 5 per are considered: specifically F is N( u. u 2 ) and G is N(#L +a, bu ) witb
cent and l per cent premium levels is arbitrary; we need to learn from a= O,!, l and b =l, 9, 25. Tbe estimators considered are various joint es-
experience and intercomparison of different measures if tbese levels are timators (T, S) obtained under Huber's proposal 2. Tbe goodness-of-fit of
reasonable in practical terms. Tbe estimators are defined in terms of cut-off Ji,( T- IL )l S to a"*Student's t distribution was examined for samples of size
values c (modification of residuals takes piace if tbey exceed eu in absolute n= 20 (also studied is tbe extent to wbicb tbeir T-estimators, and tbe
value). c needs to be determined in any situation. It depends on tbe cbosen Hodges-Lebmann estimator, bave approximate normal distributions).
premium, tbe sample size and tbe type of estimator. Guttman and Smitb Broad conclusions from a mass of empirica! results include tbe following.
(1969) tabulate approximate values of c for premiums of 5 per cent and l (i) For proximity of J~(T- f.L)IS to Student's t distribution over tbe con-
per cent and n= 3, 4(2)10. templated range of models, reasonable choice of K in Huber's proposal
Extensions to larger sample sizes (encompa;sing tbe prospect of one or 2 is in tbe region of 1.8 or 1.9. Tbe fit is reasonable except for extreme
two discordant values) are considered by Guttman (1973a), again principally cases sucb as A= 0.1, b = 25.
for tbe case of known u 2 • An interesting feature of tbis work is tbe
replacement of tbe residuals by adjusted (independent) residuals to facilitate (ii) Tbe Huber (K = l, 1.5, 2) and Hodges-Lebmann location estimators are
tbe calculation of pretbium and protection in larger sample sizes. See also reasonably normal, except again in extreme cases, e.g. A= 0.1, a= l,
Tiao and Guttman (1967). (Tbe multivariate case is considered in Section b=25.
7.3.1.) Under tbe basic model tbe residuals xi-i (j= l, 2, ... , n) bave
common variance (n -l)u 2 /n, and covariance -u 2 /n. Tbus if u is an Dispersion estimators
observation from N(O, 1), independent of tbe xi, tbe adjusted residuals Tbere bas been little detailed study of robust dispersion estimators per se,
(4.3.5) eitber in relation to a generai family of distinct distributional models or in
2
relation to specific cases sucb as families of mixed, or slipped, normal
are independent observations from N(O, u ). For reasonable sample sizes tbe distributions. We bave en passant referred to particular estimators based on
zi will differ little from tbe true residuals xi - i; tbe induced independence, tbe interquartile range, Q, tbe median deviation (e.g. Hampel, 1974)
however, renders tbe determination of performance measures of tbe corres-
ponding T A' T 8 , T w more tractable and enables some quantitative compari- sm = median {lxv)- il}
sons to be made. See Guttman (1973a) for details. use of quasi-ranges (but specifically for tbe uncontaminated normal-
Anotber area in wbicb some detailed numerica! studies bave been made distribution: Dixon, 1960) and quadratic measures, using trimmed or Win-
specifically for a normal basic model is tbat of studentized location es-
timators of tbe form Ji,(T-IL)IS. Dixon and Tukey (1968) study by sorized, samples, sucb as ~;,r (e.g. Dixon and Tukey, 1968) or, witb an
qualitative arguments and Monte Carlo metbods tbe sampling bebaviour of analogous notational interpretation, ~;,f' Tbese bave been used (witb especial
support for sm) in an auxiliary role in constructing robust location estimators
tw- Xr,r -IL)/ J{Sr)[n(n
_(w w2 -l)]}. (4.3.6) sucb as Huber-type estimators, also in considering robust studentized loca-
Tbey conclude tbat tion estimators.
(h -l)tw/(n -l) Specific proposals for mixture and slippage models witb normal distribu-
tions are made by Huber (1964) and Guttman and Smitb (1971).
has a distribution which is well approximated by Student's t distribution with Huber (1964) re-expresses tbe estimation of a scale parameter u for a
h -l degrees of freedom (h= n- 2r). No consideration is given to bow tbe random variable X in terms of estimation of a location parameter for
distribution of tw cbanges over, say, a mixture model (1- A)F+ AG where F Y = log (XZ). Tbus we are estimating T= log (u 2 ).
and G are similarly centred but differently scaled normal distributions, He sbows tbat for tbe contaminated normal case (tbe mixture model)
which would be germane to tbe outlier problem. tbere is a minimax M -estimator f of T satisfying
Just sucb a mixture model is examined, bowever, by Leone, Jayachan-
dran, and Eisenstat (1967). Again by Monte Carlo methods, tbey examine L x<Yi- f)= o
tbe sampling behaviour of studentized forms, Jn(T- f.L)IS, of robust loca- wbere
4(et -l) 41et-ll<c (4.3.7)
tion estimators, for tbe mixture model (1- A)F+ AG witb A= 0.05 and 0.10 x( t)= {c sgn(et -l) 41et-ll;?!:c
and F and G both normal. Symmetric and asymmetric contamination
Accommodation of outliers in univariate samples 171
170 Outliers in statistica[ data
The generai forms of the premium and protection measures under the two
for an appropriate choice of c. The estimator mtmmtzes the maximal types of alternative model-mean-slippage and variance-slippage-are exhi-
asymptotic variance within the class of ali estimators of T = Iog (u 2 ) which are bited by Guttman and Smith, who also consider the problem of their
invariant under changes of scale of the xi. Unfortunately, the maximization numerica! determination. Results are given only for the primitive case n= 3.
process takes piace only over a set of contaminating distributions concen- The facts that heie S~ is not worth considering when IL is known, likewise
trated on {lXI> q} wh ere (if c~!) q 2 = 2c + l. This restricts the range of both s~ and Si when IL is unknown, thus provide highly limited practical
prospects in the mixture model. guidance on the issue of the robustness of the estimators si_, s~, and
In terms of the xi the robust estimator of u is ii= éf and arises from
s~ for reasonable sample sizes.
solving
4.4 ACCOMMODATION OF OUTLIERS IN EXPONENTIAL
(4.3.8)
SAMPLES
where Suppose our sample x 1 , x2 , ••• , xn comes from an exponential distribution
with density l
lf!(q, t)= {t
ltl<q (4.3.9)
f(x, u) = - exp(-x/u) (4.4.1)
u
q sgn t ltl~q
but for the prospect that one observation may be discordant: arising from a
See Huber (1964) for more details, including discussion of some limitations
distribution with density f(x, bu) for some b >l. W e have a basic model H:
of this approach.
f(x, u) and apparently a scale-slippage alternative model H where the
It is also reasonable to contemplate dispersion estimators based on
slippage relates to just one observation. Since any discordant value under H
samples subjected to modified trimming, modified Winsorization or semi-
is likely to be one of the upper extremes it seems sensible to consider
Winsoriza!ion._ Guttman and Smith (1971) define robust disp;rsion es-
• s2 2 -2 2 estimators of u which minimize the influence of the higher-ordered sample
t1mators A' Sw, and Ss of u analogous to their location estimators T
Tw, an~ Ts ~or a normal sl~ppage model where ali but possibly one ~ values. Among restricted L-type estimators
observat10n anse from N(IL, u ) and at most one discordant value arises S(l) = f lix(i) (4.4.2)
eith~r from N.(IL: a, u ) or from N(IL, bu 2 ) with b > l. The same principle
2 j=l

ap~hes of reJectl~g or modifying the observation with largest absolute


(which ignore the n- m largest ordered observations) Kale and Sinha (1971)
restdual, should thts be sufficiently large. The proposed estimators take the show that the one-sided Winsorized mean
following forms shown in Table 4.3.
In each estimator K must be prescribed, and d is then chosen to ensure Sm,n = -1- [m-1
L X(j) +(n- m+ 1)x(m)] (4.4.3)
m+ 1 i=l
unbiasedness in the null case (no discordant value). Forms given in Table 4.2
apply to the more usual case where IL is unknown. If IL were known we is 'optimal' in the sense of minimizing MSE(S(I) l b = 1). This is demon-
~o~ld ~erely replace i, i( l) etc. in s ,
2
sto
etc. by the true mean IL· (Subscript strated under the exchangeable version of the slippage model (see Section
mdtces m brackets refer to omitted ordered observations.) 2.3). Note that there is no suggestion of optimality under the alternative
model where b >l. In the null case (4.4.3) has efficiency (m+ 1)/(n +l)
relative to the optimal full-sample version: Sn,n· It is argued, however, that
Table 4.3 Forms of Si, S~, S~ this loss of efficiency under the basic model (where we do not need to
protect against a discordant value) may be offset by a corresponding gain
under H (w bere we do need to do so). Accordingly MSE(Sm,n l b >l) is
.s;. s~ s~ Condition
investigated. The cases n = 3, 4 are studied in detail and this gain (relative to
ds 2 ds 2 ds 2 Z~t)< KS
2
an d Z~n)< KS
2
Sn,n) is confirmed for sufficiently large b. Typically if n= 4, m= 3 the
relative efficiency rises from 0.8 at b = 1.1 to 1.0 at b = 2, 4 at b = 5, 15 at
ds~t> dmax[s~2 • 1 >s~n, 1>] -d- [( n-2)s(l)+Ks]
2 2
Z~t);:::: KS
2
an d Z~l)> Z~n)
n- 1 b =lO, and ultimately becomes infinite.
The questions of choice of m, and performance for larger n, are taken up
ds~n) dmax[s:.(l,n)S~n-l,n)] -d- [(n-2)s<n>+Ks]
2 2
Z~n);:::: KS
2
an d Z~n)> Z~t)
n- 1 by Joshi (1972b). When b <2 it turns out that no m =l n improves on Sn,n·
Accommodation of outliers in univariate samples 171
170 Outliers in statistica[ data
The generai forms of the premium and protection measures under the two
for an appropriate choice of c. The estimator mtmmtzes the maximal types of alternative model-mean-slippage and variance-slippage-are exhi-
asymptotic variance within the class of ali estimators of T = Iog (u 2 ) which are bited by Guttman and Smith, who also consider the problem of their
invariant under changes of scale of the xi. Unfortunately, the maximization numerica! determination. Results are given only for the primitive case n= 3.
process takes piace only over a set of contaminating distributions concen- The facts that heie S~ is not worth considering when IL is known, likewise
trated on {lXI> q} wh ere (if c~!) q 2 = 2c + l. This restricts the range of both s~ and Si when IL is unknown, thus provide highly limited practical
prospects in the mixture model. guidance on the issue of the robustness of the estimators si_, s~, and
In terms of the xi the robust estimator of u is ii= éf and arises from
s~ for reasonable sample sizes.
solving
4.4 ACCOMMODATION OF OUTLIERS IN EXPONENTIAL
(4.3.8)
SAMPLES
where Suppose our sample x 1 , x2 , ••• , xn comes from an exponential distribution
with density l
lf!(q, t)= {t
ltl<q (4.3.9)
f(x, u) = - exp(-x/u) (4.4.1)
u
q sgn t ltl~q
but for the prospect that one observation may be discordant: arising from a
See Huber (1964) for more details, including discussion of some limitations
distribution with density f(x, bu) for some b >l. W e have a basic model H:
of this approach.
f(x, u) and apparently a scale-slippage alternative model H where the
It is also reasonable to contemplate dispersion estimators based on
slippage relates to just one observation. Since any discordant value under H
samples subjected to modified trimming, modified Winsorization or semi-
is likely to be one of the upper extremes it seems sensible to consider
Winsoriza!ion._ Guttman and Smith (1971) define robust disp;rsion es-
• s2 2 -2 2 estimators of u which minimize the influence of the higher-ordered sample
t1mators A' Sw, and Ss of u analogous to their location estimators T
Tw, an~ Ts ~or a normal sl~ppage model where ali but possibly one ~ values. Among restricted L-type estimators
observat10n anse from N(IL, u ) and at most one discordant value arises S(l) = f lix(i) (4.4.2)
eith~r from N.(IL: a, u ) or from N(IL, bu 2 ) with b > l. The same principle
2 j=l

ap~hes of reJectl~g or modifying the observation with largest absolute


(which ignore the n- m largest ordered observations) Kale and Sinha (1971)
restdual, should thts be sufficiently large. The proposed estimators take the show that the one-sided Winsorized mean
following forms shown in Table 4.3.
In each estimator K must be prescribed, and d is then chosen to ensure Sm,n = -1- [m-1
L X(j) +(n- m+ 1)x(m)] (4.4.3)
m+ 1 i=l
unbiasedness in the null case (no discordant value). Forms given in Table 4.2
apply to the more usual case where IL is unknown. If IL were known we is 'optimal' in the sense of minimizing MSE(S(I) l b = 1). This is demon-
~o~ld ~erely replace i, i( l) etc. in s ,
2
sto
etc. by the true mean IL· (Subscript strated under the exchangeable version of the slippage model (see Section
mdtces m brackets refer to omitted ordered observations.) 2.3). Note that there is no suggestion of optimality under the alternative
model where b >l. In the null case (4.4.3) has efficiency (m+ 1)/(n +l)
relative to the optimal full-sample version: Sn,n· It is argued, however, that
Table 4.3 Forms of Si, S~, S~ this loss of efficiency under the basic model (where we do not need to
protect against a discordant value) may be offset by a corresponding gain
under H (w bere we do need to do so). Accordingly MSE(Sm,n l b >l) is
.s;. s~ s~ Condition
investigated. The cases n = 3, 4 are studied in detail and this gain (relative to
ds 2 ds 2 ds 2 Z~t)< KS
2
an d Z~n)< KS
2
Sn,n) is confirmed for sufficiently large b. Typically if n= 4, m= 3 the
relative efficiency rises from 0.8 at b = 1.1 to 1.0 at b = 2, 4 at b = 5, 15 at
ds~t> dmax[s~2 • 1 >s~n, 1>] -d- [( n-2)s(l)+Ks]
2 2
Z~t);:::: KS
2
an d Z~l)> Z~n)
n- 1 b =lO, and ultimately becomes infinite.
The questions of choice of m, and performance for larger n, are taken up
ds~n) dmax[s:.(l,n)S~n-l,n)] -d- [(n-2)s<n>+Ks]
2 2
Z~n);:::: KS
2
an d Z~n)> Z~t)
n- 1 by Joshi (1972b). When b <2 it turns out that no m =l n improves on Sn,n·
172 Outliers in statistica[ data Accommodation of outliers in univariate samples 173

But for more extreme discordancy (larger b) substantial gains in relative exponential family, with (n- k) observations from one and k (not necessar-
efficiency are available. Table XX on page 323 (extracted from Joshi, ily just one) from the other. Employing a maximum likelihood approach be
1972b) presents the optimal choice m* for m an d associate d relative obtains in the case of an exponential distribution a trimmed mean estimator
efficiency em* for values of b in the range 2-20. Specifically, if b =l! h the n-k
table presents results for h= 0.05(0.05)0.50. s~.n = I x(j)J<n- k) {4.4.5)
Of course, b will not be known and Joshi suggests an ad hoc procedure j=l

which consists of first calculating sn-l,n as a provisional estimate ii, then


rather than the Winsorized mean of the Kale and Sinha (1971) approach.
estimating b from
(Trimmed means also arise for location estimates in the normal case with
nSn,n =(n+ b-1)~ (4.4.4) known, common, u 2 .)
The trimmed, and Winsorized, means are compared for k = l using
for the purpose of determining m* from Table XX. The corresponding Sm *,n
premium-protection measures. S~,n provides greater protection (lower MSE
is used for ii in (4.4.4), and a new m* is determined from the table. The
as b ~ oo) than Sn-l,n but at a higher premium (higher MSE as b ~ 1). We
process is repeated until m* becomes stable a t which stage u is estimated by
should recall, however, that sn-l,n is not necessarily the optimal form of
the corresponding sm *,n-
sm,n'
Example 4.5 Failures of a criticai electronic component occur from time to
time in a navigational aid. On failur€ the component is replaced by a new
one. Records show the ordered values of lifetime for 9 components to be
1.6 2.8 2.9 4.1 9.8 14.1 16.7 22.1 54.3
Here n= 9 and we have S9 ,9 = 12.84 and S8 ,9 = 10.689. From (4.4.4) we get
b = 2.811. Thus from Table XX, m*= 8. We do not need to proceed further.
We estimate u by 10.689 for an efficiency gain (if b is truly 2.811) of about
28 per cent.
Other aspects of this approach are discussed by Sinha (1973a; moment
properties and limiting form of the MSE), Sinha (1973c; refinements for the
two-parameter, location shifted, case), Sinha (1973d; some exact distribu-
tional results, including lengths of confidence intervals for n= 4, m= 3).
Veale and Kale (1972) considera corresponding hypothesis test. Under H
the UMP size-a test of H 0 : u =l versus H 1 : u >l has criticai region of the
form:

The robustness of this test is examined by considering its performance


under the contaminated model H. An expression for the power function
{3(b, u) is obtained. Sinha proceeds to examine tests based on Sm,n (m< n).
For any m, a UMP size-a test again exists with rejection for sufficiently
large Sm,n- However, consideration of power shows not surprisingly that for
the basic (uncontaminated) model we are best to take m = n- l if we cannot
take m = n. Robustness properties of this test are discussed in terms of
'premium' and 'protection' measures (but see Section 4.1.4), and some
tabulated values are presented.
Kale (1975c) presents a wider study of robust estimation of scale parame-
ters under an exchangeable model consisting of two components in the
172 Outliers in statistica[ data Accommodation of outliers in univariate samples 173

But for more extreme discordancy (larger b) substantial gains in relative exponential family, with (n- k) observations from one and k (not necessar-
efficiency are available. Table XX on page 323 (extracted from Joshi, ily just one) from the other. Employing a maximum likelihood approach be
1972b) presents the optimal choice m* for m an d associate d relative obtains in the case of an exponential distribution a trimmed mean estimator
efficiency em* for values of b in the range 2-20. Specifically, if b =l! h the n-k
table presents results for h= 0.05(0.05)0.50. s~.n = I x(j)J<n- k) {4.4.5)
Of course, b will not be known and Joshi suggests an ad hoc procedure j=l

which consists of first calculating sn-l,n as a provisional estimate ii, then


rather than the Winsorized mean of the Kale and Sinha (1971) approach.
estimating b from
(Trimmed means also arise for location estimates in the normal case with
nSn,n =(n+ b-1)~ (4.4.4) known, common, u 2 .)
The trimmed, and Winsorized, means are compared for k = l using
for the purpose of determining m* from Table XX. The corresponding Sm *,n
premium-protection measures. S~,n provides greater protection (lower MSE
is used for ii in (4.4.4), and a new m* is determined from the table. The
as b ~ oo) than Sn-l,n but at a higher premium (higher MSE as b ~ 1). We
process is repeated until m* becomes stable a t which stage u is estimated by
should recall, however, that sn-l,n is not necessarily the optimal form of
the corresponding sm *,n-
sm,n'
Example 4.5 Failures of a criticai electronic component occur from time to
time in a navigational aid. On failur€ the component is replaced by a new
one. Records show the ordered values of lifetime for 9 components to be
1.6 2.8 2.9 4.1 9.8 14.1 16.7 22.1 54.3
Here n= 9 and we have S9 ,9 = 12.84 and S8 ,9 = 10.689. From (4.4.4) we get
b = 2.811. Thus from Table XX, m*= 8. We do not need to proceed further.
We estimate u by 10.689 for an efficiency gain (if b is truly 2.811) of about
28 per cent.
Other aspects of this approach are discussed by Sinha (1973a; moment
properties and limiting form of the MSE), Sinha (1973c; refinements for the
two-parameter, location shifted, case), Sinha (1973d; some exact distribu-
tional results, including lengths of confidence intervals for n= 4, m= 3).
Veale and Kale (1972) considera corresponding hypothesis test. Under H
the UMP size-a test of H 0 : u =l versus H 1 : u >l has criticai region of the
form:

The robustness of this test is examined by considering its performance


under the contaminated model H. An expression for the power function
{3(b, u) is obtained. Sinha proceeds to examine tests based on Sm,n (m< n).
For any m, a UMP size-a test again exists with rejection for sufficiently
large Sm,n- However, consideration of power shows not surprisingly that for
the basic (uncontaminated) model we are best to take m = n- l if we cannot
take m = n. Robustness properties of this test are discussed in terms of
'premium' and 'protection' measures (but see Section 4.1.4), and some
tabulated values are presented.
Kale (1975c) presents a wider study of robust estimation of scale parame-
ters under an exchangeable model consisting of two components in the
Outlying sub-samples: slippage tests 115
Ammocoete (~= 19)

Metomorphosls(m = 20)
2

Spownmg (November)(mt27) --------------"'o--><----**_,.___,____


CHAPTER 5
Spownmg (Jonuory) ('1 =35)
Outlying Sub-Samples: ~
Slippage Tests
o 10 o15 o 20 o 25
Figure 5.1 Heart ratios of river lampreys at different stages of development

A different type of outlier problem may arise in situations where a set of sub-populations. Tests exist for such problems. Termed slippage tests they
data can be divided into distinct sub-samples. The sub-samples may corres- are germane to a generai study of outliers.
pond with different levels (qualitative or quantitative) of some factor of Consider the data shown diagrammatically in Figure 5 .l. This shows
classification, or with different combinations of some set of factors. The measurements by Claridge and Potter (1974 and personal correspondence)
subdivision of the sample into the sub-samples may take piace after we have of the heart ratios of river lampreys (Lampetra fluviatilis) for random
collected a random sample from some overall population in which case samples at different stages of development: ammocoete {larva), during
sub-sample sizes, mb are random quantities. Alternatively, and more likely, metamorphosis, and during the spawning run (two dates). We have a sample
we may choose random samples of prescribed sizes, mb at different factor of 101 observations made up of four independent samples of predetermined
levels or under different circumstances: these samples in combination serve sizes 19, 20, 27, and 35.
as sub-samples in the overall data set. This is the case in many designed Whilst the results may not be surprising from the biological standpoint,
experiments and one interest (examinable by analysis of variance techniques they illustrate the statistica! matters discussed above. The ammocoete sub-
on the customary assumptions of normality, additivity, and homoscedastic- sample might well be regarded as an outlier, and a test for slippage of its
ity) is in the comparison of the means of the populations from which the mean below the means for the more adult populations could be informative.
'sub-samples' arise. The presence and effect of individuai outlying observa- Then again, some might adjudge the metamorphosis sample an outlier in
tions in such sub-samples, in the context of an analysis of variance, will be terms of its larger dispersion.
discussed in Chapter 7. In discussing below the basis and nature of slippage tests in generai we
Analysis of variance techniques serve to test the homogeneity of the shall, inter alia, refer to specific methods for testing for slippage of the mean
means of the different factor-level populations against the alternative that or of the variance and can later examine the lamprey data in more detail.
not ali means are equal: they accord to some pattern expressed by a linear The slippage problem is clearly closely related to the earlier study of tests
model reflecting factor effects. A more generai consideration can, however, of discordancy for individuai outlying observations in a single sample. We
be contemplated. Examination of the sub-samples (or some summary sub- shall see that in some circumstances the statistica! methods developed for
sample measures, such as their means or variances) may manifest individuai individuai outliers may be immediately carried over. This is trivially so if all
outlying sub-samples, and appropriate tests (analogous to tests of discor- sub-sample sizes are unity. But it can arise in other cases. For example, in
dancy for individuai outlying observations in a single sample) are of interest. testing for upward slippage of the mean in one of a set of normal distribu-
A special case might be where we believe that the sub-samples have tions with common known variance it is plausible that with equal-sized
arisen from populations with identica! means, with the alternative prospect samples the sufficiency of the sub-sample means will effectively reduce the
that in just one (or a few) populations the mean has 'slipped' up or down problem to a test of an upper outlier in a single sample where the basic
from the predominant level. Or perhaps the variance of one (or a few) observations are the sample means. But in other situations this direct
populations is larger than the common variance of the majority of the reinterpretation will not hold and new methods will result.
174
Outlying sub-samples: slippage tests 115
Ammocoete (~= 19)

Metomorphosls(m = 20)
2

Spownmg (November)(mt27) --------------"'o--><----**_,.___,____


CHAPTER 5
Spownmg (Jonuory) ('1 =35)
Outlying Sub-Samples: ~
Slippage Tests
o 10 o15 o 20 o 25
Figure 5.1 Heart ratios of river lampreys at different stages of development

A different type of outlier problem may arise in situations where a set of sub-populations. Tests exist for such problems. Termed slippage tests they
data can be divided into distinct sub-samples. The sub-samples may corres- are germane to a generai study of outliers.
pond with different levels (qualitative or quantitative) of some factor of Consider the data shown diagrammatically in Figure 5 .l. This shows
classification, or with different combinations of some set of factors. The measurements by Claridge and Potter (1974 and personal correspondence)
subdivision of the sample into the sub-samples may take piace after we have of the heart ratios of river lampreys (Lampetra fluviatilis) for random
collected a random sample from some overall population in which case samples at different stages of development: ammocoete {larva), during
sub-sample sizes, mb are random quantities. Alternatively, and more likely, metamorphosis, and during the spawning run (two dates). We have a sample
we may choose random samples of prescribed sizes, mb at different factor of 101 observations made up of four independent samples of predetermined
levels or under different circumstances: these samples in combination serve sizes 19, 20, 27, and 35.
as sub-samples in the overall data set. This is the case in many designed Whilst the results may not be surprising from the biological standpoint,
experiments and one interest (examinable by analysis of variance techniques they illustrate the statistica! matters discussed above. The ammocoete sub-
on the customary assumptions of normality, additivity, and homoscedastic- sample might well be regarded as an outlier, and a test for slippage of its
ity) is in the comparison of the means of the populations from which the mean below the means for the more adult populations could be informative.
'sub-samples' arise. The presence and effect of individuai outlying observa- Then again, some might adjudge the metamorphosis sample an outlier in
tions in such sub-samples, in the context of an analysis of variance, will be terms of its larger dispersion.
discussed in Chapter 7. In discussing below the basis and nature of slippage tests in generai we
Analysis of variance techniques serve to test the homogeneity of the shall, inter alia, refer to specific methods for testing for slippage of the mean
means of the different factor-level populations against the alternative that or of the variance and can later examine the lamprey data in more detail.
not ali means are equal: they accord to some pattern expressed by a linear The slippage problem is clearly closely related to the earlier study of tests
model reflecting factor effects. A more generai consideration can, however, of discordancy for individuai outlying observations in a single sample. We
be contemplated. Examination of the sub-samples (or some summary sub- shall see that in some circumstances the statistica! methods developed for
sample measures, such as their means or variances) may manifest individuai individuai outliers may be immediately carried over. This is trivially so if all
outlying sub-samples, and appropriate tests (analogous to tests of discor- sub-sample sizes are unity. But it can arise in other cases. For example, in
dancy for individuai outlying observations in a single sample) are of interest. testing for upward slippage of the mean in one of a set of normal distribu-
A special case might be where we believe that the sub-samples have tions with common known variance it is plausible that with equal-sized
arisen from populations with identica! means, with the alternative prospect samples the sufficiency of the sub-sample means will effectively reduce the
that in just one (or a few) populations the mean has 'slipped' up or down problem to a test of an upper outlier in a single sample where the basic
from the predominant level. Or perhaps the variance of one (or a few) observations are the sample means. But in other situations this direct
populations is larger than the common variance of the majority of the reinterpretation will not hold and new methods will result.
174
176 Outliers in statistica[ data Outlying sub-samples: slippage tests 177

Any relationship with earlier single-sample results will of course only arise A non-paranietric test of H is proposed based on rank orders of the
in the context of the slippage-type alternative hypothesis for outlier genera- observations in the combined sample of nm observations.
tiono Indeed, this is why the slippage-type model was so termed. Again, we Suppose that the samples are ordered in terms of their maximum observa-
must expect to encounter the usual spectrum of distributional, inferential, tions and denoted l/< 0 , Y'(2 ), o.. , Y'(n) where Y'(l) contains the overall max-
methodological, and dimensionai distinctions. For example, slippage tests imum observation, Y'(2 ) the second largest maximum, and so on. We shall
have been developed for normal, gamma (and other) populations; identifica- refer to Y'(i) as the sample of rank i. Le t M( i, j) represent the number of
tion of a slipped population is of prime interest but robust estimation in the observations in Y'(i) which exceed ali those in Y'(j) (and, of course, in
prese nce of slippage is also contemplated; non-parametric, Bayesian, an d g'(j+l)' o • • , g'(n)). .
multiple-decision approaches ha ve bee n used; slippage of multivariate popu- Mosteller's test uses as test statistic M{l, 2): the number of observat10ns
lations has been considered. Paralleling the study of outliers in single in Y'(l) which exceed ali observations in the other samples. If M(1, 2~ is
samples we shall again need to distinguish techniques designed for slippage sufficiently large we reject H and conclude that Y'(l) comes from a populat10n
of a single population from those which contemplate (not necessarily similar) which has slipped in location to the right. (A corresponding test for slippage
slippage of several populations; we also find once more some informality in to the left has the obvious form based on ranking the samples in terms of
the expression of the alternative hypothesis which makes precise classifica- their minimum observations.)
tion of aim or principle rather difficult on occasions. This leads to some The null distribution of M(1, 2) is easily determined. Let
confusion in the literature on what is meant by 'the slippage problem' and
on how it differs from analysis of variance (for designed experiments), from s! (t - s) l (rt)
r)! (
F(r; s, t)= (s- r)! t!= r r~ s ~t.} (5.1.1)
problems with an alternative hypothesis expressing an order relationship
among the population parameters or from identification and ranking proce- =O r> s~ t.
dures for means, variances, or distributions. We shall comment further on Then, when H is true,
this matter in Section 502.
P{M(1,2)~h}=F(h-1; m-1, mn-1) (5.1.2)
Since the slippage problem is relatively self-contained, we shall break with
the principle we have adopted in other parts of the book and draw together
ali the various distinctions within the present chapter rather than, for
=n(:n_:hh) l (:n)= nm<hlj(mn)<hl (5.1.3)
example, deferring until Chapter 8 discussion of the relevant non-parametric
or Bayesian methodso where s<r)=s!/{s-r)!; Mosteller (1948) gives a brief table of these tail
probabilities for n= 2(1)6, h= 2(1)6, m= 3(2)7,10(5)25, oo. He also pres-
5.1 NON-PARAMETRIC SLIPPAGE TESTS ents an asymptotic form for P{M (l, 2) ~h} as n-h+l, but this requires m to
The earliest work on slippage, and apparently the first use of the term, is be quite large (in excess of about 25) for reasonable accuracy.
by Mosteller (1948). He considers a situation in which n equal-sized random With unequal sample sizes mb m 2 , • •• , mn (M= Ii mJ no change of
samples, each of size m, arise from continuous distributions which are principle arises. We now have
unspecified but identica!, except for the possibility that one of them may n

have slipped in location to the right. Mosteller's non-parametric test has P{M(1, 2)~ h}= L m~h)/M(h). (5.1.4)
1
stimulated others. In reviewing these it is convenient to distinguish the
situation where at most one population has slipped from that where several In discussing this case, Mosteller and Tukey (1950) give an improved
populations may have slipped. asymptotic form in the equal-sized sample case as

5.1.1 Non-parametric tests for slippage of a single population P{M(1, 2) ~h}= n-h+l exp[{-h(h -1)(n -1)/{2mn)}{1 + (2h -1)/6m}]
(5.1.5)
Mosteller (1948) approaches the problem of slippage of a single population
in the following way. and suggest that we allow for different-sized samples by assuming that we
On the working hypothesis, H, each of the n samples of m observations have n* effective equal-sized samples of size M/n* where
arises independently from a distribution with density function f(x); on the
alternative hypothesis the ith sample (with i unspecified) comes from a {5.1.6)
distribution with density function f(x- a) (a> O, unknown).
176 Outliers in statistica[ data Outlying sub-samples: slippage tests 177

Any relationship with earlier single-sample results will of course only arise A non-paranietric test of H is proposed based on rank orders of the
in the context of the slippage-type alternative hypothesis for outlier genera- observations in the combined sample of nm observations.
tiono Indeed, this is why the slippage-type model was so termed. Again, we Suppose that the samples are ordered in terms of their maximum observa-
must expect to encounter the usual spectrum of distributional, inferential, tions and denoted l/< 0 , Y'(2 ), o.. , Y'(n) where Y'(l) contains the overall max-
methodological, and dimensionai distinctions. For example, slippage tests imum observation, Y'(2 ) the second largest maximum, and so on. We shall
have been developed for normal, gamma (and other) populations; identifica- refer to Y'(i) as the sample of rank i. Le t M( i, j) represent the number of
tion of a slipped population is of prime interest but robust estimation in the observations in Y'(i) which exceed ali those in Y'(j) (and, of course, in
prese nce of slippage is also contemplated; non-parametric, Bayesian, an d g'(j+l)' o • • , g'(n)). .
multiple-decision approaches ha ve bee n used; slippage of multivariate popu- Mosteller's test uses as test statistic M{l, 2): the number of observat10ns
lations has been considered. Paralleling the study of outliers in single in Y'(l) which exceed ali observations in the other samples. If M(1, 2~ is
samples we shall again need to distinguish techniques designed for slippage sufficiently large we reject H and conclude that Y'(l) comes from a populat10n
of a single population from those which contemplate (not necessarily similar) which has slipped in location to the right. (A corresponding test for slippage
slippage of several populations; we also find once more some informality in to the left has the obvious form based on ranking the samples in terms of
the expression of the alternative hypothesis which makes precise classifica- their minimum observations.)
tion of aim or principle rather difficult on occasions. This leads to some The null distribution of M(1, 2) is easily determined. Let
confusion in the literature on what is meant by 'the slippage problem' and
on how it differs from analysis of variance (for designed experiments), from s! (t - s) l (rt)
r)! (
F(r; s, t)= (s- r)! t!= r r~ s ~t.} (5.1.1)
problems with an alternative hypothesis expressing an order relationship
among the population parameters or from identification and ranking proce- =O r> s~ t.
dures for means, variances, or distributions. We shall comment further on Then, when H is true,
this matter in Section 502.
P{M(1,2)~h}=F(h-1; m-1, mn-1) (5.1.2)
Since the slippage problem is relatively self-contained, we shall break with
the principle we have adopted in other parts of the book and draw together
ali the various distinctions within the present chapter rather than, for
=n(:n_:hh) l (:n)= nm<hlj(mn)<hl (5.1.3)
example, deferring until Chapter 8 discussion of the relevant non-parametric
or Bayesian methodso where s<r)=s!/{s-r)!; Mosteller (1948) gives a brief table of these tail
probabilities for n= 2(1)6, h= 2(1)6, m= 3(2)7,10(5)25, oo. He also pres-
5.1 NON-PARAMETRIC SLIPPAGE TESTS ents an asymptotic form for P{M (l, 2) ~h} as n-h+l, but this requires m to
The earliest work on slippage, and apparently the first use of the term, is be quite large (in excess of about 25) for reasonable accuracy.
by Mosteller (1948). He considers a situation in which n equal-sized random With unequal sample sizes mb m 2 , • •• , mn (M= Ii mJ no change of
samples, each of size m, arise from continuous distributions which are principle arises. We now have
unspecified but identica!, except for the possibility that one of them may n

have slipped in location to the right. Mosteller's non-parametric test has P{M(1, 2)~ h}= L m~h)/M(h). (5.1.4)
1
stimulated others. In reviewing these it is convenient to distinguish the
situation where at most one population has slipped from that where several In discussing this case, Mosteller and Tukey (1950) give an improved
populations may have slipped. asymptotic form in the equal-sized sample case as

5.1.1 Non-parametric tests for slippage of a single population P{M(1, 2) ~h}= n-h+l exp[{-h(h -1)(n -1)/{2mn)}{1 + (2h -1)/6m}]
(5.1.5)
Mosteller (1948) approaches the problem of slippage of a single population
in the following way. and suggest that we allow for different-sized samples by assuming that we
On the working hypothesis, H, each of the n samples of m observations have n* effective equal-sized samples of size M/n* where
arises independently from a distribution with density function f(x); on the
alternative hypothesis the ith sample (with i unspecified) comes from a {5.1.6)
distribution with density function f(x- a) (a> O, unknown).
178 Outliers in statistica/ data Outlying sub-samples: slippage tests 179

The only difficulties that arise with unequal sample sizes are the extra merely take action in relation to the observed criticallevel of the test; if we
calculation effort and the unmanageable scale of any useful tabulation of tail observe a test statistic value of t then we would consider the upper-tail
probabilities or criticai values. Accordingly this case has received little probability P(T;3: t) evaluated under the null hypothesis, rather than be
detailed consideration and we shall henceforth assume that the sample sizes constrained by pa,rticular (arbitrary) significance levels. Thus, in practice,
are equal unless specifically stated otherwise. tables of criticai values provide only a rough-and-ready guide and it is
Table XXI on page 324 presents criticai values for 5 per cent and l per sensible to augment their information with appropriate upper-tail prob-
ce n t Mosteller tests {that is, the smallest values of h for which {5 .1.2) is less abilities which are tabulated by the respective authors for most of the
than 0.05 and 0.01) for n= 2(1)6 and m= 3(1)10(5)25, 100, oo, compiled non-parametric tests described in this chapter and in Chapter 8.
from published tables of tail probabilities (Mosteller, 1948; Bofinger, 1965)
Example 5.1. For the Lamprey data n*= 3.76 so that the criticallevel for
and additional calculations.
slippage to the left of the Ammocoete sample is approximately (3.76)- 18e- 5 ·66 :
*

Since M{l, 2) is discrete we inevitably find here, and in other rank tests
overwhelming evidence of such slippage, as might be anticipated.
below, that significance levels cannot be attained exactly. Thus the size of
the tests may be rather smaller than the stated significance level. For Various modifications an d extensions of the Mosteller-Tukey test bave
example, using the Mosteller test with m= 5, n= 3 the criticai value of h for been published. Bofinger (1965) considers the unequal sample size case
a 5 per cent test is given in Table XXI as 4. But in this case where the s.ample sizes mb m2 , •.• , mn arise as observations from a multi-
nomial distribution with parameters M= L~ mi and Pi (j =l, 2, ... , n) and
P{ M{ l, 2) ;3: 4} = 0.011
we wish to test for slippage of a single population. He also presents some
so that the size of the test is only l. l per cent. For m= 5, n= 3 results on the power, against particular types of alternative hypothesis, of
tests for both equal and random sample sizes, and extends the tabulation of
P{M(l, 2) ;3: 5} = 0.001
upper-tail probabilities for tests with equal-sized samples beyond that given
so that h= 5 is the criticai value for a test with significance level l per cent; by Mosteller (1948).
the test size is only 0.1 per cent however. As m increases the sizes of the 5 Neave (1972) shows how the test might be modified to take account of
per cent and l per cent tests rise only to 3.7 per cent and 0.9 per cent, different alternative hypothesis interests expressed by the double dichotomy:
respectively. Thus in operating tests of any particular level the safeguard (in
A 1 : a specified population}
terms of the probability of incorrect rejection) may be much greater than is
A 2 : any single population
superficially implied by the significance level. This effect is common, of
course, to any non-randomized test with a discrete test statistic. In the has slipped
context of non-parametric slippage tests, Neave (1972) proposes a resolu- B 1 : in a specified direction {say, upwards)
tion of this problem by considering two types of criticai value in any { B 2 : in either direction.
situation: ha is the best conservative criticai value at level a based on a test Mosteller's (1948) test corresponds with case A 2 B 1 • Denoting by T A;Bj
statistic T if (i= l, 2; j =l, 2) the appropriate test statistics, their respective forms and
P(T;3: ha)~ a explicit and asymptotic null distributions (given in terms of the function
F(r; s, t)) are as follows.
P(T;3: ha -l)> a
IA1B1j TA 1 B 1 = number of observations in the specified sample which
whereas h~ is the nearest criticai value if
exceed ali observations in the other samples.
IP(T;3: h)- a l
P(TA 1 B 1 ;3: h)= F(h; m, mn)
is minimized for h= h~. Common statistica} practice supports the use of the (5.1.7)
best conservative criticai value, and this is employed throughout this text.
TA 1 B 2 = larger of the number of observations in the specified
However, care is needed in interpreting other tabulated values (particularly
sample which exceed, or are less than, all observations in
in Neave, 1972). Neave gives an example which highlights the problem. In
the example, P(T;3: 2) = 0.0543, P( T> 3) = 0.0099 so that for a 5 per cent the other samples.
test h0 .05 = 3, h6.os = 2 and the adoption of the (best conservative) criticai P(TA1 B 2 ;3:h)=2F(h; m, mn)-F(2h; m, mn)
{5.1.8)
value 3 appears to be rather wasteful. Surely the most reasonable policy is to
178 Outliers in statistica/ data Outlying sub-samples: slippage tests 179

The only difficulties that arise with unequal sample sizes are the extra merely take action in relation to the observed criticallevel of the test; if we
calculation effort and the unmanageable scale of any useful tabulation of tail observe a test statistic value of t then we would consider the upper-tail
probabilities or criticai values. Accordingly this case has received little probability P(T;3: t) evaluated under the null hypothesis, rather than be
detailed consideration and we shall henceforth assume that the sample sizes constrained by pa,rticular (arbitrary) significance levels. Thus, in practice,
are equal unless specifically stated otherwise. tables of criticai values provide only a rough-and-ready guide and it is
Table XXI on page 324 presents criticai values for 5 per cent and l per sensible to augment their information with appropriate upper-tail prob-
ce n t Mosteller tests {that is, the smallest values of h for which {5 .1.2) is less abilities which are tabulated by the respective authors for most of the
than 0.05 and 0.01) for n= 2(1)6 and m= 3(1)10(5)25, 100, oo, compiled non-parametric tests described in this chapter and in Chapter 8.
from published tables of tail probabilities (Mosteller, 1948; Bofinger, 1965)
Example 5.1. For the Lamprey data n*= 3.76 so that the criticallevel for
and additional calculations.
slippage to the left of the Ammocoete sample is approximately (3.76)- 18e- 5 ·66 :
*

Since M{l, 2) is discrete we inevitably find here, and in other rank tests
overwhelming evidence of such slippage, as might be anticipated.
below, that significance levels cannot be attained exactly. Thus the size of
the tests may be rather smaller than the stated significance level. For Various modifications an d extensions of the Mosteller-Tukey test bave
example, using the Mosteller test with m= 5, n= 3 the criticai value of h for been published. Bofinger (1965) considers the unequal sample size case
a 5 per cent test is given in Table XXI as 4. But in this case where the s.ample sizes mb m2 , •.• , mn arise as observations from a multi-
nomial distribution with parameters M= L~ mi and Pi (j =l, 2, ... , n) and
P{ M{ l, 2) ;3: 4} = 0.011
we wish to test for slippage of a single population. He also presents some
so that the size of the test is only l. l per cent. For m= 5, n= 3 results on the power, against particular types of alternative hypothesis, of
tests for both equal and random sample sizes, and extends the tabulation of
P{M(l, 2) ;3: 5} = 0.001
upper-tail probabilities for tests with equal-sized samples beyond that given
so that h= 5 is the criticai value for a test with significance level l per cent; by Mosteller (1948).
the test size is only 0.1 per cent however. As m increases the sizes of the 5 Neave (1972) shows how the test might be modified to take account of
per cent and l per cent tests rise only to 3.7 per cent and 0.9 per cent, different alternative hypothesis interests expressed by the double dichotomy:
respectively. Thus in operating tests of any particular level the safeguard (in
A 1 : a specified population}
terms of the probability of incorrect rejection) may be much greater than is
A 2 : any single population
superficially implied by the significance level. This effect is common, of
course, to any non-randomized test with a discrete test statistic. In the has slipped
context of non-parametric slippage tests, Neave (1972) proposes a resolu- B 1 : in a specified direction {say, upwards)
tion of this problem by considering two types of criticai value in any { B 2 : in either direction.
situation: ha is the best conservative criticai value at level a based on a test Mosteller's (1948) test corresponds with case A 2 B 1 • Denoting by T A;Bj
statistic T if (i= l, 2; j =l, 2) the appropriate test statistics, their respective forms and
P(T;3: ha)~ a explicit and asymptotic null distributions (given in terms of the function
F(r; s, t)) are as follows.
P(T;3: ha -l)> a
IA1B1j TA 1 B 1 = number of observations in the specified sample which
whereas h~ is the nearest criticai value if
exceed ali observations in the other samples.
IP(T;3: h)- a l
P(TA 1 B 1 ;3: h)= F(h; m, mn)
is minimized for h= h~. Common statistica} practice supports the use of the (5.1.7)
best conservative criticai value, and this is employed throughout this text.
TA 1 B 2 = larger of the number of observations in the specified
However, care is needed in interpreting other tabulated values (particularly
sample which exceed, or are less than, all observations in
in Neave, 1972). Neave gives an example which highlights the problem. In
the example, P(T;3: 2) = 0.0543, P( T> 3) = 0.0099 so that for a 5 per cent the other samples.
test h0 .05 = 3, h6.os = 2 and the adoption of the (best conservative) criticai P(TA1 B 2 ;3:h)=2F(h; m, mn)-F(2h; m, mn)
{5.1.8)
value 3 appears to be rather wasteful. Surely the most reasonable policy is to
180 Outliers in statistica[ data Outlying sub-samples: slippage tests 181

with criticai values of the form (asymptotically)


2
P(TA B ;3:h)=F(h
2 1
l; m-l, mn-1)
(5.1 -k(mn-
- -k) Ca(n)+-+
k l
~ n -h+ 1 as m ~ 00
n mn n
for a level-a test, where Ca(n) is the upper a-point of X~- 1 •
TA2 B 2 = larger of M(l, 2) or the complementary quantity:
Granger and Neave use heuristic arguments in support of this proposal.
number of observations in the sample of rank n w
We need to choose a value of k and they recommend max(l2, 2n) (or as a
are less than all observations in the other samples.
'safer' prescription, max(20, 3n))!
P( TA 2 B 2 ;3: h) = F( h - ! ;m - l, m n - l) The implicit alternative hypothesis is non-specific with respect to
x{2-F(h; m-h, mn-h) slippage-it merely denies homogeneity of distribution for ali n populations.
Speculative proposals are made to take account of the direction of slippage
- F(l; (n -l) m, mn- h) of the means, slippage of variances and unequal sample sizes.
xF(h-1; m-1, mn-h-1)} This test needs to be viewed as a 'quick and simple' member of the vast
~ n-h+ 1{2- n-h+ 1}
as m~ oo. range of parametric and non-parametric tests of homogeneity of distribution
(5.1.10) which have non-specific alternative hypotheses. Thus in spite of the confu-
Neave (1972) presents tabies of P(T;3: h) based on the asymptotic forms sion in the literature on this matter it is not really a slippage test in the sense
for ali four tests with h= 1(1)8, n= 3(1)10(5)20, which he states are 'quite of our discussion in this chapter.
dose approximations' to the true values provided m ;3: 20. He also gives the A more sophisticated non-parametric slippage test based on ranks is
corresponding four tables of best conservative, and nearest, criticai values described by Doornbos and Prins (1958) and Karlin and Truax (1960) and is
for 5 per cent, l per cent, and 0.1 per cent significance levels (including any investigated by Odeh (1967). It takes the form of an m-sample version of
necessary corrections for finite m). the two-sample Wilcoxon test for equality of distributions, and has been
In a later paper, Neave (1973) gives the results of a Monte Carlo shown to possess certain statistica! optimality properties.
investigation of the power of his four tests in comparison with an analysis of Given n random samples of size m the overall sample of mn observations
variance procedure, its Kruskal-Wallis type non-parametric counterpart and is ranked. Suppose ri1 is the (overall) rank of the lth observation from the jth
a quick test proposed by Granger and Neave (1968) (this latter test is sample, and define
described below). The results are based on normally distributed data with m

slippage in the mean of one population to the extent of 0.5, l, 2, and 3 1j = L


1=1
ril (j = l, 2, ... , n) (5.1.12)
standard deviations, and cover the cases n= 3, 5, 8; m= lO, 20, 50. The
power of the four Neave (1972) tests is not impressive in comparison with as the rank sum for the jth sample. To test for slippage to the right for a
the analysis of variance or Kruskal-Wallis tests (normality, of course, single population we consider
favours the former, an d in both cases the alternative hypothesis is no t
Tmax = max 1j (5.1.13)
specific to slippage of a single population). Even the Granger and Neave j=1,2, ... , n
(1968) test is usually much better than the four Neave (1972) tests. The
Granger and Neave (1968) test was offered as a quick test avoiding and conclude that the population yielding T max has slipped if T max is
sufficiently large.
laborious sorting of data or special tables. It amounts to considering (for
testing upward slippage) the k largest observations in the combined ordered Specifically we reject the basic hypothesis of homogeneity of distribution
at level a if
sample of mn observations. If
Tmax> Àa
Xi = number of observations in the selected k which come from the jth
sample, when, under the basic hypothesis, Àa is as small as possible subject to
the test statistic is P( T max > Àa) ~a.
n
(A corresponding test for slippage to the left is based appropriately on
S= I x;
j=1
(5.1.11)
T min =mini= 1, 2, ... , n 1j with rejection if T min is sufficiently small.)
180 Outliers in statistica[ data Outlying sub-samples: slippage tests 181

with criticai values of the form (asymptotically)


2
P(TA B ;3:h)=F(h
2 1
l; m-l, mn-1)
(5.1 -k(mn-
- -k) Ca(n)+-+
k l
~ n -h+ 1 as m ~ 00
n mn n
for a level-a test, where Ca(n) is the upper a-point of X~- 1 •
TA2 B 2 = larger of M(l, 2) or the complementary quantity:
Granger and Neave use heuristic arguments in support of this proposal.
number of observations in the sample of rank n w
We need to choose a value of k and they recommend max(l2, 2n) (or as a
are less than all observations in the other samples.
'safer' prescription, max(20, 3n))!
P( TA 2 B 2 ;3: h) = F( h - ! ;m - l, m n - l) The implicit alternative hypothesis is non-specific with respect to
x{2-F(h; m-h, mn-h) slippage-it merely denies homogeneity of distribution for ali n populations.
Speculative proposals are made to take account of the direction of slippage
- F(l; (n -l) m, mn- h) of the means, slippage of variances and unequal sample sizes.
xF(h-1; m-1, mn-h-1)} This test needs to be viewed as a 'quick and simple' member of the vast
~ n-h+ 1{2- n-h+ 1}
as m~ oo. range of parametric and non-parametric tests of homogeneity of distribution
(5.1.10) which have non-specific alternative hypotheses. Thus in spite of the confu-
Neave (1972) presents tabies of P(T;3: h) based on the asymptotic forms sion in the literature on this matter it is not really a slippage test in the sense
for ali four tests with h= 1(1)8, n= 3(1)10(5)20, which he states are 'quite of our discussion in this chapter.
dose approximations' to the true values provided m ;3: 20. He also gives the A more sophisticated non-parametric slippage test based on ranks is
corresponding four tables of best conservative, and nearest, criticai values described by Doornbos and Prins (1958) and Karlin and Truax (1960) and is
for 5 per cent, l per cent, and 0.1 per cent significance levels (including any investigated by Odeh (1967). It takes the form of an m-sample version of
necessary corrections for finite m). the two-sample Wilcoxon test for equality of distributions, and has been
In a later paper, Neave (1973) gives the results of a Monte Carlo shown to possess certain statistica! optimality properties.
investigation of the power of his four tests in comparison with an analysis of Given n random samples of size m the overall sample of mn observations
variance procedure, its Kruskal-Wallis type non-parametric counterpart and is ranked. Suppose ri1 is the (overall) rank of the lth observation from the jth
a quick test proposed by Granger and Neave (1968) (this latter test is sample, and define
described below). The results are based on normally distributed data with m

slippage in the mean of one population to the extent of 0.5, l, 2, and 3 1j = L


1=1
ril (j = l, 2, ... , n) (5.1.12)
standard deviations, and cover the cases n= 3, 5, 8; m= lO, 20, 50. The
power of the four Neave (1972) tests is not impressive in comparison with as the rank sum for the jth sample. To test for slippage to the right for a
the analysis of variance or Kruskal-Wallis tests (normality, of course, single population we consider
favours the former, an d in both cases the alternative hypothesis is no t
Tmax = max 1j (5.1.13)
specific to slippage of a single population). Even the Granger and Neave j=1,2, ... , n
(1968) test is usually much better than the four Neave (1972) tests. The
Granger and Neave (1968) test was offered as a quick test avoiding and conclude that the population yielding T max has slipped if T max is
sufficiently large.
laborious sorting of data or special tables. It amounts to considering (for
testing upward slippage) the k largest observations in the combined ordered Specifically we reject the basic hypothesis of homogeneity of distribution
at level a if
sample of mn observations. If
Tmax> Àa
Xi = number of observations in the selected k which come from the jth
sample, when, under the basic hypothesis, Àa is as small as possible subject to
the test statistic is P( T max > Àa) ~a.
n
(A corresponding test for slippage to the left is based appropriately on
S= I x;
j=1
(5.1.11)
T min =mini= 1, 2, ... , n 1j with rejection if T min is sufficiently small.)
182 Outliers in statistica[ data Outlying sub-samples: slippage tests 183

Doornbos and Prins (1958) present the n alternative hypotheses in the sample which exceed the maximum observation in the rank n sample. The
non-parametric forms null hypothesis of homogeneity of location is rejected for sufficiently large
Hi :P(Xi >Xj)~! M(l, n) and Conover gives a short table of upper-tail probabilities and a
with Xj (j :1: i) identically distributed (with the > sign applying to slippage to comprehensive t~J>le of criticai values for 5 per cent, l per cent, and 0.1 per
the right; the < sign applying to slippage to the left). cent tests over the ranges m= 4(1)10(2)20(5)40, oo; n= 2{1)20. Table XXIII
Karlin and Truax (1960) adopt a different form for the alternative on page 326 reproduces the criticai values for 5 per cent and l per cent tests
hypotheses expressing slippage. The ith population has slipped to the right if for the slightly restricted range of values of n= 2(1)6(2)20. The tabulated
the distribution functions Fj satisfy Fj = F (j :1: i) an d values here are the smallest integrai h (if such an h~ m exists) for which
P{M {l, n);;;:::: h}~ a for a= 0.05, and 0.01. (There will of course be an
~=(l- y)F+ yp2 (8< 'Y < 1). obvious duai version of this test based on ranking samples in terms of their
Applying results of Lehmann (1953) on rank tests they demonstrate that minimum observations.)
the above test is Iocally most powerful invariant (for small y, and with As with the test of Granger and Neave (1968), Conover's test does not
respect to monotone transformations). They also presenta different slippage employ an alternative hypothesis which is specific with regard to slippage (in
test for the case where the slipped population has the different type of spite of the author's description of the test as a 'slippage test'). The
aberrant distribution function: alternative hypothesis is expressed in the form: 'the distribution functions
Fi = p~+ç (ç> O). differ, a t Ieast with respect to their location parameters'. Conover reports on
an empirica! power comparison of the M(l, n) test with the traditional
Ode h (1967) has calculated the null distribution of T max and has presented F-test for an analysis of variance, in cases where the underlying distributions
a table of criticai values Àa for m= 2(1)8, n= 2(1)6 and a= 0.20, 0.10, have different forms: concluding that the M{l, n) test is more powerful for
0.05, 0.025, 0.01, 0.005, 0.001. The 5 per cent and l per cent criticai values uniform distributions or normal distributions with unequal variances. But it
are reproduced as Tabie XXII o n p age 325. The italicized figures are would seem inappropriate to consider the F-test in such cases!
obtained from an asymptotic form and should be accurate for a;;;:::: 0.01 (that Slippage tests based on rank ordering of the samples, and the statistics
is, for the values quoted in Table XXII). For a= 0.005, 0.001 the error is at M( i, j), clearly piace great emphasis on the extent to which extreme values
most one unit. (It is interesting to note that asymptotically the quantities in a sample reflect its location (or dispersion). Conover (1968), for example,
V,= Ti- m(mn + 1)/2 (' ) warns against the use of his test when 'a shift in the population means does
1 2
' [nm 2 (mn + l)/12p
1
l= ' '· · ·' n no t affect the upper tail of the distribution in the same way'.
Indeed, we encounter another prospect here which does not seem to have
are jointly distributed as the set ~ = ui- [J (i= l, 2, ... ' n) where the ui been investigated. It could be that the sample extremes are themselves
are independent N(O, 1). See Odeh, 1967 .)
individuai outliers within the particular samples, perhaps indicating contami-
Note. Using Odeh's tables a result is significant if it is strictly greater than nation of a form different from any slippage of one popuiation relative to
the relevant entry in the table; using the earlier Tabie XXI a resuit was another. The resolution of intra-sample outliers and slipped samples seems
significant if it was greater than or equal to the appropriate tabulated value; fraught with difficulty, and many non-parametric extreme rank slippage tests
Table XXII re-expresses Odeh's results in this latter form. will be far from robust against individuai outliers. Neave (1975) suggests
that intra-sampie outliers will tend to reduce the power of slippage tests
There is a simple relationship between the criticai Ievel Àa and the
rather than to induce wrong decisions about slippage, but this is by no
corresponding vaiue À ~ for a test for slippage to the left. W e h ave
means obvious an d merits much wider study. With parametric models we
À~ + Àa = m(mn +l) {5.1.14) can of course attempt to test individuai outliers using the results of Chapters
so that Tabie XXII also serves for testing slippage to the Ieft for a single 3 or 7 on outliers in single samples or in designed experiments.
population.
The principle of ranking samples in terms of the maximum observations, 5.1.2 Non-parametric tests for slippage of several popu.lations: multiple
which was described above, appears to have been proposed independently at comparisons
about the same time by Bofinger (1965) and Conover (1965). Conover We bave remarked on the non-specific form of the alternative hypothesis in
(1968) uses it in proposing a 'slippage test' for equal-sized samples based on many slippage tests. It is natural that such tests should be accompanied by
the test statistic M(l, n). This is the number of observations in the rank l proposals for multiple comparisons between the populations, and this is
182 Outliers in statistica[ data Outlying sub-samples: slippage tests 183

Doornbos and Prins (1958) present the n alternative hypotheses in the sample which exceed the maximum observation in the rank n sample. The
non-parametric forms null hypothesis of homogeneity of location is rejected for sufficiently large
Hi :P(Xi >Xj)~! M(l, n) and Conover gives a short table of upper-tail probabilities and a
with Xj (j :1: i) identically distributed (with the > sign applying to slippage to comprehensive t~J>le of criticai values for 5 per cent, l per cent, and 0.1 per
the right; the < sign applying to slippage to the left). cent tests over the ranges m= 4(1)10(2)20(5)40, oo; n= 2{1)20. Table XXIII
Karlin and Truax (1960) adopt a different form for the alternative on page 326 reproduces the criticai values for 5 per cent and l per cent tests
hypotheses expressing slippage. The ith population has slipped to the right if for the slightly restricted range of values of n= 2(1)6(2)20. The tabulated
the distribution functions Fj satisfy Fj = F (j :1: i) an d values here are the smallest integrai h (if such an h~ m exists) for which
P{M {l, n);;;:::: h}~ a for a= 0.05, and 0.01. (There will of course be an
~=(l- y)F+ yp2 (8< 'Y < 1). obvious duai version of this test based on ranking samples in terms of their
Applying results of Lehmann (1953) on rank tests they demonstrate that minimum observations.)
the above test is Iocally most powerful invariant (for small y, and with As with the test of Granger and Neave (1968), Conover's test does not
respect to monotone transformations). They also presenta different slippage employ an alternative hypothesis which is specific with regard to slippage (in
test for the case where the slipped population has the different type of spite of the author's description of the test as a 'slippage test'). The
aberrant distribution function: alternative hypothesis is expressed in the form: 'the distribution functions
Fi = p~+ç (ç> O). differ, a t Ieast with respect to their location parameters'. Conover reports on
an empirica! power comparison of the M(l, n) test with the traditional
Ode h (1967) has calculated the null distribution of T max and has presented F-test for an analysis of variance, in cases where the underlying distributions
a table of criticai values Àa for m= 2(1)8, n= 2(1)6 and a= 0.20, 0.10, have different forms: concluding that the M{l, n) test is more powerful for
0.05, 0.025, 0.01, 0.005, 0.001. The 5 per cent and l per cent criticai values uniform distributions or normal distributions with unequal variances. But it
are reproduced as Tabie XXII o n p age 325. The italicized figures are would seem inappropriate to consider the F-test in such cases!
obtained from an asymptotic form and should be accurate for a;;;:::: 0.01 (that Slippage tests based on rank ordering of the samples, and the statistics
is, for the values quoted in Table XXII). For a= 0.005, 0.001 the error is at M( i, j), clearly piace great emphasis on the extent to which extreme values
most one unit. (It is interesting to note that asymptotically the quantities in a sample reflect its location (or dispersion). Conover (1968), for example,
V,= Ti- m(mn + 1)/2 (' ) warns against the use of his test when 'a shift in the population means does
1 2
' [nm 2 (mn + l)/12p
1
l= ' '· · ·' n no t affect the upper tail of the distribution in the same way'.
Indeed, we encounter another prospect here which does not seem to have
are jointly distributed as the set ~ = ui- [J (i= l, 2, ... ' n) where the ui been investigated. It could be that the sample extremes are themselves
are independent N(O, 1). See Odeh, 1967 .)
individuai outliers within the particular samples, perhaps indicating contami-
Note. Using Odeh's tables a result is significant if it is strictly greater than nation of a form different from any slippage of one popuiation relative to
the relevant entry in the table; using the earlier Tabie XXI a resuit was another. The resolution of intra-sample outliers and slipped samples seems
significant if it was greater than or equal to the appropriate tabulated value; fraught with difficulty, and many non-parametric extreme rank slippage tests
Table XXII re-expresses Odeh's results in this latter form. will be far from robust against individuai outliers. Neave (1975) suggests
that intra-sampie outliers will tend to reduce the power of slippage tests
There is a simple relationship between the criticai Ievel Àa and the
rather than to induce wrong decisions about slippage, but this is by no
corresponding vaiue À ~ for a test for slippage to the left. W e h ave
means obvious an d merits much wider study. With parametric models we
À~ + Àa = m(mn +l) {5.1.14) can of course attempt to test individuai outliers using the results of Chapters
so that Tabie XXII also serves for testing slippage to the Ieft for a single 3 or 7 on outliers in single samples or in designed experiments.
population.
The principle of ranking samples in terms of the maximum observations, 5.1.2 Non-parametric tests for slippage of several popu.lations: multiple
which was described above, appears to have been proposed independently at comparisons
about the same time by Bofinger (1965) and Conover (1965). Conover We bave remarked on the non-specific form of the alternative hypothesis in
(1968) uses it in proposing a 'slippage test' for equal-sized samples based on many slippage tests. It is natural that such tests should be accompanied by
the test statistic M(l, n). This is the number of observations in the rank l proposals for multiple comparisons between the populations, and this is
184 Outliers in statistica[ data Outlying sub-samples: slippage tests 185

often so. The aim is to group the populations into sub-groups within which null distribution of M(j, j + 1), Conover presents algebraic expressions, tabu-
populations are similar with respect to location, dispersion, or distribution lated values and illustrations of the technique (avoiding the restriction of
(although the informality of structure of many published tests usualiy makes equal sample sizes). It is tentatively suggested that an overall significance
it impossible to attribute a particular similarity criterion). level a can be achieved by employing a significance level of a/( n -l) for
Before considering such generai grouping methods it is interesting to each of the individuai comparisons.
enquire if there are any slippage tests which directly paraliel work on Another multiple comparison approach to slippage is proposed by Neave
multiple outliers either in the sense of consecutive tests or of tests for a (197 5). Again the alternative hypothesis is no t specific: it merely states that
prescribed number of slipped populations. not ali location parameters are equal, but suggests that non-equality is
Bofinger (1965) provides an example of the latter interest. To test if a usually manifest in a minority of the location parameters 'straying in one
prescribed number, k, of n populations have slipped to the right he proposes direction or another'. This attitude once more supports the quest for a
the test statistic k
means of dividing the populations into distinct groups with regard to their
L M(i, k+l)
i=l
location.
A consecutive-type procedure is proposed, with no restriction on sample
sizes. Firstly we examine M(l, 2) to test for upward slippage of precisely one
based on ranking n samples of m observations. If the test statistic is
population. We then consider
sufficiently large we conclude that the k populations which yield the k
highest-ranking samples have slipped. (There will be an obvious dual test for M(l, 3)+M(2, 3)-M(l, 2)
slippage to the left of a prescribed set of k populations, based on sample
i.e. the number of observations in the samples with the two greatest maxima
ordering with respect to minimum observations in the samples.) An expres-
which exceed ali other observations less the number in the highest rank
sion is given for the upper-tail probabilities in the null distribution, and
sample which exceed ali others. Then we examine
selected values are tabulated (k = 2, 3; n= 3(1)6; m= 3, 5, 7, 10{5)25, oo).
Conover (1968) suggests two 'consecutive' procedures. In the first we M(l, 4) + M(2, 4) + M{3, 4)- M{l, 3)- M(2, 3),
consider M(l, 2), M(l, 3), M{l, 4) and so on until a significant result is first and so on. Assessing significance appropriately at each stage using condi-
obtained. Suppose this happens with M(l, j); we conclude that populations tional probability distributions (described by Neave, 19.75, and exhibited in
yielding the samples of rank l, 2, ... , j- l are indistinguishable but they tables of criticai values for 5 per cent and l per cent tests) we continue until
have ali slipped relative to the others. Alternatively we might consider stage [n/2]. Beyond this we would be implicitly examining slippage to the
M(l, n), M{l, n -1), M{l, n- 2) and so on until the first insignificant result left {downwards) of the residual smaller group of lower rank samples and it
is obtained. If this arises with M{l, j); we conclude that the populations is suggested that this be approached directly from the opposite end using the
yielding the j- l lowest ranking samples are indistinguishable and ha ve complementary argument for study of downward slippage {up to stage
slipped relative to the others. Other possible consecutive schemes might be [(n -1)/2]). The switchover is admittedly arbitrary in its effect.
to consider The type of conclusion that might be drawn from the Neave (1975) test is
M(l, n), M(2, n), M(3, n) ...
illustrated by an example he discusses. Using six samples each of size 20, he
or
:finds from study of
M(n -l, n), M( n -2, n), M( n -3, n) ... M{l, 3)+M(2, 3)-M(l, 2)
or
M(l, 2), M(2, 3), M(3, 4) ... strong evidence of slippage to the right in two populations. M{l, 2) provides
less convincing evidence that one of the two has slipped to the right further
In ali these cases the aim is to dichotomize the set of populations into two than the other. A t the third stage the result is insignificant, but the
subsets wh ere o ne has slipped relative to the other. complementary analysis reveals significant evidence of slippage to the left
To obtain a more refined grouping of the populations Conover (1968) for one population. Thus the six populations become partitioned in the form
suggests consecutive examinations of the members of the last of the above
sets of statistics: * *** * *
M{l, 2), M{2, 3), M{3, 4) .... Ali the non-parametric slippage procedures that we have considered have
the double advantage of being distribution-free and very simple to imple-
In this way we look for significant differences between samples of con- ment. Set against these advantages, however, are a variety of disadvantages.
secutive rank. Using an approximation to the upper-tail probabilities for the Non-parametric tests may have relatively low power. Alternative hypotheses
184 Outliers in statistica[ data Outlying sub-samples: slippage tests 185

often so. The aim is to group the populations into sub-groups within which null distribution of M(j, j + 1), Conover presents algebraic expressions, tabu-
populations are similar with respect to location, dispersion, or distribution lated values and illustrations of the technique (avoiding the restriction of
(although the informality of structure of many published tests usualiy makes equal sample sizes). It is tentatively suggested that an overall significance
it impossible to attribute a particular similarity criterion). level a can be achieved by employing a significance level of a/( n -l) for
Before considering such generai grouping methods it is interesting to each of the individuai comparisons.
enquire if there are any slippage tests which directly paraliel work on Another multiple comparison approach to slippage is proposed by Neave
multiple outliers either in the sense of consecutive tests or of tests for a (197 5). Again the alternative hypothesis is no t specific: it merely states that
prescribed number of slipped populations. not ali location parameters are equal, but suggests that non-equality is
Bofinger (1965) provides an example of the latter interest. To test if a usually manifest in a minority of the location parameters 'straying in one
prescribed number, k, of n populations have slipped to the right he proposes direction or another'. This attitude once more supports the quest for a
the test statistic k
means of dividing the populations into distinct groups with regard to their
L M(i, k+l)
i=l
location.
A consecutive-type procedure is proposed, with no restriction on sample
sizes. Firstly we examine M(l, 2) to test for upward slippage of precisely one
based on ranking n samples of m observations. If the test statistic is
population. We then consider
sufficiently large we conclude that the k populations which yield the k
highest-ranking samples have slipped. (There will be an obvious dual test for M(l, 3)+M(2, 3)-M(l, 2)
slippage to the left of a prescribed set of k populations, based on sample
i.e. the number of observations in the samples with the two greatest maxima
ordering with respect to minimum observations in the samples.) An expres-
which exceed ali other observations less the number in the highest rank
sion is given for the upper-tail probabilities in the null distribution, and
sample which exceed ali others. Then we examine
selected values are tabulated (k = 2, 3; n= 3(1)6; m= 3, 5, 7, 10{5)25, oo).
Conover (1968) suggests two 'consecutive' procedures. In the first we M(l, 4) + M(2, 4) + M{3, 4)- M{l, 3)- M(2, 3),
consider M(l, 2), M(l, 3), M{l, 4) and so on until a significant result is first and so on. Assessing significance appropriately at each stage using condi-
obtained. Suppose this happens with M(l, j); we conclude that populations tional probability distributions (described by Neave, 19.75, and exhibited in
yielding the samples of rank l, 2, ... , j- l are indistinguishable but they tables of criticai values for 5 per cent and l per cent tests) we continue until
have ali slipped relative to the others. Alternatively we might consider stage [n/2]. Beyond this we would be implicitly examining slippage to the
M(l, n), M{l, n -1), M{l, n- 2) and so on until the first insignificant result left {downwards) of the residual smaller group of lower rank samples and it
is obtained. If this arises with M{l, j); we conclude that the populations is suggested that this be approached directly from the opposite end using the
yielding the j- l lowest ranking samples are indistinguishable and ha ve complementary argument for study of downward slippage {up to stage
slipped relative to the others. Other possible consecutive schemes might be [(n -1)/2]). The switchover is admittedly arbitrary in its effect.
to consider The type of conclusion that might be drawn from the Neave (1975) test is
M(l, n), M(2, n), M(3, n) ...
illustrated by an example he discusses. Using six samples each of size 20, he
or
:finds from study of
M(n -l, n), M( n -2, n), M( n -3, n) ... M{l, 3)+M(2, 3)-M(l, 2)
or
M(l, 2), M(2, 3), M(3, 4) ... strong evidence of slippage to the right in two populations. M{l, 2) provides
less convincing evidence that one of the two has slipped to the right further
In ali these cases the aim is to dichotomize the set of populations into two than the other. A t the third stage the result is insignificant, but the
subsets wh ere o ne has slipped relative to the other. complementary analysis reveals significant evidence of slippage to the left
To obtain a more refined grouping of the populations Conover (1968) for one population. Thus the six populations become partitioned in the form
suggests consecutive examinations of the members of the last of the above
sets of statistics: * *** * *
M{l, 2), M{2, 3), M{3, 4) .... Ali the non-parametric slippage procedures that we have considered have
the double advantage of being distribution-free and very simple to imple-
In this way we look for significant differences between samples of con- ment. Set against these advantages, however, are a variety of disadvantages.
secutive rank. Using an approximation to the upper-tail probabilities for the Non-parametric tests may have relatively low power. Alternative hypotheses
186 Outliers in statistical data Outlying sub-samples: slippage tests 187

are often unspecified so tbat it is not clear wbat are tbe implications of seen, and it seems best to consider en masse tbe variety of single sample
rejecting tbe null bypotbesis. Tests based on ranks will not necessarily reflect outlier problems to reasonably represent tbe range of work on outliers.
in any immediate sense slippage of location, dispersion, or distribution; tbey (ii) Tbe n-sample slippage problem (witbout tbe restriction mi= l, ali j)
are also likely to be affected in an unpredictable way by individuai intra- figures in tbe literature as a topic in its own rigbt. It is tbis body of work
sample outliers. Multiple comparison metbods bave a renowned difficulty in wbicb is covered in tbe present cbapter.
relating tbe overall significance level of tbe battery of tests to tbe signifi-
cance levels of individuai comparisons. Finally, we migbt also query tbe Notwitbstanding (ii), empbasis in tbe literature is somewbat confused. For
extent to wbicb tbe masking effect in consecutive tests of outliers bas a example, in a seminai paper 'on slippage', Karlin and Truax (1960) consider
counterpart in tbe Conover, or Neave, con~ecutive multiple comparison observations xb x 2, ... , Xn from populations 1Tb 1r2 , ••• , 1Tn and examine a
slippage procedures. Tbere is clearly mucb work to be done to resolve tbe means of determining wbetber one population bas slipped. Tbus it appears
difficulties and to effect a valid comparison of tbe advantages and disadvan- as if tbe term 'slippage' is used in relation to samples eacb of size l.
tages. However, tbe autbors explain tbat tbe xi are typically sufficient statistics
based on a sample from eacb population. Tbe text entitled Slippage Tests by
5.2 THE SLIPPAGE MODEL Doornbos (1966) is, in spite of its title, largely concerned witb testing
outliers in single samples: it is again only to tbe extent tbat 'observations'
We bave already encountered a degree of confusion in tbe literature on may be interpreted as summary statistics from samples tbat tbe term
wbat is meant by slippage. Before proceeding to study parametric slippage 'slippage' corresponds witb our usage. Botb tbese works, and many otber
tests it is necessary to spend some time defining wbat we mean in tbis book important contributions, are discussed in Section 5.3 o n parametric slippage
by tbe slippage problem, bow it is distinguisbed from tbe wider study of tests.
outliers and wbere we draw tbe line in relation to tbe generai statistical Otber tangential topics wbicb we will not include furtber under tbe
problem of examining bomogeneity of distribution. 'slippage' label include tbe following.
We suppose tbat tbe data set can be represented as n samples of sizes
(a) Tests (parametric or non-parametric) of bomogeneity of distribution
m 1 , m2 , ••• , mn wbere at least one of tbe mi exceeds one (usually most of
against bigbly structured alternative bypotbeses, sucb as analysis of variance,
tbe sample sizes will be greater tban one). Our interest is in testing if one (or
and generalizations of two-sample Wilcoxon and Mann-Wbitney tests (see,
a few) of tbe complete samples come from population(s) wbicb bave slipped
for example, Friedman, 1940; Kruskal and Wallis, 1952; Downton 1976).
relative to tbe otbers eitber in baving different location, or dispersion, from
tbat of tbe majority of tbe populations. Tbis will be reflected in an alterna- (b) Inference under an ordered alternative bypotbesis. For example, if tbe
tive bypotbesis wbicb expresses sucb cbange of location or dispersion eitber populations bave means #J-b ~J- 2 , ••• , IJ.n;
explicitly, or implicitly tbrougb a cbange of distribution. Tbe working (null) H: #l-1 = /J-2 = · · · = /J-n
bypotbesis declares bomogeneity of distribution for ali n populations. Tbus
we extend tbe one-sample outlier problem naturally to a problem of
H: ~J-1~ ~J.z~ ... ~ ~J.n-
outlying samples in a set of samples. See Barlow, Bartbolomew, Bremner, and Brunk (1972) for generai results
If interest centres on outliers within particular samples we can use results in tbis area.
in otber parts of tbe book on individuai or multiple outliers in single (c) Identification and ranking procedures for distributions or means. For
samples, designed experiments, or regression or time-series models, as example, tbe selection of a subset of l populations wbicb contain tbose witb
appropriate. tbe k ( ~ l) largest means. T o a degree tbe informai classification metbods of
Tbe distinction is least clear if all mi= l, wbere results on outliers in Conover, and Neave, described in Section 5.1.2 above, are in tbis category
single samples witb a slippage-type alternative bypotbesis would seem to be but we sball not furtber pursue sucb work. Becbbofer, Kiefer, and Sobel
just a special case of tbe slippage problem defined above. Tbis is formally (1968) expound a sequential approacb.
true, but to distinguisb cases wbere at least one of tbe mi exceeds one bas
operational importance in two respects. 5.3 PARAMETRIC SLIPPAGE TESTS
(i) Tbe slippage problem witb ali mi = l is just one version of tbe single- If we know, or are prepared to make precise assumptions, about tbe form of
sample outlier problem, wbere tbe alternative bypotbesis is of a slippage tbe populations from wbicb our sub-samples arise tben it becomes relevant
type. Many otber forms of alternative bypotbesis are feasible as we bave to use population-specific slippage tests.
186 Outliers in statistical data Outlying sub-samples: slippage tests 187

are often unspecified so tbat it is not clear wbat are tbe implications of seen, and it seems best to consider en masse tbe variety of single sample
rejecting tbe null bypotbesis. Tests based on ranks will not necessarily reflect outlier problems to reasonably represent tbe range of work on outliers.
in any immediate sense slippage of location, dispersion, or distribution; tbey (ii) Tbe n-sample slippage problem (witbout tbe restriction mi= l, ali j)
are also likely to be affected in an unpredictable way by individuai intra- figures in tbe literature as a topic in its own rigbt. It is tbis body of work
sample outliers. Multiple comparison metbods bave a renowned difficulty in wbicb is covered in tbe present cbapter.
relating tbe overall significance level of tbe battery of tests to tbe signifi-
cance levels of individuai comparisons. Finally, we migbt also query tbe Notwitbstanding (ii), empbasis in tbe literature is somewbat confused. For
extent to wbicb tbe masking effect in consecutive tests of outliers bas a example, in a seminai paper 'on slippage', Karlin and Truax (1960) consider
counterpart in tbe Conover, or Neave, con~ecutive multiple comparison observations xb x 2, ... , Xn from populations 1Tb 1r2 , ••• , 1Tn and examine a
slippage procedures. Tbere is clearly mucb work to be done to resolve tbe means of determining wbetber one population bas slipped. Tbus it appears
difficulties and to effect a valid comparison of tbe advantages and disadvan- as if tbe term 'slippage' is used in relation to samples eacb of size l.
tages. However, tbe autbors explain tbat tbe xi are typically sufficient statistics
based on a sample from eacb population. Tbe text entitled Slippage Tests by
5.2 THE SLIPPAGE MODEL Doornbos (1966) is, in spite of its title, largely concerned witb testing
outliers in single samples: it is again only to tbe extent tbat 'observations'
We bave already encountered a degree of confusion in tbe literature on may be interpreted as summary statistics from samples tbat tbe term
wbat is meant by slippage. Before proceeding to study parametric slippage 'slippage' corresponds witb our usage. Botb tbese works, and many otber
tests it is necessary to spend some time defining wbat we mean in tbis book important contributions, are discussed in Section 5.3 o n parametric slippage
by tbe slippage problem, bow it is distinguisbed from tbe wider study of tests.
outliers and wbere we draw tbe line in relation to tbe generai statistical Otber tangential topics wbicb we will not include furtber under tbe
problem of examining bomogeneity of distribution. 'slippage' label include tbe following.
We suppose tbat tbe data set can be represented as n samples of sizes
(a) Tests (parametric or non-parametric) of bomogeneity of distribution
m 1 , m2 , ••• , mn wbere at least one of tbe mi exceeds one (usually most of
against bigbly structured alternative bypotbeses, sucb as analysis of variance,
tbe sample sizes will be greater tban one). Our interest is in testing if one (or
and generalizations of two-sample Wilcoxon and Mann-Wbitney tests (see,
a few) of tbe complete samples come from population(s) wbicb bave slipped
for example, Friedman, 1940; Kruskal and Wallis, 1952; Downton 1976).
relative to tbe otbers eitber in baving different location, or dispersion, from
tbat of tbe majority of tbe populations. Tbis will be reflected in an alterna- (b) Inference under an ordered alternative bypotbesis. For example, if tbe
tive bypotbesis wbicb expresses sucb cbange of location or dispersion eitber populations bave means #J-b ~J- 2 , ••• , IJ.n;
explicitly, or implicitly tbrougb a cbange of distribution. Tbe working (null) H: #l-1 = /J-2 = · · · = /J-n
bypotbesis declares bomogeneity of distribution for ali n populations. Tbus
we extend tbe one-sample outlier problem naturally to a problem of
H: ~J-1~ ~J.z~ ... ~ ~J.n-
outlying samples in a set of samples. See Barlow, Bartbolomew, Bremner, and Brunk (1972) for generai results
If interest centres on outliers within particular samples we can use results in tbis area.
in otber parts of tbe book on individuai or multiple outliers in single (c) Identification and ranking procedures for distributions or means. For
samples, designed experiments, or regression or time-series models, as example, tbe selection of a subset of l populations wbicb contain tbose witb
appropriate. tbe k ( ~ l) largest means. T o a degree tbe informai classification metbods of
Tbe distinction is least clear if all mi= l, wbere results on outliers in Conover, and Neave, described in Section 5.1.2 above, are in tbis category
single samples witb a slippage-type alternative bypotbesis would seem to be but we sball not furtber pursue sucb work. Becbbofer, Kiefer, and Sobel
just a special case of tbe slippage problem defined above. Tbis is formally (1968) expound a sequential approacb.
true, but to distinguisb cases wbere at least one of tbe mi exceeds one bas
operational importance in two respects. 5.3 PARAMETRIC SLIPPAGE TESTS
(i) Tbe slippage problem witb ali mi = l is just one version of tbe single- If we know, or are prepared to make precise assumptions, about tbe form of
sample outlier problem, wbere tbe alternative bypotbesis is of a slippage tbe populations from wbicb our sub-samples arise tben it becomes relevant
type. Many otber forms of alternative bypotbesis are feasible as we bave to use population-specific slippage tests.
188 Outliers in statistica[ data Outlying sub-samples: slippage tests 189

5.3.1 N ormai samples an d s; is an independent external estimate of u 2 with vs;/ u 2 ~ then we x;,
have considered (Section 3.4.3; Test N v2) testing the discordancy of a single
The typical situation might be one in which we consider n independent
upper outlier by means of the test statistic
normal random samples, each of size m, from distributions N(ILi' u 2 )
(j =l, 2, ... , n) where the ILi' and u 2 , are unknown and we are interested in (x(n)-i).J(n+ v-1)
(5.3.2)
testing for upward slippage in the mean of at most one distribution. The t ["n 2~ ·
L...j=l (Xj- X-)2+ VSvJ2
working hypothesis is But in the slippage problem described above, the sample means are,
(j = l, 2, ... , n) under H, independent N(IL, u 2 /m), and we could test for discordancy of the
upper mean outlier by using the corresponding form,
with IL unspecified, whilst the alternative hyp4lthesis is
(imax- i).J(n +v -l)
ii : ILi = IL (j ~ i) [Li= 1 (ii- i) 2 + vs;!z
ILi=IL+a where i is now the overall sample mean and s; is some appropriate external
where i, IL' and a are unspecified, with a> O. statistic where mvs;/u 2 ~x;. Now (5.3.1) is (apart from a factor .J[m/(n +
An appropriate test statistic is v-1)])
of precisely this form with v= n(m -l) since
T= m(imax-i) (5 3 )
1nm n 1nm
(xjz- i) 2J! .. l
[l:f=lL!:l
-L L (Xiz-i)
m i=l 1=1
2
= L (ii-i)
i=l
2
+- L l=l
m i=l
L (xi -ii)
1
2

where xi 1 is the lth observation in the jth sample (j = l, 2, ... , n; l =


l, 2, ... , m), i is the overall sample mean = sl + s2
where s2 , independent of s 1 , is such that ms 2 /u 2 ~ X~<m-l) irrespective of the
values of the ILt
Thus the slippage test based on the test statistic T in (5.3.1) is just a
and imax is the largest of the individuai sample means, familiar single outlier test applied to the means of the equal-sized samples.
The test can be implemented using appropriately the tabulated criticai
values for the single outlier test of discordancy based on (5.3.2). Specifically
we need to refer T to criticai values ha= da v'[m!(mn -l)] where da are the
tabulated criticai values in Table VIlle on page 302. Alternatively the test
We see that T is an inclusive stafistic in the sense of Section 3.1.1 since the
can be conducted by referring T.J[(mn -l)/ m] to the values in Table VIlle.
denominator involves all the data. We accept H if T is sufficiently small,
otherwise we reject H and conclude that the sample yielding imax comes Example 5.2. Stress trials were conducted on five processes producing
from a distribution whose mean ILM has slipped upwards in the sense of ii castings for oil pipes. Five random samples of ten castings were chosen from
(that is, i = M). the outputs of the processes, yielding the ( linearly transformed) breaking
We shall examine the credentials of the test statistic (5.3.1) later. For the strains shown in Table 5.1, expressed in appropriate units.
moment we note its intuitive appeal as a standardized measure of aberrance Sample sizes are small so that any formai test of normality is not feasible;
of the sample with the largest mean. Specifically, H is rejected at level a if probability plots provide no serious contra-indication. Proceeding to a Bartlett
homogeneity of variance test we obtain a test statistic value of 1.48 which is
T> ha
not significant as x~- Thus we shall model the data set by normal distributions
wh ere ha is the upper a -point of the null distribution of T. of constant variance. The null hypothesis of equality of mean breaking strain
If m= l, (5.3.1) is just the test statistic for the Pearson and Chandra could be tested against a global alternative by means of a one-way analysis of
Sekar (1936) single outlier test (see Section 3.4.3; Test Nl) and has the variance. This yields a mean square ratio of 5.06: highly significant as F4,4S·
optimality properties of that test against a slippage-type alternative So the means do not seem to be equal. But special interests might dictate a
hypothesis. W e shall see that analogous properties hold for the (m~ l) more specific alternative hypothesis of a slippage type. Suppose at least four
slippage test. Another connection with single outlier tests arises as follows. processes are to be used in the mass-production of the castings. A prudent
If xh x 2, ... , Xn is (under a null hypothesis) a random sample from N(IL, u 2 ), policy might be to test for downward slippage in the mean of at most one
188 Outliers in statistica[ data Outlying sub-samples: slippage tests 189

5.3.1 N ormai samples an d s; is an independent external estimate of u 2 with vs;/ u 2 ~ then we x;,
have considered (Section 3.4.3; Test N v2) testing the discordancy of a single
The typical situation might be one in which we consider n independent
upper outlier by means of the test statistic
normal random samples, each of size m, from distributions N(ILi' u 2 )
(j =l, 2, ... , n) where the ILi' and u 2 , are unknown and we are interested in (x(n)-i).J(n+ v-1)
(5.3.2)
testing for upward slippage in the mean of at most one distribution. The t ["n 2~ ·
L...j=l (Xj- X-)2+ VSvJ2
working hypothesis is But in the slippage problem described above, the sample means are,
(j = l, 2, ... , n) under H, independent N(IL, u 2 /m), and we could test for discordancy of the
upper mean outlier by using the corresponding form,
with IL unspecified, whilst the alternative hyp4lthesis is
(imax- i).J(n +v -l)
ii : ILi = IL (j ~ i) [Li= 1 (ii- i) 2 + vs;!z
ILi=IL+a where i is now the overall sample mean and s; is some appropriate external
where i, IL' and a are unspecified, with a> O. statistic where mvs;/u 2 ~x;. Now (5.3.1) is (apart from a factor .J[m/(n +
An appropriate test statistic is v-1)])
of precisely this form with v= n(m -l) since
T= m(imax-i) (5 3 )
1nm n 1nm
(xjz- i) 2J! .. l
[l:f=lL!:l
-L L (Xiz-i)
m i=l 1=1
2
= L (ii-i)
i=l
2
+- L l=l
m i=l
L (xi -ii)
1
2

where xi 1 is the lth observation in the jth sample (j = l, 2, ... , n; l =


l, 2, ... , m), i is the overall sample mean = sl + s2
where s2 , independent of s 1 , is such that ms 2 /u 2 ~ X~<m-l) irrespective of the
values of the ILt
Thus the slippage test based on the test statistic T in (5.3.1) is just a
and imax is the largest of the individuai sample means, familiar single outlier test applied to the means of the equal-sized samples.
The test can be implemented using appropriately the tabulated criticai
values for the single outlier test of discordancy based on (5.3.2). Specifically
we need to refer T to criticai values ha= da v'[m!(mn -l)] where da are the
tabulated criticai values in Table VIlle on page 302. Alternatively the test
We see that T is an inclusive stafistic in the sense of Section 3.1.1 since the
can be conducted by referring T.J[(mn -l)/ m] to the values in Table VIlle.
denominator involves all the data. We accept H if T is sufficiently small,
otherwise we reject H and conclude that the sample yielding imax comes Example 5.2. Stress trials were conducted on five processes producing
from a distribution whose mean ILM has slipped upwards in the sense of ii castings for oil pipes. Five random samples of ten castings were chosen from
(that is, i = M). the outputs of the processes, yielding the ( linearly transformed) breaking
We shall examine the credentials of the test statistic (5.3.1) later. For the strains shown in Table 5.1, expressed in appropriate units.
moment we note its intuitive appeal as a standardized measure of aberrance Sample sizes are small so that any formai test of normality is not feasible;
of the sample with the largest mean. Specifically, H is rejected at level a if probability plots provide no serious contra-indication. Proceeding to a Bartlett
homogeneity of variance test we obtain a test statistic value of 1.48 which is
T> ha
not significant as x~- Thus we shall model the data set by normal distributions
wh ere ha is the upper a -point of the null distribution of T. of constant variance. The null hypothesis of equality of mean breaking strain
If m= l, (5.3.1) is just the test statistic for the Pearson and Chandra could be tested against a global alternative by means of a one-way analysis of
Sekar (1936) single outlier test (see Section 3.4.3; Test Nl) and has the variance. This yields a mean square ratio of 5.06: highly significant as F4,4S·
optimality properties of that test against a slippage-type alternative So the means do not seem to be equal. But special interests might dictate a
hypothesis. W e shall see that analogous properties hold for the (m~ l) more specific alternative hypothesis of a slippage type. Suppose at least four
slippage test. Another connection with single outlier tests arises as follows. processes are to be used in the mass-production of the castings. A prudent
If xh x 2, ... , Xn is (under a null hypothesis) a random sample from N(IL, u 2 ), policy might be to test for downward slippage in the mean of at most one
190 Outliers in statistica! data Outlying sub-samples: slippage tests 191

Table 5.1 using the statistic, proposed by Cochran (1941),

s>;,ax/1t si
Process 1 2 3 4 5
52 64 80 33 68 1

58 47 74 58 52 where s~ax is the largest of the (unbiased) sample variance estimates, sf.
49 50 64 43 75
45 44 84 51 56 For downward slippage in the variance, that is where under H: CTJ = CT 2
54 37 44 25 60 (j# i), CTf = bCT 2 (O< b < 1), we would use
40 51 60 40 49
66
67
30
52
,55
47
55
37
53
41 s>;,,.flt, sJ
73 76 63 50 62
46 56 70 15 56 where s~in is the smallest of the sf.
means, ii 55.0 50.70 64.10 40.70 57.20 The distributional properties of the test statistics s~axii s'f and s'!.in/I s'f
variances, sJ 116.67 169.12 175.43 186.90 93.51 in the null cases bave been quite widely studied (Cochran, 1941; Chandra
Sekar and Francis, 1941; Eisenhart and Solomon, 1947; and Chambers,
1967, for the first statistic, and Doornbos, 1956, for the second statistic).
population. If we reject the null hypothesis in favour of this alternative it might Tables I and XXIV on pages 290-291 and 327 give approximate criticai
be best to operate just four (rather than five) processes, omitting the one which values for 5 per cent and l per cent tests in the two cases, respectively.
seems to yield an inferior mean breaking strain. Employing (5.3.1) suitably These are reproduced from Eisenhart, Hastay, and Wallis (1947), and from
modified for downward slippage we calculate Doornbos (1956).
Note. Table I is styled for testing discordancy in gamma samples. For
10(53.54-40.70) =l 305
present purposes it needs to be entered with r =(m -1)/2.
(967 4.42)4 .
Example 5.3 (continuation of Example 5.2). A sixth process yields a
where imin denotes th6Jsmallest of the sample means. Thus TJ[(mn -1)/m]= random sample of ten castings with breaking strains:
2.89.
87, 39, 55, 30, 72, 77, 51, 44, 42, 94.
Table VIlle on page 302 with n = 5 and v= 45 shows that this result is
highly significant (the upper l per cent point is about 2.5) and it is sensible to This gives i 6 = 59.10, s~ = 481.88. The value of the sample variance is
exclude process 4. disquietening. Por the earlier five samples we accepted the homogeneity of their
(In passing we might note that process 3 supports an alternative hypothesis population variances after an appropriate test. Perhaps we should now conduct
of upward slippage of the mean of one process, to almost the same degree. But a test for upward slippage of the variance in at most one of the six populations.
this is less relevant to the practical interest described above. However, the two We consider the test statistic
significant extreme means presumably account for such a highly significance F-
S~ax 481.88
ratio in the analysis of variance.) 0 394
I1 sf = 1223.51 = " '
Another case of special interest is in testing for slippage of the variance of
one population on the basis of n equal-sized samples of m observations From Table I with n= 6 and r = 4.5, we see that this is significant at the 5
from normal distributions, N(f.Li, CTT) (j = l, 2, ... , n). Irrespective of the per cent level and we would be advised to act cautiously in relation to process
values of the means we can test 6, in view of the manifest relative lack of consistency of the standard of
castings it produces as reftected by the excess variability of their breaking
H : CTJ = CT 2 ( unspecified) (j =l, 2, ... , n) strains. (Note that a test for downward slippage of variance does not show
against
process 5 as being significantly less variable than the others.)

H: CTJ= CT 2 (j# i) Having illustrated slippage tests for means and varìances in normal
samples we now proceed to a more systematic review of the nature and
CTf = bCT 2 (b >l) properties of such tests.
190 Outliers in statistica! data Outlying sub-samples: slippage tests 191

Table 5.1 using the statistic, proposed by Cochran (1941),

s>;,ax/1t si
Process 1 2 3 4 5
52 64 80 33 68 1

58 47 74 58 52 where s~ax is the largest of the (unbiased) sample variance estimates, sf.
49 50 64 43 75
45 44 84 51 56 For downward slippage in the variance, that is where under H: CTJ = CT 2
54 37 44 25 60 (j# i), CTf = bCT 2 (O< b < 1), we would use
40 51 60 40 49
66
67
30
52
,55
47
55
37
53
41 s>;,,.flt, sJ
73 76 63 50 62
46 56 70 15 56 where s~in is the smallest of the sf.
means, ii 55.0 50.70 64.10 40.70 57.20 The distributional properties of the test statistics s~axii s'f and s'!.in/I s'f
variances, sJ 116.67 169.12 175.43 186.90 93.51 in the null cases bave been quite widely studied (Cochran, 1941; Chandra
Sekar and Francis, 1941; Eisenhart and Solomon, 1947; and Chambers,
1967, for the first statistic, and Doornbos, 1956, for the second statistic).
population. If we reject the null hypothesis in favour of this alternative it might Tables I and XXIV on pages 290-291 and 327 give approximate criticai
be best to operate just four (rather than five) processes, omitting the one which values for 5 per cent and l per cent tests in the two cases, respectively.
seems to yield an inferior mean breaking strain. Employing (5.3.1) suitably These are reproduced from Eisenhart, Hastay, and Wallis (1947), and from
modified for downward slippage we calculate Doornbos (1956).
Note. Table I is styled for testing discordancy in gamma samples. For
10(53.54-40.70) =l 305
present purposes it needs to be entered with r =(m -1)/2.
(967 4.42)4 .
Example 5.3 (continuation of Example 5.2). A sixth process yields a
where imin denotes th6Jsmallest of the sample means. Thus TJ[(mn -1)/m]= random sample of ten castings with breaking strains:
2.89.
87, 39, 55, 30, 72, 77, 51, 44, 42, 94.
Table VIlle on page 302 with n = 5 and v= 45 shows that this result is
highly significant (the upper l per cent point is about 2.5) and it is sensible to This gives i 6 = 59.10, s~ = 481.88. The value of the sample variance is
exclude process 4. disquietening. Por the earlier five samples we accepted the homogeneity of their
(In passing we might note that process 3 supports an alternative hypothesis population variances after an appropriate test. Perhaps we should now conduct
of upward slippage of the mean of one process, to almost the same degree. But a test for upward slippage of the variance in at most one of the six populations.
this is less relevant to the practical interest described above. However, the two We consider the test statistic
significant extreme means presumably account for such a highly significance F-
S~ax 481.88
ratio in the analysis of variance.) 0 394
I1 sf = 1223.51 = " '
Another case of special interest is in testing for slippage of the variance of
one population on the basis of n equal-sized samples of m observations From Table I with n= 6 and r = 4.5, we see that this is significant at the 5
from normal distributions, N(f.Li, CTT) (j = l, 2, ... , n). Irrespective of the per cent level and we would be advised to act cautiously in relation to process
values of the means we can test 6, in view of the manifest relative lack of consistency of the standard of
castings it produces as reftected by the excess variability of their breaking
H : CTJ = CT 2 ( unspecified) (j =l, 2, ... , n) strains. (Note that a test for downward slippage of variance does not show
against
process 5 as being significantly less variable than the others.)

H: CTJ= CT 2 (j# i) Having illustrated slippage tests for means and varìances in normal
samples we now proceed to a more systematic review of the nature and
CTf = bCT 2 (b >l) properties of such tests.
Outlying sub-samples: slippage tests 193
192 Outliers in statistica[ data
2a/n point of the t-distribution with m n- 2 degrees of freedom. Paulson
Slippage in the mean
shows that (5.3.3) is conservative in that P(T> h~)~ a in the null case, and
The first formai study of the problem of slippage in the mean for samples that for large n the discrepancy from the true probability is less than a 2 /2.
from normal distributions is that of Paulson (1952b). This work remains a This is typical of the accuracy to which the size of slippage tests can be
cornerstone for sue h study. controlled. Generai results on this topic, wìth various applications, are
Concerned with n independent samples, each of size m, from normal presented by Doornbos (1966) and will be reviewed in Section 5.3.2 below.
distributions N(ILi' a- 2 ) (j =l, 2, ... , n), Paulson adopts a multiple decision There have been proposed many extensions of Paulson's result, new
approach by defining n+ l possible decisions ~ 0 , ~b ••• , ~n· ~o declares methods of proof and refined approximations to ha, and it is useful to
that ali the means are equal. ~i declares that ILi= IL+ a (a> O), ILi= IL summarize some of these.
(j# i). He seeks a statistica! procedure which"' is in some sense optimal for Considering first of ali the determination of ha, Quesenberry and David
judging which of the n+ l decisions to take, given the sample data. Subject (1961) develop a computational procedure which is relevant to the calcula-
to the restrictions: tion of approximate values of ha. Specifically they consider the distribution
of (x(n)- i)/S for a normal random sample xb ... , xn from N( IL, a- 2 ) having
(i) that when ~o is appropriate (that is, all means are equal) it will be
sample mean i and where
selected with probability (l_, a),
(ii) that the decision procedure is invariant with respect to changes in the
S2 = (n-l)s 2 + vs;
origin, or positive changes in the scale, of the measurement basis, with s 2 as the unbiased sample variance estimate L: (xi- xf/(n -l) and s; an
independent estimate of a- 2 wh ere vsvf a- 2 ~x;. They use a similar
(iii) the probability of adopting the correct decision, ~i' about slippage of
Bonferroni-type approximation to that employed by Paulson (1952b). If ca
the mean, does not depend on i,
is the upper a-point of the distribution of (x(n)- i)/S they show that this
a decision procedure is sought which maximizes the probability of making approach leads to an exact value of ca whenever ca> [(n- 2)/(2n)]i which
the correct decision when one population has slipped to the right in the holds for low values of n an d v; they prese n t tables of upper 5 per ce nt
sense described above. and l per cent points for n= 3(1)10, 12, 15, 20 and v= 0(1)10, 12, 15, 20,
It turns out that the optimal procedure is to use the statistic T of (5.3.1) 24, 30, 40, 50. For small n and v these are exact to the 4 d.p. accuracy
concluding either that all the means are equal (~ 0 ) if quoted, otherwise the lower and upper bounds yielded by their approach
'agree so well ... that only one value had to be tabulated'. Their tables are
T~ ha
reproduced as Tables 26a in Pearson and Hartley (1966) and, in modified
or that the population yielding the largest sample mean has slipped (~i) if form, are presented as Table VIlle on page 302. On the relationship shown
T> ha earlier in this section between the corresponding slippage and single outlier
tests, we have that
where i is the subscript of the sample yielding Xmax and ha is some constant
chosen to ensure that P( T~ ha)= a in the null case. This procedure is (5.3.4)
optimal irrespective of the (unknown) values of a and a-.
Quesenberry and David (1961) discuss an interesting example on rocket
Paulson (1952b) demonstrates that the procedure is the Bayes solution for
a uniform prior distribution over (~ 0 , ~b ••• , ~n); David (1970) modifies motor ignitors in which they first apply the single outlier discordancy test
the proof by using the Neyman-Pearson lemma. based on the appropriate form of (5.3.1) (that is, the Pearson and Chandra
Operationally the determination of the criticai values ha, for a procedure Sekar test) to each of a set of samples, and then proceed to apply the mean
slippage test to the set of samples.
of size a, is vitally important. The precise null distribution of T is compli-
cated, but Paulson uses a first-order Bonferroni inequality (see, for example, The optimal multiple-decision procedure of Paulson (1952b) arises also as
Feller, 1968, page 110) to approximate ha by a special case of the generai Bayesian decision-theoretic approach to slip-
page problems described by Karlin and Truax (1960; see also Section 5.3.2).
h '=~ n
a (
m(n -l)Fa
mn-2+Fa )
( 3 )
5 .. 3
Kudo (1956b) shows that Paulson's procedure retains its optimality even if
slippage in the mean of one population is accompanied by reduction in the
variance of the 'slipped' population. Kapur (1957) demonstrates an un-
where Fa is the upper 2a/n point of the F-distribution with degree; of
biasedness property of the Paulson procedure when used to decide which
freedom n1 = l, n 2 = mn- 2. Equivalently, Fa is (ta) 2 where ta is the upper
Outlying sub-samples: slippage tests 193
192 Outliers in statistica[ data
2a/n point of the t-distribution with m n- 2 degrees of freedom. Paulson
Slippage in the mean
shows that (5.3.3) is conservative in that P(T> h~)~ a in the null case, and
The first formai study of the problem of slippage in the mean for samples that for large n the discrepancy from the true probability is less than a 2 /2.
from normal distributions is that of Paulson (1952b). This work remains a This is typical of the accuracy to which the size of slippage tests can be
cornerstone for sue h study. controlled. Generai results on this topic, wìth various applications, are
Concerned with n independent samples, each of size m, from normal presented by Doornbos (1966) and will be reviewed in Section 5.3.2 below.
distributions N(ILi' a- 2 ) (j =l, 2, ... , n), Paulson adopts a multiple decision There have been proposed many extensions of Paulson's result, new
approach by defining n+ l possible decisions ~ 0 , ~b ••• , ~n· ~o declares methods of proof and refined approximations to ha, and it is useful to
that ali the means are equal. ~i declares that ILi= IL+ a (a> O), ILi= IL summarize some of these.
(j# i). He seeks a statistica! procedure which"' is in some sense optimal for Considering first of ali the determination of ha, Quesenberry and David
judging which of the n+ l decisions to take, given the sample data. Subject (1961) develop a computational procedure which is relevant to the calcula-
to the restrictions: tion of approximate values of ha. Specifically they consider the distribution
of (x(n)- i)/S for a normal random sample xb ... , xn from N( IL, a- 2 ) having
(i) that when ~o is appropriate (that is, all means are equal) it will be
sample mean i and where
selected with probability (l_, a),
(ii) that the decision procedure is invariant with respect to changes in the
S2 = (n-l)s 2 + vs;
origin, or positive changes in the scale, of the measurement basis, with s 2 as the unbiased sample variance estimate L: (xi- xf/(n -l) and s; an
independent estimate of a- 2 wh ere vsvf a- 2 ~x;. They use a similar
(iii) the probability of adopting the correct decision, ~i' about slippage of
Bonferroni-type approximation to that employed by Paulson (1952b). If ca
the mean, does not depend on i,
is the upper a-point of the distribution of (x(n)- i)/S they show that this
a decision procedure is sought which maximizes the probability of making approach leads to an exact value of ca whenever ca> [(n- 2)/(2n)]i which
the correct decision when one population has slipped to the right in the holds for low values of n an d v; they prese n t tables of upper 5 per ce nt
sense described above. and l per cent points for n= 3(1)10, 12, 15, 20 and v= 0(1)10, 12, 15, 20,
It turns out that the optimal procedure is to use the statistic T of (5.3.1) 24, 30, 40, 50. For small n and v these are exact to the 4 d.p. accuracy
concluding either that all the means are equal (~ 0 ) if quoted, otherwise the lower and upper bounds yielded by their approach
'agree so well ... that only one value had to be tabulated'. Their tables are
T~ ha
reproduced as Tables 26a in Pearson and Hartley (1966) and, in modified
or that the population yielding the largest sample mean has slipped (~i) if form, are presented as Table VIlle on page 302. On the relationship shown
T> ha earlier in this section between the corresponding slippage and single outlier
tests, we have that
where i is the subscript of the sample yielding Xmax and ha is some constant
chosen to ensure that P( T~ ha)= a in the null case. This procedure is (5.3.4)
optimal irrespective of the (unknown) values of a and a-.
Quesenberry and David (1961) discuss an interesting example on rocket
Paulson (1952b) demonstrates that the procedure is the Bayes solution for
a uniform prior distribution over (~ 0 , ~b ••• , ~n); David (1970) modifies motor ignitors in which they first apply the single outlier discordancy test
the proof by using the Neyman-Pearson lemma. based on the appropriate form of (5.3.1) (that is, the Pearson and Chandra
Operationally the determination of the criticai values ha, for a procedure Sekar test) to each of a set of samples, and then proceed to apply the mean
slippage test to the set of samples.
of size a, is vitally important. The precise null distribution of T is compli-
cated, but Paulson uses a first-order Bonferroni inequality (see, for example, The optimal multiple-decision procedure of Paulson (1952b) arises also as
Feller, 1968, page 110) to approximate ha by a special case of the generai Bayesian decision-theoretic approach to slip-
page problems described by Karlin and Truax (1960; see also Section 5.3.2).
h '=~ n
a (
m(n -l)Fa
mn-2+Fa )
( 3 )
5 .. 3
Kudo (1956b) shows that Paulson's procedure retains its optimality even if
slippage in the mean of one population is accompanied by reduction in the
variance of the 'slipped' population. Kapur (1957) demonstrates an un-
where Fa is the upper 2a/n point of the F-distribution with degree; of
biasedness property of the Paulson procedure when used to decide which
freedom n1 = l, n 2 = mn- 2. Equivalently, Fa is (ta) 2 where ta is the upper
194 Outliers in statistica/ data Outlying sub-samples: slippage tests 195
population mean, /Li, is largest (rather than to decide between the more and use as test statistic
restricted ~i of the slippage model).
Certain modifications of aim or model need to be considered. T'= max Yi (or max IYil) (5.3.6)
i=1,2, ... ,n i=1,2, ... ,n
(i) Variance known If fJ 2 is known (unlikely as this may be) the slippage
in piace of T (or T'). Pfanzagl (1959) demonstrates that this procedure is
test reduces to use of .J m(imax- i)/ fJ as test statistic with rejection of the locally optimum (for small a). New criticai values will now be needed and do
null hypothesis of equality of means in favour of upward slippage of l-tmax if
not appear to bave been tabulated. Approximate values can be determined
.Jm(imax-i)/fJ>À.on where the a-point can be found fora =0.05, 0.01,
using, as before, Bonferroni-type inequalities (see Doornbos, 1966; Doorn-
from Table VIle on page 299.
bos and Prins, 1958; also Section 5.3.2). Quesenberry and David (1961)
suggest that if the mi are not too dissimilar we can obtain an approximate
(ii) Slippage to the left Clearly aH that is needed is to replace Xmax in T by
test by putting
imin: the lowest of the sample means, with rejection at level a if T< -ha.

(iii) Slippage in an unspecified direction Kudo (1956a) points out that


Paulson's optimum procedure is easily modified for this situation yielding a and using in the numerator of the test statistic
decision procedure based on
max (rmixi- x)
i=1,2, ... ,n
T'= ._m max lxi-
]-1,2, ... ,n
xl/~t ~ (x;
J-11-1
1- i)
2
(5.3.5) with reference to the tabulated ha (or h!) with v= L:i= 1 mi- n. They do not
examine the properties of this approximate test.
where if T'~ h! we conclude that ali means are equal; if T'> h! that the
mean of population Jtl has slipped, where Jtl is the index of the sample for (vi) Other modifications Kudo (1956a) examines a slippage problem in a
which lxi- il is a maximum. The criticai values h! can again be obtained rather special situation. We have three sets of data: in the first set observa-
from corresponding single sample tables. Table VIIId on page 303 presents, tions arise at random from m 1 normal distributions N(l-ti' fJ 2 ), in the second
for a= 0.05, 0.01 and the same range of values of n and v as for the set m 2 observations arise from a common normal distribution N(~-t', fJ 2 ), in
one-sided test discussed above, approximate values of h!.../[(mn -l)/ m]. the third set m 3 observations arise from a common normal distribution
Thus h! is obtained as .../[m/(mn -l)] times the entry for the appropriate N(~-t", fJ ), with ali parameters unknown. No assumptions are made about ~-t"
2

number of samples, n, and for v= n(m -l). or fJ , but the null hypothesis declares that ILi= ~-t' (j = l, 2, ... , m1 ) whilst
2

as an alternative hypothesis we postulate that one of the ILi (in the first set)
(iv) Additional information about fJ 2 If in addition to the set of samples has slipped upward from IL'. H e derives a multiple decisi o n procedure un der
we bave an independent estimate of fJ 2 in the form of a quantity s~ where similar restrictions and with a similar optimality property to those of
TJS~/ (1" 2 ~x~, then this can be profitably incorporated in T or T' by changing Paulson's procedure. The decision rule utilizes the value of
the denominator to
XM-f

[JJ, (xrX?+11s~r. s
where xM is the largest observation in the first set of data, i is the overall
The appropriate ha or h! now correspond with v= n(m -l)+ 11· mean for the first two sets of data and S 2 is an overall sum of squares made
up of the sum of squares about i of the observations in the first two sets of
(v) Unequal sample sizes Suppose the samples are of sizes m1 , m 2, ... , mw data, plus the sum of squares about its mean of the observations in the third
This important prospect can be accommodated by modifying Paulson's set. No percentage points of the null distribution have been calculated for
procedure as follows. Put generai m 17 m 2, m3 • Special cases (e.g. m 2 =O, or m 2 = m 3 =O) are of course
easily handled in terms of single-sample outlier tests.
Yi = mi(ii- x)/[ f Ì (xi,- i) 2 ]~ Karlin and Truax (1960) include in their generai Bayesian study of
i=1 1=1 slippage tests a particular case of n ormai samples wh ere there are n - l
194 Outliers in statistica/ data Outlying sub-samples: slippage tests 195
population mean, /Li, is largest (rather than to decide between the more and use as test statistic
restricted ~i of the slippage model).
Certain modifications of aim or model need to be considered. T'= max Yi (or max IYil) (5.3.6)
i=1,2, ... ,n i=1,2, ... ,n
(i) Variance known If fJ 2 is known (unlikely as this may be) the slippage
in piace of T (or T'). Pfanzagl (1959) demonstrates that this procedure is
test reduces to use of .J m(imax- i)/ fJ as test statistic with rejection of the locally optimum (for small a). New criticai values will now be needed and do
null hypothesis of equality of means in favour of upward slippage of l-tmax if
not appear to bave been tabulated. Approximate values can be determined
.Jm(imax-i)/fJ>À.on where the a-point can be found fora =0.05, 0.01,
using, as before, Bonferroni-type inequalities (see Doornbos, 1966; Doorn-
from Table VIle on page 299.
bos and Prins, 1958; also Section 5.3.2). Quesenberry and David (1961)
suggest that if the mi are not too dissimilar we can obtain an approximate
(ii) Slippage to the left Clearly aH that is needed is to replace Xmax in T by
test by putting
imin: the lowest of the sample means, with rejection at level a if T< -ha.

(iii) Slippage in an unspecified direction Kudo (1956a) points out that


Paulson's optimum procedure is easily modified for this situation yielding a and using in the numerator of the test statistic
decision procedure based on
max (rmixi- x)
i=1,2, ... ,n
T'= ._m max lxi-
]-1,2, ... ,n
xl/~t ~ (x;
J-11-1
1- i)
2
(5.3.5) with reference to the tabulated ha (or h!) with v= L:i= 1 mi- n. They do not
examine the properties of this approximate test.
where if T'~ h! we conclude that ali means are equal; if T'> h! that the
mean of population Jtl has slipped, where Jtl is the index of the sample for (vi) Other modifications Kudo (1956a) examines a slippage problem in a
which lxi- il is a maximum. The criticai values h! can again be obtained rather special situation. We have three sets of data: in the first set observa-
from corresponding single sample tables. Table VIIId on page 303 presents, tions arise at random from m 1 normal distributions N(l-ti' fJ 2 ), in the second
for a= 0.05, 0.01 and the same range of values of n and v as for the set m 2 observations arise from a common normal distribution N(~-t', fJ 2 ), in
one-sided test discussed above, approximate values of h!.../[(mn -l)/ m]. the third set m 3 observations arise from a common normal distribution
Thus h! is obtained as .../[m/(mn -l)] times the entry for the appropriate N(~-t", fJ ), with ali parameters unknown. No assumptions are made about ~-t"
2

number of samples, n, and for v= n(m -l). or fJ , but the null hypothesis declares that ILi= ~-t' (j = l, 2, ... , m1 ) whilst
2

as an alternative hypothesis we postulate that one of the ILi (in the first set)
(iv) Additional information about fJ 2 If in addition to the set of samples has slipped upward from IL'. H e derives a multiple decisi o n procedure un der
we bave an independent estimate of fJ 2 in the form of a quantity s~ where similar restrictions and with a similar optimality property to those of
TJS~/ (1" 2 ~x~, then this can be profitably incorporated in T or T' by changing Paulson's procedure. The decision rule utilizes the value of
the denominator to
XM-f

[JJ, (xrX?+11s~r. s
where xM is the largest observation in the first set of data, i is the overall
The appropriate ha or h! now correspond with v= n(m -l)+ 11· mean for the first two sets of data and S 2 is an overall sum of squares made
up of the sum of squares about i of the observations in the first two sets of
(v) Unequal sample sizes Suppose the samples are of sizes m1 , m 2, ... , mw data, plus the sum of squares about its mean of the observations in the third
This important prospect can be accommodated by modifying Paulson's set. No percentage points of the null distribution have been calculated for
procedure as follows. Put generai m 17 m 2, m3 • Special cases (e.g. m 2 =O, or m 2 = m 3 =O) are of course
easily handled in terms of single-sample outlier tests.
Yi = mi(ii- x)/[ f Ì (xi,- i) 2 ]~ Karlin and Truax (1960) include in their generai Bayesian study of
i=1 1=1 slippage tests a particular case of n ormai samples wh ere there are n - l
196 Outliers in statistica[ data Outlying sub-samples: slippage tests 197

samples of size m from N(p,i, a- 2 ) (j =l, 2, ... , n -l) and an additional 145, oo, reproduced from Eisenhart, Hastay, and Wallis (1947). This table is
control sample (index O) of size m from N(p, 0 , a- 2 ). The basic (null) model presented as Table I on pages 290-291.
postulates ILi= p, 0 (j = l, 2, ... , n l), whilst under an alternative model one For unequal sample sizes, Pfanzagl (1959) derives a locally optimal
of the ILi has slipped to some value in excess of p, 0 • They show that when a- 2 procedure (b near l) which uses the statistic
is unknown the optimum (Bayes) procedure uses the statistic

j~l,~a~n-1 (ij-i)/[(~. (xrif r


(that is, of a form similar to (5.3.1) but whert?, the control sample mean has
J~l~~x , n (mj -1)[~-1]
where s 2 =L:i=tL~dxi 1 -xifi(L]= 1 mi-n-l), with rejection of the
hypothesis of equality of the a-J for sufficiently large values of the statistic,
been excluded from the maximization process). but no tabulation appears to be available in easily accessible form.
Por downward slippage (O< b <l) the corresponding statistic is
Slippage in the variance
Paulson's multiple-decision approach has been applied by Truax (1953) to
U' = S~in/ I sJ
j=l

determine an optimal procedure for (upward) slippage in the variance of one where s~in is the smallest sample variance. We conclude that slippage has
of a set of normal distributions. Again we have n samples, each of size m, occurred (in the population yielding s~in) if U' is sufficiently small, and
from normal distributions, N(p,i, a-J). If !]0 is the decision that approximate lower 5 per cent and l per cent points of U' are tabulated in
a-J = a- 2 (unspecified) (j =l, 2, ... , n) Doornbos (1956). These are reproduced as Table XXIV on page 327.
Further relevant results are to be found in discussion of slippage in
and !]i the decision that gamma distributions. (See Doornbos, 1966; Doornbos and Prins, 1958; and
a-f = a- 2 ( unspecified) j =1- i Section 5.3.3).
2 A quick variance slippage test based on ranges of equal-sized samples
a-;= ba- (b >l)
rejects the null hypothesis of equality of variances in favour of upward
then with similar invariance assumptions to those employed by Paulson, and (downward) slippage of one variance if
with the same optimality principle of maximizing the probability of correctly
adopting !]i (i= l, 2, ... , n) subject to !]0 being chosen with probability wmaxiL ~ (or WminiL ~)
l- a when no slippage has occurred, the optimal procedure turns out to be is sufficiently large (small). H ere W max (Wmin) is the largest (smallest) of the
based on the statistic considered earlier by Cochran (1941), n sample ranges ~· Bliss, Cochran, and Tukey (1956) give approximate
upper 5 per cent points for wmaxiL: ~ in the nun case; for n= 2(1)10, 12,
U= S~ax/ f sJ. (5.3.7) 15, 20 and m= 2(1)10. These are reproduced as Table 3lb in Pearson and
j= l Hartley (1966) (note the use of k for n, and n for m in this table).
Here s~ax is the largest of the unbiased sample variance estimates, sJ
(j = l, 2, ... , n). If U:::;;.;; da we ado p t !]0 an d conclude that no slippage has 5.3.2 Generai slippage tests
taken p la ce. If U > da we ado p t !]M, concluding that the population yielding In his book entitled Slippage Tests, Doornbos (1966) reviews a generai
the largest sample variance s~ax has slipped upwards in its variance in method for constructing parametric slippage tests based on earlier work by
comparison with the rest. The optimality of the procedure is demonstrated, himself and Prins (Doornbos and Prins, 1958).
in the proof by Truax (1953), by considering the Bayes solution relating to a Independent random samples of sizes m~> m 2 , ••• , mn arise from distribu-
prior distribution over the !]i which assigns probability p< l/n to !]i tions within the same family, indexed by a (vector) parameter 9. One
(i= l, 2, ... , n) and probability 1- np to !]0 . component, 8t. is of prime interest from the slippage viewpoint, the others
The determination of the percentage points, da, has again been ap- are nuisance parameters. We want to test the working hypothesis
proached using Bonferroni-type inequalities. Cochran (1941) used this ap-
proach; some tables are published in Eisenhart, Hastay, and Wallis (1947). H: 811 = 812 · · · = 8ln•
Table 3la in Pearson and Hartley (1966) presents the upper 5 per cent and where 81i is the value that the parameter 81 assumes in the population
l per cent points of U for n= 2(1)10, 12, 15, 20 and m= 2(1)11, 17, 37, yielding the jth sample, against o ne- (or two-)sided slippage alternatives for
196 Outliers in statistica[ data Outlying sub-samples: slippage tests 197

samples of size m from N(p,i, a- 2 ) (j =l, 2, ... , n -l) and an additional 145, oo, reproduced from Eisenhart, Hastay, and Wallis (1947). This table is
control sample (index O) of size m from N(p, 0 , a- 2 ). The basic (null) model presented as Table I on pages 290-291.
postulates ILi= p, 0 (j = l, 2, ... , n l), whilst under an alternative model one For unequal sample sizes, Pfanzagl (1959) derives a locally optimal
of the ILi has slipped to some value in excess of p, 0 • They show that when a- 2 procedure (b near l) which uses the statistic
is unknown the optimum (Bayes) procedure uses the statistic

j~l,~a~n-1 (ij-i)/[(~. (xrif r


(that is, of a form similar to (5.3.1) but whert?, the control sample mean has
J~l~~x , n (mj -1)[~-1]
where s 2 =L:i=tL~dxi 1 -xifi(L]= 1 mi-n-l), with rejection of the
hypothesis of equality of the a-J for sufficiently large values of the statistic,
been excluded from the maximization process). but no tabulation appears to be available in easily accessible form.
Por downward slippage (O< b <l) the corresponding statistic is
Slippage in the variance
Paulson's multiple-decision approach has been applied by Truax (1953) to
U' = S~in/ I sJ
j=l

determine an optimal procedure for (upward) slippage in the variance of one where s~in is the smallest sample variance. We conclude that slippage has
of a set of normal distributions. Again we have n samples, each of size m, occurred (in the population yielding s~in) if U' is sufficiently small, and
from normal distributions, N(p,i, a-J). If !]0 is the decision that approximate lower 5 per cent and l per cent points of U' are tabulated in
a-J = a- 2 (unspecified) (j =l, 2, ... , n) Doornbos (1956). These are reproduced as Table XXIV on page 327.
Further relevant results are to be found in discussion of slippage in
and !]i the decision that gamma distributions. (See Doornbos, 1966; Doornbos and Prins, 1958; and
a-f = a- 2 ( unspecified) j =1- i Section 5.3.3).
2 A quick variance slippage test based on ranges of equal-sized samples
a-;= ba- (b >l)
rejects the null hypothesis of equality of variances in favour of upward
then with similar invariance assumptions to those employed by Paulson, and (downward) slippage of one variance if
with the same optimality principle of maximizing the probability of correctly
adopting !]i (i= l, 2, ... , n) subject to !]0 being chosen with probability wmaxiL ~ (or WminiL ~)
l- a when no slippage has occurred, the optimal procedure turns out to be is sufficiently large (small). H ere W max (Wmin) is the largest (smallest) of the
based on the statistic considered earlier by Cochran (1941), n sample ranges ~· Bliss, Cochran, and Tukey (1956) give approximate
upper 5 per cent points for wmaxiL: ~ in the nun case; for n= 2(1)10, 12,
U= S~ax/ f sJ. (5.3.7) 15, 20 and m= 2(1)10. These are reproduced as Table 3lb in Pearson and
j= l Hartley (1966) (note the use of k for n, and n for m in this table).
Here s~ax is the largest of the unbiased sample variance estimates, sJ
(j = l, 2, ... , n). If U:::;;.;; da we ado p t !]0 an d conclude that no slippage has 5.3.2 Generai slippage tests
taken p la ce. If U > da we ado p t !]M, concluding that the population yielding In his book entitled Slippage Tests, Doornbos (1966) reviews a generai
the largest sample variance s~ax has slipped upwards in its variance in method for constructing parametric slippage tests based on earlier work by
comparison with the rest. The optimality of the procedure is demonstrated, himself and Prins (Doornbos and Prins, 1958).
in the proof by Truax (1953), by considering the Bayes solution relating to a Independent random samples of sizes m~> m 2 , ••• , mn arise from distribu-
prior distribution over the !]i which assigns probability p< l/n to !]i tions within the same family, indexed by a (vector) parameter 9. One
(i= l, 2, ... , n) and probability 1- np to !]0 . component, 8t. is of prime interest from the slippage viewpoint, the others
The determination of the percentage points, da, has again been ap- are nuisance parameters. We want to test the working hypothesis
proached using Bonferroni-type inequalities. Cochran (1941) used this ap-
proach; some tables are published in Eisenhart, Hastay, and Wallis (1947). H: 811 = 812 · · · = 8ln•
Table 3la in Pearson and Hartley (1966) presents the upper 5 per cent and where 81i is the value that the parameter 81 assumes in the population
l per cent points of U for n= 2(1)10, 12, 15, 20 and m= 2(1)11, 17, 37, yielding the jth sample, against o ne- (or two-)sided slippage alternatives for
Outlying sub-samples: slippage tests 199
198 Outliers in statistica[ data

the parameter 8 1 • Thus we contemplate the prospect, if H is untrue, that 8u Equivalently, n n

may be Iarger than (smaller than, different from) some common vaiue taken 2: qj- I qjl~o~ 2: qj (5.3.14)
j=l j<l j=l
by 81i (j~ i). We consider a transformation of the originai data where each
sample is now represented by a single sufficient statistic where Q is the probability that the Ieast one of the Xi exceeds the
corresponding u;:
Now if it happened that
in such a way that the Xi are identically distributed with a known distribu- (5.3.15)
tion not depending on the nuisance parameters and if H is true independent
or equivalently, that
of 8 1 also.
To test for siippage to the right of a singie population we consider qjl ~qjqz, (5.3.16)

di= P(Jç >xi) (j =l, 2, ... , n) (5.3.8) we bave immediately, from (5.3.13) and (5.3.14), that
n n
and reject H if
D= m~n di~ a/n. (5.3.9)
L pj- L PiPl ~p~ L Pi'
j=l j<l j=l
(5.3.17)
J

If (5.3.9) holds we conclude that slippage has occurred in the popuiation (5.3.18)
yieiding the minimum value D.
For slippage to the left, or siippage in either direction, we consider aiso Aggregating the pi, and qi, and putting p= Lt=t pi, q= Lt=l qi Ieads im-
mediately to
ei = P(Jç ~xi) (j =l, 2, ... , n) (5.3.10)
p-!p 2 ~P~p (5.3.19)
and conclude that slippage has taken piace if
q-!q2~Q~q (5.3.20)
E=minei~a/n (5.3.11) which enable bounds to be piaced on the probabilities of type I error for the
i
or generai slippage tests. For if (in the case of continuous Xi) we choose values
min(D, E)~ a/2n, (5.3.12) uia for the ui where
respectiveiy. (5.3.21)
To see that these proposals lead to tests of level a we need to study the
probabilities of type I errors. Consider, for an arbitrary set of n real then on the basis of the test described in terms of (5.3.8) and (5.3.9) we
numbers u 1 , u2, ... , un reject H in favour of the alternative hypothesis of slippage to the right if at least
one of the xi exceeds uia (for a common distribution of the Xi, as postuiated
Pi = P(Jç ~ ui) (qi = P(Jç > ui) in the structure above, ali the uia will of course be equal) ..Thus Qa, the
probability that at least one Jç exceeds the corresponding uia' is the
Pil = P(Jç ~ ui, X,~ u1) (j~ l)
probability of type I error, or significance Ievei, of the test of siippage to the
%z = P(Jç > ui, X 1> Uz) (j~ l) right. So from (5.3.20)
determined under H. Suppose P is the probabiiity that at least one of the Xi (5.3.22)
does not exceed the corresponding ui. If Ai is the event: Xi~ ui, then and the test has Ievei a and size which is not less than a- a 2 /2.
A similar result can be readiiy demonstrated for the test of siippage to the
Ieft.
Certain important points shouid be noted.
and by the first Bonferroni inequality (Feller, 1968, page 110)
(i) If the Xi are discrete, precise a-points may not exist but we can take uia

j=l
f pj- j<lL Pil~p~ f Pi·
j=l
(5.3.13)
in (5.3.21) as the smallest integer for which qia ~a/n. Then (5.3.22) holds
with a replaced by a'= Lt=l qia·
Outlying sub-samples: slippage tests 199
198 Outliers in statistica[ data

the parameter 8 1 • Thus we contemplate the prospect, if H is untrue, that 8u Equivalently, n n

may be Iarger than (smaller than, different from) some common vaiue taken 2: qj- I qjl~o~ 2: qj (5.3.14)
j=l j<l j=l
by 81i (j~ i). We consider a transformation of the originai data where each
sample is now represented by a single sufficient statistic where Q is the probability that the Ieast one of the Xi exceeds the
corresponding u;:
Now if it happened that
in such a way that the Xi are identically distributed with a known distribu- (5.3.15)
tion not depending on the nuisance parameters and if H is true independent
or equivalently, that
of 8 1 also.
To test for siippage to the right of a singie population we consider qjl ~qjqz, (5.3.16)

di= P(Jç >xi) (j =l, 2, ... , n) (5.3.8) we bave immediately, from (5.3.13) and (5.3.14), that
n n
and reject H if
D= m~n di~ a/n. (5.3.9)
L pj- L PiPl ~p~ L Pi'
j=l j<l j=l
(5.3.17)
J

If (5.3.9) holds we conclude that slippage has occurred in the popuiation (5.3.18)
yieiding the minimum value D.
For slippage to the left, or siippage in either direction, we consider aiso Aggregating the pi, and qi, and putting p= Lt=t pi, q= Lt=l qi Ieads im-
mediately to
ei = P(Jç ~xi) (j =l, 2, ... , n) (5.3.10)
p-!p 2 ~P~p (5.3.19)
and conclude that slippage has taken piace if
q-!q2~Q~q (5.3.20)
E=minei~a/n (5.3.11) which enable bounds to be piaced on the probabilities of type I error for the
i
or generai slippage tests. For if (in the case of continuous Xi) we choose values
min(D, E)~ a/2n, (5.3.12) uia for the ui where
respectiveiy. (5.3.21)
To see that these proposals lead to tests of level a we need to study the
probabilities of type I errors. Consider, for an arbitrary set of n real then on the basis of the test described in terms of (5.3.8) and (5.3.9) we
numbers u 1 , u2, ... , un reject H in favour of the alternative hypothesis of slippage to the right if at least
one of the xi exceeds uia (for a common distribution of the Xi, as postuiated
Pi = P(Jç ~ ui) (qi = P(Jç > ui) in the structure above, ali the uia will of course be equal) ..Thus Qa, the
probability that at least one Jç exceeds the corresponding uia' is the
Pil = P(Jç ~ ui, X,~ u1) (j~ l)
probability of type I error, or significance Ievei, of the test of siippage to the
%z = P(Jç > ui, X 1> Uz) (j~ l) right. So from (5.3.20)
determined under H. Suppose P is the probabiiity that at least one of the Xi (5.3.22)
does not exceed the corresponding ui. If Ai is the event: Xi~ ui, then and the test has Ievei a and size which is not less than a- a 2 /2.
A similar result can be readiiy demonstrated for the test of siippage to the
Ieft.
Certain important points shouid be noted.
and by the first Bonferroni inequality (Feller, 1968, page 110)
(i) If the Xi are discrete, precise a-points may not exist but we can take uia

j=l
f pj- j<lL Pil~p~ f Pi·
j=l
(5.3.13)
in (5.3.21) as the smallest integer for which qia ~a/n. Then (5.3.22) holds
with a replaced by a'= Lt=l qia·
200 Outliers in statistica/ data
Outlying sub-samples: slippage tests 201
(ii) We remarked on (5.3.22) in relation to the Paulson (1952b) test for
slippage of a normal mean with large equal-sized samples. In fact, it holds (largely) uniformly most powerful among symmetric invariant procedures
under the above conditions even if the sample sizes differ and are not with a prescribed probability of incorrectly adopting !]0 • In this part of their
necessarily large. work a simple (O, l) loss structure is employed.
(iii) (5.3.22) depends crucially (in the above proof) on the inequality 5.3.3 Non-normal samples
(5.3.15), or (5.3.16), holding. Doornbos (1966) shows this to be true for the
case of normal means (see also Quesenberry and David, 1961) and demon- We. have implicitly considered non-normal samples in the test of slippage of
strates its propriety for other parametric slippage tests (including normal vanance based on normal samples and described in Section 5.3.1. For if s?-
variance, tests and some others we discuss in ~ection 5.3.3 below). When (j =l, 2, ... , n) are the sample variances based on samples of size m the~
(5.3.15) does not hold, the upper bound in (5.3.22) stili prevails so that the are, in the null case, independent observations from (T 2 x?:n- 1!(m -l). Thus in
test has level a. The lower bound does not, so that we are uncertain of how testing for slippage of a population variance we are testing for discordancy
close is the size of the test to the level, a. Doornbos (1966) discusses of an outlier in a X 2 (gamma) sample, against a slippage-type alternative
conditions for the validity of (5.3.15) in the continuous case. hypothesis. It is but a minor modification to proceed to test for slippage of
the scale parameter in a gamma distribution on the basis of a set of samples
Following a similar multiple decision approach to that advanced by from gamma populations.
Paulson (1952b) for norma! means, and employed by Truax (1953) for
normal variances, Doornbos (1966) shows that certain discrete slippage Gamma samples
tests have a similar optimality. These include tests for slippage of means of
Poisson distributions, and of proportions in binomial or negative binomial Here, and throughout most of this section, we use or extend some of the
distributions. See belo w. results of Doornbos (1966).
Karlin and Truax (1960) also present a rather generai approach to Suppose we have n samples, of sizes mi (j =l, 2, ... , n), from gamma
slippage tests, with particular illustrative cases discussed in detail. Their distributions r(ri, Ai) (j =l, 2, ... , n). Under the null hypothesis all the r. are
approach is entirely Bayesian decision-theoretic, exhibiting optimum proce- equal, also the Ài, with common (unknown) values r and À respecti~ely.
dures in the forrn of Bayes solutions for loss structures which do not need to Under the alternative hypothesis of upward (downward) slippage of the
be specified in great detail. Some of their specific proposals are discussed scale parameter we have
elsewhere in this chapter. At this stage we consider briefly the methodologi- Ài = cÀ (i unspecified)
cal basis of their approach. They consider a one-parameter problem in which Àj = À (j~ i)
decision rules are required to be invariant with respect to permutation of the
with c> l (c< 1). In view of the additivity property of the gamma distribu-
labels of the samples. The loss functions hi ( 8), relating to the taking of
decision !]i when the values of the parameter 8 for ali populations are tion, the sample sums t 11 t2 , ••• , tn are independent r(mli' Ai) and we can
test for upward (downward) slippage by using a test statisti c
specified, are assumed to have certain desirable properties including permu-
tation invariance and reduced values when slippage occurs for population i.
The special form of the Bayes solution when sufficient statistics exist is given /,t,
max t1 t1
particular attention. It amounts to accepting !] 0 (no slippage) unless the
maximum discrepancy between the sample point xi and the MLE O of the
parameter 8 (under the assumption of no slippage) is particularly large. The
use of a sample point xi highlights the transference from an initial statement The null distribution of this test statistic is complicated. But following the
of the problem in terms of samples from each of a set of populations to generai approach to slippage tests described in Section 5.3 .2 we conclude
detailed study involving a single observed value from each population. In (equivalently) that slippage has occurred, for a test with significance level a,
this respect (as remarked in Section 5 .2) the Karlin an d Truax (1960) if
Bayesian analysis is more in keeping with single outlier study than with
slippage based on multi-observation samples-in spite of the authors' de-
min P{ ~!t t,> to~~ t 1 10 } , ; a/n
scription of their work.
Karlin and Truax show that their symmetric invariant Bayes solutions are
hin P{t/t t,,; ~o /t t 10 } , ; a/n)
where ti 0 is the observed value of the jth sample sum.
200 Outliers in statistica/ data
Outlying sub-samples: slippage tests 201
(ii) We remarked on (5.3.22) in relation to the Paulson (1952b) test for
slippage of a normal mean with large equal-sized samples. In fact, it holds (largely) uniformly most powerful among symmetric invariant procedures
under the above conditions even if the sample sizes differ and are not with a prescribed probability of incorrectly adopting !]0 • In this part of their
necessarily large. work a simple (O, l) loss structure is employed.
(iii) (5.3.22) depends crucially (in the above proof) on the inequality 5.3.3 Non-normal samples
(5.3.15), or (5.3.16), holding. Doornbos (1966) shows this to be true for the
case of normal means (see also Quesenberry and David, 1961) and demon- We. have implicitly considered non-normal samples in the test of slippage of
strates its propriety for other parametric slippage tests (including normal vanance based on normal samples and described in Section 5.3.1. For if s?-
variance, tests and some others we discuss in ~ection 5.3.3 below). When (j =l, 2, ... , n) are the sample variances based on samples of size m the~
(5.3.15) does not hold, the upper bound in (5.3.22) stili prevails so that the are, in the null case, independent observations from (T 2 x?:n- 1!(m -l). Thus in
test has level a. The lower bound does not, so that we are uncertain of how testing for slippage of a population variance we are testing for discordancy
close is the size of the test to the level, a. Doornbos (1966) discusses of an outlier in a X 2 (gamma) sample, against a slippage-type alternative
conditions for the validity of (5.3.15) in the continuous case. hypothesis. It is but a minor modification to proceed to test for slippage of
the scale parameter in a gamma distribution on the basis of a set of samples
Following a similar multiple decision approach to that advanced by from gamma populations.
Paulson (1952b) for norma! means, and employed by Truax (1953) for
normal variances, Doornbos (1966) shows that certain discrete slippage Gamma samples
tests have a similar optimality. These include tests for slippage of means of
Poisson distributions, and of proportions in binomial or negative binomial Here, and throughout most of this section, we use or extend some of the
distributions. See belo w. results of Doornbos (1966).
Karlin and Truax (1960) also present a rather generai approach to Suppose we have n samples, of sizes mi (j =l, 2, ... , n), from gamma
slippage tests, with particular illustrative cases discussed in detail. Their distributions r(ri, Ai) (j =l, 2, ... , n). Under the null hypothesis all the r. are
approach is entirely Bayesian decision-theoretic, exhibiting optimum proce- equal, also the Ài, with common (unknown) values r and À respecti~ely.
dures in the forrn of Bayes solutions for loss structures which do not need to Under the alternative hypothesis of upward (downward) slippage of the
be specified in great detail. Some of their specific proposals are discussed scale parameter we have
elsewhere in this chapter. At this stage we consider briefly the methodologi- Ài = cÀ (i unspecified)
cal basis of their approach. They consider a one-parameter problem in which Àj = À (j~ i)
decision rules are required to be invariant with respect to permutation of the
with c> l (c< 1). In view of the additivity property of the gamma distribu-
labels of the samples. The loss functions hi ( 8), relating to the taking of
decision !]i when the values of the parameter 8 for ali populations are tion, the sample sums t 11 t2 , ••• , tn are independent r(mli' Ai) and we can
test for upward (downward) slippage by using a test statisti c
specified, are assumed to have certain desirable properties including permu-
tation invariance and reduced values when slippage occurs for population i.
The special form of the Bayes solution when sufficient statistics exist is given /,t,
max t1 t1
particular attention. It amounts to accepting !] 0 (no slippage) unless the
maximum discrepancy between the sample point xi and the MLE O of the
parameter 8 (under the assumption of no slippage) is particularly large. The
use of a sample point xi highlights the transference from an initial statement The null distribution of this test statistic is complicated. But following the
of the problem in terms of samples from each of a set of populations to generai approach to slippage tests described in Section 5.3 .2 we conclude
detailed study involving a single observed value from each population. In (equivalently) that slippage has occurred, for a test with significance level a,
this respect (as remarked in Section 5 .2) the Karlin an d Truax (1960) if
Bayesian analysis is more in keeping with single outlier study than with
slippage based on multi-observation samples-in spite of the authors' de-
min P{ ~!t t,> to~~ t 1 10 } , ; a/n
scription of their work.
Karlin and Truax show that their symmetric invariant Bayes solutions are
hin P{t/t t,,; ~o /t t 10 } , ; a/n)
where ti 0 is the observed value of the jth sample sum.
Outlying sub-samples: slippage tests 203
202 Outliers in statistica[ data

This test has level a on the usual Bonferroni-type inequality argument So on the now familiar argument we again reject H at level a in favour of
and, again, size within a 2 /2 of a (for details see Doornbos, 1966; Doornbos upward slippage of one of the Poisson means if
and Prins, 1956). The probabilities p(x) = P(t/'f1 t1 >x) (or q(x) = 1- p(x))
can of course be expressed in terms of incomplete Beta functions and min di:::;;.;; a/n
j
evaluated from published tables or nomograms (Pearson, 1968; Pearson and
Hartley, 1966, Table 17).
concluding that the population yielding the minimum probability di is the
Doornbos (1966) also considers slippage tests for some discrete distribu- ,
one that has slipped. (For downward slippage we employ the corresponding
ti o ns.
argument in terms of
Poisson samples
We have n samples of sizes mi (j =l, 2, ... , n) from Poisson distributions
P(p) with means ILi (j =l, 2, ... , n). The sample totals ti have Poisson
distributions P(mip,i) and the null hypothesis, H, of equality of the ILi (at a Note that the probability calculations involve the well-tabulated incom-
value p,) is equivalent to declaring specific values for the ratios of the means plete Beta function (or tail probabilities for the binomial distribution). The
of the ti. Under H the ti constitute independent single observations from test can obviously be applied in an unconditional form for testing slippage in
r)
P oisso n distributions P( p, where p, r = mi p, so that multinomial populations.
When the sample sizes mi are equal the 1ri will also have equal values and
p,r ~~ p,f =m/M= 1ri (5.3.23) the minimum value of di occurs at the maximum value of ti 0 , and we will
reject H if this value is sufficiently large. Table XXV (from Doornbos, 1966)
(say) where M= L~ mi. Note that this structure arises also from observing gives corresponding (approximate) criticai values for 5 per cent and l per
numbers of events in different time intervals in independent Poisson proces- cent upward slippage tests for n= 2(1)10 and t= 2(1)25, together with test
ses where as a working hypothesis we declare that the processes ali have the sizes (which in view of the discreteness of the data may differ noticeably
same rate. from the significance levels). The upper limit of 25 on the value of t is of
The alternative hypothesis of upward (downward) slippage of o ne of the course a serious practical restriction on the use of Table XXV (page 328).
means l-ti becomes transformed to
Binomia[ samples
(j~ i) (5.3.24)
Sets of mi independent Bernoulli observations arise from n populations
where the probabilities of 'success' are Pi (j =l, 2, ... , n). The respective
with l l 1ri > c > l (O < c < l). numbers of successes are rb r2 , ••• , rw The null hypothesis H declares that
Since the joint distribution of t11 t2 , ••• , tm conditional o n the sum t= L~ tz, aH Pi are equal. Under the alternative hypothesis of upward (downward)
is multinomial a simple conditional slippage test is easily constructed. Given slippage of one of the Pi we have
t, each ti has a binomial distribution, so under H we can readily express
pi=cp (i unspecified)

(jf; i)

= 17TJti 0 , t- ti 0 +l] (5.3.25)


with l< c:::;;.;; l/p (O< c< 1). The slippage test is again of a conditional form
where lp(a, {3) is the incomplete Beta function satisfying based now on the hypergeometric distribution.
Un der H we h ave
p

f(a)f({3) I ( f.l) =
f(a + {3) P a, tJ
Ju a-1(1- )13-1 d
u u. (5.3.26)
o
Outlying sub-samples: slippage tests 203
202 Outliers in statistica[ data

This test has level a on the usual Bonferroni-type inequality argument So on the now familiar argument we again reject H at level a in favour of
and, again, size within a 2 /2 of a (for details see Doornbos, 1966; Doornbos upward slippage of one of the Poisson means if
and Prins, 1956). The probabilities p(x) = P(t/'f1 t1 >x) (or q(x) = 1- p(x))
can of course be expressed in terms of incomplete Beta functions and min di:::;;.;; a/n
j
evaluated from published tables or nomograms (Pearson, 1968; Pearson and
Hartley, 1966, Table 17).
concluding that the population yielding the minimum probability di is the
Doornbos (1966) also considers slippage tests for some discrete distribu- ,
one that has slipped. (For downward slippage we employ the corresponding
ti o ns.
argument in terms of
Poisson samples
We have n samples of sizes mi (j =l, 2, ... , n) from Poisson distributions
P(p) with means ILi (j =l, 2, ... , n). The sample totals ti have Poisson
distributions P(mip,i) and the null hypothesis, H, of equality of the ILi (at a Note that the probability calculations involve the well-tabulated incom-
value p,) is equivalent to declaring specific values for the ratios of the means plete Beta function (or tail probabilities for the binomial distribution). The
of the ti. Under H the ti constitute independent single observations from test can obviously be applied in an unconditional form for testing slippage in
r)
P oisso n distributions P( p, where p, r = mi p, so that multinomial populations.
When the sample sizes mi are equal the 1ri will also have equal values and
p,r ~~ p,f =m/M= 1ri (5.3.23) the minimum value of di occurs at the maximum value of ti 0 , and we will
reject H if this value is sufficiently large. Table XXV (from Doornbos, 1966)
(say) where M= L~ mi. Note that this structure arises also from observing gives corresponding (approximate) criticai values for 5 per cent and l per
numbers of events in different time intervals in independent Poisson proces- cent upward slippage tests for n= 2(1)10 and t= 2(1)25, together with test
ses where as a working hypothesis we declare that the processes ali have the sizes (which in view of the discreteness of the data may differ noticeably
same rate. from the significance levels). The upper limit of 25 on the value of t is of
The alternative hypothesis of upward (downward) slippage of o ne of the course a serious practical restriction on the use of Table XXV (page 328).
means l-ti becomes transformed to
Binomia[ samples
(j~ i) (5.3.24)
Sets of mi independent Bernoulli observations arise from n populations
where the probabilities of 'success' are Pi (j =l, 2, ... , n). The respective
with l l 1ri > c > l (O < c < l). numbers of successes are rb r2 , ••• , rw The null hypothesis H declares that
Since the joint distribution of t11 t2 , ••• , tm conditional o n the sum t= L~ tz, aH Pi are equal. Under the alternative hypothesis of upward (downward)
is multinomial a simple conditional slippage test is easily constructed. Given slippage of one of the Pi we have
t, each ti has a binomial distribution, so under H we can readily express
pi=cp (i unspecified)

(jf; i)

= 17TJti 0 , t- ti 0 +l] (5.3.25)


with l< c:::;;.;; l/p (O< c< 1). The slippage test is again of a conditional form
where lp(a, {3) is the incomplete Beta function satisfying based now on the hypergeometric distribution.
Un der H we h ave
p

f(a)f({3) I ( f.l) =
f(a + {3) P a, tJ
Ju a-1(1- )13-1 d
u u. (5.3.26)
o
204 Outliers in statistica/ data Outlying sub-samples: slippage tests 205

where M= L~ mz, an d we can determine the tail probabilities where M= L mz, corresponding with a particular choice of I. Por any I we
determine the criticai level for the usual one-sided two-sample t-test for
equality of two normal means against the alternative that the first sample
comes from a distribution with larger mean. Let the criticai level be dr.
We repec:t this calculation for all (:) choices of the set I and reject H in
favour of H if
an d hence conclude, a t leve l a, that upward (downward) slippage has
occurred in the population yielding mini di (mipi ei) if

Such a test is easily seen to ha ve significance level a: the probability of


min di:::;;.;; a/n
j incorrectly rejecting H is less than or equal to a.
Note however, that with n= 20, k = 3 we would have to determine 1140
Tables of hypergeometric probabilities for the determination of criticai values dr!
values can be found in Owen (1962) up to M= 20 or in more extensive form A more specific group test for normal samples of equal size m, with k = 2
in Lieberman and Owen (1961). but where the alternative hypothesis specifies slippage of one mean upwards,
Again if the sample sizes mi are equal, detection of the slipped popula- and the other downwards, is described by Ramachandran and Khatri (1957).
tion, and assessment of its discordancy, are based simply on max ri 0 (min ri 0 ). It amounts to considering a test statistic
See Tests B l an d B2 in Chapter 3 an d associated tables.
Doorrtbos (1966) also considers a slippage test for negative binomia[
samples. [L/= ILI!: l (xi,- i) 2 ]!
and concluding that slippage has occurred if this statistic is sufficiently large;
5.3.4 Group parametric slippage tests slippage then being attributed to the two populations yielding imin and imax·
The test has the Paulson-type multiple decision optimality, but criticai
W e may wish to test if a group of k(> l) out of n populations have slipped, values do not seem to have been published.
based o n samples of mi observations from the jth population (j =
l, 2, ... , n). Doornbos (1966) extends the Bonferroni-type inequality ap- 5.4 OTHER SLIPPAGE WORK
proach to this more generai situation. We lose the facility of setting a lower
bound to the size of the slippage test but can obtain a straightforward (if on In the earlier sections of this chapter we have reviewed and categorized the
occasions somewhat time-consuming) slippage test of a given significance published work on slippage tests, viewed as a natural generalization of tests
leve l. of discordancy for outliers. Not ali outlier considerations have been paral-
The approach is applied to derive specific tests for sets of normal, gamma, leled in the slippage ('outlying sub-sample') context. The absence of detailed
Poisson, binomial (and other) samples. We shall illustrate it only for the comment on multivariate slippage, accommodation of slipped populations in
normal case with upward slippage in the mean. the framework of testing or estimating parameters, effects of slippage in
We have n normal samples of sizes mb m 2, ... , mn from distributions studies of structured models such as regression or designed experiments, aH
with means p.. 1 , p.. 2 , ••• , f..Ln and common unknown variance (1" 2 • Under reflect the apparent lack of much published work in these areas. (Although
H: f..Li =p.. (unspecified) whilst un der H we have in a generai sense the vast heritage of linear model study utilizing analysis of
variance techniques can be viewed, in the case of replicated observations, as
(i E I, unspecified) concerned with slippage of the means of a set of populations.)
(jé I) Some isolated topics under the above headings merit brief mention.
Karlin and Truax (1960) include comment on multivariate slippage in their
where I is some subset of subscripts: I= (ib i2, ... , ik). Bayesian decision-theoretic examination of slippage problems, but they are
We combine the data into two sets, one consisting of k samples with total concerned more particularly with samples of size l (outliers rather than
sample size Mr, the other of n - k samples with total sample size M- Mr, slipped samples).
204 Outliers in statistica/ data Outlying sub-samples: slippage tests 205

where M= L~ mz, an d we can determine the tail probabilities where M= L mz, corresponding with a particular choice of I. Por any I we
determine the criticai level for the usual one-sided two-sample t-test for
equality of two normal means against the alternative that the first sample
comes from a distribution with larger mean. Let the criticai level be dr.
We repec:t this calculation for all (:) choices of the set I and reject H in
favour of H if
an d hence conclude, a t leve l a, that upward (downward) slippage has
occurred in the population yielding mini di (mipi ei) if

Such a test is easily seen to ha ve significance level a: the probability of


min di:::;;.;; a/n
j incorrectly rejecting H is less than or equal to a.
Note however, that with n= 20, k = 3 we would have to determine 1140
Tables of hypergeometric probabilities for the determination of criticai values dr!
values can be found in Owen (1962) up to M= 20 or in more extensive form A more specific group test for normal samples of equal size m, with k = 2
in Lieberman and Owen (1961). but where the alternative hypothesis specifies slippage of one mean upwards,
Again if the sample sizes mi are equal, detection of the slipped popula- and the other downwards, is described by Ramachandran and Khatri (1957).
tion, and assessment of its discordancy, are based simply on max ri 0 (min ri 0 ). It amounts to considering a test statistic
See Tests B l an d B2 in Chapter 3 an d associated tables.
Doorrtbos (1966) also considers a slippage test for negative binomia[
samples. [L/= ILI!: l (xi,- i) 2 ]!
and concluding that slippage has occurred if this statistic is sufficiently large;
5.3.4 Group parametric slippage tests slippage then being attributed to the two populations yielding imin and imax·
The test has the Paulson-type multiple decision optimality, but criticai
W e may wish to test if a group of k(> l) out of n populations have slipped, values do not seem to have been published.
based o n samples of mi observations from the jth population (j =
l, 2, ... , n). Doornbos (1966) extends the Bonferroni-type inequality ap- 5.4 OTHER SLIPPAGE WORK
proach to this more generai situation. We lose the facility of setting a lower
bound to the size of the slippage test but can obtain a straightforward (if on In the earlier sections of this chapter we have reviewed and categorized the
occasions somewhat time-consuming) slippage test of a given significance published work on slippage tests, viewed as a natural generalization of tests
leve l. of discordancy for outliers. Not ali outlier considerations have been paral-
The approach is applied to derive specific tests for sets of normal, gamma, leled in the slippage ('outlying sub-sample') context. The absence of detailed
Poisson, binomial (and other) samples. We shall illustrate it only for the comment on multivariate slippage, accommodation of slipped populations in
normal case with upward slippage in the mean. the framework of testing or estimating parameters, effects of slippage in
We have n normal samples of sizes mb m 2, ... , mn from distributions studies of structured models such as regression or designed experiments, aH
with means p.. 1 , p.. 2 , ••• , f..Ln and common unknown variance (1" 2 • Under reflect the apparent lack of much published work in these areas. (Although
H: f..Li =p.. (unspecified) whilst un der H we have in a generai sense the vast heritage of linear model study utilizing analysis of
variance techniques can be viewed, in the case of replicated observations, as
(i E I, unspecified) concerned with slippage of the means of a set of populations.)
(jé I) Some isolated topics under the above headings merit brief mention.
Karlin and Truax (1960) include comment on multivariate slippage in their
where I is some subset of subscripts: I= (ib i2, ... , ik). Bayesian decision-theoretic examination of slippage problems, but they are
We combine the data into two sets, one consisting of k samples with total concerned more particularly with samples of size l (outliers rather than
sample size Mr, the other of n - k samples with total sample size M- Mr, slipped samples).
206 Outliers in statistical data Outlying sub-samples: slippage tests 207

Naik (1972) has considered a Bayesian analysis of 'contaminated samples' This sequential procedure, whatever its potential efficiency, has crucial
in which each of n samples arises either from a population ~ or from a practical limitations in respect of the need to know IL and a and the
population (j}*. The populations (j} an d (j}* are from a common family of computational effort needed to carry it out.
populations, distinguished only by different values 8 and 8* of some A sequential pr9cedure utilizing an independent estimate of a- 2 is de-
important parameter. The model is typified by a situation in which certain scribed by Paulson (1962).
samples correspond with 'good runs' of some system for which the parame- An 'optimal' distribution-free slippage test based on normal scores rather
ter value is 8, the others with 'bad runs' for which the parameter value is 8*. than ranks is described by Paulson (1961).
An event a<r), with probability p<r>, is defined under which a particular r of
the n samples arise from good runs, and starting with a non-informative
prior distribution for (a<r>, 8, 8*) its posterior distribution is determined.
In the special cases considered in detail, the populations are either normal
or exponential, and squared error loss functions for 8 are assumed in the
process of estimating the (uncontaminated) value 8 a posteriori. For the
normal case three situations are discussed. ~ is N(IL, a- 2 ) and ~* is
N(IL *, a-* 2 ), an d i t is assumed either that

IL*= IL+ a, a-*= (T

where a is either known or unknown, or else that IL* and a-* are arbitrarily
different from IL and a-. For IL* = IL+ a, a-*= a-, with a specified a numerica!
example is presented for n= 3 and equal sample sizes mi= lO (j = l, 2, 3).
Parallel situations for exponential samples are also considered.
This work by Naik (1972) is an example of the accommodation of slipped
samples in the estimation of crucial parameters in the unslipped population.
A sequential approach to testing for upward slippage in the mean of one
of n normal populations is proposed by Srivastava (1973). The normal
distributions bave common unknown variance and the aim is to decide, on
the basis of observations taken one by one from each population, which
decision ggi to adopt where under ggi (i= l, 2, ... , n)

(j~ i)

with IL known and a> O specified. Under ~o no slippage has occurred:


ILi = IL (j = l' 2' ... ' n).
An asymptotically efficient sequential procedure is presented under which
the probability of correctly adopting ~o exceeds l- a, and the probability of
correctly adopting ~i (i~ O) exceeds l {3, for prescribed a and {3.
A numerical study is presented, for small n= 2(2)6, of the error prob-
abilities and expected sample numbers for the case a = {3 = 0.05. Both
depend strongly o n the value of A = a-/ a, an d A needs to be qui te large for
the error probabilities to become dose to a and {3 (that is, for the 'size' of
the procedure to approach its 'level'). When n = 2, 4, 6 we need A to exceed
about n/2 to obtain error probabilities in excess of about 0.046.
206 Outliers in statistical data Outlying sub-samples: slippage tests 207

Naik (1972) has considered a Bayesian analysis of 'contaminated samples' This sequential procedure, whatever its potential efficiency, has crucial
in which each of n samples arises either from a population ~ or from a practical limitations in respect of the need to know IL and a and the
population (j}*. The populations (j} an d (j}* are from a common family of computational effort needed to carry it out.
populations, distinguished only by different values 8 and 8* of some A sequential pr9cedure utilizing an independent estimate of a- 2 is de-
important parameter. The model is typified by a situation in which certain scribed by Paulson (1962).
samples correspond with 'good runs' of some system for which the parame- An 'optimal' distribution-free slippage test based on normal scores rather
ter value is 8, the others with 'bad runs' for which the parameter value is 8*. than ranks is described by Paulson (1961).
An event a<r), with probability p<r>, is defined under which a particular r of
the n samples arise from good runs, and starting with a non-informative
prior distribution for (a<r>, 8, 8*) its posterior distribution is determined.
In the special cases considered in detail, the populations are either normal
or exponential, and squared error loss functions for 8 are assumed in the
process of estimating the (uncontaminated) value 8 a posteriori. For the
normal case three situations are discussed. ~ is N(IL, a- 2 ) and ~* is
N(IL *, a-* 2 ), an d i t is assumed either that

IL*= IL+ a, a-*= (T

where a is either known or unknown, or else that IL* and a-* are arbitrarily
different from IL and a-. For IL* = IL+ a, a-*= a-, with a specified a numerica!
example is presented for n= 3 and equal sample sizes mi= lO (j = l, 2, 3).
Parallel situations for exponential samples are also considered.
This work by Naik (1972) is an example of the accommodation of slipped
samples in the estimation of crucial parameters in the unslipped population.
A sequential approach to testing for upward slippage in the mean of one
of n normal populations is proposed by Srivastava (1973). The normal
distributions bave common unknown variance and the aim is to decide, on
the basis of observations taken one by one from each population, which
decision ggi to adopt where under ggi (i= l, 2, ... , n)

(j~ i)

with IL known and a> O specified. Under ~o no slippage has occurred:


ILi = IL (j = l' 2' ... ' n).
An asymptotically efficient sequential procedure is presented under which
the probability of correctly adopting ~o exceeds l- a, and the probability of
correctly adopting ~i (i~ O) exceeds l {3, for prescribed a and {3.
A numerical study is presented, for small n= 2(2)6, of the error prob-
abilities and expected sample numbers for the case a = {3 = 0.05. Both
depend strongly o n the value of A = a-/ a, an d A needs to be qui te large for
the error probabilities to become dose to a and {3 (that is, for the 'size' of
the procedure to approach its 'level'). When n = 2, 4, 6 we need A to exceed
about n/2 to obtain error probabilities in excess of about 0.046.
Outliers in multivariate data 209

me~ns of ordering the sample xb x2, ... , Xm termed reduced ordering, is


ach1eved by ordering the values Ri(x0 , f)= R(xi; x 0 , f) and this may be
employed as a framework on which to express the extremeness of certain
members of the Sctmple. Needless to say, we will (except in special cir-
cumstances) be sacrificing certain information on the multivariate structure
by employing such a reduction of the data. This would be true also if we
order in terms of, say, the first principal component or restrict attention to
CHAPTER 6 marginai ordering based on some favoured single component of x.
Ideally we would hope that an appropriate sub-ordering principle would
implicitly emerge in outlier analysis from a specification of a basic model, an
Outliers in Multivariate Data alternative hypothesis, and a statistica! test principle. But as with much of
the ':ork on univariate outliers, this ideai is not often realized. Again tests
for d1scordancy have emerged from 'intuitively reasonable' test statistics and
Test~ of _discordancy of outliers have as much relevance and importance for criteria which may indeed be more 'reasonable' in one context than in some
~ultlvanate data as they do for univariate samples. Many factors carry over other. We must acknowledge the degree of arbitrariness which arises from
1mmediately. The two basic notions, of an outlier as an observation which having to employ sub-ordering, rather than the unattainable total ordering,
engenders surprise owing to its extremeness, and of its discordancy in the as the framework for expressing extremeness. But this need not always be
sense of that 'extremeness' being statistically unreasonable in terms of some too serious. For example, when the basic model is multivariate normal we
basic model, are not constrained by the dimensionality of the data. But their find that reduced ordering of the distances
expression is by no means as straightforward in more than one dimension. R(x; fL, V)= (x- JL)'V- 1 (x- JL)
Any formai, or indeed even subjective, idea of extremeness is obtuse. As
has substantial appeal in terms of probability ellipsoids (an appeal less
Gnanadesikan and Kettenring (1972) remark, the multivariate outlier no
evident for non-normal data) and also arises naturally from a likelihood
longer has a simple manifestation as the observation which 'sticks out at the
ratio approach to outlier discordancy tests.
end' of the sample. But, notably in bivariate data, we may stili perceive an
Once we have set up some pragmatic test statistic for a test of discordancy
observation as suspiciously aberrant from the data mass, particularly so if
of a multivariate outlier no fundamental obstacle remains to its application.
the data is represented in the form of a scatter diagram.
The only problems are computational and manipulative: in determining and
The idea of extremeness inevitably arises from some form of 'ordering' of
tabulating appropriate percentage points in the null distribution of the test
the ?ata. Unfort~nat~ly, no unique unambiguous form of total ordering is
statistic for assessment of significance, or in the non-null distribution as an
poss1ble for multlvanate data, although different types of sub- (less than
expression of the power of the test. But these problems can be severe.
total) ordering principle may be defined and employed. Barnett (1976)
Manipulation of multivariate distributions is notoriously complicated and it
surveys the role of sub-ordering in multivariate analysis. Thus in attempting
is little wonder that few detailed calculations have been published.
to express mathematically the subjective stimulus to the declaration of an
In this chapter we shall be examining, and illustrating, the rather sparse
outlier in a multivariate sample we will have to employ some sub-ordering
amount of material extant for testing discordancy of multivariate outliers.
principle.
Many proposals remain in qualitative rather than quantitative form but it is
Suppose we were to represent a multivariate observation x by means of
informative to consider some of the recent qualitative proposals for outlier
some univariate metric, or distance measure,
assessment which rest o n various forms of graphical" representation of the
R(x; x 0 , f)= (x-x0 )'f- 1 (x-x0 ) data. The dual problem of accommodation of outliers in robust multivariate
inference procedures is correspondingly ill-represented in the literature. A
where Xo reflects the location of the data set or underlying distribution and
few comments on this topic appear later, in Section 6.3.
f- 1 applies a differential weighting to the components of the multivariate
observation related to their scatter or to the population variability. For 6.1 OUTLIERS IN MULTIVARIATE NORMAL SAMPLES
example, Xo might be the zero vector O, or the true mean fL, or the sample
mean i, and f might be the variance-covariance matrix V or its sample As may be anticipated, most of the sparse study of outliers in multivariate
equivalent S, depending on the state of our knowledge about fL and V. One data deals either with the case of an underlying normal distribution, or is

208
Outliers in multivariate data 209

me~ns of ordering the sample xb x2, ... , Xm termed reduced ordering, is


ach1eved by ordering the values Ri(x0 , f)= R(xi; x 0 , f) and this may be
employed as a framework on which to express the extremeness of certain
members of the Sctmple. Needless to say, we will (except in special cir-
cumstances) be sacrificing certain information on the multivariate structure
by employing such a reduction of the data. This would be true also if we
order in terms of, say, the first principal component or restrict attention to
CHAPTER 6 marginai ordering based on some favoured single component of x.
Ideally we would hope that an appropriate sub-ordering principle would
implicitly emerge in outlier analysis from a specification of a basic model, an
Outliers in Multivariate Data alternative hypothesis, and a statistica! test principle. But as with much of
the ':ork on univariate outliers, this ideai is not often realized. Again tests
for d1scordancy have emerged from 'intuitively reasonable' test statistics and
Test~ of _discordancy of outliers have as much relevance and importance for criteria which may indeed be more 'reasonable' in one context than in some
~ultlvanate data as they do for univariate samples. Many factors carry over other. We must acknowledge the degree of arbitrariness which arises from
1mmediately. The two basic notions, of an outlier as an observation which having to employ sub-ordering, rather than the unattainable total ordering,
engenders surprise owing to its extremeness, and of its discordancy in the as the framework for expressing extremeness. But this need not always be
sense of that 'extremeness' being statistically unreasonable in terms of some too serious. For example, when the basic model is multivariate normal we
basic model, are not constrained by the dimensionality of the data. But their find that reduced ordering of the distances
expression is by no means as straightforward in more than one dimension. R(x; fL, V)= (x- JL)'V- 1 (x- JL)
Any formai, or indeed even subjective, idea of extremeness is obtuse. As
has substantial appeal in terms of probability ellipsoids (an appeal less
Gnanadesikan and Kettenring (1972) remark, the multivariate outlier no
evident for non-normal data) and also arises naturally from a likelihood
longer has a simple manifestation as the observation which 'sticks out at the
ratio approach to outlier discordancy tests.
end' of the sample. But, notably in bivariate data, we may stili perceive an
Once we have set up some pragmatic test statistic for a test of discordancy
observation as suspiciously aberrant from the data mass, particularly so if
of a multivariate outlier no fundamental obstacle remains to its application.
the data is represented in the form of a scatter diagram.
The only problems are computational and manipulative: in determining and
The idea of extremeness inevitably arises from some form of 'ordering' of
tabulating appropriate percentage points in the null distribution of the test
the ?ata. Unfort~nat~ly, no unique unambiguous form of total ordering is
statistic for assessment of significance, or in the non-null distribution as an
poss1ble for multlvanate data, although different types of sub- (less than
expression of the power of the test. But these problems can be severe.
total) ordering principle may be defined and employed. Barnett (1976)
Manipulation of multivariate distributions is notoriously complicated and it
surveys the role of sub-ordering in multivariate analysis. Thus in attempting
is little wonder that few detailed calculations have been published.
to express mathematically the subjective stimulus to the declaration of an
In this chapter we shall be examining, and illustrating, the rather sparse
outlier in a multivariate sample we will have to employ some sub-ordering
amount of material extant for testing discordancy of multivariate outliers.
principle.
Many proposals remain in qualitative rather than quantitative form but it is
Suppose we were to represent a multivariate observation x by means of
informative to consider some of the recent qualitative proposals for outlier
some univariate metric, or distance measure,
assessment which rest o n various forms of graphical" representation of the
R(x; x 0 , f)= (x-x0 )'f- 1 (x-x0 ) data. The dual problem of accommodation of outliers in robust multivariate
inference procedures is correspondingly ill-represented in the literature. A
where Xo reflects the location of the data set or underlying distribution and
few comments on this topic appear later, in Section 6.3.
f- 1 applies a differential weighting to the components of the multivariate
observation related to their scatter or to the population variability. For 6.1 OUTLIERS IN MULTIVARIATE NORMAL SAMPLES
example, Xo might be the zero vector O, or the true mean fL, or the sample
mean i, and f might be the variance-covariance matrix V or its sample As may be anticipated, most of the sparse study of outliers in multivariate
equivalent S, depending on the state of our knowledge about fL and V. One data deals either with the case of an underlying normal distribution, or is

208
21 O Outliers in statistica l data Outliers in multivariate data 211

non-specific in relation to tbe form of tbe basic model. We maintain tbis wbere i' is tbe sample mean of tbe (n -l) observations excluding xi and i is
distinction by considering first tbe results available for normal samples, and cbosen to maximize
in tbe subsequent section examine tbe less formai proposals tbat bave been
made for data from a probabilistically unspecified source. LA(x l V)- L(x l V).
Suppose x 1 , x2 , ••• , xn is a sample of n observations of a p-component Tbus we are led to declare as tbe outlier x(n) tbat observation xi for wbicb
normal random variable X. We initially assume tbese to bave arisen at Ri(i, V) is a maximum, so tbat implicitly tbe observations bave been
random from a p-dimensionai normal distribution, N(p., V), wbere p. is tbe ordered in terms of tbe reduced form of sub-ordering based on tbe distance
p-vector of ,means, and V tbe p x p variance-covariance matrix. A possible measure R(x; i, V). Furtbermore we will declare x(n) a discordant outlier if
alternative model wbicb would account for a single outlier is tbe slippage
alternative, obtained as a multivariate adaptation of tbe univariate models A R(n)(i, V)= (x(n)- i)' v- 1 (X(n)- i)= max Ri(i, V)
j=l, ... , n
(slippage of tbe mean) and B (slippage of tbe variance) discussed by
Ferguson (196la)-see Section 2.3. Specifically, tbe alternative bypotbeses is significantly large.
are: Tbe null distribution of R<nli, V) is not readily determined in exact form
nor very tractable.
model A E(Xi) = p.+a (some i) However, it bas been studied by Siotani (1959) wbo discusses tbe prob-
E(X) =p. (j;é i) lems associated witb determining percentage points of R<nlx0 , f) wben
witb variance-covariance matrix V(X) =V (j =l, 2, ... , n). r = v and Xo is eitber o, p. or i. For tbe latter case, and of immediate
model B V(Xi) = bV (some i) (b >l) relevance to us at tbe present stage of tbe discussion, be presents approxi-
V(Xi) = V (j;é i) mate upper 5, 2!, and l per cent points of R(n)(i, V) for p= 2(1)4 and
witb mean vector E(Xi) =p. (j =l, 2, ... , n). n= 3(1)10(2)20(5)30. Tbe criticai values for 5 per cent and l per cent tests
of discordancy of a single outlier in a multivariate normal sample wbere V is
We sball consider later (in Cbapter 8) tbe Bayesian analysis of tbe model known are reproduced as Table XXVI on page 329.
A presented by Guttman (1973b); reference bas already been made to tbe If p. were known, tbe corresponding Ri(p., V) would be independent x;
corresponding work of Karlin and Truax (Section 5.4 ). variates and we would tben bave to relate tbeir maximum R(n)(p., V) to tbe
For tbe present we adopt a more traditional viewpoint, in considering a distribution of tbe maximum observation in a random sample of size n from
test of discordancy based on tbe two-stage maximum likelihood ratio princi- a x; distribution. ~
ple as explained in Section 3.1. We consider models A and B separately, Gupta (1960) bas considered tbe distribution of tbe order statistics from
witb and witbout tbe assumption tbat parameter values are known. gamma samples and has tabulated percentage points for a range of sample
sizes, an d values of tbe two parameters. (N o te bowever, tbat tbe las t six lines
Model A, V known of some of tbe tables in Gupta (1960) are incorrect; tbey are revised in tbe
On tbe basic model tbe likelibood of tbe sample x h x2 , ••• , xn is propor- 'Errata' section of tbe journal Technometrics, 1960, 2, 523.) Suitably ex-
tional to tracted and modified values serve for tbe outlier problem witb p. and V
known, and Table XXVII on page 329 presents upper 5 per cent and l per
l
P ,..(x V) = IV~"'2 exp {-~;t, (x; - 1
p.)' l ' (x1 - p.)}- (6.1.1)
cent points of R(n)(p., V) for p= 2(2)10 and n= 3(1)10, 25, 50, 100, 200,
500, 1000. Note tbat only even values of p ( ~ 2) are accessible from tbis
table.
Tbe maximized log-likelibood is (apart from tbe constant factor)
In tbe particular case of a bivariate sample (p = 2), R(n)(p., V)/2 bas tbe

l
L(x V)= -tt, 1
(x; -i)'l' (x1 -i). (6.1.2)
distribution of tbe maximum of n independent exponential variates (mean l)
and its percentage points are easily determined. For a level-a test we would
conclude tbat x(n) (tbe observation xi yielding R(n)(p., V)) is a discordant
Under tbe alternative (model A) bypotbesis of a single outlier tbe corres- outlier if R(n)(p., V)> ça wbere
ponding maximized log-likelibood is
a= P{R<n)(p., V)> ça} = l-{F(ça/2)}n (6.1.4)
LA(x l V)= -i~. (xi -i')V- (xi -i')
1 witb
(6.1.3)
Jr-1
F(x)= 1-e-x.
21 O Outliers in statistica l data Outliers in multivariate data 211

non-specific in relation to tbe form of tbe basic model. We maintain tbis wbere i' is tbe sample mean of tbe (n -l) observations excluding xi and i is
distinction by considering first tbe results available for normal samples, and cbosen to maximize
in tbe subsequent section examine tbe less formai proposals tbat bave been
made for data from a probabilistically unspecified source. LA(x l V)- L(x l V).
Suppose x 1 , x2 , ••• , xn is a sample of n observations of a p-component Tbus we are led to declare as tbe outlier x(n) tbat observation xi for wbicb
normal random variable X. We initially assume tbese to bave arisen at Ri(i, V) is a maximum, so tbat implicitly tbe observations bave been
random from a p-dimensionai normal distribution, N(p., V), wbere p. is tbe ordered in terms of tbe reduced form of sub-ordering based on tbe distance
p-vector of ,means, and V tbe p x p variance-covariance matrix. A possible measure R(x; i, V). Furtbermore we will declare x(n) a discordant outlier if
alternative model wbicb would account for a single outlier is tbe slippage
alternative, obtained as a multivariate adaptation of tbe univariate models A R(n)(i, V)= (x(n)- i)' v- 1 (X(n)- i)= max Ri(i, V)
j=l, ... , n
(slippage of tbe mean) and B (slippage of tbe variance) discussed by
Ferguson (196la)-see Section 2.3. Specifically, tbe alternative bypotbeses is significantly large.
are: Tbe null distribution of R<nli, V) is not readily determined in exact form
nor very tractable.
model A E(Xi) = p.+a (some i) However, it bas been studied by Siotani (1959) wbo discusses tbe prob-
E(X) =p. (j;é i) lems associated witb determining percentage points of R<nlx0 , f) wben
witb variance-covariance matrix V(X) =V (j =l, 2, ... , n). r = v and Xo is eitber o, p. or i. For tbe latter case, and of immediate
model B V(Xi) = bV (some i) (b >l) relevance to us at tbe present stage of tbe discussion, be presents approxi-
V(Xi) = V (j;é i) mate upper 5, 2!, and l per cent points of R(n)(i, V) for p= 2(1)4 and
witb mean vector E(Xi) =p. (j =l, 2, ... , n). n= 3(1)10(2)20(5)30. Tbe criticai values for 5 per cent and l per cent tests
of discordancy of a single outlier in a multivariate normal sample wbere V is
We sball consider later (in Cbapter 8) tbe Bayesian analysis of tbe model known are reproduced as Table XXVI on page 329.
A presented by Guttman (1973b); reference bas already been made to tbe If p. were known, tbe corresponding Ri(p., V) would be independent x;
corresponding work of Karlin and Truax (Section 5.4 ). variates and we would tben bave to relate tbeir maximum R(n)(p., V) to tbe
For tbe present we adopt a more traditional viewpoint, in considering a distribution of tbe maximum observation in a random sample of size n from
test of discordancy based on tbe two-stage maximum likelihood ratio princi- a x; distribution. ~
ple as explained in Section 3.1. We consider models A and B separately, Gupta (1960) bas considered tbe distribution of tbe order statistics from
witb and witbout tbe assumption tbat parameter values are known. gamma samples and has tabulated percentage points for a range of sample
sizes, an d values of tbe two parameters. (N o te bowever, tbat tbe las t six lines
Model A, V known of some of tbe tables in Gupta (1960) are incorrect; tbey are revised in tbe
On tbe basic model tbe likelibood of tbe sample x h x2 , ••• , xn is propor- 'Errata' section of tbe journal Technometrics, 1960, 2, 523.) Suitably ex-
tional to tracted and modified values serve for tbe outlier problem witb p. and V
known, and Table XXVII on page 329 presents upper 5 per cent and l per
l
P ,..(x V) = IV~"'2 exp {-~;t, (x; - 1
p.)' l ' (x1 - p.)}- (6.1.1)
cent points of R(n)(p., V) for p= 2(2)10 and n= 3(1)10, 25, 50, 100, 200,
500, 1000. Note tbat only even values of p ( ~ 2) are accessible from tbis
table.
Tbe maximized log-likelibood is (apart from tbe constant factor)
In tbe particular case of a bivariate sample (p = 2), R(n)(p., V)/2 bas tbe

l
L(x V)= -tt, 1
(x; -i)'l' (x1 -i). (6.1.2)
distribution of tbe maximum of n independent exponential variates (mean l)
and its percentage points are easily determined. For a level-a test we would
conclude tbat x(n) (tbe observation xi yielding R(n)(p., V)) is a discordant
Under tbe alternative (model A) bypotbesis of a single outlier tbe corres- outlier if R(n)(p., V)> ça wbere
ponding maximized log-likelibood is
a= P{R<n)(p., V)> ça} = l-{F(ça/2)}n (6.1.4)
LA(x l V)= -i~. (xi -i')V- (xi -i')
1 witb
(6.1.3)
Jr-1
F(x)= 1-e-x.
212 Outliers in statistica[ data Outliers in multivariate data 213

Thus •
(6.1.5)
5 •

provides an expiicit value for use in the test. 10
• •


An informai assessment of discordancy might be based on a graphical plot 20 ••
of the ordered Ri(p., V) [R< 1>{p., V), R<2 lp., V), ... , R<nlfL, V)] as ordinates 30
against the expected vaiues of the order-statistics of a random sample of size 40
n from x; as abscissae. If R<n>(p., V) .appears to be aberrant on this basis (o) 50
60
(Iying above the expected straight Iine) we wopid adjudge the corresponding
70
observation x(n) to be a discordant outlier. With p. unknown, and repiaced 80
by i, or V repiaced by S, the same procedure retains a measure of informai
90 ••
••
propriety and appeal.
Healy (1968) advances just such a graphical procedure for the detection 95 •
of non-normality, and of outliers, for {principally) bivariate data. His basis •
for considering the distance measure R (x; i, S) is its intuitive appeal, rather 99
o 2 3 4 5 6
than any justification in terms of a prescribed alternative model for expiain-
ing outiiers or any specific test construction principie (such as the maximum
likeiihood ratio procedure). It is informative to reproduce a detailed exam- •
pie he discusses. 5 •

Example 6.1. A bivariate sample af 39 abservations af the lagarithms af
10 •
daily fat intake and serum chalesterol level far a group af haspital patients 20 l
••
(data taken fram Begg, Prestan, and Healy, 1966) ha ve values af Ri(i, S)
calculated, where S is the sample variance-cavariance matrix. lgnaring the
inaccuracies intraduced by estimating p. and V, we might cansider platting the
(b)
30
40
50
60
,l
l'
ardered 'distances' Ri against the expected values
•••
70

22 2 2 2 2
-, -+--, -+--+--, ... '-+-- ... +-
2 2 2
80
90

·'
••
n n n-1 n n-1 n-2 n n-1 1 95 •

( with n= 39) af the arder statistics for a sample fram X~· Instead, Healy 99
intraduces a further level af appraximatian, based an the appraximate narmal- o 2 3 4
ity af JX!, an the graunds that this simplifies the graphical pracess and even
Figure 6.1 Normal plots of -JR/i, S) for the distribution of log daily fat intake and
enhances the praspects af distinguishing autliers (in vie w af the reduced log serum cholesterol in a sample of 39 hospital patients. (a) Complete sample; (b)
caeffìcient of variatian af upper extreme sample values). Thus the square raats omitting four extreme values (reproduced by permission of the Royal Statistica!
af the ardered distances far the sample af 39 abservatians are platted an Society)
narmal prabability paper, with the results shawn in Figure 6.1 (a).
He cancludes that the bivariate narmal madel is nat unreasonable, but that
there seem ta be faur outliers. Figure 6.1 (b) is the corresponding plot with assessment of discordancy based on the actuai null distribution of R<n> when
the four outliers remaved. (Figures 6.1(a) and 6.1(b) are reproduced from p. and V are unknown. The assumption of known V is in generai unrealistic.
Healy, 1968.) We therefore proceed to examine the two-stage maximum likeiihood ratio
test for madel A when both p. and V are unknown (eschewing for the
We shall consider graphicai procedures in more detaiilater (Section 6.2) moment Healy's pragmatic simultaneous replacement of p. by i and V by S
but for the moment we continue with a more formai approach to the in the distance measure R(x; p., V)).
212 Outliers in statistica[ data Outliers in multivariate data 213

Thus •
(6.1.5)
5 •

provides an expiicit value for use in the test. 10
• •


An informai assessment of discordancy might be based on a graphical plot 20 ••
of the ordered Ri(p., V) [R< 1>{p., V), R<2 lp., V), ... , R<nlfL, V)] as ordinates 30
against the expected vaiues of the order-statistics of a random sample of size 40
n from x; as abscissae. If R<n>(p., V) .appears to be aberrant on this basis (o) 50
60
(Iying above the expected straight Iine) we wopid adjudge the corresponding
70
observation x(n) to be a discordant outlier. With p. unknown, and repiaced 80
by i, or V repiaced by S, the same procedure retains a measure of informai
90 ••
••
propriety and appeal.
Healy (1968) advances just such a graphical procedure for the detection 95 •
of non-normality, and of outliers, for {principally) bivariate data. His basis •
for considering the distance measure R (x; i, S) is its intuitive appeal, rather 99
o 2 3 4 5 6
than any justification in terms of a prescribed alternative model for expiain-
ing outiiers or any specific test construction principie (such as the maximum
likeiihood ratio procedure). It is informative to reproduce a detailed exam- •
pie he discusses. 5 •

Example 6.1. A bivariate sample af 39 abservations af the lagarithms af
10 •
daily fat intake and serum chalesterol level far a group af haspital patients 20 l
••
(data taken fram Begg, Prestan, and Healy, 1966) ha ve values af Ri(i, S)
calculated, where S is the sample variance-cavariance matrix. lgnaring the
inaccuracies intraduced by estimating p. and V, we might cansider platting the
(b)
30
40
50
60
,l
l'
ardered 'distances' Ri against the expected values
•••
70

22 2 2 2 2
-, -+--, -+--+--, ... '-+-- ... +-
2 2 2
80
90

·'
••
n n n-1 n n-1 n-2 n n-1 1 95 •

( with n= 39) af the arder statistics for a sample fram X~· Instead, Healy 99
intraduces a further level af appraximatian, based an the appraximate narmal- o 2 3 4
ity af JX!, an the graunds that this simplifies the graphical pracess and even
Figure 6.1 Normal plots of -JR/i, S) for the distribution of log daily fat intake and
enhances the praspects af distinguishing autliers (in vie w af the reduced log serum cholesterol in a sample of 39 hospital patients. (a) Complete sample; (b)
caeffìcient of variatian af upper extreme sample values). Thus the square raats omitting four extreme values (reproduced by permission of the Royal Statistica!
af the ardered distances far the sample af 39 abservatians are platted an Society)
narmal prabability paper, with the results shawn in Figure 6.1 (a).
He cancludes that the bivariate narmal madel is nat unreasonable, but that
there seem ta be faur outliers. Figure 6.1 (b) is the corresponding plot with assessment of discordancy based on the actuai null distribution of R<n> when
the four outliers remaved. (Figures 6.1(a) and 6.1(b) are reproduced from p. and V are unknown. The assumption of known V is in generai unrealistic.
Healy, 1968.) We therefore proceed to examine the two-stage maximum likeiihood ratio
test for madel A when both p. and V are unknown (eschewing for the
We shall consider graphicai procedures in more detaiilater (Section 6.2) moment Healy's pragmatic simultaneous replacement of p. by i and V by S
but for the moment we continue with a more formai approach to the in the distance measure R(x; p., V)).
214 Outliers in statistical data Outliers in multivariate data 215

Model A, V unknown Th11s. the outlier is again that observation whose distance. from the body of
jhe da!~~!_~ma.!!!!l!!l!l2. p_~o_yi(_le(_l_~_f?_~stimate i.tand v ~6y·--i a:nd-S--in fhe
With V unknown (as well as p.) the maximized log-likelihood under the ,dist~11:ce function;--(This supports Healy's~J.nformarproposals.)
basic model is (apart from the constant factor) Once more the J:listribution of the test statistic is highly complicated. Little
n is known in detail of the joint distribution of the ~i or more particularly of
L(x) = -2log IAI (6.1.6) the distribution of the minimum, ~< 1 >.
However, there is a deal of useful tabulated materia! on approximate
wh ere lA l is the determinant of the matrix of sums of squares an d cross- percentage po'ints for ~< 1 >, and on the corresponding statistic for assessing
products of the observations about the component sample means: that is the discordancy of a pair of outliers in a multivariate normal sample. The
tables appear in work by Wilks (1963) which remains the most detailed
A= f (xi -i)(xi -i)'.
j=1
(6.1.7) applications-oriented study of outlier detection in multivariate data. We
shall consider this work in some detail and apply it to a problem related to
Under the model A alternative the maximized log-likelihood is employment prospects of engineering graduates.
Concerned with testing outlying observations in a sample from a multi-
LA (x) = -!!: log lA (i)l (6.1.8) variate normal distribution with unknown mean vector and variance-
2 covariance matrix, Wilks (1963) proposes an intuitively based representa-
where A (i) is the restricted matrix obtained on omission of xi and i is chosen tion of the sample in terms of the sum of squares of the volumes of ali
to maximize simplexes that can be found from p of the sample points augmented by the
LA (x)- L(x). sample mean i. He shows (Wilks, 1962) that this is just (p!)- 2 jAj, where A
is the matrix de fin ed above. H e calls lA l the internai scatter of the sample
Thus when V is unknown it seems at first sight that quite a different
and suggests that a sensible criterion for the declaration of an outlier is to
principle is advanced for the declaration of an outlier xi and for the
choose that sample member whose omission leads to the least value for the
assessment of its discordancy. H ere we are implicitly ordering the multi-
so-called one-outlier scatter ratio
variate observations in terms of an aggregated form of reduced sub-
ordering, based on the values of lA <n1. The lA <nl are ordered, and the
observation corresponding with the smallest value of lA (i)l is declared an
outlier. Equivalently, if we denote
IA(j)l But this is precisely the likelihood ratio criterion and corresponding test
~j=w statistic. Wilks shows that the ~i are identically distributed Beta variates

the sample points are 'ordered' in accord with the ordered ~i and the outlier oo(n- ~-l,~), with a joint distribution symmetric over Rn subject to
is that observation corresponding with the smallest ~i' ~<1>· If ~<1> is
significantly low in value the outlier is adjudged discordant. '!'.~~~-.~he_ ~u~lier
is that observation whose removal from the satnpleeffeéts the gre~!~st
~ ~·=
LJ 1
n(l--p
n-1
)
reduction in tlÌe- 'internai scatter'-oCthe- d aia~ set~ Bui the distinction of
prin~ipl~-fo~ d~~la~ing-an -~~tliel'-in the case of uiiKiìown V, compared with (j=l,2, ... ,n).
the case where V is known, is less profound than might appear at first sight.
Clearly we can rewrite The joint distribution is intractable, but Wilks obtains an upper bound for
the distribution function of ~< 1 > (which he denotes r1 ), and hence lower
lA- (-n )<x -i)(x -i)' l bounds for the lower percentage points of ~< 1 > thus enabling conservative
fYI;= n-11~1 ' =1-(n:1)R;(i,A) (6.1.9)
tests of significance for a single outlier to be conducted. In comparison with
exact results due to Grubbs (1950) for the case p= l, the approximate
values seem reasonable, though it must be stressed that their accuracy for
and minimization of ~i becomes equivalent to maximization of p> l has not been assessed since there is at present no yardstick (in terms of
Ri(i, A)= Ri(i, S)!(n -l). (6.1.10) exact probabilities) for comparison.
214 Outliers in statistical data Outliers in multivariate data 215

Model A, V unknown Th11s. the outlier is again that observation whose distance. from the body of
jhe da!~~!_~ma.!!!!l!!l!l2. p_~o_yi(_le(_l_~_f?_~stimate i.tand v ~6y·--i a:nd-S--in fhe
With V unknown (as well as p.) the maximized log-likelihood under the ,dist~11:ce function;--(This supports Healy's~J.nformarproposals.)
basic model is (apart from the constant factor) Once more the J:listribution of the test statistic is highly complicated. Little
n is known in detail of the joint distribution of the ~i or more particularly of
L(x) = -2log IAI (6.1.6) the distribution of the minimum, ~< 1 >.
However, there is a deal of useful tabulated materia! on approximate
wh ere lA l is the determinant of the matrix of sums of squares an d cross- percentage po'ints for ~< 1 >, and on the corresponding statistic for assessing
products of the observations about the component sample means: that is the discordancy of a pair of outliers in a multivariate normal sample. The
tables appear in work by Wilks (1963) which remains the most detailed
A= f (xi -i)(xi -i)'.
j=1
(6.1.7) applications-oriented study of outlier detection in multivariate data. We
shall consider this work in some detail and apply it to a problem related to
Under the model A alternative the maximized log-likelihood is employment prospects of engineering graduates.
Concerned with testing outlying observations in a sample from a multi-
LA (x) = -!!: log lA (i)l (6.1.8) variate normal distribution with unknown mean vector and variance-
2 covariance matrix, Wilks (1963) proposes an intuitively based representa-
where A (i) is the restricted matrix obtained on omission of xi and i is chosen tion of the sample in terms of the sum of squares of the volumes of ali
to maximize simplexes that can be found from p of the sample points augmented by the
LA (x)- L(x). sample mean i. He shows (Wilks, 1962) that this is just (p!)- 2 jAj, where A
is the matrix de fin ed above. H e calls lA l the internai scatter of the sample
Thus when V is unknown it seems at first sight that quite a different
and suggests that a sensible criterion for the declaration of an outlier is to
principle is advanced for the declaration of an outlier xi and for the
choose that sample member whose omission leads to the least value for the
assessment of its discordancy. H ere we are implicitly ordering the multi-
so-called one-outlier scatter ratio
variate observations in terms of an aggregated form of reduced sub-
ordering, based on the values of lA <n1. The lA <nl are ordered, and the
observation corresponding with the smallest value of lA (i)l is declared an
outlier. Equivalently, if we denote
IA(j)l But this is precisely the likelihood ratio criterion and corresponding test
~j=w statistic. Wilks shows that the ~i are identically distributed Beta variates

the sample points are 'ordered' in accord with the ordered ~i and the outlier oo(n- ~-l,~), with a joint distribution symmetric over Rn subject to
is that observation corresponding with the smallest ~i' ~<1>· If ~<1> is
significantly low in value the outlier is adjudged discordant. '!'.~~~-.~he_ ~u~lier
is that observation whose removal from the satnpleeffeéts the gre~!~st
~ ~·=
LJ 1
n(l--p
n-1
)
reduction in tlÌe- 'internai scatter'-oCthe- d aia~ set~ Bui the distinction of
prin~ipl~-fo~ d~~la~ing-an -~~tliel'-in the case of uiiKiìown V, compared with (j=l,2, ... ,n).
the case where V is known, is less profound than might appear at first sight.
Clearly we can rewrite The joint distribution is intractable, but Wilks obtains an upper bound for
the distribution function of ~< 1 > (which he denotes r1 ), and hence lower
lA- (-n )<x -i)(x -i)' l bounds for the lower percentage points of ~< 1 > thus enabling conservative
fYI;= n-11~1 ' =1-(n:1)R;(i,A) (6.1.9)
tests of significance for a single outlier to be conducted. In comparison with
exact results due to Grubbs (1950) for the case p= l, the approximate
values seem reasonable, though it must be stressed that their accuracy for
and minimization of ~i becomes equivalent to maximization of p> l has not been assessed since there is at present no yardstick (in terms of
Ri(i, A)= Ri(i, S)!(n -l). (6.1.10) exact probabilities) for comparison.
216 Outliers in statistica[ data Outliers in multivariate data 217

Wilks (1963) tabulates lower bounds to the lower 10, 5, 2!, and 1 per cent Table 6.1 Ages and salaries of a sample of 55 electrical engineers in the UK in
1974
points of 9lt(l) for p= 1(1)5 and n= 5(1)30(5)100(100)500. For closer com-
parability with Tables XXVI and XXVII selected vaiues from Wilks's tabies Age (years) Annual salary (!) Age (years) Annual salary (!)
bave been transformed via (6.1.9) and (6.1.10) into approximate upper
percentage points for R<n/i, S). Tabie XXVIII presents criticai vaiues for 5 27.67 2930 26.58 2600
per cent and 1 per cent tests of discordancy of a singie outlier in a· 23.42 2330 25.50 2250
24.67 2480 29.25 3600
multivariate normai sampie where f1 and V are estimated by i and S, 27.92 4100 26.00 2750
respectively. The table covers the ranges p= 2(1)5, n= 5(1)10(2)20(5)50, 26.92 2500 29.33 3500
100, 200, 500. Note that the elements of S are unbiased sampie variance 28.92 3380 26.25 3400
and covariance estimates with divisors n -1. 26.08 2720 26.83 4500
Wilks adopts a similar approach to the testing en bloc of 2, 3, or 4 outliers 29.92 4930 27.92 2800
29.50 3020 27.08 3610
in a multivariate sampie, by considering for the s-outlier case (s = 2, 3, 4) 22.42 1970 28.33 3100
the s-outlier scatter ratios
23.42 1700 28.33 2900
28.00 3100 30.00 3600
lA <ìl'i2··· .• i.)l 23.00 1950 28.25 3600
. ••• ,].. = "-------..:
9lt.Jt,]2, IAI 25.17 2320 24.67 2030
26.58 2750 25.42 3520
22.75 1960 22.67 1900
wh ere lA <il' i 2' · · · •i.)l is the internai scatter when xit, xh, ... , xi. are omitted 25.00 2300 25.92 3230
from the sample. Again it is the subset of observations that minimizes 30.00 4120 25.25 2500
9lt it. h, . .. , i. which is declared the outlying subsèt an d their discordancy must 23.50 3900 (L) 25.58 3020
be assessed in terms of how small is 26.58 5200 (M) 29.92 3610
28.25 3200 30.58 4200
25.33 2300 23.83 3050
rs = min 97th, h, ... ,j.. 25.25 2200 26.33 2760
27.92 3500 26.83 4000
For s = 2, an outlying subset of two observations, Wilks tabulates square 29.50 3600 28.25 3100
roots of lower bounds for the Iower percentage points of r2 • Extracted values 21.17 1470 (N)
25.67 2690
are reproduced as Table XXIX on page 331 for preciseiy the same set of 28.42 2860
significance Ieveis and vaiues of p and n as are used in Table XXVIII. Thus 26.00 3000
Table XXIX gives criticai vaiues for 5 per cent and l per cent tests of 27.42 3100
discordancy (based on ..JT;.) of outlier pairs in multivariate normal samples
where p. and V are unknown for p= 2(1)5 and n= 5(1)10(2)20(5)50, 100, largest values of Ri(i, S). We have
200, 500.
M: R<ss)(i, S) = 13.7,
Example 6.2. In 1974, the Southem Graduate and Student Section of the L: R<54li, S) = 9 .2,
Institution of Electrical Engineers conducted a salary suroey of its members, N: R<53 )(i, S) = 6.0.
recording for each their age ( to the nearest month) and their current annual
salary (in f). Table 6.1 presents (by kind permission of the authors of the From Table XXVIII we see that M is a discordant outlier. Explicit calculation
suroey) the data for a sub-sample of 55 of the 374 retums. Figure 6.2 is (for n= 55) from Wilks's tables gives 5 per cent and 2! per cent criticai values
a scatter diagram of the sub-sample. The obseroations L, M, and N appear as of 12.?2 ~nd 13.58 res~ectiv~ly. Thus M is discordant at the ~per cent level.
outliers and it is interesting to test their discordancy. Por illustrative purposes Examznatwn of the outlzer pazr (M, L) on Wilks's test gives vr2 = 0.769 which
we assume a bivariate normal distribution of ages and salaries, although no is significant at the 5 per cent leve l (extrapolation in T ab le XXIX, or
detailed study has been made of the underlying distribution. inspection of Wilks's tables, gives a 5 per cent criticai value of 0.778; the 2!
Calculations show that M, L, and N yield (in decreasing order) the three per cent criticai value is 0.767).
216 Outliers in statistica[ data Outliers in multivariate data 217

Wilks (1963) tabulates lower bounds to the lower 10, 5, 2!, and 1 per cent Table 6.1 Ages and salaries of a sample of 55 electrical engineers in the UK in
1974
points of 9lt(l) for p= 1(1)5 and n= 5(1)30(5)100(100)500. For closer com-
parability with Tables XXVI and XXVII selected vaiues from Wilks's tabies Age (years) Annual salary (!) Age (years) Annual salary (!)
bave been transformed via (6.1.9) and (6.1.10) into approximate upper
percentage points for R<n/i, S). Tabie XXVIII presents criticai vaiues for 5 27.67 2930 26.58 2600
per cent and 1 per cent tests of discordancy of a singie outlier in a· 23.42 2330 25.50 2250
24.67 2480 29.25 3600
multivariate normai sampie where f1 and V are estimated by i and S, 27.92 4100 26.00 2750
respectively. The table covers the ranges p= 2(1)5, n= 5(1)10(2)20(5)50, 26.92 2500 29.33 3500
100, 200, 500. Note that the elements of S are unbiased sampie variance 28.92 3380 26.25 3400
and covariance estimates with divisors n -1. 26.08 2720 26.83 4500
Wilks adopts a similar approach to the testing en bloc of 2, 3, or 4 outliers 29.92 4930 27.92 2800
29.50 3020 27.08 3610
in a multivariate sampie, by considering for the s-outlier case (s = 2, 3, 4) 22.42 1970 28.33 3100
the s-outlier scatter ratios
23.42 1700 28.33 2900
28.00 3100 30.00 3600
lA <ìl'i2··· .• i.)l 23.00 1950 28.25 3600
. ••• ,].. = "-------..:
9lt.Jt,]2, IAI 25.17 2320 24.67 2030
26.58 2750 25.42 3520
22.75 1960 22.67 1900
wh ere lA <il' i 2' · · · •i.)l is the internai scatter when xit, xh, ... , xi. are omitted 25.00 2300 25.92 3230
from the sample. Again it is the subset of observations that minimizes 30.00 4120 25.25 2500
9lt it. h, . .. , i. which is declared the outlying subsèt an d their discordancy must 23.50 3900 (L) 25.58 3020
be assessed in terms of how small is 26.58 5200 (M) 29.92 3610
28.25 3200 30.58 4200
25.33 2300 23.83 3050
rs = min 97th, h, ... ,j.. 25.25 2200 26.33 2760
27.92 3500 26.83 4000
For s = 2, an outlying subset of two observations, Wilks tabulates square 29.50 3600 28.25 3100
roots of lower bounds for the Iower percentage points of r2 • Extracted values 21.17 1470 (N)
25.67 2690
are reproduced as Table XXIX on page 331 for preciseiy the same set of 28.42 2860
significance Ieveis and vaiues of p and n as are used in Table XXVIII. Thus 26.00 3000
Table XXIX gives criticai vaiues for 5 per cent and l per cent tests of 27.42 3100
discordancy (based on ..JT;.) of outlier pairs in multivariate normal samples
where p. and V are unknown for p= 2(1)5 and n= 5(1)10(2)20(5)50, 100, largest values of Ri(i, S). We have
200, 500.
M: R<ss)(i, S) = 13.7,
Example 6.2. In 1974, the Southem Graduate and Student Section of the L: R<54li, S) = 9 .2,
Institution of Electrical Engineers conducted a salary suroey of its members, N: R<53 )(i, S) = 6.0.
recording for each their age ( to the nearest month) and their current annual
salary (in f). Table 6.1 presents (by kind permission of the authors of the From Table XXVIII we see that M is a discordant outlier. Explicit calculation
suroey) the data for a sub-sample of 55 of the 374 retums. Figure 6.2 is (for n= 55) from Wilks's tables gives 5 per cent and 2! per cent criticai values
a scatter diagram of the sub-sample. The obseroations L, M, and N appear as of 12.?2 ~nd 13.58 res~ectiv~ly. Thus M is discordant at the ~per cent level.
outliers and it is interesting to test their discordancy. Por illustrative purposes Examznatwn of the outlzer pazr (M, L) on Wilks's test gives vr2 = 0.769 which
we assume a bivariate normal distribution of ages and salaries, although no is significant at the 5 per cent leve l (extrapolation in T ab le XXIX, or
detailed study has been made of the underlying distribution. inspection of Wilks's tables, gives a 5 per cent criticai value of 0.778; the 2!
Calculations show that M, L, and N yield (in decreasing order) the three per cent criticai value is 0.767).
Outliers in multivariate data 219
218 Outliers in statistica[ data

considers tbose decision rules wbicb satisfy four conditions:


5000
8 (a) eacb is invariant under tbe addition to Xi of a constant vector;
/ (b) eacb is invariant under tbe multiplication of Xi by a common non-
/ singular matrix;
/ (c) tbe probability pi(~J of declaring xi tbe discordant value wben this is
4000 true is independent of i;
/
. / . . ·.· .
ci
a. (d) tbe probability of correctly declaring no discordant value is 1- a, for a
~ preassigned a in (0, l); tbat is, tbe procedure bas size a.
>-
/
. .. ...
5

. ...
o /"
(/)
He seeks tbat decision rule wbicb maximizes pi(~i). It turns out to bave a
3000
~ .... familiar form: tbe optimum rule is to reject xi as tbe discordant value if xi
yields the maximum value R<n>{i, S) and
./.
R<n>(i, S) > K
/ ··:· wbere K is cbosen to satisfy tbe test size condition (d).
2000
Tbus, as for model A, we again declare tbe outlier to be the observation

.
N

21 22
/
A
23 24 25 26 27 28 29 30 31
witb maximum generalized distance Ri(i, S), and assess it as discordant if
tbat maximum, R<n>{i, S), is sufficiently large.
Additionally, bowever, Ferguson demonstrates tbat tbis procedure is
Age (years)
uniformly best over ali values of b >l.
Tbus wben p. and V are unknown it is immaterial wbetber we adopt tbe
Figure 6.2 Ages an d salaries of electrical engineers (UK, 197 4)
model A or model B formulation of tbe alternative hypotbesis describing tbe
occurrence of a single outlier. In eitber case tbe test bas tbe same form, and
Of course the two tests of discordancy (for a single outlier, and for an
can be implemented by using tbe Table XXVIII on page 330.
outlier-pair) are not independent. It seems reasonable to assess M as a
Summarizing the status of tbe {identica!) model A, and model B, tests of
discordancy wben f1 and V are unknown, we bave tbat in, relation to model
discordant outlier; the status of L is more questionable.
A, the test is tbe likelihood ratio test, wbereas relative to model B it bas tbe
We sbould remark tbat tbe various tests of discordancy for multivariate uniform optimality property of maximizing g(~J irrespective of tbe value
normal outliers discussed in tbis section bave tbe desirable property of being of b.
invariant witb respect to tbe location and scale of tbe measurement basis of Siotani (1959) tabulates approximate percentage points for a studentized
tbe observations. form of R<n>(i, S) wbere S is replaced by an external unbiased estimate Sv of
For model B, witb a single possible discordant value in a normal .sample, V baving a Wishart distribution witb v degrees of freedom. Tbese are of
Ferguson (1961a) bas derived a multi-decision procedure (see Sect10n 2.5) value for an informai test of discordancy of a single outlier in a multivariate
witb certain optimal properties. See also Kudo (1957). sample, wbere V is not estimated from tbe sample itself but by means of
Suppose again tbat xh ... , xn is a random sample initially. assumed to sucb an external estimate.
arise from N(fl., V), with f1 and V unknown. Under tbe alternative, model B, Table XXX on pages 332-333, extracted from Siotani (1959), presents
bypotbesis of a single outlier, we bave approximate 5 per cent and l per cent criticai values for a test of a single
outlier in a bivariate normal sample (p = 2) wbere f1 and V are unknown,
E(Xi)= IL {j =l, 2, ... , n), and V is estimated by Sv. Tbe table covers tbe values n= 3(1)14 and
V(Xi)= bV (some i; b> 1), v= 20(2)40(5)60, 100, 150, 200.

V(Xi)= V {j;é i).


6.2 INFORMAL DETECTION OF MULTIVARIATE OUTLIERS
Denoting by ~i tbe decision to regard xi as tbe discordant value (i= A bost of proposals bave been made for informally detecting outliers in
l, 2, ... , n), witb ~o tbe decision to declare no discordant value, Ferguson multivariate data by quantitative or grapbical metbods. Tbese cannot be
Outliers in multivariate data 219
218 Outliers in statistica[ data

considers tbose decision rules wbicb satisfy four conditions:


5000
8 (a) eacb is invariant under tbe addition to Xi of a constant vector;
/ (b) eacb is invariant under tbe multiplication of Xi by a common non-
/ singular matrix;
/ (c) tbe probability pi(~J of declaring xi tbe discordant value wben this is
4000 true is independent of i;
/
. / . . ·.· .
ci
a. (d) tbe probability of correctly declaring no discordant value is 1- a, for a
~ preassigned a in (0, l); tbat is, tbe procedure bas size a.
>-
/
. .. ...
5

. ...
o /"
(/)
He seeks tbat decision rule wbicb maximizes pi(~i). It turns out to bave a
3000
~ .... familiar form: tbe optimum rule is to reject xi as tbe discordant value if xi
yields the maximum value R<n>{i, S) and
./.
R<n>(i, S) > K
/ ··:· wbere K is cbosen to satisfy tbe test size condition (d).
2000
Tbus, as for model A, we again declare tbe outlier to be the observation

.
N

21 22
/
A
23 24 25 26 27 28 29 30 31
witb maximum generalized distance Ri(i, S), and assess it as discordant if
tbat maximum, R<n>{i, S), is sufficiently large.
Additionally, bowever, Ferguson demonstrates tbat tbis procedure is
Age (years)
uniformly best over ali values of b >l.
Tbus wben p. and V are unknown it is immaterial wbetber we adopt tbe
Figure 6.2 Ages an d salaries of electrical engineers (UK, 197 4)
model A or model B formulation of tbe alternative hypotbesis describing tbe
occurrence of a single outlier. In eitber case tbe test bas tbe same form, and
Of course the two tests of discordancy (for a single outlier, and for an
can be implemented by using tbe Table XXVIII on page 330.
outlier-pair) are not independent. It seems reasonable to assess M as a
Summarizing the status of tbe {identica!) model A, and model B, tests of
discordancy wben f1 and V are unknown, we bave tbat in, relation to model
discordant outlier; the status of L is more questionable.
A, the test is tbe likelihood ratio test, wbereas relative to model B it bas tbe
We sbould remark tbat tbe various tests of discordancy for multivariate uniform optimality property of maximizing g(~J irrespective of tbe value
normal outliers discussed in tbis section bave tbe desirable property of being of b.
invariant witb respect to tbe location and scale of tbe measurement basis of Siotani (1959) tabulates approximate percentage points for a studentized
tbe observations. form of R<n>(i, S) wbere S is replaced by an external unbiased estimate Sv of
For model B, witb a single possible discordant value in a normal .sample, V baving a Wishart distribution witb v degrees of freedom. Tbese are of
Ferguson (1961a) bas derived a multi-decision procedure (see Sect10n 2.5) value for an informai test of discordancy of a single outlier in a multivariate
witb certain optimal properties. See also Kudo (1957). sample, wbere V is not estimated from tbe sample itself but by means of
Suppose again tbat xh ... , xn is a random sample initially. assumed to sucb an external estimate.
arise from N(fl., V), with f1 and V unknown. Under tbe alternative, model B, Table XXX on pages 332-333, extracted from Siotani (1959), presents
bypotbesis of a single outlier, we bave approximate 5 per cent and l per cent criticai values for a test of a single
outlier in a bivariate normal sample (p = 2) wbere f1 and V are unknown,
E(Xi)= IL {j =l, 2, ... , n), and V is estimated by Sv. Tbe table covers tbe values n= 3(1)14 and
V(Xi)= bV (some i; b> 1), v= 20(2)40(5)60, 100, 150, 200.

V(Xi)= V {j;é i).


6.2 INFORMAL DETECTION OF MULTIVARIATE OUTLIERS
Denoting by ~i tbe decision to regard xi as tbe discordant value (i= A bost of proposals bave been made for informally detecting outliers in
l, 2, ... , n), witb ~o tbe decision to declare no discordant value, Ferguson multivariate data by quantitative or grapbical metbods. Tbese cannot be
220 Outliers in statistica[ data Outliers in multivariate data 221

regarded as tests of discordancy; they are designed more as aids to intuition data) in the identification of outliers. Firstly, we know what we mean by a
in picking out multivariate observations which are suspiciously aberrant marginai outlier: it is an extreme member of the marginai sample. Secondly,
from the bulk of the sample. we bave facilities for testing the discordancy of such univariate outliers for a
In univariate samples the 'suspicious' observation which is to be declared range of different basic models (and we can adopt models to explain the
an outlier is obvious on simple inspection. It is an extreme observation in outliers). Clearly outliers in different marginai samples need not be indepen-
the sample. In multivariate data the extremeness concept is a nebulous orie, dent, and conclusions about discordancy will need to reftect this. And
as we bave remarked above. Various forms of initial processing of the data, thirdly, perhaps most important, it is quite plausible to expect outliers to be
involving study of individuai marginai components of the observations, exhibited within specific components of the multivariate observations. This
judicious reduction of the multivariate observations to scalar quantities in is particularly true when the outliers arise from gross errors of measurement
the forms of distance measures or linear combinations of components, or recording, where almost inevitably it will be a single component in the
changes in the coordinate bases of the observations, and appropriate multivariate observation which will suffer.
methods of graphical representation, can ali help to identify or highlight a The techniques described in Chapter 3 will be appropriate to the testing
suspicious observation. If several such procedures are applied simultane- of marginai outliers.
ously (or individually) to a set of data they can help to overcome the
difficulty caused by the absence of a natural overall ordering of the sample 6.2.2 Linear constraints
members. An observation which clearly stands out on one, or preferably
more, processed re-representations of the sample becomes a firm candidate Another situation in which we may bave easy access to the detection of
for identification as an outlier. outliers is where we anticipate a simple (usually linear) relationship between
Gnanadesikan and Kettenring (1972) remark: the components of the multivariate observation or between the expected
values of the components. For example our multivariate observation might
The consequences of having ... [outliers] in a multivariate sample are intrinsically
more complex than in the much discussed univariate case. One reason for this is that consist of proportionate measurements, such as the proportions of the total
a multivariate outlier can distort not only measures of location and scale but also body length of a reptile corresponding with biologically distinct sections of
those of orientation (i.e. correlation). A second reason is that it is much more difficult the body, or it may be the three angles of a triangle in a geographic survey.
to characterize the multivariate outlier. A single univariate outlier may be typically In the first case consistency of representation demands that the proportions
thought of as 'the one which sticks out on the end', but no such simple idea suffices should bave unit sum; in the second the sum of the components should,
in higher dimensions. A third reason is the variety of types of multivariate outliers
which may arise: a vector response may be faulty because of a gross error in one of apart from errors of measurement, add to 180°. In either case, marked
its components or because of systematic mild errors in aH of its components. departures of the sum of the component values from their expected sum
The complexity of the multivariate case suggests that it would be fruitless to search can highlight gross errors of measurement or of recording as an indication of
for a truly omnibus outlier protection procedure. A more reasonable approach seems outliers. Fellegi (197 5) comments o n the presence of outliers in the editing
to be to tailor detection procedures to protect against specific types of situations, e.g., of multivariate data where just such 'pre-identified relationships' hold. He
correlation distortion, thus building up an arsenal of techniques with different
sensitivities. This approach recognizes that an outlier for one purpose may not includes consideration of less specific forms of relationship where, for
necessarily be one for another purpose! However, if several analyses are to be example, we bave information on the reasonable range of values which may
performed on the same sample, the result of selective segregation of outliers should be taken by some ratio of marginai components.
be more efficient and effective use of the available data. Note that outliers identified in this way need not (indeed are unlikely to)
Such methods do not (in generai) lead to any formai test of discordancy; show up merely on consideration of the marginai samples.
they seldom even adopt any specific assumptions about the distribution from
which the sample has arisen or about the nature of an alternative (outlier 6.2.3 Graphical methods
generating) hypothesis. They are to be viewed as initial data screening
procedures, in the spirit of the current interest in 'data analysis' methods for In relation to multivariate outliers, Rohlf (1975) remarks as follows;
the representation and summary of large-scale sets of data.
Despite the apparent complexity of the problem, one can stili characterize outliers by
6.2.1 Marginai outliers the fact that they are somewhat isolated from the main cloud of points. They may not
'stick out on the end' of the distribution as univariate outliers must, but they must
We should not underestimate the role to be played by the marginai samples 'stick out' somewhere. Points which are not internai to the cloud of points (i.e. which
(that is, the univariate samples of each component value in the multivariate are somewhere on the surface of the cloud) are potentially outliers. Techniques
220 Outliers in statistica[ data Outliers in multivariate data 221

regarded as tests of discordancy; they are designed more as aids to intuition data) in the identification of outliers. Firstly, we know what we mean by a
in picking out multivariate observations which are suspiciously aberrant marginai outlier: it is an extreme member of the marginai sample. Secondly,
from the bulk of the sample. we bave facilities for testing the discordancy of such univariate outliers for a
In univariate samples the 'suspicious' observation which is to be declared range of different basic models (and we can adopt models to explain the
an outlier is obvious on simple inspection. It is an extreme observation in outliers). Clearly outliers in different marginai samples need not be indepen-
the sample. In multivariate data the extremeness concept is a nebulous orie, dent, and conclusions about discordancy will need to reftect this. And
as we bave remarked above. Various forms of initial processing of the data, thirdly, perhaps most important, it is quite plausible to expect outliers to be
involving study of individuai marginai components of the observations, exhibited within specific components of the multivariate observations. This
judicious reduction of the multivariate observations to scalar quantities in is particularly true when the outliers arise from gross errors of measurement
the forms of distance measures or linear combinations of components, or recording, where almost inevitably it will be a single component in the
changes in the coordinate bases of the observations, and appropriate multivariate observation which will suffer.
methods of graphical representation, can ali help to identify or highlight a The techniques described in Chapter 3 will be appropriate to the testing
suspicious observation. If several such procedures are applied simultane- of marginai outliers.
ously (or individually) to a set of data they can help to overcome the
difficulty caused by the absence of a natural overall ordering of the sample 6.2.2 Linear constraints
members. An observation which clearly stands out on one, or preferably
more, processed re-representations of the sample becomes a firm candidate Another situation in which we may bave easy access to the detection of
for identification as an outlier. outliers is where we anticipate a simple (usually linear) relationship between
Gnanadesikan and Kettenring (1972) remark: the components of the multivariate observation or between the expected
values of the components. For example our multivariate observation might
The consequences of having ... [outliers] in a multivariate sample are intrinsically
more complex than in the much discussed univariate case. One reason for this is that consist of proportionate measurements, such as the proportions of the total
a multivariate outlier can distort not only measures of location and scale but also body length of a reptile corresponding with biologically distinct sections of
those of orientation (i.e. correlation). A second reason is that it is much more difficult the body, or it may be the three angles of a triangle in a geographic survey.
to characterize the multivariate outlier. A single univariate outlier may be typically In the first case consistency of representation demands that the proportions
thought of as 'the one which sticks out on the end', but no such simple idea suffices should bave unit sum; in the second the sum of the components should,
in higher dimensions. A third reason is the variety of types of multivariate outliers
which may arise: a vector response may be faulty because of a gross error in one of apart from errors of measurement, add to 180°. In either case, marked
its components or because of systematic mild errors in aH of its components. departures of the sum of the component values from their expected sum
The complexity of the multivariate case suggests that it would be fruitless to search can highlight gross errors of measurement or of recording as an indication of
for a truly omnibus outlier protection procedure. A more reasonable approach seems outliers. Fellegi (197 5) comments o n the presence of outliers in the editing
to be to tailor detection procedures to protect against specific types of situations, e.g., of multivariate data where just such 'pre-identified relationships' hold. He
correlation distortion, thus building up an arsenal of techniques with different
sensitivities. This approach recognizes that an outlier for one purpose may not includes consideration of less specific forms of relationship where, for
necessarily be one for another purpose! However, if several analyses are to be example, we bave information on the reasonable range of values which may
performed on the same sample, the result of selective segregation of outliers should be taken by some ratio of marginai components.
be more efficient and effective use of the available data. Note that outliers identified in this way need not (indeed are unlikely to)
Such methods do not (in generai) lead to any formai test of discordancy; show up merely on consideration of the marginai samples.
they seldom even adopt any specific assumptions about the distribution from
which the sample has arisen or about the nature of an alternative (outlier 6.2.3 Graphical methods
generating) hypothesis. They are to be viewed as initial data screening
procedures, in the spirit of the current interest in 'data analysis' methods for In relation to multivariate outliers, Rohlf (1975) remarks as follows;
the representation and summary of large-scale sets of data.
Despite the apparent complexity of the problem, one can stili characterize outliers by
6.2.1 Marginai outliers the fact that they are somewhat isolated from the main cloud of points. They may not
'stick out on the end' of the distribution as univariate outliers must, but they must
We should not underestimate the role to be played by the marginai samples 'stick out' somewhere. Points which are not internai to the cloud of points (i.e. which
(that is, the univariate samples of each component value in the multivariate are somewhere on the surface of the cloud) are potentially outliers. Techniques
222 Outliers in statistica[ data Outliers in multivariate àata 223
which determine the position of a point relative to the others would seem to be for N). A cbange of coordinate basis, and re-representation of tbe data on
useful. A second important consideration is that outliers must be separated from the
other points by distinct gaps. tbe new basis, can reveal outliers not immediately apparent previously.
Rotation of tbe axes of Figure 6.2 in tbe direction of AB (or of its
Witb tbis empbasis it is natura! to consider different ways in wbicb we can perpendicular) can}lelp to identify L and M, (or N) depending on wbicb of
merely look at tbe data to see if tbey seem to contain outliers. A variety of tbe new axes is considered. An appropriate grapbical representation of tbe
metbods employing different forms of pictorial or grapbical representation ordered sample vaiues (eitber marginally in tbe originai data, or for particu-
bave been proposed witb varying degrees of sopbistication in terms of tbe lar linear combinations of tbe components corresponding witb a transforma-
pre-processing of tbe data prior to its display. For obvious reasons, bivariate tions of axes) can dramatically augment tbe visual impact of outliers.
data is tbe most amenable to informative display, altbougb it will be We sball consider in more detail some work wbicb utilizes sucb ideas.
apparent tbat some of tbe approacbes do nor depend vitally on tbe dimen-
sionality of tbe data. Tbe basic ideas and interests in sucb metbods of 6.2.4 Principal component analysis method
'informai inference' applied to generai problems of anaiysis of multivariate
data (not solely tbe detection of outliers) are described by Gnanadesikan Severa! writers bave suggested performing a preliminary principal compo-
(1973). A variety of papers over several years by tbe same autbor, often in nent analysis on tbe data, and looking at sample values of tbe projection of
conjunction witb otbers, expand and illustrate tbese metbods; see, for tbe observations on to tbe principal components of different order. Tbe
example, Wilk and Gnanadesikan (1964), Gnanadesikan and Wilk (1969), example above, on eiectrical engineers' salaries and ages, sbows bow pro-
and tbe lengtby review paper by Gnanadesikan and Kettenring (1972) on jecting tbe observations on tbe leading or secondary principal component
wbicb mucb of tbe summary below is based. Tbis paper contains a multitude axes (rougbly AB and its perpendicular) can bigbligbt different types of
of informative illustrative examples. outlier. Tbis distinction in tbe relative utility of tbe first few, an d las t few,
We sball consider some possibilities for demonstrating tbe presence of principal components in outlier detection is basic to tbe metbods described
outliers in (predominantly) bivariate data. Tbe most rudimentary form of in tbe literature. Gnanadesikan and Kettenring (1972) discuss tbis in some
representation is tbe scatter diagram of two, out of tbe p, components. detail, remarking bow tbe first few principal components are sensitive to
Figure 6.2 sbows tbe scatter diagram for tbe data of salaries and ages of outliers inftating variances or covariances (or correlations, if tbe principal
electrical engineers (see page 218). Some observations do seem to 'stick out' component analysis bas been conducted in terms of tbe sample correlation
and to be separated from otbers by 'distinct gaps'; notabiy tbe observations matrix, ratber tban tbe sample covariance matrix), wbilst tbe last few are
L and M previously identified as discordant outliers. But in a different sensitive to outliers adding spurious dimensions to tbe data or obscuring
respect, tbe observation N also seems somewbat suspicious. Tbe observation singularities.
L (and to a lesser degree M) migbt well bave tbe effect of reducing tbe Suppose we write
apparent correlation between age and salary in tbe population, wbilst N Z=LX (6.2.1)
leads one to assume a larger variation in ages, or saiaries, tban would bave wbere L is a p x p ortbogonal matrix wbose rows, l{, are tbe eigenvectors of
appeared plausible in its absence. Tbis effect of N is beigbtened if we
S corresponding witb its eigenvaiues, ci, expressed in descending order of
consider tbe projection of tbe observations on to tbe line AB tbrougb tbe magnitude and X is tbe p x n matrix wbose itb column is tbe transformed
data set as sbown on Figure 6.2 (rougbly tbe regression line of salary on age) observations xi- i. Tben the lf are tbe principal component coordinates an d
wbilst L and M now appear in no way aberrant. In contrast if we project tbe itb row of Z, z:, gives tbe projections on to tbe itb principal component
onto tbe perpendicular to AB tben L and M are particuiarly extreme, wbilst coordinate of the deviations of the n originai observations about i. Tbus tbe
N appears more reasonabie.
top few, or lower few, rows of Z, provide tbe means of investigating the
Tbis example embodies many of tbe considerations empioyed in designing presence of outliers affecting the first few, or last few, principal components.
grapbical metbods for exbibiting multivariate outliers. In tbe first piace, tbe Tbe construction of scatter diagrams for pairs of zi (among tbe first few, or
scatter diagram itself may tbrow up outliers as observations on tbe peripbery last few, principai components) can grapbically exbibit outliers. Additionally
of tbe 'data cloud', distinctly separate d from otbers. Tbe marginai samples univariate outlier tests can be applied to individuai zi; or tbe orde re d values
may or may not endorse sucb a declaration, and yield a formai assessment of in zi can usefully be plotted against an appropriate cboice of plotting
discordancy (tbis could be possible for N, not for L or M). Tbe perturbation positions. Wbat is 'appropriate' is not easily assessed in any exact form,
of some aggregate measure, sucb as tbe correiation coefficient, from wbat is especially in tbe absence of reliable distributionai assumptions about tbe
anticipated may reveal tbe presence of outliers {likely bere for L and M, not originai data. However, if p is reasonably large, it is likeiy tbat tbe linear
222 Outliers in statistica[ data Outliers in multivariate àata 223
which determine the position of a point relative to the others would seem to be for N). A cbange of coordinate basis, and re-representation of tbe data on
useful. A second important consideration is that outliers must be separated from the
other points by distinct gaps. tbe new basis, can reveal outliers not immediately apparent previously.
Rotation of tbe axes of Figure 6.2 in tbe direction of AB (or of its
Witb tbis empbasis it is natura! to consider different ways in wbicb we can perpendicular) can}lelp to identify L and M, (or N) depending on wbicb of
merely look at tbe data to see if tbey seem to contain outliers. A variety of tbe new axes is considered. An appropriate grapbical representation of tbe
metbods employing different forms of pictorial or grapbical representation ordered sample vaiues (eitber marginally in tbe originai data, or for particu-
bave been proposed witb varying degrees of sopbistication in terms of tbe lar linear combinations of tbe components corresponding witb a transforma-
pre-processing of tbe data prior to its display. For obvious reasons, bivariate tions of axes) can dramatically augment tbe visual impact of outliers.
data is tbe most amenable to informative display, altbougb it will be We sball consider in more detail some work wbicb utilizes sucb ideas.
apparent tbat some of tbe approacbes do nor depend vitally on tbe dimen-
sionality of tbe data. Tbe basic ideas and interests in sucb metbods of 6.2.4 Principal component analysis method
'informai inference' applied to generai problems of anaiysis of multivariate
data (not solely tbe detection of outliers) are described by Gnanadesikan Severa! writers bave suggested performing a preliminary principal compo-
(1973). A variety of papers over several years by tbe same autbor, often in nent analysis on tbe data, and looking at sample values of tbe projection of
conjunction witb otbers, expand and illustrate tbese metbods; see, for tbe observations on to tbe principal components of different order. Tbe
example, Wilk and Gnanadesikan (1964), Gnanadesikan and Wilk (1969), example above, on eiectrical engineers' salaries and ages, sbows bow pro-
and tbe lengtby review paper by Gnanadesikan and Kettenring (1972) on jecting tbe observations on tbe leading or secondary principal component
wbicb mucb of tbe summary below is based. Tbis paper contains a multitude axes (rougbly AB and its perpendicular) can bigbligbt different types of
of informative illustrative examples. outlier. Tbis distinction in tbe relative utility of tbe first few, an d las t few,
We sball consider some possibilities for demonstrating tbe presence of principal components in outlier detection is basic to tbe metbods described
outliers in (predominantly) bivariate data. Tbe most rudimentary form of in tbe literature. Gnanadesikan and Kettenring (1972) discuss tbis in some
representation is tbe scatter diagram of two, out of tbe p, components. detail, remarking bow tbe first few principal components are sensitive to
Figure 6.2 sbows tbe scatter diagram for tbe data of salaries and ages of outliers inftating variances or covariances (or correlations, if tbe principal
electrical engineers (see page 218). Some observations do seem to 'stick out' component analysis bas been conducted in terms of tbe sample correlation
and to be separated from otbers by 'distinct gaps'; notabiy tbe observations matrix, ratber tban tbe sample covariance matrix), wbilst tbe last few are
L and M previously identified as discordant outliers. But in a different sensitive to outliers adding spurious dimensions to tbe data or obscuring
respect, tbe observation N also seems somewbat suspicious. Tbe observation singularities.
L (and to a lesser degree M) migbt well bave tbe effect of reducing tbe Suppose we write
apparent correlation between age and salary in tbe population, wbilst N Z=LX (6.2.1)
leads one to assume a larger variation in ages, or saiaries, tban would bave wbere L is a p x p ortbogonal matrix wbose rows, l{, are tbe eigenvectors of
appeared plausible in its absence. Tbis effect of N is beigbtened if we
S corresponding witb its eigenvaiues, ci, expressed in descending order of
consider tbe projection of tbe observations on to tbe line AB tbrougb tbe magnitude and X is tbe p x n matrix wbose itb column is tbe transformed
data set as sbown on Figure 6.2 (rougbly tbe regression line of salary on age) observations xi- i. Tben the lf are tbe principal component coordinates an d
wbilst L and M now appear in no way aberrant. In contrast if we project tbe itb row of Z, z:, gives tbe projections on to tbe itb principal component
onto tbe perpendicular to AB tben L and M are particuiarly extreme, wbilst coordinate of the deviations of the n originai observations about i. Tbus tbe
N appears more reasonabie.
top few, or lower few, rows of Z, provide tbe means of investigating the
Tbis example embodies many of tbe considerations empioyed in designing presence of outliers affecting the first few, or last few, principal components.
grapbical metbods for exbibiting multivariate outliers. In tbe first piace, tbe Tbe construction of scatter diagrams for pairs of zi (among tbe first few, or
scatter diagram itself may tbrow up outliers as observations on tbe peripbery last few, principai components) can grapbically exbibit outliers. Additionally
of tbe 'data cloud', distinctly separate d from otbers. Tbe marginai samples univariate outlier tests can be applied to individuai zi; or tbe orde re d values
may or may not endorse sucb a declaration, and yield a formai assessment of in zi can usefully be plotted against an appropriate cboice of plotting
discordancy (tbis could be possible for N, not for L or M). Tbe perturbation positions. Wbat is 'appropriate' is not easily assessed in any exact form,
of some aggregate measure, sucb as tbe correiation coefficient, from wbat is especially in tbe absence of reliable distributionai assumptions about tbe
anticipated may reveal tbe presence of outliers {likely bere for L and M, not originai data. However, if p is reasonably large, it is likeiy tbat tbe linear
224 Outliers in statistica/ data Outliers in multivariate data 225

transformations invoived in the principai component anaiysis may lead (via .


M
Centrai Limit Theorem arguments) to the zi being samples from approxi-
mately normai distributions. In such cases normal probability plotting in
which the jth ordered vaiue in zi is plotted against ai, where
ai= E[U(j)]
with U(j) the jth order statistic of the normai distribution, N(O, 1), may well
1000
..
...........
L
reveai outliers as extreme points in the plot lying off the Iinear reiationship
exhibited by the mass of points in the piot. Such an informai procedure has
been found to be a usefui aid to the identification of muitivariate outliers. ..
To illustrate this we again consider the salary/age data for eiectricai
engineers. Figures 6.3(a) and 6.3{b) show normai probability piots for the
o
.......
-
.
....,..
first an d second principai components respectiveiy. The first principai com-
ponent is essentially just the salary. Figure 6.3(a) shows no marked contra-
... ....
..
....
indication of normality. The outliers M and N show as extreme vaiues (L is
inconspicuous) aithough they do not lie off the linear relationship in the -1000
manner we wouid expect were they discordant outliers. In the plot (Figure
6.3b) of projections onto the second principai component, L an d M are
distinguished as extremes (N is inconspicuous) and they do Iie below the
linear relationship, indicative of discordancy. -2 -1 o 2 a
l
(a)
Added ftexibility of approach is provided by basing principal component
analysis on the sample correlation matrix, R, instead of on S, and aiso by
following the proposai of Gnanadesikan and Kettenring (1972) of repiacing
...
.....·· ...
R or S by modified robust estimates. The robustness aspect will be taken up
in the discussion of the accommodation of muitivariate outliers in Section
6.3.
...
.
Some modifications of approach to outlier detection by principai compo-
......
......
nent analysis are suggested by Hawkins (1974) and by Fellegi (1975).
Hawkins considers specifically the case where X is multivariate normai, and o
gives some consideration to different alternative hypotheses expiaining the
.. ....
presence of a single outlier. ....
6.2.5 Use of generalized distances -2
...
Another way in which informai quantitative and graphicai procedures may
be used to exhibit outliers is to construct reduced univariate measures based
on the observations xi (analogous to the distance functions more formally
considered earlier). Gnanadesikan and Kettenring (1972) consider various
possibie measures in the classes: -4

I: (xi- i)' Sb (xi- i), (6.2.2)


II: (xi -i)'Sb(xi -i)/[{xi -i)'(xi -i)]. (6.2.3) -2 -1 o 2
(b)
Particularly extreme vaiues of such statistics, possibly demonstrated by Figure 6.3 Normal probability plots for principal components of engineers'
graphicai display, may reveai outliers of different types. Such measures are salary/age data. (a) First principal component; (b) Second principal component
224 Outliers in statistica/ data Outliers in multivariate data 225

transformations invoived in the principai component anaiysis may lead (via .


M
Centrai Limit Theorem arguments) to the zi being samples from approxi-
mately normai distributions. In such cases normal probability plotting in
which the jth ordered vaiue in zi is plotted against ai, where
ai= E[U(j)]
with U(j) the jth order statistic of the normai distribution, N(O, 1), may well
1000
..
...........
L
reveai outliers as extreme points in the plot lying off the Iinear reiationship
exhibited by the mass of points in the piot. Such an informai procedure has
been found to be a usefui aid to the identification of muitivariate outliers. ..
To illustrate this we again consider the salary/age data for eiectricai
engineers. Figures 6.3(a) and 6.3{b) show normai probability piots for the
o
.......
-
.
....,..
first an d second principai components respectiveiy. The first principai com-
ponent is essentially just the salary. Figure 6.3(a) shows no marked contra-
... ....
..
....
indication of normality. The outliers M and N show as extreme vaiues (L is
inconspicuous) aithough they do not lie off the linear relationship in the -1000
manner we wouid expect were they discordant outliers. In the plot (Figure
6.3b) of projections onto the second principai component, L an d M are
distinguished as extremes (N is inconspicuous) and they do Iie below the
linear relationship, indicative of discordancy. -2 -1 o 2 a
l
(a)
Added ftexibility of approach is provided by basing principal component
analysis on the sample correlation matrix, R, instead of on S, and aiso by
following the proposai of Gnanadesikan and Kettenring (1972) of repiacing
...
.....·· ...
R or S by modified robust estimates. The robustness aspect will be taken up
in the discussion of the accommodation of muitivariate outliers in Section
6.3.
...
.
Some modifications of approach to outlier detection by principai compo-
......
......
nent analysis are suggested by Hawkins (1974) and by Fellegi (1975).
Hawkins considers specifically the case where X is multivariate normai, and o
gives some consideration to different alternative hypotheses expiaining the
.. ....
presence of a single outlier. ....
6.2.5 Use of generalized distances -2
...
Another way in which informai quantitative and graphicai procedures may
be used to exhibit outliers is to construct reduced univariate measures based
on the observations xi (analogous to the distance functions more formally
considered earlier). Gnanadesikan and Kettenring (1972) consider various
possibie measures in the classes: -4

I: (xi- i)' Sb (xi- i), (6.2.2)


II: (xi -i)'Sb(xi -i)/[{xi -i)'(xi -i)]. (6.2.3) -2 -1 o 2
(b)
Particularly extreme vaiues of such statistics, possibly demonstrated by Figure 6.3 Normal probability plots for principal components of engineers'
graphicai display, may reveai outliers of different types. Such measures are salary/age data. (a) First principal component; (b) Second principal component
226 Outliers in statistica[ data Outliers in multivariate data 227

of course related to the projections on tbe principal components and component coordinates for assessing tbe propriety of individuai observa-
Gnanadesikan and Kettenring (1972) remark tbat, with classI measures, as b tions. Tbus outliers may be revealed by particularly large values of
increases above + 1 more and more empbasis is placed on tbe first few
principal components whereas wben b decreases below -1 tbis emphasis
progressively sbifts to tbe last few principal components (a similar effect
f
i=p-q-1
[lf{xi -i)f

holds for class II measures according as b ~ 0). Extra ftexibility arises by Tbe suggestions of Gnanadesikan and Kettenring (1972) for informally
considering xi - xr {j =1- j') ratber than xi -i in tbe different measures, or R in considering residuals in least-squares fits of structured models, as a means of
piace of S. identifying outliers, are more appropriate to tbe discussion of outliers in
Le t us consider some specific examples of tbe case I measur«(S.
$
regression models and designed experiments (Chapter 7).

{b=O) qf=(xi-i)'(xi-i)= n~l [tr(A)-tr(A<n)]. (6.2.4) 6.2.6 Fourier-type representation


Gnanadesikan (1973) gives an interesting illustration of the potential use,
This squared Euclidean distance from i is sensitive to outliers 'inftating the
for detecting outliers, of a novel means of representing multivariate observa-
overall scale'.
tions, due to Andrews (1972). Andrews suggests tbat xi= (x 1i, x2i' ... , xPi)'
sbould be represented by tbe function
(b = 1) tf =(xi -i)'S(xi -i)= L cJI~(xi -i)] 2
, (6.2.5)
i

is sensitive to outliers affecting tbe 'orientation and scale of tbe first few fx/t)= x 1/l2+x 2i sin t+x 3 i cos t+x 4 i sin2t+x 5 i cos 2t... (6.2.7)
principal components of s'. over tbe range (-1r, 7T) for t. E acb sample point in p-space tben appears as a
curve over sucb values of t. Tbe idea is tbat tbis might reveal certain
(b =-l) df =(xi -i)'S- 1 (xi -i)= L c; [1~(xi -i)]
i
1 2
(6.2.6)
important qualitative features in tbe data.
Gnanadesikan {1973) sbows in an example bow it migbt usefully disting-
is particularly useful for 'uncovering observations wbicb lie far afield from uisb outliers. He considers a quadrivariate sample of 50 observations on
the generai scatter of points'. log-lengtbs and log-widtbs of sepals and petals for Iris setosa described by
For grapbical display of outliers, the 'gamma-type probability plots' of Fisber (1936). He cbooses a grid of values of t over (-1r, 1r), determines
ordered values, witb appropriately estimated shape parameter, are a useful fxj(t) {j =l, 2, ... , 50) over tbe grid, and at eacb grid value estimates certain
approximate procedure. Essentially tbe argument is as follows. If tbe multi- quantiles of fx{t) from tbe data. Tbe quantiles cbosen are tbe 10, 25, 50, 75,
variate observation xi comes from a normal distribution, tben tbe distance and 90 percentiles. Tbese are presented grapbically along witb any indi-
measures Ri(i., f) may be regarded as {approximately) independent observa- viduai fxj(t) values, at eacb t, outside tbe deciles. Tbe results are sbown in
tions from a gamma distribution, wbatever form is taken for f. Tbus if we Figure 6.4 wbere we see a very clear indication of tbe outlying nature of
knew, or could reasonably estimate, tbe parameters in tbe gamma distribu- observations number 16 and 42. {Tbe median values are labelled M,
tion, a plot of tbe ordered Ri(i., f) against quantiles of tbe gamma distribu- quartiles Q, and extreme deciles T.)
tion sbould be linear witb anomalous Ri(i, f) (for example, discordant We would not claim tbat otber tecbniques, sucb as residuals based on a
outliers) sbowing as extreme values lying markedly off tbe overall linear principal components analysis, would fail to exbibit tbese outliers. But tbis
relationsbip. (See Wilk and Gnanadesikan, 1964; Wilk, Gnanadesikan, and use of Andrews' representation may prove to bave interesting possibilities
Huyett, 1962a, 1962b, and furtber comments in Section 6.2.8.) If X is for tbe study of outliers.
multivariate normal tben tbe exact marginai distribution of tbe df is known
to be related to a Beta form witb parameters (n- p- 1)/2 and p/2, but tbe
6.2.7 Correlation methods
df are not, of course, independent (see Section 6.1). We note again tbat
consideration of the maximum value of df is equivalent to Wilks's (1963) We bave already remarked on tbe way in wbicb outliers may affect, and be
metbod; see also (above) tbe proposals of Healy {1968) for plotting tbe df revealed by, tbe correlation structure in tbe data. Some proposals for
wben X is approximately normal. identifying multivariate outliers specifically consider tbis matter.
Rao (1964) proposes examination of tbe sums of squares of tbe lengtbs of Gnanadesikan and Kettenring (1972) suggest tbat we examine tbe
tbe projections of individuai observations on tbe last few (q) principal product-moment correlation coefficients r -/s, t) relating to tbe stb an d tth
226 Outliers in statistica[ data Outliers in multivariate data 227

of course related to the projections on tbe principal components and component coordinates for assessing tbe propriety of individuai observa-
Gnanadesikan and Kettenring (1972) remark tbat, with classI measures, as b tions. Tbus outliers may be revealed by particularly large values of
increases above + 1 more and more empbasis is placed on tbe first few
principal components whereas wben b decreases below -1 tbis emphasis
progressively sbifts to tbe last few principal components (a similar effect
f
i=p-q-1
[lf{xi -i)f

holds for class II measures according as b ~ 0). Extra ftexibility arises by Tbe suggestions of Gnanadesikan and Kettenring (1972) for informally
considering xi - xr {j =1- j') ratber than xi -i in tbe different measures, or R in considering residuals in least-squares fits of structured models, as a means of
piace of S. identifying outliers, are more appropriate to tbe discussion of outliers in
Le t us consider some specific examples of tbe case I measur«(S.
$
regression models and designed experiments (Chapter 7).

{b=O) qf=(xi-i)'(xi-i)= n~l [tr(A)-tr(A<n)]. (6.2.4) 6.2.6 Fourier-type representation


Gnanadesikan (1973) gives an interesting illustration of the potential use,
This squared Euclidean distance from i is sensitive to outliers 'inftating the
for detecting outliers, of a novel means of representing multivariate observa-
overall scale'.
tions, due to Andrews (1972). Andrews suggests tbat xi= (x 1i, x2i' ... , xPi)'
sbould be represented by tbe function
(b = 1) tf =(xi -i)'S(xi -i)= L cJI~(xi -i)] 2
, (6.2.5)
i

is sensitive to outliers affecting tbe 'orientation and scale of tbe first few fx/t)= x 1/l2+x 2i sin t+x 3 i cos t+x 4 i sin2t+x 5 i cos 2t... (6.2.7)
principal components of s'. over tbe range (-1r, 7T) for t. E acb sample point in p-space tben appears as a
curve over sucb values of t. Tbe idea is tbat tbis might reveal certain
(b =-l) df =(xi -i)'S- 1 (xi -i)= L c; [1~(xi -i)]
i
1 2
(6.2.6)
important qualitative features in tbe data.
Gnanadesikan {1973) sbows in an example bow it migbt usefully disting-
is particularly useful for 'uncovering observations wbicb lie far afield from uisb outliers. He considers a quadrivariate sample of 50 observations on
the generai scatter of points'. log-lengtbs and log-widtbs of sepals and petals for Iris setosa described by
For grapbical display of outliers, the 'gamma-type probability plots' of Fisber (1936). He cbooses a grid of values of t over (-1r, 1r), determines
ordered values, witb appropriately estimated shape parameter, are a useful fxj(t) {j =l, 2, ... , 50) over tbe grid, and at eacb grid value estimates certain
approximate procedure. Essentially tbe argument is as follows. If tbe multi- quantiles of fx{t) from tbe data. Tbe quantiles cbosen are tbe 10, 25, 50, 75,
variate observation xi comes from a normal distribution, tben tbe distance and 90 percentiles. Tbese are presented grapbically along witb any indi-
measures Ri(i., f) may be regarded as {approximately) independent observa- viduai fxj(t) values, at eacb t, outside tbe deciles. Tbe results are sbown in
tions from a gamma distribution, wbatever form is taken for f. Tbus if we Figure 6.4 wbere we see a very clear indication of tbe outlying nature of
knew, or could reasonably estimate, tbe parameters in tbe gamma distribu- observations number 16 and 42. {Tbe median values are labelled M,
tion, a plot of tbe ordered Ri(i., f) against quantiles of tbe gamma distribu- quartiles Q, and extreme deciles T.)
tion sbould be linear witb anomalous Ri(i, f) (for example, discordant We would not claim tbat otber tecbniques, sucb as residuals based on a
outliers) sbowing as extreme values lying markedly off tbe overall linear principal components analysis, would fail to exbibit tbese outliers. But tbis
relationsbip. (See Wilk and Gnanadesikan, 1964; Wilk, Gnanadesikan, and use of Andrews' representation may prove to bave interesting possibilities
Huyett, 1962a, 1962b, and furtber comments in Section 6.2.8.) If X is for tbe study of outliers.
multivariate normal tben tbe exact marginai distribution of tbe df is known
to be related to a Beta form witb parameters (n- p- 1)/2 and p/2, but tbe
6.2.7 Correlation methods
df are not, of course, independent (see Section 6.1). We note again tbat
consideration of the maximum value of df is equivalent to Wilks's (1963) We bave already remarked on tbe way in wbicb outliers may affect, and be
metbod; see also (above) tbe proposals of Healy {1968) for plotting tbe df revealed by, tbe correlation structure in tbe data. Some proposals for
wben X is approximately normal. identifying multivariate outliers specifically consider tbis matter.
Rao (1964) proposes examination of tbe sums of squares of tbe lengtbs of Gnanadesikan and Kettenring (1972) suggest tbat we examine tbe
tbe projections of individuai observations on tbe last few (q) principal product-moment correlation coefficients r -/s, t) relating to tbe stb an d tth
228 Outliers in statistica[ data Outliers in multivariate data 229
0000
0000000
o o inftuence function of r, the product-moment correlation estimate in a
o 16 o
o bivariate sample, they propose (with an obvious notation)
o
o
N
<;t
I_(x 1i, x 2 i; r) =(n -1)(r- r_i). (6.2.9)
o
o
o
I_(x 1i, x 2 i; r) provides an estimate of the inftuence on r of the omission of
o the observation (x 1i, x 2 i).
o
o Two suggestions are made for presenting graphically how I_(x 1i, x 2 i; r)
TTTTTTT varies over the sample, with a view to identifying as outliers the observations
000000000 TT TTTT
00 ,.TTTT
T
O
00
which exhibit a particularly strong inftuence on r. The first amounts to
TTTTTTT
TTT TTT T TTTTTT
TT
T
00
000 TTTT
TTT
superimposing selected (hyperbolic) contours of L(x 1 , x 2 ; r) on the scatter
TT Q QQQQQ 00
QQ Q QQ QQQQ QQQ diagram, thus distinguishing the outliers. Some qualitative comments are
QQQQQQ TTTTTTTTT T Q Q QQ
QQQ QQ
QQQQQQ
T
QQQQ
QQQQ TT T
QQQQ
T TT
QQ
TTTTT
QQ QQ
QQQQQQQQQ made (and illustrated) concerning the choice of which contours to plot. The
MMM QQQQQ MMMMMM
QQQQQ
MM MMM M Q
QQ
QQQQQQQ Q
QQQQ
Q QQ
M
second relates to the sample inftuence function .of the Fisher transformation
MMMMM MMMM
MMMM
MMMMMMMMMMM M
M MMMMMMMM
MM
MMM M MMMM
MMMMMM MMMMMMMMMMMM
MMMMMMMMM MMMMM
z(r) = tanh- 1 {r):
MM MM QQQQ M

QQQ
QQQ QQQ
QQQQQQ Q QQQ
QQQQQQQQQQQ
QQQQQQ
I_(x 1i, x 2 i; z(r)) =(n -1)[z(r)- z(r_)]. (6.2.10)
QQQQ QQ QQQQQ TTTTTTTT QQQQ QQQQQQQQ Q TTTTTTTT QQQQ
Q QQQQ T TTT
QQQQQQ QQQQQQQQQQQ TT T
T QQQQQQQQQQ T Por large n, the distribution of I_(x 1i, x 2 i; z(r)) is approximately that of the
TT 00 TT TTTTT 00000000000 TT
TTTTTTT
TT T
TT 000 O
O
T
TTTTTTTT
T O 00 TTTTT product of two independent standard norma! variabies, and it is proposed
TT T TTTTTTT TT
TTTTTT TT
T
that ordered vaiues of I_ be plotted against the appropriate quantiles. The
distinct I_ values over the sample are not seriousiy correlated, and a further
o
o o normalizing transformation is proposed prior to the probability plotting.
o
00000
0000000 N
00 Again, it will be extreme values in the plot, lying away from the overall
16 o
o <;t
16 linear relationship, which indicate outliers.
o

6.2.8 A 'gap test' for multivariate m.dliers


o o
0000 000
We noted earlier the characterization of multivariate outliers suggested by
Figure 6.4 Andrews' plot of the Iris setosa data (reproduced by permission of R.
Rohlf (1975): that they are separated from other observations 'by distinct
Gnanadesikan and the International Statistica! Institute) gaps'. Rohlf has used this idea to develop a gap test for multivariate outliers
based on minimum spanning trees. Eschewing the nearest neighbour dis-
marginai samples after the omission of the single observation xi. As we vary tances as measures of separation, in view of the masking effect a cluster of
j we can examine, for any choice of s and t, the way in which the correlation outliers may exert on each other, be considers instead the lengths of edges in
changes, substantiai variations reftecting possible outliers. the minimum spanning tree MST (or shortest simply connected graph) of
Devlin, Gnanadesikan, and Kettenring (1975) make use of the influence the data set as measures of adjacency. He argues that a single isolated point
function of Hampel (1974) to investigate how outliers affect correiation will be connected to oniy one other point in the MST by a relatively large
estimates in bivariate data (p = 2). Their main interest is in robust estimation distance, and that at least one edge connection from a cluster of outliers
of correlation-see Section 6.3. But they are also concerned with the detec- must also be relatively large. Accordingiy a gap test for outliers is proposed
tion of outliers per se. They consider a multivariate distribution indexed by a with the following form. Firstly, examination of the marginai samples yields
parameter 6 and define in relation to an estimator {J the 'sample influence estimates sk (k = 1, 2, ... , p) of the standard deviations. The observations
function' are rescaied as xL= xkJsk (k = 1, 2, ... , p; i= 1, 2, ... , n). Distances be-
{j = 1, 2, ... , n) (6.2.8) tween xf and xj in the MST are calculated as

where {J_i is an estimator of the same fo~m as {J based on the sample


omitting the observation xi. We see that ()+I_ is just the jth jackknife
d,i = Lt. 2
[(xk,- xkj) ]/pr (6.2.11)
pseudo-value. As a convenient first-order approximation to the sample and in particuiar we denote by zi the lengths of the n -1 edges of the MST.
228 Outliers in statistica[ data Outliers in multivariate data 229
0000
0000000
o o inftuence function of r, the product-moment correlation estimate in a
o 16 o
o bivariate sample, they propose (with an obvious notation)
o
o
N
<;t
I_(x 1i, x 2 i; r) =(n -1)(r- r_i). (6.2.9)
o
o
o
I_(x 1i, x 2 i; r) provides an estimate of the inftuence on r of the omission of
o the observation (x 1i, x 2 i).
o
o Two suggestions are made for presenting graphically how I_(x 1i, x 2 i; r)
TTTTTTT varies over the sample, with a view to identifying as outliers the observations
000000000 TT TTTT
00 ,.TTTT
T
O
00
which exhibit a particularly strong inftuence on r. The first amounts to
TTTTTTT
TTT TTT T TTTTTT
TT
T
00
000 TTTT
TTT
superimposing selected (hyperbolic) contours of L(x 1 , x 2 ; r) on the scatter
TT Q QQQQQ 00
QQ Q QQ QQQQ QQQ diagram, thus distinguishing the outliers. Some qualitative comments are
QQQQQQ TTTTTTTTT T Q Q QQ
QQQ QQ
QQQQQQ
T
QQQQ
QQQQ TT T
QQQQ
T TT
QQ
TTTTT
QQ QQ
QQQQQQQQQ made (and illustrated) concerning the choice of which contours to plot. The
MMM QQQQQ MMMMMM
QQQQQ
MM MMM M Q
QQ
QQQQQQQ Q
QQQQ
Q QQ
M
second relates to the sample inftuence function .of the Fisher transformation
MMMMM MMMM
MMMM
MMMMMMMMMMM M
M MMMMMMMM
MM
MMM M MMMM
MMMMMM MMMMMMMMMMMM
MMMMMMMMM MMMMM
z(r) = tanh- 1 {r):
MM MM QQQQ M

QQQ
QQQ QQQ
QQQQQQ Q QQQ
QQQQQQQQQQQ
QQQQQQ
I_(x 1i, x 2 i; z(r)) =(n -1)[z(r)- z(r_)]. (6.2.10)
QQQQ QQ QQQQQ TTTTTTTT QQQQ QQQQQQQQ Q TTTTTTTT QQQQ
Q QQQQ T TTT
QQQQQQ QQQQQQQQQQQ TT T
T QQQQQQQQQQ T Por large n, the distribution of I_(x 1i, x 2 i; z(r)) is approximately that of the
TT 00 TT TTTTT 00000000000 TT
TTTTTTT
TT T
TT 000 O
O
T
TTTTTTTT
T O 00 TTTTT product of two independent standard norma! variabies, and it is proposed
TT T TTTTTTT TT
TTTTTT TT
T
that ordered vaiues of I_ be plotted against the appropriate quantiles. The
distinct I_ values over the sample are not seriousiy correlated, and a further
o
o o normalizing transformation is proposed prior to the probability plotting.
o
00000
0000000 N
00 Again, it will be extreme values in the plot, lying away from the overall
16 o
o <;t
16 linear relationship, which indicate outliers.
o

6.2.8 A 'gap test' for multivariate m.dliers


o o
0000 000
We noted earlier the characterization of multivariate outliers suggested by
Figure 6.4 Andrews' plot of the Iris setosa data (reproduced by permission of R.
Rohlf (1975): that they are separated from other observations 'by distinct
Gnanadesikan and the International Statistica! Institute) gaps'. Rohlf has used this idea to develop a gap test for multivariate outliers
based on minimum spanning trees. Eschewing the nearest neighbour dis-
marginai samples after the omission of the single observation xi. As we vary tances as measures of separation, in view of the masking effect a cluster of
j we can examine, for any choice of s and t, the way in which the correlation outliers may exert on each other, be considers instead the lengths of edges in
changes, substantiai variations reftecting possible outliers. the minimum spanning tree MST (or shortest simply connected graph) of
Devlin, Gnanadesikan, and Kettenring (1975) make use of the influence the data set as measures of adjacency. He argues that a single isolated point
function of Hampel (1974) to investigate how outliers affect correiation will be connected to oniy one other point in the MST by a relatively large
estimates in bivariate data (p = 2). Their main interest is in robust estimation distance, and that at least one edge connection from a cluster of outliers
of correlation-see Section 6.3. But they are also concerned with the detec- must also be relatively large. Accordingiy a gap test for outliers is proposed
tion of outliers per se. They consider a multivariate distribution indexed by a with the following form. Firstly, examination of the marginai samples yields
parameter 6 and define in relation to an estimator {J the 'sample influence estimates sk (k = 1, 2, ... , p) of the standard deviations. The observations
function' are rescaied as xL= xkJsk (k = 1, 2, ... , p; i= 1, 2, ... , n). Distances be-
{j = 1, 2, ... , n) (6.2.8) tween xf and xj in the MST are calculated as

where {J_i is an estimator of the same fo~m as {J based on the sample


omitting the observation xi. We see that ()+I_ is just the jth jackknife
d,i = Lt. 2
[(xk,- xkj) ]/pr (6.2.11)
pseudo-value. As a convenient first-order approximation to the sample and in particuiar we denote by zi the lengths of the n -1 edges of the MST.
Outliers in multivariate data 231
230 Outliers in statistica[ data

Tbe zi are now examined for bomogeneity, eitber by means of a probabil- from a gamma distribution witb sbape parameter 71· Equivalently
ity plot of tbeir ordered values o~y testing if tbe ratio of tbe square of tbe
maximum, zln-l)' to tbe average, z 2 , of tbe squares is of reasonable value.
Tbe 'gamma-type plot' of tbe z(i) against quantiles of a gamma distribu- - l ,,,
tion bas beuristic justification if X comes from a p-variate norma! distribu- wbere z 2 = - - I~- 1 zf, bas a Beta distribution witb parameters 11 and
n- 1
tion, on tbe following argument. If tbe components of X were independent (n -1)71 (independent of A) and use can be made of existing results on
norma! (eacb witb unit variance) tbe inter-point squared Euclidean distances (approximate) upper percentage points of sucb a Beta distribution (see
would be independently distributed as 2x~. If tbe components of X an~ not Roblf, 1975, for details). Not knowing the value of 11 it is proposed tbat we
independent, tbese distances will be depend~nt and may not follow too sbould relate zln-l)/[(n -l)z 2] to tbe approximate upper percentage points
closely a gamma distribution. However, Roblf claims tbat empirica! investi- of tbe appropriate Beta distribution for an estimated value 1}. Roblf presents
gations demonstrate tbat tbe particular su bse t of squared edge distances, zf, a table of upper bounds to tbe upper 5 per cent and l per cent points for
do appear to bave approximately independent common gamma distributions n= 10, 20(20)100, 200 and 11 = 0.1(0.1)1.0(0.5)5.0, 6.0(2.0)12.0.
(on tbe assumption of bomogeneity, tbat is, absence of discordant outliers). Tbe idea of using tbe MST to refiect outliers is interesting but clearly
Tbe relevant sbape parameter will need to be estimated eitber (iteratively) needs more detailed study and illustration. Roblf's proposals are specifically
by tbe maximum likelibood metbod, or by using tbe order statistics ap- concerned witb norma! data, and we bave of course dealt at lengtb in
proacb by Wilk, Gnanadesikan, an d Huyett (1962a, 1962b). Tbe value of
Section 6.1 witb otber proposals for tbis case.
tbe scale parameter will not need to be estimated since its value affects only
tbe slope of tbe gamma plot and not its linearity, and it is tberefore 6.3 ACCOMMODATION OF MULTIVARIATE OUTLIERS
irrelevant to tbe detection of markedly anomalous values (bere discordant
outliers). We bave noted tbe relative paucity of metbods for detecting, or assessing
Wilk, Gnanadesikan, and Huyett (1962b) consider maximum likelibood tbe discordancy of, multivariate outliers. Inference tecbniques for tbe ac-
estimation of tbe scale and sbape parameters, À and 71, in tbe gamma commodation of multivariate outliers are even less in evidence. Tbe few
distribution, r( 71, À ), base d o n an ordered random sample of observations, proposals in tbe literature again concentrate on a basic normal distribution
y(l), Y<2)' ... , Y<n)· Tbey sbow tbat 1} satisfies or bave an informai structure witb intuitive, ratber tban tbeoretically jus-
tified, appeal. We sball review briefiy some of tbe tecbniques available for
f'( 1}) A estimating parameters (in multivariate distributions) in ways wbicb are likely
f( 1ì) -In 11 = l n Q (6.2.12) to be robust against tbe presence of outliers.
We bave referred in Cbapters 2 and 4 to tbe 'premium-protection' rules
wbere f( ) denotes tbe gamma function and Q is tbe ratio of tbe geometrie of Anscombe (1960a) wbicb take tbe form of joint rejection/estimation
procedures. For a univariate sample x 1 , x2 , • •• , xn from N(~L, u ), witb a
2
and aritbmetic means of tbe Y<i)· Tbey present useful tabulated aids for
2
determining 1}. It is interesting to note tbat 11 may be estimated separately location-slippage alternative model N(IL +a, u ) for tbe generation of at
from À, an important consideration in tbat tbe probability plotting procedure most one of tbe xi, we examine tbe maximum absolute residua!
does not require À to be known (it can be arbitrarily assigned).
Roblf (1975) also remarks on a furtber advantage of sucb a means of max lxi-il.
j=l, 2, ... , n
estimation for current purposes: 7ì does not depend strongly on tbe larger
values in tbe ordered sample, so tbat 7ì will be reasonably robust against tbe If tbis is sufficiently large we omit tbe observation xi yielding tbe max-
very outliers we are seeking to identify. In Wilk, Gnanadesikan, and Huyett imum absolute residua! and estimate IL by tbe sample mean of tbe remaining
(1962a) tbe 'gamma-type' probability plot is described in detail and useful n -l observations; otberwise we estimate IL merely by i. Tbere is an
tables of quantiles of tbe gamma distribution are presented. obvious multivariate generalization in wbicb we consider R(n)(i, S) [or
If sometbing nearer a formai test of discordancy for a single outlier is R(n)(i, V), depending on our state of knowledge about V] and if it is
required, Roblf makes tbe following proposals. If we knew À and 11 in tbe sufficiently large omit tbe observation xi yielding R(n) before estimating p.
(approximate) gamma distribution for wbicb tbe zi are (approximately) from tbe residual sample; if R(n) is not sufficiently large we estimate p. from
independent observations, tben we could compare z<n-l)/ À witb tbe upper tbe total sample by means of i. Sucb an approacb is implicitly taken up by
percentage points for tbe maximum observation in a sample of size n- l Golub, Guttman, and Dutter (1973) in greater detail and generality. Tbeir
Outliers in multivariate data 231
230 Outliers in statistica[ data

Tbe zi are now examined for bomogeneity, eitber by means of a probabil- from a gamma distribution witb sbape parameter 71· Equivalently
ity plot of tbeir ordered values o~y testing if tbe ratio of tbe square of tbe
maximum, zln-l)' to tbe average, z 2 , of tbe squares is of reasonable value.
Tbe 'gamma-type plot' of tbe z(i) against quantiles of a gamma distribu- - l ,,,
tion bas beuristic justification if X comes from a p-variate norma! distribu- wbere z 2 = - - I~- 1 zf, bas a Beta distribution witb parameters 11 and
n- 1
tion, on tbe following argument. If tbe components of X were independent (n -1)71 (independent of A) and use can be made of existing results on
norma! (eacb witb unit variance) tbe inter-point squared Euclidean distances (approximate) upper percentage points of sucb a Beta distribution (see
would be independently distributed as 2x~. If tbe components of X an~ not Roblf, 1975, for details). Not knowing the value of 11 it is proposed tbat we
independent, tbese distances will be depend~nt and may not follow too sbould relate zln-l)/[(n -l)z 2] to tbe approximate upper percentage points
closely a gamma distribution. However, Roblf claims tbat empirica! investi- of tbe appropriate Beta distribution for an estimated value 1}. Roblf presents
gations demonstrate tbat tbe particular su bse t of squared edge distances, zf, a table of upper bounds to tbe upper 5 per cent and l per cent points for
do appear to bave approximately independent common gamma distributions n= 10, 20(20)100, 200 and 11 = 0.1(0.1)1.0(0.5)5.0, 6.0(2.0)12.0.
(on tbe assumption of bomogeneity, tbat is, absence of discordant outliers). Tbe idea of using tbe MST to refiect outliers is interesting but clearly
Tbe relevant sbape parameter will need to be estimated eitber (iteratively) needs more detailed study and illustration. Roblf's proposals are specifically
by tbe maximum likelibood metbod, or by using tbe order statistics ap- concerned witb norma! data, and we bave of course dealt at lengtb in
proacb by Wilk, Gnanadesikan, an d Huyett (1962a, 1962b). Tbe value of
Section 6.1 witb otber proposals for tbis case.
tbe scale parameter will not need to be estimated since its value affects only
tbe slope of tbe gamma plot and not its linearity, and it is tberefore 6.3 ACCOMMODATION OF MULTIVARIATE OUTLIERS
irrelevant to tbe detection of markedly anomalous values (bere discordant
outliers). We bave noted tbe relative paucity of metbods for detecting, or assessing
Wilk, Gnanadesikan, and Huyett (1962b) consider maximum likelibood tbe discordancy of, multivariate outliers. Inference tecbniques for tbe ac-
estimation of tbe scale and sbape parameters, À and 71, in tbe gamma commodation of multivariate outliers are even less in evidence. Tbe few
distribution, r( 71, À ), base d o n an ordered random sample of observations, proposals in tbe literature again concentrate on a basic normal distribution
y(l), Y<2)' ... , Y<n)· Tbey sbow tbat 1} satisfies or bave an informai structure witb intuitive, ratber tban tbeoretically jus-
tified, appeal. We sball review briefiy some of tbe tecbniques available for
f'( 1}) A estimating parameters (in multivariate distributions) in ways wbicb are likely
f( 1ì) -In 11 = l n Q (6.2.12) to be robust against tbe presence of outliers.
We bave referred in Cbapters 2 and 4 to tbe 'premium-protection' rules
wbere f( ) denotes tbe gamma function and Q is tbe ratio of tbe geometrie of Anscombe (1960a) wbicb take tbe form of joint rejection/estimation
procedures. For a univariate sample x 1 , x2 , • •• , xn from N(~L, u ), witb a
2
and aritbmetic means of tbe Y<i)· Tbey present useful tabulated aids for
2
determining 1}. It is interesting to note tbat 11 may be estimated separately location-slippage alternative model N(IL +a, u ) for tbe generation of at
from À, an important consideration in tbat tbe probability plotting procedure most one of tbe xi, we examine tbe maximum absolute residua!
does not require À to be known (it can be arbitrarily assigned).
Roblf (1975) also remarks on a furtber advantage of sucb a means of max lxi-il.
j=l, 2, ... , n
estimation for current purposes: 7ì does not depend strongly on tbe larger
values in tbe ordered sample, so tbat 7ì will be reasonably robust against tbe If tbis is sufficiently large we omit tbe observation xi yielding tbe max-
very outliers we are seeking to identify. In Wilk, Gnanadesikan, and Huyett imum absolute residua! and estimate IL by tbe sample mean of tbe remaining
(1962a) tbe 'gamma-type' probability plot is described in detail and useful n -l observations; otberwise we estimate IL merely by i. Tbere is an
tables of quantiles of tbe gamma distribution are presented. obvious multivariate generalization in wbicb we consider R(n)(i, S) [or
If sometbing nearer a formai test of discordancy for a single outlier is R(n)(i, V), depending on our state of knowledge about V] and if it is
required, Roblf makes tbe following proposals. If we knew À and 11 in tbe sufficiently large omit tbe observation xi yielding R(n) before estimating p.
(approximate) gamma distribution for wbicb tbe zi are (approximately) from tbe residual sample; if R(n) is not sufficiently large we estimate p. from
independent observations, tben we could compare z<n-l)/ À witb tbe upper tbe total sample by means of i. Sucb an approacb is implicitly taken up by
percentage points for tbe maximum observation in a sample of size n- l Golub, Guttman, and Dutter (1973) in greater detail and generality. Tbeir
232 Outliers in statistica[ data Outliers in multivariate data 233

work is more generai in that they consider a generai normal linear model To ensure estimated covariance matrices which are positive definite,
and augment the Anscombe-type rule by corresponding rules based on Gnanadesikan and Kettenring suggest ranking the multivariate sample in
Winsorization and 'semi-Winsorization' of residuals. The greater detail is terms of some distance measure R(x; x*, I) where x* is a robust estimator of
reflected in their discussion of the problems of determining the premium and p., omitting a smaJl proportion of the sample having the largest x*, I)
protection. To ease the task of such determinations they also propose the values, and computing a matrix
use of orthogonalized ('adjusted') residuals as a basis for approximating the
A 0 = ~l * (xi-x *)'
/..J (xi-x) (6.3.2)
premium and protection. This aspect is discussed more fully in our study of
the linear mode! situation in Section 7 .3. However, since an uncorrelated where the summation extends over the retained sample members. The whole
error structure is assumed in the detailed discussion, the results are not sample is then ranked in terms of R(x; x*, A 0 ), again a small proportion of
immediately applicable to the case of a generai multivariate normal sample. the observations having the largest R(x; x*, A 0 ) are omitted, and V is
A different approach to multivariate outlier accommodation can be based estimated as the appropriate multiple of the matrix of sums of squares and
on the Bayesian analysis by Guttman (1973b). Guttman is concerned with cross-products of the finally retained sample members. The procedure is
the posterior distribution of a for a basic norma! model N(p., V), with a intuitively appealing, but only limited empirica! investigation is reported.
mean-slippage alternative mode l N(p. +a, V) for a t most o ne of the observa- Another ingenious method of constructing a· robust estimator of V, without
tions. Examination of the posterior distribution of p. (and V) is germane to recourse to estimation of p., is also presented.
the accommodation issue. Further reference to this approach to the multi- A more specific contribution to outlier-robust estimation relates to es-
variate problem, with some detailed results for the corresponding univariate timating the correlation coefficient in a bivariate norma! distribution. This is
case, appears in Section 8.1. examined by Devlin, Gnanadesikan, an d Kettenring (197 5). After reviewing
So far we bave concentrated on outlier-robust estimation of the mean p.. various ad hoc estimators based on partitioning of the sample space,
An alternative concern might be with outlier-robust estimation of the on transformations of Kendall's T, or on normal scores, they proceed to
variance-covariance matrix V, or with derived quantities such as correlation investigate (by means of an extensive simulation exercise) estimators based
coefficients. Gnanadesikan and Kettenring (1972) are concerned with robust on (6.3.1), on trimming or on Winsorization.
estimation of multivariate location and dispersion. Whilst not preoccupied
with the outlier problem some of their proposed robust estimators will en
passant provide protection against outliers. (But we must recall our earlier
discussion of the multifarious effects and manifestations of outliers in the
multivariate case, influencing as they may scale, correlation structure, high-
or low-order components, and so on.) Various robust estimators of p. are
reviewed (and bave their performance characteristics examined by simula-
tion in the bivariate norma! case). The estimators mostly take the form of
vectors of robust univariate estimators for the distinct marginai components
of p., such as the vector of sample medians, or of trimmed means. The
further prospect of using the vector of Winsorized means is not examined in
detail. As far as robust estimation of dispersion parameters is concerned
they consider the usual variance estimates based on trimmed or Winsorized
marginai samples, or examination of the slope of appropriate probability
plots. Covariances and correlations are less readily estimated robustly. One
possibility is to use the relationship that, for any two random variables X 1
and x2.,
(6.3.1)
and to obtain robust estimates of the variances from trimmed or Winsorized
versions of the transformed samples
(j= 1,2, ... , n).
232 Outliers in statistica[ data Outliers in multivariate data 233

work is more generai in that they consider a generai normal linear model To ensure estimated covariance matrices which are positive definite,
and augment the Anscombe-type rule by corresponding rules based on Gnanadesikan and Kettenring suggest ranking the multivariate sample in
Winsorization and 'semi-Winsorization' of residuals. The greater detail is terms of some distance measure R(x; x*, I) where x* is a robust estimator of
reflected in their discussion of the problems of determining the premium and p., omitting a smaJl proportion of the sample having the largest x*, I)
protection. To ease the task of such determinations they also propose the values, and computing a matrix
use of orthogonalized ('adjusted') residuals as a basis for approximating the
A 0 = ~l * (xi-x *)'
/..J (xi-x) (6.3.2)
premium and protection. This aspect is discussed more fully in our study of
the linear mode! situation in Section 7 .3. However, since an uncorrelated where the summation extends over the retained sample members. The whole
error structure is assumed in the detailed discussion, the results are not sample is then ranked in terms of R(x; x*, A 0 ), again a small proportion of
immediately applicable to the case of a generai multivariate normal sample. the observations having the largest R(x; x*, A 0 ) are omitted, and V is
A different approach to multivariate outlier accommodation can be based estimated as the appropriate multiple of the matrix of sums of squares and
on the Bayesian analysis by Guttman (1973b). Guttman is concerned with cross-products of the finally retained sample members. The procedure is
the posterior distribution of a for a basic norma! model N(p., V), with a intuitively appealing, but only limited empirica! investigation is reported.
mean-slippage alternative mode l N(p. +a, V) for a t most o ne of the observa- Another ingenious method of constructing a· robust estimator of V, without
tions. Examination of the posterior distribution of p. (and V) is germane to recourse to estimation of p., is also presented.
the accommodation issue. Further reference to this approach to the multi- A more specific contribution to outlier-robust estimation relates to es-
variate problem, with some detailed results for the corresponding univariate timating the correlation coefficient in a bivariate norma! distribution. This is
case, appears in Section 8.1. examined by Devlin, Gnanadesikan, an d Kettenring (197 5). After reviewing
So far we bave concentrated on outlier-robust estimation of the mean p.. various ad hoc estimators based on partitioning of the sample space,
An alternative concern might be with outlier-robust estimation of the on transformations of Kendall's T, or on normal scores, they proceed to
variance-covariance matrix V, or with derived quantities such as correlation investigate (by means of an extensive simulation exercise) estimators based
coefficients. Gnanadesikan and Kettenring (1972) are concerned with robust on (6.3.1), on trimming or on Winsorization.
estimation of multivariate location and dispersion. Whilst not preoccupied
with the outlier problem some of their proposed robust estimators will en
passant provide protection against outliers. (But we must recall our earlier
discussion of the multifarious effects and manifestations of outliers in the
multivariate case, influencing as they may scale, correlation structure, high-
or low-order components, and so on.) Various robust estimators of p. are
reviewed (and bave their performance characteristics examined by simula-
tion in the bivariate norma! case). The estimators mostly take the form of
vectors of robust univariate estimators for the distinct marginai components
of p., such as the vector of sample medians, or of trimmed means. The
further prospect of using the vector of Winsorized means is not examined in
detail. As far as robust estimation of dispersion parameters is concerned
they consider the usual variance estimates based on trimmed or Winsorized
marginai samples, or examination of the slope of appropriate probability
plots. Covariances and correlations are less readily estimated robustly. One
possibility is to use the relationship that, for any two random variables X 1
and x2.,
(6.3.1)
and to obtain robust estimates of the variances from trimmed or Winsorized
versions of the transformed samples
(j= 1,2, ... , n).
Outliers in designed experiments, regression, and in time-series 235

Table 7 .l Extension of a structure under different loads

11.2 21.1 29.9 34.1 43.8 53.4 59.9 61.2 68.9

y;(in) 1.6 2.1 3.4 3.3 4.2 3.1 4.3 6.2 6.3

point in seeking outlying extensions of length in terms of the literal Yi values


CHAPTER 7 alone. If we plo t the data (as in Figure 7 .l) we find confirmation of the
relationship. Perhaps a linear regression model is appropriate. But now an
anomaly does arise; the observation (53.4, 3.1) shows up as an outlier in the
Outliers in Designed Experiments, sense that it disturbs the generai pattem to a degree which is discomforting.
An appropriate visual inspection stili su:ffices to detect the outlier, but it is
Regression, and in Time-Series clear that in a more complicated regression situation with many independent
variables this may well not be possible. Thus we now need to refine our
notion of an outlier. It is not a simple extreme value, but it has a more
generai pattern-disrupting form. Note, however, that compensatory effects
We bave concentrated so far on the examination of outliers in a single can arise for multiple outliers, making them less readily detected through
sample of observations from a common distribution, or of outlying samples simple pattern-disruption. We can no longer rely on direct subjective impact
in a set of independent samples from a common distribution. It has been but may need (as in multivariate samples) to adopt an appropriate outlier
assumed that the only departure from the null hypothesis of homogeneity of detection process before we can even contemplate testing discordancy or
distribution arises in explanation of discordant outliers, or outlying samples, refining inference procedures to take specific regard of outliers. (This latter
on the basis of one of the models for outlier generation. But in a great deal consideration is, of course, the stimulus for developing outlier-robust proce-
of statistica! analysis departure from the assumption of homogeneity of dures which provide protection against possible outliers, rather than being
distribution need bave nothing to do with outliers. It is a natural, and often designed to accommodate pre-identified outliers. Trimming or Winsoriza-
welcome, manifestation of the appropriateness of some linear mode! ex- tion exemplify this approach for univariate samples.)
plaining how mean values vary with different levels of factors of classifica- Table 7.2 (extracted from Bross, 1961; earlier presènted by Daniel, 1960)
tion or with different values of a set of independent variables; or it may presents hypothetical yields of some product at different levels of two
express a time-dependent effect in the generation of the data. This applies, chemicals, A and B. (The labels A and B bave been interchanged for
of course, to the whole range of designed experiments, regression situations, notational convenience.)
or time-series. Even so, a further degree of inhomogeneity may be revealed Presumably we should not be particularly surprised in such a situation if
by the presence of outliers, which express ad hoc influences additional to the means of the underlying distributions differed at different levels of A or
linear model or time-series effects. Thus it is appropriate to extend our study B; indeed we might wish to examine the propriety of some linear model for
of outliers to such more highly structured situations.
A crucial distinction must now be recognized in the occurrence of outliers. Table 7.2 Yields of a process at different levels
In a single univariate sample an outlier was identified subjectively as an of two chemicals, A and B
observation which engenders 'surprise' in its value relative to the other
sample members: it 'sticks out at the end' of the sample. Subjective Levels of B T o tal
identification is succeeded by formai processing in the sense of a test of

r
discordancy and (perhaps) a contingent adjustment in the value prior to 32 37 40 144
further processing of the data. For more structured data we would wish to 29 29 34 36 128
Levels
retain the stimulus of 'surprise' but this concept is now far more nebulous. 25 29 30 20 104
of A
19 25 25 35 104
For example, consider the two sets of data shown in Tables 7 .l and 7 .2. In 22 20 29 29 100
Table 7 .l values xi and Yi are respectively the loads (in lb) applied to similar
T o tal 130 135 155 160 580
structures and their resulting extensions of length (in inches). We must
expect some relationship between the loads and extensions and there is no
234
Outliers in designed experiments, regression, and in time-series 235

Table 7 .l Extension of a structure under different loads

11.2 21.1 29.9 34.1 43.8 53.4 59.9 61.2 68.9

y;(in) 1.6 2.1 3.4 3.3 4.2 3.1 4.3 6.2 6.3

point in seeking outlying extensions of length in terms of the literal Yi values


CHAPTER 7 alone. If we plo t the data (as in Figure 7 .l) we find confirmation of the
relationship. Perhaps a linear regression model is appropriate. But now an
anomaly does arise; the observation (53.4, 3.1) shows up as an outlier in the
Outliers in Designed Experiments, sense that it disturbs the generai pattem to a degree which is discomforting.
An appropriate visual inspection stili su:ffices to detect the outlier, but it is
Regression, and in Time-Series clear that in a more complicated regression situation with many independent
variables this may well not be possible. Thus we now need to refine our
notion of an outlier. It is not a simple extreme value, but it has a more
generai pattern-disrupting form. Note, however, that compensatory effects
We bave concentrated so far on the examination of outliers in a single can arise for multiple outliers, making them less readily detected through
sample of observations from a common distribution, or of outlying samples simple pattern-disruption. We can no longer rely on direct subjective impact
in a set of independent samples from a common distribution. It has been but may need (as in multivariate samples) to adopt an appropriate outlier
assumed that the only departure from the null hypothesis of homogeneity of detection process before we can even contemplate testing discordancy or
distribution arises in explanation of discordant outliers, or outlying samples, refining inference procedures to take specific regard of outliers. (This latter
on the basis of one of the models for outlier generation. But in a great deal consideration is, of course, the stimulus for developing outlier-robust proce-
of statistica! analysis departure from the assumption of homogeneity of dures which provide protection against possible outliers, rather than being
distribution need bave nothing to do with outliers. It is a natural, and often designed to accommodate pre-identified outliers. Trimming or Winsoriza-
welcome, manifestation of the appropriateness of some linear mode! ex- tion exemplify this approach for univariate samples.)
plaining how mean values vary with different levels of factors of classifica- Table 7.2 (extracted from Bross, 1961; earlier presènted by Daniel, 1960)
tion or with different values of a set of independent variables; or it may presents hypothetical yields of some product at different levels of two
express a time-dependent effect in the generation of the data. This applies, chemicals, A and B. (The labels A and B bave been interchanged for
of course, to the whole range of designed experiments, regression situations, notational convenience.)
or time-series. Even so, a further degree of inhomogeneity may be revealed Presumably we should not be particularly surprised in such a situation if
by the presence of outliers, which express ad hoc influences additional to the means of the underlying distributions differed at different levels of A or
linear model or time-series effects. Thus it is appropriate to extend our study B; indeed we might wish to examine the propriety of some linear model for
of outliers to such more highly structured situations.
A crucial distinction must now be recognized in the occurrence of outliers. Table 7.2 Yields of a process at different levels
In a single univariate sample an outlier was identified subjectively as an of two chemicals, A and B
observation which engenders 'surprise' in its value relative to the other
sample members: it 'sticks out at the end' of the sample. Subjective Levels of B T o tal
identification is succeeded by formai processing in the sense of a test of

r
discordancy and (perhaps) a contingent adjustment in the value prior to 32 37 40 144
further processing of the data. For more structured data we would wish to 29 29 34 36 128
Levels
retain the stimulus of 'surprise' but this concept is now far more nebulous. 25 29 30 20 104
of A
19 25 25 35 104
For example, consider the two sets of data shown in Tables 7 .l and 7 .2. In 22 20 29 29 100
Table 7 .l values xi and Yi are respectively the loads (in lb) applied to similar
T o tal 130 135 155 160 580
structures and their resulting extensions of length (in inches). We must
expect some relationship between the loads and extensions and there is no
234
236 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 237

(o n the usual do t convention for aggregation over the levels of the two
6 factors). Tbe table of estimated residuals appears as follows, bighligbting tbe
5 aberrance of tbe observation 20:
4
3 .. 2 -2 -l
o -l o
l
l
2
1
2 5 2 -9
-4 l -3 6
10 20 30 40 50 60 70 x.
l
o -3 2 l
It is tbis type of individuai disruption of pattern that we must regard as an
Figure 1.1 Relationship between load and extension for data of Table 7.1
expression of tbe outlying nature of a single observation. We need (at least
in complex experiments) to develop procedures for formai detection of sucb
tbe means. Tbus again, tbe extreme values (19 and 40) in tbe data set are outliers and for proper assessment of tbeir statistica! significance (discor-
not the only (or even tbe predominant) candidates as outliers, and we must dancy). Additionally, the accommodation aspect is vital. A prime aim is to
employ some basis otber tban tbe literal values of the yields for detecting examine our linear mode! in a designed experiment (or estimate and test
any outliers. parameters in a regression, or time-series, mode!) witb as little interference
If we consider tbe first row of tbe data (corresponding witb tbe first level as possible from isolated outlying observations.
of A) tbe observations are uniformly tbe largest in their respective columns In ali of tbe structured models relevant to tbe different topics of tbis
(i.e. at each leve! of B). It migbt bappen, in tbe terms of the types of cbapter it is appealing to examine disruption of pattern tbrougb the be-
slippage test described in Cbapter 5, tbat tbis constitutes a discordant haviour of residuals, and many of the published results on outliers approacb
outlying sub-sample of tbe data. But tbis would be no basis for suspecting the problem in tbis way. But we must recognize some sbortcomings in tbe
tbe integrity of tbe data; it is just one ratber specific manifestation of tbe use of residuals. Tbey are inevitably inter-correlated (and may even bave
type of effect we are investigating in the analysis of such a two-way differing variances by virtue of tbe assumed model, except perhaps in tbe
experimental design. It would point to a significant influence on yield by null cases of zero regression parameters, no treatment effects or nil time-
cbemical A, of a ratber more specific style than arises from merely rejecting dependence). Any outliers affect not only their own residuals but bave a
tbe null bypotbesis of no A-effect. Such a 'discordant outlier' result is typical carry-over influence on otbers. Tbus tbeir aberrance tends to be somewbat
of tbe identification aspect è>f outliers described in Section 2.2 as the third of smootbed out: tbey bide bebind the skirts of tbeir neighbours in tbe data set.
tbe tbree possible interests (as distinct from rejection and accommodation). Extreme examples of tbis include two-way experiments with one factor
W e sbould wisb to acknowledge tbe outlying sub-sample as a positive result baving only two levels, wbere residuals arise in pairs of identica! value and
of tbe overall analysis. apposite sign, or a 3 x 3 Latin square wbere residuals take equal values in
On the other band tbe single observation 20 in the last column of tbe data groups of tbree an d inter-residual correlations are eitber l or -0.5. Accord-
stands out in individuai isolation as being relatively more extreme witbin its ingly otber principles for outlier detection, testing, or accommodation are
column (or row) than do any otber observations in tbe data. It appears to also to be found in tbe literature and we sball be examining tbem. Often
disrupt seriously tbe overall pattem of results where (rougbly speaking) tbey involve non-parametric techniques, or informai grapbical display proce-
yields decrease witb tbe levels of A and increase with tbe levels of B. We dures.
could formalize tbis impression by considering the estimated residuals, iii' in In principle tbe study of outliers in regression situations, or in designed
relation to a fitted additive linear model experiments, can be subsumed in a wider investigation of outliers in generai
linear models. Mucb publisbed work bas tbis wider empbasis, and will be
discussed later in tbe chapter. However, we shall consider first tbe more
specific proposals wbicb bave been made for dealing witb outliers in de-
wbere xii is the yield at levels i and j of A and B, respectively, eii is tbe signed experiments (usually two-way) and in (principally linear) regression
corresponding true residua! and Ii ai= Ii {3i =O. W e bave situations. These tend to be more applications-oriented witb discussion of
the implementation of particular techniques tban does tbe work on tbe
generai linear model.
236 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 237

(o n the usual do t convention for aggregation over the levels of the two
6 factors). Tbe table of estimated residuals appears as follows, bighligbting tbe
5 aberrance of tbe observation 20:
4
3 .. 2 -2 -l
o -l o
l
l
2
1
2 5 2 -9
-4 l -3 6
10 20 30 40 50 60 70 x.
l
o -3 2 l
It is tbis type of individuai disruption of pattern that we must regard as an
Figure 1.1 Relationship between load and extension for data of Table 7.1
expression of tbe outlying nature of a single observation. We need (at least
in complex experiments) to develop procedures for formai detection of sucb
tbe means. Tbus again, tbe extreme values (19 and 40) in tbe data set are outliers and for proper assessment of tbeir statistica! significance (discor-
not the only (or even tbe predominant) candidates as outliers, and we must dancy). Additionally, the accommodation aspect is vital. A prime aim is to
employ some basis otber tban tbe literal values of the yields for detecting examine our linear mode! in a designed experiment (or estimate and test
any outliers. parameters in a regression, or time-series, mode!) witb as little interference
If we consider tbe first row of tbe data (corresponding witb tbe first level as possible from isolated outlying observations.
of A) tbe observations are uniformly tbe largest in their respective columns In ali of tbe structured models relevant to tbe different topics of tbis
(i.e. at each leve! of B). It migbt bappen, in tbe terms of the types of cbapter it is appealing to examine disruption of pattern tbrougb the be-
slippage test described in Cbapter 5, tbat tbis constitutes a discordant haviour of residuals, and many of the published results on outliers approacb
outlying sub-sample of tbe data. But tbis would be no basis for suspecting the problem in tbis way. But we must recognize some sbortcomings in tbe
tbe integrity of tbe data; it is just one ratber specific manifestation of tbe use of residuals. Tbey are inevitably inter-correlated (and may even bave
type of effect we are investigating in the analysis of such a two-way differing variances by virtue of tbe assumed model, except perhaps in tbe
experimental design. It would point to a significant influence on yield by null cases of zero regression parameters, no treatment effects or nil time-
cbemical A, of a ratber more specific style than arises from merely rejecting dependence). Any outliers affect not only their own residuals but bave a
tbe null bypotbesis of no A-effect. Such a 'discordant outlier' result is typical carry-over influence on otbers. Tbus tbeir aberrance tends to be somewbat
of tbe identification aspect è>f outliers described in Section 2.2 as the third of smootbed out: tbey bide bebind the skirts of tbeir neighbours in tbe data set.
tbe tbree possible interests (as distinct from rejection and accommodation). Extreme examples of tbis include two-way experiments with one factor
W e sbould wisb to acknowledge tbe outlying sub-sample as a positive result baving only two levels, wbere residuals arise in pairs of identica! value and
of tbe overall analysis. apposite sign, or a 3 x 3 Latin square wbere residuals take equal values in
On the other band tbe single observation 20 in the last column of tbe data groups of tbree an d inter-residual correlations are eitber l or -0.5. Accord-
stands out in individuai isolation as being relatively more extreme witbin its ingly otber principles for outlier detection, testing, or accommodation are
column (or row) than do any otber observations in tbe data. It appears to also to be found in tbe literature and we sball be examining tbem. Often
disrupt seriously tbe overall pattem of results where (rougbly speaking) tbey involve non-parametric techniques, or informai grapbical display proce-
yields decrease witb tbe levels of A and increase with tbe levels of B. We dures.
could formalize tbis impression by considering the estimated residuals, iii' in In principle tbe study of outliers in regression situations, or in designed
relation to a fitted additive linear model experiments, can be subsumed in a wider investigation of outliers in generai
linear models. Mucb publisbed work bas tbis wider empbasis, and will be
discussed later in tbe chapter. However, we shall consider first tbe more
specific proposals wbicb bave been made for dealing witb outliers in de-
wbere xii is the yield at levels i and j of A and B, respectively, eii is tbe signed experiments (usually two-way) and in (principally linear) regression
corresponding true residua! and Ii ai= Ii {3i =O. W e bave situations. These tend to be more applications-oriented witb discussion of
the implementation of particular techniques tban does tbe work on tbe
generai linear model.
238 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 239

Study of outliers in time-series data is a distinct problem on which little To consider the implications of these results let us start with an unrepli-
has been published. We review the few known results in a short final section cated two-way design where the (crossed) factors A and B bave r and c
of the chapter. levels, respectively, and we suspect no discordant values in the resulting
data. Under the usual additive linear mode! for the means, with normal
error structure, th~ observations xii (i= l, 2, ... , r; j =l, 2, ... , c) can be
0

7.1 OUTLIERS IN DESIGNED EXPERIMENTS


written
Basic problems in the study of outliers in data from designed experiments xii= p.+ ai+ ~i+ eii (7.1.1)
are that they are difficult to detect (in the respect discussed above) and that
their presence inftuences the analysis of variance of the data set in a way where Lai= L~i= O and the eii are independent N(O, a.2). The estimated
which may cloak significant effects or exhibit apparent effects which, were it means (fitted values) are
= xi.+ x.j -x.. (7.1.2)
not for the outliers, would not arise. Most techniques for outlier detection, fiij

for testing their discordancy or for minimizing their inftuences in analysing and the estimated residuals are
the underlying linear additive model, take as a basic measure of the import
(7 .1.3)
of individuai observations their residuals about the fitted linear model. A
concern about the inter-correlation, and carry-over inftuences, between The é.. will be N( O, vu 2 /(rc)) where v= (r-l)(c -l) but they will not be
residuals has prompted some use of modified (outlier-robust) residuals or of independent. The linear constraints on the éii imply correlation between éii
measures not directly based on residuals. In all cases the aim is to exhibit, and éi'i' in the form
and assess, the extent to which individuai observations disrupt the overall
-1/(r-1) i= i', j~ j'
pattern anticipated in the data, by virtue of the linear mode! for the means
which is implied by the experimental design. Piii'i'= -1/(c-1) i~i',j=j' (7.1.4)
{
Bross (1961) presents what he describes as a 'strategie appraisal' of the l/v i~ i', j~ j'.
problems of handling outliers in 'patterned experiments'. H e stresses the
Formalizing bis numerica! example Daniel considers the expected bias in
special difficulties of their detection: an outlier disrupts an anticipated
the residuals due to a single outlier at position.(i, j) in the two-way layout
pattern of interrelationship in the data, the pattern is itself a non-null
reftecting a contamination in the mean value of order re. The expected bias
representation and needs first to be characterized before it can be 'dis-
rupted'. H e develops this theme in terms of isolated departures in the values will be
of observations relative to those of neighbouring observations. In describing re i'= i, j' = j
the influence of outliers he stresses the combined effect of the outlier and of -r i'= i, j' ~ j
the analysis of variance techniques applied to the overall data set. Although 5i'j'= •f.J. • • , •
(7 .1.5)
-c z r- z, 1 = 1
Bross proposes no formai set of procedures for coping with outliers he
sketches a non-parametric principle which we shall return to in Section
7.1.5.
l
l i' ~ i, j' ~ j.
He argues that the correlation between the new (outlier affected) esti-
mated residuals, éii and their biases will be high, and proposes that th~ value
of this correlation be used as an indication of the presence of the outher and
7 .1.1 Discordancy tests based on residuals as a basis for testing its discordancy. The correlation can (after appropriate
A natural starting point for studying discordancy tests for outliers using manipulation) be expressed in the form
residuals is found in the work of Daniel (1960). -2
_ re emax
Into a se t of artificial data purporting to be the outcomes of a 4 x 5 p=-- (7 .1.6)
factorial experiment, he introduces a substantial contamination of a single v Ié~
observation (the data are the same in a reordered form as the data of Table wh ere é 2max =max I,J.. {é~-}
11 •
Thus ' in spite of the carry-over effect of the outlier,
7.2 from Bross, 1961, briefty examined above). He stresses how the outlier the indication of its presence resides solely in the largest absolute value of
introduces substantial bias in the fitted value and residuals not only in its the estimated residuals. (This remains true for· other designs where the
own specific location but throughout the row and column in which it estimated residuals bave common variance.) To assess if the outlier corres-
appears. ponding with this 'largest residua!' is discordant we need to compare é~ax
238 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 239

Study of outliers in time-series data is a distinct problem on which little To consider the implications of these results let us start with an unrepli-
has been published. We review the few known results in a short final section cated two-way design where the (crossed) factors A and B bave r and c
of the chapter. levels, respectively, and we suspect no discordant values in the resulting
data. Under the usual additive linear mode! for the means, with normal
error structure, th~ observations xii (i= l, 2, ... , r; j =l, 2, ... , c) can be
0

7.1 OUTLIERS IN DESIGNED EXPERIMENTS


written
Basic problems in the study of outliers in data from designed experiments xii= p.+ ai+ ~i+ eii (7.1.1)
are that they are difficult to detect (in the respect discussed above) and that
their presence inftuences the analysis of variance of the data set in a way where Lai= L~i= O and the eii are independent N(O, a.2). The estimated
which may cloak significant effects or exhibit apparent effects which, were it means (fitted values) are
= xi.+ x.j -x.. (7.1.2)
not for the outliers, would not arise. Most techniques for outlier detection, fiij

for testing their discordancy or for minimizing their inftuences in analysing and the estimated residuals are
the underlying linear additive model, take as a basic measure of the import
(7 .1.3)
of individuai observations their residuals about the fitted linear model. A
concern about the inter-correlation, and carry-over inftuences, between The é.. will be N( O, vu 2 /(rc)) where v= (r-l)(c -l) but they will not be
residuals has prompted some use of modified (outlier-robust) residuals or of independent. The linear constraints on the éii imply correlation between éii
measures not directly based on residuals. In all cases the aim is to exhibit, and éi'i' in the form
and assess, the extent to which individuai observations disrupt the overall
-1/(r-1) i= i', j~ j'
pattern anticipated in the data, by virtue of the linear mode! for the means
which is implied by the experimental design. Piii'i'= -1/(c-1) i~i',j=j' (7.1.4)
{
Bross (1961) presents what he describes as a 'strategie appraisal' of the l/v i~ i', j~ j'.
problems of handling outliers in 'patterned experiments'. H e stresses the
Formalizing bis numerica! example Daniel considers the expected bias in
special difficulties of their detection: an outlier disrupts an anticipated
the residuals due to a single outlier at position.(i, j) in the two-way layout
pattern of interrelationship in the data, the pattern is itself a non-null
reftecting a contamination in the mean value of order re. The expected bias
representation and needs first to be characterized before it can be 'dis-
rupted'. H e develops this theme in terms of isolated departures in the values will be
of observations relative to those of neighbouring observations. In describing re i'= i, j' = j
the influence of outliers he stresses the combined effect of the outlier and of -r i'= i, j' ~ j
the analysis of variance techniques applied to the overall data set. Although 5i'j'= •f.J. • • , •
(7 .1.5)
-c z r- z, 1 = 1
Bross proposes no formai set of procedures for coping with outliers he
sketches a non-parametric principle which we shall return to in Section
7.1.5.
l
l i' ~ i, j' ~ j.
He argues that the correlation between the new (outlier affected) esti-
mated residuals, éii and their biases will be high, and proposes that th~ value
of this correlation be used as an indication of the presence of the outher and
7 .1.1 Discordancy tests based on residuals as a basis for testing its discordancy. The correlation can (after appropriate
A natural starting point for studying discordancy tests for outliers using manipulation) be expressed in the form
residuals is found in the work of Daniel (1960). -2
_ re emax
Into a se t of artificial data purporting to be the outcomes of a 4 x 5 p=-- (7 .1.6)
factorial experiment, he introduces a substantial contamination of a single v Ié~
observation (the data are the same in a reordered form as the data of Table wh ere é 2max =max I,J.. {é~-}
11 •
Thus ' in spite of the carry-over effect of the outlier,
7.2 from Bross, 1961, briefty examined above). He stresses how the outlier the indication of its presence resides solely in the largest absolute value of
introduces substantial bias in the fitted value and residuals not only in its the estimated residuals. (This remains true for· other designs where the
own specific location but throughout the row and column in which it estimated residuals bave common variance.) To assess if the outlier corres-
appears. ponding with this 'largest residua!' is discordant we need to compare é~ax
240 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 241

appropriately witb tbe true error variance u 2 • Tbe residual sum of squares many otbers including ali equally replicated ordinary factorial designs, Latin
L é~ reflects tbe influence of tbe outlier and will be correspondingly inflated squares, and balanced incomplete blocks-see Anscombe, 1960a) tbe opti-
(relative to tbe situation wbere tbere is no contamination in tbe mean at an mality property can be reexpressed in tbe terms tbat tbe outlier test is a
isolated point) to an unknown extent. Tbus to assess discordancy wbere Bayes solution in !espect of a uniform prior distribution over tbe set of
(typically) u 2 is unknown we need to replace L é~ by a measure of residua! bypotbeses specifying equa[ sbifts in tbe mean for tbe outlier.
variability wbicb is not influenced by tbe outlier. To tbis end it is proposed / We must note bow tbe work by Anscombe and by Ferguson extends tbe
(implicitly) tbat we estimate residua! variation by removing tbe outlier, range of applicability and propriety of Daniel's proposal-to a wider set of
treating it as a missing value and replacing it by tbe corresponding least- designed experiments and to many generai linear model problems. We
squares estimate. We are led on tbis argument to a test statistic for return to tbis latter point in Section 7 .3.
discordancy in tbe form Later study of tbis test centres on tbe determination of criticai values for
attributing discordancy, and on placing it in tbe perspective of alternative
(7.1.7) proposals tbat bave been made for examining the assumptions underlying
tbe analysis of variance (for example, additivity, normality, and homoscedas-
wh ere
ticity, in addition to non-contamination by discordant outliers).
S
2
= (L é~-~ é~ax);cv-l)= S~(v -1) Let us remind ourselves of the basic nature of tbe Daniel test. It uses as
test statistic tbe ratio of tbe maximum squared (estimated) residua! to tbe
is an estimate of u 2 based on the residual sum of squares SM in an analysis residual sum of squares in a least-squares analysis wbicb regards as missing
of variance where tbe outlier is regarded as a missing observation. the observation yielding tbe maximum squared residua!. For a generai
Daniel's arguments concerning tbe null distribution of f are in error (as model it can be expressed as
remarked in a footnote in Daniel, 1960). However, tbe assessment of (7.1.9)
discordancy bas been taken up by others, as we sball see shortly. But the
generai principle is not in dispute. We see that a single outlier is detected as wbere l (<n) is the number of degrees of freedom associated witb the
the obseroation yielding the largest absolute value among the residuals; its residual sum of squares Se wben tbe model is fitted to the complete data set
discordancy is to be assessed in terms of its null (no-outlier) distribution. of n observations; SM is tbe corresponding residual sum of squares when the
Use of tbe maximum absolute (so-called) studentized residual by Daniel observation yielding É~ax is regarded as missing.
(1960) and also, in tbe context of accommodation, by Anscombe (1960a) Related statistics bave been proposed by others, as Jobn and Prescott
(see Section 7 .1.2) was based on intuitive arguments with no overt consider- (1975) point out. Over 25 years ago, Quenouille (1953) suggested tbat we
ation of tbe outlier mode! or of any resulting optimality of the proposed investigate an outlier in a designed experiment by considering a statistic
procedures. Noting its form as a natural extension of the optimum single wbich can be formally expressed as
sample procedures of Paulson (1952b), Ferguson (196la) enquired whether (7.1.10)
tbe optimality properties also extended to tbe designed experiment situation
and found tbis to be so. He adopted a mean-slippage mode! to explain a witb obvious intuitive appeal. Recognizing tbe fact tbat tbe missing observa-
single outlier in a generai linear mode!, developed the appropriate rather tion is not arbitrary, but corresponds with the largest (absolute) residual, he
complex sampling distribution theory and proceeded to investigate tbe more suggested that tbe criticai level of TI is assessed as
generai linear model formulation in terms of a multiple-decision approach nP(Fu >Ti) (7.1.11)
(see Section 2.5). (For the two-way design tbe model is
where Fu has an F-distribution witb l and l degrees of freedom.
xii = IL +ai + {3i + uaii + eii (7 .1.8) In an interesting paper using a simulation method (Goldsmith and Boddy,
wbere aii ~O for precisely one pair of values (i, j), and is zero otherwise.) 1973; discussed in more detail below) tbe authors suggested using the
Subject to assumptions that no estimated residuals bave zero variance statistic
(7.1.12)
and no two estimated residuals bave unit correlation, the outlier test based
on maximum absolute studentized residua! proves to be optimal in tbe sense and, supported by their simulation study, proposed assessing its criticallevel
of being invariant admissible. In the case of designs wbere ali estimated as
residuals bave equal variance (true of tbe unreplicated two-way design and 1.25{1-l)P(Fl,l-1 >T~). (7.1.13)
240 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 241

appropriately witb tbe true error variance u 2 • Tbe residual sum of squares many otbers including ali equally replicated ordinary factorial designs, Latin
L é~ reflects tbe influence of tbe outlier and will be correspondingly inflated squares, and balanced incomplete blocks-see Anscombe, 1960a) tbe opti-
(relative to tbe situation wbere tbere is no contamination in tbe mean at an mality property can be reexpressed in tbe terms tbat tbe outlier test is a
isolated point) to an unknown extent. Tbus to assess discordancy wbere Bayes solution in !espect of a uniform prior distribution over tbe set of
(typically) u 2 is unknown we need to replace L é~ by a measure of residua! bypotbeses specifying equa[ sbifts in tbe mean for tbe outlier.
variability wbicb is not influenced by tbe outlier. To tbis end it is proposed / We must note bow tbe work by Anscombe and by Ferguson extends tbe
(implicitly) tbat we estimate residua! variation by removing tbe outlier, range of applicability and propriety of Daniel's proposal-to a wider set of
treating it as a missing value and replacing it by tbe corresponding least- designed experiments and to many generai linear model problems. We
squares estimate. We are led on tbis argument to a test statistic for return to tbis latter point in Section 7 .3.
discordancy in tbe form Later study of tbis test centres on tbe determination of criticai values for
attributing discordancy, and on placing it in tbe perspective of alternative
(7.1.7) proposals tbat bave been made for examining the assumptions underlying
tbe analysis of variance (for example, additivity, normality, and homoscedas-
wh ere
ticity, in addition to non-contamination by discordant outliers).
S
2
= (L é~-~ é~ax);cv-l)= S~(v -1) Let us remind ourselves of the basic nature of tbe Daniel test. It uses as
test statistic tbe ratio of tbe maximum squared (estimated) residua! to tbe
is an estimate of u 2 based on the residual sum of squares SM in an analysis residual sum of squares in a least-squares analysis wbicb regards as missing
of variance where tbe outlier is regarded as a missing observation. the observation yielding tbe maximum squared residua!. For a generai
Daniel's arguments concerning tbe null distribution of f are in error (as model it can be expressed as
remarked in a footnote in Daniel, 1960). However, tbe assessment of (7.1.9)
discordancy bas been taken up by others, as we sball see shortly. But the
generai principle is not in dispute. We see that a single outlier is detected as wbere l (<n) is the number of degrees of freedom associated witb the
the obseroation yielding the largest absolute value among the residuals; its residual sum of squares Se wben tbe model is fitted to the complete data set
discordancy is to be assessed in terms of its null (no-outlier) distribution. of n observations; SM is tbe corresponding residual sum of squares when the
Use of tbe maximum absolute (so-called) studentized residual by Daniel observation yielding É~ax is regarded as missing.
(1960) and also, in tbe context of accommodation, by Anscombe (1960a) Related statistics bave been proposed by others, as Jobn and Prescott
(see Section 7 .1.2) was based on intuitive arguments with no overt consider- (1975) point out. Over 25 years ago, Quenouille (1953) suggested tbat we
ation of tbe outlier mode! or of any resulting optimality of the proposed investigate an outlier in a designed experiment by considering a statistic
procedures. Noting its form as a natural extension of the optimum single wbich can be formally expressed as
sample procedures of Paulson (1952b), Ferguson (196la) enquired whether (7.1.10)
tbe optimality properties also extended to tbe designed experiment situation
and found tbis to be so. He adopted a mean-slippage mode! to explain a witb obvious intuitive appeal. Recognizing tbe fact tbat tbe missing observa-
single outlier in a generai linear mode!, developed the appropriate rather tion is not arbitrary, but corresponds with the largest (absolute) residual, he
complex sampling distribution theory and proceeded to investigate tbe more suggested that tbe criticai level of TI is assessed as
generai linear model formulation in terms of a multiple-decision approach nP(Fu >Ti) (7.1.11)
(see Section 2.5). (For the two-way design tbe model is
where Fu has an F-distribution witb l and l degrees of freedom.
xii = IL +ai + {3i + uaii + eii (7 .1.8) In an interesting paper using a simulation method (Goldsmith and Boddy,
wbere aii ~O for precisely one pair of values (i, j), and is zero otherwise.) 1973; discussed in more detail below) tbe authors suggested using the
Subject to assumptions that no estimated residuals bave zero variance statistic
(7.1.12)
and no two estimated residuals bave unit correlation, the outlier test based
on maximum absolute studentized residua! proves to be optimal in tbe sense and, supported by their simulation study, proposed assessing its criticallevel
of being invariant admissible. In the case of designs wbere ali estimated as
residuals bave equal variance (true of tbe unreplicated two-way design and 1.25{1-l)P(Fl,l-1 >T~). (7.1.13)
242 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 243

Daniel (1960) originally suggested tbat tbe criticai leve! of f be assessed Tbe equivaience of t 2 and lzlcn) bolds for designs wbere tbe residuals (in tbe
as absence of a discordant outlier) bave equai variances, and tbe following
P(F'z.t-1 > nf!F) (7.1.14) results bo Id in sucb situations. Tbe reiationsbip (7 .1.18) can be written, in
tbe wider context, ~,as
but be pointed out a flaw in his argument (with unexamined implications). 2
t = (l-1)(1zlcn)) 2 /{1- n(lzl(n)) 2 / l} (7 .1.19)
Simulation studies by Jobn and Prescott (197 5) suggest tbat all tbe three
distributionai proposals, (7.1.11), (7.1.13), and (7.1.14) are, in generai, wbere l is as defined below (7.1.9).
unsatisfactory. Incidentally, tbe three statistics t 2 , Ti, and T~ are all equival- Stefansky (1971) extends an earlier observation of Pearson and Cbandra
ent, and amount to representing tbe position and import of a single outiier Sekar (1936) tbat criticai values of statistics based on tbe MNR in tbe single
in terms merely of tbe largest (absoiute) residual. (Tbis is important to sample case could be calculated exactly from tables of the t-distribution for
recognize in relation to tbe work of Goldsmith and Boddy, 1973, discussed sufficiently large vaiues of tbe statistic provided we know tbe Iargest value
furtber beiow.) Specifically we bave tbat can be taken by tbe second largest of tbe absolute values of the normed
residuals. Specifically, quantities
n 2 [ 1
T~=~t =(/-l) Ti-l
]-1. (7.1.15) F; = n(l-1)zf/(l-nzf) (7.1.20)
are considered. Tben F(n) = max{.R} = nt 2 /l =T~.
Jobn and Prescott suggest that tbe criticai Ievei of T~ sbouid be deter- Tbe metbod uses Bonferroni-type inequalities (see Section 5 .3) to pro vide
mined as lower and upper bounds for tbe criticai values of tbe MNR (or related
nP(Fu_ 1>T~) (7 .1.16) quantities). Different 'orders' of inequality are available; tbe bigber tbe
and sbow by simulation how the Quenouille and Daniel proposals are 'far order tbe sbarper tbe bounds but tbe more complex tbe calculations.
too conservative', those of Goldsmitb an d Boddy too liberai, whilst their However, tbe crucial resuit is tbat sufficiently far out in tbe tail of tbe
o~n suggestion appears to be quite accurate for a range of factoriai designs distribution exact criticai values are obtained. Tbe stage at wbich this facility
wttb factors at two or tbree levels. They assess tbe accuracy of their holds depends on the values of quantities Mk: tbe greatest obtainabie vaiues
simulation resuits by comparison witb exact criticai values determined by of the kth-Iargest lzJ Stefansky (1971) shows bow to determine Mk for tbe
Stefansky (1972), fora range of designs: 32 , 4 2 , 4 x 3, 5 x 3, 6 x 4 and 8 x 7. range of designs with homoscedastic residuals as discussed above. In
Altbougb tbe J obn an d Prescott proposal is simple an d appears reasona- Stefansky (1972) it is shown tbat P(lz lcn) > z) can be determined exactly
ble in many cases, the most accurate, and useful, results to date on the null from the rth Bonferroni upper (lower) bound if z > M 2 r (M2 r+ 1 ).
distribution of tbe test statistic (and bence tbe best prescription for applica- It is demonstrated that for tbe unreplicated two-way design application of
tion of the outlier test of discordancy) derive from tbe work of Stefansky this principle to the first-order inequalities is of littie practical interest since
(1971, 1972). it requires z to be unreasonabiy Iarge. However, tbe second-order in-
Stefansky re-expresses f in terms of tbe maximum normed residua[ equalities yield exact criticai values from about tbe upper 10 per cent point
(MNR). If the estimated residuals are éi (i= l, 2, ... , n), tbe normed of the distribution, for two-way designs with up to about nine leveis for eacb
residuals are factor.
Assessment of tbe precise range over wbicb exact results can be obtained
(7.1.17) depends of course o n knowing M 2 , M 3 , an d M 4 etc. Por the r x c design witb
one observation per celi we bave
and tbe MNR, denoted lzlcn), is tbe largest of tbe absolute values lzil (i= l,
2, ... , n). M 2 = [m(M -1)/2]!
Por tbe two-way design, the statistic (7 .l. 7) of Daniei can be expressed as w bere m= min(r, c), M= max(r, c),
2
t =
2
(v-l)(lzl(n)f/[1- n(lzl(n)) /v ], (7.1.18) M 3 = [(r-l)(c -l)/(3rc -2r- 2c -2)]!,
a strictly increasing function of lzlcn)· Tbus we can conduct tbe test in terms M 4 =0.5.
of tbe equivalent test statistic lzlcn), or some simple function of it, provided Correspondingly, exact upper l per cent and 5 per cent points of lzlcn) are
its null distribution is known. Stefansky addresses berself to determining tbe obtained for r = 3(1)9, c= 3(1)9 and these are presented as Tabie XXXI on
null distribution. page 334 (extracted from Stefansky, 1972).
242 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 243

Daniel (1960) originally suggested tbat tbe criticai leve! of f be assessed Tbe equivaience of t 2 and lzlcn) bolds for designs wbere tbe residuals (in tbe
as absence of a discordant outlier) bave equai variances, and tbe following
P(F'z.t-1 > nf!F) (7.1.14) results bo Id in sucb situations. Tbe reiationsbip (7 .1.18) can be written, in
tbe wider context, ~,as
but be pointed out a flaw in his argument (with unexamined implications). 2
t = (l-1)(1zlcn)) 2 /{1- n(lzl(n)) 2 / l} (7 .1.19)
Simulation studies by Jobn and Prescott (197 5) suggest tbat all tbe three
distributionai proposals, (7.1.11), (7.1.13), and (7.1.14) are, in generai, wbere l is as defined below (7.1.9).
unsatisfactory. Incidentally, tbe three statistics t 2 , Ti, and T~ are all equival- Stefansky (1971) extends an earlier observation of Pearson and Cbandra
ent, and amount to representing tbe position and import of a single outiier Sekar (1936) tbat criticai values of statistics based on tbe MNR in tbe single
in terms merely of tbe largest (absoiute) residual. (Tbis is important to sample case could be calculated exactly from tables of the t-distribution for
recognize in relation to tbe work of Goldsmith and Boddy, 1973, discussed sufficiently large vaiues of tbe statistic provided we know tbe Iargest value
furtber beiow.) Specifically we bave tbat can be taken by tbe second largest of tbe absolute values of the normed
residuals. Specifically, quantities
n 2 [ 1
T~=~t =(/-l) Ti-l
]-1. (7.1.15) F; = n(l-1)zf/(l-nzf) (7.1.20)
are considered. Tben F(n) = max{.R} = nt 2 /l =T~.
Jobn and Prescott suggest that tbe criticai Ievei of T~ sbouid be deter- Tbe metbod uses Bonferroni-type inequalities (see Section 5 .3) to pro vide
mined as lower and upper bounds for tbe criticai values of tbe MNR (or related
nP(Fu_ 1>T~) (7 .1.16) quantities). Different 'orders' of inequality are available; tbe bigber tbe
and sbow by simulation how the Quenouille and Daniel proposals are 'far order tbe sbarper tbe bounds but tbe more complex tbe calculations.
too conservative', those of Goldsmitb an d Boddy too liberai, whilst their However, tbe crucial resuit is tbat sufficiently far out in tbe tail of tbe
o~n suggestion appears to be quite accurate for a range of factoriai designs distribution exact criticai values are obtained. Tbe stage at wbich this facility
wttb factors at two or tbree levels. They assess tbe accuracy of their holds depends on the values of quantities Mk: tbe greatest obtainabie vaiues
simulation resuits by comparison witb exact criticai values determined by of the kth-Iargest lzJ Stefansky (1971) shows bow to determine Mk for tbe
Stefansky (1972), fora range of designs: 32 , 4 2 , 4 x 3, 5 x 3, 6 x 4 and 8 x 7. range of designs with homoscedastic residuals as discussed above. In
Altbougb tbe J obn an d Prescott proposal is simple an d appears reasona- Stefansky (1972) it is shown tbat P(lz lcn) > z) can be determined exactly
ble in many cases, the most accurate, and useful, results to date on the null from the rth Bonferroni upper (lower) bound if z > M 2 r (M2 r+ 1 ).
distribution of tbe test statistic (and bence tbe best prescription for applica- It is demonstrated that for tbe unreplicated two-way design application of
tion of the outlier test of discordancy) derive from tbe work of Stefansky this principle to the first-order inequalities is of littie practical interest since
(1971, 1972). it requires z to be unreasonabiy Iarge. However, tbe second-order in-
Stefansky re-expresses f in terms of tbe maximum normed residua[ equalities yield exact criticai values from about tbe upper 10 per cent point
(MNR). If the estimated residuals are éi (i= l, 2, ... , n), tbe normed of the distribution, for two-way designs with up to about nine leveis for eacb
residuals are factor.
Assessment of tbe precise range over wbicb exact results can be obtained
(7.1.17) depends of course o n knowing M 2 , M 3 , an d M 4 etc. Por the r x c design witb
one observation per celi we bave
and tbe MNR, denoted lzlcn), is tbe largest of tbe absolute values lzil (i= l,
2, ... , n). M 2 = [m(M -1)/2]!
Por tbe two-way design, the statistic (7 .l. 7) of Daniei can be expressed as w bere m= min(r, c), M= max(r, c),
2
t =
2
(v-l)(lzl(n)f/[1- n(lzl(n)) /v ], (7.1.18) M 3 = [(r-l)(c -l)/(3rc -2r- 2c -2)]!,
a strictly increasing function of lzlcn)· Tbus we can conduct tbe test in terms M 4 =0.5.
of tbe equivalent test statistic lzlcn), or some simple function of it, provided Correspondingly, exact upper l per cent and 5 per cent points of lzlcn) are
its null distribution is known. Stefansky addresses berself to determining tbe obtained for r = 3(1)9, c= 3(1)9 and these are presented as Tabie XXXI on
null distribution. page 334 (extracted from Stefansky, 1972).
244 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 245

Multiway designs can be collapsed to two-way structure by sacrificing Table 7.3


access to certain interactions. Apart from this prospect, Stefansky's tabu-
(7)
lated values are restricted to two-way designs. The best prescription for
Residua[ Mean
factorial designs with more than two factors appears to be that of John and (1) (4) (6) Square (corre-
(3)
Prescott (1975) although we must bear in mind the limited range of their Treatment (2) Effect 'Amended (5) 'New sponding yield
simulation studies which encompass only small designs (e.g. 23 , 2 x 3 3 , 24 , combination Yield total effects' Residuals residuals' missing)
etc.). Even in this range, the rather poor performance of their criticallevel
proposal when main effects and first order interactions are fitted in the 2 x 32 (1) 121 1129 1065 -15.25 0.75 562.5
design sounds a warning note. a 145 -93 -157 32.00 o 34.8
b 150 53 117 0.50 0.50 717.3
Some interesting practical examples of residual-based methods of testing ab 109 -45 19 -17.25 -1.25 519.1
outliers in designed experiments are given by Goldsmith and Boddy (1973). c 160 79 143 4.00 4.00 706.8
23 sets of data, previously discussed in the statistica! literature, are re- ac 112 -59 5 -20.75 -4.75 430.5
analysed to detect and test outliers. A 'consecutive' style of analysis is be 180 67 3 10.75 -5.25 640.5
a be 152 85 21 6.00 6.00 693.5
adopted, proceeding from a discordant outlier to the search for further
outliers. Computer-based application of the technique is discussed in some
detail in relation to any orthogonal design (admitting possible missing But let us look at the residuals after fitting a main effects mode[ to the
values). The authors express dissatisfaction with methods based on the originai data. These ha ve values as shown in column 5, with corresponding
largest residual in that its determination on the basis of fitting the model to residua[ sum of squares, 2152.5. The residua[ at the treatment combination a
the whole data set (including any outlier) will produce a test of relatively low is indeed the largest, although those at (1), ab and ac are also large. Are the
power. Their 'alternative' proposal is to regard each observation in turn as residuals random, or is there contamination at the treatment combination a
missing and to scan the set of n residual sums of squares to see if one of transmitted (via the correlation between the residuals in the same row or
them is noticeably smaller than the others, indicating an outlier. Concentrat- column of the A x B x C design) as larger residuals at (1), ab, ac, (and abc)?
ing on the minimum residua! sum of squares, they employ as test statistic the If we regard the yield a t a as missing, its least-squares estimate is 81 (vastly
quantity T~ of (7.1.12) above. However, we have noted that this is equival- different from 145, by precisely the discrepancy 64 proposed by Daniel). The
ent to other proposals based on the largest (absolute) residua!, so that it is new residuals are shown in column 6; the residua[ sum of squares is reduced
not clear just how a re-emphasis, or broadening of approach, is manifest in from 2152.5 (on four degrees of freedom) to 104.5 (on three degrees of
this work of Goldsmith and Boddy (1973). freedom).
The outlier test statistic value is
Example 7.1. T o illustrate some of the above ideas we considera set of data
from an unreplicated 2 3 experiment attributed by Goldsmith and Boddy 2 =3X2048= 58 .79
(1973) to an unpublished lecture by C. Daniel in 1960. The first three columns 2
T 104.5
in T ab le 7.3 belo w describe (in common notation) the eight treatment com-
binations, the corresponding yields, and the treatment effect totals respectively. with criticallevel (on the John and Prescott proposal) 0.04. Thus at the 5 per
All the effects are of similar order of magnitude and any main-effects cent level we would reject the a-yield of 145 as a discordant outlier.
analysis will not show up significant treatment effects. There are no im- A major diffìculty in handling outliers in designed experiments iies in their
mediately obvious outliers in the data. But perhaps outliers are present, initial detection. The residua[ 32.00 in column (5) hardly renders the a-yield a
masking genuinely significant treatment effects. According to Goldsmith and compelling candidate. Goldsmith and Boddy (1973) suggest that clearer
Boddy (1973), Daniel argued that the largeness of all the interaction terms initial detection may result from scanning the values of the residua[ mean
was in itself suspicious and might indicate one discordant outlier: specifically squares when each yield in turn is regarded as a missing value. This requires a
the yield at 'a', in view of the pattern of signs (-1, -1, +1, +l) of the lot of calculation if the number of observations is at all large. However, for the
interaction effects. Accordingly he estimated the outlier discrepancy as the current data, the residua[ mean squares (shown in column 7) really do
mean of th~ absolute values of the interaction effects (64) and produced highlight the outlier; all but the one corresponding with treatment combination
'amended effects' as shown in column 4, in which the interaction terms are a are of similar order to the originai (full data) mean square 538.1; the mean
now low in comparison with main effects. square when a is missing is dramatically smaller at 34.8.
244 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 245

Multiway designs can be collapsed to two-way structure by sacrificing Table 7.3


access to certain interactions. Apart from this prospect, Stefansky's tabu-
(7)
lated values are restricted to two-way designs. The best prescription for
Residua[ Mean
factorial designs with more than two factors appears to be that of John and (1) (4) (6) Square (corre-
(3)
Prescott (1975) although we must bear in mind the limited range of their Treatment (2) Effect 'Amended (5) 'New sponding yield
simulation studies which encompass only small designs (e.g. 23 , 2 x 3 3 , 24 , combination Yield total effects' Residuals residuals' missing)
etc.). Even in this range, the rather poor performance of their criticallevel
proposal when main effects and first order interactions are fitted in the 2 x 32 (1) 121 1129 1065 -15.25 0.75 562.5
design sounds a warning note. a 145 -93 -157 32.00 o 34.8
b 150 53 117 0.50 0.50 717.3
Some interesting practical examples of residual-based methods of testing ab 109 -45 19 -17.25 -1.25 519.1
outliers in designed experiments are given by Goldsmith and Boddy (1973). c 160 79 143 4.00 4.00 706.8
23 sets of data, previously discussed in the statistica! literature, are re- ac 112 -59 5 -20.75 -4.75 430.5
analysed to detect and test outliers. A 'consecutive' style of analysis is be 180 67 3 10.75 -5.25 640.5
a be 152 85 21 6.00 6.00 693.5
adopted, proceeding from a discordant outlier to the search for further
outliers. Computer-based application of the technique is discussed in some
detail in relation to any orthogonal design (admitting possible missing But let us look at the residuals after fitting a main effects mode[ to the
values). The authors express dissatisfaction with methods based on the originai data. These ha ve values as shown in column 5, with corresponding
largest residual in that its determination on the basis of fitting the model to residua[ sum of squares, 2152.5. The residua[ at the treatment combination a
the whole data set (including any outlier) will produce a test of relatively low is indeed the largest, although those at (1), ab and ac are also large. Are the
power. Their 'alternative' proposal is to regard each observation in turn as residuals random, or is there contamination at the treatment combination a
missing and to scan the set of n residual sums of squares to see if one of transmitted (via the correlation between the residuals in the same row or
them is noticeably smaller than the others, indicating an outlier. Concentrat- column of the A x B x C design) as larger residuals at (1), ab, ac, (and abc)?
ing on the minimum residua! sum of squares, they employ as test statistic the If we regard the yield a t a as missing, its least-squares estimate is 81 (vastly
quantity T~ of (7.1.12) above. However, we have noted that this is equival- different from 145, by precisely the discrepancy 64 proposed by Daniel). The
ent to other proposals based on the largest (absolute) residua!, so that it is new residuals are shown in column 6; the residua[ sum of squares is reduced
not clear just how a re-emphasis, or broadening of approach, is manifest in from 2152.5 (on four degrees of freedom) to 104.5 (on three degrees of
this work of Goldsmith and Boddy (1973). freedom).
The outlier test statistic value is
Example 7.1. T o illustrate some of the above ideas we considera set of data
from an unreplicated 2 3 experiment attributed by Goldsmith and Boddy 2 =3X2048= 58 .79
(1973) to an unpublished lecture by C. Daniel in 1960. The first three columns 2
T 104.5
in T ab le 7.3 belo w describe (in common notation) the eight treatment com-
binations, the corresponding yields, and the treatment effect totals respectively. with criticallevel (on the John and Prescott proposal) 0.04. Thus at the 5 per
All the effects are of similar order of magnitude and any main-effects cent level we would reject the a-yield of 145 as a discordant outlier.
analysis will not show up significant treatment effects. There are no im- A major diffìculty in handling outliers in designed experiments iies in their
mediately obvious outliers in the data. But perhaps outliers are present, initial detection. The residua[ 32.00 in column (5) hardly renders the a-yield a
masking genuinely significant treatment effects. According to Goldsmith and compelling candidate. Goldsmith and Boddy (1973) suggest that clearer
Boddy (1973), Daniel argued that the largeness of all the interaction terms initial detection may result from scanning the values of the residua[ mean
was in itself suspicious and might indicate one discordant outlier: specifically squares when each yield in turn is regarded as a missing value. This requires a
the yield at 'a', in view of the pattern of signs (-1, -1, +1, +l) of the lot of calculation if the number of observations is at all large. However, for the
interaction effects. Accordingly he estimated the outlier discrepancy as the current data, the residua[ mean squares (shown in column 7) really do
mean of th~ absolute values of the interaction effects (64) and produced highlight the outlier; all but the one corresponding with treatment combination
'amended effects' as shown in column 4, in which the interaction terms are a are of similar order to the originai (full data) mean square 538.1; the mean
now low in comparison with main effects. square when a is missing is dramatically smaller at 34.8.
246 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 24 7

Jobn (1978) gives two detailed practical examples of applying residual- discordant observations. He sbows how, on this definition, the premium can
based outlier metbods to data from designed experiments. One example is a be approximately determined.
1/3 replicate of a 34 , tbe otber a confounded 25 • An approximation to the protection provided by tbe rule, wben one of tbe
observations is discordant and bas bias {3u in the mean, is also given by
Anscombe. He reviews tbe numerica! properties of tbe rule by tabulating
7 .1.2 Residual-based accommodation procedures values of the protection, of tbe cut-off level h and of tbe probability of
We discussed earlier (Cbapter 4) tbe range of tecbniques available for inappropriate rejection of xm as discordant for premium levels of 2 per cent
accommodating outliers in robust analyses of univariate samples. Included in an d l per ce n t, and a range of values of l/ n.
tbe discussion was tbe premium-protection approacb of Anscombe (1960a), There is mucb that requires further investigation in tbis approacb: the
wbere in estimating tbe mean of tbe underlying distribution a location- accuracy of tbe various approximate results, tbe robustness (or form of
slippage outlier mode! is employed and tbe mean estimated eitber by tbe necessary modifications) of tbe procedure in the face of non-normality,
overall sample mean for tbe sample of size n, or by the sample mean of non-additivity, etc., it is of paramount concern to develop a feel for
(n -l) observations omitting an extreme sample value if it was su:fficiently appropriate levels of, and balances between, the premium and protection
large (or small) in value. Tbe outlier is tbus eitber rejected, or retained elements. Anscombe deals briefly with two otber matters: multiple outliers
witb full weigbt, in tbe estimation process (cf. partial retention employed in and unknown u. For tbe former case he suggests tbat the rule is applied
trimming, Winsorization, or more generai differential weigbting procedures). consecutively to successive smaller samples until we reacb tbe stage tbat no
Tbe procedure is assessed in terms of two criteria: tbe premium paid in further rejection of observations takes piace. For the (commonly encoun-
terms of increase of variance (or some otber measure of expected loss) of tered) situation where u is unknown be gives some approximate results for
tbe estimator wben tbe sample comes in fact from a bomogeneous source, the equivalent 'studentized' rule w bere u 2 is merely replaced by -s 2 , an
and tbe protection provided in terms of decrease of variance (or mean square estimate of u 2 based on l+ 10 degrees of freedom (10 corresponding witb
errar) wben a discordant observation is present. additional external or prior information about u 2 ). Tbe results are not
Anscombe (1960a) describes bow tbe same approacb can be used in pursued to the level of useful application, and Anscombe conjectures tbat
designed experiments (or in analysing generai linear models). Attention is the rule will have 'low power' unless l+ 10 is reasonably large (say, 30 or so).
restricted to situations wbere, in tbe absence of discordant values, all Some furtber observations are given by Anscombe and Tukey (1963) in tbe
residuals bave common variance, lu 2 /n, in tbe notation of Section 7.1.1, and context of a wider discussion of the analysis of residuals in designed
wbere inter-residual correlation is nowbere l (or -1). experiments and generallinear models. See also Anscombe (1961), and tbe
Only tbe case wbere u is known is considered in detail. If émax is again tbe discussion of tbe generai linear model in Section 7 .3.
estimated residual baving greatest absolute value it is proposed tbat: if
lÉmaxl > hu, w e reject the observation yielding émax• treat it as a missing value, 7 .1.3 Graphical methods
and estimate the unknown parameters (means) by a least-squares analysis; if
lemaxl ~ hu we retain all observations and conduct a full least-squares A variety of grapbical procedures bave been proposed for investigating the
analysis. validity of the various assumptions underlying the analysis of variance of
Tbe constant h needs to be cbosen to produce acceptable premium and data arising from a designed experiment. Sometimes these procedures are
protection guarantees. Anscombe sbows that the proposed rule bas a simple aimed specifically at exhibiting, or examining, outliers. More often other
interpretation. If lémaxl > hu, we merely replace xm (tbe observation yielding assumptions such as normality, additivity, or bomoscedasticity are under
Émax) by xm- némaxll in tbe estimating equations for the means. Determina- investigation, but the performances of tbe relevant procedures are sensitive
tion of premium and of protection are less straigbtforward tban in tbe single also to the presence of outliers, and en passant provide indications of
sample case. W e are concerned witb tbe effect of the rule on tbe variances of outlying behaviour of data points. Frequently it is di:fficult to distinguish
estimates. But we have a number of parameters of interest. One possibility wbicb specific departure from the underlying assumptions is manifest in an
is to consider tbe way in wbicb application of the rule affects tbe determin- unacceptable graphical plot; sometimes the procedure is more sensitive to
ant of the corresponding variance matrix (in tbe absence of discordant one departure than to anotber. Most of tbe procedures regard residuals as
observations). Anscombe argues that an appropriate notion of premium is in the natural reflection of tbe impropriety of tbe assumptions, altbougb non-
terms of tbat proportional increase in u 2 wbich would increase tbe variance- residual-based methods have also been advanced. A generai study of tbe
matrix determinant by as much as the proposed rule does, in tbe absence of assumptions underlying an analysis of variance is beyond our brief, but we
246 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 24 7

Jobn (1978) gives two detailed practical examples of applying residual- discordant observations. He sbows how, on this definition, the premium can
based outlier metbods to data from designed experiments. One example is a be approximately determined.
1/3 replicate of a 34 , tbe otber a confounded 25 • An approximation to the protection provided by tbe rule, wben one of tbe
observations is discordant and bas bias {3u in the mean, is also given by
Anscombe. He reviews tbe numerica! properties of tbe rule by tabulating
7 .1.2 Residual-based accommodation procedures values of the protection, of tbe cut-off level h and of tbe probability of
We discussed earlier (Cbapter 4) tbe range of tecbniques available for inappropriate rejection of xm as discordant for premium levels of 2 per cent
accommodating outliers in robust analyses of univariate samples. Included in an d l per ce n t, and a range of values of l/ n.
tbe discussion was tbe premium-protection approacb of Anscombe (1960a), There is mucb that requires further investigation in tbis approacb: the
wbere in estimating tbe mean of tbe underlying distribution a location- accuracy of tbe various approximate results, tbe robustness (or form of
slippage outlier mode! is employed and tbe mean estimated eitber by tbe necessary modifications) of tbe procedure in the face of non-normality,
overall sample mean for tbe sample of size n, or by the sample mean of non-additivity, etc., it is of paramount concern to develop a feel for
(n -l) observations omitting an extreme sample value if it was su:fficiently appropriate levels of, and balances between, the premium and protection
large (or small) in value. Tbe outlier is tbus eitber rejected, or retained elements. Anscombe deals briefly with two otber matters: multiple outliers
witb full weigbt, in tbe estimation process (cf. partial retention employed in and unknown u. For tbe former case he suggests tbat the rule is applied
trimming, Winsorization, or more generai differential weigbting procedures). consecutively to successive smaller samples until we reacb tbe stage tbat no
Tbe procedure is assessed in terms of two criteria: tbe premium paid in further rejection of observations takes piace. For the (commonly encoun-
terms of increase of variance (or some otber measure of expected loss) of tered) situation where u is unknown be gives some approximate results for
tbe estimator wben tbe sample comes in fact from a bomogeneous source, the equivalent 'studentized' rule w bere u 2 is merely replaced by -s 2 , an
and tbe protection provided in terms of decrease of variance (or mean square estimate of u 2 based on l+ 10 degrees of freedom (10 corresponding witb
errar) wben a discordant observation is present. additional external or prior information about u 2 ). Tbe results are not
Anscombe (1960a) describes bow tbe same approacb can be used in pursued to the level of useful application, and Anscombe conjectures tbat
designed experiments (or in analysing generai linear models). Attention is the rule will have 'low power' unless l+ 10 is reasonably large (say, 30 or so).
restricted to situations wbere, in tbe absence of discordant values, all Some furtber observations are given by Anscombe and Tukey (1963) in tbe
residuals bave common variance, lu 2 /n, in tbe notation of Section 7.1.1, and context of a wider discussion of the analysis of residuals in designed
wbere inter-residual correlation is nowbere l (or -1). experiments and generallinear models. See also Anscombe (1961), and tbe
Only tbe case wbere u is known is considered in detail. If émax is again tbe discussion of tbe generai linear model in Section 7 .3.
estimated residual baving greatest absolute value it is proposed tbat: if
lÉmaxl > hu, w e reject the observation yielding émax• treat it as a missing value, 7 .1.3 Graphical methods
and estimate the unknown parameters (means) by a least-squares analysis; if
lemaxl ~ hu we retain all observations and conduct a full least-squares A variety of grapbical procedures bave been proposed for investigating the
analysis. validity of the various assumptions underlying the analysis of variance of
Tbe constant h needs to be cbosen to produce acceptable premium and data arising from a designed experiment. Sometimes these procedures are
protection guarantees. Anscombe sbows that the proposed rule bas a simple aimed specifically at exhibiting, or examining, outliers. More often other
interpretation. If lémaxl > hu, we merely replace xm (tbe observation yielding assumptions such as normality, additivity, or bomoscedasticity are under
Émax) by xm- némaxll in tbe estimating equations for the means. Determina- investigation, but the performances of tbe relevant procedures are sensitive
tion of premium and of protection are less straigbtforward tban in tbe single also to the presence of outliers, and en passant provide indications of
sample case. W e are concerned witb tbe effect of the rule on tbe variances of outlying behaviour of data points. Frequently it is di:fficult to distinguish
estimates. But we have a number of parameters of interest. One possibility wbicb specific departure from the underlying assumptions is manifest in an
is to consider tbe way in wbicb application of the rule affects tbe determin- unacceptable graphical plot; sometimes the procedure is more sensitive to
ant of the corresponding variance matrix (in tbe absence of discordant one departure than to anotber. Most of tbe procedures regard residuals as
observations). Anscombe argues that an appropriate notion of premium is in the natural reflection of tbe impropriety of tbe assumptions, altbougb non-
terms of tbat proportional increase in u 2 wbich would increase tbe variance- residual-based methods have also been advanced. A generai study of tbe
matrix determinant by as much as the proposed rule does, in tbe absence of assumptions underlying an analysis of variance is beyond our brief, but we
248 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 249

will consider some of the methods with greatest relevance to the outlier (termed FUNOP-full norma! plot) has its limitations, of course, arising
issue. from the intercorrelation of residuals, or non-attributability of effects, for
Again we can conveniently commence with some work by Daniel (1959) the detection of outliers. FUNOP was introduced by Tukey (1962), who also
o n half-normal plots. Daniel considers factorial experiments of the form 2P: proposed a procedure entitled FUNOR-FUNOM for compressing the values
p factors each at two levels. He considers the ordered absolute values of the of larger residuals, in contrast to reducing them to zero as is implied in the
effect totals and remarks that in the absence of any real effects these are outright rejection, as outliers, of the corresponding observations.
observations of the order statistics from a known distribution. If the error One advantage of this simple approach is that it considers extreme values
distribution is norma!, the absolute effect totals will behave as independent of residuals, not merely extreme absolute values which were emphasized in
half-normal deviates with common variance. That is, their probability den- Sections 7 .1.1, 7 .1.2. As we shall consider later (Section 7 .3), in relation to
sity function has the form work by Andrews (1971), it can be important to consider the values of
residuals relative to the design matrix, rather than merely in absolute terms.
(x 2:: O). (7.1.21)
Gnanadesikan and Kettenring (1972) also consider the self-camouflaging
Plotting the ordered values on appropriately constructed probability paper effect of outliers, due to their influence carrying over to other residuals and
will produce, in the null case, observations lying dose to a straight line thus making their detection problematical. A solution is suggested in the use
through the origin. Departures from linearity will indicate real effects (large of 'modified residuals', based on outlier-robust fitted values (e.g. estimating
values lying off the straight line) or violations of the basic assumptions of the means by medians, trimmed means, etc.) rather than full-sample least-
model. In particular the presence of a single outlier will similarly inflate the squares estimates. It is suggested that probability plots of such ordered
absolute values of ali effect totals. It will show up, therefore, by the modified residuals may be more informative about outliers.
probability plot (although possibly linear apart from a few high values Probability plotting methods are also used to augment other methods of
indicating real effects) being not directed towards the origin but towards a studying outliers in the battery of procedures described by Gentleman and
value similar to the contamination bias of the outlier. Note that we become Wilk (1975a, 1975b); Gentleman and Wilk (1975b) re-examine full- and
aware of the outlier in this way; but we do not determine the offending half-normal plots of residuals and confirm their usefulness in detecting an
individuai observation. With more than two outliers the plot does not outlier when a single discordant value is present in a two-way design. They
necessarily reveal their presence in a very dramatic form, unless the biases demonstrate the confusion that can arise from compensating effects when
happen to be of the same sign and similar magnitude. We must also there are two or more discordant outliers, and suggest that the probability
recognize that other aberrances such as non-normality, non-constant error plots bave little value in such cases. Regarding the distribution of the
variances, etc. can affect the linearity of the plot and might do so, particu- residuals they show, in terms of the Shapiro and Wilk (1965) W-test for
larly in small data sets, in a way which is indistinguishable from the normality, that the intercorrelation of the residuals is not exhibited in
manifestation of an outlier. But this problem is not restricted to this apparent non-normality. Indeed, in terms of W, the residuals on the null
particular plotting method. additive model can exhibit a degree of 'super-normality'. This remains so for
See also Birnbaum (1959) on half-normal plotting methods. certain small configurations (e.g. 2x3, 3x4) even ifa discordant value is
Within the context of another wide-ranging study of how to assess the present although, as Chen (1971) shows, W is in generai sensitive to the
validity of assumptions underlying analysis of variance, Anscombe and presence of outliers. Prescott (1976) compares the effect of outliers on the
Tukey (1963) consider graphical display of residuals, including probability Shapiro-Wilk W-test and on an entropy based test of normality, by means
plots and plots against fitted values or external concomitant variables. They of sensitivity contours.
are at pains to emphasize the overlap of influences and indications, remark-
ing that apparent non-normality, non-additivity, different error variances,
and isolated discordant values canali show up in similar (and indistinguisha- 7 .1.4 Non-residual-based methods
ble) ways. One approach to the detection of outliers in linear models (principally
One simple graphical presentation is obtained by plotting the ordered two-way designs) which is not based exclusively on examining residuals
values of the residuals on normal probability paper. With norma! error about the no-outlier model is that of Gentleman and Wilk (1975a). They
structure we expect a straight line relationship. Non-linearity will be indica- prese nt a method for detecting the 'k most likely outlier su bse t' as that se t
tive of skewness or flatness of the error distribution; outliers will be manifest of k observations warranting attention as outliers prior to a further exami-
in marked isolated departures at either end of the plot. Such a procedure nation of their discordancy, or to an attempt to analyse the data in a way
248 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 249

will consider some of the methods with greatest relevance to the outlier (termed FUNOP-full norma! plot) has its limitations, of course, arising
issue. from the intercorrelation of residuals, or non-attributability of effects, for
Again we can conveniently commence with some work by Daniel (1959) the detection of outliers. FUNOP was introduced by Tukey (1962), who also
o n half-normal plots. Daniel considers factorial experiments of the form 2P: proposed a procedure entitled FUNOR-FUNOM for compressing the values
p factors each at two levels. He considers the ordered absolute values of the of larger residuals, in contrast to reducing them to zero as is implied in the
effect totals and remarks that in the absence of any real effects these are outright rejection, as outliers, of the corresponding observations.
observations of the order statistics from a known distribution. If the error One advantage of this simple approach is that it considers extreme values
distribution is norma!, the absolute effect totals will behave as independent of residuals, not merely extreme absolute values which were emphasized in
half-normal deviates with common variance. That is, their probability den- Sections 7 .1.1, 7 .1.2. As we shall consider later (Section 7 .3), in relation to
sity function has the form work by Andrews (1971), it can be important to consider the values of
residuals relative to the design matrix, rather than merely in absolute terms.
(x 2:: O). (7.1.21)
Gnanadesikan and Kettenring (1972) also consider the self-camouflaging
Plotting the ordered values on appropriately constructed probability paper effect of outliers, due to their influence carrying over to other residuals and
will produce, in the null case, observations lying dose to a straight line thus making their detection problematical. A solution is suggested in the use
through the origin. Departures from linearity will indicate real effects (large of 'modified residuals', based on outlier-robust fitted values (e.g. estimating
values lying off the straight line) or violations of the basic assumptions of the means by medians, trimmed means, etc.) rather than full-sample least-
model. In particular the presence of a single outlier will similarly inflate the squares estimates. It is suggested that probability plots of such ordered
absolute values of ali effect totals. It will show up, therefore, by the modified residuals may be more informative about outliers.
probability plot (although possibly linear apart from a few high values Probability plotting methods are also used to augment other methods of
indicating real effects) being not directed towards the origin but towards a studying outliers in the battery of procedures described by Gentleman and
value similar to the contamination bias of the outlier. Note that we become Wilk (1975a, 1975b); Gentleman and Wilk (1975b) re-examine full- and
aware of the outlier in this way; but we do not determine the offending half-normal plots of residuals and confirm their usefulness in detecting an
individuai observation. With more than two outliers the plot does not outlier when a single discordant value is present in a two-way design. They
necessarily reveal their presence in a very dramatic form, unless the biases demonstrate the confusion that can arise from compensating effects when
happen to be of the same sign and similar magnitude. We must also there are two or more discordant outliers, and suggest that the probability
recognize that other aberrances such as non-normality, non-constant error plots bave little value in such cases. Regarding the distribution of the
variances, etc. can affect the linearity of the plot and might do so, particu- residuals they show, in terms of the Shapiro and Wilk (1965) W-test for
larly in small data sets, in a way which is indistinguishable from the normality, that the intercorrelation of the residuals is not exhibited in
manifestation of an outlier. But this problem is not restricted to this apparent non-normality. Indeed, in terms of W, the residuals on the null
particular plotting method. additive model can exhibit a degree of 'super-normality'. This remains so for
See also Birnbaum (1959) on half-normal plotting methods. certain small configurations (e.g. 2x3, 3x4) even ifa discordant value is
Within the context of another wide-ranging study of how to assess the present although, as Chen (1971) shows, W is in generai sensitive to the
validity of assumptions underlying analysis of variance, Anscombe and presence of outliers. Prescott (1976) compares the effect of outliers on the
Tukey (1963) consider graphical display of residuals, including probability Shapiro-Wilk W-test and on an entropy based test of normality, by means
plots and plots against fitted values or external concomitant variables. They of sensitivity contours.
are at pains to emphasize the overlap of influences and indications, remark-
ing that apparent non-normality, non-additivity, different error variances,
and isolated discordant values canali show up in similar (and indistinguisha- 7 .1.4 Non-residual-based methods
ble) ways. One approach to the detection of outliers in linear models (principally
One simple graphical presentation is obtained by plotting the ordered two-way designs) which is not based exclusively on examining residuals
values of the residuals on normal probability paper. With norma! error about the no-outlier model is that of Gentleman and Wilk (1975a). They
structure we expect a straight line relationship. Non-linearity will be indica- prese nt a method for detecting the 'k most likely outlier su bse t' as that se t
tive of skewness or flatness of the error distribution; outliers will be manifest of k observations warranting attention as outliers prior to a further exami-
in marked isolated departures at either end of the plot. Such a procedure nation of their discordancy, or to an attempt to analyse the data in a way
250 Outliers in statistical data Outliers in designed experiments, regression, and in time-series 251

which is (relatively) insensitive to their presence. The procedure consists of the assumptions of the model and presumably will be influenced by non-
specifying k, determining the 'k most likely outlier subset', assessing its normality, heteroscedasticity, and non-additivity, as well as by discordant
statistica! significance and, if not significant, proceeding to consider succes- observations.
sively smalier numbers, k - l, k - 2, ... of outliers until a significant outlier If the 'k most likely outlier subset' is not statisticaliy significant, we go
subset is detected (if at ali). through the same exercise with k reduced to k -l, and so on, until we
The method involves a deal of computational effort and needs to be detect a significant outlier subset.
conducted o n a computer. lt proceeds as foliows. The whole approach appears cumbersome, although some computational
For given k we consider ali (;) partition~ of the data set obtained on
simplification is possible, e.g. for two-way designs, see Gentleman and Wilk
(1975a). It is interesting in principle, but clearly needs much examination
specifying particular subsets of k observations. If there are truly k discor- and refinement if it is to be a practical proposition. The philosophy behind
starting with a specific k and then considering smaller values is that if we
dant values present, then ( n-k)
k of these subsets w1·1l not contam
· d"1scord ant
obtain significance at any stage we might argue that there is little point in
considering even smalier k (even fewer outliers); the reverse policy does not
values, (;)-(n~ k) will do so. We suppose (in generallinear model terms) bave this apparent advantage. However, we must take care that 'swamping'
does not occur: the phenomenon of a non-outlier being included fortuitously
that the observation vector x has the form
along with an outlying subgroup (see Fielier, 1976). This approach does
x=A6+8+E (7.1.22) help, though, to reduce the opposite phenomenon of 'masking', provided k
is initialiy chosen sufficiently (but not unreasonably) large.
wbere A is a n x q design matrix, 6 a vector of q parameters, E the error See also John (1978) and John and Draper (1978).
vector and 8 a n x l vector with n- k zeros and k unknown non-zero values
corresponding with the k mean biases of the discordant observations.
7 .1.5 Non-param.etric, and Bayesian, methods
'Suppose that €, E are the estimated residuals obtained by fitting the nuli
model x= A 6 +E an d by fitting the outlier mod el (7 .1.22), respectively, Some non-parametric methods bave been proposed for detecting outliers in
when a particular set of k observations are under consideration. Then the designed experiments. We bave already referred to the generai proposals by
difference in the sums of squares Bross {1961), who seeks to give expression to the idea of outliers as
disrupters of anticipated pattern in the data. He sketches out a non-
(7.1.23) parametric approach in which we detect discordant outliers in terms of
pairwise inversions of the observations relative to the anticipated pattern
(whicAl is inevitably non-negative) provides a measure of the effect of using (inter alia) a sequence sign test. Suppose that in a two-way design we
assuming that the k chosen observations are in fact discordant. (Note that expect real effects to show up in a monotone change in the means within
E' E is also the sum of squares of residuals arising from fitting the nuli model each row and each column. We can look for the reflection of such a
to the reduced data set of n- k observations.) relationship in the actual data: anomalous inversions in the values of
If we were to evaluate the Qk for ali (;) partitions of the data we could
successive observations within a row or a column may indicate outliers.
Appropriate test statistics can be constructed in terms of accumulated
examine their relative sizes as indications of the prospect of the correspond- numbers of inversions. But there are problems with this approach, arising
ing k obsevations being outliers. The largest Qk is used to detect the k most from the hierarchy of models we bave to consider and the intangibility of
likely outlier subset. To determine if it is large enough not to bave arisen 'anticipated pattern'. We do not know at the outset what sort of pattern to
purely by chance under the nuli model we would realiy need to know its nuli expect for the means as a reflection of the additive model for the means. We
distribution. This distribution is unknown, and informai methods are pro- do not even know if the data support an additive model, or if apparent
posed to assess the significance of the largest Qk. These include the plotting non-additivity reflects interactions or isolated discordant values. More fun-
of some large subset of the larger Qk against 'typical values' of these damentally, the data may be just a random sample of observations from a
obtained {presumably by simulation) under the nuli model, or by plotting common basic distribution. That is to say there are no real effects. This is
the residuals o n the outlier model (7 .1.22) corresponding with the se t of the null-model in terms of which we wish to conduct the analysis of
'outliers' detected by the largest Qk. The first of these leans very heavily on variance. But if there are no real effects we bave no structured pattern
250 Outliers in statistical data Outliers in designed experiments, regression, and in time-series 251

which is (relatively) insensitive to their presence. The procedure consists of the assumptions of the model and presumably will be influenced by non-
specifying k, determining the 'k most likely outlier subset', assessing its normality, heteroscedasticity, and non-additivity, as well as by discordant
statistica! significance and, if not significant, proceeding to consider succes- observations.
sively smalier numbers, k - l, k - 2, ... of outliers until a significant outlier If the 'k most likely outlier subset' is not statisticaliy significant, we go
subset is detected (if at ali). through the same exercise with k reduced to k -l, and so on, until we
The method involves a deal of computational effort and needs to be detect a significant outlier subset.
conducted o n a computer. lt proceeds as foliows. The whole approach appears cumbersome, although some computational
For given k we consider ali (;) partition~ of the data set obtained on
simplification is possible, e.g. for two-way designs, see Gentleman and Wilk
(1975a). It is interesting in principle, but clearly needs much examination
specifying particular subsets of k observations. If there are truly k discor- and refinement if it is to be a practical proposition. The philosophy behind
starting with a specific k and then considering smaller values is that if we
dant values present, then ( n-k)
k of these subsets w1·1l not contam
· d"1scord ant
obtain significance at any stage we might argue that there is little point in
considering even smalier k (even fewer outliers); the reverse policy does not
values, (;)-(n~ k) will do so. We suppose (in generallinear model terms) bave this apparent advantage. However, we must take care that 'swamping'
does not occur: the phenomenon of a non-outlier being included fortuitously
that the observation vector x has the form
along with an outlying subgroup (see Fielier, 1976). This approach does
x=A6+8+E (7.1.22) help, though, to reduce the opposite phenomenon of 'masking', provided k
is initialiy chosen sufficiently (but not unreasonably) large.
wbere A is a n x q design matrix, 6 a vector of q parameters, E the error See also John (1978) and John and Draper (1978).
vector and 8 a n x l vector with n- k zeros and k unknown non-zero values
corresponding with the k mean biases of the discordant observations.
7 .1.5 Non-param.etric, and Bayesian, methods
'Suppose that €, E are the estimated residuals obtained by fitting the nuli
model x= A 6 +E an d by fitting the outlier mod el (7 .1.22), respectively, Some non-parametric methods bave been proposed for detecting outliers in
when a particular set of k observations are under consideration. Then the designed experiments. We bave already referred to the generai proposals by
difference in the sums of squares Bross {1961), who seeks to give expression to the idea of outliers as
disrupters of anticipated pattern in the data. He sketches out a non-
(7.1.23) parametric approach in which we detect discordant outliers in terms of
pairwise inversions of the observations relative to the anticipated pattern
(whicAl is inevitably non-negative) provides a measure of the effect of using (inter alia) a sequence sign test. Suppose that in a two-way design we
assuming that the k chosen observations are in fact discordant. (Note that expect real effects to show up in a monotone change in the means within
E' E is also the sum of squares of residuals arising from fitting the nuli model each row and each column. We can look for the reflection of such a
to the reduced data set of n- k observations.) relationship in the actual data: anomalous inversions in the values of
If we were to evaluate the Qk for ali (;) partitions of the data we could
successive observations within a row or a column may indicate outliers.
Appropriate test statistics can be constructed in terms of accumulated
examine their relative sizes as indications of the prospect of the correspond- numbers of inversions. But there are problems with this approach, arising
ing k obsevations being outliers. The largest Qk is used to detect the k most from the hierarchy of models we bave to consider and the intangibility of
likely outlier subset. To determine if it is large enough not to bave arisen 'anticipated pattern'. We do not know at the outset what sort of pattern to
purely by chance under the nuli model we would realiy need to know its nuli expect for the means as a reflection of the additive model for the means. We
distribution. This distribution is unknown, and informai methods are pro- do not even know if the data support an additive model, or if apparent
posed to assess the significance of the largest Qk. These include the plotting non-additivity reflects interactions or isolated discordant values. More fun-
of some large subset of the larger Qk against 'typical values' of these damentally, the data may be just a random sample of observations from a
obtained {presumably by simulation) under the nuli model, or by plotting common basic distribution. That is to say there are no real effects. This is
the residuals o n the outlier model (7 .1.22) corresponding with the se t of the null-model in terms of which we wish to conduct the analysis of
'outliers' detected by the largest Qk. The first of these leans very heavily on variance. But if there are no real effects we bave no structured pattern
252 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 253

against wbicb to detect outliers; inversions, for example, will be irrelevant in predetermined values ui of a variable U. Tbus
tbis context. Of course a single-sample test of discordancy would be approp- (7.2.1)
riate if tbere are no real effects but we bave no way of knowing if effects are xi = 80 + 81 ui +si
present or not. Tbis uncertainty is tbe stimulus for studying tbe additive wbere tbe si are independent witb zero mean. Usually tbe si are assumed to
mode l; we wan t sucb study to be safeguarded from outliers-we fin d come from a common distribution; more specifically we migbt assume
ourselves once more in a vicious circle of conflicting aims and indications. si~ N(O, o.z). W e sball consider tbe detection, testing, and accommodation
Otber non-parametric metbods of a more detailed form (but witb similar of outliers in tbis situation. If 80 and 81 are estimated by least-squares (or
conceptual difficulties) bave been proposed. Brown (1975) develops an equivalently by maximum likelibood on tbe normal error model) as 00 and
approximate x2 test of discordancy for outliers based on tbe signs of tbe 01 we can estimate tbe residuals si as
estimated residuals in tbe rows and columns of tbe data for a two-way
(7.2.2)
design, and considers its extension to more complicated designs. For tbe
two-way design be proposes tbe statistic and in seeking outliers it is again sensible to examine tbe relative sizes of tbe
r c éj.
c- 1 L Rf+r- L Cf-(rct 'P
i=l
1

j=l
1
(7.1.24) In tbe discussion of designed experiments we restricted attention to
designs wbere tbe estimated residuals bad equal variance. Even for tbe
wbere Ri and Ci are tbe sums of tbe signs of tbe residuals in tbe itb row and simple linear regression model (7 .2.1) bowever, we lose tbis simplifying
jtb column, respectively, and T tbe overall sum of tbe signs of tbe residuals. feature since
Tbis is sensitive to tbe presence of outliers and will bave a no-outlier
distribution wbicb is approximately a multiple (l- 2/ 7T) of x2 witb r +c - l
degrees of freedom. Unfortunately tbe metbod does not pinpoint tbe
var( éi) = u{n : 1 - (ui -
2
ii f l~ (ui - ii )
2
}. (7.2.3)

outlier; also a significant result could arise from non-null manifestations Tbus tbe éi are more variable tbe closer ui is to ii. Tbis 'ballooning' effect
otber tban a single discordant value. of tbe residuals (Bebnken and Draper, 1972) needs to be taken into
The 'extreme rank sum test for outliers' of Tbompson and Willke (1963) account, if it is at ali marked, in examining tbe size of tbe residuals éi as a
is not a test of discordancy for individuai outliers; it is a non-parametric reflection of outliers. Tbe effect is not restricted to tbe simple linear
slippage test for outlying rows or columns in a two-way design. See Cbap- regression model (7.2.1).
ter 5. For tbe generai linear model
A Bayesian approacb to bandling outliers in generai linear models is
presented by Box and Tiao (1968). It is not specifically directed towards x=A6+E (7.2.4)
designed experiments; some details are given in Section 8.1.2. we bave (in tbe full rank case)

7.2 OUTLIERS IN REGRESSION var(€ = u 2 (In- R) (7.2.5)

As witb designed experiments, so a study of outliers in regression situations witb


can be regarded as a particular case of tbe study of outliers in generallinear R = A(A'A)- 1 A'
models. However, tbere are practical advantages in examining tbe regression so tbat we will need later to consider tbe implications for outlier detection of
situation per se, and in metbodological terms it represents a furtber step tbe extent to wbicb tbe form of tbe design matrix A induces inbomogeneity
towards tbe generai case. But tbe boundary becomes somewbat blurred;
of variance in tbe estimated residuals.
often regression problems are considered in tbe literature as illustrative Returning to tbe simple model (7.2.1), one possible approacb is to
examples in wider investigations and tbe present section tbus tends to examine appropriately weighted estimated residuals
overlap witb tbe later discussion (Section 7 .3) of tbe generai linear model.
/{•~ (n~ -(~-ii}'/~ (ui- uf)}
1
:,=ii (7.2.6)
7 .2.1 Outliers in linear regression
Tbe simplest case is wbere we bave observations xi of independent ran- wbere s 2 =I éf/(n- 2) is tbe unbiased estimate of u 2 , and sf is an unbiased
dom variables Jç (j = l, 2, ... , n) wbose means depend linearly o n estimate of var(éi). Corresponding witb tbe earlier use (Section 7.1.1) of tbe
252 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 253

against wbicb to detect outliers; inversions, for example, will be irrelevant in predetermined values ui of a variable U. Tbus
tbis context. Of course a single-sample test of discordancy would be approp- (7.2.1)
riate if tbere are no real effects but we bave no way of knowing if effects are xi = 80 + 81 ui +si
present or not. Tbis uncertainty is tbe stimulus for studying tbe additive wbere tbe si are independent witb zero mean. Usually tbe si are assumed to
mode l; we wan t sucb study to be safeguarded from outliers-we fin d come from a common distribution; more specifically we migbt assume
ourselves once more in a vicious circle of conflicting aims and indications. si~ N(O, o.z). W e sball consider tbe detection, testing, and accommodation
Otber non-parametric metbods of a more detailed form (but witb similar of outliers in tbis situation. If 80 and 81 are estimated by least-squares (or
conceptual difficulties) bave been proposed. Brown (1975) develops an equivalently by maximum likelibood on tbe normal error model) as 00 and
approximate x2 test of discordancy for outliers based on tbe signs of tbe 01 we can estimate tbe residuals si as
estimated residuals in tbe rows and columns of tbe data for a two-way
(7.2.2)
design, and considers its extension to more complicated designs. For tbe
two-way design be proposes tbe statistic and in seeking outliers it is again sensible to examine tbe relative sizes of tbe
r c éj.
c- 1 L Rf+r- L Cf-(rct 'P
i=l
1

j=l
1
(7.1.24) In tbe discussion of designed experiments we restricted attention to
designs wbere tbe estimated residuals bad equal variance. Even for tbe
wbere Ri and Ci are tbe sums of tbe signs of tbe residuals in tbe itb row and simple linear regression model (7 .2.1) bowever, we lose tbis simplifying
jtb column, respectively, and T tbe overall sum of tbe signs of tbe residuals. feature since
Tbis is sensitive to tbe presence of outliers and will bave a no-outlier
distribution wbicb is approximately a multiple (l- 2/ 7T) of x2 witb r +c - l
degrees of freedom. Unfortunately tbe metbod does not pinpoint tbe
var( éi) = u{n : 1 - (ui -
2
ii f l~ (ui - ii )
2
}. (7.2.3)

outlier; also a significant result could arise from non-null manifestations Tbus tbe éi are more variable tbe closer ui is to ii. Tbis 'ballooning' effect
otber tban a single discordant value. of tbe residuals (Bebnken and Draper, 1972) needs to be taken into
The 'extreme rank sum test for outliers' of Tbompson and Willke (1963) account, if it is at ali marked, in examining tbe size of tbe residuals éi as a
is not a test of discordancy for individuai outliers; it is a non-parametric reflection of outliers. Tbe effect is not restricted to tbe simple linear
slippage test for outlying rows or columns in a two-way design. See Cbap- regression model (7.2.1).
ter 5. For tbe generai linear model
A Bayesian approacb to bandling outliers in generai linear models is
presented by Box and Tiao (1968). It is not specifically directed towards x=A6+E (7.2.4)
designed experiments; some details are given in Section 8.1.2. we bave (in tbe full rank case)

7.2 OUTLIERS IN REGRESSION var(€ = u 2 (In- R) (7.2.5)

As witb designed experiments, so a study of outliers in regression situations witb


can be regarded as a particular case of tbe study of outliers in generallinear R = A(A'A)- 1 A'
models. However, tbere are practical advantages in examining tbe regression so tbat we will need later to consider tbe implications for outlier detection of
situation per se, and in metbodological terms it represents a furtber step tbe extent to wbicb tbe form of tbe design matrix A induces inbomogeneity
towards tbe generai case. But tbe boundary becomes somewbat blurred;
of variance in tbe estimated residuals.
often regression problems are considered in tbe literature as illustrative Returning to tbe simple model (7.2.1), one possible approacb is to
examples in wider investigations and tbe present section tbus tends to examine appropriately weighted estimated residuals
overlap witb tbe later discussion (Section 7 .3) of tbe generai linear model.
/{•~ (n~ -(~-ii}'/~ (ui- uf)}
1
:,=ii (7.2.6)
7 .2.1 Outliers in linear regression
Tbe simplest case is wbere we bave observations xi of independent ran- wbere s 2 =I éf/(n- 2) is tbe unbiased estimate of u 2 , and sf is an unbiased
dom variables Jç (j = l, 2, ... , n) wbose means depend linearly o n estimate of var(éi). Corresponding witb tbe earlier use (Section 7.1.1) of tbe
254 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 255

maximum absolute studentized residual émaxl s, we might no w seek to detect independent variable values); and the suggested modification to deai with
and test the discordancy of a single outlier in terms of the statistic the different degrees of freedom appears to be incorrectly (or at least
t= max lé/ sJ (7 .2.7) ambiguously) described.
Prescott (1975b) takes up the insensitivity of the criticai vaiues of t to the
If t is sufficiently large, we adjudge the observation yieiding max lé/sil to u-values of t in another respect. He suggests that we ignore the differing
be a discordant outlier. Note the implication of this poiicy. It is no longer variances of the éi and repiace si in (7 .2. 7) by s where s2 is the 'average
necessariiy the most extreme residuai which is the prime candidate for variance' I7 éf/n introduced by Behnken and Draper (1972). (7 .2.7) then
designation as an outlier. Even a modest residuai, corresponding with a reduces to a multiple of the MNR: viz. n! max lsii/.J(I1 éf), and approximate
small variance (7 .2.3), can be promoted into an outlying roie. criticai vaiues are obtained (invoking Stefansky 1971, 1972) on the assump-
To conduct the test we need to know the distribution of t; again its exact tion that the estimated residuals bave common variance equal to the
form is intractable but much work has been published on approximate forms population 'average variance' (n-2)u 2 /n.
for the distribution or its percentage points, based on Bonferroni in- Although formai distinctions exist in the principles invoked by Srikantan
equalities or large scale simulation studies. (1961), Tietjen, Moore, and Beckman (1973) and by Prescott (1975b) their
In the first category is the discussion by Srikantan (1961) of tests for tabuiated criticai values differ little for practicai purposes when approp-
outliers in the generai multilinear regression model. He proposes use of a riately compared. However, perhaps the most useful tabulation of criticai
test statistic which particularizes (in the case of the simpie model (7 .2.1)) to values for the simple linear regression model is the appropriate section of
t 2 , where t is given by (7.2.7), and also considers the corresponding statistics the table by Lund (1975) of criticai vaiues of the generaiization of t for the
max{éi léil/si} and max{-éi léil/si} for one-sided tests (where the outlier generai linear model. Ellenberg (1973) had investigated the joint distribu-
model contempiates slippage in a single mean). He uses the first Bonferroni tion of the studentized residuais for the generai case. Lund makes use of
inequality to derive upper bounds for the 5 per cent and l per cent criticai Ellenberg's results, invoking the first order Bonferroni inequality, to ex-
values of the three test statistics, based o n the F -distribution. Of particular amine the distribution of the maximum individually studentized residua!
note is his demonstration that the upper bounds are the exact percentage (7 .2. 7). The resulting upper bounds for the criticai values depend o n
points for reasonably small sampies (this predates and generalizes Stefansky, percentage points of F-distributions at levels not conventionally tabuiated.
1971 and 1972, who considers only models where the éi have common This was the problem which faced ali the previously discussed efforts to
variance, although Stefansky's use of higher-order inequalities extends the determine criticai vaiues, but Lund approaches it directiy by numerically
exact results to larger sampie sizes). determining the specific criticai F -values which are required. For the linear
Impiicit in Srikantan's result about exact criticai vaiues is that the extreme regression modeI (7 .2.1) upper bounds for the 5 per cent an d l per cent
tail behaviour of the distribution of t 2 (and of the one-sided equivalents) will criticai values of t are to be found in the second columns (q= 2) of the
be independent of the configuration of values of the independent variables appropriate tables presented by Lund and reproduced as Table XXXII on
in the multilinear model. What constitutes the 'extreme tail' will depend, pages 335-336, for selected vaiues of n from 5 to 100. We must recall, how-
however, on this configuration which will thus determine whether the ever, that the vaiue of n at which the entries change from being exact to
conventional significance levels (5 per cent, l per cent say) are included in being upper bounds is unknown.
the 'extreme tail'. Tietjen, Moore, and Beckman (1973) conduct a large- Whilst space does not permit an extended discussion of other aspects of
scale simulation to obtain approximate criticai values for t. They reconfirm the study of outliers in simple linear regression, we must refer briefty to
empirically the known relative insensitivity of the criticai vaiues to the some further contributions. Elashoff (1972) considers the estimation of 8o
configuration of the ui for reasonably small n, and conclude that one can and 81 in the special case of outliers generated by a particular combined
empioy the tabulated criticai values of the singie-sample statistic (x(n)- i)/ s mixture-slippage model. She assumes, for example, that the si are uncorre-
due to Grubbs (1950) (Thompson, 1935), with appropriate minor modifica- lated and arise from a mixed normal distribution
tions to allow for its one-sided form and the discrepancy of l in the degrees
of freedom associated with s 2 • But caution is needed in three respects. The
one- and two-sided distinctions imply the use of a further Bonferroni
inequality so that at best the comparison provides only upper bounds for the with y known and A(u.) = c(u.- u(1)) 2 , where c is constant. This is a highiy
criticai values; for larger sampie sizes the correspondence will deteriorate (in specific study of the a~comm~dation of outiiers in linear regression, and is
the light of the discussion above on the effect of the configuration of the
likeiy to be of limited applicability.
254 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 255

maximum absolute studentized residual émaxl s, we might no w seek to detect independent variable values); and the suggested modification to deai with
and test the discordancy of a single outlier in terms of the statistic the different degrees of freedom appears to be incorrectly (or at least
t= max lé/ sJ (7 .2.7) ambiguously) described.
Prescott (1975b) takes up the insensitivity of the criticai vaiues of t to the
If t is sufficiently large, we adjudge the observation yieiding max lé/sil to u-values of t in another respect. He suggests that we ignore the differing
be a discordant outlier. Note the implication of this poiicy. It is no longer variances of the éi and repiace si in (7 .2. 7) by s where s2 is the 'average
necessariiy the most extreme residuai which is the prime candidate for variance' I7 éf/n introduced by Behnken and Draper (1972). (7 .2.7) then
designation as an outlier. Even a modest residuai, corresponding with a reduces to a multiple of the MNR: viz. n! max lsii/.J(I1 éf), and approximate
small variance (7 .2.3), can be promoted into an outlying roie. criticai vaiues are obtained (invoking Stefansky 1971, 1972) on the assump-
To conduct the test we need to know the distribution of t; again its exact tion that the estimated residuals bave common variance equal to the
form is intractable but much work has been published on approximate forms population 'average variance' (n-2)u 2 /n.
for the distribution or its percentage points, based on Bonferroni in- Although formai distinctions exist in the principles invoked by Srikantan
equalities or large scale simulation studies. (1961), Tietjen, Moore, and Beckman (1973) and by Prescott (1975b) their
In the first category is the discussion by Srikantan (1961) of tests for tabuiated criticai values differ little for practicai purposes when approp-
outliers in the generai multilinear regression model. He proposes use of a riately compared. However, perhaps the most useful tabulation of criticai
test statistic which particularizes (in the case of the simpie model (7 .2.1)) to values for the simple linear regression model is the appropriate section of
t 2 , where t is given by (7.2.7), and also considers the corresponding statistics the table by Lund (1975) of criticai vaiues of the generaiization of t for the
max{éi léil/si} and max{-éi léil/si} for one-sided tests (where the outlier generai linear model. Ellenberg (1973) had investigated the joint distribu-
model contempiates slippage in a single mean). He uses the first Bonferroni tion of the studentized residuais for the generai case. Lund makes use of
inequality to derive upper bounds for the 5 per cent and l per cent criticai Ellenberg's results, invoking the first order Bonferroni inequality, to ex-
values of the three test statistics, based o n the F -distribution. Of particular amine the distribution of the maximum individually studentized residua!
note is his demonstration that the upper bounds are the exact percentage (7 .2. 7). The resulting upper bounds for the criticai values depend o n
points for reasonably small sampies (this predates and generalizes Stefansky, percentage points of F-distributions at levels not conventionally tabuiated.
1971 and 1972, who considers only models where the éi have common This was the problem which faced ali the previously discussed efforts to
variance, although Stefansky's use of higher-order inequalities extends the determine criticai vaiues, but Lund approaches it directiy by numerically
exact results to larger sampie sizes). determining the specific criticai F -values which are required. For the linear
Impiicit in Srikantan's result about exact criticai vaiues is that the extreme regression modeI (7 .2.1) upper bounds for the 5 per cent an d l per cent
tail behaviour of the distribution of t 2 (and of the one-sided equivalents) will criticai values of t are to be found in the second columns (q= 2) of the
be independent of the configuration of values of the independent variables appropriate tables presented by Lund and reproduced as Table XXXII on
in the multilinear model. What constitutes the 'extreme tail' will depend, pages 335-336, for selected vaiues of n from 5 to 100. We must recall, how-
however, on this configuration which will thus determine whether the ever, that the vaiue of n at which the entries change from being exact to
conventional significance levels (5 per cent, l per cent say) are included in being upper bounds is unknown.
the 'extreme tail'. Tietjen, Moore, and Beckman (1973) conduct a large- Whilst space does not permit an extended discussion of other aspects of
scale simulation to obtain approximate criticai values for t. They reconfirm the study of outliers in simple linear regression, we must refer briefty to
empirically the known relative insensitivity of the criticai vaiues to the some further contributions. Elashoff (1972) considers the estimation of 8o
configuration of the ui for reasonably small n, and conclude that one can and 81 in the special case of outliers generated by a particular combined
empioy the tabulated criticai values of the singie-sample statistic (x(n)- i)/ s mixture-slippage model. She assumes, for example, that the si are uncorre-
due to Grubbs (1950) (Thompson, 1935), with appropriate minor modifica- lated and arise from a mixed normal distribution
tions to allow for its one-sided form and the discrepancy of l in the degrees
of freedom associated with s 2 • But caution is needed in three respects. The
one- and two-sided distinctions imply the use of a further Bonferroni
inequality so that at best the comparison provides only upper bounds for the with y known and A(u.) = c(u.- u(1)) 2 , where c is constant. This is a highiy
criticai values; for larger sampie sizes the correspondence will deteriorate (in specific study of the a~comm~dation of outiiers in linear regression, and is
the light of the discussion above on the effect of the configuration of the
likeiy to be of limited applicability.
256 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 257

Metbods based on division of tbe data into sub-samples, for detecting non-constant variance of estimated residuais, difficulties in precise determi-
outliers and for estimating parameters in a manner wbicb is robust against nation of criticai vaiues for test statistics, and so on.
outliers, are described by Scbweder (1976) and by Hinicb and Talwar In principie tbe uii couid take any vaiues an d q couid be of any order.
(1975), respectively. In tbe former case it is assumed tbat an uncontami- Tbus (7 .2.8) is really just a generai Iinear modei; it encompasses simpie
nated sub-sample can be identified, in tbe latter case trimmed means of tbe linear regression and all designed experiments. We bave tbus reacbed a
sets of sub-sample estimated regression coefficients are utiiized. stage wbere it is expedient to proceed directly to tbe generai case and we do
Otber robust procedures for estimating or testing tbe regression coeffi- so in tbe following section.
cients are presented by Adicbie (1967a, 1967b); tbey are non-parametric Tbe literature contains few specific studies of muitilinear or polynomiai
(based on rank tests) and provide en passan( a degree of protection against regression per se. O ne example is considered in Srikantan (1961) wbere
outliers. q= 3 and tbe uii (i= l, 2) are simpie trigonometric functions. But later work
on tbe relative insensitivity of tbe configuration of tbe vaiues of independent
Example 7.2. Consider the load/ extension data of T ab le 7.1. Fitting the variables (at least for residuai-based metbods of testing discordancy) impiies
linear regression mode l ( 7.2.1) we obtain tbat we do not need to be so specific and we can reasonabiy resort to results
00 =0.67656 81 = 0.07565. for tbe generai case.

The estimate d residuals, éi, and studentized residuals, éi si, are as given in 7.3 OUTLIERS WITH GENERAL LINEAR MODELS
Table 7.4. Thus the observation (53.4, 3.1) stands out as an outlier. However,
it is not statistically significant (the 5 per cent and l per cent criticai values Many of tbe generai resuits on outlier detection, testing discordancy and
being 2.29, 2.44: see Table XXXII). accommodating outliers in tbe presence of a generai Iinear modei bave
aiready been indicated or illustrated by tbe more specific studies earlier in
Table 7.4 tbe cbapter. However, we need to draw tbe tbreads togetber in re-examining
tbe range of approacbes and describing tbe extent and manner of generaiiza-
tion of tbe earlier resuits. Tbe basic distinction between metbods based
exclusiveiy on residuals, and otber metbods, is again evident. Most pub-
1.6 0.076 0.123 Iisbed results relate to tbe totally residual-based approacb.
2.1 -0.173 -0.245
3.4 0.462 0.616
3.3 0.044 0.058 7 .3.1 Residual-based methods
4.2 0.210 0.272
-2.137
W e are concerned witb situations wbere tbe observation vector x= (x 1 , x 2 ,
3.1 -1.616
4.9 -0.308 -0.422 .. ·. , xn )' is represented by a basic generai linear model
6.2 0.894 1.236 x=A6+E (7.3.1)
6.3 0.411 0.614
witb 6 a q x l vector of parameters, A a known n x q matrix of coefficients
(assumed bere to be of full rank) and E a n x l vector of residuais. We
7 .2.2 Multiple regression assume tbat E bas zero mean vector and variance matrix V( E)= a.2 In wbere
In is tbe n x n identity matrix, so tbat tbe true residuals bave common
Tbe immediate generalization of tbe simple Iinear regression model (7 .2.1)
declares tba t variance and are uncorrelated. (If tbis were not so it couid be created by an
appropriate ortbogonai transformation.) Any distribution-tbeoretic results
xi = Oo+ 81 uli + 82 U2i ... (Jq-1 uq-l,i +si (7 .2.8) (as needed, for exampie, in tests of significance) will be based o n tbe
assumption tbat E is muitivariate normal.
wbere uli' ... , uq-l,i (j =l, 2, ... , n) are tbe vaiues taken by q -l
independent variabies. In tbe absence of any prospect of discordant values, requiring a modifica-
tion of tbe model (7 .3.1) and possibly revealed as outliers, we bave tbe
Tbere is little tbat is special to be said about outiiers in tbe specific context
of sucb multilinear or polynomial regression modeis. Many of tbe formai familiar least-squares analysis of tbe linear model (7.3.1). Tbe least-squares
metbodologicai considerations of tbe early sections carry over in obvious estimate of 6 is
ways: use of residuals, scanning by grapbical procedures, implications of (7.3.2)
256 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 257

Metbods based on division of tbe data into sub-samples, for detecting non-constant variance of estimated residuais, difficulties in precise determi-
outliers and for estimating parameters in a manner wbicb is robust against nation of criticai vaiues for test statistics, and so on.
outliers, are described by Scbweder (1976) and by Hinicb and Talwar In principie tbe uii couid take any vaiues an d q couid be of any order.
(1975), respectively. In tbe former case it is assumed tbat an uncontami- Tbus (7 .2.8) is really just a generai Iinear modei; it encompasses simpie
nated sub-sample can be identified, in tbe latter case trimmed means of tbe linear regression and all designed experiments. We bave tbus reacbed a
sets of sub-sample estimated regression coefficients are utiiized. stage wbere it is expedient to proceed directly to tbe generai case and we do
Otber robust procedures for estimating or testing tbe regression coeffi- so in tbe following section.
cients are presented by Adicbie (1967a, 1967b); tbey are non-parametric Tbe literature contains few specific studies of muitilinear or polynomiai
(based on rank tests) and provide en passan( a degree of protection against regression per se. O ne example is considered in Srikantan (1961) wbere
outliers. q= 3 and tbe uii (i= l, 2) are simpie trigonometric functions. But later work
on tbe relative insensitivity of tbe configuration of tbe vaiues of independent
Example 7.2. Consider the load/ extension data of T ab le 7.1. Fitting the variables (at least for residuai-based metbods of testing discordancy) impiies
linear regression mode l ( 7.2.1) we obtain tbat we do not need to be so specific and we can reasonabiy resort to results
00 =0.67656 81 = 0.07565. for tbe generai case.

The estimate d residuals, éi, and studentized residuals, éi si, are as given in 7.3 OUTLIERS WITH GENERAL LINEAR MODELS
Table 7.4. Thus the observation (53.4, 3.1) stands out as an outlier. However,
it is not statistically significant (the 5 per cent and l per cent criticai values Many of tbe generai resuits on outlier detection, testing discordancy and
being 2.29, 2.44: see Table XXXII). accommodating outliers in tbe presence of a generai Iinear modei bave
aiready been indicated or illustrated by tbe more specific studies earlier in
Table 7.4 tbe cbapter. However, we need to draw tbe tbreads togetber in re-examining
tbe range of approacbes and describing tbe extent and manner of generaiiza-
tion of tbe earlier resuits. Tbe basic distinction between metbods based
exclusiveiy on residuals, and otber metbods, is again evident. Most pub-
1.6 0.076 0.123 Iisbed results relate to tbe totally residual-based approacb.
2.1 -0.173 -0.245
3.4 0.462 0.616
3.3 0.044 0.058 7 .3.1 Residual-based methods
4.2 0.210 0.272
-2.137
W e are concerned witb situations wbere tbe observation vector x= (x 1 , x 2 ,
3.1 -1.616
4.9 -0.308 -0.422 .. ·. , xn )' is represented by a basic generai linear model
6.2 0.894 1.236 x=A6+E (7.3.1)
6.3 0.411 0.614
witb 6 a q x l vector of parameters, A a known n x q matrix of coefficients
(assumed bere to be of full rank) and E a n x l vector of residuais. We
7 .2.2 Multiple regression assume tbat E bas zero mean vector and variance matrix V( E)= a.2 In wbere
In is tbe n x n identity matrix, so tbat tbe true residuals bave common
Tbe immediate generalization of tbe simple Iinear regression model (7 .2.1)
declares tba t variance and are uncorrelated. (If tbis were not so it couid be created by an
appropriate ortbogonai transformation.) Any distribution-tbeoretic results
xi = Oo+ 81 uli + 82 U2i ... (Jq-1 uq-l,i +si (7 .2.8) (as needed, for exampie, in tests of significance) will be based o n tbe
assumption tbat E is muitivariate normal.
wbere uli' ... , uq-l,i (j =l, 2, ... , n) are tbe vaiues taken by q -l
independent variabies. In tbe absence of any prospect of discordant values, requiring a modifica-
tion of tbe model (7 .3.1) and possibly revealed as outliers, we bave tbe
Tbere is little tbat is special to be said about outiiers in tbe specific context
of sucb multilinear or polynomial regression modeis. Many of tbe formai familiar least-squares analysis of tbe linear model (7.3.1). Tbe least-squares
metbodologicai considerations of tbe early sections carry over in obvious estimate of 6 is
ways: use of residuals, scanning by grapbical procedures, implications of (7.3.2)
258 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 259

witb Sbortly we will need to consider in some detail wby tbe estimated
residuals, and studentized residuals, are particularly relevant to tbe study of
(7.3.3)
outliers. But first it is appropriate to review tbe state of knowledge about
and of E is tbese quantities in. tbeir own rigbt, and tbeir use in informai metbods of
(7 .3.4) examining outliers. A fundamental work on tbe analysis of least-squares
wbere residuals is by Anscombe (1961). He proposes metbods using estimated
(7.3.5) residuals to examine tbe assumption tbat tbe si are independent and
normally distributed witb constant variance. The metbods include a study of
and witb tbe regression of tbe estimated residuals on tbe corresponding fitted values,
(7.3.6) and take account of tbe intercorrelation of tbe estimated residuals. Tukey's
test of non-additivity is also considered. Particular attention is given to tbe
Tbe last term of (7.3.4) sbows bow tbe estimated residuals e relate to tbe case wbere all tbe residuals bave equal variance, typical of factorial design
unknown true residuals E, but tbe determination of e must be sought in experiments witb equal replication (see Section 7.1.1). Outliers receive only
terms of known quantities sucb as (In- R)x. Tbe estimated residuals ei bave passing mention.
zero means. From (7 .3.6) we see tbat tbey are typically correlated and bave Srikantan (1961), bowever, is concerned witb using estimated residuals
differing variances. Explicitly we can write specifically to investigate outliers. He adopts a mean-slippage alternative
var(éi) =(l- Aj(A' A)- 1 Ai)u 2 =(l- rii)u 2 (7 .3.7) model for a single outlier and assumes a normal error structure. For tbe
labelled slippage model (see Section 3.1) wbere tbe index of tbe discordant
(say) wbere Aj is tbe jtb row of A. value is specified be sbows tbat for a one-sided (two-sided) test of discor-
Tbe error variance u 2 will be unknown. An unbiased estimate is obtained dancy tbe corresponding studentized residual is tbe test statistic of tbe
as uniformly most powerful (unbiased) test of discordancy. Tbus if tbe alterna-
ii 2 = e'e/(n- q)= E' (In- R)E/(n- q) (7.3.8) tive bypotbesis is
H: E (X)= A9+a (7.3.12)
in view of tbe idempotency of (In- R); e'e is termed tbe residual sum of
squares and is denoted S 2 • V(e) can now be estimated as w bere a is an n x l vector of zeros apart from one possible non-zero value
in tbe itb position, tbe tests are based on di= édsi in tbe following way.
S 2 (e) =(In- R)ii 2 (7.3.9) Suppose
so tbat tbe estimated variance of èi is
(di ~O)
sf =(l- rii)ii 2 =(l- rjj)(e'e)/(n- q)= (l- rii)S 2 /(n q). (7 .3.10) ={df

l o (di< O) (7.3.13)
We sball bave reason to consider tbe studentized residuals
(di;:::: O)
(7.3.11)
v·={odf
l (di< O).
Tbe uniformly most powerful (one-sided) tests of ai= O against ai> O and
Tbey bave an immediate intuitive appeal in tbat tbey constitute weigbted ai< O bave rejection regions ui > Ra and vi < Ra, respectively, wbere Ra bas
versions of tbe estimated residuals éi, wbere tbe weigbts are inversely to be determined to produce a test of size a. Against ai i= O tbe two-sided
proportional to estimates of tbe standard deviations of tbe éi. Tbe variances test witb rejection region ti~ R~ is uniformly most powerful unbiased (again
of the ei sbould tbus be more or less constant (precisely so if S in (7 .3.11) R~ bas to be cbosen to yield a test of size a).
were replaced by u../(n- q)), avoiding tbe inconvenience of tbe disparate For tbe (more realistic) unlabelled slippage model, wbere tbe index i of
variances of tbe éi. _ tbe discordant value is unspecified, similar tests based o n t = maxi df,
If E is normally distributed tbe estimates 9, e, and &2 are of course u = maxi ui, and v = maxi vi are recommended, but tbere is no consideration
maximum likelibood estimators, and tbeir distributional bebaviour is well of tbeir optimality properties. We sball return to tbe tborny issue of tbe
known ( oj and éj are normally distributed, S 2 bas a u 2 x 2 distribution witb determination of tbe criticai values (tbe corresponding R~ and Ra) of tbese
(n- q) degrees of freedom). tests.
258 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 259

witb Sbortly we will need to consider in some detail wby tbe estimated
residuals, and studentized residuals, are particularly relevant to tbe study of
(7.3.3)
outliers. But first it is appropriate to review tbe state of knowledge about
and of E is tbese quantities in. tbeir own rigbt, and tbeir use in informai metbods of
(7 .3.4) examining outliers. A fundamental work on tbe analysis of least-squares
wbere residuals is by Anscombe (1961). He proposes metbods using estimated
(7.3.5) residuals to examine tbe assumption tbat tbe si are independent and
normally distributed witb constant variance. The metbods include a study of
and witb tbe regression of tbe estimated residuals on tbe corresponding fitted values,
(7.3.6) and take account of tbe intercorrelation of tbe estimated residuals. Tukey's
test of non-additivity is also considered. Particular attention is given to tbe
Tbe last term of (7.3.4) sbows bow tbe estimated residuals e relate to tbe case wbere all tbe residuals bave equal variance, typical of factorial design
unknown true residuals E, but tbe determination of e must be sought in experiments witb equal replication (see Section 7.1.1). Outliers receive only
terms of known quantities sucb as (In- R)x. Tbe estimated residuals ei bave passing mention.
zero means. From (7 .3.6) we see tbat tbey are typically correlated and bave Srikantan (1961), bowever, is concerned witb using estimated residuals
differing variances. Explicitly we can write specifically to investigate outliers. He adopts a mean-slippage alternative
var(éi) =(l- Aj(A' A)- 1 Ai)u 2 =(l- rii)u 2 (7 .3.7) model for a single outlier and assumes a normal error structure. For tbe
labelled slippage model (see Section 3.1) wbere tbe index of tbe discordant
(say) wbere Aj is tbe jtb row of A. value is specified be sbows tbat for a one-sided (two-sided) test of discor-
Tbe error variance u 2 will be unknown. An unbiased estimate is obtained dancy tbe corresponding studentized residual is tbe test statistic of tbe
as uniformly most powerful (unbiased) test of discordancy. Tbus if tbe alterna-
ii 2 = e'e/(n- q)= E' (In- R)E/(n- q) (7.3.8) tive bypotbesis is
H: E (X)= A9+a (7.3.12)
in view of tbe idempotency of (In- R); e'e is termed tbe residual sum of
squares and is denoted S 2 • V(e) can now be estimated as w bere a is an n x l vector of zeros apart from one possible non-zero value
in tbe itb position, tbe tests are based on di= édsi in tbe following way.
S 2 (e) =(In- R)ii 2 (7.3.9) Suppose
so tbat tbe estimated variance of èi is
(di ~O)
sf =(l- rii)ii 2 =(l- rjj)(e'e)/(n- q)= (l- rii)S 2 /(n q). (7 .3.10) ={df

l o (di< O) (7.3.13)
We sball bave reason to consider tbe studentized residuals
(di;:::: O)
(7.3.11)
v·={odf
l (di< O).
Tbe uniformly most powerful (one-sided) tests of ai= O against ai> O and
Tbey bave an immediate intuitive appeal in tbat tbey constitute weigbted ai< O bave rejection regions ui > Ra and vi < Ra, respectively, wbere Ra bas
versions of tbe estimated residuals éi, wbere tbe weigbts are inversely to be determined to produce a test of size a. Against ai i= O tbe two-sided
proportional to estimates of tbe standard deviations of tbe éi. Tbe variances test witb rejection region ti~ R~ is uniformly most powerful unbiased (again
of the ei sbould tbus be more or less constant (precisely so if S in (7 .3.11) R~ bas to be cbosen to yield a test of size a).
were replaced by u../(n- q)), avoiding tbe inconvenience of tbe disparate For tbe (more realistic) unlabelled slippage model, wbere tbe index i of
variances of tbe éi. _ tbe discordant value is unspecified, similar tests based o n t = maxi df,
If E is normally distributed tbe estimates 9, e, and &2 are of course u = maxi ui, and v = maxi vi are recommended, but tbere is no consideration
maximum likelibood estimators, and tbeir distributional bebaviour is well of tbeir optimality properties. We sball return to tbe tborny issue of tbe
known ( oj and éj are normally distributed, S 2 bas a u 2 x 2 distribution witb determination of tbe criticai values (tbe corresponding R~ and Ra) of tbese
(n- q) degrees of freedom). tests.
260 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 261

Tiao and Guttman (1967) reconsider tbe premium-protection approacb of corresponding witb (7.3.12). Tbis leads to a test wbicb attributes discordancy
Anscombe (1960a) in some detail for tbe case of a single univariate sample, to a single outlier wben mini llrfll is sufficiently small, wbere j denotes tbe
depending on tbe extent of knowledge of tbe residua! variance u 2 • (See index of tbe potential discordant value. Andrews develops tbis test in tbe
Section 4.1.1.) A brief comment on tbe generallinear model is interesting. cases wbere tbe error distribution is normal, and exponential. Tbe approacb
Recognizing tbe difficulties arising from tbe intercorrelation of tbe estimated is also extended to multiple outliers. He claims tbat bis approacb is essen-
residuals, tbey propose using uncorrelated modified residuals, baving tbe tially different from otbers wbicb examine merely tbe absolute values of tbe
form (wben u 2 is known) residuals (or normed residuals), and be suggests tbat it is an improvement in
view of its ability to take account of tbe form of A. But tbis facility, as
z= e+uAPu (7.3.14)
expressed tbrougb tbe use of projection vectors, seems to reduce merely to
wbere P is any q x q matrix satisfying tbe use of tbe individually studentized residuals ei of (7.3.11) ratber tban of
tbe undifferentially weigbted normed residuals. See also Jobn and Draper
PP' = (A'A)- 1 (7.3.15)
(1978).
and u is N(O, lq) independent of X. For a generalized approacb to tbe analysis of residuals (but witb only
From (7.3.5) and (7.3.6) we clearly bave passing reference to outliers) see Cox and Snell (1968, 1971). See also
Bebnken and Draper (1972), Draper and Smitb (1966), and Wooding
V(z) =[(In- A(A'A)- 1 A')+APP'A']u2 = Inu 2 • (7.3.16)
(1969). Ellenberg (1973) considers in some detail tbe joint distribution of
Sucb modified residuals (or adjusted residuals) migbt seem attractive in tbe studentized residuals (7.3.11) in a generai linear model witb normal
view of tbeir independence wben tbe error distribution is normal. Various error structure, yielding wbat be terms 'a standardised version of tbe
accommodation procedures (including premium-protection rules and Win- Inverted-Student Function'. Tbe various grapbical metbods described in
sorization) utilizing tbem are discussed in detail by Golub, Guttman, and earlier sections will also be useful for exbibiting outliers, altbougb it will be
Dutter (1973) for a location-slippage model for possible discordant values. more appropriate to plot tbe individually studentized residuals ei ratber tban
Tbey remark tbat for large n tbe correction factors in (7.3.14) arising from tbe ordinary estimated residuals éi.
introducing alien independent perturbations of tbe residuals to '"break" tbe Summarizing tbe current attitude to tests of discordancy (and accommo-
correlation pattern' are small. But tbere is no detailed consideration of bow dation) of outliers in tbe generai linear model situation we find almost total
large n needs to be, nor is tbere any comparison of tbeir approacb witb preoccupation witb tbe maximum (positive, negative, or absolute) studen-
otbers based on tbe unadjusted residuals taking proper account of tbeir tized residua! as test statistic, witb discordancy attributed to tbe observation
intercorrela tions. yielding tbe maximum provided tbat maximum is sufficiently large. Tbis is
A different approacb to tests of discordancy of outliers, based on residu- precisely tbe Srikantan (1961) prescription. Illustrating it for tbe two-sided
als, is described by Andrews (1971). He considers tbe unit vector of test, we reject tbe basic bypotbesis of (7.3.1) in favour of one wbicb
normalized residuals r = 't/('t'€} = 't/S and develops tests of discordancy postulates a single discordant value, e. g. (7 .3.12), if
based on a projection of tbis vector onto a suitable subspace. Adopting tbe
outlier model (7.3.12) we bave T= max lei l= max lé/sil >ha (7.3.18)
j j
Sr = S'r' +(In R)a (7.3.17)
wbere for a test of size a tbe criticai value ha needs to be cbosen to ensure
wbere (S'f and r' are tbe residual sum of squares, and vector of estimated tbat, under (7 .3.1), (7 .3.18) bolds witb probability a. Tbe observation wbìcb
residuals, respectively, wben tbe basic (no-outlier) model is true, i.e. a= O. yields tbe maximum value of ei is declared to be tbe discordant outlier.
Tbus ifa f:. O, r' is perturbed by a vector in tbe direction (In- R)g wbere g is Most discussion of sucb a test centres on tbe problem of determining tbe
an n x l vector witb n - l zeros and a unit value at tbe position correspond- criticai values ha. We assume first of all tbat u 2 is unknown and bas to be
ing witb tbe discordant value, as indicated by a. Discordancy is revealed by r estimated solely from tbe current data. Srikantan (1961) used tbe first
being too cl o se to (In - R )g and we ca n assess tbis in terms of bow small is Bonferroni inequality to determine upper bounds for tbe criticai values of
tbe norm llr*ll of tbe ortbogonal complement r* of tbe projection of r on bis statistics (discussed above: defined in terms of squared studentized
(In- R)g. residuals). As we saw, be was able to present results for some models witb q
Of course, we will not often wisb to specify tbe index of tbe potential up to 3 in value. We bave also noted tbe special case approximations of
discordant value, but will need to test tbe unlabelled mean-slippage model Prescott (1975a, 1975b), and simulation results of Tietjen, Moore, and
260 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 261

Tiao and Guttman (1967) reconsider tbe premium-protection approacb of corresponding witb (7.3.12). Tbis leads to a test wbicb attributes discordancy
Anscombe (1960a) in some detail for tbe case of a single univariate sample, to a single outlier wben mini llrfll is sufficiently small, wbere j denotes tbe
depending on tbe extent of knowledge of tbe residua! variance u 2 • (See index of tbe potential discordant value. Andrews develops tbis test in tbe
Section 4.1.1.) A brief comment on tbe generallinear model is interesting. cases wbere tbe error distribution is normal, and exponential. Tbe approacb
Recognizing tbe difficulties arising from tbe intercorrelation of tbe estimated is also extended to multiple outliers. He claims tbat bis approacb is essen-
residuals, tbey propose using uncorrelated modified residuals, baving tbe tially different from otbers wbicb examine merely tbe absolute values of tbe
form (wben u 2 is known) residuals (or normed residuals), and be suggests tbat it is an improvement in
view of its ability to take account of tbe form of A. But tbis facility, as
z= e+uAPu (7.3.14)
expressed tbrougb tbe use of projection vectors, seems to reduce merely to
wbere P is any q x q matrix satisfying tbe use of tbe individually studentized residuals ei of (7.3.11) ratber tban of
tbe undifferentially weigbted normed residuals. See also Jobn and Draper
PP' = (A'A)- 1 (7.3.15)
(1978).
and u is N(O, lq) independent of X. For a generalized approacb to tbe analysis of residuals (but witb only
From (7.3.5) and (7.3.6) we clearly bave passing reference to outliers) see Cox and Snell (1968, 1971). See also
Bebnken and Draper (1972), Draper and Smitb (1966), and Wooding
V(z) =[(In- A(A'A)- 1 A')+APP'A']u2 = Inu 2 • (7.3.16)
(1969). Ellenberg (1973) considers in some detail tbe joint distribution of
Sucb modified residuals (or adjusted residuals) migbt seem attractive in tbe studentized residuals (7.3.11) in a generai linear model witb normal
view of tbeir independence wben tbe error distribution is normal. Various error structure, yielding wbat be terms 'a standardised version of tbe
accommodation procedures (including premium-protection rules and Win- Inverted-Student Function'. Tbe various grapbical metbods described in
sorization) utilizing tbem are discussed in detail by Golub, Guttman, and earlier sections will also be useful for exbibiting outliers, altbougb it will be
Dutter (1973) for a location-slippage model for possible discordant values. more appropriate to plot tbe individually studentized residuals ei ratber tban
Tbey remark tbat for large n tbe correction factors in (7.3.14) arising from tbe ordinary estimated residuals éi.
introducing alien independent perturbations of tbe residuals to '"break" tbe Summarizing tbe current attitude to tests of discordancy (and accommo-
correlation pattern' are small. But tbere is no detailed consideration of bow dation) of outliers in tbe generai linear model situation we find almost total
large n needs to be, nor is tbere any comparison of tbeir approacb witb preoccupation witb tbe maximum (positive, negative, or absolute) studen-
otbers based on tbe unadjusted residuals taking proper account of tbeir tized residua! as test statistic, witb discordancy attributed to tbe observation
intercorrela tions. yielding tbe maximum provided tbat maximum is sufficiently large. Tbis is
A different approacb to tests of discordancy of outliers, based on residu- precisely tbe Srikantan (1961) prescription. Illustrating it for tbe two-sided
als, is described by Andrews (1971). He considers tbe unit vector of test, we reject tbe basic bypotbesis of (7.3.1) in favour of one wbicb
normalized residuals r = 't/('t'€} = 't/S and develops tests of discordancy postulates a single discordant value, e. g. (7 .3.12), if
based on a projection of tbis vector onto a suitable subspace. Adopting tbe
outlier model (7.3.12) we bave T= max lei l= max lé/sil >ha (7.3.18)
j j
Sr = S'r' +(In R)a (7.3.17)
wbere for a test of size a tbe criticai value ha needs to be cbosen to ensure
wbere (S'f and r' are tbe residual sum of squares, and vector of estimated tbat, under (7 .3.1), (7 .3.18) bolds witb probability a. Tbe observation wbìcb
residuals, respectively, wben tbe basic (no-outlier) model is true, i.e. a= O. yields tbe maximum value of ei is declared to be tbe discordant outlier.
Tbus ifa f:. O, r' is perturbed by a vector in tbe direction (In- R)g wbere g is Most discussion of sucb a test centres on tbe problem of determining tbe
an n x l vector witb n - l zeros and a unit value at tbe position correspond- criticai values ha. We assume first of all tbat u 2 is unknown and bas to be
ing witb tbe discordant value, as indicated by a. Discordancy is revealed by r estimated solely from tbe current data. Srikantan (1961) used tbe first
being too cl o se to (In - R )g and we ca n assess tbis in terms of bow small is Bonferroni inequality to determine upper bounds for tbe criticai values of
tbe norm llr*ll of tbe ortbogonal complement r* of tbe projection of r on bis statistics (discussed above: defined in terms of squared studentized
(In- R)g. residuals). As we saw, be was able to present results for some models witb q
Of course, we will not often wisb to specify tbe index of tbe potential up to 3 in value. We bave also noted tbe special case approximations of
discordant value, but will need to test tbe unlabelled mean-slippage model Prescott (1975a, 1975b), and simulation results of Tietjen, Moore, and
262 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 263

Beckman (1973), botb limited to simple linear regression. Lund (1975) for example, we migbt cboose to detect a single outlier in terms of greatest
presents tbe most useful tabulation to date. Using tbe results of Ellenberg increase in tbe maximized likelibood under tbe set of bypotbeses ~ relative
(1973), and again tbe first Bonferroni inequality, he determines the required to tbe basic model H: x= A9+E.
(unavailable) percentage points of tbe F-distribution en route to a fairly Under H tbe likelibood is maximized by putting 9 =è and u 2 = S 2 /n =
comprebensive tabulation of upper bounds to tbe 10, 5, and l per cent points (E'E)/n. Tbe maxÌmized log-likelibood is
of T under tbe basic model, for n= 5(1)10(2)20(5)50(10)100 and q=
1(1)6(2)10, 15, 25. Tbe 5 per cent and l per cent are reproduced as Table -glog (2:S2)-g.
XXXII on pages 335-336.
Wben we bave some knowledge about u 2 , external to tbe data, tbe Under ~' xi arises from N(J.L, u 2 ) wbilst x_i arises (independently) from
discordancy test needs appropriate modification$. Josbi (1972a) considers tbis N(A_i9, u 2 In_ 1 ) wbere x_i is tbe set of observations excluding xi and A_i is
for tbe normal error model wbere we bave eitber an external estimate s~ of tbe reduced matrix A obtained o n deletion of tbe jtb row. Tbe likelibood is
u 2 distributed as u 2 x~/v independent of X, or wbere we know u 2 precisely. now maximized by putting J.L = xi and
Tbe test structure is tbe same as before except tbat tbe (internally) studen-
tized residuals éJ si are replaced by externally studentized, pooled studen- 8 = (A~iA-i)- 1 A~~-i
tized, or standardized residuals u 2 = S?:)n
éJ[(l- ru)s~f, wbere s:_i is tbe sum of squares of tbe estimated residuals wben tbe reduced
(v+ n- q)! éJ[(l- riJ( vs~ + S )]!,
2 data vector x_i is fitted by least-squares to tbe mode l x_i = A_ie + E_i. Tbe
an d maximized log-likelibood now becomes
éJ[(l- riJu 2 f.
n
--zlog (21rs:_i)
-n- - n.
A test of discordancy again proceeds in terms of tbe maximum (positive, 2
negative, or absolute) value of tbe weigbted residuals. For tbe pooled
studentized residuals, v = O reduces to tbe originai test based o n tbe éJ si. Tbus tbe increase in tbe maximized log-likelibood is
Joshi's approach to tbe determination of tbe criticai values of the tests
again yields only upper bounds but tbey turn out to be intermediate between
these given by tbe first Bonferroni inequality and tbe more precise (but less
computationally tractable) second Bonferroni inequality. The performance and so on tbe above criterion (witb no restriction on tbe value of a) a single
of tbe tests is considered, and illustrated numerically for tbe simplest case of outlier is detected as tbat observation wbose omission from tbe sample
a single univariate normal sample. Tbrougbout it is assumed under tbe effects tbe greatest reduction in tbe residual sum of squares. Tbus tbe outlier
alternative bypothesis tbat tbere is a single discordant value corresponding is tbe observation yielding
witb a constant slippage of tbe mean; its index is unknown and is assumed to max(S 2 / s:_j).
be cbosen at random from tbe set (1, 2, ... , n). j

In concluding tbis section we return to tbe basic question. Apart from


It is to be ad judged discordant if tbis maximum is suffi.ciently large relative
intuitive appeal, wby sbould we regard tbe individually studentized residuals
to its null distribution.
as appropriate representations of tbe data for detecting outliers and for
In fact we can sbow tbat tbe simple relationsbip bolds:
testing discordancy?
Tbe basic model is (7.3.1): x=A9+E. Tbe prospect of a single discordant S 2 = s:_i+ éJ![l- Aj(A'A)- 1 Ai]
value reflecting slippage in tbe mean can be expressed in terms of tbe set of = s:_j+ éJ/(1- rjj). (7 .3.20)
alternative bypotbeses
Hence
(j =l, 2, ... , n) (7 .3.19)
S 2 /S:_i= S 2 /[S 2 - éf/(1- rii)]
wbere ai is n x l witb a value a in tbe jtb position and zeros elsewbere. Tbus
iii declares tbat xi is tbe discordant value, from tbe distribution N(Aj9 + ={l- étf[S 2 (1- rii)]}- 1
a, u 2 ) wbere Aj is tbe jtb row of A. As in our study of multivariate outliers, =[l- etf(n- q)]- 1 (7 .3.21)
262 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 263

Beckman (1973), botb limited to simple linear regression. Lund (1975) for example, we migbt cboose to detect a single outlier in terms of greatest
presents tbe most useful tabulation to date. Using tbe results of Ellenberg increase in tbe maximized likelibood under tbe set of bypotbeses ~ relative
(1973), and again tbe first Bonferroni inequality, he determines the required to tbe basic model H: x= A9+E.
(unavailable) percentage points of tbe F-distribution en route to a fairly Under H tbe likelibood is maximized by putting 9 =è and u 2 = S 2 /n =
comprebensive tabulation of upper bounds to tbe 10, 5, and l per cent points (E'E)/n. Tbe maxÌmized log-likelibood is
of T under tbe basic model, for n= 5(1)10(2)20(5)50(10)100 and q=
1(1)6(2)10, 15, 25. Tbe 5 per cent and l per cent are reproduced as Table -glog (2:S2)-g.
XXXII on pages 335-336.
Wben we bave some knowledge about u 2 , external to tbe data, tbe Under ~' xi arises from N(J.L, u 2 ) wbilst x_i arises (independently) from
discordancy test needs appropriate modification$. Josbi (1972a) considers tbis N(A_i9, u 2 In_ 1 ) wbere x_i is tbe set of observations excluding xi and A_i is
for tbe normal error model wbere we bave eitber an external estimate s~ of tbe reduced matrix A obtained o n deletion of tbe jtb row. Tbe likelibood is
u 2 distributed as u 2 x~/v independent of X, or wbere we know u 2 precisely. now maximized by putting J.L = xi and
Tbe test structure is tbe same as before except tbat tbe (internally) studen-
tized residuals éJ si are replaced by externally studentized, pooled studen- 8 = (A~iA-i)- 1 A~~-i
tized, or standardized residuals u 2 = S?:)n
éJ[(l- ru)s~f, wbere s:_i is tbe sum of squares of tbe estimated residuals wben tbe reduced
(v+ n- q)! éJ[(l- riJ( vs~ + S )]!,
2 data vector x_i is fitted by least-squares to tbe mode l x_i = A_ie + E_i. Tbe
an d maximized log-likelibood now becomes
éJ[(l- riJu 2 f.
n
--zlog (21rs:_i)
-n- - n.
A test of discordancy again proceeds in terms of tbe maximum (positive, 2
negative, or absolute) value of tbe weigbted residuals. For tbe pooled
studentized residuals, v = O reduces to tbe originai test based o n tbe éJ si. Tbus tbe increase in tbe maximized log-likelibood is
Joshi's approach to tbe determination of tbe criticai values of the tests
again yields only upper bounds but tbey turn out to be intermediate between
these given by tbe first Bonferroni inequality and tbe more precise (but less
computationally tractable) second Bonferroni inequality. The performance and so on tbe above criterion (witb no restriction on tbe value of a) a single
of tbe tests is considered, and illustrated numerically for tbe simplest case of outlier is detected as tbat observation wbose omission from tbe sample
a single univariate normal sample. Tbrougbout it is assumed under tbe effects tbe greatest reduction in tbe residual sum of squares. Tbus tbe outlier
alternative bypothesis tbat tbere is a single discordant value corresponding is tbe observation yielding
witb a constant slippage of tbe mean; its index is unknown and is assumed to max(S 2 / s:_j).
be cbosen at random from tbe set (1, 2, ... , n). j

In concluding tbis section we return to tbe basic question. Apart from


It is to be ad judged discordant if tbis maximum is suffi.ciently large relative
intuitive appeal, wby sbould we regard tbe individually studentized residuals
to its null distribution.
as appropriate representations of tbe data for detecting outliers and for
In fact we can sbow tbat tbe simple relationsbip bolds:
testing discordancy?
Tbe basic model is (7.3.1): x=A9+E. Tbe prospect of a single discordant S 2 = s:_i+ éJ![l- Aj(A'A)- 1 Ai]
value reflecting slippage in tbe mean can be expressed in terms of tbe set of = s:_j+ éJ/(1- rjj). (7 .3.20)
alternative bypotbeses
Hence
(j =l, 2, ... , n) (7 .3.19)
S 2 /S:_i= S 2 /[S 2 - éf/(1- rii)]
wbere ai is n x l witb a value a in tbe jtb position and zeros elsewbere. Tbus
iii declares tbat xi is tbe discordant value, from tbe distribution N(Aj9 + ={l- étf[S 2 (1- rii)]}- 1
a, u 2 ) wbere Aj is tbe jtb row of A. As in our study of multivariate outliers, =[l- etf(n- q)]- 1 (7 .3.21)
264 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 265

and we see that maximization of S 2 /S:_i is merely equivalent to maximization estimated residuals and of S 2. The visual impact is equivalent, being the
of the squares of the (individually) studentized residuals, ei. So the maximum reflection of the values of weighted squared estimated residuals (it might
likelihood ratio procedure involves examination of the squares (or absolute appear heightened on the residua! sum of squares approach merely because
values) of the studentized residuals, detection of the outlier as the observa- we are there looking at a reduction from S 2 by the square of the estimated
tion yielding the maximum absolute studentized residua!, and assessment of residua!, rather than at just the absolute value of the residual). Finally, no
discordancy if it is statistically too large relative to the basic model. This statistica! distinction exists; in particular the method is still subject to the
equivalence between reduction in the residua! sum of squares and the cloaking, or confounding, influence of the outliers we are seeking to detect.
absolute values of the studentized residuals is demonstrated by Ellenberg
{1973, 1976). Example 7.3. Returning to Example 7.1 we can demonstrate this
So the preoccupation with studentized resìduals as an indication of equivalence. In this factorial experiment all the estimated residuals have equal
outliers in the generai linear model finds a sound foundation on the variance. In fact rjj = r = 0.5. Now S 2 = 2152.5 and if we reduce S 2 separately
maximum likelihood ratio principle, and discordancy tests based on by the éf/(1- r) using the values for ei in column (5) we obtain (after
maxi(S 2 /S:.i) and on max/ei) are equivalent. The maximum likelihood ratio appropriate weighting by the degrees of free do m) the residua/ mean squares
basis for these tests is exhibited by Fieller (1976). corresponding with the entries in column (7). Alternatively, if we examine the
This equivalence has important implications as we shall see in the next squares of the estimated residuals we see that the outlier at 'a' shows up just as
section. graphically as in column ( 7).
Mickey (1974) and Mickey, Dunn, and Clark {1967) also proposed such
examination of the separate residua! sums of squares after omission of the
7 .3.2 Non-residual-based methods
separate residuals singly, suggesting that a large enough reduction is evi-
There are few methods available for dealing with outliers with the generai dence of a discordant outlier. Their test statistic is
linear model which do not make use of residuals in some way. Various
methods based on residua! sums of squares may seem at first sight (and s2-s2. }
indeed are often claimed by their progenitors) not to involve direct study of m:x { s:./(n- q-:_ l)
the estimated residuals. It is sometimes advanced that this avoids a disad-
vantageous cloaking (or confounding) effect of an outlier arising from the with attribution of discordancy if it is sufficiently large. {They seem to think
fact that all the estimated residuals reflect the influence of the outlier. The it necessary, however, to conduct n separate regression analyses, rather than
view is expressed that there is a contradiction in trying to detect an outlier just one yielding S 2 and the ei.)
by examining estimated residuals which are 'biased by the presence' of the Snedecor and Cochran (1967, page 157) discuss a test of discordancy
outlier we are trying to detect. We bave examined a method due to based on the maximum of the studentized differences, i.e. the test statistic is
Goldsmith and Boddy (1973) for outlier detection in factorial experiments
which consists of examining the set of n residua! sums of squares obtained max {(xi- i_i)/[V(xi- i_i)]!}
j
by regarding each observation separately as a missing value. In Example 7 .l
we noticed how this approach highlighted a single outlier, in a manner which wh~re i_i is the least-squares estimate of xi when xi is regarded as a missing
seemed more dramatic than the corresponding indication from the values of observation. Again the detected outlier is acclaimed discordant if the test
the estimated residuals. statistic is suffi.ciently large.
However, the results at the end of the previous section cast doubt on any Ellenberg {1976) demonstrates that these two methods are equivalent,
advantage in the use of residua} sums of squares rather than of estimated and both coincide with the test based on the maximum absolute studentized
(studentized) residuals. Regarding an observation xi as missing will have the residua!. He employs the result (7 .3.20) which he derived earlier-Ellenberg
same effect as estimating the parameters under the model ~ and the (1973).
resuJting residua! sum of squares will be just s:.i. But in view of (7 .3.20) the The proposal of Gentleman and Wilk (1975a) to examine the 'k most
separate residua! sums of squares are readily obtained merely by reducing likely outlier subse t' (see Section 7 .1.4) is also applicable to the generai
the overall residua! sum of squares S 2 by the corresponding weighted linear model although it will be more computationally laborious than in the
squared estimated residua! ef/{1- rii). Thus three conclusions arise. The case of a two-way design examined in detail by the authors. It also possesses
calculation is the same in both approaches, involving determination of the a corresponding link with studentized residuals.
264 Outliers in statistica[ data Outliers in designed experiments, regression, and in time-series 265

and we see that maximization of S 2 /S:_i is merely equivalent to maximization estimated residuals and of S 2. The visual impact is equivalent, being the
of the squares of the (individually) studentized residuals, ei. So the maximum reflection of the values of weighted squared estimated residuals (it might
likelihood ratio procedure involves examination of the squares (or absolute appear heightened on the residua! sum of squares approach merely because
values) of the studentized residuals, detection of the outlier as the observa- we are there looking at a reduction from S 2 by the square of the estimated
tion yielding the maximum absolute studentized residua!, and assessment of residua!, rather than at just the absolute value of the residual). Finally, no
discordancy if it is statistically too large relative to the basic model. This statistica! distinction exists; in particular the method is still subject to the
equivalence between reduction in the residua! sum of squares and the cloaking, or confounding, influence of the outliers we are seeking to detect.
absolute values of the studentized residuals is demonstrated by Ellenberg
{1973, 1976). Example 7.3. Returning to Example 7.1 we can demonstrate this
So the preoccupation with studentized resìduals as an indication of equivalence. In this factorial experiment all the estimated residuals have equal
outliers in the generai linear model finds a sound foundation on the variance. In fact rjj = r = 0.5. Now S 2 = 2152.5 and if we reduce S 2 separately
maximum likelihood ratio principle, and discordancy tests based on by the éf/(1- r) using the values for ei in column (5) we obtain (after
maxi(S 2 /S:.i) and on max/ei) are equivalent. The maximum likelihood ratio appropriate weighting by the degrees of free do m) the residua/ mean squares
basis for these tests is exhibited by Fieller (1976). corresponding with the entries in column (7). Alternatively, if we examine the
This equivalence has important implications as we shall see in the next squares of the estimated residuals we see that the outlier at 'a' shows up just as
section. graphically as in column ( 7).
Mickey (1974) and Mickey, Dunn, and Clark {1967) also proposed such
examination of the separate residua! sums of squares after omission of the
7 .3.2 Non-residual-based methods
separate residuals singly, suggesting that a large enough reduction is evi-
There are few methods available for dealing with outliers with the generai dence of a discordant outlier. Their test statistic is
linear model which do not make use of residuals in some way. Various
methods based on residua! sums of squares may seem at first sight (and s2-s2. }
indeed are often claimed by their progenitors) not to involve direct study of m:x { s:./(n- q-:_ l)
the estimated residuals. It is sometimes advanced that this avoids a disad-
vantageous cloaking (or confounding) effect of an outlier arising from the with attribution of discordancy if it is sufficiently large. {They seem to think
fact that all the estimated residuals reflect the influence of the outlier. The it necessary, however, to conduct n separate regression analyses, rather than
view is expressed that there is a contradiction in trying to detect an outlier just one yielding S 2 and the ei.)
by examining estimated residuals which are 'biased by the presence' of the Snedecor and Cochran (1967, page 157) discuss a test of discordancy
outlier we are trying to detect. We bave examined a method due to based on the maximum of the studentized differences, i.e. the test statistic is
Goldsmith and Boddy (1973) for outlier detection in factorial experiments
which consists of examining the set of n residua! sums of squares obtained max {(xi- i_i)/[V(xi- i_i)]!}
j
by regarding each observation separately as a missing value. In Example 7 .l
we noticed how this approach highlighted a single outlier, in a manner which wh~re i_i is the least-squares estimate of xi when xi is regarded as a missing
seemed more dramatic than the corresponding indication from the values of observation. Again the detected outlier is acclaimed discordant if the test
the estimated residuals. statistic is suffi.ciently large.
However, the results at the end of the previous section cast doubt on any Ellenberg {1976) demonstrates that these two methods are equivalent,
advantage in the use of residua} sums of squares rather than of estimated and both coincide with the test based on the maximum absolute studentized
(studentized) residuals. Regarding an observation xi as missing will have the residua!. He employs the result (7 .3.20) which he derived earlier-Ellenberg
same effect as estimating the parameters under the model ~ and the (1973).
resuJting residua! sum of squares will be just s:.i. But in view of (7 .3.20) the The proposal of Gentleman and Wilk (1975a) to examine the 'k most
separate residua! sums of squares are readily obtained merely by reducing likely outlier subse t' (see Section 7 .1.4) is also applicable to the generai
the overall residua! sum of squares S 2 by the corresponding weighted linear model although it will be more computationally laborious than in the
squared estimated residua! ef/{1- rii). Thus three conclusions arise. The case of a two-way design examined in detail by the authors. It also possesses
calculation is the same in both approaches, involving determination of the a corresponding link with studentized residuals.
Outliers in designed experiments, regression, and in time-series 267
266 Outliers in statistica[ data
Indeed tbere is little publisbed work on outliers in time-series in terms
7.4 OUTLIERS IN TIME-SERIES
eitber of frequency-domain or time-domain analyses and tbis would seem to
be an important and bigbly cballenging area for furtber study.
We noted in Cbapter l two examples of time-series exbibiting outliers. In
One of tbe few contributions to date is tbat of Fox (1972) wbo defines two
tbe first example, illustrated in Figure 1.3, a realization of a non-stationary
types of outlier wbicb migbt occur in time-series data. His type I and type II
series of sales figures sbowed a distinct disruption of tbe quarterly cyclic
outliers are precisely tbe isolated independent gross execution or recording
pattern of sales, possibly reflecting an outlier. It could be tbat adverse
errors, independent of otber observations, and tbe 'inberent' type of
trading conditions resulting from government action or fiscal policy, sbort-
anomalous observation wbicb influences succeeding observations, wbicb we
term emergency company action or even delays in returns of sales figures
distinguisbed above. Four situations are postulated:
untypically depressed tbe sales figure at A but also induced a compensatory
untypical sales figure at B. Tbe second example of moisture content of (i) all outliers are of type I,
tabacco (Figure 1.4) exbibited apparently isolated outliers at A, B, etc. (ii) an outliers are of type II,
wbicb migbt bave reflected malfunction of tbe recording equipment. (iii) an outliers are of tbe same type, but we do not know wbicb type, and
Notice again tbe complications involved in detecting an outlier. As in (iv) both types of outlier are present.
generai linear model data, it is not necessarily an extreme value and it can
How we are to assess wbicb of tbese situations prevails is not considered
be cloaked to some extent by tbe generai structure of tbe process. In
apart from remarking tbat (ii) will be distinguisbed from (i) by tbe presence
particular, we can experience a smootbing-out effect wben we attempt to
of tbe carry-over effect. Only situations (i) and (ii) are examined in some
examine outliers in terms of derived quantities sucb as tbe values of
detail, on tbe assumption tbat tbe process is free of trend or seasonal
estimated residuals about a fitted model. Wbilst in tbe linear model any
factors; tbe possible effect of tbeir removal on tbe examination of outliers is
outlier does not tend to influence adjacent observations per se, merely tbe
not discussed. Tbus tbe metbods presented bave obvious limitations, com-
estimated residuals, etc., tbe same need not be true for time-series data in
pounded by an additional assumption tbat tbere is at most one outlier, but
view of tbe correlational pattern of tbe basic process. It is fruitful tberefore
tbey do provide a starting point in tbis difficult area of study.
to consider tbe prospect of two types of outlier.
A test for type I outliers is developed in relation to tbe mean-slippage
In tbe first case an isolated measurement, or execution, error is superim-
outlier model for a discrete time-series
posed on an otberwise reasonable realization of tbe process. Tbis will not be
reflected in tbe values of adjacent observations and its manifestation can be (7.4.1)
dramatic and obvious. Sucb a 'biccougb' effect is possibly wbat we are
noticing in tbe data of Figure 1.4. wbere 8it is tbe Kronecker delta function and tbe ut satisfy an auto-
Alternatively, a more inberent discordancy can arise and be reflected by regressive scbeme of order p
tbe correlation structure of tbe process in neigbbouring (usuany later) p

observations. Tbe data of Figure 1.3 possibly illustrate tbis effect. For tbis ut = Laiut-I + zt
1=1
(t= p+ l, ... , n) (7 .4.2)
type of outlier tbere is a prospect tbat tbe realization itself conspires to
conceal tbe outlier and tbe detection of outliers becomes more problemati- wbere tbe zt are independent N(O, o.2). Tbus we bave a set of n observations
cal. On tbe otber band any smootbing-out effected by autocorrelations can of a discrete process, furtber restricted by tbe assumption tbat p is known
bave an intrinsic role in accommodating tbe outlier. It is possible tbat its and tbat {ut} is a stationary process, witb a superimposed discordant value at
influence on parameter estimation or testing in tbe basic time-series model time point j. Botb tbe cases of prescribed j and unknown j are considered
may be less acute tban witb independent error structure. an d maximum likelihood ratio tests of H: a = O against ii: a f:. O are de-
Huber (1972) claims tbat tbe 'biccougb' effect is rare; tbe more usual veloped. For tbe latter more realistic case (j unknown) tbe maximum
outlier is of tbe inberent type revealed more obscurely in 'bumps' and likelihood ratio statistic is equivalent to
'quakes'. Tbese are respectively local cbanges in tbe mean and variance
(requiring corresponding slippage-type alternative models) wbose effect max (ki,n)
extends to influence subsequent observations. From tbe test-of-discordancy, j=p+l, ... , n-p

and accommodation, viewpoints Huber suggests examining coefficients of wbere


skewness or kurtosis or applying a smootbing process, respectively, but
offers little by way of detailed prescriptions. (7 .4.3)
Outliers in designed experiments, regression, and in time-series 267
266 Outliers in statistica[ data
Indeed tbere is little publisbed work on outliers in time-series in terms
7.4 OUTLIERS IN TIME-SERIES
eitber of frequency-domain or time-domain analyses and tbis would seem to
be an important and bigbly cballenging area for furtber study.
We noted in Cbapter l two examples of time-series exbibiting outliers. In
One of tbe few contributions to date is tbat of Fox (1972) wbo defines two
tbe first example, illustrated in Figure 1.3, a realization of a non-stationary
types of outlier wbicb migbt occur in time-series data. His type I and type II
series of sales figures sbowed a distinct disruption of tbe quarterly cyclic
outliers are precisely tbe isolated independent gross execution or recording
pattern of sales, possibly reflecting an outlier. It could be tbat adverse
errors, independent of otber observations, and tbe 'inberent' type of
trading conditions resulting from government action or fiscal policy, sbort-
anomalous observation wbicb influences succeeding observations, wbicb we
term emergency company action or even delays in returns of sales figures
distinguisbed above. Four situations are postulated:
untypically depressed tbe sales figure at A but also induced a compensatory
untypical sales figure at B. Tbe second example of moisture content of (i) all outliers are of type I,
tabacco (Figure 1.4) exbibited apparently isolated outliers at A, B, etc. (ii) an outliers are of type II,
wbicb migbt bave reflected malfunction of tbe recording equipment. (iii) an outliers are of tbe same type, but we do not know wbicb type, and
Notice again tbe complications involved in detecting an outlier. As in (iv) both types of outlier are present.
generai linear model data, it is not necessarily an extreme value and it can
How we are to assess wbicb of tbese situations prevails is not considered
be cloaked to some extent by tbe generai structure of tbe process. In
apart from remarking tbat (ii) will be distinguisbed from (i) by tbe presence
particular, we can experience a smootbing-out effect wben we attempt to
of tbe carry-over effect. Only situations (i) and (ii) are examined in some
examine outliers in terms of derived quantities sucb as tbe values of
detail, on tbe assumption tbat tbe process is free of trend or seasonal
estimated residuals about a fitted model. Wbilst in tbe linear model any
factors; tbe possible effect of tbeir removal on tbe examination of outliers is
outlier does not tend to influence adjacent observations per se, merely tbe
not discussed. Tbus tbe metbods presented bave obvious limitations, com-
estimated residuals, etc., tbe same need not be true for time-series data in
pounded by an additional assumption tbat tbere is at most one outlier, but
view of tbe correlational pattern of tbe basic process. It is fruitful tberefore
tbey do provide a starting point in tbis difficult area of study.
to consider tbe prospect of two types of outlier.
A test for type I outliers is developed in relation to tbe mean-slippage
In tbe first case an isolated measurement, or execution, error is superim-
outlier model for a discrete time-series
posed on an otberwise reasonable realization of tbe process. Tbis will not be
reflected in tbe values of adjacent observations and its manifestation can be (7.4.1)
dramatic and obvious. Sucb a 'biccougb' effect is possibly wbat we are
noticing in tbe data of Figure 1.4. wbere 8it is tbe Kronecker delta function and tbe ut satisfy an auto-
Alternatively, a more inberent discordancy can arise and be reflected by regressive scbeme of order p
tbe correlation structure of tbe process in neigbbouring (usuany later) p

observations. Tbe data of Figure 1.3 possibly illustrate tbis effect. For tbis ut = Laiut-I + zt
1=1
(t= p+ l, ... , n) (7 .4.2)
type of outlier tbere is a prospect tbat tbe realization itself conspires to
conceal tbe outlier and tbe detection of outliers becomes more problemati- wbere tbe zt are independent N(O, o.2). Tbus we bave a set of n observations
cal. On tbe otber band any smootbing-out effected by autocorrelations can of a discrete process, furtber restricted by tbe assumption tbat p is known
bave an intrinsic role in accommodating tbe outlier. It is possible tbat its and tbat {ut} is a stationary process, witb a superimposed discordant value at
influence on parameter estimation or testing in tbe basic time-series model time point j. Botb tbe cases of prescribed j and unknown j are considered
may be less acute tban witb independent error structure. an d maximum likelihood ratio tests of H: a = O against ii: a f:. O are de-
Huber (1972) claims tbat tbe 'biccougb' effect is rare; tbe more usual veloped. For tbe latter more realistic case (j unknown) tbe maximum
outlier is of tbe inberent type revealed more obscurely in 'bumps' and likelihood ratio statistic is equivalent to
'quakes'. Tbese are respectively local cbanges in tbe mean and variance
(requiring corresponding slippage-type alternative models) wbose effect max (ki,n)
extends to influence subsequent observations. From tbe test-of-discordancy, j=p+l, ... , n-p

and accommodation, viewpoints Huber suggests examining coefficients of wbere


skewness or kurtosis or applying a smootbing process, respectively, but
offers little by way of detailed prescriptions. (7 .4.3)
268 Outliers in statistica/ data

In (7 .4.3) i is ii (0, O, ... , O, l, O, ... , O)' where the l appears in position j


and ii is the maximum likelihood estimate of a under (7.4.1) and (7.4.2);
w- 1
and w- 1
are the maximum likelihood estimates of w- 1 under H and
H, respectively, where the covariance matrix of the process has the form
V= Wu 2 (7 .4.4)
which depends only on p and the auto-regressive coefficients a 1(l =l, 2, ... ,
p). Note that the elements of W have the form wt,t' = w1t-t'l· CHAPTER 8
The test of discordancy detects the outlier as the observation maximising
the ki,n and declares it discordant if the maximum value is sufficiently large.
Un der H we are involved in determining the distribution of the maximum of
a set of n correlated F-variates. Significance levels, power calculations, and
Bayesian and Non-Parametric
the behaviour of modified tests are ali examined by Fox using Approaches
simulation methods.
The outlier model used by him for a test of discordancy for a type II outlier
has the form Passing reference has already been made to the use of Bayesian, and of
p non-parametric, methods in different aspects of the study of outliers. In this
xt = L a 1xt-l + 8ita + zt (7 .4.5) chapter we draw together the threads of each of these two approaches by
l=l considering in more detail the specific proposals that have been made for
with ali quantities defined and limited as before. We can see here the testing the discordancy of outliers or of coping with their presence in
carry-over effect of the discordant value. Again the maximum likelihood 'contaminated' data. We concentrate mainly on univariate data since most
ratio test of H: a= O against H: a~ O is developed, and studied by simula- contributions are in this area.
tion, for the case where j is specified. The more important case of an
unspecified value for j is not pursued. Some implications of employing the 8.1 BAYESIAN METHODS
wrong model ((7.4.5) instead of (7.4.1) and (7.4.2) or vice versa) are also In the context of the Bayesian approach a test of significance has little
examined by simulation. relevance. It is useful, for the sake of continuity of argument, to maintain a
Another approach to the study of outliers in correlated data appears in a distinction between the statistica! detection of outliers and their accommo-
paper by Guttman and Tiao (1978). They consider the premium-protection dation within a broader analysis of the data. The notion of a test of
approach to estimating the mean of certain stationary autocorrelated discordancy for assessing the import of outliers has, however, to be approp-
discrete time processes. riately re-expressed.
8.1.1 Bayesian 'tests of discordancy'
In Chapter l we remarked briefly on what might seem to be the somewhat
anomalous role of Bayesian methods in the study of outliers. The essential
nature of an outlier is found in the degree of 'surprise' it engenders when we
examine a set of data. Early informai methods of handling outliers consisted
in developing procedures for detecting and rejecting them as 'foreign in-
tluences' reflecting undesirable errors in the data collection process. Such an
attitude does not really fit the Bayesian idiom with its dual regard for the
total data set as the basic information ingredient from which conditional
inferences are to be drawn and with the likelihood as the full statistica!
expression of the information in the data. Preliminary processing of the data
for detection, and possible rejection, of outliers implies a possibly unwar-
ranted preoccupation with a specific feature of the data with insufficient
269
268 Outliers in statistica/ data

In (7 .4.3) i is ii (0, O, ... , O, l, O, ... , O)' where the l appears in position j


and ii is the maximum likelihood estimate of a under (7.4.1) and (7.4.2);
w- 1
and w- 1
are the maximum likelihood estimates of w- 1 under H and
H, respectively, where the covariance matrix of the process has the form
V= Wu 2 (7 .4.4)
which depends only on p and the auto-regressive coefficients a 1(l =l, 2, ... ,
p). Note that the elements of W have the form wt,t' = w1t-t'l· CHAPTER 8
The test of discordancy detects the outlier as the observation maximising
the ki,n and declares it discordant if the maximum value is sufficiently large.
Un der H we are involved in determining the distribution of the maximum of
a set of n correlated F-variates. Significance levels, power calculations, and
Bayesian and Non-Parametric
the behaviour of modified tests are ali examined by Fox using Approaches
simulation methods.
The outlier model used by him for a test of discordancy for a type II outlier
has the form Passing reference has already been made to the use of Bayesian, and of
p non-parametric, methods in different aspects of the study of outliers. In this
xt = L a 1xt-l + 8ita + zt (7 .4.5) chapter we draw together the threads of each of these two approaches by
l=l considering in more detail the specific proposals that have been made for
with ali quantities defined and limited as before. We can see here the testing the discordancy of outliers or of coping with their presence in
carry-over effect of the discordant value. Again the maximum likelihood 'contaminated' data. We concentrate mainly on univariate data since most
ratio test of H: a= O against H: a~ O is developed, and studied by simula- contributions are in this area.
tion, for the case where j is specified. The more important case of an
unspecified value for j is not pursued. Some implications of employing the 8.1 BAYESIAN METHODS
wrong model ((7.4.5) instead of (7.4.1) and (7.4.2) or vice versa) are also In the context of the Bayesian approach a test of significance has little
examined by simulation. relevance. It is useful, for the sake of continuity of argument, to maintain a
Another approach to the study of outliers in correlated data appears in a distinction between the statistica! detection of outliers and their accommo-
paper by Guttman and Tiao (1978). They consider the premium-protection dation within a broader analysis of the data. The notion of a test of
approach to estimating the mean of certain stationary autocorrelated discordancy for assessing the import of outliers has, however, to be approp-
discrete time processes. riately re-expressed.
8.1.1 Bayesian 'tests of discordancy'
In Chapter l we remarked briefly on what might seem to be the somewhat
anomalous role of Bayesian methods in the study of outliers. The essential
nature of an outlier is found in the degree of 'surprise' it engenders when we
examine a set of data. Early informai methods of handling outliers consisted
in developing procedures for detecting and rejecting them as 'foreign in-
tluences' reflecting undesirable errors in the data collection process. Such an
attitude does not really fit the Bayesian idiom with its dual regard for the
total data set as the basic information ingredient from which conditional
inferences are to be drawn and with the likelihood as the full statistica!
expression of the information in the data. Preliminary processing of the data
for detection, and possible rejection, of outliers implies a possibly unwar-
ranted preoccupation with a specific feature of the data with insufficient
269
270 Outliers in statistica[ data Bayesian and non-parametric approaches 271

regard to its total import. The crucial statement of the likelihood involves a some quantity X having an initial (prior) distribution which becomes mod-
commitment to a fully specified model-which was certainly not a feature of ified in the light of a sample of observations x1 , x2 , ••• , Xn to yield a final
the ad hoc studies of outliers. (posterior) distribution for X. There is no estimation or testing problem-
Again, the Bayesian approach requires an a priori statement about the the total inferenceq is expressed by the final distribution of X. Claiming that
propriety of possible models, or about possible values of the parameters in a all inference problems are so represented, de Finetti argues that any
parametric family of models. This would bave to include a prior assessment reasonable approach to outlier rejection needs to be couched in such terms.
of probabilities attaching to the presence and form of outliers; the assess- He stresses that this raises a fundamental difficulty in that the final distribu-
ment must be made before the data are available, and irrespective of the tion (total inference) depends on all the data, an attitude which conflicts with
characteristics of any realized sample. But before we bave collected our data the preliminary rejection of some observations (as outliers). He concludes
ho w are we to recognize the prospect of outlieis; there is nothing to surprise that if rejection of outliers has any propriety this must hinge on the fact that
us? There seems to be a degree of conflict bere, between a data-keyed any observation serving as a candidate for rejection has a 'weak or practi-
response to anomalous observations and a data-independent incorporation cally negligible' influence on the final distribution.
of prospective outliers in the likelihood and prior probability assignment. This viewpoint opposes much of the rationale for outlier processing
We bave been at pains to stress throughout this book the need to advance described in the earlier chapters of this book. Apart perhaps from the
beyond the early informai view of outliers to a recognition of the importance identification of outliers as observations of special intrinsic interest, there
of adopting a specific form of outlier-generating model in any development would seem to be little point in, or basis for, either rejecting or accom-
of statistica! technique for handling outliers. The inescapable modelling modating outliers if their presence has negligible influence on the inferential
element in the Bayesian approach is thus welcome. Its refinement through import of the data!
the attribution of prior probabilities is also a potentially valuable component In exploring bis viewpoint, de Finetti insists that outlier rejection 'could
in outlier study-provided we really do bave some tangible prior informa- be justified only as an approximation to the exact Bayesian rule (under
tion. But to be compelled to produce a prior assessment in any circumstance well-specified hypotheses}, but never by empirica! ad hoc reasoning'. He
might be more of an embarrassment than an aid. When ali is considered, seeks substance in an example where interest centres on some 'estimator' x
however, perhaps the major philosophical distinction that remains is found obtained by a Bayesian analysis in the form of a weighted average of the xi:
in the irrelevance of the data to the outlier-model specification, and of the
sampling rule to the final inference, both of which are essential attitudes in
the Bayesian approach. They are both in conflict with the view we bave
advanced for recognizing, interpreting, and handling outliers on a more If the weights are complicated functions of the xi but a simpler rule yields
classica! approach. a ,good approximation to x and also takes the form of a weighted average
The Bayesian attitude asks that we anticipate the possible presence of but with roughly equal weights for most xi and negligible weights for the
outliers and structure our data-generating model accordingly before we others then these latter observations 'are outliers in regard to this method'.
observe any data. This is a tenable standpoint, and one which some may feel But de Fin etti finds such a noti o n 'vague an d rather arbitrary'. T o make it
'more objective', though whether it is an honest expression of what happens more substantial be restricts attention to an 'estimate' expressible as the
in data analysis is another matter. mean of the final (posterior) distribution, where the initial (prior) distribu-
Notwithstanding the philosophical issues, many Bayesian methods for tion is uniform. To explore the possibility of realizing the above manifesta-
outlier study bave been advanced and we shall proceed to examine some of tion of outliers (as lowly weighted observations in a linear form of inferential
them. Inevitably the test of discordancy does not bave an immediate parallel interest) de Finetti examines cases where the xi are independent, exchange-
in Bayesian terms, the final inference being in the form of a posterior able, or partially exchangeable. Some success is achieved (for uniform prior
distribution. We carry over the term to those situations where the major distributions) wben the error distributions (the distributions of Xi- X) are
interest is in drawing conclusions about the outliers, rather than about other rather simple mixtures of some common distributions.
parameters (with the outlier merely 'accommodated' at a lower level of The approach yields no direct, generally applicable, procedures for outlier
interest). rejection and the final message is that the Bayesian approach militates
One of the earliest discussions of the Bayesian approach to outliers is that against such a prospect. However the author's pessimistically expressed
of de Finetti (1961). He is primarily concerned with exploring basic attitudes conclusion that the data structure and error model are crucial to what
rather than developing technique. The discussion is set in the context of procedure should be employed seems no different to the message this book
270 Outliers in statistica[ data Bayesian and non-parametric approaches 271

regard to its total import. The crucial statement of the likelihood involves a some quantity X having an initial (prior) distribution which becomes mod-
commitment to a fully specified model-which was certainly not a feature of ified in the light of a sample of observations x1 , x2 , ••• , Xn to yield a final
the ad hoc studies of outliers. (posterior) distribution for X. There is no estimation or testing problem-
Again, the Bayesian approach requires an a priori statement about the the total inferenceq is expressed by the final distribution of X. Claiming that
propriety of possible models, or about possible values of the parameters in a all inference problems are so represented, de Finetti argues that any
parametric family of models. This would bave to include a prior assessment reasonable approach to outlier rejection needs to be couched in such terms.
of probabilities attaching to the presence and form of outliers; the assess- He stresses that this raises a fundamental difficulty in that the final distribu-
ment must be made before the data are available, and irrespective of the tion (total inference) depends on all the data, an attitude which conflicts with
characteristics of any realized sample. But before we bave collected our data the preliminary rejection of some observations (as outliers). He concludes
ho w are we to recognize the prospect of outlieis; there is nothing to surprise that if rejection of outliers has any propriety this must hinge on the fact that
us? There seems to be a degree of conflict bere, between a data-keyed any observation serving as a candidate for rejection has a 'weak or practi-
response to anomalous observations and a data-independent incorporation cally negligible' influence on the final distribution.
of prospective outliers in the likelihood and prior probability assignment. This viewpoint opposes much of the rationale for outlier processing
We bave been at pains to stress throughout this book the need to advance described in the earlier chapters of this book. Apart perhaps from the
beyond the early informai view of outliers to a recognition of the importance identification of outliers as observations of special intrinsic interest, there
of adopting a specific form of outlier-generating model in any development would seem to be little point in, or basis for, either rejecting or accom-
of statistica! technique for handling outliers. The inescapable modelling modating outliers if their presence has negligible influence on the inferential
element in the Bayesian approach is thus welcome. Its refinement through import of the data!
the attribution of prior probabilities is also a potentially valuable component In exploring bis viewpoint, de Finetti insists that outlier rejection 'could
in outlier study-provided we really do bave some tangible prior informa- be justified only as an approximation to the exact Bayesian rule (under
tion. But to be compelled to produce a prior assessment in any circumstance well-specified hypotheses}, but never by empirica! ad hoc reasoning'. He
might be more of an embarrassment than an aid. When ali is considered, seeks substance in an example where interest centres on some 'estimator' x
however, perhaps the major philosophical distinction that remains is found obtained by a Bayesian analysis in the form of a weighted average of the xi:
in the irrelevance of the data to the outlier-model specification, and of the
sampling rule to the final inference, both of which are essential attitudes in
the Bayesian approach. They are both in conflict with the view we bave
advanced for recognizing, interpreting, and handling outliers on a more If the weights are complicated functions of the xi but a simpler rule yields
classica! approach. a ,good approximation to x and also takes the form of a weighted average
The Bayesian attitude asks that we anticipate the possible presence of but with roughly equal weights for most xi and negligible weights for the
outliers and structure our data-generating model accordingly before we others then these latter observations 'are outliers in regard to this method'.
observe any data. This is a tenable standpoint, and one which some may feel But de Fin etti finds such a noti o n 'vague an d rather arbitrary'. T o make it
'more objective', though whether it is an honest expression of what happens more substantial be restricts attention to an 'estimate' expressible as the
in data analysis is another matter. mean of the final (posterior) distribution, where the initial (prior) distribu-
Notwithstanding the philosophical issues, many Bayesian methods for tion is uniform. To explore the possibility of realizing the above manifesta-
outlier study bave been advanced and we shall proceed to examine some of tion of outliers (as lowly weighted observations in a linear form of inferential
them. Inevitably the test of discordancy does not bave an immediate parallel interest) de Finetti examines cases where the xi are independent, exchange-
in Bayesian terms, the final inference being in the form of a posterior able, or partially exchangeable. Some success is achieved (for uniform prior
distribution. We carry over the term to those situations where the major distributions) wben the error distributions (the distributions of Xi- X) are
interest is in drawing conclusions about the outliers, rather than about other rather simple mixtures of some common distributions.
parameters (with the outlier merely 'accommodated' at a lower level of The approach yields no direct, generally applicable, procedures for outlier
interest). rejection and the final message is that the Bayesian approach militates
One of the earliest discussions of the Bayesian approach to outliers is that against such a prospect. However the author's pessimistically expressed
of de Finetti (1961). He is primarily concerned with exploring basic attitudes conclusion that the data structure and error model are crucial to what
rather than developing technique. The discussion is set in the context of procedure should be employed seems no different to the message this book
272 Outliers in statistica[ data Bayesian and non-parametric approaches 273

has been presenting in relation to more traditional approaches to outlier prescribed k) is in the Bayesian mould, but the choice of k does not proceed
treatment. from any prior probability distribution of possible values for k. Instead, the
We bave already referred (in Sections 2.3 and 4.4) to a novel approach to choice of k involves a combination of the fixed- k analyses and classica}
the modelling of outliers employed by Kale, Sinha, Veale, and otbers (see significance test id~,as. This latter aspect will not be considered. We concen-
Kale and Sinba, 1971). Here it is assumed tbat n- k of tbe observations trate on tbe fundamental Bayesian aspects, suggesting minor extensions of
x 1 , x 2 , ••• , Xn arise from some basic population F wbilst tbe remainder (the interpretation where tbe authors are not too specific in their proposals.
outliers) arise from populations 0 1 , 0 2 , ••• , Ok different from F. lt is Suppose tbat x1 , x2 , ••• , Xn are independent observations from normal
assumed tbat prior to taking the observations tbere is no way of identifying distributions with common unknown variance u 2 • If no outliers are present
the anomalous subset of size k. Furtbermore, sucb identification does not tbe distributions ali bave zero mean. Alternatively, a location-sbift model
arise from tbe observed values. Instead, it is ass~umed that any subset of k of prevails witb k of tbe means different from zero. A Bayesian analysis for a
tbe n observations is equally likely to be tbe set of observations arising from prescribed value of k proceeds as follows.
0 1 , 02, ... , Ok. We termed this the exchangeable model. For given k,' there are ( ~) subsets of observations which are candidates
It is a moot point wbetber tbe use of tbe uniform distribution for tbe
indices of tbe anomalous subset in tbis model implies tbat the approacb is for assessment as sets of k outliers. Assuming eacb subset to be equally
Bayesian in spirit. For a full Bayesian · approacb a specification of prior likely, a priori, to fulfil tbis role the posterior probability 1r(I) tbat subset I is
probabilities for tbe forms of tbe populations F and Oi (i= l, 2, ... , k) tbe outlier subset is proportional to

[L x;]-n/2
would be required, and inferences would need to be expressed in terms of
an appropriate posterior distribution. Kale (1974b} considers tbis extended i ~ti
prospect in tbe case wbere F and Oi (i= 1, 2, ... , k) are ali members of tbe
single-parameter family of exponential distributions witb parameter values 8 on tbe assumption of 'relatively innocuous' uniform prior distributions for
and 8i (i= l, 2, ... , k), respectively. He sbows tbat witb minima} restrictions log u and for tbe unknown location parameters. (But we are advised to
on tbe prior distribution p(8, 8h 82 , ••• , 8k} we obtain tbe same prescription recall some of tbe anomalies tbat can arise from multi-parameter uniform
for identifying tbe anomalous subset in terms of that set of k indices wbich prior distributions; see Dawid, Stone, and Zidek, 1973.)
has maximum posterior probability of corresponding witb 81 , 82 , ••• , 8k: Clearly 1r(I) is maximized wben I consists of tbe k observations witb
namely that if 8i ~ 8 (j =l, 2, ... , k 1) and 8/~ 8 (j = k 1 +l, k 1 + 2, ... , k) largest absolute values. Sucb would bave been tbe observations singled out
as outlying in a more traditional data-oriented approacb where tbe detection
thç anomalous observations are x0 >, x< 2 >, ••• , x<kt) and X<n-k+kt+l)'
stage proceeds intuitively in terms of tbe degree of 'surprise'. Here, no sucb
X<n-k+k 1 +2)' . .. , X(n)· Tbis is of course tbe intuitively sensible conclusion.
pre-detection is admitted-indeed it is ruled out by tbe adopted uniform
Kale also postulates tbat since no serious restriction was placed on
prior distribution of I.
p( 8, 8h 82 , ••• , 8k) tbis result will bold on a more classica! approacb employ-
It is proposed tbat the (marginai) posterior probability tbat xi is an outlier
ing no specification of prior attitudes about tbe parameters. Indeed, Kale
is measured by
(1974a) bas proved tbe corresponding result for tbe case 8i = 8'
(j = l, 2, ... , k) witb 8';?: 8 or 8' ~ 8. pi= L7r(I}.
Most of tbe following proposals for Bayesian analysis of outliers involve a I3i

similar form of exchangeable model to account for tbe presence of outliers, The question of wbetber or not tbe detected set of k outliers is discordant
but take tbe discussion furtber by considering tbe cboice of k, otber forms of presumably binges on bow large is
distribution F and Oi (altl}ougb usually limited to Oi =O; i= l, 2, ... , k)
and tbe estimation of parameters in tbe face of outliers. Tbese include the 1rk = max 1r(I) (8.1.1)
I e !P
work of Dempster and Rosner (1971), Guttman (1973b) on identification
and discordancy; and Box and Tiao (1968), Sinba (1972, 1973b) on accom- wbere !J is tbe set of ali subsets I of size k. We migbt decide tbat tbis needs
modation. to exceed, say, 0.95 before we attribute discordancy to tbe outlier subset.
Tbe first example we consider of a more detailed Bayesian approacb to Tbe question of cboice of k is crucial. Dempster and Rosner discuss some
detecting and testing outliers is embodied in tbe proposals of Dempster and fundamental obstacles to a full Bayesian analysis of tbis matter. Instead,
Rosner (1971}. Tbey describe tbeir approacb as 'semi-Bayesian' in tbat tbe tbey propose tbat we consider tbe sequence of maximized posterior prob-
detection and informai assessment of discordancy of a set of k outliers (for abilities 1rk for k =l, 2, 3, ... and suggest that we cboose k to yield 1rk
272 Outliers in statistica[ data Bayesian and non-parametric approaches 273

has been presenting in relation to more traditional approaches to outlier prescribed k) is in the Bayesian mould, but the choice of k does not proceed
treatment. from any prior probability distribution of possible values for k. Instead, the
We bave already referred (in Sections 2.3 and 4.4) to a novel approach to choice of k involves a combination of the fixed- k analyses and classica}
the modelling of outliers employed by Kale, Sinha, Veale, and otbers (see significance test id~,as. This latter aspect will not be considered. We concen-
Kale and Sinba, 1971). Here it is assumed tbat n- k of tbe observations trate on tbe fundamental Bayesian aspects, suggesting minor extensions of
x 1 , x 2 , ••• , Xn arise from some basic population F wbilst tbe remainder (the interpretation where tbe authors are not too specific in their proposals.
outliers) arise from populations 0 1 , 0 2 , ••• , Ok different from F. lt is Suppose tbat x1 , x2 , ••• , Xn are independent observations from normal
assumed tbat prior to taking the observations tbere is no way of identifying distributions with common unknown variance u 2 • If no outliers are present
the anomalous subset of size k. Furtbermore, sucb identification does not tbe distributions ali bave zero mean. Alternatively, a location-sbift model
arise from tbe observed values. Instead, it is ass~umed that any subset of k of prevails witb k of tbe means different from zero. A Bayesian analysis for a
tbe n observations is equally likely to be tbe set of observations arising from prescribed value of k proceeds as follows.
0 1 , 02, ... , Ok. We termed this the exchangeable model. For given k,' there are ( ~) subsets of observations which are candidates
It is a moot point wbetber tbe use of tbe uniform distribution for tbe
indices of tbe anomalous subset in tbis model implies tbat the approacb is for assessment as sets of k outliers. Assuming eacb subset to be equally
Bayesian in spirit. For a full Bayesian · approacb a specification of prior likely, a priori, to fulfil tbis role the posterior probability 1r(I) tbat subset I is
probabilities for tbe forms of tbe populations F and Oi (i= l, 2, ... , k) tbe outlier subset is proportional to

[L x;]-n/2
would be required, and inferences would need to be expressed in terms of
an appropriate posterior distribution. Kale (1974b} considers tbis extended i ~ti
prospect in tbe case wbere F and Oi (i= 1, 2, ... , k) are ali members of tbe
single-parameter family of exponential distributions witb parameter values 8 on tbe assumption of 'relatively innocuous' uniform prior distributions for
and 8i (i= l, 2, ... , k), respectively. He sbows tbat witb minima} restrictions log u and for tbe unknown location parameters. (But we are advised to
on tbe prior distribution p(8, 8h 82 , ••• , 8k} we obtain tbe same prescription recall some of tbe anomalies tbat can arise from multi-parameter uniform
for identifying tbe anomalous subset in terms of that set of k indices wbich prior distributions; see Dawid, Stone, and Zidek, 1973.)
has maximum posterior probability of corresponding witb 81 , 82 , ••• , 8k: Clearly 1r(I) is maximized wben I consists of tbe k observations witb
namely that if 8i ~ 8 (j =l, 2, ... , k 1) and 8/~ 8 (j = k 1 +l, k 1 + 2, ... , k) largest absolute values. Sucb would bave been tbe observations singled out
as outlying in a more traditional data-oriented approacb where tbe detection
thç anomalous observations are x0 >, x< 2 >, ••• , x<kt) and X<n-k+kt+l)'
stage proceeds intuitively in terms of tbe degree of 'surprise'. Here, no sucb
X<n-k+k 1 +2)' . .. , X(n)· Tbis is of course tbe intuitively sensible conclusion.
pre-detection is admitted-indeed it is ruled out by tbe adopted uniform
Kale also postulates tbat since no serious restriction was placed on
prior distribution of I.
p( 8, 8h 82 , ••• , 8k) tbis result will bold on a more classica! approacb employ-
It is proposed tbat the (marginai) posterior probability tbat xi is an outlier
ing no specification of prior attitudes about tbe parameters. Indeed, Kale
is measured by
(1974a) bas proved tbe corresponding result for tbe case 8i = 8'
(j = l, 2, ... , k) witb 8';?: 8 or 8' ~ 8. pi= L7r(I}.
Most of tbe following proposals for Bayesian analysis of outliers involve a I3i

similar form of exchangeable model to account for tbe presence of outliers, The question of wbetber or not tbe detected set of k outliers is discordant
but take tbe discussion furtber by considering tbe cboice of k, otber forms of presumably binges on bow large is
distribution F and Oi (altl}ougb usually limited to Oi =O; i= l, 2, ... , k)
and tbe estimation of parameters in tbe face of outliers. Tbese include the 1rk = max 1r(I) (8.1.1)
I e !P
work of Dempster and Rosner (1971), Guttman (1973b) on identification
and discordancy; and Box and Tiao (1968), Sinba (1972, 1973b) on accom- wbere !J is tbe set of ali subsets I of size k. We migbt decide tbat tbis needs
modation. to exceed, say, 0.95 before we attribute discordancy to tbe outlier subset.
Tbe first example we consider of a more detailed Bayesian approacb to Tbe question of cboice of k is crucial. Dempster and Rosner discuss some
detecting and testing outliers is embodied in tbe proposals of Dempster and fundamental obstacles to a full Bayesian analysis of tbis matter. Instead,
Rosner (1971}. Tbey describe tbeir approacb as 'semi-Bayesian' in tbat tbe tbey propose tbat we consider tbe sequence of maximized posterior prob-
detection and informai assessment of discordancy of a set of k outliers (for abilities 1rk for k =l, 2, 3, ... and suggest that we cboose k to yield 1rk
274 Outliers in statistica[ data Bayesian and non-parametric approaches 275

'large enough to provide reasonable assurance that the k most discrepant Table 8.1 Bayesian outlier analysis of the Daniel data (reproduced by permission of
data. points are outliers', coupling this rather generai prescription with Academic Press)
informai aids involving significance testing concepts. An alternative might be
k=1 k=2 k=3 k=4 k=5
to choose k to maximize 7Tk and to conclude that the k detected outliers are
discordant if maxk 1rk is sufficiently large. But clearly any such proposal x, [L, Pt [L, Pt [L, Pt [L, p, [L, p,

would need a careful study of its implications. 0.0000 0.0000 0.0019 0.0000 0.0015 0.0000 0.0016 0.0000 0.0127 0.0000 0.0160
Por prescribed k, Dempster and Rosner suggest estimators of u 2 and of 0.0281 0.0001 0.0019 0.0000 0.0015 0.0000 0.0016 0.0004 0.0127 0.0004 0.0160
the anomalous means ILi' which robustly accommodate the set of k outliers. 0.0561 0.0001 0.0019 0.0001 0.0015 0.0001 0.0016 0.0007 0.0128 0.0009 0.0161
0.0842 0.0002 0.0019 0.0001 0.0015 0.0001 0.0016 0.0011 0.0129 0.0014 0.0162
In the former case they propose 0.0982 0.0002 0.0019 0.0001 0.0015 0.0002 0.0016 0.0013 0.0129 0.0016 0.0163
0.1263 0.0002 0.0019 0.0002 0.0015 0.0002 0.0016 0.0016 0.0131 0.0021 0.0165
ui = L 1r(I)Si (8.1.2) 0.1684 0.0003 0.0019 0.0002 0.0015 0.0003 0.0016 0.0022 0.0133 0.0029 0.0169
Ie9 0.1964 0.0004 0.0019 0.0003 0.0015 0.0003 0.0017 0.0027 0.0136 0.0034 0.0173
0.2245 0.0004 0.0019 0.0003 0.0015 0.0004 0.0017 0.0031 0.0139 0.0040 0.0177
wh ere 0.2526 0.0005 0.0019 0.0004 0.0015 0.0004 0.0017 0.0036 0.0142 0.0046 0.0183
0.2947 0.0006 0.0020 0.0005 0.0015 0.0005 0.0018 0.0044 0.0148 0.0057 0.0192
(8.1.3) 0.3087 0.0006 0.0020 0.0005 0.0016 0.0005 0.0018 0.0046 0.0150 0.0060 0.0195
0.3929 0.0008 0.0020 0.0006 0.0016 0.0007 0.0019 0.0065 0.0166 0.0087 0.0222
0.4069 0.0008 0.0020 0.0007 0.0016 0.0008 0.0019 0.0069 0.0170 0.0092 0.0227
Por the ILi the estimators are 0.4209 0.0009 0.0020 0.0007 0.0017 0.0008 0.0019 0.0073 0.0173 0.0098 0.0233
0.4350 0.0009 0.0021 0.0007 0.0017 0.0009 0.0020 0.0077 0.0177 0.0104 0.0239
Mi= L 7T(l){ii(I)
Ie9
(8.1.4) 0.4630
0.4771
0.0010
0.0010
0.0021
0.0021
0.0008
0.0008
0.0017
0.0017
0.0009
0.0010
0.0020
0.0021
0.0086
0.0090
0.0185
0.0189
0.0117
0.0124
0.0252
0.0260
0.5472 0.0012 0.0022 0.0010 0.0018 0.0012 0.0022 0.0118 0.0215 0.0166 0.0304
wh ere 0.6595 0.0015 0.0023 0.0013 0.0020 0.0017 0.0026 0.0181 0.0275 0.0270 0.0410
0.7437 0.0018 0.0025 0.0016 0.0022 0.0022 0.0030 0.0253 0.0341 0.0398 0.0535
0.7437 0.0018 0.0025 0.0016 0.0022 0.0022 0.0030 0.0253 0.0341 0.0398 0.0535
{ii(I) ={Xi, if i E I.. } (8.1.5) 0.0031 0.0268 0.0354 0.0425 0.0561
0.7577 0.0019 0.0025 0.0017 0.0022 0.0023
O, otherwtse. 0.0028 0.0034 0.0340 0.0417 0.0559 0.0687
0.8138 0.0021 0.0026 0.0019 0.0024
0.8138 0.0021 0.0026 0.0019 0.0024 0.0028 0.0034 0.0340 0.0417 0.0559 0.0687
To illustrate their proposals, Dempster and Rosner reconsider the data 0.8980 0.0025 0.0028 0.0024 0.0027 0.0037 0.0041 0.0492 0.0547 0.0865 0.0963
discussed in Daniel (1959) in his work on half-normal plots. (See Sections 1.0804 0.0037 0.0034 0.0038 0.0035 0.0068 0.0063 0.1193 0.1104 0.4076 0.3773
1.3049 0.0059 0.0046 0.0071 0.0054 0.0163 0.0125 0.4414 0.3382 1.0523 0.8064
2.2 and 7.1.3.) It is interesting to reproduce some of the results. The Daniel 2.1321 0.9932 2.1443 0.9989
2.1468 0.0507 0.0236 0.1482 0.0690 1.9957 0.9296
data (the 31 contrasts in a 25 experiment) arranged in ascending order of 2.6659 0.3029 0.1136 2.3677 0.8882 2.6537 .0.9954 2.6650 0.9997 2.6658 1.0000
magnitude appear thus: 3.1430 2.5128 0.7995 3.1055 0.9881 3.1419 0.9996 3.1429 1.0000 3.1430 1.0000

-0.0561 -0.0842 -0.0982 0.1263 0.1684 iik 0.8650 0.7102 0.5856 0.5572 0.5221
0.0000 0.0281
11'k 0.7995 0.8762 0.9247 0.3311 0.1825
0.1964 0.2245 -0.2526 0.2947 -0.3087 0.3929 0.4069
0.4209 0.4350 0.4630 -0.4771 0.5472 0.6595 0.7437 A somewhat similar, if more specific, application of Bayesian methods to
-0.7437 -0.7577 -0.8138 -0.8138 -0.8980 1.080 -1.305
the 'detection of spuriosity' is described by Guttman (1973b). Adopting a
2.147 -2.666 -3.143 slippage-type alternative hypothesis to describe the occurrence of outliers,
Taking the data at face value as observations from N(ILi' u 2 ) (i.e. ignoring Guttman produces a procedure for determining whether or not a 'spurious
their structured origin in the 25 experiment) Dempster and Rosner seek observation' has occurred in the data. This interest in identification and
outliers using the ideas above. They tabulate for k = 1(1)5 the {i 1 and Pi and discordancy can be contrasted with the work of Box and Tiao (1968, see
uk and 7Tk· The results are reproduced as Table 8.1 and provide rather also Section 8.1.2) who employ a somewhat similar model to investigate the
compelling evidence for the three observations with largest absolute value accommodation issue: the way in which estimates of basic parameters are
being discordant outliers. We notice in particular how 7Tk builds up to inftuenced by the presence of outliers.
0.9247 at k = 3, dropping to 0.3311 at k = 4, and the maintenance of Guttman concentrates on a set of independent normal observations
anomalously high ILi and Pi for the last three observations for k = 3 and x h x2 , ••• , Xn arising, in the a bse nce of 'spuriosity', from a common normal
k=4. distribution, N(IL, u 2 ). Under the alternative model, one observation comes
274 Outliers in statistica[ data Bayesian and non-parametric approaches 275

'large enough to provide reasonable assurance that the k most discrepant Table 8.1 Bayesian outlier analysis of the Daniel data (reproduced by permission of
data. points are outliers', coupling this rather generai prescription with Academic Press)
informai aids involving significance testing concepts. An alternative might be
k=1 k=2 k=3 k=4 k=5
to choose k to maximize 7Tk and to conclude that the k detected outliers are
discordant if maxk 1rk is sufficiently large. But clearly any such proposal x, [L, Pt [L, Pt [L, Pt [L, p, [L, p,

would need a careful study of its implications. 0.0000 0.0000 0.0019 0.0000 0.0015 0.0000 0.0016 0.0000 0.0127 0.0000 0.0160
Por prescribed k, Dempster and Rosner suggest estimators of u 2 and of 0.0281 0.0001 0.0019 0.0000 0.0015 0.0000 0.0016 0.0004 0.0127 0.0004 0.0160
the anomalous means ILi' which robustly accommodate the set of k outliers. 0.0561 0.0001 0.0019 0.0001 0.0015 0.0001 0.0016 0.0007 0.0128 0.0009 0.0161
0.0842 0.0002 0.0019 0.0001 0.0015 0.0001 0.0016 0.0011 0.0129 0.0014 0.0162
In the former case they propose 0.0982 0.0002 0.0019 0.0001 0.0015 0.0002 0.0016 0.0013 0.0129 0.0016 0.0163
0.1263 0.0002 0.0019 0.0002 0.0015 0.0002 0.0016 0.0016 0.0131 0.0021 0.0165
ui = L 1r(I)Si (8.1.2) 0.1684 0.0003 0.0019 0.0002 0.0015 0.0003 0.0016 0.0022 0.0133 0.0029 0.0169
Ie9 0.1964 0.0004 0.0019 0.0003 0.0015 0.0003 0.0017 0.0027 0.0136 0.0034 0.0173
0.2245 0.0004 0.0019 0.0003 0.0015 0.0004 0.0017 0.0031 0.0139 0.0040 0.0177
wh ere 0.2526 0.0005 0.0019 0.0004 0.0015 0.0004 0.0017 0.0036 0.0142 0.0046 0.0183
0.2947 0.0006 0.0020 0.0005 0.0015 0.0005 0.0018 0.0044 0.0148 0.0057 0.0192
(8.1.3) 0.3087 0.0006 0.0020 0.0005 0.0016 0.0005 0.0018 0.0046 0.0150 0.0060 0.0195
0.3929 0.0008 0.0020 0.0006 0.0016 0.0007 0.0019 0.0065 0.0166 0.0087 0.0222
0.4069 0.0008 0.0020 0.0007 0.0016 0.0008 0.0019 0.0069 0.0170 0.0092 0.0227
Por the ILi the estimators are 0.4209 0.0009 0.0020 0.0007 0.0017 0.0008 0.0019 0.0073 0.0173 0.0098 0.0233
0.4350 0.0009 0.0021 0.0007 0.0017 0.0009 0.0020 0.0077 0.0177 0.0104 0.0239
Mi= L 7T(l){ii(I)
Ie9
(8.1.4) 0.4630
0.4771
0.0010
0.0010
0.0021
0.0021
0.0008
0.0008
0.0017
0.0017
0.0009
0.0010
0.0020
0.0021
0.0086
0.0090
0.0185
0.0189
0.0117
0.0124
0.0252
0.0260
0.5472 0.0012 0.0022 0.0010 0.0018 0.0012 0.0022 0.0118 0.0215 0.0166 0.0304
wh ere 0.6595 0.0015 0.0023 0.0013 0.0020 0.0017 0.0026 0.0181 0.0275 0.0270 0.0410
0.7437 0.0018 0.0025 0.0016 0.0022 0.0022 0.0030 0.0253 0.0341 0.0398 0.0535
0.7437 0.0018 0.0025 0.0016 0.0022 0.0022 0.0030 0.0253 0.0341 0.0398 0.0535
{ii(I) ={Xi, if i E I.. } (8.1.5) 0.0031 0.0268 0.0354 0.0425 0.0561
0.7577 0.0019 0.0025 0.0017 0.0022 0.0023
O, otherwtse. 0.0028 0.0034 0.0340 0.0417 0.0559 0.0687
0.8138 0.0021 0.0026 0.0019 0.0024
0.8138 0.0021 0.0026 0.0019 0.0024 0.0028 0.0034 0.0340 0.0417 0.0559 0.0687
To illustrate their proposals, Dempster and Rosner reconsider the data 0.8980 0.0025 0.0028 0.0024 0.0027 0.0037 0.0041 0.0492 0.0547 0.0865 0.0963
discussed in Daniel (1959) in his work on half-normal plots. (See Sections 1.0804 0.0037 0.0034 0.0038 0.0035 0.0068 0.0063 0.1193 0.1104 0.4076 0.3773
1.3049 0.0059 0.0046 0.0071 0.0054 0.0163 0.0125 0.4414 0.3382 1.0523 0.8064
2.2 and 7.1.3.) It is interesting to reproduce some of the results. The Daniel 2.1321 0.9932 2.1443 0.9989
2.1468 0.0507 0.0236 0.1482 0.0690 1.9957 0.9296
data (the 31 contrasts in a 25 experiment) arranged in ascending order of 2.6659 0.3029 0.1136 2.3677 0.8882 2.6537 .0.9954 2.6650 0.9997 2.6658 1.0000
magnitude appear thus: 3.1430 2.5128 0.7995 3.1055 0.9881 3.1419 0.9996 3.1429 1.0000 3.1430 1.0000

-0.0561 -0.0842 -0.0982 0.1263 0.1684 iik 0.8650 0.7102 0.5856 0.5572 0.5221
0.0000 0.0281
11'k 0.7995 0.8762 0.9247 0.3311 0.1825
0.1964 0.2245 -0.2526 0.2947 -0.3087 0.3929 0.4069
0.4209 0.4350 0.4630 -0.4771 0.5472 0.6595 0.7437 A somewhat similar, if more specific, application of Bayesian methods to
-0.7437 -0.7577 -0.8138 -0.8138 -0.8980 1.080 -1.305
the 'detection of spuriosity' is described by Guttman (1973b). Adopting a
2.147 -2.666 -3.143 slippage-type alternative hypothesis to describe the occurrence of outliers,
Taking the data at face value as observations from N(ILi' u 2 ) (i.e. ignoring Guttman produces a procedure for determining whether or not a 'spurious
their structured origin in the 25 experiment) Dempster and Rosner seek observation' has occurred in the data. This interest in identification and
outliers using the ideas above. They tabulate for k = 1(1)5 the {i 1 and Pi and discordancy can be contrasted with the work of Box and Tiao (1968, see
uk and 7Tk· The results are reproduced as Table 8.1 and provide rather also Section 8.1.2) who employ a somewhat similar model to investigate the
compelling evidence for the three observations with largest absolute value accommodation issue: the way in which estimates of basic parameters are
being discordant outliers. We notice in particular how 7Tk builds up to inftuenced by the presence of outliers.
0.9247 at k = 3, dropping to 0.3311 at k = 4, and the maintenance of Guttman concentrates on a set of independent normal observations
anomalously high ILi and Pi for the last three observations for k = 3 and x h x2 , ••• , Xn arising, in the a bse nce of 'spuriosity', from a common normal
k=4. distribution, N(IL, u 2 ). Under the alternative model, one observation comes
Bayesian and non-parametric approaches 277
276 Outliers in statistica/ data

from N(J.L +a, u 2 }. Tbis location-sbift model is a special case of mode[ A of Tbe attribution of spuriosity to a member of tbe sample is approacbed in
Ferguson (196la); we sball see tbat Box and Tiao (1968) deal witb tbe terms of tbe posterior distribution of a, and in particular of tbe values of J.La
dispersion-sbift analogue (mode[ B). It is asumed tbat any of tbe observa- and u~. If tbe weigbts ci are rougbly equal (to n- 1} we bave little evidence of
tions is equally likely to be tbe one tbat is spurious (see Section 2.3 for a spuriosity; a rougb argument is given to support tbe attribution of spuriosity
discussio n of tbe restriction to o ne, or a t least very few, possible spurious to an observation xi wbose weigbt ci exceeds

~+~ ~(::~)
observations) and Guttman o:ffers a 'succinct description' of bis model in
terms of tbe likelibood (for n;?: 3).

P{J.L, a, u 2 1 x} Alternatively, Bayesian confidence (credibility) intervals for tbe parameter

= _! (21Tu 2 }-n12
n
f {exp[- ~(x~-
i=l 2u
2
J.L - a } ]
a yield criteria for ascribing spuriosity expressed in terms of tbe distribution
function of tbe t-distribution witb (n- 2) degrees of freedom.
Guttman illustrates bis recommendations by reference to some sets of
x exp[- ~ &i (x.-ILf ]}.
2 (8.1.6) simulated data.
Tbe approacb can be immediately extended to multivariate data
Note tbat tbis is not equivalent to a mixture-type model wbere eacb x17 x2 , ••• , xn arising from N(p., V) if tbere is no spurious observation, witb
observation bas some small probability of arising from N(J.L + a, u 2 } or a spuriosity manifest in a single observation from N(p. +a, V). Again it is
larger complementary probability of arising from N(J.L, u 2 }. tbrougb tbe posterior distribution of a tbat we seek to detect a discordant
Adopting a non-informative prior distribution for J.L and u 2 witb density outlier. Tbe detection criteria again revolve around tbe values taken by a set
proportional to u- 2 tbe posterior distribution of (J.L, u 2 , a} is obtained. It bas of weigbts attacbed, in tbe posterior distribution of a, to tbe separate
probability density function proportional to observations. It is interesting to note tbat tbe implicit concept of extreme-
ness used to detect tbe outlier is again expressible in terms of tbe distance
u-<n+2> f exp[--1-{A_. +n+ l [a __n_ (x.- x)]2 metric, or scatter-ratio, discussed in Cbapter 6.
i/:
1 2u 2 n 1
n- l 1

+n(,.-i+~nJ (8.1.7) 8.1.2 Bayesim accommodation of outliers


We bave maintained a distinction between two basic attitudes in tbe study of
wbere A_i= Li~i (xi- i_i) 2 witb i_i =(n -1)- 1 Li~i xi. Integrating out J.L and outliers: identification and rejection on tbe one band, accommodation on
u 2 tbe posterior distribution of a is obtained. It bas probability density tbe otber. Tbis distinction also appears in tbe use of Bayesian metbods and
function proportional to we find severa! contributions of tbe accommodation type wbere metbods of

L
n { n -l [ n
A_i+-- a---=-(xi-i)
]2}-(n-1)/2 (8.1.8)
estimation or testing of parameters in a model are proposed wbicb are
robust against tbe presence of outliers.
i=l n n l
A major contribution in tbis category is tbe work of Box and Tiao (1968)
wbicb can be regarded as a weigbted combination of densities of tbe wbo consider a Bayesian analysis of tbe linear model wben outliers may be
Student's t type. present in tbe data. Tbey particularize tbeir results to a linear model wbere
Tbis latter cbaracteristic enables tbe posterior mean and variance of a to tbe error terms arise as independent observations from normal distributions
be obtained as witb zero mean but wbere slippage in tbe variance may bave occurred fora
limited number of observations (a more structured form of tbe Ferguson,
196la, model B).
We start by considering proposals for tbe generai linear model wbere tbe
observation vector x bas tbe form
x=A6+E (8.1.12)
wbere
witb 6 a p x l vector of parameters, A an n x p design matrix and E a p x l
(8.1.11)
vector of independent random errors. It is supposed tbat tbe individuai
Bayesian and non-parametric approaches 277
276 Outliers in statistica/ data

from N(J.L +a, u 2 }. Tbis location-sbift model is a special case of mode[ A of Tbe attribution of spuriosity to a member of tbe sample is approacbed in
Ferguson (196la); we sball see tbat Box and Tiao (1968) deal witb tbe terms of tbe posterior distribution of a, and in particular of tbe values of J.La
dispersion-sbift analogue (mode[ B). It is asumed tbat any of tbe observa- and u~. If tbe weigbts ci are rougbly equal (to n- 1} we bave little evidence of
tions is equally likely to be tbe one tbat is spurious (see Section 2.3 for a spuriosity; a rougb argument is given to support tbe attribution of spuriosity
discussio n of tbe restriction to o ne, or a t least very few, possible spurious to an observation xi wbose weigbt ci exceeds

~+~ ~(::~)
observations) and Guttman o:ffers a 'succinct description' of bis model in
terms of tbe likelibood (for n;?: 3).

P{J.L, a, u 2 1 x} Alternatively, Bayesian confidence (credibility) intervals for tbe parameter

= _! (21Tu 2 }-n12
n
f {exp[- ~(x~-
i=l 2u
2
J.L - a } ]
a yield criteria for ascribing spuriosity expressed in terms of tbe distribution
function of tbe t-distribution witb (n- 2) degrees of freedom.
Guttman illustrates bis recommendations by reference to some sets of
x exp[- ~ &i (x.-ILf ]}.
2 (8.1.6) simulated data.
Tbe approacb can be immediately extended to multivariate data
Note tbat tbis is not equivalent to a mixture-type model wbere eacb x17 x2 , ••• , xn arising from N(p., V) if tbere is no spurious observation, witb
observation bas some small probability of arising from N(J.L + a, u 2 } or a spuriosity manifest in a single observation from N(p. +a, V). Again it is
larger complementary probability of arising from N(J.L, u 2 }. tbrougb tbe posterior distribution of a tbat we seek to detect a discordant
Adopting a non-informative prior distribution for J.L and u 2 witb density outlier. Tbe detection criteria again revolve around tbe values taken by a set
proportional to u- 2 tbe posterior distribution of (J.L, u 2 , a} is obtained. It bas of weigbts attacbed, in tbe posterior distribution of a, to tbe separate
probability density function proportional to observations. It is interesting to note tbat tbe implicit concept of extreme-
ness used to detect tbe outlier is again expressible in terms of tbe distance
u-<n+2> f exp[--1-{A_. +n+ l [a __n_ (x.- x)]2 metric, or scatter-ratio, discussed in Cbapter 6.
i/:
1 2u 2 n 1
n- l 1

+n(,.-i+~nJ (8.1.7) 8.1.2 Bayesim accommodation of outliers


We bave maintained a distinction between two basic attitudes in tbe study of
wbere A_i= Li~i (xi- i_i) 2 witb i_i =(n -1)- 1 Li~i xi. Integrating out J.L and outliers: identification and rejection on tbe one band, accommodation on
u 2 tbe posterior distribution of a is obtained. It bas probability density tbe otber. Tbis distinction also appears in tbe use of Bayesian metbods and
function proportional to we find severa! contributions of tbe accommodation type wbere metbods of

L
n { n -l [ n
A_i+-- a---=-(xi-i)
]2}-(n-1)/2 (8.1.8)
estimation or testing of parameters in a model are proposed wbicb are
robust against tbe presence of outliers.
i=l n n l
A major contribution in tbis category is tbe work of Box and Tiao (1968)
wbicb can be regarded as a weigbted combination of densities of tbe wbo consider a Bayesian analysis of tbe linear model wben outliers may be
Student's t type. present in tbe data. Tbey particularize tbeir results to a linear model wbere
Tbis latter cbaracteristic enables tbe posterior mean and variance of a to tbe error terms arise as independent observations from normal distributions
be obtained as witb zero mean but wbere slippage in tbe variance may bave occurred fora
limited number of observations (a more structured form of tbe Ferguson,
196la, model B).
We start by considering proposals for tbe generai linear model wbere tbe
observation vector x bas tbe form
x=A6+E (8.1.12)
wbere
witb 6 a p x l vector of parameters, A an n x p design matrix and E a p x l
(8.1.11)
vector of independent random errors. It is supposed tbat tbe individuai
278 Outliers in statistical data Bayesian and non-parametric approaches 279

errors may bave arisen from one or otber of two distributions: a basic witb i, s 2 tbe sample mean and variance, respectively, i<k> tbe mean of tbose
distribution f(e l ~ 1 ) or an alternative (outlier generating) distribution xi attributed to N(O, bu 2 ) under a(k) and
g(e l ~ 2 ). Interest centres on drawing inferences about 6, witb tbe parameter
sets ~ 1 and ~2 regarded as nuisance parameters. Attribution of tbe individuai cb=l-b- 1
errors ei to f( ) or to g( ) is not triggered by tbe corresponding observed xi.
Indeed tbe structure of tbe modei (8.1.12) may render intuitive detection of - - kcf> (- -)
IL(k) = x- n _ kcf> x<k>- x (8.1.15)
outliers impossible (see Cbapter 7). Instead events a<k> are defined under
wbicb a specific k of tbe ei come from g( ), tbe remainder from f( ), and
inferences empioy tbe corresponding Iikeiibood wbicb is made up of 2n
components P(a<k> j6, ~b ~2 ) corresponding witb ali possible a<k>· A generai
tbeory is developed Ieading to a formai expression for tbe posterior
distribution of 6 based on generai prior distributions {p<k>} for {a<k>} and wbere L:' ~mplies summation over ali xi attributed to N(O, bu 2 ) under a(k)·
p(6, ~b ~2 ) for (6, ~b ~ 2 ). Tbe functwn fn-1( ) is tbe probabiiity density function of tbe t-distribution,
Tangible expression is given to tbis in terms of tbe above normai-error so tbat tbe posterior distribution of IL is a weigbted average of 2n scaled
modei witb possibie scale-sbift. Tbus tbe errors ei arise eitber from N(O, u 2 ) t-distributions witb n -1 degrees of freedom.
or from N(O, bu 2 ). A particular case is studied wbere eacb ei arises witb Determination of (8.1.13) is most tedious. Proposals are made by Box and
probabiiity (l-À) from N(O, u 2 ) or witb probability À from N(O, bu 2 ). It is Tiao for easing tbe load wbicb make tbe exercise feasibie at Ieast for
assumed tbat b is prescribed (presumabiy b > l to make sense of tbe notion moderate n (up to 20 or so) and small À. Tbe metbod is illustrated on a
of outliers) and tbat (6, Iog u) is independent, uniform, a priori. Tbe post- classica} set of data due to Darwin on beigbts of plants quoted by Fisber
erior distribution of 6 is exbibited in tbe form of a p-dimensionai multi- (1960, page 37) and examined from an alternative viewpoint to tbe present
variate t-distribution. Marginai distributions of tbe components 8i are also one by Box and Tiao (1962). Tbere tbey attempted to accommodate two
derived. lower outliers by using a broader model tban tbat previousiy employed. In
It is interesting to note tbat tbe particuiar explanation adopted for tbe way tbe current context tbe posterior distribution of tbe mean IL, allowing for
in wbicb tbe errors arise from N(O, u 2 ) or N(O, bu 2 ) impiies a mixture-type possibie discordant outliers, is exbibited in reiation to extreme alternatives
(ratber tban siippage-type) model for outlier generation. Box and Tiao point tbat tbere are no outliers or tbat tbe two outliers are discordant and
out tbat a modified approacb making formai recognition of tbe 'mixing' genuinely arise from tbe alternative model N(IL, 25u 2). A value for À of 0.05
leads to tbe same results as were obtained under tbeir wider formuiation is arbitrarily employed, altbougb efforts are made to study tbe sensitivity of
wbere tbe likelibood consists of contributions from eacb of tbe 2n configura- tbe anaiysis to tbe cboice of values of À and b. Witbin tbe limited study it
tions of error source. appears tbat tbe posterior mean and standard deviation of IL are far more
Tbe metbod is illustrated for estimation of a single mean, IL· We bave sensitive to tbe vaiue of À tban to tbe value of b.
x1, x2 , ••• , Xn as independent observations eacb arising witb probabiiity In passing we sbouid recall tbe use of Bayesian metbods in tbe proof of
(1- À) from N(IL, u 2 ) or witb probabiiity À from N(IL, bu 2 ), witb À and b optimality properties of slippage procedures based on tbe Paulson type of
prescribed and u 2 unknown. Tbe posterior distribution of IL (adopting multiple decision approacb (see Cbapter 5 for some detaiis).
uniform, independent, prior distributions for IL an d Iog u) turns out to bave Tbe exchangeable model for outliers bas been used by Kale, Sinba, and
tbe form Veale in deveioping classicai metbods for estimating or testing tbe mean of
an exponential distribution, wbere outliers may be present in tbe data.

l _'\' (n-cf>k)! [ IL-ii<k>] (Section 4.4 presented some details of tbis work.) In tbe same applications
7r(IL x)- i.J W(k) fn-1 /~( _ A.k) · (8.1.13)
c~ntext of life-testing and reliability, witb exponentially distributed lifetimes,
(k) s(k) s<k> n ""
Smba (1972, 1973b) bas considered corresponding Bayesian metbods.
Tbe summation in (8.1.13) ranges over ali events a(k)' and tbe weigbts w(k) Sinba (1972) considers n independent observations x 1, x2 , • •• , Xn wbere
are proportional to all but one (xi) arise from an exponentiai distribution witb p.d.f.

(
À
--
)k (
b-k/2 - -
n )!(s2 )-!<n-1)
~ (8.1.14)
l
f(x, 8) = (j e -xto (8 >O) (8.1.16)
1-À n- k s2 '
278 Outliers in statistical data Bayesian and non-parametric approaches 279

errors may bave arisen from one or otber of two distributions: a basic witb i, s 2 tbe sample mean and variance, respectively, i<k> tbe mean of tbose
distribution f(e l ~ 1 ) or an alternative (outlier generating) distribution xi attributed to N(O, bu 2 ) under a(k) and
g(e l ~ 2 ). Interest centres on drawing inferences about 6, witb tbe parameter
sets ~ 1 and ~2 regarded as nuisance parameters. Attribution of tbe individuai cb=l-b- 1
errors ei to f( ) or to g( ) is not triggered by tbe corresponding observed xi.
Indeed tbe structure of tbe modei (8.1.12) may render intuitive detection of - - kcf> (- -)
IL(k) = x- n _ kcf> x<k>- x (8.1.15)
outliers impossible (see Cbapter 7). Instead events a<k> are defined under
wbicb a specific k of tbe ei come from g( ), tbe remainder from f( ), and
inferences empioy tbe corresponding Iikeiibood wbicb is made up of 2n
components P(a<k> j6, ~b ~2 ) corresponding witb ali possible a<k>· A generai
tbeory is developed Ieading to a formai expression for tbe posterior
distribution of 6 based on generai prior distributions {p<k>} for {a<k>} and wbere L:' ~mplies summation over ali xi attributed to N(O, bu 2 ) under a(k)·
p(6, ~b ~2 ) for (6, ~b ~ 2 ). Tbe functwn fn-1( ) is tbe probabiiity density function of tbe t-distribution,
Tangible expression is given to tbis in terms of tbe above normai-error so tbat tbe posterior distribution of IL is a weigbted average of 2n scaled
modei witb possibie scale-sbift. Tbus tbe errors ei arise eitber from N(O, u 2 ) t-distributions witb n -1 degrees of freedom.
or from N(O, bu 2 ). A particular case is studied wbere eacb ei arises witb Determination of (8.1.13) is most tedious. Proposals are made by Box and
probabiiity (l-À) from N(O, u 2 ) or witb probability À from N(O, bu 2 ). It is Tiao for easing tbe load wbicb make tbe exercise feasibie at Ieast for
assumed tbat b is prescribed (presumabiy b > l to make sense of tbe notion moderate n (up to 20 or so) and small À. Tbe metbod is illustrated on a
of outliers) and tbat (6, Iog u) is independent, uniform, a priori. Tbe post- classica} set of data due to Darwin on beigbts of plants quoted by Fisber
erior distribution of 6 is exbibited in tbe form of a p-dimensionai multi- (1960, page 37) and examined from an alternative viewpoint to tbe present
variate t-distribution. Marginai distributions of tbe components 8i are also one by Box and Tiao (1962). Tbere tbey attempted to accommodate two
derived. lower outliers by using a broader model tban tbat previousiy employed. In
It is interesting to note tbat tbe particuiar explanation adopted for tbe way tbe current context tbe posterior distribution of tbe mean IL, allowing for
in wbicb tbe errors arise from N(O, u 2 ) or N(O, bu 2 ) impiies a mixture-type possibie discordant outliers, is exbibited in reiation to extreme alternatives
(ratber tban siippage-type) model for outlier generation. Box and Tiao point tbat tbere are no outliers or tbat tbe two outliers are discordant and
out tbat a modified approacb making formai recognition of tbe 'mixing' genuinely arise from tbe alternative model N(IL, 25u 2). A value for À of 0.05
leads to tbe same results as were obtained under tbeir wider formuiation is arbitrarily employed, altbougb efforts are made to study tbe sensitivity of
wbere tbe likelibood consists of contributions from eacb of tbe 2n configura- tbe anaiysis to tbe cboice of values of À and b. Witbin tbe limited study it
tions of error source. appears tbat tbe posterior mean and standard deviation of IL are far more
Tbe metbod is illustrated for estimation of a single mean, IL· We bave sensitive to tbe vaiue of À tban to tbe value of b.
x1, x2 , ••• , Xn as independent observations eacb arising witb probabiiity In passing we sbouid recall tbe use of Bayesian metbods in tbe proof of
(1- À) from N(IL, u 2 ) or witb probabiiity À from N(IL, bu 2 ), witb À and b optimality properties of slippage procedures based on tbe Paulson type of
prescribed and u 2 unknown. Tbe posterior distribution of IL (adopting multiple decision approacb (see Cbapter 5 for some detaiis).
uniform, independent, prior distributions for IL an d Iog u) turns out to bave Tbe exchangeable model for outliers bas been used by Kale, Sinba, and
tbe form Veale in deveioping classicai metbods for estimating or testing tbe mean of
an exponential distribution, wbere outliers may be present in tbe data.

l _'\' (n-cf>k)! [ IL-ii<k>] (Section 4.4 presented some details of tbis work.) In tbe same applications
7r(IL x)- i.J W(k) fn-1 /~( _ A.k) · (8.1.13)
c~ntext of life-testing and reliability, witb exponentially distributed lifetimes,
(k) s(k) s<k> n ""
Smba (1972, 1973b) bas considered corresponding Bayesian metbods.
Tbe summation in (8.1.13) ranges over ali events a(k)' and tbe weigbts w(k) Sinba (1972) considers n independent observations x 1, x2 , • •• , Xn wbere
are proportional to all but one (xi) arise from an exponentiai distribution witb p.d.f.

(
À
--
)k (
b-k/2 - -
n )!(s2 )-!<n-1)
~ (8.1.14)
l
f(x, 8) = (j e -xto (8 >O) (8.1.16)
1-À n- k s2 '
280 Outliers in statistica[ data Bayesian and non-parametric approaches 281
whilst xi arises from an exponential distribution with p.d.f. f(x, 8/"1} where with
O< "l ~ l. The index i is assumed, a priori, to be equally likely to take any of
the values l, 2, ... , n. If (8.1.16) is the basic lifetime distribution whilst
f(x, 8/"1) is an inconvenient intrusion representing perhaps an unidentified
f
1 = [e-u•l•u2•-2/(1 + u)"-'] du. (8.1.22)
alien component in the sample, a quantity of basic interest in reliability is o
the survivor function
In principle _ the~e results give some indication of how the sampling
R 6 ( T)= P( X~ T)= e--rto
behavw~r of .~(T) 1s affected by the presence of a single outlier, but their
and we might wish to estimate this free from serious influence of xi. In the complex1ty mllttates against any simple interpretation of this influence the
absence of contamination in the data (i.e. if T,= l) a desirable estimator is prior probability structure ~or "l is arbitrary and there seems no good re~son
why we should maintain R(T) as an estimator when an outlier is present.
(T~ nx).
1
R( T)= {[l- T/(nx)]n- } (8.1.17) Indeed Kale- and Sinha. (1971} proposed the use of s = 'ç'n- 1 x. +x
· d k...1 (J) (n-1)
O (otherwise). mstea. of nx, when a smgle spurious observation is present, since this is
R(T) is the uniform minimum variance unbiased estimator of R(T). Sinha most hkely to correspond with x(n)· Accordingly
examines the variance var[R( T)]. Clearly this depends on both T and 8 but
(T~s)
1
this joint dependence takes a simple form in that var[R(T)] is a function of R*(T}={(l-T/st- } (8.1.23)
the ratio T/ 8, an d we will denote it V( T/ 8}. O (otherwise)
One aspect of the influence of an outlier on estimation of R(T) is the
effect of xi on E[R(T)] and on the mean square error of R(T). Both of these ?as some appeal as an estimator of R( T}, but no Bayesian 'analysis of R*( T)
1s offered.
are also functions of T/8, and of "l· We will denote then J.L.,(T/8} and
MSE.,( T/8}. Their explicit forms are intractable, but Sinha derives lower and ~inh~ (~97~b} considers a fuller Bayesian treatment employing again the
upper bounds for each of them. ~nor dtstnb~tiOn (8.1.18) for "'' and three possible families of prior distribu-
An alternative approach to investigating J.L., (T/ 8} an d MSE., (T/ 8} for fixed twns for 8 (~ndependent of "1). He derives the Bayes estimators of "'' of 8
"l is to de termine the distribution of the basic statistic nx/ 8 arising from
(the mean hfe-.tlme), and of the survivor function R(T). The forms are
some prescribed prior probability distribution for "l and thence to set again highly complicated, and specific to the chosen prior structures. No
bounds on the mean and MSE of R( T). No prior distribution is assigned to simple qualitative interpretation of the inftuence of the outlier is offered nor
does it seem feasible. '
8; the approach is accordingly termed 'semi-Bayesian'. For convenience a
An alternative basic exponential model with p.d.f.
prior Beta distribution is adopted for "'' with p.d.f.
p( "l) oc "lv- 1 (1- "l)q- 1 (p, q >0). (8.1.18} g(x, J.L) = exp[-(x- J.L)] (x> J.L) (8.1.24}
Denoting the posterior mean and MSE of R(T) by J.Lp,q(T/8} and
MSEp,q (T/ 8) respectively, Sinha shows that wbere the outlier .arises from an exponential distribution with p.d.f. g(x, J.L + 8)
(x> J.L + 8, 8 >O) Is also considered in Sinha (1972} and Sinha (1973b}.
p e--rto 2F2(l, q, n, p+ q+ l, T/8} :5 J.Lpq( T/8} :5 e--rto (8.1.19) Finally in this brief review of some Bayesian and 'semi-Bayesmn' methods
p+q for. stu~ying outliers we must mention a proposal by Lingappaiah (1976) for
an d esttmatmg the shape parameter in a different wide-ranging family of dis-
tributions ~i~cluding th.e Weibull and gamma) where severa} outliers may be
(p~ q)2F2(1, q, n, p+ q+ l, T/6)k(T/6) -e-2• 1
• .;;MSE•.• (T/6) present ansmg from dtfferent members of the same family of distributions.
The basic model has p.d.f.
~k(T/8}- 2
+p e- 27162 F 2 (1,q,n,p+q+l,T/8)+e- 2 -r/O (8.1.20) (x >O). (8.1.25)
p q
where 2 F 2 ( ) is the hypergeometric function, and In a sample of size n we contemplate the prospect that k of the n
observations arise from (8.1.25) with f3 replaced by 8if3 (i= 1, 2, ... , k;
k( T/ 8} = {(T/ 8}ne- 716/f(n )}J (8.1.21) O< 8i ~l}.
280 Outliers in statistica[ data Bayesian and non-parametric approaches 281
whilst xi arises from an exponential distribution with p.d.f. f(x, 8/"1} where with
O< "l ~ l. The index i is assumed, a priori, to be equally likely to take any of
the values l, 2, ... , n. If (8.1.16) is the basic lifetime distribution whilst
f(x, 8/"1) is an inconvenient intrusion representing perhaps an unidentified
f
1 = [e-u•l•u2•-2/(1 + u)"-'] du. (8.1.22)
alien component in the sample, a quantity of basic interest in reliability is o
the survivor function
In principle _ the~e results give some indication of how the sampling
R 6 ( T)= P( X~ T)= e--rto
behavw~r of .~(T) 1s affected by the presence of a single outlier, but their
and we might wish to estimate this free from serious influence of xi. In the complex1ty mllttates against any simple interpretation of this influence the
absence of contamination in the data (i.e. if T,= l) a desirable estimator is prior probability structure ~or "l is arbitrary and there seems no good re~son
why we should maintain R(T) as an estimator when an outlier is present.
(T~ nx).
1
R( T)= {[l- T/(nx)]n- } (8.1.17) Indeed Kale- and Sinha. (1971} proposed the use of s = 'ç'n- 1 x. +x
· d k...1 (J) (n-1)
O (otherwise). mstea. of nx, when a smgle spurious observation is present, since this is
R(T) is the uniform minimum variance unbiased estimator of R(T). Sinha most hkely to correspond with x(n)· Accordingly
examines the variance var[R( T)]. Clearly this depends on both T and 8 but
(T~s)
1
this joint dependence takes a simple form in that var[R(T)] is a function of R*(T}={(l-T/st- } (8.1.23)
the ratio T/ 8, an d we will denote it V( T/ 8}. O (otherwise)
One aspect of the influence of an outlier on estimation of R(T) is the
effect of xi on E[R(T)] and on the mean square error of R(T). Both of these ?as some appeal as an estimator of R( T}, but no Bayesian 'analysis of R*( T)
1s offered.
are also functions of T/8, and of "l· We will denote then J.L.,(T/8} and
MSE.,( T/8}. Their explicit forms are intractable, but Sinha derives lower and ~inh~ (~97~b} considers a fuller Bayesian treatment employing again the
upper bounds for each of them. ~nor dtstnb~tiOn (8.1.18) for "'' and three possible families of prior distribu-
An alternative approach to investigating J.L., (T/ 8} an d MSE., (T/ 8} for fixed twns for 8 (~ndependent of "1). He derives the Bayes estimators of "'' of 8
"l is to de termine the distribution of the basic statistic nx/ 8 arising from
(the mean hfe-.tlme), and of the survivor function R(T). The forms are
some prescribed prior probability distribution for "l and thence to set again highly complicated, and specific to the chosen prior structures. No
bounds on the mean and MSE of R( T). No prior distribution is assigned to simple qualitative interpretation of the inftuence of the outlier is offered nor
does it seem feasible. '
8; the approach is accordingly termed 'semi-Bayesian'. For convenience a
An alternative basic exponential model with p.d.f.
prior Beta distribution is adopted for "'' with p.d.f.
p( "l) oc "lv- 1 (1- "l)q- 1 (p, q >0). (8.1.18} g(x, J.L) = exp[-(x- J.L)] (x> J.L) (8.1.24}
Denoting the posterior mean and MSE of R(T) by J.Lp,q(T/8} and
MSEp,q (T/ 8) respectively, Sinha shows that wbere the outlier .arises from an exponential distribution with p.d.f. g(x, J.L + 8)
(x> J.L + 8, 8 >O) Is also considered in Sinha (1972} and Sinha (1973b}.
p e--rto 2F2(l, q, n, p+ q+ l, T/8} :5 J.Lpq( T/8} :5 e--rto (8.1.19) Finally in this brief review of some Bayesian and 'semi-Bayesmn' methods
p+q for. stu~ying outliers we must mention a proposal by Lingappaiah (1976) for
an d esttmatmg the shape parameter in a different wide-ranging family of dis-
tributions ~i~cluding th.e Weibull and gamma) where severa} outliers may be
(p~ q)2F2(1, q, n, p+ q+ l, T/6)k(T/6) -e-2• 1
• .;;MSE•.• (T/6) present ansmg from dtfferent members of the same family of distributions.
The basic model has p.d.f.
~k(T/8}- 2
+p e- 27162 F 2 (1,q,n,p+q+l,T/8)+e- 2 -r/O (8.1.20) (x >O). (8.1.25)
p q
where 2 F 2 ( ) is the hypergeometric function, and In a sample of size n we contemplate the prospect that k of the n
observations arise from (8.1.25) with f3 replaced by 8if3 (i= 1, 2, ... , k;
k( T/ 8} = {(T/ 8}ne- 716/f(n )}J (8.1.21) O< 8i ~l}.
282 Outliers in statistica[ data Bayesian and non-parametric approaches 283

Adopting an exponential prior distribution for {3 and Beta prior distribu- Tbus we are seeking to express anomalies of pattern, or of residuals, in tbe
tions for tbe Oi, and assuming for fixed k tbat tbe set of outliers is equally data.
likely to be any set of tbe k (<n) observations, tbe posterior distribution of Cbapter 7 bas considered tbis problem in some detail, and its correspond-
{3 is obtained for fixed (a, b). Tbe Bayes estimator of {3 is also derived. ing form in regres~ion or time-series analyses. Many appropriate procedures
Particular cases are derived for k =l and wbere (8.1.25) reduces to a are non-parametric in form, based on signs or ranks or inversions. An
Weibull, gamma, or exponential distribution. interesting paper on tbe basic pbilosopby of identifying outliers in designed
Tbis contribution by Lingappaiab exemplifi~s tbe impracticality of mucb experiments, in tbe sense of disruptors of overall pattern, is by Bross (1961).
of tbe Bayesian contribution to tbe study of outliers in tbe literature to date! It employs tbe notion of inversion in tbe data as a principal basis for
Notwitbstanding tbe fundamental obstacles confronting a Bayesian approacb identifying outliers.
in tbis brancb of statistics, some offerings bav~ been made at tbe Bayesian Non-parametric proposals for tbe more fundamental outlier problem-
altar. But tbey are far from acceptable in terms of tbe arbitrary mix of identification and discordancy testing in a single unstructured univariate
Bayesian and sampling tbeoretic components, uniform distributions of tbe sample-are less in evidence.
outliers over tbe set of observations denying tbe principle of 'surprise' in the Some contributions bave been made by Walsb [1950 (and correction,
identification of outliers, expedient (unjustified and undiscussed) cboice of 1953), 1959, 1965]; tbe accommodation issue is briefly considered by Walsb
prior distributions for an arbitrary subset of tbe basic parameters in tbe and Kelleber (1973). We conclude tbis cbapter by considering some of
model, and formai and unmanageable results witb little interpretation or Walsb's proposals, and by posing a few questions about tbe value of
application. Not ali sucb unsatisfactory components are present at tbe same non-parametric and distribution-free metbods in tbe context of outliers.
time, but some exist in almost all tbe Bayesian contributions and we must Some non-parametric tests of outliers are discussed and described by
conclude tbat very mucb remains to be done to acbieve a convincing Walsb (1965, in tbe second volume of bis book on non-parametric statistics),
advance in tbe Bayesian study of outliers. including tbe proposals in Walsb (1950). Por one of tbe proposed tests tbe
basic model assumes tbat tbe data consist of independent observations from
common symmetric distributions witb median cf>; tbe alternative mode l
8.2 NON-PARAMETRIC METHODS
postulates upward slippage in location of a prescribed number, k, of tbe
Non-parametric procedures for identifying outliers, testing tbeir discordancy distributions. Tbe test statistic employs tbe values of tbe k extremes
or rendering tbem uninftuential in a statistica} analysis of tbe bulk of tbe X(n-k+t>' ... , x<n>· It bas a complicated form, is laborious to operate, bas
data, bave been presented in a variety of contexts. Tbe originai approacb to unknown but bounded significance level, and is useful only for k > 4 (at least
tbe slippage problem (Mosteller, 1948) was non-parametric and generated a four upper outliers).
flood of refinements or modifications. Cbapter 5 bas discussed tbese in some Tbe test operates as follows. T is some integer less tban or equal to k, and
detail. {u,} and {vt} are cbosen sequences of T numbers monotone increasing
Non-parametric metbods for tbe analysis of data arising from designed in t witb uT = k. W e reject tbe basic model H 0 if
experiments extends tbe slippage problem in a more structured form;
S(a)= min [x<n+t-u,>-x<v,>]-2x<w<a>>>O (8.2.1)
looking for effects of different factors in tbe designed experiment can be lstsT
interpreted (to a degree) as identifying extreme, or outlying, sub-samples wbere
witb special intrinsic interest. Tbe outlying sub-samples are not an incon-
venience, tbey are tbe very manifestations we seek in order to exp.ress tbe a = P{ 1W).Pr[Xcn+ 1 -u,)- Xcv.>]> 2<1> l Ho} (8.2.2)
statistica! import of tbe data. But in designed experiments it may bappen
tbat individuai observations, ratber tban sub-samples corresponding with and w(a) is tbe smallest integer sucb tbat
certain factor levels, are anomalous. Sucb outliers bave a different role; tbey
(8.2.3)
serve to cloud tbe treatment effects we are investigating and need eitber to
be rejected on a sound statistica! basis or accommodated witb minima! Tbe significance level of tbe test is bounded above by 2a, and approacbes a
import in a robust analysis of tbe data from tbe standpoint of principal for n sufficiently large provided certain additional assumptions about tbe
interest. Of course, sucb outliers may not be immediately apparent on distributions of tbe x(i) are satisfied. Tbe restrictions of large n, at least 4
simple inspection. Tbey are extreme only relative to some peer group, e.g. outliers, and computational complexity severely limit tbe usefulness of tbe
tbe members of tbe sub-sample corresponding witb a certain factor level. test.
282 Outliers in statistica[ data Bayesian and non-parametric approaches 283

Adopting an exponential prior distribution for {3 and Beta prior distribu- Tbus we are seeking to express anomalies of pattern, or of residuals, in tbe
tions for tbe Oi, and assuming for fixed k tbat tbe set of outliers is equally data.
likely to be any set of tbe k (<n) observations, tbe posterior distribution of Cbapter 7 bas considered tbis problem in some detail, and its correspond-
{3 is obtained for fixed (a, b). Tbe Bayes estimator of {3 is also derived. ing form in regres~ion or time-series analyses. Many appropriate procedures
Particular cases are derived for k =l and wbere (8.1.25) reduces to a are non-parametric in form, based on signs or ranks or inversions. An
Weibull, gamma, or exponential distribution. interesting paper on tbe basic pbilosopby of identifying outliers in designed
Tbis contribution by Lingappaiab exemplifi~s tbe impracticality of mucb experiments, in tbe sense of disruptors of overall pattern, is by Bross (1961).
of tbe Bayesian contribution to tbe study of outliers in tbe literature to date! It employs tbe notion of inversion in tbe data as a principal basis for
Notwitbstanding tbe fundamental obstacles confronting a Bayesian approacb identifying outliers.
in tbis brancb of statistics, some offerings bav~ been made at tbe Bayesian Non-parametric proposals for tbe more fundamental outlier problem-
altar. But tbey are far from acceptable in terms of tbe arbitrary mix of identification and discordancy testing in a single unstructured univariate
Bayesian and sampling tbeoretic components, uniform distributions of tbe sample-are less in evidence.
outliers over tbe set of observations denying tbe principle of 'surprise' in the Some contributions bave been made by Walsb [1950 (and correction,
identification of outliers, expedient (unjustified and undiscussed) cboice of 1953), 1959, 1965]; tbe accommodation issue is briefly considered by Walsb
prior distributions for an arbitrary subset of tbe basic parameters in tbe and Kelleber (1973). We conclude tbis cbapter by considering some of
model, and formai and unmanageable results witb little interpretation or Walsb's proposals, and by posing a few questions about tbe value of
application. Not ali sucb unsatisfactory components are present at tbe same non-parametric and distribution-free metbods in tbe context of outliers.
time, but some exist in almost all tbe Bayesian contributions and we must Some non-parametric tests of outliers are discussed and described by
conclude tbat very mucb remains to be done to acbieve a convincing Walsb (1965, in tbe second volume of bis book on non-parametric statistics),
advance in tbe Bayesian study of outliers. including tbe proposals in Walsb (1950). Por one of tbe proposed tests tbe
basic model assumes tbat tbe data consist of independent observations from
common symmetric distributions witb median cf>; tbe alternative mode l
8.2 NON-PARAMETRIC METHODS
postulates upward slippage in location of a prescribed number, k, of tbe
Non-parametric procedures for identifying outliers, testing tbeir discordancy distributions. Tbe test statistic employs tbe values of tbe k extremes
or rendering tbem uninftuential in a statistica} analysis of tbe bulk of tbe X(n-k+t>' ... , x<n>· It bas a complicated form, is laborious to operate, bas
data, bave been presented in a variety of contexts. Tbe originai approacb to unknown but bounded significance level, and is useful only for k > 4 (at least
tbe slippage problem (Mosteller, 1948) was non-parametric and generated a four upper outliers).
flood of refinements or modifications. Cbapter 5 bas discussed tbese in some Tbe test operates as follows. T is some integer less tban or equal to k, and
detail. {u,} and {vt} are cbosen sequences of T numbers monotone increasing
Non-parametric metbods for tbe analysis of data arising from designed in t witb uT = k. W e reject tbe basic model H 0 if
experiments extends tbe slippage problem in a more structured form;
S(a)= min [x<n+t-u,>-x<v,>]-2x<w<a>>>O (8.2.1)
looking for effects of different factors in tbe designed experiment can be lstsT
interpreted (to a degree) as identifying extreme, or outlying, sub-samples wbere
witb special intrinsic interest. Tbe outlying sub-samples are not an incon-
venience, tbey are tbe very manifestations we seek in order to exp.ress tbe a = P{ 1W).Pr[Xcn+ 1 -u,)- Xcv.>]> 2<1> l Ho} (8.2.2)
statistica! import of tbe data. But in designed experiments it may bappen
tbat individuai observations, ratber tban sub-samples corresponding with and w(a) is tbe smallest integer sucb tbat
certain factor levels, are anomalous. Sucb outliers bave a different role; tbey
(8.2.3)
serve to cloud tbe treatment effects we are investigating and need eitber to
be rejected on a sound statistica! basis or accommodated witb minima! Tbe significance level of tbe test is bounded above by 2a, and approacbes a
import in a robust analysis of tbe data from tbe standpoint of principal for n sufficiently large provided certain additional assumptions about tbe
interest. Of course, sucb outliers may not be immediately apparent on distributions of tbe x(i) are satisfied. Tbe restrictions of large n, at least 4
simple inspection. Tbey are extreme only relative to some peer group, e.g. outliers, and computational complexity severely limit tbe usefulness of tbe
tbe members of tbe sub-sample corresponding witb a certain factor level. test.
284 Outliers in statistica[ data Bayesian and non-parametric approaches 285

Obvious modifications produce tests for k lower outliers or two-sided us away from tbe most contentious and important area of application of
tests. outlier tests. Finally, we need to consider tbe problem of tbe likely robust-
Anotber type of test for large samples is proposed by Walsb (1959). Tbe ness or power of non-parametric tests in tbe particular context of outlier
distributions need not now be symmetric; tbe alternative model is again of studies. W e return to tbis sbortly.
tbe location slippage type. Tbe asymptotic forms of tbe moments, and One furtber contribution of non-parametric metbods for outliers appears
distributional bebaviour, of order statistics are invoked to produce tests for in tbe work of Walsb and Kelleber (1973}. Tbey consider tbe unbiased
k upper outliers (or for k lower, or a corresponding two-sided test). Tbe estimation of tbe mean and variance of a continuous distribution from wbicb
one-sided tests for k upper outliers bave rejection criterion of tbe form a set of n independent observations is purported to arise, but wbere tbere is
tbe possibility of some upper and lower outliers. Tbe numbers of upper and
X(n+l-k)- [1 + An(a}]x(n-k) + An~a}X(n+l-s) >O (8.2.4} lower outliers are prescribed, and small in relation to n. The assumptions in
wbere s is tbe largest integer less tban k +...fin and An(a) >O depends on n tbe work are similar to tbose employed in Walsb (1959).
and on an upper bound a for tbe significance level of tbe test. Tbe test It is a feature of non-parametric procedures tbat we adopt a minimum of
statistic is cbosen as a particular case of tbe more generai form distributional assumptions about tbe data-generating mechanism, Sucb pro-
cedures can bave relatively low power in comparison witb procedures
S = X(n+l-k)- (l+ A}X(r) + AX(n+l-s) specifically geared to a particular detailed parametric model. If sucb a model
to meet certain requirements about tbe test bebaviour. Given k and n, can be justified we would often wisb to employ metbods tailored to it.
r =n- k effects an approximate large-sample minimization of var(S) under Otberwise, non-parametric metbods are appealing in tbeir lack of commit-
tbe basic model wbicb postulates bomogeneity of distribution for tbe sample ment to a model-tbeir ubiquity or robustness. But tbis type of appeal bas
observations. Subject to tbis, s and A are determined by tbe additional an element of delusion in tbe outlier context. Outliers are 'atypical'
assumptions tbat A >O and tbat observations-tbey impress tbemselves on us by appearing to be unrepre-
sentative of tbe overall sample data. Witb no knowledge (or assumptions)
E(S) = K .Jvar(S)[l +o (l)] (8.2.5) about tbe generai distributional structure of tbe data-generation process we
under tbe basic model, for prescribed K. Cbebysbev's inequality implies bave no grounds for 'surprise', notbing 'typical' against wbicb to ascribe
'atypicality'. Tbe macrolepidoptera ligbt-trap data at tbe end of Section 1.2
P(S<O)=* 1/K2 (8.2.6} illustrate tbis well. It is only by considering (at least informally) tbe way in
so tbat l/ K 2 is an estimate of tbe significance level of tbe test. Specifically, wbicb tbe data migbt reasonably bave been generated that we bave any basis
we are lead to tbe prescription for examining tbe possibility of discordant outliers. Tbere must always be
tbe possibility of a bomogeneous explanation of tbe values in any sample.
s = k + [...fin] (8.2. 7) In its most extreme form a non-parametric approacb makes no assump-
an d tions about tbe basic data-generating mecbanism. At tbis level it seems a
contradiction to seek to investigate outliers. Broad specifications sucb as sym-
l+ K v'{([.J2n]- K 2)/([ v'2n]-l)}
A= . (8.2.8) metry of tbe basic distribution, or of location-slippage explanations of
[v'2n]-K 2 -l outliers, raise some prospect for outlier study but seem bound to be bigbly
Tbe test is considered applicable for large enougb n: namely w bere speculative in tbeir conclusions. If sucb specifications are as mucb as we dare
contemplate, tben perbaps we bave no alternative but to accept tbe bigbly
...fin> K 2 + l. (8.2.9} limited assessment of outliers yielded by a non-parametric approacb. But
A fairly detailed discussion of tbe form and properties of tbis test is given more tban in almost any otber area of statistica! enquiry, tbe study of
by Walsb (1959). outliers binges on as precise a model formulation as is feasible. To deliber-
Sucb a test inevitably suffers from a variety of difficulties of conception ately abandon tbe model, by seeking non-parametric (or distribution-free)
and application. Determination of its precise form is again ratber tedious: metbods in some broad aim of robustness, smacks of tbrowing out tbe
tbe structure (8.2.5) and rougb probabilistic assessment provided by batbwater before tbe baby bas even been immersed.
Cbebysbev's inequality must seriously limit any assessment of test properties
and applicability; tbe need to specify K (and k) introduces an unreasonable
degree of arbitrariness; limitation to very large n (as is often implied} takes
284 Outliers in statistica[ data Bayesian and non-parametric approaches 285

Obvious modifications produce tests for k lower outliers or two-sided us away from tbe most contentious and important area of application of
tests. outlier tests. Finally, we need to consider tbe problem of tbe likely robust-
Anotber type of test for large samples is proposed by Walsb (1959). Tbe ness or power of non-parametric tests in tbe particular context of outlier
distributions need not now be symmetric; tbe alternative model is again of studies. W e return to tbis sbortly.
tbe location slippage type. Tbe asymptotic forms of tbe moments, and One furtber contribution of non-parametric metbods for outliers appears
distributional bebaviour, of order statistics are invoked to produce tests for in tbe work of Walsb and Kelleber (1973}. Tbey consider tbe unbiased
k upper outliers (or for k lower, or a corresponding two-sided test). Tbe estimation of tbe mean and variance of a continuous distribution from wbicb
one-sided tests for k upper outliers bave rejection criterion of tbe form a set of n independent observations is purported to arise, but wbere tbere is
tbe possibility of some upper and lower outliers. Tbe numbers of upper and
X(n+l-k)- [1 + An(a}]x(n-k) + An~a}X(n+l-s) >O (8.2.4} lower outliers are prescribed, and small in relation to n. The assumptions in
wbere s is tbe largest integer less tban k +...fin and An(a) >O depends on n tbe work are similar to tbose employed in Walsb (1959).
and on an upper bound a for tbe significance level of tbe test. Tbe test It is a feature of non-parametric procedures tbat we adopt a minimum of
statistic is cbosen as a particular case of tbe more generai form distributional assumptions about tbe data-generating mechanism, Sucb pro-
cedures can bave relatively low power in comparison witb procedures
S = X(n+l-k)- (l+ A}X(r) + AX(n+l-s) specifically geared to a particular detailed parametric model. If sucb a model
to meet certain requirements about tbe test bebaviour. Given k and n, can be justified we would often wisb to employ metbods tailored to it.
r =n- k effects an approximate large-sample minimization of var(S) under Otberwise, non-parametric metbods are appealing in tbeir lack of commit-
tbe basic model wbicb postulates bomogeneity of distribution for tbe sample ment to a model-tbeir ubiquity or robustness. But tbis type of appeal bas
observations. Subject to tbis, s and A are determined by tbe additional an element of delusion in tbe outlier context. Outliers are 'atypical'
assumptions tbat A >O and tbat observations-tbey impress tbemselves on us by appearing to be unrepre-
sentative of tbe overall sample data. Witb no knowledge (or assumptions)
E(S) = K .Jvar(S)[l +o (l)] (8.2.5) about tbe generai distributional structure of tbe data-generation process we
under tbe basic model, for prescribed K. Cbebysbev's inequality implies bave no grounds for 'surprise', notbing 'typical' against wbicb to ascribe
'atypicality'. Tbe macrolepidoptera ligbt-trap data at tbe end of Section 1.2
P(S<O)=* 1/K2 (8.2.6} illustrate tbis well. It is only by considering (at least informally) tbe way in
so tbat l/ K 2 is an estimate of tbe significance level of tbe test. Specifically, wbicb tbe data migbt reasonably bave been generated that we bave any basis
we are lead to tbe prescription for examining tbe possibility of discordant outliers. Tbere must always be
tbe possibility of a bomogeneous explanation of tbe values in any sample.
s = k + [...fin] (8.2. 7) In its most extreme form a non-parametric approacb makes no assump-
an d tions about tbe basic data-generating mecbanism. At tbis level it seems a
contradiction to seek to investigate outliers. Broad specifications sucb as sym-
l+ K v'{([.J2n]- K 2)/([ v'2n]-l)}
A= . (8.2.8) metry of tbe basic distribution, or of location-slippage explanations of
[v'2n]-K 2 -l outliers, raise some prospect for outlier study but seem bound to be bigbly
Tbe test is considered applicable for large enougb n: namely w bere speculative in tbeir conclusions. If sucb specifications are as mucb as we dare
contemplate, tben perbaps we bave no alternative but to accept tbe bigbly
...fin> K 2 + l. (8.2.9} limited assessment of outliers yielded by a non-parametric approacb. But
A fairly detailed discussion of tbe form and properties of tbis test is given more tban in almost any otber area of statistica! enquiry, tbe study of
by Walsb (1959). outliers binges on as precise a model formulation as is feasible. To deliber-
Sucb a test inevitably suffers from a variety of difficulties of conception ately abandon tbe model, by seeking non-parametric (or distribution-free)
and application. Determination of its precise form is again ratber tedious: metbods in some broad aim of robustness, smacks of tbrowing out tbe
tbe structure (8.2.5) and rougb probabilistic assessment provided by batbwater before tbe baby bas even been immersed.
Cbebysbev's inequality must seriously limit any assessment of test properties
and applicability; tbe need to specify K (and k) introduces an unreasonable
degree of arbitrariness; limitation to very large n (as is often implied} takes
Perspective 287

immediate; for example, in multivariate or highly structured data an explicit


detection process may be required to reveal the outliers (the 'surprising'
observations). However, the chain of operations remains the same. We find
an observation surprising, we then proceed to investigate it ftom the
statistica! viewpoint, and conclude by rejecting it, welcoming it, or accom-
modating it. lf accommodation is the aim, the analysis proceeds a further
CHAPTER 9 stage, to a treatment of the overall data set with allowance for the presence
of the outlier.
Other viewpoints exist, however. We must face the fact that such a
Perspective sequence of operations was, and is, not universally accepted as reasonable.
From the Bayesian viewpoint, there are philosophical obstacles to the
pre-processing of selected observations from a larger data set (though less
~erious objections to the subjectivity of the outlier detection process, at least
We have covered a lot of ground in this study of outliers. We hope that in
the process we have both armed the experimental scientist with a range of 1f formalized). The statistician adopting a more classica} approach may find
useful techniques and provided some food for thought for the professional the subjective element not to his taste. Insofar as he is prepared to
statistician in clarifying basic principles and indicating some of the countenance the rejection, for example, of individuai members of a sample,
he may require this to be done 'objectively' by a routine procedure carried
methodological gaps.
The outlier problem seems to be arousing more interest today than it has out regularly and indiscriminately on every similar set of data that arises.
ever don e, in spite of its long history. The pages of many statistica! journals Concerning such fundamental objections, it is indeed not clear that the
implications of the subjective detection of outliers are, or even can be,
frequently contain new contributions and the reader who wants to keep
abreast of the subject would find it profitable to keep a regular eye on appropriately measured and reflected in the fuller statistica! analysis of a set
journals such as Technometrics, the Journal of the American Statistica[ of data. Of course, proper concern for the construction of the outlier model
can go a long way to resolve this problem.
Association, and Applied Statistics.
Current research activity seems to centre largely on informai methods for Another di:fficulty which we have to keep firmly in view is that of masking.
outliers in highly structured situations (with an emphasis on computer Its manifestation in single univariate samples is straightforward. But in more
application and graphical display) and on refinements of understanding of complex situations it needs to be viewed in a much wider context. In a
familiar univariate single-sample procedures. On both fronts useful results designed experiment it remains true that one outlier may not be declared
abound, but much remains to be done. Some of the directions for the study discordant because of the masking effect of another. In another respect the
of univariate single-sample procedures are clear. We need to know much very presence of outliers may be masked by particularly strong real effects
more about performance criteria of tests of discordancy: this embodies the or by breakdowns in the conventional assumptions, such as normality,
need for a greater understanding of distributional theory elements. Many of homoscedasticity, additivity, and so on. In reverse we may falsely attribute
the outlier-generating models warrant further investigation; the interactions idiosyncratic facets of the behaviour of the data to such breakdowns, whilst
between, and respective relevances of, the different models have not been in reality they truly reflect the presence of individuai discordant values.
appropriately examined. The whole question of block and consecutive Mu1tivariate or highly structured data further highlight the 'subjectivity'
testing procedures needs a thorough investigation such as it has never yet versus 'objectivity' argument. We have remarked above on the more nebul-
received. These important practical and conceptual matters are also likely to ous nature of 'surprise' in such situations. Explicit detection procedures are
needed for outliers here, and almost inevitably data analysis methods will be
involve some challenging mathematical work.
But when all is said and done, the major problem in outlier study remains computerized, perhaps augmented by graphical display. The modern preoc-
the one that faced the very earliest workers in the subject-what is an cupation with thoughtless do-it-yourself computer packages has a spill-over
outlier? We have taken the view that the stimulus lies in the subjective effect. We will not be surprised at pleas for outlier packages which relieve
concept of surprise engendered by o ne, or a few, observations in a se t of the analyst of any responsibility. But computers are not easily taught to be
data: that this surprise inìtiates an investigation of the statistica! propriety (or surprised. The concept is a human one. It can only to a limited extent be
influence) of the detected outliers. W e have noted that surprise is no t always translated into a mechanized form. On the extreme viewpoint that the
element of surprise must be centrai to the study of outliers, so, it would
286
Perspective 287

immediate; for example, in multivariate or highly structured data an explicit


detection process may be required to reveal the outliers (the 'surprising'
observations). However, the chain of operations remains the same. We find
an observation surprising, we then proceed to investigate it ftom the
statistica! viewpoint, and conclude by rejecting it, welcoming it, or accom-
modating it. lf accommodation is the aim, the analysis proceeds a further
CHAPTER 9 stage, to a treatment of the overall data set with allowance for the presence
of the outlier.
Other viewpoints exist, however. We must face the fact that such a
Perspective sequence of operations was, and is, not universally accepted as reasonable.
From the Bayesian viewpoint, there are philosophical obstacles to the
pre-processing of selected observations from a larger data set (though less
~erious objections to the subjectivity of the outlier detection process, at least
We have covered a lot of ground in this study of outliers. We hope that in
the process we have both armed the experimental scientist with a range of 1f formalized). The statistician adopting a more classica} approach may find
useful techniques and provided some food for thought for the professional the subjective element not to his taste. Insofar as he is prepared to
statistician in clarifying basic principles and indicating some of the countenance the rejection, for example, of individuai members of a sample,
he may require this to be done 'objectively' by a routine procedure carried
methodological gaps.
The outlier problem seems to be arousing more interest today than it has out regularly and indiscriminately on every similar set of data that arises.
ever don e, in spite of its long history. The pages of many statistica! journals Concerning such fundamental objections, it is indeed not clear that the
implications of the subjective detection of outliers are, or even can be,
frequently contain new contributions and the reader who wants to keep
abreast of the subject would find it profitable to keep a regular eye on appropriately measured and reflected in the fuller statistica! analysis of a set
journals such as Technometrics, the Journal of the American Statistica[ of data. Of course, proper concern for the construction of the outlier model
can go a long way to resolve this problem.
Association, and Applied Statistics.
Current research activity seems to centre largely on informai methods for Another di:fficulty which we have to keep firmly in view is that of masking.
outliers in highly structured situations (with an emphasis on computer Its manifestation in single univariate samples is straightforward. But in more
application and graphical display) and on refinements of understanding of complex situations it needs to be viewed in a much wider context. In a
familiar univariate single-sample procedures. On both fronts useful results designed experiment it remains true that one outlier may not be declared
abound, but much remains to be done. Some of the directions for the study discordant because of the masking effect of another. In another respect the
of univariate single-sample procedures are clear. We need to know much very presence of outliers may be masked by particularly strong real effects
more about performance criteria of tests of discordancy: this embodies the or by breakdowns in the conventional assumptions, such as normality,
need for a greater understanding of distributional theory elements. Many of homoscedasticity, additivity, and so on. In reverse we may falsely attribute
the outlier-generating models warrant further investigation; the interactions idiosyncratic facets of the behaviour of the data to such breakdowns, whilst
between, and respective relevances of, the different models have not been in reality they truly reflect the presence of individuai discordant values.
appropriately examined. The whole question of block and consecutive Mu1tivariate or highly structured data further highlight the 'subjectivity'
testing procedures needs a thorough investigation such as it has never yet versus 'objectivity' argument. We have remarked above on the more nebul-
received. These important practical and conceptual matters are also likely to ous nature of 'surprise' in such situations. Explicit detection procedures are
needed for outliers here, and almost inevitably data analysis methods will be
involve some challenging mathematical work.
But when all is said and done, the major problem in outlier study remains computerized, perhaps augmented by graphical display. The modern preoc-
the one that faced the very earliest workers in the subject-what is an cupation with thoughtless do-it-yourself computer packages has a spill-over
outlier? We have taken the view that the stimulus lies in the subjective effect. We will not be surprised at pleas for outlier packages which relieve
concept of surprise engendered by o ne, or a few, observations in a se t of the analyst of any responsibility. But computers are not easily taught to be
data: that this surprise inìtiates an investigation of the statistica! propriety (or surprised. The concept is a human one. It can only to a limited extent be
influence) of the detected outliers. W e have noted that surprise is no t always translated into a mechanized form. On the extreme viewpoint that the
element of surprise must be centrai to the study of outliers, so, it would
286
288 Outliers in statistica[ data

follow, must the individuai continue to shoulder the major burden of outlier
detection through his personal and regular intervention in any data-
screening procedure. Of course we can, and should, react to summarized or
graphically presented data from the computer to assist in the detection of
outliers. Of course we can build into the computer formai procedures for
testing discordancy. But we do this to a large extent at the sacrifice of·the
subjective 'surprise' stimulus. lt is too much to expect to be able to teach the
APPENDIX
computer just what it is that would engender the surprise in you (or me)
necessary as the precursor of a test of discordancy.
The very fact that we cannot formalize the 'surprise' is only part of the Statistica/ Tables
problem. Even on an 'objective' approach to outlier processing the compu-
ter may bave its disadvantages to be set against its indispensability as a
digester and presenter of large-scale data. In a highly structured situation
such as a designed experiment the notion of an outlier remains so primitive
(perhaps an extreme residua!, perhaps some sort of unexpected break in
pattern) that any total replacement of situation-specific analysis by deper-
sonalized routine computer processing could inhibit the development of
clearer understanding of outliers in such areas.
What of the future? There are fashions in statistics as in all things.
Outliers were out of fashion for long enough. They exist as part of the
experimentalist's reality; they are part of the analyst's inescapable responsi-
bility. Surely the professional statistician cannot reasonably withhold his
contribution to outlier methodology on the grounds that he is not too sure
how to define an outlier?

289
288 Outliers in statistica[ data

follow, must the individuai continue to shoulder the major burden of outlier
detection through his personal and regular intervention in any data-
screening procedure. Of course we can, and should, react to summarized or
graphically presented data from the computer to assist in the detection of
outliers. Of course we can build into the computer formai procedures for
testing discordancy. But we do this to a large extent at the sacrifice of·the
subjective 'surprise' stimulus. lt is too much to expect to be able to teach the
APPENDIX
computer just what it is that would engender the surprise in you (or me)
necessary as the precursor of a test of discordancy.
The very fact that we cannot formalize the 'surprise' is only part of the Statistica/ Tables
problem. Even on an 'objective' approach to outlier processing the compu-
ter may bave its disadvantages to be set against its indispensability as a
digester and presenter of large-scale data. In a highly structured situation
such as a designed experiment the notion of an outlier remains so primitive
(perhaps an extreme residua!, perhaps some sort of unexpected break in
pattern) that any total replacement of situation-specific analysis by deper-
sonalized routine computer processing could inhibit the development of
clearer understanding of outliers in such areas.
What of the future? There are fashions in statistics as in all things.
Outliers were out of fashion for long enough. They exist as part of the
experimentalist's reality; they are part of the analyst's inescapable responsi-
bility. Surely the professional statistician cannot reasonably withhold his
contribution to outlier methodology on the grounds that he is not too sure
how to define an outlier?

289
Table I Criticai values for 5% and 1% tests of discordancy for an upper outlier in a gamma sample, using the ratio x<nli. xi as test
statistic. This table is reproduced, with permission from McGraw-Hill Book Company, from Eisenhart, Hastay, and Wallis (1947),
Tables 15.1 and 15.2, with appropriate change of notation
Ga1(Ea1)
5% criticai values

r 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 8 18 72 00


n

2 0.9985 0.9750 0.9392 0.9057 0.8772 0.8534 0.8332 0.8159 0.8010 0.7880 0.7341 0.6602 0.5813 0.5000
3 0.9669 0.8709 0.7977 0.7457 0.7071 0.6771 0.6530 0.6333 0.6167 0.6025 0.5466 0.4748 0.4031 0.3333
4 0.9065 0.7679 0.6841 0.6287 0.5895 0.5598 0.5365 0.5175 0.5017 0.4884 0.4366 0.3720 0.3093 0.2500
5 0.8412 0.6838 0.5981 0.5441 0.5065 0.4783 0.4564 0.4387 0.4241 0.4118 0.3645 0.3066 0.2513 0.2000
6 0.7808 0.6161 0.5321 0.4803 0.4447 0.4184 0.3980 0.3817 0.3682 0.3568 0.3135 0.2612 0.2119 0.1667
N
\0 7 0.7271 0.5612 0.4800 0.4307 0.3974 0.3726 0.3535 0.3384 0.3259 0.3154 0.2756 0.2278 0.1833 0.1429
o
8 0.6798 0.5157 0.4377 0.3910 0.3595 0.3362 0.3185 0.3043 0.2926 0.2829 0.2462 0.2022 0.1616 0.1250
9 0.6385 0.4775 0.4027 0.3584 0.3286 0.3067 0.2901 0.2768 0.2659 0.2568 0.2226 0.1820 0.1446 0.1111
10 0.6020 0.4450 0.3733 0.3311 0.3029 0.2823 0.2666 0.2541 0.2439 0.2353 0.2032 0.1655 0.1308 0.1000
12 0.5410 0.3924 0.3264 0.2880 0.2624 0.2439 0.2299 0.2187 0.2098 0.2020 0.1737 0.1403 0.1100 0.0833
15 0.4709 0.3346 0.2758 0.2419 0.2195 0.2034 0.1911 0.1815 0.1736 0.1671 0.1429 0.1144 0.0889 0.0667
20 0.3894 0.2705 0.2205 0.1921 0.1735 0.1602 0.1501 0.1422 0.1357 0.1303 0.1108 0.0879 0.0675 0.0500
24 0.3434 0.2354 0.1907 0.1656 0.1493 0.1374 0.1286 0.1216 0.1160 0.1113 0.0942 0.0743 0.0567 0.0417
30 0.2929 0.1980 0.1593 0.1377 0.1237 0.1137 0.1061 0.1002 0.0958 0.0921 0.0771 0.0604 0.0457 0.0333
40 0.2370 0.1576 0.1259 0.1082 0.0968 0.0887 0.0827 0.0780 0.0745 0.0713 0.0595 0.0462 0.0347 0.0250
60 0.1737 0.1131 0.0895 0.0765 0.0682 0.0623 0.0583 0.0552 0.0520 0.0497 0.0411 0.0316 0.0234 0.0167
120 0.0998 0.0632 0.0495 0.0419 0.0371 0.0337 0.0312 0.0292 0.0279 0.0266 0.0218 0.0165 0.0120 0.0083
00 o o o o o o o o o o o o o o

1 % criticai values

0.5 l 1.5 2 2.5 3 3.5 4 4.5 5 8 18 72 00


n

2 0.9999 0.9950 0.9794 0.9586 0.9373 0.9172 0.8988 0.8823 0.8674 0.8539 0.7949 0.7067 0.6062 0.5000
3 0.9933 0.9423 0.8831 0.8335 0.7933 0.7606 0.7335 0.7107 0.6912 0.6743 0.6059 0.5153 0.4230 0.3333
4 0.9676 0.8643 0.7814 0.7212 0.6761 0.6410 0.6129 0.5897 0.5702 0.5536 0.4884 0.4057 0.3251 0.2500
5 0.9279 0.7885 0.6957 0.6329 0.5875 0.5531 0.5259 0.5037 0.4854 0.4697 0.4094 0.3351 0.2644 0.2000
6 0.8828 0.7218 0.6258 0.5635 0.5195 0.4866 0.4608 0.4401 0.4229 0.4084 0.3529 0.2858 0.2229 0.1667
7 0.8376 0.6644 0.5685 0.5080 0.4659 0.4347 0.4105 0.3911 0.3751 0.3616 0.3105 0.2494 0.1929 0.1429
8 0.7945 0.6152 0.5209 0.4627 0.4226 0.3932 0.3704 0.3522 0.3373 0.3248 0.2779 0.2214 0.1700 0.1250
N 9 0.7544 0.5727 0.4810 0.4251 0.3870 0.3592 0.3378 0.3207 0.3067 0.2950 0.2514 0.1992 0.1521 0.1111
~ lO 0.7175 0.5358 0.4469 0.3934 0.3572 0.3308 0.3106 0.2945 0.2813 0.2704 0.2297 0.1811 0.1376 0.1000
12 0.6528 0.4751 0.3919 0.3428 0.3099 0.2861 0.2680 0.2535 0.2419 0.2320 0.1961 0.1535 0.1157 0.0833
15 0.5747 0.4069 0.3317 0.2882 0.2593 0.2386 0.2228 0.2104 0.2002 0.1918 0.1612 0.1251 0.0934 0.0667
20 0.4799 0.3297 0.2654 0.2288 0.2048 0.1877 0.1748 0.1646 0.1567 0.1501 0.1248 0.0960 0.0709 0.0500
24 0.4247 0.2871 0.2295 0.1970 0.1759 0.1608 0.1495 0.1406 0.1338 0.1283 0.1060 0.0810 0.0595 0.0417
30 0.3632 0.2412 0.1913 0.1635 0.1454 0.1327 0.1232 0.1157 0.1100 0.1054 0.0867 0.0658 0.0480 0.0333
40 0.2940 0.1915 0.1508 0.1281 0.1135 0.1033 0.0957 0.0898 0.0853 0.0816 0.0668 0.0503 0.0363 0.0250
60 0.2151 0.1371 0.1069 0.0902 0.0796 0.0722 0.0668 0.0625 0.0594 0.0567 0.0461 0.0344 0.0245 0.0167
120 0.1225 0.0759 0.0585 0.0489 0.0429 0.0387 0.0357 0.0334 0.0316 0.0302 0.0242 0.0178 0.0125 0.0083
00 o o o o o o o o o o o o o o
n = number of observations.
r= shape parameter of the gamma distribution (r = l for exponential distribution).
Table I Criticai values for 5% and 1% tests of discordancy for an upper outlier in a gamma sample, using the ratio x<nli. xi as test
statistic. This table is reproduced, with permission from McGraw-Hill Book Company, from Eisenhart, Hastay, and Wallis (1947),
Tables 15.1 and 15.2, with appropriate change of notation
Ga1(Ea1)
5% criticai values

r 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 8 18 72 00


n

2 0.9985 0.9750 0.9392 0.9057 0.8772 0.8534 0.8332 0.8159 0.8010 0.7880 0.7341 0.6602 0.5813 0.5000
3 0.9669 0.8709 0.7977 0.7457 0.7071 0.6771 0.6530 0.6333 0.6167 0.6025 0.5466 0.4748 0.4031 0.3333
4 0.9065 0.7679 0.6841 0.6287 0.5895 0.5598 0.5365 0.5175 0.5017 0.4884 0.4366 0.3720 0.3093 0.2500
5 0.8412 0.6838 0.5981 0.5441 0.5065 0.4783 0.4564 0.4387 0.4241 0.4118 0.3645 0.3066 0.2513 0.2000
6 0.7808 0.6161 0.5321 0.4803 0.4447 0.4184 0.3980 0.3817 0.3682 0.3568 0.3135 0.2612 0.2119 0.1667
N
\0 7 0.7271 0.5612 0.4800 0.4307 0.3974 0.3726 0.3535 0.3384 0.3259 0.3154 0.2756 0.2278 0.1833 0.1429
o
8 0.6798 0.5157 0.4377 0.3910 0.3595 0.3362 0.3185 0.3043 0.2926 0.2829 0.2462 0.2022 0.1616 0.1250
9 0.6385 0.4775 0.4027 0.3584 0.3286 0.3067 0.2901 0.2768 0.2659 0.2568 0.2226 0.1820 0.1446 0.1111
10 0.6020 0.4450 0.3733 0.3311 0.3029 0.2823 0.2666 0.2541 0.2439 0.2353 0.2032 0.1655 0.1308 0.1000
12 0.5410 0.3924 0.3264 0.2880 0.2624 0.2439 0.2299 0.2187 0.2098 0.2020 0.1737 0.1403 0.1100 0.0833
15 0.4709 0.3346 0.2758 0.2419 0.2195 0.2034 0.1911 0.1815 0.1736 0.1671 0.1429 0.1144 0.0889 0.0667
20 0.3894 0.2705 0.2205 0.1921 0.1735 0.1602 0.1501 0.1422 0.1357 0.1303 0.1108 0.0879 0.0675 0.0500
24 0.3434 0.2354 0.1907 0.1656 0.1493 0.1374 0.1286 0.1216 0.1160 0.1113 0.0942 0.0743 0.0567 0.0417
30 0.2929 0.1980 0.1593 0.1377 0.1237 0.1137 0.1061 0.1002 0.0958 0.0921 0.0771 0.0604 0.0457 0.0333
40 0.2370 0.1576 0.1259 0.1082 0.0968 0.0887 0.0827 0.0780 0.0745 0.0713 0.0595 0.0462 0.0347 0.0250
60 0.1737 0.1131 0.0895 0.0765 0.0682 0.0623 0.0583 0.0552 0.0520 0.0497 0.0411 0.0316 0.0234 0.0167
120 0.0998 0.0632 0.0495 0.0419 0.0371 0.0337 0.0312 0.0292 0.0279 0.0266 0.0218 0.0165 0.0120 0.0083
00 o o o o o o o o o o o o o o

1 % criticai values

0.5 l 1.5 2 2.5 3 3.5 4 4.5 5 8 18 72 00


n

2 0.9999 0.9950 0.9794 0.9586 0.9373 0.9172 0.8988 0.8823 0.8674 0.8539 0.7949 0.7067 0.6062 0.5000
3 0.9933 0.9423 0.8831 0.8335 0.7933 0.7606 0.7335 0.7107 0.6912 0.6743 0.6059 0.5153 0.4230 0.3333
4 0.9676 0.8643 0.7814 0.7212 0.6761 0.6410 0.6129 0.5897 0.5702 0.5536 0.4884 0.4057 0.3251 0.2500
5 0.9279 0.7885 0.6957 0.6329 0.5875 0.5531 0.5259 0.5037 0.4854 0.4697 0.4094 0.3351 0.2644 0.2000
6 0.8828 0.7218 0.6258 0.5635 0.5195 0.4866 0.4608 0.4401 0.4229 0.4084 0.3529 0.2858 0.2229 0.1667
7 0.8376 0.6644 0.5685 0.5080 0.4659 0.4347 0.4105 0.3911 0.3751 0.3616 0.3105 0.2494 0.1929 0.1429
8 0.7945 0.6152 0.5209 0.4627 0.4226 0.3932 0.3704 0.3522 0.3373 0.3248 0.2779 0.2214 0.1700 0.1250
N 9 0.7544 0.5727 0.4810 0.4251 0.3870 0.3592 0.3378 0.3207 0.3067 0.2950 0.2514 0.1992 0.1521 0.1111
~ lO 0.7175 0.5358 0.4469 0.3934 0.3572 0.3308 0.3106 0.2945 0.2813 0.2704 0.2297 0.1811 0.1376 0.1000
12 0.6528 0.4751 0.3919 0.3428 0.3099 0.2861 0.2680 0.2535 0.2419 0.2320 0.1961 0.1535 0.1157 0.0833
15 0.5747 0.4069 0.3317 0.2882 0.2593 0.2386 0.2228 0.2104 0.2002 0.1918 0.1612 0.1251 0.0934 0.0667
20 0.4799 0.3297 0.2654 0.2288 0.2048 0.1877 0.1748 0.1646 0.1567 0.1501 0.1248 0.0960 0.0709 0.0500
24 0.4247 0.2871 0.2295 0.1970 0.1759 0.1608 0.1495 0.1406 0.1338 0.1283 0.1060 0.0810 0.0595 0.0417
30 0.3632 0.2412 0.1913 0.1635 0.1454 0.1327 0.1232 0.1157 0.1100 0.1054 0.0867 0.0658 0.0480 0.0333
40 0.2940 0.1915 0.1508 0.1281 0.1135 0.1033 0.0957 0.0898 0.0853 0.0816 0.0668 0.0503 0.0363 0.0250
60 0.2151 0.1371 0.1069 0.0902 0.0796 0.0722 0.0668 0.0625 0.0594 0.0567 0.0461 0.0344 0.0245 0.0167
120 0.1225 0.0759 0.0585 0.0489 0.0429 0.0387 0.0357 0.0334 0.0316 0.0302 0.0242 0.0178 0.0125 0.0083
00 o o o o o o o o o o o o o o
n = number of observations.
r= shape parameter of the gamma distribution (r = l for exponential distribution).
Table II Criticai vaiues for 5% and l% Tabie III Criticai values for 5% and l% Dixon-type tests of discordancy for an
tests of discordancy for a lower outlier in an • X(n)-X(n-1) X(n)- X(n-1)
exponentiai sampie, using x0 /I xi as test upper outlier in an exponential sampie, usmg or as test
X(n) X(n)- X(l)
statistic. Values of the statistic lower than the
statistic
criticai value are significant
E2
Ea3
For testing For testing
n 5% 1% X(n)- X(n-1)
TEa2 TE2
X(n) X(n)- Xol
3 0.00844 0.00167
n n 5% 1%
4 0.00424 0.0 3 836
5 0.00255 0.0 3 502 2 3 0.974 0.995
6 0.00170 0.0 3 335 3 4 0.894 0.957
4 5 0.830 0.912
7 0.00122 0.0 3 239 5 6 0.782 0.875
8 0.0 3 913 0.0 3 179 6 7 0.746 0.845
9 0.0 3710 0.0 3 140 7 8 0.717 0.821
8 9 0.694 0.800
lO 0.0 3 568 0.0 3 112 9 lO 0.675 0.783
12 0.0 3 388 0.0 4 761 lO 11 0.658 0.768
14 0.0 3 281 0.04 552 11 12 0.644 0.755
12 13
16 0.0 3 213 0.0 4 419 0.631 0.743
13 14 0.620 0.733
18 0.0 3 167 0.04 328 14 15 0.610 0.724
20 0.0 3 135 0.0 4 264 15 16 0.601 0.715
30 0.04 589 0.0 4 116 16 17 0.593 0.707
17 18 0.586 0.700
40 0.0 4 329 0.0 5 644 18 19 0.579 0.694
50 0.04 209 0.0 5 410 19 20 0.573 0.687
100 0.0 5 518 0.0 5 102 20 21 0.567 0.682

n = number of observations.
n = number of observations.

292 293
Table II Criticai vaiues for 5% and l% Tabie III Criticai values for 5% and l% Dixon-type tests of discordancy for an
tests of discordancy for a lower outlier in an • X(n)-X(n-1) X(n)- X(n-1)
exponentiai sampie, using x0 /I xi as test upper outlier in an exponential sampie, usmg or as test
X(n) X(n)- X(l)
statistic. Values of the statistic lower than the
statistic
criticai value are significant
E2
Ea3
For testing For testing
n 5% 1% X(n)- X(n-1)
TEa2 TE2
X(n) X(n)- Xol
3 0.00844 0.00167
n n 5% 1%
4 0.00424 0.0 3 836
5 0.00255 0.0 3 502 2 3 0.974 0.995
6 0.00170 0.0 3 335 3 4 0.894 0.957
4 5 0.830 0.912
7 0.00122 0.0 3 239 5 6 0.782 0.875
8 0.0 3 913 0.0 3 179 6 7 0.746 0.845
9 0.0 3710 0.0 3 140 7 8 0.717 0.821
8 9 0.694 0.800
lO 0.0 3 568 0.0 3 112 9 lO 0.675 0.783
12 0.0 3 388 0.0 4 761 lO 11 0.658 0.768
14 0.0 3 281 0.04 552 11 12 0.644 0.755
12 13
16 0.0 3 213 0.0 4 419 0.631 0.743
13 14 0.620 0.733
18 0.0 3 167 0.04 328 14 15 0.610 0.724
20 0.0 3 135 0.0 4 264 15 16 0.601 0.715
30 0.04 589 0.0 4 116 16 17 0.593 0.707
17 18 0.586 0.700
40 0.0 4 329 0.0 5 644 18 19 0.579 0.694
50 0.04 209 0.0 5 410 19 20 0.573 0.687
100 0.0 5 518 0.0 5 102 20 21 0.567 0.682

n = number of observations.
n = number of observations.

292 293
Table IV Criticai values for 5% and l% tests of discordancy for a lower and upper outlier-pair in a gamma sample, using the ratio
x<n)l x (l) as test statistic. This table is reproduced, with permission of the Biometrika Trustees, from Pe arso n an d Hartley (1966 ), Table
31, after changing the notation w h ere appropriate
Ga7(Ea7)
Upper 5% points

n l 2 3 4 5 6 7 8 9 lO 11 12

l 39.0 87.5 142 202 266 333 403 475 550 626 704
1.5 15.4 27.8 39.2 50.7 62.0 72.9 83.5 93.9 104 114 124
N
2 9.60 15.5 20.6 25.2 29.5 33.6 37.5 41.1 44.6 48.0 51.4
\0 2.5 7.15 16.3 18.7 22.9 24.7 28.2 29.9
~ 10.8 13.7 20.8 26.5
3 5.82 8.38 10~4 12.1 13.7 15.0 16.3 17.5 18.6 19.7 20.7
3.5 4.99 6.94 8.44 9.70 10.8 11.8 12.7 13.5 14.3 15.1 15.8
4 4.43 6.00 7.18 8.12 9.03 9.78 10.5 11.1 11.7 12.2 12.7
4.5 4.03 5.34 6.31 7.11 7.80 8.41 8.95 9.45 • 9.91 10.3 10.7
5 3.72 4.85 5.67 6.34 6.92 7.42 7.87 8.28 8.66 9.01 9.34
6 3.28 4.16 4.79 5.30 5.72 6.09 6.42 6.72 7.00 7.25 7.48
7.5 2.86 3.54 4.01 4.37 4.68 4.95 5.19 5.40 5.59 5.77 5.93
lO 2.46 2.95 3.29 3.54 3.76 3.94 4.10 4.24 4.37 4.49 4.59
15 2.07 2.40 2.61 2.78 2.91 3.02 3.12 3.21 3.29 3.36 3.39
30 1.67 1.85 1.96 2.04 2.11 2.17 2.22 2.26 2.30 2.33 2.36
00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Upper 1% points

n l 2 3 4 5 6 7 8 9 lO 11 12

l 199 448 729 1036 1362 1705 2063 2432 2813


1.5 47.5 85 120 151 184 21(6) 24(9} 28(1) 31(0) 33(7) 36(1)
2 23.2 37 49 59 69 79 89 97 106 113 120
2.5 14.9 22 28 33 38 42 46 50 54 57 60

3 11.1 15.5 19.1 22 25 27 30 32 34 36 37


3.5 8.89 12.1 14.5 16.5 18.4 20 22 23 24 26 27
4 7.50 9.9 11.7 13.2 14.5 15.8 16.9 17.9 18.9 19.8 21
4.5 6.34 8.5 9.9 11.1 12.1 13.1 13.9 14.7 15.3 16.0 16.6
N 12.9 13.4 13.9
\0 5 5.85 7.4 8.6 9.6 10.4 11.1 11.8 12.4
VI

6 4.91 6.1 6.9 7.6 8.2 8.7 9.1 9.5 9.9 10.2 10.6
7.5 4.07 4.9 5.5 6.0 6.4 6.7 7.1 7.3 7.5 7.8 8.0
10 3.32 3.8 4.3 4.6 4.9 5.1 5.3 5.5 5.6 5.8 5.9
15 2.63 3.0 3.3 3.4 3.6 3.7 3.8 3.9 4.0 4.1 4.2
30 1.96 2.2 2.3 2.4 2.4 2.5 2.5 2.6 2.6 2.7 2.7
00 1.00 1.0 1.0 1.0 1.0 1.0 l. O 1.0 l. O l. O 1.0

Values in the column n= 2 and in the rows r =l and oo are exact. Elsewhere the third digit may be in error by a few units for the 5% points and
severa! units for the l% points. The third digit figures in parentheses for r = 1.5 are the most uncertain.
n = number of observations.
r = shape parameter of the gamma distribution.
Table IV Criticai values for 5% and l% tests of discordancy for a lower and upper outlier-pair in a gamma sample, using the ratio
x<n)l x (l) as test statistic. This table is reproduced, with permission of the Biometrika Trustees, from Pe arso n an d Hartley (1966 ), Table
31, after changing the notation w h ere appropriate
Ga7(Ea7)
Upper 5% points

n l 2 3 4 5 6 7 8 9 lO 11 12

l 39.0 87.5 142 202 266 333 403 475 550 626 704
1.5 15.4 27.8 39.2 50.7 62.0 72.9 83.5 93.9 104 114 124
N
2 9.60 15.5 20.6 25.2 29.5 33.6 37.5 41.1 44.6 48.0 51.4
\0 2.5 7.15 16.3 18.7 22.9 24.7 28.2 29.9
~ 10.8 13.7 20.8 26.5
3 5.82 8.38 10~4 12.1 13.7 15.0 16.3 17.5 18.6 19.7 20.7
3.5 4.99 6.94 8.44 9.70 10.8 11.8 12.7 13.5 14.3 15.1 15.8
4 4.43 6.00 7.18 8.12 9.03 9.78 10.5 11.1 11.7 12.2 12.7
4.5 4.03 5.34 6.31 7.11 7.80 8.41 8.95 9.45 • 9.91 10.3 10.7
5 3.72 4.85 5.67 6.34 6.92 7.42 7.87 8.28 8.66 9.01 9.34
6 3.28 4.16 4.79 5.30 5.72 6.09 6.42 6.72 7.00 7.25 7.48
7.5 2.86 3.54 4.01 4.37 4.68 4.95 5.19 5.40 5.59 5.77 5.93
lO 2.46 2.95 3.29 3.54 3.76 3.94 4.10 4.24 4.37 4.49 4.59
15 2.07 2.40 2.61 2.78 2.91 3.02 3.12 3.21 3.29 3.36 3.39
30 1.67 1.85 1.96 2.04 2.11 2.17 2.22 2.26 2.30 2.33 2.36
00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Upper 1% points

n l 2 3 4 5 6 7 8 9 lO 11 12

l 199 448 729 1036 1362 1705 2063 2432 2813


1.5 47.5 85 120 151 184 21(6) 24(9} 28(1) 31(0) 33(7) 36(1)
2 23.2 37 49 59 69 79 89 97 106 113 120
2.5 14.9 22 28 33 38 42 46 50 54 57 60

3 11.1 15.5 19.1 22 25 27 30 32 34 36 37


3.5 8.89 12.1 14.5 16.5 18.4 20 22 23 24 26 27
4 7.50 9.9 11.7 13.2 14.5 15.8 16.9 17.9 18.9 19.8 21
4.5 6.34 8.5 9.9 11.1 12.1 13.1 13.9 14.7 15.3 16.0 16.6
N 12.9 13.4 13.9
\0 5 5.85 7.4 8.6 9.6 10.4 11.1 11.8 12.4
VI

6 4.91 6.1 6.9 7.6 8.2 8.7 9.1 9.5 9.9 10.2 10.6
7.5 4.07 4.9 5.5 6.0 6.4 6.7 7.1 7.3 7.5 7.8 8.0
10 3.32 3.8 4.3 4.6 4.9 5.1 5.3 5.5 5.6 5.8 5.9
15 2.63 3.0 3.3 3.4 3.6 3.7 3.8 3.9 4.0 4.1 4.2
30 1.96 2.2 2.3 2.4 2.4 2.5 2.5 2.6 2.6 2.7 2.7
00 1.00 1.0 1.0 1.0 1.0 1.0 l. O 1.0 l. O l. O 1.0

Values in the column n= 2 and in the rows r =l and oo are exact. Elsewhere the third digit may be in error by a few units for the 5% points and
severa! units for the l% points. The third digit figures in parentheses for r = 1.5 are the most uncertain.
n = number of observations.
r = shape parameter of the gamma distribution.
Table V Criticai vaiues for 5% and l%
Dixon-type tests of discordancy for a Tabie VI Criticai values for 5% and l% tests for the presence
lower outlier in an exponentiai sarnple, of an undefined nurnber of discordant vaiues in an exponentiai
using [x(2l- x(l)]/[x(nl- x(l)] as test sarnpie, using Shapiro and Wilks' 'W-Exponential' statistic
statistic El2
E4
Lower Lower Upper Upper
n 5% 1% n 1% 5% 5% 1%

3 0.905 0.980 3 0.254 0.270 0.993 0.9997


4 0.618 0.808 4 0.130 0.160 0.858 0.968
5 0.429 ~ 0.618 5 0.0905 0.119 0.668 0.860
6 0.316 0.479 6 0.0665 0.0956 0.509 0.678
7 0.246 0.381 7 0.0591 0.0810 0.416 0.571
8 0.198 0.312 8 0.0512 0.0710 0.350 0.485
9 0.165 0.262 9 0.0442 0.0633 0.300 0.401
10 0.140 0.224 lO 0.0404 0.0568 0.253 0.339
11 0.121 0.194 12 0.0358 0.0494 0.202 0.272
12 0.106 0.171 14 0.0317 0.0428 0.165 0.213
13 0.094 0.152 16 0.0280 0.0374 0.136 0.177
14 0.085 0.136 18 0.0250 0.0332 0.116 0.148
15 0.077 0.124 20 0.0227 0.0302 0.100 0.129
16 0.070 0.113 30 0.0164 0.0213 0.0593 0.0719
17 0.064 0.103 40 0.0131 0.0164 0.0414 0.0499
18 0.059 0.095 50 0.0111 0.0137 0.0317 0.0360
19 0.055 0.088 60 0.0095 0.0117 0.0252 0.0291
20 0.051 0.082 70 0.0084 0.0103 0.0209 0.0241
80 0.0075 0.0091 0.0177 0.0205
n = number of observations. 90 0.0069 0.0082 0.0156 0.0176
100 0.0063 0.0074 0.0139 0.0153

n = number of observations.

296
297
Table V Criticai vaiues for 5% and l%
Dixon-type tests of discordancy for a Tabie VI Criticai values for 5% and l% tests for the presence
lower outlier in an exponentiai sarnple, of an undefined nurnber of discordant vaiues in an exponentiai
using [x(2l- x(l)]/[x(nl- x(l)] as test sarnpie, using Shapiro and Wilks' 'W-Exponential' statistic
statistic El2
E4
Lower Lower Upper Upper
n 5% 1% n 1% 5% 5% 1%

3 0.905 0.980 3 0.254 0.270 0.993 0.9997


4 0.618 0.808 4 0.130 0.160 0.858 0.968
5 0.429 ~ 0.618 5 0.0905 0.119 0.668 0.860
6 0.316 0.479 6 0.0665 0.0956 0.509 0.678
7 0.246 0.381 7 0.0591 0.0810 0.416 0.571
8 0.198 0.312 8 0.0512 0.0710 0.350 0.485
9 0.165 0.262 9 0.0442 0.0633 0.300 0.401
10 0.140 0.224 lO 0.0404 0.0568 0.253 0.339
11 0.121 0.194 12 0.0358 0.0494 0.202 0.272
12 0.106 0.171 14 0.0317 0.0428 0.165 0.213
13 0.094 0.152 16 0.0280 0.0374 0.136 0.177
14 0.085 0.136 18 0.0250 0.0332 0.116 0.148
15 0.077 0.124 20 0.0227 0.0302 0.100 0.129
16 0.070 0.113 30 0.0164 0.0213 0.0593 0.0719
17 0.064 0.103 40 0.0131 0.0164 0.0414 0.0499
18 0.059 0.095 50 0.0111 0.0137 0.0317 0.0360
19 0.055 0.088 60 0.0095 0.0117 0.0252 0.0291
20 0.051 0.082 70 0.0084 0.0103 0.0209 0.0241
80 0.0075 0.0091 0.0177 0.0205
n = number of observations. 90 0.0069 0.0082 0.0156 0.0176
100 0.0063 0.0074 0.0139 0.0153

n = number of observations.

296
297
Table VII Criticai values for 5% and l% tests of discordancy fora single outlier in Table VII ( Continued)
a normal sample, using the deviation from the sample mean or population mean, Table VIle Table Vllf Table Vllg
studentized or standardized, as test statistic Nul Nu2 NJ-tul
Table Vlla Table Vllb Table VIle Table Vlld
Nl N2 NJ-tl NJ-t2 n 5% 1% 5% 1% 5% 1%

n 5% 1% 5% 1% 5% 1% 5% 1% 3 1.74 2.22 1.93 2.39 2.12 2.71


4 1.94 2.43 2.15 2.61 2.23 2.81
3 1.15 1.15 1.15 1.15 1.68 1.72 1.70 1.73 5 2.08 2.57 2.29 2.76 2.32 2.88
4 1.46 1.49 1.48 1.50 1.85 1.95 1.90 1.97 6 2.18 2.68 2.40 2.87 2.39 2.93
5 1.67 1.75 1.71 1.76 1.98 2.12 2.05 2.15 7 2.27 2.76 2.48 2.95 2.44 2.98
6 1.82 1.94 1.89 1.97 2.01 2.25 2.16 2.30 8 2.33 2.83 2.55 3.02 2.49 3.02
7 1.94 2.10 2.02 2.14 2.15 2.36 2.26 2.42 9 2.39 2.88 2.60 3.07 2.53 3.06
8 2.03 2.22 2.13 2.28 2.21 2.45 2.33 2.52 lO 2.44 2.93 2.65 3.12 2.57 3.09
9 2.11 2.32 2.21 2.38 2.26 2.52 2.40 2.61 12 2.52 3.01 2.63 3.14
lO 2.18 2.41 2.29 2.48 2.31 2.59 2.45 2.68 14 2.59 3.07 2.68 3.19
12 2.29 2.55 2.41 2.63 2.40 2.70 2.55 2.80 15 2.62 3.10 2.82 3.29 2.71 3.21
14 2.37 2.66 2.47 2.79 16 2.64 3.12 2.73 3.23
15 2.41 2.71 2.55 2.81 2.50 2.83 2.66 2.94 18 2.69 3.17 2.77 3.26
16 2.44 2.75 2.53 2.86 20 2.73 3.21 2.94 3.39 2.80 3.29
18 2.50 2.82 2.58 2.91 30 2.88 3.38 3.08 3.53 2.93 3.40
20 2.56 2.88 2.71 3.00 2.62 2.95 2.79 3.10 40 2.99 3.44 3.18 3.62 3.02 3.48
30 2.74 3.10 2.80 3.17 2.96 3.30 50 3.05 3.53 3.08 3.54
40 2.87 3.24 2.92 3.30 3.08 3.43 60 3.30 3.73 3.14 3.59
50 2.96 3.34 2.98 3.39 100 3.27 3.67 3.28 3.72
60 3.03 3.41 3.23 3.59 120 3.33 3.76
100 3.21 3.60 3.23 3.61 200 3.47 3.89
120 3.27 3.66 3.46 3.83 500 3.71 4.11
1000 3.88 4.26

n = number of observations.

299
298
Table VII Criticai values for 5% and l% tests of discordancy fora single outlier in Table VII ( Continued)
a normal sample, using the deviation from the sample mean or population mean, Table VIle Table Vllf Table Vllg
studentized or standardized, as test statistic Nul Nu2 NJ-tul
Table Vlla Table Vllb Table VIle Table Vlld
Nl N2 NJ-tl NJ-t2 n 5% 1% 5% 1% 5% 1%

n 5% 1% 5% 1% 5% 1% 5% 1% 3 1.74 2.22 1.93 2.39 2.12 2.71


4 1.94 2.43 2.15 2.61 2.23 2.81
3 1.15 1.15 1.15 1.15 1.68 1.72 1.70 1.73 5 2.08 2.57 2.29 2.76 2.32 2.88
4 1.46 1.49 1.48 1.50 1.85 1.95 1.90 1.97 6 2.18 2.68 2.40 2.87 2.39 2.93
5 1.67 1.75 1.71 1.76 1.98 2.12 2.05 2.15 7 2.27 2.76 2.48 2.95 2.44 2.98
6 1.82 1.94 1.89 1.97 2.01 2.25 2.16 2.30 8 2.33 2.83 2.55 3.02 2.49 3.02
7 1.94 2.10 2.02 2.14 2.15 2.36 2.26 2.42 9 2.39 2.88 2.60 3.07 2.53 3.06
8 2.03 2.22 2.13 2.28 2.21 2.45 2.33 2.52 lO 2.44 2.93 2.65 3.12 2.57 3.09
9 2.11 2.32 2.21 2.38 2.26 2.52 2.40 2.61 12 2.52 3.01 2.63 3.14
lO 2.18 2.41 2.29 2.48 2.31 2.59 2.45 2.68 14 2.59 3.07 2.68 3.19
12 2.29 2.55 2.41 2.63 2.40 2.70 2.55 2.80 15 2.62 3.10 2.82 3.29 2.71 3.21
14 2.37 2.66 2.47 2.79 16 2.64 3.12 2.73 3.23
15 2.41 2.71 2.55 2.81 2.50 2.83 2.66 2.94 18 2.69 3.17 2.77 3.26
16 2.44 2.75 2.53 2.86 20 2.73 3.21 2.94 3.39 2.80 3.29
18 2.50 2.82 2.58 2.91 30 2.88 3.38 3.08 3.53 2.93 3.40
20 2.56 2.88 2.71 3.00 2.62 2.95 2.79 3.10 40 2.99 3.44 3.18 3.62 3.02 3.48
30 2.74 3.10 2.80 3.17 2.96 3.30 50 3.05 3.53 3.08 3.54
40 2.87 3.24 2.92 3.30 3.08 3.43 60 3.30 3.73 3.14 3.59
50 2.96 3.34 2.98 3.39 100 3.27 3.67 3.28 3.72
60 3.03 3.41 3.23 3.59 120 3.33 3.76
100 3.21 3.60 3.23 3.61 200 3.47 3.89
120 3.27 3.66 3.46 3.83 500 3.71 4.11
1000 3.88 4.26

n = number of observations.

299
298
Table VIlla Criticai values for 5% and 1% tests of discordancy for a single outlier Table VIIIb Criticai values for 5% and 1% tests of discordancy fora single outlier
in a normal sample, using the externally studentized deviation from the mean as test in a normal sample, using the greatest externally studentized deviation from the
statistic mean as test statistic
Nv1 Nv3
5% criticai vaiues 5% criticai values

v v
5 6 8 10 15 20 30 40 60 00 6 8 10 15 20 30 40 60 00
n n

3 2.37 2.24 2.09 2.01 1.91 1,87 1.82 1.80 1.78 1.74 3 2.6 2.4 2.3 2.2 2.1 2.0 2.0 2.0 1.9
4 2.71 2.55 2.37 2.27 2.15 2.10 2.04 2.02 1.99 1.94 4 2.7 2.6 2.4 2.3 2.3 2.2 2.2 2.1
5 2.95 2.78 2.57 2.46 2.32 2.26 2.20 2.17 2.14 2.08 5 2.9 2.8 2.6 2.5 2.4 2.4 2.4 2.3
6 3.15 2.95 2.72 2.60 2.45 2.38 2.31 2.28 2.25 2.18 6 2.9 2.7 2.6 2.5 2.5 2.5 2.4
7 3.30 3.09 2.85 2.72 2.55 2.47 2.40 2.37 2.33 2.27 8 2.9 2.8 2.7 2.7 2.6 2.6
8 3.43 3.21 2.95 2.81 2.64 2.56 2.48 2.44 2.41 2.33 10 3.1 3.0 2.9 2.8 2.8 2.7
9 3.54 3.31 3.04 2.89 2.71 2.63 2.54 2.50 2.47 2.39 15 3.3 3.2 3.1 3.0 2.9 2.8
10 3.64 3.39 3.12 2.96 2.77 2.68 2.60 2.56 2.52 2.44 20 3.3 3.2 3.1 3.0 2.9
12 3.80 3.54 3.25 3.08 2.88 2.78 2.69 2.65 2.61 2.52 30 3.4 3.3 3.2 3.1
40 3.4 3.3 3.2
60 3.5 3.3
1 % criticai vaiues

v
6 8 10 15 20 30 40 60 1% criticai vaiues
5 00
n
v 40 60
8 10 15 20 30 00
3 3.65 3.32 2.96 2.78 2.57 2.47 2.38 2.34 2.29 2.22 n
4 4.11 3.72 3.31 3.10 2.84 2.73 2.62 2.57 2.52 2.43
5 4.45 4.02 3.56 3.32 3.03 2.91 2.79 2.73 2.68 2.57 3 3.3 3.1 2.8 2.7 2.6 2.5 2.5 2.4
6 4.70 4.24 3.74 3.48 3.17 3.04 2.91 2.85 2.79 2.68 4 3.7 3.4 3.1 3.0 2.9 2.8 2.7 2.6
7 4.93 4.43 3.89 3.62 3.29 3.14 3.01 2.94 2.88 2.76 5 4.0 3.7 3.3 3.2 3.0 2.9 2.9 2.8
8 5.11 4.58 4.02 3.73 3.38 3.23 3.08 3.02 2.95 2.83 6 3.9 3.5 3.3 3.2 3.1 3.0 2.9
9 5.26 4.71 4.13 3.82 3.46 3.30 3.15 3.08 3.01 2.88 8 4.1 3.7 3.5 3.3 3.2 3.2 3.0
10 5.39 4.82 4.22 3.90 3.53 3.37 3.21 3.13 3.06 2.93 10 4.3 3.8 3.6 3.5 3.4 3.3 3.1
12 5.62 5.01 4.38 4.04 3.65 3.47 3.30 3.22 3.15 3.01 15 4.1 3.9 3.7 3.6 3.5 3.3
20 4.0 3.9 3.7 3.6 3.4
n = number of observations. 30 4.0 3.8 3.7 3.5
v = degrees of freedom of independent estimate of o- 2 • 40 4.0 3.8 3.6
60 4.0 3.7

n = number of observations.
v= degrees of freedom of independent estimate of o- 2 .

300 301
Table VIlla Criticai values for 5% and 1% tests of discordancy for a single outlier Table VIIIb Criticai values for 5% and 1% tests of discordancy fora single outlier
in a normal sample, using the externally studentized deviation from the mean as test in a normal sample, using the greatest externally studentized deviation from the
statistic mean as test statistic
Nv1 Nv3
5% criticai vaiues 5% criticai values

v v
5 6 8 10 15 20 30 40 60 00 6 8 10 15 20 30 40 60 00
n n

3 2.37 2.24 2.09 2.01 1.91 1,87 1.82 1.80 1.78 1.74 3 2.6 2.4 2.3 2.2 2.1 2.0 2.0 2.0 1.9
4 2.71 2.55 2.37 2.27 2.15 2.10 2.04 2.02 1.99 1.94 4 2.7 2.6 2.4 2.3 2.3 2.2 2.2 2.1
5 2.95 2.78 2.57 2.46 2.32 2.26 2.20 2.17 2.14 2.08 5 2.9 2.8 2.6 2.5 2.4 2.4 2.4 2.3
6 3.15 2.95 2.72 2.60 2.45 2.38 2.31 2.28 2.25 2.18 6 2.9 2.7 2.6 2.5 2.5 2.5 2.4
7 3.30 3.09 2.85 2.72 2.55 2.47 2.40 2.37 2.33 2.27 8 2.9 2.8 2.7 2.7 2.6 2.6
8 3.43 3.21 2.95 2.81 2.64 2.56 2.48 2.44 2.41 2.33 10 3.1 3.0 2.9 2.8 2.8 2.7
9 3.54 3.31 3.04 2.89 2.71 2.63 2.54 2.50 2.47 2.39 15 3.3 3.2 3.1 3.0 2.9 2.8
10 3.64 3.39 3.12 2.96 2.77 2.68 2.60 2.56 2.52 2.44 20 3.3 3.2 3.1 3.0 2.9
12 3.80 3.54 3.25 3.08 2.88 2.78 2.69 2.65 2.61 2.52 30 3.4 3.3 3.2 3.1
40 3.4 3.3 3.2
60 3.5 3.3
1 % criticai vaiues

v
6 8 10 15 20 30 40 60 1% criticai vaiues
5 00
n
v 40 60
8 10 15 20 30 00
3 3.65 3.32 2.96 2.78 2.57 2.47 2.38 2.34 2.29 2.22 n
4 4.11 3.72 3.31 3.10 2.84 2.73 2.62 2.57 2.52 2.43
5 4.45 4.02 3.56 3.32 3.03 2.91 2.79 2.73 2.68 2.57 3 3.3 3.1 2.8 2.7 2.6 2.5 2.5 2.4
6 4.70 4.24 3.74 3.48 3.17 3.04 2.91 2.85 2.79 2.68 4 3.7 3.4 3.1 3.0 2.9 2.8 2.7 2.6
7 4.93 4.43 3.89 3.62 3.29 3.14 3.01 2.94 2.88 2.76 5 4.0 3.7 3.3 3.2 3.0 2.9 2.9 2.8
8 5.11 4.58 4.02 3.73 3.38 3.23 3.08 3.02 2.95 2.83 6 3.9 3.5 3.3 3.2 3.1 3.0 2.9
9 5.26 4.71 4.13 3.82 3.46 3.30 3.15 3.08 3.01 2.88 8 4.1 3.7 3.5 3.3 3.2 3.2 3.0
10 5.39 4.82 4.22 3.90 3.53 3.37 3.21 3.13 3.06 2.93 10 4.3 3.8 3.6 3.5 3.4 3.3 3.1
12 5.62 5.01 4.38 4.04 3.65 3.47 3.30 3.22 3.15 3.01 15 4.1 3.9 3.7 3.6 3.5 3.3
20 4.0 3.9 3.7 3.6 3.4
n = number of observations. 30 4.0 3.8 3.7 3.5
v = degrees of freedom of independent estimate of o- 2 • 40 4.0 3.8 3.6
60 4.0 3.7

n = number of observations.
v= degrees of freedom of independent estimate of o- 2 .

300 301
Table VIlle Criticai vaiues for 5% and l% tests of discordancy fora singie outlier Tabie VIIId Criticai values for 5% and l% tests of discordancy for a singie outlier
in a normai sampie, using the externally and internally studentized deviation from in a normal sample, using the greatest externally and internally studentized deviation
the mean as test statistic from the mean as test statistic
Nv2 Nv4
5% criticai values 5% criticai vaiues
v v
2 3 4 6 12 50 2 3 4 6 12 50
n n
3 1.37 1.48 1.55 1.59 1.63 1.68 1.72 3 1.39 1.54 1.63 1.69 1.76 1.8 1.9
4 1.60 1.68 1.73 1.77 1.81 1.87 1.92 4 1.65 1.76 1.83 1.88 1.95 2.03 2.1
5 1.76 1.82 1.87 1.90 1.94 2.00 2.06 5 1.83 1.92 1.97 2.02 2.08 2.16 2.2
6 1.89 1.94 1.97 2.00 2.04 2.09 2.Ì'6 6 1.98 2.04 2.09 2.12 2.18 2.26 2.35
7 1.99 2.03 2.06 2.08 2.11 2.17 2.24 7 2.09 2.14 2.18 2.21 2.26 2.34 2.43
8 2.07 2.10 2.13 2.15 2.18 2.23 2.30 8 2.18 2.22 2.26 2.29 2.33 2.40 2.49
10 2.20 2.23 2.24 2.26 2.29 2.33 2.40 10 2.33 2.36 2.38 2.40 2.44 2.50 2.59
12 2.31 2.32 2.34 2.35 2.37 2.41 2.48 12 2.44 2.46 2.48 2.50 2.53 2.58 2.67
15 2.42 2.44 2.45 2.46 2.47 2.51 2.58 15 2.57 2.58 2.60 2.61 2.63 2.68 2.77
20 2.57 2.58 2.58 2.59 2.60 2.63 2.68 20 2.72 2.73 2.74 2.75 2.77 2.80 2.87

1 % criticai values 1% criticai values


v v
2 3 4 6 12 50 2 3 4 6 12 50
n n
3 1.40 1.58 1.70 1.79 1.90 2.04 2.17 3 1.41 1.60 1.74 1.84 1.97 2.15 2.3
4 1.69 1.82 1.92 1.99 2.09 2.22 2.30 4 1.70 1.86 1.97 2.06 2.18 2.35 2.53
5 1.90 2.00 2.08 2.14 2.23 2.36 2.51 5 1.93 2.05 2.14 2.21 2.32 2.48 2.67
6 2.06 2.14 2.21 2.26 2.33 2.46 2.61 6 2.10 2.20 2.28 2.34 2.43 2.58 2.77
7 2.19 2.25 2.31 2.35 2.42 2.53 2.69 7 2.24 2.32 2.39 2.44 2.52 2.66 2.85
8 2.29 2.35 2.40 2.43 2.49 2.60 2.75 8 2.36 2.42 2.48 2.53 2.60 2.73 2.92
10 2.46 2.50 2.54 2.57 2.61 2.70 2.85 10 2.54 2.59 2.63 2.67 2.73 2.84 3.01
12 2.59 2.62 2.65 2.67 2.70 2.79 2.92 12 2.68 2.71 2.75 2.78 2.82 2.92 3.09
15 2.73 2.75 2.77 2.79 2.82 2.88 3.01 15 2.84 2.86 2.89 2.91 2.94 3.02 3.18
20 2.90 2.91 2.93 2.94 2.96 3.01 3.12 20 3.01 3.03 3.05 3.06 3.09 3.15 3.28
n = number of observations. n = number of observations.
v = degrees of freedom of independent estimate of o- 2 • v degrees of freedom of independent estimate of o- 2 .

302 303
Table VIlle Criticai vaiues for 5% and l% tests of discordancy fora singie outlier Tabie VIIId Criticai values for 5% and l% tests of discordancy for a singie outlier
in a normai sampie, using the externally and internally studentized deviation from in a normal sample, using the greatest externally and internally studentized deviation
the mean as test statistic from the mean as test statistic
Nv2 Nv4
5% criticai values 5% criticai vaiues
v v
2 3 4 6 12 50 2 3 4 6 12 50
n n
3 1.37 1.48 1.55 1.59 1.63 1.68 1.72 3 1.39 1.54 1.63 1.69 1.76 1.8 1.9
4 1.60 1.68 1.73 1.77 1.81 1.87 1.92 4 1.65 1.76 1.83 1.88 1.95 2.03 2.1
5 1.76 1.82 1.87 1.90 1.94 2.00 2.06 5 1.83 1.92 1.97 2.02 2.08 2.16 2.2
6 1.89 1.94 1.97 2.00 2.04 2.09 2.Ì'6 6 1.98 2.04 2.09 2.12 2.18 2.26 2.35
7 1.99 2.03 2.06 2.08 2.11 2.17 2.24 7 2.09 2.14 2.18 2.21 2.26 2.34 2.43
8 2.07 2.10 2.13 2.15 2.18 2.23 2.30 8 2.18 2.22 2.26 2.29 2.33 2.40 2.49
10 2.20 2.23 2.24 2.26 2.29 2.33 2.40 10 2.33 2.36 2.38 2.40 2.44 2.50 2.59
12 2.31 2.32 2.34 2.35 2.37 2.41 2.48 12 2.44 2.46 2.48 2.50 2.53 2.58 2.67
15 2.42 2.44 2.45 2.46 2.47 2.51 2.58 15 2.57 2.58 2.60 2.61 2.63 2.68 2.77
20 2.57 2.58 2.58 2.59 2.60 2.63 2.68 20 2.72 2.73 2.74 2.75 2.77 2.80 2.87

1 % criticai values 1% criticai values


v v
2 3 4 6 12 50 2 3 4 6 12 50
n n
3 1.40 1.58 1.70 1.79 1.90 2.04 2.17 3 1.41 1.60 1.74 1.84 1.97 2.15 2.3
4 1.69 1.82 1.92 1.99 2.09 2.22 2.30 4 1.70 1.86 1.97 2.06 2.18 2.35 2.53
5 1.90 2.00 2.08 2.14 2.23 2.36 2.51 5 1.93 2.05 2.14 2.21 2.32 2.48 2.67
6 2.06 2.14 2.21 2.26 2.33 2.46 2.61 6 2.10 2.20 2.28 2.34 2.43 2.58 2.77
7 2.19 2.25 2.31 2.35 2.42 2.53 2.69 7 2.24 2.32 2.39 2.44 2.52 2.66 2.85
8 2.29 2.35 2.40 2.43 2.49 2.60 2.75 8 2.36 2.42 2.48 2.53 2.60 2.73 2.92
10 2.46 2.50 2.54 2.57 2.61 2.70 2.85 10 2.54 2.59 2.63 2.67 2.73 2.84 3.01
12 2.59 2.62 2.65 2.67 2.70 2.79 2.92 12 2.68 2.71 2.75 2.78 2.82 2.92 3.09
15 2.73 2.75 2.77 2.79 2.82 2.88 3.01 15 2.84 2.86 2.89 2.91 2.94 3.02 3.18
20 2.90 2.91 2.93 2.94 2.96 3.01 3.12 20 3.01 3.03 3.05 3.06 3.09 3.15 3.28
n = number of observations. n = number of observations.
v = degrees of freedom of independent estimate of o- 2 • v degrees of freedom of independent estimate of o- 2 .

302 303
Table IXc Table IXd
N~-t3 N~-t4
OON0r1')_N_O'IO-

'<:t

~
Il
-
o~
-r1')1:'---trl0\r1')\0\0r1')
ooo---Nr1')""'"trl
oooooooooo n
5%
k=2
1% 5%
k=3
1% 5%
k=4
1% 5%
k=2
1%
-Q) trlOtrl""'"O\O'IO'I""'"r1')00
~..c:
§ ... o~ ""'"t---NI:'---trlO'Ir1')N00
00--NNN""'"trltrl 4 2.68 2.77 0.034 0.0072
O"'
trl
oociooooooo 0.086 0.027
==
~ ~
5
6
2.90
3.10
3.05
3.27 3.85 4.04 0.128 0.052
= $2=
...... OOOOOOONO'IOO""'"NN
-N""'"t---Nt----\00r1')NO\
7 3.27 3.47 4.07 4.29 0.169 0.091
IJJ ...
o~ 8 3.41 3.63 4.30 4.52 4.98 5.25 0.215 0.125
~
""" u oooo--NNr1')""'"trllrl
oooooooooooo
-
Q)"-
..... ..0
~ r1') 9 3.53 3.77 4.50 4.72 5.23 5.52 0.255 0.157
~~ Q)"'" Il 10 3.63 3.89 4.66 4.90 5.44 5.76 0.286 0.186
~< ;:oZ ~
N""'"O'IO'I\0001:'--1:'--\000\0 12 3.83 4.12 4.94 5.24 5.84 6.17 0.349 0.244
Q)- ~ r1')\00'INO'Itrl0r1')1:'--000""'"
p.~
E-< o~ 14 3.98 4.30 5.18 5.53 6.18 6.55 0.400 0.298
g..g IJJ
trl
ooo--Nr1)r1)r1')trltrl\O
oooooooooooo 16 4.10 4.45 5.39 5.75 6.47 6.84 0.448 0.346
~·.;:::
18 4.22 4.59 5.57 5.95 6.72 7.10 0.484 0.386

-.... ...=
~
"""
OCZ! 20 4.34 4.70 5.75 6.13 6.94 7.35 0.510 0.416

-
r1')0\""'"trl00_""'"_0r1')-t---01:'--N
u ~ o~ ""'"
o-""'"r---o""'"o-..o-trlO'IN--..oo 30 4.71 5.13 6.32 6.78 7.73 8.17 0.616 0.543
oooo--NNr1)r1')r1)tr)\O\OOO
~·~C oooocioooooooooo 40 4.96 5.44 6.74 7.26 8.32 8.83 0.683 0.618
"'il) o N
50 5.12 5.65 6.99 7.52 8.65 9.19 0.729 0.670
ou"""~ ~=
o ~
Il
100 5.68 6.13 7.87 8.38 9.88 10.5 0.836 O.R08
.;a u O'I\ON00--01:'--trl\00-N0r1')
~~è o~ -trlO""'"O\r1')0trl0""'"0001:'--Nr1')
oo---Nr1')r1')""'"""'"""'"""""r---oo
o ... :.::: trl
ooooooooooooooo
:; o~
~=o
... o """
~ ·~ .2
_ ..... :>..
-o S'Q)
=Q)"'"'
~ p. g

-
OO""'"NNtrlO""'"""'"OOOO

-
~
o
O\r1')0\""'"00Ntrl\Or1')00r1')
Table IXf
~È~ rr)~~v-iv-i...O...Or--:oòoòci Table IXe
Nu4
trl ~ Nu3
t5
'E.~ ""'Il"
-u~ ~
IJJ ;::) s o~
Nt---NOO-N\Or1')00-
oo-r---N\OO'INNO\r1')1:'--
k=2 k=3 k=4 k=2
.E8<
~ 0..
trl rr)~~v-iv-iv-i-..or--:r--:ooo\ n
5% 1%
> Q) 5% 1% 5% 1% 5% 1%
-"""
~
.~.;a
·E
u~
..c:
Q)

Q)
-
~
~
<l)r1")
;:oZ
r1')

~
Il
-
~
o
OONr1')0""'"r1')""'"00o-oo""'"r---
o""'"r---o""'"oo-r1')\0""'"0\r1')N
rr)rr)rr)~~~v-iv-iv-i...O...Or--:oò
4
5
6
7
2.39
2.81
3.10
3.33
2.93
3.38
3.69
3.93
3.37
3.81
3.97
4.45
0.0012
0.037
0.162
0.351
0.00004
0.0077
0.049
0.137
~~ ~
E-< o~
t---O'IOON'<:tt---trl000r1')0\Nt---
O\Ntrl00Ntrl000r1')0""'"001:'-- 8 3.51 4.12 4.13 4.80 4.35 5.02 0.585 0.268
:;-r-< trl NMMM~~~v-iv-i...O...O...Or--:
9 3.66 4.28 4.45 5.14 4.83 5.61 0.869 0.438
~o
u 10 3.79 4.41 4.66 5.40 5.16 5.93 1.22 0.631
..o~ t::

-
~ ~ \000\NNOONOtrl-N0\-\0 12 4.00 4.63 5.03 5.81 5.74 6.56 2.03 1.21
~p. o~ -trli:'--ON""'"t---0'1-N""'"O'INtrlO
~- NNNrr)rr)rr)rr)rr)~~~~v-iv-i...O 14 4.17 4.79 5.31 6.09 6.18 7.01 2.94 1.95
-Q) N
Q)- Il 16 4.31 4.93 5.60 6.34 6.56 7.38 4.03 2.77
-P.
~ s ~
o--..or---""'"oo'<:f"\Or1)\0-\0'<:f"\ON
18 4.43 5.05 5.80 6.53 6.86 7.67 5.11 3.69
E-<~ o~ -""'"-..oooo-""'"-..oooO'I-trl000\0 20 4.53 5.14 5.96 6.66 7.10 7.90 6.21 4.42
trl NNC"'iC"'irr)rr)rr)rr)rr)M~~~v-iv-i
30 4.89 5.52 6.57 7.31 7.97 8.83 12.4 9.8
40 5.13 5.76 6.95 7.67 8.55 9.36 19.5 16.2
~
trl\01:'--000'ION""'"\00000000

304
-----Nr1')""'"trl0
- 50
100
5.28
5.77
5.94
6.32
7.18
8.02
7.94
8.73
8.89
10.1

305
9.74
10.9
26.8
67.7
22.9
60.0
Table IXc Table IXd
N~-t3 N~-t4
OON0r1')_N_O'IO-

'<:t

~
Il
-
o~
-r1')1:'---trl0\r1')\0\0r1')
ooo---Nr1')""'"trl
oooooooooo n
5%
k=2
1% 5%
k=3
1% 5%
k=4
1% 5%
k=2
1%
-Q) trlOtrl""'"O\O'IO'I""'"r1')00
~..c:
§ ... o~ ""'"t---NI:'---trlO'Ir1')N00
00--NNN""'"trltrl 4 2.68 2.77 0.034 0.0072
O"'
trl
oociooooooo 0.086 0.027
==
~ ~
5
6
2.90
3.10
3.05
3.27 3.85 4.04 0.128 0.052
= $2=
...... OOOOOOONO'IOO""'"NN
-N""'"t---Nt----\00r1')NO\
7 3.27 3.47 4.07 4.29 0.169 0.091
IJJ ...
o~ 8 3.41 3.63 4.30 4.52 4.98 5.25 0.215 0.125
~
""" u oooo--NNr1')""'"trllrl
oooooooooooo
-
Q)"-
..... ..0
~ r1') 9 3.53 3.77 4.50 4.72 5.23 5.52 0.255 0.157
~~ Q)"'" Il 10 3.63 3.89 4.66 4.90 5.44 5.76 0.286 0.186
~< ;:oZ ~
N""'"O'IO'I\0001:'--1:'--\000\0 12 3.83 4.12 4.94 5.24 5.84 6.17 0.349 0.244
Q)- ~ r1')\00'INO'Itrl0r1')1:'--000""'"
p.~
E-< o~ 14 3.98 4.30 5.18 5.53 6.18 6.55 0.400 0.298
g..g IJJ
trl
ooo--Nr1)r1)r1')trltrl\O
oooooooooooo 16 4.10 4.45 5.39 5.75 6.47 6.84 0.448 0.346
~·.;:::
18 4.22 4.59 5.57 5.95 6.72 7.10 0.484 0.386

-.... ...=
~
"""
OCZ! 20 4.34 4.70 5.75 6.13 6.94 7.35 0.510 0.416

-
r1')0\""'"trl00_""'"_0r1')-t---01:'--N
u ~ o~ ""'"
o-""'"r---o""'"o-..o-trlO'IN--..oo 30 4.71 5.13 6.32 6.78 7.73 8.17 0.616 0.543
oooo--NNr1)r1')r1)tr)\O\OOO
~·~C oooocioooooooooo 40 4.96 5.44 6.74 7.26 8.32 8.83 0.683 0.618
"'il) o N
50 5.12 5.65 6.99 7.52 8.65 9.19 0.729 0.670
ou"""~ ~=
o ~
Il
100 5.68 6.13 7.87 8.38 9.88 10.5 0.836 O.R08
.;a u O'I\ON00--01:'--trl\00-N0r1')
~~è o~ -trlO""'"O\r1')0trl0""'"0001:'--Nr1')
oo---Nr1')r1')""'"""'"""'"""""r---oo
o ... :.::: trl
ooooooooooooooo
:; o~
~=o
... o """
~ ·~ .2
_ ..... :>..
-o S'Q)
=Q)"'"'
~ p. g

-
OO""'"NNtrlO""'"""'"OOOO

-
~
o
O\r1')0\""'"00Ntrl\Or1')00r1')
Table IXf
~È~ rr)~~v-iv-i...O...Or--:oòoòci Table IXe
Nu4
trl ~ Nu3
t5
'E.~ ""'Il"
-u~ ~
IJJ ;::) s o~
Nt---NOO-N\Or1')00-
oo-r---N\OO'INNO\r1')1:'--
k=2 k=3 k=4 k=2
.E8<
~ 0..
trl rr)~~v-iv-iv-i-..or--:r--:ooo\ n
5% 1%
> Q) 5% 1% 5% 1% 5% 1%
-"""
~
.~.;a
·E
u~
..c:
Q)

Q)
-
~
~
<l)r1")
;:oZ
r1')

~
Il
-
~
o
OONr1')0""'"r1')""'"00o-oo""'"r---
o""'"r---o""'"oo-r1')\0""'"0\r1')N
rr)rr)rr)~~~v-iv-iv-i...O...Or--:oò
4
5
6
7
2.39
2.81
3.10
3.33
2.93
3.38
3.69
3.93
3.37
3.81
3.97
4.45
0.0012
0.037
0.162
0.351
0.00004
0.0077
0.049
0.137
~~ ~
E-< o~
t---O'IOON'<:tt---trl000r1')0\Nt---
O\Ntrl00Ntrl000r1')0""'"001:'-- 8 3.51 4.12 4.13 4.80 4.35 5.02 0.585 0.268
:;-r-< trl NMMM~~~v-iv-i...O...O...Or--:
9 3.66 4.28 4.45 5.14 4.83 5.61 0.869 0.438
~o
u 10 3.79 4.41 4.66 5.40 5.16 5.93 1.22 0.631
..o~ t::

-
~ ~ \000\NNOONOtrl-N0\-\0 12 4.00 4.63 5.03 5.81 5.74 6.56 2.03 1.21
~p. o~ -trli:'--ON""'"t---0'1-N""'"O'INtrlO
~- NNNrr)rr)rr)rr)rr)~~~~v-iv-i...O 14 4.17 4.79 5.31 6.09 6.18 7.01 2.94 1.95
-Q) N
Q)- Il 16 4.31 4.93 5.60 6.34 6.56 7.38 4.03 2.77
-P.
~ s ~
o--..or---""'"oo'<:f"\Or1)\0-\0'<:f"\ON
18 4.43 5.05 5.80 6.53 6.86 7.67 5.11 3.69
E-<~ o~ -""'"-..oooo-""'"-..oooO'I-trl000\0 20 4.53 5.14 5.96 6.66 7.10 7.90 6.21 4.42
trl NNC"'iC"'irr)rr)rr)rr)rr)M~~~v-iv-i
30 4.89 5.52 6.57 7.31 7.97 8.83 12.4 9.8
40 5.13 5.76 6.95 7.67 8.55 9.36 19.5 16.2
~
trl\01:'--000'ION""'"\00000000

304
-----Nr1')""'"trl0
- 50
100
5.28
5.77
5.94
6.32
7.18
8.02
7.94
8.73
8.89
10.1

305
9.74
10.9
26.8
67.7
22.9
60.0
Table IXg Table IXh
N~-tu3 N~-tu4
Table Xla,b Criticai values for 5% and 1% tests of
k=3 k=4 k=2 discordancy for a lower and upper outlier-pair in a nor-
k 2
mal sample, using the studentized range (XIa) or the
n
5% 1% 5% 1% 5% 1% 5% 1% standardized range (XIb) as test statistic
Table Xla Table XIb
4 3.27 4.14 0.071 0.015 N6 Nu6 (N~-tu6)
5 3.44 4.26 0.232 0.067
6 3.65 4.46 4.48 5.47 0.450 0.166 n 5% 1% 5% 1%
7 3.82 4.58 4.72 5.69 0.683 0.335
8 3.95 4.70 4.91 5.91 5.57 6.72 1.05 0.539 3 2.00 2.00 3.31 4.12
9 4.04 4.78 5.10 6.08 5.88 6.99 1.41 0.741 4 2.43 2.45 3.63 4.40
10 4.12 4.87 5.25 6.23 6.12 1.23 1.82 0.984 5 2.75 2.80 3.86 4.60
12 4.28 5.00 5.54 6.51 6.54 7.63 2.66 1.57 6 3.01 3.10 4.03 4.76
14 4.42 5.14 5.81 6.70 6.89 7.94 3.63 2.40 7 3.22 3.34 4.17 4.88
16 4.54 5.25 6.02 6.87 7.18 8.21 4.73 3.24 8 3.40 3.54 4.29 4.99
18 4.64 5.35 6.18 7.03 7.40 8.43 5.89 4.24 9 3.55 3.72 4.39 5.08
20 4.73 5.44 6.30 7.16 7.60 8.63 7.12 5.28 10 3.69 3.87 4.47 5.16
30 5.01 5.68 6.77 7.61 8.31 9.26 13.3 10.7 12 3.91 4.13 4.62 5.29
40 5.24 5.85 7.15 7.90 8.81 9.73 20.4 16.9 14 4.09 4.34 4.74 5.40
50 5.36 6.03 7.34 8.17 9.12 10.1 27.9 23.8 16 4.24 4.52 4.85 5.49
100 5.81 6.39 8.11 8.83 10.2 11.1 68.6 61.4 18 4.37 4.67 4.93 5.57
20 4.49 4.80 5.01 5.65
n = number of observations. 30 4.89 5.26 5.30 5.91
k = number of outliers. 40 5.16 5.56 5.50 6.09
50 5.35 5.77 5.65 6.23
60 5.51 5.94 5.76 6.34
Table X Criticai values for 5% and l% tests of discordancy fora lower and upper 100 5.90 6.36 6.08 6.64
outlier-pair in a normal sample, using as statistic the ratio of the reduced sum of 200 6.39 6.84
squares to either the total sum of squares or the population variance 500 6.94 7.42
Table Xa Table Xb Table Xc Table Xd 1000 7.33 7.80
NS N~-tS NuS N~-tu5
n = number of observations.
n 5% 1% 5% 1% 5% 1% 5% 1%

4 0.00044 0.00001 0.019 0.0030 0.00009 0.00002 0.038 0.0062


5 0.011 0.002 0.049 0.017 0.025 0.0049 0.131 0.042
6 0.044 0.014 0.092 0.040 0.123 0.036 0.308 0.126
7 0.078 0.033 0.127 0.062 0.265 0.100 0.511 0.224
8 0.120 0.060 0.169 0.093 0.492 0.239 0.801 0.400
9 0.159 0.093 0.205 0.127 0.727 0.370 1.09 0.583
10 0.195 0.122 0.239 0.158 1.01 0.562 1.46 0.839
12 0.266 0.181 0.299 0.214 1.80 1.09 2.24 1.43
14 0.320 0.236 0.352 0.265 2.66 1.75 3.15 2.11
16 0.369 0.288 0.393 0.308 3.62 2.54 4.20 2.88
18 0.411 0.325 0.432 0.347 4.69 3.35 5.22 3.74
20 0.448 0.363 0.468 0.381 5.78 4.16 6.43 4.71
30 0.571 0.509 0.581 0.517 11.7 9.45 12.5 10.2
40 0.644 0.584 0.651 0.591 18.8 15.6 19.4 16.3
50 0.699 0.648 0.703 0.654 26.0 22.2 26.8 22.8
100 0.821 0.794 0.823 0.796 66.6 59.2 67.4 60.0

n = number of observations.
306
307
Table IXg Table IXh
N~-tu3 N~-tu4
Table Xla,b Criticai values for 5% and 1% tests of
k=3 k=4 k=2 discordancy for a lower and upper outlier-pair in a nor-
k 2
mal sample, using the studentized range (XIa) or the
n
5% 1% 5% 1% 5% 1% 5% 1% standardized range (XIb) as test statistic
Table Xla Table XIb
4 3.27 4.14 0.071 0.015 N6 Nu6 (N~-tu6)
5 3.44 4.26 0.232 0.067
6 3.65 4.46 4.48 5.47 0.450 0.166 n 5% 1% 5% 1%
7 3.82 4.58 4.72 5.69 0.683 0.335
8 3.95 4.70 4.91 5.91 5.57 6.72 1.05 0.539 3 2.00 2.00 3.31 4.12
9 4.04 4.78 5.10 6.08 5.88 6.99 1.41 0.741 4 2.43 2.45 3.63 4.40
10 4.12 4.87 5.25 6.23 6.12 1.23 1.82 0.984 5 2.75 2.80 3.86 4.60
12 4.28 5.00 5.54 6.51 6.54 7.63 2.66 1.57 6 3.01 3.10 4.03 4.76
14 4.42 5.14 5.81 6.70 6.89 7.94 3.63 2.40 7 3.22 3.34 4.17 4.88
16 4.54 5.25 6.02 6.87 7.18 8.21 4.73 3.24 8 3.40 3.54 4.29 4.99
18 4.64 5.35 6.18 7.03 7.40 8.43 5.89 4.24 9 3.55 3.72 4.39 5.08
20 4.73 5.44 6.30 7.16 7.60 8.63 7.12 5.28 10 3.69 3.87 4.47 5.16
30 5.01 5.68 6.77 7.61 8.31 9.26 13.3 10.7 12 3.91 4.13 4.62 5.29
40 5.24 5.85 7.15 7.90 8.81 9.73 20.4 16.9 14 4.09 4.34 4.74 5.40
50 5.36 6.03 7.34 8.17 9.12 10.1 27.9 23.8 16 4.24 4.52 4.85 5.49
100 5.81 6.39 8.11 8.83 10.2 11.1 68.6 61.4 18 4.37 4.67 4.93 5.57
20 4.49 4.80 5.01 5.65
n = number of observations. 30 4.89 5.26 5.30 5.91
k = number of outliers. 40 5.16 5.56 5.50 6.09
50 5.35 5.77 5.65 6.23
60 5.51 5.94 5.76 6.34
Table X Criticai values for 5% and l% tests of discordancy fora lower and upper 100 5.90 6.36 6.08 6.64
outlier-pair in a normal sample, using as statistic the ratio of the reduced sum of 200 6.39 6.84
squares to either the total sum of squares or the population variance 500 6.94 7.42
Table Xa Table Xb Table Xc Table Xd 1000 7.33 7.80
NS N~-tS NuS N~-tu5
n = number of observations.
n 5% 1% 5% 1% 5% 1% 5% 1%

4 0.00044 0.00001 0.019 0.0030 0.00009 0.00002 0.038 0.0062


5 0.011 0.002 0.049 0.017 0.025 0.0049 0.131 0.042
6 0.044 0.014 0.092 0.040 0.123 0.036 0.308 0.126
7 0.078 0.033 0.127 0.062 0.265 0.100 0.511 0.224
8 0.120 0.060 0.169 0.093 0.492 0.239 0.801 0.400
9 0.159 0.093 0.205 0.127 0.727 0.370 1.09 0.583
10 0.195 0.122 0.239 0.158 1.01 0.562 1.46 0.839
12 0.266 0.181 0.299 0.214 1.80 1.09 2.24 1.43
14 0.320 0.236 0.352 0.265 2.66 1.75 3.15 2.11
16 0.369 0.288 0.393 0.308 3.62 2.54 4.20 2.88
18 0.411 0.325 0.432 0.347 4.69 3.35 5.22 3.74
20 0.448 0.363 0.468 0.381 5.78 4.16 6.43 4.71
30 0.571 0.509 0.581 0.517 11.7 9.45 12.5 10.2
40 0.644 0.584 0.651 0.591 18.8 15.6 19.4 16.3
50 0.699 0.648 0.703 0.654 26.0 22.2 26.8 22.8
100 0.821 0.794 0.823 0.796 66.6 59.2 67.4 60.0

n = number of observations.
306
307
Table Xlc Criticai values for 5% and l% tests of discordancy for a lower and upper outlier-pair in a norrnal sarnple, using the
externally studentized range as test statistic
Nv6
5% criticai vaiues

v l 2 3 4 5 6 8 lO 12 15 20 30 60
n

3 27.0 8.33 5.91 5.04 4.60 4.34 4.04 3.88 3.77 3.67 3.58 3.49 3.40
4 32.8 9.80 6.82 5.76 5.22 4.90 4.53 4.33 4.20 4.08 3.96 3.85 3.74
w 5 37.1 10.9 7.50 6.29 5.67 5.30 4.89 4.65 4.51 4.37 4.23 4.10 3.98
o
00
6 40.4 11.7 8.04 6.71 6.03 5.63 5.17 4.91 4.75 4.59 4.45 4.30 4.16
7 43.1 12.4 8.48 7.05 6.33 5.90 5.40 5.12 4.95 4.78 4.62 4.46 4.31
8 45.4 13.0 8.85 7.35 6.58 6.12 5.60 5.30 5.12 4.94 4.77 4.60 4.44
9 47.4 13.5 9.18 7.60 6.80 6.32 5.77 5.46 5.27 5.0~ 4.90 4.72 4.55
lO 49.1 14.0 9.46 7.83 6.99 6.49 5.92 5.60 5.39 5.20 5.01 4.82 4.65
12 52.0 14.8 9.95 8.21 7.32 6.79 6.18 5.83 5.61 5.40 5.20 5.00 4.81
14 54.3 15.4 10.3 8.52 7.60 7.03 6.39 6.03 5.80 5.57 5.36 5.15 4.94
16 56.3 15.9 10.7 8.79 7.83 7.24 6.57 6.19 5.95 5.72 5.49 5.27 5.06
18 58.0 16.4 11.0 9.03 8.03 7.43 6.73 6.34 6.09 5.85 5.61 5.38 5.15
20 59.6 16.8 11.2 9.23 8.21 7.59 6.87 6.47 6.21 5.96 5.71 5.47 5.24

l % criticai vaiues

v
l 2 3 4 5 6 8 lO 12 15 20 30 60
n

3 135.0 19.0 10.6 8.12 6.98 6.33 5.64 5.27 5.05 4.84 4.64 4.45 4.28
4 164.3 22.3 12.2 9.17 7.80 7.03 6.20 5.77 5.50 5.25 5.02 4.80 4.59
5 185.6 24.7 13.3 9.96 8.42 7.56 6.62 6.14 5.84 5.56 5.29 5.05 4.82
6 202.2 26.6 14.2 10.6 8.91 7.97 6.96 6.43 6.10 5.80 5.51 5.24 4.99
7 215.8 28.2 15.0 11.1 9.32 8.32 7.24 6.67 6.32 5.99 5.69 5.40 5.13
w 8 227.2 29.5 15.6 11.5 9.67 8.61 7.47 6.87 6.51 6.16 5.84 5.54 5.25
o
\0
9 237.0 30.7 16.2 11.9 9.97 8.87 7.68 7.05 6.67 6.31 5.97 5.65 5.36
lO 245.6 31.7 16.7 12.3 10.2 9.10 7.86 7.21 6.81 6.44 6.09 5.76 5.45
12 260.0 33.4 17.5 12.8 10.7 9.48 8.18 7.49 7.06 6.66 6.28 5.93 5.60
14 271.8 34.8 18.2 13.3 11.1 9.81 8.44 7.71 7.26 6.84 6.45 6.08 5.73
16 281.8 36.0 18.8 13.7 11.4 10.1 8.66 7.91 7.44 7.00 6.59 6.20 5.84
18 290.4 37.0 19.3 14.1 11.7 10.3 8.85 8.08 7.59 7.14 6.71 6.31 5.93
20 298.0 37.9 19.8 14.4 11.'9 10.5 9.03 8.23 7.73 7.26 6.82 6.41 6.01

n = number of observations.
v=degrees of freedom of independent estimate of a 2 . The criticai values for v=oo are given by Table Xlb.
Table Xlc Criticai values for 5% and l% tests of discordancy for a lower and upper outlier-pair in a norrnal sarnple, using the
externally studentized range as test statistic
Nv6
5% criticai vaiues

v l 2 3 4 5 6 8 lO 12 15 20 30 60
n

3 27.0 8.33 5.91 5.04 4.60 4.34 4.04 3.88 3.77 3.67 3.58 3.49 3.40
4 32.8 9.80 6.82 5.76 5.22 4.90 4.53 4.33 4.20 4.08 3.96 3.85 3.74
w 5 37.1 10.9 7.50 6.29 5.67 5.30 4.89 4.65 4.51 4.37 4.23 4.10 3.98
o
00
6 40.4 11.7 8.04 6.71 6.03 5.63 5.17 4.91 4.75 4.59 4.45 4.30 4.16
7 43.1 12.4 8.48 7.05 6.33 5.90 5.40 5.12 4.95 4.78 4.62 4.46 4.31
8 45.4 13.0 8.85 7.35 6.58 6.12 5.60 5.30 5.12 4.94 4.77 4.60 4.44
9 47.4 13.5 9.18 7.60 6.80 6.32 5.77 5.46 5.27 5.0~ 4.90 4.72 4.55
lO 49.1 14.0 9.46 7.83 6.99 6.49 5.92 5.60 5.39 5.20 5.01 4.82 4.65
12 52.0 14.8 9.95 8.21 7.32 6.79 6.18 5.83 5.61 5.40 5.20 5.00 4.81
14 54.3 15.4 10.3 8.52 7.60 7.03 6.39 6.03 5.80 5.57 5.36 5.15 4.94
16 56.3 15.9 10.7 8.79 7.83 7.24 6.57 6.19 5.95 5.72 5.49 5.27 5.06
18 58.0 16.4 11.0 9.03 8.03 7.43 6.73 6.34 6.09 5.85 5.61 5.38 5.15
20 59.6 16.8 11.2 9.23 8.21 7.59 6.87 6.47 6.21 5.96 5.71 5.47 5.24

l % criticai vaiues

v
l 2 3 4 5 6 8 lO 12 15 20 30 60
n

3 135.0 19.0 10.6 8.12 6.98 6.33 5.64 5.27 5.05 4.84 4.64 4.45 4.28
4 164.3 22.3 12.2 9.17 7.80 7.03 6.20 5.77 5.50 5.25 5.02 4.80 4.59
5 185.6 24.7 13.3 9.96 8.42 7.56 6.62 6.14 5.84 5.56 5.29 5.05 4.82
6 202.2 26.6 14.2 10.6 8.91 7.97 6.96 6.43 6.10 5.80 5.51 5.24 4.99
7 215.8 28.2 15.0 11.1 9.32 8.32 7.24 6.67 6.32 5.99 5.69 5.40 5.13
w 8 227.2 29.5 15.6 11.5 9.67 8.61 7.47 6.87 6.51 6.16 5.84 5.54 5.25
o
\0
9 237.0 30.7 16.2 11.9 9.97 8.87 7.68 7.05 6.67 6.31 5.97 5.65 5.36
lO 245.6 31.7 16.7 12.3 10.2 9.10 7.86 7.21 6.81 6.44 6.09 5.76 5.45
12 260.0 33.4 17.5 12.8 10.7 9.48 8.18 7.49 7.06 6.66 6.28 5.93 5.60
14 271.8 34.8 18.2 13.3 11.1 9.81 8.44 7.71 7.26 6.84 6.45 6.08 5.73
16 281.8 36.0 18.8 13.7 11.4 10.1 8.66 7.91 7.44 7.00 6.59 6.20 5.84
18 290.4 37.0 19.3 14.1 11.7 10.3 8.85 8.08 7.59 7.14 6.71 6.31 5.93
20 298.0 37.9 19.8 14.4 11.'9 10.5 9.03 8.23 7.73 7.26 6.82 6.41 6.01

n = number of observations.
v=degrees of freedom of independent estimate of a 2 . The criticai values for v=oo are given by Table Xlb.
Table XII Criticai values for 5% and l% tests of discordancy for k lower outliers Table XIIIa,b,c,d,e,f,g Criticai values for 5% and l% Dixon-type tests of discor-
and k upper outliers in a normal sample, using the standardized (k -l)th quasi-range dancy for one or more outliers in a norrnal sarnple
as test statistic Table XIIIa Table XIIIb Table XIIIc Table XIIId
N o-9 (N p,o-9) N7 (Np,7) N8 (Np,8) N9 (Np,9) N10 (N p, lO)

k=2 k=3 k=4 n 5% 1% 5% 1% 5% 1% 5% 1%


n
5% 1% 5% 1% 5% 1% 3 0.941 0.988
4 0.765 0.889 0.831 0.922 0.955 0.991
4 1.58 2.17 5 0.642 0.780 0.717 0.831 0.807 0.916 0.960 0.992
5 2.05 2.61 6 0.560 0.698 0.621 0.737 0.689 0.805 0.824 0.925
6 2.35 2.89 1.11 1.56 7 0.507 0.637 0.570 0.694 0.610 0.740 0.712 0.836
7 2.58 3.10 1.51 1.95 8 0.468 0.590 0.524 0.638 0.554 0.683 0.632 0.760
8 2.75 3.26 1.78 2.21 0.859 1.23 9 0.437 0.555 0.492 0.594 0.512 0.635 0.580 0.701
9 2.90 3.40 1.99 2.41 1.20 1.56 lO 0.412 0.527 0.464 0.564 0.477 0.597 0.537 0.655
lO 3.03 3.52 2.15 2.57 1.44 1.80 12 0.376 0.482 0.429 0.520 0.428 0.541 0.473 0.590
12 3.23 3.71 2.42 2.82 1.79 2.14 14 0.349 0.450 0.397 0.485 0.395 0.502 0.432 0.542
14 3.39 3.86 2.62 3.01 2.04 2.38 16 0.329 0.426 0.376 0.461 0.369 0.472 0.401 0.508
16 3.53 3.98 2.79 3.17 2.24 2.57 18 0.313 0.407 0.354 0.438 0.349 0.449 0.377 0.480
18 3.64 4.09 2.93 3.30 2.40 2.73 20 0.300 0.391 0.340 0.417 0.334 0.430 0.358 0.458
20 3.74 4.18 3.05 3.41 2.54 2.86 25 0.277 0.362 0.316 0.386 0.304 0.394 0.324 0.417
30 4.11 4.52 3.47 3.81 3.02 3.32 30 0.260 0.341 0.300 0.368 0.283 0.369 0.301 0.389
40 4.35 4.75 3.74 4.07 3.32 3.61
50 4.53 4.92 3.94 4.26 3.54 3.82
60 4.67 5.05 4.10 4.41 3.71 3.98
70 4.79 5.16 4.23 4.54 3.85 4.12 Table XIIIe Table XIIIf Table XIIIg
80 4.88 5.26 4.34 4.64 3.97 4.23 Nll (Np,ll) Nl2 (Np,12) Nl3 (N IL 13)
90 4.97 5.34 4.43 4.73 4.07 4.33
100 5.05 5.41 4.52 4.81 4.16 4.41 n 5% 1% 5% 1% 5% 1%
n = nurnber of observations. 4 0.967 0.992
2k = nurnber of outliers (k upper and k lower).
5 0.845 0.929 0.976 0.995
6 0.736 0.836 0.872 0.951 0.983 0.995
7 0.661 0.778 0.780 0.885 0.881 0.945
8 0.607 0.710 0.710 0.829 0.803 0.890
9 0.565 0.667 0.657 0.776 0.737 0.840
lO 0.531 0.632 0.612 0.726 0.682 0.791
12 0.481 0.579 0.546 0.642 0.600 0.704
14 0.445 0.538 0.501 0.593 0.546 0.641
16 0.418 0.508 0.467 0.557 0.507 0.595
18 0.397 0.484 0.440 0.529 0.475 0.561
20 0.372 0.464 0.419 0.506 0.450 0.535
25 0.343 0.428 0.382 0.464 0.406 0.489
30 0.322 0.402 0.355 0.433 0.376 0.457

n = number of observations.

310 311
Table XII Criticai values for 5% and l% tests of discordancy for k lower outliers Table XIIIa,b,c,d,e,f,g Criticai values for 5% and l% Dixon-type tests of discor-
and k upper outliers in a normal sample, using the standardized (k -l)th quasi-range dancy for one or more outliers in a norrnal sarnple
as test statistic Table XIIIa Table XIIIb Table XIIIc Table XIIId
N o-9 (N p,o-9) N7 (Np,7) N8 (Np,8) N9 (Np,9) N10 (N p, lO)

k=2 k=3 k=4 n 5% 1% 5% 1% 5% 1% 5% 1%


n
5% 1% 5% 1% 5% 1% 3 0.941 0.988
4 0.765 0.889 0.831 0.922 0.955 0.991
4 1.58 2.17 5 0.642 0.780 0.717 0.831 0.807 0.916 0.960 0.992
5 2.05 2.61 6 0.560 0.698 0.621 0.737 0.689 0.805 0.824 0.925
6 2.35 2.89 1.11 1.56 7 0.507 0.637 0.570 0.694 0.610 0.740 0.712 0.836
7 2.58 3.10 1.51 1.95 8 0.468 0.590 0.524 0.638 0.554 0.683 0.632 0.760
8 2.75 3.26 1.78 2.21 0.859 1.23 9 0.437 0.555 0.492 0.594 0.512 0.635 0.580 0.701
9 2.90 3.40 1.99 2.41 1.20 1.56 lO 0.412 0.527 0.464 0.564 0.477 0.597 0.537 0.655
lO 3.03 3.52 2.15 2.57 1.44 1.80 12 0.376 0.482 0.429 0.520 0.428 0.541 0.473 0.590
12 3.23 3.71 2.42 2.82 1.79 2.14 14 0.349 0.450 0.397 0.485 0.395 0.502 0.432 0.542
14 3.39 3.86 2.62 3.01 2.04 2.38 16 0.329 0.426 0.376 0.461 0.369 0.472 0.401 0.508
16 3.53 3.98 2.79 3.17 2.24 2.57 18 0.313 0.407 0.354 0.438 0.349 0.449 0.377 0.480
18 3.64 4.09 2.93 3.30 2.40 2.73 20 0.300 0.391 0.340 0.417 0.334 0.430 0.358 0.458
20 3.74 4.18 3.05 3.41 2.54 2.86 25 0.277 0.362 0.316 0.386 0.304 0.394 0.324 0.417
30 4.11 4.52 3.47 3.81 3.02 3.32 30 0.260 0.341 0.300 0.368 0.283 0.369 0.301 0.389
40 4.35 4.75 3.74 4.07 3.32 3.61
50 4.53 4.92 3.94 4.26 3.54 3.82
60 4.67 5.05 4.10 4.41 3.71 3.98
70 4.79 5.16 4.23 4.54 3.85 4.12 Table XIIIe Table XIIIf Table XIIIg
80 4.88 5.26 4.34 4.64 3.97 4.23 Nll (Np,ll) Nl2 (Np,12) Nl3 (N IL 13)
90 4.97 5.34 4.43 4.73 4.07 4.33
100 5.05 5.41 4.52 4.81 4.16 4.41 n 5% 1% 5% 1% 5% 1%
n = nurnber of observations. 4 0.967 0.992
2k = nurnber of outliers (k upper and k lower).
5 0.845 0.929 0.976 0.995
6 0.736 0.836 0.872 0.951 0.983 0.995
7 0.661 0.778 0.780 0.885 0.881 0.945
8 0.607 0.710 0.710 0.829 0.803 0.890
9 0.565 0.667 0.657 0.776 0.737 0.840
lO 0.531 0.632 0.612 0.726 0.682 0.791
12 0.481 0.579 0.546 0.642 0.600 0.704
14 0.445 0.538 0.501 0.593 0.546 0.641
16 0.418 0.508 0.467 0.557 0.507 0.595
18 0.397 0.484 0.440 0.529 0.475 0.561
20 0.372 0.464 0.419 0.506 0.450 0.535
25 0.343 0.428 0.382 0.464 0.406 0.489
30 0.322 0.402 0.355 0.433 0.376 0.457

n = number of observations.

310 311
Table XIIIh,i Criticai values for 5% and l% tests of
discordancy for one, or two, upper outliers in a norrnal Table XIVc
sarnple, using [x(n)- X(n-l)]/o- (XIIIh) or [x(n-1)- X(n-2)]/o- NJL6
(Xliii) as test statistic
Table XIIIh Table Xliii n 5% 1%
N o-7 (N JLo-7) N o-8 (N JLo-8)
3 2.8 3.0
n 5% 1% 5% 1% 4 3.3 3.7
5 3.6 4.3
3 2.17 2.90 2.17 2.90 6 3.8 4.7
1.46 2.03 0.96 1.38 7 3.9 4.9
lO
20 1.28 1.80 0.79 1.14 8 4.0 5.1
30 1.20 1.70 0.73 1.05 9 4.0 5.3
40 1.14 1.63 0.68 1.00 lO 4.1 5.4
60 1.08 1.56 0.63 0.93 12 4.2 5.5
80 1.04 1.50 0.61 0.90 14 4.2 5.5
100 1.02 1.47 0.58 0.86 16 4.2 5.5
200 0.95 1.38 0.54 0.81 18 4.2 5.5
500 0.87 1.28 0.48 0.73 20 4.2 5.4
1000 0.83 1.22 0.45 0.67 30 4.2 5.2
40 4.1 5.0
n = number of observations. 50 4.0 4.9
100 3.8 4.3

n number of observations.
Table XIVa,b,c Criticai values for 5% and l% tests of
discordancy for one or more outliers in a norrnal sarnple,
using as test statistic the sarnple skewness (XIVa), the
sarnple kurtosis (XIVb ), or the sarnple kurtosis based o n Table XV Criticai values for 5% and l% tests of discordancy for k outliers in a
deviations frorn the population rnean (XIVc)
norrnal sarnple, using Tietjen and Moore's Ek -statistic
Table XIVa Table XIVb
Nl6
Nl4 Nl5
k=2 k=3 k=4
n 5% 1% 5% 1% n
5% 1% 5% 1% 5% 1%
5 l. O 1.3 2.9 3.1
lO 0.9 1.3 3.9 4.8
15 0.8 1.2 4.1 5.1 6 0.034 0.012
20 0.8 l. l 4.1 5.2 7 0.065 0.028 0.016 0.006
25 0.71 1.06 4.0 5.0 8 0.099 0.050 0.034 0.014
30 0.66 0.99 9 0.137 0.078 0.057 0.026 0.021 0.009
40 0.59 0.87 lO 0.172 0.101 0.083 0.043 0.037 0.018
50 0.53 0.79 3.99 4.88 12 0.234 0.159 0.133 0.083 0.073 0.042
60 0.49 0.72 14 0.293 0.207 0.179 0.123 0.112 0.072
70 0.46 0.67 16 0.340 0.263
4.59 0.227 0.166 0.153 0.107
75 3.87 18 0.382 0.306 0.267 0.206 0.187 0.141
80 0.43 0.63 20 0.416 0.339 0.302 0.236 0.221 0.170
90 0.41 0.60 30 0.549 0.482 0.443 0.386 0.364 0.308
100 0.39 0.57 3.77 4.39 40 0.629 0.574 0.534 0.480 0.458 0.408
200 0.28 0.40 3.57 3.98 50 0.684 0.636 0.599 0.550 0.529 0.482
500 0.18 0.26 3.37 3.60
1000 0.13 0.18 3.26 3.41 n = number of observations.
k = number of outliers.
312
313
Table XIIIh,i Criticai values for 5% and l% tests of
discordancy for one, or two, upper outliers in a norrnal Table XIVc
sarnple, using [x(n)- X(n-l)]/o- (XIIIh) or [x(n-1)- X(n-2)]/o- NJL6
(Xliii) as test statistic
Table XIIIh Table Xliii n 5% 1%
N o-7 (N JLo-7) N o-8 (N JLo-8)
3 2.8 3.0
n 5% 1% 5% 1% 4 3.3 3.7
5 3.6 4.3
3 2.17 2.90 2.17 2.90 6 3.8 4.7
1.46 2.03 0.96 1.38 7 3.9 4.9
lO
20 1.28 1.80 0.79 1.14 8 4.0 5.1
30 1.20 1.70 0.73 1.05 9 4.0 5.3
40 1.14 1.63 0.68 1.00 lO 4.1 5.4
60 1.08 1.56 0.63 0.93 12 4.2 5.5
80 1.04 1.50 0.61 0.90 14 4.2 5.5
100 1.02 1.47 0.58 0.86 16 4.2 5.5
200 0.95 1.38 0.54 0.81 18 4.2 5.5
500 0.87 1.28 0.48 0.73 20 4.2 5.4
1000 0.83 1.22 0.45 0.67 30 4.2 5.2
40 4.1 5.0
n = number of observations. 50 4.0 4.9
100 3.8 4.3

n number of observations.
Table XIVa,b,c Criticai values for 5% and l% tests of
discordancy for one or more outliers in a norrnal sarnple,
using as test statistic the sarnple skewness (XIVa), the
sarnple kurtosis (XIVb ), or the sarnple kurtosis based o n Table XV Criticai values for 5% and l% tests of discordancy for k outliers in a
deviations frorn the population rnean (XIVc)
norrnal sarnple, using Tietjen and Moore's Ek -statistic
Table XIVa Table XIVb
Nl6
Nl4 Nl5
k=2 k=3 k=4
n 5% 1% 5% 1% n
5% 1% 5% 1% 5% 1%
5 l. O 1.3 2.9 3.1
lO 0.9 1.3 3.9 4.8
15 0.8 1.2 4.1 5.1 6 0.034 0.012
20 0.8 l. l 4.1 5.2 7 0.065 0.028 0.016 0.006
25 0.71 1.06 4.0 5.0 8 0.099 0.050 0.034 0.014
30 0.66 0.99 9 0.137 0.078 0.057 0.026 0.021 0.009
40 0.59 0.87 lO 0.172 0.101 0.083 0.043 0.037 0.018
50 0.53 0.79 3.99 4.88 12 0.234 0.159 0.133 0.083 0.073 0.042
60 0.49 0.72 14 0.293 0.207 0.179 0.123 0.112 0.072
70 0.46 0.67 16 0.340 0.263
4.59 0.227 0.166 0.153 0.107
75 3.87 18 0.382 0.306 0.267 0.206 0.187 0.141
80 0.43 0.63 20 0.416 0.339 0.302 0.236 0.221 0.170
90 0.41 0.60 30 0.549 0.482 0.443 0.386 0.364 0.308
100 0.39 0.57 3.77 4.39 40 0.629 0.574 0.534 0.480 0.458 0.408
200 0.28 0.40 3.57 3.98 50 0.684 0.636 0.599 0.550 0.529 0.482
500 0.18 0.26 3.37 3.60
1000 0.13 0.18 3.26 3.41 n = number of observations.
k = number of outliers.
312
313
Table XVIa Criticai values for 5% and Table XVIb Values of the constants an,n-i+t required for calculating Shapiro and
l% tests for the presence of an un- Wilks' W-statistic TN 17 =L 2 /S 2 , where
defined nurnber of discordant values in a [n/2] n
norrnal sarnple, using Shapiro
Wilks' W-statistic
an d L= I
i=l
an,n-i+l[x(n-i+l)- x(i)] an d 2
S = I
i=l
(xi -xl
N17
2 3 4 5 6 7 8 9 10
n
n 5% 1%
3 0.7071
3 0.767 0.753 4 0.6872 0.1677
4 0.748 ().687
5 0.6646 0.2413
5 0.762 0.686 6 0.6431 0.2806 0.0875
6 0.788 0.713 7 0.6233 0.3031 0.1401
7 0.803 0.730 8 0.6052 0.3164 0.1743 0.0561
8 0.818 0.749 9 0.5888 0.3244 0.1976 0.0947
9 0.829 0.764 10 0.5739 0.3291 0.2141 0.1224 0.0399
lO 0.842 0.781 12 0.5475 0.3325 0.2347 0.1586 0.0922 0.0303
12 0.859 0.805 14 0.5251 0.3318 0.2460 0.1802 0'.1240 0.0727 0.0240
14 0.874 0.825 16 0.5056 0.3290 0.2521 0.1939 0.1447 0.1005 0.0593 0.0196
16 0.887 0.844 18 0.4886 0.3253 0.255~ 0.2027 0.1587 0.1197 0.0837 0.0496 0.0163
18 0.897 0.858 20 0.4734 0.3211 0.2565 0.2085 0.1686 0.1334 0.1013 0.0711 0.0422 0.0140
20 0.905 0.868 25 0.4450 0.3069 0.2543 0.2148 0.1822 0.1539 0.1283 0.1046 0.0823 0.0610
25 0.918 0.888 30 0.4254 0.2944 0.2487 0.2148 0.1870 0.1630 0.1415 0.1219 0.1036 0.0862
30 0.927 0.900 35 0.4096 0.2834 0.2427 0.2127 0.1883 0.1673 0.1487 0.1317 0.1160 0.1013
35 0.934 0.910 40 0.3964 0.2737 0.2368 0.2098 0.1878 0.1691 0.1526 0.1376 0.1237 0.1108
40 0.940 0.919 45 0.3850 0.2651 0.2313 0.2065 0.1865 0.1695 0.1545 0.1410 0.1286 0.1170
45 0.945 0.926 50 0.3751 0.2574 0.2260 0.2032 0.1847 0.1691 0.1554 0.1430 0.1317 0.1212
50 0.947 0.930

n = number of observations.

11 12 13 14 15 16 17 18 19 20
n

25 0.0403 0.0200
30 0.0697 0.0537 0.0381 0.0227 0.0076
35 0.0873 0.0739 0.0610 0.0484 0.0361 0.0239 0.0119
40 0.0986 0.0870 0.0759 0.0651 0.0546 0.0444 0.0343 0.0244 0.0146 0.0049
45 0.1062 0.0959 0.0860 0.0765 0.0673 0.0584 0.0497 0.0412 0.0328 0.0245
50 0.1113 0.1020 0.0932 0.0846 0.0764 0.0685 0.0608 0.0532 0.0459 0.0386
Table XVIa Criticai values for 5% and Table XVIb Values of the constants an,n-i+t required for calculating Shapiro and
l% tests for the presence of an un- Wilks' W-statistic TN 17 =L 2 /S 2 , where
defined nurnber of discordant values in a [n/2] n
norrnal sarnple, using Shapiro
Wilks' W-statistic
an d L= I
i=l
an,n-i+l[x(n-i+l)- x(i)] an d 2
S = I
i=l
(xi -xl
N17
2 3 4 5 6 7 8 9 10
n
n 5% 1%
3 0.7071
3 0.767 0.753 4 0.6872 0.1677
4 0.748 ().687
5 0.6646 0.2413
5 0.762 0.686 6 0.6431 0.2806 0.0875
6 0.788 0.713 7 0.6233 0.3031 0.1401
7 0.803 0.730 8 0.6052 0.3164 0.1743 0.0561
8 0.818 0.749 9 0.5888 0.3244 0.1976 0.0947
9 0.829 0.764 10 0.5739 0.3291 0.2141 0.1224 0.0399
lO 0.842 0.781 12 0.5475 0.3325 0.2347 0.1586 0.0922 0.0303
12 0.859 0.805 14 0.5251 0.3318 0.2460 0.1802 0'.1240 0.0727 0.0240
14 0.874 0.825 16 0.5056 0.3290 0.2521 0.1939 0.1447 0.1005 0.0593 0.0196
16 0.887 0.844 18 0.4886 0.3253 0.255~ 0.2027 0.1587 0.1197 0.0837 0.0496 0.0163
18 0.897 0.858 20 0.4734 0.3211 0.2565 0.2085 0.1686 0.1334 0.1013 0.0711 0.0422 0.0140
20 0.905 0.868 25 0.4450 0.3069 0.2543 0.2148 0.1822 0.1539 0.1283 0.1046 0.0823 0.0610
25 0.918 0.888 30 0.4254 0.2944 0.2487 0.2148 0.1870 0.1630 0.1415 0.1219 0.1036 0.0862
30 0.927 0.900 35 0.4096 0.2834 0.2427 0.2127 0.1883 0.1673 0.1487 0.1317 0.1160 0.1013
35 0.934 0.910 40 0.3964 0.2737 0.2368 0.2098 0.1878 0.1691 0.1526 0.1376 0.1237 0.1108
40 0.940 0.919 45 0.3850 0.2651 0.2313 0.2065 0.1865 0.1695 0.1545 0.1410 0.1286 0.1170
45 0.945 0.926 50 0.3751 0.2574 0.2260 0.2032 0.1847 0.1691 0.1554 0.1430 0.1317 0.1212
50 0.947 0.930

n = number of observations.

11 12 13 14 15 16 17 18 19 20
n

25 0.0403 0.0200
30 0.0697 0.0537 0.0381 0.0227 0.0076
35 0.0873 0.0739 0.0610 0.0484 0.0361 0.0239 0.0119
40 0.0986 0.0870 0.0759 0.0651 0.0546 0.0444 0.0343 0.0244 0.0146 0.0049
45 0.1062 0.0959 0.0860 0.0765 0.0673 0.0584 0.0497 0.0412 0.0328 0.0245
50 0.1113 0.1020 0.0932 0.0846 0.0764 0.0685 0.0608 0.0532 0.0459 0.0386
Table XVIIa Criticai values for 5% and l% tests of discordancy for an upper
outlier x(n) in a Poisson sarnple, using x(n) conditional on I xi as test statistic Table XVIIb Criticai values for 5% and l% tests of discordancy for a lower outlier
x(t) in a Poisson sarnple, using x(l) conditional on I xi as test statistic
Pl
5% criticai vaiues P2
5% criticai vaiues
I xi -x(n)

o l 2 3 4 5 6 8 lO 12 14 16 18 20 22 24
n
x (l)
o l 2 3 4 5 6 7 8 9 lO 11 12 13 14 15
n
3 11 16 21 25 29 33 37 41 45 49 53 57 61 64 68 72
3 4 7 9 11 13 15 16 20 24 27 31 34 38 41 44 48 4 16 23 30 36 42 47 53 58 64 69 74 80 85 90 95
4 4 6 8 9 11 13 14 17 21 24 27 30 33 36 39 42 5 21 31 39 47 55 62 69 76 83 89 96
5 3 5 7 9 lO 12 13 16 19 22 25 28 30 33 36 39 6 27 38 49 58 68 76 85 93
6 3 5 7 8 lO 11 12 15 18 21 24 26 29 32 34 37
8 3 5 6 7 9 lO 11 14 17 19 22 25 27 30 32 35 The outlier x(t) is significant at 5% if I xi is greater than or equa! to the tabulated value in the
lO 3 4 6 7 8 lO 11 14 16 19 21 24 26 28 31 33 column corresponding to the observed value of x(l).
12 3 4 6 7 8 lO 11 13 16 18 21 23 25 28 30 32
16 3 4 5 7 8 9 lO 13 15 17 20 22 24 27 29 31 1 % criticai vaiues
20 2 4 5 6 8 9 lO 12 15 17 19 22 24 26 28 31
25 2 4 5 6 7 9 lO 12 14 17 19 21 23 26 28 30
50 2 4 5 6 7 8 9 12 14 16 18 20 22 25 27 29 n
X(l)
o l 2 3 4 5 6 7 8 9 lO 11 12 13 14 15
100 2 3 5 6 7 8 9 11 13 15 18 20 22 24 26 28
3 15 20 26 30 35 39 44 48 52 56 60 64 68 72 76 80
The outlier x(n) is significant at 5% if I xi is greater than or equa! to the tabulated value in the 4 21 30 37 43 50 56 62 67 73 79 84 90 95 100
column corresponding to the observed value of I xi- x(n)· 5 28 39 48 57 65 72 80 87 95
6 35 48 60 70 80 89 98
1 % criticai vaiues
The outlier x(l) is significant at 1% if I xi is greater than or equa! to the tabulated value in the
I xi-x(nl column corresponding to the observed value of x(l).
o l 2 3 4 5 6 8 lO 12 14 16 18 20 22 24 n = number of observations.
x(l) = smallest observation.
n Ixi sum of observations.

3 6 8 11 13 15 17 19 23 26 30 34 37 41 44 48 51
4 5 7 9 11 13 14 16 19 23 26 29 32 35 38 41 45
5 4 6 8 lO 11 13 15 18 21 24 27 30 32 35 38 41
6 4 6 8 9 11 12 14 17 20 22 25 28 31 33 36 39
8 4 5 7 8 lO 11 13 15 18 21 23 26 29 31 34 36
lO 3 5 7 8 9 11 12 15 17 20 22 25 27 30 32 35
12 3 5 6 8 9 lO 12 14 17 19 22 24 27 29 31 34
16 3 5 6 7 9 lO 11 14 16 18 21 23 26 28 30 32
20 3 4 6 7 8 lO 11 13 16 18 20 23 25 27 29 32
25 3 4 6 7 8 9 11 13 15 18 20 22 24 27 29 31
50 3 4 5 6 8 9 lO 12 14 17 19 21 23 25 27 30
100 2 4 5 6 7 8 9 12 14 16 18 20 22 24 27 29

The outlier x(n) is significant at 1% if I xi is greater than or equa! to the tabulated value in the
column corresponding to the observed value of I xi- x( n)·
n = number of observations.
x(n) = greatest observation.
I xi = sum of observations.

317
316
Table XVIIa Criticai values for 5% and l% tests of discordancy for an upper
outlier x(n) in a Poisson sarnple, using x(n) conditional on I xi as test statistic Table XVIIb Criticai values for 5% and l% tests of discordancy for a lower outlier
x(t) in a Poisson sarnple, using x(l) conditional on I xi as test statistic
Pl
5% criticai vaiues P2
5% criticai vaiues
I xi -x(n)

o l 2 3 4 5 6 8 lO 12 14 16 18 20 22 24
n
x (l)
o l 2 3 4 5 6 7 8 9 lO 11 12 13 14 15
n
3 11 16 21 25 29 33 37 41 45 49 53 57 61 64 68 72
3 4 7 9 11 13 15 16 20 24 27 31 34 38 41 44 48 4 16 23 30 36 42 47 53 58 64 69 74 80 85 90 95
4 4 6 8 9 11 13 14 17 21 24 27 30 33 36 39 42 5 21 31 39 47 55 62 69 76 83 89 96
5 3 5 7 9 lO 12 13 16 19 22 25 28 30 33 36 39 6 27 38 49 58 68 76 85 93
6 3 5 7 8 lO 11 12 15 18 21 24 26 29 32 34 37
8 3 5 6 7 9 lO 11 14 17 19 22 25 27 30 32 35 The outlier x(t) is significant at 5% if I xi is greater than or equa! to the tabulated value in the
lO 3 4 6 7 8 lO 11 14 16 19 21 24 26 28 31 33 column corresponding to the observed value of x(l).
12 3 4 6 7 8 lO 11 13 16 18 21 23 25 28 30 32
16 3 4 5 7 8 9 lO 13 15 17 20 22 24 27 29 31 1 % criticai vaiues
20 2 4 5 6 8 9 lO 12 15 17 19 22 24 26 28 31
25 2 4 5 6 7 9 lO 12 14 17 19 21 23 26 28 30
50 2 4 5 6 7 8 9 12 14 16 18 20 22 25 27 29 n
X(l)
o l 2 3 4 5 6 7 8 9 lO 11 12 13 14 15
100 2 3 5 6 7 8 9 11 13 15 18 20 22 24 26 28
3 15 20 26 30 35 39 44 48 52 56 60 64 68 72 76 80
The outlier x(n) is significant at 5% if I xi is greater than or equa! to the tabulated value in the 4 21 30 37 43 50 56 62 67 73 79 84 90 95 100
column corresponding to the observed value of I xi- x(n)· 5 28 39 48 57 65 72 80 87 95
6 35 48 60 70 80 89 98
1 % criticai vaiues
The outlier x(l) is significant at 1% if I xi is greater than or equa! to the tabulated value in the
I xi-x(nl column corresponding to the observed value of x(l).
o l 2 3 4 5 6 8 lO 12 14 16 18 20 22 24 n = number of observations.
x(l) = smallest observation.
n Ixi sum of observations.

3 6 8 11 13 15 17 19 23 26 30 34 37 41 44 48 51
4 5 7 9 11 13 14 16 19 23 26 29 32 35 38 41 45
5 4 6 8 lO 11 13 15 18 21 24 27 30 32 35 38 41
6 4 6 8 9 11 12 14 17 20 22 25 28 31 33 36 39
8 4 5 7 8 lO 11 13 15 18 21 23 26 29 31 34 36
lO 3 5 7 8 9 11 12 15 17 20 22 25 27 30 32 35
12 3 5 6 8 9 lO 12 14 17 19 22 24 27 29 31 34
16 3 5 6 7 9 lO 11 14 16 18 21 23 26 28 30 32
20 3 4 6 7 8 lO 11 13 16 18 20 23 25 27 29 32
25 3 4 6 7 8 9 11 13 15 18 20 22 24 27 29 31
50 3 4 5 6 8 9 lO 12 14 17 19 21 23 25 27 30
100 2 4 5 6 7 8 9 12 14 16 18 20 22 24 27 29

The outlier x(n) is significant at 1% if I xi is greater than or equa! to the tabulated value in the
column corresponding to the observed value of I xi- x( n)·
n = number of observations.
x(n) = greatest observation.
I xi = sum of observations.

317
316
Table XVIIIa Criticai values for 5% and l% tests of discordancy for an upper Table XVIIIb Criticai values for 5% and l% tests of discordancy for a lower
outlier-pair x<n-l)' x<n> in a Poisson sarnple, using x<n- 1) + x<n> conditional on I xi as outlier-pair x(I), x(2) in a Poisson sarnple, using x< 1>+ x<2> conditional on I xi as test
test statistic statistic
P3 P4
5% criticai vaiues 5% criticai vaiues

xi X(n-1)- X(n) x<I>+x<2>


o l 2 3 4 5 6 8 lO 12 14 o 2 3 4 5 6 7 8
n
n
4 7 11 14 17 20 23 25 28 30
4 7 11 14 17 20 23 25 30 35 40 45 5 11 16 20 24 27 31 34 38 41
5 6 9 12 14 16 19 21 25 29 34 38 6 15 20 26 30 35 39 44 48 52
6 6 8 11 13 15 17 19 23 26 30 34 8 22 31 38 45 51 57 63 69 75
8 5 7 9 11 13 15 16 20 23 26 29 lO 31 42 51 60 68 76 84 91 99
lO 5 7 9 lO 12 14 15 18 21 24 27
12 4 6 8 lO 11 13 14 17 20 23 26 The outlier-pair x<1>, x< 2> is significant at 5% if L x. is greater than or equa! to the tabulated
16 4 6 8 9 11 12 13 16 19 22 24 value in the column corresponding to the observei value of x(l) + x< 2l.
20 4 6 7 9 lO 12 13 16 18 21 23
1 % criticai vaiues
The outlier-pair x<n- 1), x<nl is significant at 5% if L xi is greater than or equal to the tabulated
value in the column corresponding to the observed value of L xi- x(n-l)- x(n)· x<IJ+x<2>
1 % criticai vaiues o 2 3 4 5 6 7 8
n
Xi X(n-l)- X(n)
4 lO 14 17 20 23 26 29 31 34
o 2 3 4 5 6 8 lO 12 14 5 14 19 23 28 32 35 39 43 46
6 19 25 30 35 40 45 50 54 58
n 8 28 37 45 52 59 65 71 78 84
lO 38 50 60 69 78 86 94
4 lO 14 17 20 23 26 29 34 39 45 50
5 8 11 14 16 19 21 24 28 32 37 41 The outlier-pair x<1>, x< 2l is significant a t l% if L x. is greater than or equa! to the tabulated
6 7 10 12 15 17 19 21 25 29 33 36 value in the column corresponding to the observei value of x0 l + x< 2l.
8 6 8 11 13 14 16 18 21 25 28 31
lO 6 8 lO 12 13 15 17 20 23 26 29 n number of observations.
12 5 7 9 11 12 14 16 19 22 25 27 x(l) = smallest observation.
16 5 7 8 lO 12 13 14 17 20 23 26 x< 2>= next smallest observation.
20 5 7 8 lO 11 12 14 17 19 21 24
L xi = sum of observations.

The outlier-pair x<n-l)' x<nl is significant a t l% if L xi is greater than or equa! to the tabulated
value in the column corresponding to the observed value of L xi- x<n-1)- x<n>·

n = number of observations.
= greatest observation.
x(n)
= second greatest observation.
x(n-l)
L = sum of observations.
xi

318 319
Table XVIIIa Criticai values for 5% and l% tests of discordancy for an upper Table XVIIIb Criticai values for 5% and l% tests of discordancy for a lower
outlier-pair x<n-l)' x<n> in a Poisson sarnple, using x<n- 1) + x<n> conditional on I xi as outlier-pair x(I), x(2) in a Poisson sarnple, using x< 1>+ x<2> conditional on I xi as test
test statistic statistic
P3 P4
5% criticai vaiues 5% criticai vaiues

xi X(n-1)- X(n) x<I>+x<2>


o l 2 3 4 5 6 8 lO 12 14 o 2 3 4 5 6 7 8
n
n
4 7 11 14 17 20 23 25 28 30
4 7 11 14 17 20 23 25 30 35 40 45 5 11 16 20 24 27 31 34 38 41
5 6 9 12 14 16 19 21 25 29 34 38 6 15 20 26 30 35 39 44 48 52
6 6 8 11 13 15 17 19 23 26 30 34 8 22 31 38 45 51 57 63 69 75
8 5 7 9 11 13 15 16 20 23 26 29 lO 31 42 51 60 68 76 84 91 99
lO 5 7 9 lO 12 14 15 18 21 24 27
12 4 6 8 lO 11 13 14 17 20 23 26 The outlier-pair x<1>, x< 2> is significant at 5% if L x. is greater than or equa! to the tabulated
16 4 6 8 9 11 12 13 16 19 22 24 value in the column corresponding to the observei value of x(l) + x< 2l.
20 4 6 7 9 lO 12 13 16 18 21 23
1 % criticai vaiues
The outlier-pair x<n- 1), x<nl is significant at 5% if L xi is greater than or equal to the tabulated
value in the column corresponding to the observed value of L xi- x(n-l)- x(n)· x<IJ+x<2>
1 % criticai vaiues o 2 3 4 5 6 7 8
n
Xi X(n-l)- X(n)
4 lO 14 17 20 23 26 29 31 34
o 2 3 4 5 6 8 lO 12 14 5 14 19 23 28 32 35 39 43 46
6 19 25 30 35 40 45 50 54 58
n 8 28 37 45 52 59 65 71 78 84
lO 38 50 60 69 78 86 94
4 lO 14 17 20 23 26 29 34 39 45 50
5 8 11 14 16 19 21 24 28 32 37 41 The outlier-pair x<1>, x< 2l is significant a t l% if L x. is greater than or equa! to the tabulated
6 7 10 12 15 17 19 21 25 29 33 36 value in the column corresponding to the observei value of x0 l + x< 2l.
8 6 8 11 13 14 16 18 21 25 28 31
lO 6 8 lO 12 13 15 17 20 23 26 29 n number of observations.
12 5 7 9 11 12 14 16 19 22 25 27 x(l) = smallest observation.
16 5 7 8 lO 12 13 14 17 20 23 26 x< 2>= next smallest observation.
20 5 7 8 lO 11 12 14 17 19 21 24
L xi = sum of observations.

The outlier-pair x<n-l)' x<nl is significant a t l% if L xi is greater than or equa! to the tabulated
value in the column corresponding to the observed value of L xi- x<n-1)- x<n>·

n = number of observations.
= greatest observation.
x(n)
= second greatest observation.
x(n-l)
L = sum of observations.
xi

318 319
Table XIX Criticai values for 5% and l% tests of discordancy for an upper outlier Table XIX ( Continued)
x<n> in a binomia! sarnple, using x<n> conditional on I xi as test statistic
x(n)= m-2

Bl,B2 m
5% criticai values 3 4 5 6 7 8 9 lO
X(n)=m n

m 3 4 6 9 11 14
3 4 5 6 7 8 9 lO 4 3 5 8 11 14 17
n 5 3 6 9 13 17 21
6 3 7 11 15 19 24
3 3 5 7 lO t2 15 18 21 7 4 7 12 17 22 27
4 3 6 9 12 16 20 23 27 8 4 8 13 19 25 31
5 4 7 11 15 19 24 28 33 9 4 9 14 20 27 34
6 4 8 12 17 22 28 33 38 lO 5 9 15 22 29 37
7 4 9 14 19 25 31 37 44
8 5 lO 15 22 28 35 42 50 An outlier x(n) equa! to m- 2 is judged discorda n t a t leve! 5% if L x.1 is less than or equa! to the
9 5 11 17 24 31 39 47 55 tabulated value in the column corresponding to m.
lO 6 11 18 26 34 43 51 60
An outlier x<n> equa! to m is judged discordant at leve! 5% if L xi is less than or equa! to the 1 % criticai vaiues
tabulated value in the column corresponding to m.
X(n)= m

m
X(n)= m-1 3 4 5 6 7 8 9 lO
n
m
3 4 5 6 7 8 9 lO 3 4 6 8 lO 13 16 19
n 4 4 7 lO 13 17 20 24
5 5 8 12 16 20 24 29
3 4 7 9 12 14 17 6 3 6 9 14 18 23 27
4 3 5 8 11 15 18 22 33
7 3 6 11 15 21 26 31 38
5 3 6 lO 14 18 22 26 8 3 7 12 17 23 29 36
6 4 7 11 16 20 31 43
25 9 3 7 13 19 25 33 40 47
7 4 8 12 18 23 29 35 lO 4 8 14 20 28 35 44
8 4 9 14 20 26 32 39 52
9 5 9 15 21 28 36 43 An outlier x(n) equa! to m is judged discordant at leve! 1% if L x.1 is Iess than or equa! to the
lO 5 lO 16 23 31 39 47 tabulated value in the column corresponding to m.

An outlier x<n> equa! to m -1 is judged discordant at leve! 5% if L xi is less than or equa! to the
tabulated value in the column corresponding to m.

320 321
Table XIX Criticai values for 5% and l% tests of discordancy for an upper outlier Table XIX ( Continued)
x<n> in a binomia! sarnple, using x<n> conditional on I xi as test statistic
x(n)= m-2

Bl,B2 m
5% criticai values 3 4 5 6 7 8 9 lO
X(n)=m n

m 3 4 6 9 11 14
3 4 5 6 7 8 9 lO 4 3 5 8 11 14 17
n 5 3 6 9 13 17 21
6 3 7 11 15 19 24
3 3 5 7 lO t2 15 18 21 7 4 7 12 17 22 27
4 3 6 9 12 16 20 23 27 8 4 8 13 19 25 31
5 4 7 11 15 19 24 28 33 9 4 9 14 20 27 34
6 4 8 12 17 22 28 33 38 lO 5 9 15 22 29 37
7 4 9 14 19 25 31 37 44
8 5 lO 15 22 28 35 42 50 An outlier x(n) equa! to m- 2 is judged discorda n t a t leve! 5% if L x.1 is less than or equa! to the
9 5 11 17 24 31 39 47 55 tabulated value in the column corresponding to m.
lO 6 11 18 26 34 43 51 60
An outlier x<n> equa! to m is judged discordant at leve! 5% if L xi is less than or equa! to the 1 % criticai vaiues
tabulated value in the column corresponding to m.
X(n)= m

m
X(n)= m-1 3 4 5 6 7 8 9 lO
n
m
3 4 5 6 7 8 9 lO 3 4 6 8 lO 13 16 19
n 4 4 7 lO 13 17 20 24
5 5 8 12 16 20 24 29
3 4 7 9 12 14 17 6 3 6 9 14 18 23 27
4 3 5 8 11 15 18 22 33
7 3 6 11 15 21 26 31 38
5 3 6 lO 14 18 22 26 8 3 7 12 17 23 29 36
6 4 7 11 16 20 31 43
25 9 3 7 13 19 25 33 40 47
7 4 8 12 18 23 29 35 lO 4 8 14 20 28 35 44
8 4 9 14 20 26 32 39 52
9 5 9 15 21 28 36 43 An outlier x(n) equa! to m is judged discordant at leve! 1% if L x.1 is Iess than or equa! to the
lO 5 lO 16 23 31 39 47 tabulated value in the column corresponding to m.

An outlier x<n> equa! to m -1 is judged discordant at leve! 5% if L xi is less than or equa! to the
tabulated value in the column corresponding to m.

320 321
Table XX Optirnal choice of the nurnber, m*, of lower ordered sarnple values out
Table XIX ( Continued) of n used in estirnating the scale cr of an exponential distribution, and associated
X(n)= m-1
relative efficiency em*, when one observation has slipped in scale to cr/h (reproduced
m by perrnission of the author and the Arnerican Statistica! Association)
3 4 5 6 7 8 9 lO

~
n 0.05 0.10 0.15 0.20 0.25

3 5 8 lO 12 15 m* em* m* em* m* em* m* em* m* em*


4 4 6 9 12 15 19
5 5 7 11 14 18 22 2 l 88.59 l 21.96 l 9.66 l 5.38 l 3.43
6 5 8 12 17 21 26 3 l 74.71 2 17.58 2 7.88 2 4.49 2 2.93
7 3 6 9 •14 19 24 29 4 2 67.10 2 15.97 2 6.83 3 3.95 3 2.63
8 3 6 lO 15 21 27 33 5 3 60.42 3 14.51 3 6.29 4 3.57 4 2.41
9 3 7 11 17 23 29 36 6 4 54.84 4 13.27 4 5.81 4 3.29 5 2.25
lO 3 7 12 18 25 32 40 7 5 50.19 5 12.22 5 5.40 5 3.10 6 2.12
8 6 46.28 6 11.33 6 5.05 6 2.93 7 2.01
An outlier x< n> equa! to m - 1 is judged discordant a t leve! 1% if L xi is less than or equa! to the 9 7 42.96 7 10.57 7 4.75 7 2.79 8 1.93
tabulated value in the column corresponding to m. , lO 8 40.10 8 9.91 8 4.49 8 2.66 9 1.85
15 12 30.73 13 7.65 13 3.59 13 2.22 13 1.62
20 17 25.06 17 6.36 18 3.06 18 1.96 18 1.49
30 27 18.45 27 4.87 27 2.46 28 1.68 28 1.34
X(n)= m-2 40 36 14.77 37 4.03 37 2.14 38 1.52 38 1.25
50 46 12.39 47 3.49 47 1.93 48 1.42 48 1.20
m
3 4 5 6 7 8 9 lO
n

3 5 7 9 12

~
4 4 6 9 12 15 0.30 0.35 0.40 0.45 0.50
5 4 7 lO 14 17
6 5 8 12 16 20 m* em* m* e m* m* em* m* em* m* em*
7 5 9 13 18 22
8 3 6 lO 14 19 25 2 l 2.39 l 1.79 l 1.41 l 1.16 l 1.00
9 3 6 lO 16 21 28 3 2 2.11 2 1.62 2 1.32 2 1.13 2 1.00
lO 3 7 11 17 23 30 4 3 1.93 3 1.52 3 1.27 3 1.11 3 1.00
5 4 1.80 4 1.45 4 1.23 4 1.09 4 1.00
An outlier x<n> equa! to m - 2 is judged discordant a t leve! 1% if L xi is less than or equa! to the 6 5 1.70 5 1.39 5 1.20 5 1.08 6 1.00
tabulated value in the column corresponding to m. 7 6 1.63 6 1.35 6 1.17 6 1.07 7 1.00
8 7 1.56 7 1.31 7 1.16 7 1.06 8 1.00
n = number of observations. 9 8 1.51 8 1.28 8 1.14 8 1.05 9 1.00
x(n) = greatest observation. lO 9 1.47 9 1.26 9 1.13 9 1.05 lO 1.00
L xi= sum of observations. 15 14 1.33 14 1.17 14 1.08 14 1.03 14 1.00
m = parameter of binomia! distribution.
20 19 1.25 19 1.13 19 1.06 19 1.02 20 1.00
30 29 1.17 29 1.08 29 1.04 29 1.01 30 1.00
40 39 1.12 39 1.06 39 1.03 39 1.01 40 1.00
50 49 1.10 49 1.05 49 1.02 49 1.00 50 1.00

322 323
Table XX Optirnal choice of the nurnber, m*, of lower ordered sarnple values out
Table XIX ( Continued) of n used in estirnating the scale cr of an exponential distribution, and associated
X(n)= m-1
relative efficiency em*, when one observation has slipped in scale to cr/h (reproduced
m by perrnission of the author and the Arnerican Statistica! Association)
3 4 5 6 7 8 9 lO

~
n 0.05 0.10 0.15 0.20 0.25

3 5 8 lO 12 15 m* em* m* em* m* em* m* em* m* em*


4 4 6 9 12 15 19
5 5 7 11 14 18 22 2 l 88.59 l 21.96 l 9.66 l 5.38 l 3.43
6 5 8 12 17 21 26 3 l 74.71 2 17.58 2 7.88 2 4.49 2 2.93
7 3 6 9 •14 19 24 29 4 2 67.10 2 15.97 2 6.83 3 3.95 3 2.63
8 3 6 lO 15 21 27 33 5 3 60.42 3 14.51 3 6.29 4 3.57 4 2.41
9 3 7 11 17 23 29 36 6 4 54.84 4 13.27 4 5.81 4 3.29 5 2.25
lO 3 7 12 18 25 32 40 7 5 50.19 5 12.22 5 5.40 5 3.10 6 2.12
8 6 46.28 6 11.33 6 5.05 6 2.93 7 2.01
An outlier x< n> equa! to m - 1 is judged discordant a t leve! 1% if L xi is less than or equa! to the 9 7 42.96 7 10.57 7 4.75 7 2.79 8 1.93
tabulated value in the column corresponding to m. , lO 8 40.10 8 9.91 8 4.49 8 2.66 9 1.85
15 12 30.73 13 7.65 13 3.59 13 2.22 13 1.62
20 17 25.06 17 6.36 18 3.06 18 1.96 18 1.49
30 27 18.45 27 4.87 27 2.46 28 1.68 28 1.34
X(n)= m-2 40 36 14.77 37 4.03 37 2.14 38 1.52 38 1.25
50 46 12.39 47 3.49 47 1.93 48 1.42 48 1.20
m
3 4 5 6 7 8 9 lO
n

3 5 7 9 12

~
4 4 6 9 12 15 0.30 0.35 0.40 0.45 0.50
5 4 7 lO 14 17
6 5 8 12 16 20 m* em* m* e m* m* em* m* em* m* em*
7 5 9 13 18 22
8 3 6 lO 14 19 25 2 l 2.39 l 1.79 l 1.41 l 1.16 l 1.00
9 3 6 lO 16 21 28 3 2 2.11 2 1.62 2 1.32 2 1.13 2 1.00
lO 3 7 11 17 23 30 4 3 1.93 3 1.52 3 1.27 3 1.11 3 1.00
5 4 1.80 4 1.45 4 1.23 4 1.09 4 1.00
An outlier x<n> equa! to m - 2 is judged discordant a t leve! 1% if L xi is less than or equa! to the 6 5 1.70 5 1.39 5 1.20 5 1.08 6 1.00
tabulated value in the column corresponding to m. 7 6 1.63 6 1.35 6 1.17 6 1.07 7 1.00
8 7 1.56 7 1.31 7 1.16 7 1.06 8 1.00
n = number of observations. 9 8 1.51 8 1.28 8 1.14 8 1.05 9 1.00
x(n) = greatest observation. lO 9 1.47 9 1.26 9 1.13 9 1.05 lO 1.00
L xi= sum of observations. 15 14 1.33 14 1.17 14 1.08 14 1.03 14 1.00
m = parameter of binomia! distribution.
20 19 1.25 19 1.13 19 1.06 19 1.02 20 1.00
30 29 1.17 29 1.08 29 1.04 29 1.01 30 1.00
40 39 1.12 39 1.06 39 1.03 39 1.01 40 1.00
50 49 1.10 49 1.05 49 1.02 49 1.00 50 1.00

322 323
Table XXI Criticai values for the Mosteller non-pararnetric slip- T ab le XXII Criticai values of T max for the Doornbos an d Prins rank test of slippage
page test (5%)
% e%)
1 1%
n n
2 3 4 5 6 2 3 4 5 6
m m

3 (=) (~) (~) (~) (~) 3 (=) (24) (32) (40)


(:~)
4 (~) (:) (!) ~ (!) (!) 4 (26)
(!~) (!!) (~~) (:~)
5 (:) (:) (!) (!) (!) 5 (!~) (~~) G!) (1~:) (120)
128
6 (~) (:) (!) (!) (!) 6 (!~) (:~) eo9)
116
( 139) (168)
150 181
7 (~) (:) (!) (!) (!) 7 (~~) (106)
113
(145)
155
( 183) (222)
197 238
8 (~) (:) (!) (!) (!) 8 (~~) (136)
145
( 184)
198
( 234)
250
(283)
303
9 (~) (:) (!) (!) (!) n = number of samples; m = sample size.

10 (~) (;) (!) (!) (!)


15 (~) (;) (:) (!) (!)
20 (~) (;) (:) (!) (!)
25 (~) (:) (:) (!) (!)
100 (:) (:) (:) (!) (!)
()()
(:) (:) (:) (!) (!)
n= number of samples; m= sample size.

324 325
Table XXI Criticai values for the Mosteller non-pararnetric slip- T ab le XXII Criticai values of T max for the Doornbos an d Prins rank test of slippage
page test (5%)
% e%)
1 1%
n n
2 3 4 5 6 2 3 4 5 6
m m

3 (=) (~) (~) (~) (~) 3 (=) (24) (32) (40)


(:~)
4 (~) (:) (!) ~ (!) (!) 4 (26)
(!~) (!!) (~~) (:~)
5 (:) (:) (!) (!) (!) 5 (!~) (~~) G!) (1~:) (120)
128
6 (~) (:) (!) (!) (!) 6 (!~) (:~) eo9)
116
( 139) (168)
150 181
7 (~) (:) (!) (!) (!) 7 (~~) (106)
113
(145)
155
( 183) (222)
197 238
8 (~) (:) (!) (!) (!) 8 (~~) (136)
145
( 184)
198
( 234)
250
(283)
303
9 (~) (:) (!) (!) (!) n = number of samples; m = sample size.

10 (~) (;) (!) (!) (!)


15 (~) (;) (:) (!) (!)
20 (~) (;) (:) (!) (!)
25 (~) (:) (:) (!) (!)
100 (:) (:) (:) (!) (!)
()()
(:) (:) (:) (!) (!)
n= number of samples; m= sample size.

324 325
Table XXIII Criticai values of M (l, n) for the Conover non-pararnetric slippage
5 ;;....
test ( %) (reproduced by perrnission of the Arnerican Statistica! Association) .!:J O N ....-; 00 00 M
1% ('() ('() 00 00 ....... lfì

n
"'~
(.)
;:l
o
N
V)
'O
c:
.......
('()
c:
r- ....-;
('()
c:
o
c:
('() ~
o o
c: c:
o o o o o o

m
2 3 4 5 6 8 lO 12 14 16 18 20 "'o....
0..
~
O\ 00 00 ~ ~ \D
N M N O\ 00 t-
-b lfì ....-; N ,. . .; N ~ \D
....... lfì ('() o o o o
{/)
~ o o o o o o
o o o o o o
~~~~t)~~~~~~~
4 o.
s Cll O\ o O\ 00 ('() \D
{/)
tnoooo,.....r-N

G)C)0C)C)0C)d~~~~
-; ;::!
N M ,.....; ~ \D O\
o o o o
5 s.... lfì ('()
o o o o o o
o o o o o o o
c::
6 (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) o
t) o
....-; t- ,. . .;
\D
~
lfì
tn N
\D
00
lfì
tn
N O
00 o
00 N
....... lfì ('() o o o .......

(~) (~) (~) (~) (~) (~) (~) (~) (~) (~)
{/)
o o o o o o
7
(~) (~) o o o o o o
Cll
.s
~ lfì \D ~ ....... ('() lfì

(~) (~) (~) (~) (~) (~) (:) (~) (~) (~) (~) (~)
(.) ~ O\ ....... \D ('() O\
8 c:: 'O \D ('() \D o ('()
.;:l O\ lfì ('() o o ..............
.... o o o o o o
Cll
> o o o o o o
9 (~) (~) (~) (~) (:) (~) (~) (~) (~) (~) (~) (~) ~,-....

.sbb~
....... lfì r- O\ ~ ~
~ O\ 00 O\ ('() lfì
{/) .!:J 0'\00Mt-N\0
lO (~) (~) (~) (:) (:) (~) (~) (~) (~) (~) G) (~) ro
<+-;o
c::
(5
00 lfì ('()
o o o o o o
o o o o o o
o o ..............

o o
(~) (~) (:) (:)

(~) (~) (~) (~) (~) (~)
M N N 00
(~) (~)
Vì O\
12 CllP::: ~ ....... O\ O\ ....... o
o
o..- r- ....-; ....... ~ O\ lfì
.e- o ~
o
O
o
O O
o o
,.....; N
o o
c:: o o o o o o
(~)
Cii
14 (:) (:) (~) (~) (~) (~) (~) (~) (~) (~) (~) "'.... .:2
Cll.;!l
{/)

....-; r- ('() o r- 00
~ s.... ~ \D lfì 00 ....... .......
(~)
N ,.....; \D N
(:) (:) (~) (~) (~) (~) (~) (~) C~) (l~) (l~)
c:: O\ lfì
16 ~ ~
\D ~ O O ,.....; ,.....; N
o 0.. o o o o o o
o o o o o o
"'....
18 (~) (~) G)
(~) (~) (~) (~) C~) C~) (l~) (l~) C~) .8
{/)
~
;:l lfì ~
~
~
~

o
.......
\D
lfì
.......
\D
('()
N O\ t- tn M
O O ,.....; N M
o o o o o
o \D
lfì o

(~) (:) (~) (~)


-;
20 (~)
(~) (l~) C~) C~) (l~) C~) C~) -;
> o o o o o o
.g ~ 00 . . . . \D r- r-
o ~ ~ ~
(~) (~) (~) (~) (l~) C~) (l~) C~) L~) C~) L~) ·a -~
\D .......
25 (:) ~
~ ~
O\
o
o o o o o o
~
O
o
~ lfì \D
,.....; N M ~
o o o o
\D

<l)
o o.
tr)
s
30 (:) (:) (~) (~) (~) (l~) C~) C~) C~) C~) C~) C~) ECll
00
r-
N
t- O\ N N ,.....;
('() 00 \D O\ ('()
00 ~ N 00 M
~
Il
.§ ('() M ON~tnt-
E
(~) (~) C~) (l~) C~) (l~) C~) (l~) L~) (l~)
o o o o o o
35 (:) (:) o
:><
....
0..
o o o o o o ~
<l)

o.
~ ~ o ('() o lfì ('() s
(:) (~) (~) (~) (l~) C~) C~) C~) (l~) (l~) C~) L~)
lfì o 00 ('() r- \D
40 ....-; tn O ~ N \D ~
N O N \D O\ N ~

00
(:) G) C~) C~) (l~) (l~) C~) C~) C~) C~) G~) G~)
-
>
><
><
~
:;:::
o o o o ....... .......
o o o o o o o
1-<
<l)
,D
s
;:l

:oCll E
N M ~ tn \D t- c::
Il
E-- :;:::
n= number of samples; m sample size.
327
326
Table XXIII Criticai values of M (l, n) for the Conover non-pararnetric slippage
5 ;;....
test ( %) (reproduced by perrnission of the Arnerican Statistica! Association) .!:J O N ....-; 00 00 M
1% ('() ('() 00 00 ....... lfì

n
"'~
(.)
;:l
o
N
V)
'O
c:
.......
('()
c:
r- ....-;
('()
c:
o
c:
('() ~
o o
c: c:
o o o o o o

m
2 3 4 5 6 8 lO 12 14 16 18 20 "'o....
0..
~
O\ 00 00 ~ ~ \D
N M N O\ 00 t-
-b lfì ....-; N ,. . .; N ~ \D
....... lfì ('() o o o o
{/)
~ o o o o o o
o o o o o o
~~~~t)~~~~~~~
4 o.
s Cll O\ o O\ 00 ('() \D
{/)
tnoooo,.....r-N

G)C)0C)C)0C)d~~~~
-; ;::!
N M ,.....; ~ \D O\
o o o o
5 s.... lfì ('()
o o o o o o
o o o o o o o
c::
6 (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) (~) o
t) o
....-; t- ,. . .;
\D
~
lfì
tn N
\D
00
lfì
tn
N O
00 o
00 N
....... lfì ('() o o o .......

(~) (~) (~) (~) (~) (~) (~) (~) (~) (~)
{/)
o o o o o o
7
(~) (~) o o o o o o
Cll
.s
~ lfì \D ~ ....... ('() lfì

(~) (~) (~) (~) (~) (~) (:) (~) (~) (~) (~) (~)
(.) ~ O\ ....... \D ('() O\
8 c:: 'O \D ('() \D o ('()
.;:l O\ lfì ('() o o ..............
.... o o o o o o
Cll
> o o o o o o
9 (~) (~) (~) (~) (:) (~) (~) (~) (~) (~) (~) (~) ~,-....

.sbb~
....... lfì r- O\ ~ ~
~ O\ 00 O\ ('() lfì
{/) .!:J 0'\00Mt-N\0
lO (~) (~) (~) (:) (:) (~) (~) (~) (~) (~) G) (~) ro
<+-;o
c::
(5
00 lfì ('()
o o o o o o
o o o o o o
o o ..............

o o
(~) (~) (:) (:)

(~) (~) (~) (~) (~) (~)
M N N 00
(~) (~)
Vì O\
12 CllP::: ~ ....... O\ O\ ....... o
o
o..- r- ....-; ....... ~ O\ lfì
.e- o ~
o
O
o
O O
o o
,.....; N
o o
c:: o o o o o o
(~)
Cii
14 (:) (:) (~) (~) (~) (~) (~) (~) (~) (~) (~) "'.... .:2
Cll.;!l
{/)

....-; r- ('() o r- 00
~ s.... ~ \D lfì 00 ....... .......
(~)
N ,.....; \D N
(:) (:) (~) (~) (~) (~) (~) (~) C~) (l~) (l~)
c:: O\ lfì
16 ~ ~
\D ~ O O ,.....; ,.....; N
o 0.. o o o o o o
o o o o o o
"'....
18 (~) (~) G)
(~) (~) (~) (~) C~) C~) (l~) (l~) C~) .8
{/)
~
;:l lfì ~
~
~
~

o
.......
\D
lfì
.......
\D
('()
N O\ t- tn M
O O ,.....; N M
o o o o o
o \D
lfì o

(~) (:) (~) (~)


-;
20 (~)
(~) (l~) C~) C~) (l~) C~) C~) -;
> o o o o o o
.g ~ 00 . . . . \D r- r-
o ~ ~ ~
(~) (~) (~) (~) (l~) C~) (l~) C~) L~) C~) L~) ·a -~
\D .......
25 (:) ~
~ ~
O\
o
o o o o o o
~
O
o
~ lfì \D
,.....; N M ~
o o o o
\D

<l)
o o.
tr)
s
30 (:) (:) (~) (~) (~) (l~) C~) C~) C~) C~) C~) C~) ECll
00
r-
N
t- O\ N N ,.....;
('() 00 \D O\ ('()
00 ~ N 00 M
~
Il
.§ ('() M ON~tnt-
E
(~) (~) C~) (l~) C~) (l~) C~) (l~) L~) (l~)
o o o o o o
35 (:) (:) o
:><
....
0..
o o o o o o ~
<l)

o.
~ ~ o ('() o lfì ('() s
(:) (~) (~) (~) (l~) C~) C~) C~) (l~) (l~) C~) L~)
lfì o 00 ('() r- \D
40 ....-; tn O ~ N \D ~
N O N \D O\ N ~

00
(:) G) C~) C~) (l~) (l~) C~) C~) C~) C~) G~) G~)
-
>
><
><
~
:;:::
o o o o ....... .......
o o o o o o o
1-<
<l)
,D
s
;:l

:oCll E
N M ~ tn \D t- c::
Il
E-- :;:::
n= number of samples; m sample size.
327
326
Table XXV Approximate criticai values and sizes for tests for upward slippage in one of n
Poisson distributions ( 5%)
% (reproduced by permission of Mathematische Centrum, Amsterdam) Table XXVI Criticai values for 5% and l% tests of discordancy of a single outlier
1
in a rnultivariate norrnal sarnple where V is known and the test statistic is
rnaxi= l, 2, ... ' n (xi- i)' v- 1 (xj- i) (reproduced by perrnission of The Institute of Statis-
tica! Mathernatics)

=2 =3 p=4
3
3 .040 3 .028 3 .020 3 .016 3 .012 3 n
3 .010 5% 1% 5% 1% 5% 1%
4 .037 4 .016 4 .008 4 .005 4 .003 4 .002 3 .045 3 .037
4 7.92 10.45
4 .008 4 .005 4 .003 4 .002 4 .001 4 .001 3 5.32 7.53 6.69 9.07
5 .012 5 .004 4 .034 4 .020 4: .013 4 .009 4 .006 4 .005 4 6.48 8.95 8.05 10.70 9.47 12.28
5 7.29 9.92 9.00 11.81 10.54 13.51
5 .004 5 .002 5 .001 5 .000 4 .009 4 .006 4 .005 5
6 7.91 10.64 9.72 12.63 11.34 14.41
6 .031 6 .004 5 .019 5 .008 5 .004 4 .035 4 .024 4 .017 4 .013
6 7 8.41 11.21 10.28 13.28 11.97 15.12
6 .004 6 .001 5 .008 5 .004 5 .002 5 .001 5 .001 5 .001
8 8.82 11.68 10.74 13.80 12.49 15.70
7 .016 6 .021 6 .005 5 .023 5 .012 5 .007 5 .004 4 .037 4 .027
7
7 .001 6 .005 6 .002 6 .001 5 .004 5 .003 5 .002 9 9.18 12.08 11.15 14.24 12.93 16.19
5 .007
lO 9.48 12.42 11.49 14.62 13.31 16.61
8 .008 7 .008 6 .017 6 .006 5 .028 5 .016 5 .010 5 .006 5 .004 13.94 17.29
8 12 9.99 12.98 12.05 15.26
8 .008 7 .008 7 .002 6 .006 6 .003 6 .001 5 .010 5 .006 5 .004
14 10.40 13.44 12.53 15.76 14.45 17.83
8 .039 7 .025 6 .040 6 .015 6 .007 5 .032 5 .020 5 .013 5 .009 12.93 16.18 14.87 18.28
9
9 .004 8 .003 7 .005 7 .002 6 .007 6 .003 6 .002 6 .001 5 .009 16 10.77 13.88
18 11.06 14.13 13.26 16.53 15.23 18.66
10 9 .021 8 .010 7 .014 6 .032 6 .015 6 .008 5 .036 5 .024 5 .016 20 11.32 14.42 13.55 16.84 15.55 18.99
10 .002 9 .001 8 .002 7 .004 7 .002 6 .008 6 .004 6 .002 6 .001 17.47 16.19 19.67
25 11.88 15.02 14.15
10 .012 8 .027 7 .030 7 .010 6 .028 6 .015 6 .008 5 .040 5 .028 30 12.31 15.49 14.63 17.96 16.70 20.21
11
11 .001 9 .004 8 .005 7 .010 7 .004 7 .002 6 .008 6 .005 6 .003
12 10 .039 9 .012 8 .011 7 .020 6 .048 6 .026 6 .015 6 .009 5 .043 n= number of observations; p= dimension.
11 .006 10 .002 9 .002 8 .003 7 .008 7 .004 7 .002 6 .009 6 .005
13 11 .022 9 .027 8 .023 7 .035 7 .015 6 .042 6 .024 6 .015 6 .009 Table XXVII Criticai values for 5% and l% tests of discordancy of a single outlier
12 .003 10 .005 9 .004 8 .006 8 .002 7 .007 7 .003 7 .002 6 .009
in a rnultivariate norrnal sarnple where l.l. and V are known, and the test statistic is
12 .013 10 .012 8 .041 8 .012 7 .025 7 .012 6 .038 6 .023 6 .015
rnax (xi- ~J.)'Y- (Xi- l.l.)
1
14 R<n>(Jl, V)=
13 .002 11 .002 9 .009 9 .002 8 .004 8 .002 7 .006 7 .003 7 .002 j=l,2, ... , n
12 .035 10 .026 9 .017 8 .021 7 .040 7 .019 7 .010 6 .035 6 .022
15
13 .007 11 .005 10 .003 9 .004 8 .008 8 .003 8 .001 7 .005 7 .003 =6 p 8 p= 10

16 13 .021 10 .048 9 .030 8 .035 8 .013 7 .030 7 .016 7 .009 6 .033 n


14 .004 12 .002 10 .007 9 .007 9 .002 8 .005 8 .002 7 .009 7 .005 5% 1% 5% 1% 5% 1% 5% 1% 5% 1%
17 13 .049 11 .024 9 .050 9 .013 8 .021 7 .045 7 .024 7 .013 6 .047
15 .002 12 .006 11 .002 10 .002 9 .004 8 .009 8 .004 8 .002 7 .008 3 8.15 11.40 12.05 15.77 15.46 19.54 18.63 23.02 21.66
11 .044 9 .021 7 .035 7 .020 7 .012 4 8.73 11.98 12.72 16.42 16.20 20.24 19.43 23.76 22.50 27.10
18 14 .031 10 .022 8 .032 8 .014
15 .008 13 .003 11 .005 10 .005 9 .007 9 .003 8 .007 8 .003 8 .002 5 9.17 12.42 13.23 16.91 16.76 20.78 20.03 24.34 23.15 27.71
6 9.53 12.79 13.65 17.32 17.22 21.22 20.53 24.81 23.67 28.21
15 .019 12 .022 10 .036 9 .033 8 .048 8 .021 7 .050 7 .028 7 .017
19 9.84 13.09 14.00 17.66 17.60 21.59 20.94 25.20 24.12 28.62
16 .004 13 .006 11 .009 10 .008 10 .002 9 .004 9 .002 8 .005 8 .003 7
8 10.11 13.36 14.30 17.96 17.94 21.91 21.30 25.54 24.49 28.98
20 15 .041 12 .039 11 .016 9 .050 9 .017 8 .031 8 .015 7 .040 7 .024
9 10.34 13.61 14.57 18.24 18.23 22.21 21.61 25.86 24.83 29.31
17 .003 14 .003 12 .004 11 .003 10 .004 9 .007 9 .003 8 .008 8 .004
10 10.55 13.81 14.81 18.46 18.49 22.45 21.89 26.11 25.12 29.58
21 16 .027 13 .021 11 .026 10 .020 9 .026 8 .044 8 .022 8 .011 7 .033
17 .007 14 .006 12 .007 11 .005 10 .006 10 .002 9 .004 9 .002 8 .006 25 12.38 15.64 16.87 20.48 20.73 24.62 24.29 28.41 27.65 31.99
17 .017 13 .035 11 .040 10 .031 9 .037 9 .015 8 .031 8 .016 7 .044 50 13.77 17.02 18.41 21.99 22.40 26.24 26.06 30.12 29.52 33.78
22
18 .004 15 .003 13 .003 11 .008 10 .009 10 .003 9 .007 9 .003 8 .009 100 15.15 18.41 19.94 23.50 24.04 27.84 27.80 31.82 31.35 35.55
17 .035 14 .019 12 .019 10 .045 10 .014 9 .022 8 .042 8 .022 8 .012 200 16.54 19.80 21.46 25.00 25.67 29.44 29.52 33.49 33.16 37.30
23 35.68 35.50 39.58
19 .003 15 .005 13 .005 12 .003 11 .003 10 .005 9 .010 9 .004 9 .002 500 18.37 21.63 23.46 26.98 27.80 31.95 31.77
14 .031 8 .030 8 .017 1000 19.76 23.03 24.96 28.47 29.39 33.11 33.44 39.33 37.25 41.30
24 18 .023 12 .029 11 .019 10 .020 9 .030 9 .014
19 .007 15 .010 13 .008 12 .005 11 .005 10 .007 10 .003 9 .006 9 .003
18 .043 14 .049 12 .043 11 .028 10 .029 9 .041 9 .019 8 .040 8 .023 n= number of observations; p= dimension.
25 329
20 .004 16 .005 14 .004 12 .008 11 .008 11 .002 10 .004 9 .009 9 .005

t= sum of ali observations.


328'
Table XXV Approximate criticai values and sizes for tests for upward slippage in one of n
Poisson distributions ( 5%)
% (reproduced by permission of Mathematische Centrum, Amsterdam) Table XXVI Criticai values for 5% and l% tests of discordancy of a single outlier
1
in a rnultivariate norrnal sarnple where V is known and the test statistic is
rnaxi= l, 2, ... ' n (xi- i)' v- 1 (xj- i) (reproduced by perrnission of The Institute of Statis-
tica! Mathernatics)

=2 =3 p=4
3
3 .040 3 .028 3 .020 3 .016 3 .012 3 n
3 .010 5% 1% 5% 1% 5% 1%
4 .037 4 .016 4 .008 4 .005 4 .003 4 .002 3 .045 3 .037
4 7.92 10.45
4 .008 4 .005 4 .003 4 .002 4 .001 4 .001 3 5.32 7.53 6.69 9.07
5 .012 5 .004 4 .034 4 .020 4: .013 4 .009 4 .006 4 .005 4 6.48 8.95 8.05 10.70 9.47 12.28
5 7.29 9.92 9.00 11.81 10.54 13.51
5 .004 5 .002 5 .001 5 .000 4 .009 4 .006 4 .005 5
6 7.91 10.64 9.72 12.63 11.34 14.41
6 .031 6 .004 5 .019 5 .008 5 .004 4 .035 4 .024 4 .017 4 .013
6 7 8.41 11.21 10.28 13.28 11.97 15.12
6 .004 6 .001 5 .008 5 .004 5 .002 5 .001 5 .001 5 .001
8 8.82 11.68 10.74 13.80 12.49 15.70
7 .016 6 .021 6 .005 5 .023 5 .012 5 .007 5 .004 4 .037 4 .027
7
7 .001 6 .005 6 .002 6 .001 5 .004 5 .003 5 .002 9 9.18 12.08 11.15 14.24 12.93 16.19
5 .007
lO 9.48 12.42 11.49 14.62 13.31 16.61
8 .008 7 .008 6 .017 6 .006 5 .028 5 .016 5 .010 5 .006 5 .004 13.94 17.29
8 12 9.99 12.98 12.05 15.26
8 .008 7 .008 7 .002 6 .006 6 .003 6 .001 5 .010 5 .006 5 .004
14 10.40 13.44 12.53 15.76 14.45 17.83
8 .039 7 .025 6 .040 6 .015 6 .007 5 .032 5 .020 5 .013 5 .009 12.93 16.18 14.87 18.28
9
9 .004 8 .003 7 .005 7 .002 6 .007 6 .003 6 .002 6 .001 5 .009 16 10.77 13.88
18 11.06 14.13 13.26 16.53 15.23 18.66
10 9 .021 8 .010 7 .014 6 .032 6 .015 6 .008 5 .036 5 .024 5 .016 20 11.32 14.42 13.55 16.84 15.55 18.99
10 .002 9 .001 8 .002 7 .004 7 .002 6 .008 6 .004 6 .002 6 .001 17.47 16.19 19.67
25 11.88 15.02 14.15
10 .012 8 .027 7 .030 7 .010 6 .028 6 .015 6 .008 5 .040 5 .028 30 12.31 15.49 14.63 17.96 16.70 20.21
11
11 .001 9 .004 8 .005 7 .010 7 .004 7 .002 6 .008 6 .005 6 .003
12 10 .039 9 .012 8 .011 7 .020 6 .048 6 .026 6 .015 6 .009 5 .043 n= number of observations; p= dimension.
11 .006 10 .002 9 .002 8 .003 7 .008 7 .004 7 .002 6 .009 6 .005
13 11 .022 9 .027 8 .023 7 .035 7 .015 6 .042 6 .024 6 .015 6 .009 Table XXVII Criticai values for 5% and l% tests of discordancy of a single outlier
12 .003 10 .005 9 .004 8 .006 8 .002 7 .007 7 .003 7 .002 6 .009
in a rnultivariate norrnal sarnple where l.l. and V are known, and the test statistic is
12 .013 10 .012 8 .041 8 .012 7 .025 7 .012 6 .038 6 .023 6 .015
rnax (xi- ~J.)'Y- (Xi- l.l.)
1
14 R<n>(Jl, V)=
13 .002 11 .002 9 .009 9 .002 8 .004 8 .002 7 .006 7 .003 7 .002 j=l,2, ... , n
12 .035 10 .026 9 .017 8 .021 7 .040 7 .019 7 .010 6 .035 6 .022
15
13 .007 11 .005 10 .003 9 .004 8 .008 8 .003 8 .001 7 .005 7 .003 =6 p 8 p= 10

16 13 .021 10 .048 9 .030 8 .035 8 .013 7 .030 7 .016 7 .009 6 .033 n


14 .004 12 .002 10 .007 9 .007 9 .002 8 .005 8 .002 7 .009 7 .005 5% 1% 5% 1% 5% 1% 5% 1% 5% 1%
17 13 .049 11 .024 9 .050 9 .013 8 .021 7 .045 7 .024 7 .013 6 .047
15 .002 12 .006 11 .002 10 .002 9 .004 8 .009 8 .004 8 .002 7 .008 3 8.15 11.40 12.05 15.77 15.46 19.54 18.63 23.02 21.66
11 .044 9 .021 7 .035 7 .020 7 .012 4 8.73 11.98 12.72 16.42 16.20 20.24 19.43 23.76 22.50 27.10
18 14 .031 10 .022 8 .032 8 .014
15 .008 13 .003 11 .005 10 .005 9 .007 9 .003 8 .007 8 .003 8 .002 5 9.17 12.42 13.23 16.91 16.76 20.78 20.03 24.34 23.15 27.71
6 9.53 12.79 13.65 17.32 17.22 21.22 20.53 24.81 23.67 28.21
15 .019 12 .022 10 .036 9 .033 8 .048 8 .021 7 .050 7 .028 7 .017
19 9.84 13.09 14.00 17.66 17.60 21.59 20.94 25.20 24.12 28.62
16 .004 13 .006 11 .009 10 .008 10 .002 9 .004 9 .002 8 .005 8 .003 7
8 10.11 13.36 14.30 17.96 17.94 21.91 21.30 25.54 24.49 28.98
20 15 .041 12 .039 11 .016 9 .050 9 .017 8 .031 8 .015 7 .040 7 .024
9 10.34 13.61 14.57 18.24 18.23 22.21 21.61 25.86 24.83 29.31
17 .003 14 .003 12 .004 11 .003 10 .004 9 .007 9 .003 8 .008 8 .004
10 10.55 13.81 14.81 18.46 18.49 22.45 21.89 26.11 25.12 29.58
21 16 .027 13 .021 11 .026 10 .020 9 .026 8 .044 8 .022 8 .011 7 .033
17 .007 14 .006 12 .007 11 .005 10 .006 10 .002 9 .004 9 .002 8 .006 25 12.38 15.64 16.87 20.48 20.73 24.62 24.29 28.41 27.65 31.99
17 .017 13 .035 11 .040 10 .031 9 .037 9 .015 8 .031 8 .016 7 .044 50 13.77 17.02 18.41 21.99 22.40 26.24 26.06 30.12 29.52 33.78
22
18 .004 15 .003 13 .003 11 .008 10 .009 10 .003 9 .007 9 .003 8 .009 100 15.15 18.41 19.94 23.50 24.04 27.84 27.80 31.82 31.35 35.55
17 .035 14 .019 12 .019 10 .045 10 .014 9 .022 8 .042 8 .022 8 .012 200 16.54 19.80 21.46 25.00 25.67 29.44 29.52 33.49 33.16 37.30
23 35.68 35.50 39.58
19 .003 15 .005 13 .005 12 .003 11 .003 10 .005 9 .010 9 .004 9 .002 500 18.37 21.63 23.46 26.98 27.80 31.95 31.77
14 .031 8 .030 8 .017 1000 19.76 23.03 24.96 28.47 29.39 33.11 33.44 39.33 37.25 41.30
24 18 .023 12 .029 11 .019 10 .020 9 .030 9 .014
19 .007 15 .010 13 .008 12 .005 11 .005 10 .007 10 .003 9 .006 9 .003
18 .043 14 .049 12 .043 11 .028 10 .029 9 .041 9 .019 8 .040 8 .023 n= number of observations; p= dimension.
25 329
20 .004 16 .005 14 .004 12 .008 11 .008 11 .002 10 .004 9 .009 9 .005

t= sum of ali observations.


328'
Table XXVIII Criticai values for 5% and 1% tests of discordancy of a single outlier T ab le XXIX Criticai values of ~ for 5% and 1% tests of discordancy for a pair of
in a multivariate norma! sample where f1 and V are unknown and the test statistic is outliers in a multivariate normal sample where f1 and V are unknown, using the test
statistic
R<n>(i, S)= max (xi -i)'S- 1 (xi -i)
j=l, 2, ... , n
r2 = min fillh,h
it,h
p=2 p=3 p=4 p=5
n p=2 p=3 p=4 p=5
5% 1% 5% 1% 5% 1% 5% 1% n
5% 1% 5% 1% 5% 1% 5% 1%
5 3.17 3.19
6 4.00 4.11 4.14 4.16 5 0.0025 0.0005 0.0000
7 4.71 4.95 5.01 5.10 5.12 5.14 6 0.0337 0.0150 0.0011 0.0002
8 5.32 5.70 5.77 5.97 6.01 6.09 6.11 6.12 7 0.0860 0.0498 0.0202 0.0090 0.0006 0.0001
9 5.85 6.37 6.43 6.76 6.80 6.97 7.01 7.08 8 0.1417 0.0937 0.0580 0.0335 0.0136 0.0060 0.0004 0.0001
10 6.32 6.97 7.01 7.47 7.50 7.79 7.82 7.98 9 0.1942 0.1393 0.1024 0.0674 0.0425 0.0245 0.0098 0.0043
12 7.10 8.00 7.99 8.70 8.67 9.20 10 0.2419 0.1831 0.1470 0.1049 0.0788 0.0518 0.0327 0.0189
9.19 9.57
14 7.74 8.84 8.78 9.71 9.61 10.37 10.29 10.90 12 0.3229 0.2616 0.2288 0.1791 0.1549 0.1163 0.0966 0.0686
16 8.27 9.54 9.44 10.56 10.39 11.36 11.20 12.02 14 0.3879 0.3276 0.2982 0.2460 0.2246 0.1804 0.1631 0.1270
18 8.73 10.15 10.00 11.28 11.06 12.20 11.96 12.98 16 0.4410 0.3828 0.3563 0.3040 0.2853 0.2389 0.2242 0.1838
20 9.13 10.67 10.49 11.91 11.63 12.93 12.62 13.81 18 0.4850 0.4295 0.4054 0.3542 0.3376 0.2908 0.2782 0.2360
25 9.94 11.73 11.48 13.18 12.78 14.40 13.94 15.47 20 0.5221 0.4694 0.4472 0.3976 0.3828 0.3366 0.3257 0.2830
30 10.58 12.54 12.24 14.14 13.67 15.51 14.95 16.73 25 0.5935 0.5472 0.5288 0.4839 0.4722 0.4290 0.4211 0.3798
35 11.10 13.20 12.85 14.92 14.37 16.40 15.75 17.73 30 0.6451 0.6041 0.5882 0.5478 0.5380 0.4984 0.4923 0.4537
40 11.53 13.74 13.36 15.56 14.96 17.13 16.41 18.55 35 0.6842 0.6475 0.6335 0.5969 0.5885 0.5523 0.5473 0.5116
45 11.90 14.20 13.80 16.10 15.46 17.74 16.97 19.24 40 0.7150 0.6818 0.6693 0.6360 0.6285 0.5953 0.5911 0.5580
50 12.23 14.60 14.18 16.56 15.89 18.27 17.45 19.83 45 0.7399 0.7097 0.6982 0.6677 0.6610 0.6304 0.6267 0.5961
100 14.22 16.95 16.45 19.26 18.43 21.30 20.26 23.17 50 0.7605 0.7328 0.7222 0.6941 0.6880 0.6596 0.6564 0.6279
200 15.99 18.94 18.42 21.47 20.59 23.72 22.59 25.82 0.8047 0.7883
100 0.8629 0.8477 0.8417 0.8260 0.8225 0.8065
500 18.12 21.22 20.75 23.95 23.06 26.37 25.21 28.62 0.8918 0.8830
200 0.9232 0.9152 0.9118 0.9035 0.9015 0.8929
500 0.9650 0.9618 0.9602 0.9568 0.9558 0.9523 0.9517 0.9480
n= number of observations; p= dimension.
n= number of observations; p= dimension.

330 331
Table XXVIII Criticai values for 5% and 1% tests of discordancy of a single outlier T ab le XXIX Criticai values of ~ for 5% and 1% tests of discordancy for a pair of
in a multivariate norma! sample where f1 and V are unknown and the test statistic is outliers in a multivariate normal sample where f1 and V are unknown, using the test
statistic
R<n>(i, S)= max (xi -i)'S- 1 (xi -i)
j=l, 2, ... , n
r2 = min fillh,h
it,h
p=2 p=3 p=4 p=5
n p=2 p=3 p=4 p=5
5% 1% 5% 1% 5% 1% 5% 1% n
5% 1% 5% 1% 5% 1% 5% 1%
5 3.17 3.19
6 4.00 4.11 4.14 4.16 5 0.0025 0.0005 0.0000
7 4.71 4.95 5.01 5.10 5.12 5.14 6 0.0337 0.0150 0.0011 0.0002
8 5.32 5.70 5.77 5.97 6.01 6.09 6.11 6.12 7 0.0860 0.0498 0.0202 0.0090 0.0006 0.0001
9 5.85 6.37 6.43 6.76 6.80 6.97 7.01 7.08 8 0.1417 0.0937 0.0580 0.0335 0.0136 0.0060 0.0004 0.0001
10 6.32 6.97 7.01 7.47 7.50 7.79 7.82 7.98 9 0.1942 0.1393 0.1024 0.0674 0.0425 0.0245 0.0098 0.0043
12 7.10 8.00 7.99 8.70 8.67 9.20 10 0.2419 0.1831 0.1470 0.1049 0.0788 0.0518 0.0327 0.0189
9.19 9.57
14 7.74 8.84 8.78 9.71 9.61 10.37 10.29 10.90 12 0.3229 0.2616 0.2288 0.1791 0.1549 0.1163 0.0966 0.0686
16 8.27 9.54 9.44 10.56 10.39 11.36 11.20 12.02 14 0.3879 0.3276 0.2982 0.2460 0.2246 0.1804 0.1631 0.1270
18 8.73 10.15 10.00 11.28 11.06 12.20 11.96 12.98 16 0.4410 0.3828 0.3563 0.3040 0.2853 0.2389 0.2242 0.1838
20 9.13 10.67 10.49 11.91 11.63 12.93 12.62 13.81 18 0.4850 0.4295 0.4054 0.3542 0.3376 0.2908 0.2782 0.2360
25 9.94 11.73 11.48 13.18 12.78 14.40 13.94 15.47 20 0.5221 0.4694 0.4472 0.3976 0.3828 0.3366 0.3257 0.2830
30 10.58 12.54 12.24 14.14 13.67 15.51 14.95 16.73 25 0.5935 0.5472 0.5288 0.4839 0.4722 0.4290 0.4211 0.3798
35 11.10 13.20 12.85 14.92 14.37 16.40 15.75 17.73 30 0.6451 0.6041 0.5882 0.5478 0.5380 0.4984 0.4923 0.4537
40 11.53 13.74 13.36 15.56 14.96 17.13 16.41 18.55 35 0.6842 0.6475 0.6335 0.5969 0.5885 0.5523 0.5473 0.5116
45 11.90 14.20 13.80 16.10 15.46 17.74 16.97 19.24 40 0.7150 0.6818 0.6693 0.6360 0.6285 0.5953 0.5911 0.5580
50 12.23 14.60 14.18 16.56 15.89 18.27 17.45 19.83 45 0.7399 0.7097 0.6982 0.6677 0.6610 0.6304 0.6267 0.5961
100 14.22 16.95 16.45 19.26 18.43 21.30 20.26 23.17 50 0.7605 0.7328 0.7222 0.6941 0.6880 0.6596 0.6564 0.6279
200 15.99 18.94 18.42 21.47 20.59 23.72 22.59 25.82 0.8047 0.7883
100 0.8629 0.8477 0.8417 0.8260 0.8225 0.8065
500 18.12 21.22 20.75 23.95 23.06 26.37 25.21 28.62 0.8918 0.8830
200 0.9232 0.9152 0.9118 0.9035 0.9015 0.8929
500 0.9650 0.9618 0.9602 0.9568 0.9558 0.9523 0.9517 0.9480
n= number of observations; p= dimension.
n= number of observations; p= dimension.

330 331
Table XXX Criticai values for 5% and 1% tests of discordancy of a s.ingle outlier in
a bivariate normal sample where p. and V are unknown, and the test statistic is Table:XXX ( Continued)
l% test
R(n)(i, Sv) max (xi -i)'S~ 1 (xi -i)
j=l,2, ... , n n
3 4 5 6 7 8 9 10 11 12 14
where S, is an 'external' estimate of V (reproduced by permission of The Institute of lJ
Statistica! Mathematics)
5% test 20 10.72 12.99 14.61 15.85 16.86 17.71 18.44 19.08 19.66 20.17 21.07
22 10.36 12.53 14.07 15.24 16.20 17.00 17.69 18.29 18.83 19.31 20.16
n 24 10.07 12.16 13.63 14.76 15.68 16.44 17.10 17.67 18.18 18.64 19.44
3 4 5 6 7 8 9 10 11 12 14 26 9.84 11.86 13.28 14.37 15.25 15.98 16.62 17.16 17.66 18.09 18.86
lJ
28 9.64 11.63 12.99 14.05 14.90 15.62 16.22 16.75 17.22 17.64 18.37
30 9.47 11.40 12.74 13.77 14.60 15.30 15.88 16.40 t6.85 17.26 17.97
20 6.88 8.53 9.72 10.64 11.44 12.10 12.67 13.18 13.56 14.04 14.76 32 9.33 11.22 12.54 13.54 14.35 15.02 15.60 16.10 16.54 16.94 17.63
22 6.72 8.30 9.45 10.36 11.10 11.73 12.28 12.76 13.19 13.58 14.26 34 9.21 11.06 12.36 13.34 14.13 14.79 15.35 15.84 16.28 16.66 17.34
24 6.58 8.13 9.24 10.12 10.83 11.44 11.97 12.43 12.84 13.21 13.86 36 9.10 10.93 12.20 13.17 13.95 14.59 15.14 15.62 16.04 16.42 17.08
26 6.47 7.98 9.06 9.92 10.61 11.20 11.71 12.16 12.55 12.91 13.54 38 9.01 10.81 12.06 13.01 13.78 14.41 14.95 15.42 15.84 16.21 16.86
28 6.37 7.86 8.92 9.75 10.42 11.00 11.49 11.93 12.31 12.66 13.28 40 8.93 10.70 11.94 12.88 13.63 14.26 14.79 15.25 15.66 16.03 16.66
30 6.29 7.75 8.79 9.61 10.27 10.83 11.31 11.74 12.11 12.45 13.05 45 8.75 10.48 11.69 12.60 13.33 13.93 14.45 14.89 15.29 15.64 16.25
32 6.23 7.66 8.69 9.49 10.14 10.69 11.16 11.57 11.94 12.27 12.86 50 8.62 10.31 11.49 12.38 13.09 13.68 14.18 14.62 15.00 15.34 15.93
34 6.17 7.58 8.60 9.38 10.02 10.56 11.02 11.43 11.79 12.12 12.69 55 8.51 10.18 11.33 12.21 12.90 13.48 13.97 14.39 14.77 15.10 15.68
36 6.!'2 7.51 8.51 9.29 9.92 10.45 10.91 11.31 11.67 11.98 12.55 60 8.42 10.07 11.20 12.06 12.75 13.32 13.80 14.21 14.58 14.91 15.48
38 6.07 7.45 8.44 9.21 9.83 10.35 10.81 11.20 11.55 11.87 12.42 100 8.05 9.59 10.66 11.46 12.10 12.63 13.07 13.45 13.79 14.09 14.61
40 6.03 7.40 8.38 9.14 9.75 10.27 10.71 11.10 11.45 11.76 12.30 150 7.87 9.37 10.40 11.18 11.79 12.30 12.73 13.10 13.42 13.71 14.21
45 5.94 7.29 8.25 8.99 9.59 10.09 10.53 10.91 11.24 11.54 12.07 200 7.79 9.26 10.28 11.04 11.62 12.14 12.57 12.92 13.24 13.52 14.01
50 5.88 7.20 8.14 8.87 9.46 9.95 10.38 10.75 11.08 11.37 11.89
55 5.82 7.13 8.06 8.78 9.36 9.84 10.26 10.63 10.95 11.24 11.74 n= number of observations; v= number of degrees of freedom.
60 5.78 7.07 7.99 8.70 9.27 9.75 10.16 10.52 10.84 11.13 11.62
100 5.59 6.82 7.70 8.37 8.91 9.36 9.75 10.09 10.39 10.65 11.12
150 5.50 6.71 7.56 8.21 8.74 9.18 9.55 9.8E 10.17 10.43 10.88
200 5.45 6.65 7.49 8.13 8.64 9.09 9.46 9.78 10.06 10.32 10.76

332 333
Table XXX Criticai values for 5% and 1% tests of discordancy of a s.ingle outlier in
a bivariate normal sample where p. and V are unknown, and the test statistic is Table:XXX ( Continued)
l% test
R(n)(i, Sv) max (xi -i)'S~ 1 (xi -i)
j=l,2, ... , n n
3 4 5 6 7 8 9 10 11 12 14
where S, is an 'external' estimate of V (reproduced by permission of The Institute of lJ
Statistica! Mathematics)
5% test 20 10.72 12.99 14.61 15.85 16.86 17.71 18.44 19.08 19.66 20.17 21.07
22 10.36 12.53 14.07 15.24 16.20 17.00 17.69 18.29 18.83 19.31 20.16
n 24 10.07 12.16 13.63 14.76 15.68 16.44 17.10 17.67 18.18 18.64 19.44
3 4 5 6 7 8 9 10 11 12 14 26 9.84 11.86 13.28 14.37 15.25 15.98 16.62 17.16 17.66 18.09 18.86
lJ
28 9.64 11.63 12.99 14.05 14.90 15.62 16.22 16.75 17.22 17.64 18.37
30 9.47 11.40 12.74 13.77 14.60 15.30 15.88 16.40 t6.85 17.26 17.97
20 6.88 8.53 9.72 10.64 11.44 12.10 12.67 13.18 13.56 14.04 14.76 32 9.33 11.22 12.54 13.54 14.35 15.02 15.60 16.10 16.54 16.94 17.63
22 6.72 8.30 9.45 10.36 11.10 11.73 12.28 12.76 13.19 13.58 14.26 34 9.21 11.06 12.36 13.34 14.13 14.79 15.35 15.84 16.28 16.66 17.34
24 6.58 8.13 9.24 10.12 10.83 11.44 11.97 12.43 12.84 13.21 13.86 36 9.10 10.93 12.20 13.17 13.95 14.59 15.14 15.62 16.04 16.42 17.08
26 6.47 7.98 9.06 9.92 10.61 11.20 11.71 12.16 12.55 12.91 13.54 38 9.01 10.81 12.06 13.01 13.78 14.41 14.95 15.42 15.84 16.21 16.86
28 6.37 7.86 8.92 9.75 10.42 11.00 11.49 11.93 12.31 12.66 13.28 40 8.93 10.70 11.94 12.88 13.63 14.26 14.79 15.25 15.66 16.03 16.66
30 6.29 7.75 8.79 9.61 10.27 10.83 11.31 11.74 12.11 12.45 13.05 45 8.75 10.48 11.69 12.60 13.33 13.93 14.45 14.89 15.29 15.64 16.25
32 6.23 7.66 8.69 9.49 10.14 10.69 11.16 11.57 11.94 12.27 12.86 50 8.62 10.31 11.49 12.38 13.09 13.68 14.18 14.62 15.00 15.34 15.93
34 6.17 7.58 8.60 9.38 10.02 10.56 11.02 11.43 11.79 12.12 12.69 55 8.51 10.18 11.33 12.21 12.90 13.48 13.97 14.39 14.77 15.10 15.68
36 6.!'2 7.51 8.51 9.29 9.92 10.45 10.91 11.31 11.67 11.98 12.55 60 8.42 10.07 11.20 12.06 12.75 13.32 13.80 14.21 14.58 14.91 15.48
38 6.07 7.45 8.44 9.21 9.83 10.35 10.81 11.20 11.55 11.87 12.42 100 8.05 9.59 10.66 11.46 12.10 12.63 13.07 13.45 13.79 14.09 14.61
40 6.03 7.40 8.38 9.14 9.75 10.27 10.71 11.10 11.45 11.76 12.30 150 7.87 9.37 10.40 11.18 11.79 12.30 12.73 13.10 13.42 13.71 14.21
45 5.94 7.29 8.25 8.99 9.59 10.09 10.53 10.91 11.24 11.54 12.07 200 7.79 9.26 10.28 11.04 11.62 12.14 12.57 12.92 13.24 13.52 14.01
50 5.88 7.20 8.14 8.87 9.46 9.95 10.38 10.75 11.08 11.37 11.89
55 5.82 7.13 8.06 8.78 9.36 9.84 10.26 10.63 10.95 11.24 11.74 n= number of observations; v= number of degrees of freedom.
60 5.78 7.07 7.99 8.70 9.27 9.75 10.16 10.52 10.84 11.13 11.62
100 5.59 6.82 7.70 8.37 8.91 9.36 9.75 10.09 10.39 10.65 11.12
150 5.50 6.71 7.56 8.21 8.74 9.18 9.55 9.8E 10.17 10.43 10.88
200 5.45 6.65 7.49 8.13 8.64 9.09 9.46 9.78 10.06 10.32 10.76

332 333
Table XXXI 5% and 1% criticai values for the maximum normed residual, for Table XXXII Criticai values for 5% and 1% tests of discordancy for a single outlier
testing the discordancy of a single outlier in a r x c factorial experiment (reproduced in a generai linear model with normal error structure, using the studentized residual
by permission of the American Statistica} Association and the American Society for as test statistic (reproduced by permission of the American Statistica} Association
Quality Control) and the American Society for Quality Control)
5% criticai values 5% criticai values
q
3 4 5 6 7 8 9 2 3 4 5 6 8 10 15 25
c n

3 0.648 0.645 0.624 0.600 ~ 0.577 0.555 0.535 5 1.92


4 0.621 0.590 0.561 0.535 0.5l3 0.493 6 2.07 1.93
5 0.555 0.525 0.499 0.457 7 2.19 2.08 1.94
0.477
6 0.495 0.469 0.447 0.428 8 2.28 2.20 2.10 1.94
7 0.444 0.423 0.405 9 2.35 2.29 2.21 2.10 1.95
8 10 2.42 2.37 2.31 2.22 2.11 1.95
0.402 0.385
12 2.52 2.49 2.45 2.39 2.33 2.24 1.96
9 0.368
14 2.61 2.58 2.55 2.51 2.47 2.41 2.25 1.96
16 2.68 2.66 2.63 2.60 2.57 2.53 2.43 2.26
18 2.73 2.72 2.70 2.68 2.65 2.62 2.55 2.44
l% criticai values 20 2.78 2.77 2.76 2.74 2.72 2.70 2.64 2.57 2.15
25 2.89 2.88 2.87 2.86 2.84 2.83 2.80 2.76 2.60
30 2.96 2.96 2.95 2.94 2.93 2.93 2.90 2.88 2.79 2.17
3 4 5 6 7 8 9 35 3.03 3.02 3.02 3.01 3.00 3.00 2.93 2.97 2.91 2.64
c 40 3.08 3.08 3.07 3.07 3.06 3.06 3.05 3.03 3.00 2.84
45 3.13 3.12 3.12 3.12 3.11 3.11 3.10 3.09 3.06 2.96
3 0.660 0.675 0.664 0.646 0.626 0.606 0.587 50 3.17 3.16 3.16 3.16 3.15 3.15 3.14 3.14 3.11 3.04
4 0.665 0.640 0.613 0.588 0.565 0.544 60 3.23 3.23 3.23 3.23 3.22 3.22 3.22 3.21 3.20 3.15
5 0.608 0.578 0.551 0.527 0.506 70 3.29 3.29 3.28 3.28 3.28 3.28 3.27 3.27 3.26 3.23
6 0.546 0.519 0.495 0.475 80 3.33 3.33 3.33 3.33 3.33 3.33 3.32 3.32 3.31 3.29
7 0.492 0.469 0.449 90 3.37 3.37 3.37 3.37 3.37 3.37 3.36 3.36 3.36 3.34
8 0.446 0.426 100 3.41 3.41 3.40 3.40 3.40 3.40 3.4'0 3.40 3.39 3.38
9 0.407

334 335
Table XXXI 5% and 1% criticai values for the maximum normed residual, for Table XXXII Criticai values for 5% and 1% tests of discordancy for a single outlier
testing the discordancy of a single outlier in a r x c factorial experiment (reproduced in a generai linear model with normal error structure, using the studentized residual
by permission of the American Statistica} Association and the American Society for as test statistic (reproduced by permission of the American Statistica} Association
Quality Control) and the American Society for Quality Control)
5% criticai values 5% criticai values
q
3 4 5 6 7 8 9 2 3 4 5 6 8 10 15 25
c n

3 0.648 0.645 0.624 0.600 ~ 0.577 0.555 0.535 5 1.92


4 0.621 0.590 0.561 0.535 0.5l3 0.493 6 2.07 1.93
5 0.555 0.525 0.499 0.457 7 2.19 2.08 1.94
0.477
6 0.495 0.469 0.447 0.428 8 2.28 2.20 2.10 1.94
7 0.444 0.423 0.405 9 2.35 2.29 2.21 2.10 1.95
8 10 2.42 2.37 2.31 2.22 2.11 1.95
0.402 0.385
12 2.52 2.49 2.45 2.39 2.33 2.24 1.96
9 0.368
14 2.61 2.58 2.55 2.51 2.47 2.41 2.25 1.96
16 2.68 2.66 2.63 2.60 2.57 2.53 2.43 2.26
18 2.73 2.72 2.70 2.68 2.65 2.62 2.55 2.44
l% criticai values 20 2.78 2.77 2.76 2.74 2.72 2.70 2.64 2.57 2.15
25 2.89 2.88 2.87 2.86 2.84 2.83 2.80 2.76 2.60
30 2.96 2.96 2.95 2.94 2.93 2.93 2.90 2.88 2.79 2.17
3 4 5 6 7 8 9 35 3.03 3.02 3.02 3.01 3.00 3.00 2.93 2.97 2.91 2.64
c 40 3.08 3.08 3.07 3.07 3.06 3.06 3.05 3.03 3.00 2.84
45 3.13 3.12 3.12 3.12 3.11 3.11 3.10 3.09 3.06 2.96
3 0.660 0.675 0.664 0.646 0.626 0.606 0.587 50 3.17 3.16 3.16 3.16 3.15 3.15 3.14 3.14 3.11 3.04
4 0.665 0.640 0.613 0.588 0.565 0.544 60 3.23 3.23 3.23 3.23 3.22 3.22 3.22 3.21 3.20 3.15
5 0.608 0.578 0.551 0.527 0.506 70 3.29 3.29 3.28 3.28 3.28 3.28 3.27 3.27 3.26 3.23
6 0.546 0.519 0.495 0.475 80 3.33 3.33 3.33 3.33 3.33 3.33 3.32 3.32 3.31 3.29
7 0.492 0.469 0.449 90 3.37 3.37 3.37 3.37 3.37 3.37 3.36 3.36 3.36 3.34
8 0.446 0.426 100 3.41 3.41 3.40 3.40 3.40 3.40 3.4'0 3.40 3.39 3.38
9 0.407

334 335
Table XXXII (Continued)
1% critica l values

q
1 2 3 4 5 6 8 10 15 25
n

5 1.98
6 2.17 1.98
7 2.32 2.17 1.98
8 2.44 2.32 2.18 1.98
9
10
2.54
2.62
2.44
2.55
2.33
2.45
2.18
2.33
1.99
2.18 1.99
References an d Bibliography
12 2.76 2.70 2.64 2.56 2.46 2.34 1.99
14 2.86 2.82 2.78 2.72 2.65 2.57 2.35 1.99
16 2.95 2.92 2.88 2.84 2.79 2.73 2.58 2.35 Most of the works listed here have been referred to in the text; the pages on
18 3.02 3.00 2.97 2.94 2.90 2.85 2.75 2.59 which principal or substantial mention is made of any work are shown in
20 3.08 3.06 3.04 3.01 2.98 2.95 2.87 2.76 2.20 parentheses at the end of the reference. An example is:
25 3.21 3.19 3.18 3.16 3.14 3.12 3.07 3.01 2.75
30 3.30 3.29 3.28 3.26 3.25 3.24 3.21 3.17 3.04 2.21 Behnken, D. W., and Draper, N. R. (1972). 'Residuals and their varia nce
35 3.37 3.36 3.35 3.34 3.34 3.33 3.30 3.25 3.19 2.81 patterns'. Technometrics, 14, 101-111. (253, 255, 261)
40 3.43 3.42 3.42 3.41 3.40 3.40 3.38 3.36 3.30 3.05
45 3.48 3.47 3.47 3.46 3.46 3.45 3.44 3.43 3.38 3.23 Additional works which have not been specifically mentioned in the text,
50 3.52 3.52 3.51 3.51 3.51 3.50 3.49 3.48 3.45 3.34 but which are likely to assist with further study of the history or develop-
60 3.60 3.59 3.59 3.59 3.58 3.58 3.57 3.56 3.54 3.48 ment of outlier problems are also listed. Where appropriate we bave
70 3.65 3.65 3.65 3.65 3.64 3.64 3.64 3.63 3.61 3.57 indicated the area of relevance by means of a chapter reference, accom-
80 3.70 3.70 3.70 3.70 3.69 3.69 3.69 3.68 3.67 3.64
90 3.74 3.74 3.74 3.74 3.74 3.74 3.73 3.73 3.72 3.70 panied by the symbol H if the work is of particular historical interest, for
100 3.78 3.78 3.78 3.77 3.77 3.77 3.77 3.77 3.76 3.74 example:

n = number of observations. Glaisher, J. W. L. (1874). Note on a paper by Mr. Stone 'On the rejection of
q number of independent variables (including count for intercept if fitted). discordant observations'. Monthly Notices Roy. Astr. Soc., 34, 251. (Cb. 2, H)

Acton, F. S. (1959). Analysis of Straight-Line Data, Wiley, New York. (Ch. 7)


Adichie, J. N. (1967a). 'Asymptotic efficiency of a class of non-parametric tests for
regression parameters'. Ann. Math. Statist., 38, 884-893. (256)
Adichie, J. N. (1967b). 'Estimates of regression parameters based on rank tests'.
Ann. Math. Statist., 38, 894-903. (256)
Airy, G. B. (1856). Letter from Professor Airy, Astronomer Royal, to the Editor.
Astr. J., 4, 137-138. (Ch. 2, H)
Airy, G. B. (1861). On the Algebraical and Numerica[ Theory of Errors of Obser-
vations and the Combination of Observations. MacMillan, London. (Ch. 2, H)
Allen, G. C. (1961). See Bernoulli, D. (1777).
Andrews, D. F. (1971). 'Significance tests based on residuals'. Biometrika, 58,
139-148. (260)
Andrews, D. F. (1972). 'Plots of high-dimensionai data'. Biometrics, 28, 125-136.
(227)
Andrews, D. F. (1973). 'Robust estimation for multiple linear regression models'.
Bull. Int. Statist. Inst., 45, 105-111. (Ch. 7)
Andrews, D. F. (1974). 'A robust method for multiple linear regression'. Technomet-
rics, 16, 523-531. (Ch. 7)

336 337
Table XXXII (Continued)
1% critica l values

q
1 2 3 4 5 6 8 10 15 25
n

5 1.98
6 2.17 1.98
7 2.32 2.17 1.98
8 2.44 2.32 2.18 1.98
9
10
2.54
2.62
2.44
2.55
2.33
2.45
2.18
2.33
1.99
2.18 1.99
References an d Bibliography
12 2.76 2.70 2.64 2.56 2.46 2.34 1.99
14 2.86 2.82 2.78 2.72 2.65 2.57 2.35 1.99
16 2.95 2.92 2.88 2.84 2.79 2.73 2.58 2.35 Most of the works listed here have been referred to in the text; the pages on
18 3.02 3.00 2.97 2.94 2.90 2.85 2.75 2.59 which principal or substantial mention is made of any work are shown in
20 3.08 3.06 3.04 3.01 2.98 2.95 2.87 2.76 2.20 parentheses at the end of the reference. An example is:
25 3.21 3.19 3.18 3.16 3.14 3.12 3.07 3.01 2.75
30 3.30 3.29 3.28 3.26 3.25 3.24 3.21 3.17 3.04 2.21 Behnken, D. W., and Draper, N. R. (1972). 'Residuals and their varia nce
35 3.37 3.36 3.35 3.34 3.34 3.33 3.30 3.25 3.19 2.81 patterns'. Technometrics, 14, 101-111. (253, 255, 261)
40 3.43 3.42 3.42 3.41 3.40 3.40 3.38 3.36 3.30 3.05
45 3.48 3.47 3.47 3.46 3.46 3.45 3.44 3.43 3.38 3.23 Additional works which have not been specifically mentioned in the text,
50 3.52 3.52 3.51 3.51 3.51 3.50 3.49 3.48 3.45 3.34 but which are likely to assist with further study of the history or develop-
60 3.60 3.59 3.59 3.59 3.58 3.58 3.57 3.56 3.54 3.48 ment of outlier problems are also listed. Where appropriate we bave
70 3.65 3.65 3.65 3.65 3.64 3.64 3.64 3.63 3.61 3.57 indicated the area of relevance by means of a chapter reference, accom-
80 3.70 3.70 3.70 3.70 3.69 3.69 3.69 3.68 3.67 3.64
90 3.74 3.74 3.74 3.74 3.74 3.74 3.73 3.73 3.72 3.70 panied by the symbol H if the work is of particular historical interest, for
100 3.78 3.78 3.78 3.77 3.77 3.77 3.77 3.77 3.76 3.74 example:

n = number of observations. Glaisher, J. W. L. (1874). Note on a paper by Mr. Stone 'On the rejection of
q number of independent variables (including count for intercept if fitted). discordant observations'. Monthly Notices Roy. Astr. Soc., 34, 251. (Cb. 2, H)

Acton, F. S. (1959). Analysis of Straight-Line Data, Wiley, New York. (Ch. 7)


Adichie, J. N. (1967a). 'Asymptotic efficiency of a class of non-parametric tests for
regression parameters'. Ann. Math. Statist., 38, 884-893. (256)
Adichie, J. N. (1967b). 'Estimates of regression parameters based on rank tests'.
Ann. Math. Statist., 38, 894-903. (256)
Airy, G. B. (1856). Letter from Professor Airy, Astronomer Royal, to the Editor.
Astr. J., 4, 137-138. (Ch. 2, H)
Airy, G. B. (1861). On the Algebraical and Numerica[ Theory of Errors of Obser-
vations and the Combination of Observations. MacMillan, London. (Ch. 2, H)
Allen, G. C. (1961). See Bernoulli, D. (1777).
Andrews, D. F. (1971). 'Significance tests based on residuals'. Biometrika, 58,
139-148. (260)
Andrews, D. F. (1972). 'Plots of high-dimensionai data'. Biometrics, 28, 125-136.
(227)
Andrews, D. F. (1973). 'Robust estimation for multiple linear regression models'.
Bull. Int. Statist. Inst., 45, 105-111. (Ch. 7)
Andrews, D. F. (1974). 'A robust method for multiple linear regression'. Technomet-
rics, 16, 523-531. (Ch. 7)

336 337
338 Outliers in statistica[ data References and bibliography 339

Andrews, D. F., Bickel, P. J., Hampel, F. R., I-iuber, P. J., Rogers, W. H., and Berman, S. (1962). 'Limiting distribution of the Studentized largest observation'.
Tukey, J. W. (1972). Robust Estimates of Location: Survey and Advances. Prin- Skand. Akt., 45, 154-161. (Cb. 3)
ceton University Press, Princeton, N.J. (48, 135, 150-155, 157, 158, 163-165) Bernoulli, D. (1777). 'Dijudicatio maxime probabilis plurium observationum dis-
Anonymous (1821). 'Dissertation sur la recherche du milieu le plus probable, entre crepantium atque verisimillima inductio inde formanda'. Acta Academiae Scien-
les résultats de plusieurs observations ou expériences'. Annales de Mathématiques tiorum Petropolitanae, 1, 3-33. English translation by C. G. Allen (1961), Biomet-
Pure et Appliquées, 12, 181-204. (Cb. 2, H) rika, 48, 3-13. (18)
Anscombe, F. J. (1960a). 'Rejection of outliers'. Technometrics, 2, 123-147. (26, 27, Bernoulli, J. III (1785). 'Milieu'. Encyclopédie Méthodique, II, 404-409. (Cb. 2, H)
34, 50, 127, 131, 231, 240, 246, 247, 260) Bertrand, J. (1888a). 'Sur la loi de probabilité des erreurs d'observation'. C. R.
Anscombe, F. J. (1960b). 'Discussion by Kruskal, W., Ferguson, T. S., Tukey, J. W., Acad. Sci. Paris, 106, 153-156. (Cb. 2, H)
Gumbel, E. J. and Anscombe, F. J.'. Technometrics, 2, 157-166. (Cb. 2) Bertrand, J. (1888b). 'Sur la combination des mesures d'une meme grandeur'. C. R.
Anscombe, F. J. (1961). 'Examination of residuals'. Proceedings of the Fourth Acad. Sci. Paris, 106, 701-704. (Cb. 2, H)
Berkeley Symposium on Mathematical Statistics and Probability, Vol. l, pp. 1-36. Bertrand, J. (1889). Calcul des Probabilités. Gauthier-Villars, Paris. (Cb. 2, H)
(247, 259) Besse!, F. W., and Baeuer, J. J. (1838). Gradmessung in Ostpreussen und ihre
Anscombe, F. J. (1968). 'Statistica! analysis, special problems of outliers'. In Inter- Verbindung mit Preussischen und Russischen Dreiecksketten. Berlin. Reprinted in
national Encyclopaedia Social Sciences. MacMillan, New York, Vol. 15, pp. Abhendlungen von F. W. Besse[ (ed. R. Engleman), Leipzig, 1876.
178-182. (Cb. 2) Bickel, P. J. (1965). 'On some robust estimates of location'. Ann. Math. Statist., 36,
Anscombe, F. J., and Barron, B. A. (1966). 'Treatment of outliers in samples of size 847-858. (46, 135, 152, 156, 157)
three'. J. Res. Nat. Bur. Standards, B, 70, 141-147. (51, 126, 132, 135) Bickel, P. J. (1967). 'Some contributions to the theory of order statistics'. Proc. Fifth
Anscombe, F. J., and Tukey, J. W. (1963). 'The examination and analysis of Berkely Symp. Math. Statist. Prob. Vol. l, 575-591. (167)
residuals'. Technometrics, 5, 141-160. (247, 248) Bickel, P. J., and Hodges, J. L. Jr. (1967). 'The asymptotic theory of Galton's test
Ansell, M. (1973). 'Robustness of location estimators to asymmetry'. Applied Statis- and a related simple estimate of location'. Ann. Math. Statist., 38, 73-89. (49, 165)
tics, 22, 249-254. (Cb. 4) Birnbaum, A. (1959). 'On the analysis of factorial experiments without replication'.
Antille, A. (1974). 'A linearized version of the Hodges-Lehmann estimator'. Ann. Technometrics, 1, 343-357. (248)
Statist., 2, 1308-1313 (Cb. 4) Birnbaum, A., and Laska, E. M. (1967). 'Optimal robustness: A generai method
Arley, N. (1940). 'On the distribution of relative errors from a normal population of with applications to linear estimators of location'. J. Amer. Statist. Ass., 62,
errors. A discussion of some problems in the theory of errors'. Mathematisk- 1230-1240. (135)
Fysiske Meddelelser udgivet af det Kgl. Danske Videnskabernes Selskab, 18. (Cb. 2) Birnbaum, A., and Miké, V. (1970). 'Asymptotically robust estimators of location'. J.
Arley, N., and Buch, K. (1950). Introduction to the Theory of Probability and Amer. Statist. Ass., 65, 1265-1282. (153)
Statistics. Chapman & Hall, London; Wiley, New York. (Cb. 2) Birnbaum, A., Laska, E. M., and Meisner, M. (1971). 'Optimally robust linear
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972). estimators of location'. J. Amer. Statist. Ass., 66, 302-310. (135)
Statistica[ Inference under Order Restrictions. Wiley, New York. (187) Bliss, C. 1., Cochran, W. G., and Tukey, J. W. (1956). 'A rejection criterion based
Barnett, V. D. (1966). 'Order statistics estimators of the location of the Cauchy upon the range'. Biometrika, 43, 418-422. (197)
distribution'. J. Amer. Statist. Ass., 61, 1205-1218. (48) Bofinger, V. J. (1965). 'The k-sample slippage problem'. Austral. J. Statist., 7,
Barnett, V. D. (1976). 'The ordering of multivariate data' (with Discussion). J. Roy. 20-31. (178, 179, 182, 184)
Statist. Soc. A, 139, (208) Boscovich, R. J. (1757). 'De litteraria expeditione per pontificiam ditionem, et
Barnett, V. D., and Lewis, T. (1967). 'A study of low-temperature probabilities in synopsis amplioris operis, ac habentur plura ejus ex exemplaria etiam sensorum
the context of an industriai problem' (with Discussion). J. Roy. Statist. Soc. A, 130, impressa. Bononiensi Scientiarum et Artum Instuto Atque Academia Commentarii,
177-206. (6) 4, 353-396. (Cb. 2, H)
Basu, A. P. (1965). 'On some tests of hypotheses relating to the exponential Bowley, A. L. (1928). F. Y. Edgeworth's Contributions to Mathematical Statistics.
distribution when some outliers are present'. J. Amer. Statist. Ass., 60, 548-559. Royal Statistica! Society, London. (Cb. 2. H)
Corr. J. Amer. Statist. Ass., 60, 1249. (77) Box, G. E. P., and Tiao, G. C. (1962). 'A further look at robustness via Bayes'
Bechhofer, R. E., Kiefer, J., and Sobel, M. (1968). Sequential Identification and theorem'. Biometrika, 49, 419-432. (279)
Ranking Procedures. The University of Chicago Press, Chicago. (187) Box, G. E. P., and Tiao, G. C. (1968). 'A Bayesian approach to some outlier
Beckman, R. J., and Trussell, H. J. (1974). 'The distribution of an arbitrary problems'. Biometrika, 55, 119-129. (32, 46, 155, 252, 275, 276, 277)
studentized residual and the effects of updating in multiple regression'. J. Amer. Bross, l. D. J. (1961). 'Outliers in patterned experiments: a strategie re-appraisal'.
Statist. Ass., 69, 199-201. (Cb. 7) Technometrics, 3, 91-102. (235, 238, 251)
Begg, T. B., Preston, S. R., and Healy, M. J. R. (1966). 'The dietary habits of Brown, B. M. (1975). 'A short-cut test for outliers using residuals'. Biometrika, 62,
patients with occlusive arterial disease'. Atti V Conv. internat. Asp. diet. Inf. 623-629. (252)
Senesc., pp. 66-79. (212) Brunt, D. (1917). The Combination of Observations. University Press, Cambridge.
Behnken, D. W., and Draper, N. R. (1972). 'Residuals and their variance patterns'. (2nd edn. 1931). (Cb. 2, H)
Technometrics, 14, 101-111. (253, 255, 261) Cacoullos, T. (1968). 'A sequential scheme for detecting outliers'. Bulletin de la
Beran, R. (1974). 'Asymptotically efficient adaptive rank estimates in location Société Mathématique de Grèce, 9, 113-123. (Cb. 3)
models'. Ann. Statist., 2, 63-74. (Cb. 4) Calvin, M., Heidelberger, C., Reid, J. C., Tolbert, B. M., and Yankwich, P. F.
338 Outliers in statistica[ data References and bibliography 339

Andrews, D. F., Bickel, P. J., Hampel, F. R., I-iuber, P. J., Rogers, W. H., and Berman, S. (1962). 'Limiting distribution of the Studentized largest observation'.
Tukey, J. W. (1972). Robust Estimates of Location: Survey and Advances. Prin- Skand. Akt., 45, 154-161. (Cb. 3)
ceton University Press, Princeton, N.J. (48, 135, 150-155, 157, 158, 163-165) Bernoulli, D. (1777). 'Dijudicatio maxime probabilis plurium observationum dis-
Anonymous (1821). 'Dissertation sur la recherche du milieu le plus probable, entre crepantium atque verisimillima inductio inde formanda'. Acta Academiae Scien-
les résultats de plusieurs observations ou expériences'. Annales de Mathématiques tiorum Petropolitanae, 1, 3-33. English translation by C. G. Allen (1961), Biomet-
Pure et Appliquées, 12, 181-204. (Cb. 2, H) rika, 48, 3-13. (18)
Anscombe, F. J. (1960a). 'Rejection of outliers'. Technometrics, 2, 123-147. (26, 27, Bernoulli, J. III (1785). 'Milieu'. Encyclopédie Méthodique, II, 404-409. (Cb. 2, H)
34, 50, 127, 131, 231, 240, 246, 247, 260) Bertrand, J. (1888a). 'Sur la loi de probabilité des erreurs d'observation'. C. R.
Anscombe, F. J. (1960b). 'Discussion by Kruskal, W., Ferguson, T. S., Tukey, J. W., Acad. Sci. Paris, 106, 153-156. (Cb. 2, H)
Gumbel, E. J. and Anscombe, F. J.'. Technometrics, 2, 157-166. (Cb. 2) Bertrand, J. (1888b). 'Sur la combination des mesures d'une meme grandeur'. C. R.
Anscombe, F. J. (1961). 'Examination of residuals'. Proceedings of the Fourth Acad. Sci. Paris, 106, 701-704. (Cb. 2, H)
Berkeley Symposium on Mathematical Statistics and Probability, Vol. l, pp. 1-36. Bertrand, J. (1889). Calcul des Probabilités. Gauthier-Villars, Paris. (Cb. 2, H)
(247, 259) Besse!, F. W., and Baeuer, J. J. (1838). Gradmessung in Ostpreussen und ihre
Anscombe, F. J. (1968). 'Statistica! analysis, special problems of outliers'. In Inter- Verbindung mit Preussischen und Russischen Dreiecksketten. Berlin. Reprinted in
national Encyclopaedia Social Sciences. MacMillan, New York, Vol. 15, pp. Abhendlungen von F. W. Besse[ (ed. R. Engleman), Leipzig, 1876.
178-182. (Cb. 2) Bickel, P. J. (1965). 'On some robust estimates of location'. Ann. Math. Statist., 36,
Anscombe, F. J., and Barron, B. A. (1966). 'Treatment of outliers in samples of size 847-858. (46, 135, 152, 156, 157)
three'. J. Res. Nat. Bur. Standards, B, 70, 141-147. (51, 126, 132, 135) Bickel, P. J. (1967). 'Some contributions to the theory of order statistics'. Proc. Fifth
Anscombe, F. J., and Tukey, J. W. (1963). 'The examination and analysis of Berkely Symp. Math. Statist. Prob. Vol. l, 575-591. (167)
residuals'. Technometrics, 5, 141-160. (247, 248) Bickel, P. J., and Hodges, J. L. Jr. (1967). 'The asymptotic theory of Galton's test
Ansell, M. (1973). 'Robustness of location estimators to asymmetry'. Applied Statis- and a related simple estimate of location'. Ann. Math. Statist., 38, 73-89. (49, 165)
tics, 22, 249-254. (Cb. 4) Birnbaum, A. (1959). 'On the analysis of factorial experiments without replication'.
Antille, A. (1974). 'A linearized version of the Hodges-Lehmann estimator'. Ann. Technometrics, 1, 343-357. (248)
Statist., 2, 1308-1313 (Cb. 4) Birnbaum, A., and Laska, E. M. (1967). 'Optimal robustness: A generai method
Arley, N. (1940). 'On the distribution of relative errors from a normal population of with applications to linear estimators of location'. J. Amer. Statist. Ass., 62,
errors. A discussion of some problems in the theory of errors'. Mathematisk- 1230-1240. (135)
Fysiske Meddelelser udgivet af det Kgl. Danske Videnskabernes Selskab, 18. (Cb. 2) Birnbaum, A., and Miké, V. (1970). 'Asymptotically robust estimators of location'. J.
Arley, N., and Buch, K. (1950). Introduction to the Theory of Probability and Amer. Statist. Ass., 65, 1265-1282. (153)
Statistics. Chapman & Hall, London; Wiley, New York. (Cb. 2) Birnbaum, A., Laska, E. M., and Meisner, M. (1971). 'Optimally robust linear
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. (1972). estimators of location'. J. Amer. Statist. Ass., 66, 302-310. (135)
Statistica[ Inference under Order Restrictions. Wiley, New York. (187) Bliss, C. 1., Cochran, W. G., and Tukey, J. W. (1956). 'A rejection criterion based
Barnett, V. D. (1966). 'Order statistics estimators of the location of the Cauchy upon the range'. Biometrika, 43, 418-422. (197)
distribution'. J. Amer. Statist. Ass., 61, 1205-1218. (48) Bofinger, V. J. (1965). 'The k-sample slippage problem'. Austral. J. Statist., 7,
Barnett, V. D. (1976). 'The ordering of multivariate data' (with Discussion). J. Roy. 20-31. (178, 179, 182, 184)
Statist. Soc. A, 139, (208) Boscovich, R. J. (1757). 'De litteraria expeditione per pontificiam ditionem, et
Barnett, V. D., and Lewis, T. (1967). 'A study of low-temperature probabilities in synopsis amplioris operis, ac habentur plura ejus ex exemplaria etiam sensorum
the context of an industriai problem' (with Discussion). J. Roy. Statist. Soc. A, 130, impressa. Bononiensi Scientiarum et Artum Instuto Atque Academia Commentarii,
177-206. (6) 4, 353-396. (Cb. 2, H)
Basu, A. P. (1965). 'On some tests of hypotheses relating to the exponential Bowley, A. L. (1928). F. Y. Edgeworth's Contributions to Mathematical Statistics.
distribution when some outliers are present'. J. Amer. Statist. Ass., 60, 548-559. Royal Statistica! Society, London. (Cb. 2. H)
Corr. J. Amer. Statist. Ass., 60, 1249. (77) Box, G. E. P., and Tiao, G. C. (1962). 'A further look at robustness via Bayes'
Bechhofer, R. E., Kiefer, J., and Sobel, M. (1968). Sequential Identification and theorem'. Biometrika, 49, 419-432. (279)
Ranking Procedures. The University of Chicago Press, Chicago. (187) Box, G. E. P., and Tiao, G. C. (1968). 'A Bayesian approach to some outlier
Beckman, R. J., and Trussell, H. J. (1974). 'The distribution of an arbitrary problems'. Biometrika, 55, 119-129. (32, 46, 155, 252, 275, 276, 277)
studentized residual and the effects of updating in multiple regression'. J. Amer. Bross, l. D. J. (1961). 'Outliers in patterned experiments: a strategie re-appraisal'.
Statist. Ass., 69, 199-201. (Cb. 7) Technometrics, 3, 91-102. (235, 238, 251)
Begg, T. B., Preston, S. R., and Healy, M. J. R. (1966). 'The dietary habits of Brown, B. M. (1975). 'A short-cut test for outliers using residuals'. Biometrika, 62,
patients with occlusive arterial disease'. Atti V Conv. internat. Asp. diet. Inf. 623-629. (252)
Senesc., pp. 66-79. (212) Brunt, D. (1917). The Combination of Observations. University Press, Cambridge.
Behnken, D. W., and Draper, N. R. (1972). 'Residuals and their variance patterns'. (2nd edn. 1931). (Cb. 2, H)
Technometrics, 14, 101-111. (253, 255, 261) Cacoullos, T. (1968). 'A sequential scheme for detecting outliers'. Bulletin de la
Beran, R. (1974). 'Asymptotically efficient adaptive rank estimates in location Société Mathématique de Grèce, 9, 113-123. (Cb. 3)
models'. Ann. Statist., 2, 63-74. (Cb. 4) Calvin, M., Heidelberger, C., Reid, J. C., Tolbert, B. M., and Yankwich, P. F.
References and bibliography 341
340 Outliers in statistica[ data
Daniel, C. (1959). 'Use of half-norrnal plots in interpreting factorial two-level
(1949). Isotopic Carbon: Techniques in its measurement and chemical manipulation. experiments'. Technometrics, 1, 311-341. (25, 248, 274)
Wiley, New York. (2) Daniel, C. (1960). 'Locating outliers in factorial experiments'. Technometrics, 2,
Chambers, C. (1967). 'Extension of tables of percentage points of the largest 149-156. (235, 238, 240)
variance ratio S?:naxiS~·. Biometrika, 54, 225-228. (191) Daniell, P. J. (1920),. 'Observations weighted according to order'. Amer. J. Math.,
Chandra Sekar, C., and Francis, M. G. (1941). 'A method to get the significance limit 42, 222-236. (Ch. 2, H)
of a type of test criteria'. Sankhya, 5, 165-168. (191) Darling, D. A. (1952). 'On a test for homogeneity and extreme values'. Ann. Math.
Chase, G. R., and Bulgren, W. G. (1971). 'A Monte Carlo investigation of the Statist., 23, 450-456. (Ch. 3)
robustness of T 2 •• J. Amer. Statist. Ass., 66, 499-503. (Ch. 4) David, H. A. (1952). 'Upper 5 and 1% points of the maximum F-ratio'. Biometrika,
Chatfield, C. (1974). Personal correspondence. (7) 39, 422-424. (84)
Chatfield, C. (1975). The Analysis of Time Series: Theory and Practice. Chapman and David, H. A. (1956a). 'On the application to statistics of an elementary theorem in
Hall, London (p. 102). (11) probability'. Biometrika, 43, 85-91. (105)
Chatfield, C., Ehrenberg, A. S. C., and Goodhardt, G. J. (1966). 'Progress on a David, H. A. (1956b ). 'Revised upper percentage points of the extreme studentized
simplified model of stationary purchasing behaviour' (with Discussion). J. Roy. deviate from the sample mean'. Biometrika, 43, 449-451. (105, 110, 111)
Statist. Soc. A, 129, 317-367. (9) David, H. A. (1962). 'Order statistics in short-cut tests'. In Sarhan and Greenberg
Chauvenet, W. (1863). 'Method of least squares'. Appendix to Manual of Spherical (1962). (107)
and Practical Astronomy, Vol. 2, Lippincott, Philadelphia, pp. 469-566; tables David, H. A. (1970). Order Statistics. Wiley, New York. (43, 44, 45, 65-67, 73, 152,
593-599. Reprinted (1960) 5th edn. Dover, New York. (2, 19, 24) 159, 192)
Chedzoy, O. B. (1973). Paper read at annua! conference of Royal Statistica! Society, David, H. A., and Paulson, A. S. (1965). 'The performance of severa! tests for
Newcastle upon Tyne. (13) outliers'. Biometrika, 52, 429-436. (45, 94, 104, 105, 111)
Chen, E. H. (1971). 'The power of the Shapiro-Wilk W-test for normality in samples David, H. A., Hartley, H. 0., and Pearson, E. S. (1954). 'The distribution of the
from contaminated distributions'. J. Amer. Statist. Ass., 66, 760-762. (249) ratio, in a single norma! sample, of range to standard deviation'. Biometrika, 41,
Chen, E. H., and Dixon, W. J. (1972). 'Estirnates of parameters of a censored 482-493. (39, 97)
regression sample'. J. Amer. Statist. Ass., 67, 664-671. (~h. 7) . . . . David, F. N., Barton, D. E., Ganeshalingam, S., Harter, H. L., Kim, P. J.,
Chernoff, H., Gastwirth, J. L., and Johns, M. V. Jr. (1967). Asymptotic d1stnbution Merrington, M., and Walley, D. (1968). Norma/ Centroids, Medians and Scores for
of linear combinations of order statistics'. Ann. Math. Statist., 31, 52-72. (153) Ordinai Data, Tracts for computers No. XXIX, Cambridge University Press,
Chew, V. (1964). 'Tests for the rejection of outlying observations'. RCA Systems London. (139)
Analysis Technical Memorandum No. 64-7, Missile Test Project, Patrick Air Force Dawid, A. P., Stone, M., and Zidek, J. V. (1973). 'Marginalisation paradoxes in
Base, Florida. (Ch. 3) Bayesian and structural inference' (with Discussion). J. Roy. Statist. Soc. B, 35,
Claridge, P. N., and Potter, I. C. (1974). 'Heart ratios at different stages in the life 189-233. (273)
cyde of lampreys'. Acta Zoologica, 55, 61-69. (17 5) DeAlba Guerra, E., and Van Ryzin, J. (1974). 'An empirica! Bayes approach to the
Cochran, W. G. (1941). 'The distribution of the largest of a set of estimatèd outlier problem' (Abstract). Inst. Math. Statist. Bull., 3, 125. (Ch. 8)
variances as a fraction of their total'. Ann. Eugen., 11, 47-52. (79, 108, 191) De Finetti, B. (1961). 'The Bayesian approach to the rejection of outliers'. Pro-
Cochran, W. G. (1968). 'Errors of measurement in statistics'. Technometrics, 10,
ceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Pro ba-
637-666. (Ch. 3)
bility, Vol. 1, pp. 199-210. (46, 270)
Collett, D., and Lewis, T. (1976). 'The subjective nature of outlier rejection Dempster, A. P., and Rosner, B. (1971). 'Detection of outliers'. In Gupta and
procedures'. Applied Statistics, 25, 228-237. (64) Yackel (1971). (46, 272)
Conover, W. J. (1965). 'Several k-sample Kolmogorov-Smirnov tests'. Ann. Math. Desu, M. M., Gehan, E. A., and Severo, N. C. (1974). 'A two-stage estirnation
Statist., 36, 1019-1026. (Ch. 5)
procedure when there may be spurious observations'. Biometrika, 61, 593-599.
Conover, W. J. (1968). 'Two k-sample slippage tests'. J. Amer. Statist. Ass., 63, (51, 132)
614-626. (183, 184)
Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R. (1975). 'Robust estimation
Coolidge, J. L. (1925). An Introduction to Mathematical Probability. Oxford Univer- and outlier detection with correlation coefficients'. Biometrika, 62, 531-545. (228,
sity Press, London. Reprint (1962) Dover, New York. (Ch. 2, H) 233)
Cox, D. R., and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, Dixon, W. J. (1950). 'Analysis of extreme values'. Ann. Math. Statist., 21, 488-506.
London. (48)
(34, 54, 66, 69, 70, 87, 94, 96, 98, 99, 100, 107, 110, 111, 114)
Cox, D. R., and Snell, E. J. (1968). 'A generai definition of residuals' (with
Dixon, W. J. (1951). 'Ratios involving extreme values'. Ann. Math. Statist., 22,
Discussion). J. Roy. Statist. Soc. B, 30, 248-275. (261) 68-78. (38, 54, 87, 98, 99, 100)
Cox, D. R., and Snell, E. J. (1971). 'On test statistics calculated from residuals'.
Dixon, W. J. (1953). 'Processing data for outliers'. Biometrics, 9, 74-89. (33, 41, 73)
Biometrika, 58, 589-594. (261)
Dixon, W. J. (1960). 'Simplified estimation from censored normal samples'. Ann.
Crow, E. L., and Siddiqui, M. M. (1967). 'Robust estimation of location'. J. Amer.
Math. Statist., 31, 385-391. (135, 160, 169)
Statist. Ass., 62, 353-389. (135, 156, 157)
Dixon, W. J. (1962). 'Rejection of observations'. In Sarhan and Greenberg (1962).
Cucconi, O. (1962). 'Un criterio per il rigetto delle osservazioni spurie'. Scuola in. (94, 111, 114)
Azione, 21, 92-106. (Ch. 3)
Dixon, W. J. (1964). 'Query 4: Rejection of outlying values'. Technometrics, 6, 238.
Czuber, E. (1891). Theorie der Beobachtungsfehler. Teubner, Leipzig. (Ch. 2, H) (Ch. 3)
References and bibliography 341
340 Outliers in statistica[ data
Daniel, C. (1959). 'Use of half-norrnal plots in interpreting factorial two-level
(1949). Isotopic Carbon: Techniques in its measurement and chemical manipulation. experiments'. Technometrics, 1, 311-341. (25, 248, 274)
Wiley, New York. (2) Daniel, C. (1960). 'Locating outliers in factorial experiments'. Technometrics, 2,
Chambers, C. (1967). 'Extension of tables of percentage points of the largest 149-156. (235, 238, 240)
variance ratio S?:naxiS~·. Biometrika, 54, 225-228. (191) Daniell, P. J. (1920),. 'Observations weighted according to order'. Amer. J. Math.,
Chandra Sekar, C., and Francis, M. G. (1941). 'A method to get the significance limit 42, 222-236. (Ch. 2, H)
of a type of test criteria'. Sankhya, 5, 165-168. (191) Darling, D. A. (1952). 'On a test for homogeneity and extreme values'. Ann. Math.
Chase, G. R., and Bulgren, W. G. (1971). 'A Monte Carlo investigation of the Statist., 23, 450-456. (Ch. 3)
robustness of T 2 •• J. Amer. Statist. Ass., 66, 499-503. (Ch. 4) David, H. A. (1952). 'Upper 5 and 1% points of the maximum F-ratio'. Biometrika,
Chatfield, C. (1974). Personal correspondence. (7) 39, 422-424. (84)
Chatfield, C. (1975). The Analysis of Time Series: Theory and Practice. Chapman and David, H. A. (1956a). 'On the application to statistics of an elementary theorem in
Hall, London (p. 102). (11) probability'. Biometrika, 43, 85-91. (105)
Chatfield, C., Ehrenberg, A. S. C., and Goodhardt, G. J. (1966). 'Progress on a David, H. A. (1956b ). 'Revised upper percentage points of the extreme studentized
simplified model of stationary purchasing behaviour' (with Discussion). J. Roy. deviate from the sample mean'. Biometrika, 43, 449-451. (105, 110, 111)
Statist. Soc. A, 129, 317-367. (9) David, H. A. (1962). 'Order statistics in short-cut tests'. In Sarhan and Greenberg
Chauvenet, W. (1863). 'Method of least squares'. Appendix to Manual of Spherical (1962). (107)
and Practical Astronomy, Vol. 2, Lippincott, Philadelphia, pp. 469-566; tables David, H. A. (1970). Order Statistics. Wiley, New York. (43, 44, 45, 65-67, 73, 152,
593-599. Reprinted (1960) 5th edn. Dover, New York. (2, 19, 24) 159, 192)
Chedzoy, O. B. (1973). Paper read at annua! conference of Royal Statistica! Society, David, H. A., and Paulson, A. S. (1965). 'The performance of severa! tests for
Newcastle upon Tyne. (13) outliers'. Biometrika, 52, 429-436. (45, 94, 104, 105, 111)
Chen, E. H. (1971). 'The power of the Shapiro-Wilk W-test for normality in samples David, H. A., Hartley, H. 0., and Pearson, E. S. (1954). 'The distribution of the
from contaminated distributions'. J. Amer. Statist. Ass., 66, 760-762. (249) ratio, in a single norma! sample, of range to standard deviation'. Biometrika, 41,
Chen, E. H., and Dixon, W. J. (1972). 'Estirnates of parameters of a censored 482-493. (39, 97)
regression sample'. J. Amer. Statist. Ass., 67, 664-671. (~h. 7) . . . . David, F. N., Barton, D. E., Ganeshalingam, S., Harter, H. L., Kim, P. J.,
Chernoff, H., Gastwirth, J. L., and Johns, M. V. Jr. (1967). Asymptotic d1stnbution Merrington, M., and Walley, D. (1968). Norma/ Centroids, Medians and Scores for
of linear combinations of order statistics'. Ann. Math. Statist., 31, 52-72. (153) Ordinai Data, Tracts for computers No. XXIX, Cambridge University Press,
Chew, V. (1964). 'Tests for the rejection of outlying observations'. RCA Systems London. (139)
Analysis Technical Memorandum No. 64-7, Missile Test Project, Patrick Air Force Dawid, A. P., Stone, M., and Zidek, J. V. (1973). 'Marginalisation paradoxes in
Base, Florida. (Ch. 3) Bayesian and structural inference' (with Discussion). J. Roy. Statist. Soc. B, 35,
Claridge, P. N., and Potter, I. C. (1974). 'Heart ratios at different stages in the life 189-233. (273)
cyde of lampreys'. Acta Zoologica, 55, 61-69. (17 5) DeAlba Guerra, E., and Van Ryzin, J. (1974). 'An empirica! Bayes approach to the
Cochran, W. G. (1941). 'The distribution of the largest of a set of estimatèd outlier problem' (Abstract). Inst. Math. Statist. Bull., 3, 125. (Ch. 8)
variances as a fraction of their total'. Ann. Eugen., 11, 47-52. (79, 108, 191) De Finetti, B. (1961). 'The Bayesian approach to the rejection of outliers'. Pro-
Cochran, W. G. (1968). 'Errors of measurement in statistics'. Technometrics, 10,
ceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Pro ba-
637-666. (Ch. 3)
bility, Vol. 1, pp. 199-210. (46, 270)
Collett, D., and Lewis, T. (1976). 'The subjective nature of outlier rejection Dempster, A. P., and Rosner, B. (1971). 'Detection of outliers'. In Gupta and
procedures'. Applied Statistics, 25, 228-237. (64) Yackel (1971). (46, 272)
Conover, W. J. (1965). 'Several k-sample Kolmogorov-Smirnov tests'. Ann. Math. Desu, M. M., Gehan, E. A., and Severo, N. C. (1974). 'A two-stage estirnation
Statist., 36, 1019-1026. (Ch. 5)
procedure when there may be spurious observations'. Biometrika, 61, 593-599.
Conover, W. J. (1968). 'Two k-sample slippage tests'. J. Amer. Statist. Ass., 63, (51, 132)
614-626. (183, 184)
Devlin, S. J., Gnanadesikan, R., and Kettenring, J. R. (1975). 'Robust estimation
Coolidge, J. L. (1925). An Introduction to Mathematical Probability. Oxford Univer- and outlier detection with correlation coefficients'. Biometrika, 62, 531-545. (228,
sity Press, London. Reprint (1962) Dover, New York. (Ch. 2, H) 233)
Cox, D. R., and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, Dixon, W. J. (1950). 'Analysis of extreme values'. Ann. Math. Statist., 21, 488-506.
London. (48)
(34, 54, 66, 69, 70, 87, 94, 96, 98, 99, 100, 107, 110, 111, 114)
Cox, D. R., and Snell, E. J. (1968). 'A generai definition of residuals' (with
Dixon, W. J. (1951). 'Ratios involving extreme values'. Ann. Math. Statist., 22,
Discussion). J. Roy. Statist. Soc. B, 30, 248-275. (261) 68-78. (38, 54, 87, 98, 99, 100)
Cox, D. R., and Snell, E. J. (1971). 'On test statistics calculated from residuals'.
Dixon, W. J. (1953). 'Processing data for outliers'. Biometrics, 9, 74-89. (33, 41, 73)
Biometrika, 58, 589-594. (261)
Dixon, W. J. (1960). 'Simplified estimation from censored normal samples'. Ann.
Crow, E. L., and Siddiqui, M. M. (1967). 'Robust estimation of location'. J. Amer.
Math. Statist., 31, 385-391. (135, 160, 169)
Statist. Ass., 62, 353-389. (135, 156, 157)
Dixon, W. J. (1962). 'Rejection of observations'. In Sarhan and Greenberg (1962).
Cucconi, O. (1962). 'Un criterio per il rigetto delle osservazioni spurie'. Scuola in. (94, 111, 114)
Azione, 21, 92-106. (Ch. 3)
Dixon, W. J. (1964). 'Query 4: Rejection of outlying values'. Technometrics, 6, 238.
Czuber, E. (1891). Theorie der Beobachtungsfehler. Teubner, Leipzig. (Ch. 2, H) (Ch. 3)
342 Outliers in statistica[ data References and bibliography 343

Dixon, W. J., and Tukey, J. W. (1968). 'Approximate behaviour of the distribution Fieller, N. R. J. (1976). Some Problems related to the Rejection of Outlying Obser-
of Winsorized t (Trimming/Winsorization 2)'. Technometrics, 10, 83-98. (142, 159, vations. Ph.D. Thesis, University of Sheffield. (71, 83, 85, 95-97, 106-108,
168, 169) 110-112, 251, 264)
Doolittle, H. M. (1884). 'The rejection of doubtful observations' (Abstract). Bulletin Finney, D. J. (1974). 'Problems, data and inference: The Address of the President'
of the Philosophical Society of Washington, (Math. Soc.), 6, 153-156. (Ch. 2, H) (with Proceedi~gs). J. Roy. Statist. Soc. A, 137, 1-23, (6)
Doornbos, R. (1956). 'Significance of the smallest of a set of estimated norma! Fisher, R. A. (1929). 'Tests of significance in harmonic analysis'. Proc. Roy. Soc. A,
variances'. Statistica Neerlandica, 10, 117-126. (191, 197) 125, 54-59. (64, 79)
Doornbos, R. (1966). Slippage Tests. Mathematical Centre Tracts, No. 15, Fisher, R. A. (1936). 'The use of multiple measurements in taxonomic problems'.
Mathematisch Centrum, Amsterdam. (120, 187, 191, 193, 195, 197, 200, 201- Ann. Eugen, 7, 179-188. (227)
204) Fisher, R. A. (1960). The Design of Experiments. 7th edn. Oliver & Boyd, Edin-
Doornbos, R., and Prins, H. J. (1956). 'Slippage t,ests for a set of gamma-variates'. burgh. (279)
Indag. Math., 18, 329-337. (202) Fisher, R. A., Corbet, A. S., and Williams, C. B. (1943). 'The relation between the
Doornbos, R., and Prins, H. J. (1958). 'On slippage tests'. Indag. Math., 20, 38-55, number of species and the number of individuals in a random sample of an animai
438-447. (181, 195, 197) population'. J. Anima[ Ecol., 12, 42-57. (lO)
Downton, F. D. (1976). 'Nonparametric tests for block experiments'. Biometrika, 63, Forsythe, A. B. (1972). 'Robust estimation of straight line regression coefficients by
137-141. (187) minimizing pth power deviations'. Technometrics, 14, 159-166. (Ch. 7)
Draper, N. R., and Smith, H. (1966). Applied Regression Analysis. Wiley, New York. Fox, A. J. (1972). 'Outliers intime series'. J. Roy. Statist. Soc. B, 43, 350-363. (267)
(261) Friedman, M. (1940). 'A comparison of alternative tests of significance for the
Edgeworth, F. Y. (1883). 'The method of least squares'. Philosophical Magazine, 16, problem of m-rankings'. Ann. Math. Statist., 11, 86-92. (187)
360-375. (20) Gastwirth, J. L. (1966). 'On robust procedures'. J. Amer. Statist. Ass., 61, 929-948.
Edgeworth, F. Y. (1887). 'On discordant observations'. Philosophical Magazine, 23, (48, 135, 153, 164, 166)
364-375. (20) Gastwirth, J. L., and Cohen, M. L. (1970). 'Small sample behaviour of some robust
van Eeden, C. (1970). 'Efficiency-robust estimation of location'. Ann. Math. Statist., linear estimators of location'. J. Amer. Statist. Ass., 65, 946-973. (135, 153, 157,
41, 172-181. (Ch. 4) 164)
Eisenhart, C., and Solomon, H. (1947). 'Significance of the largest of a set of sample Gastwirth, J. L., and Rubin, H. (1969). 'On robust linear estimators'. Ann. Math.
estimates of variance'. In Eisenhart, Hastay and Wallis (1947). (191) Statist., 40, 24-39. (156)
Eisenhart, C., Hastay, M. W., and Wallis, W. A. (Eds.) (1947). Selected Techniques Gebhardt, F. (1964. 'On the risk of some strategies for outlying observations'. Ann.
of Statistica[ Analysis, McGraw-Hill, New York. (79, 108, 191, 196, 197) Math. Statist., 35, 1524-1536. (51)
Elashoff, J. D. (1972). 'A model for quadratic outliers in linear regression'. J. Amer. Gebhardt, F. (1966). 'On the effect of stragglers on the risk of some mean estimators
Statist. Ass., 67, 478-485. (255) in small samples'. Ann. Math. Statist., 37, 441-450. (Ch. 4)
Ellenberg, J. H. (1973). 'The joint distribution of the standardized least squares Gentleman, J. F., and Wilk, M. B. (1975a). 'Detecting Outliers: II Supplementing
residuals from a generai linear regression'. J. Amer. Statist. Ass., 68, 941-943. the direct analysis of residuals'. Biometrics, 31, 387-410. (249, 251, 265)
(255, 262, 264, 265) Gentleman, J. F., and Wilk, M. B. (1975b). 'Detecting outliers in a two-way table. I.
Ellenberg, J. H. (1976). 'Testing fora single outlier from a generallinear regression'. Statistica! behaviour of residuals'. Technometrics, 17, 1-14. (249)
Biometrics, 32, 637-645. (264, 265) Glaisher, J. W. L. (1872). 'On the law of facility of errors of observations and on the
Epstein, B. (1960a). 'Tests for the validity of the assumption that the underlying method of least squares'. Mem. Roy. Astr. Soc., 39, 75-124. (47)
distribution of life is exponential: Part I'. Technometrics, 2, 83-101. (40, 77) Glaisher, J. W. L. (1872-73). 'On the rejection of discordant observations'. Monthly
Epstein, B. (1960b). 'Tests for the validity of the assumption that the underlying Notices Roy. Astr. Soc., 33, 391-402. (20)
distribution of life is exponential: PartII'. Technometrics, 2, 167-183. (40, 77, 85) Glaisher, J. W. L. (1874). Note on a paper by Mr. Stone 'On the rejection of
Faye, H. E. (1888). 'Sur certain points de la théorie des erreurs accidentelles'. C. R. discordant observations'. Monthly Notices Roy. Astr. Soc., 34, 251. (Ch. 2, H)
Acad. Sci. Paris, 106, 783-786. (Ch. 3, H) Gnanadesikan, R. (1973). 'Graphical methods for informai inference in multivariate
Fellegi, I. P. (1975). 'Automatic editing and imputation of quantitiative data' data analysis'. Bull. Int. Statist. Inst., 45, Book 4, 195-206. (222, 227)
(Summary). Bull. Int. Statist. Inst., XLVI, 249-253. (221, 224) Gnanadesikan, R., and Kettenring, J. R. (1972). 'Robust estimates, residuals and
Fellegi, I. P., and Holt, D. (1976). 'A systematic approach to automatic edit and outlier detection with multiresponse data'. Biometrics, 28, 81-124. (208, 220,
imputation'. J. Amer. Statist. Ass., 71, 17-35. (Ch. 6) 222-224, 226, 227' 232, 249)
Feller, W. (1968). An Introduction to Probability Theory and its Applications. Vol. I, Gnanadesikan, R., and Wilk, M. B. (1968). 'Probability plotting methods for the
3rd edn. Wiley, New York. (192) analysis of data'. Biometrika, 55, 1-17. (Ch. 6)
Fenton, R. (1975). Personal correspondence. (13) Gnanadesikan, R., and Wilk, M. B. (1969). 'Data analytic methods in multivariate
Ferguson, T. S. (1961a). 'On the rejection of outliers'. Proceedings of the Fourth statistica! analysis'. In Krishnaiah (1969). (222)
Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 253- Goldsmith, P. L., and Boddy, R. (1973). 'Criticai analysis of factorial experiments
287. (1, 34, 39, 41, 42, 58, 60, 94, 95, 98, 101, 109, 210, 218, 240, 276) and orthogonal fractions'. Applied Statistics, 22, 141-160. (241, 244, 264)
Ferguson, T. S. (1961b). 'Rules for rejection of outliers'. Rev. Inst. Int. de Statist., 29, Golub, G. H., Guttman, I., and Dutter, R. (1973). 'Examination of pseudo-residuals
29-43. (73, 94-98, 100, 111) of outliers for detecting spurosity in the generai univariate linear mode l'. In Kabe
and Gupta (1973). (231, 260)
342 Outliers in statistica[ data References and bibliography 343

Dixon, W. J., and Tukey, J. W. (1968). 'Approximate behaviour of the distribution Fieller, N. R. J. (1976). Some Problems related to the Rejection of Outlying Obser-
of Winsorized t (Trimming/Winsorization 2)'. Technometrics, 10, 83-98. (142, 159, vations. Ph.D. Thesis, University of Sheffield. (71, 83, 85, 95-97, 106-108,
168, 169) 110-112, 251, 264)
Doolittle, H. M. (1884). 'The rejection of doubtful observations' (Abstract). Bulletin Finney, D. J. (1974). 'Problems, data and inference: The Address of the President'
of the Philosophical Society of Washington, (Math. Soc.), 6, 153-156. (Ch. 2, H) (with Proceedi~gs). J. Roy. Statist. Soc. A, 137, 1-23, (6)
Doornbos, R. (1956). 'Significance of the smallest of a set of estimated norma! Fisher, R. A. (1929). 'Tests of significance in harmonic analysis'. Proc. Roy. Soc. A,
variances'. Statistica Neerlandica, 10, 117-126. (191, 197) 125, 54-59. (64, 79)
Doornbos, R. (1966). Slippage Tests. Mathematical Centre Tracts, No. 15, Fisher, R. A. (1936). 'The use of multiple measurements in taxonomic problems'.
Mathematisch Centrum, Amsterdam. (120, 187, 191, 193, 195, 197, 200, 201- Ann. Eugen, 7, 179-188. (227)
204) Fisher, R. A. (1960). The Design of Experiments. 7th edn. Oliver & Boyd, Edin-
Doornbos, R., and Prins, H. J. (1956). 'Slippage t,ests for a set of gamma-variates'. burgh. (279)
Indag. Math., 18, 329-337. (202) Fisher, R. A., Corbet, A. S., and Williams, C. B. (1943). 'The relation between the
Doornbos, R., and Prins, H. J. (1958). 'On slippage tests'. Indag. Math., 20, 38-55, number of species and the number of individuals in a random sample of an animai
438-447. (181, 195, 197) population'. J. Anima[ Ecol., 12, 42-57. (lO)
Downton, F. D. (1976). 'Nonparametric tests for block experiments'. Biometrika, 63, Forsythe, A. B. (1972). 'Robust estimation of straight line regression coefficients by
137-141. (187) minimizing pth power deviations'. Technometrics, 14, 159-166. (Ch. 7)
Draper, N. R., and Smith, H. (1966). Applied Regression Analysis. Wiley, New York. Fox, A. J. (1972). 'Outliers intime series'. J. Roy. Statist. Soc. B, 43, 350-363. (267)
(261) Friedman, M. (1940). 'A comparison of alternative tests of significance for the
Edgeworth, F. Y. (1883). 'The method of least squares'. Philosophical Magazine, 16, problem of m-rankings'. Ann. Math. Statist., 11, 86-92. (187)
360-375. (20) Gastwirth, J. L. (1966). 'On robust procedures'. J. Amer. Statist. Ass., 61, 929-948.
Edgeworth, F. Y. (1887). 'On discordant observations'. Philosophical Magazine, 23, (48, 135, 153, 164, 166)
364-375. (20) Gastwirth, J. L., and Cohen, M. L. (1970). 'Small sample behaviour of some robust
van Eeden, C. (1970). 'Efficiency-robust estimation of location'. Ann. Math. Statist., linear estimators of location'. J. Amer. Statist. Ass., 65, 946-973. (135, 153, 157,
41, 172-181. (Ch. 4) 164)
Eisenhart, C., and Solomon, H. (1947). 'Significance of the largest of a set of sample Gastwirth, J. L., and Rubin, H. (1969). 'On robust linear estimators'. Ann. Math.
estimates of variance'. In Eisenhart, Hastay and Wallis (1947). (191) Statist., 40, 24-39. (156)
Eisenhart, C., Hastay, M. W., and Wallis, W. A. (Eds.) (1947). Selected Techniques Gebhardt, F. (1964. 'On the risk of some strategies for outlying observations'. Ann.
of Statistica[ Analysis, McGraw-Hill, New York. (79, 108, 191, 196, 197) Math. Statist., 35, 1524-1536. (51)
Elashoff, J. D. (1972). 'A model for quadratic outliers in linear regression'. J. Amer. Gebhardt, F. (1966). 'On the effect of stragglers on the risk of some mean estimators
Statist. Ass., 67, 478-485. (255) in small samples'. Ann. Math. Statist., 37, 441-450. (Ch. 4)
Ellenberg, J. H. (1973). 'The joint distribution of the standardized least squares Gentleman, J. F., and Wilk, M. B. (1975a). 'Detecting Outliers: II Supplementing
residuals from a generai linear regression'. J. Amer. Statist. Ass., 68, 941-943. the direct analysis of residuals'. Biometrics, 31, 387-410. (249, 251, 265)
(255, 262, 264, 265) Gentleman, J. F., and Wilk, M. B. (1975b). 'Detecting outliers in a two-way table. I.
Ellenberg, J. H. (1976). 'Testing fora single outlier from a generallinear regression'. Statistica! behaviour of residuals'. Technometrics, 17, 1-14. (249)
Biometrics, 32, 637-645. (264, 265) Glaisher, J. W. L. (1872). 'On the law of facility of errors of observations and on the
Epstein, B. (1960a). 'Tests for the validity of the assumption that the underlying method of least squares'. Mem. Roy. Astr. Soc., 39, 75-124. (47)
distribution of life is exponential: Part I'. Technometrics, 2, 83-101. (40, 77) Glaisher, J. W. L. (1872-73). 'On the rejection of discordant observations'. Monthly
Epstein, B. (1960b). 'Tests for the validity of the assumption that the underlying Notices Roy. Astr. Soc., 33, 391-402. (20)
distribution of life is exponential: PartII'. Technometrics, 2, 167-183. (40, 77, 85) Glaisher, J. W. L. (1874). Note on a paper by Mr. Stone 'On the rejection of
Faye, H. E. (1888). 'Sur certain points de la théorie des erreurs accidentelles'. C. R. discordant observations'. Monthly Notices Roy. Astr. Soc., 34, 251. (Ch. 2, H)
Acad. Sci. Paris, 106, 783-786. (Ch. 3, H) Gnanadesikan, R. (1973). 'Graphical methods for informai inference in multivariate
Fellegi, I. P. (1975). 'Automatic editing and imputation of quantitiative data' data analysis'. Bull. Int. Statist. Inst., 45, Book 4, 195-206. (222, 227)
(Summary). Bull. Int. Statist. Inst., XLVI, 249-253. (221, 224) Gnanadesikan, R., and Kettenring, J. R. (1972). 'Robust estimates, residuals and
Fellegi, I. P., and Holt, D. (1976). 'A systematic approach to automatic edit and outlier detection with multiresponse data'. Biometrics, 28, 81-124. (208, 220,
imputation'. J. Amer. Statist. Ass., 71, 17-35. (Ch. 6) 222-224, 226, 227' 232, 249)
Feller, W. (1968). An Introduction to Probability Theory and its Applications. Vol. I, Gnanadesikan, R., and Wilk, M. B. (1968). 'Probability plotting methods for the
3rd edn. Wiley, New York. (192) analysis of data'. Biometrika, 55, 1-17. (Ch. 6)
Fenton, R. (1975). Personal correspondence. (13) Gnanadesikan, R., and Wilk, M. B. (1969). 'Data analytic methods in multivariate
Ferguson, T. S. (1961a). 'On the rejection of outliers'. Proceedings of the Fourth statistica! analysis'. In Krishnaiah (1969). (222)
Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 253- Goldsmith, P. L., and Boddy, R. (1973). 'Criticai analysis of factorial experiments
287. (1, 34, 39, 41, 42, 58, 60, 94, 95, 98, 101, 109, 210, 218, 240, 276) and orthogonal fractions'. Applied Statistics, 22, 141-160. (241, 244, 264)
Ferguson, T. S. (1961b). 'Rules for rejection of outliers'. Rev. Inst. Int. de Statist., 29, Golub, G. H., Guttman, I., and Dutter, R. (1973). 'Examination of pseudo-residuals
29-43. (73, 94-98, 100, 111) of outliers for detecting spurosity in the generai univariate linear mode l'. In Kabe
and Gupta (1973). (231, 260)
344 Outliers in statistica[ data References and bibliography 345

Goodwin, H. M. (1913). Elements of the Precision of Measurements and Graphical Population. U.S. Air Force, Aerospace Researcb Laboratories, Wasbington, D.C.
Methods. McGraw-Hill, New York. (21) (107, 114)
Gould, B. A. Jr. (1855). 'On Peirce's criterion for tbe rejection of doubtful Harter, H. L. (1969b). Order Statistics and their Use in Testing and Estimation, Vol.
observations, witb tables for facilitating its application'. Astr. J., 4, 81-87. (Cb. 2, 2: Estimates Based on Order Statistics of Samples from Various Populations. U.S.
H) Air Force, Aero~pace Researcb Laboratories, Wasbington, D.C. (115)
Granger, C. W. J., and Neave, H. R. (1968). 'A quick test for slippage'. Rev. Int. Harter, H. L. (1974-1976). 'Tbe metbod of least squares and some alternatives Parts
Statist. Inst., 36, 309-312. (180, 183) I-VI'. Rev. Int. Inst. de Statist., 42, 147-174, PartI; 42, 235-264, (PartII); 43,
Green, R. F. (1974). 'A note on outlier-prone families of distributions'. Ann. Statist., 1-44, (Part III); 43, 125-190, (Part IV); 43,269-278, (Part V); 44, 113-159, (Part
2, 1293-1295. (37) VI). (21, 22)
Green, R. F. (1976). 'Outlier-prone and outlier-resistant distributions'. J. Amer. Hartigan, J. A. (1968). 'Note on discordant observations'. J. Roy. Statist. Soc. B, 30,
Statist. Ass., 71, 502-505. (Cb. 2) 545-550. (Cb. 3)
Grubbs, F. E. (1950). 'Sample criteria for testing outlying observations'. Ann. Math. Hartley, H. O. (1950). 'The maximum F-ratio as a sbort-cut test for beterogeneity of
Statist., 21, 27-58. (23, 34, 39, 40, 54, 73, 94, 96, 97, 110, 111, 2, 15, 254) variance'. Biometrika, 37, 308-312. (84)
Grubbs, F. E. (1969). 'Procedures for detecting outlying observations in samples'. Hawkins, D. M. (1969). 'On tbe distribution and power of a test fora single outlier'.
Technometrics, 11, 1-21. (22, 26, 40, 94, 96, 111) South Afr. Statist. 1., 3, 9-15. (Cb. 3)
Grubbs, F. E., and Beck, G. (1972). 'Extension of sample sizes and percentage Hawkins, D. M. (1973). 'Repeated testing for outliers'. Statistica Neerlandica, 27,
points for significance tests of outlying observations'. Technometrics, 14, 847-854. 1-10. (71, 73}
(94, 96) Hawkins, D. M. (1974). 'The detection of errors in multivariate data using principal
Gupta, S. S. (1960). 'Order statistics from tbe gamma distribution'. Technometrics, 2, components'. J. Amer. Statist. Ass., 69, 340-344. (224)
243-262. Correction Technometrics, 2, 523. (211) Healy, M. J. R. (1968). 'Multivariate norma! plotting'. Appl. Statist., 17, 157-161.
Gupta, S. S. (Ed.) (1975). Applied Statistics. Nortb Holland, Amsterdam. (212, 226)
Gupta, S. S., and Yackel, J. (Eds.) (1971). Statistica/ Decision Theory and Related Henry, F. M. (1950). 'Tbe loss of precision from discarding discrepant data'. Res.
Topics. Academic Press, New York. Qty. Amer. Ass. Health, 21, 145-152. (Cb. 3)
Guttman, I. (1973a). 'Premium and protection of severa! procedures for dealing witb Hinicb, M. J., and Talwar, P. P. (1975). 'A simple metbod for robust estimation'. J.
outliers wben sample sizes are moderate to large'. Technometrics, 15, 385-404. Amer. Statist. Ass., 70, 113-119. (256)
(127, 132, 135, 166, 168) Hodges, J. L. Jr. (1967). 'Efficiency in norma! samples and tolerance of extreme
Guttman, I. (1973b). 'Care and bandling of univariate or multivariate outliners in values for some estimates of location'. Proceedings of the 5th Berkeley Symposium
detecting spuriosity-a Bayesian approacb'. Technometrics, 15, 723-738. (32, 34, on Mathematical Statistics and Probability, University of California, Berkeley,
36, 45, 210, 232, 272, 275) Calif. (141, 154, 161, 165)
Guttman, 1., and Smitb, D. E. (1969). 'lnvestigation of rules for dealing witb outliers Hodges, J. L. Jr., and Lebmann, E. L. (1963). 'Estimates of location based on rank
in small samples from tbe norma! distribution. 1: Estimation of tbe mean'. tests'. Ann. Math. Statist., 34, 598-611. (49, 154, 161)
Technometrics, 11, 527-550. (50, 51, 127, 132, 148, 166, 167, 168) Hoenig, J., and Crotty, l. M. (1958). Intemational J. Social Psychiatry, 3, 260-277.
Guttman, 1., and Smitb, D. E. (1971). 'lnvestigation of rules for dealing witb outliers (55)
in small samples from tbe norma! distribution II: Estimation of tbe variance'. Hogg, R. V. (1967). 'Some observations on robust estimation'. J. Amer. Statist. Ass.,
Technometrics, 13, 101-111. (50, 127, 132, 160, 166, 169, 170) 62, 1179-1186. (135, 154, 155, 157, 166)
Guttman, I., and Tiao, G. C. (1978). 'Effect of correlation on tbe estimation of a Hogg, R. V. (1974). 'Adaptive robust procedures: a partial review and some
mean in tbe presence of spurious observations'. T o appear. Technometrics (268) suggestions for future applications and tbeory (witb comments)'. J. Amer. Statist.
Halperin, M., Greenbouse, S. W., Corn:field, J., and Zalokar, J. (1955). 'Tables of Ass., 69, 909-927. (46, 148, 149, 150, 151, 157)
percentage points for tbe studentized maximum absolute deviate in normal sam- Hogg, R. V., Utboff, V. A., Randles, R. J., and Davenport, A. S. (1972). 'On tbe
ples'. J. Amer. Statist. Ass., 50, 185-195. (39, 105, 112) selection of tbe underlying distribution and adaptive estimation'. J. Amer. Statist.
Hampel, F. R. (1968). Contributions to the Theory of Robust Estimation. Pb.D. Ass., 67, 597-600. (Cb. 4)
dissertation, University of California-Berkeley, University Microfilms lnc., Ann Huber, P. J. (1964). 'Robust estimation of a location parameter'. Ann. Math. Statist.,
Arbor Micb. (136) 35, 73-101. (46, 134, 142, 150, 151, 152, 156, 160, 163, 166, 169, 170)
Hampel, F. R. (1971). 'A generalized qualitative definition of robustness'. Ann. Huber, P. J. (1967). 'The bebaviour of maximum likelibood estimates under
Math. Statist., 42, 1887-1896. (136, 141, 157) nonstandard conditions'. Proc. Fifth Berkeley Symp. Math. Statist. Prob., Vol. I, pp.
Hampel, F. R. (1974). 'Tbe influence curve and its role in robust estimation'. J. 221-233. (149)
Amer. Statist. Ass., 69, 383-393. (46, 136, 140, 147, 151, 157, 158, 165, 169, Huber, P. J. (1968). 'Robust estimation'. Mathematical Centre Tracts, Selected
228) Statistica[ Papers, 27, 3-25. Matbematiscb Centrum Amsterdam. (142, 162)
Harris, T. E., and Tukey, J. W. (1949). 'Measures of location and scale wbicb are Huber, P. J. (1970). 'Studentizing robust estimates'. In Puri (1970). (142, 160, 161,
relatively insensitive to contamination'. Memorandum Report No. 31, Statistica! 162)
Researcb Group, Princeton University, Princeton, N.J. (Cb. 4) Huber, P. J. (1972). 'Robust statistics: a review (Tbe 1972 Wald Lecture)' Ann.
Harter, H. L. (1969a). Order Statistics and their Use in Testing and Estimation, Vol. Math. Statist., 43, 1041-1067. (26, 46, 48, 126, 141, 149, 154, 156, 161, 162, 165,
l : Tests Based on Range and Studentized Range of Samples from a Normal 266)
344 Outliers in statistica[ data References and bibliography 345

Goodwin, H. M. (1913). Elements of the Precision of Measurements and Graphical Population. U.S. Air Force, Aerospace Researcb Laboratories, Wasbington, D.C.
Methods. McGraw-Hill, New York. (21) (107, 114)
Gould, B. A. Jr. (1855). 'On Peirce's criterion for tbe rejection of doubtful Harter, H. L. (1969b). Order Statistics and their Use in Testing and Estimation, Vol.
observations, witb tables for facilitating its application'. Astr. J., 4, 81-87. (Cb. 2, 2: Estimates Based on Order Statistics of Samples from Various Populations. U.S.
H) Air Force, Aero~pace Researcb Laboratories, Wasbington, D.C. (115)
Granger, C. W. J., and Neave, H. R. (1968). 'A quick test for slippage'. Rev. Int. Harter, H. L. (1974-1976). 'Tbe metbod of least squares and some alternatives Parts
Statist. Inst., 36, 309-312. (180, 183) I-VI'. Rev. Int. Inst. de Statist., 42, 147-174, PartI; 42, 235-264, (PartII); 43,
Green, R. F. (1974). 'A note on outlier-prone families of distributions'. Ann. Statist., 1-44, (Part III); 43, 125-190, (Part IV); 43,269-278, (Part V); 44, 113-159, (Part
2, 1293-1295. (37) VI). (21, 22)
Green, R. F. (1976). 'Outlier-prone and outlier-resistant distributions'. J. Amer. Hartigan, J. A. (1968). 'Note on discordant observations'. J. Roy. Statist. Soc. B, 30,
Statist. Ass., 71, 502-505. (Cb. 2) 545-550. (Cb. 3)
Grubbs, F. E. (1950). 'Sample criteria for testing outlying observations'. Ann. Math. Hartley, H. O. (1950). 'The maximum F-ratio as a sbort-cut test for beterogeneity of
Statist., 21, 27-58. (23, 34, 39, 40, 54, 73, 94, 96, 97, 110, 111, 2, 15, 254) variance'. Biometrika, 37, 308-312. (84)
Grubbs, F. E. (1969). 'Procedures for detecting outlying observations in samples'. Hawkins, D. M. (1969). 'On tbe distribution and power of a test fora single outlier'.
Technometrics, 11, 1-21. (22, 26, 40, 94, 96, 111) South Afr. Statist. 1., 3, 9-15. (Cb. 3)
Grubbs, F. E., and Beck, G. (1972). 'Extension of sample sizes and percentage Hawkins, D. M. (1973). 'Repeated testing for outliers'. Statistica Neerlandica, 27,
points for significance tests of outlying observations'. Technometrics, 14, 847-854. 1-10. (71, 73}
(94, 96) Hawkins, D. M. (1974). 'The detection of errors in multivariate data using principal
Gupta, S. S. (1960). 'Order statistics from tbe gamma distribution'. Technometrics, 2, components'. J. Amer. Statist. Ass., 69, 340-344. (224)
243-262. Correction Technometrics, 2, 523. (211) Healy, M. J. R. (1968). 'Multivariate norma! plotting'. Appl. Statist., 17, 157-161.
Gupta, S. S. (Ed.) (1975). Applied Statistics. Nortb Holland, Amsterdam. (212, 226)
Gupta, S. S., and Yackel, J. (Eds.) (1971). Statistica/ Decision Theory and Related Henry, F. M. (1950). 'Tbe loss of precision from discarding discrepant data'. Res.
Topics. Academic Press, New York. Qty. Amer. Ass. Health, 21, 145-152. (Cb. 3)
Guttman, I. (1973a). 'Premium and protection of severa! procedures for dealing witb Hinicb, M. J., and Talwar, P. P. (1975). 'A simple metbod for robust estimation'. J.
outliers wben sample sizes are moderate to large'. Technometrics, 15, 385-404. Amer. Statist. Ass., 70, 113-119. (256)
(127, 132, 135, 166, 168) Hodges, J. L. Jr. (1967). 'Efficiency in norma! samples and tolerance of extreme
Guttman, I. (1973b). 'Care and bandling of univariate or multivariate outliners in values for some estimates of location'. Proceedings of the 5th Berkeley Symposium
detecting spuriosity-a Bayesian approacb'. Technometrics, 15, 723-738. (32, 34, on Mathematical Statistics and Probability, University of California, Berkeley,
36, 45, 210, 232, 272, 275) Calif. (141, 154, 161, 165)
Guttman, 1., and Smitb, D. E. (1969). 'lnvestigation of rules for dealing witb outliers Hodges, J. L. Jr., and Lebmann, E. L. (1963). 'Estimates of location based on rank
in small samples from tbe norma! distribution. 1: Estimation of tbe mean'. tests'. Ann. Math. Statist., 34, 598-611. (49, 154, 161)
Technometrics, 11, 527-550. (50, 51, 127, 132, 148, 166, 167, 168) Hoenig, J., and Crotty, l. M. (1958). Intemational J. Social Psychiatry, 3, 260-277.
Guttman, 1., and Smitb, D. E. (1971). 'lnvestigation of rules for dealing witb outliers (55)
in small samples from tbe norma! distribution II: Estimation of tbe variance'. Hogg, R. V. (1967). 'Some observations on robust estimation'. J. Amer. Statist. Ass.,
Technometrics, 13, 101-111. (50, 127, 132, 160, 166, 169, 170) 62, 1179-1186. (135, 154, 155, 157, 166)
Guttman, I., and Tiao, G. C. (1978). 'Effect of correlation on tbe estimation of a Hogg, R. V. (1974). 'Adaptive robust procedures: a partial review and some
mean in tbe presence of spurious observations'. T o appear. Technometrics (268) suggestions for future applications and tbeory (witb comments)'. J. Amer. Statist.
Halperin, M., Greenbouse, S. W., Corn:field, J., and Zalokar, J. (1955). 'Tables of Ass., 69, 909-927. (46, 148, 149, 150, 151, 157)
percentage points for tbe studentized maximum absolute deviate in normal sam- Hogg, R. V., Utboff, V. A., Randles, R. J., and Davenport, A. S. (1972). 'On tbe
ples'. J. Amer. Statist. Ass., 50, 185-195. (39, 105, 112) selection of tbe underlying distribution and adaptive estimation'. J. Amer. Statist.
Hampel, F. R. (1968). Contributions to the Theory of Robust Estimation. Pb.D. Ass., 67, 597-600. (Cb. 4)
dissertation, University of California-Berkeley, University Microfilms lnc., Ann Huber, P. J. (1964). 'Robust estimation of a location parameter'. Ann. Math. Statist.,
Arbor Micb. (136) 35, 73-101. (46, 134, 142, 150, 151, 152, 156, 160, 163, 166, 169, 170)
Hampel, F. R. (1971). 'A generalized qualitative definition of robustness'. Ann. Huber, P. J. (1967). 'The bebaviour of maximum likelibood estimates under
Math. Statist., 42, 1887-1896. (136, 141, 157) nonstandard conditions'. Proc. Fifth Berkeley Symp. Math. Statist. Prob., Vol. I, pp.
Hampel, F. R. (1974). 'Tbe influence curve and its role in robust estimation'. J. 221-233. (149)
Amer. Statist. Ass., 69, 383-393. (46, 136, 140, 147, 151, 157, 158, 165, 169, Huber, P. J. (1968). 'Robust estimation'. Mathematical Centre Tracts, Selected
228) Statistica[ Papers, 27, 3-25. Matbematiscb Centrum Amsterdam. (142, 162)
Harris, T. E., and Tukey, J. W. (1949). 'Measures of location and scale wbicb are Huber, P. J. (1970). 'Studentizing robust estimates'. In Puri (1970). (142, 160, 161,
relatively insensitive to contamination'. Memorandum Report No. 31, Statistica! 162)
Researcb Group, Princeton University, Princeton, N.J. (Cb. 4) Huber, P. J. (1972). 'Robust statistics: a review (Tbe 1972 Wald Lecture)' Ann.
Harter, H. L. (1969a). Order Statistics and their Use in Testing and Estimation, Vol. Math. Statist., 43, 1041-1067. (26, 46, 48, 126, 141, 149, 154, 156, 161, 162, 165,
l : Tests Based on Range and Studentized Range of Samples from a Normal 266)
346 Outliers in statistica[ data References and bibliography 347

Huber, P. J. (1973). 'Robust regression: Asymptotics, conjectures and Monte Carlo'. Karlin, S., and Truax, D. R. (1960). 'Slippage Problems'. Ann. Math. Statist., 31,
Ann. Statist., 1, 799-821. (Cb. 7) 296-324. (181, 187, 193, 195, 200, 205)
Irwin, J. O. (1925). 'On a criterion for tbe rejection of outlying observations'. Kelleber, G. J. (1974). 'Exact two-sample exceedance tests wben one observation
Biometrika, 17, 238-250. (21, 38, 114, 115) possibly an outlier'. Sankhya, B, 36, 187-193. (Cb. 3)
Jaeckel, L. A. (1969). Robust Estimates of Location. Pb.D. dissertation, University of King, E. P. (1953). 'On some procedures for tbe rejection of suspected data'. J.
California-Berkeley, University Microfilms Inc., Ann Arbor, Micb. (166) Amer. Statist. Ass., 48, 531-533. (98)
Jaeckel, L. A. (1971a). 'Robust estimates of location: Syrnmetry and asymmetric Kraft, C. H., and van Eeden, C. (1970). 'Efficient linearized estirnates based on
contamination'. Ann. Math. Statist., 42, 1020-1034. (46, 49, 135, 151, 153, 154, ranks'. In Puri (1970). (Cb. 4)
156) Kraft, C. H., and van Eeden, C. (1972). '"Asymptotic" efficiencies of quick metbods
Jaeckel, L. A. (1971b). 'Some ftexible estimates of location'. Ann. Math. Statist., 42, of computing efficient estimates'. J. Amer. Statist. Ass., 67, 199-202. (Cb. 4)
1540-1552. (135, 148, 157) Krisbnaiab, P. R. (Ed.) (1969). Multivariate Analysis, Vol. Il. Academic Press, New
Jeffreys, H. (1932). 'An alternative to tbe rejection of observations'. Proc. Roy. Soc. York.
London, A, 137, 78-87. (47) Kruskal, W. H. (1960a). 'Some remarks on wild observations'. Technometrics, 2, 1-3.
Jeffreys, H. (1938). 'Tbe law of error and tbe combination of observations'. Phil. (Cb. 2)
Trans. Roy. Soc. London, A, 237, 231-271. (Cb. 2) Kruskal, W. H. (1960b). 'Discussion of tbe papers of Messrs. Anscombe and Daniel'.
Jevons, W. S. (1874). The Principles of Science. Macmillan, London, (latest edn. Technometrics, 2, 157-158. (25, 37)
1958). (Cb. 2, H) Kruskal, W. H., and Wallis, W. A. (1952). 'Use of ranks in one-criterion variance
Jobn, J. A. (1978). 'Outliers in factorial experiments'. Applied Statistics, 27, (246, analysis'. J. Amer. Statist. Ass., 47, 583-612. (187)
251) Kudo, A. (1956a). 'On tbe testing of outlying observations'. Sankhya, 17, 67-76.
Jobn, J. A., and Draper, N. R. (1978). 'On testing for two or one outliers in two-way (43, 94, 95, 106, 111, 112, 194, 195)
tables'. Technometrics, 20, (251, 261) Kudo, A. (1956b). 'On tbe invariant multiple decisiol) procedures'. Bull. Math.
Jobn, J. A., and Prescott, P. (1975). 'Criticai values of a test to detect outliers in Statist., 6, 57-68. (43, 193)
factorial experiments'. Appl. Statistics, 24, 56-59. (241, 244) Kudo, A. (1957). 'The extreme value in a multiple norrnal sample'. Mem. Fac. Sci.
Jobns, M. V. Jr. (1974). 'Nonparametric estimates of location'. J. Amer. Statist. Ass., Kyushu Univ., A, 11, 143-156. (218)
69, 453-460. (49) Larson, W. A., and McCleary, S. J. (1972). 'Tbe use of partial residuals in regression
Josbi, P. C. (1972a). 'Some slippage tests of mean for a single outlier in linear analysis'. Technometrics, 14, 781-790. (Cb. 7)
regression'. Biometrika, 59, 109-120. (262) Laurent, A. G. (1963). 'Conditional distribution of order statistics and distribution of
Josbi, P. C. (1972b). 'Efficient estimation of a mean of an exponential distribution tbe reduced itb order statistic of tbe exponential mode!'. Ann. Math. Statist., 34,
wben an outlier is present'. Technometrics, 14,137-144. (36, 77, 171) 652-657. (77)
Josbi, P. C. (197 5). 'Some distribution tbeory results for a regression mode l'. A nn. Legendre, A. M. (1805). Nouvelles Methodes pour la Determination des Orbites des
Inst. Statist. Math., Tokyo, 27, 309-317. (Cb. 7) Cometes. Courcier, Paris (especially 'Appendice sur la métbode des moindres
Kabe, D. G. (1970). 'Testing outliers from an exponential population'. Metrika, 15, quarrés, pp. 72-80). (19)
15-18. (77, 81, 82, 85, 86, 87) Legendre, A. M. (1814). 'Metb6de des moindres quarrés, pour trouver le milieu le
Kabe, D. G., and Gupta, R. P. (Eds.) (1973). Multivariate Statistica/ Inference. plus probable entre les résultats de différentes observations'. Mémoires de la Classe
Nortb-Holland, Amsterdam. des Sciences Mathématiques et Physiques de l' Institut de France, ANNÉE 1810,
Kale, B. K. (1974a). 'Detection of outliers'. Technical Report No. 63, Department of 149-154. (Cb. 2, H)
Statistics, University of Winnipeg, Canada. (77) Lebmann, E. L. (1953). 'Tbe power of rank tests'. Ann. Math. Statist., 24, 23-43.
Kale, B. K. (1974b). 'Detection of outliers-a semi-Bayesian approacb (preliminary (182)
report) (abstract)'. Inst. Math. Statist. Bull., 3, 153. (272) Lebmann, E. L. (1975). Nonparametrics: Statistica/ Methods based on Ranks.
Kale, B. K. (1975a). 'A note on outlier-resistant families and mixtures of distribu- McGraw-Hill, New York. (162)
tions'. Technical Report No. 66, Department of Statistics, University of Manitoba, Leone, F. C., and Moussa-Hamouda, E. (1973). 'Relative efficiencies of 'O-BLUE'
Winnipeg, Canada. (38) estimators in simple linear regression'. J. Amer. Statist. Ass., 68, 953-959. (Cb. 4,
Kale, B. K. (1975b). 'On outlier-proneness of some families of distributions'. Cb. 7)
Technical Report No. 68, Department of Statistics, University of Manitoba, Leone, F. C., Jayacbandran, T., and Eisenstat, S. (1967). 'A study of robust
Winnipeg, Canada. (38) estimators'. Technometrics, 9, 652-660. (151, 161, 168)
Kale, B. K. (1975c). 'Trimmed means and tbe metbod of maximum likelibood wben Lewis, T., and Fieller, N. R. J. (1978). 'A recursive algoritbm for null distributions
spurious observations are present'. In Gupta (1975). (50, 51, 77, 172) for outliers: I. Gamma samples. To appear, Technometrics, 20, (40, 73, 77, 82, 85,
Kale, B. K., and Sinba, S. K. (1971). 'Estimation of expected life in tbe presence of 108, 110)
an outlier observation'. Technometrics, 13, 755-759. (35, 50, 77, 155, 171, 173, Liberman, G. J., and Owen, D. B. (1961). Tables of the Hypergeometric Probability
272, 281) Distribution'. University Press, Stanford, California. (204)
Kapur, M. N. (1957). 'A property of tbe optimum solution suggested by Paulson for Lieblein, J. (1952). 'Properties of certain statistics involving tbe closest pair in a
tbe k-sample slippage problem for tbe normal distribution'. Ind. Soc. Agric. sample of tbree observations'. J. Res. Nat. Bur. Stands., 48, 225-268. (51)
Statist., 9, 179-190. (43, 193) Lieblein, J. (1962). 'Tbe closest two out of tbree observations'. In Sarban and
Greenberg (1962). (51)
346 Outliers in statistica[ data References and bibliography 347

Huber, P. J. (1973). 'Robust regression: Asymptotics, conjectures and Monte Carlo'. Karlin, S., and Truax, D. R. (1960). 'Slippage Problems'. Ann. Math. Statist., 31,
Ann. Statist., 1, 799-821. (Cb. 7) 296-324. (181, 187, 193, 195, 200, 205)
Irwin, J. O. (1925). 'On a criterion for tbe rejection of outlying observations'. Kelleber, G. J. (1974). 'Exact two-sample exceedance tests wben one observation
Biometrika, 17, 238-250. (21, 38, 114, 115) possibly an outlier'. Sankhya, B, 36, 187-193. (Cb. 3)
Jaeckel, L. A. (1969). Robust Estimates of Location. Pb.D. dissertation, University of King, E. P. (1953). 'On some procedures for tbe rejection of suspected data'. J.
California-Berkeley, University Microfilms Inc., Ann Arbor, Micb. (166) Amer. Statist. Ass., 48, 531-533. (98)
Jaeckel, L. A. (1971a). 'Robust estimates of location: Syrnmetry and asymmetric Kraft, C. H., and van Eeden, C. (1970). 'Efficient linearized estirnates based on
contamination'. Ann. Math. Statist., 42, 1020-1034. (46, 49, 135, 151, 153, 154, ranks'. In Puri (1970). (Cb. 4)
156) Kraft, C. H., and van Eeden, C. (1972). '"Asymptotic" efficiencies of quick metbods
Jaeckel, L. A. (1971b). 'Some ftexible estimates of location'. Ann. Math. Statist., 42, of computing efficient estimates'. J. Amer. Statist. Ass., 67, 199-202. (Cb. 4)
1540-1552. (135, 148, 157) Krisbnaiab, P. R. (Ed.) (1969). Multivariate Analysis, Vol. Il. Academic Press, New
Jeffreys, H. (1932). 'An alternative to tbe rejection of observations'. Proc. Roy. Soc. York.
London, A, 137, 78-87. (47) Kruskal, W. H. (1960a). 'Some remarks on wild observations'. Technometrics, 2, 1-3.
Jeffreys, H. (1938). 'Tbe law of error and tbe combination of observations'. Phil. (Cb. 2)
Trans. Roy. Soc. London, A, 237, 231-271. (Cb. 2) Kruskal, W. H. (1960b). 'Discussion of tbe papers of Messrs. Anscombe and Daniel'.
Jevons, W. S. (1874). The Principles of Science. Macmillan, London, (latest edn. Technometrics, 2, 157-158. (25, 37)
1958). (Cb. 2, H) Kruskal, W. H., and Wallis, W. A. (1952). 'Use of ranks in one-criterion variance
Jobn, J. A. (1978). 'Outliers in factorial experiments'. Applied Statistics, 27, (246, analysis'. J. Amer. Statist. Ass., 47, 583-612. (187)
251) Kudo, A. (1956a). 'On tbe testing of outlying observations'. Sankhya, 17, 67-76.
Jobn, J. A., and Draper, N. R. (1978). 'On testing for two or one outliers in two-way (43, 94, 95, 106, 111, 112, 194, 195)
tables'. Technometrics, 20, (251, 261) Kudo, A. (1956b). 'On tbe invariant multiple decisiol) procedures'. Bull. Math.
Jobn, J. A., and Prescott, P. (1975). 'Criticai values of a test to detect outliers in Statist., 6, 57-68. (43, 193)
factorial experiments'. Appl. Statistics, 24, 56-59. (241, 244) Kudo, A. (1957). 'The extreme value in a multiple norrnal sample'. Mem. Fac. Sci.
Jobns, M. V. Jr. (1974). 'Nonparametric estimates of location'. J. Amer. Statist. Ass., Kyushu Univ., A, 11, 143-156. (218)
69, 453-460. (49) Larson, W. A., and McCleary, S. J. (1972). 'Tbe use of partial residuals in regression
Josbi, P. C. (1972a). 'Some slippage tests of mean for a single outlier in linear analysis'. Technometrics, 14, 781-790. (Cb. 7)
regression'. Biometrika, 59, 109-120. (262) Laurent, A. G. (1963). 'Conditional distribution of order statistics and distribution of
Josbi, P. C. (1972b). 'Efficient estimation of a mean of an exponential distribution tbe reduced itb order statistic of tbe exponential mode!'. Ann. Math. Statist., 34,
wben an outlier is present'. Technometrics, 14,137-144. (36, 77, 171) 652-657. (77)
Josbi, P. C. (197 5). 'Some distribution tbeory results for a regression mode l'. A nn. Legendre, A. M. (1805). Nouvelles Methodes pour la Determination des Orbites des
Inst. Statist. Math., Tokyo, 27, 309-317. (Cb. 7) Cometes. Courcier, Paris (especially 'Appendice sur la métbode des moindres
Kabe, D. G. (1970). 'Testing outliers from an exponential population'. Metrika, 15, quarrés, pp. 72-80). (19)
15-18. (77, 81, 82, 85, 86, 87) Legendre, A. M. (1814). 'Metb6de des moindres quarrés, pour trouver le milieu le
Kabe, D. G., and Gupta, R. P. (Eds.) (1973). Multivariate Statistica/ Inference. plus probable entre les résultats de différentes observations'. Mémoires de la Classe
Nortb-Holland, Amsterdam. des Sciences Mathématiques et Physiques de l' Institut de France, ANNÉE 1810,
Kale, B. K. (1974a). 'Detection of outliers'. Technical Report No. 63, Department of 149-154. (Cb. 2, H)
Statistics, University of Winnipeg, Canada. (77) Lebmann, E. L. (1953). 'Tbe power of rank tests'. Ann. Math. Statist., 24, 23-43.
Kale, B. K. (1974b). 'Detection of outliers-a semi-Bayesian approacb (preliminary (182)
report) (abstract)'. Inst. Math. Statist. Bull., 3, 153. (272) Lebmann, E. L. (1975). Nonparametrics: Statistica/ Methods based on Ranks.
Kale, B. K. (1975a). 'A note on outlier-resistant families and mixtures of distribu- McGraw-Hill, New York. (162)
tions'. Technical Report No. 66, Department of Statistics, University of Manitoba, Leone, F. C., and Moussa-Hamouda, E. (1973). 'Relative efficiencies of 'O-BLUE'
Winnipeg, Canada. (38) estimators in simple linear regression'. J. Amer. Statist. Ass., 68, 953-959. (Cb. 4,
Kale, B. K. (1975b). 'On outlier-proneness of some families of distributions'. Cb. 7)
Technical Report No. 68, Department of Statistics, University of Manitoba, Leone, F. C., Jayacbandran, T., and Eisenstat, S. (1967). 'A study of robust
Winnipeg, Canada. (38) estimators'. Technometrics, 9, 652-660. (151, 161, 168)
Kale, B. K. (1975c). 'Trimmed means and tbe metbod of maximum likelibood wben Lewis, T., and Fieller, N. R. J. (1978). 'A recursive algoritbm for null distributions
spurious observations are present'. In Gupta (1975). (50, 51, 77, 172) for outliers: I. Gamma samples. To appear, Technometrics, 20, (40, 73, 77, 82, 85,
Kale, B. K., and Sinba, S. K. (1971). 'Estimation of expected life in tbe presence of 108, 110)
an outlier observation'. Technometrics, 13, 755-759. (35, 50, 77, 155, 171, 173, Liberman, G. J., and Owen, D. B. (1961). Tables of the Hypergeometric Probability
272, 281) Distribution'. University Press, Stanford, California. (204)
Kapur, M. N. (1957). 'A property of tbe optimum solution suggested by Paulson for Lieblein, J. (1952). 'Properties of certain statistics involving tbe closest pair in a
tbe k-sample slippage problem for tbe normal distribution'. Ind. Soc. Agric. sample of tbree observations'. J. Res. Nat. Bur. Stands., 48, 225-268. (51)
Statist., 9, 179-190. (43, 193) Lieblein, J. (1962). 'Tbe closest two out of tbree observations'. In Sarban and
Greenberg (1962). (51)
348 Outliers in statistica[ data References and bibliography 349

Likes, J. (1966). 'Distribution of Dixon's statistics in the case of an exponential samples with unknown variance: a correction'. Technometrics, 15, 637-640. (41,
population'. Metrika, 11, 46-54. (40, 54, 73, 77, 80, 81, 82, 83, 86, 87) 94, 104, 105)
Lingappaiah, G. S. (1976). 'Effect of outliers in the estimation of parameters'. Moshman, J. (1952). 'Testing a straggler mean in a 2-way classification using the
Metrika, 23, 27-30. (281) range'. Ann. Math. Statist., 23, 126-132. (Ch. 7)
Lund, R. E. (1975). 'Tables for an approximate test for outliers in linear models'. Mosteller, F. (1948). 'A k-sample slippage test for an extreme population'. Ann.
Technometrics, 17, 473-476. (255, 262) Math. Statist., 19, 58-65. (176, 283)
McCarthy, P. J. (1972). 'The effects of discarding inliers when binomia} data are Mosteller, F., and Tukey, J. W. (1950). 'Significance levels for a k-sample slippage
subject to classification errors'. J. Amer. Statist. Ass., 67, 515-529. (Ch. 3) test'. Ann. Math. Statist., 21, 120-123. (177)
McKay, A. T. (1935). 'The distribution of the difference between the extreme Mount, K. S., and Kale, B. K. (1973). 'On selecting a spurious observation'. Can.
observation and the sample mean in samples of n from a normal universe'. Math. Bull., 16, 75-78. (37, 77)
Biometrika, 27, 466-471. (111) Moussa-Hamouda, E., and Leone, F. C. (1974). 'The O-BLUE estimators for
McMillan, R. G. (1971). 'Tests for one or two outliers in normal samples with complete and censored samples in linear regression'. Technometrics, 16, 441-446.
unknown variance'. Technometrics, 13, 87-100. (34, 40, 44, 71, 73, 94, 95, 96, (Ch. 4, Ch. 7)
104, 105, 106) Mudrov, V. 1., Kushko, V. L., Mikhailov, V. 1., and Osovitskii, E. M. (1968). 'Some
McMillan, R. G., and David, H. A. (1971). 'Tests for one of two outliers in norma! experiments on the use of the least-moduli method in processing orbita! data'
samples with known variance'. Technometrics, 13, 75-85. (34, 40, 71, 73, 110, (Russian). Kosmicheskie Issledovaniya, 6, 502-504. English translation: Cosmic
111, 112) Research, 6, 421-431. (Ch. 2)
Maguire, B. A., Pearson, E. S., and Wynn, A. H. A. (1952). 'The time intervals Muncke, G. W. (1825). 'Beobachtung'. Gehler's Physikalisches Worterbuch, 2nd edn.
between industriai accidents'. Biometrika, 39, 168-180. (119) Leipzig, Vol. I, pp. 884-912. (Ch. 2, H)
M aire, C.; Boscovich, R. J. (17 55). De litteraria Expeditione per Pontificiam ditionem Murphy, R. B. (1951). On Tests for Outlying Observations. Ph.D. thesis, Princeton
ad dimetiendas duas Meridiani gradus, et corrigendam mappam geographicam, University, University Microfilms Inc., Ann Arbor, Mich. (40, 44, 71, 95)
jussu, et auspiciis Benedicti XIV Pont. Max. Suscepta. Ramae. (French translation: Naik, U. D. (1972). 'A Bayesian analysis of certain contaminated samples'. Research
Voyage Astronomique et Géographique dans l'État de l'Église, entrepis par l'Ordre Report No. 104, Department of Probability and Statistics, Sheffield University.
et sous les Auspices du Pope Benoit XIV, pour mesurer deux degrés du méridien, et (206)
corriger la Carte de l' État ecclesiastique. Paris, l, 770. (18) Nair, K. R. (1948). 'The distribution of the extreme deviate from the sample mean
Martin, R. D., Masreliez, C. J., and Goodfellow, D. M. (1973). 'Robust location and its studentized from'. Biometrika, 35, 118-144. (104, 110, 111)
estimates and confidence intervals via stochastic approximation: small sample Nair, K. R. (1952). 'Tables of percentage points of the "Studentized" extreme
behaviour'. Inst. Math. Statist. Bull., 2, 138. (Ch. 4) deviate from the sample mean'. Biometrika, 39, 189-191. (104)
Mendeleev, D. l. (1895). 'Course of work on the renewal of prototypes or standard Neave, H. R. (1972). 'Some quick tests for slippage'. The Statistician, 21, 197-208.
measures of lengths and weights' (Russian). Vremennik Glavnoi Palaty Mer i (178, 180)
Vesov, 2, 157-185. (Reprinted 1950; Collected Writings (Socheneniya), 22, 175- Neave, H. R. (1973). 'A power study of some tests for slippage'. The Statistician, 22,
213, izdat. Akad. Nauk, SSSR, Leningrad-Moscow.) (21, 47) 269-280. (180)
Mercer, W. B., and Hall, A. D. (1912). 'The experimental error of field trials'. J. Neave, H. R. (1975). 'A quick and simple technique for generai slippage problems'.
Agric. Sci., 4, 107-132. (2) J. Amer. Statist. Ass., 70, 721-726. (183, 185)
Merriman, M. (1877). 'List of writings relating to the method of least squares with Newcomb, S. (1886). 'A generalized theory of the combination of observations so as
historical and criticai notes'. Transactions of the Connecticut Academy of Arts and to obtain the best result'. Amer. J. Math., 8, 343-366. (21, 47)
Sciences, 4, 151-232. (Ch. 2, H) Newcomb, S. (1912). 'Researches on the motion of the moon, Part IL The mean
Merriman, M. (1884). A Textbook on the Method of Least Squares. Wiley. New York. motion of the moon and other astronomica! elements derived from observations of
(Ch. 2, H) eclipses and occultations extending from the period of the Babylonians until A.D.
Meshalkin, L. D., Smirnov, N. P., and Sosnovskii, N. N. (1969). 'On the stability of 1908'. Astronomica/ papers, 9, 1-249, U.S. Government Printing Office, Washing-
estimates of the distribution center' (Review) (Russian). Zavodskaya Laboratoriya, ton. (Ch. 2, H)
35, 51-61. English translation: Industriai Laboratory, 35, 712-716. (Ch. 4) Neyman, J., and Scott, E. L. (1971). 'Outlier proneness of phenomena and of related
Mickey, M. R. (1974). 'Detecting outliers with stepwise regression'. distribution. In Rustagi (1971). (37)
Communications-UCLA Health Sciences Facility, 1, l. (265) Noether, G. E. (1967). 'Wilcoxon con:fidence intervals for location parameters in the
Mickey, M. R., Dunn, O. J., and Clark, V. (1967). 'Note on use of stepwise discrete case'. J. Amer. Statist. Ass., 62, 184-188. (162)
regression in detecting outliers'. Computers & Biomed. Res., 1, 105-111. (265) Noether, G. E. (1973). 'Some simple distribution-free confidence intervals for the
Miké, V. (1971). 'Efficiency-robust systematic linear estimators of location'. J. Amer. center of a symmetric distribution'. J. Amer. Statist. Ass., 68, 716-719. (162)
Statist. Ass., 66, 594-601. (Ch. 4) Noether, G. E. (1974). 'Distribution-free confidence intervals based on linear rank
Miké, V. (1973). 'Robust Pitman-type estimators of location'. Ann. Inst. Statist. statistics'. In Williams (1974) (162)
Math., Tokyo, 25, 65-86. (Ch. 4) Odeh, R. E. (1967). 'The distribution of the maximum sum of ranks'. Technometrics,
Moore, P. G. (1957). 'The two-sample t-test based on range'. Biometrika, 44, 9, 271-278. (181, 182)
482-485. (107) Ogrodnikoff, K. (1928). 'On the occurrence of discordant observations and a new
Moran, M. A., and McMillan, R. G. (1973). 'Tests for one or two outliers in normal method of treating them'. Monthly Notices Roy. Astr. Soc., 88, 523-532. (Ch. 3,
H)
348 Outliers in statistica[ data References and bibliography 349

Likes, J. (1966). 'Distribution of Dixon's statistics in the case of an exponential samples with unknown variance: a correction'. Technometrics, 15, 637-640. (41,
population'. Metrika, 11, 46-54. (40, 54, 73, 77, 80, 81, 82, 83, 86, 87) 94, 104, 105)
Lingappaiah, G. S. (1976). 'Effect of outliers in the estimation of parameters'. Moshman, J. (1952). 'Testing a straggler mean in a 2-way classification using the
Metrika, 23, 27-30. (281) range'. Ann. Math. Statist., 23, 126-132. (Ch. 7)
Lund, R. E. (1975). 'Tables for an approximate test for outliers in linear models'. Mosteller, F. (1948). 'A k-sample slippage test for an extreme population'. Ann.
Technometrics, 17, 473-476. (255, 262) Math. Statist., 19, 58-65. (176, 283)
McCarthy, P. J. (1972). 'The effects of discarding inliers when binomia} data are Mosteller, F., and Tukey, J. W. (1950). 'Significance levels for a k-sample slippage
subject to classification errors'. J. Amer. Statist. Ass., 67, 515-529. (Ch. 3) test'. Ann. Math. Statist., 21, 120-123. (177)
McKay, A. T. (1935). 'The distribution of the difference between the extreme Mount, K. S., and Kale, B. K. (1973). 'On selecting a spurious observation'. Can.
observation and the sample mean in samples of n from a normal universe'. Math. Bull., 16, 75-78. (37, 77)
Biometrika, 27, 466-471. (111) Moussa-Hamouda, E., and Leone, F. C. (1974). 'The O-BLUE estimators for
McMillan, R. G. (1971). 'Tests for one or two outliers in normal samples with complete and censored samples in linear regression'. Technometrics, 16, 441-446.
unknown variance'. Technometrics, 13, 87-100. (34, 40, 44, 71, 73, 94, 95, 96, (Ch. 4, Ch. 7)
104, 105, 106) Mudrov, V. 1., Kushko, V. L., Mikhailov, V. 1., and Osovitskii, E. M. (1968). 'Some
McMillan, R. G., and David, H. A. (1971). 'Tests for one of two outliers in norma! experiments on the use of the least-moduli method in processing orbita! data'
samples with known variance'. Technometrics, 13, 75-85. (34, 40, 71, 73, 110, (Russian). Kosmicheskie Issledovaniya, 6, 502-504. English translation: Cosmic
111, 112) Research, 6, 421-431. (Ch. 2)
Maguire, B. A., Pearson, E. S., and Wynn, A. H. A. (1952). 'The time intervals Muncke, G. W. (1825). 'Beobachtung'. Gehler's Physikalisches Worterbuch, 2nd edn.
between industriai accidents'. Biometrika, 39, 168-180. (119) Leipzig, Vol. I, pp. 884-912. (Ch. 2, H)
M aire, C.; Boscovich, R. J. (17 55). De litteraria Expeditione per Pontificiam ditionem Murphy, R. B. (1951). On Tests for Outlying Observations. Ph.D. thesis, Princeton
ad dimetiendas duas Meridiani gradus, et corrigendam mappam geographicam, University, University Microfilms Inc., Ann Arbor, Mich. (40, 44, 71, 95)
jussu, et auspiciis Benedicti XIV Pont. Max. Suscepta. Ramae. (French translation: Naik, U. D. (1972). 'A Bayesian analysis of certain contaminated samples'. Research
Voyage Astronomique et Géographique dans l'État de l'Église, entrepis par l'Ordre Report No. 104, Department of Probability and Statistics, Sheffield University.
et sous les Auspices du Pope Benoit XIV, pour mesurer deux degrés du méridien, et (206)
corriger la Carte de l' État ecclesiastique. Paris, l, 770. (18) Nair, K. R. (1948). 'The distribution of the extreme deviate from the sample mean
Martin, R. D., Masreliez, C. J., and Goodfellow, D. M. (1973). 'Robust location and its studentized from'. Biometrika, 35, 118-144. (104, 110, 111)
estimates and confidence intervals via stochastic approximation: small sample Nair, K. R. (1952). 'Tables of percentage points of the "Studentized" extreme
behaviour'. Inst. Math. Statist. Bull., 2, 138. (Ch. 4) deviate from the sample mean'. Biometrika, 39, 189-191. (104)
Mendeleev, D. l. (1895). 'Course of work on the renewal of prototypes or standard Neave, H. R. (1972). 'Some quick tests for slippage'. The Statistician, 21, 197-208.
measures of lengths and weights' (Russian). Vremennik Glavnoi Palaty Mer i (178, 180)
Vesov, 2, 157-185. (Reprinted 1950; Collected Writings (Socheneniya), 22, 175- Neave, H. R. (1973). 'A power study of some tests for slippage'. The Statistician, 22,
213, izdat. Akad. Nauk, SSSR, Leningrad-Moscow.) (21, 47) 269-280. (180)
Mercer, W. B., and Hall, A. D. (1912). 'The experimental error of field trials'. J. Neave, H. R. (1975). 'A quick and simple technique for generai slippage problems'.
Agric. Sci., 4, 107-132. (2) J. Amer. Statist. Ass., 70, 721-726. (183, 185)
Merriman, M. (1877). 'List of writings relating to the method of least squares with Newcomb, S. (1886). 'A generalized theory of the combination of observations so as
historical and criticai notes'. Transactions of the Connecticut Academy of Arts and to obtain the best result'. Amer. J. Math., 8, 343-366. (21, 47)
Sciences, 4, 151-232. (Ch. 2, H) Newcomb, S. (1912). 'Researches on the motion of the moon, Part IL The mean
Merriman, M. (1884). A Textbook on the Method of Least Squares. Wiley. New York. motion of the moon and other astronomica! elements derived from observations of
(Ch. 2, H) eclipses and occultations extending from the period of the Babylonians until A.D.
Meshalkin, L. D., Smirnov, N. P., and Sosnovskii, N. N. (1969). 'On the stability of 1908'. Astronomica/ papers, 9, 1-249, U.S. Government Printing Office, Washing-
estimates of the distribution center' (Review) (Russian). Zavodskaya Laboratoriya, ton. (Ch. 2, H)
35, 51-61. English translation: Industriai Laboratory, 35, 712-716. (Ch. 4) Neyman, J., and Scott, E. L. (1971). 'Outlier proneness of phenomena and of related
Mickey, M. R. (1974). 'Detecting outliers with stepwise regression'. distribution. In Rustagi (1971). (37)
Communications-UCLA Health Sciences Facility, 1, l. (265) Noether, G. E. (1967). 'Wilcoxon con:fidence intervals for location parameters in the
Mickey, M. R., Dunn, O. J., and Clark, V. (1967). 'Note on use of stepwise discrete case'. J. Amer. Statist. Ass., 62, 184-188. (162)
regression in detecting outliers'. Computers & Biomed. Res., 1, 105-111. (265) Noether, G. E. (1973). 'Some simple distribution-free confidence intervals for the
Miké, V. (1971). 'Efficiency-robust systematic linear estimators of location'. J. Amer. center of a symmetric distribution'. J. Amer. Statist. Ass., 68, 716-719. (162)
Statist. Ass., 66, 594-601. (Ch. 4) Noether, G. E. (1974). 'Distribution-free confidence intervals based on linear rank
Miké, V. (1973). 'Robust Pitman-type estimators of location'. Ann. Inst. Statist. statistics'. In Williams (1974) (162)
Math., Tokyo, 25, 65-86. (Ch. 4) Odeh, R. E. (1967). 'The distribution of the maximum sum of ranks'. Technometrics,
Moore, P. G. (1957). 'The two-sample t-test based on range'. Biometrika, 44, 9, 271-278. (181, 182)
482-485. (107) Ogrodnikoff, K. (1928). 'On the occurrence of discordant observations and a new
Moran, M. A., and McMillan, R. G. (1973). 'Tests for one or two outliers in normal method of treating them'. Monthly Notices Roy. Astr. Soc., 88, 523-532. (Ch. 3,
H)
350 Outliers in statistica[ data References and bibliography 351

Olkin, I. (Ed.) (1960). Contributions to Probability and Statistics'. University Press, Proschan, F. (1975a). 'Testing suspected observations'. Sankhya, A, 17, 67-76. (Ch.
Stanford, Calif. 3)
Owen, D .. B. (1962). Handbook of Statistica[ Tables. Addison-Wesley, Reading, Proschan, F. (1957b). 'Testing suspected observations'. Ind. Qual. C.XIII, 14-19.
Mass. (204) (Ch. 3)
Paulson, E. (1952a). 'On the comparison of severa! experimental categories with a Puri, M. L. (Ed.) (1970). Nonparametric Techniques in Statistica[ Inference. Cam-
contrai'. A nn. Math. Statist., 23, 239-246. (Ch. 5) bridge University Press, London.
Paulson, E. (1925b). 'A optimum solution to the k-sample slippage problem for the Quenouille, M. H. (1953). The Design and Analysis of Experiment. Griffin, London.
normal distribution'. Ann. Math. Statist., 23, 610-616. (192, 200, 240) (11, 241)
Paulson, E. (1961). 'A non-parametric solution for the k-sample slippage problem'. Quenouille, M. H. (1956). 'Notes on bias in estimation'. Biometrika, 43, 353-360.
In Solomon (1961). (207) (48)
Paulson, E. (1962). 'A sequential procedure for"' comparing severa! experimental Quesenberry, C. P., and David, H. A. (1961). 'Some tests for outliers'. Biometrika,
categories with a standard or contrai'. Ann. Math. Statist., 33, 438-443. (207) 48, 379-387. (94, 95, 104, 105, 106, 193, 195, 200)
Pearson, E. S. (1926). 'A further note on the distribution of range in samples taken Rahman, N. A. (1972). Practical Exercises in Probability and Statistics. Gri:ffin,
from a normal population'. Biometrika, 18, 173-194. (114) London. (11)
Pearson, E. S. (1932). 'The percentage limits for the distribution of range in samples Ramachandran, K. V., and Khatri, C. G. (1957). 'On a decision procedure based on
from a normal population (n~lOO)'. Biometrika, 24,404-417. (114) the Tukey statistic'. Ann. Math. Statist., 28, 802-806. (43, 205)
Pearson, E. S., and Chandra Sekar, C. (1936). 'The e:fficiency of statistica! tools and a Randles, R. H., Ramberg, J. S., and Hogg, R. V. (1973). 'An adaptive procedure for
criterion for the rejection of outlying observations'. Biometrika, 28, 308-320. (22, selecting the population with the largest location parameter'. Technometrics, 15,
40, 54, 71, 73, 94, 243) 769-778. (Ch. 5)
Pearson, E. S., and Hartley, H. O. (1942). 'The probability integrai of the range in Rao, C. R. (1964). 'The use and interpretation of principal component analysis in
samples of n observations from a normal population'. Biometrika, 32, 301-310. applied research'. Sankhya A, 26, 329-358. (226)
(114) Rao, P. V. (1972). 'Robust estimation fora simple exponential model'. Australian 1.
Pearson, E. S., and Hartley, H. O. (Eds.) (1966). Biometrika Tables for Statisticians, Statist., 14, 54-62. (Ch. 4)
Vol. l, 3rd edn., Cambridge University Press, London. (84, 95, 97, 101, 014, 107, Rao, P. V., and Thornby, J. I. (1969). 'A robust point estimator in a generalized
110, 111, 193, 196, 197, 202) regression model'. A nn. Math. Statist., 40, 1784-1790. (Ch. 7)
Pearson, E. S., and Stephens, M. A. (1964). 'The ratio of range to standard deviation Rider, P. R. (19-33). 'Criteria for rejection of observations'. Washington University
in the same normal sample'. Biometrika, 51, 484-487. (39, 97) Studies-New Series, Science and Technology, 8, 3-23. (20)
Pearson, K. (Ed.) (1931). Tables for Statisticians and Biometricians. Biometrie Lab., Rohlf, F. J. (1975). 'Generalisation of the gap test for the detection of multivariate
University College, London. (25) outliers'. Biometrics, 31, 93-101. (221, 229)
Pearson, K. (Ed.) (1968). Tables ofthe Incomplete Beta-Function. 2nd edn. (with new Rosner, B. (1975). 'On the detection of many outliers'. Technometrics, 17, 221-227.
Introduction by Pearson, E. S., and Johnson, N. L.). Cambridge University Press, (Ch. 3)
London. (202) Rustagi, J. (Ed.) (1971). Optimising Methods in Statistics. Academic Press, New
Peirce, B. (1852). 'Criterion for the rejection of doubtful observations'. Astr. 1., 2, York.
161-163. (19) Sacks, J., and Ylvisaker, D. (1972). 'A note on Huber's robust estimation of a
Peirce, B. (1878). 'On Peirce's criterion' (with remarks by Scott, C. A.). Proceedings location parameter'. Ann. Math. Statist., 43, 1068-1075. (Ch. 4)
of the American Academy of Arts and Sciences, 13, 348-351. (Ch. 2, H) Samuelson, P. A. (1968). 'How deviant can you be?'. 1. Amer. Statist. Ass., 63,
Peirce, C. S. (1873). 'On the theory of errors of observations'. Report of the 1522-1525. (Ch. 2)
Superintendent of the United States Coast Survey, (for the year ending l November Sarhan, A. E., and Greenberg, B. G. (Eds.) (1962). Contributions to Order Statistics.
1870) U.S. Government Printing O:ffice, Washington. (Ch. 2, H) Wiley, New York. (159)
Pfanzagl, J. (1959). 'Ein kombiniertes Test und Klassifikations-Problem'. Metrika, 2, Saunder, S. A. (1903). 'Note on the use of Peirce's criterion for the rejection of
11-45. (195-197) doubtful observations'. Monthly Notices Roy. Astr. Soc., 63, 432-436. (19)
Pillai, K. C. S. (1959). 'Upper percentage points of the extreme studentized deviate Scholz, F. (1974). 'A comparison of e:fficient location estimators'. Ann. Statist., 2,
from the sample mean'. Biometrika, 46, 473-474. (Ch. 3) 1323-1326. (Ch. 4)
Pillai, K. C. S., and Tienzo, B. P. (1959). 'On the distribution of the extreme Schuster, E. F., and Narvarte, J. A. (1973). 'A new nonparametric estimation of the
studentized deviate from the sample mean'. Biometrika, 46, 467-472. (Ch. 3) center of a symmetric distribution'. Ann. Statist., 1, 1096-1104. (Ch. 4)
Prescott, P. (1975a). 'An approximate test for outliers in linear regression'. Tech- Schweder, T. (1973). 'Window estimation of the asymptotic variance of the Hodges-
nometrics, 17, 127-128. (261) Lehmann estimator'. Inst. Math. Statist. Bull., 2, .92. (Ch. 4)
Prescott, P. (1975b). 'An approximate test for outliers in linear models'. Technomet- Schweder, T. (1976). 'Some "optimal" methods to detect structural shift or outliers
rics, 17, 129-132. (255, 261) in regression'. 1. Amer. Statist. Ass., 71, 491-501. (256)
Prescott, P. (1976). 'On a test for normality based on sample entropy'. 1. Roy. Statist. Searls, D. T. (1966). 'An estimator fora population mean which reduces the effect of
Soc. B, 38, 254-256. (249) large true observations'. 1. Amer. Statist. Ass., 61, 1200-1204. (Ch. 4)
Proschan, F. (1953). 'Rejection of outlying observations'. Amer. 1. Phys., 21, Sen, P. K. (1968). 'On a further robustness property of the test and estimator based
520-525. (Ch. 3) on Wilcoxon's signed rank statistic'. Ann. Math. Statist., 39, 282-285. (Ch. 4)
350 Outliers in statistica[ data References and bibliography 351

Olkin, I. (Ed.) (1960). Contributions to Probability and Statistics'. University Press, Proschan, F. (1975a). 'Testing suspected observations'. Sankhya, A, 17, 67-76. (Ch.
Stanford, Calif. 3)
Owen, D .. B. (1962). Handbook of Statistica[ Tables. Addison-Wesley, Reading, Proschan, F. (1957b). 'Testing suspected observations'. Ind. Qual. C.XIII, 14-19.
Mass. (204) (Ch. 3)
Paulson, E. (1952a). 'On the comparison of severa! experimental categories with a Puri, M. L. (Ed.) (1970). Nonparametric Techniques in Statistica[ Inference. Cam-
contrai'. A nn. Math. Statist., 23, 239-246. (Ch. 5) bridge University Press, London.
Paulson, E. (1925b). 'A optimum solution to the k-sample slippage problem for the Quenouille, M. H. (1953). The Design and Analysis of Experiment. Griffin, London.
normal distribution'. Ann. Math. Statist., 23, 610-616. (192, 200, 240) (11, 241)
Paulson, E. (1961). 'A non-parametric solution for the k-sample slippage problem'. Quenouille, M. H. (1956). 'Notes on bias in estimation'. Biometrika, 43, 353-360.
In Solomon (1961). (207) (48)
Paulson, E. (1962). 'A sequential procedure for"' comparing severa! experimental Quesenberry, C. P., and David, H. A. (1961). 'Some tests for outliers'. Biometrika,
categories with a standard or contrai'. Ann. Math. Statist., 33, 438-443. (207) 48, 379-387. (94, 95, 104, 105, 106, 193, 195, 200)
Pearson, E. S. (1926). 'A further note on the distribution of range in samples taken Rahman, N. A. (1972). Practical Exercises in Probability and Statistics. Gri:ffin,
from a normal population'. Biometrika, 18, 173-194. (114) London. (11)
Pearson, E. S. (1932). 'The percentage limits for the distribution of range in samples Ramachandran, K. V., and Khatri, C. G. (1957). 'On a decision procedure based on
from a normal population (n~lOO)'. Biometrika, 24,404-417. (114) the Tukey statistic'. Ann. Math. Statist., 28, 802-806. (43, 205)
Pearson, E. S., and Chandra Sekar, C. (1936). 'The e:fficiency of statistica! tools and a Randles, R. H., Ramberg, J. S., and Hogg, R. V. (1973). 'An adaptive procedure for
criterion for the rejection of outlying observations'. Biometrika, 28, 308-320. (22, selecting the population with the largest location parameter'. Technometrics, 15,
40, 54, 71, 73, 94, 243) 769-778. (Ch. 5)
Pearson, E. S., and Hartley, H. O. (1942). 'The probability integrai of the range in Rao, C. R. (1964). 'The use and interpretation of principal component analysis in
samples of n observations from a normal population'. Biometrika, 32, 301-310. applied research'. Sankhya A, 26, 329-358. (226)
(114) Rao, P. V. (1972). 'Robust estimation fora simple exponential model'. Australian 1.
Pearson, E. S., and Hartley, H. O. (Eds.) (1966). Biometrika Tables for Statisticians, Statist., 14, 54-62. (Ch. 4)
Vol. l, 3rd edn., Cambridge University Press, London. (84, 95, 97, 101, 014, 107, Rao, P. V., and Thornby, J. I. (1969). 'A robust point estimator in a generalized
110, 111, 193, 196, 197, 202) regression model'. A nn. Math. Statist., 40, 1784-1790. (Ch. 7)
Pearson, E. S., and Stephens, M. A. (1964). 'The ratio of range to standard deviation Rider, P. R. (19-33). 'Criteria for rejection of observations'. Washington University
in the same normal sample'. Biometrika, 51, 484-487. (39, 97) Studies-New Series, Science and Technology, 8, 3-23. (20)
Pearson, K. (Ed.) (1931). Tables for Statisticians and Biometricians. Biometrie Lab., Rohlf, F. J. (1975). 'Generalisation of the gap test for the detection of multivariate
University College, London. (25) outliers'. Biometrics, 31, 93-101. (221, 229)
Pearson, K. (Ed.) (1968). Tables ofthe Incomplete Beta-Function. 2nd edn. (with new Rosner, B. (1975). 'On the detection of many outliers'. Technometrics, 17, 221-227.
Introduction by Pearson, E. S., and Johnson, N. L.). Cambridge University Press, (Ch. 3)
London. (202) Rustagi, J. (Ed.) (1971). Optimising Methods in Statistics. Academic Press, New
Peirce, B. (1852). 'Criterion for the rejection of doubtful observations'. Astr. 1., 2, York.
161-163. (19) Sacks, J., and Ylvisaker, D. (1972). 'A note on Huber's robust estimation of a
Peirce, B. (1878). 'On Peirce's criterion' (with remarks by Scott, C. A.). Proceedings location parameter'. Ann. Math. Statist., 43, 1068-1075. (Ch. 4)
of the American Academy of Arts and Sciences, 13, 348-351. (Ch. 2, H) Samuelson, P. A. (1968). 'How deviant can you be?'. 1. Amer. Statist. Ass., 63,
Peirce, C. S. (1873). 'On the theory of errors of observations'. Report of the 1522-1525. (Ch. 2)
Superintendent of the United States Coast Survey, (for the year ending l November Sarhan, A. E., and Greenberg, B. G. (Eds.) (1962). Contributions to Order Statistics.
1870) U.S. Government Printing O:ffice, Washington. (Ch. 2, H) Wiley, New York. (159)
Pfanzagl, J. (1959). 'Ein kombiniertes Test und Klassifikations-Problem'. Metrika, 2, Saunder, S. A. (1903). 'Note on the use of Peirce's criterion for the rejection of
11-45. (195-197) doubtful observations'. Monthly Notices Roy. Astr. Soc., 63, 432-436. (19)
Pillai, K. C. S. (1959). 'Upper percentage points of the extreme studentized deviate Scholz, F. (1974). 'A comparison of e:fficient location estimators'. Ann. Statist., 2,
from the sample mean'. Biometrika, 46, 473-474. (Ch. 3) 1323-1326. (Ch. 4)
Pillai, K. C. S., and Tienzo, B. P. (1959). 'On the distribution of the extreme Schuster, E. F., and Narvarte, J. A. (1973). 'A new nonparametric estimation of the
studentized deviate from the sample mean'. Biometrika, 46, 467-472. (Ch. 3) center of a symmetric distribution'. Ann. Statist., 1, 1096-1104. (Ch. 4)
Prescott, P. (1975a). 'An approximate test for outliers in linear regression'. Tech- Schweder, T. (1973). 'Window estimation of the asymptotic variance of the Hodges-
nometrics, 17, 127-128. (261) Lehmann estimator'. Inst. Math. Statist. Bull., 2, .92. (Ch. 4)
Prescott, P. (1975b). 'An approximate test for outliers in linear models'. Technomet- Schweder, T. (1976). 'Some "optimal" methods to detect structural shift or outliers
rics, 17, 129-132. (255, 261) in regression'. 1. Amer. Statist. Ass., 71, 491-501. (256)
Prescott, P. (1976). 'On a test for normality based on sample entropy'. 1. Roy. Statist. Searls, D. T. (1966). 'An estimator fora population mean which reduces the effect of
Soc. B, 38, 254-256. (249) large true observations'. 1. Amer. Statist. Ass., 61, 1200-1204. (Ch. 4)
Proschan, F. (1953). 'Rejection of outlying observations'. Amer. 1. Phys., 21, Sen, P. K. (1968). 'On a further robustness property of the test and estimator based
520-525. (Ch. 3) on Wilcoxon's signed rank statistic'. Ann. Math. Statist., 39, 282-285. (Ch. 4)
References and bibliography 353
352 Outliers in statistica[ data

Stigler, S. M. (1973a). 'The asymptotic distribution of the trimmed mean'. Ann.


Sen, P. K., and Puri, M. L. (1969). 'On robust nonparametric estimation in some Statist., 1, 472-477. (Ch. 4)
multivariate linear models'. In Krishnaiah (1969). (Ch. 7) Stigler, S. M. (1973b). 'Simon Newcomb, Percy Daniell, and the history of robust
Seth, G. R. (1950). 'On the distribution of the two closest among a set of three estimation 1885-1920'. 1. Amer. Statist. Ass., 68, 872-879. (21, 158)
observations'. Ann. Math. Statist., 21, 298-301. (51) Stigler, S. M. (1974). 'Linear functions of arder statistics with smooth weight
Shapiro, S. S., and Wilk, M. B. (1965). 'An analysis of variance test for normality functions'. Ann. Statist., 2, 676-693. (Ch. 4)
(co.mplete samples)'. Biometrika, 52, 591-611. (31, 40, 88, 103, 249) Stone, E. J. (1868). 'On the rejection of discordant observations'. Monthly Notices
Shaptro, S. S., and Wilk, M. B. (1972). 'An analysis of variance test for the Roy. Astr. Soc., 28, 165-168. (19)
exl?onential dis~ribution (complete samples). Technometrics, 14, 355...:.370. (31, 40) Stone, E. J. (1873). 'On the rejection of discordant observations'. Monthly Notices
Shaptro, S. S., Wilk, M. B., and Chen, M. J. (1968). 'A comparative study of various Roy. Astr. Soc., 34, 9-15. (20)
test~ for normality'. 1. Amer. Statist. Ass., 63, 1343-1372. (31, 40, 97, 101, 103) Stone, E. J. (1874). 'Note on a discussion relating to the rejection of discordant
Sheymn, O. B. (1966a). 'Origin of the theory of errors'. Nature 211 1003-1004 observations'. Monthly Notices Roy. Astr. Soc., 35, 107-108. (Ch. 2, H)
(Ch. 2, H) ' ' .
Student (1927). 'Errors of routine analysis'. Biometrika, 19, 151-164. (47)
Sh~yin, O. B. .~1966b) .. 'On selection and adjustment of direct observations' (Rus- Sukhov, A. N. (1971). 'Comparison of the median and the arithmetic mean in the
s.an): Izvestua Vysshzkh Uchebnykh Zavedenii. Geodeziia i Aerofotos'emka, 1966. case of a small sample' (Russian). Izvestiia Vysshikh Uchebnykh Zavedenii.
~.nJ!.)sh translation: Geodesy and Aero-photography, 1966 (1967), 114-117. (Ch. Geodeziia i Aerofotos'emka, 1971, 59-65. English translation: Geodesy and
Aerophotography, 1971, (1973), 326-329. (Ch. 4)
Sheyin, O. B. (1971). 'J. H. Lambert's work on probability'. Archive for History of Swaroop, R., and Winter, W. R. (1971). 'A statistica! technique for computer
Exact Sciences, 7, 244-256. (Ch. 2, H) identification of outliers in multivariate data'. NASA TN D-6472. National
S~or.ac~, G. R. (1974). 'Random means'. Ann. Statist., 2, 661-675. (Ch. 4) Aeronautics and Space Administration, Washington, D.C. (Ch. 7)
Stddtqm, M. M., and Raghunandanan, K. (1967). 'Asymptotically robust estimators Swaroop, R., West, K. A., and Lewis, C. E. Jr. (1969). 'A simple technique for
. of location'. 1. Amer. Statist. Ass., 62, 950-953. (135, 157, 164) automatic computer editing of biodata'. NASA TN D-5275. National Aeronautics
Smha, S. K. (1972). 'Reliability estimation in life testing in the presence of an outlier and Space Administration, Washington, D.C. (Ch. 3)
. observation'. Op. Res., 20, 888-894. (51, 77, 155, 272, 279, 281) Takeuchi, K. (1971). 'A uniformly asymptotically e:fficient estimator of a location
Smha, S. K. (1973a). 'Distributions of arder statistics and estimation of mean life parameter'. 1. Amer. Statist. Ass., 66, 292-301. (49, 142, 153, 166)
. when an outlier may be present'. Canad. 1. Statist., 1, 119-121. (50, 51, 77, 172) Thomas, J. (1969). 'Monte Carlo investigation of the robustness of Dixon's criteria
Smha, S. K. (1973b). 'Lifetesting and reliability estimation for non-homogeneous for testing outlying observations'. Proceedings of the Fourteenth Conference on the
data-a Bayestan approach'. Comm. Statist., 2 235-243. (50 51 77 155 272 Design of Experiments in Army Research, Development and Testing, pp. 437-483.
279, 281) ' ' ' ' ' '
(Ch. 3)
Sin~a, .s. ~· (1973c). 'Estim~tion of the parameters of a two-parameter exponential Thompson, G. W. (1955). 'Bounds for the ratio of range to standard deviation'.
dtstnbutiOn when an outher may be present'. Utilitas Mathematica 3 75-82 . Biometrika, 42, 268-269. (107)
. Correction (1974), Utilitas Mathematica, 4, 333-334. (51, 77, 172) ' ' Thompson, W. A. Jr., and Willke, T. A: (1963). 'On an extreme rank sum test for
Smh~, S. K. C,1973d). '_Some distributions relevant in Iife testing when an outlier may
outliers'. Biometrika, 50, 375-383. (252)
be present . Technzcal Report No. 42, Department of Statistics University of Thompson, W. R. (1935). 'On a criterion for the rejection of observations and the
Manitoba, Winnipeg, Canada. (172) ' distribution of the ratio of the deviation to the sample standard deviation'. Ann.
Siotani, M. (1959). 'The extreme value of the generalised distances of the individuai Math. Statist., 6, 214-219. (22, 254)
points in the multivariate normal sample'. A nn. Inst. Statist. Math. Tokyo 10 Thompson, W. R. (1942). 'On a criterion of the difference between the extreme
183-208. (211, 219) ' ' observation and the sample mean in samples of n from a normal universe'.
Snedecor, G. W., and Cochran, W. G. (1967). Statistica[ Methods. 6th edn. The Iowa Biometrika, 32, 301-310. (Ch. 3)
State University Press, Ames, Iowa. (265) Tiao, G. C., and Guttman, I. (1967). 'Analysis of outliers with adjusted residuals'.
Solomon, H. (Ed.) (1961). Studies in Item Analysis and Prediction. University Press, Technometrics, 9, 541-559. (127, 132, 168, 260)
Stanford Calif. Tietjen, G. L., and Moore, R. H. (1972). 'Some Grubbs-type statistics for the
Srikantan, K. S. (1961). 'Testing for the single outlier in a regression model'. detection of severa! outliers'. Technometrics, 14, 583-597. (38, 40, 73, 95, 96,
Sankhyii, A, 23, 251-260. (254, 257, 259, 266) 102)
Srivastava, M. S. (1973). 'The performance of a sequential procedure fora slippage Tietjen, G. L., Moore, R. H., and Beckman, R. J. (1973). 'Testing fora single outlier
problem'. 1. Roy. Statist. Soc. B, 35, 97-103. (206) in simple linear regression'. Technometrics, 15, 717-721. (254, 261)
Stampfer, S. (1839). 'Ueber das Verhaltniss der Wiener Klafter zum Meter'. Tippett, L. H. C. (1925). 'On the extreme individuals and the range of samples taken
1ahrbucher des K. K. Polytechnisches Institutes (Vienna), 20, 145-176. (Ch. 2, H) from a normal population'. Biometrika, 17, 364-387. (114)
Stefansky, W. (1971). 'Rejecting outliers by maximum normal residua!'. Ann. Math. Torgerson, E. N. (1971). 'A counterexample on translation invariant estimators'.
Statist., 42, 35-45. (94, 242, 254, 255) Ann. Math. Statist., 42, 1450-1451. (Ch. 4)
Stefansky, W. (1972). 'Rejecting outliers in factorial designs'. Technometrics 14 Truax, D. R. (1953). 'An optimum slippage test for the variances of k normal
469-479. (242, 254, 255) ' ' distributions'. Ann. Math. Statist., 24, 669-674. (43, 196, 200)
Stewart, R. M. (1920a). 'Pierce's criterion'. Popular Astronomy, 28, 2-3. (Ch. 2, H) Tukey, J. W. (1949). 'The truncated mean in moderately large samples'. Memoran-
Stewart, R. M. (1920b). 'The treatment of discordant observations'. Popular As-
tronomy, 28, 4-6. (Ch. 2, H)
References and bibliography 353
352 Outliers in statistica[ data

Stigler, S. M. (1973a). 'The asymptotic distribution of the trimmed mean'. Ann.


Sen, P. K., and Puri, M. L. (1969). 'On robust nonparametric estimation in some Statist., 1, 472-477. (Ch. 4)
multivariate linear models'. In Krishnaiah (1969). (Ch. 7) Stigler, S. M. (1973b). 'Simon Newcomb, Percy Daniell, and the history of robust
Seth, G. R. (1950). 'On the distribution of the two closest among a set of three estimation 1885-1920'. 1. Amer. Statist. Ass., 68, 872-879. (21, 158)
observations'. Ann. Math. Statist., 21, 298-301. (51) Stigler, S. M. (1974). 'Linear functions of arder statistics with smooth weight
Shapiro, S. S., and Wilk, M. B. (1965). 'An analysis of variance test for normality functions'. Ann. Statist., 2, 676-693. (Ch. 4)
(co.mplete samples)'. Biometrika, 52, 591-611. (31, 40, 88, 103, 249) Stone, E. J. (1868). 'On the rejection of discordant observations'. Monthly Notices
Shaptro, S. S., and Wilk, M. B. (1972). 'An analysis of variance test for the Roy. Astr. Soc., 28, 165-168. (19)
exl?onential dis~ribution (complete samples). Technometrics, 14, 355...:.370. (31, 40) Stone, E. J. (1873). 'On the rejection of discordant observations'. Monthly Notices
Shaptro, S. S., Wilk, M. B., and Chen, M. J. (1968). 'A comparative study of various Roy. Astr. Soc., 34, 9-15. (20)
test~ for normality'. 1. Amer. Statist. Ass., 63, 1343-1372. (31, 40, 97, 101, 103) Stone, E. J. (1874). 'Note on a discussion relating to the rejection of discordant
Sheymn, O. B. (1966a). 'Origin of the theory of errors'. Nature 211 1003-1004 observations'. Monthly Notices Roy. Astr. Soc., 35, 107-108. (Ch. 2, H)
(Ch. 2, H) ' ' .
Student (1927). 'Errors of routine analysis'. Biometrika, 19, 151-164. (47)
Sh~yin, O. B. .~1966b) .. 'On selection and adjustment of direct observations' (Rus- Sukhov, A. N. (1971). 'Comparison of the median and the arithmetic mean in the
s.an): Izvestua Vysshzkh Uchebnykh Zavedenii. Geodeziia i Aerofotos'emka, 1966. case of a small sample' (Russian). Izvestiia Vysshikh Uchebnykh Zavedenii.
~.nJ!.)sh translation: Geodesy and Aero-photography, 1966 (1967), 114-117. (Ch. Geodeziia i Aerofotos'emka, 1971, 59-65. English translation: Geodesy and
Aerophotography, 1971, (1973), 326-329. (Ch. 4)
Sheyin, O. B. (1971). 'J. H. Lambert's work on probability'. Archive for History of Swaroop, R., and Winter, W. R. (1971). 'A statistica! technique for computer
Exact Sciences, 7, 244-256. (Ch. 2, H) identification of outliers in multivariate data'. NASA TN D-6472. National
S~or.ac~, G. R. (1974). 'Random means'. Ann. Statist., 2, 661-675. (Ch. 4) Aeronautics and Space Administration, Washington, D.C. (Ch. 7)
Stddtqm, M. M., and Raghunandanan, K. (1967). 'Asymptotically robust estimators Swaroop, R., West, K. A., and Lewis, C. E. Jr. (1969). 'A simple technique for
. of location'. 1. Amer. Statist. Ass., 62, 950-953. (135, 157, 164) automatic computer editing of biodata'. NASA TN D-5275. National Aeronautics
Smha, S. K. (1972). 'Reliability estimation in life testing in the presence of an outlier and Space Administration, Washington, D.C. (Ch. 3)
. observation'. Op. Res., 20, 888-894. (51, 77, 155, 272, 279, 281) Takeuchi, K. (1971). 'A uniformly asymptotically e:fficient estimator of a location
Smha, S. K. (1973a). 'Distributions of arder statistics and estimation of mean life parameter'. 1. Amer. Statist. Ass., 66, 292-301. (49, 142, 153, 166)
. when an outlier may be present'. Canad. 1. Statist., 1, 119-121. (50, 51, 77, 172) Thomas, J. (1969). 'Monte Carlo investigation of the robustness of Dixon's criteria
Smha, S. K. (1973b). 'Lifetesting and reliability estimation for non-homogeneous for testing outlying observations'. Proceedings of the Fourteenth Conference on the
data-a Bayestan approach'. Comm. Statist., 2 235-243. (50 51 77 155 272 Design of Experiments in Army Research, Development and Testing, pp. 437-483.
279, 281) ' ' ' ' ' '
(Ch. 3)
Sin~a, .s. ~· (1973c). 'Estim~tion of the parameters of a two-parameter exponential Thompson, G. W. (1955). 'Bounds for the ratio of range to standard deviation'.
dtstnbutiOn when an outher may be present'. Utilitas Mathematica 3 75-82 . Biometrika, 42, 268-269. (107)
. Correction (1974), Utilitas Mathematica, 4, 333-334. (51, 77, 172) ' ' Thompson, W. A. Jr., and Willke, T. A: (1963). 'On an extreme rank sum test for
Smh~, S. K. C,1973d). '_Some distributions relevant in Iife testing when an outlier may
outliers'. Biometrika, 50, 375-383. (252)
be present . Technzcal Report No. 42, Department of Statistics University of Thompson, W. R. (1935). 'On a criterion for the rejection of observations and the
Manitoba, Winnipeg, Canada. (172) ' distribution of the ratio of the deviation to the sample standard deviation'. Ann.
Siotani, M. (1959). 'The extreme value of the generalised distances of the individuai Math. Statist., 6, 214-219. (22, 254)
points in the multivariate normal sample'. A nn. Inst. Statist. Math. Tokyo 10 Thompson, W. R. (1942). 'On a criterion of the difference between the extreme
183-208. (211, 219) ' ' observation and the sample mean in samples of n from a normal universe'.
Snedecor, G. W., and Cochran, W. G. (1967). Statistica[ Methods. 6th edn. The Iowa Biometrika, 32, 301-310. (Ch. 3)
State University Press, Ames, Iowa. (265) Tiao, G. C., and Guttman, I. (1967). 'Analysis of outliers with adjusted residuals'.
Solomon, H. (Ed.) (1961). Studies in Item Analysis and Prediction. University Press, Technometrics, 9, 541-559. (127, 132, 168, 260)
Stanford Calif. Tietjen, G. L., and Moore, R. H. (1972). 'Some Grubbs-type statistics for the
Srikantan, K. S. (1961). 'Testing for the single outlier in a regression model'. detection of severa! outliers'. Technometrics, 14, 583-597. (38, 40, 73, 95, 96,
Sankhyii, A, 23, 251-260. (254, 257, 259, 266) 102)
Srivastava, M. S. (1973). 'The performance of a sequential procedure fora slippage Tietjen, G. L., Moore, R. H., and Beckman, R. J. (1973). 'Testing fora single outlier
problem'. 1. Roy. Statist. Soc. B, 35, 97-103. (206) in simple linear regression'. Technometrics, 15, 717-721. (254, 261)
Stampfer, S. (1839). 'Ueber das Verhaltniss der Wiener Klafter zum Meter'. Tippett, L. H. C. (1925). 'On the extreme individuals and the range of samples taken
1ahrbucher des K. K. Polytechnisches Institutes (Vienna), 20, 145-176. (Ch. 2, H) from a normal population'. Biometrika, 17, 364-387. (114)
Stefansky, W. (1971). 'Rejecting outliers by maximum normal residua!'. Ann. Math. Torgerson, E. N. (1971). 'A counterexample on translation invariant estimators'.
Statist., 42, 35-45. (94, 242, 254, 255) Ann. Math. Statist., 42, 1450-1451. (Ch. 4)
Stefansky, W. (1972). 'Rejecting outliers in factorial designs'. Technometrics 14 Truax, D. R. (1953). 'An optimum slippage test for the variances of k normal
469-479. (242, 254, 255) ' ' distributions'. Ann. Math. Statist., 24, 669-674. (43, 196, 200)
Stewart, R. M. (1920a). 'Pierce's criterion'. Popular Astronomy, 28, 2-3. (Ch. 2, H) Tukey, J. W. (1949). 'The truncated mean in moderately large samples'. Memoran-
Stewart, R. M. (1920b). 'The treatment of discordant observations'. Popular As-
tronomy, 28, 4-6. (Ch. 2, H)
354 Outliers in statistical data References and bibliography 355

dum Report No. 32. Statistica! Research Group, Princeton University, Princeton, Yanagawa, T. (1969). 'A small sample robust competitor of Hodges-Lehmann
N.J. (Reports Nos. 25, 33, and 34 relate.) (Ch. 4) estimate'. Bull. Math. Statist. (Fukuoka), 13, (3-4), 1-14. (Ch. 4)
Tukey, J. W. (1960). 'A survey of sampling from contaminated distributions'. In Yhap, E. F. (1967). An Asymptotic Optimally Robust Linear Estimation of Location
Olkin (1960). (33, 46, 127, 130, 158) for Symmetric Shapes. Doctoral dissertation, New York University, University
Tukey, J. W. (1962). 'The future of data analysis'. Ann. Math. Statist., 3, 1-67. (249) Microfilms Inc., Ann Arbor, Mich. (Ch. 4)
Tukey, J. W. (1977). Exploratory Data Analysis, Vol. l. Addison-Wesley, Reading, Yohai, V. J. (1974). 'Robust estimation in the linear model'. Ann. Statist., 2,
Mass. (153) 562-567. (Ch. 7)
Tukey, J. W., and McLaughlin, D. M. (1963). 'Less vulnerable confidence and Youden, W. J. (1949). 'The fallacy of the best two out of three'. National Bureau of
significance procedures for location based on a single sample: Standards Technical Bulletin, 33, 77-78. (Ch. 2)
Trimming/Winsorization'. Sankhyii, A, 25, 331-352. (48, 142, 161) Youden; W. J. (1953). 'Sets of three measurements'. The Scientific Monthly, 77,
Veale, J. R., and Huntsberger, D. V. (1969).,. 'Estimation of a mean when one 143-147. (Ch. 2)
observation may be spurious'. Technometrics, 11, 331-339. (51, 127) von Zach, F. X. (1805). 'Versuch einer auf Erfahrung gegrundeten Bestimmung
Veale, J. R., and Kale, B. K. (1972). 'Tests of hypotheses for expected life in the terrestrischer Refractionen'. Monatliche Correspondenz zur Beforderung der Er-
presence of a spurious observation'. Utilitas Mathematica, 2, 9-23. (35, 50, 51, 77, dund Himmels-Kunde, 11, 389-415. (Ch. 2, H)
144, 172) Zelenen'kiy, V. P. (1969). 'Application of statistica! decision theory to the exclusion
Walsh, J. E. (1950). 'Some nonparametric tests of whether the largest observations of anomalous measurements'. Izvestiia Akademii Nauk SSR, Tekhnicheskaya
of a set are too large or too small'. Ann. Math. Statist., 21, 583-592. Correction Kibernetika, 1969, (2), 139-142 (Russian). Translated in Engineering Cybernetics
(1953), Ann. Math. Statist., 24, 134-135. (44, 283) (1969) (2), 122-126. (Ch. 3, Ch. 7)
Walsh, J. E. (1959). 'Large sample non-parametric rejection of outlying observa- Zinger, A. (1961). 'Detection of best and outlying normal distributions with known
tions'. Ann. Inst. Statist. Math. Tokyo, 10, 223-232. (44, 283, 285) variances'. Biometrika, 48, 457. (Ch. 3)
Walsh, J. E. (1965). Handbook of Non-parametric Statistics, Il. Van Nostrand,
Princeton N.J. (44, 283)
Walsh, J. E., and Kelleher, G. J. (1973). 'Nonparametric estimation of mean and
variance when a few "sample" values possibly outliers'. A nn. Inst. Statist. Math.
Tokyo, 25, 87-90. (285)
Wani, J. K., and Kabe, D. G. (1971). 'Distributions of Dixon's statistics for the
truncated exponential, rectangular, and random intervals population'. Metron, 29,
151-160. (124, 125)
West, S. A. (1975). 'Bias in the estimator of Kendall's rank correlation when
extreme pairs are removed from the sample'. J. Amer. Statist. Ass., 70, 439-442.
(Ch. 6)
Wilk, M. B., and Gnanadesikan, R. (1964). 'Graphical methods for internai compari-
sons in multiresponse experiments'. Ann. Math. Statist., 35, 613-631. (222, 226,
230)
Wilk, M. B., Gnanadesikan, R., and Huyett, M. J. (1962a). 'Probability plots for the
gamma distribution'. Technometrics, 4, 1-20. (226, 230)
Wilk, M. B., Gnanadesikan, R., and Huyett, M. J. (1962b). 'Estimation of parame-
ters of the gamma distribution using arder statistics'. Biometrika, 49, 525-545.
(226)
Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. (215)
Wilks, S. S. (1963). 'Multivariate statistica! outliers' Sankhyii, A, 25, 407-426. (215,
226)
Williams, E. J. (Ed.) (1974). Studies in Probability and Statistics. Jerusalem Academic
Press, Jerusalem.
Willke, T. A. (1966). 'A note on contaminated samples of size three'. Journal of
Research of the National Bureau of Standards, B, 70, 149-151. (51, 127, 135)
Winlock, J. (1856). 'On Professar Airy's objections to Peirce's criterion'. Astr. l. 4,
145-147. (Ch. 2, H)
Wooding, W. M. (1969). 'The computation and use of residuals in the analysis of
experiment data'. J. Quality Technology, 1, 175-188. Correction, 1, 294 (226)
Wright, T. W. (1884). A Treatise on the Adjustment of Obseroations by the Method of
Least Squares. Van Nostrand, New York. (2, 21)
Wright, T. W., and Hayford, J. F. (1906). Adjustment of Obseroations. V an Nostrand,
New York. (21)
354 Outliers in statistical data References and bibliography 355

dum Report No. 32. Statistica! Research Group, Princeton University, Princeton, Yanagawa, T. (1969). 'A small sample robust competitor of Hodges-Lehmann
N.J. (Reports Nos. 25, 33, and 34 relate.) (Ch. 4) estimate'. Bull. Math. Statist. (Fukuoka), 13, (3-4), 1-14. (Ch. 4)
Tukey, J. W. (1960). 'A survey of sampling from contaminated distributions'. In Yhap, E. F. (1967). An Asymptotic Optimally Robust Linear Estimation of Location
Olkin (1960). (33, 46, 127, 130, 158) for Symmetric Shapes. Doctoral dissertation, New York University, University
Tukey, J. W. (1962). 'The future of data analysis'. Ann. Math. Statist., 3, 1-67. (249) Microfilms Inc., Ann Arbor, Mich. (Ch. 4)
Tukey, J. W. (1977). Exploratory Data Analysis, Vol. l. Addison-Wesley, Reading, Yohai, V. J. (1974). 'Robust estimation in the linear model'. Ann. Statist., 2,
Mass. (153) 562-567. (Ch. 7)
Tukey, J. W., and McLaughlin, D. M. (1963). 'Less vulnerable confidence and Youden, W. J. (1949). 'The fallacy of the best two out of three'. National Bureau of
significance procedures for location based on a single sample: Standards Technical Bulletin, 33, 77-78. (Ch. 2)
Trimming/Winsorization'. Sankhyii, A, 25, 331-352. (48, 142, 161) Youden; W. J. (1953). 'Sets of three measurements'. The Scientific Monthly, 77,
Veale, J. R., and Huntsberger, D. V. (1969).,. 'Estimation of a mean when one 143-147. (Ch. 2)
observation may be spurious'. Technometrics, 11, 331-339. (51, 127) von Zach, F. X. (1805). 'Versuch einer auf Erfahrung gegrundeten Bestimmung
Veale, J. R., and Kale, B. K. (1972). 'Tests of hypotheses for expected life in the terrestrischer Refractionen'. Monatliche Correspondenz zur Beforderung der Er-
presence of a spurious observation'. Utilitas Mathematica, 2, 9-23. (35, 50, 51, 77, dund Himmels-Kunde, 11, 389-415. (Ch. 2, H)
144, 172) Zelenen'kiy, V. P. (1969). 'Application of statistica! decision theory to the exclusion
Walsh, J. E. (1950). 'Some nonparametric tests of whether the largest observations of anomalous measurements'. Izvestiia Akademii Nauk SSR, Tekhnicheskaya
of a set are too large or too small'. Ann. Math. Statist., 21, 583-592. Correction Kibernetika, 1969, (2), 139-142 (Russian). Translated in Engineering Cybernetics
(1953), Ann. Math. Statist., 24, 134-135. (44, 283) (1969) (2), 122-126. (Ch. 3, Ch. 7)
Walsh, J. E. (1959). 'Large sample non-parametric rejection of outlying observa- Zinger, A. (1961). 'Detection of best and outlying normal distributions with known
tions'. Ann. Inst. Statist. Math. Tokyo, 10, 223-232. (44, 283, 285) variances'. Biometrika, 48, 457. (Ch. 3)
Walsh, J. E. (1965). Handbook of Non-parametric Statistics, Il. Van Nostrand,
Princeton N.J. (44, 283)
Walsh, J. E., and Kelleher, G. J. (1973). 'Nonparametric estimation of mean and
variance when a few "sample" values possibly outliers'. A nn. Inst. Statist. Math.
Tokyo, 25, 87-90. (285)
Wani, J. K., and Kabe, D. G. (1971). 'Distributions of Dixon's statistics for the
truncated exponential, rectangular, and random intervals population'. Metron, 29,
151-160. (124, 125)
West, S. A. (1975). 'Bias in the estimator of Kendall's rank correlation when
extreme pairs are removed from the sample'. J. Amer. Statist. Ass., 70, 439-442.
(Ch. 6)
Wilk, M. B., and Gnanadesikan, R. (1964). 'Graphical methods for internai compari-
sons in multiresponse experiments'. Ann. Math. Statist., 35, 613-631. (222, 226,
230)
Wilk, M. B., Gnanadesikan, R., and Huyett, M. J. (1962a). 'Probability plots for the
gamma distribution'. Technometrics, 4, 1-20. (226, 230)
Wilk, M. B., Gnanadesikan, R., and Huyett, M. J. (1962b). 'Estimation of parame-
ters of the gamma distribution using arder statistics'. Biometrika, 49, 525-545.
(226)
Wilks, S. S. (1962). Mathematical Statistics. Wiley, New York. (215)
Wilks, S. S. (1963). 'Multivariate statistica! outliers' Sankhyii, A, 25, 407-426. (215,
226)
Williams, E. J. (Ed.) (1974). Studies in Probability and Statistics. Jerusalem Academic
Press, Jerusalem.
Willke, T. A. (1966). 'A note on contaminated samples of size three'. Journal of
Research of the National Bureau of Standards, B, 70, 149-151. (51, 127, 135)
Winlock, J. (1856). 'On Professar Airy's objections to Peirce's criterion'. Astr. l. 4,
145-147. (Ch. 2, H)
Wooding, W. M. (1969). 'The computation and use of residuals in the analysis of
experiment data'. J. Quality Technology, 1, 175-188. Correction, 1, 294 (226)
Wright, T. W. (1884). A Treatise on the Adjustment of Obseroations by the Method of
Least Squares. Van Nostrand, New York. (2, 21)
Wright, T. W., and Hayford, J. F. (1906). Adjustment of Obseroations. V an Nostrand,
New York. (21)
Index
'Aberrant' observation, 36, 51, 53, 208, multivariate mean, 231-232
237 (but see Outlier) multivariate normal data, 231-232
'Abnormal' observation, 19 (but see performance criteria (univariate infer-
Outlier) ence), 13Q-144
Accommodation of outliers (robust in- premium-protection rules, 50, 131
ference), 4, 20, 26, 46-51 rank test estimators (R-estimators),
adaptive estimators, 49 49, 153-154
blanket procedures, 4 7-49 robustness of performance, 126
correlation coefficient, 233 robustness of validity, 126
exponential samples, 171-173 specific procedures, 47, 49-51
Hodges-Lehmann estimator, 49, 154 studentized location estimators, 16Q-
in confidence intervals, 141-142, 16Q- 162, 168-169
162 trimming, 21, 26, 48
in designed experiments, 246-247 univariate, 126-173
in estimating dispersion, 129, 133- univariate normal data, 163-171
134, 150, 158-160, 169-171 using Bayesian methods, 274-275,
in estimating location, 47-49, 128- 277-282
129, 130, 144-158, 163-169 using non-parametric methods, 285
in generallinear mode l, 246-24 7, 260 variance-covariance matrix, 232-233
in regression, 256 Winsorization, 26, 48, 50
in significance tests, 142-144, 16Q- (For further details see under specific
162 headings: Trimming, etc.)
in time-series, 268 Adaptive inference procedures, 49, 148-
infl.uence functions and infl.uence 149, 157
curves, 136-141, 228-229 Adjusted residuals, 168, 232, 260
linear arder statistics estimators (L- Andrews' Fourier-type plot, 227, 228
estimators), 48, 152-153, 171- 'Anomalous' observation, 8, 18, 19, 230
172 (but see Outlier)
maximin robust estimator, 135, 156,
157 'Ballooning', 253
maximum likelihood type estimators Basic model, 26, 29, 285
(M-estimators), 48-49, 149-152, Bayesian treatment of outliers, 15, 45,
163-166, 169-170 46, 269-282
median, 46 accommodation, 45, 274-275, 277-
minimax robust estimator, 135, 156, 282
157 'detection of spuriosity', 275-277
multivariate data, 231-233 exchangeable model, 36, 272, 279
357
Index
'Aberrant' observation, 36, 51, 53, 208, multivariate mean, 231-232
237 (but see Outlier) multivariate normal data, 231-232
'Abnormal' observation, 19 (but see performance criteria (univariate infer-
Outlier) ence), 13Q-144
Accommodation of outliers (robust in- premium-protection rules, 50, 131
ference), 4, 20, 26, 46-51 rank test estimators (R-estimators),
adaptive estimators, 49 49, 153-154
blanket procedures, 4 7-49 robustness of performance, 126
correlation coefficient, 233 robustness of validity, 126
exponential samples, 171-173 specific procedures, 47, 49-51
Hodges-Lehmann estimator, 49, 154 studentized location estimators, 16Q-
in confidence intervals, 141-142, 16Q- 162, 168-169
162 trimming, 21, 26, 48
in designed experiments, 246-247 univariate, 126-173
in estimating dispersion, 129, 133- univariate normal data, 163-171
134, 150, 158-160, 169-171 using Bayesian methods, 274-275,
in estimating location, 47-49, 128- 277-282
129, 130, 144-158, 163-169 using non-parametric methods, 285
in generallinear mode l, 246-24 7, 260 variance-covariance matrix, 232-233
in regression, 256 Winsorization, 26, 48, 50
in significance tests, 142-144, 16Q- (For further details see under specific
162 headings: Trimming, etc.)
in time-series, 268 Adaptive inference procedures, 49, 148-
infl.uence functions and infl.uence 149, 157
curves, 136-141, 228-229 Adjusted residuals, 168, 232, 260
linear arder statistics estimators (L- Andrews' Fourier-type plot, 227, 228
estimators), 48, 152-153, 171- 'Anomalous' observation, 8, 18, 19, 230
172 (but see Outlier)
maximin robust estimator, 135, 156,
157 'Ballooning', 253
maximum likelihood type estimators Basic model, 26, 29, 285
(M-estimators), 48-49, 149-152, Bayesian treatment of outliers, 15, 45,
163-166, 169-170 46, 269-282
median, 46 accommodation, 45, 274-275, 277-
minimax robust estimator, 135, 156, 282
157 'detection of spuriosity', 275-277
multivariate data, 231-233 exchangeable model, 36, 272, 279
357
358 Statistica[ Data Index 359

in generai linear model, 252 Designed experiments (outliers in), 14, 'Doubtful' observation, 18, 19 (but see Folded median (Bickel-Hodges es-
in multivariate data, 277 234-252 Outlier) timator), 154
multiple decision approach, 279 accommodation, 246-24 7 in normal sample, 165-166
multiple outliers, 272-275 Bayesian method, example, 274-275 Fréchet distribution (see Extreme-value
philosophical considerations, 269-271, detected by two-stage maximum likeli- 'Errar', 32 distributions)
283 hood procedure, 60 Evil, 21
'semi-Bayesian' methods, 272, 280, effect confounded with non-normality, Exchangeable model for outliers, 35-37, Gamma distribution
281 non-additivity, etc., 238, 247, 249 50 Bayesian treatment of outliers, 281,
slippage model, 33, 36 graphical methods, 247-249 exponential distribution, 36 282
slippage tests, 192, 193, 200-201, 206 half-normal plots, 248 in Bayesian context, 272, 279 details of discordancy tests for practi-
'tests of discordancy', 269-277 indic~ted by pattern disruption, 235,
Execution errar, 27-28, 46 cal use, 7 5-88; guide to use of
Bickel-Hodges estimator (see Folded 238, 245 Exponential distribution tests, 75-76; contents list, 77-79;
medi an) masking, 251 accommodation of outliers, 171-173 worksheets, 79-88; tables, 290-
Binomia! distribution multiple decision approach, 240 Bayesian accommodation, 279-281 291, 294-295
details of discordancy tests for practi- multiple outliers, 244, 247, 248, 249- Bayesian procedure for slippage, 206 distribution of X<nl' 211
cal use, 75-76, 115-116, 123- 251 details of discordancy tests for practi- extreme/location statistic, 40
124; guide to use of tables, 75- non-parametric methods, 251-252 cal use, 75-88, 124-125; guide to in relation to test for slippage of nor-
76; worksheets, 123-124; tables, non-residual-based methods, 249-251 use of tests, 75-76; contents list, mal variance, 20 l
320-322 sensitivity contours, 249 77-79; worksheets, 79-88, 124- outlier proneness, 3 7
slippage tests, 203-204; tables, 320- swamping, 251 125; tables, 290-297 shifted origin, 77-78
322 table, 334 estimating the mean, 35-36, 50, 51 slippage tests, 197, 201-202
Block procedure for slippage, 184 tests of discordancy, 238-246 exchangeable model for outliers, 35- Gamma-type probability plots, 226, 230
Block tests of discordancy (see Multiple use of residuals, 60, 236-237, 238- 36, 50 Glaisher's accommodation procedure, 20
outliers) 247 (see also Residuals) L-estimator, 51 Goodwin's rejection test, 21
Breakdown point, 140-141 Detection of outliers, 52 labelled slippage model, 72-73 Gross-error sensitivity, 140
in multivariate data, 208-209, 210- locally optimal test for outlier, 59 Gumbel distribution (see Extreme-value
Carelessness (modulus of), 19 211, 214-215 maximum likelihood ratio test, 56-58 distributions)
Cauchy distribution in time-series, 266-267 multiple decision procedure, 58-59
compared with normal distribution, 9 pattern disruption in d~signed experi- multiple outliers, 36-37 Hinge, 153
L-estimators, 48 ments, 235, 238, 245 recursive derivation of discordancy- Historical background, 18-22, 47
outlier proneness, 3 7 two-stage maximum likelihood proce- test statistic, 62-64 Hodges-Lehmann estimator, 49, 154
Chauvenet's rejection test, 2, 19 dure, 59, 60-61, 210-211, 214- robust tests, 35-36, 50, 51 in normal data, 164, 169
'Cloaking' of outliers by other non-null 215 shifted origin, 77-78 Huber's proposal 2, 151-152, 158
manifestations in structured data, Deterministic model for explaining out- trimmed mean, 51 Bickel one-step modification, 158
238, 247, 249, 265, 287 liers, 6-7, 14, 18, 23, 30-31 truncated, 124-125 in normal data, 163-164, 169
Confidence intervals, robust against out- Discordancy, 23 two-stage maximum likelihood ratio
liers, 141-142, 160-162 discordancy test, 24 (see also Tests of test, 59 Inclusive and exclusive measures of dis-
errar frequency, 142 discordancy) Winsorized mean, 51 cordancy, 61-64, 72
errar probability, 142 Discordant observations, 23 Extreme-value (Gumbel, Fréchet, Individuai outliers in slipped samples,
Consecutive procedures for slippage, multiple, 37 Weibull) distributions 183
184 Distance measures (multivariate data), discordancy tests for practical use, lnfluence curve, 136-141
Consecutive tests of discordancy (see 208, 209 115-116, 117-118 for correlation coefficient, 228-229
Multiple outliers) generalized distances, 224-227 in Bayesian treatment of (Weibull) for trimmed mean, 145-147
Contaminant, 65, 127 graphical plots, 212-213 outliers; 281, 282 for Winsorized mean, 146-147
Contaminated distribution, 127, 255 use in outlier detection, 208, 209, 215 mean-squared value as asymptotic
Contamination, 47, 127-130 Distribution (see under specific heading: Ferguson's slippage models A and B for variance, 139
asymmetric, 128, 132-133, 135-136 Binomia! distribution, etc.) normal outliers, 34 lnfluence function (see Influence curve)
effect on estimating normal mean and Dixon statistics for tests of discordancy, model A, 34, 42, 167-168, 276 lnherent model for outliers, 9-10, 23, 31
variance, 128-129 54, 55-56, 60 model B, 34, 42, 167-168, 276, 277
symmetric, 128 for exponential and gamma data, 78- multivariate model A, 210-218, 219 J ackknifing, 48, 161
undetectability in normal mixture, 79, 81, 82, 83, 85, 86-88 multivariate model B, 59, 210, 218-
129-130 for univariate normal data, 91, 97-100 219 Kurtosis, 39, 42, 102, 109, 266
358 Statistica[ Data Index 359

in generai linear model, 252 Designed experiments (outliers in), 14, 'Doubtful' observation, 18, 19 (but see Folded median (Bickel-Hodges es-
in multivariate data, 277 234-252 Outlier) timator), 154
multiple decision approach, 279 accommodation, 246-24 7 in normal sample, 165-166
multiple outliers, 272-275 Bayesian method, example, 274-275 Fréchet distribution (see Extreme-value
philosophical considerations, 269-271, detected by two-stage maximum likeli- 'Errar', 32 distributions)
283 hood procedure, 60 Evil, 21
'semi-Bayesian' methods, 272, 280, effect confounded with non-normality, Exchangeable model for outliers, 35-37, Gamma distribution
281 non-additivity, etc., 238, 247, 249 50 Bayesian treatment of outliers, 281,
slippage model, 33, 36 graphical methods, 247-249 exponential distribution, 36 282
slippage tests, 192, 193, 200-201, 206 half-normal plots, 248 in Bayesian context, 272, 279 details of discordancy tests for practi-
'tests of discordancy', 269-277 indic~ted by pattern disruption, 235,
Execution errar, 27-28, 46 cal use, 7 5-88; guide to use of
Bickel-Hodges estimator (see Folded 238, 245 Exponential distribution tests, 75-76; contents list, 77-79;
medi an) masking, 251 accommodation of outliers, 171-173 worksheets, 79-88; tables, 290-
Binomia! distribution multiple decision approach, 240 Bayesian accommodation, 279-281 291, 294-295
details of discordancy tests for practi- multiple outliers, 244, 247, 248, 249- Bayesian procedure for slippage, 206 distribution of X<nl' 211
cal use, 75-76, 115-116, 123- 251 details of discordancy tests for practi- extreme/location statistic, 40
124; guide to use of tables, 75- non-parametric methods, 251-252 cal use, 75-88, 124-125; guide to in relation to test for slippage of nor-
76; worksheets, 123-124; tables, non-residual-based methods, 249-251 use of tests, 75-76; contents list, mal variance, 20 l
320-322 sensitivity contours, 249 77-79; worksheets, 79-88, 124- outlier proneness, 3 7
slippage tests, 203-204; tables, 320- swamping, 251 125; tables, 290-297 shifted origin, 77-78
322 table, 334 estimating the mean, 35-36, 50, 51 slippage tests, 197, 201-202
Block procedure for slippage, 184 tests of discordancy, 238-246 exchangeable model for outliers, 35- Gamma-type probability plots, 226, 230
Block tests of discordancy (see Multiple use of residuals, 60, 236-237, 238- 36, 50 Glaisher's accommodation procedure, 20
outliers) 247 (see also Residuals) L-estimator, 51 Goodwin's rejection test, 21
Breakdown point, 140-141 Detection of outliers, 52 labelled slippage model, 72-73 Gross-error sensitivity, 140
in multivariate data, 208-209, 210- locally optimal test for outlier, 59 Gumbel distribution (see Extreme-value
Carelessness (modulus of), 19 211, 214-215 maximum likelihood ratio test, 56-58 distributions)
Cauchy distribution in time-series, 266-267 multiple decision procedure, 58-59
compared with normal distribution, 9 pattern disruption in d~signed experi- multiple outliers, 36-37 Hinge, 153
L-estimators, 48 ments, 235, 238, 245 recursive derivation of discordancy- Historical background, 18-22, 47
outlier proneness, 3 7 two-stage maximum likelihood proce- test statistic, 62-64 Hodges-Lehmann estimator, 49, 154
Chauvenet's rejection test, 2, 19 dure, 59, 60-61, 210-211, 214- robust tests, 35-36, 50, 51 in normal data, 164, 169
'Cloaking' of outliers by other non-null 215 shifted origin, 77-78 Huber's proposal 2, 151-152, 158
manifestations in structured data, Deterministic model for explaining out- trimmed mean, 51 Bickel one-step modification, 158
238, 247, 249, 265, 287 liers, 6-7, 14, 18, 23, 30-31 truncated, 124-125 in normal data, 163-164, 169
Confidence intervals, robust against out- Discordancy, 23 two-stage maximum likelihood ratio
liers, 141-142, 160-162 discordancy test, 24 (see also Tests of test, 59 Inclusive and exclusive measures of dis-
errar frequency, 142 discordancy) Winsorized mean, 51 cordancy, 61-64, 72
errar probability, 142 Discordant observations, 23 Extreme-value (Gumbel, Fréchet, Individuai outliers in slipped samples,
Consecutive procedures for slippage, multiple, 37 Weibull) distributions 183
184 Distance measures (multivariate data), discordancy tests for practical use, lnfluence curve, 136-141
Consecutive tests of discordancy (see 208, 209 115-116, 117-118 for correlation coefficient, 228-229
Multiple outliers) generalized distances, 224-227 in Bayesian treatment of (Weibull) for trimmed mean, 145-147
Contaminant, 65, 127 graphical plots, 212-213 outliers; 281, 282 for Winsorized mean, 146-147
Contaminated distribution, 127, 255 use in outlier detection, 208, 209, 215 mean-squared value as asymptotic
Contamination, 47, 127-130 Distribution (see under specific heading: Ferguson's slippage models A and B for variance, 139
asymmetric, 128, 132-133, 135-136 Binomia! distribution, etc.) normal outliers, 34 lnfluence function (see Influence curve)
effect on estimating normal mean and Dixon statistics for tests of discordancy, model A, 34, 42, 167-168, 276 lnherent model for outliers, 9-10, 23, 31
variance, 128-129 54, 55-56, 60 model B, 34, 42, 167-168, 276, 277
symmetric, 128 for exponential and gamma data, 78- multivariate model A, 210-218, 219 J ackknifing, 48, 161
undetectability in normal mixture, 79, 81, 82, 83, 85, 86-88 multivariate model B, 59, 210, 218-
129-130 for univariate normal data, 91, 97-100 219 Kurtosis, 39, 42, 102, 109, 266
360 Statistical Data
Index 361
L-estimators (see Linear arder statistics Huber's proposal 2, 151-152
estimators) one-step Huber estimators, 151 tables for tests of discordancy, 329- block procedure, 184
Labelled slippage model (see Slippage three-part descending M-estimators, 333 consecutive procedures, 184-185
model for outliers) 151 tests of discordancy for outliers, 209- equal sized samples, 176-177, 178-
Linear model (outlier in), 253 trimmed mean, 150 219, 230-231 183
accommodation, 260 Winsorized mean, 150 Multivariate outliers, 14, 59, 208-233 models, 182-183
Bayesian methods, 277 Mean deviation, 159 accommodation, 231-233 multiple sample slippage, 183-186
equivalence of use of residuals and of Measurement errar, 27-28, 37, 46 Andrews' Fourier-type plot, 217, 218 performance (power), 179
residua! sums of squares, 262-265 Median, 46 Bayesian method, 277 rank methods, 180-183
graphical methods, 261 influence of contaminants on median detection, 208-209 single sample slippage, 176-183
multiple decision approach, 240 anp mean, 136-137, 138-139, detection by two-stage maximum tables, 324-326
multiple outliers, 249-251, 261, 265 140 likelihood procedure, 60-61, unequal sized samples, 177-178, 179
non-residual-based methods, 264-265 Median deviation, 150, 158 210-211, 214-215 Non-parametric treatment (')f outliers,
residual-based methods, 257-264 Mendeleev's accommodation procedure, distance measures, 208, 209, 224-227 15, 44, 282-285
slippage model, 262-264 21 estimation of correlation coefficient, accommodation, 285
table, 335-336 Mid-mean, 145 233 in designed experiments, 251-252
tests of discordancy, 257-265 Minimax robust estimator, 134, 156, estimation of mean, 231-232 non-parametric slippage tests, 176-
two-stage maximum likelihood ratio 157, 169 estimation of variance-covariance 187 (see Slippage tests)
procedure, 262-264 Mixture model for outliers, 23, 31-33 matrix, 232-233 philosophical considerations, 282-283,
(see also Designed experiments, Re- as model for contamination: accom- Ferguson models A and B, 210-219 285
gression) modation aspect, 127-130 gamma-type plots, 226, 230 tests of discordancy, 283-285
Linear arder statistics estimators (L- normal, 33 'gap test', 229-231 Normal distribution
estimators), 48, 152-153 'Modified' residuals, 249 generalized distances, 224-227 accommodation of outliers in mul-
asymptotic normality, 162 Multinomial distribution (slippage test), graphical methods, 221-223, 224- tivariate samples, 209, 231-233
Gastwirth's estimator, 153, 164 203 225, 226, 230 accommodation of outliers in uni-
in exponential samples, 171-172 Multiple decision procedure influence function for correlation variate samples, 163-171
modified trimmed mean, 152 for outlier in generai linear model, coefficient, 228-229 Bayesian accommodation of outliers,
modified Winsorized mean, 152 240 informai methods, 219-233 33, 277-282
Trimean, 153 in Bayesian treatment, 279 internai scatter, 214, 215 Bayesian 'detection of spuriosity',
Local-shift sensitivity, 140 in multivariate normal tests of discor- linear constraints, 221 275-277
Log-normal distribution dancy, 218-219 marginai outliers, 220 details of univariate discordancy tests
discordancy tests for practical use, in slippage tests, 192, 193-194, 196- maximum likelihood ratio procedure, for practical use, 75-76, 89-115;
115-116, 118 210-211, 214-215, 219 guid~ to use of tests, 7 5-7 6; con-
197, 200
outlier proneness, 3 7 in univariate tests of discordancy, 43, minimum spanning tree, 229-230 tent list, 91-93; worksheets, 93-
44, 58-59 multiple decision approach, 218-219 115; tables, 298-315
M-estimators (see Maximum likelihood Multiple outliers, 44, 68-7 4 multiple outliers, 215-218 effect of contaminant on estimation of
type estimators) block tests of discordancy, 40, 69-73 normal samples, 209-219 mean and variance, 128-129,
Masking, 38, 40, 43, 71, 74, 251 consecutive tests of discordancy, 40- outlier scatter ratio, 215, 216 132-133
generalized, 287 (see also 'Cloaking') 41, 69-71, 73-74 plots of distance measures, 212-213 Ferguson models A and B (see Fergu-
Maximin robust estimator, 135, 156, 157 in Bayesian context, 272-275 premium-protection procedures (nor- son models)
Maximum likelihood ratio principle in design ed experiments, 244, 24 7, ma! samples), 231-232 maximum likelihood ratio test, 59-60
in linear model, 262-264 248, 249-251 tables for tests of discordancy, 329- multiple decision procedure, 59-60
in tests for multivariate outliers, 210- in exponential samples, 36-37 333 multivariate discordancy tests (see
211, 214-215, 219 in linear models, 249-251, 265 use of correlation coefficients, 227- Multivariate normal distribution)
in tests of discordancy, 41, 56-58 in multivariate normal samples, 215- 229 multivariate slippage model, 59
in time-series, 267-268 218 use of principal components, 223-226, outlier proneness, 37
two-stage, 59, 210-211, 214-215, masking effect (see Masking) 227 slippage models, 33, 34-35, 42-44,
262-264 number of outliers, 68-69 45-46, 59-60
Maximum likelihood type estimators relative performance of block and slippage tests (mean and variance),
(M-estimators), 48-49, 149-152 consecutive tests, 70-71 Negative binomia! distribution (slippage 188-197; tables, 290-291, 302,
asymptotic normality, 162 test), 204 303, 327
'sequential' tests of discordancy, 40,
for dispersion in normal data, 169- Newcomb's accommodation procedure, testing of outliers in multivariate sam-
41, 69, 73
170 21 ples, 209-219
swamping effect (see Swamping)
for location in normal data, 163-166 Non-parametric slippage tests of loca- undetectability of contamination, 129-
Multivariate normal distribution
tion, 176-187 130
360 Statistical Data
Index 361
L-estimators (see Linear arder statistics Huber's proposal 2, 151-152
estimators) one-step Huber estimators, 151 tables for tests of discordancy, 329- block procedure, 184
Labelled slippage model (see Slippage three-part descending M-estimators, 333 consecutive procedures, 184-185
model for outliers) 151 tests of discordancy for outliers, 209- equal sized samples, 176-177, 178-
Linear model (outlier in), 253 trimmed mean, 150 219, 230-231 183
accommodation, 260 Winsorized mean, 150 Multivariate outliers, 14, 59, 208-233 models, 182-183
Bayesian methods, 277 Mean deviation, 159 accommodation, 231-233 multiple sample slippage, 183-186
equivalence of use of residuals and of Measurement errar, 27-28, 37, 46 Andrews' Fourier-type plot, 217, 218 performance (power), 179
residua! sums of squares, 262-265 Median, 46 Bayesian method, 277 rank methods, 180-183
graphical methods, 261 influence of contaminants on median detection, 208-209 single sample slippage, 176-183
multiple decision approach, 240 anp mean, 136-137, 138-139, detection by two-stage maximum tables, 324-326
multiple outliers, 249-251, 261, 265 140 likelihood procedure, 60-61, unequal sized samples, 177-178, 179
non-residual-based methods, 264-265 Median deviation, 150, 158 210-211, 214-215 Non-parametric treatment (')f outliers,
residual-based methods, 257-264 Mendeleev's accommodation procedure, distance measures, 208, 209, 224-227 15, 44, 282-285
slippage model, 262-264 21 estimation of correlation coefficient, accommodation, 285
table, 335-336 Mid-mean, 145 233 in designed experiments, 251-252
tests of discordancy, 257-265 Minimax robust estimator, 134, 156, estimation of mean, 231-232 non-parametric slippage tests, 176-
two-stage maximum likelihood ratio 157, 169 estimation of variance-covariance 187 (see Slippage tests)
procedure, 262-264 Mixture model for outliers, 23, 31-33 matrix, 232-233 philosophical considerations, 282-283,
(see also Designed experiments, Re- as model for contamination: accom- Ferguson models A and B, 210-219 285
gression) modation aspect, 127-130 gamma-type plots, 226, 230 tests of discordancy, 283-285
Linear arder statistics estimators (L- normal, 33 'gap test', 229-231 Normal distribution
estimators), 48, 152-153 'Modified' residuals, 249 generalized distances, 224-227 accommodation of outliers in mul-
asymptotic normality, 162 Multinomial distribution (slippage test), graphical methods, 221-223, 224- tivariate samples, 209, 231-233
Gastwirth's estimator, 153, 164 203 225, 226, 230 accommodation of outliers in uni-
in exponential samples, 171-172 Multiple decision procedure influence function for correlation variate samples, 163-171
modified trimmed mean, 152 for outlier in generai linear model, coefficient, 228-229 Bayesian accommodation of outliers,
modified Winsorized mean, 152 240 informai methods, 219-233 33, 277-282
Trimean, 153 in Bayesian treatment, 279 internai scatter, 214, 215 Bayesian 'detection of spuriosity',
Local-shift sensitivity, 140 in multivariate normal tests of discor- linear constraints, 221 275-277
Log-normal distribution dancy, 218-219 marginai outliers, 220 details of univariate discordancy tests
discordancy tests for practical use, in slippage tests, 192, 193-194, 196- maximum likelihood ratio procedure, for practical use, 75-76, 89-115;
115-116, 118 210-211, 214-215, 219 guid~ to use of tests, 7 5-7 6; con-
197, 200
outlier proneness, 3 7 in univariate tests of discordancy, 43, minimum spanning tree, 229-230 tent list, 91-93; worksheets, 93-
44, 58-59 multiple decision approach, 218-219 115; tables, 298-315
M-estimators (see Maximum likelihood Multiple outliers, 44, 68-7 4 multiple outliers, 215-218 effect of contaminant on estimation of
type estimators) block tests of discordancy, 40, 69-73 normal samples, 209-219 mean and variance, 128-129,
Masking, 38, 40, 43, 71, 74, 251 consecutive tests of discordancy, 40- outlier scatter ratio, 215, 216 132-133
generalized, 287 (see also 'Cloaking') 41, 69-71, 73-74 plots of distance measures, 212-213 Ferguson models A and B (see Fergu-
Maximin robust estimator, 135, 156, 157 in Bayesian context, 272-275 premium-protection procedures (nor- son models)
Maximum likelihood ratio principle in design ed experiments, 244, 24 7, ma! samples), 231-232 maximum likelihood ratio test, 59-60
in linear model, 262-264 248, 249-251 tables for tests of discordancy, 329- multiple decision procedure, 59-60
in tests for multivariate outliers, 210- in exponential samples, 36-37 333 multivariate discordancy tests (see
211, 214-215, 219 in linear models, 249-251, 265 use of correlation coefficients, 227- Multivariate normal distribution)
in tests of discordancy, 41, 56-58 in multivariate normal samples, 215- 229 multivariate slippage model, 59
in time-series, 267-268 218 use of principal components, 223-226, outlier proneness, 37
two-stage, 59, 210-211, 214-215, masking effect (see Masking) 227 slippage models, 33, 34-35, 42-44,
262-264 number of outliers, 68-69 45-46, 59-60
Maximum likelihood type estimators relative performance of block and slippage tests (mean and variance),
(M-estimators), 48-49, 149-152 consecutive tests, 70-71 Negative binomia! distribution (slippage 188-197; tables, 290-291, 302,
asymptotic normality, 162 test), 204 303, 327
'sequential' tests of discordancy, 40,
for dispersion in normal data, 169- Newcomb's accommodation procedure, testing of outliers in multivariate sam-
41, 69, 73
170 21 ples, 209-219
swamping effect (see Swamping)
for location in normal data, 163-166 Non-parametric slippage tests of loca- undetectability of contamination, 129-
Multivariate normal distribution
tion, 176-187 130
362 Statistica[ Data Index 363

Performance criteria for discordancy location estimates (normal samples),


Old French custom, 26 'surprise', l, 10, 14, 32, 37, 234, 270, tests, 30, 41, 43-44, 45, 64-68 167-168
One-step Huber estimators, 151 273, 282, 285, 286-288 Single outliers: premium, 50, 131-132
Ordering in multivariate data, 208, 209 Outlier-generating models, 23, 28-37 David's measures P 1 (power), P 2 , P3, premium-protection rules, 50, 131,
distance measures, 208, 209 contamination models, 4 7 P4, Ps 231-232
marginai ordering, 209 deterministic, 6-7, 14, 18, 23, 27-28, for slippage model, 65-67 protection, 50, 131-132
reduced ordering, 209 30-31 measures for inherent model, 68 scale estimates (exponential samples),
sub-ordering, 208-209 exchangeable, 35-37, 50, 272 measures for mixture model, 67-68 173
(see also Multivariate outliers) in time-series, 266-267 Multiple outliers: Principal component analysis (in study of
Outlier inherent, 9-10, 23, 31 comparison of consecutive and block multivariate outliers), 223-226, 227
accommodation, 4, 20, 24-26, 236, intangibility in non-parametric con- tests, 70-71 Protection (see Premium and protection)
271 (see also Accommodation of text, 285 measures for block tests, 73
outliers) irrelevant in Bayesian approach?, 15 measures for consecutive tests, 73-7 4 R-estimators (see Rank test estimators)
as distinct from discordant observa- mixture, 23, 31-33, 278 Performance criteria in outlier-robust in- Rank test estimators (R-estimators), 49,
tion, 23 non-specific with respect to the out- ference 153-154
as distinct from extreme observation, lier, 33, 45 asymptotic measures, 134-136, 137 asymptotic normality, 162
23 slippage, 34-35, 42, 43 beakdown point, 140-141 folded-median estimator (Bickel-
basic model (working hypothesis) (see also under specific headings) errar frequency, 142 Hodges estimator), 154
against which outliers are ex- Outlier-robust methods (see Accommo- errar probability, 142 Hodges-Lehmann estimator, 154
amined, 26, 29, 285 dation of outliers) exponential samples, 171-173 Recursive algorithm for null distribution
Bayesian methods, 200-201, 269-282 Outlier proneness, 37 finite-sample measures, 131-133, 135, of discordancy test statistic, 62-64,
(see also Bayesian treatment of outlier-prone, 38 137 72, 75
outliers) outlier-prone completely, 38 for confidence intervals, 141-142 Regression (outliers in), 11, 252-257
causes, 26-28 outlier-resistant, 38 for location estimators, 155-158 accommodation, 256
classification of outlier problems, 5, 16 Outlying sub-sample, 236, 282 (see Slip- for significance tests, 142-144 detection by two-stage maximum
definition, 4, 22-23, 286-287 page tests) gross-error sensitivity, 140 likelihood procedure, 60-61
detection (non-subjective), 52 (see influence curve, 136-141 example, 11-12
Detection of outliers) Parametric slippage tests, 187-205 local-shift sensitivity, 140 linear regression, 252-255
different aims in handling outliers Bayes solution, 192, 193 rejection point, 140 multiple regression, 256-257
(schematic diagram), 28 binomia! samples, 203-204 univariate normal samples, 163-171 tests of discordancy, 252-257
identification, 8, 24-26, 236, 271 gamma samples, 197, 201-202, 206 univariate samples (generai), 130-144 use of residuals, 11, 60-61, 252-257
in designed experiment (see Designed group tests (multiple slipped samples), Poisson distribution Rejection point, 140
experiments) 204-205 details of discordancy tests for practi- Residuals, 11, 60, 236-237, 238-247,
in regression (see Regression) multinomial samples, 203 cal use, 75-76, 115-116, 120- 257-264
in time-series ( see Time-series) multiple decision approach, 192, 193- 122; guide to use of tests, 75-76; adjusted residuals, 168, 232, 260
masking effect (see Masking) 194, 196-197, 200 worksheets, 121-122; tables, 'ballooning', 253
models for explaining outliers (see negative binomia! samples, 204 316-319, 328 estimated residuals, 236, 239, 250,
Outlier-generating models) normal samples, 188-197 modified, 8 253
multiple, 40, 44, 68-74 (see also Mul- normal samples, mean, 188-190, 192- slippage tests for Poisson samples, graphical display, 247-249
tiple outliers) 196 202-203; table, 328 in designed experiments, 236-237,
multivartiate (see Multivariate out- normal samples, variance, 190-191, Power of discordancy test, 65-66, 70 238-247
liers) 196-197 (see also Performance criteria for in linear model, 253, 257, 264
non-parametric methods, 176-187, optimality of Paulson procedure, 192 discordancy tests) largest absolute residua!, 240, 242,
282-285 (see also Non- Poisson samples, 202-203 Premium and protection 246
parametric treatment of outliers) relationship with single outlier tests, for multivariate normal samples, 231 ..... maximum absolute studentized re-
outlying sub-sample (see Slippage 188-189 232 sidua!, 240
tests) slippage in unspecified direction, 194 in designed experiments, 246-247 maximum normal residua!, 242-243
rejection, l, 18-19, 24-26, 236, 271 tables, 290-291, 302, 303, 327, 328 in generai linear models, 246-247, 'modified' residuals, 249
relative nature (to model), 5, 7-10, 16 unequal sample sizes, 194-195, 197 260 normalized residuals, 260
significance level (subjective aspect), use of ranges, 197 in relation to hypothesis testing, 51, studentized residuals, 240, 258-264
64 Pareto distribution, discordancy tests for 144, 172 (see also Designed experiments,
subjective nature, 4, 15, 22-23, 45, practical use, 115-116 in time-series, 268 Linear model, Regression)
64, 286-287 Peirce's rejection test, 19
362 Statistica[ Data Index 363

Performance criteria for discordancy location estimates (normal samples),


Old French custom, 26 'surprise', l, 10, 14, 32, 37, 234, 270, tests, 30, 41, 43-44, 45, 64-68 167-168
One-step Huber estimators, 151 273, 282, 285, 286-288 Single outliers: premium, 50, 131-132
Ordering in multivariate data, 208, 209 Outlier-generating models, 23, 28-37 David's measures P 1 (power), P 2 , P3, premium-protection rules, 50, 131,
distance measures, 208, 209 contamination models, 4 7 P4, Ps 231-232
marginai ordering, 209 deterministic, 6-7, 14, 18, 23, 27-28, for slippage model, 65-67 protection, 50, 131-132
reduced ordering, 209 30-31 measures for inherent model, 68 scale estimates (exponential samples),
sub-ordering, 208-209 exchangeable, 35-37, 50, 272 measures for mixture model, 67-68 173
(see also Multivariate outliers) in time-series, 266-267 Multiple outliers: Principal component analysis (in study of
Outlier inherent, 9-10, 23, 31 comparison of consecutive and block multivariate outliers), 223-226, 227
accommodation, 4, 20, 24-26, 236, intangibility in non-parametric con- tests, 70-71 Protection (see Premium and protection)
271 (see also Accommodation of text, 285 measures for block tests, 73
outliers) irrelevant in Bayesian approach?, 15 measures for consecutive tests, 73-7 4 R-estimators (see Rank test estimators)
as distinct from discordant observa- mixture, 23, 31-33, 278 Performance criteria in outlier-robust in- Rank test estimators (R-estimators), 49,
tion, 23 non-specific with respect to the out- ference 153-154
as distinct from extreme observation, lier, 33, 45 asymptotic measures, 134-136, 137 asymptotic normality, 162
23 slippage, 34-35, 42, 43 beakdown point, 140-141 folded-median estimator (Bickel-
basic model (working hypothesis) (see also under specific headings) errar frequency, 142 Hodges estimator), 154
against which outliers are ex- Outlier-robust methods (see Accommo- errar probability, 142 Hodges-Lehmann estimator, 154
amined, 26, 29, 285 dation of outliers) exponential samples, 171-173 Recursive algorithm for null distribution
Bayesian methods, 200-201, 269-282 Outlier proneness, 37 finite-sample measures, 131-133, 135, of discordancy test statistic, 62-64,
(see also Bayesian treatment of outlier-prone, 38 137 72, 75
outliers) outlier-prone completely, 38 for confidence intervals, 141-142 Regression (outliers in), 11, 252-257
causes, 26-28 outlier-resistant, 38 for location estimators, 155-158 accommodation, 256
classification of outlier problems, 5, 16 Outlying sub-sample, 236, 282 (see Slip- for significance tests, 142-144 detection by two-stage maximum
definition, 4, 22-23, 286-287 page tests) gross-error sensitivity, 140 likelihood procedure, 60-61
detection (non-subjective), 52 (see influence curve, 136-141 example, 11-12
Detection of outliers) Parametric slippage tests, 187-205 local-shift sensitivity, 140 linear regression, 252-255
different aims in handling outliers Bayes solution, 192, 193 rejection point, 140 multiple regression, 256-257
(schematic diagram), 28 binomia! samples, 203-204 univariate normal samples, 163-171 tests of discordancy, 252-257
identification, 8, 24-26, 236, 271 gamma samples, 197, 201-202, 206 univariate samples (generai), 130-144 use of residuals, 11, 60-61, 252-257
in designed experiment (see Designed group tests (multiple slipped samples), Poisson distribution Rejection point, 140
experiments) 204-205 details of discordancy tests for practi- Residuals, 11, 60, 236-237, 238-247,
in regression (see Regression) multinomial samples, 203 cal use, 75-76, 115-116, 120- 257-264
in time-series ( see Time-series) multiple decision approach, 192, 193- 122; guide to use of tests, 75-76; adjusted residuals, 168, 232, 260
masking effect (see Masking) 194, 196-197, 200 worksheets, 121-122; tables, 'ballooning', 253
models for explaining outliers (see negative binomia! samples, 204 316-319, 328 estimated residuals, 236, 239, 250,
Outlier-generating models) normal samples, 188-197 modified, 8 253
multiple, 40, 44, 68-74 (see also Mul- normal samples, mean, 188-190, 192- slippage tests for Poisson samples, graphical display, 247-249
tiple outliers) 196 202-203; table, 328 in designed experiments, 236-237,
multivartiate (see Multivariate out- normal samples, variance, 190-191, Power of discordancy test, 65-66, 70 238-247
liers) 196-197 (see also Performance criteria for in linear model, 253, 257, 264
non-parametric methods, 176-187, optimality of Paulson procedure, 192 discordancy tests) largest absolute residua!, 240, 242,
282-285 (see also Non- Poisson samples, 202-203 Premium and protection 246
parametric treatment of outliers) relationship with single outlier tests, for multivariate normal samples, 231 ..... maximum absolute studentized re-
outlying sub-sample (see Slippage 188-189 232 sidua!, 240
tests) slippage in unspecified direction, 194 in designed experiments, 246-247 maximum normal residua!, 242-243
rejection, l, 18-19, 24-26, 236, 271 tables, 290-291, 302, 303, 327, 328 in generai linear models, 246-247, 'modified' residuals, 249
relative nature (to model), 5, 7-10, 16 unequal sample sizes, 194-195, 197 260 normalized residuals, 260
significance level (subjective aspect), use of ranges, 197 in relation to hypothesis testing, 51, studentized residuals, 240, 258-264
64 Pareto distribution, discordancy tests for 144, 172 (see also Designed experiments,
subjective nature, 4, 15, 22-23, 45, practical use, 115-116 in time-series, 268 Linear model, Regression)
64, 286-287 Peirce's rejection test, 19
364 Statistica[ Data Index 365

Robust inference procedures (robust parametric tests, 187-205 (see deviation/spread, 39-61; Unifrom distribution, details of discor-
against outliers) (see Accommoda- Parametric slippage tests) excess/spread, 38-39, 61; dancy tests for practical use, 115-
tion of outliers) sequential approach, 206-207 extreme/location, 4Q-41; high- 116, 118-120
Robustness of efficiency, 142 slippage of me an, 174 arder, 39-40; inclusive and exclu- 'Unreasonable' observation, 23 (but see
Robustness of performance, 126, 142, slippage of variance, l 74 sive measures, 61-64; omnibus, Outlier)
161 the slippage model, 186-187 40; range/spread, 39; sums of 'Unrepresentative' observation, l, 10
Robustness of validity, 126, 142, 161 'Spurious' observation, l, 8, 27, 45-46, squares, 39 (but see Outlier)
'Rogue' observation, l (but see Outlier) 275-277 (but see Outlier) univariate samples, 52-125
Stability aspect of robustness, 141 Time-series (outliers in), 11, 266-268
'Scatter' in multivariate data Stone's rejection test, 19-20 accommodation, 268
internai scatter, 214, 215 'Suspect~ observation, 23 (but see Out-
'bumps and quakes', 266-267 W -statistics of Shapiro and Wilk
outlier scatter-ratios, 215, 216 lier) detection, 266-267 for exponential sample, 88
Self-camouflaging effect of outliers (see 'Suspicious' observation, 4, 10, 15, 29, maximum likelihood ratio tests, 267- for normal sample, 40, 102
'Cloaking') 220, 222 (but see Outlier) 268 Weibull distribution (see Extreme-value
Semi-interquartile range, 158 Swamping, 71, 251 tests of discordancy, 267-268 distributions)
Sensitivity contours, 249 types of outlier, 266-267 Weighting of observations, 20
Sensitivity curve, 139 Tests of discordancy for outliers, 24, 29, Transformation of variables for discor- Winsorization, 26, 50, 144
Sequential methods for slippage, 206- 47, 52-56 dancy testing, 77, 88, 89-90, 115, and L-estimators, 48
207 block tests (see Multiple outliers) 116, 117, 118, 12Q-121, 123 a-Winsorized mean, 145
'Sequential' tests of discordancy (see consecutive tests (see Multiple out- Trimean, 153 in exponential samples, 171-172
Multiple outliers) liers) Trimmed mean (see Trimming) in normal samples, 163-166, 167-169,
'Shorth', 153 details of univariate tests for practical Trimming, 21, 26, 144 17Q-171, 232, 233
Significance tests robust against outliers, use, 75-125; tables, 29Q-322, as L-estimator, 48 influence function for Winsorized
142-144, 16Q-162 328 a-trimmed mean, 26, 48 mean, 146-147
performance criteria, 142-144 Dixon statistics (see Dixon statistics) in exponential samples, 173 modified Winsorization, 50, 147-148,
'premium' and 'protection', 144 in context of Bayesian approach, 269- in normal samples, 163-166, 167-169, 166-168, 17Q-171
Skewness, 39, 42, lOQ-101, 266 277 17Q-171, 232, 233 r-fold symmetrically Winsorized
Skipping, 153 in designed experiments, 238-246 influence curve for trimmed mean, mean, 145
iterative, 153 in linear models, 257-265 145-147 (r, s)-fold Winsorized mean, 145
multiple, 153 in regression, 252-257 mid-mean, 145 scale estimators, 159
Slippage model for outliers, 34-35 in relation t o slippage test, 20 l modified trimming, 147-148, 166-168 semi-Winsorization, 148, 166-169,
gamma, 44 in time-series, 267-268 r-fold symmetrically trimmed mean, 17Q-171, 232
in linear models, 262-264 intuitively constructed tests, 52-56, 72 145 Winsorized mean, 48, 51
in time-series, 266-267 invariance considerations, 52-56 (r, s)-fold trimmed mean, 144 Winsorized mean (see Winsorization)
labelled and unlabelled slippage mod- maximum likelihood ratio procedure, scale estimators, 162, 17Q-171 Working hypothesis (see Basic model)
els, 57-58, 59-60, 72-73, 259 41 trimmed mean, 26, 51 Wright's rejection test, 2, 21
normal, 33, 34-35 measures of performance (e.g. power),
slippage in dispersion, 34-35, 43 30, 41, 43-44, 45, 64-68 (see also
slippage in location, 34-35, 43 Performance criteria for discor-
(see also Ferguson's slippage models dancy tests)
A and B) multiple decision procedure, 43, 44
Slippage tests, 42-43, 174-207 multivariate normal samples, 209,
Bayesian approach, 192, 193, 20Q- 219, 23Q-231
201, 206 non-parametric, 283-285
generai method for constructing slip- one-sided versus two-sided tests, 44-
page tests, 197-200 45
individuai outliers in slipped samples, optimality criteria, 41-42
183 recursive algorithm for null distribu-
masking, 186 tion of test statistic, 62-64
non-parametric tests of location, 176- 'sequential' (see Multiple outliers)
187 (see Non-parametric slippage statistica! principles, 41-46, 56-61
tests of location) types of test statistic, 38-41;
364 Statistica[ Data Index 365

Robust inference procedures (robust parametric tests, 187-205 (see deviation/spread, 39-61; Unifrom distribution, details of discor-
against outliers) (see Accommoda- Parametric slippage tests) excess/spread, 38-39, 61; dancy tests for practical use, 115-
tion of outliers) sequential approach, 206-207 extreme/location, 4Q-41; high- 116, 118-120
Robustness of efficiency, 142 slippage of me an, 174 arder, 39-40; inclusive and exclu- 'Unreasonable' observation, 23 (but see
Robustness of performance, 126, 142, slippage of variance, l 74 sive measures, 61-64; omnibus, Outlier)
161 the slippage model, 186-187 40; range/spread, 39; sums of 'Unrepresentative' observation, l, 10
Robustness of validity, 126, 142, 161 'Spurious' observation, l, 8, 27, 45-46, squares, 39 (but see Outlier)
'Rogue' observation, l (but see Outlier) 275-277 (but see Outlier) univariate samples, 52-125
Stability aspect of robustness, 141 Time-series (outliers in), 11, 266-268
'Scatter' in multivariate data Stone's rejection test, 19-20 accommodation, 268
internai scatter, 214, 215 'Suspect~ observation, 23 (but see Out-
'bumps and quakes', 266-267 W -statistics of Shapiro and Wilk
outlier scatter-ratios, 215, 216 lier) detection, 266-267 for exponential sample, 88
Self-camouflaging effect of outliers (see 'Suspicious' observation, 4, 10, 15, 29, maximum likelihood ratio tests, 267- for normal sample, 40, 102
'Cloaking') 220, 222 (but see Outlier) 268 Weibull distribution (see Extreme-value
Semi-interquartile range, 158 Swamping, 71, 251 tests of discordancy, 267-268 distributions)
Sensitivity contours, 249 types of outlier, 266-267 Weighting of observations, 20
Sensitivity curve, 139 Tests of discordancy for outliers, 24, 29, Transformation of variables for discor- Winsorization, 26, 50, 144
Sequential methods for slippage, 206- 47, 52-56 dancy testing, 77, 88, 89-90, 115, and L-estimators, 48
207 block tests (see Multiple outliers) 116, 117, 118, 12Q-121, 123 a-Winsorized mean, 145
'Sequential' tests of discordancy (see consecutive tests (see Multiple out- Trimean, 153 in exponential samples, 171-172
Multiple outliers) liers) Trimmed mean (see Trimming) in normal samples, 163-166, 167-169,
'Shorth', 153 details of univariate tests for practical Trimming, 21, 26, 144 17Q-171, 232, 233
Significance tests robust against outliers, use, 75-125; tables, 29Q-322, as L-estimator, 48 influence function for Winsorized
142-144, 16Q-162 328 a-trimmed mean, 26, 48 mean, 146-147
performance criteria, 142-144 Dixon statistics (see Dixon statistics) in exponential samples, 173 modified Winsorization, 50, 147-148,
'premium' and 'protection', 144 in context of Bayesian approach, 269- in normal samples, 163-166, 167-169, 166-168, 17Q-171
Skewness, 39, 42, lOQ-101, 266 277 17Q-171, 232, 233 r-fold symmetrically Winsorized
Skipping, 153 in designed experiments, 238-246 influence curve for trimmed mean, mean, 145
iterative, 153 in linear models, 257-265 145-147 (r, s)-fold Winsorized mean, 145
multiple, 153 in regression, 252-257 mid-mean, 145 scale estimators, 159
Slippage model for outliers, 34-35 in relation t o slippage test, 20 l modified trimming, 147-148, 166-168 semi-Winsorization, 148, 166-169,
gamma, 44 in time-series, 267-268 r-fold symmetrically trimmed mean, 17Q-171, 232
in linear models, 262-264 intuitively constructed tests, 52-56, 72 145 Winsorized mean, 48, 51
in time-series, 266-267 invariance considerations, 52-56 (r, s)-fold trimmed mean, 144 Winsorized mean (see Winsorization)
labelled and unlabelled slippage mod- maximum likelihood ratio procedure, scale estimators, 162, 17Q-171 Working hypothesis (see Basic model)
els, 57-58, 59-60, 72-73, 259 41 trimmed mean, 26, 51 Wright's rejection test, 2, 21
normal, 33, 34-35 measures of performance (e.g. power),
slippage in dispersion, 34-35, 43 30, 41, 43-44, 45, 64-68 (see also
slippage in location, 34-35, 43 Performance criteria for discor-
(see also Ferguson's slippage models dancy tests)
A and B) multiple decision procedure, 43, 44
Slippage tests, 42-43, 174-207 multivariate normal samples, 209,
Bayesian approach, 192, 193, 20Q- 219, 23Q-231
201, 206 non-parametric, 283-285
generai method for constructing slip- one-sided versus two-sided tests, 44-
page tests, 197-200 45
individuai outliers in slipped samples, optimality criteria, 41-42
183 recursive algorithm for null distribu-
masking, 186 tion of test statistic, 62-64
non-parametric tests of location, 176- 'sequential' (see Multiple outliers)
187 (see Non-parametric slippage statistica! principles, 41-46, 56-61
tests of location) types of test statistic, 38-41;

You might also like