Download as pdf or txt
Download as pdf or txt
You are on page 1of 70

Comprehensive Chemometrics:

Chemical and Biochemical Data


Analysis 2nd Edition Steven Brown
(Editor)
Visit to download the full and correct content document:
https://ebookmass.com/product/comprehensive-chemometrics-chemical-and-biochem
ical-data-analysis-2nd-edition-steven-brown-editor/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Statistics and Data Analysis for Nursing Research (2nd


Edition ) 2nd Edition

https://ebookmass.com/product/statistics-and-data-analysis-for-
nursing-research-2nd-edition-2nd-edition/

Advanced Data Analysis & Modelling in Chemical


Engineering 1st Edition Denis Constales

https://ebookmass.com/product/advanced-data-analysis-modelling-
in-chemical-engineering-1st-edition-denis-constales/

Data Science With Rust: A Comprehensive Guide - Data


Analysis, Machine Learning, Data Visualization & More
Van Der Post

https://ebookmass.com/product/data-science-with-rust-a-
comprehensive-guide-data-analysis-machine-learning-data-
visualization-more-van-der-post/

Introduction to Chemical Processes: Principles,


Analysis, Synthesis 2nd Edition Regina Murphy

https://ebookmass.com/product/introduction-to-chemical-processes-
principles-analysis-synthesis-2nd-edition-regina-murphy/
Chemometrics in Spectroscopy (Second Edition) Howard
Mark

https://ebookmass.com/product/chemometrics-in-spectroscopy-
second-edition-howard-mark/

Biochemical Pathways: An Atlas of Biochemistry and


Molecular Biology 2nd Edition

https://ebookmass.com/product/biochemical-pathways-an-atlas-of-
biochemistry-and-molecular-biology-2nd-edition/

Data Assimilation for the Geosciences: From Theory to


Application 2nd Edition Steven J. Fletcher

https://ebookmass.com/product/data-assimilation-for-the-
geosciences-from-theory-to-application-2nd-edition-steven-j-
fletcher/

Advances in Oil-Water Separation : A Complete Guide for


Physical, Chemical, and Biochemical Processes Papita
Das

https://ebookmass.com/product/advances-in-oil-water-separation-a-
complete-guide-for-physical-chemical-and-biochemical-processes-
papita-das/

Guidelines for Revalidating a Process Hazard Analysis


2nd Edition Ccps (Center For Chemical Process Safety)

https://ebookmass.com/product/guidelines-for-revalidating-a-
process-hazard-analysis-2nd-edition-ccps-center-for-chemical-
process-safety/
COMPREHENSIVE
CHEMOMETRICS: CHEMICAL
AND BIOCHEMICAL
DATA ANALYSIS

SECOND EDITION
This page intentionally left blank
COMPREHENSIVE
CHEMOMETRICS: CHEMICAL
AND BIOCHEMICAL
DATA ANALYSIS

SECOND EDITION
EDITORS IN CHIEF
Steven Brown
Department of Chemistry and Biochemistry
University of Michigan
USA
Romà Tauler
Department of Environmental Chemistry
Institute of Environmental Assessment and Water Research (IDÆA)
SPANISH COUNCIL OF SCIENTIFIC RESEARCH (CSIC)
Spain
Beata Walczak
Department of Analytical Chemistry
Institute of Chemistry
Silesian University
Poland

VOLUME 1
Elsevier
Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
50 Hampshire Street, 5th Floor, Cambridge MA 02139, United States

Copyright Ó 2020 ELSEVIER B.V. All rights reserved

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on
how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the
Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted
herein).

Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in
research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers may always rely on their own experience and knowledge in evaluating and using any information, methods,
compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the
safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or
damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods,
products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data


A catalog record for this book is available from the Library of Congress

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from the British Library

ISBN 978-0-444-64165-6

For information on all publications visit our website


at http://store.elsevier.com

Publisher: Oliver Walter


Acquisition Editors: Sean Simms & Rachel Conway
Content Project Manager: Paula Davies
Associate Content Project Manager: Brinda Subramanian
Designer: Mark Rogers
EDITORS IN CHIEF

Steven Brown obtained PhD in analytical chemistry from the University of Washington in 1978,
where he worked with Bruce Kowalski. That year he was appointed Assistant Professor at the University
of California, Berkeley, with a joint appointment at Lawrence Berkeley Laboratory. In 1981, he moved
to Washington State University, and in 1986 to the Department of Chemistry and Biochemistry at the
University of Delaware, where he is presently Willis F. Harrington Professor.
He has served as a Section President of the American Chemical Society from 1982–1984, as Chair
of the Department of Chemistry and Biochemistry at the University of Delaware from 1997–2002, and
as President of the North American Chapter of the International Chemometrics Society from 1986–
1988. He has served on the Editorial Staff of the Journal of Chemometrics since its inception, first as
a founding Editor and then as Editor in Chief before stepping down as Editor in Chief in 2007. He
now serves on the journal’s editorial board.
His research interests concern a wide range of problems in chemometrics and machine learning,
with over 300 publications in the scientific literature. He has edited several books, including the first
edition of the four-volume set Comprehensive Chemometrics, published by Elsevier in 2009. A focus of
his research has been on the development of new instrumental methods through use of multivariate
mathematical methods for multicomponent analysis, including calibration transfer, and the novel
use of data fusion methods. He was winner of the first EAS Award in Chemometrics in 1996 and the Kowalski Award in 2015. His work
has had applications in biomedical analysis, food science, plant science, forensic science, pharmaceutical characterization, and process
chemistry.

Romà Tauler (Barcelona, Spain, 1955) is research professor at IDAEA-CSIC. He obtained his doctorate
in Analytical Chemistry at the University of Barcelona in 1977. He was Assistant and Associate Professor
of the University of Barcelona (1978–2002) and has been CSIC Research Full Professor since 2003. He
is Chief Editor of Journal of Chemometrics and Intelligent Laboratory Systems and of the Comprehensive
Chemometrics major reference work. He has achieved an Award for Achievements in Chemometrics,
Eastern Analytical Symposium, and in 2009 the Kowalski Prize from J. of Chemometrics, Wiley, 2009.
He served as the President of the Catalan Chemistry Society, 2008–2013, and was the recipient of
the Eu-ERC Advanced Grant award Nr. 320337, 2013–2018, CHEMAGEB project. Romà Tauler
has published more than 430 research publications and 60 book chapters, with more than 16,000
citations and h-índex 63. His main research field is Chemometrics and its applications to
Environmental Chemistry, Omic Sciences, and Bioanalytical Chemistry.

v
vi Editors in Chief

Prof. Beata Walczak graduated in chemistry from the Faculty of Mathematics, Physics and Chemistry,
Silesian University, Katowice, Poland, in 1979. Since then she has been working in the Institute of
Chemistry, Silesian University, where now she is the Head of the Department of Analytical Chemistry.
Meanwhile, she stayed as a postdoc at the University of Orleans (France) and at the Graz University of
Technology (Austria). She also held a post of a visiting professor at Vrije Universitiet Brussel
(Belgium), at Rome Univeristy “La Sapienza” (Italy), at AgroParisTech University (France), at the
University of Modena and Reggio Emilia(Italy), and at Radboud University (The Netherlands).
From the early 1990s she has been involved in chemometrics, and her main scientific interests are
in all aspects of data exploration and modeling (dealing with missing and censored data, dealing with
outliers, data representativity, enhancement of instrumental signals, signal warping, data compression,
linear and nonlinear projections, development of modeling approaches, feature selection techniques,
etc.). She authored and coauthored ca. 170 scientific papers, 400 conference papers, and delivered
many invited lectures at the numerous international chemistry meetings. She acted as an editor and
coauthor of the book Wavelets in Chemistry, Vol. 22, in the series “Data Handling in Science and Tech-
nology,” Elsevier, Amsterdam, 2000, and as an coeditor of four-volume Comprehensive Chemometrics,
Elsevier, Amsterdam, 2009. Currently she acts as Editor of the journal Chemometrics and Intelligent
Laboratory Systems and of “Data Handling in Chemistry and Technology” (the Elsevier book series), and also as a member of the editorial
boards of Talanta, Analytical Letters, J. of Chemometrics, and Acta Chromatographica.
SECTION EDITORS

Richard Brereton did his BA, MA, PhD and postdoc in the University of Cambridge, after which time
he moved to the staff of the University of Bristol where he was successively Lecturer, Reader,
Professor and Emeritus. He is Fellow of the Royal Society of Chemistry, Royal Statistical Society
and Royal Society of Medicine and a Chartered Chemist. He is Editor-in-Chief of the journal Heritage
Science, columnist for J Chemometrics, and editorial board member for several journals. He has given
around 200 invited lectures in over 30 countries, been main / sole supervisor of 33 PhD and 6
research masters students, on around 50 conference organising committees, and acted as expert
witness in 13 court cases. He has published over 400 recorded articles, including 8 books, 209 papers
in Web of Knowledge, 20 book chapters, 32 book reviews, 9 conference papers, almost all as main or
sole author. His papers have been cited over 5000 times according to Web of Knowledge, and his
books over 3000 times according to Google Scholar. He has published the most cited paper in J Che-
mometrics over the last decade, and the eighth most cited paper in the Analyst since 2000.

Marina Cocchi currently serves as an associate professor in Analytical Chemistry-Chemometrics, at


the Department of Chemical and Geological Sciences of the University of Modena and Reggio Emilia
(Italy), teaching chemometrics at undergraduate and graduate levels. She holds a degree (cum
Laude) and a Ph.D. in Chemical Sciences from the University of Modena. As a part of his Ph.D.
he worked with Professor S. Wold on development of chemometrics approaches for 3D QSAR.
She has published more than 100 papers in international journals and books covering a range of
topics embracing Multivariate, Multi-way and Multiset methods; Data Fusion; 2D WT in Multivariate
Images Analysis for fault detection and pattern recognition; algorithms for features selection in
Wavelet Domain; MSPC; Food Authenticity; Chemical fingerprinting by spectroscopy (MIR, NIR,
NMR) and chromatography.
She has supervised twelve PhD Thesis in chemometrics. She has been in the board of Italian Che-
mometrics Group from 2001 to 2015, acting as President in 2007-11. Since 2010 she has been
member of the editorial board of Chemometrics and Intelligent Laboratory Systems. She has been
Editor of the book Data Fusion: Methods and Applications, Data Handling in Science and Tech-
nology series, vol.31, Elsevier 2019.

Anna de Juan is an associate professor at the Department of Chemical Engineering and Analytical
Chemistry at the University of Barcelona since 2003, teaching chemometrics at undergraduate and
graduate levels. She holds a degree and PhD in Chemistry from the University of Barcelona (UB)
and her expertise is in Multivariate Curve Resolution (MCR) methods: theoretical development
and application to bioanalytical and analytical problems. Since 2002 she is member of the Editorial
Advisory Board of Chemometrics and Intelligent Laboratory systems and since 2006 of Analytica
Chimica Acta. Recently, it has acted as section editor for the reference work Comprehensive Chemo-
metrics, Elsevier (2009). She has stayed in the framework of research collaborations in Vrije Univer-
siteit Brussel, Brussels (1995), Virginia Commonwealth University, Richmond, US (1998), The
University of Newcastle, Australia (2002), Université des Sciences et Technologies de Lille, France
(2004) and the University of Dalhousie, Canada (2016). In 2004 she received the 4th Chemometrics
Elsevier Award together with Karl Booksh. She has published more than 130 papers in international
journals and books and has given more than 180 presentations in different international confer-
ences, 50 of them plenary or keynote lectures, basically on design of chemometric tools and multi-
variate curve resolution developments and related methods and on applications to process analysis,
hyperspectral image analysis and general analytical applications.

vii
viii Section editors

Rafael Cela Torrijos is a professor of analytical chemistry in the University of Santiago de Compos-
tela, Spain, and the Head of Laboratory for Analytical Chemistry in the Research institute of Chem-
ical and Biological Analyses (IAQBUS), in the same university. Previously, he was at the Universities
of Madrid (Complutense) and Cádiz, belonging to the group of analytical chemists that started the
development of chemometrics by the 1980s in Spain. His research has focused on the analytical
applications of separation science, and particularly, the development and optimization of sample
preparation techniques in chromatographic analysis, including experimental designs and develop-
ment of computer-assisted chromatographic methods, being the author of the Mchrom Scout soft-
ware, distributed by Mestrelab Research S.L. Currently, he is the author or co-author of more than
300 scientific papers and several textbooks.

Riccardo Leardi was born in Novi Ligure (Italy) on October 17, 1959.
In 1983 he graduated cum laude in Pharmaceutical Chemistry and Technology at the Faculty of
Pharmacy of the University of Genova..
His actual position is Associate Professor at the Department of Pharmacy of the School of
Medical and Pharmaceutical Sciences of the University of Genoa. In 2013 he got the qualification
for full professor in Analytical Chemistry.
Since 1985 he has been working in the section of Analytic Chemistry of the Department of Phar-
macy of the University of Genova, and his research field is Chemometrics.
His interests are mainly devoted to problems of classification and regression (applied especially
to food, environmental and clinical data), experimental design, process optimization, multivariate
process monitoring and multivariate quality control.
His original research focused mainly on genetic algorithms, especially in their application to the
problem of variable selection, and three-way methods.
He developed the chemometrical softwares CAT (Chemometric Agile Tool) and BasiCAT, both
freely downloadable from the site http://gruppochemiometria.it/index.php/software.
He is author of almost 150 papers and more than 130 communications in national and interna-
tional meetings, several of which as invited speaker; he was invited to give talks and courses in several
industries and research centers.
He organizes two schools of Chemometrics (Multivariate Analysis; Experimental Design), each
held twice a year at the University of Genoa.
Since November 2002 he started his activity of chemometric consultancy.

Prof. Federico Marini was born in Rome, Italy, in 1977 and he received his MSc (2000) and PhD
(2004) from Sapienza University of Rome. He is currently associate professor of Chemometrics at
Sapienza University of Rome. In 2006, he was awarded the Young Researcher Prize from Italian
Chemical Society and in 2012 he won the Chemometrics and Intelligent Laboratory Systems Award
“for his achievements in chemometrics”. He has been visiting researcher in various Universities
(Copenhagen, Stellenbosch, Silesia, Lille). His research activity is focused on all aspects of chemo-
metrics, ranging from the application of existing methods to real world problems in different fields
to the design and development of novel algorithms. He is author of more than 150 papers in inter-
national journals, and recently he edited and coauthored the book Chemometrics in food chemistry
(Elsevier). He is member of the Editorial boards of Chemolab, Analytica Chimica Acta, J. of Chemo-
metrics, J. of NIR Spectroscopy, J. of Spectral Imaging and he serves as Associate Editor for Chemo-
metrics in Wiley’s Encyclopedia of Analytical Chemistry. He is the past coordinator of the
Chemometric group of the Italian Chemical Society and the coordinator of the Chemometric study
group of EUCheMS.
Section editors ix

Alejandro C. Olivieri was born in Rosario, Argentina (07/28/1958). He obtained his B.Sc. from the
Catholic University (1982), and his Ph.D. from the National University of Rosario (1986). He did
Postdoctoral Research at the University of Illinois, Urbana-Champaign, USA. In 1990, he returned
to the University of Rosario, and joined the National Research Council (CONICET). He founded
a research group in chemometrics in analytical chemistry, publishing about 250 papers in interna-
tional journals, books and book chapters, and supervised nine Ph.D. theses. He received several
national and international awards, including the John Simon Guggenheim Memorial Foundation
fellowship (2001–2002). In 2018 he published “Introduction to multivariate calibration. A practical
approach”, a book which was selected by Choice, from the Association of College and Research
Libraries (ACRL) as one of the Outstanding Academic Titles for 2019.

Dr. William Rayens is Professor and the Dr. Bing Zhang Endowed Department Chair in the Depart-
ment of Statistics at the University of Kentucky. Rayens has an extensive research record focused
primarily on the development of multivariate and multi-way statistical methodologies, mostly
related to problems in chemistry and the neurosciences. He has mentored several Ph.D. students
and has been honored at both the College and the University level as an outstanding teacher. Rayens
also served as Assistant Provost for General Education during which time he was tasked with imple-
menting new general education reforms at the University of Kentucky, the first changes to that
program in almost 30 years. He designed the one-of-a-kind Technologically Enhanced Active
Learning rooms in the University’s multi-million dollar Jacobs Science Building. Dr. Rayens received
his Ph.D. in mathematics from Duke University in 1986.

Luis A. Sarabia received his Ph.D. in Statistics from de University of Valladolid (Spain) in 1979.
Since 1974, he has been teaching Statistics, Mathematics and Design of Experiments mostly to grad-
uate and postgraduate students of Chemistry. In 2000 he reached the position of professor at the
University of Burgos. He is currently director of the Department of Mathematics and Computation
and a member of the research group Chemometrics and Qualimetrics (officially recognized as
consolidated research group, UIC-237). He is author and co-author of around 140 papers, 7 chapter
of book and approximately 160 communications in international meetings. He was co-founder of
Colloquium Chemiometricum Mediterraneum. His research interest are in multivariate statistics,
n-way procedures, evolutionary algorithms for multiresponse optimization, design of experiments,
QbD and PAT with application to regulated chemical analysis, food safety, food characterization and
fraud detection.
This page intentionally left blank
IN MEMORY OF ROGER PHAN TAN LUU

“Science evolves by means of research. Scientific research is based on experiments (Galileo Galilei, Two new
sciences, 1638). With time experiments have becoming more and more complex. So scientists used “experi-
mental design (DOE, design of experiments)” as a useful tool. Only 40 years ago Roger Phan Tan Luu suggested
that DOE is the core of the Methodology of experimental research, much more than a tool, a philosophy.
The research work of Roger Phan Tan Luu includes new tools, the organization of design for complex
experiments, and a lot of applications. To perform research work on a real problem it is necessary to know very
well the problem. Obvious. When the expert of the problem asks for the help of the expert of design, the
problem expert must know just some base elements of the Methodology. The expert of Methodology instead
must study to have a deep knowledge of the problem, in much detail. In this way Roger accumulated knowl-
edge, i.e., experience, in many fields of research. The result of nearly 50 years of activity was a complete scientist,
from theory to new tools, to applications.
A complete scientist is always a teacher. Roger was a great teacher, not only in his Aix-Marseille University, but
throughout the world. He had the rare ability to capture the attention of student, transforming a boring heavy
sequence of theorems into a compelling, instructive show. For the students a pleasure, not a pain.
This description is that of the public man.
The private man was (outside the family) a loner. Only in the last 20 years of the past century did he make
contact in the organization of schools (Eurochemometrics, Erasmus stages) of chemometrics and experimental
design for some people of large affinity. With these people, he opened himself to friendship. As one of his
friends, I received from him very much. It was possible for me to appreciate his generosity, his humor, his speed
to answer to invitation (and of course to work together).
Friendship details are very intimate, not ready to be described. For friends the final separation is a very
hard continuous sentiment. Memories help. As a friend, as a common scientist, always I will remember.
Michele Forina

xi
This page intentionally left blank
CONTRIBUTORS TO VOLUME 1

BW Bader Jasper Engel


Sandia National Laboratories, Albuquerque, NM, USA Biometris, Wageningen University & Research,
Wageningen, The Netherlands
A Beal
NemrodW SAS, Marseille, France KH Esbensen
KHE Consulting, Copenhagen, Denmark
JM Bernardo
Universitat de València, Valencia, Spain Bernard Francq
Andrey Bogomolov Institute of Statistics, Biostatistics and Actuarial Sciences
Endress þ Hauser Liquid Analysis GmbH þ Co. KG, (ISBA), Louvain Institute for Data Analysis and
Gerlingen, Germany; and Samara State Technical Modeling (LIDAM), Université catholique de Louvain
University, Samara, Russia (UCLouvain), Louvain-la-Neuve, Belgium; and
CMC Statistical Sciences, GSK Vaccines, Rixensart,
Richard G Brereton Belgium
School of Chemistry, University of Bristol, Bristol,
United Kingdom Bernadette Govaerts
Institute of Statistics, Biostatistics and Actuarial Sciences
B Campisi
(ISBA), Louvain Institute for Data Analysis and
Department of Economics, Business, Mathematics and
Modeling (LIDAM), Université catholique de Louvain
Statistics, University of Trieste, Trieste, Italy
(UCLouvain), Louvain-la-Neuve, Belgium
Johan E Carlson
Department of Computer Science and Electrical Ana Herrero
Engineering, Luleå University of Technology, Luleå, Departamento de Química, Facultad de Ciencias,
Sweden Universidad de Burgos, Burgos, Spain

Rolf Carlson HCJ Hoefsloot


Department of Chemistry, Faculty of Science, University University of Amsterdam, Amsterdam, Netherlands
of Tromsoe, Tromsoe, Norway
Kas J Houthuijs
Georgia Charkoftaki Analytical Chemistry, Institute for Molecules and
Department of Environmental Health Sciences, Yale Materials, Radboud University Nijmegen, Nijmegen,
School of Public Health Yale University, New Haven, The Netherlands
CT, United States
Mia Hubert
Bieke Dejaegher Department of Mathematics, KU Leuven, Leuven,
Analytical Chemistry, Applied Chemometrics and Belgium
Molecular Modelling, Vrije Universiteit Brussel e VUB,
JJ Jansen
Brussels, Belgium
Netherlands Institute for Ecology, Heteren, Netherlands
Paul HC Eilers
LP Julius
Department of Biostatistics, Erasmus University Medical
Glycom A/S, Esbjerg, Denmark
Center Rotterdam, The Netherlands
Riccardo Leardi
SLR Ellison Department of Pharmacy, University of Genoa, Genoa,
LGC Limited, Teddington, Middlesex, United Kingdom Italy

xiii
xiv Contributors to Volume 1

Federico Marini Michel Thiel


Department of Chemistry, University of Rome La Institute of Statistics, Biostatistics and Actuarial Sciences
Sapienza, Rome, Italy (ISBA), Louvain Institute for Data Analysis and
Modeling (LIDAM), Université catholique de Louvain
Rebecca Marion
(UCLouvain), Louvain-la-Neuve, Belgium; and
Institute of Statistics, Biostatistics and Actuarial Sciences
Statistics and Decision Sciences, Janssen
(ISBA), Louvain Institute for Data Analysis and
Pharmaceutical, Beerse, Belgium
Modeling (LIDAM), Université catholique de Louvain
(UCLouvain), Louvain-la-Neuve, Belgium M Thompson
Manon Martin Birkbeck College, University of London, London, United
Institute of Statistics, Biostatistics and Actuarial Sciences Kingdom
(ISBA), Louvain Institute for Data Analysis and R Todeschini
Modeling (LIDAM), Université catholique de Louvain University of MilanoeBicocca, Milan, Italy
(UCLouvain), Louvain-la-Neuve, Belgium; and Fond
National de le Recherche Scientifique, Brussels, Belgium Rafael Cela Torrijos
CRLF University of Santiago de Compostela, Santiago,
M Cruz Ortiz
Spain
Departamento de Química, Facultad de Ciencias,
Universidad de Burgos, Burgos, Spain Yvan Vander Heyden
M Pavan Analytical Chemistry, Applied Chemometrics and
Joint Research Centre, European Commission, Ispra, Molecular Modelling, Vrije Universiteit Brussel e VUB,
Italy Brussels, Belgium

R Phan-Tan-Luu Vasilis Vasiliou


NemrodW SAS, Marseille, France; and University Paul Department of Environmental Health Sciences, Yale
Cezanne, Marseille Cedex, France School of Public Health Yale University, New Haven,
CT, United States
FF Pitard
Francis Pitard Sampling Consultants, Broomfield, CO, DJ Vis
USA University of Amsterdam, Amsterdam, Netherlands
M Sagrario Sánchez D Voinovich
Departamento de Matemáticas y Computación, Facultad Department of Chemical and Pharmaceutical Sciences,
de Ciencias, Universidad de Burgos, Burgos, Spain University of Trieste, Trieste, Italy
Luis A Sarabia Beata Walczak
Departamento de Matemáticas y Computación, Facultad Institute of Chemistry, University of Silesia, Katowice,
de Ciencias, Universidad de Burgos, Burgos, Spain Poland
AK Smilde JA Westerhuis
University of Amsterdam, Amsterdam, Netherlands University of Amsterdam, Amsterdam, Netherlands
SUBJECT CLASSIFICATION

Statistics
Quality of Analytical Measurements: Statistical Methods for Internal Validation
Proficiency Testing in Analytical Chemistry
Quality of Analytical Measurements: Univariate Regression
Robust and Nonparametric Statistical Methods
Bayesian Methodology in Statistics
Robust multivariate statistical methods
An Introduction to the Theory of Sampling: An Essential Part of Total Quality Management
Representative Sampling, Data Quality, Validation: A Necessary Trinity in Chemometrics

Experimental Design
Introduction: Experimental Designs
Screening Strategies
The Study of Experimental Factors
Response Surface Methodology
Experimental Design for Mixture Studies
Nonclassical Experimental Designs
Designing a multicomponent calibration experiment: basic principles and diagonal approach

Analysis of Variance
General Linear Models
Multiset Data Analysis: ANOVA Simultaneous Component Analysis and Related Methods
Regularized Manova
ANOVAdTP

Optimization
Constrained and Unconstrained Optimization
Sequential Optimization Methods
Optimization: Steepest Ascent, Steepest Descent, and Gradient Methods
Multicriteria Decision-Making Methods
Genetic Algorithms in Chemistry
A Guided Tour of Penalties
Particle Swarm Optimization

Linear Soft-Modeling
Linear Soft-Modeling: Introduction
Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background, Algorithms, History, Practice
Principal Component Analysis

xv
xvi Subject Classification

Independent Component Analysis


Independent component analysis in Analytical Chemistry
Introduction to Multivariate Curve Resolution
Two-Way Data Analysis: Evolving Factor Analysis
Two-Way Data Analysis: Detection of Purest Variables
Two-Way Data Analysis: Multivariate Curve Resolution j Noniterative Resolution Methods
Two-Way Data Analysis: Multivariate Curve Resolution j Iterative Resolution Methods
Two-Way Data Analysis: Multivariate Curve Resolution j Error in Curve Resolution
Estimation of feasible bands in Multivariate Curve Resolution
Multiway Data Analysis: Eigenvector-Based Methods
Multilinear Models: Iterative Methods
Multiset Data Analysis: Extended Multivariate Curve Resolution
Tensor Similarity in Chemometrics
Bayesian Methods for Factor Analysis in Chemometrics
Time Series Modeling
Other Topics in Soft-Modeling: Maximum Likelihood-Based Soft-Modeling Methods
Figures of Merit

Unsupervised Learning
Unsupervised Data Mining: Introduction
Data Mapping: Linear Methods versus Nonlinear Techniques
Tree-Based Clustering and Extensions
Model-Based Clustering
Common Clustering Algorithms
Density-Based Clustering Methods

Other Problems With Data Analysis


Feature Selection: Introduction
Feature Selection in the Wavelet Domain: Adaptive Wavelets
Missing Data
Compositional Data Analysis in Chemometrics
Sparse Methods

Data Preprocessing
Preprocessing Methods
Evaluation of Preprocessing Methods
Model-based preprocessing in vibrational spectroscopy
Normalization and Closure
Variable Shift and Alignment
Background Estimation, Denoising, and Preprocessing
Denoising and Signal-to-Noise Ratio Enhancement: Classical Filtering
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
Denoising and Signal-to-Noise Ratio Enhancement: Splines
Data Quality and Denoising: a Review
Model-Based Preprocessing and Background Elimination: OSC, OPLS, and O2PLS

Regression and Classification


Calibration Methodologies
Variable Selection
Partial Least Squares
Multivariate Approaches: UVE-PLS
Data Fusion
Multiblock and Three-Way Data Analysis
Subject Classification xvii

Transfer of Multivariate Calibration Models


Robust Multivariate Methods in Chemometrics/Robust and Sparse Multivariate Methods in Chemometrics
Regression Diagnostics
Model-Based Data Fitting
Linear Approaches for Nonlinear Modeling
Computationally Intensive Nonlinear Regression Methods
Neural Networks
Feedforward Neural Networks
Kernel Methods
Classification: Basic Concepts
Validation of Classifiers
Statistical Discriminant Analysis
Soft Independent Modeling by Class Analogy
Decision Tree Modeling in Classification
Random Forest and Ensemble Methods
Multivariate Approaches to Classification Using Genetic Algorithms
Multiway Classification
Deep Learning Theoretical Chapter for chemometrician

Applications
Chemometrics in Electrochemistry
Chemometrics in the Pharmaceutical Industry
Environmental Chemometrics
Resampling and Testing in Regression Models with Environmetrical Applications
Application of Chemometrics in the Food Sciences
Chemometrics in Forensics
Chemometric Analysis of Sensory Data
Smart Sensors
Statistical Control of Measures and Processes
Best Practice and Performance of Hardware in Process Analytical Technology (PAT)dA Prerequisite to Avoid
Pitfalls in Data Analytics
Multivariate Statistical Process Control and Process Control, Using Latent Variables
Batch Process Modeling and MSPC
Chemometrics in Raman spectroscopy
Chemometrics in NIR Hyperspectral ImagingdTheory and Applications in the Agricultural Crops and Products Sector
Mass Spectroscopic Imaging: Chemometric Data Analysis
Fast analysis, Processing, and Modeling of Hyperspectral Videos: Challenges and Possible Solutions
Image Processing
Chemometrics Analysis of Big Data
Systems Biology
Analysis of Metabolomics Data
Data Processing for RNA/DNA Sequencing
Analysis of Megavariate Data in Functional Omics
Spectral Map Analysis of Microarray Data
Chemometrics in Flow Cytometry
Chemometrics for QSAR Modeling
Chemoinformatics
High-Performance GRID Computing in Chemoinformatics
This page intentionally left blank
PREFACE

Some 50 years ago, the first publications appeared on the use of computer-aided mathematics to analyze
chemical data. With those publications, the modern field of chemometrics was launched. Both the speed and
power of computers and the sophistication of analytical instrumentation have made great leaps in the inter-
vening time. The ready availability of chemometric software, coupled with the increasing need for rigorous,
systematic examination of ever-larger and more sophisticated sets of measurements from instrumentation has
generated strong interest in reliable methods for converting the mountains of measurements into more
manageable piles of results, and for converting those results into nuggets of useful information. Interest in
application of chemometrics has spread well beyond chemists with a need to understand and interpret their
measurements; now chemometrics is helping to make important contributions in process engineering, in
systems biology, in environmental science, and other disciplines that rely on chemical instrumentation, to name
only a few areas.
In the 12 years since the first edition of this book appeared, there has been considerable change in the fields
of data science and of chemometrics. As applications of chemometrics continue to grow, so too does is the
methodology of chemometrics. After 50 years, chemometrics is a scientific field with mature areas, but it is also
a field where change continues to occur at a rapid pace, driven both by advances in chemical instrumentation
and measurement and by close connection of chemometrics with the data science, machine learning, statistics
and signal processing research communities. The interfacial location of chemometrics, falling between
measurements on the one side and statistical and computational theory and methods on the other, poses
a challenge to the new practitioner: gaining sufficient breadth and depth of understanding in data science and
learning in what ways data science connects with measurement chemistry, in order to use chemometrics
effectively.
The four volumes of Comprehensive Chemometrics, 2nd Ed. are the result of a meeting in Oxford in January,
2017, where the editors planned a revised work that would update most of the work covered in the first edition
and would cover emerging areas of chemometric research, while providing a sampling of current applications.
Our goal was to bring our reference work current with the advances in chemometrics that have occurred since
2006, with a treatment that would serve both the new and the experienced practitioner.
What has resulted from this collaboration is a resource that captures the practice of chemometrics now. The
four volumes in the revised work now include 119 chapters, with 33 new, 35 reprinted and 51 updated chapters,
making this the most wide-reaching and detailed overview of the field of chemometrics ever published.
Comprehensive Chemometrics 2nd Ed. offers depth and rigor to the new practitioner entering the field, and breadth
and varied perspectives on current literature to more experienced practitioners aiming to expand their horizons.
Software and datasets, both of which are especially valuable to those learning the methods, are provided in
some chapters. The coverage is not only comprehensive, it is authoritative as well; authors contributing to
Comprehensive Chemometrics 2nd Ed. are among the most distinguished practitioners of the field.
Comprehensive Chemometrics 2nd Ed. would not have been possible without the considerable help of the
Editorial Board, who assisted in selecting authors and reviewing chapters. For this edition, our Board included
Richard Brereton, Marina Cocchi, Anna De Juan, Riccardo Leardi, Roger Phan Tan Lu, Federico Marini, William
Rayens, Luis Sarabia, Alejandro Olivieri and Rafael Cela.
This new edition would not have been possible without the hard work of the staff at Elsevier. We also owe
thanks to Rachel Conway, Senior Acquisitions Editor at Elsevier, for supporting the project and seeing the
project off, to Sean Simms, who took over the task of Acquisitions Editor in 2019, to Dhivya Karunagaran and
Brinda Subramanian for their help in ensuring that submissions met the requirements for publication, and

xix
xx Preface

especially to Paula Davies, our Content Project Manager, for overseeing the entire project, keeping track of the
due dates and submissions, encouraging authors as needed, and helping us to keep to the production schedule.
Finally, we extend special thanks to all of our authors whose efforts have made the work the valuable reference
that it is.

Steven Brown
Romà Tauler
Beata Walczak
March, 2020
CONTENTS OF ALL VOLUMES

Editors in Chief v
Section Editors vii
In memory of Roger Phan Tan Luu xi
Contributors to Volume 1 xiii
Subject Classification xv
Preface xix

VOLUME 1

1.01 Quality of Analytical Measurements: Statistical Methods for Internal Validation 1


M Cruz Ortiz, Luis A Sarabia, M Sagrario Sánchez, and Ana Herrero
1.02 Proficiency Testing in Analytical Chemistry 53
M Thompson and SLR Ellison
1.03 Quality of Analytical Measurements: Univariate Regression 71
MC Ortiz, MS Sánchez, and LA Sarabia
1.04 Robust Multivariate Statistical Methods 107
Mia Hubert
1.05 Bayesian Methodology in Statistics 123
JM Bernardo
1.06 Robust Methods for High-Dimensional Data 149
Mia Hubert
1.07 An Introduction to the Theory of Sampling: An Essential Part of Total Quality Management 173
FF Pitard
1.08 Representative Sampling, Data Quality, ValidationdA Necessary Trinity in Chemometrics 185
KH Esbensen and LP Julius
1.09 Introduction Experimental Designs 205
R Cela and R Phan-Tan-Luu

xxi
xxii Contents of All Volumes

1.10 Screening Strategies 209


Rafael Cela Torrijos and Roger Phan-Tan-Luu
1.11 The Study of Experimental Factors 251
Rolf Carlson and Johan E Carlson
1.12 Response Surface Methodology 287
Luis A Sarabia, M Cruz Ortiz, and M Sagrario Sánchez
1.13 Experimental Design for Mixture Studies 327
D Voinovich, B Campisi, R Phan-Tan-Luu, and A Beal
1.14 Nonclassical Experimental Designs 385
Aurélie Beal and Roger Phan-Tan-Luu
1.15 Designing a Multi-Component Calibration Experiment: Basic Principles and Diagonal Approach 411
Andrey Bogomolov
1.16 The Essentials on Linear Regression, ANOVA, General Linear and Linear Mixed Models for the
Chemist 431
Bernadette Govaerts, Bernard Francq, Rebecca Marion, Manon Martin, and Michel Thiel
1.17 Multiset Data Analysis: ANOVA Simultaneous Component Analysis and Related Methods 465
HCJ Hoefsloot, DJ Vis, JA Westerhuis, AK Smilde, and JJ Jansen
1.18 Regularized Multivariate Analysis of Variance 479
Jasper Engel, Kas J Houthuijs, Vasilis Vasiliou, and Georgia Charkoftaki
1.19 ANOVA-Target Projection (ANOVA-TP) 495
Federico Marini and Beata Walczak
1.20 Constrained and Unconstrained Optimization 521
BW Bader
1.21 Sequential Optimization Methods 553
Bieke Dejaegher and Yvan Vander Heyden
1.22 Optimisation: Steepest Ascent, Steepest Descent and Gradient Methods 573
Richard G Brereton
1.23 Multicriteria Decision-Making Methods 585
M Pavan and R Todeschini
1.24 Genetic Algorithms in Chemistry 617
Riccardo Leardi
1.25 A Guided Tour of Penalties 635
Paul HC Eilers
1.26 Particle Swarm Optimization 649
Federico Marini and Beata Walczak

VOLUME 2

2.01 Introduction to Linear Soft-Modeling 1


Anna de Juan and Romà Tauler
Contents of All Volumes xxiii

2.02 Principal Component Analysis: Concept, Geometrical Interpretation, Mathematical Background,


Algorithms, History, Practice 3
KH Esbensen and P Geladi
2.03 Principal Component Analysis 17
Paul Geladi and Johan Linderholm
2.04 Independent Component Analysis 39
F Westad and M Kermit
2.05 Independent Component Analysis in Analytical Chemistry 57
Hadi Parastar
2.06 Introduction to Multivariate Curve Resolution 85
Sarah C Rutan, Anna de Juan, and Romà Tauler
2.07 Two-Way Data Analysis: Evolving Factor Analysis 95
M Maeder and A de Juan
2.08 Two-Way Data Analysis: Detection of Purest Variables 107
Willem Windig, Andrey Bogomolov, and Sergey Kucheryavskiy
2.09 Two-Way Data Analysis: Multivariate Curve Resolution: Noniterative Resolution Methods 137
Zhimin Zhang, Pan Ma, and Hongmei Lu
2.10 Two-Way Data Analysis: Multivariate Curve Resolution, Iterative Methods 153
Anna de Juan, Sarah C Rutan, and Romà Tauler
2.11 Multivariate Curve ResolutiondError in Curve Resolution 173
Romà Tauler and Marcel Maeder
2.12 On the Ambiguity Underlying Multivariate Curve Resolution Methods 199
Mathias Sawall, Henning Schröder, Denise Meinhardt, and Klaus Neymeyr
2.13 Multiway Data Analysis: Eigenvector-Based Methods 233
J Ferré, R Boqué, and NM Faber
2.14 Multilinear Models, Iterative Methods 267
Giorgio Tomasi, Evrim Acar, and Rasmus Bro
2.15 Multiset Data Analysis: Extended Multivariate Curve Resolution 305
Romà Tauler, Marcel Maeder, and Anna de Juan
2.16 Tensor Similarity in Chemometrics 337
Frederik Van Eeghem and Lieven De Lathauwer
2.17 Bayesian Methods for Factor Analysis in Chemometrics 355
Eun Sug Park and Romà Tauler
2.18 Time Series Analysis Methods in Chemometrics 371
Steven D Brown
2.19 Other Topics in Soft-Modeling: Maximum Likelihood-Based Soft-Modeling Methods 399
PD Wentzell
2.20 Figures of Merit 441
Franco Allegrini and Alejandro C Olivieri
2.21 Unsupervised Data Mining: Introduction 465
D Coomans, C Smyth, I Lee, T Hancock, and J Yang
xxiv Contents of All Volumes

2.22 Data Mapping: Linear Methods versus Nonlinear Techniques 479


R Wehrens
2.23 Tree-Based Clustering and Extensions 491
T Hancock and C Smyth
2.24 Model-Based Clustering 509
GJ McLachlan, SI Rathnayake, and SX Lee
2.25 Common Clustering Algorithms 531
Ickjai Lee and Jianhua Yang
2.26 Density-Based Clustering Methods 565
M Daszykowski and B Walczak
2.27 Feature Selection: Introduction 581
BK Lavine
2.28 Feature Selection in the Wavelet Domain: Adaptive Wavelets 587
DA Donald, YL Everingham, LW McKinna, and D Coomans
2.29 Missing Data 615
F Arteaga, A Folch-Fortuny, and A Ferrer
2.30 Compositional Data Analysis in Chemometrics 641
Peter Filzmoser and Karel Hron
2.31 Sparse Methods 663
Ahmad Mani-Varnosfaderani

VOLUME 3

3.01 Pre-processing Methods 1


Jean-Michel Roger, Jean-Claude Boulet, Magida Zeaiter, and Douglas N Rutledge
3.02 Evaluation of Preprocessing Methods 77
H Jonsson and J Gabrielsson
3.03 Model-Based Pre-Processing in Vibrational Spectroscopy 83
Achim Kohler, Johanne Heitmann Solheim, Valeria Tafintseva, Boris Zimmermann, and Volha Shapaval
3.04 Normalization and Closure 101
M Bylesjö, O Cloarec, and M Rantalainen
3.05 Variable Shift and Alignment 115
Renger H Jellema, Abel Folch-Fortuny, and Margriet MWB Hendriks
3.06 Background Estimation, Denoising, and Preprocessing 137
J Trygg, J Gabrielsson, and T Lundstedt
3.07 Denoising and Signal-to-Noise Ratio Enhancement: Classical Filtering 143
DF Thekkudan and SC Rutan
3.08 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives 157
V-M Taavitsainen
Contents of All Volumes xxv

3.09 Denoising and Signal-to-Noise Ratio Enhancement: Splines 165


V-M Taavitsainen
3.10 Data Quality and Denoising: A Review 179
MS Reis, PM Saraiva, and BR Bakshi
3.11 Model Based Preprocessing and Background Elimination: OSC, OPLS, and O2PLS 205
M Bylesjö and M Rantalainen
3.12 Calibration Methodologies 213
John H Kalivas and Steven D Brown
3.13 Linear Regression Modeling: Variable Selection 249
Roberto Kawakami Harrop Galvão, Mário César Ugulino de Araújo, and
Sófacles Figueredo Carreiro Soares
3.14 An Elemental Perspective on Partial Least Squares 295
William S Rayens
3.15 Multivariate Approaches: UVE-PLS 309
V Centner
3.16 Data and Model Fusion in Chemometrics 317
Steven D Brown
3.17 Multi-Block and Three-Way Data Analysis 341
Mohamed Hanafi, El Mostafa Qannari, and Benoit Jaillais
3.18 Transfer of Multivariate Calibration Models 359
Steven D Brown
3.19 Robust Multivariate Methods in Chemometrics 393
Peter Filzmoser, Sven Serneels, Ricardo Maronna, and Christophe Croux
3.20 Regression Diagnostics 431
Joan Ferré Baldrich
3.21 Model-Based Data Fitting 477
M Maeder, N McCann, S Clifford, and G Puxty
3.22 Linear Approaches for Nonlinear Modeling 497
H Chen and BR Bakshi
3.23 Computationally Intensive Nonlinear Regression Methods 505
Bin Li, Bhavik R Bakshi, and Prem Goel
3.24 Non-linear Modeling: Neural Networks 519
Federico Marini
3.25 Feed-Forward Neural Networks 543
BK Lavine and TR Blank
3.26 Kernel Methods 555
J Suykens
3.27 Classification: Basic Concepts 567
BK Lavine and WS Rayens
3.28 Validation of Classifiers 575
BK Lavine
xxvi Contents of All Volumes

3.29 Statistical Discriminant Analysis 585


BK Lavine and WS Rayens
3.30 Soft Independent Modeling by Class Analogy 605
Alexey L Pomerantsev and Oxana Ye Rodionova
3.31 Decision Tree Modeling 625
Steven D Brown and Anthony J Myles
3.32 Random Forest and Ensemble Methods 661
George Stavropoulos, Robert van Voorstenbosch, Frederik-Jan van Schooten, and Agnieszka Smolinska
3.33 Genetic Algorithms for Variable Selection and Pattern Recognition 673
Barry K Lavine, Collin G White, and Charles E Davidson
3.34 Multi Way Classification 701
Marina Cocchi, Mario Li Vigni, and Caterina Durante
3.35 Deep Learning Theoretical Chapter for Chemometrician 723
Robert van Vorstenbosch, Agnieszka Smolinska, and Lionel Blanchet

VOLUME 4

4.01 Chemometrics in Electrochemistry 1


M Esteban, C Ariño, and JM Díaz-Cruz
4.02 Chemometrics in the Pharmaceutical Industry 33
Benoît Igne, Christian Airiau, Sameer Talwar, and Elyse Towns
4.03 Environmental Chemometrics 69
Philip K Hopke
4.04 Resampling and Testing in Regression Models with Environmetrical Applications 87
J Roca-Pardiñas, C Cadarso-Suárez, and W González-Manteiga
4.05 Application of Chemometrics in the Food Sciences 99
Paolo Oliveri, Cristina Malegori, Eleonora Mustorgi, and Monica Casale
4.06 Chemometrics in Forensics 113
Marcelo M Sena, Werickson FC Rocha, Jez WB Braga, Carolina S Silva, and Aaron Urbas
4.07 Chemometric Analysis of Sensory Data 149
D Brynn Hibbert
4.08 Smart Sensors 193
Jordi Fonollosa
4.09 Statistical Control of Measures and Processes 215
AJ Ferrer-Riquelme
4.10 Best Practice and Performance of Hardware in Process Analytical Technology (PAT) 237
Rudolf W Kessler and Waltraud Kessler
4.11 Multivariate Statistical Process Control and Process Control, Using Latent Variables 275
T Kourti
Contents of All Volumes xxvii

4.12 Batch Process Modeling and MSPC 305


S Wold, N Kettaneh-Wold, JF MacGregor, and KG Dunn
4.13 Comprehensive Chemometrics 333
Shuxia Guo, Oleg Ryabchykov, Nairveen Ali, Rola Houhou, and Thomas Bocklitz
4.14 Chemometrics in NIR Hyperspectral Imaging: Theory and Applications in the Agricultural
Crops and Products Sector 361
Juan Antonio Fernández Pierna, Philippe Vermeulen, Damien Eylenbosch, James Burger,
Bernard Bodson, Pierre Dardenne, and Vincent Baeten
4.15 Mass Spectrometry Imaging: Chemometric Data Analysis 381
Joaquim Jaumot and Carmen Bedia
4.16 Fast Analysis, Processing and Modeling of Hyperspectral Videos: Challenges and
Possible Solutions 395
Raffaele Vitale, Petter Stefansson, Federico Marini, Cyril Ruckebusch, Ingunn Burud,
and Harald Martens
4.17 Image Processing in Chemometrics 411
Siewert Hugelier, Raffaele Vitale, and Cyril Ruckebusch
4.18 Chemometrics Analysis of Big Data 437
José Camacho and Edoardo Saccenti
4.19 Systems Biology 459
L Coulier, S Wopereis, C Rubingh, H Hendriks, M Radonjic, and RH Jellema
4.20 Analysis of Metabolomics DatadA Chemometrics Perspective 483
Julien Boccard and Serge Rudaz
4.21 Data Processing for RNA/DNA Sequencing 507
Inmaculada Fuertes, Maria Vila-Costa, Jana Asselman, Benjamín Piña, and Carlos Barata
4.22 Analysis of Megavariate Data in Functional Omics 515
EF Mosleth, A McLeod, I Rud, L Axelsson, LE Solberg, B Moen, KME Gilman, EM Færgestad,
A Lysenko, C Rawlings, SN Dankel, G Mellgren, F Barajas-Olmos, LS Orozco, S Sæbø, L Gidskehaug,
A Oust, A Kohler, H Martens, and KH Liland
4.23 Spectral Map Analysis of Microarray Data 569
L Bijnens, R Verbeeck, HW Göhlmann, W Talloen, RA Ion, PJ Lewi, and L Wouters
4.24 Chemometrics in Flow Cytometry 585
Gerjen H Tinnevelt and Jeroen J Jansen
4.25 Chemometrics for QSAR Modeling 599
Roberto Todeschini, Viviana Consonni, Davide Ballabio, and Francesca Grisoni
4.26 Chemoinformatics 635
J Polanski
4.27 High-Performance GRID Computing in Chemoinformatics 677
N Sim, D Konovalov, and D Coomans

Index 703
This page intentionally left blank
1.01 Quality of Analytical Measurements: Statistical Methods for
Internal Validationq
M Cruz Ortiz, Departamento de Química, Facultad de Ciencias, Universidad de Burgos, Burgos, Spain
Luis A Sarabia and M Sagrario Sánchez, Departamento de Matemáticas y Computación, Facultad de Ciencias, Universidad de
Burgos, Burgos, Spain
Ana Herrero, Departamento de Química, Facultad de Ciencias, Universidad de Burgos, Burgos, Spain
© 2020 Elsevier B.V. All rights reserved.
This is an update of M.C. Ortiz, L.A. Sarabia, M.S. Sánchez, A. Herrero, 1.02 - Quality of Analytical Measurements: Statistical Methods for Internal
Validation, in Comprehensive Chemometrics, edited by Steven D. Brown, Romá Tauler, Beata Walczak, Elsevier, 2009, https://doi.org/10.1016/B978-
044452701-1.00090-9.

1.01.1 Introduction 3
1.01.2 Confidence and Tolerance Intervals 7
1.01.2.1 Confidence Interval 8
1.01.2.2 Confidence Interval on the Mean of a Normal Distribution 9
1.01.2.2.1 Case 1: Known variance 9
1.01.2.2.2 Case 2: Unknown variance 10
1.01.2.3 Confidence Interval on the Variance of a Normal Distribution 10
1.01.2.4 Confidence Interval on the Difference in Two Means 11
1.01.2.4.1 Case 1: Known variances 11
1.01.2.4.2 Case 2: Unknown variances 11
1.01.2.4.3 Case 3: Confidence interval for paired samples 12
1.01.2.5 Confidence Interval on the Ratio of Variances of Two Normal Distributions 12
1.01.2.6 Confidence Interval on the Median 13
1.01.2.7 Joint Confidence Intervals 13
1.01.2.8 Tolerance Intervals 13
1.01.2.8.1 Case 1: b-content tolerance interval 13
1.01.2.8.2 Case 2: b-expectation tolerance interval 14
1.01.2.8.3 Case 3: Distribution free intervals 14
1.01.3 Hypothesis Tests 15
1.01.3.1 Elements of a Hypothesis Test 15
1.01.3.2 Hypothesis Test on the Mean of a Normal Distribution 19
1.01.3.2.1 Case 1: Known variance 19
1.01.3.2.2 Case 2: Unknown variance 19
1.01.3.2.3 Case 3: The paired t-test 19
1.01.3.3 Hypothesis Test on the Variance of a Normal Distribution 20
1.01.3.4 Hypothesis Test on the Difference in Two Means 20
1.01.3.4.1 Case 1: Known variances 20
1.01.3.4.2 Case 2: Unknown variances 21
1.01.3.5 Test Based on Intervals 22
1.01.3.6 Hypothesis Test on the Variances of Two Normal Distributions 23
1.01.3.7 Hypothesis Test on the Comparison of Several Independent Variances 24
1.01.3.7.1 Case 1: Cochran’s test 24
1.01.3.7.2 Case 2: Bartlett’s test 25
1.01.3.7.3 Case 3: Levene’s test 25
1.01.3.8 Goodness-of-Fit Tests: Normality Tests 26
1.01.3.8.1 Case 1: Chi-square test 26
1.01.3.8.2 Case 2: D’Agostino normality test 27
1.01.4 One-Way Analysis of Variance 28
1.01.4.1 The Fixed Effects Model 28
1.01.4.2 Power of the Fixed Effects ANOVA model 30
1.01.4.3 Uncertainty and Testing of the Estimated Parameters in the Fixed Effects Model 31
1.01.4.3.1 Case 1: Orthogonal contrasts 32
1.01.4.3.2 Case 2: Comparison of several means 32
1.01.4.4 The Random Effects Model 33

q
Change History: October 2019. M. Cruz Ortiz, Luis A. Sarabia, M. Sagrario Sánchez, Ana Herrero added MATLAB live-scripts for the computations; re-written
introduction to tolerance intervals; corrected estimates in Table 13; updated texts; corrected mistakes and updated references.

Comprehensive Chemometrics, 2nd edition, Volume 1 https://doi.org/10.1016/B978-0-12-409547-2.14746-8 1


2 Quality of Analytical Measurements: Statistical Methods for Internal Validation

1.01.4.5 Power of the Random Effects ANOVA model 35


1.01.4.6 Confidence Intervals for the Estimated Parameters in the Random Effects Model 35
1.01.5 Statistical Inference and Validation 35
1.01.5.1 Trueness 35
1.01.5.2 Precision 36
1.01.5.3 Statistical Aspects of the Experiments to Determine Precision 39
1.01.5.4 Consistency Analysis and Incompatibility of Data 39
1.01.5.4.1 Case 1: Elimination of data 39
1.01.5.4.2 Case 2: Robust methods 41
1.01.5.5 Accuracy 43
1.01.5.6 Ruggedness 43
1.01.6 Appendix 45
1.01.6.1 Some Basic Elements of Statistics 45
1.01.6.2 The Normal Distribution 46
1.01.6.3 Student’s t Distribution 46
1.01.6.4 The c2 (Chi-square) Distribution 47
1.01.6.5 The F Distribution 48
1.01.6.6 Convergence of Random Variables 48
1.01.6.7 Some Computational Aspects 48
1.01.6.7.1 Normal distribution 49
1.01.6.7.2 Student’s t distribution with n degrees of freedom 49
1.01.6.7.3 c2 distribution with n degrees of freedom 49
1.01.6.7.4 Fn1, n2 distribution with n1 and n2 degrees of freedom 49
1.01.6.7.5 Power for the z-test, Eq. 50
1.01.6.7.6 Power for the t-test, Eq. 50
1.01.6.7.7 Power for the chi-square test, Eq. 50
1.01.6.7.8 Power for the F-test, Eq. 50
1.01.6.7.9 Power for fixed effects ANOVA, Eq. 50
1.01.6.7.10 Power for random effects ANOVA, Eq. 50
References 50

Nomenclature
1 L a Confidence level
1 L b Power
CCa Limit of decision
CCb Capability of detection
Fn1, n2 F distribution with n1 and n2 degrees of freedom (d.f.)
H0 Null hypothesis
H1 Alternative hypothesis
N(m,s) Normal distribution with mean m and standard deviation s
NID(m,s) (Normally and Independently Distributed) independent random variables equally distributed as normal with mean
m and standard deviation s
s Sample standard deviation
s2 Sample variance
tn Student’s t distribution with n degrees of freedom (d.f.)
x Sample mean
V(X) Variance of the random variable X
a Significance level, probability of type I error
b Probability of type II error
D Bias (systematic error)
3 Random error

m Mean
n Degree(s) of freedom, d.f.
s Standard deviation
s2 Variance
sR Reproducibility (as standard deviation)
sr Repeatability (as standard deviation)
cn2 c2 (chi-square) distribution with n degrees of freedom
Quality of Analytical Measurements: Statistical Methods for Internal Validation 3

1.01.1 Introduction

Every day millions of analytical determinations are made in thousands of laboratories all around the world. These measurements
are necessary for assessment of merchandise in the commercial interchanges, supporting health care, maintaining security, for
quality control of water and environment, for characterization of raw materials and manufactured products, and for forensic anal-
yses. Practically, any aspect of the contemporary social activity is somehow supported in the analytical measurements. The cost of
these measurements is high, but the cost of the decisions made based on incorrect results is much greater. For example, a test that
wrongly shows the presence of a forbidden substance in a food destined for human consumption can result in an expensive claim,
the confirmation of the presence of an abuse drug can lead to serious judicial sentences or doping in the sport practice may result in
severe sanctions. The importance of providing a correct result is evident but it is equally important to be able to prove that the result
is correct.
Once an analytical problem is posed to a laboratory and the analytical method is selected, the next step is the in-house validation
of the method. This is the process of defining the analytical requirements to respond to the problem and to confirm that the consid-
ered method has performance characteristics consistent with those required. The results of the validation experiments must be eval-
uated in order to ensure that the method meets the measurement required specification.
The set of operations to determine the value of an amount (measurand) suitably defined is called the measurement. The method
of measurement is the sequence of operations that is used when conducting the measurements. It is documented with enough
details so that the measurement may be done without additional information.
Once a method is designed or selected, it is necessary to evaluate its performance characteristics and to identify the factors that
can change these characteristics and to what extent they can change. If, in addition, the method is developed to solve a particular
analytical problem, it is necessary to verify that the method is fit for purpose.1 This process of evaluation is called validation of the
method. It implies the determination of several parameters that characterize the method performance: decision limit, capability of
detection, selectivity, specificity, ruggedness, and accuracy (trueness and precision). In any case, they are the measurements them-
selves which allow evaluation of the performance characteristics of the method and its fit for purpose. In addition, when using the
method, the obtained measurements are also the ones that will be used to make decisions on the analyzed sample, for example,
whether the amount of an analyte fulfills a legal specification. Therefore, it is necessary to suitably model the data that a method
provides. In what follows we will consider that the data provided by the analytical method are real numbers; other possibilities exist,
for example, the count of bacteria or impacts in a detector take only (discrete) natural values, or also, sometimes, the data resulting
from an analysis are qualitative, for example, the identification of an analyte through its m/z ratios in a mass spectrometry-
chromatography analysis.
With regard to the analytical measurement, it is admitted that the value, x, provided by the method of analysis consists of three
terms, the true value of the parameter m, a systematic error (bias) D, and a random error 3 with zero mean, in an additive way as
expressed in Eq. (1):
x¼mþDþ3 (1)
All the possible measurements that a method can provide when analyzing a sample constitute the population of the measure-
ments. This is indeed a theoretical situation because it is being assumed that there are infinitely many samples and that the method
of analysis remains unalterable. In these conditions, the model of the analytical method, Eq. (1), is mathematically a random vari-
able, X, with mathematical expectation m þ D and variance equal to the variance of 3; in statistical notation, E(X) ¼ m þ D and
V(X) ¼ V(3), respectively.
A random variable, and thus the analytical method, is described by its cumulative distribution function FX(x), that is, the prob-
ability that the method provides measurements less than or equal to x for any value x. Symbolically, this is written as FX(x) ¼ pr
{X  x} for any real value x. In most of the applications, it is assumed that FX(x) is differentiable, which implies, among other
things, that the probability of obtaining exactly a specific value is zero. In the case of a differentiable cumulative distribution func-
tion, the derivative of FX(x) is the probability density function (pdf) fX(x). Any function f(x) such that it is positive, f(x)  0, and the
area under the function is 1, !Rf(x)dx ¼ 1, is the pdf of a random variable. The probability that the random variable X takes values in
the interval [a, b] is the area under the pdf over the interval [a, b], that is,
Z b
prfX ˛ ½a; bg ¼ f ðxÞ dx (2)
a

and the mean and variance of X are written as in Eqs. (3), (4), respectively
Z
Eð X Þ ¼ x f ðxÞdx (3)
R
Z
V ðXÞ ¼ ðx  Eð X ÞÞ2 f ðxÞdx (4)
R

In general, mean and variance do not characterize in a unique way a random variable and therefore neither the method of anal-
ysis. Fig. 1 shows the pdf of four random variables with the same mean 6.00 and standard deviation 0.61.
4 Quality of Analytical Measurements: Statistical Methods for Internal Validation

1.2 1.2
(A) (B)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
4 5 6 7 8 4 5 6 7 8

1.2 1.2
(C) (D)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0
4 5 6 7 8 4 5 6 7 8
Fig. 1 Probability density functions of four random variables with mean 6 and variance 0.375. (A) Uniform in [4.94, 7.06]; (B) Symmetric triangular
in [4.5, 7.5]; (C) Normal N(6, 0.61); (D) Weibull with shape 1.103 and scale 0.7 shifted to give a mean of 6. Dotted vertical lines mark the interval
[5.0, 7.0].

These four distributions, uniform or rectangular (Fig. 1A), triangular (Fig. 1B), normal (Fig. 1C), and Weibull (Fig. 1D), are
frequent in the scope of analytical determinations, and they appear in Appendix E of the EURACHEM/CITAC Guide1 and also
they are used in metrology.2
If the only available information regarding a quantity X is the lower limit, l, and the upper limit, u, but the quantity could be
anywhere in between, with no idea of whether any part of the range is more likely, then a rectangular distribution in the interval [l, u]
would be assigned to X. This is so because it is the pdf that maximizes the “information entropy” of Shannon, in other words the pdf
that adequately characterizes the incomplete knowledge about X. Frequently, in reference material, the certified concentration is
expressed in terms of a number and unqualified limits (e.g., 1000  2 mg L 1). In this case, a rectangular distribution should be
used (Fig. 1A).
When the available information concerning X includes the knowledge that values close to c (between l and u) are more likely
than those near the bounds, the adequate distribution is a triangular one (Fig. 1B), with the maximum of its pdf in c.
If a good location estimate, m, and a scale estimate, s, are the only information available regarding X, then, according to the prin-
ciple of maximum entropy, a normal probability distribution N(m,s) (Fig. 1C) would be assigned to X (remember that m and s may
have been obtained from repeated applications of a measurement method).
Finally, the Weibull distribution (Fig. 1D) is very versatile; it can mimic the behavior of other distributions such as the normal or
exponential. It is adequate for the analysis of reliability of processes, and in chemical analysis it is useful in describing the behavior
of the figures of merit of a long-term procedure. For example, the distribution of the capability of detection CCb3 is a Weibull one or
the distribution of the determinations of ammonia in water by UV-vis spectroscopy during 350 different days in Aldama.4
In the four cases given in Fig. 1, the probability of obtaining values between 5 and 7 has been computed with Eq. (2). For the
uniform distribution (Fig. 1A) this probability is 0.94, whereas for the triangular distribution (Fig. 1B) is 0.88, for the normal distri-
bution (Fig. 1C) is 0.90, and for the Weibull distribution (Fig. 1D), 0.93. Sorting in decreasing order of the proportion of values that
each distribution accumulates in the interval [5.0, 7.0] we have uniform, Weibull, normal, and triangular although the triangular
and normal distributions tend to give values symmetrically around the mean and the Weibull distribution does not. If another
interval is considered, say [5.4, 6.6], the distributions accumulate probabilities of 0.57, 0.64, 0.67, and 0.54, respectively, in which
Quality of Analytical Measurements: Statistical Methods for Internal Validation 5

Table 1 Values of b such that p ¼ pr{X < b} where X is each one of the random variables defined in the caption of Fig. 1.

Random variable
P Uniform Triangular Normal Weibull

0.01 4.96 4.71 4.58n 5.34m


0.05 5.05 4.97n 5.00 5.37m
0.50 6.00m 6.00m 6.00m 5.83n
0.95 6.95n 7.03 7.01 7.22m
0.99 7.04n 7.29 7.42 8.12m

n, Minimum b among the four distributions; m, Maximum b among the four distributions.

the difference among values is larger than before and, in addition, sorted the distributions as normal, triangular, uniform, and
Weibull.
If for each of those variables, value b is determined so that there is a fixed probability, p, of obtaining values below b (i.e., the
value b such that p ¼ pr{X < b} for each distribution X), the results of Table 1 are obtained. For example, second row, 5% of the
times the uniform distribution at hand gives values less than b ¼ 5.05, less than 4.97 if it is the triangular distribution, and so on. In
the table, the extreme values among the four distributions for each probability p have been identified, and large differences are
observed caused by the form in which the values far from 6 are distributed (notice the differences in Fig. 1 for the normal, the trian-
gular, or the uniform distribution) and also due to the asymmetry of the Weibull distribution.
Therefore, the mean and variance of a random variable give very limited information on the values provided by the random
variable, unless additional information is at hand about the form of its density (pdf). For example, if one knows that the distribu-
tion is uniform or symmetrical triangular or normal, the random variable is completely characterized by its mean and variance.
In practice, the pdf of a method of analysis is unknown. We only have a finite number, n, of measurements, which are the
outcomes obtained when applying repeatedly (n times) the same method to the same sample. These n measurements constitute
a statistical sample of the random variable X defined by the method of analysis.
Fig. 2 shows histograms of 100 results obtained when applying four methods of analysis, named A, B, C, and D, to aliquot parts
of a sample to determine an analyte. Clearly, the four methods behave differently.
From the experimental data, the (sample) mean and variance are computed as
Pn
x
x ¼ i¼1 i (5)
n
Pn
 xÞ2
i¼1 ðxi
s2 ¼ (6)
n1
x and s2 are estimates of the mean and variance of the distribution of X. These estimates with the data in Fig. 2 are shown in Table 2.
According to the model of Eq. (1), Eð X Þ ¼ m þ Dxx, that is, the sample mean estimates the true value m plus the bias D.
Assuming that the true value is m ¼ 6 and subtracting it from the sample means in the first row of Table 2, the bias estimated
for methods A and B would be 0.66 and 0.16 for methods C and D. The bias of a method is one of its performance characteristics
and must be evaluated during the validation of the method. In fact, technical guides, for example, the one by the International Orga-
nization for Standardization (ISO), state that, for a method, better trueness means less bias. To estimate the bias, it is necessary to
have samples with known concentration m (e.g., certified material, spiked samples).
The value of the variance is independent of the true content, m, of the sample. For this reason, to estimate the variance, it is only
necessary to have replicated measurements on aliquot parts of the same sample. The second row of Table 2 shows that methods B
and C have the same variance, 1.26, which is 5 times greater than the one of methods A and D, 0.25. The dispersion of the data
obtained with a method is the precision of the method and constitutes another performance characteristic to be determined in
the validation of the method. In agreement with model in Eq. (1), a measure of the dispersion is the variance V(X), which is esti-
mated by means of s2.
In some occasions, for evaluating trueness and precision, it is more descriptive to use statistics other than mean and variance. For
example, when the distribution is rather asymmetric, as in Fig. 1D, it is more reasonable to use the median than the mean. The
median is the value in which the distribution accumulates 50% of the probability, 5.83 for the pdf in Fig. 1D and 6.00 for the other
three distributions, which are symmetric around their mean. In practice, it is frequent to see the presence of anomalous data
(outliers) that influence the mean and above all the variance, which is improperly increased; in these cases, it is advisable to use
robust estimates of central tendency and spread (dispersion).5–7 Details can be found in the chapter of the present book devoted
to robust procedures.
Fig. 2 and Table 2 show that the two characteristics of a measurement method, trueness and precision, are independent to one
another, in the sense that a method with better trueness (less bias), methods C and D, can be more, case D, or less, case C, precise.
Analogously, methods A and B have an appreciable bias but A is more precise than B. A method is said to be accurate when it is
precise and fulfills trueness.
6 Quality of Analytical Measurements: Statistical Methods for Internal Validation

40 40
(A) (B)

30 30

20 20

10 10

0 0
3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10

40 40
(C) (D)

30 30

20 20

10 10

0 0
3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 10
Fig. 2 Frequency histograms of 100 measures obtained with four different analytical methods, named (A), (B), (C), and (D), on aliquot parts of
a sample. Dotted vertical lines mark the interval [5.0, 7.0].

Table 2 Some characteristics of the distributions in Fig. 2.

Method
A B C D

Mean, x 6.66 6.66 6.16 6.16


Variance, s2 0.25 1.26 1.26 0.25
fr {5<X<7} 0.70 0.56 0.58 0.98
fr{X<6} 0.08 0.29 0.49 0.39
pr{5 < N(x, s) < 7} 0.75 0.55 0.62 0.94
pr{N(x, s) < 6} 0.09 0.28 0.44 0.37

fr, frequencies; pr, probabilities.

Histograms are estimates of the pdf and allow evaluation of the performance of each method in a more detailed way than when
only considering trueness and precision. For example, the probability of obtaining values in any interval can be estimated with the
histogram. The third row in Table 2 shows the frequencies for the interval [5.0, 7.0]. Method D (best trueness and precision among
the four) provides 98% of the values in the interval, whereas method B (worst trueness and precision) provides only 56% of the
values in the interval. Nonetheless, trueness and precision should be jointly considered. See how according to data in Table 2
the effect of increasing the precision, using method A instead of B when the bias is “high” is an increase of 14% of results of the
measurement method to be in the interval [5.0, 7.0], whereas when the bias is small (C and D), there is an increase of 40%.
This behavior should be taken into account when optimizing a method and also in the ruggedness analysis, which is another perfor-
mance characteristic to be validated according to most of the guides. As can be seen in the fourth row of Table 2, if the method that
provides more results below 6 is needed, C would be the method selected.
Quality of Analytical Measurements: Statistical Methods for Internal Validation 7

The previous explanations show the usefulness of knowing the pdf of the results of a method of analysis. As in practice we have
only a limited number of results, two basic strategies are possible to estimate the pdf: (1) to assess that the experimental data are
compatible with a known distribution (e.g., normal) and then use the corresponding pdf; (2) to estimate the pdf by a data-driven
technique based on a computer-intensive method such as the kernel method8 or by using other methods such as adaptive or penal-
ized likelihood.9,10 The data of Fig. 2 can be adequately modeled by a normal distribution, according to normality hypothesis tests
whose details are explained later in Section “Goodness-of-Fit Tests: Normality Tests”. The fitted normal distributions are used to
compute the probabilities of obtaining values in the interval [5.0, 7.0] or less than 6, last two rows in Table 2. When comparing
these values with those computed with the empirical histograms (compare rows 3 and 5, and rows 4 and 6), there are no appreciable
differences and the normal pdf can be used instead.
In the validation of an analytical method and during its later use, statistical methodological strategies are needed to make deci-
sions from the available experimental data. The knowledge of these strategies supposes a way of thinking and acting that, subor-
dinated to the chemical knowledge, makes it objective both the analytical results and their comparison with those of other
researchers and/or other analytical methods.
Ultimately, a good method of analysis is a serious attempt to come close to the true value of the measurement, always unknown.
For this reason, the result of a measurement has to be accompanied by an evaluation of uncertainty or its degree of reliability. This is
done by means of a confidence interval. When the requirement is to establish the quality of an analytical method, then its capability
of detection, precision, etc. must be compared with those corresponding to other methods. This is formalized with a hypothesis test.
Confidence intervals and test of hypothesis are the basic tools in the validation of analytical methods.
In this introduction, the word sample has been used with two different meanings. Usually, there is no confusion because the
context allows one to distinguish whether it is a sample in the statistical or chemical sense.
In Chemistry, according to the International Union for Pure and Applied Chemistry (IUPAC) (Page 50 in Section 18.3.2 of
Inczédy et al.11), “sample” should be used only when it is a part of a selected material of a great amount of material. This meaning
coincides with that of a statistical sample and implies the existence of sampling error, that is, error caused by the fact that the sample
can be more or less representative of the amount in the material. For example, suppose that we want to measure the amount of
pesticide that remains in the ground of an arable land after a certain time. We take several samples “representative” of the ground
of the parcel (statistical sampling) and this introduces an uncertainty in the results characterized by a variance (theoretical) ss 2.
Afterward, the quantity of pesticide in each chemical sample is determined by an analytical method, which has its own uncertainty,
characterized by sm2, in such a way that the uncertainty in the quantity of pesticide in the parcel is ss 2 þ sm2 provided that the
method gives results independent of the location of the sample. Sometimes, when evaluating whether a method is adequate for
a task, the sampling error can be an important part of the uncertainty in the result and, of course, should be taken into account
to plan the experimentation.
When the sampling error is negligible, for example, when a portion is taken from a homogeneous solution, the IUPAC recom-
mends using words such as test portion, aliquot, or specimen.
In summary, there is a clear link between measurement method and a random variable which is why the probability is the
natural form of expressing experimental uncertainty. This is thus the focus of the present article that is organized as follows:
Section “Confidence and Tolerance Intervals” describes confidence intervals to measure bias and precision under the normality
hypothesis and tolerance intervals, useful in evaluating the fit for purpose of a method. Also, a nonparametric interval on the
median is described.
Section “Hypothesis Test” is devoted to making decisions based on experimental data that, as such, are affected by uncertainty. In
this section, the computation of the power of a test is systematically proposed as a key element to evaluate the quality of the decision
at the desired significance level. A brief incursion into tests based on intervals is also made as they solve the problem of deciding
whether an interval of values is acceptable, for example, a relative error less than 10% in absolute value. The section ends with some
goodness-of-fit tests to evaluate the compatibility of a theoretical probability distribution with some experimental data.
Section “One-Way Analysis of Variance” is dedicated to the analysis of variance (ANOVA) for both fixed and random effects, and
in Section “Statistical Inference and Validation” some more specific questions related to the usual parameters of the analytical
method validation and their relation with the developed statistical methodologies are analyzed.
Mathematical proofs are not covered in this article and, to be operative from a practical point of view, several examples have
been included so that the reader can verify the understanding of the formulas and the argumentation for their thoughtful use.
This aspect is completed with the inclusion of an Appendix where some essential aspects related to the effectiveness of the statistical
models and the limits laws are described. The Appendix also contains the necessary sentences, in MATLAB code, to repeat all the
calculations proposed along the article. The same sentences are also available as supplementary material in the form of MATLAB
.mlx live scripts (at least release R2016a is needed to read and execute them).

1.01.2 Confidence and Tolerance Intervals

There are some important questions when evaluating a method, for example, “in a given sample, what is the maximum value that it
provides?” that, due to the random character of the results, cannot be answered with just a number.
In order to include the degree of certainty in the answer, the question should be reformulated as: What is the maximum value, U,
that will be obtained 95% of the times that the method is used in the sample? The answer to the question thus posed would be
8 Quality of Analytical Measurements: Statistical Methods for Internal Validation

CONFIDENCE INTERVALS UNDER NORMAL


DISTRIBUTION(S)
ONE SAMPLE TWO INDEPENDENT SAMPLES
For difference
For mean P0 in means P1-P2

Known Known
variance variances

Unknown Unknown Equal


variance variances
Unequal

For standard For ratio of standard


deviation V0 deviations V1/V2

Fig. 3 Diagram summarizing the different cases for computing confidence intervals.

a tolerance interval, and to build it the probability distribution must be known. For instance, let us suppose that it is a N(m,s) and
we denote by z0.05 the critical value of a N(0,1) ¼ Z distribution, the one that accumulates probability 0.95. Then, a possible answer
is U ¼ m þ z0.05s because then the probability that the analytical method gives values greater than U is pr{method > U} ¼ pr
{N(m,s) > m þ z0.05s}, which, according to the result in Appendix, is equal to pr{Z > z0.05} ¼ 0.05. In general, for any percentage
of results 100(1  a)%, the maximum value provided by the method would be
U ¼ m þ za s (7)
with a probability a that the aforementioned assertion is false.
If, instead, the interest was in the value L so that the 100(1  a)% of the results are greater than L, then the answer would be
L ¼ m  za s (8)
Finally, the interval [L, U] that contains 100(1  a)% of the values obtained with the method would be
h i
½L; U  ¼ m  za=2 s; m þ za=2 s (9)

An analytical example where one of these tolerance intervals with a normal distribution N(m,s) needs to be computed would be:
An analytical method gives values (mg L 1) that follow a N(9, 0.5) distribution when measuring a standard with 9 mg L 1. To assess
whether the method is still properly working, tenstandards are
ffi  included in the daily sequence of determinations. The probability
pffiffiffiffiffi
distribution
pffiffiffiffiffiffi of the mean of these ten values is a N 9; 0:5= 10 . Following Eq. (9), the tolerance interval at 95% level is 9  1:96 
0:5= 10 ¼ 9  0.31 mg L 1. Consequently, if one day a mean of, say, 9.5 mg L 1 is obtained, the method does not work properly
because 9.5 does not belong to the tolerance interval and the method should be revised, at the risk of doing this revision uselessly
5% of the times. Notice that
ffi  the tolerance interval is always the same, built at the desired confidence level 100(1  a)% with the
pffiffiffiffiffi
distribution N 9; 0:5= 10 and it is not updated daily with the new samples.
Different to Eq. (9), two variants of tolerance intervals, namely the b-content and the b-expectation tolerance intervals, are
explained in Section “Tolerance Intervals” due to their relevance in the context of validation of analytical methods. In any case,
any of them is completely different from the confidence intervals introduced and developed in the following sections (from Section
“Confidence Interval” to Section “Joint Confidence Intervals”).
After explaining all the studied cases, the section finishes with a comparative analysis of both concepts (tolerance and confidence
intervals).

1.01.2.1 Confidence Interval


We have already remarked that estimation of solely the mean, x, and variance, s2, from n independent results provides very limited
information on the method performance. The objective now is to make affirmations of the type “in the sample, the amount of the
analyte m, estimated by x, is between L and U (m ˛ [L, U])” with a certain probability that the statement is true. Following this partic-
ular example, we should consider that x is a value taken by the random variable X  (sample mean) and use its distribution to answer
the new question. Its distribution function is obtained mathematically from the one of X, FX(x), and thus depends on the informa-
tion we have about FX(x) (e.g., if the variance is known or should be also estimated, etc.).
Quality of Analytical Measurements: Statistical Methods for Internal Validation 9

In the general case, with a random variable X, obtaining a confidence interval for X from a sample x1, x2,., xn consists of obtain-
ing two functions l(x1, x2,., xn) and u(x1, x2,., xn) such that
prfX ˛ ½lug ¼ prfl  X  ug ¼ 1  a (10)
1  a is the confidence level and a is the significance level, meaning that the statement that the value of X is between l and u will be
false 100a% of the times.
In the next sections this idea will be particularized for some different cases, according to the random variable X of interest. Fig. 3
is a diagram that summarizes the cases studied in the following sections. All the examples are written in MATLAB live-script file
Intervals_section1022_live.mlx, in the supplementary material, so that they can be easily repeated or adapted for the reader’s
own data.

1.01.2.2 Confidence Interval on the Mean of a Normal Distribution


1.01.2.2.1 Case 1: Known variance
Suppose that we have a random variable that follows a normal distribution with known variance. This will be the case, for example,
of using an already validated method of analysis. The assumption means that we know that 3 in Eq. (1) is normally distributed and
also its variance. If we are using samples of size n and taking into account the properties of the normal distribution (see Appendix),
 
the sample mean,X,  is a random variable N m; s=pffiffiffi ; thus, the particular expression of Eq. (10) for this random variable is
n
 
s   m þ z psffiffiffi ¼ 1  a
pr m  za 2 pffiffiffi  X
=
a2 n
= (11)
n
that is, 100(1  a)% of the values of the sample mean are in the interval in Eq. (11). A simple algebraic manipulation (subtract m
 multiply by  1) gives
and X,
 
pr X  z psffiffiffi  m  X
=  þ z psffiffiffi ¼ 1  a
= (12)
a2 n a2 n

Therefore, according to Eq. (10), the confidence interval on the mean that is obtained from Eq. (12) is

  z psffiffiffi ; X
X =  þ z psffiffiffi = (13)
a2 n a2 n

Analogously, the confidence intervals at confidence level 100(1  a)% for the maximum and minimum values of the mean are
computed from Eqs. (14), (15), respectively
 
pr m  X þ za psffiffiffi ¼ 1  a (14)
n
 
pr X  za psffiffiffi  m ¼ 1  a (15)
n
 
and, thus, the corresponding one-sided intervals would be  N; X   za psffiffi ; N .
 þ za psffiffi and X
n n

In an experimental context, when measuring n aliquot parts of a test sample, we obtain n values x1, x2,., xn. Their sample mean x
 and is also an estimate of the true value m.
is the particular value taken by the random variable X
Example 1: Suppose that an analytical method follows a N(m,4) and we have a sample of size 10 with values 98.87, 92.54, 99.42,
105.66, 98.70, 97.23, 98.44, 103.73, 94.45 and 101.08. With this sample, the mean is 99.01 and using Eq. (13), the interval at
pffiffiffiffiffiffi pffiffiffiffiffiffi
95% confidence level is 99:01  1:96  4 =
10 ; 99:01 þ 1:96  4=
10 ¼ [96.53, 101.49].
For the interpretation of this interval, notice that with different samples of size 10 (same analytical method), different intervals
will be obtained at the same 95% confidence level. The endpoints of these intervals are nonrandom values, and the unknown mean
value, which is also a specific value, will or will not belong to the interval. Therefore, the affirmation “the interval contains the
mean” is a deterministic assertion that is true or false for each of the intervals. What one knows is that it is true for
100(1  a)% of those intervals. In our case, as 95% of the constructed intervals will contain the true value, we say, at 95% confi-
dence level, that the interval [96.53, 101.49] contains m.
This is the interpretation with the frequentist approach adopted in this article, that is to say that the information on random
variables is obtained by means of samples of them and that the parameters to be estimated are not known but are fixed amounts
(e.g., the amount of analyte in a sample, m, is estimated by the measurement results obtained by analyzing it n times). With
a Bayesian approach to the problem, a probability distribution is attributed to the amount of analyte m and once fixed an interval
of interest [a,b], the “a priori” distribution of m, the experimental results, and the Bayes’ theorem are used to calculate the probability
10 Quality of Analytical Measurements: Statistical Methods for Internal Validation

a posteriori that m belongs to the interval [a,b]. It is shown that, although in most practical cases the uncertainty intervals obtained
from repeated measurements using either theory may be similar, their interpretation is completely different. The works by Lira and
Wöger12 and Zech13 are devoted to compare both approaches from the point of view of the experimental data and their uncertainty.
Also, an introduction to Bayesian methods for analyzing chemical data can be seen in Armstrong and Hibbert.14,15

1.01.2.2.2 Case 2: Unknown variance


Suppose a normally distributed random variable with unknown variance that must be estimated, together with the mean, from n
experimental data. The confidence interval is computed as in Case 1, but now the random variable X  follows (see Appendix)
a Student’s t distribution with n  1 degrees of freedom (d.f.); thus, the interval at the 100(1  a)% confidence level is obtained
from
 
pr X  ta=2;n psffiffiffi  m  X
 þ ta=2;n psffiffiffi ¼ 1  a (16)
n n
where ta/2,n is the upper percentage point (100 a 2%) of the Student t distribution with n ¼ n  1 d.f. and s is the sample standard
=

deviation. Analogously, the one-sided intervals at the 100(1  a)% confidence level come from
 
pr m  X þ ta;n psffiffiffi ¼ 1  a (17)
n
 
pr X  ta;n psffiffiffi  m ¼ 1  a (18)
n

Example 2: Suppose that the probability distribution of an analytical method is a normal, but its standard deviation is unknown.
With the data of Example 1, the sample standard deviation, s, is computed as 3.90. As t0.025,9 ¼ 2.262 (see Appendix), the confi-
dence interval at 95% level is [99.01  2.26  1.24, 99.01 þ 2.26  1.24] ¼ [96.21, 101.81]. The 95% confidence interval on the
minimum of the mean (i.e., the 95.0% lower confidence bound) is made up, according to Eq. (18), by all the values greater than
96.74 ¼ 99.01  1.83  1.24. The corresponding interval on the maximum (upper confidence bound for mean), Eq. (17), will be
made up by the values less than 101.28 ¼ 99.01 þ 1.83  1.24.
The length of the confidence intervals from Eqs. (12)–(15) is a function of the sample size and tends towards zero when the
sample size tends to infinity. This functional relation permits the computation of the sample size needed to obtain an interval
 2
2 za=2 s
of given length, d. It will suffice to consider 2d ¼ za=2 psffiffin and take as n the nearest integer greater than d . For example, if
we want a 95% confidence interval with length d less than 2, in the hypothesis of Example 1, we will need a sample size greater
than or equal to 62.  2
2 ta=2;n s
The same argument can be applied when the standard deviation is unknown. However, in this case, to compute n by d

it is necessary to have an initial estimation of s, which, in general, is obtained in a pilot study with size n0 , in such a way that in the
previous expression the d.f., n, are n0  1. An alternative is to define the desired length of the interval in standard deviation units
(remember that the standard deviation is unknown). For instance, in Example 2, if we want d ¼ 0.5s, we will need a sample size
greater than (4za/2)2 ¼ 61.5; note the substitution of ta/2,n by za/2, which is mandatory because we do not have the sample size
needed to compute ta/2,n, which is precisely what we want to estimate.

1.01.2.3 Confidence Interval on the Variance of a Normal Distribution


In this case, the data come from a N(m,s) distribution with m and s unknown, and we have a sample with values x1, x2,., xn. The
distribution of the random variable “sample variance” S2 is related to the chi-square distribution, c2 (see Appendix). As a conse-
quence, the 100(1  a)% confidence interval for the variance s2 is obtained from
( )
ðn  1ÞS2 2 ðn  1ÞS2
pr  s  ¼1a (19)
c2a=2;n c21a=2;n

where c2a/2, n is the critical value of a c2 distribution with n ¼ n  1 d.f. at significance level a/2. As in the previous case for the
sample mean, we should distinguish between the random variable sample variance S2 and one of its values, s2, computed with Eq.
(6) from sample x1, x2, ., xn.
The intervals for the maximum and minimum of the variance at 100(1  a)% confidence level are obtained from Eqs. (20),
(21), respectively.
( )
2 ðn  1ÞS2
pr s  ¼1a (20)
c21a;n
Quality of Analytical Measurements: Statistical Methods for Internal Validation 11

( )
ðn  1ÞS2
pr  s2 ¼1a (21)
c2a;n

Example 3: Knowing that the n ¼ 10 data of Example 2 come from a normal distribution with both mean and variance unknown, the
95% confidence interval on s2 is found from Eq. (19) as [7.21, 50.81] because s2 ¼ 15.25, c20.025, 9 ¼ 19.02, and c20.975, 9 ¼ 2.70. If
the analyst is interested in obtaining a confidence interval for the maximum variance, the 95% upper confidence interval is found
from Eq. (20) as [0, 41.27] because c20.95, 9 ¼ 3.33, that is, the upper bound for the variance is 41.27 with 95% confidence. Notice
the lower bound in 0. To obtain confidence intervals on the standard deviation, it suffices to take the square root of the aforemen-
tioned intervals because this operation is a monotonically increasing transformation; therefore, the intervals at 95% confidence
level on the standard deviation are [2.69, 7.13] and [0, 6.42], respectively.
The sample size, n, needed so that s2/s2 is between 1  k and 1 þ k is given by the nearest integer greater than 1 þ
h pffiffiffiffiffiffiffiffiffiffiffi . i2
12 z
=
a=2 1 þ k þ 1 k . For example, for k ¼ 0.5, such that the length of the confidence interval verifies 0.5 < s2/s2 < 1.5,
we would need n ¼ 40 data (at least). Just for comparative purposes, we will admit in the example that with the sample of size
40 we obtain the same variance s2 ¼ 15.25. As c20.025, 39 ¼ 58.12, and c20.975, 39 ¼ 23.65, the two-sided interval at 95% confidence
level is now [10.23, 25.15], which verifies the required specifications.

1.01.2.4 Confidence Interval on the Difference in Two Means


1.01.2.4.1 Case 1: Known variances
Consider two independent random variables, N1 and N2, distributed as N(m1,s1) and N(m2,s2) with unknown means and known
variances s12 and s22. We wish to find a 100(1  a)% confidence interval on the difference in means m1  m2. With a random
sample of n1 observations from the first distribution, x11, x12, ., x1 n1, and n2 observations from the second one, x21, x22, .,
x2 n2, the 100(1  a)% confidence interval on m1  m2 is obtained from the equation
8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 9
< s 2 s 2 2 2=
1  X
pr ðX  2 Þ  za=2 1
þ 2  m1  m2  ðX  2 Þ þ za=2 s1 þ s2 ¼ 1  a
1  X (22)
: n1 n2 n1 n2 ;

where X 1 and X
 2 are the random variables of the sample mean, which take the values x1 and x2 . The reader can easily write the
expressions analogous to Eqs. (14), (15) for the one-sided intervals.

1.01.2.4.2 Case 2: Unknown variances


The approach to this topic is similar to the previous case, but here even the variances s21 and s22 are unknown. However, it can be
reasonable to assume that they are equal, s21 ¼ s22 ¼ s2, and that the differences observed in their estimates with the samples, s12
and s22, are not significant. The methodology to decide whether this can be assumed, or not, is explained later, in Section “Hypoth-
esis Test”.
An estimate of the common variance s2 is given by the pooled sample variance in Eq. (23) which is an arithmetic average of both
variances weighted by the corresponding d.f.,
ðn1  1Þs21 þ ðn2  1Þs22
s2p ¼ (23)
n1 þ n2  2
The 100(1  a)% confidence interval is obtained from the following equation:
 rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 
pr X  2  ta=2;n sp 1 þ 1  m1  m2  X
1  X  2 þ ta=2;n sp 1 þ 1
1  X ¼1a (24)
n1 n2 n1 n2
where n ¼ n1 þ n2  2 are the d.f. of the Student’s t distribution. The one-sided intervals at 100(1  a)% confidence level have the
analogous expressions deduced from Eq. (24) by substituting ta/2,n for ta,n. If a fixed length is desired for the confidence interval, the
computation explained in Section “Confidence Interval on the Mean of a Normal Distribution” can be immediately adapted to
obtain the needed sample size.
Example 4: We want to study the stability of a substance after being stored for a month. Here stability means that the content of the
substance remains unchanged. Two series of measurements (n1 ¼ n2 ¼ 8) were carried out before and after the storage period and
we will estimate the difference in means by a 95% confidence interval. The results were x1 ¼ 90:8; s21 ¼ 3:89 and x1 ¼ 92:7; s22 ¼
4:02, respectively. Therefore, the two-sided interval when assuming equal variances (sp2 ¼ 3.96, Eq. (23)) is
pffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffi
ð90:8  92:7Þ  2:1448  3:96 18 þ 18, that is [ 4.03, 0.23]. Therefore, at 95% confidence level, the difference of the means
belongs to the interval, including null difference, that is, the substance is stable.
12 Quality of Analytical Measurements: Statistical Methods for Internal Validation

When the assumption s21 ¼ s22 is not reasonable, we can still obtain an interval on the difference m1  m2 by using the fact that
 
1 X 2 ðm1 m2 Þ
the statistic Xp ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 2
is distributed approximately as a t with d.f. given by,
s1 =n1 þs2 =n2
 2
s21 =n1 þ s22 =n2
n¼ 2 2 (25)
ðs21 =n1 Þ ðs22 =n2 Þ
n1 1 þ n2 1

The 100(1  a)% confidence interval is obtained from the following equation:
8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 9
< s 2 s 2 2 2 =
pr X1  X
 2  ta=2;n 1
þ 2
 m1  m2  X  2 þ ta=2;n s1 þ s2
1  X ¼1a (26)
: n1 n2 n1 n2 ;

Example 5: We want to compute a confidence interval on the difference of two means with unknown and not equal variances, with
the results that come from an experiment carried out with four aliquot samples by two different analysts. The first analyst obtains
x1 ¼ 3:285, and the second x2 ¼ 3:257. The variances were s12 ¼ 3.33  10 5 and s22 ¼ 9.17  10 5, respectively. Assuming that
s12 s s22, Eq. (25) gives n ¼ 4.9, so the d.f. to apply Eq. (26) are 5 and t0.025,5 ¼ 2.571. Thus, the 95% confidence interval is
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð3:285  3:257Þ  2:571 3:3310 þ 9:1710
5 5
4 4 , that is, [0.014, 0.042]. So, at 95% of confidence, the two analysts provide unequal
measurements because zero is not in the interval.
The confidence intervals for the maximum and the minimum are obtained by considering the last or the first term respectively in
Eq. (26) and replacing ta/2,n by ta,n.

1.01.2.4.3 Case 3: Confidence interval for paired samples


Sometimes we are interested in evaluating an effect (e.g., the reduction of a polluting agent in an industrial spill by means of a cata-
lyst) but it is impossible to have two homogeneous populations of samples without and with treatment to obtain the two means of
the recoveries, because the amount of polluting agent may change, for example, over time. In these cases, the solution is to deter-
mine the polluting agent before and after applying the procedure to the same spill. The difference between both determinations is
a measure of the effect of the catalyst. The (statistical) samples obtained in this way are known as paired samples. Formally, with the
two paired samples of size n, x11, x12,., x1n and x21, x22,., x2n, we compute the differences between any pair of data, di ¼ x1i  x2i,
i ¼ 1,2,.,n. If these differences follow a normal distribution, the 100(1  a)% confidence interval is obtained from
 
s s
pr  d  ta=2;n pdffiffiffi  m  
d þ ta=2;n pdffiffiffi ¼ 1  a (27)
n n

where d and sd are the mean and standard deviation of the differences di and n ¼ n  1 are the d.f. of the t distribution.

1.01.2.5 Confidence Interval on the Ratio of Variances of Two Normal Distributions


This section approaches the question of giving a confidence interval on the ratio s21/s22 of the variances of two distributions N1  N
(m1,s1) and N2  N(m2,s2) with unknown means and variances. Let x11, x12, ., x1 n1 be a random sample of n1 observations from N1
and x21, x22, ., x2 n2 be a random sample of n2 observations from N2. The sample variances obtained with these two samples, s12 and
s22, are the particular values of the random variables S12 and S22, and the 100(1  a)% confidence interval on the ratio of variances is
computed from the following equation:
 
S2 s2 S2
pr F1a=2;n1 ;n2 12  12  Fa=2;n1 ;n2 12 ¼ 1  a (28)
S2 s2 S2

where F1  a/2, n1, n2 and Fa/2, n1, n2 are the critical values (upper tail) of an F distribution with n1 ¼ n2  1 d.f. in the numerator and
n2 ¼ n1  1 d.f. in the denominator. The Appendix contains a description of some relevant properties of the F distribution.
We can also compute one-sided confidence intervals. The 100(1  a)% upper or lower confidence bound on s12/s22 is obtained
from Eqs. (29), (30), respectively. Remember that, when computing the intervals by using Eq. (29), the lower bound is always 0.
 2 
s S2
pr 12  Fa;n1 ;n2 12 ¼ 1  a (29)
s2 S2
 
S2 s2
pr F1a;n1 ;n2 12  12 ¼ 1  a (30)
S2 s2

Example 6: In this example, we compute a two-sided 95% confidence interval for the ratio of the variances in Example 4 (n1 ¼ n2 ¼ 8,
s12 ¼ 3.89, s22 ¼ 4.02). The resulting interval is [0.20  (3.89/4.02), 4.99  (3.89/4.02)] ¼ [0.19, 4.83]. As 1 belongs to this
interval, we can admit that both variances are equal.
Quality of Analytical Measurements: Statistical Methods for Internal Validation 13

1.01.2.6 Confidence Interval on the Median


This case is different from the previous cases, because the confidence interval is a “distribution-free” interval, that is, there is no
distribution assumed for the data. As it is known, a percentile (pct) is the value xpct such that 100pct% of the values are less
than or equal to xpct. It is possible to compute confidence intervals on any pct, but for values of pct near one or zero we need
very large sample sizes, n, because the values n  pct and n  (1  pct) must be greater than 5. For the median (pct ¼ 0.5), it suffices
to consider samples of size 10 or more.
The fundamentals of these confidence intervals are based on the binomial distribution whose details are outside the scope of this
article and can be found in Sprent.16 We use the data of Example 1 to show step by step how a 100(1  a)% confidence interval on
the median is computed (the guided example is for a ¼ 0.05 with za/2 ¼ 1.96). The procedure consists of three steps:
1. To sort the data in ascending order. In our case, 92.54, 94.45, 97.23, 98.44, 98.70, 98.87, 99.42, 101.08, 103.73, and 105.66. The
rank of each datum is the position that it occupies in the sorted list, for example, the rank of 98.44 is four.
2. To calculate the rank,
 rl, of the value that will be the lower
pffiffiffiffiffiffi endpoint
 of the interval. It is the nearest integer less than
1
pffiffiffi
2 n  z a=2 n þ 1 . In our case, this value is 0:5 10  1:96 10 þ 1 ¼ 2:40, thus rl ¼ 2.
3. To calculate the rank,
 r , of the value that will be
 the upper endpoint
pffiffiffiffiffiffi  of the interval, which is the nearest integer greater than
1
pffiffiffi u
2 n þ za=2 n  1 . In our case, this value is 0:5 10 þ 1:96 10  1 ¼ 7:60, then ru ¼ 8.

Hence, the 95% confidence interval on the median is made by the values that are between the values in position 2 and 8, that is,
[94.45, 101.08].

1.01.2.7 Joint Confidence Intervals


Sometimes it is necessary to compute confidence intervals for several parameters but maintaining a 100(1  a)% confidence that all
of them contain the true value of the corresponding parameter. For example, for two parameters statistically independent, we can
assure a 100(1  a)% joint confidence level by taking separately the corresponding 100(1  a)1/2% confidence intervals because
(1  a)1/2  (1  a)1/2 ¼ (1  a). In general, if there are k parameters, we will compute the 100(1  a)1/k% confidence interval
for any of them.
However, if the used sample statistics are not independent to one another, the above computation is not valid. The Bonferroni
inequality states that the probability that all the affirmations are true at 100(1  a)% confidence level is greater than or equal to
P
1  ( ki ¼ 1 ai), where 1  ai is the confidence level of the i-th interval (usually ai ¼ a/k). For example, if a joint 90% confidence
interval is needed for the mean of two distributions, according to Bonferroni inequality ai ¼ a/2 ¼ 0.10/2 ¼ 0.05; thus, each indi-
vidual interval should be the corresponding 95% confidence interval.

1.01.2.8 Tolerance Intervals


In the introduction to present Section “Confidence and Tolerance Intervals”, the tolerance intervals of a normal distribution have
been calculated knowing its mean and variance. Remember that the tolerance interval [l, u] contains 100(1  a)% of the values of
the distribution of X or, equivalently, pr{X ; [l, u]} ¼ a. Actually, the values of the parameters that define the probability distri-
bution are unknown and this uncertainty should be transferred into the endpoints of the interval. There are several types of toler-
ance regions, but in this article, we will restrict ourselves to two common cases.

1.01.2.8.1 Case 1: b-content tolerance interval


Given a random variable X, an interval [l, u] is a b-content tolerance interval at g confidence level if the following holds:
prfprfX ˛ ½l; ug  bg  g (31)
Expressed in words, [l, u] contains at least 100b% of the values of X with g confidence level. For the case of an analytical method,
this is to say that we have to determine, based on a sample of size n, for instance, the interval that will contain 95% (b ¼ 0.95) of the
results and this assertion must be true 90% of the times (g ¼ 0.90). Evidently, b-content tolerance intervals can be one-sided, which
means that the procedure will provide 95% of its results above l (respectively, below u) 90% of the times. We leave to the reader the
corresponding formal definitions.
One-sided and two-sided b-content tolerance intervals can be computed either by controlling the center or by controlling the
tails, and for both continuous and discrete random variables (a review can be seen in Patel17 and applications in Analytical Chem-
istry in Meléndez et al.18 and Reguera et al.19).
Here we will only describe the case of a normally distributed X with unknown mean and variance. From this distribution, we
have a sample of size n that is used to compute the mean x and standard deviation s. We want to obtain a two-sided b-content toler-
ance interval controlling the center, that is, an interval such that
prfprfX ˛ ½x  ks; x þ ksg  bg  g (32)
17
To determine k, several approximations have been reported; consult Patel for a discussion on them. The approach by Wald and
Wolfowitz20 is based on determining k1 such that
14 Quality of Analytical Measurements: Statistical Methods for Internal Validation

   
1 1
pr Nð0; 1Þ  pffiffiffi þ k1  pr Nð0; 1Þ  pffiffiffi  k1 ¼ b (33)
n n
Therefore
sffiffiffiffiffiffiffiffiffiffiffiffiffi
n1
k ¼ k1 (34)
c2g;n1

cg,2 n  1 is the point exceeded with probability g when using the c2 distribution with n  1 d.f.
2
Example 7: With the data in Example 1, and b ¼ g ¼ 0.95, we have 
x ¼ 99:01, s ¼ 3.91, k1 ¼ 2.054, and c0.95, 9¼ 3.33; thus, accord-
ing to Eq. (34), k ¼ 3.379 and, as a consequence, the interval [99.01  3.38  3.91, 99.01 þ 3.38  3.91] ¼ [85.79, 112.23]
contains 95% of the results of the method 95% of the times that the procedure is repeated with a sample of size 10.

1.01.2.8.2 Case 2: b-expectation tolerance interval


The interval [l, u] is called a b-expectation tolerance interval if
EðprfX ˛ ½l; ugÞ ¼ b (35)
Unlike the b-content tolerance interval, condition in Eq. (35) only demands that, on average, the probability that the random
variable takes values between l and u is b.
As in the previous case, we limit ourselves to obtain intervals of the form [x  ks, x þ ks]. When the distribution of the random
variable is normal and we have a sample of size n, the solution was obtained for the first time by Wilks21 and is
rffiffiffiffiffiffiffiffiffiffiffiffi
nþ1
k ¼ t1b;n (36)
2 n
where t(1  b)/2, n is the upper (1  b)/2 point of the t distribution with n ¼ n  1 d.f.
Example 7 (continuation): With the same data, the 95% expectation tolerance interval would be [99.01  2.37  3.91, 99.01 þ
2.37  3.91] ¼ [89.74, 108.28] as now k is directly computed with the critical value t0.025, 9 ¼ 2.262.
This interval is shorter than the b-content tolerance interval because it only assures the expected value (the mean) of the prob-
abilities that the individual values belong to the interval. In fact, the interval [89.74, 108.28] contains 95% of the values of X only
64% of the times, conclusion drawn by applying Eq. (32) with k ¼ 2.37. Also, note that when the sample size tends to infinity, the
value of k in Eq. (36) tends towards z(1  b)/2 which is the length of the theoretical interval that, in our example, would be [91.35,
106.67] obtained by substituting k by z0.025 ¼ 1.96.

1.01.2.8.3 Case 3: Distribution free intervals


It is also possible to obtain tolerance intervals independent of the distribution (provided it is continuous) of variable X. These inter-
vals are based on the rank of the observations, but they demand very large sample sizes, which makes them quite useless in practice.
For example, the sample size n needed to guarantee that the b-content tolerance interval [l, u] is [x(1), x(n)] (i.e., the endpoints are the
smallest and the greatest values in the sample), it is necessary that n fulfills approximately the equation log(n) þ (n  1) log (g) ¼
log (1  b)  log (1  g).22 If we need, as in Example 7, b ¼ g ¼ 0.95, the value of n has to be 89. Nevertheless, Willinks23 used the
Monte Carlo method to compute shorter “distribution-free” uncertainty intervals proposed in Draft Supplement2 but it still requires
sample sizes that are rather large in the scope of chemical analysis. A complete theoretical development on tolerance intervals
(including their estimation by means of Bayesian methods) is in the book by Guttman.24
The tolerance intervals are of interest to show that a method is fit for purpose because when establishing that the interval [x  ks,
x þ ks] will contain, on average, 100b% of the values provided by the method (or 100b% of the values with g confidence level), we
are including precision and trueness. To assess that the method is “fit for purpose” it suffices that the tolerance interval [x  ks,
x þ ks] is included in the specifications that the method should fulfill. Note that a method with high precision (small value of s)
but with a significant bias can get to fulfill the specifications in the sense that a high proportion of its values are within the spec-
ifications. In addition, in the estimation of s, the repeatability can be introduced as the intermediate precision or the reproducibility
to consider the scope of application of the method. The use of a tolerance interval solves the problem of introducing the bias as
a component of the uncertainty.
With the aim of developing analytical fit for purpose methods, the Societé Française des Sciences et Techniques Pharmaceutiques
(SFSTP) proposed25–28 the use of b-expectation tolerance intervals in the validation of quantitative methods. In four case studies, it
has shown the validity of b-expectation tolerance intervals as an adequate way to conciliate both the objectives of the analytical
method in routine analysis and those of the validation step, and it proposes them29 as a criterion to select the calibration curve.
Also, it has analyzed30 their adequacy to the guides that establish the performance criteria that should be validated and their useful-
ness31 in the problem of the transference of an analytical method. González and Herrador32 have proposed their computation for
the estimation of uncertainty of the analytical assay. In all these cases, b-expectation tolerance intervals based on the normality of
data are used, that is, using Eq. (36). To avoid dependence on the underlying distribution and the use of the classic distribution-free
Quality of Analytical Measurements: Statistical Methods for Internal Validation 15

methods, Rebafka et al.33 proposed the use of a bootstrap technique to calculate b-expectation tolerance intervals, whereas Fernholz
and Gillespie34 studied the estimation of the b-content tolerance intervals by using bootstrap.
To summarize this whole section about tolerance and confidence intervals, it is worth pointing out some comparative aspects
because there is a tendency to confuse both concepts that have nothing in common but the word interval. The difference between
them is clear: the confidence interval is the set that is supposed to contain (with a 100(1  a)% confidence) the true value of the
unknown parameter; the tolerance interval is the set that contains a value which is taken by the random variable in a percentage of b,
with a given confidence g.
In particular, confidence intervals must be used in the process of evaluating trueness and precision of a method when there is no
need to fulfill external requirements but just to compare with other methods or to quantify uncertainty and bias of the results ob-
tained with it.
A usual error is to mistakenly consider a confidence interval as a tolerance interval when the difference between them is impor-
tant. For instance, with the data of Example 7, notice that to compute the confidence interval, the standard deviation of the mean is
pffiffiffi
estimated as s= n ¼ 1:24, whereas the standard deviation of the individual results of the method is estimated as s ¼ 3.91, very
different.
Also, it is important to remember that when the sample size n tends to infinity, the length of a confidence interval tends toward
zero, independently of the chosen confidence level. For example, with the confidence intervals for the mean, in the limit we will
have x ¼ m thus the estimator and the true parameter will be equal for sure (1  a ¼ 1). On the contrary, the length of a b-content
tolerance interval does not tend towards zero when increasing the sample size but to the interval that contains for sure (1  g ¼ 1)
the 100b % of the values.
There are other aspects of the determination of the uncertainty that are of practical interest, for example, the problem that arises
by the fact that any uncertainty interval, particularly an expanded uncertainty interval, should be restricted to the range of feasible
values of the measurand. Cowen and Ellison35 analyzed how to modify the interval when the data are close to a natural limit in
a feasible range such as 0 or 100% mass or mole fraction.

1.01.3 Hypothesis Tests

This section is devoted to the introduction of a statistical methodology to decide whether an affirmation is false, for example, the
affirmation “this method of analysis applied to this sample of reference provides the certified value”. If, on the basis of the exper-
imental results, it is decided that it is false, we will conclude that the method has bias. The affirmation is customarily called hypoth-
esis and the procedure of decision making is called hypothesis testing. A statistical hypothesis is an asseveration on the probability
distribution that follows a random variable. Sometimes one has to decide on a parameter, for example, whether the mean of
a normal distribution is a specific value. In other occasions it may be required to decide on other characteristics of the distribution,
for example, whether the experimental data are compatible with the hypothesis that they come from a normal or uniform
distribution.

1.01.3.1 Elements of a Hypothesis Test


As the results obtained with analytical methods are modeled by a probability distribution, it is evident that both the validation of
a method and its routine use involve making decisions that are naturally formulated as problems of hypothesis testing. In order to
describe the elements of a hypothesis test, we will use a concrete case. Like in the case of intervals, all the examples can be followed
with live-script in the supplementary material entitled Tests_section1023_live.mlx.
Example 8: For an experimental procedure, we need solutions with pH values less than 2. The preparation of these solutions provides
pH values that follow a normal distribution with s ¼ 0.55. pH values obtained from 10 measurements were 2.09, 1.53, 1.70, 1.65,
2.00, 1.68, 1.52, 1.71, 1.62, and 1.58. The question to be answered is whether the pH of the resulting solution is adequate to
proceed with the experiment.
We express this formally as
H0 : m ¼ 2:00 ðinadequate solutionÞ
(37)
H1 : m < 2:00 ðvalid solutionÞ
The statement “m ¼ 2.00” in Eq. (37) is called the null hypothesis, denoted as H0, and the statement “m < 2.00” is called the
alternative hypothesis, H1. As the alternative hypothesis specifies values of m that are less than 2.00 it is called one-sided alternative.
In some situations, we may wish to formulate a two-sided alternative hypothesis to specify values of m that could be either greater or
less than 2.00 as in
H0 : m ¼ 2:00
(38)
H1 : ms2:00
16 Quality of Analytical Measurements: Statistical Methods for Internal Validation

The hypotheses are not affirmations about the sample but about the distribution from which those values come, that is to say, m
is the value, unknown, of the pH of the solution that will be the same as the value provided by the procedure if the bias is zero (see
the model of Eq. (1)). In general, to test a hypothesis, the analyst must consider the experimental goal and define, accordingly, the
null hypothesis for the test, as in Eq. (37). Hypothesis-testing procedures rely on using the information in a random sample; if this
information is inconsistent with the null hypothesis, we would conclude that the hypothesis is false. If there is not enough evidence
to prove falseness, the test defaults to the decision of not rejecting the null hypothesis though this does not actually prove that it is
correct. It is therefore critical to choose carefully the null hypothesis in each problem.
In practice, to test a hypothesis, we must take a random sample, compute an appropriate test statistic from the sample data, and
then use the information contained in this statistic to make a decision. However, as the decision is based on a random sample, it is
subject to error. Two kinds of potential errors may be made when testing hypothesis. If the null hypothesis is rejected when it is true,
then a type I error has been made. A type II error occurs when the researcher accepts the null hypothesis when it is false. The situation
is described in Table 3.
In Example 8, if the experimental data lead to rejection of the null hypothesis H0 being true, our (wrong) conclusion is that the
pH of the solution is less than 2. A type I error has been made and the analyst will use the solution in the procedure when in fact it is
not chemically valid. If, on the contrary, the experimental data lead to acceptance of the null hypothesis when it is false, the analyst
will not use the solution when in fact the pH is less than 2 and a type II error has been made. Note that both types of error have to be
considered because their consequences are very different. In the case of type I error, an unsuitable solution is accepted, the procedure
will be inadequate, and the analytical result will be wrong with the subsequent damages that it may cause (e.g., the loss of a client, or
a mistaken environmental diagnosis). On the contrary, the type II error implies that a valid solution is not used with the correspond-
ing extra cost of the analysis. It is clear that the analyst has to specify the assumable risk of making these errors, and this is done in
terms of the probability that they will occur.
The probabilities of occurrence of type I and II errors are denoted by specific symbols, defined in Eq. (39). The probability a of
the test is called the significance level, and the power of the test is 1  b, which measures the probability of correctly rejecting the
null hypothesis.
a ¼ prftype I errorg ¼ prfreject H0 j H0 is trueg
(39)
b ¼ prftype II errorg ¼ prfaccept H0 j H0 is falseg
In Eq. (39), the symbol “|” indicates that the probability is calculated under that condition. In the example we are following,
a will be calculated with the normal distribution of mean 2 and standard deviation 0.55.
Statistically expressed, with the n ¼ 10 results in Example 8 (sample mean x ¼ 1:708), one wants to decide about the value of
the mean of a normal distribution with known variance and one-sided alternative hypothesis (a one-tail test).
pffiffiffi
 With ffi  premises, the related statistic is written in Table 4 (second row) and gives Zcalc ¼ ðx  m0 Þ=ðs= n Þ ¼ ð1:708  2:0Þ=
these
pffiffiffiffiffi
0:55= 10 ¼  1:679.
In addition, the analyst must assume the risk a, say 0.05. This means that the decision rule that is going to apply to the exper-
imental results will accept an inadequate (chemical) solution 5% of the times. Therefore, the critical or rejection region is written in
Table 4, second row, as CR ¼ {Zcalc <  1.645}, meaning that the null hypothesis will be rejected for the samples of size 10 that
provide values of the statistic less than  1.645. In the example, the actual value Zcalc ¼  1.679 belongs to the critical region; thus,
the decision is to reject the null hypothesis at 5% significance level.
Given the present facilities of computation, instead of the CR, the available statistical software calculates the so-called P-value,
which is the probability of obtaining the current value of the statistic under the null hypothesis H0. In our case, P-value ¼ pr
{Z   1.679} ¼ 0.0466. When the P-value is less than the significance level a, the null hypothesis is rejected because this is the
same as saying that the value of the statistic belongs to the critical region.
The next question that immediately arises is about the power of the applied decision rule (statistic and critical region). To calcu-
late b, defined in Eq. (39), it is necessary to exactly specify the meaning of the alternative hypothesis. In our case, what is meant by
pH smaller than 2. From a mathematical point of view, the answer is clear: any number less than 2, for example, 1.9999 which
clearly does not make sense from the point of view of the analyst. In this context, sometimes due to previous knowledge, in other
cases because of the regulatory stipulations or simply by the detail of the standardized work procedure, the analyst can decide the
value of pH that is considered to be less than 2.00, for example, a pH less than 1.60. This is the same as assuming that “pH equal to
2” is any smaller value whose distance to 2 is less than 0.40. In these conditions,
 
jdj pffiffiffi
b ¼ pr Nð0; 1Þ < za  n (40)
s

Table 3 Decisions in hypothesis testing.

The unknown truth


Researcher’s decision H0 is true H0 is false

Accept H0 No error Type II error


Reject H0 Type I error No error
Another random document with
no related content on Scribd:
live in peace, with this and many other crimes staring him in the
face.”
The heart sickens at such a recital of cold-blooded murder; and the
evidence of savage, not to say inhuman, barbarity that characterized
the horrible crime is sufficient to humiliate the whole race of men
and send our much vaunted Christian civilization reeling back into
the dark ages. The shadow on the dial of Ahaz went back ten degrees
—it was a wonderful miracle—but here, in the noon of the nineteenth
century, the shadow on the dial of human progress and Christian
civilization has gone down forty degrees without a miracle, and
reaches the grosser, the darker and the baser passions of our fallen
nature, which instigate and then execute deeds of horror at which all
Christendom revolts.
CHAPTER XXI.
REV. B. H. SPENCER.

His Character and Position as a Minister—Order of Banishment—


Interview with General Merrill—Note to Colonel Kettle—Cause of
Banishment—Letter to A. C. Stewart—Provost-Marshal at
Danville—Frank, Manly Reply—Second Letter to Mr. Stewart,
and Petition to General McKean—The Latter Treated with Silent
Contempt—Strong Loyal Petition Endorsed by H. S. Lane, U. S.
Senator, and O. P. Morton; Governor of Indiana—“Red Tape”—
Petition Returned—Hon. S. C. Wilson Counsel for the Exiles—
General Schofield Finally and Unconditionally Revokes the Order
of Banishment—Indictment for Preaching Without Taking the
“Test Oath.”—Why he Declined to Take the Oath—Prayer for his
Persecutors.
Rev. B. H. Spencer.
Neither goodness, kindness, humility nor usefulness in a minister
of the gospel could disarm malice or shield the servant of God from
the persecutions of wicked men. It is truly astonishing how many and
how diverse the pretexts framed for the arrest, robbery, banishment,
imprisonment or murder of those whose only crime was that they
were ministers of the gospel in connection with the M. E. Church,
South. Infidelity was never at a loss for expedients and Antichrist
was never without efficient agents.
The Rev. B. H. Spencer is almost a native of Missouri, being only
six months of age when his parents came to Missouri from North
Carolina, and has received regular appointments from the Missouri
Annual Conference, M. E. Church, South, consecutively since 1843,
when he was first admitted on trial. No man has a cleaner and purer
record in the Church, both in his personal and ministerial character;
and few men have occupied so many places of high trust and
responsibility. He is one of the old Presiding Elders, and has often
been called to represent his Conference on the floor of the General
Conference, and has always proved himself to be prudent in council,
wise in legislation, correct in administration and eminently useful in
the pulpit; distinguished, perhaps, for his scriptural, practical and
forcible expositions of the distinctive doctrines and duties of Bible
Christianity. He is zealous, humble, earnest, energetic and
Methodistic in all his ministerial work; extensively known and highly
esteemed in love for his works’ sake all over the State.
Long associated with the honored names that will live in the
annals of Missouri Methodism, and taking a high rank with them,
the sentiments that introduced the Rev. W. M. Rush to these pages,
and the reader, may, with but little alteration, introduce Mr. Spencer.
Mr. Spencer is a representative man in his character and position
in Missouri, and while his persecutions were severe and protracted,
his was not an isolated case. He represents in his cruel and wanton
exile a large class of Missourians, and especially of Missouri
ministers, some of whom will, perhaps, never return to this State. B.
T. Kavanaugh, L. M. Lewis, E. K. Miller, B. R. Baxter and many
others are possibly lost to the State forever. They may have gone out
for different causes, but the peculiar proscription and persecutions to
which ministers in Missouri have been subjected kept them out.
Few if any cases of persecution in Missouri present more
deliberate meditation, cooler cruelty and more heartless inhumanity
than the one disclosed in the following narrative, made in Mr.
Spencer’s own quiet, clear and forcible style. His letters to the
various military officials, written in exile, and while all the finer
sentiments and feelings of his manly, Christian heart were writhing
under the cruel injustice he had to bear without the means of
vindication or the hope of redress, are worthy the pen of Cranmer,
and would have given a higher tone and temper to the moral courage
of Latimore.
The reader must, however, measure the man and his persecutors
by the following paper:
“Order of Banishment.
“Dear Doctor: The first item that I send you is in regard to my
banishment, as an act of ecclesiastical persecution.
“In the town of High Hill, Mo., on the 16th January, 1863, I
received from the hands of a Federal soldier the following order, viz.:

“‘Headquarters N. E. District Missouri, }


“‘Warrenton, Mo., Jan. 13, 1863. }

“‘Provost-Marshal, or Commanding Officer, Danville, Mo.:

“‘Sir: You will cause the following persons to leave the State of
Missouri, within a reasonable time after the receipt of this order, and
reside, during the war or until permitted to return, at some place
north of Indianapolis, Indiana, and east of Illinois. They will be
required to report to you, by letter, once a month, and are not
permitted to leave the State by way of St. Louis, but directed to go by
Macon City and Hannibal, Missouri. Rev. B. H. Spencer, * * * * * * *.
“‘By command of Brigadier-General Merrill.

“‘Geo. M. Houston, A. A. G.’

“The above order was accompanied by the following:

“‘Headquarters 67th Regiment E. M. M., }


“‘Danville, Mo., Jan. 16, 1863. }

“‘Rev. B. H. Spencer:

“‘Sir: The above is a true copy of Gen. Merrill’s order to me. You
will obey said order within six days from this date. You will report to
these headquarters on the day of departure.
“‘By order of J. G. Kettle, Col. Commanding.

“‘J. F. Anderson, Adjutant.’


“On the day of receiving this order I went to Warrenton, being
Gen. Merrill’s headquarters, to see if I could not induce him to
revoke it. I found him at the supper table, and unwilling to give me a
hearing anywhere else, when the following conversation took place
between us:
“‘Gen. Merrill, I have received from you an order of banishment
from the State, and wish to see you in regard to it.’
“‘Then what is your name and place of residence?’
“‘My name is B. H. Spencer, High Hill, Mo.’
“General (in a passion)—‘I can do nothing for you!’
“I replied—‘It seems that the tongue of slander has reached you
concerning me; will you hear evidence in my favor?’
“His reply was peremptorily, ‘No, sir!’
“I inquired, ‘Will you then read documents?’
“Answer in same manner—‘No, sir!’
“He then inquired—‘Does the order allow you to go by St. Louis?’
“I answered, ‘No, sir.’
“‘Then,’ said he, ‘see that you don’t go that way!’
“I replied, ‘I don’t expect to.’
“He said, ‘see that you don’t!’ And then added, ‘You may think
yourself very fortunate that you are not hung, and should feel that
you are very mercifully dealt by!’
So the conversation ended, and I returned home and wrote the
following note to Colonel Kettle:

“‘High Hill, Mo., Jan. 19, 1863.

“‘Col. J. G. Kettle, Danville, Mo.:

“‘Honored Sir: Some time ago I promised to marry a couple in


this vicinity on to-morrow night, and as it will not be in violation of
Gen. Merrill’s order, and will furnish me some means with which to
carry out that order, will you permit me to do so?

“‘I am, very respectfully, B. H. Spencer.’


“The following is his reply:

“‘Headquarters 67th Regiment E. M. M., }


“‘Danville, Mo., Jan. 19, 1863. }

“‘Rev. B. H. Spencer:

“‘Sir: Your request to marry the couple and to preach is granted. I


would say that you had better not speak of your banishment in your
sermon.

“‘Yours, &c., J. G. Kettle, Colonel.’

“On the 25th of January, 1863, I preached the sermon alluded to;
and then, in company with four others, made my report to military
headquarters at Danville, Mo. But, in consequence of an accident on
the railroad, I was permitted to remain with my family until the 28th
of that month, when, with a sad heart, I was compelled to leave my
distressed wife and six little children and go into a land of strangers,
and remain in exile for ten long months.
“Dr. H. W. Pitman, Rev. D. W. Nowlin, Rev. J. D. Gregory and Rev.
Wm. A. Taylor were banished in company with me. We had no trial,
either civil or military, nor would they condescend to tell us what
were the charges against us, or whether, indeed, there were any. Nor
to this day—September 7th, 1869—have we found out why it was
done, except through private and unofficial sources. The information
thus received as to the cause of my banishment was as I expected—I
was banished because I was a Southern Methodist preacher! One of
the officers was asked by one of my friends: ‘What are the charges
against Spencer?’ He answered, ‘I never heard that there are any;
but he is a man of influence, and, if disposed, can do a great deal of
harm!’ Another officer was asked by another friend, and he replied,
‘The fact that he is a Southern Methodist preacher is all I want to
know!’ There never was a more clear case of ecclesiastical
persecution than was my banishment. Certain men sought to
produce secession, treason and rebellion in the M. E. Church, South,
by way of showing how they professed to hate these things in the
nation; I opposed them, and they became my enemies and had me
banished. If any one doubts this let him attend to the following
documents:

“‘Ashby’s Mills, Ind., April 22, 1863.

“‘Mr. A. C. Stewart, Provost-Marshal, Danville, Mo.:

“‘Sir—There are reasons which induce me to believe that my case


is wholly at the disposal of the officers and Union men of Danville
and vicinity. If this be so, I wish to solicit your attention to a few
considerations in regard to my case. And, first, I was banished from
my home and family without a trial or a knowledge of the charges
against me, or who preferred them. Now, sir, is this right? Is there
any law, civil or military, that will punish an innocent man? How
could the officer who banished me know that I was guilty of any
crime without giving me a trial and hearing evidence in the case?
Have I ever had such a trial? When? Where? Who were the judge,
jury, witnesses pro and con? Where was the prisoner during the
trial? And where was my legal counsel to see that justice was done
me? With what was I charged, and who were my accusers? Three
months have passed since my banishment, and I am still left in
ignorance of why it was done. Was it done merely to gratify official
ambition? or rather, was it not done to gratify the malice of secret
enemies? Can the interests of the Government be secured or
protected or its dignity increased by such treatment of one of its
citizens? Do you say that I am a great rebel, and therefore such
treatment is good enough for me? How do you know that I am a
rebel at all, much less a great one? Did you learn it from mere rumor,
or from a trustworthy witness, sworn to tell the truth before a proper
tribunal and in the presence of the accused? In the absence of such
evidence how can an intelligent gentleman make such a charge, if,
indeed, any one does make it? If it be stated, or insinuated, that I
have been, or am, disloyal or disobedient to the Constitution of the
United States, or to any of the laws made in pursuance thereof, or to
the constitution and laws of any State where I have ever lived, or to
any military order or edict—this most unjust and oppressive one
banishing me from my home and family not excepted—I deny the
allegation and defy proof by competent testimony! Have I not
silently borne injustice and oppression long enough? Can you blame
me for entering my earnest protest against such treatment? Has it
not been said by officers who ought to know, ‘that there are no
charges against me, but that I am a man of influence, and, if
disposed, could do a great deal of harm?’ Now, if there are no
charges against me, in the name of everything that an American
citizen holds dear, why suffer me to be thus persecuted and
oppressed without an effort to prevent it? Are you not a sworn officer
—sworn to support and defend the Constitution of the United States?
and does that Constitution allow such treatment of an American
citizen against whom there are no charges? and can you allow it to be
done without an effort to prevent it and be innocent? And suppose I
have influence, is that a crime? and what reason has any one to fear
that I would use it for evil? Is it proposed to banish men of character
and influence from the State for fear they will exert their influence
for evil? If not, why send off, and keep off, so humble a person as
myself? Is this the way an officer should fulfill his oath of office? Was
he clothed with authority for this purpose? Is this the only protection
I am to expect from the officers of my native State? Is not my
banishment, under the circumstances, an unmitigated outrage upon
civil and military order, as well as upon my liberties as a citizen? I
love and almost venerate the Government of the United States as
established by our patriotic ancestors! Among earthly institutions I
expect and want nothing better. With it I find no fault. My complaint
is against certain of its officers for the injustice and oppression with
which they treat me. If you were in my place and I in your’s, what
course would you wish me to pursue? If a peaceable and quiet
citizen, such as I have always been, is not free from imprisonment or
banishment, who is safe? Has justice forsaken the land? And is there
no place where the oppressed may find redress? If there be any place
where justice may be had, will you tell me where it is, and how to
approach it? I must candidly believe that my banishment was caused
by ecclesiastical persecution—that I am banished for an
ecclesiastical and not for a political reason! Certain persons sought
to produce secession, treason and rebellion in the M. E. Church,
South, by way of showing how they professed to hate these things in
the nation, and I opposed them, because I not only loved union in the
nation, but also in the Church—hence they became my enemies, and
for this cause alone, as I believe, they secured my banishment! I
believe the officer who did it was deceived, and induced to believe me
a bad and dangerous man, or surely he would not have acted so
hastily and rashly! But you know, and so do all my enemies, that
such is not my character. Who would be injured by my return to my
family? Can anybody tell? Does anybody fear it? Shall my secret
enemies be allowed to continue the gratification of their malignity at
my expense under pretense of friendship to the Government? Will
my continued religious persecution do the Government any good?
Why, then, suffer its continuance? Why keep a man in exile without
just cause, who is in feeble health, with limited means, and a wife
and six dependent children needing his attention? Will you not then
allow me to come home at once? Do not even the instincts of
humanity, to say nothing of the higher obligations of justice and
official duty, urge compliance with this request? I honestly believe
that you and the Union men of your vicinity can get me home if you
will—just as easily as to say the word. I may be mistaken, but I
honestly believe that my whole case is in your hands, and that I
remain in exile or return to my family, just as you will the one or the
other. I have reasons for this opinion, and if I am mistaken would
like to know it. I wish to say that in all that I have written I have not
intentionally used a single word that was disrespectful toward those
in authority. In all that I have said, I have aimed to speak plainly,
candidly and earnestly, but also respectfully. I respect you on
account of the authority with which you are invested and the
Government which you represent. But I protest against the way I am
treated, and who can blame me for it? And if this protest shall be
disregarded now, perhaps it may live and speak in vindication of my
character when I am dead, and when the voice of injured justice shall
be heard and respected. If you can not release me, will you tell me
who can? And will you answer this at your earliest possible
convenience, and let me know what you intend to do in my case.

I am, most respectfully,


“‘B. H. Spencer.’

“The answer of the Provost-Marshal was prompt, frank and


manly, and does honor to the head and heart of its author. Unlike
every other officer, civil or military, to whom I had applied for
information or redress, he did not treat me with silent contempt. He
answered. And the answer is important, because it shows clearly that
he not only had no hand in the banishment of myself and my
companions in exile, but that he also had been kept in ignorance of
the intention to do it, as also for the reasons why it was done. Surely
there could have been no public charges against us, or proper trial in
our case, or the Provost-Marshal in our immediate vicinity could not
have thus been kept in ignorance of such an intention till after it was
done.
“It proves, furthermore, that by order of Gen. McKean, it was left
to the so-called loyal men of Montgomery county, Mo., to say
whether we should return or not. And we have the names of those
who gave their sworn opinions as to whether it was proper for us to
return or not, and could give them, but in mercy we withhold them.
And, finally, it proves that our efforts to obtain a revocation of our
order of banishment, to be successful, had to be kept to ourselves.
Why? Simply because if our secret enemies found it out they would
thwart our efforts at the headquarters of the Commanding General of
the district. But the letter speaks for itself. It is as follows:

“‘Office Provost-Marshal, Danville, }


Montgomery Co., Mo., April 26, 1863. }

“‘Rev. B. H. Spencer, Ashby’s Mills, Ind.:

“‘Dear Sir—I have just received yours of 22d inst., and must
acknowledge I am utterly at a loss to comprehend it.
“‘I want to say, once for all, to yourself, as also to Doct. Pitman and
Judge Nowlin, that I had no hand in your banishment whatever,
either as a private citizen or as an officer; that I never had, either
directly or indirectly, an intimation that such a thing was
contemplated. An order was issued by General McKean, who is
Commanding General of this district, headquarters at Palmyra, to J.
G. Lane, Provost-Marshal of Wellsville district, to take the testimony
of the loyal men of Montgomery county in relation to the propriety
of your return home. Lane was removed from office and his district
thrown into mine, and the order was sent to me by General McKean,
which I executed by taking the evidence of loyal men, both at High
Hill and Montgomery City, as well as Danville. The evidence was
sworn to and sent by order of the commanding General to his
headquarters.
“‘Now, sir, I have given you the facts in regard to everything I have
had to do with this case. And, although you protest against any
intention to insult or offend in your communication, I must frankly
admit that the whole tenor of your letter seems to savor of both.
‘How can you consent, without just cause, to keep one in exile who is
in feeble health,’ &c., is one extract from your letter. ‘Will you not
then allow me to come home at once?’ is another. Now, sir, you must
know that I have no direct control of this matter! Why ask me such
questions? Why not ask me, as a private citizen, to use my influence
to obtain a revocation of the order? The authorities that issued the
order of your banishment have never asked, neither have I given, my
opinion as to the propriety of the order. Notwithstanding I consider
your letter as invidious, and, as I understand it, full of insinuations
against me, yet, under the circumstances, I will allow humanity to
step in, discard all feeling that your letter may have excited, and give
you the best advice I am capable of.
“‘Judge Nowlin, Doct. Pitman and yourself get up a letter, directed
to Brig.-Gen. McKean, Palmyra, Mo., through me as Provost-Marshal
of Montgomery county. Take humanity for your text; appeal to him
through the tears of your wife and helpless children; let Government
officers alone; agree to report to me once a week in person, if it
should be considered necessary; give every assurance that your lips
will be sealed in future as to the utterance of treason, directly or
indirectly; send the letter to me and I will forward it, with such
recommendation as I may deem proper and right, and, if that fails, I
am at the end of my row. The success of this thing will very much
depend on keeping my advice to yourselves. I may be mistaken, but I
believe your liberation may be effected in that way. Give my respects
to Judge Nowlin and Doctor Pitman.

“‘Yours, &c.,
“‘A. C. Stewart, Prov.-Marshal.’

“To the above noble letter I made the following reply:

“‘Ashby’s Mills, Montgomery Co., Indiana, }


‘May 4, 1863. }

“‘Mr. A. C. Stewart, Provost-Marshal, Danville, Mo.:


“‘Dear Sir: Yours of the 26th April is to hand, has been read and
contents noted. And in reply let me say, I regret that you considered
my letter in its whole tenor ‘invidious, offensive and insulting,’
notwithstanding my protest against such a construction. I knew the
task I had undertaken was difficult, for there seems to be something
about official position which is always more or less impatient of
contradiction. And hence it was reasonable to conclude that this is
true of military officers, who feel that it is theirs to command and for
others to obey or submit, and not to reason or question. The
difficulty was to so employ language as to convey some idea of my
righteous indignation at the injustice of my treatment, and which
would at the same time be respectful and courteous toward those in
authority. And I question very much whether you yourself, in my
circumstances, would, if you could, have done better. I was, with only
a few days’ notice, forced away from the fellowship and pastoral
oversight of hundreds of beloved brethren; from a most dependent
and afflicted family; from my only means of their support; from the
graves of my kindred, and every thing of earth that was dear; was
denied the privilege of going by St. Louis, where I might have
reached the ears of power and have gained a revocation of my order
of banishment; with limited means, was compelled to travel a
circuitous and expensive route to my place of exile; was denied the
privilege of living in the loyal State of Illinois, where I had kindred,
and it would have cost me nothing; was denied the sympathy of
friends who would have helped me financially, but were afraid; was
sent into a land of strangers, under Government censure, where,
without sympathy, if without money, a man had better be dead; was
not allowed to know the charges against me, who were my accusers,
or even the semblance of a trial, though I had sought one of Gen.
Merrill, of Gen. Curtis, of Gen. Halleck, of Gov. Gamble, of Attorney-
General Bates, of Secretary Stanton and of President Lincoln, and
had done this, directly and indirectly, through men of commanding
influence, whose loyalty was above suspicion, and all this without
success; felt, yea knew, that I was innocent; that there could be no
truthful evidence of my being guilty of any crime; knew that I was
suffering all this to gratify the malignity of secret enemies who had
deceived the military commander and secured my banishment;
enemies who, like the midnight assassin, did their work and then
slunk away to gloat over the misery they had caused; felt satisfied
that I was thus persecuted for an ecclesiastical and not for a political
reason; was sure the Government could not be benefited by my
persecution nor injured by my return to my family; and, finally,
became thoroughly convinced that the influence that controlled the
action of those who had the power to release me from the binding
force of this order, or to keep me in exile, was in or near Danville;
and, in a word, was satisfied that I had found out the locality of the
authors of my trouble and why they persecuted me, but the identical
names of my persecutors I did not know; and hence, in view of the
foregoing considerations, I wrote you in the way I did. Now, interpret
my letter in the light of my circumstances, and imagine yourself in
my condition, and you will be able to ‘comprehend it,’ and to excuse
anything that may seem ‘discourteous or insulting,’ especially when I
assure you nothing of the kind was intended. You have my thanks for
your prompt and manly reply to my letter. There are times when I
would rather a man would abuse me a little than not answer me at
all, and this is one of those times. You are the only officer who had
the condescension, kindness, humanity, or whatever else you may
please to call it, to answer a single one of my numerous appeals for
deliverance from oppression, or for instruction as to where or how I
might obtain it. To your praise be this spoken. It affords me much
pleasure, also, to learn from yourself that you had no hand in
securing my banishment, or knowledge of it until after it occurred. I
wish I could think the same of every other citizen of Danville.
“And now that, in accordance with your wish, I am addressing you
as a private citizen, may I ask, and confidently expect, that you will
give me the names of my accusers, and the nature of their
accusations against me, if there are any, together with the names of
those loyal men whose sworn testimony was sent to Gen. McKean in
regard to the ‘propriety’ of allowing me to come home, and the
substance of what each one said? As that is the nearest a trial of
anything else I have had, should not the accused be allowed to know
his accusers, the names of the witnesses and the nature of their
testimony against him? You reprehend me very severely for
insinuating that you have any ‘direct control of my case.’ Well, I did
not suppose you had authority to revoke the order of banishment;
but I did suppose, and do still suppose, that you and your friends of
that vicinity can influence Gen. McKean to revoke the order or not,
just as you wish; and that you have control of my case in that way.
And hence it is that I am so thankful to you, and so much encouraged
by your kind offer to use your influence with the commanding officer
to set aside this order and permit me to return home. And I am sure
if you do promptly and vigorously exert your influence in that
direction you are certain of success.
“Among your items of advice you say, ‘Give every assurance that
your lips will be sealed in future as to the utterance of treason,
directly or indirectly.’ Now, as this is, to my mind, an intimation that
some one, or all three of us, are charged with having been guilty of
treasonable utterances, and hence are required to give assurance that
we will do so no more, I wish to say for myself that, if such be the
intimation, I deny the allegation in toto; for I have neither uttered
nor acted treason, nor do I expect to do either in future. And if I am
permitted to return, and you can protect me from the tongue of
slander, and the secret enemies that with consummate mendacity
hound my steps and torture and misrepresent my language and
conduct, you will hear nothing of treason, either in utterance or
action. But, if that can not be done—if the tongue of slander and
falsehood against me can not be silenced in any other way—then give
a fair trial, and make these secret liars, who whisper falsehoods into
official ears against those they hate, ‘face the music,’ and I will
vindicate my innocence. Upon that subject I can make no further
promises. A mere charge of treason, you know, is no evidence of
guilt. The immaculate Son of Man was accused of rebellion, sedition
and treason, with blasphemy, and with being the agent of the prince
of devils! Of Innocence itself they said, ‘He is not fit to live; away
with him! crucify him! crucify him!’ And ‘If they have done these
things in the green tree, what will they not do in the dry?’ And the
same divine authority has said, ‘If any man will live godly in Christ
Jesus, he must suffer persecution,’ and I have made my calculations
accordingly. As to your other suggestions, I wish to say that I will
herewith transmit to Gen. McKean, through you, a request, or
petition, for the revocation of this order in my case, accompanied
with a few of the reasons why I make it, which I will thank you to
send to him, if you please, together with such remarks and
recommendations as you may think proper to make. Please let me
hear from you at an early day, and much oblige,

“‘Most respectfully,
“‘B. H. Spencer.’

“The petition was sent to General McKean, through the Provost-


Marshal of Montgomery county, Mo., together with the best appeal
that he could make in our favor. But the only notice he seems to have
given it was to treat it with silent contempt.
“The following is a copy of that petition:

“‘Ashby’s Mills, Ind., May 7, 1863.

“‘Brigadier-Gen. McKean, Com., Palmyra, Mo.:

“‘Dear Sir—Will you please to revoke the order of Gen. Merrill, of


the 13th January, 1863, banishing me from the State of Missouri? A
few of the reasons why I ask you to do this are—
“’1st. The order was unjust. The General who issued this order did
not know me, was dependent upon others for his information
concerning me, and was evidently deceived by my personal enemies,
or he never would have issued it.
“’2d. I have never engaged in this rebellion in any way, nor violated
any law, civil or military; and, therefore, am not deserving of this
punishment.
“’3d. I have a wife and six small, helpless children, whose ages
range from two to twelve years, from whom I have been forcibly
separated for more than three months, and who very much need my
attention, and, therefore, humanity, to say nothing of the higher
claims of truth and justice, demands compliance with this request.
“’4th. If permitted to return, I expect to be, as I have ever been, a
law-abiding and good citizen, and, therefore, the Government can
not be benefited by my remaining in exile nor injured by my return
to my family.
“’5th. As it is the duty and glory of a Government to protect its
citizens in the possession of all their legitimate rights, I ask, and
hope it will be your pleasure to grant, that I may return to my family
in the enjoyment of the untrammeled liberty that I had before my
banishment.
“‘This petition will be sent to your headquarters by Mr. A. C.
Stewart, Provost-Marshal, Danville, Mo., accompanied by such
remarks and recommendations as he may think proper to make.
“‘In the confident expectation that you will grant this just and
reasonable request at an early day,

“‘I am, most respectfully,


“‘B. H. Spencer.’

“After being compelled to remain long enough in exile to form


character and make friends amongst strangers, at the end of nine
months some of the most prominent Union men of Indiana, on the
31st August, 1863, sent the following petition to the Provost-Marshal
General of the department of the Missouri:
“‘To Lieut.-Col. J. O. Broadhead, P. M. G. of Missouri, St. Louis,
Mo., or to whomsoever this petition should be addressed:

“‘The undersigned petitioners beg leave respectfully to represent to


the proper authorities in the State of Missouri, that we are citizens of
the United States, residents of the counties of Montgomery and
Putnam, in the State of Indiana; that we are now and ever have been
loyal and devoted to the Government of the United States; that we
are supporters of the present Administration thereof, and that we are
in favor of using all lawful ways and means for suppressing the
present rebellion and preserving the Union established by our
fathers; we, therefore, cordially endorse all and every one of the
measures of the Government having these much desired objects in
view.
“‘We beg leave further to represent that there have been residing in
our midst, in our immediate vicinity, for the past six or seven
months, three individuals, said to be citizens of Montgomery county,
in the State of Missouri, and to have been banished from that State
by the military authorities there, viz.: H. W. Pitman, B. H. Spencer
and David W. Nowlin. While we can not know the causes that led to
the banishment of these men, we would state that they came among
us under the ban of the Government, and we looked upon them as
objects of suspicion. They and their conduct have been closely
observed and narrowly scrutinized, not to say strictly watched by our
party, and we deem it but sheer justice to declare, candidly and
emphatically, that after an observation of the length of time
indicated above we have seen nothing in these men that in our
judgment would require that they longer be kept in exile.’
“‘They are represented to us as men having families dependent
greatly on them for support, and every feeling of humanity is enlisted
in their behalf, if the interests of the Government do not imperatively
require their continuance in exile. With the lights before us, and in
view of the facts that these men have resided for the past six or seven
months in a population greatly excited on political issues, and among
whom sundry disloyal practices have been rife, in which they have
had ample opportunities to have partaken if they had been so
inclined, and yet our observation has not been sufficient to detect
them as aiders or abettors in these disloyal practices; we feel free,
therefore, to declare emphatically our convictions that the interests
of the Government will not be advanced by a longer continuance of
their exile; but, on the contrary, we are satisfied that those interests
would be promoted by a revocation of the order banishing them from
Missouri. We, therefore, in behalf of these exiles, pray the authorities
in Missouri who are empowered to do so to revoke the order
banishing the said H. W. Pitman, B. H. Spencer and David W.
Nowlin from the said State of Missouri, and to release them from
further pains and penalties in the premises; and as loyal citizens in
duty bound, we will ever pray, &c.

(Signed) “‘John W. Harrison,


“‘Dr. H. Labarre,
“‘Franklin M. McMurray,
“‘Dr. George W. Miller,
“‘James Knox,
“‘J. J. Billingsley,
“‘A. D. Billingsley.’

“The undoubted loyalty of these petitioners, and their prominence


in social and political circles during Mr. Lincoln’s Administration,
received the following endorsement, which accompanied their
petition and formed a part of it:
“‘I have known the signers of this paper long and well; they are
true and loyal citizens of Indiana, and are all supporters of the
Administration. They are gentlemen of the highest character, and
their statements are entitled to full credit.

“‘H. S. Lane, U. S. Senator.’

“‘The gentlemen who signed the foregoing statement are of


undoubted loyalty, and their representations are worthy of credit.

“‘O. P. Morton, Gov. of Indiana.’

“And now, by way of showing how difficult it was for those in


prison or exile to obtain a hearing at headquarters, in consequence of
official routine, etiquette, or what is technically called ‘Red Tape,’ I
give the following inscription, which was written on the outside of
the above petition before it was returned to the petitioners. It seems
first to have come into the hands of some sub-official, who read it
and then wrote on it a digest of its contents, as follows:
“‘Petition. Citizens of Indiana. P. 102 (P. M. G.) 63. That H. W.
Pitman, B. H. Spencer and D. W. Nowlin, exiles from Montgomery
county, Mo., be permitted to return to their families and homes, as
they have been closely watched while here and have always
conducted themselves as Union men. These petitioners are indorsed
by the Governor of Indiana.’
“This sub-official then seems to have sent it to the P. M. General of
the Department, who, without granting or promising to grant the
petition, sent it back to Gov. Morton, with the following explanation
written on it:

“‘Headquarters Department of the Missouri, }


“‘Office of the P. M. G., }
“‘St. Louis, Mo., Sept. 3, 1863. }

“Respectfully returned to his Excellency, O. P. Morton, Governor


of Indiana, with the information that there are no papers on the
cases of the persons named in the within petition in this office.
Neither does their names appear upon the records. They were
probably banished by order of some district commander.
“‘By order of Lieut.-Col. J. O. Broadhead.
“‘H. H. Haine,
“‘Lieut. and A. P. M. G. Dept. of the Missouri.’

“Upon receiving it Governor Morton sent it to Senator Lane, who


sent it to the petitioners with the following explanation:
“This paper was to-day returned to me by Governor Morton, with
the indorsements on it. Sept. 7, 1863.

“‘H. S. Lane.’

“Just think of it! No trial, no charges, nothing for us or against us,


not on the records, no papers in our cases, and yet we in exile and
compelled to stay there! But we employed one of Indiana’s noblest
lawyers, the Hon. Samuel C. Wilson, of Crawfordsville, to take that
petition and go with it in person to Gen. Schofield’s headquarters.
The result was an unconditional revocation of the order of
banishment, on the 16th Sept., 1863, which is as follows:

“‘Headq’rs Department of the Missouri, }


St. Louis, Mo., Sept. 15th, 1863. }

“‘Special Orders No. 252.]

“‘I. Dr. H. W. Pitman, David Nowlin and B. H. Spencer, citizens of


Montgomery county, Missouri, heretofore banished to Indiana, to
remain there during the war, are permitted to remain in any part of
the United States, outside of the limits of this Department. They will
report their places of residence the first of each month during the
war to the Provost-Marshal General of this Department.
“‘By command of Major-General Schofield.
“‘Wm. W. Eno, Ass’t Adj’t-Gen’l.
“‘B. H. Spencer, per Maj. Dunn.’
“The foregoing facts and documents are a mere tithing of what
might be given to the same effect, and go to show most clearly that I
was persecuted in various ways, and banished from my helpless
family for ten long months, for no higher and no other crime than
that I was a Southern Methodist preacher!

You might also like