Multivariate Statistical Quali

Multivariate Statistical Quality Control of a
Pharmaceutical Manufacturing Process Using
Near Infrared Spectroscopy And Imaging
Microscopy
ANDREW JAMES O ’NEIL
A thesis submitted in partial fulfilment of the requirements of The
University of London for the degree of Doctor of Philosophy in the Faculty
of Medicine
The School of Pharmacy,
University of London,
29/39 Brunswick Square,
London W CIN lAX.
May 2000.
ProQuest Number: 10104305
All rights reserved
INFORMATION TO ALL USERS

The quality of this reproduction is dependent upon the quality of the copy submitted.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest.
ProQuest 10104305
Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.
All rights reserved.

This work is protected against unauthorized copying under Title 17, United States Code.
Microform Edition © ProQuest LLC.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
ABSTRACT
Multivariate Statistical Quality Control of a Pharmaceutical
Manufacturing Process Using Near Infrared Spectroscopy
And Imaging Microscopy
The ability of near infrared (NIR) reflectance and transmittance spectroscopy and near
infrared imaging microscopy to enable multivariate statistical quality control of an
entire pharmaceutical tablet manufacturing process has been demonstrated.
Statistical quality control of process intermediates at each of the processes’ stages (raw
materials dispensing, powder blending and tabletting of blend) required construction of
a multivariate model from a reference set of NIR spectroscopic measurements of
process intermediates. With blends and tablets, these measurements were collected from
a number of different batches where the process was known to have operated within
specification and within a state of statistical control. With the powdered raw materials,
measurements were made of pharmaceutical grade materials.
Using the multivariate models, it was shown possible to assess the quality of raw
materials by NIR spectroscopy and determine their suitability for use in manufacture.
Simultaneous determination of powdered pharmaceutical raw materials’ identities and
their accurate particle size distributions were obtainable from a single averaged NIR
spectrum.
The models developed from NIR measurements of blends and tablets enabled a level of
quality control at each of these process stages superior to current reference analytical
laboratory measurements of these. Significant trends in process deviation could be
identified from an averaged NIR spectrum even at the blend stage despite within
specification reference laboratory data. These batches of blends ultimately produced
tablets of lower quality. This included tablet friability, increased tablet thickness and
prolonged tablet dissolution time.
NIR microscopic imaging of these lower quality pharmaceutical blends was examined
to provide more detailed diagnostic information. The spatial locations and size of drug
substance particles could be readily identified and in some instances showed unmilled
drug substance particles. This demonstrated the potential of NIR imaging microscopy
for on-line or at-line quality control of the blending stage.
ACKNOWLEDGEMENTS
I would like to thank those who have helped me in my studies at The School of
Pharmacy. My supervisors, Prof. Tony Moffat and Dr. Roger Jee, deserve special
thanks. Throughout my research training, they provided invaluable comments and
guidance. I particularly enjoyed the useful discussions in the bar, and participation in
several international conferences.
I am also grateful to my industrial supervisor. Dr. Perry A. Hailey, Pfizer Central
Research, Pfizer Ltd., and to Pfizer Ltd. for funding my Ph.D. studentship.
Perry devised the concepts for my Ph.D. research and was an excellent source of advice
on matters of chemometrics and computing. He also arranged for me to be provided
with an extensive set of pharmaceutical process materials. John Wakeman, Pfizer
Central Research, Quality Operations, deserves many thanks for kindly selecting
appropriate batches of normal and unusual process materials.
I also thank the Mathworks Inc. for providing Matlab 5.2 software, FMC International
for supplying Avicel samples and Foss NIRSystems, Buhler AG and Spectral
Dimensions Inc. for use of near infrared instruments.
I greatly appreciate the support and encouragement of my parents, Steve and Angela,
over the years - especially during my Ph.D.
Contents
Title 1
Abstract 2
Acknowledgements 3
List of Abbreviations 9
List of Symbols 12
1. Introduction 17
1.1 Pharmaceutical Quality Control, 17
1.2 Process Based Measurements, 18
1.3 Aim, 20
1.4 Principles of Near Infrared Spectroscopy, 22
1.5 Diffuse Reflectance Spectroscopy, 25
1.6 Near Infrared Spectral Data Pre-processing, 29
1.7 Multivariate Analysis, 32
1.8 Multivariate Statistical Process Control, 41
2. Measurement of Powdered Pharmaceutical Material 48
Particle Size by Near Infrared Spectroscopy
2.1 Introduction, 48
2.2 Review of The Literature, 49
2.3 Materials Used, 50
2.4 Sample Preparation, 52
2.5 Reference Particle Size Analysis, 53
2.6 Near Infrared Reflectance Measurements, 54

2.7 Data Analysis, 55
2.8 Measurement of The Number Median Particle Size, 55
2.9 Measurement of The Cumulative Percentage Frequency Particle Size
Distribution, 65
2.10 Measurement of The Percentage Frequency Particle Size Distribution, 75
2.11 Classification of Excipient Grades by Cluster Analysis Methods, 84
2.12 Conclusion, 93
3. Multivariate Statistical Process Control of a Pharmaceutical 95
Manufacturing Process Using Principal Components Analysis
of Near Infrared Measurements
3.2 Background and Overview of The Process, 96
3.3 Materials, 97
3.4 Reference Analytical Data, 98
3.5 Near Infrared Measurements, 106
3.6 Data Analysis And Pre-treatment, 107
3.7 Multivariate Statistical Quality Control Methods, 114
3.8 Multivariate Statistical Quality Control of Pharmaceutical Blends,
3.9 Multivariate Statistical Quality Control of Pharmaceutical Tablets,
3.10 Multivariate Statistical Quality Control of The Entire Process, 166
3.11 Summary of Results, 176
3.12 Conclusion, 178

4. Multivariate Statistical Process Control of a Pharmaceutical 180
Process Using Partial Least Squares Regression (PLSR) of
Near Infrared And Reference Analysis Measurements
4.2 Near Infrared And Reference Analysis Data Sets Used, 181
4.3 Statistical Quality of Pharmaceutical Blends And Tablets by Singleblock
PLSR, 182
4.4 Statistical Quality Control of The Entire Process by Multiblock PLSR,
191
4.5 Summary of Results, 217
4.6 Conclusion, 220
5. Multivariate Image Analysis of Near Infrared Multispectral 222
Blend Images For Quality Control of Pharmaceutical Blending
5.2 Materials Studied, 223
5.3 Sample Preparation, 223
5.4 Liquid Crystal Tuneable Filter InSb Focal Plane Array
Near Infrared Imaging Microscopy, 224
5.5 Multivariate Image Analysis, 225
5.6 Image Cube Data Pre-treatments, 229
5.7 Multiway Principal Components Analysis of Multispectral
Images, 235
5.8 Particle Size Analysis of Unmilled Crystalline Material and Drug
Substance, 245
5.9 Multivariate Monitoring of Blend Quality, 251

5.10 Alternative Approaches to Monitoring Blend Quality, 269
5.11 Conclusion, 271
6. Conclusion 272
References 274
Appendix A. List of Publications 280
Appendix B. Tables for PCA And Multiway PC A Data Sets 281
Appendix C. Tables for Singleblock PLSAnd Multiblock PLS Data Sets 319
Appendix D. Data Sets for Chapter 2
(CD-ROM, inside back cover (files in ASCII tab delimited format))
Sections 2.8 and 2.9
avicelcal - Near infrared reflectance spectra of microcrystalline cellulose
samples (calibration set)
avicelval - Near infrared reflectance spectra of microcrystalline cellulose
samples (validation set)
avicelquantilescal - Particle size data for avicelcal (calibration set)
avicelquantilesval - Particle size data for avicelval (validation set)

Section 2.10
avicelmastersizer - Percentage frequency particle size data of
microcrystalline cellulose samples for
mastersizerspectra
highsize - Particle size intervals for avicelmastersizer
mastersizerspectra - Near infrared absorbance spectra of
microcrystalline cellulose samples for
avicelmastersizer
Section 2.11
knnlactosefastflo - Near infrared reflectance spectra of lactose Fastflo
knnlactoseregular - Near infrared reflectance spectra of lactose Regular
knnavicelphlOl - Near infrared reflectance spectra of Avicel PH 101
knnavicelphlOl - Near infrared reflectance spectra of Avicel PH 102

LIST OF A B B R E V I A T I O N S
approx. approximation,
BN batch number,
BaF? barium fluoride.
CD-ROM compact disc read only memory.
CL control limit.
C. of A. certificate of analysis.
CV coefficient of variation,
d. f. degrees of freedom.
DT quadratic baseline detrend.
FALLS forward angle laser light scattering.
Fig. figure.
FT-NIR Fourier-transform near infrared.
HPLC high performance liquid chromatography.
InSb indium antimonide solid state detector.
Intact near infrared tablet transmission module.
knn k nearest neighbour observations in pattern space to a target pattern
vector, calculated from the Euclidean distance.
LC-TF liquid-crystal tuneable filter.
Max. maximum.
MHz megahertz.
MPCA multiway principal components analysis.
MB PLS multiblock partial least squares.
MB PLSR multiblock partial least squares regression.
MEWMA multivariate exponentially weighted moving average.

MIA multivariate image analysis.
MLR multiple linear regression.
MSC multiplicative scatter correction.
MSPC multivariate statistical process control.
NIPALS non-linear iterative partial least squares.
NIR near infrared.
NIRS near infrared spectroscopy.
PCA principal components analysis.
PC principal component.
PCR principal components regression.
PLS partial least squares or projection to latent structures.
PLSR partial least squares regression.
PLSRl non-linear iterative partial least squares regression algorithm 1.
PLSR2 non-linear iterative partial least squares regression algorithm 2.
PRESS predicted residual error sum of squares statistic.
PTFE polytetrafluoroethylene.
RAM random access memory.
RCA near infrared Rapid Content Analyser module for diffusely reflective
materials {eg. powders).
RMSEC root mean square error of calibration.
RMSEP root mean square error of prediction.
RSS residual sum of squares.
SEC standard error of calibration.
SEM scanning electron microscopy.
SEP standard error of prediction.
Sg2dl 1 Savitzky-Golay 11 point quadratic second derivative digital smoothing

10
filter.
SIMCA soft independent modelling of class analogy.
SNV standard normal variate.
SNV-DT standard normal variate transformation followed by quadratic baseline
detrend.
SS residual sum of squares.
%SS cumulative percentage sum of squares.
UCL upper control limit.
UV ultraviolet.
11
LIST OF S YMB OL S
a 1. index for principal component or partial least squares dimension; 2.
confidence interval.
a index for multiple linear regression wavelength or coefficient.
a proportionality constant in apparent absorbance equation.
A(l%, 1cm) specific absorbance of a filtered, 1% solution of 1 cm pathlength.
P predictive partial least squares model matrix of regression coefficients.
0 trace of residual covariance matrix.
A 1. wavelength; 2. multivariate exponentially weighted moving average
weight coefficient.
Afl eigenvalue of ath principal component.
|l lO-*.
JLL 1. reduced mass; 2. mean signal of a near infrared spectrum; 3.
population mean vector.
(T 1. standard deviation of a near infrared spectrum; 2. population standard
deviation.
e absorption coefficient.
Z population variance-covariance matrix.
El theoretical sample generalised variance,
chi-squared statistic.
A 1. absorbance; 2. rank of multivariate model (eg. principal components
analysis or partial least squares); 3. Anderson’s asymptotic normal
approximation.
12
bo multiple linear regression equation intercept.
b] multiple linear regression equation coefficient.
Z?2 multiple linear regression equation coefficient.
c 1. speed of light in a vacuum; 2. intercept of linear regression of near
infrared predicted and reference analysis measurements; 3. molar
concentration.
D Mahalanobis distance.
Dpq Euclidean distance between observation p and q.
dso number median particle size.
dx number percentage quantile particle size.
E residual matrix after multivariate modelling {eg. principal components
analysis or partial least squares).
Ea residual intensity image matrix in multivariate image analysis.
El residual matrix after multiblock partial least squares modelling of X I
block.
E2 residual matrix after multiblock partial least squares modelling of X2
block.
Ey potential energy of a vibrational energy level of quantum number, v.
F residual matrix of unmodelled variance in reference analysis data, Y.
F critical point of the F distribution.
/ vibrational frequency of a diatomic molecule.
fe uniform spacing between vibrational energy levels.
F{Roo) Kubelka-Munk function.
g weight coefficient for weighted chi-squared distribution.
h 1. higher order term in a vibrational energy level, Ey, 2. degrees of
freedom of weighted chi-squared distribution.

13
lo Intensity of incident near infrared radiation.
Irefl Intensity of reflected near infrared radiation.
k 1. bonding force constant; 2. molar absorption coefficient in Kubelka-
Munk theory of diffuse reflectance.
constant term in Kubelka-Munk function equation, equivalent to scatter
coefficient, s, divided by the product of 2.303 times the absorption
coefficient, e.
log(l//?) absorbance of diffuse reflectance near infrared spectral data,
io g (im absorbance of transmission near infrared spectral data.
m 1. slope of linear regression of near infrared predicted and reference
analysis measurements; 2. number of observations; 3. sample mean of Q
statistic.
fraction of mass of a component in a material.
1. refractive index in Fresnel equation of regular reflectance; 2. number
of variables or principal components.
Kronecker product.
1. matrix of principal components analysis loadings; 2. matrix of partial
least squares loading vectors for X block.
PI matrix of multiblock partial least squares loading vectors for X I block,
P2 matrix of multiblock partial least squares loading vectors for X2 block.
P 1. probability level; 2. number of variables.
Pa partial least squares loading vector for matrix X in one partial least
squares dimension, a.
qa partial least squares loading vector for Y matrix in partial least squares
dimension, a.
Q 1. matrix of loading vectors for Y matrix; 2. squared prediction error

14
matrix.
Qa residual distance to model intensity image matrix in multivariate image
analysis.
Qa critical value for Q statistic.
Q95 95% critical value for Q statistic.
Q9 9 99% critical value for Q statistic.
R 1. reflectance; 2. multiple correlation coefficient; 3. principal components
analysis cross-validation statistic.
R^ absolute diffuse reflectance from an opaque, diffusely reflecting material
of infinite thickness.
Roo relative diffuse reflectance from an opaque, diffusely reflecting material
of infinite thickness.
Rreg regular reflectance.
r simple linear correlation coefficient.
s scattering coefficient in Kubelka-Munk theory of diffuse reflectance.
S sample variance-covariance matrix.
151 sample generalised variance of sample variance-covariance matrix, 5.
T transmission.
7^ Hotelling’s 7^ statistic.
multivariate exponentially weighted moving average Hotelling’s 7^
statistic for observation i.
T ^n . m - n ,a Hotelling’s 7^ control limit for principal components analysis or partial
least squares model of dimension n, with m control batches and m -n d. f.
and confidence interval a.
T principal components analysis or partial least squares scores matrix for X
15
matrix.
ta partial least squares score vector for X matrix in partial least squares
dimension, a.
tCa consensus multiblock partial least squares score vector in partial least
tla multiblock partial least squares score vector for X I matrix in partial least
î2a multiblock partial least squares score vector for X2 matrix in partial least
Ua partial least squares latent vector for Y matrix.
V 1. vibrational quantum number; 2. variance of Q statistic sample.
W matrix of partial least squares weight vectors for X matrix.
W principal components analysis cross-validation statistic.
Wa partial least squares weight vector for X matrix in dimension, a.
X matrix of near infrared observations-by-spectral wavelength.
XI multiblock partial least squares matrix of near infrared observations-by-
spectral wavelength at first process stage.
X2 multiblock partial least squares matrix of near infrared observations-by-
spectral wavelength at second process stage.
X pattern space vector.
Xe anharmonicity constant.
Y matrix of reference analysis measurements-by-variable.
yxj near infrared spectral data at wavelength Xj.
Zi multivariate exponentially weighted moving average vector for
observation i.
16
C HA P T E R 1
Introduction
1.1 Pharmaceutical Quality Control
Quality control of pharmaceutical manufacturing processes requires physical and
chemical characterisation of raw materials, process intermediates and finished products.
This is most often achieved through a series of tests which are performed in a laboratory
situated away from the production area. Analysis usually requires destruction of the
matrix of a representative sample of product, either for separation of the individual
components to facilitate their quantitative measurement or to determine physical
characteristics. For example, chemical tests may include tablet strength assay by high
performance liquid chromatography (HPLC) and total moisture content assay by Karl
Fischer titration. Physical tests may include tablet friability or particle size analysis.
These conventional tests measure the average amount of a component and its variance
within a batch but do not assess the distribution of these components within an
individual dosage unit or small sample of process material. With well controlled
processes, knowledge of the average quantity of a component and its variance within a
batch provides assurance that a batch of product will conform to its specification.
However there exists the potential for such a batch to produce a product of unexpectedly
low quality if the distribution of materials within dosage units is not uniform and if this
results in physicochemical instability. For example the spatial distribution of
components, such as moisture (which may exist as surface and free moisture and water
of crystallisation), drug substance (which may include polymorphic crystalline forms)
and excipients (for example disintegrant) may not be homogeneous within an individual
dosage unit or sample of blended powder. Non-uniform distribution of free moisture

17
within a dosage unit may result in instability of the drug substance which would alter its
biological efficacy. As conventional analytical tests examine representative samples of
product, this type of variation may go unnoticed.
An added disadvantage with remote laboratory analysis is the often lengthy time for
analysis at each process stage. Pharmaceutical products are therefore mostly produced
in batches, rather than through more cost effective continuous processes.
1. 2. Process Based Measurements
The emergence of modem spectrometric technologies that provide rapid and sample
non-invasive measurements has generated considerable interest in their application to
measurement and control of pharmaceutical processes. Near infrared spectroscopy
(NIRS) is regarded as a viable alternative to the traditional pharmacopoeial analyses
owing to its speed of measurement and high signal to noise ratio and the ability to
perform both reflectance and transmittance measurements of intact dosage forms and
powders. The wealth of information recorded in the near infrared (NIR) spectrum of a
sample matrix contains considerable chemical and physical information. NIRS has
therefore received recognition as a highly valuable technology within the
pharmaceutical industry.
Rapid measurements of process performance may be taken in the production area in
real-time, rather than in a remote laboratory, by use of fibre-optic probe instruments.
These may be used throughout a manufacturing process: for identification and
qualification of excipients through to analysis of powder blending by incorporation of a
fibre-optic through the wall of a blender (Hailey 1996; Maesschalck et al, 1998) and of
the tablets by reflectance and transmission spectroscopy (Andersson et al, 1999). The
advantage of process based measurements is that data are generated and analysed in
real-time thus providing the potential for efficient control of the process.
18
NIR measurements acquired throughout a manufacturing process are likely to contain
information that relates to the process performance history of a batch. Collectively these
measurements would form a ‘process fingerprint’. To date, NIRS pharmaceutical
process control applications have only focused on discrete parts of a process, such as
blending (Hailey et al, 1994, 1996). Complete statistical control of processes from start
to finish, have however been demonstrated in chemical engineering processes. These
applications made multivariate statistical comparison of batch process data against a
reference set of data from past successful batches and produced excellent monitoring
results (Nomikos and MacGregor, 1994; MacGregor et al, 1994).
For similar monitoring to be successfully demonstrated with NIRS of pharmaceutical
processes, the results must compare favourably with those of existing conventional
analyses. Once demonstrated, such models could be used as part of an intelligent system
for future process control by NIRS. This ‘smart’ manufacturing system, envisaged by
Hailey (1996), is also referred to as parametric release. This is defined by the European
Organisation for Quality, EOQ 111/91 as:
“ A system o f release that gives assurance that the product is o f the intended quality
based on information collected during the manufacturing process"'.
Currently, the only European Pharmacopoeia approved method of parametric release is
for terminally sterilised sterile products {The Industrial Pharmacist, 1999,10, 9).
However, the potential exists for applying these principles to other pharmaceutical
processes, such as a tabletting process, since the British Pharmacopoeia 1999 states
that: '"''...parametric release is, in appropriate circumstances, not precluded by the need
to comply with the pharmacopoeia".
19
1.3 Aim.
The aim of this investigation was to determine the ability of NIR spectroscopy for
statistical process control of a pharmaceutical tablet manufacturing process, from raw
materials through to blends and tablets. The process studied was the manufacture of a
Pfizer Ltd. marketed tablet with which some manufacturing anomalies had been
experienced over a manufacturing period. Samples from the unusual batches and also
from a considerable number of normal batches were obtained from the Quality
Assurance laboratory for retrospective NIR analysis.
Though NIR spectra of solid materials contain considerable chemical and physical
information, the physical aspect, which is largely due to particle size, is often quoted as
being a hindrance to chemical analysis. However, particle size information is useful
because it exerts a major role in determining process performance. Both chemical and
particle size analyses are required of all raw materials prior to manufacture and the
ability to determine both with a single, rapid NIR measurement would be invaluable.
The use of NIRS for chemical identification has been described (Candolfi et al, 1999a,
1999b), however an NIR method of identification and qualification which includes
particle size analysis has not been described. The ability to both identify raw materials
and to determine their particle size distributions by NIRS was therefore examined
(Chapter 2).
NIRS has also been used for on-line and at-line control of other pharmaceutical process
stages. These include blending applications (Maesschalck et al 1998; Sekulic et al,
1998) and control of tablet film coating (Andersson et al, 1999). At present, the use of
NIRS for statistical process control of an entire pharmaceutical manufacturing process
has not been demonstrated. In this study, the ability of NIRS to allow statistical process
control of an entire pharmaceutical tabletting process was investigated. Multivariate
statistical models of NIR data of blends and tablets were generated and their ability to
20
enable discrimination of high quality batches from anomalous batches was determined.
Consistency between blend and tablet models was examined as this would indicate
whether NIR measurements form a process performance fingerprint. Multivariate
models were used since these reduced the dimensionality of the data from several
hundred correlated variables to a few orthogonal variables which described the
systematic variance in the data. These models were therefore easier to interpret by
multivariate statistical process control (MSPC), which is in keeping with the concept of
statistical process control {Statistical Process Control, ed. Mamzic, 1995). The
multivariate models examined were ‘model-free’ models derived solely from the NIR
data (Chapter 3) and models derived from both NIR measurements and certificate of
analysis (C. of A.) data (Chapter 4). Comparison of the results of these two model types
was made to determine the necessity of C. of A. data.
A collection of some of the poor blends were also studied by NIR imaging microscopy
(Chapter 5). This technique was investigated in the hope that it would provide detailed,
spatially resolved chemical and physical information of the blends and thereby enable
diagnosis of reasons for poor blend performance. The NIR image data were subjected to
multivariate analysis to enable this to be achieved.
Overall conclusions of the suitability of NIRS for multivariate statistical process control
of this tablet manufacturing process were drawn after consideration of the pooled results
(Chapter 6).
21
1.4 Principles of Near Infrared Spectroscopy
The near infrared region of the electromagnetic spectrum was the first non-visible
portion of the electromagnetic spectrum identified, and was discovered by William
Herschel in 1800 (Stark et al, 1986). This region lies between the visible and mid
infrared regions of the electromagnetic spectrum and is defined by the American
Society for Testing and Materials’ (ASTM) Working Group on NIRS as the spectral
region spanning the wavelength range 780 to 2526 nm (Stark et al, 1986). In this region,
absorption of NIR radiation is due to overtones and combinations of fundamental mid
infrared vibration bands (Whitfield, 1986). These produce overlapping absorption
bands, hence visual identification of chemical groupings of a molecule from its NIR
spectrum is more difficult than from the ‘finger-print’ region of its mid infrared
spectrum. The overtone and combination bands are one to three orders of magnitude
weaker than the fundamental bands. This is advantageous for sampling of solids and
liquids, as NIR radiation tends to penetrate further into samples than mid infrared
radiation (Blanco et al., 1998).
For a material to absorb in the infrared region, the incident light must be of sufficiently
high energy to produce vibrational transitions in the molecules of the material. This
means that the frequency of the infra red light should match the fundamental vibration
frequency of a given molecule, and that a change in its dipole moment should occur due
to the fundamental vibration (Blanco et al, 1998).
The vibrational frequency,/, of a diatomic molecule may be assumed to follow the
harmonic oscillator model (Osborne et al, 1993), which obeys Hooke’s law as:
1 IT (lAl)
iTTC^jU
22
where c is the speed of light in a vacuum, k is the bonding force constant and the
reduced mass (Blanco et al, 1998).
The variation of the potential energy with bond distance may be described by a parabola
centred about the equilibrium distance and has evenly spaced vibrational energy levels.
Each energy level, E^, is given by:
^ v = / ( v + i) (1-4.2)
w here/is the vibrational frequency and v is the vibrational quantum number. The
selection rule for harmonic oscillator transitions is Av = ±1, hence the energy difference
between two consecutive energy levels will always be E(v+]) -E v =f, which is the
‘fundamental frequency’ of the vibration band (Blanco et al, 1998).
In polyatomic molecules, vibrations tend to involve complex movements of their
constituent atoms and in practice, the vibrations tend to be non-harmonic. This is
because real bonds, though elastic, do not exactly obey Hooke’s law due to coulombic
repulsion between nuclei (Osborne et al, 1993). The result of this is that the potential
energy curve for real bonds is only approximately parabolic. Deviation from the
parabola is most pronounced at the upper energy levels, where spacing between energy
levels also decreases with energy level. The harmonic oscillator model may be
improved by adding higher order terms to equation (1.4.2). The energy, Ev, for each
energy level is therefore described by (Blanco et al, 1998):
£ v = / . ( v + i ) - / . ^ . ( v + i ) + /î (1-4.3)
where v is the vibrational quantum number, Xg is the anharmonicity c o n s t a n t , i s the
23
uniform spacing between levels corresponding to a parabola with its centre at the
equilibrium distance and its the same curvature as the real potential energy function and
h is a. higher order term. Neglecting higher order terms, the frequency of a transition
between adjacent energy levels (v + 1) is dependent on the vibrational quantum
number (Blanco et al, 1998):
/ = /,[ 1- 2a : ,( v + 1)] (1.4 .4)
A consequence of introducing the quadratic term into Hooke’s law is that the selection
rule becomes Av = ±1, ±2, ..,±n, where n is an integer. As a result, other higher
frequencies, known as overtones or harmonics, appear in addition to the fundamental
band, at frequencies approximately n times greater than the fundamental frequencies.
The intensity of the overtones decays abruptly since transition probability falls rapidly
with increase in vibrational quantum number. In practice, just the first two to three
overtones are observable (Osborne et al, 1993). Polyatomic molecules may possess
several fundamental frequencies, and therefore will show simultaneous changes in the
energies of two or more vibrational modes. The frequency observed will either be the
sum of, or the difference between, fundamental frequencies. The result is very weak
bands which are known as ‘combination’ and ‘difference’ bands (Osborne et al, 1993).
Combination bands are unlikely to be observed in NIR spectra unless they arise from
two vibrations, linked either through a common atom or through several bonds;
difference bands arise from absorption of molecules which are in an excited state, and
have a very low probability of being observed in NIR spectra at room temperature
(Osborne et al, 1993). Anharmonicity produces combination bands which are slightly
smaller than the combined fundamental frequencies involved.
Most NIR bands arise due to overtones and combinations of various hydrogen bonds.
24
eg C-H , N-H, O -H and S-H. These overtones and combinations of hydrogen bonds
are observed in the NIR region due to the small mass and large force constant of
hydrogen (Blanco et al, 1998). Other groups, such as C=0, C-C, C -F and C-Cl exhibit
very weak overtone bands in the NIR region, which in practice may be difficult to
observe.
1.5 Diffuse Reflectance Spectroscopy
NIR light which penetrates a powder’s surface is scattered by particles many times
before emerging back through the surface. This is known as diffuse reflectance. In the
NIR region, solid materials tend to exhibit low molar absorptivity (typically 0.01 to 0.1
moF^ dm^ cm"’) (Blanco et al, 1998). This enables NIR diffuse reflectance
measurements to be made of solid materials.
1,5,1 Kubelka-Munk Theory o f Diffuse Reflectance
The most widely accepted theory which describes diffuse reflectance and the
transparency of light-scattering and absorbing layers of solid materials is the Kubelka-
Munk theory (Frei and MacNeil, 1973). This theory was developed for infinitely thick
opaque layers, and may be written as:
_ k (1 5.1)
2 R '. ' i
where R’oois the absolute reflectance of the layer, k is its molar absorption coefficient,
and s is the scattering coefficient. Instead of determining R’oo, it is usual to work with
the more convenient relative diffuse reflectance, which is measured against a
stan4ard typically made of either MgO, BaS0 4 , polytetrafluoroethylene (PTFE) or
ceramic. In these cases, k is assumed to have a value of zero and the absolute reflectance
25
is assumed to be one. However, since the absolute reflectance of standards exhibiting
the highest /?’oo values never exceeds 0.98 to 0.99, the actual relationship becomes:
Resample _ ^ (1.5.2)
R standard
and it is essential to specify the standard used (in Section 2, the diffuse reflectance
standards used were Spectralon (PTFE) (Labsphere Inc., North Sutton, NH, USA) and a
ceramic tile; in Section 3, the diffuse reflectance standard used was a ceramic tile).
Equation (1.5.1) therefore approximates to:
2R s
which shows that a linear relationship should be observed between F(Roo) and the
absorption coefficient, k, provided that s remains constant. The scattering coefficient, s,
is rendered independent of wavelength by using particles whose size is large in relation
to the wavelength used.
Reflectance measurements of a sample, diluted with a non- or low-absorbing powder,
which are measured against the pure powder, have an absorption coefficient which may
be re-written as the product 2.30ec, where g is the absorption coefficient and c is the
molar concentration. The Kubelka-Munk equation (1.5.3) may then be written as:
2R k'
26
where k ’ is a constant equal to 5/ 2.303 e. As F(Roo) is proportional to the molar
concentration under constant conditions, the Kubelka-Munk relationship is analogous to
the Beer-Lambert law of absorption spectrophotometry. At sufficiently high enough
dilution, the regular reflection from the sample approximates that from the reflectance
standard and is therefore cancelled out in any comparison measurement.
A linear relationship between F{Roo) and c is only observed when dealing with weakly
absorbing substances (such as powdered pharmaceutical materials in the NIR region)
and only when the particle size used is relatively small (ideally around 1 pm in
diameter) (Kortum, 1969). In addition, any significant departure from the state of
infinite thickness of the adsorbent layer assumed in the derivation of equation (1.5.3)
results in background interference, which in turn causes non-ideal diffuse reflectance.
When either adsorbents with large particle size or large concentrations of the absorbing
species are used, plots of F{Roa) versus c become markedly non-linear at higher
concentrations. Kortum (1969) attributes this reflectance phenomenon to a combination
of both regular and diffuse reflectance. Regular reflection is a mirror reflection, whereas
diffuse reflection occurs when impinging radiation is partly absorbed and partly
scattered by a material such that it is reflected in a diffuse manner, i.e. with no defined
angle of emergence. Regular reflectance is described by the Fresnel equation (Kortum,
1969):
_ _ ^refi _ ( n - l Ÿ +n^k^ (1.5.5)

~ In ~ {n + \ Ÿ + n ^ k ^
where k is the absorption coefficient and n is the refractive index. Regular reflectance is
superimposed on diffuse reflectance and this is postulated to be the cause of deviation
27
from linearity between F(Roo) and c at high concentrations of the absorbing species. It is
therefore recommended that interference caused by regular reflection, Rreg, is eliminated
as much as possible. This may be achieved by using powders with small particle size
and by diluting the absorbing species with suitable diluents.
Other theories developed to describe diffuse reflectance have been shown to be special
cases or adaptations of the Kubelka-Munk theory (Kortum, 1969). A summarised theory
and derivation of the Kubelka-Munk function, which is applicable to opaque infinitely
thick powders (approximately 1 mm in thickness, or greater, for fine powders), has been
described (Kortum, 1969).
In practice, however, NIR spectroscopic data tend to be used as either raw data (relative
reflectance which is henceforth abbreviated to reflectance, R\ relative transmission
which is henceforth abbreviated to transmission, T) or as apparent absorbance, A
(Osborne et al, 1993):
(1.5.6)
A = log 10 = ac
where c is concentration and a ' is a proportionality constant. Equation (1.5.6) applies to
diffuse reflectance measurements and similarly, with transmission measurements,
transformation to apparent absorbance. A, is given by:
(1.5.7)
A = log 10 = ac
Though equations (1.5.6) and (1.5.7) are not based on the Kubelka-Munk theory, highly
satisfactory results are often obtained with these transformations in NIR spectroscopic
applications.
28
1.6 Near Infrared Spectral Data Pre-processing
The NIR spectral data obtained from diffuse reflectance and transmission measurements
of solid materials comprise chemical information, described in Section 1.5, and physical
information arising due to multiple scatter. This latter effect results in spectral offset and
curvature or ‘non-uniform’ baselines. For chemical constituent identification and
quantification, this physical aspect of the spectrum may not be considered useful.
Various mathematical transformations have been developed to eliminate the effect of
multiple scatter from the spectra. These are applied to individual NIR spectra prior to
quantitative or qualitative data analysis. This is frequently referred to as ‘data pre
processing’. Commonly applied pre-processing transformations include: absorbance
(described in Section 1.3); standard normal variate (SNV) (Barnes et al, 1989);
multiplicative scatter correction (MSG) (Ilari et al, 1988); quadratic baseline detrend
(DT) (Barnes et al, 1989) and first and second derivatives {Advances in Near-Infrared
Measurements, ed. Patonay, 1993).
The mathematical details for transforming reflectance and transmission measurements
to apparent absorbance were described in Section 1.5, equations (1.5.6) and (1.5.7)
respectively.
SNV transformation of an NIR spectrum is applied via the equation:
('a. ( 1 .6 . 1 )
= 1 ----
where y and r are the SNV transformed and original signals, respectively, at wavelength
Xj, jjL and cr are the mean signal of the original spectrum and the standard deviation of
I the spectrum respectively, over all J wavelengths. The effect of SNV transformation on
NCR spectra of materials of identical chemical composition but different particle size is
Iremoval of most spectral offset and considerable reduction in scatter-induced variation

29
between spectra of the same material.
MSC is a scatter correction transformation which produces results similar to SNV
(Dhanoa et al, 1994). However, this method of scatter correction requires that the mean
spectral response of a data set be calculated. Individual NIR spectra are then linearly
regressed on the mean spectrum according to the following equation:
= a + br^^+ei, ( 1.6 .2)
where y and r are the original and mean signals, respectively, at wavelength Àj,j = 1 ,...,
J wavelengths, a is the offset of the regression equation, b is the slope of the linear least
squares regression, ris the mean spectral response of the data set at wavelength Àj, and e
is the residual signal at wavelength Ay. The MSC transformed spectrum is obtained by
subtraction of the offset constant, a, from the spectral response at each wavelength,
followed by division of the slope term, b, at each wavelength.
DT is another popular scatter correction method which is commonly applied to NIR
spectra. The transformation involves fitting the original spectrum to a second order
polynomial:
y i , = “ + bjl, + cj^,+ e^ (1.6.3)
where y is the original spectral response at wavelength Ay, j = 1,..., J wavelengths, a
represents the spectral offset, b and c are the coefficients of the quadratic least squares
equation, and e is the residual signal at wavelength Xj. An estimate of the spectral
baseline is obtained from the first three terms on the right hand side of equation (1.6.3):
, .2 ^ . (1.6.4)
30
The estimated baseline is subtracted from the original spectrum to produce the residual
or ‘detrend’ spectrum. This scatter correction removes both spectral offset and curvature
from the original spectra.
The high signal to noise ratio of NIR spectral data enable derivative spectra to be
calculated by the difference method. The procedure to calculate difference derivative
spectra involves calculating the difference between spectral data points at evenly spaced
wavelengths. Transformation of an NIR spectrum to its first derivative may be
calculated according to:
where y is the first derivative of the original signal at r is the original signal at Xj.
The number of data points, used in the smoothing may be varied depending upon the
required amount of smoothing.
The second derivative difference spectrum may be calculated in a similar fashion to the
first derivative difference spectrum, according to:
= (%.. - )/ 2 ( 1 .6 .6 )
Derivative spectra may also be calculated by applying a least squares digital polynomial
smoothing filter, such as a Savitzky-Golay smoothing filter (Bromba, M. and Ziegler,
H., 1981, 1983). The latter method of calculation of the derivative spectrum is better
able to preserve peak heights than the difference method. Calculation of the second
derivative spectrum largely eliminates spectral offset and baseline curvature which
result from multiple scatter. In addition, chemical peaks in the spectra are resolved. If
the original spectral data is reflectance, the second derivative peaks have a positive
31
value; if the original spectral data is absorbance, the second derivative peaks have
negative values. The Savitzky-Golay method for calculating 2"^ derivative spectra was
used (quadratic, 11 data point: Chapters 2, 3 and 4; quadratic 15 data point: Chapter 5).
1.7 Multivariate Analysis
Multivariate analysis methods are commonly employed in NIR spectrometry. Typical
techniques used include principal components analysis (PCA) and partial least squares
regression (PLSR). These methods are used to reduce the dimensionality of the often
highly collinear spectral data, from several hundred variables (wavelengths), to a set of
new variables in reduced dimensional space. PCA is a useful technique that allows
exploratory data analysis and qualitative analysis of a spectral data set. PLSR is a biased
regression method that produces a set of new latent variables that maximise the
covariance between the spectral data and a set of reference data. With both of these
latent variable techniques, the various pre-processing transformations, described in
Section 1.6, may be applied to the spectral data, and their effect on the qualitative and
quantitative models produced determined.
1.7.1 Principal Components Analysis
As NIR spectra are highly collinear, it is often advantageous to transform them to their
principal components via PCA. This is a multivariate data reduction method, which
produces linear combinations of the original variables. These may be thought of as a set
of new variables that have the property of being uncorrelated (Jackson, 1991). The PCA
transformation is a two step transformation (Kirsch and Drennen, 1995). In the first
step, the cartesian co-ordinate system defined in multidimensional space, is translated to
the centre of a spectral cluster. This is achieved through variable-wise mean centring
32
of the spectral data. The next step is the rotation of the cartesian co-ordinate system to
describe, as nearly as possible, all the variations present in the spectral cluster. The co
ordinate system remains rectangular throughout the process and is moved rigidly from
one position to the next. This rotation step decomposes the spectral variation into
orthogonal (independent) components.
In the process of calculating each principal axis, the perpendicular distances between
the spectral data points and each axis are minimised. Hence, the first principal
component (PC) describes the largest amount of spectral variation. This is then
effectively subtracted from the data cluster. The second principal component is defined
in a similar manner to the first, except that during the rotation to the second principal
axis, the second axis remains perpendicular to the first. This orthogonal condition forces
each component to account for the maximum spectral variation remaining in the cluster.
Typically, only a small number of these iterations are required before additional
components account for random noise. These components are therefore usually ignored.
The transformation process ultimately produces spectral co-ordinates that are expressed
in a reduced dimensional space and effectively overcomes the problems of collinearity
and noise. The transformation to principal axes simplifies the selection of variables for
regression in quantitative analysis (variables may be added to or deleted from regression
equations without changing the coefficients of the remaining variables).
Mathematically, PCA decomposition of the variable-wise mean centred spectral data, X,
produces a score matrix, J , a loadings matrix, P, and a residuals matrix, E\
X = T P +E (1.7.1)
The loadings matrix comprises a loadings vector, for each extracted PC. Each of these is
a set of weights that identifies the variables (wavelengths) which contribute most to
33
that PC. As a result, it may be possible to assign physical or chemical interpretation to a
PC’s loadings vector. For example, the loadings may represent chemical peaks. The PC
score vector of each spectral observation is produced by the product of the original
spectrum with the loadings vector, and therefore will have score values for each PC
dimension that are related to the magnitude of each of the principal components in the
original spectra.
The value of the PC score value may therefore reflect the amount of a chemical or
physical constituent present in the sample. PC scores are therefore useful in cluster
analysis, principal components regression and statistical process control techniques.
1.7.2 PCA M odel Rank Determination
The number of principal components extracted from the spectral data may be selected
according to a number of criteria. One simple method for estimating the number of
useful components, the model rank, {i.e. those that account for systematic variance)
involves visual inspection of the loading vectors. PCs with loadings which appear to
represent noise, may be discarded. Another popular method is to calculate the
percentage sum of squares accounted for by the model. This involves calculation of the
sum of squares (%SS) of the original variable-wise mean centred data, X, and of the
residual matrix, E, after n PCs have been extracted, over all observations, m, and
variables, n:
^ m n m n ^ (1.7.2)
%SS = 100 X
1=1 y=l 1=1 7=1
Where this method is used to determine the number of PCs to retain, the process of
extracting PCs is terminated after an arbitrary %SS has been reached. Typically this
34
may be 95%SS or 99%SS.
Cross-validation is another popular method for determining the rank of a PC model. The
process involves randomly dividing a data set into a number of sub-groups. A PCA
model is calculated for the data set after removal of one of the sub-groups. The
remaining sub-group, is then projected onto the eigenvectors of the model, according to:
T = XP
The resultant scores of this sub-group are then used to estimate the original data of the
sub-group according to:
X =TP
This procedure is repeated for all subgroups, and for n PCs. The predicted residual error
matrix for all observations in a sub-group is calculated according to:
E ^ X - X (1.7.5)
The predicted residual error sum of squares (PRESS) for n PCs and all sub-groups is the
sum of the squared residual error matrix, E, in equation (1.7.5), over all observations, m
and variables, n, divided by the number of elements in the original spectral data set. For
zero PCs extracted, the value of PRESS used is the sum of squares of the original
spectral data, X, divided by the number of elements. The rank of the PCA model is the
number of PCs which provide the smallest PRESS value (Jackson, 1991).
Another statistic used to determine the rank of a PCA model is the R statistic (Lindberg
et al, 1983; Wold, 1978). This involves calculation of PRESS, as in equation (1.7.5). In
35
addition, PCA models are calculated with n PCs, using all of the spectral data. The
residual error sum of squares, is the sum of the square of E, from equation (1.7.5), over
all observations, m, and variables, n, divided by the number of elements in the spectral
data. The R statistic for each PC is then calculated according to:
(1.7.6)
j^^PRESS{n + \)
RSS(n)
where n is the number of PCs extracted. The first value for n is zero, and the RSS for
zero PCs extracted is the sum of squares of the spectral data, X, over all observations, m,
and over all wavelengths, n, divided by the number of elements in the spectral data.
With successive PCs extracted, the value of R should increase from near zero to unity.
Extraction of PCs terminates at n PCs when R exceeds unity. This implies that the latest
component extracted did not better the prediction errors than the previous component,
hence n - l PCs is the rank of the model. The W statistic (Eastment and Krzanowski,
1982) is another PCA cross validation statistic. It is calculated according to (1.7.7):
[PRESS in -1 ) - PRESS i n ) ] / (1.7.7)

W=
PR ESSin)ID ,
where
Dn f =m + p - 2 n (1.7.8)
= p i m - V ) - ^ i m + p -2 i) (1.7.9)
1=1 36
m is the total number of observations and p is the number of variables (wavelengths).
With successive PCs extracted, the value of W should fall. PCs are included in the
model up to the number, n, which return values of W greater than or equal to unity, with
additional PCs returning values for W less than unity.
1.7.3 Q Statistic PCA Residual Analysis
PCA models of NIR spectral data may be used for subsequent qualification of
unclassified NIR spectra, and for detection of outliers, using the Q statistic (Jackson,
1991; MacGregor et al, 1994; Nomikos and MacGregor; 1994). This statistic gives a
measure of the distance of the observations from the «-dimensional space and is
calculated as (Jackson, 1991);
Q =(x -x y (x -x ) (1.7.10)
where x represents the original observation and jc is the value of the observation
predicted by the PCA model. The critical value for Q may be calculated as (Jackson,
1991):
, eA % + i), 1
Qa =
0i=Tr(E) (1.7.12)
% = Tr (E^) (1.7.13)
37
6>3 = Tr(EO (1.7.14)
where E is the residual covariance matrix after n PCs have been extracted and:
6>.
(1.7.16)
y [ïë X
Observations whose Q values exceed the upper limit do not belong to the modelled
class. Such observations should be removed from the data set and the model re
calculated.
The confidence interval (or control limit) for Q requires calculation of the square and
third power of the residual covariance matrix, which is computationally lengthy. An
alternative and computationally faster method of calculating the control limits has been
described (Nomikos and MacGregor, 1995). This method approximates the squared
residuals, Q, to a weighted chi-squared distribution (g^h)- The weight (g) and the
degrees of freedom (h) are both functions of the eigenvalues of Z. Estimation of g and h
is based on matching moments between a g ^ h distribution and the reference distribution
of Q. The mean and variance of the g ^ h distribution (/ll = gh, = 2gh^) are equated to
the sample mean (m) and variance (v) of the Q sample. Previous studies (Nomikos and
MacGregor, 1995) have found this to be a quick and reliable method to estimate g and h
provided that the number of Q observations is sufficiently large. The control limit on Q
at significance level a for batch k is given by (Nomikos and MacGregor, 1995):

38
(1.7-17)
Where ^ im h , a is the critical value of the chi-squared variable with 2m h d. f. at
significance level a.
1.7.4 Partial Least Squares Regression
Partial least squares regression (PLSR) is similar to principal components regression,
and provides a set of latent vectors that are analogous to PCs (Kirsch and Drennen,
1995). Partial least squares (PLS) attempts to summarise as much of the variation in the
dependent variables as possible using only the relevant factors contained in the spectral
data. PLS has the advantage that it allows for measurement error in both the spectral and
independent data sets, whilst modelling the spectral data and correlating to the reference
data (chemical or physical). Only significant PLS components are retained, which
provides a noise reducing effect (Kirsch and Drennen, 1995).
The PLS procedure projects the information in the high-dimensional data spaces (%, Y)
down onto low-dimensional spaces defined by a small number of latent variables. The
NIR (%) and reference analytical data set {Y) are usually mean-centred and scaled to
unit variance and then decomposed as (MacGregor et al, 1994):
(1.7.18)
0=1
y = 'Z t y .+ F (1.7.19)
0=1
where the latent vectors t^ are sequentially computed from the data for each PLS
dimension (a= 1 , 2 , ..., A) such that the linear combination of the x vectors defined by
39
the latent variable:
(1.7.20)
and the linear combination of y vectors defined by the latent variable:
maximise the covariance between X and Y that isexplained at eachdimension. The
vectors and are loading vectors whose elements Waj and âj express thecontribution
of each variable jcj and yj, respectively, towards defining the new latent variables fa and
Wa.
The predictive PLS model is a biased regression model:
Y = X^ +F (1.7.22)
where the matrix of regression coefficients is given by:
P = W {P^W y'Q ^ (1.7.23)
where W, P, and Q are (A:*A), (â:*A) and (m*A) matrices whose columns are the vectors
Wa,Pa, and â The number of PLS dimensions, A, required to extract the information
from X and Y and provide the lowest prediction error in Y is usually determined by
cross validation. This is performed similarly for PCA, by randomly dividing
observations in X and their corresponding reference values in Y into subgroups. One
sub-group from X and its corresponding observations in Y are withheld from calculation
of the PLS model. Equation (1.7.22) is then used to predict the reference analytical
values of the spectral sub-group. The sum of squares of the differences in the predicted
40
and measured reference sub-group are calculated, over all observations, m, and PLS
components, a. The process is repeated for all subgroups, and the sum of squared
residuals of each sub-group summed over all sub-groups, to provide the value of
PRESS. The number of PLS components retained, a, is that value with the minimum
PRESS value.
Similar to PCA, the PLS loading vectors associated with the spectral data set, P, may
have physical or chemical interpretation. Their score values, T, and sum of squared
residuals, Q, may also be used in cluster analysis and statistical process control
techniques as described for PCA.
1.8 Multivariate Statistical Process Control
Statistical process control (SPC), is a technique which employs a related set of
statistically based tools for monitoring, analysing, controlling and effecting
improvements in the performance of a process {Statistical Process Control, ed. Mamzic,
1995). SPC is a proven technique, that is capable of producing dramatic results, and is
widely applicable to many processes where the output varies and where minimising
variability would improve operation.
The original concepts of SPC were devised by Walter A. Shewhart in the early 1920s
{Statistical Process Control, ed. Mamzic, 1995). Importantly, he observed that when a
process remains in a state of statistical control, the random distribution of each output
variable is repeatable and is therefore predictable from one period to another. In this
state, a process is said to be affected only by ‘common’ causes of variation. These are
random, uncontrollable phenomena that are inherent in the process. This observation led
to development of the Shewhart control chart. This chart plots sampled data with respect
to time, in a form which is visually easy to interpret and thereby determine whether a
process is operating normally {i.e. where only common causes of variation are
41
located in References under the title Statistical Process Control.

affecting the process), or whether assignable causes have affected the process and
moved it out of control. Periodically, a number of measurements of the process are
made and the average value plotted on a chart. The chart also shows upper and lower
control limits which are based on the standard deviation of the process variable when
only common causes of variation are affecting the process. Establishment of these
control limits is referred to as control phase 1. Control phase 2 monitoring involves
monitoring and control of future process observations using the control phase 1 limits.
With multivariate process measurements, it is often desirable to simultaneously monitor
and control a number of process characteristics. This is known as multivariate quality
control or multivariate statistical process control (MSPC) (Montgomery, 1997). Original
work in this area of SPC was carried out by Hotelling in 1947 in analysis of bombsight
data. With multivariate data, however, use of individual monitoring control charts for
each variable is associated with an increase in the overall type I error, a. This is the
probability of rejecting the null hypothesis {i.e. that the process is operating in a state of
statistical control) when it is correct. With a set of n independent measured and
controlled variables, the overall type I error, a ’ is calculated according to (Montgomery,
1997):
a'= l-{l-a y ( 1 .8 . 1 )
However, if the n simultaneously controlled variables are not totally independent,
equation (1.8.1) does not hold. An alternative approach, recommended by Jackson
(1980), is to use the PCs of the process data for multivariate monitoring. This has the
advantages of reducing the number of variables for monitoring and provides a known
type I error since the PCs are independent.
Jackson (1991) has identified four requirements to achieve MSPC:

42
1. A single answer should be available to answer the question: ‘Is the
process in control?’
2. An overall Type 1 error should be specified.
3. The procedure should take into account the relationships among the
variables.
4. Procedures should be available to answer the question: ‘If the process is
out of control, what is the problem?’
PCs of NIR spectra may be used for MSPC purposes. From these, sample variance-
covariance matrices of the process data may be estimated. This is known as ‘Control
phase r (Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and is performed to
establish statistical control levels of multivariate process data.
1.8.1 Mahalanobis Distance and Hotelling*s Control Ellipses
During control phase 1, control ellipses (Montgomery, 1997) are calculated for the PCs
extracted from the spectral data. Assuming that the data for n variables follows a
multivariate normal distribution, the probability function (Massart et al, 1988) ,/(x), is
given by:
( 1. 8 .2 )
where fi represents the population mean vector (the centroid in the pattern space), Z is
the population variance-covariance matrix. The square root of the expression in brackets
is the generalised or Mahalanobis distance (Massart et al, 1988), D:
43
=(x-nyi.-\x-n) (1.83)
This method of classification assumes that each class may be modelled by a multivariate
normal distribution. Dîs computed for each object, x, in the learning class and follows
a chi-squared distribution. This enables 95% confidence limits for the ellipse to be
calculated. In control phase 2, an unknown sample is classified by measuring the
distance between itself and each of the modelled classes’ centroids. In practice,
however, the population variance-covariance matrix requires estimation from the data.
Hence equation (1.8.3) therefore becomes:
T'^ = ( x - n y s ~ ' { x - n ) (1.8.4)
where 7^ is the Hotelling’s 7^ distance, x is the unknown sample vector to be classified,
11 is the target class mean vector and S is the target class sample variance-covariance
matrix. This statistic has been shown to follow an F distribution (MacGregor et al,
1994; Neave, 1995) with upper control limit (UCL) given by (MacGregor et al, 1994):
where n is the number of variables (principal components), m is the number of samples
(batches), m-n) is the upper 100«% critical point of the F distribution with («, m -n)
degrees of freedom. The value of Cf typically is set to 95% or 99%.
1,8.2 Multivariate Exponentially Weighted M oving Average (MEWMA)
The Hotelling’s 7^ uses information from only current samples and is therefore
44
insensitive to small or moderate drift in the mean vector. The MEWMA (Lowry et al,
1992) control chart is a moving average control-chart which can show trends in the
process, such as drift and systematic variation. The MEWMA Z„ is given by (Lowry et
al, 1992):
Z, = A jc,+(1-A)Z,_, (1.8.6)
where A lies in the range: 0 - 1; jc, is the vector of the /th sample and Z,.y is the value of Z
for the /-1th sample. The value of plotted on the control chart is given by:
= z : z ; 'z . (1.8.7)
where the variance-covariance matrix, Zzi, is given by:
2, (1.8.8)
This statistic uses the process mean score vector and sample variance-covariance matrix
determined in control phase 1. The upper control limits used with this chart are based on
tabulated data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models
with a greater number of PCs may be assigned control limits based on the chi-squared
distribution (Neave, 1995) multiplied by a factor of 1.05, as suggested by Lowry et al
(1992).
1.8.3 Process Variance: Sample Generalised Variance
The sample generalised variance (Montgomery, 1997), ISI, is the determinant of the
sample variance-covariance matrix, S, and is a widely used measure of multivariate

45
dispersion that may be used to monitor the variability of a process over time.
With two variables (in this case PCs), the UCL and mean control limit, CL are
calculated from (Montgomery, 1997):
UCL=\I}{b, ) (1.8.9)
CL = b X ( 1. 8 . 10)
where bi and b%are given by:
( 1. 8 . 11)
and
' in-lŸ"
fi “7+2)-p[ ~7)
;= 1 j=]
( 1. 8 . 12)
However the control limits for the sample generalised variance above, are applicable
only for situations where two variables are monitored. With more than two variables,
Alt (1985) recommends the use of Anderson's asymptotic normal approximation
(Anderson, 1984):
Aloi A
-1 (1.8.13)
46
Where S is the sample covariance matrix of a batch with n variables and m degrees of
freedom. This sample generalised variance is asymptotically normally distributed with a
mean of zero and variance 2n; IZI is a theoretical sample generalised variance and is
equivalent to the mean sample generalised variance of all batches used in control phase
1 (Alt, 1985; Anderson, 1984).
1.8.4 Diagnosis o f Out-of-Control Observations
Hotelling's Stacked Bar Charts And Shewhart-Type Plots on Individual PC Scores
Where the process mean vector of batch data exceeds the upper control limit in
Hotelling’s control phase 2, it is useful to attempt a diagnosis of the problem (Kourti
and MacGregor, 1996), if the PCA models have physical or chemical interpretation
(Nomikos and MacGregor, 1995) {i.e. if the loadings of the PCs appear to represent
physical or chemical components).
The simplest method of achieving this is to examine the individual PC contributions to
the multivariate 7^. These contributions, may be plotted as a stacked bar chart
(Jackson, 1980, 1981a, 1981b), or individually. Large values of fî would indicate
which PCs account for the problem.
In addition, Shewhart control charts may be constructed for the individual PC scores
(Jackson, 1991). These use the mean and range for each PC score sample, measured for
the historical data set. The overall or grand mean is equal to zero (unless additional
observations are projected into the model space) with each model, whilst the range is
equal to the standard deviation of the observations multiplied by the corresponding
value of the r-distribution (95 % and 99% control limits). This is used instead of the
traditional three-sigma limits to avoid Type 1 errors (Jackson, 1991).
47
CHAPTER 2
Measurement of Powdered Pharmaceutical
Material Particle Size by Near Infrared
Spectroscopy
2.1 Introduction
The measurement of paiticle-size for pharmaceutical materials is important (Aulton,
1988; Barth et al, 1987) because it influences bulk physical properties (Washington,
1992), and determines the ability of powders to flow, mix, granulate and dissolve. It is
also often a requirement in pharmaceutical manufacturing processes that particle size
measurements are performed on raw materials {British Pharmacopoeia 1999, 1999).
Commonly employed methods of measurement are forward angle laser light scattering
(FALLS) and electrical zone sensing (Simmons, 1993). A disadvantage with these
methods is that samples generally need to be analysed away from the production area,
which is time consuming and leads to manufacturing delays (Hailey et al, 1996).
In this chapter, the ability of NIRS to determine powdered pharmaceutical material
particle size is examined. Section 2.8 examines measurement of median particle size by
multiple linear regression of NIR measurements and FALLS measurements. In Section
2.9 this method is further developed to measure the cumulative percentage frequency
particle size distribution of microcrystalline cellulose by NIRS. In Section 2.10, the
percentage frequency particle size distribution of this material is measured by NIRS
using PLSR. Section 2.11 deals with classification of grades of powdered material by
48
cluster analysis methods.
2.2 Review of The Literature
The potential application of NERS for particle size determination of powdered
pharmaceuticals has long been suggested (Ciurczak et al, 1986; Plugge and Vlies, van
der, 1993; Vlies, van der, 1996). Despite these suggestions, NIRS has remained largely
a technique used for chemical analysis (Dubois et al, 1987; Aucott et al, 1988, Cowe et
al, 1989, Dreassi et al, 1995a, 1995b, 1995c; Wargo and Drennen, 1996; Forbes et al,
1996).
Powdered pharmaceutical materials may be suited for NIR particle size determinations
since they are diffusely reflecting materials (Ciurczak et al, 1986). In the NIR region
(1000 nm to 2500 nm) these materials both absorb and scatter light, resulting in spectra
with non uniform baselines and varying offsets. These scatter effects vary with the
particle size (Vlies, van der, 1996), sample porosity (Hailey et al, 1996) (and hence
compaction pressure) and with the wavelength (Bull, 1991) and can be described using
Rayleigh and Mie theory (Kortum, 1969), or alternatively using the Kubelka-Munk
theory of diffuse reflectance (Kortum, 1969).
Previous studies that have examined the effects of particle size on NIR spectra have
demonstrated that reflectance varies non-linearly with particle size (Kortum, 1969;
Norris and Williams, 1984; Dari et al, 1988; Ciurczak et al, 1986). Ciurczack et al
(1986) found that reflectance exhibited an inverse relationship with mean particle size in
agreement with Mie theory (Kortum, 1969). However, this relationship does not
necessarily apply in all cases and is dependent on the shape of the particle size
distribution of the sample (Kortum, 1969), the particle shape (Kortum, 1969) and the
material's refractive index (Kortum, 1969). The presence of very small particles will
further complicate the relationship as these may exhibit Rayleigh scatter, which is
49
proportional to the third power of the particle size (Kortum, 1969).
The complicated relationship between reflectance at a spectral wavelength and particle
size has resulted in most work in this area focusing on chemometric calibration
methods, rather than theoretical models (Vlies, van der et al, 1995; Plugge and Vlies,
van der, 1996; Ilari et al, 1988). A novel method for classifying pharmaceutical powders
involved transforming second derivative NIR spectra to polar co-ordinates (Vlies, van
der, et al 1995). Each NIR spectrum was reduced to a single quality point in a plane,
that could therefore be plotted in cartesian co-ordinates. Linear plots of the logarithm of
the particle size versus %ory co-ordinate were found to show significant correlation.
Multivariate calibration methods have proven most successful in calibrating NIR data to
measure particle size (Ilari et al, 1988). The first method to appear in the literature
produced NIR calibrations to determine particle size in organic and inorganic powders.
Importantly in this paper, a new method of scatter correction, MSC, was described (Ilari
et al, 1988). The scatter correction technique linearly regressed each spectrum to the
mean of the data set and resulted in two coefficients: an intercept and a slope
coefficient. For each spectrum, these were used to correct for scatter. Reflectance
spectra for each material were recorded at 19 wavelengths and transformed to Kubelka-
Munk function. PLSR models were produced using the transformed spectra and particle
size measurements. Inclusion of the scatter coefficients was found to improve
calibration precision. Another multivariate method of calibrating NIR spectra to
measure particle size that has shown some success is artificial neural networks (Frake et
al, 1998a, 1998b).
2.3 Materials Used
Section 2.8 Measurement o f The Number Median Particle Size
Single batches of aspirin and anhydrous caffeine (Sigma Chemical Co., St Louis,
50
USA) and paracetamol (Boots Pharmaceuticals, Nottingham, UK) were used.
Microcrystalline cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19 batches) and
Avicel PH200 (single batch) were all from FMC International, Wallingstown, Little
Island, Co Cork, Ireland. The batches of lactose monohydrate used were a single batch
of a reagent grade material (Avocado Research Chemicals Ltd., Hey sham, UK), 9
samples of 110 mesh obtained from two manufacturers (DMV International, Veghel,
Netherlands and Lactose New Zealand, Hawera, New Zealand) and 18 samples of
Fastflo (Foremost Ingredients Group, Wisconsin, USA).
Section 2.9 Measurement o f The Cumulative Percentage Frequency Particle Size
Grades of microcrystalline cellulose used were: Avicel PH 101 (16 batches), Avicel
PH 102 (19 batches) and Avicel PH200 (single batch), all from FMC International,
Wallingstown, Little Island, Co Cork, Ireland. These batches of Avicel were those used
in Section 2.8.
Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution
The microcrystalline cellulose samples used were obtained from one supplier (FMC
International, Wallingstown, Little Island, Co Cork, Ireland). These samples (« = 113)
were from six different grades with different particle size distributions and nominal
moisture contents which ranged from 0 .8 to 4.8%"^/m.
Section 2.11 Classification o f Excipient Grades by Cluster Analysis Methods
Grades of microcrystalline cellulose used were: Avicel PH 101 (« = 9 batches), Avicel
PH 102 (« = 9 batches), all from FMC International, Wallingstown, Little Island, Co
Cork, Ireland. The batches of lactose monohydrate used were: lactose Regular (« = 9
batches, obtained from two manufacturers: DMV International, Veghel, Netherlands

51
and Lactose New Zealand, Hawera, New Zealand) and lactose Fastflo (« = 9, all from
Foremost Ingredients Group, Wisconsin, USA). The batches of Avicel used were those
of Sections 2.8 and 2.9. The batches of lactose used were those of Section 2.8.
2.4 Sample Preparation
Sections 2.8 and 2.9: Measurement o f The Number Median Particle Size and
M easurement o f The Cumulative Percentage Frequency Particle Size
A range of aspirin samples of different particle size distributions were obtained by
grinding the coarse bulk material with a mortar and pestle (approximately 50 g).
Samples were taken successively with every few minutes of grinding. The ground
aspirin samples were air-jet sieved to remove fines.
Air-jet sieve fractions of the aspirin, anhydrous caffeine and paracetamol (in each case
approximately 50 g of material was used) were produced using an Alpine air-jet sieve
(Alpine, Augsburg, Germany) with stainless steel wire mesh sieves of different sieve
diameter (75, 56, 50, 40 and 36 pm). With use of the smallest air-jet sieve, an additional
sample was collected from the sieve filter paper.
Sieve fractions of approximately 200 g of material, from single batches of Avicel
PH 101, Avicel PH 102, Avicel PH200 and reagent grade lactose monohydrate were
produced by machine sieving (Endecotts Ltd., London, UK) for 20 minutes using a nest
of progressively finer stainless steel wire mesh sieves (150, 90, 63, 45, 38 and 32 pm).
In addition, material falling through the 32 pm sieve was collected.
For each prepared sample (sieve fraction or bulk material), approximately 8 g of
material was collected and used to fill a narrow soda glass vial (25 mm wide by 50 mm
deep) for NIR and reference particle size analyses.
52
A single narrow soda glass vial was filled with material for each powdered sample (n =
113) (approximately 8 g of material per vial). These were allowed to settle overnight.
Bulk samples of lactose monohydrate and microcrystalline cellulose were used as
obtained from the manufacturers. A single narrow soda glass vial was filled with
material for each powdered sample ( « = 1 8 for each material) (approximately 8 g of
material per vial). These were allowed to settle overnight.
2.5 Reference Particle Size Analysis
,Sections 2.8 and 2.9: M easurement o f The Number Median Particle Size and
M easurement o f The Cumulative Percentage Frequency Particle Size Distribution
The particle size distributions of sieve fractions and of the remaining batches of bulk
lactose 110 mesh, Fastflo, Avicel PH I01 and Avicel PH 102 were measured by FALLS
(Malvern 2600C, Malvern Instruments, Malvern, UK). A sample from each soda glass
vial (approximately 100 mg in each case) was suspended in a disperse medium in which
it was practically insoluble with surfactant (sorbitan trioleate or dilute household
detergent) prior to particle sizing and was gently shaken using a vortex-mixer to prevent
formation of agglomerates. Avicel and aspirin samples were dispersed in cold, distilled
water using dilute detergent. Anhydrous caffeine and lactose monohydrate were
suspended in cyclohexane with sorbitan trioleate. Paracetamol was suspended in
pentane with sorbitan trioleate. Microcrystalline cellulose and lactose monohydrate
particle shapes were assessed by scanning electron microscopy using a Philips XL20
scanning electron microscope (Philips Electron Optics, Eindhoven, Netherlands).
' Cumulative percentage frequency particle size data for Avicel samples are supplied in
53
Appendix D, CD-ROM.
Particle size distribution data for the microcrystalline cellulose samples were acquired
by laser diffraction using a Malvern Mastersizer X Laser Diffraction instrument
(Malvern Instruments, Malvern, UK) equipped with a Dry Powder Feeder. These data
were supplied by the manufacturer, FMC International, Wallingstown, Little Island, Co
Cork, Ireland. These data are supplied in Appendix D, CD-ROM (inside back cover).
The materials were not particle sized. Instead, they were classified by their nominal
grade. NIR spectral data are supplied in Appendix D, CD-ROM (inside back cover).
2.6 Near Infrared Reflectance Measurements
Sections 2.8, 2.9 and 2.11 M easurement o f The Number Median Particle Size,
M easurement o f The Cumulative Percentage Frequency Particle Size and
Classification o f Excipient Grades by Cluster Analysis Methods
NIR measurements were made using a FT-NIR NIRVIS spectrometer (No. 100.1,
Buhler AG, Uzwil, Switzerland) fitted with a Buhler fibre-optic probe (No. 110.2).
Reflectance spectra were acquired by inserting the probe into the sample, and were
recorded over the range 4008 to 9996 cm“^ (500 data points), each spectrum being the
average of six scans. The reflectance reference used was Spectralon (Labsphere Inc.,
North Sutton, NH, USA). Microcrystalline cellulose data (Sections 2.8, 2.9 and 2.11)
are supplied in Appendix D, CD-ROM (inside back cover).
Near infrared reflectance measurements of the powdered samples were made using a
Foss NIRSystems grating spectrometer (model 6500) (Foss NIRSystems, Silver

54
Springs, MD, USA) equipped with a Rapid Content Analyser (RCA) module. Each
sample was centrally positioned on the window of the RCA stage, using an iris
mechanism, above the lead sulphide detectors. A diffuse reflectance spectrum of each
sample was recorded with the lid of the RCA closed. Each spectrum was the average of
32 scans and was recorded over the range 1100 to 2500 nm, at 2 nm increments (700
data points). The reflectance reference used was a ceramic tile. Data are supplied in
Appendix D, CD-ROM (inside back cover).
2.7 Data Analysis
All programs were written in Matlab 5.2 Scientific and Technical Language, except for
the MLR program. This was written in-house in C and used routine svdfit, available in
the literature (Press et al, 1992).
2.8 Measurement of The Number Median Particle Size
2,8.1 Preliminary Results
Reflectance at any wavenumber versus number median particle size, dsoox Mdso
exhibited a curvilinear relationship. To allow for this, two different approaches were
compared: single wavenumber quadratic least squares regression and full two-
wavenumber search MLR. With each of these calibration methods, different pre
treatments of the NIR spectral and FALLS dso data were applied and their effects on
standard errors of calibration (SEC) and prediction (SEP) (Mark, 1991) observed:
SEC = (=1_________
m —a —\ (2 .8 . 1 )
55
SEP = (=1 ( 2 .8 .2 )
m
where F, is the measured value of the zth sample in the calibration or prediction set,
A
is the calibration estimated or predicted value of the ith sample, a is the number of
wavelengths used in the regression, m is the number of samples in the calibration or
prediction set. The effects of the data pre-treatments on bias and linearity were also
investigated.
Quadratic least squares fits of NIR spectral and FALLS data were used to allow for
gentle curvature in calibrations. The NIR spectral data were diffuse reflectance (of
infinite thickness for all practical purposes), R\ mean corrected reflectance (where the
mean reflectance value of an individual spectrum was subtracted from the reflectance at
each of it's spectral wavenumbers); absorbance, log {HR) and Kubelka-Munk function,
f(R). A search of all 500 data points was used to select the wavenumber giving the
smallest SEC for each NIR data pre-treatment.
The second calibration technique applied was two wavenumber MLR (Osborne et al,
1993). A search of all combinations of two wavenumbers from the 500 measured by the
spectrometer was carried out. The NIR spectral data used were again reflectance, mean-
corrected reflectance, absorbance and Kubelka-Munk function. FALLS data pre
treatments investigated were J 50, \ldso and the ln(FALLS dô). All possible
combinations of pretreated NIR and FALLS data were tested.
Investigation with both calibration methods revealed that the reflectance data recorded
by the NIR spectrometer produced calibrations with significant correlation, low scatter
and low bias. Kubelka-Munk function data and absorbance also showed significant
correlation between NIR predicted J 50 and FALLS dso, however these pre-treatments
56
introduced significant fixed bias into the calibration. Reflectance data were therefore
subsequently used. The ln(FALLS J 50) was found to give two wavenumber MLR
calibrations with a lower SEC than I/J 50 or and were the FALLS data used in
subsequent MLR calibrations.
To demonstrate the feasibility of the MLR calibration method, sieve fractions of a single
batch of three drugs were tried initially (aspirin, anhydrous caffeine and paracetamol).
Subsequent working MLR calibrations for the two pharmaceutical excipients
(microcrystalline cellulose and lactose monohydrate) were produced from a larger data
set using either machine sieve fractions or a combination of machine sieve fractions and
bulk samples from a number of different batches. The quadratic least squares
calibrations were performed using the microcrystalline cellulose and lactose
monohydrate data sets as these had the largest number of data.
2.8.2 Spectral Characteristics
Scans of each powdered sample exhibited the characteristic overlapping combinations
and overtones arising from the fundamentals of the mid infrared, with non-uniform
baselines resulting from multiple scattering. Spectra also showed different offset values,
which appear to increase with wavenumber (Fig. 2.1). This has previously been
attributed to variation in pathlength (Mark, 1991), which is influenced by particle size
and sample porosity.
2.8.3 Single Wavenumber Quadratic Least Squares Calibration
Reflectance at any wavenumber showed a generally inverse linear trend with median
particle size up to approximately 100 pm (Fig. 2.2) broadly agreeing with Mie and
Fraunhoffer theory for particles of comparable size to the wavelength (Kortum, 1969).
Beyond this particle size, the relationship becomes markedly non-linear (Fig. 2.2). A
57
0.9 0.8
g 0.8 8
I 0.7 S 0.6
^0,6 I 0.4
I 0.5 I
0.4 0.2
0.3
4000 6000 8000 10000 4000 6000 8000 10000
Wavenumber/cm - 1 Wavenumber/cm'- 1
Fig. 2.1. NIR reflectance spectra for different median particle sizes. (A)
microcrystalline cellulose: (a) 24 jim, (b) 45.8 pm, (c) 93.4 pm, (d) 261 pm, (e)
406 pm and (B) lactose monohydrate: (a) 44.7 pm, (b) 66.3 pm, (c) 98 pm, (d)
132 pm, (e) 168 pm.
58
0.55 -0.015
- 0.02
c -0.025
o 0.45 ^ -0.03
IT
0.4 -0.035
-0.04
0.35
100 200 300 400 100 200 300 400
FALLS FALLS
0.16
0.4
0.15
S
c 0.3 8c 0.14
su so 0.13
0> 0)
"5 0.2 0.12
cc
0.11 H
0.1
0.1
0.09-
50 100 150 50 100 150
FALLS FALLS
Fig. 2.2. Single wavenumber quadratic least squares fit of NIR spectral data and
median particle size, dso : (A) microcrystalline cellulose reflectance data (9012
cm“^), (B) microcrystalline cellulose mean-corrected reflectance data (7128 cm“^),
(C) lactose reflectance data (7428 cm“^) and (D) lactose mean-corrected reflectance
data (7056 cm"^).
59
quadratic least squares fit of the data (Fig. 2.2) between reflectance and median particle
showed useful correlations between the NIR predicted and FALLS J 50 values
(microcrystalline cellulose: r = 0.96, 9012 cm"* (n = 57); lactose monohydrate: r = 0.90,
7428 cm"‘ (n = 33)).
Mean-correction of each spectrum was found to improve the correlation between NIR
predicted and FALLS dso with the microcrystalline cellulose data set (r = 0.98, 7128
cm"* (n = 57)). This pre-treatment acts to centre the data of individual spectra (Rmean =
0) and can help to eliminate baseline differences that occur as a result of variable
sample porosity and pressure applied with the fibre-optic probe. The variation in offset
will also be influenced by the flow properties of the material. Use of this pre-treatment
is likely to be appropriate in single wavenumber least squares calibrations where the
material exhibits variable compaction properties, such as with different grades of
microcrystalline cellulose {Handbook o f Pharmaceutical Excipients, 1994), and also
where the NIR measurements are recorded using a fibre-optic probe. However, with
lactose monohydrate (reagent grade, 110 mesh and Fastflo), which tends to have good
flow and compaction properties {Handbook o f Pharmaceutical Excipients, 1994;
Pearce, 1986), this pre-treatment was not appropriate and gave a poorer fit between NIR
predicted and FALLS dso (f = 0.68, 7056 cm"* {n = 33)).
2.8.4 MLR Calibration Using Aspirin, Anhydrous Caffeine A nd Paracetamol
The results of these calibrations which applied MLR to all two wavenumber
combinations and contained 7 or 10 data points, clearly demonstrated a relationship
between NIR reflectance and the ln(FALLS d s o ) values (Fig. 2.3). The ln(FALLS d s o )
and reflectance data, R, were fitted to an equation of the general form:
60
400 180
160 150
« 100
T3
2 100
a 300 Q.
50
60
250 0
250 300 350 400 50 100 150 0 50 100 150
FALLS t/gg/ixm FALLS FALLS
Fig. 2.3. Feasibility study. Results of MLR calibration. NIR measured

median particle size, dso, versus FALLS dso : (A) aspirin, (B) anhydrous
caffeine and (C) paracetamol.
61
ln(FALLS + /?, (2.8.3)
were bo is the intercept, bj and 62 are the MLR coefficients for the two wavenumbers, A;
and A2 respectively. Significant linear association was found between NIR predicted
lnû?5o and ln(FALLS J 50) values in each case (r = 0.99 (« = 7), anhydrous caffeine: r =
0.99 (« = 7) and paracetamol: r = 0.96 {n = 10); in each case p < 0.005).
2.8.5 Full Two Wavenumber Search MLR Using Microcrystalline Cellulose And
Lactose
With each of these materials, preliminary data processing revealed that MLR of
reflectance versus ln(FALLS d$o) produced the most linear calibrations. In addition,
mean-correction of these spectra was not found to improve calibration results. The MLR
two-wavenumber model therefore compensates for variation in baseline offset.
2.8.5.1 Particle Size Calibration Using Sieve Fraction Data
Before calibrations were attempted for both materials, the data of each was split into
two sets: a calibration set of mainly sieved fractions (microcrystalline cellulose {n = 24)
and lactose {n = 15)) and a validation set of bulk samples (microcrystalline cellulose {n
= 33) and lactose {n = 18)). With each calibration set, highly significant correlation {p <
0.005) was obtained between NIR predicted InJso and ln(FALLS d^d) values (Table 2.1).
However, the validation sets for each material exhibited more scatter (Table 2.1). With
the microcrystalline cellulose prediction set of bulk samples (Avicel grades PH 101 and
PH 102), significant correlation {p < 0.005) between NIR predicted \nd50 and ln(FALLS
dso) was obtained with a SEP greater than the SEC (Table 2.1). The high SEP is
possibly accounted for by the FALLS and scanning electron microscopy (SEM) results
62
Table 2.1. Microcrystalline cellulose and lactose MLR calibration (sieve fraction
data) and validation (bulk sample data) results.
M a te ria l M ic ro c ry s ta llin e c e llu lo s e L a c to s e m o n o b y d ra te
bo' - 3 .9 9 4 .5 9
b ,' 5 9 .6 5 -1 7 2 .3
b : -5 7 .9 9 1 6 7 .8
W a v e n u m b e r 1 (c m ') 8244 6012
W a v e n u m b e r 2 ( c m " ') 5964 5940
S E C (ln (* n /ftm )) 0 .0 6 7 0 .0 9 7
S E P ( l n ( ( / ; ,) / |a m ) ) 0 .1 7 0 .1 8
In P = c + m l n ( F A L L S d s o )
C a lib ra tio n set
r 0 .9 9 0 .9 9
m 0 .9 9 098
c 0 .0 3 5 & % 8
n 2 4 ( P H lO l, P H 1 0 2 , P H 2 0 0 s ie v e d ) 15 ( S i e v e d a n d 1 1 0 n
V a lid a tio n set

r 0 .8 4 0 .0 1 4
m 0 .9 6 0 .0 1 8
c 0 .1 7 4 .3 8
n 33 ( P H lO l & P H 10 2 , b u lk ) 18 (F a s tflo s a m p le s )
* M L R c o e f f i c i e n t s : h o - i n t e r c e p t , h , - w a v e n u m b e r 1, a n d l?2 - w a v e n u m b e r 2 .
r i s c o r r e l a t i o n c o e f f i c i e n t ; m a n d c a r e s l o p e a n d i n t e r c e p t o f p l o t s o f N I R p r e d i c t e d InrA o v s . F A L L S m e a s u r e d I n J s o ; ti i s t h e
n u m b e r o f s a m p l e s i n e a c h d a t a s e t . ____________ _________ ____________________________ __________________ ____________ _____________________________
Fig. 2.4. SEM photographs. (A) Avicel PHlOl bulk sample, (B) Avicel PH200 > 200
pm sieve fraction, (C) lactose monobydrate < 31 pm sieve fraction and (D) lactose
monobydrate > 150 pm sieve fraction.
63
(Fig 2.4A & 2.4B) which showed that these bulk samples had broad distributions and
comprised a mixture of irregularly shaped fines and large spherical particles. Previous
work (Kortum, 1969) has shown that this can produce more variable results than the use
of narrow or uniform-size distributions as the NIR scattering and absorbing properties
of these particles will be different to that of median sized particles.
Validation of the lactose calibration used bulk samples of Fastflo from 18 different
batches. This spray dried material generally has a relatively uniform and spherical
particle size (Pearce, 1986). A narrow range of <^50 was confirmed by FALLS (Range
dsQ. 81.1 - 115.7 |im ) . Poor correlation was obtained between NIR predicted InJso and
ln(FALLS dso) with these samples and is probably due to the narrow range of particle
size in the prediction set as the SEP is not significantly different from that of the
microcrystalline cellulose prediction set of bulk samples (Table 2.1).
SEM results of the lactose sieve fractions used in the calibration set showed small,
irregularly shaped fines in the smallest sieve fractions and large spherical particles in
the largest sieve fractions (Fig 2.4C & 2.4D), much the same as with the
microcrystalline cellulose calibration set.
2,8.5.2 Particle Size Calibration Using Randomised Sieve Fraction A nd Bulk Sample
Data
To produce working calibrations with the microcrystalline cellulose and lactose data
sets, the sieve fraction and bulk sample data were randomly assigned to either the
calibration set (67% of spectra) or validation set (33% of spectra). This procedure was
repeated three times to test the robustness of the method, giving three different
calibration and validation sets for the two materials. Both sieve fractions and bulk
samples were used in calibrations as preliminary investigation showed that this
produced more robust calibrations.

64
In each case, all three calibrations employed slightly different combinations of
wavenumbers. The selected wavenumbers were found to occur on the slopes of
overtone peaks; the selection of each wavenumber is therefore likely to have been
influenced by the random noise in each data set. With both materials, each of the three
MLR calibrations showed a good fit between NIR spectral and FALLS data
(microcrystalline cellulose: SEC (ln(J 5o/|Lim)) = 0.10 - 0.11^ and lactose monohydrate:
SEC (ln(J 5o/|im)) = 0.12 - 0.13^). This was confirmed by plots of NIR predicted InJso
versus ln(FALLS d s o ) which showed significant linear association (microcrystalline
cellulose: r = 0.98 (n = 38 for each set) and lactose monohydrate: r = 0.97 - 0.98^ (n =
22 for each set); in each case with p < 0.005) (Figs 2.5A & 2.6A). The three validation
sets for each material showed similar results (Figs 2.5B & 2.6B) with highly significant
correlation between NIR predicted InJso and ln(FALLS d s o ) (microcrystalline cellulose
(« = 19): r = 0.98, lactose monohydrate (n = 11): r = 0.93 - 0.97;^ in each case p <
0.005), and SEP comparable to SEC, microcrystalline cellulose: SEP (ln(J 5o/jim)) =
0.12 - 0.14,^ and lactose monohydrate: SEP (ln(6f5o/|im)) = 0.15 - 0.21^).
2.9 Measurement of The Cumulative Percentage Frequency Particle Size
Distribution
In this Section, NIR measurements of powdered microcrystalline cellulose are
calibrated to measure the cumulative percentage frequency particle size distribution.
Two different chemometric methods are compared: 3-wavenumber MLR and 3
principal components regression (PCR).
^ The range gives the minimum and maximum values observed for the three randomly
selected calibration and validation sets.
65
10 10
10,2
Q. Q.
,1 .1
10 10
10
1 10 .2
10 ,3
10
,1
10.2 10.3
FALLS FALLS
Fig. 2.5. Results of microcrystalline cellulose MLR calibration with

randomised sieve fraction and bulk sample data. NIR measured median
particle size, dso, versus FALLS dso. (A) Calibration set and (B)
validation set.
10 10
s
•a ■o
10,2 10,2
a. Q.
10,1 ,1 10
,1
10 ,2
10' ,3
10 10.1 10',2 10.3
FALLS dgg/|im FALLS
Fig. 2.6. Results of lactose monobydrate MLR calibration with

randomised sieve fraction and bulk sample data. NIR measured
median particle size, dso, versus FALLS dso. (A) Calibration set and (B)
validation set.
66
2.9.1 Preliminary Investigation
In Section 2.7, it was shown that useful calibrations for median particle size can be
obtained by using NIR reflectance data with a logarithmic transform of the FALLS
particle-size data, hence these data have been used in this section.
With MLR calibrations, preliminary work showed that a 3 wavelength linear regression
at any of the FALLS quantités produced calibrations more robust than a two wavelength
fit. It was therefore decided that three wavelength MLR calibrations would be employed
subsequently. With PCR models, three principal components were required to produce
satisfactory calibrations and this number was used for all subsequent calibrations.
The spectra of each powdered sample exhibited the effects of multiple scatter, as
described in Section 2.8.2 (Fig. 2.1 A).
2.9.3 Model Generation
The FALLS instrument gives values of the cumulative percentage frequency particle-
size distribution at 64 particle sizes (range: 564 to 5.8 pm), at intervals which follow a
geometric progression. For each sample, linear interpolation of the measured FALLS
values was used to calculate the particle size values corresponding to the 5,10, 20, 30,
40, 50, 60, 70, 80, 90 and 95% quantiles (Appendix D, CD-ROM inside back cover).
The samples exhibited a wide range of particle sizes at each quantile (Table 2.2) and a
wide variety of distributional shapes. Of the 57 samples available, 34 were chosen at
random for the calibration set; the remaining 23 samples were used as an independent
validation set. To aid comparison of the two calibration methods, the same calibration
and validation data were used for each method.
67
Table 2.2. Particle size ranges at each quantile for the calibration and validation
sets as determined by FALLS.
Quantile Particle size/|im
Calibration set (n = 34) Validation set (n = 23)
Minimum Median Maximum Minimum Median Maximum
5% 6.45 25.72 216.52 7.21 23.06 167.11
10% 9.92 37.14 268.91 11.44 32.34 187.67
20% 14.48 52.92 311.96 18.05 45.40 219.33
30% 18.39 67.10 345.62 22.55 56.36 251.13
40% 21.40 81.41 376.44 26.27 67.35 283.66
50% 23.99 96.59 406.07 29.82 78.98 319.67
60% 26.47 112.82 436.21 33.71 91.57 359.67
70% 29.25 131.29 466.55 38.30 105.95 402.81
80% 33.11 154.78 497.51 44.94 124.03 451.54
90% 40.62 197.11 529.54 57.16 152.54 504.66
95% 48.47 240.07 546.76 70.34 184.74 533.21
68
2.9.4 Three-Wavelength Multiple Linear Regression
Data from the calibration samples were used to generate calibration equations for each
quantile by fitting the logio Jx values to the NIR reflectance values according to
equation (2.9.1):
(2-9.1)
where d is the FALLS interpolated particle size at quantile, x, R the reflectance at
wavelength Xa and ba the MLR coefficient for each wavelength. The selection of
wavelengths was performed on a reduced data set of every other wavelength to reduce
the computation time required. A full 3 wavelength search for each particle size quantile
calibrated therefore used 250 of the 500 available wavelengths. This reduced the total
computation time for all eleven calibrations to about 10 hours (on an Acer Pentium II
333 MHz PC), compared with an estimated 80 hours if all 500 wavelengths had been
searched. Though setting up the eleven calibrations is time consuming, the cumulative
particle size distributions of future samples may be calculated from their NIR spectra
virtually instantaneously.
For each calibration equation, the three chosen wavelengths (Table 2.3) were those
which gave the smallest standard error of calibration (SEC). The optimum wavelengths
were similar for the 30 to 60 percent quantiles, but varied somewhat for the extreme
quantiles. The calibration equations were then used to predict the validation set {n = 23)
to give an indication of the robustness of the method (Table 2.4).
2.9.5 Principal Components Regression
This calibration method required generation of a principal components analysis (PCA)
69
Table 2.3. MLR wavelengths and PCs selected for each percentage quantile
calibration.
Percentage MLR wavenumbers/cm ' PCs

5 4008 9300 9528 28 22 17
10 4008 9300 9528 29 27 14
20 5640 5676 6216 29 27 14
30 4464 9852 9864 29 28 27
40 5736 9852 9864 27 15 1
50 5736 9852 9864 20 15 1
60 5496 9852 9864 20 15 1
70 6024 6948 9168 15 14 1
80 5664 5796 9432 23 9 3
90 5952 6996 8280 28 18 6
95 7632 8532 8664 28 18 6
Table 2.4. MLR & PCR calibration and validation results at various percentage
quantiles.
Percentage
5 10 20 30 40 50 60 70 80 90 95
MLR
Calibration set (n = 34)
R 0.977 0.980 0.984 0.987 0.989 0.988 0.984 0.979 0.972 0.932 0.889
m 0.954 0.960 0.968 0.974 0.978 0.975 0.968 0.958 0.945 0.869* 0.791*
c 0.065 0.063 0.055 0.048 0.042 0.049 0.065 0.088 0.121 0.301* 0.497*
SEC (logioW /pm)) 0.084 0.071 0.055 0.046 0.039 0.039 0.042 0.046 0.052 0.080 0.104
CV(%) 19.3 16.3 12.7 10.6 9.0 9.0 9.7 10.6 12.0 18.4 23.9
Validation set (n = 23)

R 0.951 0.951 0.965 0.971 0.959 0.955 0.950 0.959 0.964 0.897 0.822
m 0.876 0.950 0.977 0.978 0.984 0.973 0.943 0.986 0.980 0.921 0.734*
c 0.140 0.064 0.046 0.021 0.013 0.040 0.108 0.050 0.057 0.216 0.657*
SEP (log|o(d./pm)) 0.131 0.109 0.074 0.066 0.074 0.073 0.073 0.070 0.061 0.106 0.132
CV(%) 30.1 25.1 17.0 15.2 17.0 16.8 16.8 16.1 14.0 24.4 30.4
PCR
Calibration set (.n = 34)
R 0.969 0.973 0.978 0.980 0.981 0.981 0.976 0.968 0.959 0.898 0.858
0.939 0.946 0.956 0.960 0.963 0.961 0.953 0.937 0.921 0.806 0.737
0.086 0.084 0.076 0.074 0.070 0.077 0.096 0.133 0.174 0.445* 0.627'
SEC (logioW pm )) 0.096 0.082 0.065 0.057 0.051 0.049 0.051 0.057 0.062 0.097 0.116
CV(%) 22.1 18.9 15.0 13.1 11.7 11.3 11.7 13.1 14.3 22.3 36.8
V alidation set (n = 23)

R 0.980 0.981 0.981 0.978 0.984 0.981 0.969 0.965 0.953 0.924 0.842
1.128* 1.045 0.998 0.969 0.967 0.970 0.977 0.975 0.946 0.932 0.861
-0.16* -0.071 0.005 0.053 0.051 0.042 0.024 0.034 0.079 0.092 0.258
SEP (log,oW,/Mm)) 0.085 0.062 0.051 0.050 0.041 0.045 0.056 0.057 0.071 0.094 0.124
CV(%) 19.6 14.3 11.7 11.5 9.4 10.4 12.9 13.1 16.3 21.6 28.5
R is multiple correlation coefficient, m and c are slope and intercept of plots o f NIR predicted log,off, vs. FALLS measured log,ot/,; n is the number o f samples in each data
set; * m significantly different from 1, o r e significantly different from 0; CV - coefficient of variation.
70
model. This consists of a set of new variables which are uncorrelated and represent
linear combinations of the original NIR reflectance data.
The PCA model was obtained as the product of a score matrix, T, with a loadings
matrix, P, plus a residuals matrix, E, according to equation (1.7.1), from variable mean-
centred spectral data. Regression of FALLS data was as described above for MLR,
except that PC scores were used in place of reflectance values (Naes and Martens,
1988). For each calibration, the 3 PCs selected were those that gave the highest
correlations with the FALLS data (Table 2.3). The total time required to compute PCs
and PCR calibration equations was much faster than MLR, requiring only about 20
minutes.
2.9.6 Calibration A n d Validation Precision
With both methods, individual calibrations were the most precise at the 40% and 50%
quantiles (Table 2.4). This is clearly seen from the plot of SEC versus percentage
quantile (Fig. 2.7). The falling off in the precision of individual calibrations at the
extreme quantiles most probably reflects the shape of the distribution curves for the
particle-sizes in the calibration sets. The shapes of the distributions become more
skewed at the extreme quantiles (Appendix D, avicelquantilescal).
With both MLR and PCR excellent calibration results were obtained, with low SEC
(Table 2.4). The SECs at each quantile are smaller with MLR, however the standard
errors of prediction (SEPs) for the independent validation set are smaller with the PCR
model (Table 2.4). This suggests that the PCR model is more robust. Table 2.4 also
gives the slopes and intercepts for the plots of NIR predicted logio^/x versus FALLS
measured log,o^x values at each quantile. The slopes and intercepts were not
significantly (5% probability level) different from 1 and 0 respectively, apart from a few
values (marked with an asterisk) which occurred at some of the extreme quantiles.
71
0.12
0.11
0.1
0.09
= -0 .0 7
0.06
0.05
0.04
0.03
20 30 40 50 60 70 80 100
Percentage quantile
Fig. 2.7. Standard errors of calibration (SEC) versus cumulative percentage

quantile: (A) MLR, and (B) PCR.
72
2.9.7 Cumulative Particle Size Distributions
The percentage quantile value was plotted against the NIR predicted logio^fx of each
sample in the calibration and validation sets to give cumulative particle-size distribution
curves for both the MLR and PCR methods. The MLR and PCR results for the first 4
validation samples are shown in Fig. 2.8, which also shows the FALLS measured
cumulative percentage frequency distributions overlaid. Predicted distributions for both
calibration methods closely follow those obtained by FALLS, although PCR predicted
distributions match the FALLS measured distributions more closely than with MLR.
In this work, the number of quantiles at which calibration equations were set up was
restricted to 11. In principle, more or less could be used. With the present data sets the
errors do not justify the need for smaller intervals (Table 2.4).
73
Cumulative % frequency Cumulative % frequency Cumulative % frequency Cumulative % frequency
o
§ ê S
O-
T3
B) ?
5 I
2.
r
(D
<0 O o
1
++
O o
Cumulative % frequency Cumulative % frequency Cumulative % frequency Cumulative % frequency
ë ê o>
o s s ë §
o O
? ?
03 a0
1 O
1 O
(D (D
2 o <A (0
O o O o
2.10 Measurement of The Percentage Frequency Particle Size Distribution
In this Section, the percentage frequency particle size distribution of microcrystalline
cellulose is measured by NIRS. Calibrations of NIR and laser diffraction analysis
reference particle size measurements of this powdered material were produced by
partial least squares regression (PLSR) (Section 1.7.4).
The spectra showed the effects of multiple scatter, as described in section 2.8.2. This is
due to differences in particle size and surface moisture content (Kortum, 1969) (Fig.
2.S0.
2.10.2 Spectral Data Pre-treatments
A range of data pre-treatments commonly employed in NIR spectrometry were applied
to the spectral data and their effects on calibration and prediction precision were
compared with results for raw absorbance (log(l/R)) data. The data pre-treatments
tested were:
1. SNV;
2. DT;
3. SNV-DT;
4. Sg2dll.
Owing to the increase in noise in the Savitzky-Golay smoothed second derivative
spectra beyond 2200 nm, the wavelength range used with this pre-treated data was
truncated to 2200 nm.*
*due to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm (« = 546 data points).
0.4
0.3
g) 0.2
- 0.1
1200 1400 1600 1800 2000 2200 2400

Wavelength/nm
Fig. 2.9. NIR absorbance spectra of microcrystalline cellulose samples (n = 113).
76
2.10.3 Preliminary Data Analysis
In Section 2.8 it was shown that NIR data may be calibrated to measure the percentage
cumulative frequency particle size distribution of this powdered material by MLR and
by PCR. That was an extension of the NIR method for calibration of median particle
size, described in Section 2.7. With the data sets and chemometric techniques used
previously, it was not found possible to produce accurate calibrations for the percentage
frequency particle size distribution by MLR or PCR. However, with this larger NIR-
particle size data set, preliminary investigation showed that this could be calibrated for
by PLSR - although it was not also possible to calibrate the larger data set to measure
the percentage cumulative particle size distribution by this chemometric method.
A Malvern Mastersizer X Laser Diffraction instrument was used to provide percentage
frequency and percentage cumulative frequency particle size distributions. Each
measured distribution comprised 32 different channels, with intervals which follow a
geometric progression. For calibration purposes, the mean value between the low and
high particle sizes for a given channel were used (range: 0.9 - 448.34 |xm). Preliminary
investigation of the total particle size data set revealed that the largest particle size
channel had values which were zero for all samples. Since the variance in this channel
was zero, and therefore could not be modelled by PLSR, it was removed from the
particle size data set providing 31 channels for calibration.
2.10.4 Partial Least Squares M odel Generation
Biased regression models for particle size distribution data were produced with raw and
pre-treated NIR spectral data by partial least squares regression (PLSR2), according to
1
equations (1.7.18) to (1.7.23). The number ofPLS dimensions. A, for each of the pre-
i
treated data sets (X, Y) were estimated by cross-validation, using the PRESS statistic.
For this, the first 110 of 113 observations (X, Y) were divided into 11 subsets of 10
77
observations. Next, calculation of PLS models of rank A was performed for all
combinations of 10 of the 11 subsets of data. With each PLS model, the remaining
subset of NIR spectral and particle size data was used to test the goodness of fit of the
model (A PLS dimensions) by measuring the sum of squares between model predicted
and reference particle size data. This approach of dividing the data into subsets was
preferable to ‘leave-one-out-cross-validation’, requiring considerably less computation
time (less than one minute compared with 30 minutes for a Teave-one-out’ cross
validation). The number of PLS components required to extract the information from X
and Y was that number A with the lowest PRESS value.
2.10.5 Cross Validation
With the exception of the model produced with Savitzky-Golay 2"^ derivative data, 6
PLS components were required to fit the data and give the lowest PRESS value (Table
2.5). Clearly, the Savitzky-Golay smoothed second derivative did remove more scatter
and baseline drift information from the NIR data than the other pre-treatments, requiring
only 4 components to model the data. With all data pre-treatments tested, typical
idealised plots of PRESS versus number of PLS components were obtained, with clear
minima. This is shown for the absorbance data set in Fig. 2.10. The SNV transformation
provided a model with the lowest PRESS value (Table 2.5). All other pre-treated NIR
data sets produced models with low PRESS values except for absorbance data which
had the highest PRESS value which was 33% higher than for the model derived from
SNV data (Table 2.5).
78
Table 2.5. PLS results of cross validation, calibration and prediction for the NIR
data sets tested.
Data PLS Components P RESS{% f M S E P {% f RM SE P {% f
Absorbance 6 0.21960 0.8181 0.9045
SNV 6 0.16478 1.0672 1.0331
SNV detrend 6 0.16723 1.1609 1.0774
Detrend 6 0.19251 1.1351 1.0654
Savitzky-Golay 2"^ derivative 4 0.20079 2.8711 1.6944
0.9
0.8
a 0.7
0.6
0.4
0.3
0.2
PLS components
Fig. 2.10. Predicted residual error sum of squares (PRESS) for successive PLS
components extracted using NIR absorbance and laser diffraction data.
“PRESS is the predicted residual error sum of squares between NIR predicted and reference measured particle size data.
Mean square error of prediction (MSEP) for predicted particle size data, rescaled by standard deviation and mean of reference
particle size data used to calculate model.
Root mean square error of prediction {RMSEP) for predicted particle size data, rescaled by standard deviation and mean of
reference particle size data used to calculate model.
79
2.10,6 Calibration A nd Validation Precision
To test the robustness of the PLS models, the NIR-particle size data set was split into a
calibration and a validation set. For PLS modelling, 90 spectra and particle size results
(80%) were randomly selected for calibration. The remaining 23 samples (20%) were
used to test the predictive abilities of the models. PLS models for raw and pre-treated
NIR-particle size data were created, each having the number of components determined
by cross validation. The predictive ability of the models were determined by calculation
of the mean square error of prediction {MSEP) (Beebe et al, 1998) and root mean square
error of prediction (RMSEP) (Beebe et al, 1998) between PLS model predicted and
reference percentage frequency, y, over all samples, m, and channels, n:
JL JL A
MSEP = — À'
mn (2.10.1)
£ £ ( > ',> y o') (2 . 1 0 .2 )

RMSEP =
mn
The best predictive model was obtained using absorbance data which produced lowest
prediction errors: MSEP = 0.82% and RMSEP = 0.90%. The SNV, DT and SNV-DT
pre-treatments also produced models which showed low prediction errors (range: 1.0 to
1.1%), however these ranged from 14 to 19% higher than those obtained with
absorbance data (Table 2.5). The Sg2dl 1 transformation produced a model with far
higher prediction errors, with an RMSEP 87% higher than that obtained with absorbance
data (Table 2.5). Plots of percentage frequency particle size distributions for the 23
validation samples are shown in Fig. 2.11 and show the results obtained by laser
80
12
10
10
g
& &
2 I
10 0 2 0 0 3 0 0 4 0 0 100 200 300 400 100 200 300 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm
12 10
>, 10
C
! s &
1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400

12
12
10 10 10
> 10
I
& & &
2
100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400

12
10
10
C
I & &
100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400

12
10
10 10
I g
&
2
I
&
2. I
I
&
O' ■
100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400

10 10
10
I g
&
I
2. .
■±±
100 200 300 400 1 0 0 2 0 0 3 0 0 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm
Fig. 2.11. Plots of laser diffraction measured percentage frequency distributions

for the validation samples (n = 23) with NIR values overlaid (+).
81
diffraction with the NIR predicted results overlaid. The linear association between NIR
predicted and laser diffraction measured particle size percentage frequency values for
the entire validation set (all 23 samples and 31 channels) was found to have a highly
significant correlation, r, of 0.973 (n = 713, p = 0.005), with slope of 0.940 and
intercept of 0.194% (Fig. 2.12).
82
14
12
10
ü
5
Ig)
8
sc
0)
6
I
1 4
1
Q.
%* • •
çç
z 2
-2
0 2 4 6 8 10 12 14
Percentage frequency
Fig. 2.12. NIR predicted percentage frequencies versus Malvern Mastersizer

measured percentage frequencies for the 23 validation samples (w = 713).
83
2.11 Classification of Excipient Grades by Cluster Analysis Methods
Cluster analysis and pattern recognition methods have been used to identify
pharmaceutical excipients by NIRS (Candolfi et al, 1999a, 1999b). The multivariate
techniques used were soft independent modelling of class analogy (SIMCA),
wavelength distance, k nearest neighbour {knn). Hotelling’s 7^ control ellipses and
triangular potential functions (Candolfi et al, 1999a, 1999b). However these
multivariate methods have not been investigated for classification of different grades of
materials. In this Section, grades of lactose monohydrate and microcrystalline cellulose
are classified and identified by cluster analysis (Massait et al, 1988) of their NIR
spectra. Two different chemometric methods are compared: knn using reflectance
values at all combinations of two wavelengths and score values of combinations of two
PCs and Hotelling’s 7^ control ellipses of PC scores.
2.11.1 Near Infrared Spectra
The reflectance spectra for all samples showed characteristic non-uniform baselines
arising from multiple scatter. Between grades of the same material, the differences in
offset and baseline curvature were consistent with the median particle sizes of the
grades. Hence with the two grades of microcrystalline cellulose, the spectra fall into two
groups (Fig. 2.13 A). With lactose monohydrate, the Regular grade material from two
different manufacturers had median particle sizes greater and less than for lactose
Fastflo, hence spectra for the Regular grade appear at both higher and lower reflectance
than for Fastflo (Fig. 2.13B).
84
0.76
0.74
® 0.7
— Re gu lar
-PH 101
1
_ -----
Fastflo
Re gu lar
-P H 102
6000 6200 6400 6600 6800 7000 7200 7400 5000 6000 7000 8000 9000
W a v e n u m b e r/c m '’ W a v e n u m b e r/c m '’
Fig. 2.13. Near infrared reflectance spectra of microcrystalline cellulose and

lactose monohydrate grades: A) microcrystalline cellulose (grades PHlOl and
PH102, 5909 to 7406 cm~^), B) lactose monohydrate (grades Regular and Fastflo,
4008 to 9996 cm"^).
85
2.11.2 Spectral Data Pre-treatments
A range of scatter correcting data pre-treatments commonly employed in NIR
spectrometry were applied to the spectral data and their effects on classification were
compared against results for reflectance data. The data pre-treatments tested were:
1. SNV;
2. DT;
3. SNV-DT;
4. S g2dll.
2.11.3 k Nearest Neighbour
k Nearest neighbour {knn) is a mathematically very simple non-parametric classification
procedure (Massait et al, 1988). It involves computing the Euclidean distance between
an unknown sample’s pattern vector and n samples in a given training set resulting in n
distances. The Euclidean distance, D, between sample p and q is given by:
( 2 . 11 . 1)
where m represents the number of variables. The unknown sample is assigned to which
ever training set has the k nearest samples. The value of k is determined by optimisation
to determine the best prediction ability. Small values are normally preferred with typical
values for of 3 or 5 having been reported. It has also been reported that with large data
sets this method is capable of yielding classification results close to or better than more
complicated methods, despite its simplicity (Massait et al, 1988). In this work,
classification has been attempted using spectral values at pairs of wavelengths and
principal components (PCs) scores on two components. With both materials, each
86
sample of a grade has been treated as an unknown and classified according to the class
of its k nearest neighbours.
2.11.4 Wavelength A n d Principal Components Selection
The algorithm used searched all possible combinations of pairs of wavelengths and with
PCA, pairs of PCs. At each of the two-wavelength and two-PC combinations, the
classification ability was determined {i.e. the numbers of correct identifications and the
numbers of incorrect identifications). Combinations yielding 100% correct classification
for the two groups and their total number were recorded. The procedure was also
repeated for different data pre-treatments.
The choice of using two wavelengths and PCs was made to maintain mathematical
simplicity and also because principal components analysis often results in clusters
which are discernible on plots of two PCs, frequently with the first two components.
Hence, the results of the two wavelength and two PC approaches could be directly
compared.
The number of wavelengths included in the search was 500 for all pre-treatments except
the Savitzky-Golay 2"^ derivative which used 489 wavelengths. With PCA, the number
of PCs searched depended on the rank of the model. With each model, this was
determined by calculation of the % sum of squares {%SS) (Lindberg et al, 1983) and the
R statistic (Wold, 1978) for successive PCs extracted (Tables 2.6 & 2.7 ).
2.11.5 k-Nearest Neighbour With Spectral Wavelengths
For the two grades of microcrystalline cellulose, 100% correct identification of the
samples was obtained using pairs of wavelengths for all spectral pre-treatments (Table
2.8). Optimisation of k produced excellent classification results with a value of just one
87
Table 2.6. knn results for microcrystalline cellulose grades PHI 01 (w = 15) and
PH102 (n = 15) using pairs of principal components (PCs).
Data PCs Extracted Correct identification Example PCs
PHlOl PH 102
( k = l)
Reflectance 4 99.33 15 13
SNV 4 94.36 15 13
Detrend 4 94.31 12 14 -
SNV detrend 4 93.85 13 14
Savitzky-Golay 2"'* derivative 4 65.86 15 15 1,4
(ik = 2)
SNV 4 94.36 14 15
Detrend 4 94.31 8 14
SNV detrend 4 93.85 13 15 -
Savitzky-Golay 2"'' derivative 4 65.86 14 15
Table 2.7. knn results for lactose monohydrate grades Regular (n = 9) and Fastflo
(n = 9) using pairs of principal components (PCs).
Data PCs Extracted %ss- Correct identification Example PCs
Regular Fastflo
(* = 1 )
SNV 3 94.68 7 9
Detrend 4 96.07 9 9 1 ,2 and 2, 3
SNV detrend 3 92.72 7 9 -
Savitzky-Golay 2"'* derivative 3 70.21 8 9
(* = 2)
SNV 3 94.68 6 9
Detrend 4 96.07 8 9
SNV detrend 3 92.72 6 9
Savitzky-Golay 2°^ derivative 3 70.21 7 9
^%SS is the sum of squares of the differences between the NIR data and the NIR data
estimated by the principal components analysis model.
88
or two nearest neighbours. With a value for k of two, reflectance data yielded only 12
pairs of wavelengths out of 124,750 possible combinations which were suitable (0.01%)
(Table 2.8). The best data pre-treatment was the Savitzky-Golay 2"^ derivative. With k
equal to one, 100% correct identification of the two grades was obtained for 8381
combinations of wavelength pairs out of 119,316 possible combinations (5%) (Table
2 .8 ).
For the two grades of lactose monohydrate, 100% correct identification was obtained
with all pre-treatments except for reflectance data (Table 2.9). These results could be
obtained with a value for k of one or two. The best data pre-treatment for this material
was the quadratic baseline detrend. This returned 34,543 possible combinations of
wavelength pairs that yielded 100% correct identification (27.7%) with k = \ (Table
2.9).
For both materials, the computation time of a full two-wavelength search was
approximately 47 minutes. However, the grade of subsequent future samples could be
determined from their NIR spectra virtually instantaneously.
2.11.6 k-Nearest Neighbour With Principal Components Scores
Between two and four PCs were required to model the pre-treated data sets for both
materials (Tables 2.6 & 2.7). These numbers of PCs were used in the two PC search.
For microcrystalline cellulose grades, best classification was obtained using PCs 1 and 4
(Fig 2.14A) and a value of one for k*. The only pre-treated data set which yielded 100%
correct identification for all samples in both classes was the Savitzky-Golay 2"^
derivative data set. This pre-treatment agrees with the optimum pre-treatment for the
method using pairs of wavelengths.
However, a value for A: of 4 also yielded the same result.

89
Table 2.8. knn results for microcrystalline cellulose grades PHlOl (n = 15) and
PH102 {n = 15) using pairs of wavelengths.
Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '
PHlOl PH 102
(* = 1 )
Reflectance 15 15 685 8868, 9900
SNV 15 15 3785 9720,9935
Detrend 15 15 2416 9072, 9996
SNV detrend 15 15 2555 9096, 9804
Savitzky-Golay 2"‘‘ derivative 15 15 8381 9792, 9852
(k = 2)
Reflectance 15 15 12 8424, 9672
SNV 15 15 3337 9768, 9936
Detrend 15 15 2363 9072, 9996
SNV detrend 15 15 1991 9096,9804
Savitzky-Golay 2"'* derivative 15 15 5908 9792,9852
Table 2.9. knn results for lactose monohydrate grades Regular in = 9) and Fastflo
(n = 9) using pairs of wavelengths.
Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '
Regular Fastflo
(* = 1)
Reflectance 9 8 8916,9936
SNV 9 9 2602 9756,9828
Detrend 9 9 34543 9804, 9924
SNV detrend 9 9 5101 9756, 9840
Savitzky-Golay 2"“' derivative 9 9 6144 9792, 9852
(k = 2)
Reflectance 8 9 - 9624,9996
SNV 9 9 364 8028,8136
Detrend 9 9 7942 9096, 9432
SNV detrend 9 9 414 9240, 9360
Savitzky-Golay 2”“' derivative 9 9 6144 9792, 9852
90
PH 102
■5-14
-15
i
o
Q, •14 •12
•I'i^"^
•1 1
-ID
P H 101
PC4 scorss
F a s tflo
F a s tflo
K -0.1
R e g u la r
R e g u la r
O
PC2 score#
Fig. 2.14. k nearest neighbour {knn) and Hotelling's 7^ classification (95% control
ellipses) of microcrystalline cellulose and lactose monohydrate grades using
principal components scores. Microcrystalline cellulose (Savitzky-Golay 2"^*
derivative of reflectance data): A) knn (100% correct identification, k = 1), B)
Hotelling's 7^ control ellipses (overlapped); lactose monohydrate (quadratic
haseline corrected reflectance data): C) knn (100% correct identification, k = 1), D)
Hotelling's 7^ control ellipses (overlapped).
91
For lactose monohydrate grades, the best classification results were obtained using PCs
1 and 2 (Fig 2.14C) or 2 and 3 and a value of one for k and quadratic baseline corrected
spectra (Table 2.7). This yielded 100% correct identification for all samples in both
grades. This optimum pre-treatment also confirms that found for the method using pairs
of wavelengths.
Overall, the method using PC scores was preferred owing to the greatly reduced
computation time required to set-up the method (approximately one minute). Prediction
of the grades of future samples could be achieved virtually instantaneously.
2.11.7 H o t e l l i n g C o n t r o l Ellipses
As a comparison of the predictive ability of the knn method. Hotelling's 7^ control
ellipses (Montgomery, 1997) were produced for both materials with the PCs that
yielded best identification, as described in Section 1.8.1. The value of lOOof was set to
95%. With both microcrystalline cellulose and lactose monohydrate, the 95%
Hotelling’s 7^ ellipses for the two grades were found to overlap, resulting in some
incorrect assignment of samples for each grade (Fig. 2.14B &D). The two materials are
chemically different and should belong to different multinormal distributions.
2.11.8 Identification of The Materials Using Hotelling*s 7^ Control Ellipses
To confirm whether the Hotelling’s 7^ control ellipses are able to distinguish the two
materials, principal components models were constructed for each material using the
pre-treatments found optimum with knn. Hotelling’s 7^ control ellipses were produced
for the samples of the material (both grades) used to calculate the PC model. With both
PC models (microcrystalline cellulose and lactose monohydrate) the other material’s
pre-treated spectra were centred using the mean of the data set for the model. These
were then projected into the pattern space. For both materials, 100% correct
92
identification of the modelled class was obtained with all samples for the non-modelled
material falling outside the 99% control ellipse (Fig. 2.15A &B).
2.12 Conclusion
The shapes of near infrared spectra (i.e. baseline curvature, spectral offset, absorption
I peak height and width) of powdered pharmaceutical materials are clearly influenced by
the particle size distribution of the material. This enables calibration of the data to
measure and classify the particle size of materials. Extraction of particle size
distribution information requires multivariate calibration of a reference set of NIR
spectral data with reference particle size data. This was found to be most accurate with
raw spectral data and by PLSR. Alternatively, materials may be classified by grade
using the simple non-parametric methods, eg knn. This method does not require
reference particle size data, however scatter correction of the spectral data was found to
be necessary for the two materials studied. This is most likely necessary to reduce
within class variance and increase between class variance. As the optimum scatter
correction was found to be different for the two materials studied in this chapter,
implementation of this method for other materials requires that the optimum scatter
correction be determined first.
93
5
0
•14
•10 •12
-5
•18
•15 M ic r o c r y sta llin e c e llu lo s e
•16
L a cto se
•10
-5 -4 3 -2 -1 0 1 2 3
PC4 s c o r e s
Xid’
L a c to se
0 .2
•1 5
" M ic r o c r y sta ilin e c e llu lo s e
- 0.8 - 0.6 -0 .4 - 0.2 0 0.2 0.4 0.6
Fig. 2.15. Identification of microcrystalline cellulose and lactose monohydrate

using Hotelling's 7^ control ellipses (99% control ellipses) of principal components
(PC) models for each material. Microcrystalline cellulose PC model (Savitzky-
Golay 2"*^ derivative data used): A) 100% correct identification of microcrystalline
cellulose {n = 30). Lactose monohydrate PC model (quadratic haseline detrend
data used): B) 100% correct identification of lactose monohydrate {n = 18).
94
CHAPTER 3
Multivariate Statistical Process Control of a
Pharmaceutical Manufacturing Process
Using Principal Components Analysis of
Near Infrared Measurements
3.1 Introduction
Traditional quality control of pharmaceutical manufacturing processes involves
collection and analysis of process intermediates and final product to ensure that the
product meets the requirements of its specification (Candolfi et al, 1999a, 1999b;
Sekulic et al, 1998). This approach to process monitoring is time consuming and does
not monitor the performance of the process over time.
In this chapter, an alternative method for pharmaceutical quality control that enables
monitoring of the process operating performance of a tabletting process is discussed. In
Section 3.5, near infrared spectrometric analysis is used to provide process based
measurements at each process stage. Section 3.7 describes the use of the data reduction
method, principal components analysis (PCA), for the multivariate statistical process
control methods used. In Sections 3.8 and 3.9, multivariate models are generated and
monitored for the first process stage (blend) and second process stage (tablet). The
combined process measurements are used in Section 3.10 to develop an overall process
fingerprint from which unusual product batches may be identified. In Section 3.12
conclusions are drawn as to the advantages of this form of quality control over
95
traditional methods.
3. 2 Background And Overview of The Process
The manufacturing process of the marketed tablet can be viewed as three steps:
1. Raw materials dispensing,
2. Blending of excipients and active,
3. Tabletting of blend.
The powdered raw materials used in the blends are: microcrystalline cellulose, sodium
starch glycolate, dibasic calcium phosphate anhydrous, drug substance and magnesium
stearate. The first four of these raw materials are blended together for a set time before
being passed through a Fitzmill screen, to remove lumps from the blender contents.
Magnesium stearate is then added to the blender, and the contents are further blended
for a specified time. At the end of the blend stage, samples of blend are collected for
analysis (usually a three-point sample) by traditional analytical techniques (Section 3.4)
before proceeding to tabletting. At this stage, the process can be delayed as samples
must be analysed in the laboratory, away from the production area.
Tabletting of the blend produces one of two strengths of tablet, depending on the fill
weight and size of die chosen. Samples of tablets are then analysed by reference
analytical methods (See Section 3.4).
The pharmaceutical process studied was one which was normally well controlled.
However, during a period of manufacture, a number of batches of blends and tablets
were produced which were of lower quality. Some of these problem batches were
identifiable from reference analysis of the blends, however most could not be. These
produced batches of tablets which were friable, had high average tablet thickness and
96
which showed prolonged tablet dissolution time. All of these unusual batches of tablets
had analytical results within the limits of the product specification, however they each
showed results close to the limits and were therefore considered unusual.
3 .3 Materials
A total of 205 production batches of blends and tablets were examined. These were
supplied by the manufacturer (Pfizer Ltd., Pfizer Central Research, 1 Ramsgate Road,
Sandwich, Kent, CT13 9NJ, UK.).
The blends analysed were 193 different batches of the total number of batches studied.
Of these batches, 13 were re-blended batches (as a result of unusual reference analysis
results) and one was a placebo batch. With each batch of blend, specimens were
supplied which were taken from different locations within the blender (Flobin). The
number of these sampling locations was either three or seven (if the batch was re
blended) different blender locations. Samples were collected by the Quality Assurance
laboratory over a six year period. A total of 1881 blend NIR absorbance spectra were
measured.
The tablets obtained were of two strengths of the drug substance and are both
manufactured from an identical blend composition and hence are two sizes. The number
of batches used was 44 with lower strength tablets and 43 batches with higher strength
tablets. The tablets themselves were white and were embossed with the Pfizer logo on
one side and embossed with distinguishing lettering on the other side.
For the two strengths of tablet, 39 batches of lower strength tablets were supplied with
their corresponding blends and 41 batches of the higher strength tablets were supplied
with their corresponding blends.
97
3. 4 Reference Analytical Data
Reference laboratory data were provided for most batches of blends and tablets.
For blended powders, the variables measured were;
1. drug substance content/mg
2. blend uniformity (%);
3. moisture content (%);
4. moisture deviation (%);
5. UV specific absorbance, A(l%, 1 cm), of a 1% filtered solution of 1cm
pathlength.
Blend uniformity is the maximum absolute difference in drug substance contents, of
replicate assays for a batch, from their mean value, expressed as a percentage. Moisture
deviation is the maximum absolute difference in moisture contents of replicate assays
for a batch from their mean value, expressed as a percentage. These test were performed
for blends used to produce the lower and higher strength tablets (n= 193 batches).
For tablets, the variables measure were:
1. drug substance content per tablet/mg;
2. content uniformity (%);
3. moisture (%);
4. dissolution (%);
5. disintegration/secs;
6. average tablet weight/mg;
7. average tablet hardness/kPa;
8. average tablet thickness/mm;
98
9. friability/mg.
The tablet tests are standard pharmacopoeial tests and were performed for the lower
strength tablets {n = 44 batches) and for the higher strength tablets {n = 43 batches).
Summaries of the reference analytical data for the blends (Section 3.8) and lower and
higher strength tablets (Section 3.9) are provided in Tables 3.1, 3.2 and 3.3 respectively.
Summaries of the combined blend and tablet data sets (multiway data sets*) used in
Section 3.10 are provided in Tables 3.4 and 3.5 respectively. The batch numbers of
product, and their indices in data sets (blend (Section 3.8), lower strength tablet (Section
3.9), higher strength tablet (Section 3.9), and multiway blend and tablet data sets
(Section 3.10)) are provided in Table 3.6 (N.B. some C. of A. reference analysis results
were missing for some batches).
Multiway data sets are three dimensional data sets that may be unfolded to form a two-dimensional array.
99
Table 3.1. Summary of blend C. of A. data (n = 193 batches).
Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg g 24.98 0.41 25.8 24.2 185
Blend uniformity (%) 0.88 0.63 3.0 0.0 185
Moisture content (%) 2.98 0.23 4.0 - 185
Moisture deviation (%) 2.23 1.64 5.0 0.0 172
A{\%, 1 cm) 115.32 1.19 - - 182
Table 3.2. Summary of lower strength tablet C. of A. data {n = 44 batches)
Drug substance content/mg 4.98 0.08 5.15 4.85 40
Content uniformity (%) 1.70 0.93 6.0 0.0 40
Dissolution (%) 99.15 1.88 100 90 40
Di sintegration/secs 9.63 0.95 600 - 40
Mean weight/mg 199.52 0.37 - - 40
Hardness/kPa 13.60 0.94 17 9 40
Thickness/mm 3.31 0.04 3.5 3.2 40
Friability/mg 0.28 0.60 - - 40
Table 3.3. Summary of higher strength tablet C. of A. data (n = 43 batches).
Drug substance content/mg 10.00 0.10 10.3 9.7 43
Content uniformity (%) 1.36 0.69 6.0 0.0 43
Dissolution (%) 98.85 1.20 100 90 43
Disintegration/secs 10.71 1.19 600 - 43
Mean weight/mg 399.70 0.65 - - 43
Hardness/kPa 14.32 0.71 17 9 43
Thickness/mm 4.58 0.05 4.6 4.1 43
Friability/mg 1.58 1.16 - - 43
100
Table 3.4. Summary of multiway lower strength blend and tablet C. of A. data {n
39 batches).
Variable Mean value Standard Upper limit Lower limit Batches

deviation
Blend drug substance content/mg g"' 24.81 0.33 25.8 24.2 37
Blend moisture content (%) 2.96 0.16 4.0 - 37
Blend moisture deviation (%) 2.52 1.66 5.0 0.0 37
Blend A(l%, 1 cm) 115.62 1.41 - - 37
Tablet drug substance content/mg 4.98 0.08 5.15 4.85 35
Tablet content uniformity (%) 1.85 0.95 6.0 0.0 35
Tablet moisture content (%) 3.23 0.30 4.5 - 35
Tablet dissolution (%) 98.96 1.95 100 90 35
Tablet disintegration/secs 9.43 0.92 600 - 35
Tablet mean weight/mg 199.50 0.33 - - 35
Tablet hardness/kPa 13.40 0.84 17 9 35
Tablet thickness/mm 3.32 0.04 3.5 3.2 35
Tablet friability/mg 0.20 0.47 - - 35
Table 3.5. Summary of multiway higher strength blend and tablet C. of A. data {n
= 41 batches).
Variable Mean value Standard Upper limit Lower limit Batches

deviation
Blend drug substance content/mg g“* 24.87 0.33 25.8 24.2 40
Blend moisture content (%) 2.92 0.32 4.0 - 40
Blend moisture deviation (%) 2.16 1.29 5.0 0.0 40
Blend A(l%, 1 cm) 115.45 0.97 - - 38
Tablet drug substance content/mg 10.00 0.10 10.3 9.7 41
Tablet content uniformity (%) 1.46 0.76 6.0 0.0 41
Tablet moisture content (%) 3.18 0.31 4.5 - 41
Tablet dissolution (%) 98.73 1.20 100 90 41
Tablet disintegration/secs 10.67 1.22 600 - 39
Tablet mean weight/mg 399.69 0.67 - - 41
Tablet hardness/kPa 14.25 0.72 17 9 41
Tablet thickness/mm 4.58 0.04 4.6 4.1 41
Tablet friability/mg 0.41 0.95 - - 41
101
Table 3.6. Batch numbers and indices of blend, tablet and multiway blend and
tablet data sets {n = 205 production batches).
Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
1 1 0 0 0 0
967 2 0 0 0 0
968 3 0 0 0 0
969 4 0 0 0 0
973 5 0 0 0 0
974 6 0 0 0 0
975 7 0 0 0 0
976 8 0 0 0 0
983 9 0 0 0 0
984 10 0 0 0 0
985 11 0 0 0 0
989 12 0 0 0 0
990 13 0 0 0 0
993 14 0 0 0 0
997 15 0 0 0 0
998 16 0 0 0 0
999 17 0 0 0 0
1006 18 0 0 0 0
1009 19 0 0 0 0
1010 20 0 0 0 0
1015 21 0 0 0 0
1016 22 0 0 0 0
1017 23 0 0 0 0
1024 24 0 0 0 0
1025 25 0 0 0 0
1029 26 0 0 0 0
1031 27 0 0 0 0
1034 28 0 0 0 0
1469 29 0 0 0 0
1969 30 0 0 0 0
1971 31 0 0 0 0
1973 32 0 0 0 0
1975 33 0 0 0 0
1977 34 0 0 0 0
1979 35 0 0 0 0
1981 36 0 0 0 0
1983 37 0 0 0 0
1987 38 0 0 0 0
1990 39 0 0 0 0
1991 40 0 0 0 0
1992 41 0 0 0 0
1997 42 0 0 0 0
1998 43 0 0 0 0
1999 44 0 0 0 0
2000 45 0 0 0 0
2005 46 0 0 0 0
2007 47 0 0 0 0
2011 48 0 0 0 0
2011 (re-blend) 49 0 0 0 0
2017 50 0 0 0 0
2019 51 0 0 0 0
2021 52 0 0 0 0
2023 53 0 0 0 0
2025 54 0 0 0 0
2027 55 0 0 0 0
2029 56 0 0 0 0
2031 57 0 0 0 0
2033 58 0 0 0 0
2035 59 0 0 0 0
2039 60 0 0 0 0
102
Table 3.6. Continued.
2041 61 0 0 0 0
2043 62 0 0 0 0
2045 63 0 0 0 0
2047 64 0 0 0 0
2049 65 0 0 0 0
2051 66 0 0 0 0
2055 67 0 0 0 0
2057 68 0 0 0 0
2061 69 0 0 0 0
2063 70 0 0 0 0
2967 71 0 0 0 0
2967 (re-blend) 72 0 0 0 0
2969 73 0 0 0 0
2981 74 0 0 0 0
2983 75 0 0 0 0
2985 76 0 0 0 0
2987 77 0 0 0 0
2988 78 0 0 0 0
2989 79 0 0 0 0
2990 80 0 0 0 0
2994 81 0 0 0 0
3011 82 0 0 0 0
4050 0 0 1 0 0
4058 83 0 0 0 0
4059 84 0 0 0 0
4978 0 0 2 0 0
4980 0 0 3 0 0
4983 0 1 0 0 0
4984 0 2 0 0 0
4985 0 3 0 0 0
4986 0 4 0 0 0
4987 0 5 0 0 0
4988 0 6 0 0 0
4989 0 7 0 0 0
4990 0 0 4 0 0
4991 85 0 5 0 1
4992 86 0 6 0 2
4993 87 8 0 1 0
4994 88 9 0 2 0
4995 0 10 0 0 0
4996 89 11 0 3 0
4997 90 12 0 4 0
4998 91 13 0 5 0
4999 92 0 7 0 3
5000 93 0 8 0 4
5001 94 0 9 0 5
5002 95 14 0 6 0
5003 96 15 0 7 0
5004 97 0 0 0 0
5005 98 0 0 0 0
5006 99 0 0 0 0
5007 100 0 0 0 0
5008 101 0 0 0 0
5008 (re-blend) 102 0 0 0 0
5009 103 0 10 0 6
5010 104 0 11 0 7
5011 105 0 12 0 8
5012 106 0 0 0 0
5013 107 16 0 8 0
5014 108 0 0 0 0
5015 109 0 0 0 0
5016 110 17 0 9 0
5017 111 18 0 10 0
5018 112 19 0 11 0
5019 113 0 0 0 0
5020 114 0 0 0 0
5021 115 0 13 0 9
103
5022 116 0 14 0 10
5023 117 0 15 0 11
5024 118 0 16 0 12
5025 119 0 0 12 0
5025 (re-blend) 120 20 0 13 0
5026 121 21 0 14 0
5027 122 22 0 15 0
5028 123 0 0 16 0
5028 (re-blend) 124 23 0 17 0
5029 125 24 0 18 0
5030 126 0 0 0 0
5031 127 0 17 0 13
5032 128 0 18 0 14
5033 129 0 19 0 15
5035 130 25 0 19 0
5036 131 26 0 20 0
5037 132 27 0 21 0
5038 133 28 0 22 0
5039 134 0 0 23 0
5039 (re-blend) 135 29 0 24 0
5040 136 30 0 25 0
5041 137 31 0 26 0
5042 138 32 0 27 0
5043 139 0 0 0 16
5043 (re-blend) 140 0 0 0 17
5043 (re-blend) 141 0 20 0 18
5044 142 0 21 0 19
5045 143 0 0 0 0
5045 (re-blend) 144 0 0 0 0
5045 (re-blend) 145 0 0 0 0
5045 (re-blend) 146 0 0 0 0
5045 (re-blend) 147 0 0 0 0
5045 (re-blend) 148 0 0 0 0
5046 149 0 22 0 20
5047 150 0 23 0 21
5048 151 0 24 0 22
5049 152 0 25 0 23
5050 153 0 26 0 24
5051 154 0 27 0 25
5052 155 0 28 0 26
5053 156 0 29 0 27
5054 157 0 30 0 28
5055 158 33 0 28 0
5056 159 34 0 29 0
5057 160 35 0 30 0
5058 161 0 31 0 29
5059 162 0 32 0 30
5060 163 0 33 0 31
5061 164 36 0 31 0
5062 165 37 0 32 0
5063 166 0 0 0 0
5064 167 38 0 33 0
5065 168 39 0 34 0
5067 169 40 0 35 0
5068 170 0 34 0 32
5069 171 0 35 0 33
5070 172 0 36 0 34
5071 173 0 37 0 35
5072 174 0 0 0 0
5073 175 0 38 0 36
104
5074 176 0 39 0 37
5075 177 41 0 36 0
5076 178 42 0 37 0
5077 179 43 0 38 0
5078 180 0 40 0 38
5079 181 0 41 0 39
5080 182 0 42 0 40
5081 183 0 43 0 41
5967* 184 44 0 39 0
5968 185 0 0 0 0
5969 186 0 0 0 0
5970 187 0 0 0 0
5970 (re-blend) 188 0 0 0 0
5971 189 0 0 0 0
5972 190 0 0 0 0
5972 (re-blend) 191 0 0 0 0
5974 192 0 0 0 0
5975 193 0 0 0 0
105
* Example - blend BN5967 has: batch index 184 in the blend data set; batch index 44 in the lower strength
tablet data set and batch index 39 in the multiway PCA and multiblock PLS data sets.
3. 5 Near Infrared Measurements
In this study, NIR measurements were made using a grating instrument (Foss
NIRSystems, Maryland, USA) equipped with either a diffuse reflectance module (Rapid
Content Analyser) or a transmission module (Intact Analyser). The diffuse reflectance
module records spectra over the range 1100 to 2498 nm, at 2 nm intervals (700 data
points) and outputs spectra as absorbance (logio(l//?)). The reflectance reference used
with the RCA module was a ceramic tile. The transmission module records spectra over
the range 600 to 1898 nm, at 2 nm intervals (650 data points) and outputs spectra as
absorbance (logio(l/7)). The spectra recorded with the transmission module used air as
the reference.
3. 5. 1 Sample Preparation A nd Presentation
All spectra recorded were the average of 50 scans, which is the default setting on the
instrument vendor’s software (NSAS).
Blends were scanned using the diffuse reflectance RCA module (as described in Section
2.6), in narrow soda-glass vials (henceforth these data are referred to as absorbance).
For each specimen, 3 soda glass vials (50 mm deep by 25 mm wide) were filled with
blend (approximately 8 g) and their NIR absorbance spectra recorded.
The tablets were scanned in both diffuse reflectance (RCA module, described in Section
2.6) and transmission modes (Intact module). Transmission measurements of the two
strengths of tablet were made by placing a tablet of either strength into a specially
machined aluminium template, with a circular hole underneath the tablet to allow
transmission of NIR radiation. For each measurement, the template and tablet were
placed inside the module, between the fibre optic probe and the lead sulphide detectors,
and the lid closed. To allow consistency of measurement, each tablet was scanned with
the Pfizer logo adjacent to the source of incident NIR radiation. Henceforth, diffuse
106
reflectance measurements of tablets are referred to as absorbance data; transmission
measurements of tablets, though converted to apparent absorbance data, are referred to
as transmission data. The numbers of NIR spectra measured for the lower strength
tablets were 4911 absorbance spectra (mean =112 tablets per batch) and 4904
transmission spectra (mean =111 tablets per batch). The numbers of NIR spectra
measured for the higher strength tablets were 2716 absorbance spectra (mean = 63
tablets per batch) and 2721 transmission spectra (mean = 63 tablets per batch). Spectra
were acquired over several weeks.
3. 6 Data Analysis And Pre-treatment
As NIR spectral data are often highly collinear, they generally require pre-processing -
to remove multiple scatter effects - followed by multivariate analysis to extract useful
chemical and physical information. A number of different scatter pre-treatments of
blend and tablet data were therefore examined. These were:
1. Raw spectral data (absorbance/transmission);
2. SNV;
3. DT;
4. SNV-DT;
5. Sg2dll.
These 4 pre-treatments were applied to blend absorbance data, and absorbance and
transmission data for each of the two strengths of tablet. This allowed for a total of 55
data sets (including multiway data sets of appended blend and tablet data) to be studied
via principal components analysis (equation (1.7.1)) (Sections 3.6.2 and 3.6.3).
The data were analysed using code programmed in Matlab 5.2 Scientific and Technical
107
Programming Language (The Mathworks Inc., Natick, MA, USA).
3. 6.1 Wavelength Selection
Spectral Characteristics
All spectra exhibited non-uniform baselines arising from multiple scatter. The SNV
transform was able to correct for variation in scatter within each data set. SNV coupled
with DT was able to correct for both variation in scatter within each data set as well as
correct for non-uniform baselines. The Sg2dl 1 transformation was able to correct both
variation in scatter and non-uniform baselines. In addition, it resolved chemical peaks in
the spectra which in the raw data were obscured as overlapped combinations and
overtones of fundamental mid infrared absorptions. However, in absorbance mode the
Sg2dl 1 spectra appeared more noisy beyond 2200 nm.
Absorbance Spectra
The wavelength range scanned was used to generate PCA models for absorbance, SNV ,
DT and SNV-DT pre-treated spectral data. However, with the Sg2dll transform, due to
the increased noise beyond 2200 nm, the wavelength range used was truncated to 2200
nm^
Transmission Spectra
With transmission measurements (raw, SNV, DT and SNV-DT pre-treated spectral
data), the wavelength range selected was 750 to 1208 nm for higher strength tablets and
750 to 1350 nm for lower strength tablets. These wavelength ranges were selected to
provide spectra within the dynamic range of the detector* and were different for the two
^ Owing to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm.
'The dynamic range of the detector is approximately 2 absorbance units on a baseline. The maximum absorbance at any wavelength
should not exceed 6 absorbance units.
108
strengths of tablet due to different tablet thickness. With the Sg2dl 1 transformation, this
range was truncated to 1196 nm and 1338 nm respectively^
3. 6. 2 Principal Components Analysis Model Generation
PCA Models were generated for all data pre-treatments (absorbance/transmission, SNV,
DT, SNV-DT and Sg2dl 1) of each data set: blends (absorbance data); lower strength
tablets; higher strength tablets (absorbance data and transmission data for each tablet
strength); multiway data sets containing blend and tablet data (absorbance and/or
transmission). To eliminate systematic variation in the data due to scatter effects from
sample presentation geometry and variable optical properties of the soda glass vials, the
average spectrum for each batch was used for PCA. This was found to produce more
acceptable PCA models, requiring both fewer components and less cross validation
computation time.
With each PCA model, the variable mean-centred spectral data, X, were decomposed as
the product of a score matrix, T, and a loadings matrix, P, according to equation (1.7.1)
(Piovoso et al, 1992).
The rank of each PCA model was determined by ‘leave-one-out’ cross-validation and
involved calculation of predicted residual error sum of squares {PRESS) and R and W
(Eastment and Krzanowski, 1982) statistics. In addition, the cumulative percentage sum
of squares (%SS) (Lindberg et al, 1983) for each extracted PC was calculated and a chi-
squared significance test was performed on the eigenvalues (Jackson, 1991) (Appendix
B). Once generated, these PCA models were amenable to multivariate statistical process
control (MSPC) methods, described in Section 3.7.
Ôwing to Sg2dll transformation, the wavelength ranges used were: 760 to 1196 nm with transmission spectra of higher strength
tablets and 760 to 1338 nm with transmission spectra of lower strength tablets.
109
Blend And Tablet PCA Model Loadings
With the blend data-set, the PCA loadings for each pre-treatment appeared to contain
physical and chemical information. An example of the PCA loadings for absorbance
spectra (« = 193 batches) is given in Fig. 3.1. The first two loadings appeared to
represent physical information, and showed a general reduction in value across the
wavelength range. One PC’s loadings represent absorptions at 1930 nm and 1410 nm,
which are characteristic absorptions of water (Osborne et al, 1993) (Fig. 3. IE).
The loadings of PCA models generated from tablet absorbance and transmission data
also appeared to contain physical and chemical information. An example of the loadings
of absorbance and transmission spectra of lower strength tablets is given in Fig. 3.2 and
Fig. 3.3 respectively {n = 44 batches). The loadings of the third PC of the lower strength
tablet absorbance spectra also seemed to contain features characteristic of water (Fig.
3.2C).
3. 6. 3 Characterising The Entire Process Using Multiway PCA
In section 3.6.2, the method of PCA is described for the blend and tablet stage data sets.
The PC scores from these data sets may be monitored by MSPC procedures so that
deviations in process performance at each stage may be identified. However, it was also
considered desirable to examine how the process performed overall. Multiway PCA
(MPCA) (Kresta et al, 1991; Skagerberg et al, 1992; Nomikos and MacGregor, 1994) is
a data reduction method that enables the data of an entire chemical process to be
described by a few latent variables and was therefore examined. The variability among
batches with respect to their variables and time variation were studied by MPCA which
enabled the process to be summarised as a ‘fingerprint’ in reduced dimensional space.
In MPCA, the three-way array of spectral data, X, may be decomposed into a series of A
principal components consisting of scores matrices. Ta, and loadings v e c t o r s , p l u s a
110
0.04
0.05 0.02
■5 0.04
<0 — 0.02
-0 .0 4
0.03
-0 .0 6
0.02 —0.08
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm
0.06
0.04 0.05
□
c>
0.02
a. -0.02
O
“■ -0 .0 5
-0 .0 4
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
g) 0.05
.5 0.05 c
-0 .0 5
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.05 0.1
(0
oc> S) 0.05
1
^ -0 .0 5
-0 .0 5
- 0.1 - 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Fig. 3.1. Blends NIR absorbance spectra PCA loadings (n = 193 batches): A) PCI;
B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PCS.
Ill
0.04
0.06
0.02
o>
-J 0.03 j - 0 .0 2
0.02 -0 .0 4
0.01 -0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0
I 0.05
?
■5 -0 .0 5
1
o0.
Q. -0.1
-0 .0 5
-0 .1 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
0.05 0.05
0
I
o -0 .0 5 u
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
0.15
0.05
?
o 0.05
1
Ü -0 .0 5
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Fig. 3.2. Lower strength tablet NIR absorbance spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PC8.
112
0.07
0.05
c 0.065
0.06
y -0 .0 5
0.055
- 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.15 (0
I 0.1 g* 0.1
1 0.05 1
o
Q.
-0 .0 5 - 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.05 0.05
O)
§ -0 .0 5
-0 .0 5
ü -0.1
o. -0.1 Q.
-0 .1 5
-0 .1 5
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.1
0.1
« 0.05
I
-0 .0 5 1
^ - 0.1
- 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.15
0.1
0.05
ü
Q.
-0 .0 5
800 900 1000 1100 1200 1300

W avelength/nm
Fig. 3.3. Lower strength tablet NIR transmission spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6; G) PC 7; H) PC 8 &
I) PC9.
113
residual array, E, which is as small as possible in a least squares sense (Bharati and
MacGregor, 1998; Wold et al, 1987; Westerhuis and Coenegracht, 1997):
+£ (3.6.1)
a=\
Where ® represents the Kronecker product. Mathematically, MPCA is equivalent to
performing ordinary PCA (Section 3.6.2) on the unfolded array, X (Wold et al, 1987). In
this chapter, data reduction was achieved by performing MPCA on variable mean-
centred unfolded three-way data sets (blend and tablet NIR data [absorbance and/or
transmission]). Two-way arrays of data were produced by slicing each three-way array,
X, vertically and appending the tablet absorbance and/or transmission data set(s) to the
right of the blend absorbance data set. This provided two-way arrays of rows of
observations {i.e. batch) versus columns equal to the number of combined wavelengths.
For the lower strength tablet process, the two-way arrays therefore contained NIR
spectral data of 39 batches of blends and their corresponding tablets. The higher
strength tablet process two-way arrays therefore contained NIR spectral data for 41
batches of blends and their corresponding tablets. PCA was then performed on these
unfolded three-way arrays, as for the discrete data sets. The models were derived from
batch average spectra with rank determined as for two-way PCA models (Section 3.6.2).
The effects of the different data pre-treatments on MPCA models were also
investigated. Once generated, these MPCA models were amenable to multivariate
statistical process control (MSPC) methods, described in Section 3.7.
3. 7 Multivariate Statistical Process Control Methods
NIR spectra of the blends and tablets were used for MSPC purposes. The dimensionality
of these data were first reduced by subjecting them to PCA (1.7.1) (Section 1.7). Using
114
the resulting PC scores, sample variance-covariance matrices of the process at each
stage and of the overall process, were estimated. This is known as ‘Control phase 1’
(Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and was performed to
establish statistical control levels. The method employed a Monte Carlo search as an
optimisation procedure to determine batches produced whilst the process operated
within statistical control (Section 3.8.3). The optimisation procedure was therefore used
to estimate both the theoretical process mean PC score vector and the theoretical sample
variance-covariance matrix at each stage.
Once these were estimated, all batches examined for each model were then treated as
future production samples for control phase 2 monitoring (Alt, 1985). This involved
monitoring the PC scores of these batches, at each process stage, by means of residual
analysis, using the sample variance-covariance matrix and process mean PC score
vector determined in control phase 1. The ability of the method to monitor the process
overall, from the raw materials through to the finished dosage form, was determined.
3. 7. 1 Hotelling^s Control Phase 1
The generalised Hotelling’s 7^ was calculated for the average spectrum of each batch
with the PCA models examined and was calculated according to equation (1.8.4) with
upper control limit, UCL, for this statistic calculated according to equation (1.8.5). The
value of a of the upper I00a% critical point of the F distribution was set to 0.95
(warning level) and 0.99 (control level).
3. 7. 2 Multivariate Exponentially Weighted Moving Average (MEWMA)
This statistic was calculated for the batches examined with each PCA model, and used
the process mean score vector and sample variance-covariance matrix determined in
control phase 1. The upper control limits used with this chart were based on tabulated
115
data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models with a
greater number of PCs were assigned control limits based on the chi-squared
distribution (Neave, 1995) multiplied by a factor of 1.05, as suggested by Lowry et al
(1992).
3. 7. 3 Process Variance: Sample Generalised Variance
All PCA models calculated in this chapter comprised more than two PCs, hence process
variance was monitored using Anderson’s asymptotic normal approximation.
Process Variance Control Phases 1 and 2
To implement this MSPC procedure, 9 spectra were selected at random for each batch,
from the library of data. This number was chosen as it was greater than the maximum
number of retained PCs in any model (8 PCs) and was equivalent to the lowest number
of samples measured for a blend (9 spectra). This number was also used for tablet
models and is close to the number used in British Pharmacopoeial assay methods (9
tablets out of 10 must have uniformity of content within 80 and 120% of specified
amount) {British Pharmacopoeia 1999, 1999) - however more tablet spectra could have
been used. Multiway PCA models which combined blend and tablet data were limited to
9 samples per batch.
The randomly selected spectra were mean-centred and projected onto the eigenvectors
for each model. The resulting PC scores were then used to calculate Anderson’s
asymptotic normal approximation for each batch.
Preliminary control charts for this sample generalised variance were constructed with
various statistical control limits, however with most PCA models, this approach did not
produce control charts with a mean Anderson's asymptotic normal approximation of 0
and variance 2n (where n is the number of PCs) because some batches had significant
116
generalised variances above 99.9% limits. To overcome this problem, a recursive
algorithm was implemented for control phase 1 which excluded these batches from the
calculation of the theoretical sample generalised variance. The control phase 1 process
terminated when the mean of Anderson’s asymptotic normal approximation for the
batches used in this calculation was 0. This was found to be effective, and in control
phase 2 monitoring produced improved discrimination of batches with significantly high
Anderson’s asymptotic normal approximation.
3. 7. 4 Diagnosis of Out-of-Control Batches:
Hotelling^s Stacked Bar Charts And Shewhart-Type Plots on Individual PC Scores
In this study, the individual PC contributions to the multivariate of batches found to
be out of control were examined. The results of these plots may be found in Appendix B
(Tables: B35 to B49).
In addition, Shewhart control charts were constructed for the individual PC scores
(Jackson, 1991). These used the mean and range for each PC score sample, measured
for the historical data set. The overall or grand mean was, in most cases, equal to zero
(unless additional batches were projected into the model space) with each model, whilst
the range was equal to the standard deviation of the observations multiplied by the
corresponding value of the r-distribution (95% and 99% control limits). With blends, a
similar approach to the PC Shewhart method was adopted to diagnose PCs responsible
for significant process variance for batches which showed significant multivariate
dispersion (using Anderson’s asymptotic normal approximation) (Wise et al, 1991). To
determine the PCs on which significant variance occurred, the data set of each batch
was first sub-group mean-centred. The range was calculated from the centred batch data
with overall mean equal to zero.
117
3. 7. 5 Generation of PCA Models of Normal Batches: Q Statistic PCA Residual
Analysis
The Q statistic was used to identify batches which had not processed in the normal
manner (Nomikos and MacGregor, 1994) - and which were located outside the plane
defined by the principal axes. The implementation of this control chart on any PCA
model, involved an initial screening of the model with rank determined by the cross-
validation R statistic. Batches whose Q values exceeded 95% and 99% limits were
deemed to have processed unusually and were excluded from the 'normal' data set. A
full PCA cross-validation was then performed on the updated data set, and the control
limits for Q were recalculated based on the rank of the updated model. Those batches
which were excluded from the model were centred and projected onto the eigenvectors,
and their residuals and Q statistics calculated. Additional batches falling beyond the
99% limits were excluded, as were any batches whose Q values were above the 95%
control limits and were consecutive in product batch number with any batches already
eliminated from the 'normal' data set. This process was repeated until all unusual
batches had Q values exceeding the 99% level with no consecutive batches having Q
values above the 95% level.
3. 8 Multivariate Statistical Quality Control of Pharmaceutical Blends
3. 8.1 Principal Components Analysis,
A visual inspection of PC loadings confirmed that all PCs selected with the R statistic
described systematic variation (loadings showed chemical or physical information);
extra PC loadings appeared more random and were deemed to represent noise. The W
statistic which is another popular stopping criterion tended to select more components
than R before terminating and was therefore considered less useful. The cumulative
%SS explained by models was in excess of 99.5% for all models except the Sg2dl 1 data
118
set. This model explained only 94%SS, probably owing to increased noise from the
derivative transformation.
Between 6 and 8 PCs were required. The absorbance data set required most PCs, its first
two component loadings appear to represent scatter information, and show a general
decrease across the wavelengths (Fig. 3.1 A & B). With all PCA models, at least one PC
has high positive loadings at 1400 and 1900 nm and may represent water. This is found
on the 5^^ PC with absorbance data, 2"^ PC with SNV data; and 2"^ PCs with DT
data; F* PC with SNV DT data and on the first PC with Sg2dl 1 data. The PC loadings
of the models also showed significant correlations with NIR absorbance spectra of the
excipients (Appendix B: Table B51).
Examination of the PC scores for all data sets reveals systematic variation which can be
clearly observed on the PC for all PCA models. With the first PC for each model, the
batches were clearly divided into three main groups over time: low level; high level and
optimum level (about zero vector). This is shown for the PC scores of blend Sg2dl 1
data (Fig. 3.4). This systematic process variation on the first PC was considered to relate
to a change in the efficiency of the milling step during blending. Over the period of high
systematic process variation, the batches of drug substance used were more brittle and
less free flowing. These were consequently harder to mill. This was later confirmed by
NIR imaging microscopy of some of these blends (Chapter 5).
3. 8. 2 Unusual Batch Detection: Q Statistic
With this statistic, similar batches for all data sets were found to have processed
unusually and thereby not fit the plane defined by the model. The placebo blend (29)
was easily identified as an outlier with all models. Consideration of results from data
analysis of multiway PCA models suggested that the results of SNV DT and Sg2dl 1
models, which were consistent, provided the best indication of unusual blends. This is
119
-4
X 10 A X 10
S '... •
s »
u
0.
-2 -1
50 100 150 50 100 150

Batch Batch
X 10” x lO ”
1
1.5
• * • •
1
0.5
i 0.5
8
L:
D.
(0
Ü
CL
-1 -0.5
-1.5 -1
50 100 150 50 100 150
Batch Batch
X 10” X 10”
10
5
2
8 0
(/) - V.:". •• - iO-St-
Ü
CL -5 ■V. 1.
-5
-10
50 100 150 50 100 150
Batch Batch
X 10”
£-1
-2
-3
-4
50 100 150
Batch
Fig. 3.4. PC Shewhart control charts for PCA scores of NIR Savitzky-Golay 2”**
derivative of absorbance spectra of blends (n = 193 batches). A) PCI scores; B)
PC2 scores; C) PC3 scores; D) PC4 scores; E) PCS scores; F) PC6 scores & G) PC7
scores (95% and 99% control limits shown).
120
shown for SNV DT data set in Fig. 3.5. With these models, a number of consecutive
batch numbers were found to have processed unusually (Batches: 125 (BN5029), 126
(BN5030); 167 (BN5064), 168 (BN5065), 169 (BN5067); 180 (BN5078), 181
(BN5079); 192 (BN5974), 193 (BN5975)).
3. 8. 3 Hotelling Control Charts
Hotelling*s Control Phase 1: Monte Carlo Genetic Algorithm
For statistical process control monitoring, it is necessary to establish control limits
based on data from previous successful batch runs. This entails estimation of an 'in-
control' sample covariance matrix and construction of a Hotelling’s 7^ control ellipse
which encompasses these measurements. With NIR process measurements, no product
quality data may be available from which a suitable set of reference data may be
selected and an alternative approach is required to identify those batches from which the
‘in control’ process covariance matrix may be estimated. The approach employed here
utilised a Monte Carlo search algorithm as an optimisation procedure to identify 'in-
control' batches. The algorithm initially selected a subset of batches at random -
typically 2 or 3 more batches than the number of retained PCs - from which a sample
covariance matrix and 99% Hotelling’s 7^ limits were calculated. The Hotelling’s 7^
distances of these batches were then measured and if all batches were located within the
99% confidence ellipsoid, control phase 2 was implemented. This involved measuring
the 7^ for all other batches and recording the frequency of each batch failing the test.
Typically this process was repeated 100 times and the results were plotted as a
frequency bar chart of outliers. These outlier batches were excluded from control phase
1; the remaining batches were used to estimate the process covariance matrix and grand
mean vector. If however any batch fell outside the 99% control ellipsoid, it was
removed from the phase 1 data set and the covariance matrix re-estimated. Once all
121
Batch
Fig. 3.5. Q Statistic control chart of PCA model of blends (n = 193 batches) (NIR
SNV-DT absorbance spectra) (95% and 99% limits shown).
1 2 2
control batches were found to lie within the 99% ellipsoid, the Hotelling’s 1^ was
calculated for all batches and plotted as a control chart. Batches exceeding 95 and 99%
control limits were recorded.
With up to 6 PCs included in the model, this approach was found to work well with 8 to
10 preliminary batches and 99% control limits. However with 8 PCs, the search
required considerable computation time (several hours), and was therefore considered
unacceptable. Two alternative strategies were found to provide useful results:
1. Use (n PCs + 1) batches in the preliminary search with pcs, i, «where n is
the number of PCs in the model and a = 99 or 99.9%, to identify potential
outliers, followed by 95 and 99% control limits thereafter.
2. Use PCs 1+2 (or 1 to 3 ), which describe most variance in the data, and
repeat the procedure described above in instances with 8 PCs or more, using
approximately 6 batches in the preliminary Monte Carlo search. Then use
selected batches and repeat control phase 1 monitoring with inclusion of all
PCs. Exclude any batches which lie outside the 99% ellipsoid from the
control phase 1 batches. Proceed to control phase 2 monitoring.
An example bar chart for blend DT data is shown in Fig. 3.6.
Hotelling Control Phase 2
All production batches were then monitored using the estimated process covariance
matrix and PC scores to measure Hotelling’s 7^. The results for absorbance and pre
treated data sets were all similar using 99% confidence limits: the placebo blend 29
(BN1469) was identified as an outlier (PC6, Fig. 3.4 & 3.10); batch 21 (BN 1015) was
an outlier with all models and with absorbance and SNV consecutive batches 22
123
2 .5
2 1.5
0.5
20 40 60 80 100 120 140 160 180 200

Batch
Fig. 3.6. Results of Monte Carlo search of PC scores of blends (n = 193 batches)
(DT data) showing outlier frequency with batch (100 searches).
124
(BN 1016), 23 (BN1017), and 24 (BN 1024) were also outliers. With all control phase 2
Hotelling’s 7^ charts most batches between 83 and 154 (BN4058 and BN5051) were
out-of-control {i.e. showed systematic process variation). An example control chart for
DT blend data is shown in Fig. 3.7B. This is further illustrated on plots of two PC
scores with the 95% and 99% Hotelling’s 7^ control ellipses (Figs. 3.8, 3.9 & 3.10).
Hotelling^s MEWMA Control Charts
This control chart exhibited very similar behaviour with all models and clearly showed
systematic drift in the process with time. With all data sets, the process mean vector had
reached statistical process control (99% limit) by batch 82, however significant drift
occurred taking the process mean vector out-of-control. The level of this statistic fell
sharply from batch 154 onwards indicating that the process mean vector was shifting
towards the theoretical value determined by the search algorithm. Only with the detrend
model did the value again fall within control - however with all data sets the value
lowered and reached a constant level. This chart is consistent with the Hotelling’s 7^
control phase 2 charts which identified most batches between 83 and 154 as out-of-
control. An example of this control chart for DT data is shown in Fig. 3.7C.
3. 8. 4 PC Shewhart Control Charts
To minimise Type I error, 99% control limits were examined. With all data sets, the
placebo blend 29 (BN 1469) was clearly identified as being out-of-control (Appendix B:
Table B23). In addition, batches 21 (BN1015), 22 (BN1016), 23 (BN1017) and 24
(BN 1024) were mostly found to be out-of-control with absorbance, SNV, DT and SNV-
DT data models (Appendix B: Table B23). Batches 64 (BN2047), 74 (BN2981), 127
(BN5031) and 192 (BN5974) were out-of-control with all models except Sg2dl 1
(Appendix B: Table B23). This model however identified groups of consecutive
125
H otelling's 7^ control ch a rt: p h a s e 1 H otelling's 7^ c o n tro l c h a rt: p h a s e 2
10
10
10 I
10 20 40 60 80 100 120 50 100 150

In -c o n tro l b a tc h e s F u tu re p ro d u ctio n b a tc h e s
MEWMA c o n tro l ch a rt: p h a s e 2 P r o c e s s variability co n tro l c h a rt: p h a s e 2
I 600
c 200
50 100 ISO 50 100 150

F u tu re p ro d u ctio n b a tc h e s F u tu re p ro d u ctio n b a tc h e s
Fig. 3.7. MSPC Control charts of blend PC scores (DT data) {n = 193 batches): A)
Hotelling’s control phase 1; B) Hotelling’s 7^ control phase 2; C) Hotelling’s 7^
MEWMA & D) Anderson’s asymptotic normal approximation.
126
0.15
0.05
-0 .0 5 -
-0 .0 6 - 0.02 0.06
PC2 S cores
Fig. 3.8. PC2 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.
127
0.15
0.1
49
•3Ô1 *3
•64
•67
27 21
•60
0.05 •15 40
68 ' -3 /
,6032
29 •72
•192 •1 6 3 8 0
•181
•127
•110
-0.05 4 04
64
—0.1
-0.015 - 0.01 -0.005 0.005 0.01 0.015
PCS S cores
Fig. 3.9. PC6 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 7^ 95% and 99% control ellipses.
128
0.06
■19:
0.04
•29 •186
•18*W&5% é%-
•158
0.02
•130
•192
•16 !
(/) •183 <30.
2 •141
ü8)
o
Q.
•1491 '€4
- 0.02
•58
-0 .0 4
-0 .0 6 —
-0.015 - 0.01 -0.005 0 0.005 0.01 0.015
PC6 S cores
Fig. 3.10. PC6 versus PC2 scores plot of PCA scores of blends (n = 193 batches)
(DT NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.
129
batches as outliers: 136 (BN5040), 137 (BN5041) and 138 (BN5042) (Fig 3.4E); 171
(BN5069), 173 (BN5071) (Fig. 3.4C) and 174 (BN5072) (Fig. 3.4D); 161 (BN5058)
(Fig. 3.4F) (Appendix B: Table B23). Most of these batches were subsequently found to
be out-of-control with multiway PCA models.
Clearly, the individual PC score control charts were unable to identify drift in the
process as with the Monte Carlo search algorithm and Hotelling’s 7^ charts. This is
because all available data are used with no optimisation procedure. However, with all
blend models, visual inspection of the scores on the F* PC clearly show three levels,
indicating systematic variance in this component. However, they were able to identify at
the 99% control level, groups of consecutive batches which were also out-of-control
with Hotelling’s 7^ charts, eg batches 21, 22, 23, 24 and placebo batch 29. The Sg2dl 1
transformation produced best PC Shewhart charts capable of identifying a number of
batches which were subsequently out-of-control with MPC A models (Fig. 3.4)
(Appendix B: Table B23).
3. 8. 5 Process Variance:
Anderson^s Asymptotic Normal Approximation Sample Generalised Variance
With all blend PCA models, similar batches were identified as having shown significant
process variance at the 99.9% confidence interval (For all data pre-treatments: mean
sample generalised variance of all batches used in control phase 1 was zero, variance of
sample generalised variance of the control phase 1 batches was approximately 2n).
Across all models, the earliest batches seemed to show more process variance (batches
14, 15, 16, 26, 27, 28, 33, 37, 39, 43, 64, 67), with a cluster of values above 99.9%
confidence limits for each data set (Appendix B: Tables B35 - B39). With the exception
of the SNV-DT model, batch 109 (BN5015) was out-of-control. Most data sets
identified batches 64 (BN2047) and 67 (BN2055) as having significant process variance
130
and Sg2dl 1 model identified batches 125 (BN5029) and 126 (BN5030) as out-of-
control. All of these were found to have processed unusually with some blend data-sets,
having significant Q values (p = 0.01) (Appendix B: Table B12). Batches 142
(BN5044), 144 (BN5045 [S* re-blend]), 146 (BN5045 [4“' re-blend]), 152 (BN5049)
and 186 (BN5969) showed highly significant process variance on most data sets. None
of these batches had processed unusually, with Q values below 99% limits, however the
highest scoring batches, 144 and 146, were re-blends of the same production batch
(BN5045). Interestingly, this batch had a mean total water content below specification
limit, however the deviation of water content throughout the re-blends had been found
to be excessive in both cases (moisture deviations of 6.62% and 5.74% respectively,
maximum moisture content = 5%) and these were therefore rejected.
An example control chart for DT data is shown in Fig. 3.7D.
Sub-Group Mean-Centred PC Shewhart Plots
These plots were also examined to assess whether significant process variance was
traceable to the PC scores. The results are summarised in Appendix B: Table B24.
Batches considered to have exhibited excessive sub-group variance on a given PC were
restricted to cases where a minimum of 3 observations were above 99% control limit.
This corresponds to one blend sample, and should minimise Type I errors. With batches
64, 142, 146 and 186 it was possible to trace the excessive process variance to 1 or 2
PCs. Batch 146, which exhibited excessive variation in moisture content throughout the
blend, showed significant variance on PCI for all models. Hence this PC must represent
either scatter, which is affected by surface moisture or moisture content. An example
chart for Sg2dl 1 sub-group mean centred scores is shown in Fig. 3.11. The PC
Shewhart plots for PCI (original PC scores) with all models also showed systematic
variation which is consistent with the Hotelling’s 7^ MEWMA charts.
131
x10
10 0.5
I
500 1000 500 1000 1500

O bservation O bservation
500 1000 1500 500 1000 1500

X 10 xIO" F
10 4
. .! .. .•
I k
s » Q.
_4
' •
-5 : ................ —6
500 1000 1500 500 1000 1500

500 1000 1500

O bservation
Fig. 3.11. PC Shewhart control charts for sub-group mean centred PCA scores (n =
9 spectra per hatch) of NIR Savitzky-Golay 2"^ derivative of absorbance spectra of
blends (n = 193 batches). A) PCI scores; B) PC2 scores; C) PC3 scores; D) PC4
scores; E) PCS scores; F) PC6 scores & G) PC7 scores (95% and 99% control
limits shown).
132
The certificate of analysis total moisture contents for these blends were plotted and did
not show the same trend - it is possible that this component represents scatter which is
affected by surface moisture.
3. 8. 6 Correlation of Blend MSPC Results With Raw Material Usage Batch Data
The results of MSPC of blends (Sections 3.8.2 and 3.8.5) - Q statistic, Anderson’s
asymptotic normal approximation and the contributions of the individual PC scores to
the Hotelling’s f^- were examined for correlations with raw material batch numbers
used. Significant correlations with these statistics would indicate that a particular batch
of a raw material resulted in poor process performance and excess process variance.
Principal Factor Analysis
To examine the correlations between raw materials used and the MSPC indicators of
poor process performance, a data array was constructed and ordered such that the rows
corresponded to a particular process batch number, and the columns corresponded to all
of the raw material batch numbers used. The elements of the array contained the masses
(kg) of each raw material batch used in a particular blend batch. For each batch of blend
produced, each of the five raw materials could comprise several different raw material
batches in different amounts. Where none of a given raw material batch was used, the
element was made zero. The product batches used in this study were numbers 85 to 154,
which corresponds to the batches with systematic variation. The total number of
variables (raw material batches) used was 52, and the total number of blend batches
studied was 70. Appended to this array were column vectors of Q statistic, Anderson’s
asymptotic normal approximation and the individual PC contributions to the Hotelling’s
'f', with these values depending on the NIR blend data set examined.
133
Data Analysis
The combined data set (70-by-54) was autoscaled and subjected to a PCA. The number
of vectors retained included those with eigenvalues greater than or equal to unity, which
is typical in a factor analysis (Appendix B: Table B50). For each data set, the vectors
were then normal varimax rotated into terminal vectors (Harman, 1976).
Interpretation of Principal Factor Loadings
With principal factor analysis, no significance is attached to the order of rotated factors
(however they still collectively account for the same amount of variance as before
rotation) (Dillon and Goldstein, 1984). Instead, the factor loadings are examined in turn
for large positive or negative values. Their values correspond to correlations of that
factor with the variable of high loading value (positive or negative), and may be
subjected to statistical significance test (Dillon and Goldstein, 1984). With 70
observations, loadings must be greater than or equal to 0.3 to be considered significant
{p = 0.01) (most loadings on a principal factor analysis are typically close to zero).
Meaning may also be attached to the pattern of a factor’s loadings. For example, if there
exists bipolarity on a factor (Dillon and Goldstein, 1984).
With this study, important correlations would be those which include Q statistic,
Anderson’s asymptotic normal approximation or the Hotelling’s PC contributions
with raw material batch numbers.
Results
The SNV and SNV-DT data sets showed significant correlations ip = 0.01, n = 70)
between EX005147 (dibasic calcium phosphate anhydrous ), EX004245
(microcrystalline cellulose) and Anderson’s asymptotic normal approximation
(Appendix B: Table B50). This is important with these batch numbers being used in
134
BN5045, which exhibited excessive moisture variation throughout the blend.
The Sg2dl 1 data set showed significant correlations {p = 0.01, n = 70) between
EX008202 (dibasic calcium phosphate anhydrous), EX007173 (microcrystalline
cellulose), EX008015 (sodium starch glycolate) and Q statistic (Appendix B: Table
B50). This is important as these batch numbers were used in varying proportions in
blends 136 to 139 (BNs 5040, 5041, 5042 and 5043) which were found to have
processed unusually.
3. 9 Multivariate Statistical Quality Control of Pharmaceutical Tablets
This Section deals with MSPC for the lower and higher strength tablet data sets. For
each strength of tablet, models were examined from tablet absorbance data and from
tablet transmission data.
3.9.1 Lower Strength Tablet Absorbance Data Sets
Principal Components Analysis
All 44 batches were examined in this study. The rank of each pre-treated data set was
determined by recursive 'leave one out' cross validation in conjunction with Q statistic
Between 6 and 8 PCs were required (Appendix B: Table B2). All models except Sg2dl 1
explained in excess of 99%SS while the Sg2dl 1 model accounted for 96.68% %
probably due to reduction in signal to noise. All PCs were found to be significant
(Appendix B: Table B2). All of the PCA models’ loadings appear to represent physical
and chemical information.
The absorbance model’s first two PC loadings show a general trend of increasing value
across the variables suggesting scatter (Fig. 3.2A & B). The third PC’s loadings have
high negative loadings at 1400 and 1900 nm, characteristic of water (Fig. 3.2C). With
SNV, DT and SNV-DT models, the loadings of the first component are characteristic of
135
water (Fig. 3.12). The other PCs’ loadings of these models resemble quadratic baseline
corrected spectra of microcrystalline cellulose and magnesium stearate (Appendix B:
Table B51).
Unusual Batch Detection: Q Statistic
Some similarity exists in tablet batches found to have processed unusually with this
control chart. Batches 21 and 31 (BN5026 and BN5041) had significant Q values (p =
0.01) for absorbance, DT and SNV-DT models (Appendix B: Table B13). Batches 30,
31 and 32 were found to have processed unusually with absorbance data (BNs 5040,
5041 and 5042). Batch 7 had significant Q values with all models except Sg2dl 1.
Consideration of these control charts with their blend counterparts showed good
agreement for absorbance and SNV-DT models (Fig. 3.13). With both of these models,
batches 38, 39 and 40 had significant Q values (p = 0.01) (BNs 5064, 5065, 5067
respectively) (Appendix B: Table B13) (Fig. 3.13). Overall, absorbance data was
considered to provide the most consistent results.
Hotelling*s Control Phase 1: Monte Carlo Genetic Algorithm
This procedure was repeated with the 44 batches as for the blends. The batches
examined were produced from blends selected from batches 85 to 184. Most of these
were found to have shown process drift, however from batch 154 onwards, the process
had drifted back in control.
Between 17 and 35 batches were included in estimation of the process covariance
matrix (Appendix B: Table B40). With 99% and 95% control limits established, the PC
scores were tested in control phase 2.
136
0.05
0.1
<0
? I
% 0.05
1
^ -0 .0 5
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
0.05
o) 0.05 &
c
1
CL -0 .0 5 ^ -0 .0 5
- 0.1
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
0.05
g, 0.05
?
g -0 .0 5
1
- 0.1
o
“■ -0 .0 5
-0 .1 5
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.06
0.04
0.02
oo. -0.02
-0 .0 4
1200 1400 1600 1800 2000 2200 2400

W avelength/nm
Fig. 3.12. Lower strength tablet NIR absorbance spectra PCA loadings (SNV DT
data) (#î = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.
137
20 25
O bservation
Fig. 3.13. Q Statistic control chart of PCA model of lower strength tablets (n = 44
batches) (NIR SNV-DT absorbance spectra) (95% and 99% limits shown).
138
Hotelling^s Control Phase 2
The results of this control phase were found to depend on whether batches had been
included in the PCA model generation, or whether they had been excluded on the basis
of significant Q values. With absorbance and SNV-DT models, batches 38, 39 and 40
(BN5064, BN5065, BN5067) (excluded from PCA but projected onto eigenvectors)
were not found to be out of control. With these models, batches that were found to be
out of control were early batches that correspond to those batches which exhibited
systematic drift in the blend models. The detrend model did not identify any batches as
out of control.
The SNV and Sg2dl 1 models both identified batches 38, 39 and 40 (BN5064, BN5065,
BN5067) as out of control, which is consistent with blend Q statistics (these batches
were not identified as unusual with tablet Q statistics) (Appendix B; Table B40). Both
models also identified batch 30 (BN5040) as being out of control. Interestingly, the
Sg2dl 1 model also identified batches 31 and 32 as being out of control (BNs 5041 and
5042) (Fig. 3.14B). This result agrees with SNV-DT and Sg2dl 1 blend Q statistics. This
is shown clearly on a plot of PC scores 5 versus 4 with the Hotelling’s 95% and 99%
control ellipses (Fig 3.15), which also shows batches 38 and 39 lying close to the 95%
limit and batch 40 lying outside the 95% limit. The Sg2dl 1 pre-treatment provided most
consistent results.
Raw absorbance data PCA models clearly did not model the differences in NIR spectra
arising from physical differences in the surface of the friable tablets. This was detected
in the residual space with the Q statistic charts. The physical differences which affected
the spectra were emphasised with the Sg2dl 1 and SNV transformation and were thus
detected from their PCA models.
139
H o tellin g ’s 7^ c o n tro l c tta rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10
10
10 10
10 5 10 15 20 25
10 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

10
I 150
10
10 '
5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.14. MSPC Control charts of lower strength tablet NIR absorbance spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.
140
•32
•30
•14 •12
•16
<56 42
•15
-2 •13 •37
40
-4
•19
-6
•10
•18
—8
-1 -0 .8 -0 .6 -0 .4 -0 .2 0 0.2 0.4 0.6 0.8
PCS S cores
X 10"
Fig. 3.15. PCS versus PC4 scores plot of PCA scores of lower strength tablets (n -
44 batches) (Sg2dll absorbance data) with Hotelling’s 95% and 99% control
ellipses.
141
Hotelling^s MEWMA Control Charts
With absorbance, DT and SNV-DT, this chart shows the exponentially weighted
Hotelling’s 7^ drifting from an out-of-control state to an in control state. This is an
interesting result which confirms the results of blend MSPC. This tablet data set
includes batches which as blends had shown significant drift in the process mean vector
and then moved to near (or) in control level {eg DT model).
With the SNV and Sg2dl 1 models batches 38, 39 and 40 (BN5064, BN5065, BN5067)
had not shown significant sum of squared residuals, Q, and were therefore included in
the PCA. These batches were found to be out of control with the Hotelling’s 7^ phase 2
control charts. The SNV Hotelling’s 7^ MEWMA moved from an out of control state, to
an in control state and then out of control with these batches. The Sg2dl 1 Hotelling’s 7^
MEWMA was much better able to respond to process drift (Fig. 3.14C), it moved out of
control and then drifted to an in control level. At batch 30 (BN5040), it again drifted out
of control (See batches 30, 31 and 32, Appendix B: Table B40), returning again to an in
control level at batch 33 (BN5055). However, at batch 38 (BN5064) it drifted out of
control and then began to drift to a lower level from batch 42 (BN5076) onwards.
PC Shewhart Control Charts
These charts used 99% control limits as with blends. These results were less consistent
than those of Q statistic and Hotelling’s control charts for both blends and tablets.
Batches 2, 5 and 10 were found to be out of control on PC Shewhart control charts with
SNV, DT and SNV DT. Batches 2 and 5 (BN4984 and BN4987) were out of control on
a PC with Sg2dl 1. Absorbance PC Shewhart plots did identify batches 39 and 40 as
(BN5065 and BN5067) out of control on PC6. The loadings on this PC were difficult to
interpret. Sg2dl 1 PC control charts identified batches 31 and 42 (BN5041 and BN5076)
as outliers on PCs 4 and 5 respectively. Again the loadings were difficult to interpret.
142
Process Variance: Anderson^s Asymptotic Normal Approximation
These control charts showed fairly consistent results across models: batches 4, 9, 15 and
16 (BNs: 4986, 4994, 5003, 5013 respectively) showed significant generalised variance
(Appendix B: Table B40). These were not found to have exhibited significant
generalised variance with blend data, however batch 15 was found to have exhibited
excess generalised variance with SNV-DT multiway PCA model [(blend and tablet
absorbance and transmission data) and (blend absorbance and tablet transmission data)
and (blend and tablet absorbance)]. Batches 15 and 16 were also found to have shown
significant process variance with the Sg2dl 1 multiway PCA model (blend absorbance
and tablet absorbance data). With most models, batch 15 had exhibited the highest
sample generalised variance.
3.9.2 Lower Strength Tablet Transmission Data Sets
outlier detection. The R statistic was used as stopping criterion.
Between 6 and 9 components were used for the models. The raw transmission data
required most components to model the data whereas the Sg2dl 1 model required only 6
PCs to fit the data (Appendix B: Table B4). Most models accounted for more than
99.9%SS except Sg2dl 1 which accounted for 98.2%55. All PCs extracted were found to
represent significant amounts of variance. The PC loadings for all models appeared to
represent physical and chemical information. The transmission model’s first 3 PC
loadings did not contain any peaks and may represent scatter (Fig. 3.3). The SNV
model’s first PC were similar and may also represent scatter. The DT and SNV-DT
models showed a broad peak on the first PC loading, however these may still represent
143
scatter as the peaks are less resolved than on the higher order PC loadings. All PC
loadings with Sg2dl 1 model contain sharply defined peaks, which suggests that they
represent chemical information (Fig. 3.16).
Some similarity exists in batches found to be unusual and not fit a given model across
all models (Appendix B: Table B 15). Batch 12 (BN4997) was an outlier with all
models. Batch 21 (BN5026) was an outlier with all models except S g2dll. With SNV-
DT model, batches 20, 21 and 23 (5025, BN5026, 5028) which are virtually consecutive;
were all found to be outliers. Batch 28 (BN5038) was an outlier on SNV, DT and SNV
DT models. Batch 30 (BN5040) was an outlier on SNV-DT and S g2dll models. The
Sg2dll model identified batches 30, 31 and 32 as unusual (BN5040, BN5041,
BN5042). These have consistently been found to be unusual with blend and tablet
absorbance models and are shown to be unusual with multiway PCA models (Section
3.10), and is therefore a consistent result. As the loadings for this model all contain ;
chemical absorption peaks and therefore are likely to be modelling just chemical
information, it is probable that these batches are physically different. This is unlikely to/
be modelled by the PCA model. This interpretation was supported by the C. of A.
results which showed that tablets produced from batch 30 (BN5040) were friable (1
mg). No C. of A. data were recorded for batches 31 and 32 (BN5041 and BN5042).
Hotelling^s Control Phase 1: Monte Carlo Genetic Algorithm
With the exception of Sg2dl 1, all models required an initial control phase 1 batch
screening using the first two PCs. With 6 batches and 100 Monte Carlo searches.
144
0.15 0.15
0.1 0.1
w 0.05
en 0.05
?
1
U -0 .0 5
o- -0 .0 5 a.
- 0.1
- 0.1 -0 .1 5
-0 .1 5 - 0.2
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.1 0.1
0.05
O) en
- 0.1
o -0 .0 5
“ ■ - 0.2
-0 .1 5
- 0 .3
- 0.2
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
0.2
0.1
0.15
0.05
w 0.1
o>
.E 0.05
1 -0 .0 5
^ -0.1 a- -0 .0 5
-0 .1 5
-0.1
-0.2 -0 .1 5
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
Fig. 3.16. Lower strength tablet NIR transmission spectra PCA loadings (S g2dll
data) {n = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5 & F) PC6.
145
potentially unusual batches were identified and excluded. These batches were then
further screened in control phase 1 by inclusion of all PCs and construction of 99%
control ellipsoids (Appendix B: Table B42).
Hotelling Control Phase 2
The Sg2dl 1 model did not require an initial screening, and identified batches 5, 6, 38,
39 and 40 as unusual (Fig, 3.17B). The latter 3 batches (BNs 5064, 5065 and 5067 )
were successfully identified as unusual with all other pre-treatment models, however
this required an initial screening of PCs 1 and 2 (Fig. 3.18). Batch 38 produced tablets
of 1 mg friability; no C. of A. data were recorded for batches 39 and 40.
The SNV model also identified batches 30, 31 and 32 (BNs 5040, 5041, 5042) and 38,
39 and 40 as unusual however these batches were not found to have processed unusually
with Q statistic control chart.
The PCA models of the transmission data sets contained chemical information and
physical information and were more effective at detecting anomalies than the Q statistic
control chart. The SNV transformation was more effective at detecting anomalous
batches than the Sg2dl 1 transformation, probably because more physical information is
modelled in its first two PC loadings. The Sg2dl 1 model appears to effectively remove
physical information, hence batches which have physically processed in an unusual
manner do not fit the model and are detected on the Q statistic chart. With the same
transformation, these batches cannot be detected as unusual on the Flotelling’s 7^
control charts. However, batches which have chemical differences are easily detected on
these control charts and more easily than with the other transformations. The SNV
model produced less consistent results with the Q chart, however all batches thus far
found to be unusual were detected on the Hotelling’s 7^ control charts. This is because
this model still retains physical information in its first 2 PCs.
146
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 Hotelling'S 7^ control chart: p h a s e 2
}
£
X I
10"' 10"'
5 10 15 20 25 30 35 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
X 100
1L. 10
5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
Fig. 3.17. MSPC Control charts of lower strength tablet NIR transmission spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.
147
X 10'
40
•16 •38 -39
•15 •21
•10
•13
2 •32
67
O •19 67
Q.
•33
•35 43
-2
- 3 I-
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2
PC2 Scores X 10
Fig. 3.18. PC2 versus PCI scores plot of PCA scores of lower strength tablets {n =
44 batches) (Sg2dll transmission data) with Hotelling’s 95% and 99% control
ellipses.
148
The Sg2dl 1 and SNV models for these data appear to be the best transformations for
providing the most consistent results.
Hotelling^s 7^ MEWMA Control Charts
With all pre-treatment models, the Hotelling’s MEWMA control charts generally
showed drift towards the theoretical grand mean vector. With transmission data, this
value was within the 99% limits, showing improvement up to batch 27 (BN5037) (Fig.
3.19C). Drift could then be observed, peaking at batch 32 (BN5042). This statistic
drifted out of control from batch 38 onwards (BN5064). The SNV transformation was
less useful in showing this drift - it identified earlier batches as out of control on
Hotelling’s 7^ control phase 2 chart, hence this statistic did not shift back within
control. The DT, SNV-DT and Sg2dl 1 charts all showed an improvement in drift to
within a state of control. With DT model, drift occurred at batches 38 (BN5064)
onwards. With SNV-DT and Sg2dl 1 models, drift occurred at batch 28 (BN5038),
taking this statistic out-of-control. An improvement in drift could be seen from batch 32
(BN5042) onwards with both, but again increasing from batch 38 (BN5064). With the
Sg2dl 1 model, this statistic reduced to within control at batch 37 before again drifting
out of control with batch 38 (BN5064). The Sg2dl 1 chart was considered to perform
best and could detect drift which was not detected on the Hotelling’s 7^ phase 2 control
chart.
With these charts, 99% control limits were used. Batches 5 and 6 signalled out of
control across all pre-treatments.
149
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10
10 10
10
10 5
10
10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b iiity c o n tro i c h a rt: p h a s e 2
10
-5 0 0
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.19. MS PC Control charts of lower strength tablet NIR transmission spectra
PC scores (raw data): A) Hotelling’s control phase 1; B) Hotelling’s control
phase 2; C) Hotelling’s MEWMA & D) Anderson’s asymptotic normal
approximation.
150
Batch 30 (BN5040) signalled out of control with DT and SNV-DT on PC2. The SNV
DT model also detected batch 31 (BN5041) as unusual on PCS.
With this control chart, batches 30, 32 and 39 (BN5040, BN5042, BN5065) were found
to have significant process variance (Appendix B: Table B42). These results are
consistent with multiway PGA results (MPCA lower strength tablet batch indices 25, 27
and 34 respectively (refer to Table 3.6), Appendix B: Table B48). The Sg2dl 1 model
did not identify these batches as having shown excess process variance (however
BN5065 was detected with multiway PCA analysis with this transformation (Appendix
B: Table B48)). Transmission and detrend data performed the best (Appendix B: Table
B42).
3.9.3 Higher Strength Tablet Absorbance Data Sets
outlier detection. The R statistic was used as stopping criterion.
Between 6 and 7 PCs were extracted for the different pre-treatments, with models
explaining more than 99.6%SS for all pre-treatments except Sg2dl 1 (Appendix B: Table
B3). This model only accounted for 95 A%SS with 6PCs. All extracted PCs were found
to be significant using Anderson’s likelihood ratio chi-squared test (Jackson, 1991).
The PC loadings for these models appear to show physical and chemical information.
The absorbance model’s 2 PC loadings appeared to represent scatter (Fig. 3.20A &
B). The 3*^^, 4**^ and 5^ PC loadings appear to contain features characteristic of water,
microcrystalline cellulose and magnesium stearate respectively (Fig. 3.20C, D & E). All
151
0.06 0.02
g) o>
~ 0.04
o -0 .0 2
Q- -0 .0 4
-0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.15 0.1
o) 0.1 ai 0.05
0.05 1
uQ .
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1 0.1
0.05
I o) 0.05
1 u
^ -0 .0 5 Q.
-0 .0 5
-0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
0.1
S, 0.05
1
O
“■ -0 .0 5
-0.1
1200 1400 1600 1800 2000 2200 2400
W avelength/nm
Fig. 3.20. Higher strength tablet NIR absorbance spectra PCA loadings (raw data)
{n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6 & G) PC7.
152
other models did not appear to have loadings that represented scatter, the first, 2"^ and
3"^^ PC loadings of SNV, DT and SNV-DT models contained features characteristic of
water, microcrystalline cellulose and magnesium stearate respectively. The Sg2dl 1
model showed loadings with chemical features; the PC loadings appeared to
represent water.
Consistent batches were found to be outliers with the Q statistic control charts across
models (Appendix B: Table B14). Batches 32 and 33 (BNs 5059 and 5060) were
outliers with absorbance and SNV-DT; batch 33 was an outlier with SNV and DT.
Batches 4 and 6 (BNs 4990 and 4992) were outliers with SNV, DT, SNV DT and
Sg2dll. Batch 42 (BN5080) was an outlier with absorbance and SNV. SNV also
showed batch 43 (BN5081) as an outlier. Comparison of these results with those for
blends, multiway PCA and transmission models suggested that tablet absorbance
measurements did not provide as reliable a model for Q statistic identification of
batches of higher strength tablets that have processed unusually. The transmission
models provided a better indicator of unusual batches (Appendix B: Table B16). This
can be explained by the certificate of analysis data which showed that unusual batches
(detected on blend and multiway models) had a greater thickness than normal (these
tablets may have exhibited greater elasticity). Transmission measurements pass through
the entire tablet and are likely to be sensitive to the increase in pathlength of the tablet,
whereas reflectance measurements were not sensitive to this. (N. B. The opposite of this
was true for some lower strength tablets (Section 3.9.2). Unusual batches, detected at
blend and multiway model Q statistic control charts were found to be friable in
certificate of analysis tests. These tablets are therefore likely to be more brittle than
normal and will show increased fragmentation. This occurs at the surface of the tablet
153
and will affect surface texture and therefore also affect NIR absorbance measurements.
The results of Q statistic control charts for absorbance data for lower strength tablets
(Appendix B: Table B 13) agreed with this finding and provided a much better
indication of this than for transmission higher strength tablet data sets (Appendix B:
Table B 15)).
H o t e l l i n g C o n t r o l Phase 1: Monte Carlo Genetic Algorithm
A preliminary search using 6 randomly selected batches and PCs 1 and 2 was required
with absorbance data only. The other models produced satisfactory results using one
more batch than the number of retained PCs and 90% control levels except Sg2dl 1
which used 12 batches with all 6 PCs in the search.
Hotelling*s Control Phase 2
The effect of different data pre-treatments on the ability of MSPC of their PCA models
was considered alongside multiway and blend MSPC results. This indicated that SNV
and DT pre-treatments provided the most reliable models (Appendix B: Tables 41, 12
and 49) (Fig. 3.21B): batches 36, 40 and 41 (BN5070, BN5078 and BN5079) were
identified as unusual with 99% confidence level.
The absorbance data identified batch 36 as unusual at the 95% level. Interestingly, with
this model, batch 41 can be seen as falling outside the 95% confidence ellipse for PCs 1
and 2. Batches 40 and 36 lie close to the 95% confidence limit. The Sg2dl 1 data set
identified batch 41 as unusual at 99% confidence level and batch 40 as unusual with a
95% confidence level. On the PCI versus PC2 Hotelling’s 7^ control ellipse, batch 36
can be seen to lie very close to the 99% confidence limit (Fig. 3.22).
154
H o tellin g 'S 7^ c o n tro l c tia rt: p h a s e 1 H o te liin g 's 7^ c o n tro l c h a rt; p h a s e 2
I
I
10"' 10"'
5 10 15 20 5 10 15 20 25 30 35 40
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b iiity c o n tro i c h a rt: p h a s e 2
I 150
100
10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.21. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (SNV data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.
155
X 10
o
in
42
O
Q.
•19
•16
69
-2
-3
-2 -1 .5 1 -0 .5 0 0.5 1 1.5 2 2.5
PC2 S cores
X 10'
Fig. 3.22. PC2 versus PCI scores plot of PCA scores of higher strength tablets {n
43 batches) (Sg2dll NIR absorbance data) with Hotelling’s 95% and 99%
control ellipses.
156
Hotelling*s MEWMA Control Charts
The ability of these charts to representatively show drift in the process was found to
depend on how well the Hotelling’s 7^ control phase 2 charts were able to identify
unusual batches. The absorbance and Sg2dl 1 charts were considered best indicators of
process drift, and showed the value of this statistic moving out of control for batches 36,
40 and 41 (BN5070, BN5078 and BN5079) (Fig. 3.23C). This is because with these
charts some batches fall above the 95% confidence level but not the 99% confidence
level. This appeared to cause the exponentially weighted Hotelling’s 7^ value to shift
without greatly falling beyond the 99% level. Evidently, this chart is very sensitive to
large Hotelling’s 7^ control phase 2 values in excess of the 99% control limit (a
different value for A could be examined - in this study a value of 0.1 was used). The
MEWMA chart for DT data was quite a good indicator of drift - it showed the process
drifting to a state of control by batch 35 (BN5069), then drifting out of control with
batch 36 (BN5070), moving again within control by batch 38 (BN5073) and then
drifting out of control from batch 39 (BN5074) onwards. The SNV-DT model showed
the process drift to a state of control by batch 29 (BN5053) and then continued to drift
out of control thereafter. The SNV model appeared more erratic and shifted
considerably.
With these charts, 99% confidence limits were used. The best performing model was
DT which identified batches 36 and 41 (BN5070 and BN5079) as significantly different
for PCs 1 and 5 respectively. The loadings for PCI suggest moisture, for PC5 this is
more difficult to interpret. SNV-DT and Sg2dl 1 models both identified batch 41
(BN5079) as unusual on PCs 5 and 6 respectively. Again, the loadings on these PCs
were difficult to interpret.
157
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
I I
I 10 I
10"' 5 10 15 20 25 5 10 15 20 25 30 35 40
E 600
10
5 10 15 20 25 30 35 40 10 15 20 25 30 35
Fig. 3.23. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (Sg2dll data) in = 43 batches): A) Hotelling’s control phase 1; B)
158
The results of these charts did not appear to be consistent with blend and multiway
control charts. This is probably because reflectance measurements are not sensitive to
differences in pathlength. Increased tablet thickness {i.e. pathlength) was found to have
occurred with some tablet batches identified as unusual in multiway models (compare
Tables: B42, B45, B47 and B49, Appendix B).
3. 9. 4 Higher Strength Tablet Transmission Data Sets
outlier detection. The R statistic was used as stopping criterion (Appendix B: Table B5).
Between 3 and 7 PCs were required for PCA models. The Sg2dl 1 model required
fewest PCs, transmission and DT models required 7 PCs. Most models explained more
than 99.5 %SS except Sg2dl 1 which accounted for just 98.9 %SS.
All PCs extracted were found to represent significant amounts of variance using
Anderson’s likelihood ratio test (Appendix B: Table B5) (p = 0.01). The loadings for
these models appeared to contain physical and chemical information: with transmission,
this was evident with the first few PCs (Fig. 3.24); with SNV and DT the first PC
loadings appeared to represent physical information. The SNV-DT and Sg2dl 1 models
had loadings which appeared to represent chemical information (Fig. 3.25).
With these models, a number of batches could be identified as having processed
unusually and had significant (p = 0.01) Q statistics. These results agreed with those of
blend and multiway models (Appendix B: Tables 12 and 49) and were in contrast to
159
0.15
0.075
0.1
^ 0.07
0.05
O
a.
0.065
-0 .0 5
800 900 1000 1100 1200 800 900 1000 1100 1200
0.1 0.2
.E 0.05
8,
c
0.1
1 1
-0 .0 5
- 0.1 -0.1
800 900 1000 1100 1200 800 900 1000 1100 1200
0.1
0.2
0.05
I» 0.1
-I -0 .0 5 1
-0.1 -0.1
-0 .1 5 -0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
0.2
o) 0.1
Ü
1
Q.
-0.1
800 900 1000 1100 1200
W avelength/nm
Fig. 3.24. Higher strength tablet NIR transmission spectra PCA loadings (raw
data) (n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.
160
0.1 0.2
g) 0.05
I
-0 .0 5
-0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
0.05 0.15
0.1
o> oE> 0.05
I -0 .0 5
-I -0 .1
& -0 .0 5
-0 .1 5
-0.1
- 0.2
-0 .1 5
800 900 1000 1100 1200 800 900 1000 1100 1200
Fig. 3.25. Higher strength tablet NIR transmission spectra PCA loadings (SNV DT
data) (n = 43 batches): A) PCI; B) PC2; C) PC3 & D) PC4.
161
those of absorbance models for these batches. Previous results indicated that these
batches are physically different from normal batches. Transmission measurements pass
through the entire tablet and are likely to be more sensitive to pathlength differences
than diffuse reflectance measurements. Batches 26, 27, 28, 29, 30, 31, 32, 38, 40 and 41
(BNs 5050, 5051, 5052, 5053, 5054, 5058, 5059, 5073, 5078, 5079) were found to be
unusual. These results were consistent between the models (Appendix B: Table B16),
and are consecutive batches, suggesting deviation in process performance with time.
Examination of certificate of analysis data showed that these batches were thicker than
normal, confirming the NIR model predictions.
Between 23 and 32 batches were selected for control phase 1 (Appendix B: Table B43).
The PCA models produced from transmission and DT data required a preliminary
Monte Carlo search using the first two PCs and random selection of 6 batches to
identify unusual batches. All other models were able to identify unusual batches using
all of the modelled components in the Monte Carlo search.
Hotelling^s 1^ Control Phase 2
All models identified similar batches of tablets as unusual (Appendix B: Table B43),
with a number of groups of consecutive batches identified as unusual, suggesting a drift
from normal process operating performance. Batches: 24 to 31 (BN5048, BN5049,
BN5050, BN5051, BN5052, BN5053, BN5054, BN5058) and 40 and 41 (BN5078 and
BN5079) were identified as unusual (Appendix B: Table B43). These batches produced
tablets with average thickness above the limit of 4.6 mm, and from batch 28 to 30
(BN5052, BN5053, BN5054) also produced tablets which were friable (2 mg, 4 mg and
1 mg respectively). Batches 40 and 41 produced tablets of average thickness above the
162
maximum limit and batch 40 produced tablets which were friable (1 mg). These tablets
had very large transmission values across the spectral range scanned. An example
control chart for Sg2dl 1 data is shown in Fig. 3.26B. These batches were clearly
observed as falling outside the 99% Hotelling’s 7^ control ellipse on the PCI versus 2
score plot of Sg2dl 1 data (Fig. 3.27).
H o t e l l i n g M E W M A Control Charts
With this control chart, the statistic was largely out of control for all batches with the
models produced from SNV and SNV DT transmission measurements. The charts using
PC scores from transmission, DT and Sg2dl 1 transmission measurements were more
consistent with the Hotelling’s control phase 2 charts (Fig. 3.26C). Significant drift
ip = 0.01) in the process mean vector was observed from batch 24 (BN5048) onwards.
The value fell from batch 31 (BN5058), consistent with Hotelling’s 7^ control phase 2
charts, however it began drifting at batch 40 (BN5078), also consistent with previous
findings.
Consistent results were obtained with this control chart across the models (Appendix B:
Table B28). Batches 30, 36 and 40 (BN5054, BN5070, BN5078) were identified as
significant different (p = 0.01 level).
Process Variance: Anderson*s Asymptotic Normal Approximation
With this control chart, batches 30,40 and 41 (BN5054, BN5078 and BN5079) were
found to have exhibited significant process variance across the models tested (Appendix
B: Table B43). Batch 30 and 40 produced tablets of an average thickness above the
maximum limit which were also friable (1 mg); batch 41 produced tablets which were
163
H o tellin g ’s 7^ c o n tro l c tia rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10"' 5 10 15 20 25
10"' 5 10 15 20 25 30 35 40
i 300
'U Id' S’ 200
c 100
10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35
Fig. 3.26. MSPC Control charts of higher strength tablet NIR transmission spectra
PC scores (Sg2dll data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
164
0.04
40
0.035
0.03
0.025
0.02
ü) 0.015 •30
0.01
0.005
•38
66 •14
^ % •J|B-9-1^Û11
-0 .0 0 5
- 0.01
-3 -2 1 0 1 2 3
PC2 S c o re s
X 10
Fig. 3.27. PC2 versus PCI scores plot of PCA scores of higher strength tablets (n
43 batches) (Sg2dll NIR transmission data) with Hotelling’s 7^ 95% and 99%
control ellipses.
165
above the maximum average thickness.
3.10 Multivariate Statistical Quality Control of The Entire Process
In this section multiway PCA models are examined. For the two strengths of tablet, the
models were derived from both blend NIR spectral data and tablet NIR spectral data.
Different combinations of blend and tablet spectral data were examined. Consistency
between the results of these models was examined, and compared with reference
analysis data. The combinations of blend and tablet spectral data examined were :
1. Blend absorbance and tablet absorbance NIR data (including pre-treatments.
Section 3.6)
2. Blend absorbance and tablet transmission NIR data (including pre-treatments.
Section 3.6)
3. Blend absorbance and tablet absorbance and transmission NIR data (including
pre-treatments. Section 3.6)
The results of MSPC of each of these models were compared between MPCA models
and against results of MSPC of PCA models for the blend and tablet stage. They were
also compared against reference analysis data. These comparisons were made so that the
relative importance of tablet absorbance and transmission data for modelling and
monitoring process performance could be determined.
3 .1 0 .1 Lower Strength Tablet Process
Multiway Principal Components Analysis
For the lower strength multiway models tested, between 3 and 8 PCs were required to
model the data sets (Appendix B: Tables B6, B8 & BIO). The Sg2dl I transformation
166
produced models which required the least number of components and the multiway
model produced from raw blend and tablet absorbance and transmission data required
most PCs. The loadings for each model appeared to represent chemical and physical
information, however their precise interpretation with these data sets was difficult.
With all models examined, the multiway models produced from SNV blend absorbance
and tablet absorbance data (Appendix B: Table B 17) and Sg2dl 1 blend absorbance and
tablet transmission data (Appendix B: Table B 19) produced MPCA models which were
able to identify groups of batches previously found to have processed unusually. These
were batches: 25 to 28 (BN5040, BN5041, BN5042, BN5055) and 33 (BN5064) and 35
to 37 (BN5067, BN5075, BN5076).
Between 11 and 39 batches were selected in control phase 1 using this algorithm, as for
other PCA models.
Hotelling ^s Control Phase 2
MSPC control charts were found to perform best with blend and tablet transmission data
(Appendix B: Table B46) and blend and tablet absorbance and transmission data
(Appendix B: Table B48) (Fig. 3.28B). The MPCA models derived from pre-treated
blend and tablet absorbance data identified systematic variation in the process with the
first 18 batches selected for the control ellipsoid (Appendix B: Table B44). With these
batches, this systematic variation was traced to the blend data which showed two groups
of spectra differing only in offset.
167
H o te llin g 's 7® c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7® c o n tro l c h a rt: p h a s e 2
I I
I I
10"'
5 10 15 20 25 5 10 15 20 25 30 35
MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b iiity c o n tro l c h a rt: p h a s e 2
5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.28. MSPC Control charts of lower strength tablet multiway PCA scores
(SNV DT blend absorbance and tablet absorbance and transmission data) (n = 39
batches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.
168
With the other MPCA models, batches 25, 26, 27 (BN5040, BN5041, BN5042) and 33,
34, 35 and 39 (BN5064, BN5065, BN5067, BN5967) were found to have processed
unusually with this statistic (p = 0.01). These results agree with previous blend and
tablet PCA models and could be traced to one or two PCs (Appendix B: Tables B44,
B46 and B48). An example PC2 versus PCI score plot with Hotelling’s 7^ 95% and
99% control ellipses is shown in Fig. 3.29 for the SNV DT multiway data set.
Hotelling^s MEW MA Control Charts
With the charts produced from the MPCA models containing transmission data, process
drift was observed from batch 23 (BN5039) onwards. The process mean vector drifted
out of control (p = 0.01) and from batch 27 (BN5042), began to shift towards the grand
mean vector, however further drift in process operating performance from batch 32 was
observed (Fig. 3.28C). These charts were very similar to those for the tablet models.
With the MPCA models derived from blend and either tablet absorbance or transmission
measurements, batches 10 (BN5017) and 25 (BN5040) were found to be unusual at the
p = 0.01 level (Appendix B: Tables B29 & B31). The MPCA model derived from blend
absorbance and combined tablet absorbance and transmission measurements identified
batches 25 (BN5040), 35 & 36 (BN5067 and BN5075) as having processed unusually
(Appendix B: Table B33).
Process Variance: Anders on *s Asymptotic Normal Approximation
With this control chart, some consistent results between MPCA charts were observed.
The overall MPCA model incorporating all NIR measurements identified batches 24
169
1.5
0.5
^5
68Q8 ^220
60
2
o
Q. •10
« -14-15
-0 .5
- 1 . 5 '-
— 0.8 - 0.6 -0 .4 - 0.2 0 0.2 0.4 0.6
PC2 S c o re s
Fig. 3.29. PC2 versus PCI scores plot of multiway PCA scores of lower strength
tablet process (SNV DT blend absorbance and tablet absorbance and transmission
data) (w = 39 batches) with Hotelling’s 7^ 95% and 99% control ellipses.
170
(BN5039 [re-blend]), 25 (BN5040) and 27 (BN5042) as having exhibited excessive
process variance (Appendix B: Table B48). The most consistent results with this were
obtained from the MPCA models calculated from blend and tablet transmission data
(Appendix B: Table B46) (Fig 3.28D).
3 .1 0 .1 Higher Strength Tablet Process
Multiway Principal Components Analysis
Between 3 and 8 PCs were required to model the data sets. Raw data tended to produce
models requiring most PCs (Appendix B: Tables B7, B9 & B 11). Models produced
from SNV data which included SNV transmission spectra required only 3 PCs
(Appendix B: Tables B9 & B 11) as did the Sg2dl 1 model produced from blend and
tablet absorbance and transmission measurements (Appendix B: Table B 11).
With this control chart, consistent results were produced between models derived from
blend and tablet absorbance data and blend and tablet transmission data (Appendix B:
Tables B18 & B20). Batches found to have processed unusually and therefore not fit the
model plane were: 23 to 25 (BN5049, BN5050, BN5051); 27 and 28 (BN5053 and
BN5054) and 38 and 39 (BN5078 and BN5079). These results are consistent with
previous PCA models. These groups of consecutive batch numbers show that the
process is deviating from its optimum conditions with time.
Between 18 and 41 batches were selected in control phase 1 using this algorithm, as for
other PCA models.
171
Hotelling^s Control Phase 2
With this control chart, models produced from blend and tablet transmission and blend
and tablet absorbance and transmission measurements produced the most consistent
results which were also in agreement with those of previous PCA models (Appendix B:
Tables B47 & B49). This is shown in Fig. 3.30B for the Sg2dll multiway data set.
Batches: 23 to 29 (BN5049, BN5050, BN5051, BN5052, BN5053, BN5054, BN5058);
34 (BN5070) and 38 and 39 (BN5078 and BN5079) were found to have processed
unusually, typically on the first PC (Appendix B: Tables B47 & B49). These results
were clearly observed on PCI versus PC2 scores plots for Sg2dl 1 data with Hotelling’s
7^ 95% and 99% control ellipses (Fig. 3.31).
Hotelling*s MEWMA Control Charts
The control charts produced from MPCA models which included transmission data
were very consistent with those of PCA data for higher strength tablet transmission
measurements. With the overall MPCA models (blend and tablet absorbance and
transmission measurements), drift in the process mean vector was observed for all
control charts. With raw, DT and Sg2dl 1 data (Fig. 3.30C), this occurred from batch 23
(BN5049) onwards, and shifted toward the process mean vector at batch 27 (BN5053),
and then drifted further from batch 33 (BN5069). With the SNV DT model, the process
mean vector drifted within control (p = 0.01) at batch 30 (BN5059), however it then
drifted out of control from batch 32 (BN5068). The SNV DT Hotelling’s 7^ MEWMA
chart performed best.
With these control charts, batches 24 (BN5050), 34 (BN5070) and 38 (BN5078) were
identified as unusual (Appendix B: Tables B30, B32 & B34). As with the Hotelling’s 7^
172
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2
Ik
I I
I 1
10"' 5 10 20 25 5 10 15 20 25 30 35 40
'x 1 0 0 0
Ë 500
10'* 10 15 20 25 30 35 40 10 15 20 25 30 35
Fig. 3.30. MSPC Control charts of higher strength tablet multiway PCA scores
(Sg2dll hlend ahsorhance and tablet absorbance and transmission data) {n = 41
hatches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.
173
0.04
68
0.035
0.03
0.025
0.02
69
w 0.015 ■28
0.01
67
0.005 63
65
66
64
10 6 3 .
^ "—
^3 -tot -5«4«-7
-0.005
- 0.01
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2 2.5
PC2 Scores X 10"
Fig. 3.31. PC2 versus PCI scores plot of multiway PCA scores of higher strength
tablet process (Sg2dll blend absorbance and tablet absorbance and transmission
data) {n = 41 batches) with Hotelling’s 95% and 99% control ellipses.
174
control charts, MPC A models which included tablet transmission data produced most
consistent results (Appendix B: Tables B30 & B34).
Process Variance: Anderson *s Asymptotic Normal Approximation
With these control charts, batches 20 (BN5046), 23 (BN5049), 28 (BN5054), 38
(BN5078) and 40 (BN5080) were found to have exhibited significant process variance
with the MFC A data sets which included transmission measurements (Appendix B:
Tables B47 & B49) (Fig. 3.30D).
175
3.11 Summary of Results
The results o f this chapter are summarised in Tables 3.7 and 3.8.
Table 3.7. Summary of results for lower strength tablet process.

NLR blend absorbance NIR tablet absorbance NIR tablet transmission Multiway NIR data Unusual C. of A. Cixnment
number data predicted unusual data predicted unusual data predicted unusual predicted unusual data
Q A" Q r* A Q A Q A
4993 ! •
4994 2 • • •
4996 3 • • • •
4997 4 • . . • .
4998 5
5002 6
5003 7 • •
5013 8 . •
5015 9
5017 10 • Increase in content uniformity
5018 11 • . Increase in content unifwmity
5025 12 •
5025"' 13 •
5026 14 •
5027 15
5028 16
5028"’ 17
5029 18 . •
5035 19
5036 20
5037 21
5038 22
5039 23 Excessive moisture deviation
5039"’ 24 Excessive moisture deviation
5040 25 . •
5041 26 Friability of Img
5042 27 Friability of 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 Friability of Img
5065 34 • N/A' C of A. data missing
5067 35 N/A' C. of A. data missing
5075 36
5076 37 • Low drug content per tablet
5077 38
5967 39 •
represents Anderson’s asymptotic normal approximation,

rb
denotes a re-blended batch.
N/A denotes certificate of analysis data not available
176
Table 3.8. Summary of results for higher strength tablet process.
NIR blend absorbance NIR tablet absorbance NIR tablet transmission M ultiway NIR data Unusual C. of A. Comment
n u rn b ^ data predicted unusual data predicted unusual data predicted unusual predicted unusual
Q A" Q A Q A Q 7" A
4991 1
4992 2
4999 3
5000 4
5001 5
5009 6
5010 7
5011 8
5021 9
5022 10
5023 11
5024 12
5031 13
5032 14
5033 15
5043 16
5043"’ 17
5043™ 18
5044 19 .
5046 20 •
5047 21
5048 22
5049 23 • • • Thick tablets (4.62 mm) of high
m oisture (4.44% )
5050 24 Thick tablets (4.62 mm)
5052 26
5053 27 Friability of 1 mg
5054 28 • • • Friability 4 mg; tablet thickness
= 4.63 mm
5058 29 . • Thick tablets (>4.6 mm)
5059 30 Thick tablets (>4.6 mm)
5070 34 • • Thick tablets (4.61 mm)
5071 35 Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36 •
5074 37 Friability of 4 mg
5078 38 • • Friability of 1 mg; thick tablets
(4.62 mm)
5079 39 . • Thick tablets (4.62 mm)
5080 40 •
5081 41 •
^ represents Anderson’s asymptotic normal approximation.

* denotes a re-blended batch.
177
3.12 Conclusion
The aim of this study was to determine whether NIR spectrometry could be used for
quality control of a pharmaceutical tablet manufacturing process through application of
MSPC procedures. The ability of the method to identify trends in process performance
and their relationship to reference analytical product quality data were therefore
determined. Statistical correlations were also made of MSPC results with raw material
batch usage data, in an attempt to determine whether particular batches of raw materials
lead to poor process performance.
The MSPC procedures were applied to PCs of the NIR spectral data. These were used as
they are linear combinations of the original data which summarise the systematic
variability in a ‘least squares sense’. The dimensionality of the data were therefore
reduced to a few variables, without loss of information. This allowed easier process
monitoring.
The statistical assumptions made of the data were that most batches of product were
from the same multinormal population and were produced whilst the process operated
within statistical control. This assumption was verified by reference of MSPC control
phase batches to their reference analytical data, i.e. all control phase 1 batches produced
high quality tablets.
Trends in process performance could be detected at each process stage by the MSPC
procedures. Systematic variability could be identified in the blending process from raw
or pretreated spectral data. Within this period of systematic variability, a number of
batches of blend were produced which were of unusual quality or which produced
tablets of unusual quality. These exhibited excessive blend moisture deviation and
deviation in drug substance content, and were identified as significantly different by the
NIR method. Some of these blends were not tabletted. The placebo blend was also
identified as unusual by the NIR method and indicates that this method may be useful
178
for surveillance of counterfeit medicines. Many of the other blends identified as unusual
by the NIR method, but not having shown unusual reference analysis results, ultimately
produced batches of unusual tablets. These unusual product quality included tablet
friability, increased tablet thickness and prolonged dissolution time. The differences in
unusual product quality may relate to the particular batch numbers of raw materials used
as significant correlations were observed with MSPC results.
The NIR measurements required to implement the MSPC method includes diffuse
reflectance measurements of the blends and both diffuse reflectance and transmission
measurements of the tablets (in this study these raw data were transformed to apparent
absorbance data). Though batch averaged spectra were used for PC A, several
measurements of blends and tablets are required for each batch for detection of process
variability by the NIR method, eg excessive moisture deviation in blends.
With both strengths of tablet, diffuse reflectance and transmission measurements are
necessary and provide different, but complementary information. The diffuse
reflectance measurements (expressed as apparent absorbance) were useful for detecting
physical anomalies which affected the tablet surface, eg friability. The transmission
measurements (expressed as apparent absorbance) were useful for qualifying tablet
thickness and drug substance content. Hence both measurements should be made. The
most useful scatter correction pre-treatments were SNV-DT and S g2dll.
Overall, the results suggest that with a properly validated reference set of blend and
tablet NIR measurements, the MSPC method could be solely used for quality control
and assurance of the process at the blending and tabletting stage and to monitor process
performance with time.
179
CHAPTER 4
Multivariate Statistical Process Control of a
Pharmaceutical Process Using Partial Least
Squares Regression (PLSR) of Near Infrared
and Reference Analysis Measurements
4.1 Introduction
In Chapter 3, MSPC procedures were applied to PC scores of NIR spectral
measurements of blends and tablets. The ability of this ‘model-free’ approach to process
control and monitoring was determined by comparison of MSPC results with reference
analytical measurements. In this chapter, the multivariate methods known as partial
least squares regression (PLSR) and multiblock* PLSR are examined. These methods
maximise the covariance between NIR spectral measurements and reference analytical
measurements and therefore produce latent vectors which are most closely related to the
reference analytical values. The PLS scores produced may be monitored in the same
manner as described in Chapter 3 for PC scores.
In Sections 4.3, singleblock^ PLS models of blends and tablets respectively, are
subjected to MSPC. The predictive abilities of these models are compared with those of
Chapter 3.
Section 4.4 models the entire process by multiblock PLSR. Conclusions regarding these
methods of process monitoring are discussed in Section 4.6.
multiblock data sets are a collection of three-way data sets of process data at each process stage.
' singleblock data sets are three-way data sets of process data of one process stage.
180
4. 2 Near Infrared And Reference Analysis Data Sets Used
In this study, the NIR spectral and reference analysis data sets (Section 3.4) used were
those of Section 3.6.3 for multiway PC A of the process for the two strengths of tablet
(Tables 3.4 and 3.5).
The number of batches of blends and tablets used in the manufacture of the lower
strength tablet was therefore 39 (Table 3.6). For the higher strength tablet process, 41
batches of blends and their corresponding tablets were examined (Table 3.6).
With a few batches produced, some reference analysis data were missing.
4. 2 .1 Data Analysis And Pre-treatment
The spectral data were analysed using code programmed in Matlab 5.2 Scientific and
Technical Programming Language (The Mathworks Inc., Natick, MA, USA). A number
of different pre-treatments of blend and tablet data were examined. The data pre
treatments examined were those which were previously shown to produce the best
multivariate principal components analysis process models for such data (Section 3.11).
These were:
1. Raw spectral data (absorbance/transmission);
2. SNV-DT;
3. S g2dll.
These 2 pre-treatments were applied to blend absorbance data, and absorbance and
transmission data for each of the two strengths of tablet. This provided 18 data sets
(including multiway data sets of blend and appended tablet absorbance and transmission
data) to be studied via projection to latent structures and multiblock projection to latent
structures.
181
4. 3 Statistical Quality Control of Pharmaceutical Blends And Tailets by Single
Block PLSR
The method of PLS summarises the important variability in both the process (NIR
spectral data) (X) and the final product quality data (Y) (Morud, 1996). This procedure
projects the information in the high-dimensional data spaces (X, V) dowi onto low
dimensional spaces defined by a small number of latent variables. The NR (X) and
final product quality (Y) data sets were first mean-centred and scaled to mit variance
and then decomposed according to equations (1.7.18) and (1.7.19) respectively.
The number of PLS components required to extract the information fromX and Y was
judged to be 6 components for all models. This number of components vas selected as
it modelled a considerable amount of the Y data and was also the numbeiof
components used with most principal components analysis model in Chapter 3. An
advantage of this algorithm is its ability to handle missing data.
4. 3. 1 Singleblock PLS Model Variability
Singleblock PLS models (Wangen and Kowalski, 1988) were created forraw and pre
treated spectral data sets (SNV DT, 11 point quadratic Savitzky-Golay snoothed 2"^
derivative) for the blends used to produce lower strength tablets {n = 39 batches) and for
combined absorbance and transmission measurements of the lower strength tablets {n =
39 batches). The models were calculated using the average spectrum of tlose recorded
for each batch. The average spectrum of each batch was used instead of several
measurements to eliminate systematic variability within measurements of the same
batch introduced by particle size effects and differences in scatter from tie surfaces of
the glass vials and tablets and also because average measurements tend tc follow a
normal distribution.
182
Single block PLS models were also created for raw and pre-treated NIR data sets of
blends used to produce higher strength tablets (« = 41 batches) and of combined NIR
absorbance and transmission measurements of higher strength tablets (« = 41 batches)
using the average spectrum of each batch. This produced a total of 12 models of blends
and tablets for each strength and for raw and pre-treated data sets which were monitored
subsequently.
For each of the singleblock PLS models 6 components were considered to be an
appropriate rank for the models. The decision to use this size of model was based on
previous experience of multivariate PCA projection of these data and because this rank
explained most variability within the NIR data sets and significant amounts of
variability within the certificate of analysis data sets (Appendix C: Tables C l to C4).
With raw data sets for blend and tablet models, the high amount of variance explained,
typically above 99.6%, is due to multiplicative scatter within the data sets which
accounts for most variability in the spectra. The amount of variance accounted for by
these models of the certificate of analysis data was therefore not surprisingly lower: 37
and 53% for lower strength blend and tablet models respectively and 27 and 29% for
higher strength blend and tablet models respectively. The lower variability accounted
for in the certificate of analysis data probably arises from the fact that these data do not
contain any particle size information.
With the SNV DT singleblock PLS models, slightly lower variability in the NIR data
sets is accounted for by the models (Appendix C: Tables C l to C4) due to scatter
correction. The 11 point Savitzky-Golay 2"^ derivative transformation effectively
removed much of the multiple scatter information from the NIR data sets and produced
single block PLS which accounted for similar amounts of variability for both the NIR
and certificate of analysis data sets (Appendix C: Tables C l to C4).
183
4. 3. 2 Quantitative Calibration o f Individual Certificate o f Analysis Blend and
Tablet Variables by Partial Least Squares Regression
Multivariate projection methods for monitoring process operating performance of
multivariate processes have been shown to work well where all process data and
product quality data are monitored (MacGregor et al, 1994). This is because the
variability within and between process and product quality variables is required to
model the process (MacGregor et al, 1994). In this study, the ability to produce
quantitative models of the individual certificate of analysis variables was investigated.
Only low amounts of variability could be modelled for some of the individual blend and
tablet variables and was not considered accurate enough for future prediction of
individual reference analysis variables (Appendix C: Tables C7 & C8). The low
variance within each variable is likely to be the reason for this.
4. 3. 3 Singleblock PLS Loadings
The loadings of the single block PLS models appeared to represent physical and
chemical information. However, their precise interpretation was difficult, especially
with the combined tablet absorbance and transmission data.
4. 3. 4 Q Statistic Monitoring o f Unusual Batches
The performance of this control chart was determined for each model type and for the
different data pre-treatments by comparison of results above the 99% significance level
with the results from the certificate of analysis reference data {i.e. significant Q values
and corresponding unusual reference analysis values). In particular, the ability of this
chart to identify anomalies in the process in batches preceding those which produced
lower quality product and failed reference laboratory tests, was examined as this would
be useful for detecting trends in the process over time. PLS models for each process
184
stage (blend and tablet) were created for the different pre-treated data sets in a recursive
fashion, with batches whose Q statistic exceeded the 99% significance level excluded
from the model. Each PLS model had 6 components
Singleblock PLS Models
With the lower strength tablet data set, raw data showed batches 22 to 26 as having
processed unusually at the blend stage (Appendix C: Table C9). Of these batches,
batches 23 (BN5039), 24 (BN5039 re-blend) were re-blends of a blend which was found
to exhibit excessive moisture deviation throughout (moisture deviation for batch 23 =
5.94%, limit: < 5%). With the SNV DT and Savitzky-Golay 2"^ derivative blend data
sets, batch 25 (BN5040) was found to have processed unusually despite normal
certificate of analysis data (Appendix C: Table C9). With the SNV DT data set, batches
34 (BN5065), 35 (BN5067) and 36 (BN5075) were found to have processed unusually
at the blend stage (Appendix C: Table C9) but had reference laboratory results within
limits. The PLS models for lower strength tablets showed some agreement with this
result: batches 36 and 37 (BN5076) were outside the 99% limit with the SNV DT and
Savitzky-Golay 2"^ derivative models; batch 32 (BN5062) was outside the control limit
with raw and SNV DT models and batch 33 (BN5064) was outside the limit for raw
data (Appendix C: Table CIO). The certificate of analysis results showed that batch 33
produced tablets which were friable (1 mg). Reference data for batches 34 and 35 were
missing. For batch 37 the drug substance content was found to be low (4.83 mg/tablet,
range: 4.85 to 5.15 mg/tablet). With the Savitzky-Golay tablet model, batches 26
(BN5041), 27 (BN5042) and 28 (BN5055) were found to have processed unusually
(Appendix C: Table CIO). Their certificate of analysis data revealed that batches 26 and
27 showed friability of 1 mg and 2 mg respectively.
Clearly, these control charts were able to detect unusual process behaviour at the blend
185
and tablet stages with consistency between results for the two stages. Raw data and
SNV DT data sets were able to detect process anomalies at the blend stage which were
not detected by the reference analytical data.
With the higher strength tablet data set, batches 35 (BN5071), 36 (BN5073), 37
(BN5074), 38 (BN5078), and 39 (BN5079) were all found to exceed the 99% limit with
the SNV DT blend data set (Appendix C: Table C l 1). The certificate of analysis data
for the blends of these batches were all within limits. However, the reference analytical
data showed that batches 35, 37 and 38 produced tablets which exhibited friability of 1
mg, 4 mg and 1 mg respectively. In addition, batches 35, 38 and 39 also had average
tablet thicknesses which were above the limit of 4.6 mm (all 4.62 mm). Batches 32
(BN5068), 35 and 38 were outside the 99% limit for the Q statistic with the Savitzky-
Golay 2"^ derivative blend data (Appendix C: Table Cl 1) and batches 33 (BN5069), 34
(BN5070) and 39 were outside the 99% limit with the blend raw data (Appendix C:
Table C l 1). Batches 32 and 33 did not produce tablets which were friable, but the
average tablet thicknesses for these batches were 4.62 mm and 4.61 mm respectively -
outside the limit. Batch 34 produced tablets which showed longer than normal
disintegration time (17 seconds) and had an average tablet thickness of 4.66 mm, which
is above the limit. The tablet Q statistic control charts which performed best were those
for raw and Savitzky-Golay 2"^ derivative data. These identified batches 35 and 38 and
batches 35, 37 and 39 as unusual respectively (Appendix C: Table C l2). With the
Savitzky-Golay 2"^ derivative tablet PLS model, batches 27 (BN5053) and 28
(BN5054) were identified as unusual (Appendix C: Table C l2). These batches were
both friable (4 mg and 1 mg respectively) and batch 28 had an average tablet thickness
of 4.63 mm. With the higher strength tablet raw data PLS model, batches 28 (BN5054),
29 (BN5058), 30 (BN5059) and 31 (BN5060) were identified as unusual (Appendix C:
Table C12). Batches 29 to 31 had average tablet thicknesses above the limit of 4.6 mm.
186
The Q statistic control charts for the higher strength tablet process were also able to
identify batches and groups of consecutive batches which differed from the normal data
set at both the blend and tablet stages. The manufacture of these batches ultimately
produced tablets of lower quality. This deviation from normal process operating
performance was identifiable from blend data despite showing no unusual reference
analytical results at that stage. Overall, the SNV DT data set showed the best
performance for the higher strength tablet process of the pre-treatments tested.
4. 3. 5 MSPC o f Singleblock PLS Models o f Blends A nd Tablets
The PLS scores of the single block models corresponding to the latent vectors of the
NIR data were used for MSPC monitoring. Estimation of the control phase 1 batches
was performed by a Monte Carlo simulation. This involved random selection of the
scores of 8 or 9 batches and construction of 99% Hotelling’s f^ control ellipses as
described in Chapter 3. The Hotelling’s distance was then measured for the scores of
the remaining batches from this ellipse and batches which had significant values were
recorded. This process was repeated 200 times to produce a frequency bar chart which
showed the frequency that any batch had been found to be significantly different from
the control group. Batches which had a frequency greater than zero were not used for
estimation of the process variance-covariance matrix and process mean vector. All
batches were then monitored in control phase 2 using those batches which were deemed
to be in-control {i.e. Monte Carlo bar chart frequency = 0) as the control phase 1 group.
Hotelling^s Control Phase 1 For Blends A nd Tablets
With raw data for the blends of lower strength tablet process and higher strength tablet
process batches, the Monte Carlo search was unable to identify any unusual batches
(Appendix C: Tables C17 and CIS). Most batches were therefore considered by the
187
algorithm to be in-control {n - 39 batches of lower strength tablet process blends, « = 36
batches of higher strength tablet process blends). For both processes, examination of the
scores revealed that they were evenly divided into two clusters. This was found to be
due to a difference in offset in the original blend absorbance data (Fig. 4.1), probably
arising from different particle size distributions and porosity. With the PLS models
produced from raw tablet data, these distinct clusters were not observed on the score
plots, with some batches identified as unusual at control phase 1. The results of the
tablet models produced from raw data for both strengths of tablet showed agreement
with results from the Q statistic control charts (Appendix C: Tables C19 & C20), hence
raw data was not considered useful for monitoring the blends and spectral scatter
correction was considered necessary. With the SNV DT transformation, 17 and 31
batches were used in control phase 1 for the lower strength tablet process and higher
strength tablet process blends respectively (Appendix C: Tables C17 & CIS). With the
Savitzky-Golay 2"^ derivative data, 29 and 32 batches were used in control phase 1 for
the lower strength tablet process and higher strength tablet process blends (Appendix C:
Tables C l7 & CIS). PLS models for the two strengths of tablet used between 17 and 2S
batches for the lower strength tablet process tablets (Appendix C: Table C l9) and used
between 19 and 34 batches for the higher strength tablet process tablets (Appendix C:
Table C20).
Hotelling*s Control Phase 2 For Blends A nd Tablets
Scatter correction was considered a useful pre-treatment of the blend absorbance data.
With the SNV DT scatter correction of the lower strength tablet process data set, a
number of batches, some consecutive in batch number, were found to have significant
Hotelling’s 7^ values (Appendix C: Table C17). These were compared with results of
188
0.3
- 0.1
- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm
0.3
n - 0.1
- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm
Fig. 4.1. Blends batch mean absorbance spectra for: A) lower strength tablet
process {n = 39 spectra) and B) higher strength tablet process (n = 41 spectra),
showing spectra for each process divided into two classes with different offsets
(batches 1 to 18 for lower strength process blends have lowest offsets, batches: 1 to
17 ; 19 to 20; and 23 to 25 for higher strength process blends have lowest offsets).
189
the lower strength tablet process SNV DT lower strength tablet process data set. Batches
25 (BN5040), 26 (BN5041) and 27 (BN5042) and batches 33 (BN5064) and 35
(BN5067) were found to have significant values at both the blend and tablet stage
(Appendix C: Tables C17 & C19). Batches 26 and 27 produced tablets with friability of
1 mg and 2 mg; batch 33 produced tablets with friability of 1 mg (tablet reference
analysis data for batch 35 was not recorded). The Savitzky-Golay 2"^ derivative
transformation did not detect these batches as unusual from their blends, however
batches 33, 34 (BN5065) and 35 were detected as unusual from the tablet PLS model
(Appendix C: Table C l9) (tablet reference analysis data for batch 34 was not recorded).
For the lower strength tablet process, the SNV DT transformation was considered to be
the most appropriate pre-treatment of those tested for blend and tablet data.
With the higher strength tablet process data sets, the SNV DT transformation detected
batches 23 (BN5049)and 24 (BN5050) as unusual at both the blend and tablet stages.
Batch 25 (BN5051) was detected as unusual at the blend stage. The blends were not
found to have unusual reference analysis values, however they produced tablets with
average thicknesses of 4.62 mm, above the limit of 4.6 mm. The SNV DT and Savitzky-
Golay tablet models detected batches 34 (BN5070) and 38 (BN5078) and 39 (BN5079)
as unusual (Appendix C: Table C20). Batch 34 produced tablets with unusually long
disintegration time (17 seconds) and which had an average thickness greater than the
maximum limit (average thickness = 4.66 mm). Batches 38 and 39 had average tablet
thicknesses above the maximum limit (4.62 mm for both batches) and batch 38
produced tablets with 1 mg friability. With the Savitzky-Golay blend model, batch 38
could be detected as unusual despite having apparently normal reference analysis results
(Appendix C: Table C l8). Both the SNV DT and Savitzky-Golay 2"^ derivative pre
treatments produced useful blend and tablet models.
190
4. 4 Statistical Quality Control of The Entire Process by Multiblock PLSR
4. 4 .1 Multiblock Partial Least Squares Model Generation
Multiblock PLS has been proposed as an alternative projection method to single block
PLS for situations with large numbers of variables that can be divided into distinct
process sections (X blocks) (MacGregor et al, 1994; Wangen and Kowalski, 1988). The
data sets used in this study may be considered as two process X blocks: blend stage, X I
(blend spectra) and tablet stage, X2 (combined tablet absorbance and transmission data).
The final product quality data, Y, are the blend and tablet certificate of analysis
reference data combined (14 variables). MacGregor (1994) states that multiblock
projection methods allow for easier interpretation of process data because smaller
meaningful blocks can be individually monitored as may the relationship between these
blocks.
The multiblock PLS algorithms used in this study were variations of those of Wold et al
(1987) and Wangen and Kowalski (1988). This algorithm leads to a set of orthogonal
loading vectors (w /a , a= 1 , 2 , . . . ) and orthogonal latent vectors (r /a , a= 1 , 2 , . . . ) for
each block X/. The X/ blocks are then represented in terms of their leading A PLS
components as:
= (4.4.1)
a=\
X 2 = '^ t 2 ^ p 2 l + E 2 (4.4.2)
a=\
This enables monitoring and construction of diagnostic plots for each block separately,
as previously described for singleblock PLS. This algorithm is also able to effectively
handle missing data. An overall monitoring space for the process may be obtained by
191
using projections in the latent vector space {tCa, a= 1,2, ...) of the consensus matrix T
formed by collecting the latent vectors from the individual blocks. The score vectors of
this consensus matrix are no longer orthogonal, however it has been shown that where
blocking of the process variables has been done in a meaningful fashion, these vectors
should continue to define the same plane as the latent vectors obtained by single block
PLS, and provide essentially the same predictions of Y:
Y = ± tc ,g l (4.4.3)
a=]
A check on whether the blocking has been done well is to compare predictions of Y
obtained from the singleblock and multiblock algorithms for the same number of
dimensions (A). These should be comparable.
4, 4. 2 Multiblock PLS Model Variability
The multiblock PLS models were found to account for similar amounts of variability
within each process stage NIR data set and for the certificate of analysis data as was
explained by the single block PLS models (Appendix C: Tables C5 to C6). The raw and
SNV DT models explained considerably more variability within the NIR data sets than
in the certificate of analysis data sets, as with single block PLS models. The Savitzky-
Golay smoothed second derivative produced multiblock PLS models which accounted
for similar amounts of variability within the NIR and certificate of analysis data sets
(Appendix C: Tables C5 to C6).
These results suggest that the multiblock models are able to model the data as
effectively as the single block PLS models for each stage of the process and that the
Savitzky-Golay 2"^ derivative transformation produces models which explain similar
amounts of variation in both the NIR and reference analytical data.
192
4. 4. 3 Multiblock PLS Loadings
The loadings of the multiblock PLS models appeared to represent physical and chemical
information. However, their precise interpretation was difficult, especially with the
combined tablet absorbance and transmission data.
4, 4. 4 Q Statistic Monitoring o f Unusual Batches
The performance of this control chart was determined for each model type and for the
different data pre-treatments by comparison of results above the 99% significance level
with the results from the certificate of analysis reference data {i.e. significant Q values
and corresponding unusual reference analysis values). In particular, the ability of this
chart to identify anomalies in the process in batches preceding those which produced
lower quality product and failed reference laboratory tests, was examined as this would
be useful for detecting trends in the process over time. MB PLS models for each process
stage (blend and tablet) were created for the different pre-treated data sets in a recursive
fashion, with batches whose Q statistic exceeded the 99% significance level excluded
from the model. Each MB PLS model had 6 components
Multiblock PLS Models
With the lower strength process multiblock PLS models, batch 36 (BN5075) was found
to have processed unusually at both the blend and tablet stages for all data sets,
consistent with results for the single block PLS results of tablet models and SNV DT
blend PLS model (Appendix C: Tables CIS & C14) (Fig. 4.2). This batch did not show
unusual reference analytical results at either process stage, however it occurred within a
period where unusual product was produced. Batches 25 and 26 (BN5040 and BN5041)
were found to have exceeded the 99% limit for Q at both the blend and tablet stage with
the Savitzky-Golay 2"^ derivative data consistent with singleblock PLS models
193
(Appendix C: Tables C13 & C14) (Fig. 4.3). With the SNV DT and Savitzky-Golay 2"^
derivative lower strength tablet models, both batches 36 and 37 exceeded the 99% limit
for Q (Appendix C: Table C14); batch 37 (BN5076) was found to have an average drug
substance content per tablet below the minimum limit. This result is consistent with
those of single block PLS models and also confirms the results of this control chart at
the blend stage where batch 36 was identified as having processed unusually. With the
lower strength process multiblock PLS models, the SNV DT model performed best at
the blend stage and the Savitzky-Golay 2"^ derivative model performed best at the tablet
stage.
With the higher strength tablet process data sets, consistent results were obtained
between blend and tablet control charts. Batch 34 (BN5070) and batches 36 to 39
(BN5073, BN5074, BN5078, BN5079) were found to have processed unusually at the
blend and tablet stage with raw data (Appendix C: Tables CIS & C16) (Fig. 4.4). With
SNV DT data, batches 28 to 30 (BN5054, BN5058, BN5059) were found to have
processed unusually at the blend stage, and batches 28 and 30 were found to have
processed unusually at the tablet stage (Appendix C: Tables CIS & C16) (Fig. 4.S). The
results for Savitzky-Golay 2"^ derivative data were very consistent: batches 27 to 33
(BNS0S3, BNS0S4, BNS0S8, BNS0S9, BNS068, BNS069) and batches 3S, 38 and 39
(BNS071, BNS078, BNS079) were found to have processed unusually at both the blend
and tablet stages (Appendix C: Tables CIS & C16) (Fig. 4.6). These results are
consistent with those of the single block PLS models. For the higher strength tablet
process multiblock PLS models, the Savitzky-Golay 2"^ derivative model performed
best.
194
A
•
O 40
1 •
5 10 15 20 25 30 35
O b s e rv a tio n
300
B
250 -
200
O 150 - -
100
50
0
5 10 15 20 25 30 35
O b s e rv a tio n
Fig. 4.2. Multiblock PLS Q statistic control charts for the lower strength tablet
process: A) blends, B) tablets (SNV detrend data, n =29 batches for PLS
modelling).
1000
A
800 -
600
400 - -
200
0 1 •
5 10 15 20 25 30 35
O b s e rv a tio n
O 400
• •
5 10 15 20 25 30 35
O b s e rv a tio n
Fig. 4.3. Multiblock PLS Q statistic control charts for lower strength tablet
manufacturing process: A) blends; B) tablets (11 point quadratic Savitzky-Golay
smoothed second derivative data, n =29 batches for PLS modelling).
195
A '
■
. ' . ' '
5 10 15 20 25 30 35 40
O b s e rv a tio n
O b s e rv a tio n
Fig. 4.4. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (raw data, n = 28 batches out of 41 used for
PLS model).
20 25
O b s e rv a tio n
O 1000
20 25
O b s e rv a tio n
tablet process: A) blends, B) tablets (SNV detrend data, n = 32 batches out of 41
used for PLS model).
196
20 25
O b serv atio n
20 25
O b serv atio n
tablet process: A) blends, B) tablets (11 point quadratic Savitzky-Golay smoothed
second derivative data, n =22 batches out of 41 used for PLS model).
197
Overall, the Q statistic monitoring charts enabled construction of single block and
multiblock PLS models which represented the variability present within batches
produced whilst the process was operating in a normal manner. The results of
multiblock models were consistent with those of singleblock PLS models.
4. 4. 5 MSPC o f Multiblock PLS Blend A nd Tablet Models
Preliminary work with multiblock PLS models showed that these produced consistent
results between blocks with the Q statistic control charts and with the Hotelling’s
control charts. The MEWMA Hotelling’s 7^ and Anderson’s asymptotic normal
approximation control charts were therefore only considered with these models.
H o t e l l i n g C o n t r o l Phase 1 For Blends A nd Tablets
With both lower strength and higher strength tablet process multiblock PLS models,
Savitzky-Golay 2"^ derivative transformation of the blend absorbance data was found to
be necessary. The Monte Carlo search algorithm was unable to detect any unusual
batches with blend absorbance data for the lower strength tablet process (Appendix C:
Table C21) and could only detect two batches as unusual with higher strength tablet
process SNV DT blend data set (Appendix C: Table C22). With the higher strength
process blend absorbance data, this search algorithm identified a cluster of batches
which have been shown to have a different spectral offset only (Appendix C: Table
C22) (Fig. 4. IB). With the lower strength process SNV DT blend data, the first 18
batches were identified as unusual by the search algorithm, however these have been
shown to differ only in their spectral offset from the remaining batches (Appendix C:
Table C21) (Fig. 4.1 A). The Savitzky-Golay 2"^ derivative pre-treatment effectively
removed the offset and was considered to be useful for the blend data set. The number
of batches selected by the Monte Carlo search with this blend data set was 28.
198
With the lower strength tablet data sets, scatter correction was not necessary and all
models used between 25 and 29 batches (Appendix C: Table C23). With the higher
strength tablet data sets, scatter correction of the raw spectra was necessary (Appendix
C: Table C24).
Hotelling^s Control Phase 2 For Blends A nd Tablets
With the lower strength tablet process model generated from Savitzky-Golay 2"^
derivative blend data, batches 33 (BN5064), 34 (BN5065) and 35 (BN5067) were
identified as unusual; batch 33 produced tablets of 1 mg friability (Appendix C: Table
C21, note: no reference analysis data were recorded fo r batches 34 and 35). These
results are consistent with those for the single block PLS models, and show that the
model can detect deviations in the process from the blend stage which are not detectable
with current reference analysis. These results are shown clearly with the MSPC control
charts (Fig. 4.7B) and on the PLS components 1 and 2 score plot (Fig. 4.8). The score
plot shows that these batches have deviated away from the normal operating region
defined by the 99% control ellipse, in addition batch 26 (BN5041) lies outside the 95%
control ellipse. With the lower strength tablet models, all three of these batches were
identified as unusual, in addition with the Savitzky-Golay 2"^ derivative model, batches
25 (BN5040) and 26 (BN5041) were identified as unusual (Appendix C: Table C23).
These results are shown with the MSPC control charts (Fig. 4.9B) and on the PLS
component 2 and 1 score plot (Fig. 4.10). With this plot, batch 25 lies outside the 99%
control region, and deviation from the normal process operating region was observed
for batches 33, 34 and 35 with increasing distance.
199
H o tellin g ’s 7^ c o n tro l c h a rt; p h a s e 1 H o te liin g 's c o n tro i c h a rt: p h a s e 2
10" T- 10*
10"' 10 20 25
10"'
5 5 10 15 20 25 30 35
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v aria b iiity c o n tro l c h a rt: p h a s e 2
10"' 5 10 15 20 25 30 35 10 15 20 25 30
Fig. 4.7. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.
200
25
•16
•3 2
•3 9
•10 <37,
<31
a. 0.1
-5
•m 02
-10
-15
<33
-2 0 •3 5
•3 4
-2 5
-3 0 -20 -1 0 0 10 20 30
PLS com ponent 2 sc o res
Fig. 4.8. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components 2
and 1 scores of multiblock PLS model of lower strength tablet process blends (11
point Savitzky-Golay smoothed second derivative data, n = 29 batches for PLS
modelling; 28 batches used for optimising control limits).
201
H o te llin g 's 7* c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2
I
I
10 *
5 10 15 20 25 5 10 15 20 25 30 35
MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2
10 '
5 10 15 20 25 30 35 10 15 20 25 30
Fig. 4.9. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablets (11 point Savitzky-Golay smoothed second derivative data, n
= 25 control phase 1 batches): A) Hotelling’s 7^ control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.
202
40 47
•39 -38
41
•14
c -1 0 ■20
g.
43
-30 44
-40 •35
-50
-50 -40 -30 -2 0 -1 0 0 10 20 30 40 50
PLS component 2 scores
Fig. 4.10. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
2 and 1 scores of multiblock PLS model of lower strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n - 2 9 batches for PLS
modelling; 25 batches used for optimising control limits).
203
With the higher strength tablet process blend data sets, the Savitzky-Golay 2"^
derivative produced results which were most consistent with single block PLS models.
Batches 28 (BN5054); 29 (BN5058) and 38 (BN5078) were found to have processed
unusually with this pre-treatment (Appendix C: Table C22) (Fig. 4.11B). This is a
useful result as these batches produced lower quality tablets despite having normal
blend reference analysis results. The higher strength tablet models produced results
consistent with single block PLS and reference analysis data for the SNV DT and
Savitzky-Golay 2"^ derivative transformations respectively (Appendix C: Table C24)
(Figs. 4.12B & 4.13B). Batches 34, 38 and 39 (BN5070, BN5078 and BN5079) were
found to be unusual with SNV DT tablet data, and batches 34 and 38 were unusual with
Savitzky-Golay 2"^ derivative data. This is shown clearly with the MS PC control charts
(Figs. 4.12B & 4.13B) and with PLS components score plots (Figs. 4.14 to 4.17). With
the PLS component 6 and 1 score plot of SNV DT higher strength tablet data (Fig.
4.15), deviation from normal process operating performance can be observed from
batch 27 (BN5053) to 28 (BN5054). These batches were above the 95% Hotelling’s 7^
limit with the MS PC charts. Deviation from normal operating performance can be
clearly observed from batches 38 and 39. In addition, batch 37 (BN5074) lies just
outside the 95% ellipse but at the opposite end of the ellipse from batches 38 and 39.
Interestingly, this batch did not exhibit excessive average tablet thickness as batches 38
and 39 did, however the tablets showed 4 mg friability. Similar results were observed
with the PLS component 6 and 5 score plot (Fig. 4.14). With the Savitzky-Golay 2"^
derivative data, batches 38 and 39 also lay outside the 99% control region on PLS
components score plots (Figs. 4.16 & 4.17). Batch 27 lay outside the 95% control limit
on PLS component 1 and 4 score plot (Fig. 4.16).
These results clearly show that the multivariate PLS projection method produces
excellent diagnostic ability for identifying deviations from normal process operating
204
Hotelling's 7^ control chart: p h a se 1 H otelling's 7^ controi chart: p h a se 2
10*
10*
5 10 15 20 25 5 10 15 20 25 30 35 40
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro i c h a rt: p h a s e 2
5 10 15 20 25 30 35 40 10 15 20 25 30 35
Fig. 4.11. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.
205
Hotelling’s 7* control chart: p h a se 1 Hoteiling s 7^ controi chart: p h a s e 2
10"' 10" '

5 10 15 20 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v ariab iiity c o n tro l c h a rt: p h a s e 2

1500
X 1000
c 500
5 10 15 20 25 30 35 40 10 15 20 25 30 35
of higher strength tablets (11 point Savitzky-Golay smoothed second derivative
data, n = 21 control phase 1 batches): A) Hotelling’s control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.
206
Hotelling’s 7® control chart: p h ase 1 H oteiiing's 7® control chart: p h a se 2
10"
10"' 5 10 15 20
10"' 10 15 20 25 30 35 40
5
X100
10" '
5 10 15 20 25 30 35 40 10 15 20 25 30 35
of higher strength tablets (SNV detrend data, n = 24 control phase 1 batches): A)
Hotelling’s control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.
207
g2 -3 3
•14
•1 -3 6
0.1 ■24 ^ 5
40
-15 <39
-2 0
•3 8
-2 5
-2 5 -2 0 -15 -1 0 -5 0 5 10 15
6 and 5 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).
208
50
40
■24
<39
■26
20 •3 8
•10 •3 7
•12
•36 ■21
CL
•32 <33
o _10
-2 0
•15
-30
•14
-40
•13
-50
-25 -2 0 -15 -1 0 -5 0 5 10 15
Fig. 4.15. Hotelling’s 7^control ellipses (95% and 99% limits) for PLS components
6 and 1 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).
209
■38
20
•10/
<39
•30
•3€3
a.
,■21
•14
-1 0
•IB
•19
-2 0 •13
-3 0
-4 0 -3 0 -2 0 -1 0 0 10 20 30
PLS com ponent 4 sc o r e s
4 and 1 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n = 22 hatches for PLS
modelling; 21 hatches used for optimising control limits).
210
^4 c
•15 <32
<35
40
2o 41
O •2 9
c ■20 -2 4
66
g •11 <37
& •10
E -1 0
O
ü •21
Q.
<39
-2 0
-3 0
•3 8
-4 0
-3 0 -2 5 -2 0 -1 5 -1 0 -5 0 5 10 15 20
PLS com ponent 5 sc o r e s
Fig. 4.17. Hotelling’s control ellipses (95% and 99% limits) for PLS components
4 and 5 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n =22 hatches for PLS
modelling; 21 hatches used for optimising control limits).
211
conditions. Deviation may be observed at the blend stage before tabletting, despite
normal reference analysis results. The monitoring plots may be further simplified to two
PLS scores plots once statistical limits and in-control batches have been established.
These enable both determination of process deviation and also diagnosis of the problem
as the regions in which the scores are located are characteristic of the problem.
Multivariate Exponentially Weighted Moving Average Control Charts
These control charts were able to successfully identify drift in the process mean vector.
With the lower strength and higher strength process blend models produced from raw
data, approximately half of the batches exceeded the MEWMA Hotelling’s 7^ limit of
17.72. An example for higher strength tablet process blends is given in Fig. 4.18C. This
chart was sensitive to the systematic variation in the blending process (See Chapter 3)
which resulted in an offset difference in their absorbance spectra (Fig. 4.1 A). However,
scatter correction of the blend and tablet data sets was necessary for monitoring the
blending stage. The MEWMA control charts for SNV DT and Savitzky-Golay 2"^
derivative data of the lower strength tablet process blends were both able to detect drift
in the process mean vector, which reached a state of statistical control and then drifted
out of control, above the MEWMA Hotelling’s 7^ limit of 17.72, from blend 33
(BN5064) onwards (Fig. 4.7C). With the higher strength tablet process blend data sets,
the Savitzky-Golay data and SNV DT data both show the MEWMA Hotelling’s 7^
drifting in and out of control, above the limit of 17.72 (Figs 4.11C & 4.19C
respectively). The Savitzky-Golay data were better able to identify drift in the process
mean vector as it performed better with the Hotelling’s 7^ control phase 2 charts.
With the lower strength tablet data sets, the performance of the MEWMA Hotelling’s 7^
chart also depended on the ability of the model to identify unusual batches from the
212
Hotelling’s 7^ control ctiart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2
10'
10"' 10"'
5 10 15 20 25 5 10 15 20 25 30 35 40
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v aria b ility c o n tro l c h a rt: p h a s e 2
10'^ 5 10 15 20 25 30 35 40 10 15 20 25 30 35
of higher strength tablet process blends (raw data, n = 25 control phase 1 batches):
A) Hotelling’s control phase 1 chart; B) Hotelling’s 7^ control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
213
Hotelling’s 7^ control chart: p h ase 1 Hotelling's 7^ control chart: p h ase 2
10
10
10" ' 10" '

5 10 15 20 25 30 35 5 10 15 20 25 30 35 40
MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b iiity c o n tro l c h a rt: p h a s e 2

60
50
40
: 30
20
10
0
-1 0
-2 0
10 15 20 25 30 35 40 10 15 20 25 30 35
of higher strength tablet process blends (SNV detrend data, n =35 control phase 1
batches): A) Hotelling’s 7^ control phase 1 chart; B) Hotelling’s 7^ control phase 2
chart; C) Hotelling’s 7^ multivariate exponentially weighted moving average
control chart; D) Anderson’s asymptotic normal approximation control chart.
214
Hotelling’s control phase 2 charts. This control chart showed drift out of control with
the SNV DT data from batch 33 onwards (BN5064) (Fig. 4.20C), however the chart
using Savitzky-Golay 2"^ derivative data also identified systematic drift from batch 24
to 27 (BN5039 to BN5042) and was therefore considered to perform better (Fig. 4.9C).
For the higher strength tablet data sets, the SNV DT and Savitzky-Golay models
showed the MEWMA Hotelling’s 7^ mostly above the limit of 17.72 (Figs. 4.13C &
4.12C). This was due to batches exceeding the Hotelling’s 7^ 99% limits over the
production period.
Anderson Asymptotic Normal Approximation
With the lower strength tablet process data sets, the SNV DT model detected significant
process variance in the tablets of batches 26 (BN5041), which produced friable tablets,
and 36 (BN5075) (Appendix C: Table C23) (Fig. 4.20D).With raw tablet data, batch 37
(BN5076) was also detected as having exhibited excessive process variation at the tablet
stage (Appendix C: Table C23). The tablets from this batch were found to have low
drug substance content (4.83 mg per tablet). Batches 10 (BN5017) and 11 (BN5018)
were also found to have shown significant process variance with raw data (Appendix C:
Table C23). Examination of the reference analysis data revealed an increase in content
uniformity with these batches. Batch 10 had content uniformity of 2.43% and batch 11
had a content uniformity of 5.1%. Although less than the maximum limit of 6%, this
was still a high value (mean content uniformity = 1.66%, standard deviation = 1.06%, n
= 39 batches).
With higher strength tablet process blend data, the SNV DT model identified batches 23
(BN5049) and 38 (BN5078) as having exhibited significant process variance (Appendix
C: Table C22) (Fig. 4.19D). Both of these batches produced tablets of average
215
Hotelling’s 7^ control chart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2
10 *
10
10' *
20
10*
5 10 15 25 5 10 15 20 25 30 35
E 60
40
10* 5 10 15 20 25 30 35 10 15 20 25 30
of lower strength tablets (SNV detrend data, n = 29 control phase 1 batches):
A) Hotelling s 7^ control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s multivariate exponentially weighted moving average control chart;
216
thickness 4.62 mm (above the maximum limit of 4.6 mm); batch 23 (BN5049) produced
tablets with a high moisture content of 4.44%, close to the maximum limit of 4.5%
(mean moisture content = 3.18%, standard deviation of moisture content = 0.31%, n =
41 batches) and batch 38 (BN5078) produced tablets which were friable (1 mg). Batch
38 was also identified as having shown significant process variance at the tablet stage
with Savitzky-Golay 2"^ derivative data (Appendix C: Table C24) (Fig. 4.12D).
The Anderson’s asymptotic normal approximation control chart is able to detect an
increase in process variance that leads to extreme final product quality on a number of
variables. With the multiblock models, this may also be traced back to the blend stage
despite apparently normal blend reference analysis values. This demonstrates the power
of the ‘multivariate projection of NIR process data’ method over the traditional wet
chemical tests used.
4. 5 Sum m ary of Results
The results of this chapter are summarised in Tables 4.1 (lower strength tablet process)
and 4.2 (higher strength tablet process).
217
Table 4.1. Summary of results for lower strength tablet process.
NIR blend absorbance NIR tablet combined data NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number data predicted unusual predicted unusual predicted unusual tablet data predicted
unusual
G f A" G 0 A G A
4993 1 •
4994 2
4996 3 .
4997 4 . •
4998 5
5002 6
5003 7 .
5013 8
5016 9 .
5017 10 . • . Increase in content uniformity
5018 11 • . • Increase in ccmtent uniformity
5025 12
5025"* 13
5026 14
5027 15
5028 16
5028"* 17 •
5029 18 .
5035 19
5036 20
5037 21
5038 22
5039 23 • Excessive moisture deviation
5039"’ 24 . • Excessive moisture deviation
5040 25 • • • • .
5041 26 • • • • • • • . • Friability o f Im g
5042 27 • • • • Friability o f 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 . • Friability o f Img
5065 34 N/A' C of A. data missing
5067 35 N /A' C of A. data missing
5075 36 • • • •
5076 37 • Low drug ccxitent per tablet
5077 38
5967 39
represents Anderson’s asymptotic normal approximation

rb
denotes a re-blended batch
N/A denotes certificate of analysis data not available
218
Table 4.2. Summary of results for higher strength tablet process.
NIR PLS blend NIR PLS tablet combined NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number absorbance data predicted data predicted unusual predicted unusual tablet data predicted
unusual unusual
Q t Q t A Q A Q A
4991 1 •
4992 2 •
4999 3 •
5000 4 • •
5001 5
5009 6
5010 7
5011 8 •
5021 9 . . •
5022 10
5023 11
5024 12
5031 13 • • • • . •
5032 14 • • •
5033 15 • •
5043 16
5043"’ 17
5043" 18 •
5044 19 • • .
5046 20
5047 21 • •
5048 22
5049 23 • • • • • Thick tablets (4.62 m m ) of high
moisture (4.44%)
5050 24 • . Thick tablets (4.62 m m )
5051 25 • Thick tablets (4.62 m m )
5052 26
5053 27 Friability o f 1 mg
5054 28 • • Friability 4 m g; tablet thickness
= 4.63 mm
5058 29 • Thick tablets (>4.6 mm)
5068 32 Thick tablets (4.62 m m )
5069 33 Thick tablets (4.62 m m )
5070 34 . Thick tablets (4.61 mm)
5071 35 • • • Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36
5074 37 • Friability o f 4 mg
5078 38 • • • • • • Friability o f 1 mg; thick tablets
(4.62 mm)
5079 39 • • • • Thick tablets (4.62 m m )
5080 40
5081 41
^ represents Anderson’s asymptotic normal approximation

^ denotes re-blended batch
219
4 .6 Conclusion
Both the singleblock and multiblock PLS models appear to be equally effective in
modelling the variability within the reference analytical data at both process stages. The
loadings of the models suggest that physical and chemical information is modelled.
For identification of batches which processed unusually (Q statistic), scatter correction
(SNV-DT or Sg2dl 1) of the NIR data was useful. This was especially the case with
blend NIR measurements. Batches which processed in an unusual manner could be
detected at the blend stage, even though the reference analytical data did not detect
anything unusual. For example, batches of unusual blends which produced friable
tablets. The Q statistic control charts of singleblock and multiblock models were also
able to identify trends in process deviation at both process stages: consecutive batches
which exhibited unusual reference analytical values, eg excessive blend moisture
deviation, low average drug substance content per tablet, friability, high average tablet
thickness and prolonged tablet dissolution time. Overall, the SNV-DT scatter correction
was considered the most appropriate, of those tested for blend data, for detection of
unusual processing by this control chart. For tablets, the Sg2dl 1 was considered the best
scatter correction method tested.
The MS PC control charts of the singleblock and multiblock PLS scores were also able
to identify unusual batches, as with the Q statistic charts. For example, batches which
produced friable tablets could be detected at the blend stage, despite normal reference
analytical data, and also from their tablet NIR measurements. Scatter correction by
Sg2dl 1 transformation was found to be the best data pre-treatment of blends, as this
produced the most consistent (i.e. best pre-treatment for detection of process deviation)
results between process stages. Interestingly with the lower strength tablet models,
scatter correction was not necessary to identify unusual batches (friable or increased
average tablet thickness), however for detection of process variation (drift), the Sg2dl 1
220
produced best results. The models of the higher strength tablets did require scatter
correction, eg SNV-DT or Sg2dl 1.
Once models have been calculated from scatter corrected data, monitoring of the PLS or
MBPLS scores enables diagnostic control charts to be constructed. These may be
constructed from just two PLS scores and are able to provide diagnosis of the process
problem as the region in which the unusual batch scores are located is characteristic. For
example, the scores of blends or tablets which produce friable tablets or tablets which
have high average thickness tend to locate in specific areas of the score plot, and outside
the 99% Hotelling’s 7^ control ellipse. This was also found to be the case with batches
for which no reference analytical data was provided (eg BN5041 and BN5042). These
batches were consistently found to be unusual throughout the process, and had
preceding batches which were shown to be unusual from C. of A. data. These batches
would therefore be expected to show the same trend as the preceding batches, i.e. show
friability or increased tablet thickness etc. The process variance control charts were also
found to be useful for process monitoring. With the lower strength tablets, batches
which showed significant process variation corresponded to: friable tablet batches;
batches with low average drug substance content per tablet and batches which occurred
during a manufacturing period where a trend in increase in content uniformity was
observed (although remaining within limits). These control charts for the higher strength
tablet process showed consistency in results at the blend and tablet stages. For example,
batches which showed significant process variability at both stages produced tablets
which had high average thickness and which were friable.
Overall, the PLS and MBPLS models produced consistent results between process
stages and results which are consistent with the PCA methods (Chapter 3). This
suggests that these PLS methods may also be solely used to control and monitor the
quality of product produced in this process.
221
CHAPTER 5
Multivariate Image Analysis of Near
Infrared Multispectral Blend Images For
Quality Control of Pharmaceutical Blending
5 .1 Introduction
In Chapters 3 and 4, multivariate procedures were described for statistical quality
control of blending and tabletting using near infrared spectroscopy. The methods
described required construction of a reference set of batches which processed normally.
Unusual batches could be identified easily since they did not belong to the same
multinormal distribution. In some instances, chemical and physical information could be
directly obtained from the NIR data, enabling diagnosis of the process problem. For
example, significant variation in the PC scores of a batch of blend on a PC known to be
related to moisture content. However, this was not always possible.
In this chapter, near infrared multispectral imaging is used to study some of the unusual
blends. The aim is to generate detailed, spatially resolved images which provide
diagnostic chemical and physical information at the microscopic level. Section 5.4
describes liquid crystal tuneable filter InSb focal plane array NIR imaging microscopy.
Multivariate methods and data pre-treatments are described in Sections 5.5 and 5.6
respectively. In Sections 5.7 to 5.9, methods are described for statistical control and
monitoring of blend quality. Section 5.10 describes alternative approaches to monitoring
blend quality. Conclusions regarding this method are described in Section 5.11.
222
5. 2 Materials Studied
A selection of blends which exhibited unusual reference analytical results at the blend
and tablet stage were studied. Their batch numbers (BNs) were: 5039, 5040, 5045,
5064, 5065, 5067, 5070, 5078. Blend BN5045 exhibited excessive moisture deviation
and poor uniformity of content of the drug substance, throughout the blend; large lumps
of drug substance were found. These did not dissolve prior to high performance liquid
chromatography (HPLC) analysis, and were observed by visible light microscopy in this
study. This blend was therefore not tabletted and was considered an important blend for
study by NIR imaging microscopy. The drug substance content of each blend should lie
between 2.42 to 2.58% (2.5% nominal value); between different blend samples of the
same batch, the maximum absolute deviation in active content from their mean value
should not exceed 3%. In addition to these blends, powdered crystalline drug substance
was also studied.
5. 3 Sample Preparation
A single sample from each batch was prepared for imaging using a powder sampling
accessory with a barium fluoride window. Samples were prepared by compressing
approximately 200 mg of powder between two flat circular barium fluoride discs. The
discs were held together between three stainless steel pins mounted on a steel plate and
were tightly secured to the plate by a steel screw cap. A circular hole in the screw cap
allowed for passage of light through the barium fluoride disc and onto the sample. The
samples were positioned on the microscope stage with the barium fluoride window
aligned beneath the objective lens.
223
5. 4 Liquid Crystal Tuneable Filter InSb Focal Plane Array Near Infrared
Imaging Microscopy
5. 4 .1 Instrumentation
The near infrared (NIR) imaging system (MatrixNIR, Spectral Dimensions Inc.,
Maryland, MD, USA) incorporated a high resolution focal plane array InSb detector,
capable of producing NIR-images at high frame rate and with resolution of 256-by-256
pixels. The focal plane array is cooled by a Dewar flask filled with liquid nitrogen.
Light from the 100 W tungsten halogen lamp was directed through the liquid crystal
tuneable filter (LCTF) which had a tuneable range of 1100 to 1900 nm and wavelength
resolution of 1 nm (bandpass of 6 nm). The light emitted through the LCTF was then
reflected by a beam splitter through an Olympus microscope (objective lens
magnification of five times), equipped with a tungsten halogen lamp of variable
intensity, and onto the sample. Adjustment of shutters on the beam splitter directed
light diffusely-reflected from the sample up through the objective lens and onto the
focal plane array detector via a mirror.
5. 4. 2 Sample Imaging
A sample prepared from each batch was imaged through the barium fluoride window of
the sample accessory. The microscope stage height required to focus light from the
LCTF on the powder surface was determined by manually focusing visible light on to
the sample using the lamp attached to the microscope. The shutters of the beam splitter
were adjusted for this purpose. When focused, the microscope lamp was then switched
off and the shutters of the beam splitter re-adjusted to allow light from the LCTF to
reach the powder surface and be diffusely reflected onto the InSb detector. The InSb
detector was maintained at low temperature by means of a liquid nitrogen filled Dewar
flask. All images were produced in a darkened room with the camera frame rate and
224
gain adjusted to provide optimum image quality with minimal saturation of the focal
plane array pixels. A frame rate of 7.89 Hz and 2000 accumulations per image plane
(wavelength) were deemed to provide optimum image quality and were used
subsequently. Images were recorded at each wavelength across the spectral region used.
For each sample, this provided a three dimensional image cube where the first two
dimensions (x,y) represent spatial location (pixel) and the third dimension represents
wavelength. Each pixel represented 36 p,m^ of sample.
A background reflectance intensity image was produced using a ceramic reflectance
standard and with the same camera settings and wavelengths as for the blends.
5. 4. 3 Data Analysis
The multispectral images of the blends were individually transformed to diffuse-
reflectance NIR images by pixel-wise division of each pixel by the corresponding pixel
of the ceramic background image. The resultant images were used for multivariate
image analysis (MIA). All programs used were written in Matlab 5.3 Scientific and
Technical Programming Language (The Mathworks Inc., Natick, MA, USA) and were
run on an Acer Pentium II 333 MHz PC fitted with 320 Mb RAM.
5. 5 Multivariate Image Analysis
5. 5.1 Wavelength Range Selection
Preliminary data analysis using NIR absorbance spectra of blend BN5045 (n = 42
spectra) and a placebo blend (n = 9 spectra) (Chapter 3) were performed in order to
determine the spectral region where the active drug absorbs. This was investigated to
permit imaging over the minimum number of wavelengths necessary to reduce both the
computation time and the image file size. Principal components analysis of 11 point
quadratic Savitzky-Golay 2"^ derivative smoothed absorbance spectra (1110 to 2200
225
nm, 2 nm increments) (Fig. 5.1.) was able to separate the placebo blend and blend
BN5045 data on the first component (Fig. 5.2). The loadings for the first component
revealed that the important wavelengths to monitor {i.e. highest absolute loading values)
were 1132, 1135, 1649, 1665, 1701 and 2145 nm (Fig. 5.3), although the latter
wavelength was outside the range of the NIR imaging microscope. The loadings from
1649 to 1701 were most important, hence the wavelength range selected was restricted
to 1600 to 1750 nm. The wavelength increment was carefully set to 6 nm and therefore
incorporated these important peaks. The number of wavelengths selected was therefore
26 out of the 800 hundred possible. Imaging time for each blend was approximately 1.8
hours.
5. 5. 2 Multiway Principal Components Analysis
Multiway principal components analysis of three dimensional image data produces a
loading vector and a score image for each principal component extracted (Esbensen et
al, 1992; Geladi et al, 1989a, 1989b, 1994, 1996; Geladi, 1992). Mathematically, the 3-
way algebra is equivalent to unfolding the 3-way array, X, to a two way array followed
by ordinary principal components analysis (Section 3.6.3, equation (3.6.1)) (Bharati and
MacGregor, 1998). This unfolding results in only transient loss of information as the
resulting scores, T, may be rearranged back to form an image (three-way array) (Geladi
et al, 1996). This procedure has been termed unfolding and backfolding (Wold et al.,
1987). The principal components algorithm used here was the power method. This was
considered to be more suitable and computationally faster than the NIPALS method as
the kernel matrix is formed only once. Successive PCs are extracted by deflating the
residual kernel matrix. Different methods of calculating the kernel matrix were
examined. These were: the cross products matrix; the variance-covariance matrix and
the correlation matrix (Geladi et al, 1996). Preliminary data analysis showed these
226
■S-1
-3
-4
1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
W avelength/nm
Fig. 5.1. Savitzky-Golay 2"^ derivative (11 point quadratic) of absorbance NIR
spectra of a blend BN5045 {n = 9) and a placebo blend {n = 42) over the wavelength
range 1110 to 2200 nm (2 nm increments).
-2
•2.5 •2 -1.5 0 0.5 1 1.5 2 2.5
PC2 S c o re s
xIO"^
Fig. 5.2. Hotelling’s control ellipse (99%) of PCI and PC2 scores for a blend
BN5045 (n = 42 blend BN5045 spectra used in PCA, 11 point quadratic Savitzky-
Golay second derivative of absorbance data) showing separation of blend BN5045
from placebo blend (n = 9 placebo blend spectra used in PCA) (control limits set
using blend BN5045 PC scores).
227
0.2
0.15
0.1
0.05
0.05
-0.1
- 0.15
1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
Wav«l*ngthAim
Fig. 5.3. Principal component loadings for the first component for 11 point
quadratic Savitzky-Golay second derivative of absorbance spectra of a blend
BN5045 in = 42) and a placebo blend {n = 9).
228
methods to produce similar results. All PCI and PC2 images formed from these kernel
matrices were considered to represent an intensity and contrast image respectively
(Geladi et al, 1996).The cross products matrix has been suggested as a method for
generating intensity PC images (Geladi et al, 1996), however the first two PC images
from this kernel matrix were not visually different from those obtained using the other
two kernel matrices. Subsequent MIA used the variance-covariance matrix and not the
correlation matrix to avoid scaling up variables representing small amounts of variance
{i.e. noise) to unit variance.
5. 6 Image Cube Data Pre-treatm ents
5. 6 . 1 Sample Illumination A n d Multivariate Shading Correction
A preliminary multivariate image analysis was performed on the images using
reflectance data and the variance-covariance matrix. The first principal component
image for each blend was found to represent the intensity of reflected light {eg BN5045
(Fig. 5.4A.)). This image clearly shows that the illumination of the sample was non-
uniform. The score values on the first component confirm this and show a trend of
increasing intensity with pixel number (Fig. 5.5). The effect of pixel norm scaling
(multivariate shading correction) (Geladi et al, 1996) was also studied. Improvement in
intensity across the first PC image was observed (Fig. 5.4B) however lower intensity
still remained around the borders of the first 100 rows of the image. MIA was therefore
restricted to rows 100 to 256 for each image {i.e. 157 rows). Pixel norm scaling was not
considered necessary for this image region and was therefore not used.
5. 6 . 2 Noise Reduction by Median Filtering
The effect of an m-hy-n median filter on signal to noise ratio was investigated for all
images. This involved replacing pixel reflectance values by the median value of itself
229
150
200
#
250
50 100 150 200 250
50 100 150 250
Fig. 5.4. Principal component one image (PCI) of blend BN5045 (reflectance data)
showing non-uniform illumination of sample. A) Before multivariate shading
correction, B) after multivariate shading correction.
230
w -0 .5
-1 .5
3 4
P ix e l
Fig. 5.5. Principal component one score plot (PCI) for blend BN5045 (reflectance
data) showing variation in illumination intensity (score value) with pixel number.
231
and the pixels surrounding it in a 3-by-3 pixel filter at each image plane (wavelength).
The filter clearly improved signal to noise, probably by removing stray light and ‘salt
and pepper’ noise, producing smoother images (Fig. 5.6). Median filtering also
produced better visual contrast between true image features for each principal
component image (Fig. 5.6) because the 256 intensity levels were set from proper
intensity levels and not from extreme noise values. In figure 5.6, a surface scratch on the
barium fluoride disc of image BN5045 (lower left part of image) shows up more clearly
as a dark line in PCI image after median filtering (compare Fig. 5.6A & 5.6B). Other
spectral features, such as crystalline material, appear brighter after median filtering
(compare Fig. 5.6A & 5.6B). Examination of the first four principal components
loadings for PCA models derived from raw reflectance data and from median filtered
reflectance data showed that the third and fourth PCs appeared to contain more
chemical information after median filtering (Fig. 5.7). The cumulative percentage sum
of squares explained by the PCs extracted was greater after filtering, especially with the
first PC, and confirms the reduction in random noise in the data set (Table 5.1). Median
filtering was therefore considered very useful and was used in all subsequent MIA.
5. 6. 3 Effect o f Pixel-Wise Digital Smoothing A nd Baseline Correction on M IA
The effect of a range of data pre-treatments, commonly applied to near infrared spectra,
were studied with the blend images. These were: SNV; Sg2dl 1 and 15 point quadratic
Savitzky-Golay smoothed second derivative and DT transformation. Each was applied
pixel-wise to the 26 reflectance values. All pre-treatments were found to remove much
of the intensity information from the images. The first two principal components
loadings showed high positive and negative peak values across the wavelength range
and were considered to represent chemical information. The loss of intensity
information from the images is likely to make detection of unmilled crystalline material
232
•
È . . r : ' ' % ' :' i / J
Fig. 5.6. The effect of 3-by-3 median filtering on image signal to noise ratio (first
principal component image, reflectance data) for blend BN5045. A) No filtering, B)
3-by-3 median filtered data.
233
W av«l*ngth/nm
W av«l«ngth/nm W #v#l#ngth/nm
I
I
g
§ 3 - 0.1
Fig. 5.7. Principal component loadings for blend BN5045 (reflectance data) for
first four components. Raw reflectance data: (A) to (D); median filtered reflectance
data: (E) to (H).
Table 5.1. Cumulative percentage sum of squares (%SS) explained by principal
components models (reflectance and median filtered reflectance data sets) for
blend BN5045.
Data set Principal components %SS
Reflectance 1 57.19
2 80.01
3 84.56
4 87.79
Median filtered reflectance 1 76.99
2 87.23
3 89.76
4 91.70
234
harder as unmilled material appears to reflect with higher intensity, probably due to
some specular reflectance. This was observed with the first PC image for all blends (Fig
5.8), especially BN5045 (Fig. 5.6A & 5.6B). For this study, use of reflectance data was
therefore preferred.
5. 7 Multiway Principal Components Analysis of Multispectral Images
5. 7. 1 Multivariate Segmentation: Detection of Spatial Locations of Unmilled
Material And Drug Substance For Blends
Multivariate segmentation of images (pixel class delineation) identifies regions in the
image where a class of material is located (Geladi et al, 1996). In this study three
different approaches to identifying regions of drug substance and unmilled crystalline
material were examined.
The first approach attempted to classify by selection of pixels on a principal component
image above an arbitrary threshold intensity.
Another method studied involved construction of a pixel density map. This method
required selection of two PC images to monitor and therefore necessitated visual
interpretation of the PC loadings (for example the first and third PCs would be selected,
representing intensity and drug substance respectively). A pixel density map is a three
dimensional histogram which classifies pixels according to their intensities (0 to 255
intensity levels) on the two selected PCs. Mathematically, it is a two way matrix where
the row and column dimensions represent the intensity on the two PCs and the value of
an element in the array is the number of pixels with these intensities (Bharati and
MacGregor, 1998). Once calculated, the array is then displayed as an image. The
reasoning behind this approach is that pixels representing the same class of material will
show similar
235
100
150 150
50 100 150 200 250 50 100 150 200 250
100
150
50 100 150 200 250 50 100 150 200 250
100 100
150 150
100 150 200 250 50 100 150 200 250
H
100
150 150
50 100 150 200 250 50 100 150 200 250
Fig. 5.8. Principal component one intensity images of blends (median filtered
reflectance data). A) BN5039, B) BN5040, C) BN5045, D) BN5064, E) BN5065, F)
BN5067, G) BN5070, H) BN5078.
236
intensities on the two PCs and will be grouped together in the density map (Bharati and
MacGregor, 1998). Regions of interest in these maps were highlighted by placing them
inside a rectangle, selected using a computer mouse and an on-screen cross-hair pointer.
Pixels with the range of selected intensities for the two PCs were then projected back
into the image score space for visual interpretation.
The final method studied involved construction of Shewhart type control charts for the
PCs studied. The control limits used were those suggested by Jackson (1991) (Section
3.7). The reasoning behind this method is that pixels which represent a high
concentration of a material will show high positive or negative values on the PC which
represents that material.
With each method, the pixels which were selected were also used to construct binary
images. Selected pixels were given a value of one (white), non-selected pixels were
given a value of zero (black).
Overall, the most effective method for these data sets was found to be the Shewhart
chart method. This was able to locate areas of unmilled crystalline material with both
the blends and powdered drug substance from the first PC (intensity). In addition, with
the blends, the spatial locations of drug substance (both fine and unmilled clumps) were
located by use of a control chart for the third PC. This PC showed loadings similar to
spectra of the placebo and active blends (Fig. 5.9A and 5.9B) and importantly, similar to
the loadings of the third PC of drug substance (Fig 5. IOC). In this spectral region, the
absorbance was found to be higher if active material was present (Fig. 5.9), especially
from 1648 to 1672 nm. It therefore follows that drug substance shows lower reflectance
in this region. Interpretation of the loadings for the third component revealed that areas
of blend with no drug substance would have lower (more negative) score values than
areas of blend with drug substance. This is because regions of blend with low drug
237
-0 .0 3
-0.04
-0 .0 4
5-0.05
-0.05
-0.06
c -0.06
S -0.07 S -0 .0 7
-0 .0 8
-0.08
1600 1650 1700 1750 1600 1650 1700 1750
W avelength/nm
o 10
o
g
I.
° 1600 1650 1700 1750 ° 1600 1650 1700 1750
Fig. 5.9. Near infrared absorbance spectra of a placebo blend (n = 9) and a blend
{n = 42) (1600 to 1750 nm, 6 nm increment). Absorbance spectra: A) placebo blend,
B) blend. Second derivative of absorbance: C) placebo blend, D) blend.
I
1
Wmv#l*ngt.h/r
Fig. 5.10. Principal components (PC) loadings for the first four components for
powdered drug substance. A) First PC, B) second PC, C) third PC, D) fourth PC.
238
substance content will, similar to placebo blend, show higher reflectance. The
contributions to lower score values for such pixels will therefore be mostly from the
high negative loadings. With the careful selection of the wavelength range used {i.e.
where most differences in reflectance occur between drug substance and the bulk), the
drug substance should be identified from the Shewhart PC chart above a threshold level
(95% or 99% confidence). A comparison of the first and third PC images of blend
BN5045 provided evidence which supported this. The areas of crystalline material in
this blend (Fig. 5.11C) are known from certificate of analysis tests to represent clumps
of unmilled drug substance. Although these regions showed high reflectance in PCI
image, probably due to specular reflectance from the crystals, some absorption of light
also occurred as the same areas also appear as high score values for the third PC image.
With blend BN5067, the polarity of the loadings on PC3 were reversed. Pixels with low
score values were therefore deemed to represent drug substance, and also had identical
spatial locations to the crystalline material identified on the first PC image.
Binary images were produced for each blend using the third PC image. The percentage
of pixels found to represent drug substance was approximately 2.5% for the blends.
This is not inconsistent with the concentration of drug substance in the blends
(2.5%""/m).
5. 7. 2 Ôn-line ^ Type Analysis o f Blends From PC Shewhart Control Chart Binary
Images
The binary images (95% control limits on PC scores) representing unmilled crystalline
material (PCI) and drug substance (PC3) were monitored by dividing the images into
16 sub-images (Fig. 5.12) (each image used just the first 156 rows, and all 256
columns). With each image sub-area (39-by-64 pixels), the number of pixels classified
as unmilled or drug substance was recorded, and the results for the 16 sub-areas were
239
150 200
200 250 200 250
150 200 150 200
Fig. 5.11. Binary images of crystalline areas produced from principal component
one intensity images of blends (median filtered reflectance data). A) BN5039, B)
BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H) BN5078.
240
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
256
Fig. 5.12. Image sub-areas in = 16).
241
displayed as bar charts. This allowed for monitoring of areas with high concentration of
either unmilled material or drug substance.
Detection o f Unmilled Material
The results of this monitoring method allowed easy identification of areas with high
pixel concentration of unmilled material (Fig. 5.13). Overall, an average percentage of
unmilled material was calculated for each image and control limits, based on the t-
distribution (Neave, 1995), were used corresponding to 95% and 99% (Table 5.2).
These limits differ between blends and were used as a rough guide for identifying high
concentration areas; future monitoring would probably use limits set empirically. All
images showed at least one region with a high pixel concentration of unmilled
crystalline material (Fig. 5.13). Blend BN5045 showed two areas with high pixel
concentration of unmilled material (Fig. 5.13C). Blends BN: 5039; 5040; 5064; 5065
and 5067 all showed one area with a pixel concentration of 15% or greater (Fig. 5.13).
Detection o f Drug Substance
Monitoring of the PC3 binary images used the same method as for unmilled crystalline
material. For each image, the percentages of drug substance for the 16 sub-areas were
averaged and the standard deviation calculated (Table 5.3). From the results for these
images, blends BN: 5040, 5064; 5070 and 5078 appear to have mean drug substance
contents by surface area which are reasonably consistent with the specification limits of
2.42 to 2.58%"™/m. Blends BNs: 5039 and 5045 showed high concentration of drug
substance content in these images and their respective certificate of analysis results
confirmed that some blend samples showed excessive drug substance content. The
standard deviation of drug substance content was higher for BN5045 than for BN5039,
which agrees with the certificate of analysis content uniformity results. Two batches,
242
B
■5 15
X
■q.
o
10
ra o>
5 10 15 5 10 15
Image area Image area
5 10
image area Image area
«20
|io
Image area
0)
5
5 10
Image area
iJ
15
15
-10
5 10 15
Image area Image area
Fig. 5.13. On-line Shewhart PC control bar charts showing percentage of unmilled
crystalline material in 16 sub-areas of blend images, with mean and sigma and 2
sigma limits. Blend image: A) BN5039; B) BN5040; C) BN5045; D) BN5064; E)

BN5065; F) BN5067; G) BN5070; H) BN5078.
243
Table 5.2. PCI Unmilled crystalline material Shewhart control bar charts of blend
images, each divided into 16 image sub-areas.
Blend Percentage unmilled material (mean) Standard deviation o f unmilled material (%) Content uniform ity"
BN5039 1.888 4.064 729.71
BN5040 1.845 3.900 705.43
BN5045 1.470 2.854 464.22
BN5064 2.487 5.064 676.64
BN5065 2.101 4.263 618.95
BN5067 2.229 5.264 780.90
BN5070 1.360 1.726 315.47
BN5078 1.608 3.427 700
Table 5.3. PC3 ‘drug substance’ Shewhart control bar charts of blend images, eai
divided into 16 image sub-areas.
Blend Image calculated percentage of drug Certificate of analysis drug content Standard deviation of drug Content uniformity"
substance by surface area (mean) content
BN5039 2.659 2.523 0.694 68.362
BN5040 2.574 2.462 1.280 105.5
BN5045 2.654 2.579 1.230 140.0
BN5064 2.544 2.455 1.076 68.50
BN5065 2.637 2.448 1.579 155.3
BN5067 2.284 2.443 1.180 143.9
BN5070 2.478 2.502 0.981 110.3
BN5078 2.514 2.521 1.037 81.67
^ ‘Content uniformity’ is the maximum absolute difference between the image sub-area
with the highest number of pixels classified as unmilled with the mean value, divided by
the mean value over all sub-areas, expressed as a percentage.
244
BN5065 and BN5067, were produced consecutively and subsequently found to have
processed unusually by NIR spectroscopy (Chapters 3 and 4), having showed high and
low drug substance contents respectively by this method. These results are shown in
Figs. 5.14 and 5.15.
5.8 Particle Size Analysis of Unmilled Crystalline Material and Drug
Substance.
The binary images of unmilled crystalline material and of powdered drug substance
were subjected to particle size analysis. With each image, all 157 rows and 256 columns
were used. Particle-sizing was performed by scanning horizontally and vertically across
each binary image (PCI ‘unmilled crystalline material’ and PCS ‘powdered drug
substance’ binary images) and measuring the diameter of each particle encountered.
Hence for each particle, a number of measurements of its size were recorded (Table
5.4).
The data for each image were used to construct percentage frequency particle size
distributions (Fig. 5.16 and 5.17) and to estimate the mean, median and maximum
particle sizes for each image, and the standard deviation of each percentage frequency
particle size distribution. The resolution of the NIR imaging microscope restricted the
lowest measurable particle size to 6 microns.
Particles identified as crystalline material in each blend’s PCI binary images showed
mean and median particle sizes greater than those measured for the corresponding PC3
powdered drug substance images (Table 5.4). In addition, with the exception of blend
BN5070, the maximum measured particle sizes of each blend’s PCI unmilled
crystalline material image were much larger than with the corresponding PC3 powdered
drug substance image. Blends: BN5040; BN5045; BN5065 and BN5078 showed the
245
100 150 200 250 100 150 200 250
H
150 200 250 200 250
Fig. 5.14. Binary images of areas of drug substance produced from principal
component three images of blends (median filtered reflectance data). A) BN5039,
B) BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H)
BN5078.
246
0) o
O)
o
i ü
5 10 5 10
Im age area Im age area
X6
0)
o 4
5 10 5 10 15
Image area Im age area
5 10 5 10
a.
o4 o 4
oO)
i ü
5 10 15 5 10 15
Fig. 5.15. On-line Shewhart PC control bar charts showing percentage of drug
substance in 16 sub-areas of blend images, with mean and sigma and 2 sigma
limits. Blend image: A) BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065;
F) BN5067; G) BN5070; H) BN5078.
247
Table 5.4. Particle size analysis of unmilled crystalline material and powdered
drug substance from PCI and PC3 binary images respectively.
B a tc h C r y s t a l l i n e m a t e r ia l ( P C I i m a g e s ) P o w d e re d d ru g s u b s ta n c e (P C 3 im a g e s )
n* M ean t/so /n m o /^ m M ax. M ean d sü !\im d \a n M ax.
s i z e / |i m s iz e /^ m s iz e /^ im s i z e / |i m
B N 5039 707 1 3 .2 6 12 9 .3 0 60 1356 9 .4 5 6 5 .4 2 48
B N 5040 638 1 4 .6 5 12 1 0 .9 8 102 1365 9 .2 0 6 5 .0 7 42
B N 5045 443 1 6 .1 9 12 1 5 .6 3 120 1386 9 .2 7 6 5 .4 4 42
B N 5064 914 1 3 .4 3 12 9 .4 9 72 1377 8 .9 8 6 4 .8 0 36
B N 5065 683 1 5 .1 4 12 1 1 .7 6 108 1364 9 .5 0 6 5 .1 8 42
BN 5067 721 1 5 .5 5 12 1 0 .8 6 96 1231 9 .0 3 6 4 .9 8 36
B N 5070 528 1 2 .4 1 12 8 .5 48 1365 8 .8 6 6 4 .6 3 48
B N 5078 592 1 3 .5 8 12 9 .8 8 114 1319 9 .2 3 6 4 .9 4 36
where n is the number of particle size measurements recorded for each image.
248
>>40
O-30
0)20
e 10
6 12 18 24 30 36 42 48 54 60 20 40 60 80 100
Particle size/|im Particle size/pm
>,40
O’ 30
0)20
y 10
20 40 60 80 100 120 6 12 18 24 30 36 42 48 54 60 66 72
Particle size/pm Particle size/pm
- 20
(u 10
20 40 60 80 100 20 40 60 80
o 40
ST30
O'20
Ü 10
6 12 18 24 30 36 42 48 54 6 12 18 24 30 36 42 48 54
Particle size/iim Particle size/^m
Fig. 5.16. Percentage frequency particle size distributions of unmilled crystalline

drug substance measured from blend PCI binary images (spatial resolution = 6
pm): A) BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G)
BN5070 and H) BN5078.
249
6 12 18 24 30 36 42 48 6 12 18 24 30 36 42
12 18 24 30 36 42 12 18 24 30 36
12 18 24 30 36 42 12 18 24 30 36
6 12 18 24 30 36 42 48 12 18 24 30 36
Fig. 5.17. Percentage frequency particle size distributions of powdered drug

substance measured from blend PC3 binary images (spatial resolution = 6 pm): A)
BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G) BN5070
and H) BN5078.
250
largest unmilled crystalline material particle size.
5. 9 M ultivariate M onitoring of Blend Quality
In this Section, all 157 rows and 256 columns of each image were used.
5. 9. 1 Monitoring Chemical Composition o f Unmilled Crystalline Material by
Residual Analysis
With these images, most pixels have been shown to fall within control limits on a given
PC, that is most are grouped within the same multinormal distribution. Preliminary Q
residual analysis for blend BN5045, with a 7PC model, showed that most of the 40192
pixels fall within the model space, as predicted by the PC Shewhart control charts.
Unusual pixels, which did not fit the model were found to be those with high positive or
negative score values on a PC and included pixels representing crystalline areas,
crystalline drug substance and holes in the powder surface or scratches on the barium
fluoride disc. Either of these groups of pixels should therefore be amenable to Q
residual analysis, once identified from the PC score Shewhart control charts. A PCA
model specific to either of these groups would need to be constructed, with an
appropriate rank determined by cross validation. In this work, location and chemical
characterisation of areas of large unmilled crystalline material was considered important
and was therefore monitored.
The method of detecting unmilled crystalline material on the PCI intensity images by
Shewhart control charts was used to identify the spatial locations with highly reflective
dense crystalline material. Blend BN5045 was treated as a reference blend, from which
the Q residual model was built. This blend was used as a reference as its crystalline
areas are known to be unmilled drug substance. With the other unusual blends, the
crystalline areas do not all necessarily contain the same chemical composition, i.e. drug
251
substance. For these, this was therefore monitored by residual analysis using the Q
statistic and also by visual inspection of residual distance, Q, to model images.
Cross Validatory Estimation o f The Number o f Principal Components to Retain
A PCA model was built from blend BN5045 using pixels which had intensity values
above the PC Shewhart 99% control limit (« =196 pixels). The crystalline material on
this image is known from certificate of analysis data and from examination of both PCI
and PC3 images to be drug substance crystals. Cross validation using 14 subgroups each
with 14 pixels was performed. The W, R and percentage sum of squares {%SS) statistics
were examined. The rank of the data set was determined to be 7 PCs {R = 1.03, %SS =
8 8 .6 6 , Table 5.5).
Residual Distance, Q, to Model Images
Residual analysis of PCA models was examined after extraction of 3 components,
which accounts for intensity and drug substance content, and after extraction of 7
components, the model rank determined by cross validation for pixels with PCI score
values above 99% control limit (Fig. 5.18). For calculation of the residual distance to
model images (Geladi et al, 1996), PCA models were constructed from the entire image.
The residual distance, Qâ , of each pixel from the 3PC and from the 7PC models was
calculated as:
r K ^ (5.8.1)
Q ua =1jt=i ijkA
Where îjL4 is a vector of size (K-by-1) with indices i andy, extracted from E a , d\jA is a
pixel with indices i and j for the PCA models with rank A, i= 1, ..., / is the column
252
Table 5.5. PCA results for modelling unmilled crystalline drug substance in blend
BN5045 (n = 196 pixels) (pixels selected from PC3 Shewhart control chart with
99% control limit).
Principal component W statistic R statistic Cumulative % 55*
0 -
1 14.25 0.61 40.83
2 8.89 0.72 59.05
3 10.21 0.70 73.07
4 9.34 0.72 81.65
5 3.08 0.91 84.64
6 2.70 0.94 86.97
7 1.29 1.03 88.66
0.003
0.002
Observation
Fig. 5.18. Q Residual control chart for PCA model of unmilled crystalline material
identified from PC Shewhart control chart of blend BN5045 with a 99%
confidence level (/i = 196 observations, 7PCs) (95% and 99% limits for Q shown).
^ Cumulative % SS is the cumulative percentage sum of squares of the original data set
explained by the model with n PCs.
253
index in the image, 7 = 1, . . 7 is the row index in the image, k= 1, ..., 7Tis the variable
index for the multivariate residual image from MPCA, E a (Section 3.6.3, equation
(3.6.1)), and A is the effective rank of the model considered (3 or 7PCs).The form
an intensity image Qa which shows distance and location and clustering (Geladi et al,
1996). Regions which fit the model well will show small distances, regions which do
not fit the model will show up as large distances. After 3PCs were extracted, most of the
crystalline areas fit the model, however one large crystal is observed in the upper right
region (Fig. 5.19A). However, after 7PCs, little structure remains in the residual image
E a , with only a few pixels showing high distance (Fig. 5.19B). The 7PC model
therefore adequately describes most systematic variation in the image.
Q Residual Control Charts For Monitoring Crystalline Material
Using blend BN5045, a 7PC model was constructed from pixels above their PCI
Shewhart 99% control limits {n = 196 pixels) (Fig. 5.18). This limit was used instead of
the lower 95% limit to reduce Type I errors (random). Using these pixels, control limits
for Q were estimated (Section 3.7.5, equation (3.7.16)).
The 95% and 99% control limits for Q were 3.4644 x 10“^ and 4.5704 x 10“^
respectively. The type I errors were found to be satisfactory with one pixel out of 196
exceeding the 99% control limit, and 10 pixels exceeded the 95% control limit. Type I
errors were also estimated for this model using pixels identified as crystalline on PCI
Shewhart chart with a 95% control limit. Out of 598 pixels above the 95% PCI
Shewhart control limit, only 19 had Q values which exceeded the g 95% control limit
(3.18%). Future monitoring of crystalline material for the other blends used this model.
The pixels which were monitored for each blend were those which exceeded the 95%
PCI Shewhart control limits for those images (Table 5.6). Blends BN: 5040; 5064;
5067; 5070 and 5078 all showed greater than 1% of their crystalline pixels to exceed the
254
Fig. 5.19. Residual distance, d, to model image for blend BN5045 after principal
components analysis. Principal components extracted: A) 3; B) 7.
255
Table 5.6. Q Residual analysis of blend pixels classed as unmilled crystalline
material on their respective PCI control charts (95% limits). Model built from
blend BN5045 pixels above 99% control limit on PCI (/i = 7PCs).
Batch Pixels above 95% control limit on g > Q 95% control g > Q 99% control
number PCI limit (%) limit (%)
5039 781 5.2 1.4
5040 779 7.8 3.9
5045 598 3.2 0.3
5064 1023 6.5 1.5
5065 862 3.0 0.46
5067 934 9.4 4.8
5070 546 7.7 3.3
5078 670 11.5 4.3
256
99% Q limit (Table 5.6). Their control charts showed that those pixels which exceeded
the 99% limit for Q were not randomly distributed, but spatially were closely located
(Fig. 5.20). Hence these pixels are unusual and, although they are crystalline, they may
not be drug substance. Further investigation of these pixels could involve projecting
them back into the image space, to identify their spatial location, and in addition the
spectra of these pixels could be examined. However, identification of their composition
may require imaging over the full NIR spectral wavelength range.
5. 9. 2 Multivariate Monitoring o f Blend Quality Using Hotelling^s Control
Ellipses
The monitoring of blend quality with individual PC score Shewhart control charts was
compared with a more sophisticated approach. Hotelling’s 7^ control ellipses of PC
scores. This method simultaneously monitors all blend quality variables with a specific
overall type I error. The PC score Shewhart control charts used each had a type I error
of 5%; overall this equates to a type I error of 14.26% for a 3PC model.
With this method, the first three principal components scores were monitored with 95%
and 99% control limits. With the PCA models. Hotelling’s 7^ was calculated according
to equation (1.8.4).The upper control limit, UCL, for this statistic was be calculated
according to equation (1.8.5). The value of a for the I00a% critical point of the F
distribution was set to 0.95 (warning level) and 0.99 (control level).
With a PCA model derived from mean-centred data. Hotelling’s 7^ may also be
calculated as (Kourti and MacGregor, 1996):
n *2 w *2
^ '= I-T =I ^ (5.8.2)

a= l 1=1
where Xa,a= 1 , 2 , ...,« , are the eigenvalues of the covariance matrix, S, and ta are the
257
A
0.01 0.02
0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002
0
200 400 600 200 400 600
Observation Observation
C
0.01
0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002
100 200 300 400 500 200 400 600 800 1000
E F
0.01 0.04
0.008
0.03
0.006
O O 0.02
0.004
0.01
0.002
200 400 600 800 200 400 600 800

G H
0.01 0.02
0.008
0.015
0.006
O 0.01
0.004
0.005
0.002
0 100 200 300 400 500 200 400 600

Fig. 5.20. Q Statistic control charts for blends using local PCA model derived from
unmilled crystalline drug substance pixels (99% control limit on PC3 Sbewbart
chart) of blend BN5045 (95% and 99% Q control limits). Blend: A) BN5039; B)
BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G) BN5070; H) BN5078.
258
scores from the principal component transformation, sâ is the variance of ta (the
variances of the principal components are the eigenvalues of S).
Interpretation o f The Principal Components Used For Control
Prior interpretation of the PC loadings and of the PC score images has shown that the
first three components represent physical and chemical information. The first
component represents intensity, and is therefore useful for monitoring physical
characteristics, such as the presence of large crystalline particles. The second
component is a contrast of the first. The third component represents chemical
information and allows identification of areas with high concentration of drug
substance. The first and third PCs were considered most important for monitoring and
are shown in Fig. 5.21. This figure shows the Hotelling’s 7^ 95% and 99% control
ellipses and the 95% and 99% PC Shewhart control limits. This is an informative plot
which allows pixels with high values of Hotelling’s 7^ to be classified as either
crystalline (high positive PCI score value), drug substance (high positive PC3 score
value) or both (high positive score values on PCI and PC3) or as a surface scratch on
the barium fluoride disc of the sampling accessory (high negative value on PCI score
value). A simple computer program was written to assign pixels with Hotelling’s 7^
above the 95% limit {UCL = 8.99, a = 0.95). This involved monitoring normalised
scores. The sum of square of each of these values corresponds to Hotelling’s 7^ and
monitoring of their normalised values for a pixel provides directional information.
Similar to the PC Shewhart control charts, a 95% control limit based on the t-
distribution was used. Hence any normalised score value exceeding a value of 1.96 was
deemed to represent a high value on the PC. An example of this is shown as bar charts
for two pixels in Fig. 5.22. The two pixels which had the highest score values on PCI
and PC3 are shown with their normalised score values. The first pixel was classified as
259
0.6
Unmilled crystalline materilal I Unmilled drug su b stan ce
I I
I I •
0.4
!.. ••• . .
0.2
Milled drug su b stan ce
O
^ - 0.2
-0 .4
Barium fluoride disip

surface scratch
- 0.6
- 0.8 _i_L
-0.15 - 0.1 -0.05 0 0.05 0.1 0.15 0.2
PC3 S cores
Fig. 5.21. Hotelling’s 7^ control ellipses (95% and 99%) and PC Shewhart control
limits (95% and 99%, dotted line and dotted-dashed line respectively) for principal
components 1 and 3 of blend BN5045 showing regions of: unmilled crystalline
material and crystalline drug substance; drug substance and surface scratches on
the barium fluoride (BaFi) disc of the sampling accessory window.
260
1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t
« 0.3
o 0 .2
1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t
Fig. 5.22. Bar charts of principal components (PC) scores and normalised PC
scores of pixels from blend BN5045 PCI to 3 images (7 ^ 3, 40192-3, 0.95 = 8.99, 7^ 3, 40192-
3, 0.99 = 13.82). Pixel 34159 (drug substance), 7^ = 91.50: A) PC scores & B)
normalised PC scores. Pixel 34992 (unmilled crystalline material), 7^ = 33.45: C)
PC scores & D) normalised PC scores.
261
both drug substance and crystalline (Fig 5.22B), the second pixel was classified as
crystalline (Fig. 5.22D). This agrees with the PCI versus PC3 plot (Fig. 5.21).
‘On-line^ Type Imaging o f Multivariate Statistical Control Blend Results
As an example, the results of the multivariate control approach for blend BN5045 were
represented as an image. A colour scheme was chosen which would be suitable for an
engineer to monitor. Areas which were identified as crystalline were given the colour
red (denoting a warning colour). Areas which were identified as drug substance were
given the colour green (denoting an acceptable colour). The background colour was
blue. This is shown in Fig. 5.23. Areas which were classified as both crystalline and as
drug substance were shown as green. The advantage with this multivariate approach is
that now both areas with dense unmilled particles and drug substance may be shown on
the same graph. In this image, both unmilled crystals and drug substance occur together
in agreement with previous findings. In total 715 pixels were identified as unmilled
crystals, 850 were identified as active, and 747 as surface scratches. Further analysis of
these results showed that only 332 out of the 715 crystalline pixels were identified
solely as crystalline material, the rest were also identified as drug substance. With the
surface scratch pixels, only 399 were solely identified as being that. Therefore, with this
image, 2.97% (« = 850+332 pixels) may be considered to represent drug substance in
total, with 0.84% being unmilled large crystals. This result is not inconsistent with the
certificate of analysis results. These results may also be subjected to an ‘on-line’ type of
analysis in the same manner as with the binary images from the PC Shewhart control
charts. The identification of areas with unmilled crystals and drug substance should be
as straightforward.
262
100
120
140
150 200
Fig. 5.23. Image of blend BN5045 showing areas identified as drug substance
(green, n = 850 out of 40192 pixels) and crystalline material (red, n = 715 out of
40192 pixels) from Hotelling’s 7^ control ellipse (7^95% control limit = 8.99) of
principal components 1 to 3 scores. Pixels classified as both drug substance and
crystalline are shown as green.
263
Future Monitoring o f Blend Images Using The Multivariate Model
The multivariate control model developed from blend BN5045 was used to verify its
ability to detect both crystalline regions and drug substance. This was tested using an
image of pure drug substance. In Fig. 5.24, the first PC images of drug substance and
blend BN5045 are shown. From these, it is readily apparent that the image of drug
substance is visually more darker. This is consistent with spectroscopic analysis as drug
substance absorbs more strongly over this spectral region than the remaining bulk
components of the blend. In addition, a considerable number of dark regions are
identifiable on the drug substance image. These probably represent holes in the powder
surface. The sample of drug substance imaged was a very fine, micronised and porous
sample. In both images, crystalline areas are visible as bright yellow spots.
The multivariate monitoring procedure was the same as for blend BN5045. First, the
unfolded image array of drug substance was centred and projected onto the eigenvectors
from the 3PC model of blend BN5045. The normalised scores were then monitored and
assigned as for BN5045. In this case, a considerable number of pixels were found to
have high negative score values on PCI (« = 10660 out of 40192 pixels). These were
projected back into the image score space (Fig. 5.25) and appear to overlap with the
dark areas observed in the first PC image (Fig. 5.24). These were considered to
represent holes in the powder surface. This was confirmed by a pixel density map of
score intensities on PCs 1 and 3 for the drug substance and blend BN5045 combined
image. This showed that many of the dmg substance pixels had lower intensities on the
first PC (Fig. 5.26); this PC had a bimodal distribution of pixel intensities (Fig. 5.27).
Further analysis of the scores revealed that of these 10660 pixels, 4334 pixels were not
assigned to another class. The number of crystalline pixels was found to be 391, with
153 of these classified solely as crystalline. The number of pixels found to be drug
264
100
150
200
250
300
g *
50 100 150 200 250
Fig. 5.24. Principal component 1 image of drug substance (upper half) and blend
BN5045 (lower half) (PCA model calculated for blend BN5045) showing an overall
darker and more porous image for the drug substance.
265
20
40
GO
iP I l- s .
80
Æ ^ . : v :
100 : V î 'V '< .
120
140
50 100 150 200 250
Fig. 5.25. ‘On-line monitoring of powder porosity’ image for micronised drug
substance powder produced from 3PC model derived from blend BN5045 with
Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as holes in
the powder surface from high normalised PC2 scores (red, n = 10660 pixels out of
40192).
266
Powdered drug substance
B end BN5045
r 150
100 150
PC3 Pixel intensity
Fig. 5.26. Pixel density image of principal components 1 and 3 scores for blend
BN5045 and powdered drug substance (PCA model calculated from blend
BN5045) showing separation in intensity along PCI.
267
o 1000
0)
o- 800
r 600
100 150
PCI Pixel intensity
4000
:»3000
g> 2000
1000
50 100 150 200 250

PC3 Pixel intensity
Fig. 5.27. Pixel intensity frequency histograms of principal components (PC) 1 and
3 scores for blend BN5045 and powdered drug substance (PCA model calculated
from blend BN5045). PCI scores showed bimodal distribution of pixel intensity
frequencies (0 to 255 intensities): A) PCI (mode intensities: 72, n = 757 pixels; 135,
n = 1423 pixels) and B) PC3 (mode intensity: 123, n = 4067 pixels).
268
substance was 22241. An image showing crystalline and drug substance regions of the
powdered drug substance is shown in Fig. 5.28. By correcting the total number of pixels
for those believed to represent holes in the powder surface, the drug substance content
of this image was 62.03% by surface area. This value is not 100% and may be a result
of the signal to noise in this image; spectra (pixels) of very fine particles are likely to
have lower signal to noise and might not be classified as drug substance. Instead these
might be classified as holes in the powder surface. However, visually, this image shows
uniform coverage of the image area with drug substance.
The calculated active content value of 62.03% by surface area is less than the content by
mass. This may be due to the porous nature of the powder. Importantly, though, this
result confirms the ability of the model to identify a greater abundance of drug
substance.
5.10 Alternative Approaches to Monitoring Blend Quality
Although not tested here, the specificity of the model towards the drug substance could
be investigated by projection of an image of a chemically different powder into the
model space, followed by similar multivariate monitoring. In this study, testing the
model’s specificity to this drug substance was not necessary since all imaged blends
have been identified and qualified by NIR spectroscopy (Chapter 3). However, other
powdered materials which absorb in this same narrow NIR region might be incorrectly
classified as containing this drug substance if the same MPCA model of blend BN5045
were to be used for monitoring. In this event, it is likely that either identification and
qualification of the tested material by NIR spectroscopy, or imaging over a wider NIR
wavelength range would be required to prevent this.
In addition, model specificity might be determined by Q residual analysis of images of
other powders, using the PCA model of blend BN5045.
269
I
100
120
140
I
Fig. 5.28. ‘On-line type monitoring of drug substance content’ image of micronised
drug substance powder produced from 3PC model derived from blend BN5045
with Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as
drug substance from high normalised PC3 scores (green, n = 22241 out of 40192
pixels) and crystalline material from high normalised PCI scores (red, n = 391 out
of 40192 pixels). (Dark areas corresponding to holes in powder surface and surface
scratches on barium fluoride sampling accessory window not shown, n = 10660 out
of 40192 pixels). In total, 35412 pixels were outside the 95% control ellipse.
270
Another method of classification that could be tested is /[-nearest neighbour. This
method could be used to classify pixels classed as crystalline, drug substance and as
barium fluoride disc surface scratches. This would require a multivariate model with
one or all of these classes of pixels, eg blend BN5045. The classification results by this
method might yield results as good as those based on Hotelling’s 7^ and normalised PC
scores.
5.11 Conclusion
Multivariate image analysis of NIR multispectral images may be used to monitor the
quality of blends. The method allows characterisation of physical and chemical
parameters such as drug substance content, dense crystalline material content, blend
uniformity, particle size distribution and powder porosity. Monitoring of future blends
requires construction of a multivariate model. The blend used for this purpose should
contain all of the features which are to be monitored in the future.
271
CHAPTER 6
Conclusion
The results of this study clearly demonstrate the ability of near infrared spectroscopy
and imaging microscopy to allow multivariate statistical quality control of an entire
pharmaceutical manufacturing process. This includes identification and qualification of
powdered pharmaceutical raw materials through to statistical process control of a tablet
manufacturing process at both the blending and tabletting stages.
Each of the process stages requires the construction of a multivariate model of NIR
measurements of high quality process intermediates. These measurements should
include diffuse reflectance measurements of blends and tablets and transmission
measurements of tablets (in Chapters 3 and 4, these raw data were transformed to
apparent absorbance). These data should be transformed to either SNV-DT or Sg2dl 1
data. For each process stage, the multivariate model derived from either of these data
requires validation by statistical means and by comparison of model validation results
with reference analytical data. The initial implementation of this MSPC method for
control of a pharmaceutical tabletting process will therefore require reference analysis
data. However, as the multivariate models derived from NIR measurements and from
both NIR measurements and reference analytical data were equally effective in
monitoring process operating performance, subsequent process monitoring could be
achieved solely with NIR measurements.
The NIR method has several advantages over the reference analytical tests. These relate
to both the analysis and the analytical results obtained. The NIR analysis is considerably
faster and allows non destructive analysis of the process material’s matrix. The
272
multivariate analysis of these measurements allows the process performance at all
process stages to be monitored over time. Significant deviations in process performance,
that lead to lower quality product, are readily identified and diagnosed throughout the
process. Importantly, deviations in process performance which affect the blending stage
may be observed at this stage by the NIR method. These are not always identified with
the reference analyses. NIR imaging of these blends may be used for at-line quality
control of the blending stage and provides useful diagnostic information of blending
performance. This information may allow for improved control of blending to ensure
that the tablets produced are of the highest possible quality.
273
References
Advances in Near-Infrared Measurements, ed. Patonay, G., JAI Press Inc., Greenwich,
Connecticut, 1993.
Alt, F. B., in Encyclopedia o f Statistical Sciences, ed. Kotz, S., and Johnson, N. L., John
Wiley and Sons Inc., New York, 1985.
Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley and
Sons Inc., New York, 2"^ edn., 1984.
Andersson, M., Josefson, M., Langkilde, F. W., and Wahlund, K. G., J. Pharm. Biomed.
Anal, 1999,20,27.
Aucott, L. S., Garthwaite, P. H., and Buckland, S. T., Analyst, 1988,113, 1849.
Aulton, M. E., Pharmaceutics: The Science o f Dosage Form Design, Churchill
Livingstone, Edinburgh, 1988.
Barnes, I. J., Dhanoa, M. S., and Lister, S. J., Applied Spectroscopy, 1989, 43, 772.
Barth, H. G., Sun, S., and Nickol, R. M., Anal Chem., 1987, 59, 142.
Beebe, K. R., Pell, R. J., and Seasholtz, M. B., Chemometrics: A Practical Guide, John
Wiley & Sons Inc., New York, 1998.
Bharati, M. H., and MacGregor, J. F., Ind. Eng. Chem. Res., 1998, 37, 4715.
Blanco, M., Coello, J., Iturriaga, H., Maspoch, S., and Pezuela, de la, C., Analyst, 1998,
123, 135R.
Bromba, M. U. A., and Ziegler, H., Anal Chem., 1981, 53, 1583.
Bromba, M. U. A., and Ziegler, H., Anal. Chem., 1983, 55, 1299.
Bull, C. K., Analyst, 1991,116, 781.
British Pharmacopoeia 1999, HM Stationery Office, London, 1999, vol. 2, Appendix
XVII, A & B .
274
Candolfi, A., Maesschalck, de, R., Massart, D. L., Hailey, P. A., and Harrington, A. C.
E., J. Pharm. Biomed. Anal, 1999a, 19, 923.
Candolfi, A., Maesschalck, de, R., Jouan-Rimbaud, D., Hailey, P. A., and Massait, D.
L., J. Pharm. Biomed. Anal., 1999b, 21, 115.
Ciurczak, E., Torlini, P., and Demkowicz, P., Spectroscopy, 1986,1, 36.
Cowe, I. A., McNicol, J. W., and Clifford Cuthbertson, D., Analyst, 1989, 114, 683.
Dhanoa, M. S., Lister, S. J., Sanderson, R., and Barnes, R. J., J. Near Infrared
Spectrosc., 1994, 2, 43.
Dillon, W. R., and Goldstein, M., Multivariate Analysis Methods And Applications,
John Wiley & Sons Inc., New York, 1984.
Dreassi, E., Ceramelli, G., Savini, L., Corti, P., Peruccio, P. L., and Lonardi, S., Analyst,
1995a, 120,319.
Dreassi, E., Ceramelli, G., and Corti, P., Analyst, 1995b, 120, 1005.
Dreassi, E., Ceramelli, G., Corti, P., Massacesi, M., and Perruccio, P. L., Analyst,
1995c, 120, 2361.
Dubois, P., Martinez, J., and Levillain, P., Analyst, 1987,112, 1675.
Eastment, H. T., and Krzanowski, W. J., Technometrics, 1982, 24, 73.
Esbensen, K. H., Geladi, P., and Grahn, H. P., Chemometrics and Intelligent Laboratory
Systems, 1992, 14, 357.
Forbes, R. A., Persinger, M. L., and Smith, D. R., J. Pharm. Biomed. Anal., 1996,15,
315.
Frake, P., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U. A.,
Analytical Communications, 1998a, 35, 133.
Frake, P., Gill, I., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U.
A.., Analyst, \99^h, 123,2043.
Frei, R. W. and MacNeil, J. D., Diffuse Reflectance Spectroscopy in Environmental
275
Problem-Solving, CRC Press, Cleveland, Ohio, 1973.
Geladi, P., Isaksson, H., Lindqvist, L., Wold, S., and Esbensen, K., Chemometrics and
Intelligent Laboratory Systems, 1989a, 5, 209.
Geladi, P., and Esbensen, K., Journal o f Chemometrics, 1989b, 3, 419.
Geladi, P., Chemometrics and Intelligent Laboratory Systems, 1992,14, 375.
Geladi, P., Swerts, J., and Lindgren, F., Chemometrics and Intelligent Laboratory
Systems, 1994, 24, 145.
Geladi, P., and Grahn, H., Multivariate Image Analysis, John Wiley & Sons Inc., New
York, 1996.
Hailey, P. A., Oakley, A. C. E., Doherty, P., Pettman, A. J., Sharp,D. C. A., and
Barnes, D. M. H., NIR News, 1994, 5, 10.
Hailey, P. A., European Pharmaceutical Review, 1996, 1 (2), 45.
Hailey, P. A., Doherty, P., Tapsell, P., Oliver, T., and Aldridge, P.K., J. Pharm.
Biomed. Anal., 1996, 14, 551.
Handbook o f Pharmaceutical Excipients, ed. Wade, A., and Weller, P. J., American
Pharmaceutical Association, Washington, The Pharmaceutical Press, London, 2nd
edn., 1994.
Harman, H., Modem Factor Analysis, The University of Chicago Press, Chicago, 3"^^
edn., 1976.
Dari, J. L., Martens, H., and Isaksson, T., Applied Spectroscopy, 1988, 42, 722.
The Industrial Pharmacist, 1999,10, 9.
Jackson, J. E., Journal o f Quality Technology, 1980,12, 201.
Jackson, J. E., Journal o f Quality Technology, 1981a, 13, 46.
Jackson, J. E., Journal o f Quality Technology, 1981b, 13, 125.
Jackson, J. E., A User’s Guide To Principal Components, John Wiley & Sons Inc., New
York, 1991.
276
Kortum, G., Reflectance Spectroscopy, Principles, Methods, Applications, Springer-
verlag, Berlin, 1969.
Kourti, T., and MacGregor, J. F., Journal o f Quality Technology, 1996, 28, 409.
Kresta, J. V., MacGregor, J. P., and Marlin, T. E., The Canadian Journal o f Chemical
Engineering, 1991, 69, 35.
Lindberg, W., Persson, J. A., and Wold, S., Anal Chem., 1983, 55, 643.
Lowry, C. A., Woodhall, W. H., Champ, C. W., and Rigdon, S. E., Technometrics,
1992, 34, 46.
MacGregor, J. P., Jaeckle, C., Kiparissides, C., and Koutoudi, M., AIChE Journal,
1994, 40, 826.
Mark, H., Principles and Practice o f Spectroscopic Calibration, John Wiley & Sons
Inc., New York, 1991.
Maesschalck, de, R., Cuesta Sanchez, P., Massart, D. L., Doherty, P., and Hailey, P.,
Applied Spectroscopy, 1998, 52, 725.
Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y., and Kaufman, L.,
Chemometrics: a textbook, Elsevier Science Publishers B. V., Amsterdam, 1988.
Montgomery, D. C., Introduction To Statistical Quality Control, John Wiley & Sons
Inc., New York, 1997.
Morud, T. E., Journal o f Chemometrics, 1996,10, 669.
Naes, T., and Martens, H., Journal o f Chemometrics, 1988, 2, 155.
Neave, H. R., Elementary Statistics Tables, Routledge, London, 1995.
Nomikos, P., and MacGregor, J. P., AIChE Journal, 1994, 40, 1361.
Nomikos, P., and MacGregor, J. P., Technometrics, 1995, 37,41.
Norris, K. H., and Williams, P. C., Cereal Chemistry, 1984, 61, 158.
Osborne, B. G., Peam, T., Hindle, P. H., Practical NIR Spectroscopy With Applications
in Food and Beverage Analysis, Longman Scientific & Technical, UK, 2nd edn..
277
1993.
Pearce, S., Manufacturing Chemist, 1986, 57, 77.
Piovoso, M. J., Kosanovich, K. A., and Pearson, R. K., Proc. Amer. Control Conf,
1992, 2359.
Plugge, W., and Vlies, van der, C., J. Pharm. Biomed. Anal., 1993,11, 435.
Plugge, W., and Vlies, van der, C , J. Pharm. Biomed Anal., 1996,14, 891.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Numerical
Recipes in C, The Art o f Scientific Computing, Cambridge University Press,
Cambridge, 2nd edn., 1992.
Sekulic, S. S., Wakeman, J., Doherty, P., and Hailey, P. A., J. Pharm. Biomed. A nal,
1998,17,1285.
Simmons, A., International LABMATE, 1993,17, 23.
Skagerberg, B., MacGregor, J. P., and Kiparissides, C., Chemometrics and Intelligent
Laboratory Systems, 1992,14, 341.
Stark, E., Luchter, K., and Margoshes, M., Applied Spectroscopy Reviews, 1986, 22,
335.
Statistical Process Control, ed. Mamzic, C. L., Instrument Society of America, NC,
USA, 1995.
Vlies, van der, C., Plugge, W., and Kaffka, K. J., Spectroscopy, 1995, 10, 46.
Vlies, van der, C., European Pharmaceutical Review, 1996,1(1), 49.
Wangen, L. E., and Kowalski, B. R., Journal o f Chemometrics, 1988, 3, 3.
Wargo, D. J., and Drennen, J. K., J. Pharm. Biomed. Anal., 1996,14, 1415.
Washington, C., Particle Size Analysis in Pharmaceutics and Other Industries, Ellis
Horwood, New York, 1992.
Westerhuis, J. A., and Coenegracht, P. M. J., Journal o f Chemometrics, 1997,11, 379.
Whitfield, G., Pharmaceutical Manufacturing, 1986, 3, 31.
278
Wise, B. M., Veltkamp, D. J., Ricker, N. L., Kowalski, B. R., Bames, S. M., and
Arakali, V., Waste Management Proc., Tuscon, 1991, 169.
Wold, S., Technometrics, 1978, 20, 397.
Wold, S., Geladi, P., Esbensen, K., and Ohman, J., Journal o f Chemometrics, 1987,1,
41.
279
APPENDIX A
List of Publications
O' Neil, A. J., Jee, R. D., Watt, R. A., and Moffat, A. C., J. Pharm. Pharmacol., 1997,
49, {Supplement): 19.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1998a, 50
(Supplement): 45.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1998b, 123, 2297 - 2302.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1999a, 124, 33 - 36.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1999b, 51
(Supplement): 47.
280
APPENDIX B
Tables For PCA And Multiway PCA Data Sets
281
Table B l. PCA results: blends.
Data Set PCs PRESS SS %SS R d.f.
Raw data 0 - 3.4034e-004 0 0 0 -
1 2.3765e-005 2.3300e-005 93.1539 0.0698 1.2769e-k004 77

2 2.0786e-006 2.0070e-006 99.4103 0.0892 1.3567e4-004 65
3 6.2922e-007 5.9890e-007 99.8240 0.3135 1.3688e+004 54
4 3.5143e-007 3.2341e-007 99.9050 0.5868 1.3260e+004 44
5 1.0715e-007 9.8904e-008 99.9709 0.3313 1.2497e4-004 35
6 6.4774e-008 5.8307e-008 99.9829 0.6549 1.1670e-k004 27
7 4.2406e-008 3.7018e-008 99.9891 0.7273 1.0487e-r004 20
8 2.6310e-008 2.0742e-008 99.9939 0.7107 9.1472e+003 14
9* 2.1502e-008 1.3155e-008 99.9961 1.0366 7.7176e4-003 9
SNV 0 6.7573e-004 0 0 0
I 8.5795e-005 8.3710e-005 87.6119 0.1270 6.6698e+003 65
2 3.5162e-005 3.3626e-005 95.0238 0.4200 7.4692e-f-003 54
3 1,2058e-005 1.1431e-005 98.3083 0.3586 7.6622e-k003 44
4 5.4395e-006 5.0691 e-006 99.2498 0.4758 7.627 le-i-003 35
5 3.6739e-006 3.261 le-006 99.5174 0.7248 7.3151e4-003 27
6 2.0019e-006 1.7481e-006 99.7413 0.6139 6.7325e4-003 20
7 1.7173e-006 1.0332e-006 99.8471 0.9824 6.0815e4-003 14
Detrend 0 3.6857e-006 0 0
1 7.2327e-007 7.0743e-007 82.2622 0.1962 9.1077e-f-003 119
2 3.8726e-007 3.6801e-007 90.3347 0.5474 1.0103e-)-004 104
3 1.2486e-007 1.1867e-007 97.0801 0.3393 1.0538e4-004 90
4 7.4748e-008 6.9656e-008 98.4290 0.6299 1.0736e-t-004 77
5 4.4516e-008 4.0115e-008 99.2071 0.6391 1.0588e-k004 65
6 2.7663e-008 2.2799e-008 99.6229 0.6896 1.0316e4-004 54
7 2.3569e-008 1.5094e-008 99.7491 1.0338 9.9154e-t-003 44
SNV Detrend 0 2.9907e-004 0 0 0

1 5.1065e-005 4.9855e-005 83.3297 0.1707 6.5708e-t-003 77
2 2.5235e-005 2.3939e-005 91.9955 0.5062 7.4406e-t-003 65
3 8.3422e-006 7.8986e-006 97.3589 0.3485 7.7392e+003 54
4 4.2461 e-006 3.9428e-006 98.6816 0.5376 7.8302e+003 44
5 2.4987e-006 2.2542e-006 99.2463 0.6337 7.6138e4-003 35
6 2.0691 e-006 1.4533e-006 99.5141 0.9179 7.2036e-h003 27
7 1.6120e-006 9.3497e-007 99.6874 1.1092 6.6206e-t-003 20
Savitzky-Golay 2"*' derivative 0 3.6738e-011 0 0 0 0

1 1.7291e-011 1.6920e-011 53.9447 0.4706 1.3112e-n003 77
2 1.2038e-011 1.1599e-011 68.4283 0.7115 2.4743e-h003 65
3 1.0619e-011 9.0654e-012 75.3244 0.9155 3.0039e-k003 54
4 8.1489e-012 6.6312e-012 81.9503 0.8989 3.3147e-k003 44
5 6.4114e-012 4.3682e-012 88.1101 0.9669 3.5459e-n003 35
6 5.1445e-012 3.1088e-012 91.5380 1.1777 .6651e-h003 27
7 3.4186e-012 2.1806e-012 94.0646 1.0996 3.6170e-h003 20
8 PCs extracted after residual analysis, based on visual inspection of loadings and %SS
(= 99.993).
282
Table B2. PCA results: lower strength tablet RCA.
Absorbance 0 - ].5141e-004 0 0 -
1 9.4706e-006 7.9859e-006 94.73 0.06 1261.32 44

2 1.6827e-006 1.3494e-006 99.11 0.21 1362.85 35
3 3.9786e-007 2.8244e-007 99.81 0.29 1361.27 27
4 1.1642e-007 7.5907e-008 99.95 0.41 1301.11 20
5 6.9585e-008 3.9056e-008 99.97 0.92 1188.24 14
6 3.1565e-008 1.7063e-008 99.99 0.81 1019.52 9
7 1.5746e-008 8.1669e-009 99.99 0.92 828.48 5
8 8.8043e-009 3.8448e-009 100.00 1.08 601.34 2
SNV 0 1.0928e-004 0 0
1 4.3295e-005 3.8458e-005 64.81 0.40 663.75 44
2 9.6814e-006 8.3578e-006 92.35 0.25 908.64 35
3 4.2721e-006 3.4629e-006 96.83 0.51 996.20 27
4 1.8986e-006 1.4246e-006 98.70 0.55 992.58 20
5 1.2809e-006 8.7922e-007 99.20 0.90 941.74 14
6 9.0963e-007 5.8307e-007 99.47 1.03 827.79 9
Detrend 0 2.2620e-006 0 0
1 9.2012e-007 8.0456e-007 64.43 0.41 491.17 35
2 2.1666e-007 1.8132e-007 91.98 0.27 700.40 27
3 9.6564e-008 7.8296e-008 96.54 0.53 774.70 20
4 3.7717e-008 2.7805e-008 98.77 0.48 765.73 14
5 2.9147e-008 1.8019e-008 99.20 1.05 717.45 9
6 1.6825e-008 1.0247e-008 99.55 0.93 596.79 5
7 1,0453e-008 5.9254e-009 99.74 1.02 454.07 2
SNV detrend 0 9.249 le-005 0 0

1 2.6515e-005 2.328 le-005 74.83 0.29 745.16 44
2 5.1537e-006 4.4064e-006 95.24 0.22 927.63 35
3 2.689 le-006 1.9790e-006 97.86 0.61 987.53 27
4 1 2890e-006 9.3724e-007 98.99 0.65 963.08 20
5 8.8885e-007 5.8454e-007 99.37 0.95 899.96 14
6 5.4824e-007 3.3678e-007 99.64 0.94 790.16 9
7 3.6598e-007 2.1064e-007 99.77 1.09 658.20 5
Savitzky-Golay 2"“^derivative 0 - 6.1338e-011 0 0

1 3.0265e-0il 2.6700e-01I 56.47 0.49 178.00 35
2 ].6247e-011 1.3391e-011 78.17 0.61 379.04 27
3 9.4102e-012 7.2495e-012 88.18 0.70 466.90 20
4 6.2523e-012 4.411 le-012 92.81 0.86 496.67 14
5 4.2918e-012 2.8648e-012 95.33 0.97 481.88 9
6 3.2307e-0I2 2.0362C-012 96.68 1.13 430.92 5
283
Table B3. PCA results: higher strength tablet RCA.
Data Set PCs PRESS SS %SS R / d.f.
Absorbance 0 - 2.8673e-005 0 0 0 0
1 5.5138e-006 4.9724e-006 82.66 0.1923 1163.52 44
2 1.6202e-006 1.3475e-006 95.30 0.3258 1330.37 35
3 4.7295e-007 3.6299e-007 98.73 0.3510 1378.80 27
4 8.439 le-008 6.3968e-008 99.78 0.2325 1362.22 20
5 5.5398e-008 3.7337e-008 99.87 0.8660 1289.96 14
6 2.6475e-008 1.7377e-008 99.94 0.7091 1115.41 9
7 1.8597e-008 1.1116e-008 99.96 1.0702 919.66 5
SNV 0 0 1.1126e-004 0 0 0 0
1 l.ll26e-004 3.19886-005 71.25 0.34 553.93 35
2 3.7539e-005 8.2639e-006 92.57 0.31 723.56 27
3 9.9305e-006 2.9176e-006 97.38 0.45 783.28 20
4 3.7155e-006 1.0414e-006 99.06 0.48 775.59 14
5 I.4108e-006 5.9575e-007 99.46 0.84 721.16 9
6 8.7558e-007 3.6048e-007 99.68 0.97 607.10 5
7 5.7592e-007 2.7077e-007 99.76 1.35 458.54 2
Detrend 0 1.6017e-006 0 0 0 0
1 1.6017e-006 4.906le-007 69.37 0.34 539.02 35
2 5.4476e-007 1.4136e-007 91.17 0.34 731.19 27
3 1.6899e-007 7.4979e-008 95.32 0.81 797.76 20
4 1.1408e-007 1.90356-008 98.81 0.33 787.93 14
5 2.466 le-008 1.21796-008 99.24 0.92 755.82 9
6 1.7555e-008 7.6559e-009 99.52 0.97 631.49 5
7 1.1757e-008 5.6942e-009 99.64 1.31 477.33 2
SNV detrend 0 0 1.01516-004 0 0 0 0

1 1.0151e-004 2.63906-005 74.00 0.30 655.26 35
2 3.0373C-005 5.6355e-006 94.45 0.25 834.69 27
3 6.7015e-006 1.5607e-006 98.46 0.34 895.12 20
4 1.9060C-006 9.80056-007 99.03 0.85 877.35 14
5 1.325 le-006 5.6182e-007 99.45 0.82 777.21 9
6 8.0055e-007 3.5296e-007 99.65 0.98 654.44 5
7 5.4903e-007 2.5956e-007 99.74 1.26 492.12 2
Savitzky-Golay 2"“* 0 0 4.0630e-011 0 0 0 0

derivative 1 4.0630e-011 2.20716-011 45.68 0.62 60.96 35
2 2.5278e-011 1.08146-011 73.38 0.58 301.61 27
3 1.2858e-011 6.5609e-012 83.85 0.77 404.08 20
4 8.3584e-012 3.7225e-012 90.84 0.76 442.94 14
5 4.9766e-012 2.5260C-012 93.78 0.98 445.39 9
6 3.6538e-012 1.86716-012 95.40 1.15 401.31 5
284
Table B4. PCA results: lower strength tablet Intact.
Transmission 0 - 1.9309e-002 0 0 -
1 1.62l6e-003 1.3995e-003 92.75 0.08 2513.19 54

2 1.1444e-004 9.0285e-005 99.53 0.08 2668.62 44
3 1.1508e-005 9.1151 e-006 99.95 0.13 2688.09 35
4 2.6575e-006 1 9877e-006 99.99 0.29 2589.91 27
5 1.3875e-006 9.6032e-007 100.00 0.70 2388.58 20
6 7.8370e-007 4.8750e-007 100.00 0.82 2103.07 14
7 3.7168e-007 2.2196e-007 100.00 0.76 1778.46 9
8 1.8946e-007 1.0396e-007 100.00 0.85 1420.67 5
9 1.1290e-007 5.7926e-008 100.00 1.09 1016.83 2
SNV 0 1.3275e-003 0 0
1 1.0250e-004 8.770 le-005 93.39 0.08 1369.42 44
2 2.1154e-005 1.6823e-005 98.73 0.24 1502.04 35
3 5.9883e-006 4.7138e-006 99.64 0.36 1508.00 27
4 2.7927e-006 2.0684e-006 99.84 0.59 1441.00 20
5 1.3794e-006 9.1907e-007 99.93 0.67 1311.17 14
6 7.6756e-007 4.6957e-007 99.96 0.84 1142.90 9
7 4.4474e-007 2.480 le-007 99.98 0.95 930.50 5
8 3.0003e-007 1.5154e-007 99.99 1.21 679.33 2
Detrend 0 1.2686e-004 0 0
1 3.1805e-005 2.8447e-005 77.58 0.25 938.43 44
2 9.827 le-006 7.831 le-006 93.83 0.35 1106.23 35
3 3.8546e-006 2.6945e-006 97.88 0.49 1162.27 27
4 6.6619e-007 5.0676e-007 99.60 0.25 1154.72 20
5 4.1237e-007 2.581 le-007 99.80 0.81 1106.38 14
6 1.7928e-007 1.1244e-007 99.91 0.69 972.24 9
7 9.7649e-008 5.3988e-008 99.96 0.87 809.59 5
8 5.6415e-008 2.8888e-008 99.98 1.04 601.79 2
SNV detrend 0 2.1205e-004 0 0

1 6.3504e-005 5.3239e-005 74.89 0.30 833.47 44
2 2.1988e-005 1.6236e-005 92.34 0.41 992.37 35
3 2.9974e-006 2.4629e-006 98.84 0.18 1052.55 27
4 1.945 le-006 1.3016e-006 99.39 0.79 1058.54 20
5 6.6538e-007 4.3415e-007 99.80 0.51 976.80 14
6 4.7093e-007 2.5773e-007 99.88 1.08 878.07 9
7 2.1661e-007 1.1992e-007 99.94 0.84 718.38 5
8 1.1890e-007 6.0651e-008 99.97 0.99 539.25 2
Savitzky- 0 6.3958e-009 0 0
Golay 2"** 1 3.3879e-009 2.8325e-009 55.71 0.53 161.72 27
derivative 2 I.2977e-009 1.0l47e-009 84.14 0.46 354.11 20
3 5.8567e-010 4.5514e-010 92.88 0.58 430.16 14
4 4.1493e-010 2.8737e-010 95.51 0.91 439.20 9
5 2.6904e-010 1.7286e-010 97.30 0.94 392.77 5
6 1.8724e-010 1.1325e-010 98.23 1.08 318.36 2
285
Table B5. PCA results: higher strength tablet Intact.
Transmission 0 - 2.6399e-002 0 0 -
1 1.253le-003 1.1360e-003 95.70 0.05 1581.86 35

2 7.9033e-005 6.6845e-005 99.75 0.07 1700.81 27
3 1.6135e-005 1.2157e-005 99.95 0.24 1689.27 20
4 5.7359e-006 3.7653e-006 99.99 0.47 1564.20 14
5 2.5204e-006 1.5297e-006 99.99 0.67 1364.36 9
6 1.3679e-006 6.6336e-007 100.00 0.89 1107.78 5
7 7.8812e-007 3.9265e-007 100.00 1.19 804.84 2
SNV 0 8.0087e-003 0 0
1 5.1457e-004 3.9485e-004 95.07 0.06 985.63 35
2 3.7113e-004 1.2975e-004 98.38 0.94 1099.51 27
3 1.7685e-004 5.0638e-005 99.37 1.36 1083.37 20
4 1.1255e-004 2.3430e-005 99.71 2.22 1017.20 14
5 2.8991 e-005 8.0100e-006 99.90 1.24 906.17 9
Detrend 0 5.6523e-005 0 0
1 2.393 le-005 2.0798e-005 63.20 0.42 398.75 35
2 1.3644e-005 1.0400e-005 81.60 0.66 566.83 27
3 3.3333e-006 2.5707e-006 95.45 0.32 648.10 20
4 1.6904e-006 1.1839e-006 97.91 0.66 679.60 14
5 7.402 le-007 4.9321 e-007 99.13 0.63 639.88 9
6 4.063 le-007 2.5389e-007 99.55 0.82 565.20 5
7 3.3351e-007 1 8709e-007 99.67 1.31 441.01 2
SNV detrend 0 1.8997e-003 0 0

1 3.4320e-004 2.345 le-004 87.66 0.18 761.39 35
2 2.2666e-004 6.3937e-005 96.63 0.97 883.39 27
3 4.3565e-005 1.6059e-005 99.15 0.68 904.81 20
4 3.0980e-005 8.2879e-006 99.56 1.93 873.52 14
Savitzky- 0 2.5273e-007 0 0
Golay 2"“ 1 8.6205e-009 6.4296e-009 97.46 0.03 507.65 14
derivative 2 6.0060e-009 4.087 le-009 98.38 0.93 601.63 9
3 4.3278e-009 2.7883e-009 98.90 1.06 512.83 5
286
Table B6. Multiway PCA results: lower strength blend (RCA) and tablet (RCA).
Absorbance 0 - 2.2114e-004 0 0 -
1 5.1864e-005 4.5513e-005 79.42 0.23 1258.04 54

2 4.5078e-006 3.8245e-006 98.27 0.10 1444.08 44
3 2.6324e-006 1.9818e-006 99.10 0.69 1500.09 35
4 1.1562e-006 8.3717e-007 99.62 0.58 1437.89 27
5 5.9752e-007 4.0400e-007 99.82 0.71 1352.31 20
6 3.3556e-007 2.0858e-007 99.91 0.83 1225.38 14
7 1.9792e-007 1.1063e-007 99.95 0.95 1062.42 9
SNV 0 2.0617e-004 0 0
1 4.5335e-005 3.9636e-005 80.78 0.22 388.45 35
2 3.3000e-005 2.3310e-005 88.69 0.83 498.52 27
3 1.4935e-005 1.0887e-005 94.72 0.64 524.00 20
4 8.2068e-006 5.4248e-006 97.37 0.75 526.59 14
5 4.6967e-006 2.8593e-006 98.61 0.87 495.05 9
Detrend 0 1.9936e-006 0 0
1 8.2742e-007 7.2626e-007 63.57 0.42 226.74 35
2 5.109 le-007 3.9394e-007 80.24 0.70 384.64 27
3 2,0096e-007 1,5679e-007 92.14 0.51 457.02 20
4 1.6188e-007 1 0628e-007 94.67 1.03 483.95 14
SNV detrend 0 1.1895e-004 0 0

1 3.5454e-005 3.1488e-005 73.53 0.30 359.00 35
2 2.1297e-005 1.7173e-005 85.56 0.68 505.71 27
3 1.0739e-005 8,0632e-006 93.22 0.63 556.97 20
4 5.893 le-006 4.2767e-006 96.40 0.73 570.37 14
5 4.0154e-006 2.7064e-006 97.72 0.94 539.62 9
6 3.064le-006 1.9023e-006 98.40 1.13 466.38 5
Savitzky- 0 - 3.8694e-011 0 0 . -
Golay 2"“ 1 2.1007e-011 1.8852e-011 51.28 0.54 46.82 54

derivative 2 1.60l6e-011 !.3415e-011 65.33 0.85 253.17 44
3 1.42l6e-011 1.0387e-011 73.16 1.06 348.46 35
287
Table B7. Multiway PCA results: higher strength blend (RCA) and tablet (RCA).
Data Set PCs PRESS SS %SS R x" d. f.
Absorbance 0 - 1,8636e-004 0 0 -
1 1,7958e-005 I.5929e-005 91.45 0.10 769.35 35

2 6.6317e-006 5.4135e-006 97.10 0.42 889.52 27
3 3.7125e-006 2.7465e-006 98.53 0.69 893.48 20
4 1.6192e-006 1.1086e-006 99.41 0.59 841.54 14
5 1.0191e-006 6.2059e-007 99.67 0.92 763.99 9
6 4.7642e-007 2.8372e-007 99.85 0.77 634.94 5
7 2.4213e-007 1.4028e-007 99.92 0.85 480.84 2
8 1.4030e-007 7.4985e-008 99.96 1.00 278.46 0
SNV 0 2.2307e-004 0 0
1 5.8589e-005 5.088 le-005 77.19 0.26 340.44 35
2 3.5179e-005 2.7080e-005 87.86 0.69 461.98 27
3 2.2223e-005 1.5434e-005 93.08 0.82 496.35 20
4 1,1666e-005 7.6827e-006 96.56 0.76 496.53 14
5 6.8397e-006 4.0711 e-006 98.17 0.89 473.09 9
6 3.6997e-006 2.1260e-006 99.05 0.91 416.39 5
Detrend 0 1.8364e-006 0 0
1 8.1432e-007 7.2469e-007 60.54 0.44 168.12 35
2 5.0930e-007 4.1116e-007 77.61 0.70 339.23 27
3 3.9002e-007 2.6049e-007 85.82 0.95 409.77 20
4 1.9347e-007 1.3238e-007 92.79 0.74 436.65 14
5 1.1370e-007 7.3184e-008 96.01 0.86 438.48 9
6 7.0525e-008 4.2038e-008 97.71 0.96 399.16 5
SNV detrend 0 1.2769e-004 0 0

1 4.7403e-005 4.2703e-005 66.56 0.37 195.17 27
2 2.8092e-005 2.1595e-005 83.09 0.66 359.05 20
3 1.3839e-005 1.074le-005 91.59 0.64 420.50 14
4 9.8221e-006 6.9032e-006 94.59 0.91 430.41 9
5 5.5516e-006 3.8135e-006 97.01 0.80 386.57 5
6 3.8402e-006 2.3709e-006 98.14 1.01 317.99 2
Savitzky- 0 - 3.0210e-011 0 0
Golay 2"“ 1 2.0085e-011 1.7638e-011 41.62 0.66 -67.51 14
derivative 2 1.7120e-011 1.2664e-011 58.08 0.97 62.89 9
3 I.1656e-011 8.5891e-012 71.57 0.92 117.81 5
4 8.6818e-012 6.0954e-012 79.82 1.01 133.42 2
288
Table B8. Multiway PCA results: lower strength blend (RCA) and tablet (Intact).
Raw* 0 - 6.0221e-003 0 0 -
1 5.7318e-004 5.0057e-004 91.69 0.10 1435.57 44

2 1.6859e-004 1.4586e-004 97.58 0.34 1570.16 35
3 3.6290e-005 2.743 le-005 99.54 0.25 1582.98 27
4 5.9125e-006 4.4696e-006 99.93 0.22 1544.85 20
5 3.6375e-006 2.3943e-006 99.96 0.81 1438.90 14
6 1.4189e-006 9.3101e-007 99.98 0.59 1236.52 9
7 9.0220e-007 5.377 le-007 99.99 0.97 1011.82 5
SNV 0 5.3543e-004 0 0
1 2.2612e-004 2,0357e-004 61.98 0.42 440.32 35
2 4.8841e-005 4,0050e-005 92.52 0.24 655.60 27
3 2.9911 e-005 2.1992e-005 95.89 0.75 728.76 20
4 1.435 le-005 9.8170e-006 98.17 0.65 709.27 14
5 7.178le-006 4.9250e-006 99.08 0.73 663.90 9
6 4.7181e-006 3.0184e-006 99.44 0.96 574.13 5
7 3.7015e-006 2.1273e-006 99.60 1.23 436.52 2
Detrend 0 3.8312e-005 0 0
1 8.4304e-006 7.4780e-006 80.48 0.22 403.20 27
2 4.0596e-006 3.2836e-006 91.43 0.54 530.47 20
3 2.0885e-006 1.5127e-006 96.05 0.64 559.43 14
4 9.4316e-007 6.3848e-007 98.33 0.62 544.10 9
5 3.6735e-007 2.6052e-007 99.32 0.58 492.15 5
6 2.7006e-007 1.7042e-007 99.56 1.04 397.74 2
SNV detrend 0 1.6356e-004 0 0

1 7.2459e-005 6.4047e-005 60.84 0.44 278.89 35
2 2.8052e-005 2.2855e-005 86.03 0.44 480.13 27
3 1.5287e-005 1.1520e-005 92.96 0.67 558.10 20
4 8.8599e-006 6.3751 e-006 96.10 0.77 568.22 14
5 5.7952e-006 3.7922e-006 97.68 0.91 536.69 9
6 3.4829e-006 2.2624e-006 98.62 0.92 468.35 5
7 2.4475e-006 1.4597e-006 99.11 1.08 367.02 2
Savitzky- 0 - 1.9480e-009 0 0
Golay 2"“ 1 9.5597e-010 7.7635e-010 60.15 0.49 104.79 20
derivative 2 4.2379e-010 3.2825e-010 83.15 0.55 238.79 14
3 1.8666e-010 1.4089e-010 92.77 0.57 290.09 9
4 1.2009e-010 8.6205e-011 95.57 0.85 292.33 5
5 8.5508e-011 5.6108e-011 97.12 0.99 243.16 2
Multiway PCA model: Blend absorbance data and tablet transmission data.
289
Table B9. Multiway PCA results: higher strength blend (RCA) and tablet (Intact).
Data Set PCs PRESS SS %ss R d.f.
Raw* 0 - 5.3808e-003 0 0 -
1 5.0828e-004 4.3756e-004 91.87 0.09 1018.76 35

2 2.1086e-004 1.5104e-004 97.19 0.48 1131.24 27
3 1.993 le-005 1.4898e-005 99.72 0.13 1135.22 2 0
4 8.3505e-006 5.7367e-006 99.89 0.56 1103.47 14

5 5.1774e-006 2.9962e-006 99.94 0.90 975.74 9
6 1.7527e-006 1.0858e-006 99.98 0.58 799.80 5
7 1.2705e-006 6.8042e-007 99.99 1.17 600.70 2
SNV 0 2.0576e-003 0 0
1 4.0494e-004 3.3618e-004 83.66 0 .2 0 430.34 27

2 1.4897e-004 1.0745e-004 94.78 0.44 549.78 2 0
3 1.0474e-004 4.6409e-005 97.74 0.97 573.35 14
Detrend 0 1.8813e-005 0 0
1 1.4680e-005 9.738 le-006 48.24 0.78 6 6 .6 8 27

2 6.0791e-006 4.2724e-006 77.29 0.62 277.56 2 0
3 3.4738e-006 1.9104e-006 89.85 0.81 369.23 14

4 1.9776e-006 1.0109e-006 94.63 1.04 396.66 9
SNV detrend 0 6.192le-004 0 0
1 2.0730e-004 1.7706e-004 71.40 0.33 325.11 27

2 8.1592e-005 6.2649e-005 89.88 0.46 490.92 2 0
3 4.595 le-005 2.8093e-005 95.46 0.73 546.41 14

4 3.2545e-005 1.8449e-005 97.02 1.16 535.77 9
Savitzky- 0 - 7.0949e-008 0 0 - -
Golay 2"“ 1 2.4743e-009 1.8460e-009 97.40 0.03 486.50 14

derivative 2 1.7267e-009 1.1667e-009 98.36 0.94 577.98 9
3 1.3286e-009 8.0216e-010 98.87 1.14 492.77 5
Multiway PCA model: blend absorbance data and tablet transmission data
290
Table BIO. Multiway PCA results: lower strength blend (RCA) and tablet (RCA
and Intact).
Raw* 0 - 3.5069e-003 0 0 -
1 3.6294e-004 3.1752e-004 90.95 0 .1 0 1523.90 54

2 1.0694e-004 9.2927e-005 97.35 0.34 1670.71 44
3 2.8177e-005 2.1658e-005 99.38 0.30 1692.88 35
4 1.096 le-005 7.9323e-006 99.77 0.51 1660.65 27
5 4.4789e-006 3.1368e-006 99.91 0.56 1559.49 2 0
6 3.1193e-006 1.8856e-006 99.95 0.99 1411.72 14

7 1.6664e-006 9.9394e-007 99.97 0 .8 8 1205.20 9
8 1.0195e-006 5.8816e-007 99.98 1.03 976.16 5
SNV 0 3.6179e-004 0 0
1 1.7427e-004 1.5482e-004 57.21 0.48 235.04 35

2 5.8638e-005 4.7749e-005 86.80 0.38 442.69 27
3 3.7213e-005 2.8030e-005 92.25 0.78 518.30 2 0
4 2.4744e-005 1.6579e-005 95.42 0 .8 8 519.81 14

5 1,4978e-005 9.4908e-006 97.38 0.90 491.02 9
6 8.5909e-006 5.0959e-006 98.59 0.91 434.33 5
7 5.5894e-006 3.1719e-006 99.12 1 .1 0 347.67 2
Detrend 0 2.3182e-005 0 0
1 5.5527e-006 4.8936e-006 78.89 0.24 421.93 35

2 2.9994e-006 2.3925e-006 89.68 0.61 554.92 27
3 1.7340e-006 1.2448e-006 94.63 0.72 591.49 2 0
4 8.7454e-007 6.1198e-007 97.36 0.70 587.92 14

5 5.5607e-007 3.6856e-007 98.41 0.91 552.93 9
6 3.4827e-007 2.1550e-007 99.07 0.94 475.72 5
7 2.069 le-007 1.2032e-007 99.48 0.96 370.55 2
SNV detrend 0 1.2983e-004 0 0
1 6.3945e-005 5.6604e-005 56.40 0.49 144.76 35

2 3.6285e-005 2.9715e-005 77.11 0.64 332.09 27
3 2.3326e-005 1 7353e-005 86.63 0.78 413.26 2 0
4 1.4153e-005 9.6101e-006 92.60 0.82 443.07 14

5 8.4503e-006 5.4496e-006 95.80 0 .8 8 439.31 9
6 5.656le-006 3.5124e-006 97.29 1.04 399.64 5
Savitzky- 0 1.1640e-009 0 0
Golay 2"" 1 6.1913e-010 5.2262e-010 55.10 0.53 334.59 54

derivative 2 2.6854e-010 2.1528e-010 81.51 0.51 592.01 44
3 1.3474e-010 1.0650e-010 90.85 0.63 701.48 35
4 9.8797e-011 7.4375e-011 93.61 0.93 737.55 27
5 8.1261e-011 5.5706e-011 95.21 1.09 716.60 2 0
Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
291
Table B ll. Multiway PCA results: higher strength blend (RCA) and tablet (RCA
and Intact).
Raw* 0 - 5.1944e-003 0 0 -
1 3.0869e-004 2.7908e-004 94.63 0.06 1677.58 54

2 1,2682e-004 1,0546e-004 97.97 0.45 1820.11 44
3 2.2252e-005 1.6983e-005 99.67 0 .2 1 1816.08 35
4 1.1619e-005 8.3142e-006 99.84 0 .6 8 1774.34 27
5 7.8440e-006 5.1694e-006 99.90 0.94 1633.56 2 0
6 5.863 le-006 3.4698e-006 99.93 1.13 1445.42 14
SNV 0 1.4184e-003 0 0
1 2.6442e-004 2.3269e-004 83.60 0.19 822.42 54

2 1.1613e-004 9.4948e-005 93.31 0.50 992.02 44
3 1,0342e-004 6.3332e-005 95.54 1.09 1034.64 35
Detrend 0 1.1324e-005 0 0
1 9.2578e-006 6.2702e-006 44.63 0.82 68.84 35

2 4.2032e-006 2.9978e-006 73.53 0.67 322.69 27
3 2.3736e-006 1 4636e-006 87.07 0.79 437.81 2 0
4 1.7556e-006 8.9420e-007 92.10 1 .2 0 481.62 14

5 9.1283e-007 4.9293e-007 95.65 1 .0 2 471.69 9
SNV detrend 0 3.900 le-004 0 0
1 1.5753e-004 1.3564e-004 65.22 0.40 290.04 35

2 7.1520e-005 5.7048e-005 85.37 0.53 481.10 27
3 5.I308e-005 3.2879e-005 91.57 0.90 553.15 2 0
4 3.7524e-005 1 9292e-005 95.05 1.14 560.83 14
Savitzky- 0 4.1191e-008 0 0
Golay 2"“ 1 1.3994e-009 1.1078e-009 97.31 0.03 1296.77 54

derivative 2 9.8406e-010 7.2603e-010 98.24 0.89 1423.54 44
3 7.8394e-010 5.1952e-010 98.74 1.08 1367.18 35
292
Table B12. Blend PCA Q statistics.
Batch Q > Q ç9
29 9,9989e-004
50 3.8650e-005
76 2.8328e-005
125 4.1612e-005
126 2.7180e-005
141 3.6368e-005
160 2.83606-005
162 4.0577e-005
167 4.3216e-005
168 5.4408e-005
169 6.9286e-005
171 3.41156-005
173 4.10106-005
174 2.9097e-005
180 4.05476-005
193 3.0612e-005
1.4924e-003 29 8.6033e-002
50 2.6435e-003
125 2.4948e-003
141 2.72446-003
162 2.52856-003
167 3.15816-003
168 3.58276-003
169 4.89186-003
171 2.40066-003
173 2.39996-003
180 2.62906-003
193 2.24856-003
29 9.84226-004
91 2.09726-005
125 3.93746-005
160 3.18356-005
162 3.95456-005
163 2.30366-005
167 2.86346-005
168 3.57806-005
169 4.60576-005
170 2.14686-005
171 4.60626-005
173 4.04536-005
174 2.83116-005
180 2.75506-005
193 3.09266-005
SNV Detrend 7.8427C-004 29 7.76506-002

50 1.54516-003
64 1.65586-003
67 1.54646-003
89 1.14226-003
125 1.94756-003
126 1.64996-003
137 1.28146-003
157 1.39276-003
161 1.46986-003
167 1.95906-003
168 2.05326-003
169 3.35146-003
171 1.47746-003
173 1.60966-003
180 2.00196-003
181 1.09756-003
192 1.65876-003
193 2.36156-003
Savitzky-Golay 2" 1.4089e-009 1 1.94346-009

derivative 17 1.73376-009
20 1.74756-009
21 2.88886-009
22 3.10056-009
23 2.90636-009
28 1.94486-009
29 8.20516-008
38 2.00896-009
55 2.49306-009
56 2.57276-009
62 2.22966-009
64 2.09616-009
67 2.17556-009
68 1.93096-009
138 2.00716-009
167 2.21386-009
168 2.25616-009
169 3.12526-009
180 2.24066-009
181 1.75766-009
192 2.09586-009
193 3.88336-009
293
Table B13. Q statistics for lower strength Tablet (RCA) PCA.
Data Set Qgg Q 95 Batch Q > 099
Absorbance 4.9744e-006 4.1779e-006 3 2.0823e-005
7 2.9661e-005
14 1.3612e-005
18 1.6689e-005
21 2.6919e-005
2 2 1.8446e-005
27 9.9993e-006
30 2.5717e-005
31 2.6398e-005
32 2.4187e-005
33 9.1554e-006
38 3.1898e-005
39 5.5400e-005
40 5.7075e-005
SNV 1.1333e-003 8.5092e-004 7 2.2698e-003

42 1.2746e-003
Detrend 9.898 le-006 7.7477e-006 7 2.5652e-005

9 1.3786e-005
11 1.795 le-005
21 2.2040e-005
31 2.0820e-005
SNV detrend 3.3199e-004 2.3178e-004 2 7.5289e-004

7 6.8202e-004
21 8.1532e-004
31 4.1489e-004
38 1.5914e-003
39 2.2810e-003
40 2.8171e-003
Savitzky-Golay 2"'* 2.0109e-009 1.6990e-009 11 2.8535e-009

derivative 13 3.4055e-009
2 0 2.4806e-009
21 3.9365e-009
41 2.7154e-009
Table B14. Q statistics for higher strength Tablet (RCA) PCA.

Data Set Q 99 Ô 95 Batch Q > 099
Absorbance 1.9648e-005 1.5142e-005 32 2.5085e-005
33 3.635 le-005
42 2.801 le-005
SNV 4.8838e-004 3.7428e-004 1 9.4335e-004

4 1.3209e-003
6 1.0309e-003
33 1.2482e-003
39 5.2010e-004
42 7.7018e-004
43 1.0678e-003
Detrend 8.8034e-006 7.0393e-006 4 2.4553e-005

6 2.1819e-005
33 2.5949e-005
SNV detrend 4.5336e-004 3.5054e-004 4 1.1607e-003

6 1.0696e-003
32 6.3294e-004
33 1.4386e-003
Savitzky-Golay 2"“* 1.8816e-009 1.5809e-009 4 7.5230e-009

2 0 2.6156e-009
25 5.9786e-009
294
Table B15. Q statistics for lower strength Tablet (Intact) PCA.
Data Set Ô9 9 Ô95 Batch Q > Q 99
Transmission 5.2093e-005 3.8329e-005 12 1.7977e-004
21 1.3308e-004
31 1.0500e-004
SNV 1.1009e-004 8.5906e-005 12 5.6989e-004

13 1.5413e-004
21 3.1784e-004
27 2.1313e-004
28 2.3024e-004
29 1.4907e-004
Detrend 2.1911 e-005 1.6897e-005 5 1.6097e-004

6 2.0328e-004
12 1.7702e-004
14 4.9454e-005
21 9.5759e-005
23 3.5307e-005
28 5.1909e-005
SNV detrend 3.8494e-005 3.1175e-005 5 3.1986e-004

6 4.3138e-004
12 4.5632e-004
14 1.3133e-004
2 0 6.3377e-005
21 2.6612e-004
23 1.0322e-004
28 1.5675e-004
30 2.0845e-004
33 7.4489e-005
Savitzky-Golay 2"** 5.864le-008 4.9724e-008 12 1.7618e-007

19 1.1108e-007
27 1.2536e-007
30 2.5247e-007
31 1.6950e-007
32 1.4492e-007
295
Table B16. Q statistics for higher strength Tablet (Intact) PCA.
Data Set Qg, Ô 95 Batch Q > Q 99
Transmission 2.1780e-004 1,6979e-004 20 5.2657e-004
26 4.6967e-004
28 4.6604e-004
30 4.9237e-003
31 6.1234e-004
40 9.5833e-003
SNV 4.8715e-003 3.7027e-003 1 1.7733e-002

28 9.4336e-003
29 1.7096e-002
30 4.2784e-002
31 1.4916e-002
32 8.9986e-003
41 1.1527e-002
Detrend 9.5224e-005 7.6020e-005 14 1.2623e-004

27 1.4583e-004
29 4.3460e-004
30 5.9208e-003
38 1.3273e-004
40 1.5912e-002
41 1.4367e-003
SNV detrend 5.2424e-003 3.9417e-003 1 1.3028e-002

19 5.4406e-003
24 8.9175e-003
25 7.1171e-003
29 9.6550e-003
30 5.7097e-002
31 9.4374e-003
41 1.0339e-002
Savitzky-Golay 2"*^ 1.7825e-006 I.3181e-006 1 2.0560e-006

derivative 2 1.878 le-006
21 2.6345e-006
29 2.0629e-006
30 1.0607e-005
296
Table B17. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Q 99
Absorbance 3.9809e-004 3.0530e-004 33 1.1911e-003
35 7.0804e-004
SNV 7.5891e-003 6.3292e-003 9 2.3266e-002

15 1.2068e-002
19 1.5666e-002
25 1.8175e-002
28 3.0518e-002
33 4.1400e-002
35 2.5518e-002
36 1.6482e-002
Detrend 2.9I59e-004 2.4097e-004 10 7.6645e-004

11 3.7375e-004
19 3.1051e-004
36 4.1973e-004
SNV detrend 6.4019e-003 5.0012e-003 15 1.1284e-002

35 8.3091e-003
Savitzky-Golay I"** 3.1322e-008 2.3553e-008 - -
derivative
Table B18. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Ü 99
Absorbance 2.7978e-004 2.1245e-004 1 9.0982e-004
2 1.2517e-003
6 8.1050e-004
8 8.420 le-004
39 4.7903e-004
SNV 7.6923e-003 5.8904e-003 13 1.8904e-002

14 1.5525e-002
18 1.6901e-002
23 2.8408e-002
24 2.7384e-002
25 1.8181e-002
33 8.1036e-003
38 1.7199e-002
39 1.5279e-002
Detrend I.4100e-004 1.1026e-004 8 2.8793e-004

9 3.4958e-004
18 2.6217e-004
35 2.3900e-004
SNV detrend 7.7382e-003 6.0963e-003 18 1.4617e-002
Savitzky-Golay 2"*’ 1.3829e-008 1.1246e-008 14 1.905 le-008

23 2.0146e-008
24 2.3255e-008
38 2.3272e-008
39 2.1624e-008
297
(Intact).
Data Set Q'» Ô95 Batch !2>Ô99
R aw ' 1.5907e-003 1.1738e-003 28 2.7533C-003
SNV 5.1276e-003 4.0039e-003 30 7.5109e-003
Detrend 3.8712e-004 3.0723e-004 20 4.9826e-004

31 3.9653e-004
36 6.169 le-004
SNV detrend 3.6453C-003 2.8187e-003 18 6.7668C-003
Savitzky-Golay 2”“* 1.1668e-007 9.0296e-008 4 2.3788e-007

25 3.0874e-007
26 2.048 le-007
27 1.6877C-007
36 2.2012e-007
37 1.9747C-007
(Intact).
Data Set Qv) 695 Batch Q>Q99
R aw ' 1.4708e-003 1.1593e-003 21 5.2403e-003
27 7.343 le-003
28 1.398 le-002
32 2.5276e-003
38 6.3302e-002
39 8.1246e-003
41 2.2253e-003
SNV 1.0076e-001 7,9322e-002 2 1.0352e-001

13 1.1048e-001
24 1.8528C-001
28 1.8429e-001
29 1.3560e-001
34 1.9496e-001
39 l.0656e-001
Detrend 2.3131e-003 1.7949e-003 24 7.0645e-003

27 4.5675e-003
28 4.8798C-003
SNV detrend 4.451.SC-002 3.404 le-002 13 9.6856C-002
Savitzky-Golay 2"‘‘ 1.737 le-006 1.2968e-006 19 2.2427e-006

derivative 27 2.0910C-006
28 1.0698C-005
(RCA and Intact).
Data Set Q99 Qts Batch 2>Ô99
Raw ' 2.4705e-003 1.8427e-003 10 4.149 le-003
15 3.3556e-003
SNV 1.1685e-002 9.3937e-003 11 1.7378e-002

19 2.0498C-002
28 3.2384e-002
Detrend 5.1087e-004 3.9496e-004 20 1.0090e-003

21 8.9269C-004
36 1.5686e-003
37 1.0384e-003
SNV detrend 1.2784e-002 1.0312e-002 9 2.4090C-002

28 2.1033e-002
Savitzky-Golay 2"** 2.1165e-007 1.5935e-007

derivative
Multiway PCA model: blend absorbance and tablet transmission data.
298
(RCA and Intact).
Data Set Q 99 Q 95 Batch Q > Q 99
Raw* 1.4852e-002 1.1309e-002 - -
SNV 1.7927e-001 1,4223e-001 13 2.3560e-001

24 6.1258e-001
34 5.3285e-001
Detrend 2.0636e-003 1.5826e-003 28 5.8723e-003
SNV detrend 9.1369e-002 6.7729e-002 13 2.0121e-001
Savitzky-Golay 2"'* 1,9274e-006 1.4392e-006 19 2.3320e-006

Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
299
Table B23. Blend PC score Shewhart plots results.
Daia Set Batch PC Score value 99% Control limit
Raw 21 5 -3.4255e-002 -3.21346-002
23 6 -1.4635e-002 -1.35906-002
29 5 4.6468e-002 3.32656-002
29 8 1.3889e-002 8.63976-003
32 4 3.9906e-002 3.54816-002
37 8 -9.5603e-003 -8.54216-003
50 8 -9.229 le-003 -8.54216-003
64 7 1.0903e-002 1.01596-002
68 7 1.0328e-002 1.01596-002
71 6 1.8794e-002 1.42326-002
74 4 3,7637e-002 3.54816-002
127 4 3.9395e-002 3.54816-002
127 7 -1.0824e-002 -1.00026-002
192 6 1.6389e-002 1.42326-002
SNV 21 2 -4.7707e-001 -3.45336-001

22 2 -3.5495e-001 -3.45336-001
24 2 -3.9561e-001 -3.45336-001
28 3 -3.47896-001 -2.73806-001
29 2 4.9455e-001 3.56496-001
29 6 -7.5455e-002 -5.62936-002
29 7 7.3642e-002 4.24966-002
31 3 -3.03996-001 -2.73806-001
33 3 -3.2266e-001 -2.73806-001
63 5 -9.3944e-002 -9.03856-002
64 5 -9.3676e-002 -9.03856-002
69 6 -5.995 le-002 -5.62936-002
74 3 3.36346-001 2.79206-001
110 6 5.8387e-002 5.39806-002
127 6 7.31706-002 5.39806-002
192 6 -6.36626-002 -5.62936-002
Detrend 17 4 1.64186-002 1.51686-002

21 4 2.10266-002 1.51686-002
22 4 2.04566-002 1.51686-002
23 4 1.89506-002 1.51686-002
24 4 1.66096-002 1.51686-002
29 6 -1.34696-002 -8.96906-003
64 6 1.01366-002 8.78766-003
74 4 1.82736-002 1.51686-002
76 5 1.26186-002 1.17616-002
127 5 1.70686-002 1.17616-002
192 6 -1.00196-002 -8.96906-003
193 2 4.38616-002 4.14376-002
SNV Detrend 21 2 -4.77076-001 -3.45336-001

22 2 -3.54956-001 -3.45336-001
24 2 -3.95616-001 -3.45336-001
28 3 -3.47896-001 -2.73806-001
29 2 4.94556-001 3.56496-001
29 6 -7.54556-002 -5.62936-002
29 7 7.36426-002 4.24966-002
31 3 -3.03996-001 -2.73806-001
33 3 -3.22666-001 -2.73806-001
63 5 -9.39446-002 -9.03856-002
64 5 -9.36766-002 -9.03856-002
69 6 -5.99516-002 -5.62936-002
74 3 3.36346-001 2.79206-001
110 6 5.8387e-002 5.39806-002
127 6 7.31706-002 5.39806-002
192 6 -6.36626-002 -5.62936-002
Savitzky-Golay 2'“' 29 5 7.51416-005 7.21036-005

derivative 29 6 6.3694C-005 6.04936-005
29 7 -3.32166-004 -7.87026-005
125 3 -1.25856-004 -9.79626-005
136 5 -9.37706-005 -7.09226-005
137 5 -7.80756-005 -7.09226-005
138 5 -7.23816-005 -7.09226-005
152 5 -7.1721e-005 -7.09226-005
157 6 7.85556-005 6.04936-005
161 6 7.55546-005 6.04936-005
171 3 -1.30756-004 -9.79626-005
173 3 -1.18686-004 -9.79626-005
174 4 1.22576-004 9.50776-005
300
Table B24. Blend PC score Shewhart plot results: dispersion of sub-group centred
scores.
Data Set Batch PC 99% Control limits (+ /-) Observations exceeding 99% lim its'
Raw data g 1 +4.1520e-001,-4.1520e-(X)l 5
21 1 +4.1520e-001, -4.1520e-001 3
48 1 +4.1520e-001, - 4 .1 520e-001 3
49 1 +4.1520e-001,-4.1520e-001 5
49 + 1.6328e-001.-1.6328e-001 4
53 1 +4.1520e-001,^.1520e-001 8
53 + 1.6328e-001,-1.6328e-001 3
146 1 +4.1520e-001.-4.1520e-001 5
189 1 +4.1520e-001,-4.1520e-001 4
189 +4.4435e-002, -4.4435e-002 6
192 +1.0889e-(X)2, -1.0889e-002 4
SNV 8 1 +5.4989e-001, -5.4989e-001 4

8 +2.8536e-001,-2.8536e-001 3
15 1 +5.4989e-001, -5.4989e-001 3
16 1 +5.4989e-001, -5.4989e-001 4
25 1 +5.4989e-001, -5.4989e-001 3
27 1 +5.4989e-001, -5.4989e-001 3
27 +2.8536e-001, -2.8536e-001 3
39 1 +5.4989e-001,-5.4989e-001 5
39 +2.8536e-001,-2.8536e-001 3
40 1 +5.4989e-001, -5.4989e-001 3
49 1 +5.4989e-001, -5.4989e-001 5
49 +2.8536e-001, -2.8536e-001 4
58 1 +5.4989e-001, -5.4989e-001 6
58 +2.8536e-001, -2.8536e-001 3
59 1 +5.4989e-001,-5.4989e-001 6
59 +2.8536e-001,-2.8536e-001 3
64 1 +5.4989e-001,-5.4989e-001 6
64 +2.8536e-001, -2.8536e-001 4
71 +2.5687e-001, -2.5687e-001 4
146 1 +5.4989e-001,-5.4989e-001 6
154 + 1.2447e-001, -1.2447e-001 3
186 -rl.2447e-001.-1.2447e-001 3
189 1 -r5.4989e-001, -5.4989e-001 8
189 2 +2.8536e-001, -2.8536c-001 7
192 6 + 1.2447c-001,-1.2447e-001 4
Detrend 8 3 +2.1745C-002, -2.1745e-002 5

15 4 +5.6283e-003, -5.6283e-003 3
59 3 +2.1745e-002,-2.1745e-(X)2 4
64 1 + 2.9354e-002, -2.9354c-002 4
64 3 +2.1745e-002, -2 .1 745c-002 4
71 6 + 1.2227e-002, - 3
146 1 +2.9354C-002. -2.9354c-002 8
186 5 +9.9877C-003, -9.9877e-003 3
186 7 +5.2675e-003, -5.2675e-003 3
189 1 +2.9354C-002, -2.9354e-(X)2 6
189 3 + 2 .1745C-002, - 2 . 1745e-002 7
192 6 + 1.2227e-002, -1.2227e-002 4
SNV Detrend 8 1 +2.1727e-001, -2.1727e-001 3

8 3 + 1.8377e-001, -1.8377e-001 3
31 7 +3.1738e-002, - 3 . 1738e-(X)2 3
37 4 +6.0075e-002, -6.(X)75e-002 5
39 1 +2.1727e-001, -2.1727C-001 4
43 1 +2.1727e-001,-2.1727e-001 3
43 4 +6.0075e-002, -6.0075e-002 4
43 7 +3.1738e-002,-3.1738e-002 5
44 4 +6.0075e-(X)2, -6.0075c-002 4
53 7 +3.1738e-002, -3 .1 738e-002 6
58 1 +2.1727e-001,-2.1727e-CX)l 4
59 1 +2.1727e-001,-2.1727e-001 3
64 1 +2.1727e-001,-2.1727e-001 6
64 3 + 1.8377e-001,-1.8377e-001 5
146 1 +2.1727e-001,-2.1727e-001 9
146 4 +6.0075e-002, -6.0075e-002 3
150 5 +1.0842e-001, - 3
154 6 +7.465le-002, -7 .4 6 5 le-002 4
189 1 +2.1727e-001,-2.1727e-001 8
189 3 +1.8377e-001,-1.8377e-001 7
192 5 +1.0842e-001, -1.0842e-001 4
Savitzky-Golay 21 5 +4.9628e-005, -4.9628e-005 3

2“* Derivative
53 7 +2.2707e-005, -2.2707e-005 4
64 6 +2.9123e-(X)5, -2.9123e-(X)5 4
142 4 +1.3328e-004,- 3
146 1 +6.5719e-005, -6.5719e-005 8
150 3 +6.4059e-(X)5, - 3
150 4 +1.3328e-004,- 3
182 6 +2.9123e-005, -2.9123e-005 3
186 3 +6.4059e-005, -6.4059e-005 3
186 4 + 1.3328e-004, -1.3328e-004 4
” Restricted to a minimum o f 3 observations (equivalent to one blend sample out of three per batch). Control limits are shown as positive and negative where observations fall
beyond upper and lower limits.
301
Table B25. PC score Shewhart plot results for lower strength Tablets (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 18 4 3.9412e-002 3.8330e-002
39 6 2.0824e-002 1.7222e-002
40 6 1.8152e-002 1.7222e-002
SNV 2 4 -8.6349e-002 -8.0326e-002

5 3 1.1017e-001 1.0719e-001
10 5 -4.2980e-002 -4.2878e-002
Detrend 2 2 -5.9015e-002 -5.6332e-002

5 4 1.9610e-002 1.6383e-002
SNV detrend 2 4 -8.6349e-002 -8,0326e-002

5 3 1.1017e-001 1.0719e-001
10 5 -4.2980e-002 -4.2878e-002
Savitzky-Golay 2"“*derivative 2 2 -2.5744e-004 -2.2580e-004

5 3 1.8518e-004 1.4897e-004
31 4 1.1906e-004 1.0393e-004
42 5 9.0237e-005 8.0597e-005
Table B26. PC score Shewhart plot results for higher strength Tablet (RCA).
Absorbance 3 1 4.0676e-001 3.4801e-001
SNV - - -
Detrend 36 1 7.9890e-002 7.6085e-002

41 5 -6.7603e-003 -5.7996e-003
SNV detrend 41 5 -4.8264e-002 ^.4 6 9 8 e-0 0 2

42 6 -3.2585e-002 -3.1940e-002
Savitzky-Golay 2“'* 41 6 5.5241e-005 5.2591e-005

derivative
Table B27. PC score Shewhart plot results for lower strength Tablet (Intact).
Transmission 2 4 -1.2771e-001 -1.2343e-001
6 8 1.6276e-002 1.58I2e-002
SNV 5 6 3.0174e-002 2.9772e-002

27 7 3.1849e-002 2.8843e-002
27 8 -2.2767e-002 -2.2317e-002
Detrend 5 7 -1.9020e-002 -1.5136e-002

30 2 -2.1765e-001 -2.0130e-001
SNV detrend 5 7 -2.7288e-002 -2.2561e-002

27 6 2.0466e-002 1.9752e-002
30 2 -3.6439e-001 -3.3250e-001
31 8 -1.3180e-002 1.2178C-002
Savitzky-Golay 2"'' 5 5 -4.871 le-004 -4.7223e-004

derivative 6 5 -5.1441e-004 ^ .7 2 2 3 e -0 0 4
41 6 -3.9947e-004 3.5901e-004
Table B28. PC score Shewhart plot results for higher strength Tablet (Intact).
Transmission 36 4 1.3984e-001 1.2727e-001
40 1 9.705 le-tOOO 8.7008e-f000
40 3 -3.3700e-001 -3.2879e-001
40 6 -1.8427e-001 -9.8051e-002
40 7 -3.0900e-002 -2.5547e-002
43 5 -6 .4 2 6 le-002 -6.2210e-002
SNV 30 5 2.0175e-001 1.7988e-001

36 3 ^ .2159e-001 —4.0883e-001
40 2 -1.2174e-tOOO —6.6052e-001
Detrend 1 1 -2.6050e-001 -2.4802e-001

30 7 -1.6410e-002 -1.4472e-002
36 3 1.4791e-001 1.0737e-001
40 2 -3.0849e-001 -2.3694e-001
40 4 2.5635e-001 1.3788e-001
40 7 1.8963e-002 1.5376e-002
SNV detrend 36 3 -3.0932e-001 -2.9049e-001

40 2 8.9516e-001 5.4008e-001
Savitzky-Golay 2"“* 36 2 -2.3379e-003 -1.8498e-003

derivative 40 1 3.6740e-002 2.0559e-002
302
Table B29. PC score Shewhart plot results for multiway PC A: lower strength
blend (RCA) and tablet (Intact).
R aw ' -
SNV 25 3 3.6550e-001 3.6412e-001
Detrend 25 2 -2.0095e-001 -1.7757e-001
SNV detrend 10 7 —8.6866e-002 -7.6995e-002

25 3 -3.2550e-001 -2.9671e-001
Savitzky-Golay 2"'“ - - -
derivative
Table B30. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (Intact).
R aw ' 21 7 7.5409e-002 7.3657e-002
38 1 9.4213e-t4XX) 8.3792e+000
38 4 ^ .6845e-001 -3.4343e-001
SNV 38 3 -1.087264000 -6.2344e-001
Detrend 28 2 -2.6072e-001 -2,3232e-001

34 3 1.4293e-001 1.2496e-001
38 1 -3.7778e-001 -2.4695e-001
SNV detrend 38 3 7.5972e-001 4.8390e-001
Savitzky-Golay 2"'* 34 2 -2.2685e-003 -l.8924e-003

derivative 38 1 3.6585e-002 2.0853e-002
Table B3I. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA).
Absorbance 10 7 -3.4132e-002 -3.1283e-002
15 6 4.6640e-002 4.4400e-002
25 5 6.7545e-002 6.7212C-002
SNV 25 4 4.0872e-001 3.5299e-001
Detrend 20 4 -2.3329e-002 -2.3318e-002
SNV detrend - -
Savitzky-Golay 2"'' 18 3 -1.8611e-004 -1.5796e-004

derivative
blend (RCA) and tablet (RCA).
Absorbance - - -
SNV 24 3 3.9296e-001 3.9053e-001
Detrend 24 2 7.4402e-002 5.5530e-002
SNV detrend 9 6 1.3182e-001 1.2260e-001

13 3 -3.4029e-001 -3.3292e-001
24 2 -6.1649e-001 ^,68 5 8 e-0 0 1
Savitzky-Golay 2"'
derivative
Multiway PCA model; blend absorbance and tablet transmission data.
303
Table B33. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA and Intact).
Raw* 36 8 -7.3195e-002 -7.1348e-002
SNV 25 3 5.3250e-001 4.8857e-001

35 6 -2.5187e-001 -2.3543e-001
Detrend 25 2 -2.0833e-001 -1.7886e-001
SNV detrend 25 3 -4.7052e-001 -3.9068e-001
Savitzky-Golay 2"'' - - - -
derivative
blend (RCA) and tablet (RCA and Intact).
Raw* 24 3 -1.1030e+000 -1.0395e-K)00
38 1 8.8263e+000 T7480e+000
SNV 38 3 -1.0555e-H000 -6.1915e-001
Detrend 24 5 -7.7750e-002 -6.9869e-002

28 2 -2.546 le-001 -2.3106e-001
34 3 1.6589e-001 1.3503e-001
38 1 -3.9147e-001 -2.4680e-001
SNV detrend 24 3 6.3643e-001 5.3612e-001

38 4 5.9262e-001 4.0872e-001
Savitzky-Golay 2"** 34 2 -2.275 le-003 -1.8968e-003

derivative 38 1 3.6375e-002 2,0646e-002
304
Table B35. MSPC of blend PCA models (absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (C ontrol Phase 1) PCs approximation normal
(99.99% limit)
42.42 11 1,5 15.56 14 235.00
47.86 21 5 15 1506.35
35.28 22 2, 5 ,6 20 19.26
34.99 23 2 ,5 ,6 23 20.26
37.61 24 5 ,6 33 29.40
30.78 28 1 ,3 ,4 37 703.16
58.44 29 5 43 83.02
26.05 31 1,2, 3 ,4 64 75.56
41.33 33 1 .3 ,4 67 26.54
25.93 49 1 109 177.71
33.91 53 1,5 115 191.46
121.12 83 1 144 44.90
180.46 84 1, 4, 5, 6 146 136.65
155.58 85 1.5 154 16.90
164.05 86 1.5 182 25.51
105.12 87 1
158.76 88 1.5
158.58 89 1.5
148.31 90 1.5
118.45 91 1,5
97.52 92 1
80.67 93 1
130.34 94 1.5
134.36 95 1,5
141.11 96 1,5
131.75 97 1.5
146.04 98 1,5
126.11 99 1,5
126.02 100 1,5
121.52 101 1.5
145.15 102 1,5
157.45 103 1.5
77.55 104 1
154.40 105 1.5
125.38 106 1,5
103.77 107 1.5
105.89 108 1
85.85 109 1.2
91.22 110 1
67.03 111 1
80.60 112 1
146.27 113 1.5
140.40 114 1,5
75.70 115 1,6
68.11 116 1
81.93 117 1
70.42 118 1
111.21 119 1
146.84 120 1
105.21 121 1
124.06 122 1
122.44 123 1
117.02 124 1
141.88 125 1,5
136.21 126 1.5
132.71 127 1
87.53 128 1 .5 ,6
96.66 129 1 ,6
81.93 139 1,5
127.70 140 1.5
82.16 141 1 ,4 ,5
85.95 142 1
86.88 143 1
76.25 144 1 ,4 ,5
111.79 145 1,5
115.15 149 1
94.50 151 1,4
120.22 152 1
146.32 153 1
142.07 154 1
305
Table B36. MSPC of blend PCA models (SNV absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (ControlPhase I) PCs approximation normal
(99.99% limit) approximation
24.90 1 1,3 13.48 8 14.24
40.81 8 1 ,2 ,4 14 450.36
64.86 11 1 ,2 ,4 15 114.50
28.20 12 4. 1,2 16 57.91
46.04 14 1 .2 ,4 27 25.95
50.39 21 2, 4 ,3 37 621.11
31.48 22 4, 2 ,3 39 18.19
29.17 23 4 ,2 40 75.23
50.11 24 2 ,4 43 48.93
33.13 25 1 .2 ,4 44 13.68
19.94 26 1 .4 59 78.21
38.03 27 1 64 21.51
63.31 28 1 .3 ,2 67 23.34
53.98 29 2, 1 ,3 ,4 109 23.47
44.65 31 1.2, 3 ,4 144 194.38
74.33 33 1.2 146 273.34
29.53 40 4. 1.2
38.38 48 1 .2 ,4
46.14 49 1.2
49.63 53 1 .2 ,4
21.39 64 1.5
21.78 67 1
93.94 83 1 .2 ,4
142.89 84 1 .2 ,4
117.64 85 1 ,2 ,4
131.87 86 1 .2 ,4
76.66 87 1 .2 ,4
122.11 88 1 .2 ,4
117.46 89 1 ,2 ,4
119.29 90 1 .2 ,4
93.01 91 1 .2 ,4
73.97 92 1 .2 ,4
55.95 93 1 .2 ,4
102.14 94 1 ,2 ,4
112.52 95 1 ,2 ,4
118.24 96 1 .2 ,4
102.25 97 1 .2 ,4
111.25 98 1 .2 ,4
106.37 99 1 .2 ,4
106.17 100 1 .2 ,4
100.64 101 1 .2 ,4
110.94 102 1 .2 ,4
131.39 103 1 .2 ,4
56.92 104 1 .4 ,6
120.23 105 1 .2 ,4
106.86 106 1 .2 ,4
80.93 107 1 .2 ,4
79.12 108 1 ,2 ,4
66.23 109 1 .2 ,6
64.45 110 1 .6 ,4
49.54 111 1. 2
57.11 112 1. 2
116.17 113 1 .2 ,4
109.96 114 1 .2 ,4
61.92 115 1 .4 ,6
48.04 116 1
56.70 117 1.2
48.78 118 1
65.65 119 1 .2 ,6
95.97 120 2, 1
77.02 121 1 ,2 ,4
76.39 122 2, 1
77.15 123 1. 2
84.44 124 1 .2 ,4
105.84 125 1 .2 ,4
103.68 126 1 .2 .4
93.25 127 1.2, 6 ,4
50.49 128 1 .2 ,6
68.91 129 1 .2 ,4
56.70 139 1. 2
93.70 140 1.2
76.78 141 1 .4 ,5
54.77 142 1.2
59.85 143 1.2
62.12 144 1.2
90.93 145 1 .2 ,4
77.23 149 1.2
69.34 151 1.2
71.88 152 1.2
90.85 153 2. 1
87.36 154 2. 1
22.98 189 3 .2 ,1
306
Table B37. MSPC of blend PCA models (detrend absorbance data).
Data Set Control PCs 7^„ 9 ,^ T ^> FI. t n - n . 9 9Î Batch Implicated Anderson's normal Batch Anderson's
Batches (Control Phase 1) PCs approximation normal
Detrend 139 7 20.51 28.98 21 1.3 14.5573 8 18.58
33.1572 29 6, 2 ,3 14 317.40
25.0336 83 1 16 25.31
49.3930 84 1,5 37 160.49
42.6574 85 1 ,2 43 100.85
44.8700 86 1 ,2 59 635.11
21.4141 87 1 64 39.21
43.1488 88 1 ,2 67 35.78
40.9772 89 1 ,2 109 152.77
35.9743 90 1 ,2 115 30.85
27.6983 91 1 ,2 142 21.73
37.7630 94 1 ,2 144 94.74
33.0183 95 1 146 30.59
36.1442 96 1,2 152 56.67
35.7966 97 1 ,2 186 26.61
32.9009 98 1
30.9418 99 1
31.0504 100 1
30.7297 101 1 ,2
36.6090 102 1 ,2
42.2416 103 1 ,2
37.5513 105 1 ,2
32.8472 106 1
20.5247 107 1
21.0809 108 1
39.0639 113 1 ,2 ,4
31.5319 114 1,4
23.3561 119 1
36.2788 120 1 ,2 ,3
24.6408 121 1,2
25.4348 122 1,3
23.3730 123 1
31.6397 124 1
38.3432 125 1,2
44.4650 126 1 ,2 ,7
29.2552 127 5, 1 ,6 ,3
32.7394 140 1,2
41.4892 141 1 ,5 ,7
23.5197 144 1
31.5817 145 1,2
31.7742 149 1,2
27.3713 151 1
27.2661 152 1
37.0055 153 1,3
36.5750 154 1,3
307
Table B38. MSPC of blend PCA models (SNV detrend absorbance data).
Data Set Control n. m-«. 99% TST». Batch Implicated Anderson's normal Batch Anderson’s
SNV 123 7 20.8037 35.22 21 1,4 14.5573 8 33.37
Detrend 21.67 22 1 .4 14 273.01
25.61 24 1 ,4 15 273.05
77.88 29 1 ,2 , 6 ,7 16 267.84
22.46 33 1,3 26 84.36
87.34 83 1 ,4 27 128.08
137.62 84 1 ,3 ,4 28 34.95
117.26 85 1 ,2 ,4 33 36.02
127.73 86 1 37 164.11
70.19 87 1 39 66.28
116.19 88 1 43 163.99
112.38 89 1 59 125.17
115.21 90 1 64 127.47
91.36 91 1 67 25.53
71.36 92 1 144 677.97
52.47 93 1 146 213.66
99.20 94 1 ,3 ,4
107.55 95 1 ,4
112.74 96 1,4
97.29 97 1,4
105.36 98 1 ,3 ,4
102.24 99 1 ,2 ,4
100.29 100 1 ,2 ,4
94.78 101 1 ,4
108.45 102 1 ,2 ,4
125.86 103 1 ,3 ,4
51.75 104 1 ,6
118.36 105 1 ,3 ,4
102.00 106 1 ,3 ,4
74.80 107 1,4
75.22 108 1
65.06 109 1 ,2 ,6
62.31 110 1 ,6
50.20 111 1
54.45 112 1
113.31 113 1 ,2 ,4
106.53 114 1 ,2 ,4
57.20 115 1 ,6
42.74 116 1
53.47 117 1
44.33 118 1
62.26 119 1
93.72 120 1,3
73.30 121 1
74.31 122 1 ,2 ,3
75.38 123 1 ,2 ,3
81.15 124 1,3
106.71 125 1,3
101.65 126 1,3
98.10 127 1 ,2 ,6
51.59 128 1
66.20 129 1
53.47 139 1
92.73 140 1
79.06 141 1 ,4
55.30 142 1
59.02 143 1
63.77 144 1
90.65 145 1
74.82 149 1
66.93 151 1,3
72.84 152 1,3
89.43 153 1,3
86.73 _______ 154 1,3
308
Table B39. MSPC of blend PCA models (Savitzky-Golay 11 point 2"^ derivative of
absorbance data).
Data Set Control PCs 7^„ 99, m-n. Batch Implicated Anderson's normal Batch Anderson's
Savitzky- 91 7 21.7160 25.6375 21 5 .6 14.5573 33 18.79
Golay 2'“' 623.2553 29 5 .7 36 25.01
derivative 29.3948 32 2. 5 ,7 89 44.95
22.5586 48 5 90 29.68
32.6354 50 3 .5 .7 102 16.18
32.3917 63 1 .2 .3 106 84.83
114.5680 83 1 109 62.79
184.4618 84 1 125 5851.04
164.7183 85 1.2 126 3989.45
195.6259 86 1.2 144 5686.18
93.6012 87 1 .2 ,3 186 38.09
168.3523 1.2
170.9391 89 1.2
175.9201 90 1 .2 ,3
132.6630 91 1.2
87.0157 92 1
65.1144 93 1
133.6834 94
156.8426 95
173.9118 96
157.7016 97 1 .2 .3
151.5180 98 1 .2 .3
144.9607 99 1
145.0235 100 1
151.8975 101 1.2
155.1517 102 1.2
193.0451 103 1 .2 .3
57.4501 104 1
154.7630 105 1.2
145.7133 106 1.2
98.8913 107 1.2
99.4680 108 1
77.8768 109 1.3
49.1095 110 1 .2 .3
56.2726 111 1.3
65.3618 112 1.3
154.3364 113 1.2
139.8857 114 1
53.3653 115 1 .3 .7
51.0597 116 1 .3 .7
76.4228 117 1.2
49.5355 118 1 .2 .3
106.1942 119 1.5
166.6243 120 1.5
120.4525 121 1,2. 5 ,7
119.3954 122 1.5
133.2169 123 1.5
164.7556 124 1.5
156.3455 125 1.2
172.6288 126 1 .2 .5
70.4721 127 1 .3 .6
50.8225 128 1 .3 .6
83.0562 129 1.3
173.7180 136 5. 6 ,7
104.4856 137 2 .5
73.5134 138 2 .5
76.4228 139 1.2
138.2863 140 1 .2 .3
104.4436 141 1.3
58.2344 142 1.3
63.3440 143 1
93.0315 144 1
122.1246 145 1
129.4913 149 1
104.9995 151 1.3
155.6240 152 1.5
184.8494 153 1.5
171.9002 154 1.5
34.8586 156 2 .5
66.6923 157 5 .6
80.8661 161 5 .6
24.1009 162 3 .4
34.6592 170 2 .5
54.6877 171 3. 5 .6
62.9707 172 2 .5 .6
57.8558 173 2 .3
30.1715 174 3 .4
309
Table B40. MSPC of lower strength tablet PCA models (RCA).
Data Set Control PCs T^> 7^n. m-«. 99« Batch Implicated Anderson's normal Batch Anderson's normal
batches (Control Phase 1) PCs approximation approximation
(99.99% limit)
Absorbance 35 8 33.82 71.90 5 2, 1 15.56 4 76.78
47.12 6 2 9 689.51
35.06 8 1,8
36.82 10 1 ,7 ,6
48.93 16 1,8
SNV 24 6 32.06 89.69 5 4 13.48 4 30.76

47.60 6 4 9 91.82
38.16 26 1 15 187.56
71.99 30 1
35.65 38 5
80.85 39 5 ,3
46.31 40 5
54.96 44 1,5
Detrend 34 7 48.41 14.56 4 46.70

15 599.20
16 25.15
SNV detrend 17 7 61.67 62.74 1 6 14.56 4 19.98

385.00 2 1 ,6 ,4 9 17.94
161.66 3 6, 1,2 15 404.05
168.65 5 3 16 24.80
163.08 6 3
168.47 7 6, 1,2
68.29 9 3
148.74 15 6, 1
165.96 16 2 ,6
64.96 19 6, 5 ,4
84.15 29 3, 2 .4
Savitzky- 26 6 30.15 52.52 10 4 13.48 4 54.40

Golay 2°‘‘ 89.81 18 4 ,3 13 102.00
derivative 32.42 21 4 15 199.00
30.39 22 4 21 127.73
156.65 30 4, 1
197.62 31 4
96.29 32 4
40.96 38 4
37.84 39 4
38.01 40 4
Table B41. MSPC of higher strength tablet PCA models (RCA).

Data Set Control PCs m-fi. 99'X 7^> n, in-r. 99% Batch Implicated Anderson's normal Batch Anderson's normal
batches (Control Phase 1) PCs approximation approximation
(99.99% limit)
Absorbance 32 7 30.94 48.58 3 1 14.56 4 24.47
11 292.02
12 129.88
19 47.18
43 110.57
SNV 20 7 47.71 64.31 3 3, 2. 1 14.56 4 64.71

53.53 23 3 11 116.65
60.38 36 1,5, 2 ,4 18 28.36
113.96 40 4, 5 .3 19 156.03
72.94 41 5 .4 43 40.96
Detrend 22 7 42.44 42.85 12 14.56 4 50.28

49.20 36 11 20.15
54.82 40 18 373.97
51.10 41 19 259.03
SNV detrend 21 7 44.82 64.04 6 1,3 14.56 18 189.67

44.93 7 1,3 19 371.18
71.33 8 1 ,3 .6 34 22.14
70.29 11 1,3 43 42.49
150.19 18 3. 1
166.88 26 1 ,6
105.72 30 1,5, 6 ,7
231.95 36 1 ,5 ,6
82.23 39 1 ,6
114.32 43 6, 5. 1
Savitzky- 28 6 28.66 40.29 3 2 13.48 4 745.15

Golay 28.98 13 2 11 33.54
derivative 39.50 19 4, 6 ,2 19 213.59
32.28 41 6, 4, 1 43 28.98
310
Table B42. MSPC of lower strength tablet PCA models (Intact).
Data Set Control PCs m-n. 99% Batch Implicated PCs Anderson's normal Batch Anderson's
batches (Control Phase 1) approximation normal
Transmission 41 9 34.86 92.50 38 2. 1.3 16.51 1 -217.45
115.28 39 2. 1,3 6 120.46
135.78 40 2, 1,3 19 80.15
30 68.29
32 28.52
39 2519.17
SNV 17 8 82.33 394.65 5 6 15.56 6 39.78

181.01 6 6 ,3 30 39.91
91.66 8 6 39 2027.54
118.27 9 6
104.54 25 1
93.99 26 1
1413.65 30 6, 1
1491.62 31 6, 1,8
1565.12 32 6, 1,8
1220.39 38 1 ,3 ,6
1946.85 39 1 ,6 ,3
1943.94 40 1 ,6 ,3
Detrend 32 8 57.44 63.65 38 1,4 15.56 1 303.70

71.14 39 1,4 9 43.36
84.86 40 1 ,4 ,3 17 216.84
30 159.72
32 188.62
39 18.72
SNV detrend 33 8 35.07 39.71 5 1,7 15.56 17 66.50

52.38 30 2 39 633.97
159.73 38 1.3
183.35 39 1.3
191.50 40 1,3
Savitzky-Golay 35 6 25.24 47.51 5 5, 1 13.48 9 113.90

2“'“ derivative 29.92 6 5 18 21.97
63.09 38 1 ,2 ,3 19 46.16
90.75 39 2, 3, 1
92.94 40 2, 1,3
311
Table B43. MSPC of higher strength tablet PCA models (Intact).
Data Set Control PCs n. m-fi. 99a 7^». >»-». 99* Batch Implicated Anderson’s normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
Transmission 25 7 37.28 437.21 25 1 .5 ,3 14.56 22 95.19
261.36 26 1.5 40 15.61
237.65 27 1 .3 .5 41 455.95
239.99 28 1 .6
638.92 29 1 .6 ,5
2020.40 30 6. 1,2
289.32 31 1 .6
47.51 38 1
10423.23 40 6. 1 .3 .5
2196.21 41 1.6, 5 ,4
SNV 29 5 23,50 23.92 2 4 12.30 5 20.12

1865.05 13 1 .5 ,4 ,3 22 181.92
218.45 14 1 .4 30 35.99
1442.91 15 1 .4 ,5 40 128.54
397.21 22 1 41 16.98
139.15 23 1.3
479.36 24 1.5
87.29 28 1.3
1482.03 29 1 .4 , 3 ,5
4587.34 30 1 .5 ,4 , 3 ,2
329.09 31 1 .5 ,3
3077.95 40 1 .2 ,4 ,3
179.77 41 1 .2 ,3
Detrend 32 7 30.94 41.08 1 1 14.56 25 30.40

64.40 30 2, 4 ,7 40 35.76
275.75 40 4, 2 ,7 41 622.95
55.73 41 2 ,4
SNV detrend 23 4 21.75 33.27 2 1 11.00 22 58.49

31.63 4 1 26 20.50
-42.60 6 1 30 64.20
61.52 13 1 40 48.09
52.60 15 1 41 22.83
31.31 22 1
25.97 24 1
33.00 25 1.2
24.19 26
48.59 28 1
141.29 29 1
331.80 30 1.2
64.28 31 1
1067.67 40 2. 1
200.99 41 2, 1
Savitzky-Golay 29 3 15.51 60.60 25 1 9.53 29 23.12

2"^ derivative 38.21 26 1 30 29.44
33.22 27 1 40 305.13
31.50 28 1 41 130.30
91.48 29 1
250.66 30 1
52.60 31 1
1316.8 40 1
273.22 41 1
312
Table B44. MSPC of lower strength blend (RCA) and tablet (RCA) multiway PCA
models.
C ontrol PCs t ‘> Im plicated Anderson’s normal Anderson's
batches (Control Phase 1) PC s approximation normal
14,56 42,87
247,51
42,22 1 3. 1 12,30 22 12,95

34,49 4 1.3 24 19,07
52,10 8 3. 1
36,33 14 1
82,51 1 1 11,00 6 20,14

101.72 2 1 7 23,69
107,15 3 1
130,48 4 1
125,20 5 1
126,68 6 1
143,01 7 1
88,05 8 1
56,33 9 1
65,26 10 1
89,53 11 1
113,07 12 1
133,59 13 1,3
102,33 14 1
114,56 15 1
108,81 16 1
134,63 17 1
150,54 18 1.3
SNV detrend 30 40,24 25 2 .4 13.48 7 115,59

32,49 33 5 .4 ,2
Savitzky- 217,59 1 1 9,53 6 28,95

Golay 2"'* 390.96 2 1.3 7 12,42
derivative 313.55 3 1 8 15,52
273,96 4 1 32 11,19
374,73 5 1.3
312,74 6 1
340,16 7 1
286,54 8 1
163,58 9 1
33,15 10 1
152,11 11 1
67,37 12 1.3
101,56 13 1.3
272,28 14 1.3
223,58 15 1.3
75,48 16 1.3
100,70 17 1.3
193,19 18 3. 1
313
Table B45. MSPC of higher strength blend (RCA) and tablet (RCA) multiway
PCA models.
Data Set Control PCs .-..« % Batch Implicated Anderson's normal Batch Anderson's
Absorbance 41 8 32.20 - 15.56 4 70.30
32 16.73
38 20.62
40 25.38
SNV 27 6 29,37 32.23 24 3 .4 13.48 2 39.06

30.06 32 1,2 16 109.60
17 18.54
Detrend 18 6 43.26 293.51 1 5 ,4 13.48 2 23.58

86.84 2 5, 3 ,4 7 15.11
93.36 8 5 15 16.83
400.07 13 5 ,2 16 44.65
95.87 14 4 ,3 19 105.55
262.87 15 1 .5 ,4 ,3
95.52 18 5
57.68 20 2
440.81 21 5 ,2
410.31 24 2 ,4
44.41 27 5 ,2
110.67 29 4
183.19 32 5 ,2
318.14 34 2 ,4
72.68 36 2
213.81 39 4 ,2 ,5
284.34 40 2 ,5
SNV detrend 35 6 25.35 64.29 1 6 ,3 13.48 2 51.70

41.85 8 6 15 18.46
56.08 9 6 16 20.88
61.20 13 3 19 47.64
45.70 19 6 ,3
27.00 24 2
Savitzky- 31 4 18.87 151.03 1 2 11.00 7 22.44

Golay 2"" 141.35 8 2 16 14.47
derivative 115.67 9 2 19 104.15
107.02 13 2
99.81 19 2
107.54 21 2
Table B46. MSPC of lower strength blend (RCA) and tablet (Intact) multiway
PCA models.
Data Set Control PCs n. m-n. 99» Batch Implicated Anderson's normal Batch Anderson's
hatches (Control Phase 1) PCs approximation normal
Raw" 26 7 36.01 53.49 1 1 14.56 9 17.72
46.10 8 1 10 34.83
21 40.10
25 36.12
SNV 28 7 33.93 101.38 25 1,3 14.56 25 640.75

61.68 26 1 27 17.14
58.62 27 1
149.91 33 3, 1,5
136.54 34 3,1
204.25 35 1,3, 6 ,4
Detrend 22 6 34.59 48.27 25 2 13.48 4 28.52

126.06 33 3, 1 25 476.38
130.06 34 3, 1
176.96 35 3, 1,5
SNV detrend 23 7 40.44 114.44 25 3 ,7 14.56 4 168.96

95.20 26 3 ,2 .7 7 42.34
92.28 27 2, 3, 1 17 22.19
199.08 33 2, 5 ,3 25 134.83
189.41 34 2, 3 ,5
225.25 35 2 ,5 ,3
Savitzky- 18 5 33.56 64.31 1 3 12.30 9 14.28

Golay 2"“ 48.00 8 3 11 28.67
derivative 36.99 20 3
84.96 25 2
53.48 26 2
36.05 27 2
175.64 33 2 ,1 ,3
246.71 34 2 ,1 ,3
262.41 35 2,1
51.25 39 3, 1

314
Table B47. MSPC of higher strength blend (RCA) and tablet (Intact) multiway
PCA models.
Data Set Control PCs 7^> 7^», Batch Implicated Anderson's normal Batch Anderson's
Raw ' 37 7 28.50 67.27 38 4, 1 14.56 9 15.85
13 490.92
41 49.67
SNV 27 3 15.90 48.06 9 1 9.53 13 11.35

41.22 11 1 20 155.42
19.61 20 1 38 43.41
24.76 21 40 12.55
20.55 22 1
52.09 27 1
83.50 28 1
17.45 29 1
191.72 38 3. 1
24.90 39 1
Detrend 27 4 19.99 23.85 27 2 11.00 20 18.65

58.15 28 2 38 56.46
22.37 34 3 40 21.18
257.54 38 1, 2
50.20 39 1,2
SNV detrend 26 4 20.36 33.36 2 1 11.00 15 25.92

38.75 9 1 20 291.46
30.91 11 1 38 79.53
21.05 20 1 40 13.36
24.39 26 1.2
70.43 27 1
153.31 28 1
35.61 29 1
402.77 38 1 .3 ,2
82.66 39 1.3
Savitzky- 19 3 18.80 19.69 1 1 9.53 20 17.07

Golay 2"“ 30.19 2 1 23 15.46
derivative 42.15 23 1 28 19.61
65.45 24 1.3 38 1091.35
18.84 25 1
33.92 26 1
77.87 27 1
207.69 28 1
34.21 29 1
30.11 34
1091.89 38 1
325.52 39 1.3
Table B48. MSPC of lower strength blend (RCA) and tablet (RCA and Intact)
multiway PCA models.
Data Set Control PCs m-fi, 99% Batch Implicated Anderson's normal Batch Anderson's
(99.99% Umit) approximation
Raw" 39 8 31.91 15.56
SNV 32 7 30.94 60.22 25 1.3 14.56 24 307.98

35.47 27 1 25 331.61
86.48 33 1 .6 ,3
67.50 34 1.3
95.97 35 6, 1
Detrend 29 7 33.06 33.84 25 2 14.56 25 23.52

113.28 33 3. 1.4 27 16.22
143.00 34 3. 1.4
131.55 35 3. 1
SNV detrend 29 6 28.03 70.74 25 3 13.48 7 39.13

42.54 26 5 25 44.45
44.57 27 5 .2
70.31 33 2
74.99 34 2
121.79 35 2 .6
Savitzky- 11 5 79.51 121.83 25 2 12.30 9 14.17

Golay 2"“ 108.98 33 2 11 14.31
derivative 166.56 34 2 34 36.23
196.42 35 2
99.46 39 1.3

315
Table B49. MSPC of higher strength blend (RCA) and tablet (RCA and Intact)
multiway PCA models.
Daia Set Control PCs r. m-n. 99* Batch Implicated Anderson's normal Batch Anderson's
Raw ' 25 6 31.05 32.28 2 1 13.48 13 329.20
35.31 21 3 20 229.50
100.22 23 1,3 38 27.25
116.25 24 3, 1
87.(X) 25 3, 1
36.59 27 1
55.92 28 1
240.66 38 1,4
109.02 39 1
SNV 21 3 17,78 35.70 9 1 9.53 13 14.65

31.78 11 1 20 104.04
55.75 21 2 ,3 38 19.60
35.30 26 2
88.07 27 1,2
93.12 28 1
26.78 29 1
24.91 32 2
20.88 34 2
182.40 38 3. 1,2
40.24 39 2, 1
Detrend 33 5 22.14 36.85 24 5 12.30

24.77 27 1,2
52.25 28 2
41.10 34 3
299.88 38 1,2.4
52.67 39 1
SNV detrend 36 4 17.87 30.97 24 3 11.00 7 12.15

25.42 34 2 ,4 15 89.13
126.15 38 4, 3. 1 20 97.53
Savitzky- 28 3 15.69 19.46 1 1 9.53 20 16.67

Golay 2"'* 26.31 2 1 23 15.00
derivative 17.17 11 1 28 20.60
77.40 23 1 38 1111.89
67.11 24 1
39.15 25 1
42.99 26 1
117.90 27 1
323.83 28 1
60.24 29 1
16.07 34
1715.01 38 1
427.71 39 1
316
Table B50. Principal Factor Analysis results (normal varimax): raw material usage
data (kg) and blend PCA results {Q statistic and Anderson’s asymptotic normal
approximation).
Data Factor Variable Batch Number Loading*
Reflectance 14 M agnesium stearate EX 004181 0.35
7^ PC 5 - - 0 .3 9
fPC6 - 0.54
Anderson's asymptotic normal approx. - 0.44
SNV 10 Dibasic calcium phosphate anhydrous E X 005147 0.49

Microcrystalline cellulose E X 004245 0.51
Anderson's asymptotic normal approx. - 0.36
14 Dibasic calcium phosphate anhydrous E X 006189 -0 .4 0
Dibasic calcium phosphate anhydrous E X 005148 0.77
fPC3 - 0.43
Detrend 8 Drug substance 7D R B 041A 0.37

Dibasic calcium phosphate anhydrous EX 006245 0.48
Microcrystalline cellulose EX 006279 0.51
fPC6 - 0.39
14 Dibasic calcium phosphate anhydrous E X 006189 -0 .3 6
Dibasic calcium phosphate anhydrous E X 005148 0.73
Q - - 0 .3 9
SNV Detrend 3 Dibasic calcium phosphate anhydrous E X 005147 -0 .4 8

Microcrystalline cellulose E X 004245 - 0 .5 2
Anderson's asymptotic normal approx. - -0 .3 7
10 Drug substance 7D R B 041A 0.35
Dibasic calcium phosphate anhydrous EX 006245 0.51
Microcrystalline cellulose EX 006279 0.55
7^ PC 3 - 0.34
14 7^ PC 5 - 0.33
fPC6 - 0.51
7^ PC 7 - 0.48
15 Dibasic calcium phosphate anhydrous EX 006189 0.40
Dibasic calcium phosphate anhydrous E X 005148 -0 .8 1
Q - 0.34
Savitzky-Golay 2 Dibasic calcium phosphate anhydrous EX 008202 0.44

2"^ Derivative
Microcrystalline cellulose E X 007173 0.44
Sodium starch glycolate E X 008015 0.44
Q - 0.37
13 Magnesium stearate E X 008090 0.51
7^ PC 4 - 0.36
7^ PC 6 - 0.50
Q - -0 .4 5
16 fPC3 - 0.61
Anderson's asymptotic normal approx. - - 0 .5 4
Significant correlation {p = 0.01, « = 70 )
317
Table B51. Blend PCA loading correlations with excipient NIR spectra
(absorbance, DT absorbance and Savitzky-Golay 11 point 2"** derivative of
absorbance spectra).
NIR Spectral Data Raw Material PC Loadings Correlation, r
data used for Points
PCA
Reflectance^ 700 Dibasic calcium phosphate 2 -0.969
anhydrous
Magnesium stearate 2,1 -0.837, -0.694
Microcrystalline cellulose 2,1 -0.961, -0.859
Sodium starch glycolate 1 -0.829
SNV^ 700 Dibasic calcium phosphate - -
anhydrous
Magnesium stearate 5,6 0.656, 0.624
Microcrystalline cellulose 3 0.861
Sodium starch glycolate 3,2* 0.685, -0.508
Detrend^ 700 Dibasic calcium phosphate 1 0.636

anhydrous
Magnesium stearate 6 -0.703
Microcrystalline cellulose 2 0.819
Sodium starch glycolate 2 0.900
SNV Detrend^ 700 Dibasic calcium phosphate _
anhydrous
Magnesium stearate 5 0.800
Microcrystalline cellulose 2,3 0.746, -0.568
Sodium starch glycolate 2 0.757
Savitzky- 546 Dibasie calcium phosphate - -
Golay 2"^ anhydrous

derivative^
Magnesium stearate 4 -0.810
Microcrystalline cellulose 2 -0.759
Sodium starch glycolate 2 -0.808
' correlation produced using raw absorbance spectral data.

' correlation produced using detrend of absorbance spectral data,
correlation produced using Sg2dl 1 spectral data,
loadings characteristic of water.
318
APPENDIX C
Tables For Singleblock PLS And Multiblock PLS
Data Sets
319
Table C l. Cumulative percentage sum of squares (%S5) accounted for by lower
strength blend PLS models (n = 39 batch average observations) for X (NIR) and
Y(Certificate of Analysis data) blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%SS) Y block (%SS)
Absorbance 0 0 0
1 98.5278 11.2094
2 99.5886 12.5387
3 99.9250 15.4238
4 99.9655 18.8266
5 99.9723 33.4962
6 99.9891 36.8413
SNV detrend 0 0 0
1 63.0350 9.6071
2 73.4564 13.6424
3 83.1379 19.2952
4 92.3981 24.0503
5 96.1400 32.7977
6 97.3619 41.1443
Savitzky-Golay 2"’’ derivative 0 0 0

I 35.1658 12.4350
2 43.6291 38.5427
3 53.1901 45.7549
4 60.1101 53.5354
5 65.3785 63.8242
6 71.0695 69.6576
Table C2. Cumulative percentage sum of squares (%SS) accounted for by lower
strength tablet (combined absorbance and transmission data sets) PLS models (n
39 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%55) V block (%SS)
Absorbance 0 0 0
1 37.4909 16.6747
2 93.2834 22.2812
3 98.6305 36.0010
4 99.2602 46.0702
5 99.6718 51.7658
6 99.8546 52.6298
SNV detrend 0 0 0
1 24.1250 24.4640
2 53.4029 34.8386
3 67.3050 38.7702
4 82.9736 39.8916
5 94.3031 41.3013
6 96.5111 45.9564
Savitzky-Golay 2"'* derivative 0 0 0

1 17.2797 33.7190
2 51.3616 37.6372
3 66.3221 46.5585
4 74.1837 56.2960
5 77.4930 66.1358
6 80.8087 71.1353
320
Table C3. Cumulative percentage sum of squares (%SS) accounted for by higher
strength blend (absorbance and pre-treated absorbance data sets) PLS models (n
41 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%S5) Y block (%SS)
Absorbance 0 0 0
1 97.0926 4.2075
2 99.4767 5.3678
3 99.7981 12.3122
4 99.9240 20.1134
5 99.9713 22.4132
6 99.9897 26.9295
SNV detrend 0 0 0
1 54.1217 6.4363
2 79.3772 15.4671
3 88.5172 18.5090
4 91.4556 23.9379
5 97.0076 25.7904
6 98.5628 30.6108
Savitzky-Golay 2"'* derivative 0 0 0

1 35.3910 8.1480
2 42.9317 21.7321
3 55.5738 27.8595
4 65.2647 33.4627
5 69.2866 41.3344
6 71.7891 49.1541
Table C4. Cumulative percentage sum of squares (%SS) accounted for by higher
strength tablet (combined absorbance and transmission and pre-treated
absorbance and transmittance data sets) PLS models (n = 41 batch average
observations) for X (NIR) and Y(Certificate of Analysis data) blocks and for
different NIR spectral data sets.
NIR data set PLS components X block (%SS) V block
Absorbance 0 0 0
1 60.6272 11.1973
2 88.9116 15.2110
3 95.9915 18.7508
4 98.2400 21.5196
5 99.3493 23.8571
6 99.6597 28.7177
SNV detrend 0 0 0
1 46.7215 9.4302
2 71.0492 12.7547
3 79.8127 19.2646
4 91.7310 22.1985
5 94.5895 26.8946
6 96.2716 32.4628
Savitzky-Golay 2"‘‘ derivative 0 0 0

1 32.7060 13.0083
2 46.2040 21.6517
3 53.8930 30.4609
4 63.4475 36.4756
5 70.7722 40.0074
6 74.1100 45.2025
321
Table C5. Cumulative percentage sum of squares (%SS) explained by multiblock
PLS models for each subsection of the lower strength manufacturing process : XI
(blend: absorbance/pre-treated absorbance data) and X2 (tablet: combined
absorbance/pre-treated absorbance and transmission/pre-treated transmission
data) and certificate of analysis data, Y (n = 39 batch average observations) for
NIR data set PLS components X I (blend, %SS) X2 (tablet, %SS) Y (Certificate of analysis data, %SS)
Raw* 0 0 0 0
1 98.5270 21.3553 13.2083
2 98.7977 43.9091 21.5690
3 99.8554 98.6277 23.1329
4 99.9655 99.1732 30.8496
5 99.9829 99.6724 34.7930
6 99.9901 99.7694 36.8361
SNV detrend 0 0 0 0
1 42.0676 14.0736 19.2026
2 75.2256 51.3683 23.9042
3 85.4275 68.6934 26.2144
4 88.4289 82.1438 30.7211
5 96.1761 91.1881 34.7918
6 97.1229 96.4851 36.3713
Savitzky-Golay 2"‘‘ 0 0 0 0
derivative 1 26.1042 18.8555 23.8181
2 44.8448 57.2383 27.2238
3 53.2462 63.7748 36.4666
4 60.8855 67.1647 45.4472
5 67.5435 75.4463 48.8964
6 72.0485 80.2112 53.0074
Table C6. Cumulative percentage sum of squares (%SS) explained by multiblock

PLS models for each subsection of the higher strength manufacturing process: XI
(blend: absorbance/pre-treated absorbance data) and X2 (tablet: combined
absorbance/pre-treated absorbance and transmission/pre-treated transmission
data) and certificate of analysis data, Y (n = 41 batch average observations) for
NIR data set PLS components XI (blend, %S5) X2 (tablet, %SS) V (Certificate of analysis data, %SS)
Raw* 0 0 0 0
1 97.0868 62.6657 10.5780
2 99.5043 79.4850 13.9584
3 99.6162 94.8990 17.8981
4 99.9178 97.2575 21.3120
5 99.9751 99.0602 23.1101
6 99.9864 99.5196 31.5082
SNV detrend 0 0 0 0
1 49.0888 42.9342 8.4910
2 79.4721 67.4368 12.2611
3 83.4558 79.4804 19.0699
4 90.7284 89.2626 24.2839
5 96.5455 92.4355 28.7376
6 98.3923 95.7558 31.2060
Savitzky-Golay 2"'* 0 0 0 0
derivative 1 30.7872 19.9184 15.1495
2 46.3398 40.3850 21.5796
3 59.4878 50.9751 24.6986
4 63.2746 57.4616 29.0838
5 68.6447 62.7568 34.7574
6 74.8242 67.8627 36.9821
^ Raw data includes blend absorbance data (XI) and combined tablet absorbance and
transmittance data (X2).
322
Table C7. Partial least squares regression modelling (PLSRl algorithm) of
individual certificate of analysis (C. of A.) variables of lower strength blends and
tablets with their near infrared blend absorbance/pre-treated absorbance and
tablet combined absorbance/transmission or pre-treated absorbance/transmission
data (n = 36 observations).
C. of A. variable modelled PLS components PRESS minimum" ; Sum o f squares
Blend data (Sum o f squares o f autoscaled C. o f A. data = 0.97222)
Absorbance Blend uniformity 0.81764 15.90

Moisture deviation 0.79157 18.58
Blend uniformity 0.88841 8.62

Savitzky-Golay 2 derivative Blend uniformity 0.85003 12.57

Tablet data (Sum o f squares o f autoscaled C. o f A. data = 0.97222)
Raw Drug substance content 0.85185 12.38

Content uniformity 0.89283 8.17
Moisture 0.81383 16.29

Moisture 0.92766 4.58
Disintegration 0.94034 3.279
Savitzky-Golay 2°'^ derivative Drug substance content 0.96839 0.39

Moisture 0.86472 11.06
Disintegration 0.88985 8.47
Thickness 0.95229 2.05
Table C8. Partial least squares regression modelling (PLSRl algorithm) of

individual certificate of analysis (C. of A.) variables of higher strength blends and
tablets with their near infrared blend absorbance/pre-treated absorbance and
tablet combined absorbance/transmission or pre-treated absorbance/transmission
data {n = 36 observations).
NIR data set C. of A. variable modelled PLS components PRESS minimum" % Sum o f squares
Blend data (Sum o f .squares o f autoscaled C. o f A. data = 0.9750)
Absorbance Content uniformity
SNV detrend
Savitzky-Golay 2”'* derivative
Tablet data (Sum o f squares o f autoscaled C. o f A. data = 0.9750)
Raw Average weight 0.75624 22.44

Tablet thickness 0.58611 39.89
SNV detrend Average weight 0.84602 13.23

Savitzky-Golay 2"‘‘ derivative Average weight 0.74988 23.09

^PRESS is the predicted residual error sum of squares between PLS model predicted
and certificate of analysis measured values after cross validation.
323
Table C9. Q statistics for singleblock PLS models of lower strength blend.
Q99 Q9! Q>Q^
1 0.1374
2 0.1612
3 0.2119
10 0.2066
19 0.2217
22 0.1681
23 0.1754
24 0.2106
25 0.1740
26 0.1313
30 0.1703
35 0.1076
39 0.1159
1 31.0732
2 58.4866
3 66.2464
9 112.4084
10 43.6827
25 63.3014
28 54.5769
31 27.4952
34 32.9563
35 26.6006
36 50.0634
39 23.5542
Savilzky-Golay 2 derivative 14 480.1945

25 418.6779
Table CIO. Q statistics for singleblock PLS models of lower strength tablets (n = 39
batches).
Q99 Q95
1 9.9436
5 2.8456
6 2.9097
8 5.0869
9 1.7279
10 4.1349
11 3.9836
21 2.8280
32 2.3869
33 4.5181
36 4.7909
38.3869 5 84.7026
7 80.5027
10 112.7194
11 132.7731
21 57.7824
32 84.8975
36 131.0980
37 143.3551
Savitzky-Golay 2 derivative I 251.4722

5 327.7355
11 384.2397
21 266.5947
22 250.0825
26 526.7971
27 330.1475
28 246.5633
31 291.0530
36 386.5780
37 486.9569
324
Table C il. Q statistics for singleblock PLS models of higher strength blends (n
41 hatches).
Q„s__________________ Baich_________ Q> Qvt
18 0.3257
21 0.2699
33 0.1929
34 0.1732
39 0.2439
SNV detrend 4 13.4550

15 15.3647
18 67.7308
30 21.8702
33 28.9836
35 32.4789
36 23.6168
37 14.2792
38 69.1630
39 41.3193
Savilzky-Golay 2 ”‘‘ derivative I 424.1561

2 413.2710
6 453.1932
9 257.3108
18 353.7704
21 260.2007
32 350.7126
35 328.6117
38 286.3250
Table C12. Q statistics for singleblock PLS models of higher strength tablets (n
41 hatches).
3
-£>J2a-
10.8397
8 19.3108
12 4.2183
14 24.4520
15 14.7481
21 13.2031
23 9.2130
28 6.9264
29 5.0530
30 6.7363
31 4.8932
35 12.1203
38 12.5183
2 119.7522
30 104.9829
Savitzky-Golay 2"“*derivative 2 632.6778

12 396.8873
24 508.0848
27 364.4079
28 738.8445
35 368.9999
37 547.1033
39 823.7451
40 459.3705
41 671.8081
Table C13. Q statistic monitoring of multihlock PLS models of lower strength

blends (n = 39 hatches).
Data Set_________________________________________________________ Batch Q> Q<k
5 0.1106
9 0.2374
18 0.1950
19 0.2284
21 0.1176
30 0.1502
36 0.3784
SNV detrend 1 37.2855

5 36.1332
9 36.8617
18 66.1289
36 44.3539
Savitzky-Golay 2°‘‘ derivative 299.6489 2 662.9428

3 444.4115
5 668.3952
10 397.1586
18 812.1618
25 587.9371
26 371.3142
36 319.7184
325
Table C14. Q statistic monitoring of multiblock PLS models of lower strength
tablets (n = 39 batches).
Data Set__________________________________ g » _______________________________ BatchQ> Q 99_________________
1 4.0697
4 1.2737
5 2.7040
6 4.0430
9 4.3076
10 2.1581
U 4.0817
21 2.2855
32 2.0614
33 3.7292
36 4.2138
5 141.7461
6 82.5636
7 74.1785
9 89.9190
11 284.0160
21 79.1252
36 213.1416
37 117.3819
Savitzky-Golay 2 “’*derivative 2 364.2942

3 402.7941
5 396.1163
10 601.3112
14 450.2222
25 722.4410
26 710.5008
36 391.8748
37 548.6745
Table CIS. Q statistic monitoring of multiblock PLS models of higher strength

blends (n = 41 batches).
Data Set_____________________________________ 2s!2____________________ Batch Q > Qw_________________
13 0.3501
18 0.4603
21 0.4079
34 0.3011
36 0.4371
37 0.2378
38 0.2379
39 0.2998
8 23.4291
14 26.4575
18 57.6511
28 20.9336
29 38.7516
30 23.9276
38 55.3172
39 34.4153
Savitzky-Golay 2"'* derivative 1 829.7140

2 531.1372
6 546.0073
8 453.8431
15 359.1001
18 397.5697
21 364.2394
27 368.0456
28 402.0031
29 445.5624
30 523.9487
31 336.3478
32 445.7456
33 633.3479
35 717.5016
38 597.7105
39 369.1461
326
Table C16. Q statistic monitoring of multiblock PLS models of higher strength
tablets (i2 = 41 hatches).
Q>Q^
13 20.3496
14 42.0068
15 50.0628
28 34.1517
34 8.2974
36 13.3758
37 27.2729
38 27.9747
39 7.6078
41 9.4235
28 293.69
30 106.02
38 1507.04
39 119.99
Savilzky-Golay 2"'* derivative 353,2041 276.5355 1 447.87

2 729.30
8 356.96
15 373.96
23 1266.44
25 657.69
27 950.02
28 2316.70
29 873.21
30 601.94
31 420.30
32 411.18
33 434.62
35 673.49
38 9760.45
39 2527.45
Table C17. MSPC of single block PLS models of lower strength blends (n = 39
hatches).
D ata Set C ontrol PLS model ^ Batch Im plicated A nderson's B atch Anderson's
7^>7^ n. ;n-n. 99<*
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw data 39 6 24.1641 - - - 13.4774 17 231.4818
24 18.2005
27 17.6282
32 65.7944
SNV 17 6 46.8408 51.4504 10 6 13.4774 4 18.6013

detrend 162.3735 12 3 ,5 10 17.5367
166.7626 13 2. 3, 4, 5 25 46.7734
178.8413 15 3 ,5 ,4
127.4049 16 3 ,5 ,4
96.9685 18 4, 2 ,6
64.8746 22 3
129.6379 25 3 ,2
146.3403 26 6, 3, 2, 4
128.8867 27 3, 2, 5, 4
87.1072 28 5 ,6
70.4428 29 3 ,2
56.2280 33 4, 3 ,6
67.2597 35 6, 3 ,4
Savitzky- 29 6 28.0336 35.3602 2 2 ,4 13.4774 3 43.9022

G olay 2'"' 50.5780 5 2 4 139.4466
derivative 32.7429 13 3 7 75.1900
80.9381 18 2 ,4 9 74.8522
18 71.0997
327
Table CIS. MSPC of single block PLS models of higher strength blends (n = 41
hatches).
Data Set Control PLS model Batch Im plicated A nderson's Batch A nderson's
T^n. m—n. 99% 7^>7^n, /n-n. 99%
Raw data 36 6 24.9863 - - - 13.4774 9 55.2053
23 13.9840
40 224.0407
SNV 31 6 26.9568 56.2801 13 1 ,4 , 3, 2 ,6 13.4774 9 27.9771

detrend 39.7230 14 5 ,3 19 34.9401
42.3505 18 5 23 109.5630
34.5354 19 5, 3, 1
35.6836 23 3
36.5784 24 3
39.4643 25 3
39.7649 29 3 ,5
Savitzky- 32 6 26.0762 30.0860 13 2 13.4774 7 42.7164

G olay 2"‘* 41.4962 28 2 ,6 9 42.5453
derivative 30.7203 29 3 ,6 ,2 13 24.2530
41.9554 38 6 ,2 23 18.1693
Table C19. MSPC of singleblock PLS models of lower strength tablets {n = 39

hatches).
D ata Set Control PLS m odel ^ B atch Im plicated A nderson's Batch A nderson's
Batches rank T^n. n^n.99%. 7 ^ > 7 ^ n . m -n.99%
com ponents norm al approx. normal
Raw 28 6 28.6609 32.5607 2 1 ,3 13.4774 10 28.2784
data 33.4329 3 3, 1 11 14.9977
33.1637 8 1 ,5 30 19.5014
56.0507 33 1 ,3 ,5 37 23.1500
97.4051 34 1 ,3 ,5
70.4464 35 1 ,2 ,3
SNV 25 6 31.0476 67.4030 1 1 ,6 ,5 13.4774 26 17.8473

detrend 36.6776 2 4 36 58.3499
42.5664 19 1 ,6
39.5029 20 1
120.2734 25 4, 1
62.9724 26 4
78.8773 27 4, 1
116.3010 33 1 ,2
136.8727 34 1 ,2 ,4
172.8398 35 1 ,2 ,4
Savitzky 17 6 46.8408 85.6728 6 5 ,2 13.4774 1 62.0617

-Golay 47.5518 8 2 ,4 4 41.0105
2nd
51.3959 10 6 ,5 11 37.1803
derivati 74.8446 14 5, 2 ,4
ve 51.6367 15 5 ,2
107.0301 33 1 ,5 ,3
118.1172 34 1 ,3 ,6
122.4901 35 1
328
Table C20. MSPC of single block PLS models of higher strength tablets (/i = 41
hatches).
Data Set C ontrol PLS model ^ Batch Im plicated A nderson's Batch A nderson's
rank J n. ^ . 99% 7^> T^n. m-n. 99H
Batches com ponents norm al approx. normal
Raw 34 6 25.7327 37.6401 1 2 13.4774 2 20.6533
data 43.3983 9 3 ,2 4 107.2205
45.3390 13 2 ,3 39 20.1148
37.7641 19 3 ,2
31.3083 38 3, 5 ,6
SNV 19 6 40.4016 136.4086 1 1 ,4 ,3 13.4774 3 16.0606

detrend 48.8874 3 1 4 41.1726
66.3386 8 1 ,3 ,5 15 143.9220
109.6429 9 3. 1 ,4 35 30.0806
179.9242 13 1 ,4
172.5391 14 4, 1 ,3
87.2163 15 4, 2 ,5
85.5137 19 3, 1 ,4
77.0827 21 3
48.3603 23 6 ,5
93.3001 24 1
56.8649 34 1 ,2
255.1997 38 4, 6 , 2
88.8304 39 6, 1 ,4
Savitzky 23 6 33.2356 79.5435 23 5 13.4774 2 22.7688

-G olay 54.7429 34 3, 4, 1 ,5 13 290.4568
2"‘* 270.5364 38 1 ,3 , 2 ,4 15 36.1492
derivati 66.2157 39 2, 1
Table C21. MSPC of multihlock PLS models of lower strength blends {n = 39

hatches).
Data Set Control PLS model Batch Im plicated A nderson's Batch A nderson's
T^>T^n. m—n. 99^
Batches rank ^ com ponents norm al approx. norm al
Raw 39 6 24.1641 - - - 13.4774 17 31.0109
data 22 24.0181
SNV 19 6 40.4016 336.9790 1 1, 2 13.4774 10 14.8050

detrend 454.4193 2 1, 2 12 41.5443
478.3653 3 1, 2 18 34.5997
536.2881 4 1, 2
441.1569 5 1, 2
490.5406 6 1, 2
511.1302 7 1, 2
392.6983 8 1 ,2
327.7269 9 1, 2
415.2102 10 1 ,4 .2
384.9173 11 1, 2
641.8274 12 1 ,5
722.5097 13 1 ,5 ,2
446.3537 14 1 ,2
713.7106 15 1 ,5
637.7805 16 1, 5
519.8125 17 2, 1 ,5
573.8687 18 1 ,2 , 5
Savitzky 28 6 28.6609 55.7324 4 2, 3 ,5 13.4774 4 240.0300

-G olay 145.1090 6 3, 2, 4, 6 6 24.4122
2nd 75.2717 7 3, 2 , 4 9 4 8.2270
derivati 30.7492 31 5, 3
ve 93.7048 33 1 ,2 ,5
108.5247 34 1 ,2 , 3 ,6
101.7803 35 1 ,2 ,5
329
Table C22. MSPC of multiblock PLS models of higher strength blends (n = 41
hatches).
D ata Set C ontrol PLS model ^ Batch Implicated A nderson’s B atch A nderson’s
rank m-n. 99% 7 ^> 7^n. m-n. 99»
Batches com ponents norm al approx. normal
Raw 25 6 31.0476 98.8444 21 1 ,3 ,4 13.4774 9 33.8573
data 80.5131 26 1 .3 ,4 40 205.7495
117.2681 27 1 ,3 ,4
69.7281 30 1 ,3
66.5197 31 1
161.0069 32 1 ,4 ,3
68.6803 33 1
103.4204 34 1
69.5886 35 1 ,5
114.5315 36 1
108.2342 37 1
59.8452 39 1 ,3
80.2606 40 1 ,4
60.3487 41 1 ,3 ,5
SNV 35 6 25.3531 27.2447 9 6, 1 ,3 13.4774 9 20.4434

detrend 47.7170 13 6, 2, 1 19 55.4270
23 25.7298
38 50.0795
Savitzky 28 6 28.66 31.9045 13 3, 2 , 1 13.4774 7 21.0214

-G olay 78.6975 14 4 ,3 9 25.8174
2nd
43.3303 19 4 ,3 13 21.4501
derivati 99.0945 28 6, 3, 4, 1 23 160.3362
ve 80.8732 29 6, 3 ,4
71.2137 38 3, 6, 1
Table C23. MSPC of multihlock PLS models of lower strength tablets in = 39

hatches).
Data Set C ontrol PLS model Batch Implicated A nderson’s Batch A nderson’s
^ n. m—n. 99» 7 ^ 7 ^ n. m—n. 99»
Raw 27 6 29.3665 56.1923 33 1 ,5 , 2 ,3 13.4774 10 18.4722
data 93.9374 34 1 ,5 , 2 ,3 11 26.2823
98.3632 35 1 ,2 37 29.5160
32.9918 39 1 ,4 ,5
SNV 29 6 28.0336 33.3677 25 5, 4 ,2 13.4774 26 24.2920

detrend 79.7102 33 1 ,2 36 74.5298
104.7714 34 1 ,2 ,6
117.9409 35 1 ,2 ,6
Savitzky 25 6 31.0476 39.3699 3 4, 3 ,2 13.4774 1 16.3947

-G olay 56.3101 25 3, 1 ,4 , 2 ,6 4 13.8552
2nd
32.2647 26 3 ,6 7 15.1962
derivati 102.7039 33 1 ,4 ,6
109.6028 34 1 ,5 ,2
132.3347 35 1 ,2
330
Table C24. MSPC of multiblock PLS models of higher strength tablets (n = 41
batches).
Data Set C ontrol PLS m odel ^ Batch Im plicated A nderson’s Batch A nderson's
T^>T^n. m-n. 99*
Batches rank " ^ com ponents norm al approx. normal
Raw 40 6 23.9074 - - - 13.4774 2 99.4951
data 4 25.4433
32 21.5128
SNV 24 6 32.0642 88.5502 1 1 ,4 13.4774 4 42.3555

detrend 32.2035 8 6, 1 .5 7 14.6677
81.4838 9 4. 3, 2, 1 ,6 15 108.0377
110.2670 13 1 ,2 ,4 35 28.6275
64.2434 14 1 ,6 ,2
67.3319 15 1 ,2 , 6 ,5
82.7787 19 1 ,6 ,2 , 3
42.9276 21 6, 4, 2, 3
35.1013 23 5, 2, 1
66.7054 34 2, 3 ,5
108.2507 38 5, 4 , 6
48.6369 39 5, 6, 1
Savitzky 21 6 36.1890 62.2460 1 1 ,6 ,3 13.4774 4 30.13

-G olay 111.8127 8 1 ,6 , 4 ,5 13 1144.13
2nd
91.5308 9 4, 1 ,3 15 23.93
derivati 194.0301 13 1 ,4 , 6 ,5 38 14.23
123.3179 16 1 ,3 ,4
123.3179 17 1 ,3 ,4
123.3179 18 1 ,3 ,4
91.4559 19 1 ,3 ,4
85.7615 21 4, 1 ,6
51.9810 34 4, 3 ,2
64.5257 38 4, 5 ,3
331
Measurement of the cumulative particle size distribution of
microcrystalline cellulose using near infrared reflectance
spectroscopy
Andrew J. O ’Neil,* Roger D. Jee and Anthony C. Moffat
Centre fo r Pharmaceutical Analysis, The School o f Pharmacy, University o f London, 29-39

Brunswick Square, London, UK W CIN lA X
Received 14th September 1998, Accepted 9th November 1998
The cumulative particle size distribution of microcrystalline cellulose, a widely used pharmaceutical excipient, was
determined using near infrared (NIR) reflectance spectroscopy. Forward angle laser light scattering measurements
were used to provide reference particle size values corresponding to different quantités and then used to calibrate
the NIR data. Two different chemometric methods, three wavelength multiple linear regression and principal
components regression (three components), were compared. For each method, calibration equations were produced
at each of eleven quantités (5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95%). NIR predicted cumulative frequency
particle-size distributions were calculated for each of the calibration samples {n = 34) and for an independent test
set {n = 23). The NIR procedure was able to predict those obtained via forward angle laser light scattering.
Measurement of the particle size distribution of powdered E xperim ental

pharmaceutical raw materials is an important task that must be
performed prior to manufacturing processes. This is because the Instrumentation
distribution determines physical properties such as powder flow
(Hausner ratio), dissolution rate and compressibility.‘ With NIR measurements were made using an FT-NIR NIRVIS
microcrystalline cellulose, a range of different grades are spectrometer (Model 100.1, Buhler, UzwiI, Switzerland) fitted
commercially available, each with different physico-chemical with a Buhler fibre-optic probe (Model 110.2). Reflectance
properties.^ These grades are classified by their median particle spectra were recorded over the range 4008-9996 cm ' (500 data
size and by their cumulative particle size distribution.^ Each points), each spectrum being the average of six scans. Particle
grade should have a nominal median particle size and a size data were acquired by forward angle laser light scattering
cumulative particle size distribution which agrees with the using a Malvern 2600C particle sizer (Malvern Instruments,
range set by a pharmacopoeial monograph.^ Malvern, UK). Sieve fractions were produced using a machine
Typically, particle size analysis of this material is by forward sieve (Endecotts, London, UK).
angle laser light scattering (FALLS) or sieve analysis.^
However, a drawback with these methods is that the analysis is
time consuming and sample destructive.^ Recently, a near Materials
infrared (NIR) spectroscopic method of analysis has been
described which is capable of measuring the median particle The grades of microcrystalline cellulose used were Avicel
size using a two wavelength multiple linear regression of NIR PH101 (16 batches), Avicel PH 102 (19 batches) and Avicel
reflectance, R, and FALLS data.^ PH200 (single batch), all from EMC International (Wallings-
The best calibration results were obtained using the logarithm town. Little Island, Co. Cork, Ireland).
of the FALLS median particle size versus reflectance data.^
Taking this method a step further, it should be possible to Sample preparation and presentation
produce calibrations for particle size quantiles other than just
50% (median size), for example, 5,10,20,30,40,50,60,70,80, Sieve fractions from single batches of Avicel PH 101, Avicel
90 and 95%. These calibrations would be produced in the same PH 102 and Avicel PH200 were produced by machine sieving
manner as described in our previous paper,^ and thus permit the using a nest of progressively finer stainless steel wire mesh
measurement of a sample’s cumulative frequency particle-size sieves (150, 90, 63, 45, 38 and 32 pm). In addition, material
distribution from its NIR spectrum. A manufacturer should then falling through the 32 pm sieve was collected.
have sufficient information to assess the likely physico The sieved fractions and samples of all the original Avicel
chemical properties of the sample.^ batches were particle sized by FALLS. Each sample was
The aim of this work was to measure the cumulative suspended in cold, distilled water with surfactant (dilute
percentage frequency particle-size distribution of microcrystal household detergent) prior to particle sizing and was gently
line cellulose. Two chemometric methods of calibration were shaken using a vortex mixer to prevent the formation of
compared; three wavelength multiple linear regression (MLR)^ agglomerates. NIR diffuse reflectance measurements were
and principal components regression (PCR)^ using three made on the samples of sieved and bulk materials in narrow
principal components, each model using log (FALLS particle disposable glass vials to permit a consistent compaction
size) values and NIR reflectance data. The robustness of the pressure. Sieved and bulk materials were scanned on different
calibrations was assessed using an independent validation set. days over the course of several weeks.
Analyst, 1999, 124, 33-36 33

Data analysis distributional shapes. Of the 57 samples available, 34 were
chosen at random for the calibration set; the remaining 23
Data were processed using in-house computer programs written samples were used as an independent validation set. To aid
in C and in Matlab 5 Scientific and Technical Programming comparison of the two calibration methods, the same calibration
language (Mathworks, Natick, MA, USA). The MLR program and validation data were used for each method.
was based on the routine svdfit, available in the literature.*
Programs were run on an Acer Pentium II 333 MHz machine. Multiple linear regression. Data from the calibration
samples were used to generate calibration equations for each
quantile by fitting the logd^ values to the NIR reflectance values
Results and discussion according to the equation
ôgdjc - bo + biRxi + bxRi^ + bj,Rx^ (1)
Preliminary investigation
where d is the FALLS interpolated particle size at quantile x, R
The results of previous work^ have shown that useful calibra is the reflectance at wavelength X and b are the MLR
tions for median particle size can be obtained by using NIR coefficients. The selection of wavelengths was performed on a
reflectance data with a logarithmic transform of the FALLS reduced data set of every other wavelength to reduce the
particle-size data, hence these data were used in this work. computation time required. A full three wavelength search for
With MLR calibrations, preliminary work showed that a each particle size quantile calibrated therefore used 250 of the
three wavelength linear regression at any of the FALLS 500 available wavelengths. This reduced the total computation
quantiles produced calibrations more robust than a two time for all 1 1 calibrations to about 1 0 h, compared with an
wavelength fit. It was therefore decided that three wavelength estimated 80 h if all 500 wavelengths had been searched.
MLR calibrations would be employed subsequently. With PCR For each calibration equation, the three chosen wavenumbers
models, three principal components were required to obtain (Table 2) were those which gave the smallest standard error of
satisfactory calibrations and this was used for all subsequent calibration (SEC).® The optimum wavenumbers were similar
calibrations. for the 30-60% quantiles, but varied for the extreme quantiles.
The calibration equations were then used to predict the
validation set {n = 23) to give an indication of the robustness of
Spectral characteristics the method (Table 3).
Each powdered sample exhibited an NIR reflectance spectrum Principal components regression. This calibration method
with a curved baseline resulting from multiple scattering (Fig. required the generation of a principal components analysis
1). Across the spectrum of each sample, the apparent offset (PGA) model. This consists of a set of new variables which are
appears to increase and this has previously been attributed to uncorrelated and represent linear combinations of the original
variations in pathlength,^ which in turn is dependent on particle NIR reflectance data.
size and sample porosity.
Table 1 Particle size ranges at each quantile for the calibration and
validation sets as determined by FALLS
Model generation
Particle size/îm
The FALLS instrument gives values of the cumulative percent
age frequency particle-size distribution at 64 particle sizes Calibration set (n =: 34) Validation set (n == 23)
(range 564-5.8 pm) at intervals which follow a geometric Quantile
progression. For each sample, linear interpolation of the (%) Minimum Median Maximum Minimum Median1 Maximum
measured FALLS values was used to calculate the particle size 5 6.45 25.72 216.52 7.21 23.06 167.11
values corresponding to the 5,10, 20, 30,40, 50, 60, 70, 80,90 10 9.92 37.14 268.91 11.44 32.34 187.67
and 95% quantiles. The samples exhibited a wide range of 20 14.48 52.92 311.96 18.05 45.40 219.33
particle sizes at each quantile (Table 1) and a wide variety of 30 18.39 67.10 345.62 22.55 56.36 251.13
40 21.40 81.41 376.44 26.27 67.35 283.66
50 23.99 96.59 406.07 2&82 78.98 319.67
60 26.47 112.82 436.21 33.71 91.57 359.67
70 29.25 131.29 466.55 38.30 105.95 402.81
80 33.11 154.78 497.51 44.94 124.03 451.54
90 40.62 197.11 529.54 57.16 152.54 504.66
0.9 546.76 70.34 184.74 533.21
95 48.47 240.07
0.8
Table 2 MLR wavelengths and PCs selected for each percentage quantile
calibration
%
oc Percentage MLR wavenumber/cm-' PCs
0.5 4008 9300 9528 28 22 17
5
10 4008 9300 9528 29 27 14
0.4 20 5640 5676 6216 29 27 14
30 4464 9852 9864 29 28 27
0.3 40 5736 9M2 9864 27 15 1
50 5736 9852 9864 20 15 1
4500 5500 6500 7500 8500 9500 60 5496 9852 9864 20 15 1
Wavenumber/cm“^ 70 6024 6948 9168 15 14 1
80 5664 5796 9432 23 9 3
Fig. 1 NIR spectra of microcrystalline cellulose samples with different 90 5952 6996 8280 28 18 6
particle-size distributions and median particle sizes, (a) 24, (b) 45.8, (c) 95 7632 8532 8664 28 18 6
93.4, (d) 261 and (e) 406 fxm.
34 Analyst, 1999, 124, 33-36

The PCA model, X, was obtained as the product of a score with superscript b) which occurred at some of the extreme
matrix, T, with a loadings matrix, U, plus a residuals matrix, quantiles.
E:
X = TU+E (2) Cumulative particle size distributions

where X represents the original spectral data. Principal
The percentage quantile value was plotted against the NIR
components (PCs) are arranged such that the first represents the
predicted logcf^ of each sample in the calibration and validation
variable describing the largest amount of variance in the data
sets to give cumulative particle-size distribution curves for both
set, the next represents the largest residual variance, and so on
the MLR and PCR methods. The MLR and PCR results for the
until all PCs are extracted. Regression of FALLS data was as
first four validation samples are shown in Fig. 3, which also
described above for MLR, except that PC scores were used in
shows the FALLS measured cumulative percentage frequency
place of reflectance values. For each calibration, the three PCs
distributions overlaid. Predicted distributions for both calibra
selected were those which gave the highest correlations with the
tion methods closely follow those obtained by FALLS, although
FALLS data (Table 2). The total time required to compute PCs
PCR predicted distributions match the FALLS measured
and PCR calibration equations was much faster than MLR,
distributions more closely than with MLR.
requiring only about 2 0 min.
In this work, the number of quantiles at which calibration
equations were set up was restricted to 11. In principle, more or
Calibration and validation precision

0.12
With both methods, individual calibrations were the most 0.11

precise at the 40% and 50% quantiles (Table 3). This is clearly
seen from the plot of SEC versus percentage quantile (Fig. 2).
The decrease in the precision of individual calibrations at the
extreme quantiles probably reflects the shape of the distribution I
curves for the particle sizes in the calibration sets. The shapes of â 0.08
the distributions become more skewed at the extreme quan
o’ 0.07
tiles.
Ü
With both MLR and PCR, excellent calibration results were w 0.06
obtained, with low SECs (Table 3). The SECs at each quantile
are smaller with MLR; however, the standard errors of 0.05
prediction (SEPs)^ for the independent validation set are smaller
0.04
with the PCR model (Table 3). This suggests that the PCR
model is more robust. Table 3 also gives the slopes and 0.03
intercepts for the plots of NIR predicted logd^ versus FALLS 0 10 20 30 40 50 60 70 80 90 100
measured log<ix values at each quantile. The slopes and Percentage quantile
intercepts were not significantly (5% probability level) different Fig. 2 Standard errors of calibration (SEC) versus cumulative percentage
from 1 and 0 , respectively, apart from a few values (indicated quantile: (A) MLR; and (B) PCR.
Table 3 MLR and PCR calibration and validation results at various percentage quantiles
Percentage
Parameter^ 5 10 20 30 40 50 60 70 80 90 95
MLR calibration set {n = 34]-

R 0.977 0.980 0.984 0.987 0.989 0.988 0.984 0.979 0.972 0.932 0.889
0.954 0.960 0.968 0.974 0.978 0.975 0.968 0.958 0.945 0.869* 0.791*
0.065 0.063 0.055 0.048 0.042 0.049 0.065 0.088 0.121 0.301* 0.497*
SEC [log(4/pm)] 0.084 0.071 0.055 0.046 0.039 0.039 0.042 0.046 0.052 0.080 0.104
RSD (%) 19.3 16.3 12.7 10.6 9.0 9.0 9.7 10.6 12.0 18.4 23.9
Validation set [n = 23 )—
R 0.951 0.951 0.965 0.971 0.959 0.955 0.950 0.959 0.964 0.897 0.822
m 0.876 0.950 0.977 0.978 0.984 0.973 0.943 0.986 0.980 0.921 0.734*
c 0.140 0.064 0.046 0.021 0.013 0.040 0.108 0.050 0.057 0.216 0.657*
SEP [log(4/fxm)] 0.131 0.109 0.074 0.066 0.074 0.073 0.073 0.070 0.061 0.106 0.132
RSD (%) 30.1 25.1 17.0 15.2 17.0 16.8 - 16.8 16.1 14.0 24.4 30.4
PCR calibration set (n = 34 )—
R 0.969 0.973 0.978 0.980 0.981 0.981 0.976 0.968 0.959 0.898 0.858
m 0.939 0.946 0.956 0.960 0.963 0.961 0.953 0.937 0.921 0.806 0.737
c 0.086 0.084 0.076 0.074 0.070 0.077 0.096 0.133 0.174 0.445* 0.627*
SEC [log(4/p,m)] 0.096 0.082 0.065 0.057 0.051 0.049 0.051 0.057 0.062 0.097 0.116
RSD (%) 22.1 18.9 15.0 13.1 11.7 11.3 11.7 13.1 14.3 22.3 3&8
Validation set [n - 23 )—
R 0.980 0.981 0.981 0.978 0.984 0.981 0.969 0.965 0.953 0.924 0.842
1.128& 1.045 0.998 0.969 0.967 0.970 0.977 0.975 0.946 0.932 0.861
-0.16* -0.071 0.005 0.053 0.051 0.042 0.024 0.034 0.079 0.092 0.258
SEP [log(d*/pm)] 0.085 0.062 0.051 0.050 0.041 0.045 0.056 0.057 0.071 0.094 0.124
RSD (%) 19.6 14.3 11.7 11.5 9.4 10.4 12.9 13.1 16.3 21.6 28.5
" R is multiple correlation coefficient, m and c are slope and intercept of plots of NIR predicted logd^ versus FALLS measured logd%; n is the number of
samples in each data set. * m significantly different from 1, ore significantly different from 0.
Analyst, 1999, 124, 33-36 35

less could be used. With the present data sets the errors do not the median or mean particle size.^-'o-^^ Although setting up the
justify the need for smaller intervals (Table 3). calibration equations is time consuming, once generated they
allow the rapid determination of the cumulative frequency
distribution of subsequent samples. Both MLR and PCR
Conclusion provide excellent results; however, the PCR method is compu
tationally faster and slightly more robust. The method should be
applicable to other powdered pharmaceutical materials.
NIR spectroscopy may be used to measure the cumulative
percentage frequency particle-size distribution of powdered
The authors are grateful to Buhler for the loan of the NIR
microcrystalline cellulose. This represents a development over
instrument and to Mathworks for providing Matlab 5 software.
previous studies which have focused on measurement of only
They thank P. A. Hailey, Pfizer, Sandwich, UK and FMC
International for advice and providing samples of pharmaceuti
100 100 cal excipients and A. J. O’Neil thanks Pfizer for a research
A B
80 80
grant. Kevin Taylor and Keith Barnes, The School of Pharmacy,
/
University of London, are thanked for assistance with forward
60 / 60 /
angle laser light scattering and sieving.
40 y 40 /
20 y / 20 /
0 0
10^ 10 ' 10^ 10^ 10 ' 10^
References
1 C. Washington, Particle Size Analysis in Pharmaceutics and Other
Industries, Ellis Horwood, New York, 1992.
2 Handbook o f Pharmaceutical Excipients, ed. A. Wade and P. J.
Weller, American Pharmaceutical Association, Washington, DC and
Pharmaceutical Press, London, 2nd edn., 1994.
3 British Pharmacopoeia 1993, H.M. Stationery Office, London, 1993,
vol. 1.
4 M. E. Aulton, Pharmaceutics: the Science o f D osage Form Design,
Churchill Livingstone, Edinburgh, 1988.
5 P. A. Hailey, P. Doherty, P. Tapsell, T. Oliver and P. K. Aldridge, J.
Pharm. Biomed. Anal., 1996,14, 551.
6 A. J. O’Neil, R. D. Jee and A. C. Moffat, Analyst, submitted for
publication.
7 B. G. Osborne, T. Feam and P. H. Kindle, Practical NIR
Spectroscopy with Applications in Food and Beverage Analysis,
Longman, Harlow, 2nd edn., 1993.
8 W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery,
Numerical Recipes in C. The Art o f Scientific Computing, Cambridge
University Press, Cambridge, 2nd edn., 1992.
9 H. Mark, Principles and Practice o f Spectroscopic Calibration, J.
Wiley, New York, 1991.
10 J. L. Ilari, H. Martens and T. Isaksson, Appl. Spectrosc., 1988, 42,
722.
Particle Size/jxm 11 E. Ciurczak, P. Tori ini and P. Demkowicz, Spectroscopy, 1986, 1,
36.
Fig. 3 Cumulative percentage frequency particle-size distributions for the 12 P. Frake, C. N. Luscombe, D. R. Rudd, J. Waterhouse and U. A.
first four validation samples with FALLS measured values overlaid. Sample Jayasooriya, Anal. Commun., 1998, 35, 133.
1; (A) MLR and (B) PCR. Sample 2; (C) MLR and (D) PCR. Sample 3: (E)
MLR and (F) PCR. Sample 4: (G) MLR and (H) PCR. Paper 81071341
36 Analyst, 1999, 124, 33-36

The application of multiple linear regression to the
measurement of the median particle size of drugs and
pharmaceutical excipients by near-infrared spectroscopy
Andrew J. O’Neil,* Roger D. Jee and Anthony C. Moffat
Centre fo r Pharmaceutical Analysis, The School o f Pharmacy, University o f London, 29139

Brunswick Square, London, UK W CIN lA X
Received 30th July 1998, Accepted 14th September 1998
A number of powdered drugs and pharmaceutical excipients were used to demonstrate the ability of near-infrared
spectroscopy to measure median particle size {dso). Sieved fractions and bulk samples of aspirin, anhydrous
caffeine, paracetamol, lactose monohydrate and microcrystalline cellulose were particle sized by forward angle
laser light scattering (FALLS) and scanned by fibre-optic probe FT-NIR spectroscopy. Two-wavenumber multiple
linear regression (MLR) calibrations were produced using: NIR reflectance; absorbance and Kubelka-Munk
function data with each of median particle size, reciprocal median particle size and the logarithm of median
particle size. Best calibrations were obtained using reflectance data versus the logarithm of median particle size
(NIR predicted Inafgo versus ln(FALLS dso) for microcrystalline cellulose and lactose monohydrate sieve fraction
calibrations: r = 0.99 in each case). Working calibrations for lactose monohydrate (median particle size range:
19.2-183 pm) and microcrystalline cellulose (median particle size range: 24-406 pm) were set-up using
combinations of machine sieve-fractions and bulk samples. This approach was found to produce more robust
calibrations than just the use of sieved fractions. The method has been compared with single wavenumber
quadratic least squares regression using reflectance and mean-corrected reflectance data with median particle size.
(Correlation between NIR predicted and FALLS values was significantly better using the MLR method.
The measurement of particle size for pharmaceutical materials relationship as these may exhibit Rayleigh scatter, which is
is important^ ' 2 because it influences bulk physical properties,^ proportional to the third power of the particle size.?*
and determines the ability of powders to flow, mix, granulate The aim of this study has been to measure the number median
and dissolve. It is also often a requirement in pharmaceutical particle size, J5 0 ,* in drugs and pharmaceutical excipients by
manufacturing processes that particle size measurements are NIR spectroscopy using relatively simple chemometrics. Single
performed on raw materials.^ Commonly employed methods of wavenumber quadratic least squares regression of NIR and
measurement are forward angle laser light scattering (FALLS) particle size data has been compared with a two-wavenumber
and electrical zone sensing.^ A disadvantage with these multiple linear regression (MLR) of the same data.?®- ?? The
methods is that samples generally need to be analysed away effect of different pre-treatments of NIR data and FALLS data
from the production area, which is time consuming and leads to on calibration has also been investigated.
manufacturing delays. These problems can be effectively
overcome by the use of near-infrared spectroscopy (NIRS).^
This technique has the advantages that in addition to providing E xp erim en tal
physical information about the raw material, such as its particle
size,^ it can simultaneously provide useful chemical informa- Instrumentation
tion, * - ' 9 and the analysis can be performed in a few minutes in
the pharmaceutical warehouse using fibre-optic probe instru NIR measurements were made using a FT-NIR NIRVIS (No.
ments.^ 100.1, Buhler AG, Uzwil, Switzerland) spectrometer fitted with
With diffusely reflecting materials, such as pharmaceutical a fibre-optic probe (No. 110.2). Spectra were recorded over the
powders, scattering of light in the NIR region (4000 to 10 000 range 4008 to 9996 cmr* (500 data points) each spectrum being
cm-*) produces spectra with non-uniform baselines and the average of six scans. Particle size data were acquired by
varying offsets. These scatter effects vary with the particle FALLS using a Malvern 2600C particle-sizer (Malvern Instru
size,’®sample porosity® (and hence compaction pressure) and ments Ltd, Malvern, UK). A Philips XL20 scanning electron
with the wavelength^® and can be described using Rayleigh and microscope (Phillips Electron Optics, Eindhoven, Netherlands)
Mie theory,21 or alternatively using the Kubelka-Munk theory was used to determine particle shapes. Sieve fractions were
of diffuse reflectance. 2 1 produced using a machine sieve (Endecotts Ltd, London, UK)
Previous studies that have examined the effects of particle and an air-jet sieve (Alpine, Augsburg, Germany).
size on NIR spectra have demonstrated that reflectance varies
non-Iinearly with particle size.2*-25 Ciurczack et al3^ found that
reflectance exhibited an inverse relationship with mean particle Materials
size in agreement with Mie theory.^’ However, this relationship
does not necessarily apply in all cases and is dependent on the Single batches of aspirin and anhydrous caffeine (Sigma
shape of the particle size distribution of the sample,^’ the Chemical Co, St Louis, MO, USA) and paracetamol (Boots
particle shape?* and the materials refractive index.?* The Pharmaceuticals, Nottingham, UK) were used. Microcrystalline
presence of very small particles will further complicate the cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19
Analyst, 1998, 123, 2297-2302 2297

batches) and Avicel PH200 (single batch) were all from FMC
1.0 International (Wallingstown, Little Island, Co Cork, Ireland).
The batches of lactose monohydrate used were a single batch of
0 .9
a reagent grade material (Avocado Research Chemicals Ltd,
0.8
Heysham, UK), 9 samples of 110 mesh obtained from two
manufacturers (DMV International, Veghel, Netherlands and
0 .7 Lactose New Zealand, Hawera, New Zealand) and 18 samples
of Fast-Flo (Foremost Ingredients Group, Baraboo, WI,
0.6 USA).
0 .5
0 .4
Sample preparation and presentation
0 .3 A range of aspirin samples of different particle size distributions

were obtained by grinding the coarse bulk material with a
4000 5000 6000 7000 8000 9000 10000
mortar and pestle. Samples were taken successively with every
Wavenumber/cm
few minutes of grinding. The ground aspirin samples were air-
1.0
jet sieved to remove fines.
0 .9 Air-jet sieve fractions of the aspirin, anhydrous caffeine and
paracetamol were produced using stainless steel wire mesh
0.8
sieves of different sieve diameter (75, 56, 50, 40 and 36 pm).
0 .7 With use of the smallest air-jet sieve, an additional sample was
0.6 collected from the sieve filter paper.
Sieve fractions of a single batch of Avicel PHlOl, Avicel
0 .5
PH102, Avicel PH200 and the reagent grade lactose mono
0 .4 hydrate were produced by machine sieving using a nest of
progressively finer stainless steel wire mesh sieves (150,90,63,
0 .3
45, 38 and 32 pm). In addition, material falling through the 32
0.2 pm sieve was collected.
0.1 These size fractions and the remaining batches of bulk lactose
110 mesh, Fast-Flo, Avicel PHlOl and Avicel PH 102 were
o'— particle sized by FALLS. A sample from each was suspended in
4000 5000 6000 7000 8000 9000 10000
Wavenumber/cm^^ a practically insoluble disperse medium with surfactant (sorbi-
tan trioleaté or dilute household detergent) prior to particle
Fig. 1 NIR reflectance spectra for different median particle sizes. A,
Microcrystalline cellulose; a, 24 p,m; b, 45.8 pm; c, 93.4 pm; d, 261 pm; e, sizing and was gently shaken using a vortex-mixer to prevent
406 pm; B, lactose monohydrate; a, 44.7 pm; b, 66.3 pm; c, 98 pm; d, 132 formation of agglomerates. Avicel and aspirin samples were
pm; e, 168 pm. dispersed in cold, distilled water using dilute detergent.
0 .0 1 5
0 .5 5
0.020
0 .5 0
S 0 .0 2 5
» 0 .4 5 0 .0 3 0
0 .0 3 5
0 .4 0
0 .0 4 0
0 .3 5
100 300 400 100 20 0 300 400
FALLS d /|m
0 .1 6
0 .4 0
0 .1 5
0 .3 5
o 0 .3 0 g 0.14-
tJ 0 . 2 5 I oa3.
0.20 5 0.1 2 '
0 .1 5
I O.ll.f.
/
0.10 0.10
0 .0 5
0 .0 9 -
50 100 150 100 150
FALLS FALLS
Fig. 2 Single wavenumber quadratic least squares fit of NIR spectral data and median particle size, d^Q . A, microcrystalline cellulose reflectance data; B,
microcrystalline cellulose mean-corrected reflectance data; C, lactose reflectance data, and D, lactose mean-corrected reflectance data.
2298 Analyst, 1998, 123, 2297-2302

Anhydrous caffeine and lactose monohydrate were suspended l/d s o and the ln(FALLS dso). All possible combinations of
in cyclohexane with sorbitan trioleate. Paracetamol was sus pretreated NIR and FALLS data were tested.
pended in pentane with sorbitan trioleate. NIR diffuse re Investigation with both calibration methods revealed that the
flectance measurements were made on the samples of sieved, reflectance data recorded by the NIR spectrometer produced
ground and bulk material in narrow disposable glass vials to calibrations with significant correlation, low scatter and low
enable consistent compaction pressure. Sieved and bulk materi bias. Kubelka-Munk function data and absorbance also showed
als were scanned on different days, over the course of several significant correlation between NIR predicted dso and FALLS
dso, however these pretreatments introduced significant fixed
weeks.
bias into the calibration. Reflectance data were therefore
subsequently used. The ln(FALLS dso) was found to give two
wavenumber MLR calibrations with a lower SEC than l/d s o or
Data analysis dso and were the FALLS data used in subsequent MLR
calibrations.
Data were processed using code programmed in Matlab 5 To demonstrate the feasibility of the MLR calibration
Scientific and Technical Programming language (The Math method, sieve fractions of a single batch of three drugs were
works Inc, Natick, MA, USA). tried initially (aspirin, anhydrous caffeine and paracetamol).
Subsequent working MLR calibrations for the two pharmaceuti
cal excipients (microcrystalline cellulose and lactose mono
Results and discussion hydrate) were produced from a larger data set using either
machine sieve fractions or a combination of machine sieve
fractions and bulk samples from a number of different batches.
Preliminary results
Reflectance at any wavenumber versus dso or l/c^so exhibited a Table 1 Microcrystalline cellulose and lactose MLR calibration (sieve
curvilinear relationship. To allow for this, two different fraction data) and validation (bulk sample data) results
approaches were compared: single wavenumber quadratic least
squares regression and full two-wavenumber search MLR. With Microcrystalline
each of these calibration methods, different pretreatments of the Material cellulose Lactose monohydrate
NIR spectral and FALLS dso data were applied and their effects V -3.99 4.59
on standard errors of calibration and prediction (SEC and SEP bi« 59.65 -172.3
b2 « -57.99 167.8
respectively),bias and linearity investigated. Wavenumber 1
Quadratic least squares fits of NIR spectral and FALLS dso (cm-') 8244 6012
data were used to allow for gentle curvature in calibrations. The Wavenumber 2
NIR spectral data were diffuse reflectance (of infinite thickness (cm-') 5964 5940
for all practical purposes), R; mean corrected reflectance (where SEC [ln(£?5 o/pm)] 0.067 0.097
the mean reflectance value of an individual spectrum is SEP [bi(J5 o/p.m)] 0.17 0.18
subtracted from the reflectance at each of its spectral wave In P = c + m In(FALLS dso)
numbers); absorbance, log (l/R) and Kubelka-Munk function,
Calibration set*
/W: r 0.99 0.99
£ m 0.99 0.98
ji-R) c 0.035 0.068
f{R) = (1)
2R S n 24 (PHlOl, PH102, 15 (Sieved and
PH200 sieved) 1 1 0 mesh)
where K and S are the Kubelka-Munk absorption and scatter Validation set*
coefficients respectively. A search of all 500 datapoints was r 0.84 0.014
used to select the wavenumber giving the smallest SEC for each m 0.96 0.018
NIR data pretreatment. c 0.17 4.38
n 33 (PHlOl and 18 (Fast-flo samples)
The second calibration technique applied was two wave
PH102, bulk)
number MLR. A search of all combinations of two wave
numbers from the 500 measured by the spectrometer was “ MLR coefficients: b o , intercept; 6 ,, wavenumber 1, and 6 2 ,
wavenumber 2 . * r is correlation coefficient; m and c are slope and
carried out. The NIR spectral data used were again reflectance,
intercept, respectively, of plots of NIR predicted W 5 0 vs. FALLS measured
mean-corrected reflectance, absorbance and Kubelka-Munk Intiso; n number of samples in each data set.
function. FALLS data pretreatments investigated were dso,
400 180 180
160
160
140
I
% 140
350 S 120
S 120 100
!... 100
80
60
250
250 300 350 400 60 ) 100 150
FALLS dgg/pm FALLS d ^ / i m
Fig. 3 Feasibility study. Results of MLR calibration. NIR measured median particle size, dso . versus FALLS dso : A, aspirin; B, anhydrous caffeine, and
C, paracetamol.
Anafysf, 1998, 123, 2297-2302 2299

Fig. 4 SEM results. A, Avicel PHlOl bulk sample; B, Avicel PH200 > 200 pm sieve fraction; C, lactose monohydrate < 31 pm sieve fraction, and D,
lactose monohydrate > 150 pm à ic v c iiavîiuii.
The quadratic least squares calibrations were performed using Mean-correction of each spectrum was found to improve
the microcrystalline cellulose and lactose monohydrate data sets correlation between NIR predicted and FALLS dso with the
as these had the largest number of data. microcrystalline cellulose data set [r = 0.98, 7128 cm ' (n =
57)]. This pre-treatment acts to centre the data of individual
spectra (^mean = 0 ) and can help to eliminate baseline
Spectral characteristics differences that occur as a result of variable sample porosity and
pressure applied with the fibre-optic probe. The variation in
Scans of each powdered sample exhibited the characteristic offset will also be influenced by the flow properties of the
overlapping combinations and overtones arising from the material. Use of this pretreatment is likely to be appropriate in
fundamentals of the mid-infrared, with non-uniform baselines single wavenumber least squares calibrations where the mate
resulting from multiple scattering. Spectra also showed differ rial exhibits variable compaction properties, such as with
ent offset values, which appear to increase with wavenumber different grades of microcrystalline cellulose,2 » and also where
(Fig. 1). This has previously been attributed to variation in the NIR measurements are recorded using a fibre-optic probe.
pathlength,26 which is influenced by particle size and sample However, with lactose monohydrate (Reagent grade, 110 mesh
porosity. and Fast-Flo), which tends to have good flow and compaction
properties,29 pretreatment was not appropriate and gave
a poorer fit between NIR predicted and FALLS dso [/' = 0.68,
Single wavenumber quadratic least squares calibration 7056 cm-' {n = 33)].
Reflectance at any wavenumber showed a generally inverse

linear trend with median particle size up to approximately 1 0 0 MLR calibration using aspirin, anhydrous caffeine, and
pm (Fig. 2) broadly agreeing with Mie and Fraunhoffer theory paracetamol
for particles of comparable size to the wavelength.^' Beyond
this particle size, the relationship becomes markedly non-linear The results of these calibrations which applied MLR to all two
(Fig. 2). A quadratic least squares fit of the data (Fig. 2) between wavenumber combinations and contained 7 or 1 0 data points,
reflectance and median particle showed useful correlations clearly demonstrates a relationship between NIR reflectance
between the NIR predicted and FALLS J5 0 [microcrystalline and the ln(FALLS dso) (Fig. 3). The ln(FALLS dso) and
cellulose: r = 0.96,9012 cm-' {n = 57); lactose monohydrate: reflectance data, R, were fitted to an equation of the general
r = 0.90, 7428 cm-' {n = 33)]. form:
2300 Analyst, 1998, 123, 2297-2302

ln(FALLS dso) — bo + b\R\^ + b2R\^ (2) 1). However, the validation sets for each material exhibited
more scatter (Table 1).
were bo is the intercept, b^ and 6 2 are the MLR coefficients for
With the microcrystalline cellulose prediction set of bulk
the two wavenumbers, X,i and X2 respectively. Significant linear
samples (Avicel grades PHlOl and PH 102), significant correla
association was found between NIR predicted Intiso and
tion (p < 0.005) between NIR predicted W 5 0 and ln(FALLS
ln(FALLS dso) values in each case [aspirin; r = 0.99 (n = 7),
dso) was obtained with a SEP greater than the SEC (Table 1).
anhydrous caffeine: r = 0.99 (« = 7) and paracetamol: r = 0.96
The high SEP is possibly accounted for by the FALLS and
{n = 10); in each case p < 0.005].
scanning electron microscopy (SEM) results (Fig. 4A and 4B)
which showed that these bulk samples had broad distributions
and comprised a mixture of irregularly shaped fines and large
Full two wavenumber search MLR using microcrystalline
spherical particles. Previous workî has shown that this can
cellulose and lactose
produce more variable results than the use of narrow or
uniform-size distributions as the NIR scattering and absorbing
With each of these materials, preliminary data processing
properties of these particles will be different to that of median
revealed that MLR of reflectance versus ln(FALLS dsç^
sized particles.
produced the most linear calibrations. In addition, mean-
Validation of the lactose calibration used bulk samples of
correction of these spectra was not found to improve calibration
Fast-Flo from 18 different batches. This spray dried material
results. The MLR two wavenumber model therefore com
generally has a relatively uniform and spherical particle size.29
pensates for variation in baseline offset.
A narrow range of dso was confirmed by FALLS (range dso'
81.1-115.7 p m ) . Poor correlation was obtained between NIR
Particle size calibration using sieve fraction data. Before
predicted Inciso and ln(FALLS dso) with these samples and is
probably due to the narrow range of particle size in the
calibrations were attempted for both materials, the data of each
prediction set as the SEP is not significantly different from that
was split into two sets: a calibration set of mainly sieved
of the microcrystalline cellulose prediction set of bulk samples
fractions [microcrystalline cellulose {n = 24) and lactose (n =
(Table 1).
15)1 and a validation set of bulk samples [microcrystalline
SEM results of the lactose sieve fractions used in the
cellulose (n = 33) and lactose (n = 18)]. With each calibration
calibration set showed small, irregularly shaped fines in the
set, highly significant correlation (p < 0.005) was obtained
smallest sieve fractions and large spherical particles in the
between NIR predicted InJô and ln(FALLS dso) values (Table
largest sieve fractions (Fig. 4C and 4D), much the same as with
the microcrystalline cellulose calibration set.
10
u 10
102
1
10 10
FALLS d^g/pm 101 103
10
Ü 10
10 10 1 3
FALLS d /pm 10 102 10
FALLS d^g/pm
Fig. 5 Results of microcrystalline cellulose MLR calibration with
randomised sieve fraction and bulk sample data. NIR measured median Fig. 6 Results of lactose monohydrate MLR calibration with randomised
particle size, dso, versus FALLS dso- A, Calibration set and B, validation sieve fraction and bulk sample data. NIR measmed median particle size, dso,
set. versus FALLS dso- A, Calibration set and B, validation set.
Analyst, 1998, 123, 2297-2302 2301

Particle size calibration using randomised sieve fraction grant. Keith Barnes and Dave McCarthy, The School of
and bulk sample data. To produce working calibrations with Pharmacy, University of London are thanked for assistance with
the microcrystalline cellulose and lactose data sets, the sieve sieving and SEM.
fraction and bulk sample data were randomly assigned to either
the calibration set (67% of spectra) or validation set (33% of
spectra). This procedure was repeated three times to test the
References
robustness of the method, giving three different calibration and
validation sets for the two materials. In each case, all three 1 M. E. Aulton, Pharmaceutics: The Science o f Dosage Form Design,
calibrations employed slightly different combinations of wave Churchill Livingstone, Edinburgh, 1988.
numbers. The selected wavenumbers were found to occur on the 2 H. G. Barth, S. Sun and R. M. Nickol, Anal. Chem., 1987, 59, 142.
slopes of overtone peaks; the selection of each wavenumber is 3 C. Washington, Particle Size Analysis in Pharmaceutics and Other
Industries, Ellis Horwood, New York, 1992.
therefore likely to have been influenced by the random noise in
4 British Pharmacopoeia 1993, HM Stationery Office, London, 1993,
each data set. With both materials, each of the three MLR vol. 2, Appendix XVII, A & B.
calibrations showed a good fit between NIR spectral and 5 A. Simmons, International LABMATE, 1993,17, 23.
FALLS data {microcrystalline cellulose: SEC [ln(J5 o/pm)] = 6 P. A. Hailey, P. Doherty, P. Tapsell, T., Oliver and P. K. Aldridge,
0.10-0.1 It and lactose monohydrate: SEC [ln(rf5 o/p.m)] = J. Pharm. Biomed. Anal., 1996,14, 551.
0.12-0.13t}. This was confirmed by plots of NIR predicted 7 P. Frake, C. N. Luscombe, D. R. Rudd, J. Waterhouse and U. A.
ln^5 o versus ln(FALLS dgo) which showed significant linear Jayasooriya, Anal. Commun., 1998, 35, 133.
8 P. Dubois, J. Martinez and P. Levillain, Analyst, 1987,112, 1675.
association [microcrystalline cellulose: r = 0.98 (n = 38 for 9 W. Plugge and C. van der Vlies, J. Pharm. Biomed. Anal., 1993,11,
each set) and lactose monohydrate: r = 0.97-0.981 {n = 22 for 435.
each set); in each case with p < 0.005] (Fig. 5A and 6A). The 10 C. van der Vhes, W. Plugge and K. J. Kaffka, Spectroscopy, 1995,10,
three validation sets for each material showed similar results 46.
(Fig. 5B and 6B) with highly significant correlation between 11 W. Plugge and C. van der Vlies, J. Pharm. Biomed Anal., 1996, 14,
NIR predicted In^so and ln(FALLS dô) [microcrystalline 891.
12 E. Dreassi, G. Ceramelli, L. Savini, P. Corti, P. L. Peruccio and S.
cellulose (n = 19): r = 0.98, lactose monohydrate (n = 11): r Lonardi, Analyst, 1995, 120, 319.
= 0.93-0.97;t in each casep < 0.005], and SEP comparable to 13 E. Dreassi, G. Ceramelli and P. Corti, Analyst, 1995,120, 1005.
SEC, {microcrystalline cellulose: SEP [ln(c?5 o/pm)] = 14 E. Dreassi, G. Ceramelli, P. Corti, M. Massacesi and P. L. Perruccio,
0.12-0.14,t and lactose monohydrate: SEP [ln(<i5 o/pm)] = Analyst, 1995, 120, 2361.
0.15-0.21t). 15 D. J. Wargo and J. K. Drennen, J. Pharm. Biomed, Anal., 1996,14,
1415.
16 R. A. Forbes, M. L. Persinger and D. R. Smith, J. Pharm. Biomed.
Anal., 1996,15, 315.
Conclusion 17 LA. Cowe, J. W. McNicol and D. C. Cuthbertson, Ana/yV, 1989,114,
683.
18 L. S. Aucott, P. H. Garthwaite and S. T. Buckland, Analyst, 1988,
The accurate measurement of median particle size in pharma 113, 1849.
ceutical powders can be achieved using NIR spectroscopy. 19 I. J. Bames, M. S. Dhanoa and S. J. Lister, Appl.Spectrosc., 1989,43,
Excellent calibration results can be achieved by applying the 772
MLR method to sieved fractions, however in practice as bulk 20 C. R. Bull, Analyst, 1991,116, 781.
samples are more likely to be particle sized, calibrations also 21 G. Kortum, Reflectance Spectroscopy, Principles, Methods, Applica
require inclusion of bulk samples. Use of both bulk samples and tions, Springer-verlag, Berlin, 1969.
22 K. H. Norris and P. C. Williams, Cereal Chem., 1984, 61, 158.
sieve fractions produces robust calibrations that can be used 23 A. J. O’Neil, R. D. Jee, R. A. Watt and A. C. Moffat, J. Pharm.
over a wide range of particle size. Pharm acol, 1997, 49, Suppl. 4,19.
24 J. L. Bari, H. Martens and T. Isaksson, Appl. Spectrosc., 1988, 42,
722.
25 E. Ciurczak, P. Torlini and P. Demkowicz, Spectroscopy, 1986, 1,
Acknowledgements 36.
26 H. Mark, Principles and Practice o f Spectroscopic Calibration, John
The authors are grateful to Buhler AG for the loan of the NIR Wiley & Sons Inc., New York, 1991.
instrument, and to The Mathworks Inc for providing Matlab 5 27 J. C. Miller and J. N. Miller, Statistics For Analytical Chemistry, Ellis
software. We thank P. A. Hailey, Pfizer Ltd and FMC Horwood, New York, 3rd edn., 1993.
28 Handbook o f Pharmaceutical Excipients, ed. A. Wade and P. J.
International for advice and providing samples of pharmaceuti WeUer, American Pharmaceutical Association, Washington, The
cal excipients and A. J. O’Neil thanks Pfizer Ltd for a research Pharmaceutical Press, London, 2nd edn., 1994.
29 S. Pearce, Manuf. Chem., 1986, 57, 77.
t The range gives the minimum and maximum values observed for the three
randomly selected calibration and validation sets. Paper 810600I K
2302 Analyst, 1998, 123, 2297-2302

Multivariate Statistical Quali

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multivariate Statistical Quali

Uploaded by

Copyright:

Available Formats

Multivariate Statistical Quality Control of a

Pharmaceutical Manufacturing Process Using

Near Infrared Spectroscopy And Imaging

ANDREW JAMES O ’NEIL

A thesis submitted in partial fulfilment of the requirements of The

University of London for the degree of Doctor of Philosophy in the Faculty

The School of Pharmacy,

29/39 Brunswick Square,

London W CIN lAX.

All rights reserved

INFORMATION TO ALL USERS

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved.

1.1 Pharmaceutical Quality Control, 17

1.2 Process Based Measurements, 18

1.4 Principles of Near Infrared Spectroscopy, 22

1.5 Diffuse Reflectance Spectroscopy, 25

1.6 Near Infrared Spectral Data Pre-processing, 29

1.7 Multivariate Analysis, 32

1.8 Multivariate Statistical Process Control, 41

2. Measurement of Powdered Pharmaceutical Material 48

Particle Size by Near Infrared Spectroscopy

2.2 Review of The Literature, 49

2.3 Materials Used, 50

2.4 Sample Preparation, 52

2.5 Reference Particle Size Analysis, 53

2.6 Near Infrared Reflectance Measurements, 54

2.8 Measurement of The Number Median Particle Size, 55

2.9 Measurement of The Cumulative Percentage Frequency Particle Size

2.10 Measurement of The Percentage Frequency Particle Size Distribution, 75

2.11 Classification of Excipient Grades by Cluster Analysis Methods, 84

3. Multivariate Statistical Process Control of a Pharmaceutical 95

Manufacturing Process Using Principal Components Analysis

of Near Infrared Measurements

3.2 Background and Overview of The Process, 96

3.4 Reference Analytical Data, 98

3.5 Near Infrared Measurements, 106

3.6 Data Analysis And Pre-treatment, 107

3.7 Multivariate Statistical Quality Control Methods, 114

3.8 Multivariate Statistical Quality Control of Pharmaceutical Blends,

3.9 Multivariate Statistical Quality Control of Pharmaceutical Tablets,

3.10 Multivariate Statistical Quality Control of The Entire Process, 166

3.11 Summary of Results, 176

3.12 Conclusion, 178

Process Using Partial Least Squares Regression (PLSR) of

Near Infrared And Reference Analysis Measurements

4.1 Introduction, 180

4.3 Statistical Quality of Pharmaceutical Blends And Tablets by Singleblock

4.4 Statistical Quality Control of The Entire Process by Multiblock PLSR,

4.5 Summary of Results, 217

4.6 Conclusion, 220

5. Multivariate Image Analysis of Near Infrared Multispectral 222

Blend Images For Quality Control of Pharmaceutical Blending

5.1 Introduction, 222

5.2 Materials Studied, 223

5.3 Sample Preparation, 223

5.4 Liquid Crystal Tuneable Filter InSb Focal Plane Array

Near Infrared Imaging Microscopy, 224

5.5 Multivariate Image Analysis, 225

5.6 Image Cube Data Pre-treatments, 229

5.7 Multiway Principal Components Analysis of Multispectral

5.8 Particle Size Analysis of Unmilled Crystalline Material and Drug

5.9 Multivariate Monitoring of Blend Quality, 251

5.11 Conclusion, 271