Professional Documents
Culture Documents
Multivariate Statistical Quali
Multivariate Statistical Quali
Microscopy
of Medicine
University of London,
May 2000.
ProQuest Number: 10104305
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.
uest.
ProQuest 10104305
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
ABSTRACT
Multivariate Statistical Quality Control of a Pharmaceutical
Manufacturing Process Using Near Infrared Spectroscopy
And Imaging Microscopy
The ability of near infrared (NIR) reflectance and transmittance spectroscopy and near
infrared imaging microscopy to enable multivariate statistical quality control of an
entire pharmaceutical tablet manufacturing process has been demonstrated.
Statistical quality control of process intermediates at each of the processes’ stages (raw
materials dispensing, powder blending and tabletting of blend) required construction of
a multivariate model from a reference set of NIR spectroscopic measurements of
process intermediates. With blends and tablets, these measurements were collected from
a number of different batches where the process was known to have operated within
specification and within a state of statistical control. With the powdered raw materials,
measurements were made of pharmaceutical grade materials.
Using the multivariate models, it was shown possible to assess the quality of raw
materials by NIR spectroscopy and determine their suitability for use in manufacture.
Simultaneous determination of powdered pharmaceutical raw materials’ identities and
their accurate particle size distributions were obtainable from a single averaged NIR
spectrum.
The models developed from NIR measurements of blends and tablets enabled a level of
quality control at each of these process stages superior to current reference analytical
laboratory measurements of these. Significant trends in process deviation could be
identified from an averaged NIR spectrum even at the blend stage despite within
specification reference laboratory data. These batches of blends ultimately produced
tablets of lower quality. This included tablet friability, increased tablet thickness and
prolonged tablet dissolution time.
NIR microscopic imaging of these lower quality pharmaceutical blends was examined
to provide more detailed diagnostic information. The spatial locations and size of drug
substance particles could be readily identified and in some instances showed unmilled
drug substance particles. This demonstrated the potential of NIR imaging microscopy
for on-line or at-line quality control of the blending stage.
ACKNOWLEDGEMENTS
I would like to thank those who have helped me in my studies at The School of
Pharmacy. My supervisors, Prof. Tony Moffat and Dr. Roger Jee, deserve special
thanks. Throughout my research training, they provided invaluable comments and
guidance. I particularly enjoyed the useful discussions in the bar, and participation in
several international conferences.
I am also grateful to my industrial supervisor. Dr. Perry A. Hailey, Pfizer Central
Research, Pfizer Ltd., and to Pfizer Ltd. for funding my Ph.D. studentship.
Perry devised the concepts for my Ph.D. research and was an excellent source of advice
on matters of chemometrics and computing. He also arranged for me to be provided
with an extensive set of pharmaceutical process materials. John Wakeman, Pfizer
Central Research, Quality Operations, deserves many thanks for kindly selecting
appropriate batches of normal and unusual process materials.
I also thank the Mathworks Inc. for providing Matlab 5.2 software, FMC International
for supplying Avicel samples and Foss NIRSystems, Buhler AG and Spectral
Dimensions Inc. for use of near infrared instruments.
I greatly appreciate the support and encouragement of my parents, Steve and Angela,
over the years - especially during my Ph.D.
Contents
Title 1
Abstract 2
Acknowledgements 3
List of Abbreviations 9
List of Symbols 12
1. Introduction 17
1.3 Aim, 20
2.1 Introduction, 48
Distribution, 65
2.12 Conclusion, 93
3.1 Introduction, 95
3.3 Materials, 97
4.2 Near Infrared And Reference Analysis Data Sets Used, 181
PLSR, 182
191
Images, 235
Substance, 245
6. Conclusion 272
References 274
Appendix C. Tables for Singleblock PLSAnd Multiblock PLS Data Sets 319
mastersizerspectra
avicelmastersizer
Section 2.11
approx. approximation,
BN batch number,
CL control limit.
C. of A. certificate of analysis.
CV coefficient of variation,
d. f. degrees of freedom.
Fig. figure.
Max. maximum.
MHz megahertz.
PC principal component.
PTFE polytetrafluoroethylene.
RCA near infrared Rapid Content Analyser module for diffusely reflective
detrend.
UV ultraviolet.
11
LIST OF S YMB OL S
confidence interval.
weight coefficient.
|l lO-*.
deviation.
e absorption coefficient.
chi-squared statistic.
approximation.
12
bo multiple linear regression equation intercept.
concentration.
D Mahalanobis distance.
block.
block.
coefficient, e.
statistic.
Kronecker product.
Pa partial least squares loading vector for matrix X in one partial least
squares dimension, a.
qa partial least squares loading vector for Y matrix in partial least squares
dimension, a.
analysis.
of infinite thickness.
of infinite thickness.
T transmission.
7^ Hotelling’s 7^ statistic.
15
matrix.
ta partial least squares score vector for X matrix in partial least squares
dimension, a.
tCa consensus multiblock partial least squares score vector in partial least
squares dimension, a.
tla multiblock partial least squares score vector for X I matrix in partial least
squares dimension, a.
î2a multiblock partial least squares score vector for X2 matrix in partial least
squares dimension, a.
Xe anharmonicity constant.
observation i.
16
C HA P T E R 1
Introduction
1.1 Pharmaceutical Quality Control
This is most often achieved through a series of tests which are performed in a laboratory
situated away from the production area. Analysis usually requires destruction of the
characteristics. For example, chemical tests may include tablet strength assay by high
performance liquid chromatography (HPLC) and total moisture content assay by Karl
Fischer titration. Physical tests may include tablet friability or particle size analysis.
These conventional tests measure the average amount of a component and its variance
within a batch but do not assess the distribution of these components within an
individual dosage unit or small sample of process material. With well controlled
processes, knowledge of the average quantity of a component and its variance within a
batch provides assurance that a batch of product will conform to its specification.
However there exists the potential for such a batch to produce a product of unexpectedly
low quality if the distribution of materials within dosage units is not uniform and if this
components, such as moisture (which may exist as surface and free moisture and water
and excipients (for example disintegrant) may not be homogeneous within an individual
An added disadvantage with remote laboratory analysis is the often lengthy time for
analysis at each process stage. Pharmaceutical products are therefore mostly produced
The emergence of modem spectrometric technologies that provide rapid and sample
owing to its speed of measurement and high signal to noise ratio and the ability to
perform both reflectance and transmittance measurements of intact dosage forms and
powders. The wealth of information recorded in the near infrared (NIR) spectrum of a
sample matrix contains considerable chemical and physical information. NIRS has
pharmaceutical industry.
fibre-optic through the wall of a blender (Hailey 1996; Maesschalck et al, 1998) and of
the tablets by reflectance and transmission spectroscopy (Andersson et al, 1999). The
advantage of process based measurements is that data are generated and analysed in
real-time thus providing the potential for efficient control of the process.
18
NIR measurements acquired throughout a manufacturing process are likely to contain
information that relates to the process performance history of a batch. Collectively these
process control applications have only focused on discrete parts of a process, such as
blending (Hailey et al, 1994, 1996). Complete statistical control of processes from start
reference set of data from past successful batches and produced excellent monitoring
processes, the results must compare favourably with those of existing conventional
analyses. Once demonstrated, such models could be used as part of an intelligent system
for future process control by NIRS. This ‘smart’ manufacturing system, envisaged by
Hailey (1996), is also referred to as parametric release. This is defined by the European
“ A system o f release that gives assurance that the product is o f the intended quality
for terminally sterilised sterile products {The Industrial Pharmacist, 1999,10, 9).
However, the potential exists for applying these principles to other pharmaceutical
processes, such as a tabletting process, since the British Pharmacopoeia 1999 states
that: '"''...parametric release is, in appropriate circumstances, not precluded by the need
19
1.3 Aim.
The aim of this investigation was to determine the ability of NIR spectroscopy for
materials through to blends and tablets. The process studied was the manufacture of a
Pfizer Ltd. marketed tablet with which some manufacturing anomalies had been
experienced over a manufacturing period. Samples from the unusual batches and also
from a considerable number of normal batches were obtained from the Quality
Though NIR spectra of solid materials contain considerable chemical and physical
information, the physical aspect, which is largely due to particle size, is often quoted as
because it exerts a major role in determining process performance. Both chemical and
particle size analyses are required of all raw materials prior to manufacture and the
ability to determine both with a single, rapid NIR measurement would be invaluable.
The use of NIRS for chemical identification has been described (Candolfi et al, 1999a,
particle size analysis has not been described. The ability to both identify raw materials
and to determine their particle size distributions by NIRS was therefore examined
(Chapter 2).
NIRS has also been used for on-line and at-line control of other pharmaceutical process
1998) and control of tablet film coating (Andersson et al, 1999). At present, the use of
has not been demonstrated. In this study, the ability of NIRS to allow statistical process
statistical models of NIR data of blends and tablets were generated and their ability to
20
enable discrimination of high quality batches from anomalous batches was determined.
Consistency between blend and tablet models was examined as this would indicate
models were used since these reduced the dimensionality of the data from several
systematic variance in the data. These models were therefore easier to interpret by
multivariate statistical process control (MSPC), which is in keeping with the concept of
statistical process control {Statistical Process Control, ed. Mamzic, 1995). The
multivariate models examined were ‘model-free’ models derived solely from the NIR
data (Chapter 3) and models derived from both NIR measurements and certificate of
analysis (C. of A.) data (Chapter 4). Comparison of the results of these two model types
A collection of some of the poor blends were also studied by NIR imaging microscopy
(Chapter 5). This technique was investigated in the hope that it would provide detailed,
spatially resolved chemical and physical information of the blends and thereby enable
diagnosis of reasons for poor blend performance. The NIR image data were subjected to
Overall conclusions of the suitability of NIRS for multivariate statistical process control
of this tablet manufacturing process were drawn after consideration of the pooled results
(Chapter 6).
21
1.4 Principles of Near Infrared Spectroscopy
The near infrared region of the electromagnetic spectrum was the first non-visible
Herschel in 1800 (Stark et al, 1986). This region lies between the visible and mid
Society for Testing and Materials’ (ASTM) Working Group on NIRS as the spectral
region spanning the wavelength range 780 to 2526 nm (Stark et al, 1986). In this region,
bands, hence visual identification of chemical groupings of a molecule from its NIR
spectrum is more difficult than from the ‘finger-print’ region of its mid infrared
spectrum. The overtone and combination bands are one to three orders of magnitude
weaker than the fundamental bands. This is advantageous for sampling of solids and
liquids, as NIR radiation tends to penetrate further into samples than mid infrared
For a material to absorb in the infrared region, the incident light must be of sufficiently
high energy to produce vibrational transitions in the molecules of the material. This
means that the frequency of the infra red light should match the fundamental vibration
frequency of a given molecule, and that a change in its dipole moment should occur due
harmonic oscillator model (Osborne et al, 1993), which obeys Hooke’s law as:
1 IT (lAl)
iTTC^jU
22
where c is the speed of light in a vacuum, k is the bonding force constant and the
The variation of the potential energy with bond distance may be described by a parabola
centred about the equilibrium distance and has evenly spaced vibrational energy levels.
^ v = / ( v + i) (1-4.2)
w here/is the vibrational frequency and v is the vibrational quantum number. The
selection rule for harmonic oscillator transitions is Av = ±1, hence the energy difference
between two consecutive energy levels will always be E(v+]) -E v =f, which is the
because real bonds, though elastic, do not exactly obey Hooke’s law due to coulombic
repulsion between nuclei (Osborne et al, 1993). The result of this is that the potential
energy curve for real bonds is only approximately parabolic. Deviation from the
parabola is most pronounced at the upper energy levels, where spacing between energy
levels also decreases with energy level. The harmonic oscillator model may be
improved by adding higher order terms to equation (1.4.2). The energy, Ev, for each
£ v = / . ( v + i ) - / . ^ . ( v + i ) + /î (1-4.3)
23
uniform spacing between levels corresponding to a parabola with its centre at the
equilibrium distance and its the same curvature as the real potential energy function and
h is a. higher order term. Neglecting higher order terms, the frequency of a transition
A consequence of introducing the quadratic term into Hooke’s law is that the selection
rule becomes Av = ±1, ±2, ..,±n, where n is an integer. As a result, other higher
The intensity of the overtones decays abruptly since transition probability falls rapidly
with increase in vibrational quantum number. In practice, just the first two to three
overtones are observable (Osborne et al, 1993). Polyatomic molecules may possess
several fundamental frequencies, and therefore will show simultaneous changes in the
energies of two or more vibrational modes. The frequency observed will either be the
sum of, or the difference between, fundamental frequencies. The result is very weak
bands which are known as ‘combination’ and ‘difference’ bands (Osborne et al, 1993).
Combination bands are unlikely to be observed in NIR spectra unless they arise from
two vibrations, linked either through a common atom or through several bonds;
difference bands arise from absorption of molecules which are in an excited state, and
have a very low probability of being observed in NIR spectra at room temperature
(Osborne et al, 1993). Anharmonicity produces combination bands which are slightly
Most NIR bands arise due to overtones and combinations of various hydrogen bonds.
24
eg C-H , N-H, O -H and S-H. These overtones and combinations of hydrogen bonds
are observed in the NIR region due to the small mass and large force constant of
hydrogen (Blanco et al, 1998). Other groups, such as C=0, C-C, C -F and C-Cl exhibit
very weak overtone bands in the NIR region, which in practice may be difficult to
observe.
NIR light which penetrates a powder’s surface is scattered by particles many times
before emerging back through the surface. This is known as diffuse reflectance. In the
NIR region, solid materials tend to exhibit low molar absorptivity (typically 0.01 to 0.1
moF^ dm^ cm"’) (Blanco et al, 1998). This enables NIR diffuse reflectance
The most widely accepted theory which describes diffuse reflectance and the
Munk theory (Frei and MacNeil, 1973). This theory was developed for infinitely thick
_ k (1 5.1)
2 R '. ' i
where R’oois the absolute reflectance of the layer, k is its molar absorption coefficient,
and s is the scattering coefficient. Instead of determining R’oo, it is usual to work with
ceramic. In these cases, k is assumed to have a value of zero and the absolute reflectance
25
is assumed to be one. However, since the absolute reflectance of standards exhibiting
the highest /?’oo values never exceeds 0.98 to 0.99, the actual relationship becomes:
Resample _ ^ (1.5.2)
R standard
and it is essential to specify the standard used (in Section 2, the diffuse reflectance
standards used were Spectralon (PTFE) (Labsphere Inc., North Sutton, NH, USA) and a
ceramic tile; in Section 3, the diffuse reflectance standard used was a ceramic tile).
2R s
which shows that a linear relationship should be observed between F(Roo) and the
which are measured against the pure powder, have an absorption coefficient which may
be re-written as the product 2.30ec, where g is the absorption coefficient and c is the
molar concentration. The Kubelka-Munk equation (1.5.3) may then be written as:
2R k'
26
where k ’ is a constant equal to 5/ 2.303 e. As F(Roo) is proportional to the molar
dilution, the regular reflection from the sample approximates that from the reflectance
A linear relationship between F{Roo) and c is only observed when dealing with weakly
and only when the particle size used is relatively small (ideally around 1 pm in
diameter) (Kortum, 1969). In addition, any significant departure from the state of
infinite thickness of the adsorbent layer assumed in the derivation of equation (1.5.3)
When either adsorbents with large particle size or large concentrations of the absorbing
species are used, plots of F{Roa) versus c become markedly non-linear at higher
of both regular and diffuse reflectance. Regular reflection is a mirror reflection, whereas
diffuse reflection occurs when impinging radiation is partly absorbed and partly
scattered by a material such that it is reflected in a diffuse manner, i.e. with no defined
1969):
where k is the absorption coefficient and n is the refractive index. Regular reflectance is
27
from linearity between F(Roo) and c at high concentrations of the absorbing species. It is
as much as possible. This may be achieved by using powders with small particle size
Other theories developed to describe diffuse reflectance have been shown to be special
thick powders (approximately 1 mm in thickness, or greater, for fine powders), has been
In practice, however, NIR spectroscopic data tend to be used as either raw data (relative
(1.5.6)
A = log 10 = ac
(1.5.7)
A = log 10 = ac
Though equations (1.5.6) and (1.5.7) are not based on the Kubelka-Munk theory, highly
satisfactory results are often obtained with these transformations in NIR spectroscopic
applications.
28
1.6 Near Infrared Spectral Data Pre-processing
The NIR spectral data obtained from diffuse reflectance and transmission measurements
of solid materials comprise chemical information, described in Section 1.5, and physical
information arising due to multiple scatter. This latter effect results in spectral offset and
quantification, this physical aspect of the spectrum may not be considered useful.
multiple scatter from the spectra. These are applied to individual NIR spectra prior to
(described in Section 1.3); standard normal variate (SNV) (Barnes et al, 1989);
multiplicative scatter correction (MSG) (Ilari et al, 1988); quadratic baseline detrend
(DT) (Barnes et al, 1989) and first and second derivatives {Advances in Near-Infrared
to apparent absorbance were described in Section 1.5, equations (1.5.6) and (1.5.7)
respectively.
('a. ( 1 .6 . 1 )
= 1 ----
where y and r are the SNV transformed and original signals, respectively, at wavelength
Xj, jjL and cr are the mean signal of the original spectrum and the standard deviation of
I the spectrum respectively, over all J wavelengths. The effect of SNV transformation on
NCR spectra of materials of identical chemical composition but different particle size is
(Dhanoa et al, 1994). However, this method of scatter correction requires that the mean
spectral response of a data set be calculated. Individual NIR spectra are then linearly
where y and r are the original and mean signals, respectively, at wavelength Àj,j = 1 ,...,
J wavelengths, a is the offset of the regression equation, b is the slope of the linear least
squares regression, ris the mean spectral response of the data set at wavelength Àj, and e
is the residual signal at wavelength Ay. The MSC transformed spectrum is obtained by
subtraction of the offset constant, a, from the spectral response at each wavelength,
spectra. The transformation involves fitting the original spectrum to a second order
polynomial:
represents the spectral offset, b and c are the coefficients of the quadratic least squares
equation, and e is the residual signal at wavelength Xj. An estimate of the spectral
baseline is obtained from the first three terms on the right hand side of equation (1.6.3):
, .2 ^ . (1.6.4)
30
The estimated baseline is subtracted from the original spectrum to produce the residual
or ‘detrend’ spectrum. This scatter correction removes both spectral offset and curvature
The high signal to noise ratio of NIR spectral data enable derivative spectra to be
spectra involves calculating the difference between spectral data points at evenly spaced
where y is the first derivative of the original signal at r is the original signal at Xj.
The number of data points, used in the smoothing may be varied depending upon the
The second derivative difference spectrum may be calculated in a similar fashion to the
= (%.. - )/ 2 ( 1 .6 .6 )
Derivative spectra may also be calculated by applying a least squares digital polynomial
H., 1981, 1983). The latter method of calculation of the derivative spectrum is better
able to preserve peak heights than the difference method. Calculation of the second
derivative spectrum largely eliminates spectral offset and baseline curvature which
result from multiple scatter. In addition, chemical peaks in the spectra are resolved. If
the original spectral data is reflectance, the second derivative peaks have a positive
31
value; if the original spectral data is absorbance, the second derivative peaks have
negative values. The Savitzky-Golay method for calculating 2"^ derivative spectra was
used (quadratic, 11 data point: Chapters 2, 3 and 4; quadratic 15 data point: Chapter 5).
techniques used include principal components analysis (PCA) and partial least squares
regression (PLSR). These methods are used to reduce the dimensionality of the often
highly collinear spectral data, from several hundred variables (wavelengths), to a set of
new variables in reduced dimensional space. PCA is a useful technique that allows
exploratory data analysis and qualitative analysis of a spectral data set. PLSR is a biased
regression method that produces a set of new latent variables that maximise the
covariance between the spectral data and a set of reference data. With both of these
Section 1.6, may be applied to the spectral data, and their effect on the qualitative and
As NIR spectra are highly collinear, it is often advantageous to transform them to their
principal components via PCA. This is a multivariate data reduction method, which
produces linear combinations of the original variables. These may be thought of as a set
of new variables that have the property of being uncorrelated (Jackson, 1991). The PCA
transformation is a two step transformation (Kirsch and Drennen, 1995). In the first
the centre of a spectral cluster. This is achieved through variable-wise mean centring
32
of the spectral data. The next step is the rotation of the cartesian co-ordinate system to
describe, as nearly as possible, all the variations present in the spectral cluster. The co
ordinate system remains rectangular throughout the process and is moved rigidly from
one position to the next. This rotation step decomposes the spectral variation into
In the process of calculating each principal axis, the perpendicular distances between
the spectral data points and each axis are minimised. Hence, the first principal
component (PC) describes the largest amount of spectral variation. This is then
effectively subtracted from the data cluster. The second principal component is defined
in a similar manner to the first, except that during the rotation to the second principal
axis, the second axis remains perpendicular to the first. This orthogonal condition forces
each component to account for the maximum spectral variation remaining in the cluster.
Typically, only a small number of these iterations are required before additional
components account for random noise. These components are therefore usually ignored.
The transformation process ultimately produces spectral co-ordinates that are expressed
and noise. The transformation to principal axes simplifies the selection of variables for
X = T P +E (1.7.1)
The loadings matrix comprises a loadings vector, for each extracted PC. Each of these is
a set of weights that identifies the variables (wavelengths) which contribute most to
33
that PC. As a result, it may be possible to assign physical or chemical interpretation to a
PC’s loadings vector. For example, the loadings may represent chemical peaks. The PC
score vector of each spectral observation is produced by the product of the original
spectrum with the loadings vector, and therefore will have score values for each PC
dimension that are related to the magnitude of each of the principal components in the
original spectra.
The value of the PC score value may therefore reflect the amount of a chemical or
physical constituent present in the sample. PC scores are therefore useful in cluster
The number of principal components extracted from the spectral data may be selected
according to a number of criteria. One simple method for estimating the number of
useful components, the model rank, {i.e. those that account for systematic variance)
involves visual inspection of the loading vectors. PCs with loadings which appear to
percentage sum of squares accounted for by the model. This involves calculation of the
sum of squares (%SS) of the original variable-wise mean centred data, X, and of the
residual matrix, E, after n PCs have been extracted, over all observations, m, and
variables, n:
^ m n m n ^ (1.7.2)
%SS = 100 X
1=1 y=l 1=1 7=1
Where this method is used to determine the number of PCs to retain, the process of
extracting PCs is terminated after an arbitrary %SS has been reached. Typically this
34
may be 95%SS or 99%SS.
Cross-validation is another popular method for determining the rank of a PC model. The
process involves randomly dividing a data set into a number of sub-groups. A PCA
model is calculated for the data set after removal of one of the sub-groups. The
remaining sub-group, is then projected onto the eigenvectors of the model, according to:
T = XP
The resultant scores of this sub-group are then used to estimate the original data of the
X =TP
This procedure is repeated for all subgroups, and for n PCs. The predicted residual error
E ^ X - X (1.7.5)
The predicted residual error sum of squares (PRESS) for n PCs and all sub-groups is the
sum of the squared residual error matrix, E, in equation (1.7.5), over all observations, m
and variables, n, divided by the number of elements in the original spectral data set. For
zero PCs extracted, the value of PRESS used is the sum of squares of the original
spectral data, X, divided by the number of elements. The rank of the PCA model is the
number of PCs which provide the smallest PRESS value (Jackson, 1991).
Another statistic used to determine the rank of a PCA model is the R statistic (Lindberg
et al, 1983; Wold, 1978). This involves calculation of PRESS, as in equation (1.7.5). In
35
addition, PCA models are calculated with n PCs, using all of the spectral data. The
residual error sum of squares, is the sum of the square of E, from equation (1.7.5), over
all observations, m, and variables, n, divided by the number of elements in the spectral
(1.7.6)
j^^PRESS{n + \)
RSS(n)
where n is the number of PCs extracted. The first value for n is zero, and the RSS for
zero PCs extracted is the sum of squares of the spectral data, X, over all observations, m,
and over all wavelengths, n, divided by the number of elements in the spectral data.
With successive PCs extracted, the value of R should increase from near zero to unity.
Extraction of PCs terminates at n PCs when R exceeds unity. This implies that the latest
component extracted did not better the prediction errors than the previous component,
hence n - l PCs is the rank of the model. The W statistic (Eastment and Krzanowski,
where
Dn f =m + p - 2 n (1.7.8)
= p i m - V ) - ^ i m + p -2 i) (1.7.9)
1=1 36
m is the total number of observations and p is the number of variables (wavelengths).
With successive PCs extracted, the value of W should fall. PCs are included in the
model up to the number, n, which return values of W greater than or equal to unity, with
PCA models of NIR spectral data may be used for subsequent qualification of
unclassified NIR spectra, and for detection of outliers, using the Q statistic (Jackson,
1991; MacGregor et al, 1994; Nomikos and MacGregor; 1994). This statistic gives a
measure of the distance of the observations from the «-dimensional space and is
Q =(x -x y (x -x ) (1.7.10)
where x represents the original observation and jc is the value of the observation
predicted by the PCA model. The critical value for Q may be calculated as (Jackson,
1991):
, eA % + i), 1
Qa =
0i=Tr(E) (1.7.12)
% = Tr (E^) (1.7.13)
37
6>3 = Tr(EO (1.7.14)
where E is the residual covariance matrix after n PCs have been extracted and:
6>.
(1.7.16)
y [ïë X
Observations whose Q values exceed the upper limit do not belong to the modelled
class. Such observations should be removed from the data set and the model re
calculated.
The confidence interval (or control limit) for Q requires calculation of the square and
alternative and computationally faster method of calculating the control limits has been
described (Nomikos and MacGregor, 1995). This method approximates the squared
residuals, Q, to a weighted chi-squared distribution (g^h)- The weight (g) and the
degrees of freedom (h) are both functions of the eigenvalues of Z. Estimation of g and h
of Q. The mean and variance of the g ^ h distribution (/ll = gh, = 2gh^) are equated to
the sample mean (m) and variance (v) of the Q sample. Previous studies (Nomikos and
MacGregor, 1995) have found this to be a quick and reliable method to estimate g and h
provided that the number of Q observations is sufficiently large. The control limit on Q
significance level a.
and provides a set of latent vectors that are analogous to PCs (Kirsch and Drennen,
1995). Partial least squares (PLS) attempts to summarise as much of the variation in the
dependent variables as possible using only the relevant factors contained in the spectral
data. PLS has the advantage that it allows for measurement error in both the spectral and
independent data sets, whilst modelling the spectral data and correlating to the reference
data (chemical or physical). Only significant PLS components are retained, which
The PLS procedure projects the information in the high-dimensional data spaces (%, Y)
down onto low-dimensional spaces defined by a small number of latent variables. The
NIR (%) and reference analytical data set {Y) are usually mean-centred and scaled to
(1.7.18)
0=1
y = 'Z t y .+ F (1.7.19)
0=1
where the latent vectors t^ are sequentially computed from the data for each PLS
dimension (a= 1 , 2 , ..., A) such that the linear combination of the x vectors defined by
39
the latent variable:
(1.7.20)
vectors and are loading vectors whose elements Waj and ^aj express thecontribution
of each variable jcj and yj, respectively, towards defining the new latent variables fa and
Wa.
Y = X^ +F (1.7.22)
where W, P, and Q are (A:*A), (â:*A) and (m*A) matrices whose columns are the vectors
Wa,Pa, and ^a The number of PLS dimensions, A, required to extract the information
from X and Y and provide the lowest prediction error in Y is usually determined by
sub-group from X and its corresponding observations in Y are withheld from calculation
of the PLS model. Equation (1.7.22) is then used to predict the reference analytical
values of the spectral sub-group. The sum of squares of the differences in the predicted
40
and measured reference sub-group are calculated, over all observations, m, and PLS
components, a. The process is repeated for all subgroups, and the sum of squared
residuals of each sub-group summed over all sub-groups, to provide the value of
PRESS. The number of PLS components retained, a, is that value with the minimum
PRESS value.
Similar to PCA, the PLS loading vectors associated with the spectral data set, P, may
have physical or chemical interpretation. Their score values, T, and sum of squared
residuals, Q, may also be used in cluster analysis and statistical process control
1995). SPC is a proven technique, that is capable of producing dramatic results, and is
widely applicable to many processes where the output varies and where minimising
The original concepts of SPC were devised by Walter A. Shewhart in the early 1920s
{Statistical Process Control, ed. Mamzic, 1995). Importantly, he observed that when a
process remains in a state of statistical control, the random distribution of each output
variable is repeatable and is therefore predictable from one period to another. In this
state, a process is said to be affected only by ‘common’ causes of variation. These are
random, uncontrollable phenomena that are inherent in the process. This observation led
to development of the Shewhart control chart. This chart plots sampled data with respect
to time, in a form which is visually easy to interpret and thereby determine whether a
process is operating normally {i.e. where only common causes of variation are
41
made and the average value plotted on a chart. The chart also shows upper and lower
control limits which are based on the standard deviation of the process variable when
only common causes of variation are affecting the process. Establishment of these
monitoring and control of future process observations using the control phase 1 limits.
work in this area of SPC was carried out by Hotelling in 1947 in analysis of bombsight
data. With multivariate data, however, use of individual monitoring control charts for
each variable is associated with an increase in the overall type I error, a. This is the
probability of rejecting the null hypothesis {i.e. that the process is operating in a state of
1997):
a'= l-{l-a y ( 1 .8 . 1 )
(1980), is to use the PCs of the process data for multivariate monitoring. This has the
advantages of reducing the number of variables for monitoring and provides a known
process in control?’
3. The procedure should take into account the relationships among the
variables.
PCs of NIR spectra may be used for MSPC purposes. From these, sample variance-
covariance matrices of the process data may be estimated. This is known as ‘Control
phase r (Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and is performed to
During control phase 1, control ellipses (Montgomery, 1997) are calculated for the PCs
extracted from the spectral data. Assuming that the data for n variables follows a
multivariate normal distribution, the probability function (Massart et al, 1988) ,/(x), is
given by:
( 1. 8 .2 )
where fi represents the population mean vector (the centroid in the pattern space), Z is
the population variance-covariance matrix. The square root of the expression in brackets
43
=(x-nyi.-\x-n) (1.83)
This method of classification assumes that each class may be modelled by a multivariate
normal distribution. D^is computed for each object, x, in the learning class and follows
a chi-squared distribution. This enables 95% confidence limits for the ellipse to be
distance between itself and each of the modelled classes’ centroids. In practice,
however, the population variance-covariance matrix requires estimation from the data.
11 is the target class mean vector and S is the target class sample variance-covariance
matrix. This statistic has been shown to follow an F distribution (MacGregor et al,
1994; Neave, 1995) with upper control limit (UCL) given by (MacGregor et al, 1994):
(batches), m-n) is the upper 100«% critical point of the F distribution with («, m -n)
The Hotelling’s 7^ uses information from only current samples and is therefore
44
insensitive to small or moderate drift in the mean vector. The MEWMA (Lowry et al,
1992) control chart is a moving average control-chart which can show trends in the
process, such as drift and systematic variation. The MEWMA Z„ is given by (Lowry et
al, 1992):
Z, = A jc,+(1-A)Z,_, (1.8.6)
where A lies in the range: 0 - 1; jc, is the vector of the /th sample and Z,.y is the value of Z
for the /-1th sample. The value of plotted on the control chart is given by:
= z : z ; 'z . (1.8.7)
2, (1.8.8)
This statistic uses the process mean score vector and sample variance-covariance matrix
determined in control phase 1. The upper control limits used with this chart are based on
tabulated data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models
with a greater number of PCs may be assigned control limits based on the chi-squared
(1992).
The sample generalised variance (Montgomery, 1997), ISI, is the determinant of the
With two variables (in this case PCs), the UCL and mean control limit, CL are
UCL=\I}{b, ) (1.8.9)
CL = b X ( 1. 8 . 10)
( 1. 8 . 11)
and
' in-lŸ"
fi “7+2)-p[ ~7)
;= 1 j=]
( 1. 8 . 12)
However the control limits for the sample generalised variance above, are applicable
only for situations where two variables are monitored. With more than two variables,
(Anderson, 1984):
Aloi A
-1 (1.8.13)
46
Where S is the sample covariance matrix of a batch with n variables and m degrees of
mean of zero and variance 2n; IZI is a theoretical sample generalised variance and is
equivalent to the mean sample generalised variance of all batches used in control phase
Where the process mean vector of batch data exceeds the upper control limit in
and MacGregor, 1996), if the PCA models have physical or chemical interpretation
(Nomikos and MacGregor, 1995) {i.e. if the loadings of the PCs appear to represent
the multivariate 7^. These contributions, may be plotted as a stacked bar chart
(Jackson, 1980, 1981a, 1981b), or individually. Large values of f^i would indicate
In addition, Shewhart control charts may be constructed for the individual PC scores
(Jackson, 1991). These use the mean and range for each PC score sample, measured for
the historical data set. The overall or grand mean is equal to zero (unless additional
observations are projected into the model space) with each model, whilst the range is
value of the r-distribution (95 % and 99% control limits). This is used instead of the
47
CHAPTER 2
Spectroscopy
2.1 Introduction
1988; Barth et al, 1987) because it influences bulk physical properties (Washington,
1992), and determines the ability of powders to flow, mix, granulate and dissolve. It is
Commonly employed methods of measurement are forward angle laser light scattering
(FALLS) and electrical zone sensing (Simmons, 1993). A disadvantage with these
methods is that samples generally need to be analysed away from the production area,
which is time consuming and leads to manufacturing delays (Hailey et al, 1996).
particle size is examined. Section 2.8 examines measurement of median particle size by
2.9 this method is further developed to measure the cumulative percentage frequency
using PLSR. Section 2.11 deals with classification of grades of powdered material by
48
cluster analysis methods.
pharmaceuticals has long been suggested (Ciurczak et al, 1986; Plugge and Vlies, van
der, 1993; Vlies, van der, 1996). Despite these suggestions, NIRS has remained largely
a technique used for chemical analysis (Dubois et al, 1987; Aucott et al, 1988, Cowe et
al, 1989, Dreassi et al, 1995a, 1995b, 1995c; Wargo and Drennen, 1996; Forbes et al,
1996).
Powdered pharmaceutical materials may be suited for NIR particle size determinations
since they are diffusely reflecting materials (Ciurczak et al, 1986). In the NIR region
(1000 nm to 2500 nm) these materials both absorb and scatter light, resulting in spectra
with non uniform baselines and varying offsets. These scatter effects vary with the
particle size (Vlies, van der, 1996), sample porosity (Hailey et al, 1996) (and hence
compaction pressure) and with the wavelength (Bull, 1991) and can be described using
Rayleigh and Mie theory (Kortum, 1969), or alternatively using the Kubelka-Munk
Previous studies that have examined the effects of particle size on NIR spectra have
demonstrated that reflectance varies non-linearly with particle size (Kortum, 1969;
Norris and Williams, 1984; Dari et al, 1988; Ciurczak et al, 1986). Ciurczack et al
(1986) found that reflectance exhibited an inverse relationship with mean particle size in
agreement with Mie theory (Kortum, 1969). However, this relationship does not
necessarily apply in all cases and is dependent on the shape of the particle size
distribution of the sample (Kortum, 1969), the particle shape (Kortum, 1969) and the
material's refractive index (Kortum, 1969). The presence of very small particles will
further complicate the relationship as these may exhibit Rayleigh scatter, which is
49
proportional to the third power of the particle size (Kortum, 1969).
size has resulted in most work in this area focusing on chemometric calibration
methods, rather than theoretical models (Vlies, van der et al, 1995; Plugge and Vlies,
van der, 1996; Ilari et al, 1988). A novel method for classifying pharmaceutical powders
involved transforming second derivative NIR spectra to polar co-ordinates (Vlies, van
der, et al 1995). Each NIR spectrum was reduced to a single quality point in a plane,
that could therefore be plotted in cartesian co-ordinates. Linear plots of the logarithm of
the particle size versus %ory co-ordinate were found to show significant correlation.
Multivariate calibration methods have proven most successful in calibrating NIR data to
measure particle size (Ilari et al, 1988). The first method to appear in the literature
produced NIR calibrations to determine particle size in organic and inorganic powders.
Importantly in this paper, a new method of scatter correction, MSC, was described (Ilari
et al, 1988). The scatter correction technique linearly regressed each spectrum to the
mean of the data set and resulted in two coefficients: an intercept and a slope
coefficient. For each spectrum, these were used to correct for scatter. Reflectance
spectra for each material were recorded at 19 wavelengths and transformed to Kubelka-
Munk function. PLSR models were produced using the transformed spectra and particle
measure particle size that has shown some success is artificial neural networks (Frake et
Single batches of aspirin and anhydrous caffeine (Sigma Chemical Co., St Louis,
50
USA) and paracetamol (Boots Pharmaceuticals, Nottingham, UK) were used.
Microcrystalline cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19 batches) and
Avicel PH200 (single batch) were all from FMC International, Wallingstown, Little
Island, Co Cork, Ireland. The batches of lactose monohydrate used were a single batch
of a reagent grade material (Avocado Research Chemicals Ltd., Hey sham, UK), 9
samples of 110 mesh obtained from two manufacturers (DMV International, Veghel,
Netherlands and Lactose New Zealand, Hawera, New Zealand) and 18 samples of
Grades of microcrystalline cellulose used were: Avicel PH 101 (16 batches), Avicel
PH 102 (19 batches) and Avicel PH200 (single batch), all from FMC International,
Wallingstown, Little Island, Co Cork, Ireland. These batches of Avicel were those used
in Section 2.8.
The microcrystalline cellulose samples used were obtained from one supplier (FMC
were from six different grades with different particle size distributions and nominal
Cork, Ireland. The batches of lactose monohydrate used were: lactose Regular (« = 9
Foremost Ingredients Group, Wisconsin, USA). The batches of Avicel used were those
of Sections 2.8 and 2.9. The batches of lactose used were those of Section 2.8.
Sections 2.8 and 2.9: Measurement o f The Number Median Particle Size and
grinding the coarse bulk material with a mortar and pestle (approximately 50 g).
Samples were taken successively with every few minutes of grinding. The ground
Air-jet sieve fractions of the aspirin, anhydrous caffeine and paracetamol (in each case
approximately 50 g of material was used) were produced using an Alpine air-jet sieve
(Alpine, Augsburg, Germany) with stainless steel wire mesh sieves of different sieve
diameter (75, 56, 50, 40 and 36 pm). With use of the smallest air-jet sieve, an additional
PH 101, Avicel PH 102, Avicel PH200 and reagent grade lactose monohydrate were
produced by machine sieving (Endecotts Ltd., London, UK) for 20 minutes using a nest
of progressively finer stainless steel wire mesh sieves (150, 90, 63, 45, 38 and 32 pm).
material was collected and used to fill a narrow soda glass vial (25 mm wide by 50 mm
52
Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution
A single narrow soda glass vial was filled with material for each powdered sample (n =
113) (approximately 8 g of material per vial). These were allowed to settle overnight.
obtained from the manufacturers. A single narrow soda glass vial was filled with
,Sections 2.8 and 2.9: M easurement o f The Number Median Particle Size and
M easurement o f The Cumulative Percentage Frequency Particle Size Distribution
The particle size distributions of sieve fractions and of the remaining batches of bulk
lactose 110 mesh, Fastflo, Avicel PH I01 and Avicel PH 102 were measured by FALLS
(Malvern 2600C, Malvern Instruments, Malvern, UK). A sample from each soda glass
vial (approximately 100 mg in each case) was suspended in a disperse medium in which
detergent) prior to particle sizing and was gently shaken using a vortex-mixer to prevent
formation of agglomerates. Avicel and aspirin samples were dispersed in cold, distilled
water using dilute detergent. Anhydrous caffeine and lactose monohydrate were
particle shapes were assessed by scanning electron microscopy using a Philips XL20
' Cumulative percentage frequency particle size data for Avicel samples are supplied in
53
Appendix D, CD-ROM.
Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution
Particle size distribution data for the microcrystalline cellulose samples were acquired
(Malvern Instruments, Malvern, UK) equipped with a Dry Powder Feeder. These data
Cork, Ireland. These data are supplied in Appendix D, CD-ROM (inside back cover).
The materials were not particle sized. Instead, they were classified by their nominal
grade. NIR spectral data are supplied in Appendix D, CD-ROM (inside back cover).
Sections 2.8, 2.9 and 2.11 M easurement o f The Number Median Particle Size,
NIR measurements were made using a FT-NIR NIRVIS spectrometer (No. 100.1,
Buhler AG, Uzwil, Switzerland) fitted with a Buhler fibre-optic probe (No. 110.2).
Reflectance spectra were acquired by inserting the probe into the sample, and were
recorded over the range 4008 to 9996 cm“^ (500 data points), each spectrum being the
average of six scans. The reflectance reference used was Spectralon (Labsphere Inc.,
North Sutton, NH, USA). Microcrystalline cellulose data (Sections 2.8, 2.9 and 2.11)
Near infrared reflectance measurements of the powdered samples were made using a
sample was centrally positioned on the window of the RCA stage, using an iris
mechanism, above the lead sulphide detectors. A diffuse reflectance spectrum of each
sample was recorded with the lid of the RCA closed. Each spectrum was the average of
32 scans and was recorded over the range 1100 to 2500 nm, at 2 nm increments (700
data points). The reflectance reference used was a ceramic tile. Data are supplied in
All programs were written in Matlab 5.2 Scientific and Technical Language, except for
the MLR program. This was written in-house in C and used routine svdfit, available in
Reflectance at any wavenumber versus number median particle size, dsoox Mdso
exhibited a curvilinear relationship. To allow for this, two different approaches were
compared: single wavenumber quadratic least squares regression and full two-
wavenumber search MLR. With each of these calibration methods, different pre
treatments of the NIR spectral and FALLS dso data were applied and their effects on
standard errors of calibration (SEC) and prediction (SEP) (Mark, 1991) observed:
SEC = (=1_________
m —a —\ (2 .8 . 1 )
55
SEP = (=1 ( 2 .8 .2 )
m
where F, is the measured value of the zth sample in the calibration or prediction set,
A
is the calibration estimated or predicted value of the ith sample, a is the number of
prediction set. The effects of the data pre-treatments on bias and linearity were also
investigated.
Quadratic least squares fits of NIR spectral and FALLS data were used to allow for
gentle curvature in calibrations. The NIR spectral data were diffuse reflectance (of
infinite thickness for all practical purposes), R\ mean corrected reflectance (where the
mean reflectance value of an individual spectrum was subtracted from the reflectance at
each of it's spectral wavenumbers); absorbance, log {HR) and Kubelka-Munk function,
f(R). A search of all 500 data points was used to select the wavenumber giving the
The second calibration technique applied was two wavenumber MLR (Osborne et al,
1993). A search of all combinations of two wavenumbers from the 500 measured by the
spectrometer was carried out. The NIR spectral data used were again reflectance, mean-
treatments investigated were J 50, \ldso and the ln(FALLS d^o). All possible
Investigation with both calibration methods revealed that the reflectance data recorded
by the NIR spectrometer produced calibrations with significant correlation, low scatter
and low bias. Kubelka-Munk function data and absorbance also showed significant
correlation between NIR predicted J 50 and FALLS dso, however these pre-treatments
56
introduced significant fixed bias into the calibration. Reflectance data were therefore
subsequently used. The ln(FALLS J 50) was found to give two wavenumber MLR
calibrations with a lower SEC than I/J 50 or and were the FALLS data used in
To demonstrate the feasibility of the MLR calibration method, sieve fractions of a single
batch of three drugs were tried initially (aspirin, anhydrous caffeine and paracetamol).
(microcrystalline cellulose and lactose monohydrate) were produced from a larger data
set using either machine sieve fractions or a combination of machine sieve fractions and
bulk samples from a number of different batches. The quadratic least squares
and overtones arising from the fundamentals of the mid infrared, with non-uniform
baselines resulting from multiple scattering. Spectra also showed different offset values,
which appear to increase with wavenumber (Fig. 2.1). This has previously been
Reflectance at any wavenumber showed a generally inverse linear trend with median
particle size up to approximately 100 pm (Fig. 2.2) broadly agreeing with Mie and
Fraunhoffer theory for particles of comparable size to the wavelength (Kortum, 1969).
Beyond this particle size, the relationship becomes markedly non-linear (Fig. 2.2). A
57
0.9 0.8
g 0.8 8
I 0.7 S 0.6
^0,6 I 0.4
I 0.5 I
0.4 0.2
0.3
4000 6000 8000 10000 4000 6000 8000 10000
Wavenumber/cm - 1 Wavenumber/cm'- 1
Fig. 2.1. NIR reflectance spectra for different median particle sizes. (A)
microcrystalline cellulose: (a) 24 jim, (b) 45.8 pm, (c) 93.4 pm, (d) 261 pm, (e)
406 pm and (B) lactose monohydrate: (a) 44.7 pm, (b) 66.3 pm, (c) 98 pm, (d)
132 pm, (e) 168 pm.
58
0.55 -0.015
- 0.02
c -0.025
o 0.45 ^ -0.03
IT
0.4 -0.035
-0.04
0.35
100 200 300 400 100 200 300 400
FALLS FALLS
0.16
0.4
0.15
S
c 0.3 8c 0.14
su so 0.13
0> 0)
"5 0.2 0.12
cc
0.11 H
0.1
0.1
0.09-
50 100 150 50 100 150
FALLS FALLS
Fig. 2.2. Single wavenumber quadratic least squares fit of NIR spectral data and
median particle size, dso : (A) microcrystalline cellulose reflectance data (9012
(C) lactose reflectance data (7428 cm“^) and (D) lactose mean-corrected reflectance
59
quadratic least squares fit of the data (Fig. 2.2) between reflectance and median particle
showed useful correlations between the NIR predicted and FALLS J 50 values
Mean-correction of each spectrum was found to improve the correlation between NIR
predicted and FALLS dso with the microcrystalline cellulose data set (r = 0.98, 7128
cm"* (n = 57)). This pre-treatment acts to centre the data of individual spectra (Rmean =
0) and can help to eliminate baseline differences that occur as a result of variable
sample porosity and pressure applied with the fibre-optic probe. The variation in offset
will also be influenced by the flow properties of the material. Use of this pre-treatment
where the NIR measurements are recorded using a fibre-optic probe. However, with
lactose monohydrate (reagent grade, 110 mesh and Fastflo), which tends to have good
Pearce, 1986), this pre-treatment was not appropriate and gave a poorer fit between NIR
The results of these calibrations which applied MLR to all two wavenumber
between NIR reflectance and the ln(FALLS d s o ) values (Fig. 2.3). The ln(FALLS d s o )
60
400 180
160 150
« 100
T3
2 100
a 300 Q.
50
60
250 0
250 300 350 400 50 100 150 0 50 100 150
FALLS t/gg/ixm FALLS FALLS
61
ln(FALLS + /?, (2.8.3)
were bo is the intercept, bj and 62 are the MLR coefficients for the two wavenumbers, A;
and A2 respectively. Significant linear association was found between NIR predicted
lnû?5o and ln(FALLS J 50) values in each case (r = 0.99 (« = 7), anhydrous caffeine: r =
2.8.5 Full Two Wavenumber Search MLR Using Microcrystalline Cellulose And
Lactose
With each of these materials, preliminary data processing revealed that MLR of
reflectance versus ln(FALLS d$o) produced the most linear calibrations. In addition,
mean-correction of these spectra was not found to improve calibration results. The MLR
Before calibrations were attempted for both materials, the data of each was split into
two sets: a calibration set of mainly sieved fractions (microcrystalline cellulose {n = 24)
and lactose {n = 15)) and a validation set of bulk samples (microcrystalline cellulose {n
= 33) and lactose {n = 18)). With each calibration set, highly significant correlation {p <
0.005) was obtained between NIR predicted InJso and ln(FALLS d^d) values (Table 2.1).
However, the validation sets for each material exhibited more scatter (Table 2.1). With
the microcrystalline cellulose prediction set of bulk samples (Avicel grades PH 101 and
PH 102), significant correlation {p < 0.005) between NIR predicted \nd50 and ln(FALLS
dso) was obtained with a SEP greater than the SEC (Table 2.1). The high SEP is
possibly accounted for by the FALLS and scanning electron microscopy (SEM) results
62
Table 2.1. Microcrystalline cellulose and lactose MLR calibration (sieve fraction
bo' - 3 .9 9 4 .5 9
b ,' 5 9 .6 5 -1 7 2 .3
b : -5 7 .9 9 1 6 7 .8
S E C (ln (* n /ftm )) 0 .0 6 7 0 .0 9 7
S E P ( l n ( ( / ; ,) / |a m ) ) 0 .1 7 0 .1 8
In P = c + m l n ( F A L L S d s o )
C a lib ra tio n set
r 0 .9 9 0 .9 9
m 0 .9 9 098
c 0 .0 3 5 & % 8
n 2 4 ( P H lO l, P H 1 0 2 , P H 2 0 0 s ie v e d ) 15 ( S i e v e d a n d 1 1 0 n
* M L R c o e f f i c i e n t s : h o - i n t e r c e p t , h , - w a v e n u m b e r 1, a n d l?2 - w a v e n u m b e r 2 .
r i s c o r r e l a t i o n c o e f f i c i e n t ; m a n d c a r e s l o p e a n d i n t e r c e p t o f p l o t s o f N I R p r e d i c t e d InrA o v s . F A L L S m e a s u r e d I n J s o ; ti i s t h e
n u m b e r o f s a m p l e s i n e a c h d a t a s e t . ____________ _________ ____________________________ __________________ ____________ _____________________________
Fig. 2.4. SEM photographs. (A) Avicel PHlOl bulk sample, (B) Avicel PH200 > 200
pm sieve fraction, (C) lactose monobydrate < 31 pm sieve fraction and (D) lactose
monobydrate > 150 pm sieve fraction.
63
(Fig 2.4A & 2.4B) which showed that these bulk samples had broad distributions and
comprised a mixture of irregularly shaped fines and large spherical particles. Previous
work (Kortum, 1969) has shown that this can produce more variable results than the use
Validation of the lactose calibration used bulk samples of Fastflo from 18 different
batches. This spray dried material generally has a relatively uniform and spherical
particle size (Pearce, 1986). A narrow range of <^50 was confirmed by FALLS (Range
dsQ. 81.1 - 115.7 |im ) . Poor correlation was obtained between NIR predicted InJso and
ln(FALLS dso) with these samples and is probably due to the narrow range of particle
size in the prediction set as the SEP is not significantly different from that of the
SEM results of the lactose sieve fractions used in the calibration set showed small,
irregularly shaped fines in the smallest sieve fractions and large spherical particles in
the largest sieve fractions (Fig 2.4C & 2.4D), much the same as with the
2,8.5.2 Particle Size Calibration Using Randomised Sieve Fraction A nd Bulk Sample
Data
To produce working calibrations with the microcrystalline cellulose and lactose data
sets, the sieve fraction and bulk sample data were randomly assigned to either the
calibration set (67% of spectra) or validation set (33% of spectra). This procedure was
repeated three times to test the robustness of the method, giving three different
calibration and validation sets for the two materials. Both sieve fractions and bulk
overtone peaks; the selection of each wavenumber is therefore likely to have been
influenced by the random noise in each data set. With both materials, each of the three
MLR calibrations showed a good fit between NIR spectral and FALLS data
(microcrystalline cellulose: SEC (ln(J 5o/|Lim)) = 0.10 - 0.11^ and lactose monohydrate:
SEC (ln(J 5o/|im)) = 0.12 - 0.13^). This was confirmed by plots of NIR predicted InJso
cellulose: r = 0.98 (n = 38 for each set) and lactose monohydrate: r = 0.97 - 0.98^ (n =
22 for each set); in each case with p < 0.005) (Figs 2.5A & 2.6A). The three validation
sets for each material showed similar results (Figs 2.5B & 2.6B) with highly significant
(« = 19): r = 0.98, lactose monohydrate (n = 11): r = 0.93 - 0.97;^ in each case p <
0.005), and SEP comparable to SEC, microcrystalline cellulose: SEP (ln(J 5o/jim)) =
Distribution
^ The range gives the minimum and maximum values observed for the three randomly
selected calibration and validation sets.
65
10 10
10,2
Q. Q.
,1 .1
10 10
10
1 10 .2
10 ,3
10
,1
10.2 10.3
FALLS FALLS
10 10
s
•a ■o
10,2 10,2
a. Q.
10,1 ,1 10
,1
10 ,2
10' ,3
10 10.1 10',2 10.3
FALLS dgg/|im FALLS
66
2.9.1 Preliminary Investigation
In Section 2.7, it was shown that useful calibrations for median particle size can be
obtained by using NIR reflectance data with a logarithmic transform of the FALLS
particle-size data, hence these data have been used in this section.
With MLR calibrations, preliminary work showed that a 3 wavelength linear regression
at any of the FALLS quantités produced calibrations more robust than a two wavelength
fit. It was therefore decided that three wavelength MLR calibrations would be employed
subsequently. With PCR models, three principal components were required to produce
satisfactory calibrations and this number was used for all subsequent calibrations.
The spectra of each powdered sample exhibited the effects of multiple scatter, as
The FALLS instrument gives values of the cumulative percentage frequency particle-
size distribution at 64 particle sizes (range: 564 to 5.8 pm), at intervals which follow a
geometric progression. For each sample, linear interpolation of the measured FALLS
values was used to calculate the particle size values corresponding to the 5,10, 20, 30,
40, 50, 60, 70, 80, 90 and 95% quantiles (Appendix D, CD-ROM inside back cover).
The samples exhibited a wide range of particle sizes at each quantile (Table 2.2) and a
random for the calibration set; the remaining 23 samples were used as an independent
validation set. To aid comparison of the two calibration methods, the same calibration
67
Table 2.2. Particle size ranges at each quantile for the calibration and validation
68
2.9.4 Three-Wavelength Multiple Linear Regression
Data from the calibration samples were used to generate calibration equations for each
quantile by fitting the logio Jx values to the NIR reflectance values according to
equation (2.9.1):
(2-9.1)
wavelength Xa and ba the MLR coefficient for each wavelength. The selection of
wavelengths was performed on a reduced data set of every other wavelength to reduce
the computation time required. A full 3 wavelength search for each particle size quantile
calibrated therefore used 250 of the 500 available wavelengths. This reduced the total
computation time for all eleven calibrations to about 10 hours (on an Acer Pentium II
333 MHz PC), compared with an estimated 80 hours if all 500 wavelengths had been
searched. Though setting up the eleven calibrations is time consuming, the cumulative
particle size distributions of future samples may be calculated from their NIR spectra
virtually instantaneously.
For each calibration equation, the three chosen wavelengths (Table 2.3) were those
which gave the smallest standard error of calibration (SEC). The optimum wavelengths
were similar for the 30 to 60 percent quantiles, but varied somewhat for the extreme
quantiles. The calibration equations were then used to predict the validation set {n = 23)
69
Table 2.3. MLR wavelengths and PCs selected for each percentage quantile
calibration.
Table 2.4. MLR & PCR calibration and validation results at various percentage
quantiles.
Percentage
5 10 20 30 40 50 60 70 80 90 95
MLR
Calibration set (n = 34)
R 0.977 0.980 0.984 0.987 0.989 0.988 0.984 0.979 0.972 0.932 0.889
m 0.954 0.960 0.968 0.974 0.978 0.975 0.968 0.958 0.945 0.869* 0.791*
c 0.065 0.063 0.055 0.048 0.042 0.049 0.065 0.088 0.121 0.301* 0.497*
SEC (logioW /pm)) 0.084 0.071 0.055 0.046 0.039 0.039 0.042 0.046 0.052 0.080 0.104
CV(%) 19.3 16.3 12.7 10.6 9.0 9.0 9.7 10.6 12.0 18.4 23.9
PCR
Calibration set (.n = 34)
R 0.969 0.973 0.978 0.980 0.981 0.981 0.976 0.968 0.959 0.898 0.858
0.939 0.946 0.956 0.960 0.963 0.961 0.953 0.937 0.921 0.806 0.737
0.086 0.084 0.076 0.074 0.070 0.077 0.096 0.133 0.174 0.445* 0.627'
SEC (logioW pm )) 0.096 0.082 0.065 0.057 0.051 0.049 0.051 0.057 0.062 0.097 0.116
CV(%) 22.1 18.9 15.0 13.1 11.7 11.3 11.7 13.1 14.3 22.3 36.8
SEP (log,oW,/Mm)) 0.085 0.062 0.051 0.050 0.041 0.045 0.056 0.057 0.071 0.094 0.124
CV(%) 19.6 14.3 11.7 11.5 9.4 10.4 12.9 13.1 16.3 21.6 28.5
R is multiple correlation coefficient, m and c are slope and intercept of plots o f NIR predicted log,off, vs. FALLS measured log,ot/,; n is the number o f samples in each data
set; * m significantly different from 1, o r e significantly different from 0; CV - coefficient of variation.
70
model. This consists of a set of new variables which are uncorrelated and represent
The PCA model was obtained as the product of a score matrix, T, with a loadings
matrix, P, plus a residuals matrix, E, according to equation (1.7.1), from variable mean-
centred spectral data. Regression of FALLS data was as described above for MLR,
except that PC scores were used in place of reflectance values (Naes and Martens,
1988). For each calibration, the 3 PCs selected were those that gave the highest
correlations with the FALLS data (Table 2.3). The total time required to compute PCs
and PCR calibration equations was much faster than MLR, requiring only about 20
minutes.
With both methods, individual calibrations were the most precise at the 40% and 50%
quantiles (Table 2.4). This is clearly seen from the plot of SEC versus percentage
quantile (Fig. 2.7). The falling off in the precision of individual calibrations at the
extreme quantiles most probably reflects the shape of the distribution curves for the
particle-sizes in the calibration sets. The shapes of the distributions become more
With both MLR and PCR excellent calibration results were obtained, with low SEC
(Table 2.4). The SECs at each quantile are smaller with MLR, however the standard
errors of prediction (SEPs) for the independent validation set are smaller with the PCR
model (Table 2.4). This suggests that the PCR model is more robust. Table 2.4 also
gives the slopes and intercepts for the plots of NIR predicted logio^/x versus FALLS
measured log,o^x values at each quantile. The slopes and intercepts were not
significantly (5% probability level) different from 1 and 0 respectively, apart from a few
values (marked with an asterisk) which occurred at some of the extreme quantiles.
71
0.12
0.11
0.1
0.09
= -0 .0 7
0.06
0.05
0.04
0.03
20 30 40 50 60 70 80 100
Percentage quantile
72
2.9.7 Cumulative Particle Size Distributions
The percentage quantile value was plotted against the NIR predicted logio^fx of each
sample in the calibration and validation sets to give cumulative particle-size distribution
curves for both the MLR and PCR methods. The MLR and PCR results for the first 4
validation samples are shown in Fig. 2.8, which also shows the FALLS measured
calibration methods closely follow those obtained by FALLS, although PCR predicted
distributions match the FALLS measured distributions more closely than with MLR.
In this work, the number of quantiles at which calibration equations were set up was
restricted to 11. In principle, more or less could be used. With the present data sets the
errors do not justify the need for smaller intervals (Table 2.4).
73
Cumulative % frequency Cumulative % frequency Cumulative % frequency Cumulative % frequency
o
§ ê S
O-
T3
B) ?
5 I
2.
r
(D
<0 O o
1
++
O o
ë ê o>
o s s ë §
o O
? ?
03 a0
1 O
1 O
(D (D
2 o <A (0
O o O o
2.10 Measurement of The Percentage Frequency Particle Size Distribution
The spectra showed the effects of multiple scatter, as described in section 2.8.2. This is
due to differences in particle size and surface moisture content (Kortum, 1969) (Fig.
2.S0.
to the spectral data and their effects on calibration and prediction precision were
compared with results for raw absorbance (log(l/R)) data. The data pre-treatments
tested were:
1. SNV;
2. DT;
3. SNV-DT;
4. Sg2dll.
spectra beyond 2200 nm, the wavelength range used with this pre-treated data was
*due to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm (« = 546 data points).
0.4
0.3
g) 0.2
- 0.1
76
2.10.3 Preliminary Data Analysis
In Section 2.8 it was shown that NIR data may be calibrated to measure the percentage
cumulative frequency particle size distribution of this powdered material by MLR and
by PCR. That was an extension of the NIR method for calibration of median particle
size, described in Section 2.7. With the data sets and chemometric techniques used
previously, it was not found possible to produce accurate calibrations for the percentage
frequency particle size distribution by MLR or PCR. However, with this larger NIR-
particle size data set, preliminary investigation showed that this could be calibrated for
by PLSR - although it was not also possible to calibrate the larger data set to measure
geometric progression. For calibration purposes, the mean value between the low and
high particle sizes for a given channel were used (range: 0.9 - 448.34 |xm). Preliminary
investigation of the total particle size data set revealed that the largest particle size
channel had values which were zero for all samples. Since the variance in this channel
was zero, and therefore could not be modelled by PLSR, it was removed from the
Biased regression models for particle size distribution data were produced with raw and
pre-treated NIR spectral data by partial least squares regression (PLSR2), according to
1
equations (1.7.18) to (1.7.23). The number ofPLS dimensions. A, for each of the pre-
i
treated data sets (X, Y) were estimated by cross-validation, using the PRESS statistic.
For this, the first 110 of 113 observations (X, Y) were divided into 11 subsets of 10
77
observations. Next, calculation of PLS models of rank A was performed for all
combinations of 10 of the 11 subsets of data. With each PLS model, the remaining
subset of NIR spectral and particle size data was used to test the goodness of fit of the
model (A PLS dimensions) by measuring the sum of squares between model predicted
and reference particle size data. This approach of dividing the data into subsets was
time (less than one minute compared with 30 minutes for a Teave-one-out’ cross
validation). The number of PLS components required to extract the information from X
With the exception of the model produced with Savitzky-Golay 2"^ derivative data, 6
PLS components were required to fit the data and give the lowest PRESS value (Table
2.5). Clearly, the Savitzky-Golay smoothed second derivative did remove more scatter
and baseline drift information from the NIR data than the other pre-treatments, requiring
only 4 components to model the data. With all data pre-treatments tested, typical
idealised plots of PRESS versus number of PLS components were obtained, with clear
minima. This is shown for the absorbance data set in Fig. 2.10. The SNV transformation
provided a model with the lowest PRESS value (Table 2.5). All other pre-treated NIR
data sets produced models with low PRESS values except for absorbance data which
had the highest PRESS value which was 33% higher than for the model derived from
78
Table 2.5. PLS results of cross validation, calibration and prediction for the NIR
0.9
0.8
a 0.7
0.6
0.4
0.3
0.2
PLS components
Fig. 2.10. Predicted residual error sum of squares (PRESS) for successive PLS
components extracted using NIR absorbance and laser diffraction data.
“PRESS is the predicted residual error sum of squares between NIR predicted and reference measured particle size data.
Mean square error of prediction (MSEP) for predicted particle size data, rescaled by standard deviation and mean of reference
particle size data used to calculate model.
Root mean square error of prediction {RMSEP) for predicted particle size data, rescaled by standard deviation and mean of
reference particle size data used to calculate model.
79
2.10,6 Calibration A nd Validation Precision
To test the robustness of the PLS models, the NIR-particle size data set was split into a
calibration and a validation set. For PLS modelling, 90 spectra and particle size results
(80%) were randomly selected for calibration. The remaining 23 samples (20%) were
used to test the predictive abilities of the models. PLS models for raw and pre-treated
NIR-particle size data were created, each having the number of components determined
by cross validation. The predictive ability of the models were determined by calculation
of the mean square error of prediction {MSEP) (Beebe et al, 1998) and root mean square
error of prediction (RMSEP) (Beebe et al, 1998) between PLS model predicted and
JL JL A
MSEP = — À'
mn (2.10.1)
The best predictive model was obtained using absorbance data which produced lowest
prediction errors: MSEP = 0.82% and RMSEP = 0.90%. The SNV, DT and SNV-DT
pre-treatments also produced models which showed low prediction errors (range: 1.0 to
1.1%), however these ranged from 14 to 19% higher than those obtained with
absorbance data (Table 2.5). The Sg2dl 1 transformation produced a model with far
higher prediction errors, with an RMSEP 87% higher than that obtained with absorbance
data (Table 2.5). Plots of percentage frequency particle size distributions for the 23
validation samples are shown in Fig. 2.11 and show the results obtained by laser
80
12
10
10
g
& &
2 I
10 0 2 0 0 3 0 0 4 0 0 100 200 300 400 100 200 300 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm
12 10
>, 10
C
! s &
I & &
I g
&
2
I
&
2. I
I
&
O' ■
10 10
10
I g
&
I
2. .
■±±
100 200 300 400 1 0 0 2 0 0 3 0 0 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm
81
diffraction with the NIR predicted results overlaid. The linear association between NIR
predicted and laser diffraction measured particle size percentage frequency values for
the entire validation set (all 23 samples and 31 channels) was found to have a highly
82
14
12
10
ü
5
Ig)
8
sc
0)
6
I
1 4
1
Q.
%* • •
çç
z 2
-2
0 2 4 6 8 10 12 14
Percentage frequency
83
2.11 Classification of Excipient Grades by Cluster Analysis Methods
Cluster analysis and pattern recognition methods have been used to identify
multivariate methods have not been investigated for classification of different grades of
are classified and identified by cluster analysis (Massait et al, 1988) of their NIR
spectra. Two different chemometric methods are compared: knn using reflectance
values at all combinations of two wavelengths and score values of combinations of two
The reflectance spectra for all samples showed characteristic non-uniform baselines
arising from multiple scatter. Between grades of the same material, the differences in
offset and baseline curvature were consistent with the median particle sizes of the
grades. Hence with the two grades of microcrystalline cellulose, the spectra fall into two
groups (Fig. 2.13 A). With lactose monohydrate, the Regular grade material from two
different manufacturers had median particle sizes greater and less than for lactose
Fastflo, hence spectra for the Regular grade appear at both higher and lower reflectance
84
0.76
0.74
® 0.7
— Re gu lar
-PH 101
1
_ -----
Fastflo
Re gu lar
-P H 102
6000 6200 6400 6600 6800 7000 7200 7400 5000 6000 7000 8000 9000
W a v e n u m b e r/c m '’ W a v e n u m b e r/c m '’
85
2.11.2 Spectral Data Pre-treatments
spectrometry were applied to the spectral data and their effects on classification were
compared against results for reflectance data. The data pre-treatments tested were:
1. SNV;
2. DT;
3. SNV-DT;
4. S g2dll.
procedure (Massait et al, 1988). It involves computing the Euclidean distance between
an unknown sample’s pattern vector and n samples in a given training set resulting in n
( 2 . 11 . 1)
where m represents the number of variables. The unknown sample is assigned to which
ever training set has the k nearest samples. The value of k is determined by optimisation
to determine the best prediction ability. Small values are normally preferred with typical
values for of 3 or 5 having been reported. It has also been reported that with large data
sets this method is capable of yielding classification results close to or better than more
complicated methods, despite its simplicity (Massait et al, 1988). In this work,
classification has been attempted using spectral values at pairs of wavelengths and
principal components (PCs) scores on two components. With both materials, each
86
sample of a grade has been treated as an unknown and classified according to the class
The algorithm used searched all possible combinations of pairs of wavelengths and with
PCA, pairs of PCs. At each of the two-wavelength and two-PC combinations, the
classification ability was determined {i.e. the numbers of correct identifications and the
for the two groups and their total number were recorded. The procedure was also
The choice of using two wavelengths and PCs was made to maintain mathematical
simplicity and also because principal components analysis often results in clusters
which are discernible on plots of two PCs, frequently with the first two components.
Hence, the results of the two wavelength and two PC approaches could be directly
compared.
The number of wavelengths included in the search was 500 for all pre-treatments except
the Savitzky-Golay 2"^ derivative which used 489 wavelengths. With PCA, the number
of PCs searched depended on the rank of the model. With each model, this was
determined by calculation of the % sum of squares {%SS) (Lindberg et al, 1983) and the
R statistic (Wold, 1978) for successive PCs extracted (Tables 2.6 & 2.7 ).
For the two grades of microcrystalline cellulose, 100% correct identification of the
samples was obtained using pairs of wavelengths for all spectral pre-treatments (Table
2.8). Optimisation of k produced excellent classification results with a value of just one
87
Table 2.6. knn results for microcrystalline cellulose grades PHI 01 (w = 15) and
PHlOl PH 102
( k = l)
Reflectance 4 99.33 15 13
SNV 4 94.36 15 13
Detrend 4 94.31 12 14 -
(ik = 2)
Reflectance 4 99.33 13 13
SNV 4 94.36 14 15
Detrend 4 94.31 8 14
Table 2.7. knn results for lactose monohydrate grades Regular (n = 9) and Fastflo
Regular Fastflo
(* = 1 )
Reflectance 2 99.44 8 8
SNV 3 94.68 7 9
(* = 2)
Reflectance 2 99.44 7 8
SNV 3 94.68 6 9
Detrend 4 96.07 8 9
^%SS is the sum of squares of the differences between the NIR data and the NIR data
estimated by the principal components analysis model.
88
or two nearest neighbours. With a value for k of two, reflectance data yielded only 12
pairs of wavelengths out of 124,750 possible combinations which were suitable (0.01%)
(Table 2.8). The best data pre-treatment was the Savitzky-Golay 2"^ derivative. With k
equal to one, 100% correct identification of the two grades was obtained for 8381
2 .8 ).
For the two grades of lactose monohydrate, 100% correct identification was obtained
with all pre-treatments except for reflectance data (Table 2.9). These results could be
obtained with a value for k of one or two. The best data pre-treatment for this material
was the quadratic baseline detrend. This returned 34,543 possible combinations of
wavelength pairs that yielded 100% correct identification (27.7%) with k = \ (Table
2.9).
For both materials, the computation time of a full two-wavelength search was
Between two and four PCs were required to model the pre-treated data sets for both
materials (Tables 2.6 & 2.7). These numbers of PCs were used in the two PC search.
For microcrystalline cellulose grades, best classification was obtained using PCs 1 and 4
(Fig 2.14A) and a value of one for k*. The only pre-treated data set which yielded 100%
correct identification for all samples in both classes was the Savitzky-Golay 2"^
derivative data set. This pre-treatment agrees with the optimum pre-treatment for the
Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '
PHlOl PH 102
(* = 1 )
(k = 2)
Table 2.9. knn results for lactose monohydrate grades Regular in = 9) and Fastflo
Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '
Regular Fastflo
(* = 1)
Reflectance 9 8 8916,9936
(k = 2)
Reflectance 8 9 - 9624,9996
90
PH 102
■5-14
-15
i
o
Q, •14 •12
•I'i^"^
•1 1
-ID
P H 101
PC4 scorss
F a s tflo
F a s tflo
K -0.1
R e g u la r
R e g u la r
O
PC2 score#
Fig. 2.14. k nearest neighbour {knn) and Hotelling's 7^ classification (95% control
ellipses) of microcrystalline cellulose and lactose monohydrate grades using
principal components scores. Microcrystalline cellulose (Savitzky-Golay 2"^*
derivative of reflectance data): A) knn (100% correct identification, k = 1), B)
Hotelling's 7^ control ellipses (overlapped); lactose monohydrate (quadratic
haseline corrected reflectance data): C) knn (100% correct identification, k = 1), D)
Hotelling's 7^ control ellipses (overlapped).
91
For lactose monohydrate grades, the best classification results were obtained using PCs
1 and 2 (Fig 2.14C) or 2 and 3 and a value of one for k and quadratic baseline corrected
spectra (Table 2.7). This yielded 100% correct identification for all samples in both
grades. This optimum pre-treatment also confirms that found for the method using pairs
of wavelengths.
Overall, the method using PC scores was preferred owing to the greatly reduced
computation time required to set-up the method (approximately one minute). Prediction
2.11.7 H o t e l l i n g C o n t r o l Ellipses
ellipses (Montgomery, 1997) were produced for both materials with the PCs that
yielded best identification, as described in Section 1.8.1. The value of lOOof was set to
95%. With both microcrystalline cellulose and lactose monohydrate, the 95%
Hotelling’s 7^ ellipses for the two grades were found to overlap, resulting in some
incorrect assignment of samples for each grade (Fig. 2.14B &D). The two materials are
To confirm whether the Hotelling’s 7^ control ellipses are able to distinguish the two
materials, principal components models were constructed for each material using the
pre-treatments found optimum with knn. Hotelling’s 7^ control ellipses were produced
for the samples of the material (both grades) used to calculate the PC model. With both
pre-treated spectra were centred using the mean of the data set for the model. These
were then projected into the pattern space. For both materials, 100% correct
92
identification of the modelled class was obtained with all samples for the non-modelled
material falling outside the 99% control ellipse (Fig. 2.15A &B).
2.12 Conclusion
The shapes of near infrared spectra (i.e. baseline curvature, spectral offset, absorption
I peak height and width) of powdered pharmaceutical materials are clearly influenced by
the particle size distribution of the material. This enables calibration of the data to
measure and classify the particle size of materials. Extraction of particle size
spectral data with reference particle size data. This was found to be most accurate with
raw spectral data and by PLSR. Alternatively, materials may be classified by grade
using the simple non-parametric methods, eg knn. This method does not require
reference particle size data, however scatter correction of the spectral data was found to
be necessary for the two materials studied. This is most likely necessary to reduce
within class variance and increase between class variance. As the optimum scatter
correction was found to be different for the two materials studied in this chapter,
implementation of this method for other materials requires that the optimum scatter
93
5
0
•14
•10 •12
-5
•18
•15 M ic r o c r y sta llin e c e llu lo s e
•16
L a cto se
•10
-5 -4 3 -2 -1 0 1 2 3
PC4 s c o r e s
Xid’
L a c to se
0 .2
•1 5
94
CHAPTER 3
3.1 Introduction
collection and analysis of process intermediates and final product to ensure that the
product meets the requirements of its specification (Candolfi et al, 1999a, 1999b;
Sekulic et al, 1998). This approach to process monitoring is time consuming and does
In this chapter, an alternative method for pharmaceutical quality control that enables
Section 3.5, near infrared spectrometric analysis is used to provide process based
measurements at each process stage. Section 3.7 describes the use of the data reduction
method, principal components analysis (PCA), for the multivariate statistical process
control methods used. In Sections 3.8 and 3.9, multivariate models are generated and
monitored for the first process stage (blend) and second process stage (tablet). The
combined process measurements are used in Section 3.10 to develop an overall process
fingerprint from which unusual product batches may be identified. In Section 3.12
conclusions are drawn as to the advantages of this form of quality control over
95
traditional methods.
The manufacturing process of the marketed tablet can be viewed as three steps:
3. Tabletting of blend.
The powdered raw materials used in the blends are: microcrystalline cellulose, sodium
starch glycolate, dibasic calcium phosphate anhydrous, drug substance and magnesium
stearate. The first four of these raw materials are blended together for a set time before
being passed through a Fitzmill screen, to remove lumps from the blender contents.
Magnesium stearate is then added to the blender, and the contents are further blended
for a specified time. At the end of the blend stage, samples of blend are collected for
before proceeding to tabletting. At this stage, the process can be delayed as samples
Tabletting of the blend produces one of two strengths of tablet, depending on the fill
weight and size of die chosen. Samples of tablets are then analysed by reference
The pharmaceutical process studied was one which was normally well controlled.
were produced which were of lower quality. Some of these problem batches were
identifiable from reference analysis of the blends, however most could not be. These
produced batches of tablets which were friable, had high average tablet thickness and
96
which showed prolonged tablet dissolution time. All of these unusual batches of tablets
had analytical results within the limits of the product specification, however they each
showed results close to the limits and were therefore considered unusual.
3 .3 Materials
A total of 205 production batches of blends and tablets were examined. These were
supplied by the manufacturer (Pfizer Ltd., Pfizer Central Research, 1 Ramsgate Road,
The blends analysed were 193 different batches of the total number of batches studied.
Of these batches, 13 were re-blended batches (as a result of unusual reference analysis
results) and one was a placebo batch. With each batch of blend, specimens were
supplied which were taken from different locations within the blender (Flobin). The
number of these sampling locations was either three or seven (if the batch was re
blended) different blender locations. Samples were collected by the Quality Assurance
laboratory over a six year period. A total of 1881 blend NIR absorbance spectra were
measured.
The tablets obtained were of two strengths of the drug substance and are both
manufactured from an identical blend composition and hence are two sizes. The number
of batches used was 44 with lower strength tablets and 43 batches with higher strength
tablets. The tablets themselves were white and were embossed with the Pfizer logo on
one side and embossed with distinguishing lettering on the other side.
For the two strengths of tablet, 39 batches of lower strength tablets were supplied with
their corresponding blends and 41 batches of the higher strength tablets were supplied
97
3. 4 Reference Analytical Data
Reference laboratory data were provided for most batches of blends and tablets.
pathlength.
replicate assays for a batch, from their mean value, expressed as a percentage. Moisture
for a batch from their mean value, expressed as a percentage. These test were performed
for blends used to produce the lower and higher strength tablets (n= 193 batches).
3. moisture (%);
4. dissolution (%);
5. disintegration/secs;
98
9. friability/mg.
The tablet tests are standard pharmacopoeial tests and were performed for the lower
strength tablets {n = 44 batches) and for the higher strength tablets {n = 43 batches).
Summaries of the reference analytical data for the blends (Section 3.8) and lower and
higher strength tablets (Section 3.9) are provided in Tables 3.1, 3.2 and 3.3 respectively.
Summaries of the combined blend and tablet data sets (multiway data sets*) used in
Section 3.10 are provided in Tables 3.4 and 3.5 respectively. The batch numbers of
product, and their indices in data sets (blend (Section 3.8), lower strength tablet (Section
3.9), higher strength tablet (Section 3.9), and multiway blend and tablet data sets
(Section 3.10)) are provided in Table 3.6 (N.B. some C. of A. reference analysis results
Multiway data sets are three dimensional data sets that may be unfolded to form a two-dimensional array.
99
Table 3.1. Summary of blend C. of A. data (n = 193 batches).
Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg g 24.98 0.41 25.8 24.2 185
Blend uniformity (%) 0.88 0.63 3.0 0.0 185
Moisture content (%) 2.98 0.23 4.0 - 185
Moisture deviation (%) 2.23 1.64 5.0 0.0 172
A{\%, 1 cm) 115.32 1.19 - - 182
Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg 4.98 0.08 5.15 4.85 40
Content uniformity (%) 1.70 0.93 6.0 0.0 40
Moisture content (%) 3.14 0.31 4.5 - 40
Dissolution (%) 99.15 1.88 100 90 40
Di sintegration/secs 9.63 0.95 600 - 40
Mean weight/mg 199.52 0.37 - - 40
Hardness/kPa 13.60 0.94 17 9 40
Thickness/mm 3.31 0.04 3.5 3.2 40
Friability/mg 0.28 0.60 - - 40
Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg 10.00 0.10 10.3 9.7 43
Content uniformity (%) 1.36 0.69 6.0 0.0 43
Moisture content (%) 3.13 0.32 4.5 - 43
Dissolution (%) 98.85 1.20 100 90 43
Disintegration/secs 10.71 1.19 600 - 43
Mean weight/mg 399.70 0.65 - - 43
Hardness/kPa 14.32 0.71 17 9 43
Thickness/mm 4.58 0.05 4.6 4.1 43
Friability/mg 1.58 1.16 - - 43
100
Table 3.4. Summary of multiway lower strength blend and tablet C. of A. data {n
39 batches).
Table 3.5. Summary of multiway higher strength blend and tablet C. of A. data {n
= 41 batches).
101
Table 3.6. Batch numbers and indices of blend, tablet and multiway blend and
Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
1 1 0 0 0 0
967 2 0 0 0 0
968 3 0 0 0 0
969 4 0 0 0 0
973 5 0 0 0 0
974 6 0 0 0 0
975 7 0 0 0 0
976 8 0 0 0 0
983 9 0 0 0 0
984 10 0 0 0 0
985 11 0 0 0 0
989 12 0 0 0 0
990 13 0 0 0 0
993 14 0 0 0 0
997 15 0 0 0 0
998 16 0 0 0 0
999 17 0 0 0 0
1006 18 0 0 0 0
1009 19 0 0 0 0
1010 20 0 0 0 0
1015 21 0 0 0 0
1016 22 0 0 0 0
1017 23 0 0 0 0
1024 24 0 0 0 0
1025 25 0 0 0 0
1029 26 0 0 0 0
1031 27 0 0 0 0
1034 28 0 0 0 0
1469 29 0 0 0 0
1969 30 0 0 0 0
1971 31 0 0 0 0
1973 32 0 0 0 0
1975 33 0 0 0 0
1977 34 0 0 0 0
1979 35 0 0 0 0
1981 36 0 0 0 0
1983 37 0 0 0 0
1987 38 0 0 0 0
1990 39 0 0 0 0
1991 40 0 0 0 0
1992 41 0 0 0 0
1997 42 0 0 0 0
1998 43 0 0 0 0
1999 44 0 0 0 0
2000 45 0 0 0 0
2005 46 0 0 0 0
2007 47 0 0 0 0
2011 48 0 0 0 0
2011 (re-blend) 49 0 0 0 0
2017 50 0 0 0 0
2019 51 0 0 0 0
2021 52 0 0 0 0
2023 53 0 0 0 0
2025 54 0 0 0 0
2027 55 0 0 0 0
2029 56 0 0 0 0
2031 57 0 0 0 0
2033 58 0 0 0 0
2035 59 0 0 0 0
2039 60 0 0 0 0
102
Table 3.6. Continued.
Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
2041 61 0 0 0 0
2043 62 0 0 0 0
2045 63 0 0 0 0
2047 64 0 0 0 0
2049 65 0 0 0 0
2051 66 0 0 0 0
2055 67 0 0 0 0
2057 68 0 0 0 0
2061 69 0 0 0 0
2063 70 0 0 0 0
2967 71 0 0 0 0
2967 (re-blend) 72 0 0 0 0
2969 73 0 0 0 0
2981 74 0 0 0 0
2983 75 0 0 0 0
2985 76 0 0 0 0
2987 77 0 0 0 0
2988 78 0 0 0 0
2989 79 0 0 0 0
2990 80 0 0 0 0
2994 81 0 0 0 0
3011 82 0 0 0 0
4050 0 0 1 0 0
4058 83 0 0 0 0
4059 84 0 0 0 0
4978 0 0 2 0 0
4980 0 0 3 0 0
4983 0 1 0 0 0
4984 0 2 0 0 0
4985 0 3 0 0 0
4986 0 4 0 0 0
4987 0 5 0 0 0
4988 0 6 0 0 0
4989 0 7 0 0 0
4990 0 0 4 0 0
4991 85 0 5 0 1
4992 86 0 6 0 2
4993 87 8 0 1 0
4994 88 9 0 2 0
4995 0 10 0 0 0
4996 89 11 0 3 0
4997 90 12 0 4 0
4998 91 13 0 5 0
4999 92 0 7 0 3
5000 93 0 8 0 4
5001 94 0 9 0 5
5002 95 14 0 6 0
5003 96 15 0 7 0
5004 97 0 0 0 0
5005 98 0 0 0 0
5006 99 0 0 0 0
5007 100 0 0 0 0
5008 101 0 0 0 0
5008 (re-blend) 102 0 0 0 0
5009 103 0 10 0 6
5010 104 0 11 0 7
5011 105 0 12 0 8
5012 106 0 0 0 0
5013 107 16 0 8 0
5014 108 0 0 0 0
5015 109 0 0 0 0
5016 110 17 0 9 0
5017 111 18 0 10 0
5018 112 19 0 11 0
5019 113 0 0 0 0
5020 114 0 0 0 0
5021 115 0 13 0 9
103
Table 3.6. Continued.
Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
5022 116 0 14 0 10
5023 117 0 15 0 11
5024 118 0 16 0 12
5025 119 0 0 12 0
5025 (re-blend) 120 20 0 13 0
5026 121 21 0 14 0
5027 122 22 0 15 0
5028 123 0 0 16 0
5028 (re-blend) 124 23 0 17 0
5029 125 24 0 18 0
5030 126 0 0 0 0
5031 127 0 17 0 13
5032 128 0 18 0 14
5033 129 0 19 0 15
5035 130 25 0 19 0
5036 131 26 0 20 0
5037 132 27 0 21 0
5038 133 28 0 22 0
5039 134 0 0 23 0
5039 (re-blend) 135 29 0 24 0
5040 136 30 0 25 0
5041 137 31 0 26 0
5042 138 32 0 27 0
5043 139 0 0 0 16
5043 (re-blend) 140 0 0 0 17
5043 (re-blend) 141 0 20 0 18
5044 142 0 21 0 19
5045 143 0 0 0 0
5045 (re-blend) 144 0 0 0 0
5045 (re-blend) 145 0 0 0 0
5045 (re-blend) 146 0 0 0 0
5045 (re-blend) 147 0 0 0 0
5045 (re-blend) 148 0 0 0 0
5046 149 0 22 0 20
5047 150 0 23 0 21
5048 151 0 24 0 22
5049 152 0 25 0 23
5050 153 0 26 0 24
5051 154 0 27 0 25
5052 155 0 28 0 26
5053 156 0 29 0 27
5054 157 0 30 0 28
5055 158 33 0 28 0
5056 159 34 0 29 0
5057 160 35 0 30 0
5058 161 0 31 0 29
5059 162 0 32 0 30
5060 163 0 33 0 31
5061 164 36 0 31 0
5062 165 37 0 32 0
5063 166 0 0 0 0
5064 167 38 0 33 0
5065 168 39 0 34 0
5067 169 40 0 35 0
5068 170 0 34 0 32
5069 171 0 35 0 33
5070 172 0 36 0 34
5071 173 0 37 0 35
5072 174 0 0 0 0
5073 175 0 38 0 36
104
Table 3.6. Continued.
Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
5074 176 0 39 0 37
5075 177 41 0 36 0
5076 178 42 0 37 0
5077 179 43 0 38 0
5078 180 0 40 0 38
5079 181 0 41 0 39
5080 182 0 42 0 40
5081 183 0 43 0 41
5967* 184 44 0 39 0
5968 185 0 0 0 0
5969 186 0 0 0 0
5970 187 0 0 0 0
5970 (re-blend) 188 0 0 0 0
5971 189 0 0 0 0
5972 190 0 0 0 0
5972 (re-blend) 191 0 0 0 0
5974 192 0 0 0 0
5975 193 0 0 0 0
105
* Example - blend BN5967 has: batch index 184 in the blend data set; batch index 44 in the lower strength
tablet data set and batch index 39 in the multiway PCA and multiblock PLS data sets.
3. 5 Near Infrared Measurements
In this study, NIR measurements were made using a grating instrument (Foss
NIRSystems, Maryland, USA) equipped with either a diffuse reflectance module (Rapid
module records spectra over the range 1100 to 2498 nm, at 2 nm intervals (700 data
points) and outputs spectra as absorbance (logio(l//?)). The reflectance reference used
with the RCA module was a ceramic tile. The transmission module records spectra over
the range 600 to 1898 nm, at 2 nm intervals (650 data points) and outputs spectra as
absorbance (logio(l/7)). The spectra recorded with the transmission module used air as
the reference.
All spectra recorded were the average of 50 scans, which is the default setting on the
Blends were scanned using the diffuse reflectance RCA module (as described in Section
2.6), in narrow soda-glass vials (henceforth these data are referred to as absorbance).
For each specimen, 3 soda glass vials (50 mm deep by 25 mm wide) were filled with
The tablets were scanned in both diffuse reflectance (RCA module, described in Section
2.6) and transmission modes (Intact module). Transmission measurements of the two
strengths of tablet were made by placing a tablet of either strength into a specially
machined aluminium template, with a circular hole underneath the tablet to allow
transmission of NIR radiation. For each measurement, the template and tablet were
placed inside the module, between the fibre optic probe and the lead sulphide detectors,
and the lid closed. To allow consistency of measurement, each tablet was scanned with
the Pfizer logo adjacent to the source of incident NIR radiation. Henceforth, diffuse
106
reflectance measurements of tablets are referred to as absorbance data; transmission
as transmission data. The numbers of NIR spectra measured for the lower strength
tablets were 4911 absorbance spectra (mean =112 tablets per batch) and 4904
transmission spectra (mean =111 tablets per batch). The numbers of NIR spectra
measured for the higher strength tablets were 2716 absorbance spectra (mean = 63
tablets per batch) and 2721 transmission spectra (mean = 63 tablets per batch). Spectra
As NIR spectral data are often highly collinear, they generally require pre-processing -
2. SNV;
3. DT;
4. SNV-DT;
5. Sg2dll.
These 4 pre-treatments were applied to blend absorbance data, and absorbance and
transmission data for each of the two strengths of tablet. This allowed for a total of 55
data sets (including multiway data sets of appended blend and tablet data) to be studied
via principal components analysis (equation (1.7.1)) (Sections 3.6.2 and 3.6.3).
The data were analysed using code programmed in Matlab 5.2 Scientific and Technical
107
Programming Language (The Mathworks Inc., Natick, MA, USA).
Spectral Characteristics
All spectra exhibited non-uniform baselines arising from multiple scatter. The SNV
transform was able to correct for variation in scatter within each data set. SNV coupled
with DT was able to correct for both variation in scatter within each data set as well as
correct for non-uniform baselines. The Sg2dl 1 transformation was able to correct both
the spectra which in the raw data were obscured as overlapped combinations and
Absorbance Spectra
The wavelength range scanned was used to generate PCA models for absorbance, SNV ,
DT and SNV-DT pre-treated spectral data. However, with the Sg2dll transform, due to
the increased noise beyond 2200 nm, the wavelength range used was truncated to 2200
nm^
Transmission Spectra
data), the wavelength range selected was 750 to 1208 nm for higher strength tablets and
750 to 1350 nm for lower strength tablets. These wavelength ranges were selected to
provide spectra within the dynamic range of the detector* and were different for the two
^ Owing to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm.
'The dynamic range of the detector is approximately 2 absorbance units on a baseline. The maximum absorbance at any wavelength
should not exceed 6 absorbance units.
108
strengths of tablet due to different tablet thickness. With the Sg2dl 1 transformation, this
PCA Models were generated for all data pre-treatments (absorbance/transmission, SNV,
DT, SNV-DT and Sg2dl 1) of each data set: blends (absorbance data); lower strength
tablets; higher strength tablets (absorbance data and transmission data for each tablet
strength); multiway data sets containing blend and tablet data (absorbance and/or
transmission). To eliminate systematic variation in the data due to scatter effects from
sample presentation geometry and variable optical properties of the soda glass vials, the
average spectrum for each batch was used for PCA. This was found to produce more
acceptable PCA models, requiring both fewer components and less cross validation
computation time.
With each PCA model, the variable mean-centred spectral data, X, were decomposed as
the product of a score matrix, T, and a loadings matrix, P, according to equation (1.7.1)
The rank of each PCA model was determined by ‘leave-one-out’ cross-validation and
involved calculation of predicted residual error sum of squares {PRESS) and R and W
(Eastment and Krzanowski, 1982) statistics. In addition, the cumulative percentage sum
of squares (%SS) (Lindberg et al, 1983) for each extracted PC was calculated and a chi-
squared significance test was performed on the eigenvalues (Jackson, 1991) (Appendix
B). Once generated, these PCA models were amenable to multivariate statistical process
^Owing to Sg2dll transformation, the wavelength ranges used were: 760 to 1196 nm with transmission spectra of higher strength
tablets and 760 to 1338 nm with transmission spectra of lower strength tablets.
109
Blend And Tablet PCA Model Loadings
With the blend data-set, the PCA loadings for each pre-treatment appeared to contain
physical and chemical information. An example of the PCA loadings for absorbance
spectra (« = 193 batches) is given in Fig. 3.1. The first two loadings appeared to
represent physical information, and showed a general reduction in value across the
wavelength range. One PC’s loadings represent absorptions at 1930 nm and 1410 nm,
which are characteristic absorptions of water (Osborne et al, 1993) (Fig. 3. IE).
The loadings of PCA models generated from tablet absorbance and transmission data
also appeared to contain physical and chemical information. An example of the loadings
of absorbance and transmission spectra of lower strength tablets is given in Fig. 3.2 and
Fig. 3.3 respectively {n = 44 batches). The loadings of the third PC of the lower strength
tablet absorbance spectra also seemed to contain features characteristic of water (Fig.
3.2C).
In section 3.6.2, the method of PCA is described for the blend and tablet stage data sets.
The PC scores from these data sets may be monitored by MSPC procedures so that
deviations in process performance at each stage may be identified. However, it was also
considered desirable to examine how the process performed overall. Multiway PCA
(MPCA) (Kresta et al, 1991; Skagerberg et al, 1992; Nomikos and MacGregor, 1994) is
a data reduction method that enables the data of an entire chemical process to be
described by a few latent variables and was therefore examined. The variability among
batches with respect to their variables and time variation were studied by MPCA which
In MPCA, the three-way array of spectral data, X, may be decomposed into a series of A
110
0.04
0.05 0.02
■5 0.04
<0 — 0.02
-0 .0 4
0.03
-0 .0 6
0.02 —0.08
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm
0.06
0.04 0.05
□
c>
0.02
a. -0.02
O
“■ -0 .0 5
-0 .0 4
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm
0.1
g) 0.05
.5 0.05 c
-0 .0 5
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm
0.05 0.1
(0
oc> S) 0.05
1
^ -0 .0 5
-0 .0 5
- 0.1 - 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm
Fig. 3.1. Blends NIR absorbance spectra PCA loadings (n = 193 batches): A) PCI;
B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PCS.
Ill
0.04
0.06
0.02
o>
-J 0.03 j - 0 .0 2
0.02 -0 .0 4
0.01 -0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0
I 0.05
?
■5 -0 .0 5
1
o0.
Q. -0.1
-0 .0 5
-0 .1 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0.05 0.05
0
I
o -0 .0 5 u
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0.15
0.05
?
o 0.05
1
Ü -0 .0 5
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
Fig. 3.2. Lower strength tablet NIR absorbance spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PC8.
112
0.07
0.05
c 0.065
0.06
y -0 .0 5
0.055
- 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.15 (0
I 0.1 g* 0.1
1 0.05 1
o
Q.
-0 .0 5 - 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.05 0.05
O)
§ -0 .0 5
-0 .0 5
ü -0.1
o. -0.1 Q.
-0 .1 5
-0 .1 5
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.1
0.1
« 0.05
I
-0 .0 5 1
^ - 0.1
- 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.15
0.1
0.05
ü
Q.
-0 .0 5
Fig. 3.3. Lower strength tablet NIR transmission spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6; G) PC 7; H) PC 8 &
I) PC9.
113
residual array, E, which is as small as possible in a least squares sense (Bharati and
+£ (3.6.1)
a=\
performing ordinary PCA (Section 3.6.2) on the unfolded array, X (Wold et al, 1987). In
this chapter, data reduction was achieved by performing MPCA on variable mean-
centred unfolded three-way data sets (blend and tablet NIR data [absorbance and/or
transmission]). Two-way arrays of data were produced by slicing each three-way array,
X, vertically and appending the tablet absorbance and/or transmission data set(s) to the
right of the blend absorbance data set. This provided two-way arrays of rows of
observations {i.e. batch) versus columns equal to the number of combined wavelengths.
For the lower strength tablet process, the two-way arrays therefore contained NIR
spectral data of 39 batches of blends and their corresponding tablets. The higher
strength tablet process two-way arrays therefore contained NIR spectral data for 41
batches of blends and their corresponding tablets. PCA was then performed on these
unfolded three-way arrays, as for the discrete data sets. The models were derived from
batch average spectra with rank determined as for two-way PCA models (Section 3.6.2).
The effects of the different data pre-treatments on MPCA models were also
NIR spectra of the blends and tablets were used for MSPC purposes. The dimensionality
of these data were first reduced by subjecting them to PCA (1.7.1) (Section 1.7). Using
114
the resulting PC scores, sample variance-covariance matrices of the process at each
stage and of the overall process, were estimated. This is known as ‘Control phase 1’
(Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and was performed to
establish statistical control levels. The method employed a Monte Carlo search as an
within statistical control (Section 3.8.3). The optimisation procedure was therefore used
to estimate both the theoretical process mean PC score vector and the theoretical sample
Once these were estimated, all batches examined for each model were then treated as
future production samples for control phase 2 monitoring (Alt, 1985). This involved
monitoring the PC scores of these batches, at each process stage, by means of residual
analysis, using the sample variance-covariance matrix and process mean PC score
vector determined in control phase 1. The ability of the method to monitor the process
overall, from the raw materials through to the finished dosage form, was determined.
The generalised Hotelling’s 7^ was calculated for the average spectrum of each batch
with the PCA models examined and was calculated according to equation (1.8.4) with
upper control limit, UCL, for this statistic calculated according to equation (1.8.5). The
value of a of the upper I00a% critical point of the F distribution was set to 0.95
This statistic was calculated for the batches examined with each PCA model, and used
the process mean score vector and sample variance-covariance matrix determined in
control phase 1. The upper control limits used with this chart were based on tabulated
115
data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models with a
greater number of PCs were assigned control limits based on the chi-squared
(1992).
All PCA models calculated in this chapter comprised more than two PCs, hence process
To implement this MSPC procedure, 9 spectra were selected at random for each batch,
from the library of data. This number was chosen as it was greater than the maximum
number of retained PCs in any model (8 PCs) and was equivalent to the lowest number
of samples measured for a blend (9 spectra). This number was also used for tablet
models and is close to the number used in British Pharmacopoeial assay methods (9
tablets out of 10 must have uniformity of content within 80 and 120% of specified
amount) {British Pharmacopoeia 1999, 1999) - however more tablet spectra could have
been used. Multiway PCA models which combined blend and tablet data were limited to
The randomly selected spectra were mean-centred and projected onto the eigenvectors
for each model. The resulting PC scores were then used to calculate Anderson’s
Preliminary control charts for this sample generalised variance were constructed with
various statistical control limits, however with most PCA models, this approach did not
and variance 2n (where n is the number of PCs) because some batches had significant
116
generalised variances above 99.9% limits. To overcome this problem, a recursive
algorithm was implemented for control phase 1 which excluded these batches from the
calculation of the theoretical sample generalised variance. The control phase 1 process
terminated when the mean of Anderson’s asymptotic normal approximation for the
batches used in this calculation was 0. This was found to be effective, and in control
be out of control were examined. The results of these plots may be found in Appendix B
In addition, Shewhart control charts were constructed for the individual PC scores
(Jackson, 1991). These used the mean and range for each PC score sample, measured
for the historical data set. The overall or grand mean was, in most cases, equal to zero
(unless additional batches were projected into the model space) with each model, whilst
the range was equal to the standard deviation of the observations multiplied by the
corresponding value of the r-distribution (95% and 99% control limits). With blends, a
similar approach to the PC Shewhart method was adopted to diagnose PCs responsible
for significant process variance for batches which showed significant multivariate
determine the PCs on which significant variance occurred, the data set of each batch
was first sub-group mean-centred. The range was calculated from the centred batch data
117
3. 7. 5 Generation of PCA Models of Normal Batches: Q Statistic PCA Residual
Analysis
The Q statistic was used to identify batches which had not processed in the normal
manner (Nomikos and MacGregor, 1994) - and which were located outside the plane
defined by the principal axes. The implementation of this control chart on any PCA
model, involved an initial screening of the model with rank determined by the cross-
validation R statistic. Batches whose Q values exceeded 95% and 99% limits were
deemed to have processed unusually and were excluded from the 'normal' data set. A
full PCA cross-validation was then performed on the updated data set, and the control
limits for Q were recalculated based on the rank of the updated model. Those batches
which were excluded from the model were centred and projected onto the eigenvectors,
and their residuals and Q statistics calculated. Additional batches falling beyond the
99% limits were excluded, as were any batches whose Q values were above the 95%
control limits and were consecutive in product batch number with any batches already
eliminated from the 'normal' data set. This process was repeated until all unusual
batches had Q values exceeding the 99% level with no consecutive batches having Q
A visual inspection of PC loadings confirmed that all PCs selected with the R statistic
extra PC loadings appeared more random and were deemed to represent noise. The W
statistic which is another popular stopping criterion tended to select more components
than R before terminating and was therefore considered less useful. The cumulative
%SS explained by models was in excess of 99.5% for all models except the Sg2dl 1 data
118
set. This model explained only 94%SS, probably owing to increased noise from the
derivative transformation.
Between 6 and 8 PCs were required. The absorbance data set required most PCs, its first
two component loadings appear to represent scatter information, and show a general
decrease across the wavelengths (Fig. 3.1 A & B). With all PCA models, at least one PC
has high positive loadings at 1400 and 1900 nm and may represent water. This is found
on the 5^^ PC with absorbance data, 2"^ PC with SNV data; and 2"^ PCs with DT
data; F* PC with SNV DT data and on the first PC with Sg2dl 1 data. The PC loadings
of the models also showed significant correlations with NIR absorbance spectra of the
Examination of the PC scores for all data sets reveals systematic variation which can be
clearly observed on the PC for all PCA models. With the first PC for each model, the
batches were clearly divided into three main groups over time: low level; high level and
optimum level (about zero vector). This is shown for the PC scores of blend Sg2dl 1
data (Fig. 3.4). This systematic process variation on the first PC was considered to relate
to a change in the efficiency of the milling step during blending. Over the period of high
systematic process variation, the batches of drug substance used were more brittle and
less free flowing. These were consequently harder to mill. This was later confirmed by
With this statistic, similar batches for all data sets were found to have processed
unusually and thereby not fit the plane defined by the model. The placebo blend (29)
was easily identified as an outlier with all models. Consideration of results from data
analysis of multiway PCA models suggested that the results of SNV DT and Sg2dl 1
models, which were consistent, provided the best indication of unusual blends. This is
119
-4
X 10 A X 10
S '... •
s »
u
0.
-2 -1
-1.5 -1
50 100 150 50 100 150
Batch Batch
X 10” X 10”
10
5
2
8 0
(/) - V.:". •• - iO-St-
Ü
CL -5 ■V. 1.
-5
-10
50 100 150 50 100 150
Batch Batch
X 10”
£-1
-2
-3
-4
50 100 150
Batch
Fig. 3.4. PC Shewhart control charts for PCA scores of NIR Savitzky-Golay 2”**
derivative of absorbance spectra of blends (n = 193 batches). A) PCI scores; B)
PC2 scores; C) PC3 scores; D) PC4 scores; E) PCS scores; F) PC6 scores & G) PC7
scores (95% and 99% control limits shown).
120
shown for SNV DT data set in Fig. 3.5. With these models, a number of consecutive
batch numbers were found to have processed unusually (Batches: 125 (BN5029), 126
(BN5030); 167 (BN5064), 168 (BN5065), 169 (BN5067); 180 (BN5078), 181
based on data from previous successful batch runs. This entails estimation of an 'in-
quality data may be available from which a suitable set of reference data may be
selected and an alternative approach is required to identify those batches from which the
‘in control’ process covariance matrix may be estimated. The approach employed here
typically 2 or 3 more batches than the number of retained PCs - from which a sample
covariance matrix and 99% Hotelling’s 7^ limits were calculated. The Hotelling’s 7^
distances of these batches were then measured and if all batches were located within the
99% confidence ellipsoid, control phase 2 was implemented. This involved measuring
the 7^ for all other batches and recording the frequency of each batch failing the test.
Typically this process was repeated 100 times and the results were plotted as a
frequency bar chart of outliers. These outlier batches were excluded from control phase
1; the remaining batches were used to estimate the process covariance matrix and grand
mean vector. If however any batch fell outside the 99% control ellipsoid, it was
removed from the phase 1 data set and the covariance matrix re-estimated. Once all
121
Batch
Fig. 3.5. Q Statistic control chart of PCA model of blends (n = 193 batches) (NIR
SNV-DT absorbance spectra) (95% and 99% limits shown).
1 2 2
control batches were found to lie within the 99% ellipsoid, the Hotelling’s 1^ was
calculated for all batches and plotted as a control chart. Batches exceeding 95 and 99%
With up to 6 PCs included in the model, this approach was found to work well with 8 to
10 preliminary batches and 99% control limits. However with 8 PCs, the search
required considerable computation time (several hours), and was therefore considered
2. Use PCs 1+2 (or 1 to 3 ), which describe most variance in the data, and
repeat the procedure described above in instances with 8 PCs or more, using
selected batches and repeat control phase 1 monitoring with inclusion of all
PCs. Exclude any batches which lie outside the 99% ellipsoid from the
All production batches were then monitored using the estimated process covariance
matrix and PC scores to measure Hotelling’s 7^. The results for absorbance and pre
treated data sets were all similar using 99% confidence limits: the placebo blend 29
(BN1469) was identified as an outlier (PC6, Fig. 3.4 & 3.10); batch 21 (BN 1015) was
an outlier with all models and with absorbance and SNV consecutive batches 22
123
2 .5
2 1.5
0.5
Fig. 3.6. Results of Monte Carlo search of PC scores of blends (n = 193 batches)
(DT data) showing outlier frequency with batch (100 searches).
124
(BN 1016), 23 (BN1017), and 24 (BN 1024) were also outliers. With all control phase 2
Hotelling’s 7^ charts most batches between 83 and 154 (BN4058 and BN5051) were
out-of-control {i.e. showed systematic process variation). An example control chart for
DT blend data is shown in Fig. 3.7B. This is further illustrated on plots of two PC
scores with the 95% and 99% Hotelling’s 7^ control ellipses (Figs. 3.8, 3.9 & 3.10).
This control chart exhibited very similar behaviour with all models and clearly showed
systematic drift in the process with time. With all data sets, the process mean vector had
reached statistical process control (99% limit) by batch 82, however significant drift
occurred taking the process mean vector out-of-control. The level of this statistic fell
sharply from batch 154 onwards indicating that the process mean vector was shifting
towards the theoretical value determined by the search algorithm. Only with the detrend
model did the value again fall within control - however with all data sets the value
lowered and reached a constant level. This chart is consistent with the Hotelling’s 7^
control phase 2 charts which identified most batches between 83 and 154 as out-of-
control. An example of this control chart for DT data is shown in Fig. 3.7C.
To minimise Type I error, 99% control limits were examined. With all data sets, the
placebo blend 29 (BN 1469) was clearly identified as being out-of-control (Appendix B:
(BN 1024) were mostly found to be out-of-control with absorbance, SNV, DT and SNV-
(BN5031) and 192 (BN5974) were out-of-control with all models except Sg2dl 1
125
H otelling's 7^ control ch a rt: p h a s e 1 H otelling's 7^ c o n tro l c h a rt: p h a s e 2
10
10
10 I
I 600
c 200
Fig. 3.7. MSPC Control charts of blend PC scores (DT data) {n = 193 batches): A)
Hotelling’s control phase 1; B) Hotelling’s 7^ control phase 2; C) Hotelling’s 7^
MEWMA & D) Anderson’s asymptotic normal approximation.
126
0.15
0.05
-0 .0 5 -
-0 .0 6 - 0.02 0.06
PC2 S cores
Fig. 3.8. PC2 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.
127
0.15
0.1
49
•3Ô1 *3
•64
•67
27 21
•60
0.05 •15 40
68 ' -3 /
,6032
29 •72
•192 •1 6 3 8 0
•181
•127
•110
-0.05 4 04
64
—0.1
-0.015 - 0.01 -0.005 0.005 0.01 0.015
PCS S cores
Fig. 3.9. PC6 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 7^ 95% and 99% control ellipses.
128
0.06
■19:
0.04
•29 •186
•18*W&5% é%-
•158
0.02
•130
•192
•16 !
(/) •183 <30.
2 •141
ü8)
o
Q.
•1491 '€4
- 0.02
•58
-0 .0 4
-0 .0 6 —
-0.015 - 0.01 -0.005 0 0.005 0.01 0.015
PC6 S cores
Fig. 3.10. PC6 versus PC2 scores plot of PCA scores of blends (n = 193 batches)
(DT NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.
129
batches as outliers: 136 (BN5040), 137 (BN5041) and 138 (BN5042) (Fig 3.4E); 171
(BN5069), 173 (BN5071) (Fig. 3.4C) and 174 (BN5072) (Fig. 3.4D); 161 (BN5058)
(Fig. 3.4F) (Appendix B: Table B23). Most of these batches were subsequently found to
Clearly, the individual PC score control charts were unable to identify drift in the
process as with the Monte Carlo search algorithm and Hotelling’s 7^ charts. This is
because all available data are used with no optimisation procedure. However, with all
blend models, visual inspection of the scores on the F* PC clearly show three levels,
indicating systematic variance in this component. However, they were able to identify at
the 99% control level, groups of consecutive batches which were also out-of-control
with Hotelling’s 7^ charts, eg batches 21, 22, 23, 24 and placebo batch 29. The Sg2dl 1
batches which were subsequently out-of-control with MPC A models (Fig. 3.4)
3. 8. 5 Process Variance:
With all blend PCA models, similar batches were identified as having shown significant
process variance at the 99.9% confidence interval (For all data pre-treatments: mean
sample generalised variance of all batches used in control phase 1 was zero, variance of
sample generalised variance of the control phase 1 batches was approximately 2n).
Across all models, the earliest batches seemed to show more process variance (batches
14, 15, 16, 26, 27, 28, 33, 37, 39, 43, 64, 67), with a cluster of values above 99.9%
confidence limits for each data set (Appendix B: Tables B35 - B39). With the exception
of the SNV-DT model, batch 109 (BN5015) was out-of-control. Most data sets
130
and Sg2dl 1 model identified batches 125 (BN5029) and 126 (BN5030) as out-of-
control. All of these were found to have processed unusually with some blend data-sets,
(BN5044), 144 (BN5045 [S* re-blend]), 146 (BN5045 [4“' re-blend]), 152 (BN5049)
and 186 (BN5969) showed highly significant process variance on most data sets. None
of these batches had processed unusually, with Q values below 99% limits, however the
highest scoring batches, 144 and 146, were re-blends of the same production batch
(BN5045). Interestingly, this batch had a mean total water content below specification
limit, however the deviation of water content throughout the re-blends had been found
These plots were also examined to assess whether significant process variance was
traceable to the PC scores. The results are summarised in Appendix B: Table B24.
restricted to cases where a minimum of 3 observations were above 99% control limit.
This corresponds to one blend sample, and should minimise Type I errors. With batches
64, 142, 146 and 186 it was possible to trace the excessive process variance to 1 or 2
PCs. Batch 146, which exhibited excessive variation in moisture content throughout the
blend, showed significant variance on PCI for all models. Hence this PC must represent
chart for Sg2dl 1 sub-group mean centred scores is shown in Fig. 3.11. The PC
Shewhart plots for PCI (original PC scores) with all models also showed systematic
131
x10
10 0.5
I
10 4
. .! .. .•
I k
s » Q.
_4
' •
-5 : ................ —6
Fig. 3.11. PC Shewhart control charts for sub-group mean centred PCA scores (n =
9 spectra per hatch) of NIR Savitzky-Golay 2"^ derivative of absorbance spectra of
blends (n = 193 batches). A) PCI scores; B) PC2 scores; C) PC3 scores; D) PC4
scores; E) PCS scores; F) PC6 scores & G) PC7 scores (95% and 99% control
limits shown).
132
The certificate of analysis total moisture contents for these blends were plotted and did
not show the same trend - it is possible that this component represents scatter which is
3. 8. 6 Correlation of Blend MSPC Results With Raw Material Usage Batch Data
The results of MSPC of blends (Sections 3.8.2 and 3.8.5) - Q statistic, Anderson’s
the Hotelling’s f^- were examined for correlations with raw material batch numbers
used. Significant correlations with these statistics would indicate that a particular batch
of a raw material resulted in poor process performance and excess process variance.
To examine the correlations between raw materials used and the MSPC indicators of
poor process performance, a data array was constructed and ordered such that the rows
corresponded to a particular process batch number, and the columns corresponded to all
of the raw material batch numbers used. The elements of the array contained the masses
(kg) of each raw material batch used in a particular blend batch. For each batch of blend
produced, each of the five raw materials could comprise several different raw material
batches in different amounts. Where none of a given raw material batch was used, the
element was made zero. The product batches used in this study were numbers 85 to 154,
which corresponds to the batches with systematic variation. The total number of
variables (raw material batches) used was 52, and the total number of blend batches
studied was 70. Appended to this array were column vectors of Q statistic, Anderson’s
'f', with these values depending on the NIR blend data set examined.
133
Data Analysis
The combined data set (70-by-54) was autoscaled and subjected to a PCA. The number
of vectors retained included those with eigenvalues greater than or equal to unity, which
is typical in a factor analysis (Appendix B: Table B50). For each data set, the vectors
were then normal varimax rotated into terminal vectors (Harman, 1976).
With principal factor analysis, no significance is attached to the order of rotated factors
(however they still collectively account for the same amount of variance as before
rotation) (Dillon and Goldstein, 1984). Instead, the factor loadings are examined in turn
for large positive or negative values. Their values correspond to correlations of that
factor with the variable of high loading value (positive or negative), and may be
{p = 0.01) (most loadings on a principal factor analysis are typically close to zero).
Meaning may also be attached to the pattern of a factor’s loadings. For example, if there
With this study, important correlations would be those which include Q statistic,
Results
The SNV and SNV-DT data sets showed significant correlations ip = 0.01, n = 70)
(Appendix B: Table B50). This is important with these batch numbers being used in
134
BN5045, which exhibited excessive moisture variation throughout the blend.
The Sg2dl 1 data set showed significant correlations {p = 0.01, n = 70) between
B50). This is important as these batch numbers were used in varying proportions in
blends 136 to 139 (BNs 5040, 5041, 5042 and 5043) which were found to have
processed unusually.
This Section deals with MSPC for the lower and higher strength tablet data sets. For
each strength of tablet, models were examined from tablet absorbance data and from
All 44 batches were examined in this study. The rank of each pre-treated data set was
determined by recursive 'leave one out' cross validation in conjunction with Q statistic
Between 6 and 8 PCs were required (Appendix B: Table B2). All models except Sg2dl 1
explained in excess of 99%SS while the Sg2dl 1 model accounted for 96.68% %
probably due to reduction in signal to noise. All PCs were found to be significant
(Appendix B: Table B2). All of the PCA models’ loadings appear to represent physical
The absorbance model’s first two PC loadings show a general trend of increasing value
across the variables suggesting scatter (Fig. 3.2A & B). The third PC’s loadings have
high negative loadings at 1400 and 1900 nm, characteristic of water (Fig. 3.2C). With
SNV, DT and SNV-DT models, the loadings of the first component are characteristic of
135
water (Fig. 3.12). The other PCs’ loadings of these models resemble quadratic baseline
Table B51).
Some similarity exists in tablet batches found to have processed unusually with this
control chart. Batches 21 and 31 (BN5026 and BN5041) had significant Q values (p =
0.01) for absorbance, DT and SNV-DT models (Appendix B: Table B13). Batches 30,
31 and 32 were found to have processed unusually with absorbance data (BNs 5040,
5041 and 5042). Batch 7 had significant Q values with all models except Sg2dl 1.
Consideration of these control charts with their blend counterparts showed good
agreement for absorbance and SNV-DT models (Fig. 3.13). With both of these models,
batches 38, 39 and 40 had significant Q values (p = 0.01) (BNs 5064, 5065, 5067
respectively) (Appendix B: Table B13) (Fig. 3.13). Overall, absorbance data was
This procedure was repeated with the 44 batches as for the blends. The batches
examined were produced from blends selected from batches 85 to 184. Most of these
were found to have shown process drift, however from batch 154 onwards, the process
matrix (Appendix B: Table B40). With 99% and 95% control limits established, the PC
136
0.05
0.1
<0
? I
% 0.05
1
^ -0 .0 5
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0.05
o) 0.05 &
c
1
CL -0 .0 5 ^ -0 .0 5
- 0.1
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
0.05
g, 0.05
?
g -0 .0 5
1
- 0.1
o
“■ -0 .0 5
-0 .1 5
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.06
0.04
0.02
oo. -0.02
-0 .0 4
Fig. 3.12. Lower strength tablet NIR absorbance spectra PCA loadings (SNV DT
data) (#î = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.
137
20 25
O bservation
Fig. 3.13. Q Statistic control chart of PCA model of lower strength tablets (n = 44
batches) (NIR SNV-DT absorbance spectra) (95% and 99% limits shown).
138
Hotelling^s Control Phase 2
The results of this control phase were found to depend on whether batches had been
included in the PCA model generation, or whether they had been excluded on the basis
of significant Q values. With absorbance and SNV-DT models, batches 38, 39 and 40
(BN5064, BN5065, BN5067) (excluded from PCA but projected onto eigenvectors)
were not found to be out of control. With these models, batches that were found to be
out of control were early batches that correspond to those batches which exhibited
systematic drift in the blend models. The detrend model did not identify any batches as
out of control.
The SNV and Sg2dl 1 models both identified batches 38, 39 and 40 (BN5064, BN5065,
BN5067) as out of control, which is consistent with blend Q statistics (these batches
were not identified as unusual with tablet Q statistics) (Appendix B; Table B40). Both
models also identified batch 30 (BN5040) as being out of control. Interestingly, the
Sg2dl 1 model also identified batches 31 and 32 as being out of control (BNs 5041 and
5042) (Fig. 3.14B). This result agrees with SNV-DT and Sg2dl 1 blend Q statistics. This
is shown clearly on a plot of PC scores 5 versus 4 with the Hotelling’s 95% and 99%
control ellipses (Fig 3.15), which also shows batches 38 and 39 lying close to the 95%
limit and batch 40 lying outside the 95% limit. The Sg2dl 1 pre-treatment provided most
consistent results.
Raw absorbance data PCA models clearly did not model the differences in NIR spectra
arising from physical differences in the surface of the friable tablets. This was detected
in the residual space with the Q statistic charts. The physical differences which affected
the spectra were emphasised with the Sg2dl 1 and SNV transformation and were thus
139
H o tellin g ’s 7^ c o n tro l c tta rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10
10
10 10
10 5 10 15 20 25
10 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
I 150
10
10 '
5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.14. MSPC Control charts of lower strength tablet NIR absorbance spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.
140
•32
•30
•14 •12
•16
<56 42
•15
-2 •13 •37
40
-4
•19
-6
•10
•18
—8
-1 -0 .8 -0 .6 -0 .4 -0 .2 0 0.2 0.4 0.6 0.8
PCS S cores
X 10"
Fig. 3.15. PCS versus PC4 scores plot of PCA scores of lower strength tablets (n -
44 batches) (Sg2dll absorbance data) with Hotelling’s 95% and 99% control
ellipses.
141
Hotelling^s MEWMA Control Charts
With absorbance, DT and SNV-DT, this chart shows the exponentially weighted
interesting result which confirms the results of blend MSPC. This tablet data set
includes batches which as blends had shown significant drift in the process mean vector
With the SNV and Sg2dl 1 models batches 38, 39 and 40 (BN5064, BN5065, BN5067)
had not shown significant sum of squared residuals, Q, and were therefore included in
the PCA. These batches were found to be out of control with the Hotelling’s 7^ phase 2
control charts. The SNV Hotelling’s 7^ MEWMA moved from an out of control state, to
an in control state and then out of control with these batches. The Sg2dl 1 Hotelling’s 7^
MEWMA was much better able to respond to process drift (Fig. 3.14C), it moved out of
control and then drifted to an in control level. At batch 30 (BN5040), it again drifted out
of control (See batches 30, 31 and 32, Appendix B: Table B40), returning again to an in
control and then began to drift to a lower level from batch 42 (BN5076) onwards.
These charts used 99% control limits as with blends. These results were less consistent
than those of Q statistic and Hotelling’s control charts for both blends and tablets.
Batches 2, 5 and 10 were found to be out of control on PC Shewhart control charts with
SNV, DT and SNV DT. Batches 2 and 5 (BN4984 and BN4987) were out of control on
(BN5065 and BN5067) out of control on PC6. The loadings on this PC were difficult to
interpret. Sg2dl 1 PC control charts identified batches 31 and 42 (BN5041 and BN5076)
as outliers on PCs 4 and 5 respectively. Again the loadings were difficult to interpret.
142
Process Variance: Anderson^s Asymptotic Normal Approximation
These control charts showed fairly consistent results across models: batches 4, 9, 15 and
16 (BNs: 4986, 4994, 5003, 5013 respectively) showed significant generalised variance
(Appendix B: Table B40). These were not found to have exhibited significant
generalised variance with blend data, however batch 15 was found to have exhibited
excess generalised variance with SNV-DT multiway PCA model [(blend and tablet
absorbance and transmission data) and (blend absorbance and tablet transmission data)
and (blend and tablet absorbance)]. Batches 15 and 16 were also found to have shown
significant process variance with the Sg2dl 1 multiway PCA model (blend absorbance
and tablet absorbance data). With most models, batch 15 had exhibited the highest
All 44 batches were examined in this study. The rank of each pre-treated data set was
determined by recursive 'leave one out' cross validation in conjunction with Q statistic
Between 6 and 9 components were used for the models. The raw transmission data
required most components to model the data whereas the Sg2dl 1 model required only 6
PCs to fit the data (Appendix B: Table B4). Most models accounted for more than
99.9%SS except Sg2dl 1 which accounted for 98.2%55. All PCs extracted were found to
represent significant amounts of variance. The PC loadings for all models appeared to
loadings did not contain any peaks and may represent scatter (Fig. 3.3). The SNV
model’s first PC were similar and may also represent scatter. The DT and SNV-DT
models showed a broad peak on the first PC loading, however these may still represent
143
scatter as the peaks are less resolved than on the higher order PC loadings. All PC
loadings with Sg2dl 1 model contain sharply defined peaks, which suggests that they
Some similarity exists in batches found to be unusual and not fit a given model across
all models (Appendix B: Table B 15). Batch 12 (BN4997) was an outlier with all
models. Batch 21 (BN5026) was an outlier with all models except S g2dll. With SNV-
DT model, batches 20, 21 and 23 (5025, BN5026, 5028) which are virtually consecutive;
were all found to be outliers. Batch 28 (BN5038) was an outlier on SNV, DT and SNV
DT models. Batch 30 (BN5040) was an outlier on SNV-DT and S g2dll models. The
BN5042). These have consistently been found to be unusual with blend and tablet
absorbance models and are shown to be unusual with multiway PCA models (Section
3.10), and is therefore a consistent result. As the loadings for this model all contain ;
chemical absorption peaks and therefore are likely to be modelling just chemical
information, it is probable that these batches are physically different. This is unlikely to/
results which showed that tablets produced from batch 30 (BN5040) were friable (1
mg). No C. of A. data were recorded for batches 31 and 32 (BN5041 and BN5042).
With the exception of Sg2dl 1, all models required an initial control phase 1 batch
screening using the first two PCs. With 6 batches and 100 Monte Carlo searches.
144
0.15 0.15
0.1 0.1
w 0.05
en 0.05
?
1
U -0 .0 5
o- -0 .0 5 a.
- 0.1
- 0.1 -0 .1 5
-0 .1 5 - 0.2
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.1 0.1
0.05
O) en
- 0.1
o -0 .0 5
“ ■ - 0.2
-0 .1 5
- 0 .3
- 0.2
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.2
0.1
0.15
0.05
w 0.1
o>
.E 0.05
1 -0 .0 5
^ -0.1 a- -0 .0 5
-0 .1 5
-0.1
-0.2 -0 .1 5
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
Fig. 3.16. Lower strength tablet NIR transmission spectra PCA loadings (S g2dll
data) {n = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5 & F) PC6.
145
potentially unusual batches were identified and excluded. These batches were then
further screened in control phase 1 by inclusion of all PCs and construction of 99%
The Sg2dl 1 model did not require an initial screening, and identified batches 5, 6, 38,
39 and 40 as unusual (Fig, 3.17B). The latter 3 batches (BNs 5064, 5065 and 5067 )
were successfully identified as unusual with all other pre-treatment models, however
this required an initial screening of PCs 1 and 2 (Fig. 3.18). Batch 38 produced tablets
The SNV model also identified batches 30, 31 and 32 (BNs 5040, 5041, 5042) and 38,
39 and 40 as unusual however these batches were not found to have processed unusually
The PCA models of the transmission data sets contained chemical information and
physical information and were more effective at detecting anomalies than the Q statistic
control chart. The SNV transformation was more effective at detecting anomalous
batches than the Sg2dl 1 transformation, probably because more physical information is
modelled in its first two PC loadings. The Sg2dl 1 model appears to effectively remove
manner do not fit the model and are detected on the Q statistic chart. With the same
control charts. However, batches which have chemical differences are easily detected on
these control charts and more easily than with the other transformations. The SNV
model produced less consistent results with the Q chart, however all batches thus far
found to be unusual were detected on the Hotelling’s 7^ control charts. This is because
146
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 Hotelling'S 7^ control chart: p h a s e 2
}
£
X I
10"' 10"'
5 10 15 20 25 30 35 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
X 100
1L. 10
5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.17. MSPC Control charts of lower strength tablet NIR transmission spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.
147
X 10'
40
•16 •38 -39
•15 •21
•10
•13
2 •32
67
O •19 67
Q.
•33
•35 43
-2
- 3 I-
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2
PC2 Scores X 10
Fig. 3.18. PC2 versus PCI scores plot of PCA scores of lower strength tablets {n =
44 batches) (Sg2dll transmission data) with Hotelling’s 95% and 99% control
ellipses.
148
The Sg2dl 1 and SNV models for these data appear to be the best transformations for
With all pre-treatment models, the Hotelling’s MEWMA control charts generally
showed drift towards the theoretical grand mean vector. With transmission data, this
value was within the 99% limits, showing improvement up to batch 27 (BN5037) (Fig.
3.19C). Drift could then be observed, peaking at batch 32 (BN5042). This statistic
drifted out of control from batch 38 onwards (BN5064). The SNV transformation was
less useful in showing this drift - it identified earlier batches as out of control on
Hotelling’s 7^ control phase 2 chart, hence this statistic did not shift back within
control. The DT, SNV-DT and Sg2dl 1 charts all showed an improvement in drift to
onwards. With SNV-DT and Sg2dl 1 models, drift occurred at batch 28 (BN5038),
taking this statistic out-of-control. An improvement in drift could be seen from batch 32
(BN5042) onwards with both, but again increasing from batch 38 (BN5064). With the
Sg2dl 1 model, this statistic reduced to within control at batch 37 before again drifting
out of control with batch 38 (BN5064). The Sg2dl 1 chart was considered to perform
best and could detect drift which was not detected on the Hotelling’s 7^ phase 2 control
chart.
With these charts, 99% control limits were used. Batches 5 and 6 signalled out of
149
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10
10 10
10
10 5
10
10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
10
-5 0 0
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.19. MS PC Control charts of lower strength tablet NIR transmission spectra
PC scores (raw data): A) Hotelling’s control phase 1; B) Hotelling’s control
phase 2; C) Hotelling’s MEWMA & D) Anderson’s asymptotic normal
approximation.
150
Batch 30 (BN5040) signalled out of control with DT and SNV-DT on PC2. The SNV
With this control chart, batches 30, 32 and 39 (BN5040, BN5042, BN5065) were found
to have significant process variance (Appendix B: Table B42). These results are
consistent with multiway PGA results (MPCA lower strength tablet batch indices 25, 27
and 34 respectively (refer to Table 3.6), Appendix B: Table B48). The Sg2dl 1 model
did not identify these batches as having shown excess process variance (however
BN5065 was detected with multiway PCA analysis with this transformation (Appendix
B: Table B48)). Transmission and detrend data performed the best (Appendix B: Table
B42).
All 43 batches were examined in this study. The rank of each pre-treated data set was
determined by recursive 'leave one out' cross validation in conjunction with Q statistic
Between 6 and 7 PCs were extracted for the different pre-treatments, with models
explaining more than 99.6%SS for all pre-treatments except Sg2dl 1 (Appendix B: Table
B3). This model only accounted for 95 A%SS with 6PCs. All extracted PCs were found
The PC loadings for these models appear to show physical and chemical information.
The absorbance model’s 2 PC loadings appeared to represent scatter (Fig. 3.20A &
B). The 3*^^, 4**^ and 5^ PC loadings appear to contain features characteristic of water,
microcrystalline cellulose and magnesium stearate respectively (Fig. 3.20C, D & E). All
151
0.06 0.02
g) o>
~ 0.04
o -0 .0 2
Q- -0 .0 4
-0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.15 0.1
o) 0.1 ai 0.05
0.05 1
uQ .
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1 0.1
0.05
I o) 0.05
1 u
^ -0 .0 5 Q.
-0 .0 5
-0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm
0.1
S, 0.05
1
O
“■ -0 .0 5
-0.1
1200 1400 1600 1800 2000 2200 2400
W avelength/nm
Fig. 3.20. Higher strength tablet NIR absorbance spectra PCA loadings (raw data)
{n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6 & G) PC7.
152
other models did not appear to have loadings that represented scatter, the first, 2"^ and
represent water.
Consistent batches were found to be outliers with the Q statistic control charts across
models (Appendix B: Table B14). Batches 32 and 33 (BNs 5059 and 5060) were
outliers with absorbance and SNV-DT; batch 33 was an outlier with SNV and DT.
Batches 4 and 6 (BNs 4990 and 4992) were outliers with SNV, DT, SNV DT and
Sg2dll. Batch 42 (BN5080) was an outlier with absorbance and SNV. SNV also
showed batch 43 (BN5081) as an outlier. Comparison of these results with those for
blends, multiway PCA and transmission models suggested that tablet absorbance
batches of higher strength tablets that have processed unusually. The transmission
models provided a better indicator of unusual batches (Appendix B: Table B16). This
can be explained by the certificate of analysis data which showed that unusual batches
(detected on blend and multiway models) had a greater thickness than normal (these
tablets may have exhibited greater elasticity). Transmission measurements pass through
the entire tablet and are likely to be sensitive to the increase in pathlength of the tablet,
whereas reflectance measurements were not sensitive to this. (N. B. The opposite of this
was true for some lower strength tablets (Section 3.9.2). Unusual batches, detected at
blend and multiway model Q statistic control charts were found to be friable in
certificate of analysis tests. These tablets are therefore likely to be more brittle than
normal and will show increased fragmentation. This occurs at the surface of the tablet
153
and will affect surface texture and therefore also affect NIR absorbance measurements.
The results of Q statistic control charts for absorbance data for lower strength tablets
(Appendix B: Table B 13) agreed with this finding and provided a much better
indication of this than for transmission higher strength tablet data sets (Appendix B:
Table B 15)).
A preliminary search using 6 randomly selected batches and PCs 1 and 2 was required
with absorbance data only. The other models produced satisfactory results using one
more batch than the number of retained PCs and 90% control levels except Sg2dl 1
The effect of different data pre-treatments on the ability of MSPC of their PCA models
was considered alongside multiway and blend MSPC results. This indicated that SNV
and DT pre-treatments provided the most reliable models (Appendix B: Tables 41, 12
and 49) (Fig. 3.21B): batches 36, 40 and 41 (BN5070, BN5078 and BN5079) were
The absorbance data identified batch 36 as unusual at the 95% level. Interestingly, with
this model, batch 41 can be seen as falling outside the 95% confidence ellipse for PCs 1
and 2. Batches 40 and 36 lie close to the 95% confidence limit. The Sg2dl 1 data set
identified batch 41 as unusual at 99% confidence level and batch 40 as unusual with a
95% confidence level. On the PCI versus PC2 Hotelling’s 7^ control ellipse, batch 36
can be seen to lie very close to the 99% confidence limit (Fig. 3.22).
154
H o tellin g 'S 7^ c o n tro l c tia rt: p h a s e 1 H o te liin g 's 7^ c o n tro l c h a rt; p h a s e 2
I
I
10"' 10"'
5 10 15 20 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
I 150
100
10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.21. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (SNV data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.
155
X 10
o
in
42
O
Q.
•19
•16
69
-2
-3
-2 -1 .5 1 -0 .5 0 0.5 1 1.5 2 2.5
PC2 S cores
X 10'
Fig. 3.22. PC2 versus PCI scores plot of PCA scores of higher strength tablets {n
43 batches) (Sg2dll NIR absorbance data) with Hotelling’s 95% and 99%
control ellipses.
156
Hotelling*s MEWMA Control Charts
The ability of these charts to representatively show drift in the process was found to
depend on how well the Hotelling’s 7^ control phase 2 charts were able to identify
unusual batches. The absorbance and Sg2dl 1 charts were considered best indicators of
process drift, and showed the value of this statistic moving out of control for batches 36,
40 and 41 (BN5070, BN5078 and BN5079) (Fig. 3.23C). This is because with these
charts some batches fall above the 95% confidence level but not the 99% confidence
level. This appeared to cause the exponentially weighted Hotelling’s 7^ value to shift
without greatly falling beyond the 99% level. Evidently, this chart is very sensitive to
large Hotelling’s 7^ control phase 2 values in excess of the 99% control limit (a
different value for A could be examined - in this study a value of 0.1 was used). The
MEWMA chart for DT data was quite a good indicator of drift - it showed the process
drifting to a state of control by batch 35 (BN5069), then drifting out of control with
batch 36 (BN5070), moving again within control by batch 38 (BN5073) and then
drifting out of control from batch 39 (BN5074) onwards. The SNV-DT model showed
the process drift to a state of control by batch 29 (BN5053) and then continued to drift
out of control thereafter. The SNV model appeared more erratic and shifted
considerably.
With these charts, 99% confidence limits were used. The best performing model was
for PCs 1 and 5 respectively. The loadings for PCI suggest moisture, for PC5 this is
more difficult to interpret. SNV-DT and Sg2dl 1 models both identified batch 41
(BN5079) as unusual on PCs 5 and 6 respectively. Again, the loadings on these PCs
157
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
I I
I 10 I
10"' 5 10 15 20 25 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
E 600
10
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 3.23. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (Sg2dll data) in = 43 batches): A) Hotelling’s control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.
158
Process Variance: Anderson^s Asymptotic Normal Approximation
The results of these charts did not appear to be consistent with blend and multiway
control charts. This is probably because reflectance measurements are not sensitive to
differences in pathlength. Increased tablet thickness {i.e. pathlength) was found to have
occurred with some tablet batches identified as unusual in multiway models (compare
All 43 batches were examined in this study. The rank of each pre-treated data set was
determined by recursive 'leave one out' cross validation in conjunction with Q statistic
outlier detection. The R statistic was used as stopping criterion (Appendix B: Table B5).
Between 3 and 7 PCs were required for PCA models. The Sg2dl 1 model required
fewest PCs, transmission and DT models required 7 PCs. Most models explained more
than 99.5 %SS except Sg2dl 1 which accounted for just 98.9 %SS.
All PCs extracted were found to represent significant amounts of variance using
Anderson’s likelihood ratio test (Appendix B: Table B5) (p = 0.01). The loadings for
these models appeared to contain physical and chemical information: with transmission,
this was evident with the first few PCs (Fig. 3.24); with SNV and DT the first PC
loadings appeared to represent physical information. The SNV-DT and Sg2dl 1 models
unusually and had significant (p = 0.01) Q statistics. These results agreed with those of
blend and multiway models (Appendix B: Tables 12 and 49) and were in contrast to
159
0.15
0.075
0.1
^ 0.07
0.05
O
a.
0.065
-0 .0 5
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm
0.1 0.2
.E 0.05
8,
c
0.1
1 1
-0 .0 5
- 0.1 -0.1
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm
0.1
0.2
0.05
I» 0.1
-I -0 .0 5 1
-0.1 -0.1
-0 .1 5 -0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm
0.2
o) 0.1
Ü
1
Q.
-0.1
800 900 1000 1100 1200
W avelength/nm
Fig. 3.24. Higher strength tablet NIR transmission spectra PCA loadings (raw
data) (n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.
160
0.1 0.2
g) 0.05
I
-0 .0 5
-0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm
0.05 0.15
0.1
o> oE> 0.05
I -0 .0 5
-I -0 .1
& -0 .0 5
-0 .1 5
-0.1
- 0.2
-0 .1 5
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm
Fig. 3.25. Higher strength tablet NIR transmission spectra PCA loadings (SNV DT
data) (n = 43 batches): A) PCI; B) PC2; C) PC3 & D) PC4.
161
those of absorbance models for these batches. Previous results indicated that these
batches are physically different from normal batches. Transmission measurements pass
through the entire tablet and are likely to be more sensitive to pathlength differences
than diffuse reflectance measurements. Batches 26, 27, 28, 29, 30, 31, 32, 38, 40 and 41
(BNs 5050, 5051, 5052, 5053, 5054, 5058, 5059, 5073, 5078, 5079) were found to be
unusual. These results were consistent between the models (Appendix B: Table B16),
and are consecutive batches, suggesting deviation in process performance with time.
Examination of certificate of analysis data showed that these batches were thicker than
Between 23 and 32 batches were selected for control phase 1 (Appendix B: Table B43).
The PCA models produced from transmission and DT data required a preliminary
Monte Carlo search using the first two PCs and random selection of 6 batches to
identify unusual batches. All other models were able to identify unusual batches using
All models identified similar batches of tablets as unusual (Appendix B: Table B43),
BN5050, BN5051, BN5052, BN5053, BN5054, BN5058) and 40 and 41 (BN5078 and
BN5079) were identified as unusual (Appendix B: Table B43). These batches produced
tablets with average thickness above the limit of 4.6 mm, and from batch 28 to 30
(BN5052, BN5053, BN5054) also produced tablets which were friable (2 mg, 4 mg and
162
maximum limit and batch 40 produced tablets which were friable (1 mg). These tablets
had very large transmission values across the spectral range scanned. An example
control chart for Sg2dl 1 data is shown in Fig. 3.26B. These batches were clearly
observed as falling outside the 99% Hotelling’s 7^ control ellipse on the PCI versus 2
H o t e l l i n g M E W M A Control Charts
With this control chart, the statistic was largely out of control for all batches with the
models produced from SNV and SNV DT transmission measurements. The charts using
consistent with the Hotelling’s control phase 2 charts (Fig. 3.26C). Significant drift
ip = 0.01) in the process mean vector was observed from batch 24 (BN5048) onwards.
The value fell from batch 31 (BN5058), consistent with Hotelling’s 7^ control phase 2
charts, however it began drifting at batch 40 (BN5078), also consistent with previous
findings.
Consistent results were obtained with this control chart across the models (Appendix B:
Table B28). Batches 30, 36 and 40 (BN5054, BN5070, BN5078) were identified as
With this control chart, batches 30,40 and 41 (BN5054, BN5078 and BN5079) were
found to have exhibited significant process variance across the models tested (Appendix
B: Table B43). Batch 30 and 40 produced tablets of an average thickness above the
maximum limit which were also friable (1 mg); batch 41 produced tablets which were
163
H o tellin g ’s 7^ c o n tro l c tia rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10"' 5 10 15 20 25
10"' 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
i 300
c 100
10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.26. MSPC Control charts of higher strength tablet NIR transmission spectra
PC scores (Sg2dll data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.
164
0.04
40
0.035
0.03
0.025
0.02
ü) 0.015 •30
0.01
0.005
•38
66 •14
^ % •J|B-9-1^Û11
-0 .0 0 5
- 0.01
-3 -2 1 0 1 2 3
PC2 S c o re s
X 10
Fig. 3.27. PC2 versus PCI scores plot of PCA scores of higher strength tablets (n
43 batches) (Sg2dll NIR transmission data) with Hotelling’s 7^ 95% and 99%
control ellipses.
165
above the maximum average thickness.
In this section multiway PCA models are examined. For the two strengths of tablet, the
models were derived from both blend NIR spectral data and tablet NIR spectral data.
Different combinations of blend and tablet spectral data were examined. Consistency
between the results of these models was examined, and compared with reference
analysis data. The combinations of blend and tablet spectral data examined were :
Section 3.6)
Section 3.6)
3. Blend absorbance and tablet absorbance and transmission NIR data (including
The results of MSPC of each of these models were compared between MPCA models
and against results of MSPC of PCA models for the blend and tablet stage. They were
also compared against reference analysis data. These comparisons were made so that the
relative importance of tablet absorbance and transmission data for modelling and
For the lower strength multiway models tested, between 3 and 8 PCs were required to
model the data sets (Appendix B: Tables B6, B8 & BIO). The Sg2dl I transformation
166
produced models which required the least number of components and the multiway
model produced from raw blend and tablet absorbance and transmission data required
most PCs. The loadings for each model appeared to represent chemical and physical
information, however their precise interpretation with these data sets was difficult.
With all models examined, the multiway models produced from SNV blend absorbance
and tablet absorbance data (Appendix B: Table B 17) and Sg2dl 1 blend absorbance and
tablet transmission data (Appendix B: Table B 19) produced MPCA models which were
able to identify groups of batches previously found to have processed unusually. These
Between 11 and 39 batches were selected in control phase 1 using this algorithm, as for
MSPC control charts were found to perform best with blend and tablet transmission data
(Appendix B: Table B46) and blend and tablet absorbance and transmission data
(Appendix B: Table B48) (Fig. 3.28B). The MPCA models derived from pre-treated
blend and tablet absorbance data identified systematic variation in the process with the
first 18 batches selected for the control ellipsoid (Appendix B: Table B44). With these
batches, this systematic variation was traced to the blend data which showed two groups
167
H o te llin g 's 7® c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7® c o n tro l c h a rt: p h a s e 2
I I
I I
10"'
5 10 15 20 25 5 10 15 20 25 30 35
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.28. MSPC Control charts of lower strength tablet multiway PCA scores
(SNV DT blend absorbance and tablet absorbance and transmission data) (n = 39
batches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.
168
With the other MPCA models, batches 25, 26, 27 (BN5040, BN5041, BN5042) and 33,
34, 35 and 39 (BN5064, BN5065, BN5067, BN5967) were found to have processed
unusually with this statistic (p = 0.01). These results agree with previous blend and
tablet PCA models and could be traced to one or two PCs (Appendix B: Tables B44,
B46 and B48). An example PC2 versus PCI score plot with Hotelling’s 7^ 95% and
99% control ellipses is shown in Fig. 3.29 for the SNV DT multiway data set.
With the charts produced from the MPCA models containing transmission data, process
drift was observed from batch 23 (BN5039) onwards. The process mean vector drifted
out of control (p = 0.01) and from batch 27 (BN5042), began to shift towards the grand
mean vector, however further drift in process operating performance from batch 32 was
observed (Fig. 3.28C). These charts were very similar to those for the tablet models.
With the MPCA models derived from blend and either tablet absorbance or transmission
p = 0.01 level (Appendix B: Tables B29 & B31). The MPCA model derived from blend
With this control chart, some consistent results between MPCA charts were observed.
The overall MPCA model incorporating all NIR measurements identified batches 24
169
1.5
0.5
^5
68Q8 ^220
60
2
o
Q. •10
« -14-15
-0 .5
- 1 . 5 '-
— 0.8 - 0.6 -0 .4 - 0.2 0 0.2 0.4 0.6
PC2 S c o re s
Fig. 3.29. PC2 versus PCI scores plot of multiway PCA scores of lower strength
tablet process (SNV DT blend absorbance and tablet absorbance and transmission
data) (w = 39 batches) with Hotelling’s 7^ 95% and 99% control ellipses.
170
(BN5039 [re-blend]), 25 (BN5040) and 27 (BN5042) as having exhibited excessive
process variance (Appendix B: Table B48). The most consistent results with this were
obtained from the MPCA models calculated from blend and tablet transmission data
Between 3 and 8 PCs were required to model the data sets. Raw data tended to produce
models requiring most PCs (Appendix B: Tables B7, B9 & B 11). Models produced
from SNV data which included SNV transmission spectra required only 3 PCs
(Appendix B: Tables B9 & B 11) as did the Sg2dl 1 model produced from blend and
With this control chart, consistent results were produced between models derived from
blend and tablet absorbance data and blend and tablet transmission data (Appendix B:
Tables B18 & B20). Batches found to have processed unusually and therefore not fit the
BN5054) and 38 and 39 (BN5078 and BN5079). These results are consistent with
previous PCA models. These groups of consecutive batch numbers show that the
Between 18 and 41 batches were selected in control phase 1 using this algorithm, as for
171
Hotelling^s Control Phase 2
With this control chart, models produced from blend and tablet transmission and blend
and tablet absorbance and transmission measurements produced the most consistent
results which were also in agreement with those of previous PCA models (Appendix B:
Tables B47 & B49). This is shown in Fig. 3.30B for the Sg2dll multiway data set.
34 (BN5070) and 38 and 39 (BN5078 and BN5079) were found to have processed
unusually, typically on the first PC (Appendix B: Tables B47 & B49). These results
were clearly observed on PCI versus PC2 scores plots for Sg2dl 1 data with Hotelling’s
The control charts produced from MPCA models which included transmission data
were very consistent with those of PCA data for higher strength tablet transmission
measurements. With the overall MPCA models (blend and tablet absorbance and
transmission measurements), drift in the process mean vector was observed for all
control charts. With raw, DT and Sg2dl 1 data (Fig. 3.30C), this occurred from batch 23
(BN5049) onwards, and shifted toward the process mean vector at batch 27 (BN5053),
and then drifted further from batch 33 (BN5069). With the SNV DT model, the process
mean vector drifted within control (p = 0.01) at batch 30 (BN5059), however it then
drifted out of control from batch 32 (BN5068). The SNV DT Hotelling’s 7^ MEWMA
With these control charts, batches 24 (BN5050), 34 (BN5070) and 38 (BN5078) were
identified as unusual (Appendix B: Tables B30, B32 & B34). As with the Hotelling’s 7^
172
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2
Ik
I I
I 1
10"' 5 10 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
'x 1 0 0 0
Ë 500
10'* 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 3.30. MSPC Control charts of higher strength tablet multiway PCA scores
(Sg2dll hlend ahsorhance and tablet absorbance and transmission data) {n = 41
hatches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.
173
0.04
68
0.035
0.03
0.025
0.02
69
w 0.015 ■28
0.01
67
0.005 63
65
66
64
10 6 3 .
^ "—
^3 -tot -5«4«-7
-0.005
- 0.01
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2 2.5
PC2 Scores X 10"
Fig. 3.31. PC2 versus PCI scores plot of multiway PCA scores of higher strength
tablet process (Sg2dll blend absorbance and tablet absorbance and transmission
data) {n = 41 batches) with Hotelling’s 95% and 99% control ellipses.
174
control charts, MPC A models which included tablet transmission data produced most
(BN5078) and 40 (BN5080) were found to have exhibited significant process variance
with the MFC A data sets which included transmission measurements (Appendix B:
175
3.11 Summary of Results
The results o f this chapter are summarised in Tables 3.7 and 3.8.
Q A" Q r* A Q A Q A
4993 ! •
4994 2 • • •
4996 3 • • • •
4997 4 • . . • .
4998 5
5002 6
5003 7 • •
5013 8 . •
5015 9
5017 10 • Increase in content uniformity
5018 11 • . Increase in content unifwmity
5025 12 •
5025"' 13 •
5026 14 •
5027 15
5028 16
5028"’ 17
5029 18 . •
5035 19
5036 20
5037 21
5038 22
5039 23 Excessive moisture deviation
5039"’ 24 Excessive moisture deviation
5040 25 . •
5041 26 Friability of Img
5042 27 Friability of 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 Friability of Img
5065 34 • N/A' C of A. data missing
5067 35 N/A' C. of A. data missing
5075 36
5076 37 • Low drug content per tablet
5077 38
5967 39 •
176
Table 3.8. Summary of results for higher strength tablet process.
NIR blend absorbance NIR tablet absorbance NIR tablet transmission M ultiway NIR data Unusual C. of A. Comment
n u rn b ^ data predicted unusual data predicted unusual data predicted unusual predicted unusual
Q A" Q A Q A Q 7" A
4991 1
4992 2
4999 3
5000 4
5001 5
5009 6
5010 7
5011 8
5021 9
5022 10
5023 11
5024 12
5031 13
5032 14
5033 15
5043 16
5043"’ 17
5043™ 18
5044 19 .
5046 20 •
5047 21
5048 22
5049 23 • • • Thick tablets (4.62 mm) of high
m oisture (4.44% )
5050 24 Thick tablets (4.62 mm)
5051 25 Thick tablets (4.62 mm)
5052 26
5053 27 Friability of 1 mg
5054 28 • • • Friability 4 mg; tablet thickness
= 4.63 mm
5058 29 . • Thick tablets (>4.6 mm)
5059 30 Thick tablets (>4.6 mm)
5060 31 Thick tablets (>4.6 mm)
5068 32 Thick tablets (4.62 mm)
5069 33 Thick tablets (4.62 mm)
5070 34 • • Thick tablets (4.61 mm)
5071 35 Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36 •
5074 37 Friability of 4 mg
5078 38 • • Friability of 1 mg; thick tablets
(4.62 mm)
5079 39 . • Thick tablets (4.62 mm)
5080 40 •
5081 41 •
177
3.12 Conclusion
The aim of this study was to determine whether NIR spectrometry could be used for
MSPC procedures. The ability of the method to identify trends in process performance
and their relationship to reference analytical product quality data were therefore
determined. Statistical correlations were also made of MSPC results with raw material
batch usage data, in an attempt to determine whether particular batches of raw materials
The MSPC procedures were applied to PCs of the NIR spectral data. These were used as
they are linear combinations of the original data which summarise the systematic
variability in a ‘least squares sense’. The dimensionality of the data were therefore
reduced to a few variables, without loss of information. This allowed easier process
monitoring.
The statistical assumptions made of the data were that most batches of product were
from the same multinormal population and were produced whilst the process operated
within statistical control. This assumption was verified by reference of MSPC control
phase batches to their reference analytical data, i.e. all control phase 1 batches produced
Trends in process performance could be detected at each process stage by the MSPC
procedures. Systematic variability could be identified in the blending process from raw
batches of blend were produced which were of unusual quality or which produced
tablets of unusual quality. These exhibited excessive blend moisture deviation and
deviation in drug substance content, and were identified as significantly different by the
NIR method. Some of these blends were not tabletted. The placebo blend was also
identified as unusual by the NIR method and indicates that this method may be useful
178
for surveillance of counterfeit medicines. Many of the other blends identified as unusual
by the NIR method, but not having shown unusual reference analysis results, ultimately
produced batches of unusual tablets. These unusual product quality included tablet
friability, increased tablet thickness and prolonged dissolution time. The differences in
unusual product quality may relate to the particular batch numbers of raw materials used
The NIR measurements required to implement the MSPC method includes diffuse
reflectance measurements of the blends and both diffuse reflectance and transmission
measurements of the tablets (in this study these raw data were transformed to apparent
absorbance data). Though batch averaged spectra were used for PC A, several
measurements of blends and tablets are required for each batch for detection of process
With both strengths of tablet, diffuse reflectance and transmission measurements are
physical anomalies which affected the tablet surface, eg friability. The transmission
thickness and drug substance content. Hence both measurements should be made. The
Overall, the results suggest that with a properly validated reference set of blend and
tablet NIR measurements, the MSPC method could be solely used for quality control
and assurance of the process at the blending and tabletting stage and to monitor process
179
CHAPTER 4
4.1 Introduction
measurements of blends and tablets. The ability of this ‘model-free’ approach to process
control and monitoring was determined by comparison of MSPC results with reference
least squares regression (PLSR) and multiblock* PLSR are examined. These methods
maximise the covariance between NIR spectral measurements and reference analytical
measurements and therefore produce latent vectors which are most closely related to the
reference analytical values. The PLS scores produced may be monitored in the same
In Sections 4.3, singleblock^ PLS models of blends and tablets respectively, are
subjected to MSPC. The predictive abilities of these models are compared with those of
Chapter 3.
Section 4.4 models the entire process by multiblock PLSR. Conclusions regarding these
multiblock data sets are a collection of three-way data sets of process data at each process stage.
' singleblock data sets are three-way data sets of process data of one process stage.
180
4. 2 Near Infrared And Reference Analysis Data Sets Used
In this study, the NIR spectral and reference analysis data sets (Section 3.4) used were
those of Section 3.6.3 for multiway PC A of the process for the two strengths of tablet
The number of batches of blends and tablets used in the manufacture of the lower
strength tablet was therefore 39 (Table 3.6). For the higher strength tablet process, 41
batches of blends and their corresponding tablets were examined (Table 3.6).
With a few batches produced, some reference analysis data were missing.
The spectral data were analysed using code programmed in Matlab 5.2 Scientific and
Technical Programming Language (The Mathworks Inc., Natick, MA, USA). A number
of different pre-treatments of blend and tablet data were examined. The data pre
treatments examined were those which were previously shown to produce the best
multivariate principal components analysis process models for such data (Section 3.11).
These were:
2. SNV-DT;
3. S g2dll.
These 2 pre-treatments were applied to blend absorbance data, and absorbance and
transmission data for each of the two strengths of tablet. This provided 18 data sets
(including multiway data sets of blend and appended tablet absorbance and transmission
data) to be studied via projection to latent structures and multiblock projection to latent
structures.
181
4. 3 Statistical Quality Control of Pharmaceutical Blends And Tailets by Single
Block PLSR
The method of PLS summarises the important variability in both the process (NIR
spectral data) (X) and the final product quality data (Y) (Morud, 1996). This procedure
projects the information in the high-dimensional data spaces (X, V) dowi onto low
dimensional spaces defined by a small number of latent variables. The NR (X) and
final product quality (Y) data sets were first mean-centred and scaled to mit variance
The number of PLS components required to extract the information fromX and Y was
judged to be 6 components for all models. This number of components vas selected as
it modelled a considerable amount of the Y data and was also the numbeiof
Singleblock PLS models (Wangen and Kowalski, 1988) were created forraw and pre
treated spectral data sets (SNV DT, 11 point quadratic Savitzky-Golay snoothed 2"^
derivative) for the blends used to produce lower strength tablets {n = 39 batches) and for
39 batches). The models were calculated using the average spectrum of tlose recorded
for each batch. The average spectrum of each batch was used instead of several
batch introduced by particle size effects and differences in scatter from tie surfaces of
the glass vials and tablets and also because average measurements tend tc follow a
normal distribution.
182
Single block PLS models were also created for raw and pre-treated NIR data sets of
blends used to produce higher strength tablets (« = 41 batches) and of combined NIR
using the average spectrum of each batch. This produced a total of 12 models of blends
and tablets for each strength and for raw and pre-treated data sets which were monitored
subsequently.
appropriate rank for the models. The decision to use this size of model was based on
previous experience of multivariate PCA projection of these data and because this rank
explained most variability within the NIR data sets and significant amounts of
variability within the certificate of analysis data sets (Appendix C: Tables C l to C4).
With raw data sets for blend and tablet models, the high amount of variance explained,
typically above 99.6%, is due to multiplicative scatter within the data sets which
accounts for most variability in the spectra. The amount of variance accounted for by
these models of the certificate of analysis data was therefore not surprisingly lower: 37
and 53% for lower strength blend and tablet models respectively and 27 and 29% for
higher strength blend and tablet models respectively. The lower variability accounted
for in the certificate of analysis data probably arises from the fact that these data do not
With the SNV DT singleblock PLS models, slightly lower variability in the NIR data
sets is accounted for by the models (Appendix C: Tables C l to C4) due to scatter
removed much of the multiple scatter information from the NIR data sets and produced
single block PLS which accounted for similar amounts of variability for both the NIR
183
4. 3. 2 Quantitative Calibration o f Individual Certificate o f Analysis Blend and
multivariate processes have been shown to work well where all process data and
product quality data are monitored (MacGregor et al, 1994). This is because the
variability within and between process and product quality variables is required to
model the process (MacGregor et al, 1994). In this study, the ability to produce
Only low amounts of variability could be modelled for some of the individual blend and
tablet variables and was not considered accurate enough for future prediction of
individual reference analysis variables (Appendix C: Tables C7 & C8). The low
The loadings of the single block PLS models appeared to represent physical and
The performance of this control chart was determined for each model type and for the
different data pre-treatments by comparison of results above the 99% significance level
with the results from the certificate of analysis reference data {i.e. significant Q values
and corresponding unusual reference analysis values). In particular, the ability of this
chart to identify anomalies in the process in batches preceding those which produced
lower quality product and failed reference laboratory tests, was examined as this would
be useful for detecting trends in the process over time. PLS models for each process
184
stage (blend and tablet) were created for the different pre-treated data sets in a recursive
fashion, with batches whose Q statistic exceeded the 99% significance level excluded
With the lower strength tablet data set, raw data showed batches 22 to 26 as having
processed unusually at the blend stage (Appendix C: Table C9). Of these batches,
batches 23 (BN5039), 24 (BN5039 re-blend) were re-blends of a blend which was found
5.94%, limit: < 5%). With the SNV DT and Savitzky-Golay 2"^ derivative blend data
sets, batch 25 (BN5040) was found to have processed unusually despite normal
certificate of analysis data (Appendix C: Table C9). With the SNV DT data set, batches
at the blend stage (Appendix C: Table C9) but had reference laboratory results within
limits. The PLS models for lower strength tablets showed some agreement with this
result: batches 36 and 37 (BN5076) were outside the 99% limit with the SNV DT and
Savitzky-Golay 2"^ derivative models; batch 32 (BN5062) was outside the control limit
with raw and SNV DT models and batch 33 (BN5064) was outside the limit for raw
data (Appendix C: Table CIO). The certificate of analysis results showed that batch 33
produced tablets which were friable (1 mg). Reference data for batches 34 and 35 were
missing. For batch 37 the drug substance content was found to be low (4.83 mg/tablet,
range: 4.85 to 5.15 mg/tablet). With the Savitzky-Golay tablet model, batches 26
(Appendix C: Table CIO). Their certificate of analysis data revealed that batches 26 and
Clearly, these control charts were able to detect unusual process behaviour at the blend
185
and tablet stages with consistency between results for the two stages. Raw data and
SNV DT data sets were able to detect process anomalies at the blend stage which were
With the higher strength tablet data set, batches 35 (BN5071), 36 (BN5073), 37
(BN5074), 38 (BN5078), and 39 (BN5079) were all found to exceed the 99% limit with
the SNV DT blend data set (Appendix C: Table C l 1). The certificate of analysis data
for the blends of these batches were all within limits. However, the reference analytical
data showed that batches 35, 37 and 38 produced tablets which exhibited friability of 1
mg, 4 mg and 1 mg respectively. In addition, batches 35, 38 and 39 also had average
tablet thicknesses which were above the limit of 4.6 mm (all 4.62 mm). Batches 32
(BN5068), 35 and 38 were outside the 99% limit for the Q statistic with the Savitzky-
Golay 2"^ derivative blend data (Appendix C: Table Cl 1) and batches 33 (BN5069), 34
(BN5070) and 39 were outside the 99% limit with the blend raw data (Appendix C:
Table C l 1). Batches 32 and 33 did not produce tablets which were friable, but the
average tablet thicknesses for these batches were 4.62 mm and 4.61 mm respectively -
outside the limit. Batch 34 produced tablets which showed longer than normal
disintegration time (17 seconds) and had an average tablet thickness of 4.66 mm, which
is above the limit. The tablet Q statistic control charts which performed best were those
for raw and Savitzky-Golay 2"^ derivative data. These identified batches 35 and 38 and
batches 35, 37 and 39 as unusual respectively (Appendix C: Table C l2). With the
(BN5054) were identified as unusual (Appendix C: Table C l2). These batches were
both friable (4 mg and 1 mg respectively) and batch 28 had an average tablet thickness
of 4.63 mm. With the higher strength tablet raw data PLS model, batches 28 (BN5054),
Table C12). Batches 29 to 31 had average tablet thicknesses above the limit of 4.6 mm.
186
The Q statistic control charts for the higher strength tablet process were also able to
identify batches and groups of consecutive batches which differed from the normal data
set at both the blend and tablet stages. The manufacture of these batches ultimately
produced tablets of lower quality. This deviation from normal process operating
performance was identifiable from blend data despite showing no unusual reference
analytical results at that stage. Overall, the SNV DT data set showed the best
performance for the higher strength tablet process of the pre-treatments tested.
The PLS scores of the single block models corresponding to the latent vectors of the
NIR data were used for MSPC monitoring. Estimation of the control phase 1 batches
was performed by a Monte Carlo simulation. This involved random selection of the
described in Chapter 3. The Hotelling’s distance was then measured for the scores of
the remaining batches from this ellipse and batches which had significant values were
recorded. This process was repeated 200 times to produce a frequency bar chart which
showed the frequency that any batch had been found to be significantly different from
the control group. Batches which had a frequency greater than zero were not used for
estimation of the process variance-covariance matrix and process mean vector. All
batches were then monitored in control phase 2 using those batches which were deemed
to be in-control {i.e. Monte Carlo bar chart frequency = 0) as the control phase 1 group.
With raw data for the blends of lower strength tablet process and higher strength tablet
process batches, the Monte Carlo search was unable to identify any unusual batches
(Appendix C: Tables C17 and CIS). Most batches were therefore considered by the
187
algorithm to be in-control {n - 39 batches of lower strength tablet process blends, « = 36
batches of higher strength tablet process blends). For both processes, examination of the
scores revealed that they were evenly divided into two clusters. This was found to be
due to a difference in offset in the original blend absorbance data (Fig. 4.1), probably
arising from different particle size distributions and porosity. With the PLS models
produced from raw tablet data, these distinct clusters were not observed on the score
plots, with some batches identified as unusual at control phase 1. The results of the
tablet models produced from raw data for both strengths of tablet showed agreement
with results from the Q statistic control charts (Appendix C: Tables C19 & C20), hence
raw data was not considered useful for monitoring the blends and spectral scatter
batches were used in control phase 1 for the lower strength tablet process and higher
strength tablet process blends respectively (Appendix C: Tables C17 & CIS). With the
Savitzky-Golay 2"^ derivative data, 29 and 32 batches were used in control phase 1 for
the lower strength tablet process and higher strength tablet process blends (Appendix C:
Tables C l7 & CIS). PLS models for the two strengths of tablet used between 17 and 2S
batches for the lower strength tablet process tablets (Appendix C: Table C l9) and used
between 19 and 34 batches for the higher strength tablet process tablets (Appendix C:
Table C20).
Scatter correction was considered a useful pre-treatment of the blend absorbance data.
With the SNV DT scatter correction of the lower strength tablet process data set, a
number of batches, some consecutive in batch number, were found to have significant
Hotelling’s 7^ values (Appendix C: Table C17). These were compared with results of
188
0.3
- 0.1
- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm
0.3
n - 0.1
- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm
Fig. 4.1. Blends batch mean absorbance spectra for: A) lower strength tablet
process {n = 39 spectra) and B) higher strength tablet process (n = 41 spectra),
showing spectra for each process divided into two classes with different offsets
(batches 1 to 18 for lower strength process blends have lowest offsets, batches: 1 to
17 ; 19 to 20; and 23 to 25 for higher strength process blends have lowest offsets).
189
the lower strength tablet process SNV DT lower strength tablet process data set. Batches
(BN5067) were found to have significant values at both the blend and tablet stage
(Appendix C: Tables C17 & C19). Batches 26 and 27 produced tablets with friability of
analysis data for batch 35 was not recorded). The Savitzky-Golay 2"^ derivative
transformation did not detect these batches as unusual from their blends, however
batches 33, 34 (BN5065) and 35 were detected as unusual from the tablet PLS model
(Appendix C: Table C l9) (tablet reference analysis data for batch 34 was not recorded).
For the lower strength tablet process, the SNV DT transformation was considered to be
the most appropriate pre-treatment of those tested for blend and tablet data.
With the higher strength tablet process data sets, the SNV DT transformation detected
batches 23 (BN5049)and 24 (BN5050) as unusual at both the blend and tablet stages.
Batch 25 (BN5051) was detected as unusual at the blend stage. The blends were not
found to have unusual reference analysis values, however they produced tablets with
average thicknesses of 4.62 mm, above the limit of 4.6 mm. The SNV DT and Savitzky-
Golay tablet models detected batches 34 (BN5070) and 38 (BN5078) and 39 (BN5079)
as unusual (Appendix C: Table C20). Batch 34 produced tablets with unusually long
disintegration time (17 seconds) and which had an average thickness greater than the
maximum limit (average thickness = 4.66 mm). Batches 38 and 39 had average tablet
thicknesses above the maximum limit (4.62 mm for both batches) and batch 38
produced tablets with 1 mg friability. With the Savitzky-Golay blend model, batch 38
could be detected as unusual despite having apparently normal reference analysis results
(Appendix C: Table C l8). Both the SNV DT and Savitzky-Golay 2"^ derivative pre
190
4. 4 Statistical Quality Control of The Entire Process by Multiblock PLSR
Multiblock PLS has been proposed as an alternative projection method to single block
PLS for situations with large numbers of variables that can be divided into distinct
process sections (X blocks) (MacGregor et al, 1994; Wangen and Kowalski, 1988). The
data sets used in this study may be considered as two process X blocks: blend stage, X I
(blend spectra) and tablet stage, X2 (combined tablet absorbance and transmission data).
The final product quality data, Y, are the blend and tablet certificate of analysis
reference data combined (14 variables). MacGregor (1994) states that multiblock
projection methods allow for easier interpretation of process data because smaller
meaningful blocks can be individually monitored as may the relationship between these
blocks.
The multiblock PLS algorithms used in this study were variations of those of Wold et al
(1987) and Wangen and Kowalski (1988). This algorithm leads to a set of orthogonal
each block X/. The X/ blocks are then represented in terms of their leading A PLS
components as:
= (4.4.1)
a=\
X 2 = '^ t 2 ^ p 2 l + E 2 (4.4.2)
a=\
This enables monitoring and construction of diagnostic plots for each block separately,
as previously described for singleblock PLS. This algorithm is also able to effectively
handle missing data. An overall monitoring space for the process may be obtained by
191
using projections in the latent vector space {tCa, a= 1,2, ...) of the consensus matrix T
formed by collecting the latent vectors from the individual blocks. The score vectors of
this consensus matrix are no longer orthogonal, however it has been shown that where
blocking of the process variables has been done in a meaningful fashion, these vectors
should continue to define the same plane as the latent vectors obtained by single block
Y = ± tc ,g l (4.4.3)
a=]
A check on whether the blocking has been done well is to compare predictions of Y
obtained from the singleblock and multiblock algorithms for the same number of
The multiblock PLS models were found to account for similar amounts of variability
within each process stage NIR data set and for the certificate of analysis data as was
explained by the single block PLS models (Appendix C: Tables C5 to C6). The raw and
SNV DT models explained considerably more variability within the NIR data sets than
in the certificate of analysis data sets, as with single block PLS models. The Savitzky-
Golay smoothed second derivative produced multiblock PLS models which accounted
for similar amounts of variability within the NIR and certificate of analysis data sets
These results suggest that the multiblock models are able to model the data as
effectively as the single block PLS models for each stage of the process and that the
192
4. 4. 3 Multiblock PLS Loadings
The loadings of the multiblock PLS models appeared to represent physical and chemical
information. However, their precise interpretation was difficult, especially with the
The performance of this control chart was determined for each model type and for the
different data pre-treatments by comparison of results above the 99% significance level
with the results from the certificate of analysis reference data {i.e. significant Q values
and corresponding unusual reference analysis values). In particular, the ability of this
chart to identify anomalies in the process in batches preceding those which produced
lower quality product and failed reference laboratory tests, was examined as this would
be useful for detecting trends in the process over time. MB PLS models for each process
stage (blend and tablet) were created for the different pre-treated data sets in a recursive
fashion, with batches whose Q statistic exceeded the 99% significance level excluded
With the lower strength process multiblock PLS models, batch 36 (BN5075) was found
to have processed unusually at both the blend and tablet stages for all data sets,
consistent with results for the single block PLS results of tablet models and SNV DT
blend PLS model (Appendix C: Tables CIS & C14) (Fig. 4.2). This batch did not show
unusual reference analytical results at either process stage, however it occurred within a
period where unusual product was produced. Batches 25 and 26 (BN5040 and BN5041)
were found to have exceeded the 99% limit for Q at both the blend and tablet stage with
the Savitzky-Golay 2"^ derivative data consistent with singleblock PLS models
193
(Appendix C: Tables C13 & C14) (Fig. 4.3). With the SNV DT and Savitzky-Golay 2"^
derivative lower strength tablet models, both batches 36 and 37 exceeded the 99% limit
for Q (Appendix C: Table C14); batch 37 (BN5076) was found to have an average drug
substance content per tablet below the minimum limit. This result is consistent with
those of single block PLS models and also confirms the results of this control chart at
the blend stage where batch 36 was identified as having processed unusually. With the
lower strength process multiblock PLS models, the SNV DT model performed best at
the blend stage and the Savitzky-Golay 2"^ derivative model performed best at the tablet
stage.
With the higher strength tablet process data sets, consistent results were obtained
between blend and tablet control charts. Batch 34 (BN5070) and batches 36 to 39
(BN5073, BN5074, BN5078, BN5079) were found to have processed unusually at the
blend and tablet stage with raw data (Appendix C: Tables CIS & C16) (Fig. 4.4). With
processed unusually at the blend stage, and batches 28 and 30 were found to have
processed unusually at the tablet stage (Appendix C: Tables CIS & C16) (Fig. 4.S). The
results for Savitzky-Golay 2"^ derivative data were very consistent: batches 27 to 33
(BNS0S3, BNS0S4, BNS0S8, BNS0S9, BNS068, BNS069) and batches 3S, 38 and 39
(BNS071, BNS078, BNS079) were found to have processed unusually at both the blend
and tablet stages (Appendix C: Tables CIS & C16) (Fig. 4.6). These results are
consistent with those of the single block PLS models. For the higher strength tablet
process multiblock PLS models, the Savitzky-Golay 2"^ derivative model performed
best.
194
A
•
O 40
1 •
5 10 15 20 25 30 35
O b s e rv a tio n
300
B
250 -
200
O 150 - -
100
50
0
5 10 15 20 25 30 35
O b s e rv a tio n
Fig. 4.2. Multiblock PLS Q statistic control charts for the lower strength tablet
process: A) blends, B) tablets (SNV detrend data, n =29 batches for PLS
modelling).
1000
A
800 -
600
400 - -
200
0 1 •
5 10 15 20 25 30 35
O b s e rv a tio n
O 400
• •
5 10 15 20 25 30 35
O b s e rv a tio n
Fig. 4.3. Multiblock PLS Q statistic control charts for lower strength tablet
manufacturing process: A) blends; B) tablets (11 point quadratic Savitzky-Golay
smoothed second derivative data, n =29 batches for PLS modelling).
195
A '
■
. ' . ' '
5 10 15 20 25 30 35 40
O b s e rv a tio n
O b s e rv a tio n
Fig. 4.4. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (raw data, n = 28 batches out of 41 used for
PLS model).
20 25
O b s e rv a tio n
O 1000
20 25
O b s e rv a tio n
Fig. 4.5. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (SNV detrend data, n = 32 batches out of 41
used for PLS model).
196
20 25
O b serv atio n
20 25
O b serv atio n
Fig. 4.6. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (11 point quadratic Savitzky-Golay smoothed
second derivative data, n =22 batches out of 41 used for PLS model).
197
Overall, the Q statistic monitoring charts enabled construction of single block and
multiblock PLS models which represented the variability present within batches
produced whilst the process was operating in a normal manner. The results of
Preliminary work with multiblock PLS models showed that these produced consistent
results between blocks with the Q statistic control charts and with the Hotelling’s
approximation control charts were therefore only considered with these models.
With both lower strength and higher strength tablet process multiblock PLS models,
Savitzky-Golay 2"^ derivative transformation of the blend absorbance data was found to
be necessary. The Monte Carlo search algorithm was unable to detect any unusual
batches with blend absorbance data for the lower strength tablet process (Appendix C:
Table C21) and could only detect two batches as unusual with higher strength tablet
process SNV DT blend data set (Appendix C: Table C22). With the higher strength
process blend absorbance data, this search algorithm identified a cluster of batches
which have been shown to have a different spectral offset only (Appendix C: Table
C22) (Fig. 4. IB). With the lower strength process SNV DT blend data, the first 18
batches were identified as unusual by the search algorithm, however these have been
shown to differ only in their spectral offset from the remaining batches (Appendix C:
Table C21) (Fig. 4.1 A). The Savitzky-Golay 2"^ derivative pre-treatment effectively
removed the offset and was considered to be useful for the blend data set. The number
of batches selected by the Monte Carlo search with this blend data set was 28.
198
With the lower strength tablet data sets, scatter correction was not necessary and all
models used between 25 and 29 batches (Appendix C: Table C23). With the higher
strength tablet data sets, scatter correction of the raw spectra was necessary (Appendix
C: Table C24).
With the lower strength tablet process model generated from Savitzky-Golay 2"^
C21, note: no reference analysis data were recorded fo r batches 34 and 35). These
results are consistent with those for the single block PLS models, and show that the
model can detect deviations in the process from the blend stage which are not detectable
with current reference analysis. These results are shown clearly with the MSPC control
charts (Fig. 4.7B) and on the PLS components 1 and 2 score plot (Fig. 4.8). The score
plot shows that these batches have deviated away from the normal operating region
defined by the 99% control ellipse, in addition batch 26 (BN5041) lies outside the 95%
control ellipse. With the lower strength tablet models, all three of these batches were
identified as unusual, in addition with the Savitzky-Golay 2"^ derivative model, batches
These results are shown with the MSPC control charts (Fig. 4.9B) and on the PLS
component 2 and 1 score plot (Fig. 4.10). With this plot, batch 25 lies outside the 99%
control region, and deviation from the normal process operating region was observed
199
H o tellin g ’s 7^ c o n tro l c h a rt; p h a s e 1 H o te liin g 's c o n tro i c h a rt: p h a s e 2
10" T- 10*
10"' 10 20 25
10"'
5 5 10 15 20 25 30 35
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
10"' 5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.7. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.
200
25
•16
•3 2
•3 9
•10 <37,
<31
a. 0.1
-5
•m 02
-10
-15
<33
-2 0 •3 5
•3 4
-2 5
-3 0 -20 -1 0 0 10 20 30
PLS com ponent 2 sc o res
Fig. 4.8. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components 2
and 1 scores of multiblock PLS model of lower strength tablet process blends (11
point Savitzky-Golay smoothed second derivative data, n = 29 batches for PLS
modelling; 28 batches used for optimising control limits).
201
H o te llin g 's 7* c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2
I
I
10 *
5 10 15 20 25 5 10 15 20 25 30 35
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s
10 '
5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 4.9. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablets (11 point Savitzky-Golay smoothed second derivative data, n
= 25 control phase 1 batches): A) Hotelling’s 7^ control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.
202
40 47
•39 -38
41
•14
c -1 0 ■20
g.
43
-30 44
-40 •35
-50
-50 -40 -30 -2 0 -1 0 0 10 20 30 40 50
PLS component 2 scores
Fig. 4.10. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
2 and 1 scores of multiblock PLS model of lower strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n - 2 9 batches for PLS
modelling; 25 batches used for optimising control limits).
203
With the higher strength tablet process blend data sets, the Savitzky-Golay 2"^
derivative produced results which were most consistent with single block PLS models.
unusually with this pre-treatment (Appendix C: Table C22) (Fig. 4.11B). This is a
useful result as these batches produced lower quality tablets despite having normal
blend reference analysis results. The higher strength tablet models produced results
consistent with single block PLS and reference analysis data for the SNV DT and
(Figs. 4.12B & 4.13B). Batches 34, 38 and 39 (BN5070, BN5078 and BN5079) were
found to be unusual with SNV DT tablet data, and batches 34 and 38 were unusual with
Savitzky-Golay 2"^ derivative data. This is shown clearly with the MS PC control charts
(Figs. 4.12B & 4.13B) and with PLS components score plots (Figs. 4.14 to 4.17). With
the PLS component 6 and 1 score plot of SNV DT higher strength tablet data (Fig.
4.15), deviation from normal process operating performance can be observed from
batch 27 (BN5053) to 28 (BN5054). These batches were above the 95% Hotelling’s 7^
limit with the MS PC charts. Deviation from normal operating performance can be
clearly observed from batches 38 and 39. In addition, batch 37 (BN5074) lies just
outside the 95% ellipse but at the opposite end of the ellipse from batches 38 and 39.
Interestingly, this batch did not exhibit excessive average tablet thickness as batches 38
and 39 did, however the tablets showed 4 mg friability. Similar results were observed
with the PLS component 6 and 5 score plot (Fig. 4.14). With the Savitzky-Golay 2"^
derivative data, batches 38 and 39 also lay outside the 99% control region on PLS
components score plots (Figs. 4.16 & 4.17). Batch 27 lay outside the 95% control limit
These results clearly show that the multivariate PLS projection method produces
excellent diagnostic ability for identifying deviations from normal process operating
204
Hotelling's 7^ control chart: p h a se 1 H otelling's 7^ controi chart: p h a se 2
10*
10*
5 10 15 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.11. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.
205
Hotelling’s 7* control chart: p h a se 1 Hoteiling s 7^ controi chart: p h a s e 2
X 1000
c 500
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.12. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablets (11 point Savitzky-Golay smoothed second derivative
data, n = 21 control phase 1 batches): A) Hotelling’s control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.
206
Hotelling’s 7® control chart: p h ase 1 H oteiiing's 7® control chart: p h a se 2
10"
10"' 5 10 15 20
10"' 10 15 20 25 30 35 40
5
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
X100
10" '
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.13. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablets (SNV detrend data, n = 24 control phase 1 batches): A)
Hotelling’s control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.
207
g2 -3 3
•14
•1 -3 6
0.1 ■24 ^ 5
40
-15 <39
-2 0
•3 8
-2 5
-2 5 -2 0 -15 -1 0 -5 0 5 10 15
PLS com ponent 6 sc o res
Fig. 4.14. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
6 and 5 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).
208
50
40
■24
<39
■26
20 •3 8
•10 •3 7
•12
•36 ■21
CL
•32 <33
o _10
-2 0
•15
-30
•14
-40
•13
-50
-25 -2 0 -15 -1 0 -5 0 5 10 15
PLS com ponent 6 sc o res
Fig. 4.15. Hotelling’s 7^control ellipses (95% and 99% limits) for PLS components
6 and 1 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).
209
■38
20
•10/
<39
•30
•3€3
a.
,■21
•14
-1 0
•IB
•19
-2 0 •13
-3 0
-4 0 -3 0 -2 0 -1 0 0 10 20 30
PLS com ponent 4 sc o r e s
Fig. 4.16. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
4 and 1 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n = 22 hatches for PLS
modelling; 21 hatches used for optimising control limits).
210
^4 c
•15 <32
<35
40
2o 41
O •2 9
c ■20 -2 4
66
g •11 <37
& •10
E -1 0
O
ü •21
Q.
<39
-2 0
-3 0
•3 8
-4 0
-3 0 -2 5 -2 0 -1 5 -1 0 -5 0 5 10 15 20
PLS com ponent 5 sc o r e s
Fig. 4.17. Hotelling’s control ellipses (95% and 99% limits) for PLS components
4 and 5 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n =22 hatches for PLS
modelling; 21 hatches used for optimising control limits).
211
conditions. Deviation may be observed at the blend stage before tabletting, despite
normal reference analysis results. The monitoring plots may be further simplified to two
PLS scores plots once statistical limits and in-control batches have been established.
These enable both determination of process deviation and also diagnosis of the problem
as the regions in which the scores are located are characteristic of the problem.
These control charts were able to successfully identify drift in the process mean vector.
With the lower strength and higher strength process blend models produced from raw
data, approximately half of the batches exceeded the MEWMA Hotelling’s 7^ limit of
17.72. An example for higher strength tablet process blends is given in Fig. 4.18C. This
chart was sensitive to the systematic variation in the blending process (See Chapter 3)
which resulted in an offset difference in their absorbance spectra (Fig. 4.1 A). However,
scatter correction of the blend and tablet data sets was necessary for monitoring the
blending stage. The MEWMA control charts for SNV DT and Savitzky-Golay 2"^
derivative data of the lower strength tablet process blends were both able to detect drift
in the process mean vector, which reached a state of statistical control and then drifted
out of control, above the MEWMA Hotelling’s 7^ limit of 17.72, from blend 33
(BN5064) onwards (Fig. 4.7C). With the higher strength tablet process blend data sets,
the Savitzky-Golay data and SNV DT data both show the MEWMA Hotelling’s 7^
drifting in and out of control, above the limit of 17.72 (Figs 4.11C & 4.19C
respectively). The Savitzky-Golay data were better able to identify drift in the process
mean vector as it performed better with the Hotelling’s 7^ control phase 2 charts.
With the lower strength tablet data sets, the performance of the MEWMA Hotelling’s 7^
chart also depended on the ability of the model to identify unusual batches from the
212
Hotelling’s 7^ control ctiart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2
10'
10"' 10"'
5 10 15 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
10'^ 5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s
Fig. 4.18. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (raw data, n = 25 control phase 1 batches):
A) Hotelling’s control phase 1 chart; B) Hotelling’s 7^ control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.
213
Hotelling’s 7^ control chart: p h ase 1 Hotelling's 7^ control chart: p h ase 2
10
10
50
40
: 30
20
10
0
-1 0
-2 0
10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.19. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (SNV detrend data, n =35 control phase 1
batches): A) Hotelling’s 7^ control phase 1 chart; B) Hotelling’s 7^ control phase 2
chart; C) Hotelling’s 7^ multivariate exponentially weighted moving average
control chart; D) Anderson’s asymptotic normal approximation control chart.
214
Hotelling’s control phase 2 charts. This control chart showed drift out of control with
the SNV DT data from batch 33 onwards (BN5064) (Fig. 4.20C), however the chart
using Savitzky-Golay 2"^ derivative data also identified systematic drift from batch 24
to 27 (BN5039 to BN5042) and was therefore considered to perform better (Fig. 4.9C).
For the higher strength tablet data sets, the SNV DT and Savitzky-Golay models
showed the MEWMA Hotelling’s 7^ mostly above the limit of 17.72 (Figs. 4.13C &
4.12C). This was due to batches exceeding the Hotelling’s 7^ 99% limits over the
production period.
With the lower strength tablet process data sets, the SNV DT model detected significant
process variance in the tablets of batches 26 (BN5041), which produced friable tablets,
and 36 (BN5075) (Appendix C: Table C23) (Fig. 4.20D).With raw tablet data, batch 37
(BN5076) was also detected as having exhibited excessive process variation at the tablet
stage (Appendix C: Table C23). The tablets from this batch were found to have low
drug substance content (4.83 mg per tablet). Batches 10 (BN5017) and 11 (BN5018)
were also found to have shown significant process variance with raw data (Appendix C:
Table C23). Examination of the reference analysis data revealed an increase in content
uniformity with these batches. Batch 10 had content uniformity of 2.43% and batch 11
had a content uniformity of 5.1%. Although less than the maximum limit of 6%, this
was still a high value (mean content uniformity = 1.66%, standard deviation = 1.06%, n
= 39 batches).
With higher strength tablet process blend data, the SNV DT model identified batches 23
C: Table C22) (Fig. 4.19D). Both of these batches produced tablets of average
215
Hotelling’s 7^ control chart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2
10 *
10
10' *
20
10*
5 10 15 25 5 10 15 20 25 30 35
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s
E 60
40
10* 5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s
Fig. 4.20. Multivariate statistical process control charts for multiblock PLS model
of lower strength tablets (SNV detrend data, n = 29 control phase 1 batches):
A) Hotelling s 7^ control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.
216
thickness 4.62 mm (above the maximum limit of 4.6 mm); batch 23 (BN5049) produced
tablets with a high moisture content of 4.44%, close to the maximum limit of 4.5%
41 batches) and batch 38 (BN5078) produced tablets which were friable (1 mg). Batch
38 was also identified as having shown significant process variance at the tablet stage
with Savitzky-Golay 2"^ derivative data (Appendix C: Table C24) (Fig. 4.12D).
increase in process variance that leads to extreme final product quality on a number of
variables. With the multiblock models, this may also be traced back to the blend stage
despite apparently normal blend reference analysis values. This demonstrates the power
of the ‘multivariate projection of NIR process data’ method over the traditional wet
The results of this chapter are summarised in Tables 4.1 (lower strength tablet process)
217
Table 4.1. Summary of results for lower strength tablet process.
NIR blend absorbance NIR tablet combined data NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number data predicted unusual predicted unusual predicted unusual tablet data predicted
unusual
G f A" G 0 A G A
4993 1 •
4994 2
4996 3 .
4997 4 . •
4998 5
5002 6
5003 7 .
5013 8
5016 9 .
5017 10 . • . Increase in content uniformity
5018 11 • . • Increase in ccmtent uniformity
5025 12
5025"* 13
5026 14
5027 15
5028 16
5028"* 17 •
5029 18 .
5035 19
5036 20
5037 21
5038 22
5039 23 • Excessive moisture deviation
5039"’ 24 . • Excessive moisture deviation
5040 25 • • • • .
5041 26 • • • • • • • . • Friability o f Im g
5042 27 • • • • Friability o f 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 . • Friability o f Img
5065 34 N/A' C of A. data missing
5067 35 N /A' C of A. data missing
5075 36 • • • •
5076 37 • Low drug ccxitent per tablet
5077 38
5967 39
218
Table 4.2. Summary of results for higher strength tablet process.
NIR PLS blend NIR PLS tablet combined NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number absorbance data predicted data predicted unusual predicted unusual tablet data predicted
unusual unusual
Q t Q t A Q A Q A
4991 1 •
4992 2 •
4999 3 •
5000 4 • •
5001 5
5009 6
5010 7
5011 8 •
5021 9 . . •
5022 10
5023 11
5024 12
5031 13 • • • • . •
5032 14 • • •
5033 15 • •
5043 16
5043"’ 17
5043" 18 •
5044 19 • • .
5046 20
5047 21 • •
5048 22
5049 23 • • • • • Thick tablets (4.62 m m ) of high
moisture (4.44%)
5050 24 • . Thick tablets (4.62 m m )
5051 25 • Thick tablets (4.62 m m )
5052 26
5053 27 Friability o f 1 mg
5054 28 • • Friability 4 m g; tablet thickness
= 4.63 mm
5058 29 • Thick tablets (>4.6 mm)
5059 30 Thick tablets (>4.6 mm)
5060 31 Thick tablets (>4.6 mm)
5068 32 Thick tablets (4.62 m m )
5069 33 Thick tablets (4.62 m m )
5070 34 . Thick tablets (4.61 mm)
5071 35 • • • Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36
5074 37 • Friability o f 4 mg
5078 38 • • • • • • Friability o f 1 mg; thick tablets
(4.62 mm)
5079 39 • • • • Thick tablets (4.62 m m )
5080 40
5081 41
219
4 .6 Conclusion
Both the singleblock and multiblock PLS models appear to be equally effective in
modelling the variability within the reference analytical data at both process stages. The
loadings of the models suggest that physical and chemical information is modelled.
(SNV-DT or Sg2dl 1) of the NIR data was useful. This was especially the case with
detected at the blend stage, even though the reference analytical data did not detect
anything unusual. For example, batches of unusual blends which produced friable
tablets. The Q statistic control charts of singleblock and multiblock models were also
able to identify trends in process deviation at both process stages: consecutive batches
deviation, low average drug substance content per tablet, friability, high average tablet
thickness and prolonged tablet dissolution time. Overall, the SNV-DT scatter correction
was considered the most appropriate, of those tested for blend data, for detection of
unusual processing by this control chart. For tablets, the Sg2dl 1 was considered the best
The MS PC control charts of the singleblock and multiblock PLS scores were also able
to identify unusual batches, as with the Q statistic charts. For example, batches which
produced friable tablets could be detected at the blend stage, despite normal reference
analytical data, and also from their tablet NIR measurements. Scatter correction by
Sg2dl 1 transformation was found to be the best data pre-treatment of blends, as this
produced the most consistent (i.e. best pre-treatment for detection of process deviation)
results between process stages. Interestingly with the lower strength tablet models,
scatter correction was not necessary to identify unusual batches (friable or increased
average tablet thickness), however for detection of process variation (drift), the Sg2dl 1
220
produced best results. The models of the higher strength tablets did require scatter
Once models have been calculated from scatter corrected data, monitoring of the PLS or
constructed from just two PLS scores and are able to provide diagnosis of the process
problem as the region in which the unusual batch scores are located is characteristic. For
example, the scores of blends or tablets which produce friable tablets or tablets which
have high average thickness tend to locate in specific areas of the score plot, and outside
the 99% Hotelling’s 7^ control ellipse. This was also found to be the case with batches
for which no reference analytical data was provided (eg BN5041 and BN5042). These
batches were consistently found to be unusual throughout the process, and had
preceding batches which were shown to be unusual from C. of A. data. These batches
would therefore be expected to show the same trend as the preceding batches, i.e. show
friability or increased tablet thickness etc. The process variance control charts were also
found to be useful for process monitoring. With the lower strength tablets, batches
which showed significant process variation corresponded to: friable tablet batches;
batches with low average drug substance content per tablet and batches which occurred
observed (although remaining within limits). These control charts for the higher strength
tablet process showed consistency in results at the blend and tablet stages. For example,
batches which showed significant process variability at both stages produced tablets
Overall, the PLS and MBPLS models produced consistent results between process
stages and results which are consistent with the PCA methods (Chapter 3). This
suggests that these PLS methods may also be solely used to control and monitor the
221
CHAPTER 5
5 .1 Introduction
control of blending and tabletting using near infrared spectroscopy. The methods
Unusual batches could be identified easily since they did not belong to the same
directly obtained from the NIR data, enabling diagnosis of the process problem. For
In this chapter, near infrared multispectral imaging is used to study some of the unusual
blends. The aim is to generate detailed, spatially resolved images which provide
diagnostic chemical and physical information at the microscopic level. Section 5.4
describes liquid crystal tuneable filter InSb focal plane array NIR imaging microscopy.
Multivariate methods and data pre-treatments are described in Sections 5.5 and 5.6
respectively. In Sections 5.7 to 5.9, methods are described for statistical control and
blend quality. Conclusions regarding this method are described in Section 5.11.
222
5. 2 Materials Studied
A selection of blends which exhibited unusual reference analytical results at the blend
and tablet stage were studied. Their batch numbers (BNs) were: 5039, 5040, 5045,
5064, 5065, 5067, 5070, 5078. Blend BN5045 exhibited excessive moisture deviation
and poor uniformity of content of the drug substance, throughout the blend; large lumps
of drug substance were found. These did not dissolve prior to high performance liquid
chromatography (HPLC) analysis, and were observed by visible light microscopy in this
study. This blend was therefore not tabletted and was considered an important blend for
study by NIR imaging microscopy. The drug substance content of each blend should lie
between 2.42 to 2.58% (2.5% nominal value); between different blend samples of the
same batch, the maximum absolute deviation in active content from their mean value
should not exceed 3%. In addition to these blends, powdered crystalline drug substance
5. 3 Sample Preparation
A single sample from each batch was prepared for imaging using a powder sampling
approximately 200 mg of powder between two flat circular barium fluoride discs. The
discs were held together between three stainless steel pins mounted on a steel plate and
were tightly secured to the plate by a steel screw cap. A circular hole in the screw cap
allowed for passage of light through the barium fluoride disc and onto the sample. The
samples were positioned on the microscope stage with the barium fluoride window
223
5. 4 Liquid Crystal Tuneable Filter InSb Focal Plane Array Near Infrared
Imaging Microscopy
5. 4 .1 Instrumentation
The near infrared (NIR) imaging system (MatrixNIR, Spectral Dimensions Inc.,
Maryland, MD, USA) incorporated a high resolution focal plane array InSb detector,
capable of producing NIR-images at high frame rate and with resolution of 256-by-256
pixels. The focal plane array is cooled by a Dewar flask filled with liquid nitrogen.
Light from the 100 W tungsten halogen lamp was directed through the liquid crystal
tuneable filter (LCTF) which had a tuneable range of 1100 to 1900 nm and wavelength
resolution of 1 nm (bandpass of 6 nm). The light emitted through the LCTF was then
intensity, and onto the sample. Adjustment of shutters on the beam splitter directed
light diffusely-reflected from the sample up through the objective lens and onto the
5. 4. 2 Sample Imaging
A sample prepared from each batch was imaged through the barium fluoride window of
the sample accessory. The microscope stage height required to focus light from the
LCTF on the powder surface was determined by manually focusing visible light on to
the sample using the lamp attached to the microscope. The shutters of the beam splitter
were adjusted for this purpose. When focused, the microscope lamp was then switched
off and the shutters of the beam splitter re-adjusted to allow light from the LCTF to
reach the powder surface and be diffusely reflected onto the InSb detector. The InSb
detector was maintained at low temperature by means of a liquid nitrogen filled Dewar
flask. All images were produced in a darkened room with the camera frame rate and
224
gain adjusted to provide optimum image quality with minimal saturation of the focal
plane array pixels. A frame rate of 7.89 Hz and 2000 accumulations per image plane
(wavelength) were deemed to provide optimum image quality and were used
subsequently. Images were recorded at each wavelength across the spectral region used.
For each sample, this provided a three dimensional image cube where the first two
dimensions (x,y) represent spatial location (pixel) and the third dimension represents
standard and with the same camera settings and wavelengths as for the blends.
5. 4. 3 Data Analysis
reflectance NIR images by pixel-wise division of each pixel by the corresponding pixel
of the ceramic background image. The resultant images were used for multivariate
image analysis (MIA). All programs used were written in Matlab 5.3 Scientific and
Technical Programming Language (The Mathworks Inc., Natick, MA, USA) and were
determine the spectral region where the active drug absorbs. This was investigated to
permit imaging over the minimum number of wavelengths necessary to reduce both the
computation time and the image file size. Principal components analysis of 11 point
225
nm, 2 nm increments) (Fig. 5.1.) was able to separate the placebo blend and blend
BN5045 data on the first component (Fig. 5.2). The loadings for the first component
revealed that the important wavelengths to monitor {i.e. highest absolute loading values)
were 1132, 1135, 1649, 1665, 1701 and 2145 nm (Fig. 5.3), although the latter
wavelength was outside the range of the NIR imaging microscope. The loadings from
1649 to 1701 were most important, hence the wavelength range selected was restricted
to 1600 to 1750 nm. The wavelength increment was carefully set to 6 nm and therefore
incorporated these important peaks. The number of wavelengths selected was therefore
26 out of the 800 hundred possible. Imaging time for each blend was approximately 1.8
hours.
loading vector and a score image for each principal component extracted (Esbensen et
al, 1992; Geladi et al, 1989a, 1989b, 1994, 1996; Geladi, 1992). Mathematically, the 3-
way algebra is equivalent to unfolding the 3-way array, X, to a two way array followed
by ordinary principal components analysis (Section 3.6.3, equation (3.6.1)) (Bharati and
MacGregor, 1998). This unfolding results in only transient loss of information as the
resulting scores, T, may be rearranged back to form an image (three-way array) (Geladi
et al, 1996). This procedure has been termed unfolding and backfolding (Wold et al.,
1987). The principal components algorithm used here was the power method. This was
considered to be more suitable and computationally faster than the NIPALS method as
the kernel matrix is formed only once. Successive PCs are extracted by deflating the
residual kernel matrix. Different methods of calculating the kernel matrix were
examined. These were: the cross products matrix; the variance-covariance matrix and
the correlation matrix (Geladi et al, 1996). Preliminary data analysis showed these
226
■S-1
-3
-4
1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
W avelength/nm
Fig. 5.1. Savitzky-Golay 2"^ derivative (11 point quadratic) of absorbance NIR
spectra of a blend BN5045 {n = 9) and a placebo blend {n = 42) over the wavelength
range 1110 to 2200 nm (2 nm increments).
-2
•2.5 •2 -1.5 0 0.5 1 1.5 2 2.5
PC2 S c o re s
xIO"^
Fig. 5.2. Hotelling’s control ellipse (99%) of PCI and PC2 scores for a blend
BN5045 (n = 42 blend BN5045 spectra used in PCA, 11 point quadratic Savitzky-
Golay second derivative of absorbance data) showing separation of blend BN5045
from placebo blend (n = 9 placebo blend spectra used in PCA) (control limits set
using blend BN5045 PC scores).
227
0.2
0.15
0.1
0.05
0.05
-0.1
- 0.15
1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
Wav«l*ngthAim
Fig. 5.3. Principal component loadings for the first component for 11 point
quadratic Savitzky-Golay second derivative of absorbance spectra of a blend
BN5045 in = 42) and a placebo blend {n = 9).
228
methods to produce similar results. All PCI and PC2 images formed from these kernel
(Geladi et al, 1996).The cross products matrix has been suggested as a method for
generating intensity PC images (Geladi et al, 1996), however the first two PC images
from this kernel matrix were not visually different from those obtained using the other
two kernel matrices. Subsequent MIA used the variance-covariance matrix and not the
reflectance data and the variance-covariance matrix. The first principal component
image for each blend was found to represent the intensity of reflected light {eg BN5045
(Fig. 5.4A.)). This image clearly shows that the illumination of the sample was non-
uniform. The score values on the first component confirm this and show a trend of
increasing intensity with pixel number (Fig. 5.5). The effect of pixel norm scaling
(multivariate shading correction) (Geladi et al, 1996) was also studied. Improvement in
intensity across the first PC image was observed (Fig. 5.4B) however lower intensity
still remained around the borders of the first 100 rows of the image. MIA was therefore
restricted to rows 100 to 256 for each image {i.e. 157 rows). Pixel norm scaling was not
considered necessary for this image region and was therefore not used.
The effect of an m-hy-n median filter on signal to noise ratio was investigated for all
images. This involved replacing pixel reflectance values by the median value of itself
229
150
200
#
250
50 100 150 200 250
Fig. 5.4. Principal component one image (PCI) of blend BN5045 (reflectance data)
showing non-uniform illumination of sample. A) Before multivariate shading
correction, B) after multivariate shading correction.
230
w -0 .5
-1 .5
3 4
P ix e l
Fig. 5.5. Principal component one score plot (PCI) for blend BN5045 (reflectance
data) showing variation in illumination intensity (score value) with pixel number.
231
and the pixels surrounding it in a 3-by-3 pixel filter at each image plane (wavelength).
The filter clearly improved signal to noise, probably by removing stray light and ‘salt
and pepper’ noise, producing smoother images (Fig. 5.6). Median filtering also
produced better visual contrast between true image features for each principal
component image (Fig. 5.6) because the 256 intensity levels were set from proper
intensity levels and not from extreme noise values. In figure 5.6, a surface scratch on the
barium fluoride disc of image BN5045 (lower left part of image) shows up more clearly
as a dark line in PCI image after median filtering (compare Fig. 5.6A & 5.6B). Other
spectral features, such as crystalline material, appear brighter after median filtering
(compare Fig. 5.6A & 5.6B). Examination of the first four principal components
loadings for PCA models derived from raw reflectance data and from median filtered
reflectance data showed that the third and fourth PCs appeared to contain more
chemical information after median filtering (Fig. 5.7). The cumulative percentage sum
of squares explained by the PCs extracted was greater after filtering, especially with the
first PC, and confirms the reduction in random noise in the data set (Table 5.1). Median
filtering was therefore considered very useful and was used in all subsequent MIA.
The effect of a range of data pre-treatments, commonly applied to near infrared spectra,
were studied with the blend images. These were: SNV; Sg2dl 1 and 15 point quadratic
pixel-wise to the 26 reflectance values. All pre-treatments were found to remove much
of the intensity information from the images. The first two principal components
loadings showed high positive and negative peak values across the wavelength range
information from the images is likely to make detection of unmilled crystalline material
232
•
Fig. 5.6. The effect of 3-by-3 median filtering on image signal to noise ratio (first
principal component image, reflectance data) for blend BN5045. A) No filtering, B)
3-by-3 median filtered data.
233
W av«l*ngth/nm
W av«l«ngth/nm W #v#l#ngth/nm
I
I
g
§ 3 - 0.1
Fig. 5.7. Principal component loadings for blend BN5045 (reflectance data) for
first four components. Raw reflectance data: (A) to (D); median filtered reflectance
data: (E) to (H).
components models (reflectance and median filtered reflectance data sets) for
blend BN5045.
Reflectance 1 57.19
2 80.01
3 84.56
4 87.79
2 87.23
3 89.76
4 91.70
234
harder as unmilled material appears to reflect with higher intensity, probably due to
some specular reflectance. This was observed with the first PC image for all blends (Fig
5.8), especially BN5045 (Fig. 5.6A & 5.6B). For this study, use of reflectance data was
therefore preferred.
image where a class of material is located (Geladi et al, 1996). In this study three
Another method studied involved construction of a pixel density map. This method
interpretation of the PC loadings (for example the first and third PCs would be selected,
representing intensity and drug substance respectively). A pixel density map is a three
intensity levels) on the two selected PCs. Mathematically, it is a two way matrix where
the row and column dimensions represent the intensity on the two PCs and the value of
an element in the array is the number of pixels with these intensities (Bharati and
MacGregor, 1998). Once calculated, the array is then displayed as an image. The
reasoning behind this approach is that pixels representing the same class of material will
show similar
235
100
150 150
50 100 150 200 250 50 100 150 200 250
100
150
50 100 150 200 250 50 100 150 200 250
100 100
150 150
100 150 200 250 50 100 150 200 250
H
100
150 150
50 100 150 200 250 50 100 150 200 250
Fig. 5.8. Principal component one intensity images of blends (median filtered
reflectance data). A) BN5039, B) BN5040, C) BN5045, D) BN5064, E) BN5065, F)
BN5067, G) BN5070, H) BN5078.
236
intensities on the two PCs and will be grouped together in the density map (Bharati and
MacGregor, 1998). Regions of interest in these maps were highlighted by placing them
inside a rectangle, selected using a computer mouse and an on-screen cross-hair pointer.
Pixels with the range of selected intensities for the two PCs were then projected back
The final method studied involved construction of Shewhart type control charts for the
PCs studied. The control limits used were those suggested by Jackson (1991) (Section
3.7). The reasoning behind this method is that pixels which represent a high
concentration of a material will show high positive or negative values on the PC which
With each method, the pixels which were selected were also used to construct binary
images. Selected pixels were given a value of one (white), non-selected pixels were
Overall, the most effective method for these data sets was found to be the Shewhart
chart method. This was able to locate areas of unmilled crystalline material with both
the blends and powdered drug substance from the first PC (intensity). In addition, with
the blends, the spatial locations of drug substance (both fine and unmilled clumps) were
located by use of a control chart for the third PC. This PC showed loadings similar to
spectra of the placebo and active blends (Fig. 5.9A and 5.9B) and importantly, similar to
the loadings of the third PC of drug substance (Fig 5. IOC). In this spectral region, the
absorbance was found to be higher if active material was present (Fig. 5.9), especially
from 1648 to 1672 nm. It therefore follows that drug substance shows lower reflectance
in this region. Interpretation of the loadings for the third component revealed that areas
of blend with no drug substance would have lower (more negative) score values than
areas of blend with drug substance. This is because regions of blend with low drug
237
-0 .0 3
-0.04
-0 .0 4
5-0.05
-0.05
-0.06
c -0.06
S -0.07 S -0 .0 7
-0 .0 8
-0.08
1600 1650 1700 1750 1600 1650 1700 1750
W avelength/nm
o 10
o
g
I.
° 1600 1650 1700 1750 ° 1600 1650 1700 1750
W avelength/nm W avelength/nm
Fig. 5.9. Near infrared absorbance spectra of a placebo blend (n = 9) and a blend
{n = 42) (1600 to 1750 nm, 6 nm increment). Absorbance spectra: A) placebo blend,
B) blend. Second derivative of absorbance: C) placebo blend, D) blend.
I
1
Wmv#l*ngt.h/r
Fig. 5.10. Principal components (PC) loadings for the first four components for
powdered drug substance. A) First PC, B) second PC, C) third PC, D) fourth PC.
238
substance content will, similar to placebo blend, show higher reflectance. The
contributions to lower score values for such pixels will therefore be mostly from the
high negative loadings. With the careful selection of the wavelength range used {i.e.
where most differences in reflectance occur between drug substance and the bulk), the
drug substance should be identified from the Shewhart PC chart above a threshold level
(95% or 99% confidence). A comparison of the first and third PC images of blend
BN5045 provided evidence which supported this. The areas of crystalline material in
this blend (Fig. 5.11C) are known from certificate of analysis tests to represent clumps
of unmilled drug substance. Although these regions showed high reflectance in PCI
image, probably due to specular reflectance from the crystals, some absorption of light
also occurred as the same areas also appear as high score values for the third PC image.
With blend BN5067, the polarity of the loadings on PC3 were reversed. Pixels with low
score values were therefore deemed to represent drug substance, and also had identical
Binary images were produced for each blend using the third PC image. The percentage
of pixels found to represent drug substance was approximately 2.5% for the blends.
This is not inconsistent with the concentration of drug substance in the blends
(2.5%""/m).
Images
The binary images (95% control limits on PC scores) representing unmilled crystalline
material (PCI) and drug substance (PC3) were monitored by dividing the images into
16 sub-images (Fig. 5.12) (each image used just the first 156 rows, and all 256
columns). With each image sub-area (39-by-64 pixels), the number of pixels classified
as unmilled or drug substance was recorded, and the results for the 16 sub-areas were
239
150 200
Fig. 5.11. Binary images of crystalline areas produced from principal component
one intensity images of blends (median filtered reflectance data). A) BN5039, B)
BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H) BN5078.
240
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
256
241
displayed as bar charts. This allowed for monitoring of areas with high concentration of
The results of this monitoring method allowed easy identification of areas with high
unmilled material was calculated for each image and control limits, based on the t-
distribution (Neave, 1995), were used corresponding to 95% and 99% (Table 5.2).
These limits differ between blends and were used as a rough guide for identifying high
concentration areas; future monitoring would probably use limits set empirically. All
images showed at least one region with a high pixel concentration of unmilled
crystalline material (Fig. 5.13). Blend BN5045 showed two areas with high pixel
concentration of unmilled material (Fig. 5.13C). Blends BN: 5039; 5040; 5064; 5065
and 5067 all showed one area with a pixel concentration of 15% or greater (Fig. 5.13).
Monitoring of the PC3 binary images used the same method as for unmilled crystalline
material. For each image, the percentages of drug substance for the 16 sub-areas were
averaged and the standard deviation calculated (Table 5.3). From the results for these
images, blends BN: 5040, 5064; 5070 and 5078 appear to have mean drug substance
contents by surface area which are reasonably consistent with the specification limits of
2.42 to 2.58%"™/m. Blends BNs: 5039 and 5045 showed high concentration of drug
substance content in these images and their respective certificate of analysis results
confirmed that some blend samples showed excessive drug substance content. The
standard deviation of drug substance content was higher for BN5045 than for BN5039,
which agrees with the certificate of analysis content uniformity results. Two batches,
242
B
■5 15
X
■q.
o
10
ra o>
5 10 15 5 10 15
Image area Image area
5 10
image area Image area
«20
|io
Image area
0)
5
5 10
Image area
iJ
15
15
-10
5 10 15
Image area Image area
Fig. 5.13. On-line Shewhart PC control bar charts showing percentage of unmilled
crystalline material in 16 sub-areas of blend images, with mean and sigma and 2
243
Table 5.2. PCI Unmilled crystalline material Shewhart control bar charts of blend
images, each divided into 16 image sub-areas.
Blend Percentage unmilled material (mean) Standard deviation o f unmilled material (%) Content uniform ity"
Table 5.3. PC3 ‘drug substance’ Shewhart control bar charts of blend images, eai
Blend Image calculated percentage of drug Certificate of analysis drug content Standard deviation of drug Content uniformity"
^ ‘Content uniformity’ is the maximum absolute difference between the image sub-area
with the highest number of pixels classified as unmilled with the mean value, divided by
the mean value over all sub-areas, expressed as a percentage.
244
BN5065 and BN5067, were produced consecutively and subsequently found to have
processed unusually by NIR spectroscopy (Chapters 3 and 4), having showed high and
low drug substance contents respectively by this method. These results are shown in
Substance.
The binary images of unmilled crystalline material and of powdered drug substance
were subjected to particle size analysis. With each image, all 157 rows and 256 columns
were used. Particle-sizing was performed by scanning horizontally and vertically across
each binary image (PCI ‘unmilled crystalline material’ and PCS ‘powdered drug
substance’ binary images) and measuring the diameter of each particle encountered.
Hence for each particle, a number of measurements of its size were recorded (Table
5.4).
The data for each image were used to construct percentage frequency particle size
distributions (Fig. 5.16 and 5.17) and to estimate the mean, median and maximum
particle sizes for each image, and the standard deviation of each percentage frequency
particle size distribution. The resolution of the NIR imaging microscope restricted the
Particles identified as crystalline material in each blend’s PCI binary images showed
mean and median particle sizes greater than those measured for the corresponding PC3
powdered drug substance images (Table 5.4). In addition, with the exception of blend
BN5070, the maximum measured particle sizes of each blend’s PCI unmilled
crystalline material image were much larger than with the corresponding PC3 powdered
drug substance image. Blends: BN5040; BN5045; BN5065 and BN5078 showed the
245
100 150 200 250 100 150 200 250
H
Fig. 5.14. Binary images of areas of drug substance produced from principal
component three images of blends (median filtered reflectance data). A) BN5039,
B) BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H)
BN5078.
246
0) o
O)
o
i ü
5 10 5 10
Im age area Im age area
X6
0)
o 4
5 10 5 10 15
Image area Im age area
5 10 5 10
Im age area Im age area
a.
o4 o 4
oO)
i ü
5 10 15 5 10 15
Im age area Im age area
Fig. 5.15. On-line Shewhart PC control bar charts showing percentage of drug
substance in 16 sub-areas of blend images, with mean and sigma and 2 sigma
limits. Blend image: A) BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065;
F) BN5067; G) BN5070; H) BN5078.
247
Table 5.4. Particle size analysis of unmilled crystalline material and powdered
B a tc h C r y s t a l l i n e m a t e r ia l ( P C I i m a g e s ) P o w d e re d d ru g s u b s ta n c e (P C 3 im a g e s )
s i z e / |i m s iz e /^ m s iz e /^ im s i z e / |i m
where n is the number of particle size measurements recorded for each image.
248
>>40
O-30
0)20
e 10
6 12 18 24 30 36 42 48 54 60 20 40 60 80 100
Particle size/|im Particle size/pm
>,40
O’ 30
0)20
y 10
20 40 60 80 100 120 6 12 18 24 30 36 42 48 54 60 66 72
Particle size/pm Particle size/pm
- 20
(u 10
20 40 60 80 100 20 40 60 80
Particle size/pm Particle size/pm
o 40
ST30
O'20
Ü 10
6 12 18 24 30 36 42 48 54 6 12 18 24 30 36 42 48 54
Particle size/iim Particle size/^m
249
6 12 18 24 30 36 42 48 6 12 18 24 30 36 42
Particle size/pm Particle size/pm
12 18 24 30 36 42 12 18 24 30 36
Particle size/pm Particle size/pm
12 18 24 30 36 42 12 18 24 30 36
Particle size/pm Particle size/pm
6 12 18 24 30 36 42 48 12 18 24 30 36
Particle size/pm Particle size/pm
250
largest unmilled crystalline material particle size.
In this Section, all 157 rows and 256 columns of each image were used.
Residual Analysis
With these images, most pixels have been shown to fall within control limits on a given
PC, that is most are grouped within the same multinormal distribution. Preliminary Q
residual analysis for blend BN5045, with a 7PC model, showed that most of the 40192
pixels fall within the model space, as predicted by the PC Shewhart control charts.
Unusual pixels, which did not fit the model were found to be those with high positive or
crystalline drug substance and holes in the powder surface or scratches on the barium
residual analysis, once identified from the PC score Shewhart control charts. A PCA
appropriate rank determined by cross validation. In this work, location and chemical
The method of detecting unmilled crystalline material on the PCI intensity images by
Shewhart control charts was used to identify the spatial locations with highly reflective
dense crystalline material. Blend BN5045 was treated as a reference blend, from which
the Q residual model was built. This blend was used as a reference as its crystalline
areas are known to be unmilled drug substance. With the other unusual blends, the
crystalline areas do not all necessarily contain the same chemical composition, i.e. drug
251
substance. For these, this was therefore monitored by residual analysis using the Q
A PCA model was built from blend BN5045 using pixels which had intensity values
above the PC Shewhart 99% control limit (« =196 pixels). The crystalline material on
this image is known from certificate of analysis data and from examination of both PCI
and PC3 images to be drug substance crystals. Cross validation using 14 subgroups each
with 14 pixels was performed. The W, R and percentage sum of squares {%SS) statistics
were examined. The rank of the data set was determined to be 7 PCs {R = 1.03, %SS =
8 8 .6 6 , Table 5.5).
which accounts for intensity and drug substance content, and after extraction of 7
components, the model rank determined by cross validation for pixels with PCI score
values above 99% control limit (Fig. 5.18). For calculation of the residual distance to
model images (Geladi et al, 1996), PCA models were constructed from the entire image.
The residual distance, Q^a , of each pixel from the 3PC and from the 7PC models was
calculated as:
r K ^ (5.8.1)
Q ua =1jt=i ijkA
Where ^ijL4 is a vector of size (K-by-1) with indices i andy, extracted from E a , d\jA is a
pixel with indices i and j for the PCA models with rank A, i= 1, ..., / is the column
252
Table 5.5. PCA results for modelling unmilled crystalline drug substance in blend
BN5045 (n = 196 pixels) (pixels selected from PC3 Shewhart control chart with
0 -
0.003
0.002
Observation
Fig. 5.18. Q Residual control chart for PCA model of unmilled crystalline material
identified from PC Shewhart control chart of blend BN5045 with a 99%
confidence level (/i = 196 observations, 7PCs) (95% and 99% limits for Q shown).
^ Cumulative % SS is the cumulative percentage sum of squares of the original data set
explained by the model with n PCs.
253
index in the image, 7 = 1, . . 7 is the row index in the image, k= 1, ..., 7Tis the variable
index for the multivariate residual image from MPCA, E a (Section 3.6.3, equation
(3.6.1)), and A is the effective rank of the model considered (3 or 7PCs).The form
an intensity image Qa which shows distance and location and clustering (Geladi et al,
1996). Regions which fit the model well will show small distances, regions which do
not fit the model will show up as large distances. After 3PCs were extracted, most of the
crystalline areas fit the model, however one large crystal is observed in the upper right
region (Fig. 5.19A). However, after 7PCs, little structure remains in the residual image
E a , with only a few pixels showing high distance (Fig. 5.19B). The 7PC model
Using blend BN5045, a 7PC model was constructed from pixels above their PCI
Shewhart 99% control limits {n = 196 pixels) (Fig. 5.18). This limit was used instead of
the lower 95% limit to reduce Type I errors (random). Using these pixels, control limits
The 95% and 99% control limits for Q were 3.4644 x 10“^ and 4.5704 x 10“^
respectively. The type I errors were found to be satisfactory with one pixel out of 196
exceeding the 99% control limit, and 10 pixels exceeded the 95% control limit. Type I
errors were also estimated for this model using pixels identified as crystalline on PCI
Shewhart chart with a 95% control limit. Out of 598 pixels above the 95% PCI
Shewhart control limit, only 19 had Q values which exceeded the g 95% control limit
(3.18%). Future monitoring of crystalline material for the other blends used this model.
The pixels which were monitored for each blend were those which exceeded the 95%
PCI Shewhart control limits for those images (Table 5.6). Blends BN: 5040; 5064;
5067; 5070 and 5078 all showed greater than 1% of their crystalline pixels to exceed the
254
Fig. 5.19. Residual distance, d, to model image for blend BN5045 after principal
components analysis. Principal components extracted: A) 3; B) 7.
255
Table 5.6. Q Residual analysis of blend pixels classed as unmilled crystalline
material on their respective PCI control charts (95% limits). Model built from
blend BN5045 pixels above 99% control limit on PCI (/i = 7PCs).
Batch Pixels above 95% control limit on g > Q 95% control g > Q 99% control
256
99% Q limit (Table 5.6). Their control charts showed that those pixels which exceeded
the 99% limit for Q were not randomly distributed, but spatially were closely located
(Fig. 5.20). Hence these pixels are unusual and, although they are crystalline, they may
not be drug substance. Further investigation of these pixels could involve projecting
them back into the image space, to identify their spatial location, and in addition the
may require imaging over the full NIR spectral wavelength range.
Ellipses
The monitoring of blend quality with individual PC score Shewhart control charts was
scores. This method simultaneously monitors all blend quality variables with a specific
overall type I error. The PC score Shewhart control charts used each had a type I error
of 5%; overall this equates to a type I error of 14.26% for a 3PC model.
With this method, the first three principal components scores were monitored with 95%
and 99% control limits. With the PCA models. Hotelling’s 7^ was calculated according
to equation (1.8.4).The upper control limit, UCL, for this statistic was be calculated
according to equation (1.8.5). The value of a for the I00a% critical point of the F
distribution was set to 0.95 (warning level) and 0.99 (control level).
With a PCA model derived from mean-centred data. Hotelling’s 7^ may also be
n *2 w *2
where Xa,a= 1 , 2 , ...,« , are the eigenvalues of the covariance matrix, S, and ta are the
257
A
0.01 0.02
0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002
0
200 400 600 200 400 600
Observation Observation
C
0.01
0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002
100 200 300 400 500 200 400 600 800 1000
Observation Observation
E F
0.01 0.04
0.008
0.03
0.006
O O 0.02
0.004
0.01
0.002
0.008
0.015
0.006
O 0.01
0.004
0.005
0.002
Fig. 5.20. Q Statistic control charts for blends using local PCA model derived from
unmilled crystalline drug substance pixels (99% control limit on PC3 Sbewbart
chart) of blend BN5045 (95% and 99% Q control limits). Blend: A) BN5039; B)
BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G) BN5070; H) BN5078.
258
scores from the principal component transformation, s^a is the variance of ta (the
Prior interpretation of the PC loadings and of the PC score images has shown that the
first three components represent physical and chemical information. The first
substance. The first and third PCs were considered most important for monitoring and
are shown in Fig. 5.21. This figure shows the Hotelling’s 7^ 95% and 99% control
ellipses and the 95% and 99% PC Shewhart control limits. This is an informative plot
crystalline (high positive PCI score value), drug substance (high positive PC3 score
value) or both (high positive score values on PCI and PC3) or as a surface scratch on
the barium fluoride disc of the sampling accessory (high negative value on PCI score
value). A simple computer program was written to assign pixels with Hotelling’s 7^
above the 95% limit {UCL = 8.99, a = 0.95). This involved monitoring normalised
scores. The sum of square of each of these values corresponds to Hotelling’s 7^ and
Similar to the PC Shewhart control charts, a 95% control limit based on the t-
distribution was used. Hence any normalised score value exceeding a value of 1.96 was
deemed to represent a high value on the PC. An example of this is shown as bar charts
for two pixels in Fig. 5.22. The two pixels which had the highest score values on PCI
and PC3 are shown with their normalised score values. The first pixel was classified as
259
0.6
Unmilled crystalline materilal I Unmilled drug su b stan ce
I I
I I •
0.4
!.. ••• . .
0.2
O
^ - 0.2
-0 .4
- 0.8 _i_L
-0.15 - 0.1 -0.05 0 0.05 0.1 0.15 0.2
PC3 S cores
Fig. 5.21. Hotelling’s 7^ control ellipses (95% and 99%) and PC Shewhart control
limits (95% and 99%, dotted line and dotted-dashed line respectively) for principal
components 1 and 3 of blend BN5045 showing regions of: unmilled crystalline
material and crystalline drug substance; drug substance and surface scratches on
the barium fluoride (BaFi) disc of the sampling accessory window.
260
1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t
« 0.3
o 0 .2
1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t
Fig. 5.22. Bar charts of principal components (PC) scores and normalised PC
scores of pixels from blend BN5045 PCI to 3 images (7 ^ 3, 40192-3, 0.95 = 8.99, 7^ 3, 40192-
3, 0.99 = 13.82). Pixel 34159 (drug substance), 7^ = 91.50: A) PC scores & B)
normalised PC scores. Pixel 34992 (unmilled crystalline material), 7^ = 33.45: C)
PC scores & D) normalised PC scores.
261
both drug substance and crystalline (Fig 5.22B), the second pixel was classified as
crystalline (Fig. 5.22D). This agrees with the PCI versus PC3 plot (Fig. 5.21).
As an example, the results of the multivariate control approach for blend BN5045 were
represented as an image. A colour scheme was chosen which would be suitable for an
engineer to monitor. Areas which were identified as crystalline were given the colour
red (denoting a warning colour). Areas which were identified as drug substance were
given the colour green (denoting an acceptable colour). The background colour was
blue. This is shown in Fig. 5.23. Areas which were classified as both crystalline and as
drug substance were shown as green. The advantage with this multivariate approach is
that now both areas with dense unmilled particles and drug substance may be shown on
the same graph. In this image, both unmilled crystals and drug substance occur together
in agreement with previous findings. In total 715 pixels were identified as unmilled
crystals, 850 were identified as active, and 747 as surface scratches. Further analysis of
these results showed that only 332 out of the 715 crystalline pixels were identified
solely as crystalline material, the rest were also identified as drug substance. With the
surface scratch pixels, only 399 were solely identified as being that. Therefore, with this
total, with 0.84% being unmilled large crystals. This result is not inconsistent with the
certificate of analysis results. These results may also be subjected to an ‘on-line’ type of
analysis in the same manner as with the binary images from the PC Shewhart control
charts. The identification of areas with unmilled crystals and drug substance should be
as straightforward.
262
100
120
140
150 200
Fig. 5.23. Image of blend BN5045 showing areas identified as drug substance
(green, n = 850 out of 40192 pixels) and crystalline material (red, n = 715 out of
40192 pixels) from Hotelling’s 7^ control ellipse (7^95% control limit = 8.99) of
principal components 1 to 3 scores. Pixels classified as both drug substance and
crystalline are shown as green.
263
Future Monitoring o f Blend Images Using The Multivariate Model
The multivariate control model developed from blend BN5045 was used to verify its
ability to detect both crystalline regions and drug substance. This was tested using an
image of pure drug substance. In Fig. 5.24, the first PC images of drug substance and
blend BN5045 are shown. From these, it is readily apparent that the image of drug
substance is visually more darker. This is consistent with spectroscopic analysis as drug
substance absorbs more strongly over this spectral region than the remaining bulk
identifiable on the drug substance image. These probably represent holes in the powder
surface. The sample of drug substance imaged was a very fine, micronised and porous
sample. In both images, crystalline areas are visible as bright yellow spots.
The multivariate monitoring procedure was the same as for blend BN5045. First, the
unfolded image array of drug substance was centred and projected onto the eigenvectors
from the 3PC model of blend BN5045. The normalised scores were then monitored and
assigned as for BN5045. In this case, a considerable number of pixels were found to
have high negative score values on PCI (« = 10660 out of 40192 pixels). These were
projected back into the image score space (Fig. 5.25) and appear to overlap with the
dark areas observed in the first PC image (Fig. 5.24). These were considered to
represent holes in the powder surface. This was confirmed by a pixel density map of
score intensities on PCs 1 and 3 for the drug substance and blend BN5045 combined
image. This showed that many of the dmg substance pixels had lower intensities on the
first PC (Fig. 5.26); this PC had a bimodal distribution of pixel intensities (Fig. 5.27).
Further analysis of the scores revealed that of these 10660 pixels, 4334 pixels were not
assigned to another class. The number of crystalline pixels was found to be 391, with
153 of these classified solely as crystalline. The number of pixels found to be drug
264
100
150
200
250
300
g *
50 100 150 200 250
Fig. 5.24. Principal component 1 image of drug substance (upper half) and blend
BN5045 (lower half) (PCA model calculated for blend BN5045) showing an overall
darker and more porous image for the drug substance.
265
20
40
GO
iP I l- s .
80
Æ ^ . : v :
100 : V î 'V '< .
120
140
Fig. 5.25. ‘On-line monitoring of powder porosity’ image for micronised drug
substance powder produced from 3PC model derived from blend BN5045 with
Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as holes in
the powder surface from high normalised PC2 scores (red, n = 10660 pixels out of
40192).
266
Powdered drug substance
B end BN5045
r 150
100 150
PC3 Pixel intensity
Fig. 5.26. Pixel density image of principal components 1 and 3 scores for blend
BN5045 and powdered drug substance (PCA model calculated from blend
BN5045) showing separation in intensity along PCI.
267
o 1000
0)
o- 800
r 600
100 150
PCI Pixel intensity
4000
:»3000
g> 2000
1000
Fig. 5.27. Pixel intensity frequency histograms of principal components (PC) 1 and
3 scores for blend BN5045 and powdered drug substance (PCA model calculated
from blend BN5045). PCI scores showed bimodal distribution of pixel intensity
frequencies (0 to 255 intensities): A) PCI (mode intensities: 72, n = 757 pixels; 135,
n = 1423 pixels) and B) PC3 (mode intensity: 123, n = 4067 pixels).
268
substance was 22241. An image showing crystalline and drug substance regions of the
powdered drug substance is shown in Fig. 5.28. By correcting the total number of pixels
for those believed to represent holes in the powder surface, the drug substance content
of this image was 62.03% by surface area. This value is not 100% and may be a result
of the signal to noise in this image; spectra (pixels) of very fine particles are likely to
have lower signal to noise and might not be classified as drug substance. Instead these
might be classified as holes in the powder surface. However, visually, this image shows
The calculated active content value of 62.03% by surface area is less than the content by
mass. This may be due to the porous nature of the powder. Importantly, though, this
result confirms the ability of the model to identify a greater abundance of drug
substance.
Although not tested here, the specificity of the model towards the drug substance could
model space, followed by similar multivariate monitoring. In this study, testing the
model’s specificity to this drug substance was not necessary since all imaged blends
have been identified and qualified by NIR spectroscopy (Chapter 3). However, other
powdered materials which absorb in this same narrow NIR region might be incorrectly
classified as containing this drug substance if the same MPCA model of blend BN5045
were to be used for monitoring. In this event, it is likely that either identification and
qualification of the tested material by NIR spectroscopy, or imaging over a wider NIR
269
I
100
120
140
I
Fig. 5.28. ‘On-line type monitoring of drug substance content’ image of micronised
drug substance powder produced from 3PC model derived from blend BN5045
with Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as
drug substance from high normalised PC3 scores (green, n = 22241 out of 40192
pixels) and crystalline material from high normalised PCI scores (red, n = 391 out
of 40192 pixels). (Dark areas corresponding to holes in powder surface and surface
scratches on barium fluoride sampling accessory window not shown, n = 10660 out
of 40192 pixels). In total, 35412 pixels were outside the 95% control ellipse.
270
Another method of classification that could be tested is /[-nearest neighbour. This
method could be used to classify pixels classed as crystalline, drug substance and as
barium fluoride disc surface scratches. This would require a multivariate model with
one or all of these classes of pixels, eg blend BN5045. The classification results by this
method might yield results as good as those based on Hotelling’s 7^ and normalised PC
scores.
5.11 Conclusion
Multivariate image analysis of NIR multispectral images may be used to monitor the
parameters such as drug substance content, dense crystalline material content, blend
uniformity, particle size distribution and powder porosity. Monitoring of future blends
requires construction of a multivariate model. The blend used for this purpose should
271
CHAPTER 6
Conclusion
The results of this study clearly demonstrate the ability of near infrared spectroscopy
Each of the process stages requires the construction of a multivariate model of NIR
measurements of tablets (in Chapters 3 and 4, these raw data were transformed to
data. For each process stage, the multivariate model derived from either of these data
with reference analytical data. The initial implementation of this MSPC method for
data. However, as the multivariate models derived from NIR measurements and from
both NIR measurements and reference analytical data were equally effective in
The NIR method has several advantages over the reference analytical tests. These relate
to both the analysis and the analytical results obtained. The NIR analysis is considerably
faster and allows non destructive analysis of the process material’s matrix. The
272
multivariate analysis of these measurements allows the process performance at all
that lead to lower quality product, are readily identified and diagnosed throughout the
process. Importantly, deviations in process performance which affect the blending stage
may be observed at this stage by the NIR method. These are not always identified with
the reference analyses. NIR imaging of these blends may be used for at-line quality
control of the blending stage and provides useful diagnostic information of blending
performance. This information may allow for improved control of blending to ensure
273
References
Advances in Near-Infrared Measurements, ed. Patonay, G., JAI Press Inc., Greenwich,
Connecticut, 1993.
Alt, F. B., in Encyclopedia o f Statistical Sciences, ed. Kotz, S., and Johnson, N. L., John
Andersson, M., Josefson, M., Langkilde, F. W., and Wahlund, K. G., J. Pharm. Biomed.
Anal, 1999,20,27.
Aucott, L. S., Garthwaite, P. H., and Buckland, S. T., Analyst, 1988,113, 1849.
Barnes, I. J., Dhanoa, M. S., and Lister, S. J., Applied Spectroscopy, 1989, 43, 772.
Barth, H. G., Sun, S., and Nickol, R. M., Anal Chem., 1987, 59, 142.
Beebe, K. R., Pell, R. J., and Seasholtz, M. B., Chemometrics: A Practical Guide, John
Bharati, M. H., and MacGregor, J. F., Ind. Eng. Chem. Res., 1998, 37, 4715.
Blanco, M., Coello, J., Iturriaga, H., Maspoch, S., and Pezuela, de la, C., Analyst, 1998,
123, 135R.
Bromba, M. U. A., and Ziegler, H., Anal Chem., 1981, 53, 1583.
Bromba, M. U. A., and Ziegler, H., Anal. Chem., 1983, 55, 1299.
XVII, A & B .
274
Candolfi, A., Maesschalck, de, R., Massart, D. L., Hailey, P. A., and Harrington, A. C.
Candolfi, A., Maesschalck, de, R., Jouan-Rimbaud, D., Hailey, P. A., and Massait, D.
Ciurczak, E., Torlini, P., and Demkowicz, P., Spectroscopy, 1986,1, 36.
Cowe, I. A., McNicol, J. W., and Clifford Cuthbertson, D., Analyst, 1989, 114, 683.
Dhanoa, M. S., Lister, S. J., Sanderson, R., and Barnes, R. J., J. Near Infrared
Dillon, W. R., and Goldstein, M., Multivariate Analysis Methods And Applications,
Dreassi, E., Ceramelli, G., Savini, L., Corti, P., Peruccio, P. L., and Lonardi, S., Analyst,
1995a, 120,319.
Dreassi, E., Ceramelli, G., and Corti, P., Analyst, 1995b, 120, 1005.
Dreassi, E., Ceramelli, G., Corti, P., Massacesi, M., and Perruccio, P. L., Analyst,
Dubois, P., Martinez, J., and Levillain, P., Analyst, 1987,112, 1675.
Esbensen, K. H., Geladi, P., and Grahn, H. P., Chemometrics and Intelligent Laboratory
Forbes, R. A., Persinger, M. L., and Smith, D. R., J. Pharm. Biomed. Anal., 1996,15,
315.
Frake, P., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U. A.,
Frake, P., Gill, I., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U.
275
Problem-Solving, CRC Press, Cleveland, Ohio, 1973.
Geladi, P., Isaksson, H., Lindqvist, L., Wold, S., and Esbensen, K., Chemometrics and
Geladi, P., Swerts, J., and Lindgren, F., Chemometrics and Intelligent Laboratory
Geladi, P., and Grahn, H., Multivariate Image Analysis, John Wiley & Sons Inc., New
York, 1996.
Hailey, P. A., Oakley, A. C. E., Doherty, P., Pettman, A. J., Sharp,D. C. A., and
Hailey, P. A., Doherty, P., Tapsell, P., Oliver, T., and Aldridge, P.K., J. Pharm.
Handbook o f Pharmaceutical Excipients, ed. Wade, A., and Weller, P. J., American
edn., 1994.
Harman, H., Modem Factor Analysis, The University of Chicago Press, Chicago, 3"^^
edn., 1976.
Dari, J. L., Martens, H., and Isaksson, T., Applied Spectroscopy, 1988, 42, 722.
Jackson, J. E., A User’s Guide To Principal Components, John Wiley & Sons Inc., New
York, 1991.
276
Kortum, G., Reflectance Spectroscopy, Principles, Methods, Applications, Springer-
Kourti, T., and MacGregor, J. F., Journal o f Quality Technology, 1996, 28, 409.
Kresta, J. V., MacGregor, J. P., and Marlin, T. E., The Canadian Journal o f Chemical
Lindberg, W., Persson, J. A., and Wold, S., Anal Chem., 1983, 55, 643.
Lowry, C. A., Woodhall, W. H., Champ, C. W., and Rigdon, S. E., Technometrics,
MacGregor, J. P., Jaeckle, C., Kiparissides, C., and Koutoudi, M., AIChE Journal,
Mark, H., Principles and Practice o f Spectroscopic Calibration, John Wiley & Sons
Maesschalck, de, R., Cuesta Sanchez, P., Massart, D. L., Doherty, P., and Hailey, P.,
Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y., and Kaufman, L.,
Montgomery, D. C., Introduction To Statistical Quality Control, John Wiley & Sons
Nomikos, P., and MacGregor, J. P., AIChE Journal, 1994, 40, 1361.
Norris, K. H., and Williams, P. C., Cereal Chemistry, 1984, 61, 158.
Osborne, B. G., Peam, T., Hindle, P. H., Practical NIR Spectroscopy With Applications
in Food and Beverage Analysis, Longman Scientific & Technical, UK, 2nd edn..
277
1993.
Piovoso, M. J., Kosanovich, K. A., and Pearson, R. K., Proc. Amer. Control Conf,
1992, 2359.
Plugge, W., and Vlies, van der, C., J. Pharm. Biomed. Anal., 1993,11, 435.
Plugge, W., and Vlies, van der, C , J. Pharm. Biomed Anal., 1996,14, 891.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Numerical
Sekulic, S. S., Wakeman, J., Doherty, P., and Hailey, P. A., J. Pharm. Biomed. A nal,
1998,17,1285.
Skagerberg, B., MacGregor, J. P., and Kiparissides, C., Chemometrics and Intelligent
Stark, E., Luchter, K., and Margoshes, M., Applied Spectroscopy Reviews, 1986, 22,
335.
Statistical Process Control, ed. Mamzic, C. L., Instrument Society of America, NC,
USA, 1995.
Vlies, van der, C., Plugge, W., and Kaffka, K. J., Spectroscopy, 1995, 10, 46.
Wargo, D. J., and Drennen, J. K., J. Pharm. Biomed. Anal., 1996,14, 1415.
Washington, C., Particle Size Analysis in Pharmaceutics and Other Industries, Ellis
278
Wise, B. M., Veltkamp, D. J., Ricker, N. L., Kowalski, B. R., Bames, S. M., and
Wold, S., Geladi, P., Esbensen, K., and Ohman, J., Journal o f Chemometrics, 1987,1,
41.
279
APPENDIX A
List of Publications
O' Neil, A. J., Jee, R. D., Watt, R. A., and Moffat, A. C., J. Pharm. Pharmacol., 1997,
49, {Supplement): 19.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1998a, 50
(Supplement): 45.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1998b, 123, 2297 - 2302.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1999a, 124, 33 - 36.
O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1999b, 51
(Supplement): 47.
280
APPENDIX B
281
Table B l. PCA results: blends.
Data Set PCs PRESS SS %SS R d.f.
Raw data 0 - 3.4034e-004 0 0 0 -
SNV 0 6.7573e-004 0 0 0
I 8.5795e-005 8.3710e-005 87.6119 0.1270 6.6698e+003 65
2 3.5162e-005 3.3626e-005 95.0238 0.4200 7.4692e-f-003 54
3 1,2058e-005 1.1431e-005 98.3083 0.3586 7.6622e-k003 44
4 5.4395e-006 5.0691 e-006 99.2498 0.4758 7.627 le-i-003 35
5 3.6739e-006 3.261 le-006 99.5174 0.7248 7.3151e4-003 27
6 2.0019e-006 1.7481e-006 99.7413 0.6139 6.7325e4-003 20
7 1.7173e-006 1.0332e-006 99.8471 0.9824 6.0815e4-003 14
Detrend 0 3.6857e-006 0 0
1 7.2327e-007 7.0743e-007 82.2622 0.1962 9.1077e-f-003 119
2 3.8726e-007 3.6801e-007 90.3347 0.5474 1.0103e-)-004 104
3 1.2486e-007 1.1867e-007 97.0801 0.3393 1.0538e4-004 90
4 7.4748e-008 6.9656e-008 98.4290 0.6299 1.0736e-t-004 77
5 4.4516e-008 4.0115e-008 99.2071 0.6391 1.0588e-k004 65
6 2.7663e-008 2.2799e-008 99.6229 0.6896 1.0316e4-004 54
7 2.3569e-008 1.5094e-008 99.7491 1.0338 9.9154e-t-003 44
8 PCs extracted after residual analysis, based on visual inspection of loadings and %SS
(= 99.993).
282
Table B2. PCA results: lower strength tablet RCA.
Data Set PCs PRESS SS %SS R d.f.
Absorbance 0 - ].5141e-004 0 0 -
SNV 0 1.0928e-004 0 0
1 4.3295e-005 3.8458e-005 64.81 0.40 663.75 44
2 9.6814e-006 8.3578e-006 92.35 0.25 908.64 35
3 4.2721e-006 3.4629e-006 96.83 0.51 996.20 27
4 1.8986e-006 1.4246e-006 98.70 0.55 992.58 20
5 1.2809e-006 8.7922e-007 99.20 0.90 941.74 14
6 9.0963e-007 5.8307e-007 99.47 1.03 827.79 9
Detrend 0 2.2620e-006 0 0
1 9.2012e-007 8.0456e-007 64.43 0.41 491.17 35
2 2.1666e-007 1.8132e-007 91.98 0.27 700.40 27
3 9.6564e-008 7.8296e-008 96.54 0.53 774.70 20
4 3.7717e-008 2.7805e-008 98.77 0.48 765.73 14
5 2.9147e-008 1.8019e-008 99.20 1.05 717.45 9
6 1.6825e-008 1.0247e-008 99.55 0.93 596.79 5
7 1,0453e-008 5.9254e-009 99.74 1.02 454.07 2
283
Table B3. PCA results: higher strength tablet RCA.
Data Set PCs PRESS SS %SS R / d.f.
Absorbance 0 - 2.8673e-005 0 0 0 0
1 5.5138e-006 4.9724e-006 82.66 0.1923 1163.52 44
2 1.6202e-006 1.3475e-006 95.30 0.3258 1330.37 35
3 4.7295e-007 3.6299e-007 98.73 0.3510 1378.80 27
4 8.439 le-008 6.3968e-008 99.78 0.2325 1362.22 20
5 5.5398e-008 3.7337e-008 99.87 0.8660 1289.96 14
6 2.6475e-008 1.7377e-008 99.94 0.7091 1115.41 9
7 1.8597e-008 1.1116e-008 99.96 1.0702 919.66 5
SNV 0 0 1.1126e-004 0 0 0 0
1 l.ll26e-004 3.19886-005 71.25 0.34 553.93 35
2 3.7539e-005 8.2639e-006 92.57 0.31 723.56 27
3 9.9305e-006 2.9176e-006 97.38 0.45 783.28 20
4 3.7155e-006 1.0414e-006 99.06 0.48 775.59 14
5 I.4108e-006 5.9575e-007 99.46 0.84 721.16 9
6 8.7558e-007 3.6048e-007 99.68 0.97 607.10 5
7 5.7592e-007 2.7077e-007 99.76 1.35 458.54 2
Detrend 0 1.6017e-006 0 0 0 0
1 1.6017e-006 4.906le-007 69.37 0.34 539.02 35
2 5.4476e-007 1.4136e-007 91.17 0.34 731.19 27
3 1.6899e-007 7.4979e-008 95.32 0.81 797.76 20
4 1.1408e-007 1.90356-008 98.81 0.33 787.93 14
5 2.466 le-008 1.21796-008 99.24 0.92 755.82 9
6 1.7555e-008 7.6559e-009 99.52 0.97 631.49 5
7 1.1757e-008 5.6942e-009 99.64 1.31 477.33 2
284
Table B4. PCA results: lower strength tablet Intact.
Data Set PCs PRESS SS %SS R d.f.
Transmission 0 - 1.9309e-002 0 0 -
SNV 0 1.3275e-003 0 0
1 1.0250e-004 8.770 le-005 93.39 0.08 1369.42 44
2 2.1154e-005 1.6823e-005 98.73 0.24 1502.04 35
3 5.9883e-006 4.7138e-006 99.64 0.36 1508.00 27
4 2.7927e-006 2.0684e-006 99.84 0.59 1441.00 20
5 1.3794e-006 9.1907e-007 99.93 0.67 1311.17 14
6 7.6756e-007 4.6957e-007 99.96 0.84 1142.90 9
7 4.4474e-007 2.480 le-007 99.98 0.95 930.50 5
8 3.0003e-007 1.5154e-007 99.99 1.21 679.33 2
Detrend 0 1.2686e-004 0 0
1 3.1805e-005 2.8447e-005 77.58 0.25 938.43 44
2 9.827 le-006 7.831 le-006 93.83 0.35 1106.23 35
3 3.8546e-006 2.6945e-006 97.88 0.49 1162.27 27
4 6.6619e-007 5.0676e-007 99.60 0.25 1154.72 20
5 4.1237e-007 2.581 le-007 99.80 0.81 1106.38 14
6 1.7928e-007 1.1244e-007 99.91 0.69 972.24 9
7 9.7649e-008 5.3988e-008 99.96 0.87 809.59 5
8 5.6415e-008 2.8888e-008 99.98 1.04 601.79 2
Savitzky- 0 6.3958e-009 0 0
Golay 2"** 1 3.3879e-009 2.8325e-009 55.71 0.53 161.72 27
derivative 2 I.2977e-009 1.0l47e-009 84.14 0.46 354.11 20
3 5.8567e-010 4.5514e-010 92.88 0.58 430.16 14
4 4.1493e-010 2.8737e-010 95.51 0.91 439.20 9
5 2.6904e-010 1.7286e-010 97.30 0.94 392.77 5
6 1.8724e-010 1.1325e-010 98.23 1.08 318.36 2
285
Table B5. PCA results: higher strength tablet Intact.
Data Set PCs PRESS SS %SS R d.f.
Transmission 0 - 2.6399e-002 0 0 -
SNV 0 8.0087e-003 0 0
1 5.1457e-004 3.9485e-004 95.07 0.06 985.63 35
2 3.7113e-004 1.2975e-004 98.38 0.94 1099.51 27
3 1.7685e-004 5.0638e-005 99.37 1.36 1083.37 20
4 1.1255e-004 2.3430e-005 99.71 2.22 1017.20 14
5 2.8991 e-005 8.0100e-006 99.90 1.24 906.17 9
Detrend 0 5.6523e-005 0 0
1 2.393 le-005 2.0798e-005 63.20 0.42 398.75 35
2 1.3644e-005 1.0400e-005 81.60 0.66 566.83 27
3 3.3333e-006 2.5707e-006 95.45 0.32 648.10 20
4 1.6904e-006 1.1839e-006 97.91 0.66 679.60 14
5 7.402 le-007 4.9321 e-007 99.13 0.63 639.88 9
6 4.063 le-007 2.5389e-007 99.55 0.82 565.20 5
7 3.3351e-007 1 8709e-007 99.67 1.31 441.01 2
Savitzky- 0 2.5273e-007 0 0
Golay 2"“ 1 8.6205e-009 6.4296e-009 97.46 0.03 507.65 14
derivative 2 6.0060e-009 4.087 le-009 98.38 0.93 601.63 9
3 4.3278e-009 2.7883e-009 98.90 1.06 512.83 5
286
Table B6. Multiway PCA results: lower strength blend (RCA) and tablet (RCA).
Data Set PCs PRESS SS %SS R d.f.
Absorbance 0 - 2.2114e-004 0 0 -
SNV 0 2.0617e-004 0 0
1 4.5335e-005 3.9636e-005 80.78 0.22 388.45 35
2 3.3000e-005 2.3310e-005 88.69 0.83 498.52 27
3 1.4935e-005 1.0887e-005 94.72 0.64 524.00 20
4 8.2068e-006 5.4248e-006 97.37 0.75 526.59 14
5 4.6967e-006 2.8593e-006 98.61 0.87 495.05 9
Detrend 0 1.9936e-006 0 0
1 8.2742e-007 7.2626e-007 63.57 0.42 226.74 35
2 5.109 le-007 3.9394e-007 80.24 0.70 384.64 27
3 2,0096e-007 1,5679e-007 92.14 0.51 457.02 20
4 1.6188e-007 1 0628e-007 94.67 1.03 483.95 14
Savitzky- 0 - 3.8694e-011 0 0 . -
287
Table B7. Multiway PCA results: higher strength blend (RCA) and tablet (RCA).
Data Set PCs PRESS SS %SS R x" d. f.
Absorbance 0 - 1,8636e-004 0 0 -
SNV 0 2.2307e-004 0 0
1 5.8589e-005 5.088 le-005 77.19 0.26 340.44 35
2 3.5179e-005 2.7080e-005 87.86 0.69 461.98 27
3 2.2223e-005 1.5434e-005 93.08 0.82 496.35 20
4 1,1666e-005 7.6827e-006 96.56 0.76 496.53 14
5 6.8397e-006 4.0711 e-006 98.17 0.89 473.09 9
6 3.6997e-006 2.1260e-006 99.05 0.91 416.39 5
Detrend 0 1.8364e-006 0 0
1 8.1432e-007 7.2469e-007 60.54 0.44 168.12 35
2 5.0930e-007 4.1116e-007 77.61 0.70 339.23 27
3 3.9002e-007 2.6049e-007 85.82 0.95 409.77 20
4 1.9347e-007 1.3238e-007 92.79 0.74 436.65 14
5 1.1370e-007 7.3184e-008 96.01 0.86 438.48 9
6 7.0525e-008 4.2038e-008 97.71 0.96 399.16 5
Savitzky- 0 - 3.0210e-011 0 0
Golay 2"“ 1 2.0085e-011 1.7638e-011 41.62 0.66 -67.51 14
derivative 2 1.7120e-011 1.2664e-011 58.08 0.97 62.89 9
3 I.1656e-011 8.5891e-012 71.57 0.92 117.81 5
4 8.6818e-012 6.0954e-012 79.82 1.01 133.42 2
288
Table B8. Multiway PCA results: lower strength blend (RCA) and tablet (Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 6.0221e-003 0 0 -
SNV 0 5.3543e-004 0 0
1 2.2612e-004 2,0357e-004 61.98 0.42 440.32 35
2 4.8841e-005 4,0050e-005 92.52 0.24 655.60 27
3 2.9911 e-005 2.1992e-005 95.89 0.75 728.76 20
4 1.435 le-005 9.8170e-006 98.17 0.65 709.27 14
5 7.178le-006 4.9250e-006 99.08 0.73 663.90 9
6 4.7181e-006 3.0184e-006 99.44 0.96 574.13 5
7 3.7015e-006 2.1273e-006 99.60 1.23 436.52 2
Detrend 0 3.8312e-005 0 0
1 8.4304e-006 7.4780e-006 80.48 0.22 403.20 27
2 4.0596e-006 3.2836e-006 91.43 0.54 530.47 20
3 2.0885e-006 1.5127e-006 96.05 0.64 559.43 14
4 9.4316e-007 6.3848e-007 98.33 0.62 544.10 9
5 3.6735e-007 2.6052e-007 99.32 0.58 492.15 5
6 2.7006e-007 1.7042e-007 99.56 1.04 397.74 2
Savitzky- 0 - 1.9480e-009 0 0
Golay 2"“ 1 9.5597e-010 7.7635e-010 60.15 0.49 104.79 20
derivative 2 4.2379e-010 3.2825e-010 83.15 0.55 238.79 14
3 1.8666e-010 1.4089e-010 92.77 0.57 290.09 9
4 1.2009e-010 8.6205e-011 95.57 0.85 292.33 5
5 8.5508e-011 5.6108e-011 97.12 0.99 243.16 2
Multiway PCA model: Blend absorbance data and tablet transmission data.
289
Table B9. Multiway PCA results: higher strength blend (RCA) and tablet (Intact).
Data Set PCs PRESS SS %ss R d.f.
Raw* 0 - 5.3808e-003 0 0 -
SNV 0 2.0576e-003 0 0
Detrend 0 1.8813e-005 0 0
Savitzky- 0 - 7.0949e-008 0 0 - -
Multiway PCA model: blend absorbance data and tablet transmission data
290
Table BIO. Multiway PCA results: lower strength blend (RCA) and tablet (RCA
and Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 3.5069e-003 0 0 -
SNV 0 3.6179e-004 0 0
Detrend 0 2.3182e-005 0 0
Savitzky- 0 1.1640e-009 0 0
Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
291
Table B ll. Multiway PCA results: higher strength blend (RCA) and tablet (RCA
and Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 5.1944e-003 0 0 -
SNV 0 1.4184e-003 0 0
Detrend 0 1.1324e-005 0 0
Savitzky- 0 4.1191e-008 0 0
Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
292
Table B12. Blend PCA Q statistics.
Batch Q > Q ç9
29 9,9989e-004
50 3.8650e-005
76 2.8328e-005
125 4.1612e-005
126 2.7180e-005
141 3.6368e-005
160 2.83606-005
162 4.0577e-005
167 4.3216e-005
168 5.4408e-005
169 6.9286e-005
171 3.41156-005
173 4.10106-005
174 2.9097e-005
180 4.05476-005
193 3.0612e-005
1.4924e-003 29 8.6033e-002
50 2.6435e-003
125 2.4948e-003
141 2.72446-003
162 2.52856-003
167 3.15816-003
168 3.58276-003
169 4.89186-003
171 2.40066-003
173 2.39996-003
180 2.62906-003
193 2.24856-003
29 9.84226-004
91 2.09726-005
125 3.93746-005
160 3.18356-005
162 3.95456-005
163 2.30366-005
167 2.86346-005
168 3.57806-005
169 4.60576-005
170 2.14686-005
171 4.60626-005
173 4.04536-005
174 2.83116-005
180 2.75506-005
193 3.09266-005
293
Table B13. Q statistics for lower strength Tablet (RCA) PCA.
Data Set Qgg Q 95 Batch Q > 099
Absorbance 4.9744e-006 4.1779e-006 3 2.0823e-005
7 2.9661e-005
14 1.3612e-005
18 1.6689e-005
21 2.6919e-005
2 2 1.8446e-005
27 9.9993e-006
30 2.5717e-005
31 2.6398e-005
32 2.4187e-005
33 9.1554e-006
38 3.1898e-005
39 5.5400e-005
40 5.7075e-005
294
Table B15. Q statistics for lower strength Tablet (Intact) PCA.
Data Set Ô9 9 Ô95 Batch Q > Q 99
Transmission 5.2093e-005 3.8329e-005 12 1.7977e-004
21 1.3308e-004
31 1.0500e-004
295
Table B16. Q statistics for higher strength Tablet (Intact) PCA.
Data Set Qg, Ô 95 Batch Q > Q 99
Transmission 2.1780e-004 1,6979e-004 20 5.2657e-004
26 4.6967e-004
28 4.6604e-004
30 4.9237e-003
31 6.1234e-004
40 9.5833e-003
296
Table B17. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Q 99
Absorbance 3.9809e-004 3.0530e-004 33 1.1911e-003
35 7.0804e-004
derivative
Table B18. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Ü 99
Absorbance 2.7978e-004 2.1245e-004 1 9.0982e-004
2 1.2517e-003
6 8.1050e-004
8 8.420 le-004
39 4.7903e-004
297
Table B19. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(Intact).
Data Set Q'» Ô95 Batch !2>Ô99
R aw ' 1.5907e-003 1.1738e-003 28 2.7533C-003
Table B20. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(Intact).
Data Set Qv) 695 Batch Q>Q99
R aw ' 1.4708e-003 1.1593e-003 21 5.2403e-003
27 7.343 le-003
28 1.398 le-002
32 2.5276e-003
38 6.3302e-002
39 8.1246e-003
41 2.2253e-003
Table B21. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(RCA and Intact).
Data Set Q99 Qts Batch 2>Ô99
Raw ' 2.4705e-003 1.8427e-003 10 4.149 le-003
15 3.3556e-003
Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
298
Table B22. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(RCA and Intact).
Data Set Q 99 Q 95 Batch Q > Q 99
Raw* 1.4852e-002 1.1309e-002 - -
Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
299
Table B23. Blend PC score Shewhart plots results.
Daia Set Batch PC Score value 99% Control limit
Raw 21 5 -3.4255e-002 -3.21346-002
23 6 -1.4635e-002 -1.35906-002
29 5 4.6468e-002 3.32656-002
29 8 1.3889e-002 8.63976-003
32 4 3.9906e-002 3.54816-002
37 8 -9.5603e-003 -8.54216-003
50 8 -9.229 le-003 -8.54216-003
64 7 1.0903e-002 1.01596-002
68 7 1.0328e-002 1.01596-002
71 6 1.8794e-002 1.42326-002
74 4 3,7637e-002 3.54816-002
127 4 3.9395e-002 3.54816-002
127 7 -1.0824e-002 -1.00026-002
192 6 1.6389e-002 1.42326-002
300
Table B24. Blend PC score Shewhart plot results: dispersion of sub-group centred
scores.
Data Set Batch PC 99% Control limits (+ /-) Observations exceeding 99% lim its'
Raw data g 1 +4.1520e-001,-4.1520e-(X)l 5
21 1 +4.1520e-001, -4.1520e-001 3
48 1 +4.1520e-001, - 4 .1 520e-001 3
49 1 +4.1520e-001,-4.1520e-001 5
49 + 1.6328e-001.-1.6328e-001 4
53 1 +4.1520e-001,^.1520e-001 8
53 + 1.6328e-001,-1.6328e-001 3
146 1 +4.1520e-001.-4.1520e-001 5
189 1 +4.1520e-001,-4.1520e-001 4
189 +4.4435e-002, -4.4435e-002 6
192 +1.0889e-(X)2, -1.0889e-002 4
” Restricted to a minimum o f 3 observations (equivalent to one blend sample out of three per batch). Control limits are shown as positive and negative where observations fall
beyond upper and lower limits.
301
Table B25. PC score Shewhart plot results for lower strength Tablets (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 18 4 3.9412e-002 3.8330e-002
39 6 2.0824e-002 1.7222e-002
40 6 1.8152e-002 1.7222e-002
Table B26. PC score Shewhart plot results for higher strength Tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 3 1 4.0676e-001 3.4801e-001
SNV - - -
Table B27. PC score Shewhart plot results for lower strength Tablet (Intact).
Data Set Batch PC Score value 99% Control limit
Transmission 2 4 -1.2771e-001 -1.2343e-001
6 8 1.6276e-002 1.58I2e-002
Table B28. PC score Shewhart plot results for higher strength Tablet (Intact).
Data Set Batch PC Score value 99% Control limit
Transmission 36 4 1.3984e-001 1.2727e-001
40 1 9.705 le-tOOO 8.7008e-f000
40 3 -3.3700e-001 -3.2879e-001
40 6 -1.8427e-001 -9.8051e-002
40 7 -3.0900e-002 -2.5547e-002
43 5 -6 .4 2 6 le-002 -6.2210e-002
302
Table B29. PC score Shewhart plot results for multiway PC A: lower strength
blend (RCA) and tablet (Intact).
Data Set Batch PC Score value 99% Control limit
R aw ' -
Savitzky-Golay 2"'“ - - -
derivative
Table B30. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (Intact).
Data Set Batch PC Score value 99% Control limit
R aw ' 21 7 7.5409e-002 7.3657e-002
38 1 9.4213e-t4XX) 8.3792e+000
38 4 ^ .6845e-001 -3.4343e-001
Table B3I. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 10 7 -3.4132e-002 -3.1283e-002
15 6 4.6640e-002 4.4400e-002
25 5 6.7545e-002 6.7212C-002
SNV detrend - -
Table B32. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance - - -
303
Table B33. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA and Intact).
Data Set Batch PC Score value 99% Control limit
Raw* 36 8 -7.3195e-002 -7.1348e-002
Savitzky-Golay 2"'' - - - -
derivative
Table B34. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (RCA and Intact).
Data Set Batch PC Score value 99% Control limit
Raw* 24 3 -1.1030e+000 -1.0395e-K)00
38 1 8.8263e+000 T7480e+000
Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
304
Table B35. MSPC of blend PCA models (absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (C ontrol Phase 1) PCs approximation normal
(99.99% limit)
42.42 11 1,5 15.56 14 235.00
47.86 21 5 15 1506.35
35.28 22 2, 5 ,6 20 19.26
34.99 23 2 ,5 ,6 23 20.26
37.61 24 5 ,6 33 29.40
30.78 28 1 ,3 ,4 37 703.16
58.44 29 5 43 83.02
26.05 31 1,2, 3 ,4 64 75.56
41.33 33 1 .3 ,4 67 26.54
25.93 49 1 109 177.71
33.91 53 1,5 115 191.46
121.12 83 1 144 44.90
180.46 84 1, 4, 5, 6 146 136.65
155.58 85 1.5 154 16.90
164.05 86 1.5 182 25.51
105.12 87 1
158.76 88 1.5
158.58 89 1.5
148.31 90 1.5
118.45 91 1,5
97.52 92 1
80.67 93 1
130.34 94 1.5
134.36 95 1,5
141.11 96 1,5
131.75 97 1.5
146.04 98 1,5
126.11 99 1,5
126.02 100 1,5
121.52 101 1.5
145.15 102 1,5
157.45 103 1.5
77.55 104 1
154.40 105 1.5
125.38 106 1,5
103.77 107 1.5
105.89 108 1
85.85 109 1.2
91.22 110 1
67.03 111 1
80.60 112 1
146.27 113 1.5
140.40 114 1,5
75.70 115 1,6
68.11 116 1
81.93 117 1
70.42 118 1
111.21 119 1
146.84 120 1
105.21 121 1
124.06 122 1
122.44 123 1
117.02 124 1
141.88 125 1,5
136.21 126 1.5
132.71 127 1
87.53 128 1 .5 ,6
96.66 129 1 ,6
81.93 139 1,5
127.70 140 1.5
82.16 141 1 ,4 ,5
85.95 142 1
86.88 143 1
76.25 144 1 ,4 ,5
111.79 145 1,5
115.15 149 1
94.50 151 1,4
120.22 152 1
146.32 153 1
142.07 154 1
305
Table B36. MSPC of blend PCA models (SNV absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (ControlPhase I) PCs approximation normal
(99.99% limit) approximation
24.90 1 1,3 13.48 8 14.24
40.81 8 1 ,2 ,4 14 450.36
64.86 11 1 ,2 ,4 15 114.50
28.20 12 4. 1,2 16 57.91
46.04 14 1 .2 ,4 27 25.95
50.39 21 2, 4 ,3 37 621.11
31.48 22 4, 2 ,3 39 18.19
29.17 23 4 ,2 40 75.23
50.11 24 2 ,4 43 48.93
33.13 25 1 .2 ,4 44 13.68
19.94 26 1 .4 59 78.21
38.03 27 1 64 21.51
63.31 28 1 .3 ,2 67 23.34
53.98 29 2, 1 ,3 ,4 109 23.47
44.65 31 1.2, 3 ,4 144 194.38
74.33 33 1.2 146 273.34
29.53 40 4. 1.2
38.38 48 1 .2 ,4
46.14 49 1.2
49.63 53 1 .2 ,4
21.39 64 1.5
21.78 67 1
93.94 83 1 .2 ,4
142.89 84 1 .2 ,4
117.64 85 1 ,2 ,4
131.87 86 1 .2 ,4
76.66 87 1 .2 ,4
122.11 88 1 .2 ,4
117.46 89 1 ,2 ,4
119.29 90 1 .2 ,4
93.01 91 1 .2 ,4
73.97 92 1 .2 ,4
55.95 93 1 .2 ,4
102.14 94 1 ,2 ,4
112.52 95 1 ,2 ,4
118.24 96 1 .2 ,4
102.25 97 1 .2 ,4
111.25 98 1 .2 ,4
106.37 99 1 .2 ,4
106.17 100 1 .2 ,4
100.64 101 1 .2 ,4
110.94 102 1 .2 ,4
131.39 103 1 .2 ,4
56.92 104 1 .4 ,6
120.23 105 1 .2 ,4
106.86 106 1 .2 ,4
80.93 107 1 .2 ,4
79.12 108 1 ,2 ,4
66.23 109 1 .2 ,6
64.45 110 1 .6 ,4
49.54 111 1. 2
57.11 112 1. 2
116.17 113 1 .2 ,4
109.96 114 1 .2 ,4
61.92 115 1 .4 ,6
48.04 116 1
56.70 117 1.2
48.78 118 1
65.65 119 1 .2 ,6
95.97 120 2, 1
77.02 121 1 ,2 ,4
76.39 122 2, 1
77.15 123 1. 2
84.44 124 1 .2 ,4
105.84 125 1 .2 ,4
103.68 126 1 .2 .4
93.25 127 1.2, 6 ,4
50.49 128 1 .2 ,6
68.91 129 1 .2 ,4
56.70 139 1. 2
93.70 140 1.2
76.78 141 1 .4 ,5
54.77 142 1.2
59.85 143 1.2
62.12 144 1.2
90.93 145 1 .2 ,4
77.23 149 1.2
69.34 151 1.2
71.88 152 1.2
90.85 153 2. 1
87.36 154 2. 1
22.98 189 3 .2 ,1
306
Table B37. MSPC of blend PCA models (detrend absorbance data).
Data Set Control PCs 7^„ 9 ,^ T ^> FI. t n - n . 9 9Î Batch Implicated Anderson's normal Batch Anderson's
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Detrend 139 7 20.51 28.98 21 1.3 14.5573 8 18.58
33.1572 29 6, 2 ,3 14 317.40
25.0336 83 1 16 25.31
49.3930 84 1,5 37 160.49
42.6574 85 1 ,2 43 100.85
44.8700 86 1 ,2 59 635.11
21.4141 87 1 64 39.21
43.1488 88 1 ,2 67 35.78
40.9772 89 1 ,2 109 152.77
35.9743 90 1 ,2 115 30.85
27.6983 91 1 ,2 142 21.73
37.7630 94 1 ,2 144 94.74
33.0183 95 1 146 30.59
36.1442 96 1,2 152 56.67
35.7966 97 1 ,2 186 26.61
32.9009 98 1
30.9418 99 1
31.0504 100 1
30.7297 101 1 ,2
36.6090 102 1 ,2
42.2416 103 1 ,2
37.5513 105 1 ,2
32.8472 106 1
20.5247 107 1
21.0809 108 1
39.0639 113 1 ,2 ,4
31.5319 114 1,4
23.3561 119 1
36.2788 120 1 ,2 ,3
24.6408 121 1,2
25.4348 122 1,3
23.3730 123 1
31.6397 124 1
38.3432 125 1,2
44.4650 126 1 ,2 ,7
29.2552 127 5, 1 ,6 ,3
32.7394 140 1,2
41.4892 141 1 ,5 ,7
23.5197 144 1
31.5817 145 1,2
31.7742 149 1,2
27.3713 151 1
27.2661 152 1
37.0055 153 1,3
36.5750 154 1,3
307
Table B38. MSPC of blend PCA models (SNV detrend absorbance data).
Data Set Control n. m-«. 99% TST». Batch Implicated Anderson's normal Batch Anderson’s
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
SNV 123 7 20.8037 35.22 21 1,4 14.5573 8 33.37
Detrend 21.67 22 1 .4 14 273.01
25.61 24 1 ,4 15 273.05
77.88 29 1 ,2 , 6 ,7 16 267.84
22.46 33 1,3 26 84.36
87.34 83 1 ,4 27 128.08
137.62 84 1 ,3 ,4 28 34.95
117.26 85 1 ,2 ,4 33 36.02
127.73 86 1 37 164.11
70.19 87 1 39 66.28
116.19 88 1 43 163.99
112.38 89 1 59 125.17
115.21 90 1 64 127.47
91.36 91 1 67 25.53
71.36 92 1 144 677.97
52.47 93 1 146 213.66
99.20 94 1 ,3 ,4
107.55 95 1 ,4
112.74 96 1,4
97.29 97 1,4
105.36 98 1 ,3 ,4
102.24 99 1 ,2 ,4
100.29 100 1 ,2 ,4
94.78 101 1 ,4
108.45 102 1 ,2 ,4
125.86 103 1 ,3 ,4
51.75 104 1 ,6
118.36 105 1 ,3 ,4
102.00 106 1 ,3 ,4
74.80 107 1,4
75.22 108 1
65.06 109 1 ,2 ,6
62.31 110 1 ,6
50.20 111 1
54.45 112 1
113.31 113 1 ,2 ,4
106.53 114 1 ,2 ,4
57.20 115 1 ,6
42.74 116 1
53.47 117 1
44.33 118 1
62.26 119 1
93.72 120 1,3
73.30 121 1
74.31 122 1 ,2 ,3
75.38 123 1 ,2 ,3
81.15 124 1,3
106.71 125 1,3
101.65 126 1,3
98.10 127 1 ,2 ,6
51.59 128 1
66.20 129 1
53.47 139 1
92.73 140 1
79.06 141 1 ,4
55.30 142 1
59.02 143 1
63.77 144 1
90.65 145 1
74.82 149 1
66.93 151 1,3
72.84 152 1,3
89.43 153 1,3
86.73 _______ 154 1,3
308
Table B39. MSPC of blend PCA models (Savitzky-Golay 11 point 2"^ derivative of
absorbance data).
Data Set Control PCs 7^„ 99, m-n. Batch Implicated Anderson's normal Batch Anderson's
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Savitzky- 91 7 21.7160 25.6375 21 5 .6 14.5573 33 18.79
Golay 2'“' 623.2553 29 5 .7 36 25.01
derivative 29.3948 32 2. 5 ,7 89 44.95
22.5586 48 5 90 29.68
32.6354 50 3 .5 .7 102 16.18
32.3917 63 1 .2 .3 106 84.83
114.5680 83 1 109 62.79
184.4618 84 1 125 5851.04
164.7183 85 1.2 126 3989.45
195.6259 86 1.2 144 5686.18
93.6012 87 1 .2 ,3 186 38.09
168.3523 1.2
170.9391 89 1.2
175.9201 90 1 .2 ,3
132.6630 91 1.2
87.0157 92 1
65.1144 93 1
133.6834 94
156.8426 95
173.9118 96
157.7016 97 1 .2 .3
151.5180 98 1 .2 .3
144.9607 99 1
145.0235 100 1
151.8975 101 1.2
155.1517 102 1.2
193.0451 103 1 .2 .3
57.4501 104 1
154.7630 105 1.2
145.7133 106 1.2
98.8913 107 1.2
99.4680 108 1
77.8768 109 1.3
49.1095 110 1 .2 .3
56.2726 111 1.3
65.3618 112 1.3
154.3364 113 1.2
139.8857 114 1
53.3653 115 1 .3 .7
51.0597 116 1 .3 .7
76.4228 117 1.2
49.5355 118 1 .2 .3
106.1942 119 1.5
166.6243 120 1.5
120.4525 121 1,2. 5 ,7
119.3954 122 1.5
133.2169 123 1.5
164.7556 124 1.5
156.3455 125 1.2
172.6288 126 1 .2 .5
70.4721 127 1 .3 .6
50.8225 128 1 .3 .6
83.0562 129 1.3
173.7180 136 5. 6 ,7
104.4856 137 2 .5
73.5134 138 2 .5
76.4228 139 1.2
138.2863 140 1 .2 .3
104.4436 141 1.3
58.2344 142 1.3
63.3440 143 1
93.0315 144 1
122.1246 145 1
129.4913 149 1
104.9995 151 1.3
155.6240 152 1.5
184.8494 153 1.5
171.9002 154 1.5
34.8586 156 2 .5
66.6923 157 5 .6
80.8661 161 5 .6
24.1009 162 3 .4
34.6592 170 2 .5
54.6877 171 3. 5 .6
62.9707 172 2 .5 .6
57.8558 173 2 .3
30.1715 174 3 .4
309
Table B40. MSPC of lower strength tablet PCA models (RCA).
Data Set Control PCs T^> 7^n. m-«. 99« Batch Implicated Anderson's normal Batch Anderson's normal
batches (Control Phase 1) PCs approximation approximation
(99.99% limit)
Absorbance 35 8 33.82 71.90 5 2, 1 15.56 4 76.78
47.12 6 2 9 689.51
35.06 8 1,8
36.82 10 1 ,7 ,6
48.93 16 1,8
310
Table B42. MSPC of lower strength tablet PCA models (Intact).
Data Set Control PCs m-n. 99% Batch Implicated PCs Anderson's normal Batch Anderson's
batches (Control Phase 1) approximation normal
(99.99% limit) approximation
Transmission 41 9 34.86 92.50 38 2. 1.3 16.51 1 -217.45
115.28 39 2. 1,3 6 120.46
135.78 40 2, 1,3 19 80.15
30 68.29
32 28.52
39 2519.17
311
Table B43. MSPC of higher strength tablet PCA models (Intact).
Data Set Control PCs n. m-fi. 99a 7^». >»-». 99* Batch Implicated Anderson’s normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Transmission 25 7 37.28 437.21 25 1 .5 ,3 14.56 22 95.19
261.36 26 1.5 40 15.61
237.65 27 1 .3 .5 41 455.95
239.99 28 1 .6
638.92 29 1 .6 ,5
2020.40 30 6. 1,2
289.32 31 1 .6
47.51 38 1
10423.23 40 6. 1 .3 .5
2196.21 41 1.6, 5 ,4
312
Table B44. MSPC of lower strength blend (RCA) and tablet (RCA) multiway PCA
models.
C ontrol PCs t ‘> Im plicated Anderson’s normal Anderson's
batches (Control Phase 1) PC s approximation normal
(99.99% limit) approximation
14,56 42,87
247,51
313
Table B45. MSPC of higher strength blend (RCA) and tablet (RCA) multiway
PCA models.
Data Set Control PCs .-..« % Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Absorbance 41 8 32.20 - 15.56 4 70.30
32 16.73
38 20.62
40 25.38
Table B46. MSPC of lower strength blend (RCA) and tablet (Intact) multiway
PCA models.
Data Set Control PCs n. m-n. 99» Batch Implicated Anderson's normal Batch Anderson's
hatches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Raw" 26 7 36.01 53.49 1 1 14.56 9 17.72
46.10 8 1 10 34.83
21 40.10
25 36.12
Table B48. MSPC of lower strength blend (RCA) and tablet (RCA and Intact)
multiway PCA models.
Data Set Control PCs m-fi, 99% Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% Umit) approximation
Raw" 39 8 31.91 15.56
Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
316
Table B50. Principal Factor Analysis results (normal varimax): raw material usage
data (kg) and blend PCA results {Q statistic and Anderson’s asymptotic normal
approximation).
Data Factor Variable Batch Number Loading*
Reflectance 14 M agnesium stearate EX 004181 0.35
7^ PC 5 - - 0 .3 9
fPC6 - 0.54
Anderson's asymptotic normal approx. - 0.44
317
Table B51. Blend PCA loading correlations with excipient NIR spectra
(absorbance, DT absorbance and Savitzky-Golay 11 point 2"** derivative of
absorbance spectra).
NIR Spectral Data Raw Material PC Loadings Correlation, r
data used for Points
PCA
Reflectance^ 700 Dibasic calcium phosphate 2 -0.969
anhydrous
Magnesium stearate 2,1 -0.837, -0.694
Microcrystalline cellulose 2,1 -0.961, -0.859
Sodium starch glycolate 1 -0.829
anhydrous
Magnesium stearate 5,6 0.656, 0.624
Microcrystalline cellulose 3 0.861
Sodium starch glycolate 3,2* 0.685, -0.508
anhydrous
Magnesium stearate 5 0.800
Microcrystalline cellulose 2,3 0.746, -0.568
Sodium starch glycolate 2 0.757
318
APPENDIX C
Data Sets
319
Table C l. Cumulative percentage sum of squares (%S5) accounted for by lower
strength blend PLS models (n = 39 batch average observations) for X (NIR) and
Y(Certificate of Analysis data) blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%SS) Y block (%SS)
Absorbance 0 0 0
1 98.5278 11.2094
2 99.5886 12.5387
3 99.9250 15.4238
4 99.9655 18.8266
5 99.9723 33.4962
6 99.9891 36.8413
SNV detrend 0 0 0
1 63.0350 9.6071
2 73.4564 13.6424
3 83.1379 19.2952
4 92.3981 24.0503
5 96.1400 32.7977
6 97.3619 41.1443
Table C2. Cumulative percentage sum of squares (%SS) accounted for by lower
strength tablet (combined absorbance and transmission data sets) PLS models (n
39 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%55) V block (%SS)
Absorbance 0 0 0
1 37.4909 16.6747
2 93.2834 22.2812
3 98.6305 36.0010
4 99.2602 46.0702
5 99.6718 51.7658
6 99.8546 52.6298
SNV detrend 0 0 0
1 24.1250 24.4640
2 53.4029 34.8386
3 67.3050 38.7702
4 82.9736 39.8916
5 94.3031 41.3013
6 96.5111 45.9564
320
Table C3. Cumulative percentage sum of squares (%SS) accounted for by higher
strength blend (absorbance and pre-treated absorbance data sets) PLS models (n
41 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%S5) Y block (%SS)
Absorbance 0 0 0
1 97.0926 4.2075
2 99.4767 5.3678
3 99.7981 12.3122
4 99.9240 20.1134
5 99.9713 22.4132
6 99.9897 26.9295
SNV detrend 0 0 0
1 54.1217 6.4363
2 79.3772 15.4671
3 88.5172 18.5090
4 91.4556 23.9379
5 97.0076 25.7904
6 98.5628 30.6108
Table C4. Cumulative percentage sum of squares (%SS) accounted for by higher
strength tablet (combined absorbance and transmission and pre-treated
absorbance and transmittance data sets) PLS models (n = 41 batch average
observations) for X (NIR) and Y(Certificate of Analysis data) blocks and for
different NIR spectral data sets.
NIR data set PLS components X block (%SS) V block
Absorbance 0 0 0
1 60.6272 11.1973
2 88.9116 15.2110
3 95.9915 18.7508
4 98.2400 21.5196
5 99.3493 23.8571
6 99.6597 28.7177
SNV detrend 0 0 0
1 46.7215 9.4302
2 71.0492 12.7547
3 79.8127 19.2646
4 91.7310 22.1985
5 94.5895 26.8946
6 96.2716 32.4628
321
Table C5. Cumulative percentage sum of squares (%SS) explained by multiblock
PLS models for each subsection of the lower strength manufacturing process : XI
(blend: absorbance/pre-treated absorbance data) and X2 (tablet: combined
absorbance/pre-treated absorbance and transmission/pre-treated transmission
data) and certificate of analysis data, Y (n = 39 batch average observations) for
different NIR spectral data sets.
NIR data set PLS components X I (blend, %SS) X2 (tablet, %SS) Y (Certificate of analysis data, %SS)
Raw* 0 0 0 0
1 98.5270 21.3553 13.2083
2 98.7977 43.9091 21.5690
3 99.8554 98.6277 23.1329
4 99.9655 99.1732 30.8496
5 99.9829 99.6724 34.7930
6 99.9901 99.7694 36.8361
SNV detrend 0 0 0 0
1 42.0676 14.0736 19.2026
2 75.2256 51.3683 23.9042
3 85.4275 68.6934 26.2144
4 88.4289 82.1438 30.7211
5 96.1761 91.1881 34.7918
6 97.1229 96.4851 36.3713
Savitzky-Golay 2"‘‘ 0 0 0 0
derivative 1 26.1042 18.8555 23.8181
2 44.8448 57.2383 27.2238
3 53.2462 63.7748 36.4666
4 60.8855 67.1647 45.4472
5 67.5435 75.4463 48.8964
6 72.0485 80.2112 53.0074
SNV detrend 0 0 0 0
1 49.0888 42.9342 8.4910
2 79.4721 67.4368 12.2611
3 83.4558 79.4804 19.0699
4 90.7284 89.2626 24.2839
5 96.5455 92.4355 28.7376
6 98.3923 95.7558 31.2060
Savitzky-Golay 2"'* 0 0 0 0
derivative 1 30.7872 19.9184 15.1495
2 46.3398 40.3850 21.5796
3 59.4878 50.9751 24.6986
4 63.2746 57.4616 29.0838
5 68.6447 62.7568 34.7574
6 74.8242 67.8627 36.9821
^ Raw data includes blend absorbance data (XI) and combined tablet absorbance and
transmittance data (X2).
322
Table C7. Partial least squares regression modelling (PLSRl algorithm) of
individual certificate of analysis (C. of A.) variables of lower strength blends and
tablets with their near infrared blend absorbance/pre-treated absorbance and
tablet combined absorbance/transmission or pre-treated absorbance/transmission
data (n = 36 observations).
C. of A. variable modelled PLS components PRESS minimum" ; Sum o f squares
SNV detrend
^PRESS is the predicted residual error sum of squares between PLS model predicted
and certificate of analysis measured values after cross validation.
323
Table C9. Q statistics for singleblock PLS models of lower strength blend.
Q99 Q9! Q>Q^
1 0.1374
2 0.1612
3 0.2119
10 0.2066
19 0.2217
22 0.1681
23 0.1754
24 0.2106
25 0.1740
26 0.1313
30 0.1703
35 0.1076
39 0.1159
1 31.0732
2 58.4866
3 66.2464
9 112.4084
10 43.6827
25 63.3014
28 54.5769
31 27.4952
34 32.9563
35 26.6006
36 50.0634
39 23.5542
Table CIO. Q statistics for singleblock PLS models of lower strength tablets (n = 39
batches).
Q99 Q95
1 9.9436
5 2.8456
6 2.9097
8 5.0869
9 1.7279
10 4.1349
11 3.9836
21 2.8280
32 2.3869
33 4.5181
36 4.7909
38.3869 5 84.7026
7 80.5027
10 112.7194
11 132.7731
21 57.7824
32 84.8975
36 131.0980
37 143.3551
324
Table C il. Q statistics for singleblock PLS models of higher strength blends (n
41 hatches).
Q„s__________________ Baich_________ Q> Qvt
18 0.3257
21 0.2699
33 0.1929
34 0.1732
39 0.2439
Table C12. Q statistics for singleblock PLS models of higher strength tablets (n
41 hatches).
3
-£>J2a-
10.8397
8 19.3108
12 4.2183
14 24.4520
15 14.7481
21 13.2031
23 9.2130
28 6.9264
29 5.0530
30 6.7363
31 4.8932
35 12.1203
38 12.5183
2 119.7522
30 104.9829
325
Table C14. Q statistic monitoring of multiblock PLS models of lower strength
tablets (n = 39 batches).
Data Set__________________________________ g » _______________________________ BatchQ> Q 99_________________
1 4.0697
4 1.2737
5 2.7040
6 4.0430
9 4.3076
10 2.1581
U 4.0817
21 2.2855
32 2.0614
33 3.7292
36 4.2138
5 141.7461
6 82.5636
7 74.1785
9 89.9190
11 284.0160
21 79.1252
36 213.1416
37 117.3819
8 23.4291
14 26.4575
18 57.6511
28 20.9336
29 38.7516
30 23.9276
38 55.3172
39 34.4153
326
Table C16. Q statistic monitoring of multiblock PLS models of higher strength
tablets (i2 = 41 hatches).
Q>Q^
13 20.3496
14 42.0068
15 50.0628
28 34.1517
34 8.2974
36 13.3758
37 27.2729
38 27.9747
39 7.6078
41 9.4235
28 293.69
30 106.02
38 1507.04
39 119.99
Table C17. MSPC of single block PLS models of lower strength blends (n = 39
hatches).
D ata Set C ontrol PLS model ^ Batch Im plicated A nderson's B atch Anderson's
7^>7^ n. ;n-n. 99<*
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw data 39 6 24.1641 - - - 13.4774 17 231.4818
24 18.2005
27 17.6282
32 65.7944
327
Table CIS. MSPC of single block PLS models of higher strength blends (n = 41
hatches).
Data Set Control PLS model Batch Im plicated A nderson's Batch A nderson's
T^n. m—n. 99% 7^>7^n, /n-n. 99%
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw data 36 6 24.9863 - - - 13.4774 9 55.2053
23 13.9840
40 224.0407
328
Table C20. MSPC of single block PLS models of higher strength tablets (/i = 41
hatches).
Data Set C ontrol PLS model ^ Batch Im plicated A nderson's Batch A nderson's
rank J n. ^ . 99% 7^> T^n. m-n. 99H
Batches com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 34 6 25.7327 37.6401 1 2 13.4774 2 20.6533
data 43.3983 9 3 ,2 4 107.2205
45.3390 13 2 ,3 39 20.1148
37.7641 19 3 ,2
31.3083 38 3, 5 ,6
329
Table C22. MSPC of multiblock PLS models of higher strength blends (n = 41
hatches).
D ata Set C ontrol PLS model ^ Batch Implicated A nderson’s B atch A nderson’s
rank m-n. 99% 7 ^> 7^n. m-n. 99»
Batches com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 25 6 31.0476 98.8444 21 1 ,3 ,4 13.4774 9 33.8573
data 80.5131 26 1 .3 ,4 40 205.7495
117.2681 27 1 ,3 ,4
69.7281 30 1 ,3
66.5197 31 1
161.0069 32 1 ,4 ,3
68.6803 33 1
103.4204 34 1
69.5886 35 1 ,5
114.5315 36 1
108.2342 37 1
59.8452 39 1 ,3
80.2606 40 1 ,4
60.3487 41 1 ,3 ,5
330
Table C24. MSPC of multiblock PLS models of higher strength tablets (n = 41
batches).
Data Set C ontrol PLS m odel ^ Batch Im plicated A nderson’s Batch A nderson's
T^>T^n. m-n. 99*
Batches rank " ^ com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 40 6 23.9074 - - - 13.4774 2 99.4951
data 4 25.4433
32 21.5128
331
Measurement of the cumulative particle size distribution of
microcrystalline cellulose using near infrared reflectance
spectroscopy
The cumulative particle size distribution of microcrystalline cellulose, a widely used pharmaceutical excipient, was
determined using near infrared (NIR) reflectance spectroscopy. Forward angle laser light scattering measurements
were used to provide reference particle size values corresponding to different quantités and then used to calibrate
the NIR data. Two different chemometric methods, three wavelength multiple linear regression and principal
components regression (three components), were compared. For each method, calibration equations were produced
at each of eleven quantités (5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95%). NIR predicted cumulative frequency
particle-size distributions were calculated for each of the calibration samples {n = 34) and for an independent test
set {n = 23). The NIR procedure was able to predict those obtained via forward angle laser light scattering.
Each powdered sample exhibited an NIR reflectance spectrum Principal components regression. This calibration method
with a curved baseline resulting from multiple scattering (Fig. required the generation of a principal components analysis
1). Across the spectrum of each sample, the apparent offset (PGA) model. This consists of a set of new variables which are
appears to increase and this has previously been attributed to uncorrelated and represent linear combinations of the original
variations in pathlength,^ which in turn is dependent on particle NIR reflectance data.
size and sample porosity.
Table 1 Particle size ranges at each quantile for the calibration and
validation sets as determined by FALLS
Model generation
Particle size/^im
The FALLS instrument gives values of the cumulative percent
age frequency particle-size distribution at 64 particle sizes Calibration set (n =: 34) Validation set (n == 23)
(range 564-5.8 pm) at intervals which follow a geometric Quantile
progression. For each sample, linear interpolation of the (%) Minimum Median Maximum Minimum Median1 Maximum
measured FALLS values was used to calculate the particle size 5 6.45 25.72 216.52 7.21 23.06 167.11
values corresponding to the 5,10, 20, 30,40, 50, 60, 70, 80,90 10 9.92 37.14 268.91 11.44 32.34 187.67
and 95% quantiles. The samples exhibited a wide range of 20 14.48 52.92 311.96 18.05 45.40 219.33
particle sizes at each quantile (Table 1) and a wide variety of 30 18.39 67.10 345.62 22.55 56.36 251.13
40 21.40 81.41 376.44 26.27 67.35 283.66
50 23.99 96.59 406.07 2&82 78.98 319.67
60 26.47 112.82 436.21 33.71 91.57 359.67
70 29.25 131.29 466.55 38.30 105.95 402.81
80 33.11 154.78 497.51 44.94 124.03 451.54
90 40.62 197.11 529.54 57.16 152.54 504.66
0.9 546.76 70.34 184.74 533.21
95 48.47 240.07
0.8
Table 2 MLR wavelengths and PCs selected for each percentage quantile
calibration
%
oc Percentage MLR wavenumber/cm-' PCs
0.5 4008 9300 9528 28 22 17
5
10 4008 9300 9528 29 27 14
0.4 20 5640 5676 6216 29 27 14
30 4464 9852 9864 29 28 27
0.3 40 5736 9M2 9864 27 15 1
50 5736 9852 9864 20 15 1
4500 5500 6500 7500 8500 9500 60 5496 9852 9864 20 15 1
Wavenumber/cm“^ 70 6024 6948 9168 15 14 1
80 5664 5796 9432 23 9 3
Fig. 1 NIR spectra of microcrystalline cellulose samples with different 90 5952 6996 8280 28 18 6
particle-size distributions and median particle sizes, (a) 24, (b) 45.8, (c) 95 7632 8532 8664 28 18 6
93.4, (d) 261 and (e) 406 fxm.
Table 3 MLR and PCR calibration and validation results at various percentage quantiles
Percentage
Parameter^ 5 10 20 30 40 50 60 70 80 90 95
A number of powdered drugs and pharmaceutical excipients were used to demonstrate the ability of near-infrared
spectroscopy to measure median particle size {dso). Sieved fractions and bulk samples of aspirin, anhydrous
caffeine, paracetamol, lactose monohydrate and microcrystalline cellulose were particle sized by forward angle
laser light scattering (FALLS) and scanned by fibre-optic probe FT-NIR spectroscopy. Two-wavenumber multiple
linear regression (MLR) calibrations were produced using: NIR reflectance; absorbance and Kubelka-Munk
function data with each of median particle size, reciprocal median particle size and the logarithm of median
particle size. Best calibrations were obtained using reflectance data versus the logarithm of median particle size
(NIR predicted Inafgo versus ln(FALLS dso) for microcrystalline cellulose and lactose monohydrate sieve fraction
calibrations: r = 0.99 in each case). Working calibrations for lactose monohydrate (median particle size range:
19.2-183 pm) and microcrystalline cellulose (median particle size range: 24-406 pm) were set-up using
combinations of machine sieve-fractions and bulk samples. This approach was found to produce more robust
calibrations than just the use of sieved fractions. The method has been compared with single wavenumber
quadratic least squares regression using reflectance and mean-corrected reflectance data with median particle size.
(Correlation between NIR predicted and FALLS values was significantly better using the MLR method.
The measurement of particle size for pharmaceutical materials relationship as these may exhibit Rayleigh scatter, which is
is important^ ' 2 because it influences bulk physical properties,^ proportional to the third power of the particle size.?*
and determines the ability of powders to flow, mix, granulate The aim of this study has been to measure the number median
and dissolve. It is also often a requirement in pharmaceutical particle size, J5 0 ,* in drugs and pharmaceutical excipients by
manufacturing processes that particle size measurements are NIR spectroscopy using relatively simple chemometrics. Single
performed on raw materials.^ Commonly employed methods of wavenumber quadratic least squares regression of NIR and
measurement are forward angle laser light scattering (FALLS) particle size data has been compared with a two-wavenumber
and electrical zone sensing.^ A disadvantage with these multiple linear regression (MLR) of the same data.?®- ?? The
methods is that samples generally need to be analysed away effect of different pre-treatments of NIR data and FALLS data
from the production area, which is time consuming and leads to on calibration has also been investigated.
manufacturing delays. These problems can be effectively
overcome by the use of near-infrared spectroscopy (NIRS).^
This technique has the advantages that in addition to providing E xp erim en tal
physical information about the raw material, such as its particle
size,^ it can simultaneously provide useful chemical informa- Instrumentation
tion, * - ' 9 and the analysis can be performed in a few minutes in
the pharmaceutical warehouse using fibre-optic probe instru NIR measurements were made using a FT-NIR NIRVIS (No.
ments.^ 100.1, Buhler AG, Uzwil, Switzerland) spectrometer fitted with
With diffusely reflecting materials, such as pharmaceutical a fibre-optic probe (No. 110.2). Spectra were recorded over the
powders, scattering of light in the NIR region (4000 to 10 000 range 4008 to 9996 cmr* (500 data points) each spectrum being
cm-*) produces spectra with non-uniform baselines and the average of six scans. Particle size data were acquired by
varying offsets. These scatter effects vary with the particle FALLS using a Malvern 2600C particle-sizer (Malvern Instru
size,’®sample porosity® (and hence compaction pressure) and ments Ltd, Malvern, UK). A Philips XL20 scanning electron
with the wavelength^® and can be described using Rayleigh and microscope (Phillips Electron Optics, Eindhoven, Netherlands)
Mie theory,21 or alternatively using the Kubelka-Munk theory was used to determine particle shapes. Sieve fractions were
of diffuse reflectance. 2 1 produced using a machine sieve (Endecotts Ltd, London, UK)
Previous studies that have examined the effects of particle and an air-jet sieve (Alpine, Augsburg, Germany).
size on NIR spectra have demonstrated that reflectance varies
non-Iinearly with particle size.2*-25 Ciurczack et al3^ found that
reflectance exhibited an inverse relationship with mean particle Materials
size in agreement with Mie theory.^’ However, this relationship
does not necessarily apply in all cases and is dependent on the Single batches of aspirin and anhydrous caffeine (Sigma
shape of the particle size distribution of the sample,^’ the Chemical Co, St Louis, MO, USA) and paracetamol (Boots
particle shape?* and the materials refractive index.?* The Pharmaceuticals, Nottingham, UK) were used. Microcrystalline
presence of very small particles will further complicate the cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19
0 .4
Sample preparation and presentation
0 .0 1 5
0 .5 5
0.020
0 .5 0
S 0 .0 2 5
» 0 .4 5 0 .0 3 0
0 .0 3 5
0 .4 0
0 .0 4 0
0 .3 5
100 300 400 100 20 0 300 400
FALLS d /|m
0 .1 6
0 .4 0
0 .1 5
0 .3 5
o 0 .3 0 g 0.14-
tJ 0 . 2 5 I oa3.
0.20 5 0.1 2 '
0 .1 5
I O.ll.f.
/
0.10 0.10
0 .0 5
0 .0 9 -
50 100 150 100 150
FALLS FALLS
Fig. 2 Single wavenumber quadratic least squares fit of NIR spectral data and median particle size, d^Q . A, microcrystalline cellulose reflectance data; B,
microcrystalline cellulose mean-corrected reflectance data; C, lactose reflectance data, and D, lactose mean-corrected reflectance data.
Reflectance at any wavenumber versus dso or l/c^so exhibited a Table 1 Microcrystalline cellulose and lactose MLR calibration (sieve
curvilinear relationship. To allow for this, two different fraction data) and validation (bulk sample data) results
approaches were compared: single wavenumber quadratic least
squares regression and full two-wavenumber search MLR. With Microcrystalline
each of these calibration methods, different pretreatments of the Material cellulose Lactose monohydrate
NIR spectral and FALLS dso data were applied and their effects V -3.99 4.59
on standard errors of calibration and prediction (SEC and SEP bi« 59.65 -172.3
b2 « -57.99 167.8
respectively),bias and linearity investigated. Wavenumber 1
Quadratic least squares fits of NIR spectral and FALLS dso (cm-') 8244 6012
data were used to allow for gentle curvature in calibrations. The Wavenumber 2
NIR spectral data were diffuse reflectance (of infinite thickness (cm-') 5964 5940
for all practical purposes), R; mean corrected reflectance (where SEC [ln(£?5 o/pm)] 0.067 0.097
the mean reflectance value of an individual spectrum is SEP [bi(J5 o/p.m)] 0.17 0.18
subtracted from the reflectance at each of its spectral wave In P = c + m In(FALLS dso)
numbers); absorbance, log (l/R) and Kubelka-Munk function,
Calibration set*
/W: r 0.99 0.99
£ m 0.99 0.98
ji-R) c 0.035 0.068
f{R) = (1)
2R S n 24 (PHlOl, PH102, 15 (Sieved and
PH200 sieved) 1 1 0 mesh)
where K and S are the Kubelka-Munk absorption and scatter Validation set*
coefficients respectively. A search of all 500 datapoints was r 0.84 0.014
used to select the wavenumber giving the smallest SEC for each m 0.96 0.018
NIR data pretreatment. c 0.17 4.38
n 33 (PHlOl and 18 (Fast-flo samples)
The second calibration technique applied was two wave
PH102, bulk)
number MLR. A search of all combinations of two wave
numbers from the 500 measured by the spectrometer was “ MLR coefficients: b o , intercept; 6 ,, wavenumber 1, and 6 2 ,
wavenumber 2 . * r is correlation coefficient; m and c are slope and
carried out. The NIR spectral data used were again reflectance,
intercept, respectively, of plots of NIR predicted W 5 0 vs. FALLS measured
mean-corrected reflectance, absorbance and Kubelka-Munk Intiso; n number of samples in each data set.
function. FALLS data pretreatments investigated were dso,
160
160
140
I
% 140
350 S 120
S 120 100
!... 100
80
60
250
250 300 350 400 60 ) 100 150
FALLS dgg/pm FALLS d ^ / i m
Fig. 3 Feasibility study. Results of MLR calibration. NIR measured median particle size, dso . versus FALLS dso : A, aspirin; B, anhydrous caffeine, and
C, paracetamol.
The quadratic least squares calibrations were performed using Mean-correction of each spectrum was found to improve
the microcrystalline cellulose and lactose monohydrate data sets correlation between NIR predicted and FALLS dso with the
as these had the largest number of data. microcrystalline cellulose data set [r = 0.98, 7128 cm ' (n =
57)]. This pre-treatment acts to centre the data of individual
spectra (^mean = 0 ) and can help to eliminate baseline
Spectral characteristics differences that occur as a result of variable sample porosity and
pressure applied with the fibre-optic probe. The variation in
Scans of each powdered sample exhibited the characteristic offset will also be influenced by the flow properties of the
overlapping combinations and overtones arising from the material. Use of this pretreatment is likely to be appropriate in
fundamentals of the mid-infrared, with non-uniform baselines single wavenumber least squares calibrations where the mate
resulting from multiple scattering. Spectra also showed differ rial exhibits variable compaction properties, such as with
ent offset values, which appear to increase with wavenumber different grades of microcrystalline cellulose,2 » and also where
(Fig. 1). This has previously been attributed to variation in the NIR measurements are recorded using a fibre-optic probe.
pathlength,26 which is influenced by particle size and sample However, with lactose monohydrate (Reagent grade, 110 mesh
porosity. and Fast-Flo), which tends to have good flow and compaction
properties,29 pretreatment was not appropriate and gave
a poorer fit between NIR predicted and FALLS dso [/' = 0.68,
Single wavenumber quadratic least squares calibration 7056 cm-' {n = 33)].
10
u 10
102
1
10 10
FALLS d^g/pm 101 103
10
Ü 10
10 10 1 3
FALLS d /pm 10 102 10
FALLS d^g/pm
Fig. 5 Results of microcrystalline cellulose MLR calibration with
randomised sieve fraction and bulk sample data. NIR measured median Fig. 6 Results of lactose monohydrate MLR calibration with randomised
particle size, dso, versus FALLS dso- A, Calibration set and B, validation sieve fraction and bulk sample data. NIR measmed median particle size, dso,
set. versus FALLS dso- A, Calibration set and B, validation set.