Download as pdf or txt
Download as pdf or txt
You are on page 1of 342

Multivariate Statistical Quality Control of a

Pharmaceutical Manufacturing Process Using

Near Infrared Spectroscopy And Imaging

Microscopy

ANDREW JAMES O ’NEIL

A thesis submitted in partial fulfilment of the requirements of The

University of London for the degree of Doctor of Philosophy in the Faculty

of Medicine

The School of Pharmacy,

University of London,

29/39 Brunswick Square,

London W CIN lAX.

May 2000.
ProQuest Number: 10104305

All rights reserved

INFORMATION TO ALL USERS


The quality of this reproduction is dependent upon the quality of the copy submitted.

In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if material had to be removed,
a note will indicate the deletion.

uest.
ProQuest 10104305

Published by ProQuest LLC(2016). Copyright of the Dissertation is held by the Author.

All rights reserved.


This work is protected against unauthorized copying under Title 17, United States Code.
Microform Edition © ProQuest LLC.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
ABSTRACT
Multivariate Statistical Quality Control of a Pharmaceutical
Manufacturing Process Using Near Infrared Spectroscopy
And Imaging Microscopy

The ability of near infrared (NIR) reflectance and transmittance spectroscopy and near
infrared imaging microscopy to enable multivariate statistical quality control of an
entire pharmaceutical tablet manufacturing process has been demonstrated.
Statistical quality control of process intermediates at each of the processes’ stages (raw
materials dispensing, powder blending and tabletting of blend) required construction of
a multivariate model from a reference set of NIR spectroscopic measurements of
process intermediates. With blends and tablets, these measurements were collected from
a number of different batches where the process was known to have operated within
specification and within a state of statistical control. With the powdered raw materials,
measurements were made of pharmaceutical grade materials.
Using the multivariate models, it was shown possible to assess the quality of raw
materials by NIR spectroscopy and determine their suitability for use in manufacture.
Simultaneous determination of powdered pharmaceutical raw materials’ identities and
their accurate particle size distributions were obtainable from a single averaged NIR
spectrum.
The models developed from NIR measurements of blends and tablets enabled a level of
quality control at each of these process stages superior to current reference analytical
laboratory measurements of these. Significant trends in process deviation could be
identified from an averaged NIR spectrum even at the blend stage despite within
specification reference laboratory data. These batches of blends ultimately produced
tablets of lower quality. This included tablet friability, increased tablet thickness and
prolonged tablet dissolution time.
NIR microscopic imaging of these lower quality pharmaceutical blends was examined
to provide more detailed diagnostic information. The spatial locations and size of drug
substance particles could be readily identified and in some instances showed unmilled
drug substance particles. This demonstrated the potential of NIR imaging microscopy
for on-line or at-line quality control of the blending stage.
ACKNOWLEDGEMENTS

I would like to thank those who have helped me in my studies at The School of
Pharmacy. My supervisors, Prof. Tony Moffat and Dr. Roger Jee, deserve special
thanks. Throughout my research training, they provided invaluable comments and
guidance. I particularly enjoyed the useful discussions in the bar, and participation in
several international conferences.
I am also grateful to my industrial supervisor. Dr. Perry A. Hailey, Pfizer Central
Research, Pfizer Ltd., and to Pfizer Ltd. for funding my Ph.D. studentship.
Perry devised the concepts for my Ph.D. research and was an excellent source of advice
on matters of chemometrics and computing. He also arranged for me to be provided
with an extensive set of pharmaceutical process materials. John Wakeman, Pfizer
Central Research, Quality Operations, deserves many thanks for kindly selecting
appropriate batches of normal and unusual process materials.
I also thank the Mathworks Inc. for providing Matlab 5.2 software, FMC International
for supplying Avicel samples and Foss NIRSystems, Buhler AG and Spectral
Dimensions Inc. for use of near infrared instruments.
I greatly appreciate the support and encouragement of my parents, Steve and Angela,
over the years - especially during my Ph.D.
Contents

Title 1

Abstract 2

Acknowledgements 3

List of Abbreviations 9

List of Symbols 12

1. Introduction 17

1.1 Pharmaceutical Quality Control, 17

1.2 Process Based Measurements, 18

1.3 Aim, 20

1.4 Principles of Near Infrared Spectroscopy, 22

1.5 Diffuse Reflectance Spectroscopy, 25

1.6 Near Infrared Spectral Data Pre-processing, 29

1.7 Multivariate Analysis, 32

1.8 Multivariate Statistical Process Control, 41

2. Measurement of Powdered Pharmaceutical Material 48

Particle Size by Near Infrared Spectroscopy

2.1 Introduction, 48

2.2 Review of The Literature, 49

2.3 Materials Used, 50

2.4 Sample Preparation, 52

2.5 Reference Particle Size Analysis, 53

2.6 Near Infrared Reflectance Measurements, 54


2.7 Data Analysis, 55

2.8 Measurement of The Number Median Particle Size, 55

2.9 Measurement of The Cumulative Percentage Frequency Particle Size

Distribution, 65

2.10 Measurement of The Percentage Frequency Particle Size Distribution, 75

2.11 Classification of Excipient Grades by Cluster Analysis Methods, 84

2.12 Conclusion, 93

3. Multivariate Statistical Process Control of a Pharmaceutical 95

Manufacturing Process Using Principal Components Analysis

of Near Infrared Measurements

3.1 Introduction, 95

3.2 Background and Overview of The Process, 96

3.3 Materials, 97

3.4 Reference Analytical Data, 98

3.5 Near Infrared Measurements, 106

3.6 Data Analysis And Pre-treatment, 107

3.7 Multivariate Statistical Quality Control Methods, 114

3.8 Multivariate Statistical Quality Control of Pharmaceutical Blends,

3.9 Multivariate Statistical Quality Control of Pharmaceutical Tablets,

3.10 Multivariate Statistical Quality Control of The Entire Process, 166

3.11 Summary of Results, 176

3.12 Conclusion, 178


4. Multivariate Statistical Process Control of a Pharmaceutical 180

Process Using Partial Least Squares Regression (PLSR) of

Near Infrared And Reference Analysis Measurements

4.1 Introduction, 180

4.2 Near Infrared And Reference Analysis Data Sets Used, 181

4.3 Statistical Quality of Pharmaceutical Blends And Tablets by Singleblock

PLSR, 182

4.4 Statistical Quality Control of The Entire Process by Multiblock PLSR,

191

4.5 Summary of Results, 217

4.6 Conclusion, 220

5. Multivariate Image Analysis of Near Infrared Multispectral 222

Blend Images For Quality Control of Pharmaceutical Blending

5.1 Introduction, 222

5.2 Materials Studied, 223

5.3 Sample Preparation, 223

5.4 Liquid Crystal Tuneable Filter InSb Focal Plane Array

Near Infrared Imaging Microscopy, 224

5.5 Multivariate Image Analysis, 225

5.6 Image Cube Data Pre-treatments, 229

5.7 Multiway Principal Components Analysis of Multispectral

Images, 235

5.8 Particle Size Analysis of Unmilled Crystalline Material and Drug

Substance, 245

5.9 Multivariate Monitoring of Blend Quality, 251


5.10 Alternative Approaches to Monitoring Blend Quality, 269

5.11 Conclusion, 271

6. Conclusion 272

References 274

Appendix A. List of Publications 280

Appendix B. Tables for PCA And Multiway PC A Data Sets 281

Appendix C. Tables for Singleblock PLSAnd Multiblock PLS Data Sets 319

Appendix D. Data Sets for Chapter 2

(CD-ROM, inside back cover (files in ASCII tab delimited format))

Sections 2.8 and 2.9

avicelcal - Near infrared reflectance spectra of microcrystalline cellulose

samples (calibration set)

avicelval - Near infrared reflectance spectra of microcrystalline cellulose

samples (validation set)

avicelquantilescal - Particle size data for avicelcal (calibration set)

avicelquantilesval - Particle size data for avicelval (validation set)


Section 2.10

avicelmastersizer - Percentage frequency particle size data of

microcrystalline cellulose samples for

mastersizerspectra

highsize - Particle size intervals for avicelmastersizer

mastersizerspectra - Near infrared absorbance spectra of

microcrystalline cellulose samples for

avicelmastersizer

Section 2.11

knnlactosefastflo - Near infrared reflectance spectra of lactose Fastflo

knnlactoseregular - Near infrared reflectance spectra of lactose Regular

knnavicelphlOl - Near infrared reflectance spectra of Avicel PH 101

knnavicelphlOl - Near infrared reflectance spectra of Avicel PH 102


LIST OF A B B R E V I A T I O N S

approx. approximation,

BN batch number,

BaF? barium fluoride.

CD-ROM compact disc read only memory.

CL control limit.

C. of A. certificate of analysis.

CV coefficient of variation,

d. f. degrees of freedom.

DT quadratic baseline detrend.

FALLS forward angle laser light scattering.

Fig. figure.

FT-NIR Fourier-transform near infrared.

HPLC high performance liquid chromatography.

InSb indium antimonide solid state detector.

Intact near infrared tablet transmission module.

knn k nearest neighbour observations in pattern space to a target pattern

vector, calculated from the Euclidean distance.

LC-TF liquid-crystal tuneable filter.

Max. maximum.

MHz megahertz.

MPCA multiway principal components analysis.

MB PLS multiblock partial least squares.

MB PLSR multiblock partial least squares regression.

MEWMA multivariate exponentially weighted moving average.


MIA multivariate image analysis.

MLR multiple linear regression.

MSC multiplicative scatter correction.

MSPC multivariate statistical process control.

NIPALS non-linear iterative partial least squares.

NIR near infrared.

NIRS near infrared spectroscopy.

PCA principal components analysis.

PC principal component.

PCR principal components regression.

PLS partial least squares or projection to latent structures.

PLSR partial least squares regression.

PLSRl non-linear iterative partial least squares regression algorithm 1.

PLSR2 non-linear iterative partial least squares regression algorithm 2.

PRESS predicted residual error sum of squares statistic.

PTFE polytetrafluoroethylene.

RAM random access memory.

RCA near infrared Rapid Content Analyser module for diffusely reflective

materials {eg. powders).

RMSEC root mean square error of calibration.

RMSEP root mean square error of prediction.

RSS residual sum of squares.

SEC standard error of calibration.

SEM scanning electron microscopy.

SEP standard error of prediction.

Sg2dl 1 Savitzky-Golay 11 point quadratic second derivative digital smoothing


10
filter.

SIMCA soft independent modelling of class analogy.

SNV standard normal variate.

SNV-DT standard normal variate transformation followed by quadratic baseline

detrend.

SS residual sum of squares.

%SS cumulative percentage sum of squares.

UCL upper control limit.

UV ultraviolet.

11
LIST OF S YMB OL S

a 1. index for principal component or partial least squares dimension; 2.

confidence interval.

a index for multiple linear regression wavelength or coefficient.

a proportionality constant in apparent absorbance equation.

A(l%, 1cm) specific absorbance of a filtered, 1% solution of 1 cm pathlength.

P predictive partial least squares model matrix of regression coefficients.

0 trace of residual covariance matrix.

A 1. wavelength; 2. multivariate exponentially weighted moving average

weight coefficient.

Afl eigenvalue of ath principal component.

|l lO-*.

JLL 1. reduced mass; 2. mean signal of a near infrared spectrum; 3.

population mean vector.

(T 1. standard deviation of a near infrared spectrum; 2. population standard

deviation.

e absorption coefficient.

Z population variance-covariance matrix.

El theoretical sample generalised variance,

chi-squared statistic.

A 1. absorbance; 2. rank of multivariate model (eg. principal components

analysis or partial least squares); 3. Anderson’s asymptotic normal

approximation.

12
bo multiple linear regression equation intercept.

b] multiple linear regression equation coefficient.

Z?2 multiple linear regression equation coefficient.

c 1. speed of light in a vacuum; 2. intercept of linear regression of near

infrared predicted and reference analysis measurements; 3. molar

concentration.

D Mahalanobis distance.

Dpq Euclidean distance between observation p and q.

dso number median particle size.

dx number percentage quantile particle size.

E residual matrix after multivariate modelling {eg. principal components

analysis or partial least squares).

Ea residual intensity image matrix in multivariate image analysis.

El residual matrix after multiblock partial least squares modelling of X I

block.

E2 residual matrix after multiblock partial least squares modelling of X2

block.

Ey potential energy of a vibrational energy level of quantum number, v.

F residual matrix of unmodelled variance in reference analysis data, Y.

F critical point of the F distribution.

/ vibrational frequency of a diatomic molecule.

fe uniform spacing between vibrational energy levels.

F{Roo) Kubelka-Munk function.

g weight coefficient for weighted chi-squared distribution.

h 1. higher order term in a vibrational energy level, Ey, 2. degrees of

freedom of weighted chi-squared distribution.


13
lo Intensity of incident near infrared radiation.

Irefl Intensity of reflected near infrared radiation.

k 1. bonding force constant; 2. molar absorption coefficient in Kubelka-

Munk theory of diffuse reflectance.

constant term in Kubelka-Munk function equation, equivalent to scatter

coefficient, s, divided by the product of 2.303 times the absorption

coefficient, e.

log(l//?) absorbance of diffuse reflectance near infrared spectral data,

io g (im absorbance of transmission near infrared spectral data.

m 1. slope of linear regression of near infrared predicted and reference

analysis measurements; 2. number of observations; 3. sample mean of Q

statistic.

fraction of mass of a component in a material.

1. refractive index in Fresnel equation of regular reflectance; 2. number

of variables or principal components.

Kronecker product.

1. matrix of principal components analysis loadings; 2. matrix of partial

least squares loading vectors for X block.

PI matrix of multiblock partial least squares loading vectors for X I block,

P2 matrix of multiblock partial least squares loading vectors for X2 block.

P 1. probability level; 2. number of variables.

Pa partial least squares loading vector for matrix X in one partial least

squares dimension, a.

qa partial least squares loading vector for Y matrix in partial least squares

dimension, a.

Q 1. matrix of loading vectors for Y matrix; 2. squared prediction error


14
matrix.

Qa residual distance to model intensity image matrix in multivariate image

analysis.

Qa critical value for Q statistic.

Q95 95% critical value for Q statistic.

Q9 9 99% critical value for Q statistic.

R 1. reflectance; 2. multiple correlation coefficient; 3. principal components

analysis cross-validation statistic.

R^ absolute diffuse reflectance from an opaque, diffusely reflecting material

of infinite thickness.

Roo relative diffuse reflectance from an opaque, diffusely reflecting material

of infinite thickness.

Rreg regular reflectance.

r simple linear correlation coefficient.

s scattering coefficient in Kubelka-Munk theory of diffuse reflectance.

S sample variance-covariance matrix.

151 sample generalised variance of sample variance-covariance matrix, 5.

T transmission.

7^ Hotelling’s 7^ statistic.

multivariate exponentially weighted moving average Hotelling’s 7^

statistic for observation i.

T ^n . m - n ,a Hotelling’s 7^ control limit for principal components analysis or partial

least squares model of dimension n, with m control batches and m -n d. f.

and confidence interval a.

T principal components analysis or partial least squares scores matrix for X

15
matrix.

ta partial least squares score vector for X matrix in partial least squares

dimension, a.

tCa consensus multiblock partial least squares score vector in partial least

squares dimension, a.

tla multiblock partial least squares score vector for X I matrix in partial least

squares dimension, a.

î2a multiblock partial least squares score vector for X2 matrix in partial least

squares dimension, a.

Ua partial least squares latent vector for Y matrix.

V 1. vibrational quantum number; 2. variance of Q statistic sample.

W matrix of partial least squares weight vectors for X matrix.

W principal components analysis cross-validation statistic.

Wa partial least squares weight vector for X matrix in dimension, a.

X matrix of near infrared observations-by-spectral wavelength.

XI multiblock partial least squares matrix of near infrared observations-by-

spectral wavelength at first process stage.

X2 multiblock partial least squares matrix of near infrared observations-by-

spectral wavelength at second process stage.

X pattern space vector.

Xe anharmonicity constant.

Y matrix of reference analysis measurements-by-variable.

yxj near infrared spectral data at wavelength Xj.

Zi multivariate exponentially weighted moving average vector for

observation i.

16
C HA P T E R 1

Introduction
1.1 Pharmaceutical Quality Control

Quality control of pharmaceutical manufacturing processes requires physical and

chemical characterisation of raw materials, process intermediates and finished products.

This is most often achieved through a series of tests which are performed in a laboratory

situated away from the production area. Analysis usually requires destruction of the

matrix of a representative sample of product, either for separation of the individual

components to facilitate their quantitative measurement or to determine physical

characteristics. For example, chemical tests may include tablet strength assay by high

performance liquid chromatography (HPLC) and total moisture content assay by Karl

Fischer titration. Physical tests may include tablet friability or particle size analysis.

These conventional tests measure the average amount of a component and its variance

within a batch but do not assess the distribution of these components within an

individual dosage unit or small sample of process material. With well controlled

processes, knowledge of the average quantity of a component and its variance within a

batch provides assurance that a batch of product will conform to its specification.

However there exists the potential for such a batch to produce a product of unexpectedly

low quality if the distribution of materials within dosage units is not uniform and if this

results in physicochemical instability. For example the spatial distribution of

components, such as moisture (which may exist as surface and free moisture and water

of crystallisation), drug substance (which may include polymorphic crystalline forms)

and excipients (for example disintegrant) may not be homogeneous within an individual

dosage unit or sample of blended powder. Non-uniform distribution of free moisture


17
within a dosage unit may result in instability of the drug substance which would alter its

biological efficacy. As conventional analytical tests examine representative samples of

product, this type of variation may go unnoticed.

An added disadvantage with remote laboratory analysis is the often lengthy time for

analysis at each process stage. Pharmaceutical products are therefore mostly produced

in batches, rather than through more cost effective continuous processes.

1. 2. Process Based Measurements

The emergence of modem spectrometric technologies that provide rapid and sample

non-invasive measurements has generated considerable interest in their application to

measurement and control of pharmaceutical processes. Near infrared spectroscopy

(NIRS) is regarded as a viable alternative to the traditional pharmacopoeial analyses

owing to its speed of measurement and high signal to noise ratio and the ability to

perform both reflectance and transmittance measurements of intact dosage forms and

powders. The wealth of information recorded in the near infrared (NIR) spectrum of a

sample matrix contains considerable chemical and physical information. NIRS has

therefore received recognition as a highly valuable technology within the

pharmaceutical industry.

Rapid measurements of process performance may be taken in the production area in

real-time, rather than in a remote laboratory, by use of fibre-optic probe instruments.

These may be used throughout a manufacturing process: for identification and

qualification of excipients through to analysis of powder blending by incorporation of a

fibre-optic through the wall of a blender (Hailey 1996; Maesschalck et al, 1998) and of

the tablets by reflectance and transmission spectroscopy (Andersson et al, 1999). The

advantage of process based measurements is that data are generated and analysed in

real-time thus providing the potential for efficient control of the process.
18
NIR measurements acquired throughout a manufacturing process are likely to contain

information that relates to the process performance history of a batch. Collectively these

measurements would form a ‘process fingerprint’. To date, NIRS pharmaceutical

process control applications have only focused on discrete parts of a process, such as

blending (Hailey et al, 1994, 1996). Complete statistical control of processes from start

to finish, have however been demonstrated in chemical engineering processes. These

applications made multivariate statistical comparison of batch process data against a

reference set of data from past successful batches and produced excellent monitoring

results (Nomikos and MacGregor, 1994; MacGregor et al, 1994).

For similar monitoring to be successfully demonstrated with NIRS of pharmaceutical

processes, the results must compare favourably with those of existing conventional

analyses. Once demonstrated, such models could be used as part of an intelligent system

for future process control by NIRS. This ‘smart’ manufacturing system, envisaged by

Hailey (1996), is also referred to as parametric release. This is defined by the European

Organisation for Quality, EOQ 111/91 as:

“ A system o f release that gives assurance that the product is o f the intended quality

based on information collected during the manufacturing process"'.

Currently, the only European Pharmacopoeia approved method of parametric release is

for terminally sterilised sterile products {The Industrial Pharmacist, 1999,10, 9).

However, the potential exists for applying these principles to other pharmaceutical

processes, such as a tabletting process, since the British Pharmacopoeia 1999 states

that: '"''...parametric release is, in appropriate circumstances, not precluded by the need

to comply with the pharmacopoeia".

19
1.3 Aim.

The aim of this investigation was to determine the ability of NIR spectroscopy for

statistical process control of a pharmaceutical tablet manufacturing process, from raw

materials through to blends and tablets. The process studied was the manufacture of a

Pfizer Ltd. marketed tablet with which some manufacturing anomalies had been

experienced over a manufacturing period. Samples from the unusual batches and also

from a considerable number of normal batches were obtained from the Quality

Assurance laboratory for retrospective NIR analysis.

Though NIR spectra of solid materials contain considerable chemical and physical

information, the physical aspect, which is largely due to particle size, is often quoted as

being a hindrance to chemical analysis. However, particle size information is useful

because it exerts a major role in determining process performance. Both chemical and

particle size analyses are required of all raw materials prior to manufacture and the

ability to determine both with a single, rapid NIR measurement would be invaluable.

The use of NIRS for chemical identification has been described (Candolfi et al, 1999a,

1999b), however an NIR method of identification and qualification which includes

particle size analysis has not been described. The ability to both identify raw materials

and to determine their particle size distributions by NIRS was therefore examined

(Chapter 2).

NIRS has also been used for on-line and at-line control of other pharmaceutical process

stages. These include blending applications (Maesschalck et al 1998; Sekulic et al,

1998) and control of tablet film coating (Andersson et al, 1999). At present, the use of

NIRS for statistical process control of an entire pharmaceutical manufacturing process

has not been demonstrated. In this study, the ability of NIRS to allow statistical process

control of an entire pharmaceutical tabletting process was investigated. Multivariate

statistical models of NIR data of blends and tablets were generated and their ability to
20
enable discrimination of high quality batches from anomalous batches was determined.

Consistency between blend and tablet models was examined as this would indicate

whether NIR measurements form a process performance fingerprint. Multivariate

models were used since these reduced the dimensionality of the data from several

hundred correlated variables to a few orthogonal variables which described the

systematic variance in the data. These models were therefore easier to interpret by

multivariate statistical process control (MSPC), which is in keeping with the concept of

statistical process control {Statistical Process Control, ed. Mamzic, 1995). The

multivariate models examined were ‘model-free’ models derived solely from the NIR

data (Chapter 3) and models derived from both NIR measurements and certificate of

analysis (C. of A.) data (Chapter 4). Comparison of the results of these two model types

was made to determine the necessity of C. of A. data.

A collection of some of the poor blends were also studied by NIR imaging microscopy

(Chapter 5). This technique was investigated in the hope that it would provide detailed,

spatially resolved chemical and physical information of the blends and thereby enable

diagnosis of reasons for poor blend performance. The NIR image data were subjected to

multivariate analysis to enable this to be achieved.

Overall conclusions of the suitability of NIRS for multivariate statistical process control

of this tablet manufacturing process were drawn after consideration of the pooled results

(Chapter 6).

21
1.4 Principles of Near Infrared Spectroscopy

The near infrared region of the electromagnetic spectrum was the first non-visible

portion of the electromagnetic spectrum identified, and was discovered by William

Herschel in 1800 (Stark et al, 1986). This region lies between the visible and mid

infrared regions of the electromagnetic spectrum and is defined by the American

Society for Testing and Materials’ (ASTM) Working Group on NIRS as the spectral

region spanning the wavelength range 780 to 2526 nm (Stark et al, 1986). In this region,

absorption of NIR radiation is due to overtones and combinations of fundamental mid

infrared vibration bands (Whitfield, 1986). These produce overlapping absorption

bands, hence visual identification of chemical groupings of a molecule from its NIR

spectrum is more difficult than from the ‘finger-print’ region of its mid infrared

spectrum. The overtone and combination bands are one to three orders of magnitude

weaker than the fundamental bands. This is advantageous for sampling of solids and

liquids, as NIR radiation tends to penetrate further into samples than mid infrared

radiation (Blanco et al., 1998).

For a material to absorb in the infrared region, the incident light must be of sufficiently

high energy to produce vibrational transitions in the molecules of the material. This

means that the frequency of the infra red light should match the fundamental vibration

frequency of a given molecule, and that a change in its dipole moment should occur due

to the fundamental vibration (Blanco et al, 1998).

The vibrational frequency,/, of a diatomic molecule may be assumed to follow the

harmonic oscillator model (Osborne et al, 1993), which obeys Hooke’s law as:

1 IT (lAl)
iTTC^jU

22
where c is the speed of light in a vacuum, k is the bonding force constant and the

reduced mass (Blanco et al, 1998).

The variation of the potential energy with bond distance may be described by a parabola

centred about the equilibrium distance and has evenly spaced vibrational energy levels.

Each energy level, E^, is given by:

^ v = / ( v + i) (1-4.2)

w here/is the vibrational frequency and v is the vibrational quantum number. The

selection rule for harmonic oscillator transitions is Av = ±1, hence the energy difference

between two consecutive energy levels will always be E(v+]) -E v =f, which is the

‘fundamental frequency’ of the vibration band (Blanco et al, 1998).

In polyatomic molecules, vibrations tend to involve complex movements of their

constituent atoms and in practice, the vibrations tend to be non-harmonic. This is

because real bonds, though elastic, do not exactly obey Hooke’s law due to coulombic

repulsion between nuclei (Osborne et al, 1993). The result of this is that the potential

energy curve for real bonds is only approximately parabolic. Deviation from the

parabola is most pronounced at the upper energy levels, where spacing between energy

levels also decreases with energy level. The harmonic oscillator model may be

improved by adding higher order terms to equation (1.4.2). The energy, Ev, for each

energy level is therefore described by (Blanco et al, 1998):

£ v = / . ( v + i ) - / . ^ . ( v + i ) + /î (1-4.3)

where v is the vibrational quantum number, Xg is the anharmonicity c o n s t a n t , i s the

23
uniform spacing between levels corresponding to a parabola with its centre at the

equilibrium distance and its the same curvature as the real potential energy function and

h is a. higher order term. Neglecting higher order terms, the frequency of a transition

between adjacent energy levels (v + 1) is dependent on the vibrational quantum

number (Blanco et al, 1998):

/ = /,[ 1- 2a : ,( v + 1)] (1.4 .4)

A consequence of introducing the quadratic term into Hooke’s law is that the selection

rule becomes Av = ±1, ±2, ..,±n, where n is an integer. As a result, other higher

frequencies, known as overtones or harmonics, appear in addition to the fundamental

band, at frequencies approximately n times greater than the fundamental frequencies.

The intensity of the overtones decays abruptly since transition probability falls rapidly

with increase in vibrational quantum number. In practice, just the first two to three

overtones are observable (Osborne et al, 1993). Polyatomic molecules may possess

several fundamental frequencies, and therefore will show simultaneous changes in the

energies of two or more vibrational modes. The frequency observed will either be the

sum of, or the difference between, fundamental frequencies. The result is very weak

bands which are known as ‘combination’ and ‘difference’ bands (Osborne et al, 1993).

Combination bands are unlikely to be observed in NIR spectra unless they arise from

two vibrations, linked either through a common atom or through several bonds;

difference bands arise from absorption of molecules which are in an excited state, and

have a very low probability of being observed in NIR spectra at room temperature

(Osborne et al, 1993). Anharmonicity produces combination bands which are slightly

smaller than the combined fundamental frequencies involved.

Most NIR bands arise due to overtones and combinations of various hydrogen bonds.
24
eg C-H , N-H, O -H and S-H. These overtones and combinations of hydrogen bonds

are observed in the NIR region due to the small mass and large force constant of

hydrogen (Blanco et al, 1998). Other groups, such as C=0, C-C, C -F and C-Cl exhibit

very weak overtone bands in the NIR region, which in practice may be difficult to

observe.

1.5 Diffuse Reflectance Spectroscopy

NIR light which penetrates a powder’s surface is scattered by particles many times

before emerging back through the surface. This is known as diffuse reflectance. In the

NIR region, solid materials tend to exhibit low molar absorptivity (typically 0.01 to 0.1

moF^ dm^ cm"’) (Blanco et al, 1998). This enables NIR diffuse reflectance

measurements to be made of solid materials.

1,5,1 Kubelka-Munk Theory o f Diffuse Reflectance

The most widely accepted theory which describes diffuse reflectance and the

transparency of light-scattering and absorbing layers of solid materials is the Kubelka-

Munk theory (Frei and MacNeil, 1973). This theory was developed for infinitely thick

opaque layers, and may be written as:

_ k (1 5.1)
2 R '. ' i

where R’oois the absolute reflectance of the layer, k is its molar absorption coefficient,

and s is the scattering coefficient. Instead of determining R’oo, it is usual to work with

the more convenient relative diffuse reflectance, which is measured against a

stan4ard typically made of either MgO, BaS0 4 , polytetrafluoroethylene (PTFE) or

ceramic. In these cases, k is assumed to have a value of zero and the absolute reflectance

25
is assumed to be one. However, since the absolute reflectance of standards exhibiting

the highest /?’oo values never exceeds 0.98 to 0.99, the actual relationship becomes:

Resample _ ^ (1.5.2)
R standard

and it is essential to specify the standard used (in Section 2, the diffuse reflectance

standards used were Spectralon (PTFE) (Labsphere Inc., North Sutton, NH, USA) and a

ceramic tile; in Section 3, the diffuse reflectance standard used was a ceramic tile).

Equation (1.5.1) therefore approximates to:

2R s

which shows that a linear relationship should be observed between F(Roo) and the

absorption coefficient, k, provided that s remains constant. The scattering coefficient, s,

is rendered independent of wavelength by using particles whose size is large in relation

to the wavelength used.

Reflectance measurements of a sample, diluted with a non- or low-absorbing powder,

which are measured against the pure powder, have an absorption coefficient which may

be re-written as the product 2.30ec, where g is the absorption coefficient and c is the

molar concentration. The Kubelka-Munk equation (1.5.3) may then be written as:

2R k'

26
where k ’ is a constant equal to 5/ 2.303 e. As F(Roo) is proportional to the molar

concentration under constant conditions, the Kubelka-Munk relationship is analogous to

the Beer-Lambert law of absorption spectrophotometry. At sufficiently high enough

dilution, the regular reflection from the sample approximates that from the reflectance

standard and is therefore cancelled out in any comparison measurement.

A linear relationship between F{Roo) and c is only observed when dealing with weakly

absorbing substances (such as powdered pharmaceutical materials in the NIR region)

and only when the particle size used is relatively small (ideally around 1 pm in

diameter) (Kortum, 1969). In addition, any significant departure from the state of

infinite thickness of the adsorbent layer assumed in the derivation of equation (1.5.3)

results in background interference, which in turn causes non-ideal diffuse reflectance.

When either adsorbents with large particle size or large concentrations of the absorbing

species are used, plots of F{Roa) versus c become markedly non-linear at higher

concentrations. Kortum (1969) attributes this reflectance phenomenon to a combination

of both regular and diffuse reflectance. Regular reflection is a mirror reflection, whereas

diffuse reflection occurs when impinging radiation is partly absorbed and partly

scattered by a material such that it is reflected in a diffuse manner, i.e. with no defined

angle of emergence. Regular reflectance is described by the Fresnel equation (Kortum,

1969):

_ _ ^refi _ ( n - l Ÿ +n^k^ (1.5.5)


~ In ~ {n + \ Ÿ + n ^ k ^

where k is the absorption coefficient and n is the refractive index. Regular reflectance is

superimposed on diffuse reflectance and this is postulated to be the cause of deviation

27
from linearity between F(Roo) and c at high concentrations of the absorbing species. It is

therefore recommended that interference caused by regular reflection, Rreg, is eliminated

as much as possible. This may be achieved by using powders with small particle size

and by diluting the absorbing species with suitable diluents.

Other theories developed to describe diffuse reflectance have been shown to be special

cases or adaptations of the Kubelka-Munk theory (Kortum, 1969). A summarised theory

and derivation of the Kubelka-Munk function, which is applicable to opaque infinitely

thick powders (approximately 1 mm in thickness, or greater, for fine powders), has been

described (Kortum, 1969).

In practice, however, NIR spectroscopic data tend to be used as either raw data (relative

reflectance which is henceforth abbreviated to reflectance, R\ relative transmission

which is henceforth abbreviated to transmission, T) or as apparent absorbance, A

(Osborne et al, 1993):

(1.5.6)
A = log 10 = ac

where c is concentration and a ' is a proportionality constant. Equation (1.5.6) applies to

diffuse reflectance measurements and similarly, with transmission measurements,

transformation to apparent absorbance. A, is given by:

(1.5.7)
A = log 10 = ac

Though equations (1.5.6) and (1.5.7) are not based on the Kubelka-Munk theory, highly

satisfactory results are often obtained with these transformations in NIR spectroscopic

applications.

28
1.6 Near Infrared Spectral Data Pre-processing

The NIR spectral data obtained from diffuse reflectance and transmission measurements

of solid materials comprise chemical information, described in Section 1.5, and physical

information arising due to multiple scatter. This latter effect results in spectral offset and

curvature or ‘non-uniform’ baselines. For chemical constituent identification and

quantification, this physical aspect of the spectrum may not be considered useful.

Various mathematical transformations have been developed to eliminate the effect of

multiple scatter from the spectra. These are applied to individual NIR spectra prior to

quantitative or qualitative data analysis. This is frequently referred to as ‘data pre­

processing’. Commonly applied pre-processing transformations include: absorbance

(described in Section 1.3); standard normal variate (SNV) (Barnes et al, 1989);

multiplicative scatter correction (MSG) (Ilari et al, 1988); quadratic baseline detrend

(DT) (Barnes et al, 1989) and first and second derivatives {Advances in Near-Infrared

Measurements, ed. Patonay, 1993).

The mathematical details for transforming reflectance and transmission measurements

to apparent absorbance were described in Section 1.5, equations (1.5.6) and (1.5.7)

respectively.

SNV transformation of an NIR spectrum is applied via the equation:

('a. ( 1 .6 . 1 )
= 1 ----

where y and r are the SNV transformed and original signals, respectively, at wavelength

Xj, jjL and cr are the mean signal of the original spectrum and the standard deviation of

I the spectrum respectively, over all J wavelengths. The effect of SNV transformation on

NCR spectra of materials of identical chemical composition but different particle size is

Iremoval of most spectral offset and considerable reduction in scatter-induced variation


29
between spectra of the same material.
MSC is a scatter correction transformation which produces results similar to SNV

(Dhanoa et al, 1994). However, this method of scatter correction requires that the mean

spectral response of a data set be calculated. Individual NIR spectra are then linearly

regressed on the mean spectrum according to the following equation:

= a + br^^+ei, ( 1.6 .2)

where y and r are the original and mean signals, respectively, at wavelength Àj,j = 1 ,...,

J wavelengths, a is the offset of the regression equation, b is the slope of the linear least

squares regression, ris the mean spectral response of the data set at wavelength Àj, and e

is the residual signal at wavelength Ay. The MSC transformed spectrum is obtained by

subtraction of the offset constant, a, from the spectral response at each wavelength,

followed by division of the slope term, b, at each wavelength.

DT is another popular scatter correction method which is commonly applied to NIR

spectra. The transformation involves fitting the original spectrum to a second order

polynomial:

y i , = “ + bjl, + cj^,+ e^ (1.6.3)

where y is the original spectral response at wavelength Ay, j = 1,..., J wavelengths, a

represents the spectral offset, b and c are the coefficients of the quadratic least squares

equation, and e is the residual signal at wavelength Xj. An estimate of the spectral

baseline is obtained from the first three terms on the right hand side of equation (1.6.3):

, .2 ^ . (1.6.4)

30
The estimated baseline is subtracted from the original spectrum to produce the residual

or ‘detrend’ spectrum. This scatter correction removes both spectral offset and curvature

from the original spectra.

The high signal to noise ratio of NIR spectral data enable derivative spectra to be

calculated by the difference method. The procedure to calculate difference derivative

spectra involves calculating the difference between spectral data points at evenly spaced

wavelengths. Transformation of an NIR spectrum to its first derivative may be

calculated according to:

where y is the first derivative of the original signal at r is the original signal at Xj.

The number of data points, used in the smoothing may be varied depending upon the

required amount of smoothing.

The second derivative difference spectrum may be calculated in a similar fashion to the

first derivative difference spectrum, according to:

= (%.. - )/ 2 ( 1 .6 .6 )

Derivative spectra may also be calculated by applying a least squares digital polynomial

smoothing filter, such as a Savitzky-Golay smoothing filter (Bromba, M. and Ziegler,

H., 1981, 1983). The latter method of calculation of the derivative spectrum is better

able to preserve peak heights than the difference method. Calculation of the second

derivative spectrum largely eliminates spectral offset and baseline curvature which

result from multiple scatter. In addition, chemical peaks in the spectra are resolved. If

the original spectral data is reflectance, the second derivative peaks have a positive
31
value; if the original spectral data is absorbance, the second derivative peaks have

negative values. The Savitzky-Golay method for calculating 2"^ derivative spectra was

used (quadratic, 11 data point: Chapters 2, 3 and 4; quadratic 15 data point: Chapter 5).

1.7 Multivariate Analysis

Multivariate analysis methods are commonly employed in NIR spectrometry. Typical

techniques used include principal components analysis (PCA) and partial least squares

regression (PLSR). These methods are used to reduce the dimensionality of the often

highly collinear spectral data, from several hundred variables (wavelengths), to a set of

new variables in reduced dimensional space. PCA is a useful technique that allows

exploratory data analysis and qualitative analysis of a spectral data set. PLSR is a biased

regression method that produces a set of new latent variables that maximise the

covariance between the spectral data and a set of reference data. With both of these

latent variable techniques, the various pre-processing transformations, described in

Section 1.6, may be applied to the spectral data, and their effect on the qualitative and

quantitative models produced determined.

1.7.1 Principal Components Analysis

As NIR spectra are highly collinear, it is often advantageous to transform them to their

principal components via PCA. This is a multivariate data reduction method, which

produces linear combinations of the original variables. These may be thought of as a set

of new variables that have the property of being uncorrelated (Jackson, 1991). The PCA

transformation is a two step transformation (Kirsch and Drennen, 1995). In the first

step, the cartesian co-ordinate system defined in multidimensional space, is translated to

the centre of a spectral cluster. This is achieved through variable-wise mean centring

32
of the spectral data. The next step is the rotation of the cartesian co-ordinate system to

describe, as nearly as possible, all the variations present in the spectral cluster. The co­

ordinate system remains rectangular throughout the process and is moved rigidly from

one position to the next. This rotation step decomposes the spectral variation into

orthogonal (independent) components.

In the process of calculating each principal axis, the perpendicular distances between

the spectral data points and each axis are minimised. Hence, the first principal

component (PC) describes the largest amount of spectral variation. This is then

effectively subtracted from the data cluster. The second principal component is defined

in a similar manner to the first, except that during the rotation to the second principal

axis, the second axis remains perpendicular to the first. This orthogonal condition forces

each component to account for the maximum spectral variation remaining in the cluster.

Typically, only a small number of these iterations are required before additional

components account for random noise. These components are therefore usually ignored.

The transformation process ultimately produces spectral co-ordinates that are expressed

in a reduced dimensional space and effectively overcomes the problems of collinearity

and noise. The transformation to principal axes simplifies the selection of variables for

regression in quantitative analysis (variables may be added to or deleted from regression

equations without changing the coefficients of the remaining variables).

Mathematically, PCA decomposition of the variable-wise mean centred spectral data, X,

produces a score matrix, J , a loadings matrix, P, and a residuals matrix, E\

X = T P +E (1.7.1)

The loadings matrix comprises a loadings vector, for each extracted PC. Each of these is

a set of weights that identifies the variables (wavelengths) which contribute most to
33
that PC. As a result, it may be possible to assign physical or chemical interpretation to a

PC’s loadings vector. For example, the loadings may represent chemical peaks. The PC

score vector of each spectral observation is produced by the product of the original

spectrum with the loadings vector, and therefore will have score values for each PC

dimension that are related to the magnitude of each of the principal components in the

original spectra.

The value of the PC score value may therefore reflect the amount of a chemical or

physical constituent present in the sample. PC scores are therefore useful in cluster

analysis, principal components regression and statistical process control techniques.

1.7.2 PCA M odel Rank Determination

The number of principal components extracted from the spectral data may be selected

according to a number of criteria. One simple method for estimating the number of

useful components, the model rank, {i.e. those that account for systematic variance)

involves visual inspection of the loading vectors. PCs with loadings which appear to

represent noise, may be discarded. Another popular method is to calculate the

percentage sum of squares accounted for by the model. This involves calculation of the

sum of squares (%SS) of the original variable-wise mean centred data, X, and of the

residual matrix, E, after n PCs have been extracted, over all observations, m, and

variables, n:

^ m n m n ^ (1.7.2)
%SS = 100 X
1=1 y=l 1=1 7=1

Where this method is used to determine the number of PCs to retain, the process of

extracting PCs is terminated after an arbitrary %SS has been reached. Typically this
34
may be 95%SS or 99%SS.

Cross-validation is another popular method for determining the rank of a PC model. The

process involves randomly dividing a data set into a number of sub-groups. A PCA

model is calculated for the data set after removal of one of the sub-groups. The

remaining sub-group, is then projected onto the eigenvectors of the model, according to:

T = XP

The resultant scores of this sub-group are then used to estimate the original data of the

sub-group according to:

X =TP

This procedure is repeated for all subgroups, and for n PCs. The predicted residual error

matrix for all observations in a sub-group is calculated according to:

E ^ X - X (1.7.5)

The predicted residual error sum of squares (PRESS) for n PCs and all sub-groups is the

sum of the squared residual error matrix, E, in equation (1.7.5), over all observations, m

and variables, n, divided by the number of elements in the original spectral data set. For

zero PCs extracted, the value of PRESS used is the sum of squares of the original

spectral data, X, divided by the number of elements. The rank of the PCA model is the

number of PCs which provide the smallest PRESS value (Jackson, 1991).

Another statistic used to determine the rank of a PCA model is the R statistic (Lindberg

et al, 1983; Wold, 1978). This involves calculation of PRESS, as in equation (1.7.5). In

35
addition, PCA models are calculated with n PCs, using all of the spectral data. The

residual error sum of squares, is the sum of the square of E, from equation (1.7.5), over

all observations, m, and variables, n, divided by the number of elements in the spectral

data. The R statistic for each PC is then calculated according to:

(1.7.6)
j^^PRESS{n + \)
RSS(n)

where n is the number of PCs extracted. The first value for n is zero, and the RSS for

zero PCs extracted is the sum of squares of the spectral data, X, over all observations, m,

and over all wavelengths, n, divided by the number of elements in the spectral data.

With successive PCs extracted, the value of R should increase from near zero to unity.

Extraction of PCs terminates at n PCs when R exceeds unity. This implies that the latest

component extracted did not better the prediction errors than the previous component,

hence n - l PCs is the rank of the model. The W statistic (Eastment and Krzanowski,

1982) is another PCA cross validation statistic. It is calculated according to (1.7.7):

[PRESS in -1 ) - PRESS i n ) ] / (1.7.7)


W=
PR ESSin)ID ,

where

Dn f =m + p - 2 n (1.7.8)

= p i m - V ) - ^ i m + p -2 i) (1.7.9)
1=1 36
m is the total number of observations and p is the number of variables (wavelengths).

With successive PCs extracted, the value of W should fall. PCs are included in the

model up to the number, n, which return values of W greater than or equal to unity, with

additional PCs returning values for W less than unity.

1.7.3 Q Statistic PCA Residual Analysis

PCA models of NIR spectral data may be used for subsequent qualification of

unclassified NIR spectra, and for detection of outliers, using the Q statistic (Jackson,

1991; MacGregor et al, 1994; Nomikos and MacGregor; 1994). This statistic gives a

measure of the distance of the observations from the «-dimensional space and is

calculated as (Jackson, 1991);

Q =(x -x y (x -x ) (1.7.10)

where x represents the original observation and jc is the value of the observation

predicted by the PCA model. The critical value for Q may be calculated as (Jackson,

1991):

, eA % + i), 1
Qa =

0i=Tr(E) (1.7.12)

% = Tr (E^) (1.7.13)

37
6>3 = Tr(EO (1.7.14)

where E is the residual covariance matrix after n PCs have been extracted and:

6>.
(1.7.16)
y [ïë X

Observations whose Q values exceed the upper limit do not belong to the modelled

class. Such observations should be removed from the data set and the model re­

calculated.

The confidence interval (or control limit) for Q requires calculation of the square and

third power of the residual covariance matrix, which is computationally lengthy. An

alternative and computationally faster method of calculating the control limits has been

described (Nomikos and MacGregor, 1995). This method approximates the squared

residuals, Q, to a weighted chi-squared distribution (g^h)- The weight (g) and the

degrees of freedom (h) are both functions of the eigenvalues of Z. Estimation of g and h

is based on matching moments between a g ^ h distribution and the reference distribution

of Q. The mean and variance of the g ^ h distribution (/ll = gh, = 2gh^) are equated to

the sample mean (m) and variance (v) of the Q sample. Previous studies (Nomikos and

MacGregor, 1995) have found this to be a quick and reliable method to estimate g and h

provided that the number of Q observations is sufficiently large. The control limit on Q

at significance level a for batch k is given by (Nomikos and MacGregor, 1995):


38
(1.7-17)

Where ^ im h , a is the critical value of the chi-squared variable with 2m h d. f. at

significance level a.

1.7.4 Partial Least Squares Regression

Partial least squares regression (PLSR) is similar to principal components regression,

and provides a set of latent vectors that are analogous to PCs (Kirsch and Drennen,

1995). Partial least squares (PLS) attempts to summarise as much of the variation in the

dependent variables as possible using only the relevant factors contained in the spectral

data. PLS has the advantage that it allows for measurement error in both the spectral and

independent data sets, whilst modelling the spectral data and correlating to the reference

data (chemical or physical). Only significant PLS components are retained, which

provides a noise reducing effect (Kirsch and Drennen, 1995).

The PLS procedure projects the information in the high-dimensional data spaces (%, Y)

down onto low-dimensional spaces defined by a small number of latent variables. The

NIR (%) and reference analytical data set {Y) are usually mean-centred and scaled to

unit variance and then decomposed as (MacGregor et al, 1994):

(1.7.18)
0=1

y = 'Z t y .+ F (1.7.19)
0=1

where the latent vectors t^ are sequentially computed from the data for each PLS

dimension (a= 1 , 2 , ..., A) such that the linear combination of the x vectors defined by
39
the latent variable:

(1.7.20)

and the linear combination of y vectors defined by the latent variable:

maximise the covariance between X and Y that isexplained at eachdimension. The

vectors and are loading vectors whose elements Waj and ^aj express thecontribution

of each variable jcj and yj, respectively, towards defining the new latent variables fa and

Wa.

The predictive PLS model is a biased regression model:

Y = X^ +F (1.7.22)

where the matrix of regression coefficients is given by:

P = W {P^W y'Q ^ (1.7.23)

where W, P, and Q are (A:*A), (â:*A) and (m*A) matrices whose columns are the vectors

Wa,Pa, and ^a The number of PLS dimensions, A, required to extract the information

from X and Y and provide the lowest prediction error in Y is usually determined by

cross validation. This is performed similarly for PCA, by randomly dividing

observations in X and their corresponding reference values in Y into subgroups. One

sub-group from X and its corresponding observations in Y are withheld from calculation

of the PLS model. Equation (1.7.22) is then used to predict the reference analytical

values of the spectral sub-group. The sum of squares of the differences in the predicted
40
and measured reference sub-group are calculated, over all observations, m, and PLS

components, a. The process is repeated for all subgroups, and the sum of squared

residuals of each sub-group summed over all sub-groups, to provide the value of

PRESS. The number of PLS components retained, a, is that value with the minimum

PRESS value.

Similar to PCA, the PLS loading vectors associated with the spectral data set, P, may

have physical or chemical interpretation. Their score values, T, and sum of squared

residuals, Q, may also be used in cluster analysis and statistical process control

techniques as described for PCA.

1.8 Multivariate Statistical Process Control

Statistical process control (SPC), is a technique which employs a related set of

statistically based tools for monitoring, analysing, controlling and effecting

improvements in the performance of a process {Statistical Process Control, ed. Mamzic,

1995). SPC is a proven technique, that is capable of producing dramatic results, and is

widely applicable to many processes where the output varies and where minimising

variability would improve operation.

The original concepts of SPC were devised by Walter A. Shewhart in the early 1920s

{Statistical Process Control, ed. Mamzic, 1995). Importantly, he observed that when a

process remains in a state of statistical control, the random distribution of each output

variable is repeatable and is therefore predictable from one period to another. In this

state, a process is said to be affected only by ‘common’ causes of variation. These are

random, uncontrollable phenomena that are inherent in the process. This observation led

to development of the Shewhart control chart. This chart plots sampled data with respect

to time, in a form which is visually easy to interpret and thereby determine whether a

process is operating normally {i.e. where only common causes of variation are
41

located in References under the title Statistical Process Control.


affecting the process), or whether assignable causes have affected the process and

moved it out of control. Periodically, a number of measurements of the process are

made and the average value plotted on a chart. The chart also shows upper and lower

control limits which are based on the standard deviation of the process variable when

only common causes of variation are affecting the process. Establishment of these

control limits is referred to as control phase 1. Control phase 2 monitoring involves

monitoring and control of future process observations using the control phase 1 limits.

With multivariate process measurements, it is often desirable to simultaneously monitor

and control a number of process characteristics. This is known as multivariate quality

control or multivariate statistical process control (MSPC) (Montgomery, 1997). Original

work in this area of SPC was carried out by Hotelling in 1947 in analysis of bombsight

data. With multivariate data, however, use of individual monitoring control charts for

each variable is associated with an increase in the overall type I error, a. This is the

probability of rejecting the null hypothesis {i.e. that the process is operating in a state of

statistical control) when it is correct. With a set of n independent measured and

controlled variables, the overall type I error, a ’ is calculated according to (Montgomery,

1997):

a'= l-{l-a y ( 1 .8 . 1 )

However, if the n simultaneously controlled variables are not totally independent,

equation (1.8.1) does not hold. An alternative approach, recommended by Jackson

(1980), is to use the PCs of the process data for multivariate monitoring. This has the

advantages of reducing the number of variables for monitoring and provides a known

type I error since the PCs are independent.

Jackson (1991) has identified four requirements to achieve MSPC:


42
1. A single answer should be available to answer the question: ‘Is the

process in control?’

2. An overall Type 1 error should be specified.

3. The procedure should take into account the relationships among the

variables.

4. Procedures should be available to answer the question: ‘If the process is

out of control, what is the problem?’

PCs of NIR spectra may be used for MSPC purposes. From these, sample variance-

covariance matrices of the process data may be estimated. This is known as ‘Control

phase r (Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and is performed to

establish statistical control levels of multivariate process data.

1.8.1 Mahalanobis Distance and Hotelling*s Control Ellipses

During control phase 1, control ellipses (Montgomery, 1997) are calculated for the PCs

extracted from the spectral data. Assuming that the data for n variables follows a

multivariate normal distribution, the probability function (Massart et al, 1988) ,/(x), is

given by:

( 1. 8 .2 )

where fi represents the population mean vector (the centroid in the pattern space), Z is

the population variance-covariance matrix. The square root of the expression in brackets

is the generalised or Mahalanobis distance (Massart et al, 1988), D:

43
=(x-nyi.-\x-n) (1.83)

This method of classification assumes that each class may be modelled by a multivariate

normal distribution. D^is computed for each object, x, in the learning class and follows

a chi-squared distribution. This enables 95% confidence limits for the ellipse to be

calculated. In control phase 2, an unknown sample is classified by measuring the

distance between itself and each of the modelled classes’ centroids. In practice,

however, the population variance-covariance matrix requires estimation from the data.

Hence equation (1.8.3) therefore becomes:

T'^ = ( x - n y s ~ ' { x - n ) (1.8.4)

where 7^ is the Hotelling’s 7^ distance, x is the unknown sample vector to be classified,

11 is the target class mean vector and S is the target class sample variance-covariance

matrix. This statistic has been shown to follow an F distribution (MacGregor et al,

1994; Neave, 1995) with upper control limit (UCL) given by (MacGregor et al, 1994):

where n is the number of variables (principal components), m is the number of samples

(batches), m-n) is the upper 100«% critical point of the F distribution with («, m -n)

degrees of freedom. The value of Cf typically is set to 95% or 99%.

1,8.2 Multivariate Exponentially Weighted M oving Average (MEWMA)

The Hotelling’s 7^ uses information from only current samples and is therefore
44
insensitive to small or moderate drift in the mean vector. The MEWMA (Lowry et al,

1992) control chart is a moving average control-chart which can show trends in the

process, such as drift and systematic variation. The MEWMA Z„ is given by (Lowry et

al, 1992):

Z, = A jc,+(1-A)Z,_, (1.8.6)

where A lies in the range: 0 - 1; jc, is the vector of the /th sample and Z,.y is the value of Z

for the /-1th sample. The value of plotted on the control chart is given by:

= z : z ; 'z . (1.8.7)

where the variance-covariance matrix, Zzi, is given by:

2, (1.8.8)

This statistic uses the process mean score vector and sample variance-covariance matrix

determined in control phase 1. The upper control limits used with this chart are based on

tabulated data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models

with a greater number of PCs may be assigned control limits based on the chi-squared

distribution (Neave, 1995) multiplied by a factor of 1.05, as suggested by Lowry et al

(1992).

1.8.3 Process Variance: Sample Generalised Variance

The sample generalised variance (Montgomery, 1997), ISI, is the determinant of the

sample variance-covariance matrix, S, and is a widely used measure of multivariate


45
dispersion that may be used to monitor the variability of a process over time.

With two variables (in this case PCs), the UCL and mean control limit, CL are

calculated from (Montgomery, 1997):

UCL=\I}{b, ) (1.8.9)

CL = b X ( 1. 8 . 10)

where bi and b%are given by:

( 1. 8 . 11)

and

' in-lŸ"
fi “7+2)-p[ ~7)
;= 1 j=]
( 1. 8 . 12)

However the control limits for the sample generalised variance above, are applicable

only for situations where two variables are monitored. With more than two variables,

Alt (1985) recommends the use of Anderson's asymptotic normal approximation

(Anderson, 1984):

Aloi A
-1 (1.8.13)

46
Where S is the sample covariance matrix of a batch with n variables and m degrees of

freedom. This sample generalised variance is asymptotically normally distributed with a

mean of zero and variance 2n; IZI is a theoretical sample generalised variance and is

equivalent to the mean sample generalised variance of all batches used in control phase

1 (Alt, 1985; Anderson, 1984).

1.8.4 Diagnosis o f Out-of-Control Observations

Hotelling's Stacked Bar Charts And Shewhart-Type Plots on Individual PC Scores

Where the process mean vector of batch data exceeds the upper control limit in

Hotelling’s control phase 2, it is useful to attempt a diagnosis of the problem (Kourti

and MacGregor, 1996), if the PCA models have physical or chemical interpretation

(Nomikos and MacGregor, 1995) {i.e. if the loadings of the PCs appear to represent

physical or chemical components).

The simplest method of achieving this is to examine the individual PC contributions to

the multivariate 7^. These contributions, may be plotted as a stacked bar chart

(Jackson, 1980, 1981a, 1981b), or individually. Large values of f^i would indicate

which PCs account for the problem.

In addition, Shewhart control charts may be constructed for the individual PC scores

(Jackson, 1991). These use the mean and range for each PC score sample, measured for

the historical data set. The overall or grand mean is equal to zero (unless additional

observations are projected into the model space) with each model, whilst the range is

equal to the standard deviation of the observations multiplied by the corresponding

value of the r-distribution (95 % and 99% control limits). This is used instead of the

traditional three-sigma limits to avoid Type 1 errors (Jackson, 1991).

47
CHAPTER 2

Measurement of Powdered Pharmaceutical

Material Particle Size by Near Infrared

Spectroscopy

2.1 Introduction

The measurement of paiticle-size for pharmaceutical materials is important (Aulton,

1988; Barth et al, 1987) because it influences bulk physical properties (Washington,

1992), and determines the ability of powders to flow, mix, granulate and dissolve. It is

also often a requirement in pharmaceutical manufacturing processes that particle size

measurements are performed on raw materials {British Pharmacopoeia 1999, 1999).

Commonly employed methods of measurement are forward angle laser light scattering

(FALLS) and electrical zone sensing (Simmons, 1993). A disadvantage with these

methods is that samples generally need to be analysed away from the production area,

which is time consuming and leads to manufacturing delays (Hailey et al, 1996).

In this chapter, the ability of NIRS to determine powdered pharmaceutical material

particle size is examined. Section 2.8 examines measurement of median particle size by

multiple linear regression of NIR measurements and FALLS measurements. In Section

2.9 this method is further developed to measure the cumulative percentage frequency

particle size distribution of microcrystalline cellulose by NIRS. In Section 2.10, the

percentage frequency particle size distribution of this material is measured by NIRS

using PLSR. Section 2.11 deals with classification of grades of powdered material by

48
cluster analysis methods.

2.2 Review of The Literature

The potential application of NERS for particle size determination of powdered

pharmaceuticals has long been suggested (Ciurczak et al, 1986; Plugge and Vlies, van

der, 1993; Vlies, van der, 1996). Despite these suggestions, NIRS has remained largely

a technique used for chemical analysis (Dubois et al, 1987; Aucott et al, 1988, Cowe et

al, 1989, Dreassi et al, 1995a, 1995b, 1995c; Wargo and Drennen, 1996; Forbes et al,

1996).

Powdered pharmaceutical materials may be suited for NIR particle size determinations

since they are diffusely reflecting materials (Ciurczak et al, 1986). In the NIR region

(1000 nm to 2500 nm) these materials both absorb and scatter light, resulting in spectra

with non uniform baselines and varying offsets. These scatter effects vary with the

particle size (Vlies, van der, 1996), sample porosity (Hailey et al, 1996) (and hence

compaction pressure) and with the wavelength (Bull, 1991) and can be described using

Rayleigh and Mie theory (Kortum, 1969), or alternatively using the Kubelka-Munk

theory of diffuse reflectance (Kortum, 1969).

Previous studies that have examined the effects of particle size on NIR spectra have

demonstrated that reflectance varies non-linearly with particle size (Kortum, 1969;

Norris and Williams, 1984; Dari et al, 1988; Ciurczak et al, 1986). Ciurczack et al

(1986) found that reflectance exhibited an inverse relationship with mean particle size in

agreement with Mie theory (Kortum, 1969). However, this relationship does not

necessarily apply in all cases and is dependent on the shape of the particle size

distribution of the sample (Kortum, 1969), the particle shape (Kortum, 1969) and the

material's refractive index (Kortum, 1969). The presence of very small particles will

further complicate the relationship as these may exhibit Rayleigh scatter, which is
49
proportional to the third power of the particle size (Kortum, 1969).

The complicated relationship between reflectance at a spectral wavelength and particle

size has resulted in most work in this area focusing on chemometric calibration

methods, rather than theoretical models (Vlies, van der et al, 1995; Plugge and Vlies,

van der, 1996; Ilari et al, 1988). A novel method for classifying pharmaceutical powders

involved transforming second derivative NIR spectra to polar co-ordinates (Vlies, van

der, et al 1995). Each NIR spectrum was reduced to a single quality point in a plane,

that could therefore be plotted in cartesian co-ordinates. Linear plots of the logarithm of

the particle size versus %ory co-ordinate were found to show significant correlation.

Multivariate calibration methods have proven most successful in calibrating NIR data to

measure particle size (Ilari et al, 1988). The first method to appear in the literature

produced NIR calibrations to determine particle size in organic and inorganic powders.

Importantly in this paper, a new method of scatter correction, MSC, was described (Ilari

et al, 1988). The scatter correction technique linearly regressed each spectrum to the

mean of the data set and resulted in two coefficients: an intercept and a slope

coefficient. For each spectrum, these were used to correct for scatter. Reflectance

spectra for each material were recorded at 19 wavelengths and transformed to Kubelka-

Munk function. PLSR models were produced using the transformed spectra and particle

size measurements. Inclusion of the scatter coefficients was found to improve

calibration precision. Another multivariate method of calibrating NIR spectra to

measure particle size that has shown some success is artificial neural networks (Frake et

al, 1998a, 1998b).

2.3 Materials Used

Section 2.8 Measurement o f The Number Median Particle Size

Single batches of aspirin and anhydrous caffeine (Sigma Chemical Co., St Louis,
50
USA) and paracetamol (Boots Pharmaceuticals, Nottingham, UK) were used.

Microcrystalline cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19 batches) and

Avicel PH200 (single batch) were all from FMC International, Wallingstown, Little

Island, Co Cork, Ireland. The batches of lactose monohydrate used were a single batch

of a reagent grade material (Avocado Research Chemicals Ltd., Hey sham, UK), 9

samples of 110 mesh obtained from two manufacturers (DMV International, Veghel,

Netherlands and Lactose New Zealand, Hawera, New Zealand) and 18 samples of

Fastflo (Foremost Ingredients Group, Wisconsin, USA).

Section 2.9 Measurement o f The Cumulative Percentage Frequency Particle Size

Grades of microcrystalline cellulose used were: Avicel PH 101 (16 batches), Avicel

PH 102 (19 batches) and Avicel PH200 (single batch), all from FMC International,

Wallingstown, Little Island, Co Cork, Ireland. These batches of Avicel were those used

in Section 2.8.

Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution

The microcrystalline cellulose samples used were obtained from one supplier (FMC

International, Wallingstown, Little Island, Co Cork, Ireland). These samples (« = 113)

were from six different grades with different particle size distributions and nominal

moisture contents which ranged from 0 .8 to 4.8%"^/m.

Section 2.11 Classification o f Excipient Grades by Cluster Analysis Methods

Grades of microcrystalline cellulose used were: Avicel PH 101 (« = 9 batches), Avicel

PH 102 (« = 9 batches), all from FMC International, Wallingstown, Little Island, Co

Cork, Ireland. The batches of lactose monohydrate used were: lactose Regular (« = 9

batches, obtained from two manufacturers: DMV International, Veghel, Netherlands


51
and Lactose New Zealand, Hawera, New Zealand) and lactose Fastflo (« = 9, all from

Foremost Ingredients Group, Wisconsin, USA). The batches of Avicel used were those

of Sections 2.8 and 2.9. The batches of lactose used were those of Section 2.8.

2.4 Sample Preparation

Sections 2.8 and 2.9: Measurement o f The Number Median Particle Size and

M easurement o f The Cumulative Percentage Frequency Particle Size

A range of aspirin samples of different particle size distributions were obtained by

grinding the coarse bulk material with a mortar and pestle (approximately 50 g).

Samples were taken successively with every few minutes of grinding. The ground

aspirin samples were air-jet sieved to remove fines.

Air-jet sieve fractions of the aspirin, anhydrous caffeine and paracetamol (in each case

approximately 50 g of material was used) were produced using an Alpine air-jet sieve

(Alpine, Augsburg, Germany) with stainless steel wire mesh sieves of different sieve

diameter (75, 56, 50, 40 and 36 pm). With use of the smallest air-jet sieve, an additional

sample was collected from the sieve filter paper.

Sieve fractions of approximately 200 g of material, from single batches of Avicel

PH 101, Avicel PH 102, Avicel PH200 and reagent grade lactose monohydrate were

produced by machine sieving (Endecotts Ltd., London, UK) for 20 minutes using a nest

of progressively finer stainless steel wire mesh sieves (150, 90, 63, 45, 38 and 32 pm).

In addition, material falling through the 32 pm sieve was collected.

For each prepared sample (sieve fraction or bulk material), approximately 8 g of

material was collected and used to fill a narrow soda glass vial (25 mm wide by 50 mm

deep) for NIR and reference particle size analyses.

52
Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution

A single narrow soda glass vial was filled with material for each powdered sample (n =

113) (approximately 8 g of material per vial). These were allowed to settle overnight.

Section 2.11 Classification o f Excipient Grades by Cluster Analysis Methods

Bulk samples of lactose monohydrate and microcrystalline cellulose were used as

obtained from the manufacturers. A single narrow soda glass vial was filled with

material for each powdered sample ( « = 1 8 for each material) (approximately 8 g of

material per vial). These were allowed to settle overnight.

2.5 Reference Particle Size Analysis

,Sections 2.8 and 2.9: M easurement o f The Number Median Particle Size and
M easurement o f The Cumulative Percentage Frequency Particle Size Distribution

The particle size distributions of sieve fractions and of the remaining batches of bulk

lactose 110 mesh, Fastflo, Avicel PH I01 and Avicel PH 102 were measured by FALLS

(Malvern 2600C, Malvern Instruments, Malvern, UK). A sample from each soda glass

vial (approximately 100 mg in each case) was suspended in a disperse medium in which

it was practically insoluble with surfactant (sorbitan trioleate or dilute household

detergent) prior to particle sizing and was gently shaken using a vortex-mixer to prevent

formation of agglomerates. Avicel and aspirin samples were dispersed in cold, distilled

water using dilute detergent. Anhydrous caffeine and lactose monohydrate were

suspended in cyclohexane with sorbitan trioleate. Paracetamol was suspended in

pentane with sorbitan trioleate. Microcrystalline cellulose and lactose monohydrate

particle shapes were assessed by scanning electron microscopy using a Philips XL20

scanning electron microscope (Philips Electron Optics, Eindhoven, Netherlands).

' Cumulative percentage frequency particle size data for Avicel samples are supplied in
53
Appendix D, CD-ROM.
Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution

Particle size distribution data for the microcrystalline cellulose samples were acquired

by laser diffraction using a Malvern Mastersizer X Laser Diffraction instrument

(Malvern Instruments, Malvern, UK) equipped with a Dry Powder Feeder. These data

were supplied by the manufacturer, FMC International, Wallingstown, Little Island, Co

Cork, Ireland. These data are supplied in Appendix D, CD-ROM (inside back cover).

Section 2.11 Classification o f Excipient Grades by Cluster Analysis Methods

The materials were not particle sized. Instead, they were classified by their nominal

grade. NIR spectral data are supplied in Appendix D, CD-ROM (inside back cover).

2.6 Near Infrared Reflectance Measurements

Sections 2.8, 2.9 and 2.11 M easurement o f The Number Median Particle Size,

M easurement o f The Cumulative Percentage Frequency Particle Size and

Classification o f Excipient Grades by Cluster Analysis Methods

NIR measurements were made using a FT-NIR NIRVIS spectrometer (No. 100.1,

Buhler AG, Uzwil, Switzerland) fitted with a Buhler fibre-optic probe (No. 110.2).

Reflectance spectra were acquired by inserting the probe into the sample, and were

recorded over the range 4008 to 9996 cm“^ (500 data points), each spectrum being the

average of six scans. The reflectance reference used was Spectralon (Labsphere Inc.,

North Sutton, NH, USA). Microcrystalline cellulose data (Sections 2.8, 2.9 and 2.11)

are supplied in Appendix D, CD-ROM (inside back cover).

Section 2.10 Measurement o f The Percentage Frequency Particle Size Distribution

Near infrared reflectance measurements of the powdered samples were made using a

Foss NIRSystems grating spectrometer (model 6500) (Foss NIRSystems, Silver


54
Springs, MD, USA) equipped with a Rapid Content Analyser (RCA) module. Each

sample was centrally positioned on the window of the RCA stage, using an iris

mechanism, above the lead sulphide detectors. A diffuse reflectance spectrum of each

sample was recorded with the lid of the RCA closed. Each spectrum was the average of

32 scans and was recorded over the range 1100 to 2500 nm, at 2 nm increments (700

data points). The reflectance reference used was a ceramic tile. Data are supplied in

Appendix D, CD-ROM (inside back cover).

2.7 Data Analysis

All programs were written in Matlab 5.2 Scientific and Technical Language, except for

the MLR program. This was written in-house in C and used routine svdfit, available in

the literature (Press et al, 1992).

2.8 Measurement of The Number Median Particle Size

2,8.1 Preliminary Results

Reflectance at any wavenumber versus number median particle size, dsoox Mdso

exhibited a curvilinear relationship. To allow for this, two different approaches were

compared: single wavenumber quadratic least squares regression and full two-

wavenumber search MLR. With each of these calibration methods, different pre­

treatments of the NIR spectral and FALLS dso data were applied and their effects on

standard errors of calibration (SEC) and prediction (SEP) (Mark, 1991) observed:

SEC = (=1_________
m —a —\ (2 .8 . 1 )

55
SEP = (=1 ( 2 .8 .2 )
m

where F, is the measured value of the zth sample in the calibration or prediction set,
A

is the calibration estimated or predicted value of the ith sample, a is the number of

wavelengths used in the regression, m is the number of samples in the calibration or

prediction set. The effects of the data pre-treatments on bias and linearity were also

investigated.

Quadratic least squares fits of NIR spectral and FALLS data were used to allow for

gentle curvature in calibrations. The NIR spectral data were diffuse reflectance (of

infinite thickness for all practical purposes), R\ mean corrected reflectance (where the

mean reflectance value of an individual spectrum was subtracted from the reflectance at

each of it's spectral wavenumbers); absorbance, log {HR) and Kubelka-Munk function,

f(R). A search of all 500 data points was used to select the wavenumber giving the

smallest SEC for each NIR data pre-treatment.

The second calibration technique applied was two wavenumber MLR (Osborne et al,

1993). A search of all combinations of two wavenumbers from the 500 measured by the

spectrometer was carried out. The NIR spectral data used were again reflectance, mean-

corrected reflectance, absorbance and Kubelka-Munk function. FALLS data pre­

treatments investigated were J 50, \ldso and the ln(FALLS d^o). All possible

combinations of pretreated NIR and FALLS data were tested.

Investigation with both calibration methods revealed that the reflectance data recorded

by the NIR spectrometer produced calibrations with significant correlation, low scatter

and low bias. Kubelka-Munk function data and absorbance also showed significant

correlation between NIR predicted J 50 and FALLS dso, however these pre-treatments
56
introduced significant fixed bias into the calibration. Reflectance data were therefore

subsequently used. The ln(FALLS J 50) was found to give two wavenumber MLR

calibrations with a lower SEC than I/J 50 or and were the FALLS data used in

subsequent MLR calibrations.

To demonstrate the feasibility of the MLR calibration method, sieve fractions of a single

batch of three drugs were tried initially (aspirin, anhydrous caffeine and paracetamol).

Subsequent working MLR calibrations for the two pharmaceutical excipients

(microcrystalline cellulose and lactose monohydrate) were produced from a larger data

set using either machine sieve fractions or a combination of machine sieve fractions and

bulk samples from a number of different batches. The quadratic least squares

calibrations were performed using the microcrystalline cellulose and lactose

monohydrate data sets as these had the largest number of data.

2.8.2 Spectral Characteristics

Scans of each powdered sample exhibited the characteristic overlapping combinations

and overtones arising from the fundamentals of the mid infrared, with non-uniform

baselines resulting from multiple scattering. Spectra also showed different offset values,

which appear to increase with wavenumber (Fig. 2.1). This has previously been

attributed to variation in pathlength (Mark, 1991), which is influenced by particle size

and sample porosity.

2.8.3 Single Wavenumber Quadratic Least Squares Calibration

Reflectance at any wavenumber showed a generally inverse linear trend with median

particle size up to approximately 100 pm (Fig. 2.2) broadly agreeing with Mie and

Fraunhoffer theory for particles of comparable size to the wavelength (Kortum, 1969).

Beyond this particle size, the relationship becomes markedly non-linear (Fig. 2.2). A
57
0.9 0.8
g 0.8 8
I 0.7 S 0.6
^0,6 I 0.4
I 0.5 I
0.4 0.2
0.3
4000 6000 8000 10000 4000 6000 8000 10000
Wavenumber/cm - 1 Wavenumber/cm'- 1

Fig. 2.1. NIR reflectance spectra for different median particle sizes. (A)
microcrystalline cellulose: (a) 24 jim, (b) 45.8 pm, (c) 93.4 pm, (d) 261 pm, (e)
406 pm and (B) lactose monohydrate: (a) 44.7 pm, (b) 66.3 pm, (c) 98 pm, (d)
132 pm, (e) 168 pm.

58
0.55 -0.015

- 0.02

c -0.025
o 0.45 ^ -0.03
IT
0.4 -0.035

-0.04
0.35
100 200 300 400 100 200 300 400
FALLS FALLS

0.16
0.4
0.15

S
c 0.3 8c 0.14
su so 0.13
0> 0)
"5 0.2 0.12
cc
0.11 H
0.1
0.1
0.09-
50 100 150 50 100 150
FALLS FALLS

Fig. 2.2. Single wavenumber quadratic least squares fit of NIR spectral data and

median particle size, dso : (A) microcrystalline cellulose reflectance data (9012

cm“^), (B) microcrystalline cellulose mean-corrected reflectance data (7128 cm“^),

(C) lactose reflectance data (7428 cm“^) and (D) lactose mean-corrected reflectance

data (7056 cm"^).

59
quadratic least squares fit of the data (Fig. 2.2) between reflectance and median particle

showed useful correlations between the NIR predicted and FALLS J 50 values

(microcrystalline cellulose: r = 0.96, 9012 cm"* (n = 57); lactose monohydrate: r = 0.90,

7428 cm"‘ (n = 33)).

Mean-correction of each spectrum was found to improve the correlation between NIR

predicted and FALLS dso with the microcrystalline cellulose data set (r = 0.98, 7128

cm"* (n = 57)). This pre-treatment acts to centre the data of individual spectra (Rmean =

0) and can help to eliminate baseline differences that occur as a result of variable

sample porosity and pressure applied with the fibre-optic probe. The variation in offset

will also be influenced by the flow properties of the material. Use of this pre-treatment

is likely to be appropriate in single wavenumber least squares calibrations where the

material exhibits variable compaction properties, such as with different grades of

microcrystalline cellulose {Handbook o f Pharmaceutical Excipients, 1994), and also

where the NIR measurements are recorded using a fibre-optic probe. However, with

lactose monohydrate (reagent grade, 110 mesh and Fastflo), which tends to have good

flow and compaction properties {Handbook o f Pharmaceutical Excipients, 1994;

Pearce, 1986), this pre-treatment was not appropriate and gave a poorer fit between NIR

predicted and FALLS dso (f = 0.68, 7056 cm"* {n = 33)).

2.8.4 MLR Calibration Using Aspirin, Anhydrous Caffeine A nd Paracetamol

The results of these calibrations which applied MLR to all two wavenumber

combinations and contained 7 or 10 data points, clearly demonstrated a relationship

between NIR reflectance and the ln(FALLS d s o ) values (Fig. 2.3). The ln(FALLS d s o )

and reflectance data, R, were fitted to an equation of the general form:

60
400 180

160 150

« 100
T3
2 100
a 300 Q.
50

60
250 0
250 300 350 400 50 100 150 0 50 100 150
FALLS t/gg/ixm FALLS FALLS

Fig. 2.3. Feasibility study. Results of MLR calibration. NIR measured


median particle size, dso, versus FALLS dso : (A) aspirin, (B) anhydrous
caffeine and (C) paracetamol.

61
ln(FALLS + /?, (2.8.3)

were bo is the intercept, bj and 62 are the MLR coefficients for the two wavenumbers, A;

and A2 respectively. Significant linear association was found between NIR predicted

lnû?5o and ln(FALLS J 50) values in each case (r = 0.99 (« = 7), anhydrous caffeine: r =

0.99 (« = 7) and paracetamol: r = 0.96 {n = 10); in each case p < 0.005).

2.8.5 Full Two Wavenumber Search MLR Using Microcrystalline Cellulose And

Lactose

With each of these materials, preliminary data processing revealed that MLR of

reflectance versus ln(FALLS d$o) produced the most linear calibrations. In addition,

mean-correction of these spectra was not found to improve calibration results. The MLR

two-wavenumber model therefore compensates for variation in baseline offset.

2.8.5.1 Particle Size Calibration Using Sieve Fraction Data

Before calibrations were attempted for both materials, the data of each was split into

two sets: a calibration set of mainly sieved fractions (microcrystalline cellulose {n = 24)

and lactose {n = 15)) and a validation set of bulk samples (microcrystalline cellulose {n

= 33) and lactose {n = 18)). With each calibration set, highly significant correlation {p <

0.005) was obtained between NIR predicted InJso and ln(FALLS d^d) values (Table 2.1).

However, the validation sets for each material exhibited more scatter (Table 2.1). With

the microcrystalline cellulose prediction set of bulk samples (Avicel grades PH 101 and

PH 102), significant correlation {p < 0.005) between NIR predicted \nd50 and ln(FALLS

dso) was obtained with a SEP greater than the SEC (Table 2.1). The high SEP is

possibly accounted for by the FALLS and scanning electron microscopy (SEM) results

62
Table 2.1. Microcrystalline cellulose and lactose MLR calibration (sieve fraction

data) and validation (bulk sample data) results.

M a te ria l M ic ro c ry s ta llin e c e llu lo s e L a c to s e m o n o b y d ra te

bo' - 3 .9 9 4 .5 9
b ,' 5 9 .6 5 -1 7 2 .3
b : -5 7 .9 9 1 6 7 .8

W a v e n u m b e r 1 (c m ') 8244 6012

W a v e n u m b e r 2 ( c m " ') 5964 5940

S E C (ln (* n /ftm )) 0 .0 6 7 0 .0 9 7
S E P ( l n ( ( / ; ,) / |a m ) ) 0 .1 7 0 .1 8

In P = c + m l n ( F A L L S d s o )
C a lib ra tio n set
r 0 .9 9 0 .9 9
m 0 .9 9 098
c 0 .0 3 5 & % 8
n 2 4 ( P H lO l, P H 1 0 2 , P H 2 0 0 s ie v e d ) 15 ( S i e v e d a n d 1 1 0 n

V a lid a tio n set


r 0 .8 4 0 .0 1 4
m 0 .9 6 0 .0 1 8
c 0 .1 7 4 .3 8
n 33 ( P H lO l & P H 10 2 , b u lk ) 18 (F a s tflo s a m p le s )

* M L R c o e f f i c i e n t s : h o - i n t e r c e p t , h , - w a v e n u m b e r 1, a n d l?2 - w a v e n u m b e r 2 .
r i s c o r r e l a t i o n c o e f f i c i e n t ; m a n d c a r e s l o p e a n d i n t e r c e p t o f p l o t s o f N I R p r e d i c t e d InrA o v s . F A L L S m e a s u r e d I n J s o ; ti i s t h e
n u m b e r o f s a m p l e s i n e a c h d a t a s e t . ____________ _________ ____________________________ __________________ ____________ _____________________________

Fig. 2.4. SEM photographs. (A) Avicel PHlOl bulk sample, (B) Avicel PH200 > 200
pm sieve fraction, (C) lactose monobydrate < 31 pm sieve fraction and (D) lactose
monobydrate > 150 pm sieve fraction.
63
(Fig 2.4A & 2.4B) which showed that these bulk samples had broad distributions and

comprised a mixture of irregularly shaped fines and large spherical particles. Previous

work (Kortum, 1969) has shown that this can produce more variable results than the use

of narrow or uniform-size distributions as the NIR scattering and absorbing properties

of these particles will be different to that of median sized particles.

Validation of the lactose calibration used bulk samples of Fastflo from 18 different

batches. This spray dried material generally has a relatively uniform and spherical

particle size (Pearce, 1986). A narrow range of <^50 was confirmed by FALLS (Range

dsQ. 81.1 - 115.7 |im ) . Poor correlation was obtained between NIR predicted InJso and

ln(FALLS dso) with these samples and is probably due to the narrow range of particle

size in the prediction set as the SEP is not significantly different from that of the

microcrystalline cellulose prediction set of bulk samples (Table 2.1).

SEM results of the lactose sieve fractions used in the calibration set showed small,

irregularly shaped fines in the smallest sieve fractions and large spherical particles in

the largest sieve fractions (Fig 2.4C & 2.4D), much the same as with the

microcrystalline cellulose calibration set.

2,8.5.2 Particle Size Calibration Using Randomised Sieve Fraction A nd Bulk Sample

Data

To produce working calibrations with the microcrystalline cellulose and lactose data

sets, the sieve fraction and bulk sample data were randomly assigned to either the

calibration set (67% of spectra) or validation set (33% of spectra). This procedure was

repeated three times to test the robustness of the method, giving three different

calibration and validation sets for the two materials. Both sieve fractions and bulk

samples were used in calibrations as preliminary investigation showed that this

produced more robust calibrations.


64
In each case, all three calibrations employed slightly different combinations of

wavenumbers. The selected wavenumbers were found to occur on the slopes of

overtone peaks; the selection of each wavenumber is therefore likely to have been

influenced by the random noise in each data set. With both materials, each of the three

MLR calibrations showed a good fit between NIR spectral and FALLS data

(microcrystalline cellulose: SEC (ln(J 5o/|Lim)) = 0.10 - 0.11^ and lactose monohydrate:

SEC (ln(J 5o/|im)) = 0.12 - 0.13^). This was confirmed by plots of NIR predicted InJso

versus ln(FALLS d s o ) which showed significant linear association (microcrystalline

cellulose: r = 0.98 (n = 38 for each set) and lactose monohydrate: r = 0.97 - 0.98^ (n =

22 for each set); in each case with p < 0.005) (Figs 2.5A & 2.6A). The three validation

sets for each material showed similar results (Figs 2.5B & 2.6B) with highly significant

correlation between NIR predicted InJso and ln(FALLS d s o ) (microcrystalline cellulose

(« = 19): r = 0.98, lactose monohydrate (n = 11): r = 0.93 - 0.97;^ in each case p <

0.005), and SEP comparable to SEC, microcrystalline cellulose: SEP (ln(J 5o/jim)) =

0.12 - 0.14,^ and lactose monohydrate: SEP (ln(6f5o/|im)) = 0.15 - 0.21^).

2.9 Measurement of The Cumulative Percentage Frequency Particle Size

Distribution

In this Section, NIR measurements of powdered microcrystalline cellulose are

calibrated to measure the cumulative percentage frequency particle size distribution.

Two different chemometric methods are compared: 3-wavenumber MLR and 3

principal components regression (PCR).

^ The range gives the minimum and maximum values observed for the three randomly
selected calibration and validation sets.
65
10 10

10,2
Q. Q.

,1 .1
10 10
10
1 10 .2
10 ,3
10
,1
10.2 10.3
FALLS FALLS

Fig. 2.5. Results of microcrystalline cellulose MLR calibration with


randomised sieve fraction and bulk sample data. NIR measured median
particle size, dso, versus FALLS dso. (A) Calibration set and (B)
validation set.

10 10

s
•a ■o
10,2 10,2
a. Q.

10,1 ,1 10
,1

10 ,2
10' ,3
10 10.1 10',2 10.3
FALLS dgg/|im FALLS

Fig. 2.6. Results of lactose monobydrate MLR calibration with


randomised sieve fraction and bulk sample data. NIR measured
median particle size, dso, versus FALLS dso. (A) Calibration set and (B)
validation set.

66
2.9.1 Preliminary Investigation

In Section 2.7, it was shown that useful calibrations for median particle size can be

obtained by using NIR reflectance data with a logarithmic transform of the FALLS

particle-size data, hence these data have been used in this section.

With MLR calibrations, preliminary work showed that a 3 wavelength linear regression

at any of the FALLS quantités produced calibrations more robust than a two wavelength

fit. It was therefore decided that three wavelength MLR calibrations would be employed

subsequently. With PCR models, three principal components were required to produce

satisfactory calibrations and this number was used for all subsequent calibrations.

2.9.2 Spectral Characteristics

The spectra of each powdered sample exhibited the effects of multiple scatter, as

described in Section 2.8.2 (Fig. 2.1 A).

2.9.3 Model Generation

The FALLS instrument gives values of the cumulative percentage frequency particle-

size distribution at 64 particle sizes (range: 564 to 5.8 pm), at intervals which follow a

geometric progression. For each sample, linear interpolation of the measured FALLS

values was used to calculate the particle size values corresponding to the 5,10, 20, 30,

40, 50, 60, 70, 80, 90 and 95% quantiles (Appendix D, CD-ROM inside back cover).

The samples exhibited a wide range of particle sizes at each quantile (Table 2.2) and a

wide variety of distributional shapes. Of the 57 samples available, 34 were chosen at

random for the calibration set; the remaining 23 samples were used as an independent

validation set. To aid comparison of the two calibration methods, the same calibration

and validation data were used for each method.

67
Table 2.2. Particle size ranges at each quantile for the calibration and validation

sets as determined by FALLS.

Quantile Particle size/|im

Calibration set (n = 34) Validation set (n = 23)

Minimum Median Maximum Minimum Median Maximum

5% 6.45 25.72 216.52 7.21 23.06 167.11

10% 9.92 37.14 268.91 11.44 32.34 187.67

20% 14.48 52.92 311.96 18.05 45.40 219.33

30% 18.39 67.10 345.62 22.55 56.36 251.13

40% 21.40 81.41 376.44 26.27 67.35 283.66

50% 23.99 96.59 406.07 29.82 78.98 319.67

60% 26.47 112.82 436.21 33.71 91.57 359.67

70% 29.25 131.29 466.55 38.30 105.95 402.81

80% 33.11 154.78 497.51 44.94 124.03 451.54

90% 40.62 197.11 529.54 57.16 152.54 504.66

95% 48.47 240.07 546.76 70.34 184.74 533.21

68
2.9.4 Three-Wavelength Multiple Linear Regression

Data from the calibration samples were used to generate calibration equations for each

quantile by fitting the logio Jx values to the NIR reflectance values according to

equation (2.9.1):

(2-9.1)

where d is the FALLS interpolated particle size at quantile, x, R the reflectance at

wavelength Xa and ba the MLR coefficient for each wavelength. The selection of

wavelengths was performed on a reduced data set of every other wavelength to reduce

the computation time required. A full 3 wavelength search for each particle size quantile

calibrated therefore used 250 of the 500 available wavelengths. This reduced the total

computation time for all eleven calibrations to about 10 hours (on an Acer Pentium II

333 MHz PC), compared with an estimated 80 hours if all 500 wavelengths had been

searched. Though setting up the eleven calibrations is time consuming, the cumulative

particle size distributions of future samples may be calculated from their NIR spectra

virtually instantaneously.

For each calibration equation, the three chosen wavelengths (Table 2.3) were those

which gave the smallest standard error of calibration (SEC). The optimum wavelengths

were similar for the 30 to 60 percent quantiles, but varied somewhat for the extreme

quantiles. The calibration equations were then used to predict the validation set {n = 23)

to give an indication of the robustness of the method (Table 2.4).

2.9.5 Principal Components Regression

This calibration method required generation of a principal components analysis (PCA)

69
Table 2.3. MLR wavelengths and PCs selected for each percentage quantile

calibration.

Percentage MLR wavenumbers/cm ' PCs


5 4008 9300 9528 28 22 17
10 4008 9300 9528 29 27 14
20 5640 5676 6216 29 27 14
30 4464 9852 9864 29 28 27
40 5736 9852 9864 27 15 1
50 5736 9852 9864 20 15 1
60 5496 9852 9864 20 15 1
70 6024 6948 9168 15 14 1
80 5664 5796 9432 23 9 3
90 5952 6996 8280 28 18 6
95 7632 8532 8664 28 18 6

Table 2.4. MLR & PCR calibration and validation results at various percentage

quantiles.

Percentage
5 10 20 30 40 50 60 70 80 90 95

MLR
Calibration set (n = 34)
R 0.977 0.980 0.984 0.987 0.989 0.988 0.984 0.979 0.972 0.932 0.889
m 0.954 0.960 0.968 0.974 0.978 0.975 0.968 0.958 0.945 0.869* 0.791*
c 0.065 0.063 0.055 0.048 0.042 0.049 0.065 0.088 0.121 0.301* 0.497*
SEC (logioW /pm)) 0.084 0.071 0.055 0.046 0.039 0.039 0.042 0.046 0.052 0.080 0.104
CV(%) 19.3 16.3 12.7 10.6 9.0 9.0 9.7 10.6 12.0 18.4 23.9

Validation set (n = 23)


R 0.951 0.951 0.965 0.971 0.959 0.955 0.950 0.959 0.964 0.897 0.822
m 0.876 0.950 0.977 0.978 0.984 0.973 0.943 0.986 0.980 0.921 0.734*
c 0.140 0.064 0.046 0.021 0.013 0.040 0.108 0.050 0.057 0.216 0.657*
SEP (log|o(d./pm)) 0.131 0.109 0.074 0.066 0.074 0.073 0.073 0.070 0.061 0.106 0.132
CV(%) 30.1 25.1 17.0 15.2 17.0 16.8 16.8 16.1 14.0 24.4 30.4

PCR
Calibration set (.n = 34)
R 0.969 0.973 0.978 0.980 0.981 0.981 0.976 0.968 0.959 0.898 0.858
0.939 0.946 0.956 0.960 0.963 0.961 0.953 0.937 0.921 0.806 0.737
0.086 0.084 0.076 0.074 0.070 0.077 0.096 0.133 0.174 0.445* 0.627'
SEC (logioW pm )) 0.096 0.082 0.065 0.057 0.051 0.049 0.051 0.057 0.062 0.097 0.116
CV(%) 22.1 18.9 15.0 13.1 11.7 11.3 11.7 13.1 14.3 22.3 36.8

V alidation set (n = 23)


R 0.980 0.981 0.981 0.978 0.984 0.981 0.969 0.965 0.953 0.924 0.842
1.128* 1.045 0.998 0.969 0.967 0.970 0.977 0.975 0.946 0.932 0.861
-0.16* -0.071 0.005 0.053 0.051 0.042 0.024 0.034 0.079 0.092 0.258

SEP (log,oW,/Mm)) 0.085 0.062 0.051 0.050 0.041 0.045 0.056 0.057 0.071 0.094 0.124

CV(%) 19.6 14.3 11.7 11.5 9.4 10.4 12.9 13.1 16.3 21.6 28.5

R is multiple correlation coefficient, m and c are slope and intercept of plots o f NIR predicted log,off, vs. FALLS measured log,ot/,; n is the number o f samples in each data
set; * m significantly different from 1, o r e significantly different from 0; CV - coefficient of variation.

70
model. This consists of a set of new variables which are uncorrelated and represent

linear combinations of the original NIR reflectance data.

The PCA model was obtained as the product of a score matrix, T, with a loadings

matrix, P, plus a residuals matrix, E, according to equation (1.7.1), from variable mean-

centred spectral data. Regression of FALLS data was as described above for MLR,

except that PC scores were used in place of reflectance values (Naes and Martens,

1988). For each calibration, the 3 PCs selected were those that gave the highest

correlations with the FALLS data (Table 2.3). The total time required to compute PCs

and PCR calibration equations was much faster than MLR, requiring only about 20

minutes.

2.9.6 Calibration A n d Validation Precision

With both methods, individual calibrations were the most precise at the 40% and 50%

quantiles (Table 2.4). This is clearly seen from the plot of SEC versus percentage

quantile (Fig. 2.7). The falling off in the precision of individual calibrations at the

extreme quantiles most probably reflects the shape of the distribution curves for the

particle-sizes in the calibration sets. The shapes of the distributions become more

skewed at the extreme quantiles (Appendix D, avicelquantilescal).

With both MLR and PCR excellent calibration results were obtained, with low SEC

(Table 2.4). The SECs at each quantile are smaller with MLR, however the standard

errors of prediction (SEPs) for the independent validation set are smaller with the PCR

model (Table 2.4). This suggests that the PCR model is more robust. Table 2.4 also

gives the slopes and intercepts for the plots of NIR predicted logio^/x versus FALLS

measured log,o^x values at each quantile. The slopes and intercepts were not

significantly (5% probability level) different from 1 and 0 respectively, apart from a few

values (marked with an asterisk) which occurred at some of the extreme quantiles.
71
0.12

0.11

0.1

0.09

= -0 .0 7

0.06

0.05

0.04

0.03
20 30 40 50 60 70 80 100
Percentage quantile

Fig. 2.7. Standard errors of calibration (SEC) versus cumulative percentage


quantile: (A) MLR, and (B) PCR.

72
2.9.7 Cumulative Particle Size Distributions

The percentage quantile value was plotted against the NIR predicted logio^fx of each

sample in the calibration and validation sets to give cumulative particle-size distribution

curves for both the MLR and PCR methods. The MLR and PCR results for the first 4

validation samples are shown in Fig. 2.8, which also shows the FALLS measured

cumulative percentage frequency distributions overlaid. Predicted distributions for both

calibration methods closely follow those obtained by FALLS, although PCR predicted

distributions match the FALLS measured distributions more closely than with MLR.

In this work, the number of quantiles at which calibration equations were set up was

restricted to 11. In principle, more or less could be used. With the present data sets the

errors do not justify the need for smaller intervals (Table 2.4).

73
Cumulative % frequency Cumulative % frequency Cumulative % frequency Cumulative % frequency

o
§ ê S
O-

T3
B) ?
5 I
2.

r
(D
<0 O o

1
++

O o

Cumulative % frequency Cumulative % frequency Cumulative % frequency Cumulative % frequency

ë ê o>
o s s ë §
o O

? ?
03 a0

1 O

1 O
(D (D
2 o <A (0

O o O o
2.10 Measurement of The Percentage Frequency Particle Size Distribution

In this Section, the percentage frequency particle size distribution of microcrystalline

cellulose is measured by NIRS. Calibrations of NIR and laser diffraction analysis

reference particle size measurements of this powdered material were produced by

partial least squares regression (PLSR) (Section 1.7.4).

2.10.1 Spectral Characteristics

The spectra showed the effects of multiple scatter, as described in section 2.8.2. This is

due to differences in particle size and surface moisture content (Kortum, 1969) (Fig.

2.S0.

2.10.2 Spectral Data Pre-treatments

A range of data pre-treatments commonly employed in NIR spectrometry were applied

to the spectral data and their effects on calibration and prediction precision were

compared with results for raw absorbance (log(l/R)) data. The data pre-treatments

tested were:

1. SNV;

2. DT;

3. SNV-DT;

4. Sg2dll.

Owing to the increase in noise in the Savitzky-Golay smoothed second derivative

spectra beyond 2200 nm, the wavelength range used with this pre-treated data was

truncated to 2200 nm.*

*due to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm (« = 546 data points).
0.4

0.3

g) 0.2

- 0.1

1200 1400 1600 1800 2000 2200 2400


Wavelength/nm

Fig. 2.9. NIR absorbance spectra of microcrystalline cellulose samples (n = 113).

76
2.10.3 Preliminary Data Analysis

In Section 2.8 it was shown that NIR data may be calibrated to measure the percentage

cumulative frequency particle size distribution of this powdered material by MLR and

by PCR. That was an extension of the NIR method for calibration of median particle

size, described in Section 2.7. With the data sets and chemometric techniques used

previously, it was not found possible to produce accurate calibrations for the percentage

frequency particle size distribution by MLR or PCR. However, with this larger NIR-

particle size data set, preliminary investigation showed that this could be calibrated for

by PLSR - although it was not also possible to calibrate the larger data set to measure

the percentage cumulative particle size distribution by this chemometric method.

A Malvern Mastersizer X Laser Diffraction instrument was used to provide percentage

frequency and percentage cumulative frequency particle size distributions. Each

measured distribution comprised 32 different channels, with intervals which follow a

geometric progression. For calibration purposes, the mean value between the low and

high particle sizes for a given channel were used (range: 0.9 - 448.34 |xm). Preliminary

investigation of the total particle size data set revealed that the largest particle size

channel had values which were zero for all samples. Since the variance in this channel

was zero, and therefore could not be modelled by PLSR, it was removed from the

particle size data set providing 31 channels for calibration.

2.10.4 Partial Least Squares M odel Generation

Biased regression models for particle size distribution data were produced with raw and

pre-treated NIR spectral data by partial least squares regression (PLSR2), according to
1
equations (1.7.18) to (1.7.23). The number ofPLS dimensions. A, for each of the pre-
i
treated data sets (X, Y) were estimated by cross-validation, using the PRESS statistic.

For this, the first 110 of 113 observations (X, Y) were divided into 11 subsets of 10
77
observations. Next, calculation of PLS models of rank A was performed for all

combinations of 10 of the 11 subsets of data. With each PLS model, the remaining

subset of NIR spectral and particle size data was used to test the goodness of fit of the

model (A PLS dimensions) by measuring the sum of squares between model predicted

and reference particle size data. This approach of dividing the data into subsets was

preferable to ‘leave-one-out-cross-validation’, requiring considerably less computation

time (less than one minute compared with 30 minutes for a Teave-one-out’ cross

validation). The number of PLS components required to extract the information from X

and Y was that number A with the lowest PRESS value.

2.10.5 Cross Validation

With the exception of the model produced with Savitzky-Golay 2"^ derivative data, 6

PLS components were required to fit the data and give the lowest PRESS value (Table

2.5). Clearly, the Savitzky-Golay smoothed second derivative did remove more scatter

and baseline drift information from the NIR data than the other pre-treatments, requiring

only 4 components to model the data. With all data pre-treatments tested, typical

idealised plots of PRESS versus number of PLS components were obtained, with clear

minima. This is shown for the absorbance data set in Fig. 2.10. The SNV transformation

provided a model with the lowest PRESS value (Table 2.5). All other pre-treated NIR

data sets produced models with low PRESS values except for absorbance data which

had the highest PRESS value which was 33% higher than for the model derived from

SNV data (Table 2.5).

78
Table 2.5. PLS results of cross validation, calibration and prediction for the NIR

data sets tested.

Data PLS Components P RESS{% f M S E P {% f RM SE P {% f

Absorbance 6 0.21960 0.8181 0.9045

SNV 6 0.16478 1.0672 1.0331

SNV detrend 6 0.16723 1.1609 1.0774

Detrend 6 0.19251 1.1351 1.0654

Savitzky-Golay 2"^ derivative 4 0.20079 2.8711 1.6944

0.9

0.8

a 0.7

0.6

0.4

0.3

0.2
PLS components

Fig. 2.10. Predicted residual error sum of squares (PRESS) for successive PLS
components extracted using NIR absorbance and laser diffraction data.

“PRESS is the predicted residual error sum of squares between NIR predicted and reference measured particle size data.
Mean square error of prediction (MSEP) for predicted particle size data, rescaled by standard deviation and mean of reference
particle size data used to calculate model.
Root mean square error of prediction {RMSEP) for predicted particle size data, rescaled by standard deviation and mean of
reference particle size data used to calculate model.
79
2.10,6 Calibration A nd Validation Precision

To test the robustness of the PLS models, the NIR-particle size data set was split into a

calibration and a validation set. For PLS modelling, 90 spectra and particle size results

(80%) were randomly selected for calibration. The remaining 23 samples (20%) were

used to test the predictive abilities of the models. PLS models for raw and pre-treated

NIR-particle size data were created, each having the number of components determined

by cross validation. The predictive ability of the models were determined by calculation

of the mean square error of prediction {MSEP) (Beebe et al, 1998) and root mean square

error of prediction (RMSEP) (Beebe et al, 1998) between PLS model predicted and

reference percentage frequency, y, over all samples, m, and channels, n:

JL JL A

MSEP = — À'
mn (2.10.1)

£ £ ( > ',> y o') (2 . 1 0 .2 )


RMSEP =
mn

The best predictive model was obtained using absorbance data which produced lowest

prediction errors: MSEP = 0.82% and RMSEP = 0.90%. The SNV, DT and SNV-DT

pre-treatments also produced models which showed low prediction errors (range: 1.0 to

1.1%), however these ranged from 14 to 19% higher than those obtained with

absorbance data (Table 2.5). The Sg2dl 1 transformation produced a model with far

higher prediction errors, with an RMSEP 87% higher than that obtained with absorbance

data (Table 2.5). Plots of percentage frequency particle size distributions for the 23

validation samples are shown in Fig. 2.11 and show the results obtained by laser

80
12
10
10

g
& &
2 I
10 0 2 0 0 3 0 0 4 0 0 100 200 300 400 100 200 300 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm

12 10

>, 10
C
! s &

1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400


Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm
12
12
10 10 10
> 10
I
& & &
2

100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400


Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm
12
10
10
C

I & &

100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400


Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm
12
10
10 10

I g
&
2
I
&
2. I
I
&
O' ■

100 2 0 0 3 0 0 4 0 0 1 0 0 2 0 0 3 0 0 400 100 200 300 400 1 0 0 2 0 0 3 0 0 400


Particle Size/pm Particle Size/pm Particle Size/pm Particle Size/pm

10 10
10

I g
&
I
2. .
■±±
100 200 300 400 1 0 0 2 0 0 3 0 0 400 100 200 300 400
Particle Size/pm Particle Size/pm Particle Size/pm

Fig. 2.11. Plots of laser diffraction measured percentage frequency distributions


for the validation samples (n = 23) with NIR values overlaid (+).

81
diffraction with the NIR predicted results overlaid. The linear association between NIR

predicted and laser diffraction measured particle size percentage frequency values for

the entire validation set (all 23 samples and 31 channels) was found to have a highly

significant correlation, r, of 0.973 (n = 713, p = 0.005), with slope of 0.940 and

intercept of 0.194% (Fig. 2.12).

82
14

12

10
ü
5
Ig)
8

sc
0)
6
I
1 4
1
Q.
%* • •

çç
z 2

-2
0 2 4 6 8 10 12 14
Percentage frequency

Fig. 2.12. NIR predicted percentage frequencies versus Malvern Mastersizer


measured percentage frequencies for the 23 validation samples (w = 713).

83
2.11 Classification of Excipient Grades by Cluster Analysis Methods

Cluster analysis and pattern recognition methods have been used to identify

pharmaceutical excipients by NIRS (Candolfi et al, 1999a, 1999b). The multivariate

techniques used were soft independent modelling of class analogy (SIMCA),

wavelength distance, k nearest neighbour {knn). Hotelling’s 7^ control ellipses and

triangular potential functions (Candolfi et al, 1999a, 1999b). However these

multivariate methods have not been investigated for classification of different grades of

materials. In this Section, grades of lactose monohydrate and microcrystalline cellulose

are classified and identified by cluster analysis (Massait et al, 1988) of their NIR

spectra. Two different chemometric methods are compared: knn using reflectance

values at all combinations of two wavelengths and score values of combinations of two

PCs and Hotelling’s 7^ control ellipses of PC scores.

2.11.1 Near Infrared Spectra

The reflectance spectra for all samples showed characteristic non-uniform baselines

arising from multiple scatter. Between grades of the same material, the differences in

offset and baseline curvature were consistent with the median particle sizes of the

grades. Hence with the two grades of microcrystalline cellulose, the spectra fall into two

groups (Fig. 2.13 A). With lactose monohydrate, the Regular grade material from two

different manufacturers had median particle sizes greater and less than for lactose

Fastflo, hence spectra for the Regular grade appear at both higher and lower reflectance

than for Fastflo (Fig. 2.13B).

84
0.76

0.74

® 0.7
— Re gu lar
-PH 101

1
_ -----
Fastflo

Re gu lar

-P H 102

6000 6200 6400 6600 6800 7000 7200 7400 5000 6000 7000 8000 9000
W a v e n u m b e r/c m '’ W a v e n u m b e r/c m '’

Fig. 2.13. Near infrared reflectance spectra of microcrystalline cellulose and


lactose monohydrate grades: A) microcrystalline cellulose (grades PHlOl and
PH102, 5909 to 7406 cm~^), B) lactose monohydrate (grades Regular and Fastflo,
4008 to 9996 cm"^).

85
2.11.2 Spectral Data Pre-treatments

A range of scatter correcting data pre-treatments commonly employed in NIR

spectrometry were applied to the spectral data and their effects on classification were

compared against results for reflectance data. The data pre-treatments tested were:

1. SNV;

2. DT;

3. SNV-DT;

4. S g2dll.

2.11.3 k Nearest Neighbour

k Nearest neighbour {knn) is a mathematically very simple non-parametric classification

procedure (Massait et al, 1988). It involves computing the Euclidean distance between

an unknown sample’s pattern vector and n samples in a given training set resulting in n

distances. The Euclidean distance, D, between sample p and q is given by:

( 2 . 11 . 1)

where m represents the number of variables. The unknown sample is assigned to which

ever training set has the k nearest samples. The value of k is determined by optimisation

to determine the best prediction ability. Small values are normally preferred with typical

values for of 3 or 5 having been reported. It has also been reported that with large data

sets this method is capable of yielding classification results close to or better than more

complicated methods, despite its simplicity (Massait et al, 1988). In this work,

classification has been attempted using spectral values at pairs of wavelengths and

principal components (PCs) scores on two components. With both materials, each

86
sample of a grade has been treated as an unknown and classified according to the class

of its k nearest neighbours.

2.11.4 Wavelength A n d Principal Components Selection

The algorithm used searched all possible combinations of pairs of wavelengths and with

PCA, pairs of PCs. At each of the two-wavelength and two-PC combinations, the

classification ability was determined {i.e. the numbers of correct identifications and the

numbers of incorrect identifications). Combinations yielding 100% correct classification

for the two groups and their total number were recorded. The procedure was also

repeated for different data pre-treatments.

The choice of using two wavelengths and PCs was made to maintain mathematical

simplicity and also because principal components analysis often results in clusters

which are discernible on plots of two PCs, frequently with the first two components.

Hence, the results of the two wavelength and two PC approaches could be directly

compared.

The number of wavelengths included in the search was 500 for all pre-treatments except

the Savitzky-Golay 2"^ derivative which used 489 wavelengths. With PCA, the number

of PCs searched depended on the rank of the model. With each model, this was

determined by calculation of the % sum of squares {%SS) (Lindberg et al, 1983) and the

R statistic (Wold, 1978) for successive PCs extracted (Tables 2.6 & 2.7 ).

2.11.5 k-Nearest Neighbour With Spectral Wavelengths

For the two grades of microcrystalline cellulose, 100% correct identification of the

samples was obtained using pairs of wavelengths for all spectral pre-treatments (Table

2.8). Optimisation of k produced excellent classification results with a value of just one

87
Table 2.6. knn results for microcrystalline cellulose grades PHI 01 (w = 15) and

PH102 (n = 15) using pairs of principal components (PCs).

Data PCs Extracted Correct identification Example PCs

PHlOl PH 102

( k = l)

Reflectance 4 99.33 15 13

SNV 4 94.36 15 13

Detrend 4 94.31 12 14 -

SNV detrend 4 93.85 13 14

Savitzky-Golay 2"'* derivative 4 65.86 15 15 1,4

(ik = 2)

Reflectance 4 99.33 13 13

SNV 4 94.36 14 15

Detrend 4 94.31 8 14

SNV detrend 4 93.85 13 15 -

Savitzky-Golay 2"'' derivative 4 65.86 14 15

Table 2.7. knn results for lactose monohydrate grades Regular (n = 9) and Fastflo

(n = 9) using pairs of principal components (PCs).

Data PCs Extracted %ss- Correct identification Example PCs

Regular Fastflo

(* = 1 )

Reflectance 2 99.44 8 8

SNV 3 94.68 7 9

Detrend 4 96.07 9 9 1 ,2 and 2, 3

SNV detrend 3 92.72 7 9 -

Savitzky-Golay 2"'* derivative 3 70.21 8 9

(* = 2)

Reflectance 2 99.44 7 8

SNV 3 94.68 6 9

Detrend 4 96.07 8 9

SNV detrend 3 92.72 6 9

Savitzky-Golay 2°^ derivative 3 70.21 7 9

^%SS is the sum of squares of the differences between the NIR data and the NIR data
estimated by the principal components analysis model.

88
or two nearest neighbours. With a value for k of two, reflectance data yielded only 12

pairs of wavelengths out of 124,750 possible combinations which were suitable (0.01%)

(Table 2.8). The best data pre-treatment was the Savitzky-Golay 2"^ derivative. With k

equal to one, 100% correct identification of the two grades was obtained for 8381

combinations of wavelength pairs out of 119,316 possible combinations (5%) (Table

2 .8 ).

For the two grades of lactose monohydrate, 100% correct identification was obtained

with all pre-treatments except for reflectance data (Table 2.9). These results could be

obtained with a value for k of one or two. The best data pre-treatment for this material

was the quadratic baseline detrend. This returned 34,543 possible combinations of

wavelength pairs that yielded 100% correct identification (27.7%) with k = \ (Table

2.9).

For both materials, the computation time of a full two-wavelength search was

approximately 47 minutes. However, the grade of subsequent future samples could be

determined from their NIR spectra virtually instantaneously.

2.11.6 k-Nearest Neighbour With Principal Components Scores

Between two and four PCs were required to model the pre-treated data sets for both

materials (Tables 2.6 & 2.7). These numbers of PCs were used in the two PC search.

For microcrystalline cellulose grades, best classification was obtained using PCs 1 and 4

(Fig 2.14A) and a value of one for k*. The only pre-treated data set which yielded 100%

correct identification for all samples in both classes was the Savitzky-Golay 2"^

derivative data set. This pre-treatment agrees with the optimum pre-treatment for the

method using pairs of wavelengths.

However, a value for A: of 4 also yielded the same result.


89
Table 2.8. knn results for microcrystalline cellulose grades PHlOl (n = 15) and

PH102 {n = 15) using pairs of wavelengths.

Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '

PHlOl PH 102

(* = 1 )

Reflectance 15 15 685 8868, 9900

SNV 15 15 3785 9720,9935

Detrend 15 15 2416 9072, 9996

SNV detrend 15 15 2555 9096, 9804

Savitzky-Golay 2"‘‘ derivative 15 15 8381 9792, 9852

(k = 2)

Reflectance 15 15 12 8424, 9672

SNV 15 15 3337 9768, 9936

Detrend 15 15 2363 9072, 9996

SNV detrend 15 15 1991 9096,9804

Savitzky-Golay 2"'* derivative 15 15 5908 9792,9852

Table 2.9. knn results for lactose monohydrate grades Regular in = 9) and Fastflo

(n = 9) using pairs of wavelengths.

Data Correct identification Suitable wavelength pairs Example wavelength pair/cm '

Regular Fastflo

(* = 1)

Reflectance 9 8 8916,9936

SNV 9 9 2602 9756,9828

Detrend 9 9 34543 9804, 9924

SNV detrend 9 9 5101 9756, 9840

Savitzky-Golay 2"“' derivative 9 9 6144 9792, 9852

(k = 2)

Reflectance 8 9 - 9624,9996

SNV 9 9 364 8028,8136

Detrend 9 9 7942 9096, 9432

SNV detrend 9 9 414 9240, 9360

Savitzky-Golay 2”“' derivative 9 9 6144 9792, 9852

90
PH 102

■5-14

-15

i
o
Q, •14 •12

•I'i^"^
•1 1
-ID
P H 101

PC4 scorss

F a s tflo
F a s tflo

K -0.1

R e g u la r

R e g u la r

O
PC2 score#

Fig. 2.14. k nearest neighbour {knn) and Hotelling's 7^ classification (95% control
ellipses) of microcrystalline cellulose and lactose monohydrate grades using
principal components scores. Microcrystalline cellulose (Savitzky-Golay 2"^*
derivative of reflectance data): A) knn (100% correct identification, k = 1), B)
Hotelling's 7^ control ellipses (overlapped); lactose monohydrate (quadratic
haseline corrected reflectance data): C) knn (100% correct identification, k = 1), D)
Hotelling's 7^ control ellipses (overlapped).

91
For lactose monohydrate grades, the best classification results were obtained using PCs

1 and 2 (Fig 2.14C) or 2 and 3 and a value of one for k and quadratic baseline corrected

spectra (Table 2.7). This yielded 100% correct identification for all samples in both

grades. This optimum pre-treatment also confirms that found for the method using pairs

of wavelengths.

Overall, the method using PC scores was preferred owing to the greatly reduced

computation time required to set-up the method (approximately one minute). Prediction

of the grades of future samples could be achieved virtually instantaneously.

2.11.7 H o t e l l i n g C o n t r o l Ellipses

As a comparison of the predictive ability of the knn method. Hotelling's 7^ control

ellipses (Montgomery, 1997) were produced for both materials with the PCs that

yielded best identification, as described in Section 1.8.1. The value of lOOof was set to

95%. With both microcrystalline cellulose and lactose monohydrate, the 95%

Hotelling’s 7^ ellipses for the two grades were found to overlap, resulting in some

incorrect assignment of samples for each grade (Fig. 2.14B &D). The two materials are

chemically different and should belong to different multinormal distributions.

2.11.8 Identification of The Materials Using Hotelling*s 7^ Control Ellipses

To confirm whether the Hotelling’s 7^ control ellipses are able to distinguish the two

materials, principal components models were constructed for each material using the

pre-treatments found optimum with knn. Hotelling’s 7^ control ellipses were produced

for the samples of the material (both grades) used to calculate the PC model. With both

PC models (microcrystalline cellulose and lactose monohydrate) the other material’s

pre-treated spectra were centred using the mean of the data set for the model. These

were then projected into the pattern space. For both materials, 100% correct

92
identification of the modelled class was obtained with all samples for the non-modelled

material falling outside the 99% control ellipse (Fig. 2.15A &B).

2.12 Conclusion

The shapes of near infrared spectra (i.e. baseline curvature, spectral offset, absorption

I peak height and width) of powdered pharmaceutical materials are clearly influenced by

the particle size distribution of the material. This enables calibration of the data to

measure and classify the particle size of materials. Extraction of particle size

distribution information requires multivariate calibration of a reference set of NIR

spectral data with reference particle size data. This was found to be most accurate with

raw spectral data and by PLSR. Alternatively, materials may be classified by grade

using the simple non-parametric methods, eg knn. This method does not require

reference particle size data, however scatter correction of the spectral data was found to

be necessary for the two materials studied. This is most likely necessary to reduce

within class variance and increase between class variance. As the optimum scatter

correction was found to be different for the two materials studied in this chapter,

implementation of this method for other materials requires that the optimum scatter

correction be determined first.

93
5

0
•14
•10 •12

-5
•18
•15 M ic r o c r y sta llin e c e llu lo s e

•16
L a cto se

•10
-5 -4 3 -2 -1 0 1 2 3
PC4 s c o r e s
Xid’

L a c to se

0 .2

•1 5

" M ic r o c r y sta ilin e c e llu lo s e

- 0.8 - 0.6 -0 .4 - 0.2 0 0.2 0.4 0.6

Fig. 2.15. Identification of microcrystalline cellulose and lactose monohydrate


using Hotelling's 7^ control ellipses (99% control ellipses) of principal components
(PC) models for each material. Microcrystalline cellulose PC model (Savitzky-
Golay 2"*^ derivative data used): A) 100% correct identification of microcrystalline
cellulose {n = 30). Lactose monohydrate PC model (quadratic haseline detrend
data used): B) 100% correct identification of lactose monohydrate {n = 18).

94
CHAPTER 3

Multivariate Statistical Process Control of a

Pharmaceutical Manufacturing Process

Using Principal Components Analysis of

Near Infrared Measurements

3.1 Introduction

Traditional quality control of pharmaceutical manufacturing processes involves

collection and analysis of process intermediates and final product to ensure that the

product meets the requirements of its specification (Candolfi et al, 1999a, 1999b;

Sekulic et al, 1998). This approach to process monitoring is time consuming and does

not monitor the performance of the process over time.

In this chapter, an alternative method for pharmaceutical quality control that enables

monitoring of the process operating performance of a tabletting process is discussed. In

Section 3.5, near infrared spectrometric analysis is used to provide process based

measurements at each process stage. Section 3.7 describes the use of the data reduction

method, principal components analysis (PCA), for the multivariate statistical process

control methods used. In Sections 3.8 and 3.9, multivariate models are generated and

monitored for the first process stage (blend) and second process stage (tablet). The

combined process measurements are used in Section 3.10 to develop an overall process

fingerprint from which unusual product batches may be identified. In Section 3.12

conclusions are drawn as to the advantages of this form of quality control over

95
traditional methods.

3. 2 Background And Overview of The Process

The manufacturing process of the marketed tablet can be viewed as three steps:

1. Raw materials dispensing,

2. Blending of excipients and active,

3. Tabletting of blend.

The powdered raw materials used in the blends are: microcrystalline cellulose, sodium

starch glycolate, dibasic calcium phosphate anhydrous, drug substance and magnesium

stearate. The first four of these raw materials are blended together for a set time before

being passed through a Fitzmill screen, to remove lumps from the blender contents.

Magnesium stearate is then added to the blender, and the contents are further blended

for a specified time. At the end of the blend stage, samples of blend are collected for

analysis (usually a three-point sample) by traditional analytical techniques (Section 3.4)

before proceeding to tabletting. At this stage, the process can be delayed as samples

must be analysed in the laboratory, away from the production area.

Tabletting of the blend produces one of two strengths of tablet, depending on the fill

weight and size of die chosen. Samples of tablets are then analysed by reference

analytical methods (See Section 3.4).

The pharmaceutical process studied was one which was normally well controlled.

However, during a period of manufacture, a number of batches of blends and tablets

were produced which were of lower quality. Some of these problem batches were

identifiable from reference analysis of the blends, however most could not be. These

produced batches of tablets which were friable, had high average tablet thickness and

96
which showed prolonged tablet dissolution time. All of these unusual batches of tablets

had analytical results within the limits of the product specification, however they each

showed results close to the limits and were therefore considered unusual.

3 .3 Materials

A total of 205 production batches of blends and tablets were examined. These were

supplied by the manufacturer (Pfizer Ltd., Pfizer Central Research, 1 Ramsgate Road,

Sandwich, Kent, CT13 9NJ, UK.).

The blends analysed were 193 different batches of the total number of batches studied.

Of these batches, 13 were re-blended batches (as a result of unusual reference analysis

results) and one was a placebo batch. With each batch of blend, specimens were

supplied which were taken from different locations within the blender (Flobin). The

number of these sampling locations was either three or seven (if the batch was re­

blended) different blender locations. Samples were collected by the Quality Assurance

laboratory over a six year period. A total of 1881 blend NIR absorbance spectra were

measured.

The tablets obtained were of two strengths of the drug substance and are both

manufactured from an identical blend composition and hence are two sizes. The number

of batches used was 44 with lower strength tablets and 43 batches with higher strength

tablets. The tablets themselves were white and were embossed with the Pfizer logo on

one side and embossed with distinguishing lettering on the other side.

For the two strengths of tablet, 39 batches of lower strength tablets were supplied with

their corresponding blends and 41 batches of the higher strength tablets were supplied

with their corresponding blends.

97
3. 4 Reference Analytical Data

Reference laboratory data were provided for most batches of blends and tablets.

For blended powders, the variables measured were;

1. drug substance content/mg

2. blend uniformity (%);

3. moisture content (%);

4. moisture deviation (%);

5. UV specific absorbance, A(l%, 1 cm), of a 1% filtered solution of 1cm

pathlength.

Blend uniformity is the maximum absolute difference in drug substance contents, of

replicate assays for a batch, from their mean value, expressed as a percentage. Moisture

deviation is the maximum absolute difference in moisture contents of replicate assays

for a batch from their mean value, expressed as a percentage. These test were performed

for blends used to produce the lower and higher strength tablets (n= 193 batches).

For tablets, the variables measure were:

1. drug substance content per tablet/mg;

2. content uniformity (%);

3. moisture (%);

4. dissolution (%);

5. disintegration/secs;

6. average tablet weight/mg;

7. average tablet hardness/kPa;

8. average tablet thickness/mm;

98
9. friability/mg.

The tablet tests are standard pharmacopoeial tests and were performed for the lower

strength tablets {n = 44 batches) and for the higher strength tablets {n = 43 batches).

Summaries of the reference analytical data for the blends (Section 3.8) and lower and

higher strength tablets (Section 3.9) are provided in Tables 3.1, 3.2 and 3.3 respectively.

Summaries of the combined blend and tablet data sets (multiway data sets*) used in

Section 3.10 are provided in Tables 3.4 and 3.5 respectively. The batch numbers of

product, and their indices in data sets (blend (Section 3.8), lower strength tablet (Section

3.9), higher strength tablet (Section 3.9), and multiway blend and tablet data sets

(Section 3.10)) are provided in Table 3.6 (N.B. some C. of A. reference analysis results

were missing for some batches).

Multiway data sets are three dimensional data sets that may be unfolded to form a two-dimensional array.

99
Table 3.1. Summary of blend C. of A. data (n = 193 batches).

Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg g 24.98 0.41 25.8 24.2 185
Blend uniformity (%) 0.88 0.63 3.0 0.0 185
Moisture content (%) 2.98 0.23 4.0 - 185
Moisture deviation (%) 2.23 1.64 5.0 0.0 172
A{\%, 1 cm) 115.32 1.19 - - 182

Table 3.2. Summary of lower strength tablet C. of A. data {n = 44 batches)

Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg 4.98 0.08 5.15 4.85 40
Content uniformity (%) 1.70 0.93 6.0 0.0 40
Moisture content (%) 3.14 0.31 4.5 - 40
Dissolution (%) 99.15 1.88 100 90 40
Di sintegration/secs 9.63 0.95 600 - 40
Mean weight/mg 199.52 0.37 - - 40
Hardness/kPa 13.60 0.94 17 9 40
Thickness/mm 3.31 0.04 3.5 3.2 40
Friability/mg 0.28 0.60 - - 40

Table 3.3. Summary of higher strength tablet C. of A. data (n = 43 batches).

Variable Mean value Standard deviation Upper limit Lower limit Batches
Drug substance content/mg 10.00 0.10 10.3 9.7 43
Content uniformity (%) 1.36 0.69 6.0 0.0 43
Moisture content (%) 3.13 0.32 4.5 - 43
Dissolution (%) 98.85 1.20 100 90 43
Disintegration/secs 10.71 1.19 600 - 43
Mean weight/mg 399.70 0.65 - - 43
Hardness/kPa 14.32 0.71 17 9 43
Thickness/mm 4.58 0.05 4.6 4.1 43
Friability/mg 1.58 1.16 - - 43

100
Table 3.4. Summary of multiway lower strength blend and tablet C. of A. data {n

39 batches).

Variable Mean value Standard Upper limit Lower limit Batches


deviation
Blend drug substance content/mg g"' 24.81 0.33 25.8 24.2 37
Blend uniformity (%) 0.90 0.50 3.0 0.0 37
Blend moisture content (%) 2.96 0.16 4.0 - 37
Blend moisture deviation (%) 2.52 1.66 5.0 0.0 37
Blend A(l%, 1 cm) 115.62 1.41 - - 37
Tablet drug substance content/mg 4.98 0.08 5.15 4.85 35
Tablet content uniformity (%) 1.85 0.95 6.0 0.0 35
Tablet moisture content (%) 3.23 0.30 4.5 - 35
Tablet dissolution (%) 98.96 1.95 100 90 35
Tablet disintegration/secs 9.43 0.92 600 - 35
Tablet mean weight/mg 199.50 0.33 - - 35
Tablet hardness/kPa 13.40 0.84 17 9 35
Tablet thickness/mm 3.32 0.04 3.5 3.2 35
Tablet friability/mg 0.20 0.47 - - 35

Table 3.5. Summary of multiway higher strength blend and tablet C. of A. data {n

= 41 batches).

Variable Mean value Standard Upper limit Lower limit Batches


deviation
Blend drug substance content/mg g“* 24.87 0.33 25.8 24.2 40
Blend uniformity (%) 0.93 0.55 3.0 0.0 40
Blend moisture content (%) 2.92 0.32 4.0 - 40
Blend moisture deviation (%) 2.16 1.29 5.0 0.0 40
Blend A(l%, 1 cm) 115.45 0.97 - - 38
Tablet drug substance content/mg 10.00 0.10 10.3 9.7 41
Tablet content uniformity (%) 1.46 0.76 6.0 0.0 41
Tablet moisture content (%) 3.18 0.31 4.5 - 41
Tablet dissolution (%) 98.73 1.20 100 90 41
Tablet disintegration/secs 10.67 1.22 600 - 39
Tablet mean weight/mg 399.69 0.67 - - 41
Tablet hardness/kPa 14.25 0.72 17 9 41
Tablet thickness/mm 4.58 0.04 4.6 4.1 41
Tablet friability/mg 0.41 0.95 - - 41

101
Table 3.6. Batch numbers and indices of blend, tablet and multiway blend and

tablet data sets {n = 205 production batches).

Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
1 1 0 0 0 0
967 2 0 0 0 0
968 3 0 0 0 0
969 4 0 0 0 0
973 5 0 0 0 0
974 6 0 0 0 0
975 7 0 0 0 0
976 8 0 0 0 0
983 9 0 0 0 0
984 10 0 0 0 0
985 11 0 0 0 0
989 12 0 0 0 0
990 13 0 0 0 0
993 14 0 0 0 0
997 15 0 0 0 0
998 16 0 0 0 0
999 17 0 0 0 0
1006 18 0 0 0 0
1009 19 0 0 0 0
1010 20 0 0 0 0
1015 21 0 0 0 0
1016 22 0 0 0 0
1017 23 0 0 0 0
1024 24 0 0 0 0
1025 25 0 0 0 0
1029 26 0 0 0 0
1031 27 0 0 0 0
1034 28 0 0 0 0
1469 29 0 0 0 0
1969 30 0 0 0 0
1971 31 0 0 0 0
1973 32 0 0 0 0
1975 33 0 0 0 0
1977 34 0 0 0 0
1979 35 0 0 0 0
1981 36 0 0 0 0
1983 37 0 0 0 0
1987 38 0 0 0 0
1990 39 0 0 0 0
1991 40 0 0 0 0
1992 41 0 0 0 0
1997 42 0 0 0 0
1998 43 0 0 0 0
1999 44 0 0 0 0
2000 45 0 0 0 0
2005 46 0 0 0 0
2007 47 0 0 0 0
2011 48 0 0 0 0
2011 (re-blend) 49 0 0 0 0
2017 50 0 0 0 0
2019 51 0 0 0 0
2021 52 0 0 0 0
2023 53 0 0 0 0
2025 54 0 0 0 0
2027 55 0 0 0 0
2029 56 0 0 0 0
2031 57 0 0 0 0
2033 58 0 0 0 0
2035 59 0 0 0 0
2039 60 0 0 0 0

102
Table 3.6. Continued.

Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
2041 61 0 0 0 0
2043 62 0 0 0 0
2045 63 0 0 0 0
2047 64 0 0 0 0
2049 65 0 0 0 0
2051 66 0 0 0 0
2055 67 0 0 0 0
2057 68 0 0 0 0
2061 69 0 0 0 0
2063 70 0 0 0 0
2967 71 0 0 0 0
2967 (re-blend) 72 0 0 0 0
2969 73 0 0 0 0
2981 74 0 0 0 0
2983 75 0 0 0 0
2985 76 0 0 0 0
2987 77 0 0 0 0
2988 78 0 0 0 0
2989 79 0 0 0 0
2990 80 0 0 0 0
2994 81 0 0 0 0
3011 82 0 0 0 0
4050 0 0 1 0 0
4058 83 0 0 0 0
4059 84 0 0 0 0
4978 0 0 2 0 0
4980 0 0 3 0 0
4983 0 1 0 0 0
4984 0 2 0 0 0
4985 0 3 0 0 0
4986 0 4 0 0 0
4987 0 5 0 0 0
4988 0 6 0 0 0
4989 0 7 0 0 0
4990 0 0 4 0 0
4991 85 0 5 0 1
4992 86 0 6 0 2
4993 87 8 0 1 0
4994 88 9 0 2 0
4995 0 10 0 0 0
4996 89 11 0 3 0
4997 90 12 0 4 0
4998 91 13 0 5 0
4999 92 0 7 0 3
5000 93 0 8 0 4
5001 94 0 9 0 5
5002 95 14 0 6 0
5003 96 15 0 7 0
5004 97 0 0 0 0
5005 98 0 0 0 0
5006 99 0 0 0 0
5007 100 0 0 0 0
5008 101 0 0 0 0
5008 (re-blend) 102 0 0 0 0
5009 103 0 10 0 6
5010 104 0 11 0 7
5011 105 0 12 0 8
5012 106 0 0 0 0
5013 107 16 0 8 0
5014 108 0 0 0 0
5015 109 0 0 0 0
5016 110 17 0 9 0
5017 111 18 0 10 0
5018 112 19 0 11 0
5019 113 0 0 0 0
5020 114 0 0 0 0
5021 115 0 13 0 9

103
Table 3.6. Continued.

Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
5022 116 0 14 0 10
5023 117 0 15 0 11
5024 118 0 16 0 12
5025 119 0 0 12 0
5025 (re-blend) 120 20 0 13 0
5026 121 21 0 14 0
5027 122 22 0 15 0
5028 123 0 0 16 0
5028 (re-blend) 124 23 0 17 0
5029 125 24 0 18 0
5030 126 0 0 0 0
5031 127 0 17 0 13
5032 128 0 18 0 14
5033 129 0 19 0 15
5035 130 25 0 19 0
5036 131 26 0 20 0
5037 132 27 0 21 0
5038 133 28 0 22 0
5039 134 0 0 23 0
5039 (re-blend) 135 29 0 24 0
5040 136 30 0 25 0
5041 137 31 0 26 0
5042 138 32 0 27 0
5043 139 0 0 0 16
5043 (re-blend) 140 0 0 0 17
5043 (re-blend) 141 0 20 0 18
5044 142 0 21 0 19
5045 143 0 0 0 0
5045 (re-blend) 144 0 0 0 0
5045 (re-blend) 145 0 0 0 0
5045 (re-blend) 146 0 0 0 0
5045 (re-blend) 147 0 0 0 0
5045 (re-blend) 148 0 0 0 0
5046 149 0 22 0 20
5047 150 0 23 0 21
5048 151 0 24 0 22
5049 152 0 25 0 23
5050 153 0 26 0 24
5051 154 0 27 0 25
5052 155 0 28 0 26
5053 156 0 29 0 27
5054 157 0 30 0 28
5055 158 33 0 28 0
5056 159 34 0 29 0
5057 160 35 0 30 0
5058 161 0 31 0 29
5059 162 0 32 0 30
5060 163 0 33 0 31
5061 164 36 0 31 0
5062 165 37 0 32 0
5063 166 0 0 0 0
5064 167 38 0 33 0
5065 168 39 0 34 0
5067 169 40 0 35 0
5068 170 0 34 0 32
5069 171 0 35 0 33
5070 172 0 36 0 34
5071 173 0 37 0 35
5072 174 0 0 0 0
5073 175 0 38 0 36

104
Table 3.6. Continued.

Batch number Blend batch index Lower strength Higher strength MPCA/ MBPLS MPCA/MBPLS
tablet batch index tablet batch index lower strength higher strength
batch index batch index
5074 176 0 39 0 37
5075 177 41 0 36 0
5076 178 42 0 37 0
5077 179 43 0 38 0
5078 180 0 40 0 38
5079 181 0 41 0 39
5080 182 0 42 0 40
5081 183 0 43 0 41
5967* 184 44 0 39 0
5968 185 0 0 0 0
5969 186 0 0 0 0
5970 187 0 0 0 0
5970 (re-blend) 188 0 0 0 0
5971 189 0 0 0 0
5972 190 0 0 0 0
5972 (re-blend) 191 0 0 0 0
5974 192 0 0 0 0
5975 193 0 0 0 0

105
* Example - blend BN5967 has: batch index 184 in the blend data set; batch index 44 in the lower strength
tablet data set and batch index 39 in the multiway PCA and multiblock PLS data sets.
3. 5 Near Infrared Measurements

In this study, NIR measurements were made using a grating instrument (Foss

NIRSystems, Maryland, USA) equipped with either a diffuse reflectance module (Rapid

Content Analyser) or a transmission module (Intact Analyser). The diffuse reflectance

module records spectra over the range 1100 to 2498 nm, at 2 nm intervals (700 data

points) and outputs spectra as absorbance (logio(l//?)). The reflectance reference used

with the RCA module was a ceramic tile. The transmission module records spectra over

the range 600 to 1898 nm, at 2 nm intervals (650 data points) and outputs spectra as

absorbance (logio(l/7)). The spectra recorded with the transmission module used air as

the reference.

3. 5. 1 Sample Preparation A nd Presentation

All spectra recorded were the average of 50 scans, which is the default setting on the

instrument vendor’s software (NSAS).

Blends were scanned using the diffuse reflectance RCA module (as described in Section

2.6), in narrow soda-glass vials (henceforth these data are referred to as absorbance).

For each specimen, 3 soda glass vials (50 mm deep by 25 mm wide) were filled with

blend (approximately 8 g) and their NIR absorbance spectra recorded.

The tablets were scanned in both diffuse reflectance (RCA module, described in Section

2.6) and transmission modes (Intact module). Transmission measurements of the two

strengths of tablet were made by placing a tablet of either strength into a specially

machined aluminium template, with a circular hole underneath the tablet to allow

transmission of NIR radiation. For each measurement, the template and tablet were

placed inside the module, between the fibre optic probe and the lead sulphide detectors,

and the lid closed. To allow consistency of measurement, each tablet was scanned with

the Pfizer logo adjacent to the source of incident NIR radiation. Henceforth, diffuse

106
reflectance measurements of tablets are referred to as absorbance data; transmission

measurements of tablets, though converted to apparent absorbance data, are referred to

as transmission data. The numbers of NIR spectra measured for the lower strength

tablets were 4911 absorbance spectra (mean =112 tablets per batch) and 4904

transmission spectra (mean =111 tablets per batch). The numbers of NIR spectra

measured for the higher strength tablets were 2716 absorbance spectra (mean = 63

tablets per batch) and 2721 transmission spectra (mean = 63 tablets per batch). Spectra

were acquired over several weeks.

3. 6 Data Analysis And Pre-treatment

As NIR spectral data are often highly collinear, they generally require pre-processing -

to remove multiple scatter effects - followed by multivariate analysis to extract useful

chemical and physical information. A number of different scatter pre-treatments of

blend and tablet data were therefore examined. These were:

1. Raw spectral data (absorbance/transmission);

2. SNV;

3. DT;

4. SNV-DT;

5. Sg2dll.

These 4 pre-treatments were applied to blend absorbance data, and absorbance and

transmission data for each of the two strengths of tablet. This allowed for a total of 55

data sets (including multiway data sets of appended blend and tablet data) to be studied

via principal components analysis (equation (1.7.1)) (Sections 3.6.2 and 3.6.3).

The data were analysed using code programmed in Matlab 5.2 Scientific and Technical

107
Programming Language (The Mathworks Inc., Natick, MA, USA).

3. 6.1 Wavelength Selection

Spectral Characteristics

All spectra exhibited non-uniform baselines arising from multiple scatter. The SNV

transform was able to correct for variation in scatter within each data set. SNV coupled

with DT was able to correct for both variation in scatter within each data set as well as

correct for non-uniform baselines. The Sg2dl 1 transformation was able to correct both

variation in scatter and non-uniform baselines. In addition, it resolved chemical peaks in

the spectra which in the raw data were obscured as overlapped combinations and

overtones of fundamental mid infrared absorptions. However, in absorbance mode the

Sg2dl 1 spectra appeared more noisy beyond 2200 nm.

Absorbance Spectra

The wavelength range scanned was used to generate PCA models for absorbance, SNV ,

DT and SNV-DT pre-treated spectral data. However, with the Sg2dll transform, due to

the increased noise beyond 2200 nm, the wavelength range used was truncated to 2200

nm^

Transmission Spectra

With transmission measurements (raw, SNV, DT and SNV-DT pre-treated spectral

data), the wavelength range selected was 750 to 1208 nm for higher strength tablets and

750 to 1350 nm for lower strength tablets. These wavelength ranges were selected to

provide spectra within the dynamic range of the detector* and were different for the two

^ Owing to Sg2dl 1 transformation, the wavelength range used was 1110 to 2200 nm.
'The dynamic range of the detector is approximately 2 absorbance units on a baseline. The maximum absorbance at any wavelength
should not exceed 6 absorbance units.
108
strengths of tablet due to different tablet thickness. With the Sg2dl 1 transformation, this

range was truncated to 1196 nm and 1338 nm respectively^

3. 6. 2 Principal Components Analysis Model Generation

PCA Models were generated for all data pre-treatments (absorbance/transmission, SNV,

DT, SNV-DT and Sg2dl 1) of each data set: blends (absorbance data); lower strength

tablets; higher strength tablets (absorbance data and transmission data for each tablet

strength); multiway data sets containing blend and tablet data (absorbance and/or

transmission). To eliminate systematic variation in the data due to scatter effects from

sample presentation geometry and variable optical properties of the soda glass vials, the

average spectrum for each batch was used for PCA. This was found to produce more

acceptable PCA models, requiring both fewer components and less cross validation

computation time.

With each PCA model, the variable mean-centred spectral data, X, were decomposed as

the product of a score matrix, T, and a loadings matrix, P, according to equation (1.7.1)

(Piovoso et al, 1992).

The rank of each PCA model was determined by ‘leave-one-out’ cross-validation and

involved calculation of predicted residual error sum of squares {PRESS) and R and W

(Eastment and Krzanowski, 1982) statistics. In addition, the cumulative percentage sum

of squares (%SS) (Lindberg et al, 1983) for each extracted PC was calculated and a chi-

squared significance test was performed on the eigenvalues (Jackson, 1991) (Appendix

B). Once generated, these PCA models were amenable to multivariate statistical process

control (MSPC) methods, described in Section 3.7.

^Owing to Sg2dll transformation, the wavelength ranges used were: 760 to 1196 nm with transmission spectra of higher strength
tablets and 760 to 1338 nm with transmission spectra of lower strength tablets.
109
Blend And Tablet PCA Model Loadings

With the blend data-set, the PCA loadings for each pre-treatment appeared to contain

physical and chemical information. An example of the PCA loadings for absorbance

spectra (« = 193 batches) is given in Fig. 3.1. The first two loadings appeared to

represent physical information, and showed a general reduction in value across the

wavelength range. One PC’s loadings represent absorptions at 1930 nm and 1410 nm,

which are characteristic absorptions of water (Osborne et al, 1993) (Fig. 3. IE).

The loadings of PCA models generated from tablet absorbance and transmission data

also appeared to contain physical and chemical information. An example of the loadings

of absorbance and transmission spectra of lower strength tablets is given in Fig. 3.2 and

Fig. 3.3 respectively {n = 44 batches). The loadings of the third PC of the lower strength

tablet absorbance spectra also seemed to contain features characteristic of water (Fig.

3.2C).

3. 6. 3 Characterising The Entire Process Using Multiway PCA

In section 3.6.2, the method of PCA is described for the blend and tablet stage data sets.

The PC scores from these data sets may be monitored by MSPC procedures so that

deviations in process performance at each stage may be identified. However, it was also

considered desirable to examine how the process performed overall. Multiway PCA

(MPCA) (Kresta et al, 1991; Skagerberg et al, 1992; Nomikos and MacGregor, 1994) is

a data reduction method that enables the data of an entire chemical process to be

described by a few latent variables and was therefore examined. The variability among

batches with respect to their variables and time variation were studied by MPCA which

enabled the process to be summarised as a ‘fingerprint’ in reduced dimensional space.

In MPCA, the three-way array of spectral data, X, may be decomposed into a series of A

principal components consisting of scores matrices. Ta, and loadings v e c t o r s , p l u s a

110
0.04
0.05 0.02

■5 0.04
<0 — 0.02

-0 .0 4
0.03
-0 .0 6

0.02 —0.08
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm

0.06

0.04 0.05

c>
0.02

a. -0.02
O
“■ -0 .0 5
-0 .0 4

1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm

0.1

g) 0.05
.5 0.05 c

-0 .0 5
-0 .0 5

1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm

0.05 0.1
(0
oc> S) 0.05

1
^ -0 .0 5
-0 .0 5

- 0.1 - 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
Wavelength/nm Wavelength/nm

Fig. 3.1. Blends NIR absorbance spectra PCA loadings (n = 193 batches): A) PCI;
B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PCS.

Ill
0.04
0.06
0.02
o>

-J 0.03 j - 0 .0 2

0.02 -0 .0 4

0.01 -0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1
0
I 0.05
?
■5 -0 .0 5
1
o0.
Q. -0.1
-0 .0 5
-0 .1 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1

0.05 0.05

0
I
o -0 .0 5 u
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1
0.15
0.05
?
o 0.05
1
Ü -0 .0 5
Q.
- 0.1
-0 .0 5
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

Fig. 3.2. Lower strength tablet NIR absorbance spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6; G) PC7 & H) PC8.

112
0.07
0.05
c 0.065

0.06
y -0 .0 5
0.055
- 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm

0.15 (0
I 0.1 g* 0.1
1 0.05 1
o
Q.

-0 .0 5 - 0.1
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm

0.05 0.05
O)
§ -0 .0 5
-0 .0 5
ü -0.1
o. -0.1 Q.
-0 .1 5
-0 .1 5
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.1
0.1
« 0.05
I
-0 .0 5 1
^ - 0.1
- 0.1

800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm
0.15

0.1
0.05

ü
Q.
-0 .0 5

800 900 1000 1100 1200 1300


W avelength/nm

Fig. 3.3. Lower strength tablet NIR transmission spectra PCA loadings (n = 44
batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6; G) PC 7; H) PC 8 &
I) PC9.

113
residual array, E, which is as small as possible in a least squares sense (Bharati and

MacGregor, 1998; Wold et al, 1987; Westerhuis and Coenegracht, 1997):

+£ (3.6.1)
a=\

Where ® represents the Kronecker product. Mathematically, MPCA is equivalent to

performing ordinary PCA (Section 3.6.2) on the unfolded array, X (Wold et al, 1987). In

this chapter, data reduction was achieved by performing MPCA on variable mean-

centred unfolded three-way data sets (blend and tablet NIR data [absorbance and/or

transmission]). Two-way arrays of data were produced by slicing each three-way array,

X, vertically and appending the tablet absorbance and/or transmission data set(s) to the

right of the blend absorbance data set. This provided two-way arrays of rows of

observations {i.e. batch) versus columns equal to the number of combined wavelengths.

For the lower strength tablet process, the two-way arrays therefore contained NIR

spectral data of 39 batches of blends and their corresponding tablets. The higher

strength tablet process two-way arrays therefore contained NIR spectral data for 41

batches of blends and their corresponding tablets. PCA was then performed on these

unfolded three-way arrays, as for the discrete data sets. The models were derived from

batch average spectra with rank determined as for two-way PCA models (Section 3.6.2).

The effects of the different data pre-treatments on MPCA models were also

investigated. Once generated, these MPCA models were amenable to multivariate

statistical process control (MSPC) methods, described in Section 3.7.

3. 7 Multivariate Statistical Process Control Methods

NIR spectra of the blends and tablets were used for MSPC purposes. The dimensionality

of these data were first reduced by subjecting them to PCA (1.7.1) (Section 1.7). Using

114
the resulting PC scores, sample variance-covariance matrices of the process at each

stage and of the overall process, were estimated. This is known as ‘Control phase 1’

(Alt, 1985; Statistical Process Control, ed. Mamzic, 1995) and was performed to

establish statistical control levels. The method employed a Monte Carlo search as an

optimisation procedure to determine batches produced whilst the process operated

within statistical control (Section 3.8.3). The optimisation procedure was therefore used

to estimate both the theoretical process mean PC score vector and the theoretical sample

variance-covariance matrix at each stage.

Once these were estimated, all batches examined for each model were then treated as

future production samples for control phase 2 monitoring (Alt, 1985). This involved

monitoring the PC scores of these batches, at each process stage, by means of residual

analysis, using the sample variance-covariance matrix and process mean PC score

vector determined in control phase 1. The ability of the method to monitor the process

overall, from the raw materials through to the finished dosage form, was determined.

3. 7. 1 Hotelling^s Control Phase 1

The generalised Hotelling’s 7^ was calculated for the average spectrum of each batch

with the PCA models examined and was calculated according to equation (1.8.4) with

upper control limit, UCL, for this statistic calculated according to equation (1.8.5). The

value of a of the upper I00a% critical point of the F distribution was set to 0.95

(warning level) and 0.99 (control level).

3. 7. 2 Multivariate Exponentially Weighted Moving Average (MEWMA)

This statistic was calculated for the batches examined with each PCA model, and used

the process mean score vector and sample variance-covariance matrix determined in

control phase 1. The upper control limits used with this chart were based on tabulated

115
data by Lowry et al. (1992) for PCA models with 4 or less PCs. PCA models with a

greater number of PCs were assigned control limits based on the chi-squared

distribution (Neave, 1995) multiplied by a factor of 1.05, as suggested by Lowry et al

(1992).

3. 7. 3 Process Variance: Sample Generalised Variance

All PCA models calculated in this chapter comprised more than two PCs, hence process

variance was monitored using Anderson’s asymptotic normal approximation.

Process Variance Control Phases 1 and 2

To implement this MSPC procedure, 9 spectra were selected at random for each batch,

from the library of data. This number was chosen as it was greater than the maximum

number of retained PCs in any model (8 PCs) and was equivalent to the lowest number

of samples measured for a blend (9 spectra). This number was also used for tablet

models and is close to the number used in British Pharmacopoeial assay methods (9

tablets out of 10 must have uniformity of content within 80 and 120% of specified

amount) {British Pharmacopoeia 1999, 1999) - however more tablet spectra could have

been used. Multiway PCA models which combined blend and tablet data were limited to

9 samples per batch.

The randomly selected spectra were mean-centred and projected onto the eigenvectors

for each model. The resulting PC scores were then used to calculate Anderson’s

asymptotic normal approximation for each batch.

Preliminary control charts for this sample generalised variance were constructed with

various statistical control limits, however with most PCA models, this approach did not

produce control charts with a mean Anderson's asymptotic normal approximation of 0

and variance 2n (where n is the number of PCs) because some batches had significant

116
generalised variances above 99.9% limits. To overcome this problem, a recursive

algorithm was implemented for control phase 1 which excluded these batches from the

calculation of the theoretical sample generalised variance. The control phase 1 process

terminated when the mean of Anderson’s asymptotic normal approximation for the

batches used in this calculation was 0. This was found to be effective, and in control

phase 2 monitoring produced improved discrimination of batches with significantly high

Anderson’s asymptotic normal approximation.

3. 7. 4 Diagnosis of Out-of-Control Batches:

Hotelling^s Stacked Bar Charts And Shewhart-Type Plots on Individual PC Scores

In this study, the individual PC contributions to the multivariate of batches found to

be out of control were examined. The results of these plots may be found in Appendix B

(Tables: B35 to B49).

In addition, Shewhart control charts were constructed for the individual PC scores

(Jackson, 1991). These used the mean and range for each PC score sample, measured

for the historical data set. The overall or grand mean was, in most cases, equal to zero

(unless additional batches were projected into the model space) with each model, whilst

the range was equal to the standard deviation of the observations multiplied by the

corresponding value of the r-distribution (95% and 99% control limits). With blends, a

similar approach to the PC Shewhart method was adopted to diagnose PCs responsible

for significant process variance for batches which showed significant multivariate

dispersion (using Anderson’s asymptotic normal approximation) (Wise et al, 1991). To

determine the PCs on which significant variance occurred, the data set of each batch

was first sub-group mean-centred. The range was calculated from the centred batch data

with overall mean equal to zero.

117
3. 7. 5 Generation of PCA Models of Normal Batches: Q Statistic PCA Residual

Analysis

The Q statistic was used to identify batches which had not processed in the normal

manner (Nomikos and MacGregor, 1994) - and which were located outside the plane

defined by the principal axes. The implementation of this control chart on any PCA

model, involved an initial screening of the model with rank determined by the cross-

validation R statistic. Batches whose Q values exceeded 95% and 99% limits were

deemed to have processed unusually and were excluded from the 'normal' data set. A

full PCA cross-validation was then performed on the updated data set, and the control

limits for Q were recalculated based on the rank of the updated model. Those batches

which were excluded from the model were centred and projected onto the eigenvectors,

and their residuals and Q statistics calculated. Additional batches falling beyond the

99% limits were excluded, as were any batches whose Q values were above the 95%

control limits and were consecutive in product batch number with any batches already

eliminated from the 'normal' data set. This process was repeated until all unusual

batches had Q values exceeding the 99% level with no consecutive batches having Q

values above the 95% level.

3. 8 Multivariate Statistical Quality Control of Pharmaceutical Blends

3. 8.1 Principal Components Analysis,

A visual inspection of PC loadings confirmed that all PCs selected with the R statistic

described systematic variation (loadings showed chemical or physical information);

extra PC loadings appeared more random and were deemed to represent noise. The W

statistic which is another popular stopping criterion tended to select more components

than R before terminating and was therefore considered less useful. The cumulative

%SS explained by models was in excess of 99.5% for all models except the Sg2dl 1 data

118
set. This model explained only 94%SS, probably owing to increased noise from the

derivative transformation.

Between 6 and 8 PCs were required. The absorbance data set required most PCs, its first

two component loadings appear to represent scatter information, and show a general

decrease across the wavelengths (Fig. 3.1 A & B). With all PCA models, at least one PC

has high positive loadings at 1400 and 1900 nm and may represent water. This is found

on the 5^^ PC with absorbance data, 2"^ PC with SNV data; and 2"^ PCs with DT

data; F* PC with SNV DT data and on the first PC with Sg2dl 1 data. The PC loadings

of the models also showed significant correlations with NIR absorbance spectra of the

excipients (Appendix B: Table B51).

Examination of the PC scores for all data sets reveals systematic variation which can be

clearly observed on the PC for all PCA models. With the first PC for each model, the

batches were clearly divided into three main groups over time: low level; high level and

optimum level (about zero vector). This is shown for the PC scores of blend Sg2dl 1

data (Fig. 3.4). This systematic process variation on the first PC was considered to relate

to a change in the efficiency of the milling step during blending. Over the period of high

systematic process variation, the batches of drug substance used were more brittle and

less free flowing. These were consequently harder to mill. This was later confirmed by

NIR imaging microscopy of some of these blends (Chapter 5).

3. 8. 2 Unusual Batch Detection: Q Statistic

With this statistic, similar batches for all data sets were found to have processed

unusually and thereby not fit the plane defined by the model. The placebo blend (29)

was easily identified as an outlier with all models. Consideration of results from data

analysis of multiway PCA models suggested that the results of SNV DT and Sg2dl 1

models, which were consistent, provided the best indication of unusual blends. This is

119
-4
X 10 A X 10

S '... •
s »
u
0.
-2 -1

50 100 150 50 100 150


Batch Batch
X 10” x lO ”
1
1.5
• * • •
1
0.5
i 0.5
8
L:
D.
(0
Ü
CL
-1 -0.5

-1.5 -1
50 100 150 50 100 150
Batch Batch
X 10” X 10”
10

5
2
8 0
(/) - V.:". •• - iO-St-
Ü
CL -5 ■V. 1.
-5
-10
50 100 150 50 100 150
Batch Batch
X 10”

£-1

-2

-3

-4
50 100 150
Batch

Fig. 3.4. PC Shewhart control charts for PCA scores of NIR Savitzky-Golay 2”**
derivative of absorbance spectra of blends (n = 193 batches). A) PCI scores; B)
PC2 scores; C) PC3 scores; D) PC4 scores; E) PCS scores; F) PC6 scores & G) PC7
scores (95% and 99% control limits shown).

120
shown for SNV DT data set in Fig. 3.5. With these models, a number of consecutive

batch numbers were found to have processed unusually (Batches: 125 (BN5029), 126

(BN5030); 167 (BN5064), 168 (BN5065), 169 (BN5067); 180 (BN5078), 181

(BN5079); 192 (BN5974), 193 (BN5975)).

3. 8. 3 Hotelling Control Charts

Hotelling*s Control Phase 1: Monte Carlo Genetic Algorithm

For statistical process control monitoring, it is necessary to establish control limits

based on data from previous successful batch runs. This entails estimation of an 'in-

control' sample covariance matrix and construction of a Hotelling’s 7^ control ellipse

which encompasses these measurements. With NIR process measurements, no product

quality data may be available from which a suitable set of reference data may be

selected and an alternative approach is required to identify those batches from which the

‘in control’ process covariance matrix may be estimated. The approach employed here

utilised a Monte Carlo search algorithm as an optimisation procedure to identify 'in-

control' batches. The algorithm initially selected a subset of batches at random -

typically 2 or 3 more batches than the number of retained PCs - from which a sample

covariance matrix and 99% Hotelling’s 7^ limits were calculated. The Hotelling’s 7^

distances of these batches were then measured and if all batches were located within the

99% confidence ellipsoid, control phase 2 was implemented. This involved measuring

the 7^ for all other batches and recording the frequency of each batch failing the test.

Typically this process was repeated 100 times and the results were plotted as a

frequency bar chart of outliers. These outlier batches were excluded from control phase

1; the remaining batches were used to estimate the process covariance matrix and grand

mean vector. If however any batch fell outside the 99% control ellipsoid, it was

removed from the phase 1 data set and the covariance matrix re-estimated. Once all

121
Batch

Fig. 3.5. Q Statistic control chart of PCA model of blends (n = 193 batches) (NIR
SNV-DT absorbance spectra) (95% and 99% limits shown).

1 2 2
control batches were found to lie within the 99% ellipsoid, the Hotelling’s 1^ was

calculated for all batches and plotted as a control chart. Batches exceeding 95 and 99%

control limits were recorded.

With up to 6 PCs included in the model, this approach was found to work well with 8 to

10 preliminary batches and 99% control limits. However with 8 PCs, the search

required considerable computation time (several hours), and was therefore considered

unacceptable. Two alternative strategies were found to provide useful results:

1. Use (n PCs + 1) batches in the preliminary search with pcs, i, «where n is

the number of PCs in the model and a = 99 or 99.9%, to identify potential

outliers, followed by 95 and 99% control limits thereafter.

2. Use PCs 1+2 (or 1 to 3 ), which describe most variance in the data, and

repeat the procedure described above in instances with 8 PCs or more, using

approximately 6 batches in the preliminary Monte Carlo search. Then use

selected batches and repeat control phase 1 monitoring with inclusion of all

PCs. Exclude any batches which lie outside the 99% ellipsoid from the

control phase 1 batches. Proceed to control phase 2 monitoring.

An example bar chart for blend DT data is shown in Fig. 3.6.

Hotelling Control Phase 2

All production batches were then monitored using the estimated process covariance

matrix and PC scores to measure Hotelling’s 7^. The results for absorbance and pre­

treated data sets were all similar using 99% confidence limits: the placebo blend 29

(BN1469) was identified as an outlier (PC6, Fig. 3.4 & 3.10); batch 21 (BN 1015) was

an outlier with all models and with absorbance and SNV consecutive batches 22

123
2 .5

2 1.5

0.5

20 40 60 80 100 120 140 160 180 200


Batch

Fig. 3.6. Results of Monte Carlo search of PC scores of blends (n = 193 batches)
(DT data) showing outlier frequency with batch (100 searches).

124
(BN 1016), 23 (BN1017), and 24 (BN 1024) were also outliers. With all control phase 2

Hotelling’s 7^ charts most batches between 83 and 154 (BN4058 and BN5051) were

out-of-control {i.e. showed systematic process variation). An example control chart for

DT blend data is shown in Fig. 3.7B. This is further illustrated on plots of two PC

scores with the 95% and 99% Hotelling’s 7^ control ellipses (Figs. 3.8, 3.9 & 3.10).

Hotelling^s MEWMA Control Charts

This control chart exhibited very similar behaviour with all models and clearly showed

systematic drift in the process with time. With all data sets, the process mean vector had

reached statistical process control (99% limit) by batch 82, however significant drift

occurred taking the process mean vector out-of-control. The level of this statistic fell

sharply from batch 154 onwards indicating that the process mean vector was shifting

towards the theoretical value determined by the search algorithm. Only with the detrend

model did the value again fall within control - however with all data sets the value

lowered and reached a constant level. This chart is consistent with the Hotelling’s 7^

control phase 2 charts which identified most batches between 83 and 154 as out-of-

control. An example of this control chart for DT data is shown in Fig. 3.7C.

3. 8. 4 PC Shewhart Control Charts

To minimise Type I error, 99% control limits were examined. With all data sets, the

placebo blend 29 (BN 1469) was clearly identified as being out-of-control (Appendix B:

Table B23). In addition, batches 21 (BN1015), 22 (BN1016), 23 (BN1017) and 24

(BN 1024) were mostly found to be out-of-control with absorbance, SNV, DT and SNV-

DT data models (Appendix B: Table B23). Batches 64 (BN2047), 74 (BN2981), 127

(BN5031) and 192 (BN5974) were out-of-control with all models except Sg2dl 1

(Appendix B: Table B23). This model however identified groups of consecutive

125
H otelling's 7^ control ch a rt: p h a s e 1 H otelling's 7^ c o n tro l c h a rt: p h a s e 2
10

10

10 I

10 20 40 60 80 100 120 50 100 150


In -c o n tro l b a tc h e s F u tu re p ro d u ctio n b a tc h e s

MEWMA c o n tro l ch a rt: p h a s e 2 P r o c e s s variability co n tro l c h a rt: p h a s e 2

I 600

c 200

50 100 ISO 50 100 150


F u tu re p ro d u ctio n b a tc h e s F u tu re p ro d u ctio n b a tc h e s

Fig. 3.7. MSPC Control charts of blend PC scores (DT data) {n = 193 batches): A)
Hotelling’s control phase 1; B) Hotelling’s 7^ control phase 2; C) Hotelling’s 7^
MEWMA & D) Anderson’s asymptotic normal approximation.

126
0.15

0.05

-0 .0 5 -

-0 .0 6 - 0.02 0.06
PC2 S cores

Fig. 3.8. PC2 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.

127
0.15

0.1

49
•3Ô1 *3
•64
•67
27 21
•60
0.05 •15 40
68 ' -3 /
,6032
29 •72

•192 •1 6 3 8 0
•181

•127
•110
-0.05 4 04

64

—0.1
-0.015 - 0.01 -0.005 0.005 0.01 0.015
PCS S cores

Fig. 3.9. PC6 versus PCI scores plot of PC A scores of blends (n = 193 batches) (DT
NIR absorbance data) with Hotelling’s 7^ 95% and 99% control ellipses.

128
0.06

■19:
0.04

•29 •186
•18*W&5% é%-
•158
0.02
•130
•192
•16 !
(/) •183 <30.
2 •141

ü8)
o
Q.

•1491 '€4
- 0.02

•58

-0 .0 4

-0 .0 6 —
-0.015 - 0.01 -0.005 0 0.005 0.01 0.015
PC6 S cores

Fig. 3.10. PC6 versus PC2 scores plot of PCA scores of blends (n = 193 batches)
(DT NIR absorbance data) with Hotelling’s 95% and 99% control ellipses.

129
batches as outliers: 136 (BN5040), 137 (BN5041) and 138 (BN5042) (Fig 3.4E); 171

(BN5069), 173 (BN5071) (Fig. 3.4C) and 174 (BN5072) (Fig. 3.4D); 161 (BN5058)

(Fig. 3.4F) (Appendix B: Table B23). Most of these batches were subsequently found to

be out-of-control with multiway PCA models.

Clearly, the individual PC score control charts were unable to identify drift in the

process as with the Monte Carlo search algorithm and Hotelling’s 7^ charts. This is

because all available data are used with no optimisation procedure. However, with all

blend models, visual inspection of the scores on the F* PC clearly show three levels,

indicating systematic variance in this component. However, they were able to identify at

the 99% control level, groups of consecutive batches which were also out-of-control

with Hotelling’s 7^ charts, eg batches 21, 22, 23, 24 and placebo batch 29. The Sg2dl 1

transformation produced best PC Shewhart charts capable of identifying a number of

batches which were subsequently out-of-control with MPC A models (Fig. 3.4)

(Appendix B: Table B23).

3. 8. 5 Process Variance:

Anderson^s Asymptotic Normal Approximation Sample Generalised Variance

With all blend PCA models, similar batches were identified as having shown significant

process variance at the 99.9% confidence interval (For all data pre-treatments: mean

sample generalised variance of all batches used in control phase 1 was zero, variance of

sample generalised variance of the control phase 1 batches was approximately 2n).

Across all models, the earliest batches seemed to show more process variance (batches

14, 15, 16, 26, 27, 28, 33, 37, 39, 43, 64, 67), with a cluster of values above 99.9%

confidence limits for each data set (Appendix B: Tables B35 - B39). With the exception

of the SNV-DT model, batch 109 (BN5015) was out-of-control. Most data sets

identified batches 64 (BN2047) and 67 (BN2055) as having significant process variance

130
and Sg2dl 1 model identified batches 125 (BN5029) and 126 (BN5030) as out-of-

control. All of these were found to have processed unusually with some blend data-sets,

having significant Q values (p = 0.01) (Appendix B: Table B12). Batches 142

(BN5044), 144 (BN5045 [S* re-blend]), 146 (BN5045 [4“' re-blend]), 152 (BN5049)

and 186 (BN5969) showed highly significant process variance on most data sets. None

of these batches had processed unusually, with Q values below 99% limits, however the

highest scoring batches, 144 and 146, were re-blends of the same production batch

(BN5045). Interestingly, this batch had a mean total water content below specification

limit, however the deviation of water content throughout the re-blends had been found

to be excessive in both cases (moisture deviations of 6.62% and 5.74% respectively,

maximum moisture content = 5%) and these were therefore rejected.

An example control chart for DT data is shown in Fig. 3.7D.

Sub-Group Mean-Centred PC Shewhart Plots

These plots were also examined to assess whether significant process variance was

traceable to the PC scores. The results are summarised in Appendix B: Table B24.

Batches considered to have exhibited excessive sub-group variance on a given PC were

restricted to cases where a minimum of 3 observations were above 99% control limit.

This corresponds to one blend sample, and should minimise Type I errors. With batches

64, 142, 146 and 186 it was possible to trace the excessive process variance to 1 or 2

PCs. Batch 146, which exhibited excessive variation in moisture content throughout the

blend, showed significant variance on PCI for all models. Hence this PC must represent

either scatter, which is affected by surface moisture or moisture content. An example

chart for Sg2dl 1 sub-group mean centred scores is shown in Fig. 3.11. The PC

Shewhart plots for PCI (original PC scores) with all models also showed systematic

variation which is consistent with the Hotelling’s 7^ MEWMA charts.

131
x10

10 0.5
I

500 1000 500 1000 1500


O bservation O bservation

500 1000 1500 500 1000 1500


O bservation O bservation
X 10 xIO" F

10 4
. .! .. .•

I k
s » Q.
_4
' •
-5 : ................ —6

500 1000 1500 500 1000 1500


O bservation O bservation

500 1000 1500


O bservation

Fig. 3.11. PC Shewhart control charts for sub-group mean centred PCA scores (n =
9 spectra per hatch) of NIR Savitzky-Golay 2"^ derivative of absorbance spectra of
blends (n = 193 batches). A) PCI scores; B) PC2 scores; C) PC3 scores; D) PC4
scores; E) PCS scores; F) PC6 scores & G) PC7 scores (95% and 99% control
limits shown).

132
The certificate of analysis total moisture contents for these blends were plotted and did

not show the same trend - it is possible that this component represents scatter which is

affected by surface moisture.

3. 8. 6 Correlation of Blend MSPC Results With Raw Material Usage Batch Data

The results of MSPC of blends (Sections 3.8.2 and 3.8.5) - Q statistic, Anderson’s

asymptotic normal approximation and the contributions of the individual PC scores to

the Hotelling’s f^- were examined for correlations with raw material batch numbers

used. Significant correlations with these statistics would indicate that a particular batch

of a raw material resulted in poor process performance and excess process variance.

Principal Factor Analysis

To examine the correlations between raw materials used and the MSPC indicators of

poor process performance, a data array was constructed and ordered such that the rows

corresponded to a particular process batch number, and the columns corresponded to all

of the raw material batch numbers used. The elements of the array contained the masses

(kg) of each raw material batch used in a particular blend batch. For each batch of blend

produced, each of the five raw materials could comprise several different raw material

batches in different amounts. Where none of a given raw material batch was used, the

element was made zero. The product batches used in this study were numbers 85 to 154,

which corresponds to the batches with systematic variation. The total number of

variables (raw material batches) used was 52, and the total number of blend batches

studied was 70. Appended to this array were column vectors of Q statistic, Anderson’s

asymptotic normal approximation and the individual PC contributions to the Hotelling’s

'f', with these values depending on the NIR blend data set examined.

133
Data Analysis

The combined data set (70-by-54) was autoscaled and subjected to a PCA. The number

of vectors retained included those with eigenvalues greater than or equal to unity, which

is typical in a factor analysis (Appendix B: Table B50). For each data set, the vectors

were then normal varimax rotated into terminal vectors (Harman, 1976).

Interpretation of Principal Factor Loadings

With principal factor analysis, no significance is attached to the order of rotated factors

(however they still collectively account for the same amount of variance as before

rotation) (Dillon and Goldstein, 1984). Instead, the factor loadings are examined in turn

for large positive or negative values. Their values correspond to correlations of that

factor with the variable of high loading value (positive or negative), and may be

subjected to statistical significance test (Dillon and Goldstein, 1984). With 70

observations, loadings must be greater than or equal to 0.3 to be considered significant

{p = 0.01) (most loadings on a principal factor analysis are typically close to zero).

Meaning may also be attached to the pattern of a factor’s loadings. For example, if there

exists bipolarity on a factor (Dillon and Goldstein, 1984).

With this study, important correlations would be those which include Q statistic,

Anderson’s asymptotic normal approximation or the Hotelling’s PC contributions

with raw material batch numbers.

Results

The SNV and SNV-DT data sets showed significant correlations ip = 0.01, n = 70)

between EX005147 (dibasic calcium phosphate anhydrous ), EX004245

(microcrystalline cellulose) and Anderson’s asymptotic normal approximation

(Appendix B: Table B50). This is important with these batch numbers being used in

134
BN5045, which exhibited excessive moisture variation throughout the blend.

The Sg2dl 1 data set showed significant correlations {p = 0.01, n = 70) between

EX008202 (dibasic calcium phosphate anhydrous), EX007173 (microcrystalline

cellulose), EX008015 (sodium starch glycolate) and Q statistic (Appendix B: Table

B50). This is important as these batch numbers were used in varying proportions in

blends 136 to 139 (BNs 5040, 5041, 5042 and 5043) which were found to have

processed unusually.

3. 9 Multivariate Statistical Quality Control of Pharmaceutical Tablets

This Section deals with MSPC for the lower and higher strength tablet data sets. For

each strength of tablet, models were examined from tablet absorbance data and from

tablet transmission data.

3.9.1 Lower Strength Tablet Absorbance Data Sets

Principal Components Analysis

All 44 batches were examined in this study. The rank of each pre-treated data set was

determined by recursive 'leave one out' cross validation in conjunction with Q statistic

Between 6 and 8 PCs were required (Appendix B: Table B2). All models except Sg2dl 1

explained in excess of 99%SS while the Sg2dl 1 model accounted for 96.68% %

probably due to reduction in signal to noise. All PCs were found to be significant

(Appendix B: Table B2). All of the PCA models’ loadings appear to represent physical

and chemical information.

The absorbance model’s first two PC loadings show a general trend of increasing value

across the variables suggesting scatter (Fig. 3.2A & B). The third PC’s loadings have

high negative loadings at 1400 and 1900 nm, characteristic of water (Fig. 3.2C). With

SNV, DT and SNV-DT models, the loadings of the first component are characteristic of

135
water (Fig. 3.12). The other PCs’ loadings of these models resemble quadratic baseline

corrected spectra of microcrystalline cellulose and magnesium stearate (Appendix B:

Table B51).

Unusual Batch Detection: Q Statistic

Some similarity exists in tablet batches found to have processed unusually with this

control chart. Batches 21 and 31 (BN5026 and BN5041) had significant Q values (p =

0.01) for absorbance, DT and SNV-DT models (Appendix B: Table B13). Batches 30,

31 and 32 were found to have processed unusually with absorbance data (BNs 5040,

5041 and 5042). Batch 7 had significant Q values with all models except Sg2dl 1.

Consideration of these control charts with their blend counterparts showed good

agreement for absorbance and SNV-DT models (Fig. 3.13). With both of these models,

batches 38, 39 and 40 had significant Q values (p = 0.01) (BNs 5064, 5065, 5067

respectively) (Appendix B: Table B13) (Fig. 3.13). Overall, absorbance data was

considered to provide the most consistent results.

Hotelling*s Control Phase 1: Monte Carlo Genetic Algorithm

This procedure was repeated with the 44 batches as for the blends. The batches

examined were produced from blends selected from batches 85 to 184. Most of these

were found to have shown process drift, however from batch 154 onwards, the process

had drifted back in control.

Between 17 and 35 batches were included in estimation of the process covariance

matrix (Appendix B: Table B40). With 99% and 95% control limits established, the PC

scores were tested in control phase 2.

136
0.05
0.1
<0
? I
% 0.05
1
^ -0 .0 5

- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1
0.05
o) 0.05 &
c

1
CL -0 .0 5 ^ -0 .0 5

- 0.1
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1
0.05

g, 0.05
?
g -0 .0 5
1
- 0.1
o
“■ -0 .0 5

-0 .1 5
- 0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.06
0.04
0.02

oo. -0.02
-0 .0 4

1200 1400 1600 1800 2000 2200 2400


W avelength/nm

Fig. 3.12. Lower strength tablet NIR absorbance spectra PCA loadings (SNV DT
data) (#î = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.

137
20 25
O bservation

Fig. 3.13. Q Statistic control chart of PCA model of lower strength tablets (n = 44
batches) (NIR SNV-DT absorbance spectra) (95% and 99% limits shown).

138
Hotelling^s Control Phase 2

The results of this control phase were found to depend on whether batches had been

included in the PCA model generation, or whether they had been excluded on the basis

of significant Q values. With absorbance and SNV-DT models, batches 38, 39 and 40

(BN5064, BN5065, BN5067) (excluded from PCA but projected onto eigenvectors)

were not found to be out of control. With these models, batches that were found to be

out of control were early batches that correspond to those batches which exhibited

systematic drift in the blend models. The detrend model did not identify any batches as

out of control.

The SNV and Sg2dl 1 models both identified batches 38, 39 and 40 (BN5064, BN5065,

BN5067) as out of control, which is consistent with blend Q statistics (these batches

were not identified as unusual with tablet Q statistics) (Appendix B; Table B40). Both

models also identified batch 30 (BN5040) as being out of control. Interestingly, the

Sg2dl 1 model also identified batches 31 and 32 as being out of control (BNs 5041 and

5042) (Fig. 3.14B). This result agrees with SNV-DT and Sg2dl 1 blend Q statistics. This

is shown clearly on a plot of PC scores 5 versus 4 with the Hotelling’s 95% and 99%

control ellipses (Fig 3.15), which also shows batches 38 and 39 lying close to the 95%

limit and batch 40 lying outside the 95% limit. The Sg2dl 1 pre-treatment provided most

consistent results.

Raw absorbance data PCA models clearly did not model the differences in NIR spectra

arising from physical differences in the surface of the friable tablets. This was detected

in the residual space with the Q statistic charts. The physical differences which affected

the spectra were emphasised with the Sg2dl 1 and SNV transformation and were thus

detected from their PCA models.

139
H o tellin g ’s 7^ c o n tro l c tta rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10

10

10 10

10 5 10 15 20 25
10 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2


10

I 150

10

10 '
5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 3.14. MSPC Control charts of lower strength tablet NIR absorbance spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.

140
•32

•30

•14 •12
•16

<56 42

•15
-2 •13 •37

40
-4
•19
-6
•10
•18
—8
-1 -0 .8 -0 .6 -0 .4 -0 .2 0 0.2 0.4 0.6 0.8
PCS S cores
X 10"

Fig. 3.15. PCS versus PC4 scores plot of PCA scores of lower strength tablets (n -
44 batches) (Sg2dll absorbance data) with Hotelling’s 95% and 99% control
ellipses.

141
Hotelling^s MEWMA Control Charts

With absorbance, DT and SNV-DT, this chart shows the exponentially weighted

Hotelling’s 7^ drifting from an out-of-control state to an in control state. This is an

interesting result which confirms the results of blend MSPC. This tablet data set

includes batches which as blends had shown significant drift in the process mean vector

and then moved to near (or) in control level {eg DT model).

With the SNV and Sg2dl 1 models batches 38, 39 and 40 (BN5064, BN5065, BN5067)

had not shown significant sum of squared residuals, Q, and were therefore included in

the PCA. These batches were found to be out of control with the Hotelling’s 7^ phase 2

control charts. The SNV Hotelling’s 7^ MEWMA moved from an out of control state, to

an in control state and then out of control with these batches. The Sg2dl 1 Hotelling’s 7^

MEWMA was much better able to respond to process drift (Fig. 3.14C), it moved out of

control and then drifted to an in control level. At batch 30 (BN5040), it again drifted out

of control (See batches 30, 31 and 32, Appendix B: Table B40), returning again to an in

control level at batch 33 (BN5055). However, at batch 38 (BN5064) it drifted out of

control and then began to drift to a lower level from batch 42 (BN5076) onwards.

PC Shewhart Control Charts

These charts used 99% control limits as with blends. These results were less consistent

than those of Q statistic and Hotelling’s control charts for both blends and tablets.

Batches 2, 5 and 10 were found to be out of control on PC Shewhart control charts with

SNV, DT and SNV DT. Batches 2 and 5 (BN4984 and BN4987) were out of control on

a PC with Sg2dl 1. Absorbance PC Shewhart plots did identify batches 39 and 40 as

(BN5065 and BN5067) out of control on PC6. The loadings on this PC were difficult to

interpret. Sg2dl 1 PC control charts identified batches 31 and 42 (BN5041 and BN5076)

as outliers on PCs 4 and 5 respectively. Again the loadings were difficult to interpret.

142
Process Variance: Anderson^s Asymptotic Normal Approximation

These control charts showed fairly consistent results across models: batches 4, 9, 15 and

16 (BNs: 4986, 4994, 5003, 5013 respectively) showed significant generalised variance

(Appendix B: Table B40). These were not found to have exhibited significant

generalised variance with blend data, however batch 15 was found to have exhibited

excess generalised variance with SNV-DT multiway PCA model [(blend and tablet

absorbance and transmission data) and (blend absorbance and tablet transmission data)

and (blend and tablet absorbance)]. Batches 15 and 16 were also found to have shown

significant process variance with the Sg2dl 1 multiway PCA model (blend absorbance

and tablet absorbance data). With most models, batch 15 had exhibited the highest

sample generalised variance.

3.9.2 Lower Strength Tablet Transmission Data Sets

Principal Components Analysis

All 44 batches were examined in this study. The rank of each pre-treated data set was

determined by recursive 'leave one out' cross validation in conjunction with Q statistic

outlier detection. The R statistic was used as stopping criterion.

Between 6 and 9 components were used for the models. The raw transmission data

required most components to model the data whereas the Sg2dl 1 model required only 6

PCs to fit the data (Appendix B: Table B4). Most models accounted for more than

99.9%SS except Sg2dl 1 which accounted for 98.2%55. All PCs extracted were found to

represent significant amounts of variance. The PC loadings for all models appeared to

represent physical and chemical information. The transmission model’s first 3 PC

loadings did not contain any peaks and may represent scatter (Fig. 3.3). The SNV

model’s first PC were similar and may also represent scatter. The DT and SNV-DT

models showed a broad peak on the first PC loading, however these may still represent

143
scatter as the peaks are less resolved than on the higher order PC loadings. All PC

loadings with Sg2dl 1 model contain sharply defined peaks, which suggests that they

represent chemical information (Fig. 3.16).

Unusual Batch Detection: Q Statistic

Some similarity exists in batches found to be unusual and not fit a given model across

all models (Appendix B: Table B 15). Batch 12 (BN4997) was an outlier with all

models. Batch 21 (BN5026) was an outlier with all models except S g2dll. With SNV-

DT model, batches 20, 21 and 23 (5025, BN5026, 5028) which are virtually consecutive;

were all found to be outliers. Batch 28 (BN5038) was an outlier on SNV, DT and SNV

DT models. Batch 30 (BN5040) was an outlier on SNV-DT and S g2dll models. The

Sg2dll model identified batches 30, 31 and 32 as unusual (BN5040, BN5041,

BN5042). These have consistently been found to be unusual with blend and tablet

absorbance models and are shown to be unusual with multiway PCA models (Section

3.10), and is therefore a consistent result. As the loadings for this model all contain ;

chemical absorption peaks and therefore are likely to be modelling just chemical

information, it is probable that these batches are physically different. This is unlikely to/

be modelled by the PCA model. This interpretation was supported by the C. of A.

results which showed that tablets produced from batch 30 (BN5040) were friable (1

mg). No C. of A. data were recorded for batches 31 and 32 (BN5041 and BN5042).

Hotelling^s Control Phase 1: Monte Carlo Genetic Algorithm

With the exception of Sg2dl 1, all models required an initial control phase 1 batch

screening using the first two PCs. With 6 batches and 100 Monte Carlo searches.

144
0.15 0.15

0.1 0.1

w 0.05
en 0.05
?
1
U -0 .0 5
o- -0 .0 5 a.
- 0.1
- 0.1 -0 .1 5

-0 .1 5 - 0.2
800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm

0.1 0.1

0.05

O) en

- 0.1
o -0 .0 5

“ ■ - 0.2
-0 .1 5
- 0 .3
- 0.2

800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm

0.2
0.1
0.15
0.05
w 0.1
o>
.E 0.05

1 -0 .0 5

^ -0.1 a- -0 .0 5
-0 .1 5
-0.1
-0.2 -0 .1 5

800 900 1000 1100 1200 1300 800 900 1000 1100 1200 1300
W avelength/nm W avelength/nm

Fig. 3.16. Lower strength tablet NIR transmission spectra PCA loadings (S g2dll
data) {n = 44 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5 & F) PC6.

145
potentially unusual batches were identified and excluded. These batches were then

further screened in control phase 1 by inclusion of all PCs and construction of 99%

control ellipsoids (Appendix B: Table B42).

Hotelling Control Phase 2

The Sg2dl 1 model did not require an initial screening, and identified batches 5, 6, 38,

39 and 40 as unusual (Fig, 3.17B). The latter 3 batches (BNs 5064, 5065 and 5067 )

were successfully identified as unusual with all other pre-treatment models, however

this required an initial screening of PCs 1 and 2 (Fig. 3.18). Batch 38 produced tablets

of 1 mg friability; no C. of A. data were recorded for batches 39 and 40.

The SNV model also identified batches 30, 31 and 32 (BNs 5040, 5041, 5042) and 38,

39 and 40 as unusual however these batches were not found to have processed unusually

with Q statistic control chart.

The PCA models of the transmission data sets contained chemical information and

physical information and were more effective at detecting anomalies than the Q statistic

control chart. The SNV transformation was more effective at detecting anomalous

batches than the Sg2dl 1 transformation, probably because more physical information is

modelled in its first two PC loadings. The Sg2dl 1 model appears to effectively remove

physical information, hence batches which have physically processed in an unusual

manner do not fit the model and are detected on the Q statistic chart. With the same

transformation, these batches cannot be detected as unusual on the Flotelling’s 7^

control charts. However, batches which have chemical differences are easily detected on

these control charts and more easily than with the other transformations. The SNV

model produced less consistent results with the Q chart, however all batches thus far

found to be unusual were detected on the Hotelling’s 7^ control charts. This is because

this model still retains physical information in its first 2 PCs.

146
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 Hotelling'S 7^ control chart: p h a s e 2

}
£
X I
10"' 10"'
5 10 15 20 25 30 35 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

X 100
1L. 10

5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 3.17. MSPC Control charts of lower strength tablet NIR transmission spectra
PC scores (Sg2dll data): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^
control phase 2; C) Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal
approximation.

147
X 10'

40
•16 •38 -39

•15 •21
•10
•13
2 •32
67
O •19 67
Q.

•33

•35 43

-2

- 3 I-
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2
PC2 Scores X 10

Fig. 3.18. PC2 versus PCI scores plot of PCA scores of lower strength tablets {n =
44 batches) (Sg2dll transmission data) with Hotelling’s 95% and 99% control
ellipses.

148
The Sg2dl 1 and SNV models for these data appear to be the best transformations for

providing the most consistent results.

Hotelling^s 7^ MEWMA Control Charts

With all pre-treatment models, the Hotelling’s MEWMA control charts generally

showed drift towards the theoretical grand mean vector. With transmission data, this

value was within the 99% limits, showing improvement up to batch 27 (BN5037) (Fig.

3.19C). Drift could then be observed, peaking at batch 32 (BN5042). This statistic

drifted out of control from batch 38 onwards (BN5064). The SNV transformation was

less useful in showing this drift - it identified earlier batches as out of control on

Hotelling’s 7^ control phase 2 chart, hence this statistic did not shift back within

control. The DT, SNV-DT and Sg2dl 1 charts all showed an improvement in drift to

within a state of control. With DT model, drift occurred at batches 38 (BN5064)

onwards. With SNV-DT and Sg2dl 1 models, drift occurred at batch 28 (BN5038),

taking this statistic out-of-control. An improvement in drift could be seen from batch 32

(BN5042) onwards with both, but again increasing from batch 38 (BN5064). With the

Sg2dl 1 model, this statistic reduced to within control at batch 37 before again drifting

out of control with batch 38 (BN5064). The Sg2dl 1 chart was considered to perform

best and could detect drift which was not detected on the Hotelling’s 7^ phase 2 control

chart.

PC Shewhart Control Charts

With these charts, 99% control limits were used. Batches 5 and 6 signalled out of

control across all pre-treatments.

149
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2
10 10

10 10

10

10 5
10
10 15 20 25 30 35 40 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b iiity c o n tro i c h a rt: p h a s e 2

10

-5 0 0

5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 3.19. MS PC Control charts of lower strength tablet NIR transmission spectra
PC scores (raw data): A) Hotelling’s control phase 1; B) Hotelling’s control
phase 2; C) Hotelling’s MEWMA & D) Anderson’s asymptotic normal
approximation.

150
Batch 30 (BN5040) signalled out of control with DT and SNV-DT on PC2. The SNV

DT model also detected batch 31 (BN5041) as unusual on PCS.

Process Variance: Anderson^s Asymptotic Normal Approximation

With this control chart, batches 30, 32 and 39 (BN5040, BN5042, BN5065) were found

to have significant process variance (Appendix B: Table B42). These results are

consistent with multiway PGA results (MPCA lower strength tablet batch indices 25, 27

and 34 respectively (refer to Table 3.6), Appendix B: Table B48). The Sg2dl 1 model

did not identify these batches as having shown excess process variance (however

BN5065 was detected with multiway PCA analysis with this transformation (Appendix

B: Table B48)). Transmission and detrend data performed the best (Appendix B: Table

B42).

3.9.3 Higher Strength Tablet Absorbance Data Sets

Principal Components Analysis

All 43 batches were examined in this study. The rank of each pre-treated data set was

determined by recursive 'leave one out' cross validation in conjunction with Q statistic

outlier detection. The R statistic was used as stopping criterion.

Between 6 and 7 PCs were extracted for the different pre-treatments, with models

explaining more than 99.6%SS for all pre-treatments except Sg2dl 1 (Appendix B: Table

B3). This model only accounted for 95 A%SS with 6PCs. All extracted PCs were found

to be significant using Anderson’s likelihood ratio chi-squared test (Jackson, 1991).

The PC loadings for these models appear to show physical and chemical information.

The absorbance model’s 2 PC loadings appeared to represent scatter (Fig. 3.20A &

B). The 3*^^, 4**^ and 5^ PC loadings appear to contain features characteristic of water,

microcrystalline cellulose and magnesium stearate respectively (Fig. 3.20C, D & E). All

151
0.06 0.02

g) o>
~ 0.04
o -0 .0 2

Q- -0 .0 4

-0 .0 6
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.15 0.1

o) 0.1 ai 0.05

0.05 1
uQ .
-0 .0 5

1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1 0.1
0.05
I o) 0.05

1 u
^ -0 .0 5 Q.

-0 .0 5
-0.1
1200 1400 1600 1800 2000 2200 2400 1200 1400 1600 1800 2000 2200 2400
W avelength/nm W avelength/nm

0.1
S, 0.05

1
O
“■ -0 .0 5

-0.1
1200 1400 1600 1800 2000 2200 2400
W avelength/nm

Fig. 3.20. Higher strength tablet NIR absorbance spectra PCA loadings (raw data)
{n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PC5; F) PC6 & G) PC7.

152
other models did not appear to have loadings that represented scatter, the first, 2"^ and

3"^^ PC loadings of SNV, DT and SNV-DT models contained features characteristic of

water, microcrystalline cellulose and magnesium stearate respectively. The Sg2dl 1

model showed loadings with chemical features; the PC loadings appeared to

represent water.

Unusual Batch Detection: Q Statistic

Consistent batches were found to be outliers with the Q statistic control charts across

models (Appendix B: Table B14). Batches 32 and 33 (BNs 5059 and 5060) were

outliers with absorbance and SNV-DT; batch 33 was an outlier with SNV and DT.

Batches 4 and 6 (BNs 4990 and 4992) were outliers with SNV, DT, SNV DT and

Sg2dll. Batch 42 (BN5080) was an outlier with absorbance and SNV. SNV also

showed batch 43 (BN5081) as an outlier. Comparison of these results with those for

blends, multiway PCA and transmission models suggested that tablet absorbance

measurements did not provide as reliable a model for Q statistic identification of

batches of higher strength tablets that have processed unusually. The transmission

models provided a better indicator of unusual batches (Appendix B: Table B16). This

can be explained by the certificate of analysis data which showed that unusual batches

(detected on blend and multiway models) had a greater thickness than normal (these

tablets may have exhibited greater elasticity). Transmission measurements pass through

the entire tablet and are likely to be sensitive to the increase in pathlength of the tablet,

whereas reflectance measurements were not sensitive to this. (N. B. The opposite of this

was true for some lower strength tablets (Section 3.9.2). Unusual batches, detected at

blend and multiway model Q statistic control charts were found to be friable in

certificate of analysis tests. These tablets are therefore likely to be more brittle than

normal and will show increased fragmentation. This occurs at the surface of the tablet

153
and will affect surface texture and therefore also affect NIR absorbance measurements.

The results of Q statistic control charts for absorbance data for lower strength tablets

(Appendix B: Table B 13) agreed with this finding and provided a much better

indication of this than for transmission higher strength tablet data sets (Appendix B:

Table B 15)).

H o t e l l i n g C o n t r o l Phase 1: Monte Carlo Genetic Algorithm

A preliminary search using 6 randomly selected batches and PCs 1 and 2 was required

with absorbance data only. The other models produced satisfactory results using one

more batch than the number of retained PCs and 90% control levels except Sg2dl 1

which used 12 batches with all 6 PCs in the search.

Hotelling*s Control Phase 2

The effect of different data pre-treatments on the ability of MSPC of their PCA models

was considered alongside multiway and blend MSPC results. This indicated that SNV

and DT pre-treatments provided the most reliable models (Appendix B: Tables 41, 12

and 49) (Fig. 3.21B): batches 36, 40 and 41 (BN5070, BN5078 and BN5079) were

identified as unusual with 99% confidence level.

The absorbance data identified batch 36 as unusual at the 95% level. Interestingly, with

this model, batch 41 can be seen as falling outside the 95% confidence ellipse for PCs 1

and 2. Batches 40 and 36 lie close to the 95% confidence limit. The Sg2dl 1 data set

identified batch 41 as unusual at 99% confidence level and batch 40 as unusual with a

95% confidence level. On the PCI versus PC2 Hotelling’s 7^ control ellipse, batch 36

can be seen to lie very close to the 99% confidence limit (Fig. 3.22).

154
H o tellin g 'S 7^ c o n tro l c tia rt: p h a s e 1 H o te liin g 's 7^ c o n tro l c h a rt; p h a s e 2

I
I

10"' 10"'
5 10 15 20 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b iiity c o n tro i c h a rt: p h a s e 2

I 150

100

10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35 40
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 3.21. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (SNV data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.

155
X 10

o
in
42
O
Q.

•19
•16
69

-2

-3
-2 -1 .5 1 -0 .5 0 0.5 1 1.5 2 2.5
PC2 S cores
X 10'

Fig. 3.22. PC2 versus PCI scores plot of PCA scores of higher strength tablets {n
43 batches) (Sg2dll NIR absorbance data) with Hotelling’s 95% and 99%
control ellipses.

156
Hotelling*s MEWMA Control Charts

The ability of these charts to representatively show drift in the process was found to

depend on how well the Hotelling’s 7^ control phase 2 charts were able to identify

unusual batches. The absorbance and Sg2dl 1 charts were considered best indicators of

process drift, and showed the value of this statistic moving out of control for batches 36,

40 and 41 (BN5070, BN5078 and BN5079) (Fig. 3.23C). This is because with these

charts some batches fall above the 95% confidence level but not the 99% confidence

level. This appeared to cause the exponentially weighted Hotelling’s 7^ value to shift

without greatly falling beyond the 99% level. Evidently, this chart is very sensitive to

large Hotelling’s 7^ control phase 2 values in excess of the 99% control limit (a

different value for A could be examined - in this study a value of 0.1 was used). The

MEWMA chart for DT data was quite a good indicator of drift - it showed the process

drifting to a state of control by batch 35 (BN5069), then drifting out of control with

batch 36 (BN5070), moving again within control by batch 38 (BN5073) and then

drifting out of control from batch 39 (BN5074) onwards. The SNV-DT model showed

the process drift to a state of control by batch 29 (BN5053) and then continued to drift

out of control thereafter. The SNV model appeared more erratic and shifted

considerably.

PC Shewhart Control Charts

With these charts, 99% confidence limits were used. The best performing model was

DT which identified batches 36 and 41 (BN5070 and BN5079) as significantly different

for PCs 1 and 5 respectively. The loadings for PCI suggest moisture, for PC5 this is

more difficult to interpret. SNV-DT and Sg2dl 1 models both identified batch 41

(BN5079) as unusual on PCs 5 and 6 respectively. Again, the loadings on these PCs

were difficult to interpret.

157
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2

I I
I 10 I

10"' 5 10 15 20 25 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

E 600

10

5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 3.23. MSPC Control charts of higher strength tablet NIR absorbance spectra
PC scores (Sg2dll data) in = 43 batches): A) Hotelling’s control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.

158
Process Variance: Anderson^s Asymptotic Normal Approximation

The results of these charts did not appear to be consistent with blend and multiway

control charts. This is probably because reflectance measurements are not sensitive to

differences in pathlength. Increased tablet thickness {i.e. pathlength) was found to have

occurred with some tablet batches identified as unusual in multiway models (compare

Tables: B42, B45, B47 and B49, Appendix B).

3. 9. 4 Higher Strength Tablet Transmission Data Sets

Principal Components Analysis

All 43 batches were examined in this study. The rank of each pre-treated data set was

determined by recursive 'leave one out' cross validation in conjunction with Q statistic

outlier detection. The R statistic was used as stopping criterion (Appendix B: Table B5).

Between 3 and 7 PCs were required for PCA models. The Sg2dl 1 model required

fewest PCs, transmission and DT models required 7 PCs. Most models explained more

than 99.5 %SS except Sg2dl 1 which accounted for just 98.9 %SS.

All PCs extracted were found to represent significant amounts of variance using

Anderson’s likelihood ratio test (Appendix B: Table B5) (p = 0.01). The loadings for

these models appeared to contain physical and chemical information: with transmission,

this was evident with the first few PCs (Fig. 3.24); with SNV and DT the first PC

loadings appeared to represent physical information. The SNV-DT and Sg2dl 1 models

had loadings which appeared to represent chemical information (Fig. 3.25).

Unusual Batch Detection: Q Statistic

With these models, a number of batches could be identified as having processed

unusually and had significant (p = 0.01) Q statistics. These results agreed with those of

blend and multiway models (Appendix B: Tables 12 and 49) and were in contrast to

159
0.15
0.075
0.1
^ 0.07
0.05
O
a.
0.065
-0 .0 5
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm

0.1 0.2
.E 0.05
8,
c
0.1
1 1
-0 .0 5

- 0.1 -0.1
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm

0.1
0.2
0.05
I» 0.1
-I -0 .0 5 1
-0.1 -0.1
-0 .1 5 -0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm

0.2

o) 0.1

Ü
1
Q.

-0.1
800 900 1000 1100 1200
W avelength/nm

Fig. 3.24. Higher strength tablet NIR transmission spectra PCA loadings (raw
data) (n = 43 batches): A) PCI; B) PC2; C) PC3; D) PC4; E) PCS; F) PC6 & G)
PC7.

160
0.1 0.2

g) 0.05

I
-0 .0 5

-0.2
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm

0.05 0.15

0.1
o> oE> 0.05
I -0 .0 5
-I -0 .1
& -0 .0 5
-0 .1 5
-0.1
- 0.2
-0 .1 5
800 900 1000 1100 1200 800 900 1000 1100 1200
W avelength/nm W avelength/nm

Fig. 3.25. Higher strength tablet NIR transmission spectra PCA loadings (SNV DT
data) (n = 43 batches): A) PCI; B) PC2; C) PC3 & D) PC4.

161
those of absorbance models for these batches. Previous results indicated that these

batches are physically different from normal batches. Transmission measurements pass

through the entire tablet and are likely to be more sensitive to pathlength differences

than diffuse reflectance measurements. Batches 26, 27, 28, 29, 30, 31, 32, 38, 40 and 41

(BNs 5050, 5051, 5052, 5053, 5054, 5058, 5059, 5073, 5078, 5079) were found to be

unusual. These results were consistent between the models (Appendix B: Table B16),

and are consecutive batches, suggesting deviation in process performance with time.

Examination of certificate of analysis data showed that these batches were thicker than

normal, confirming the NIR model predictions.

Hotelling^s Control Phase 1: Monte Carlo Genetic Algorithm

Between 23 and 32 batches were selected for control phase 1 (Appendix B: Table B43).

The PCA models produced from transmission and DT data required a preliminary

Monte Carlo search using the first two PCs and random selection of 6 batches to

identify unusual batches. All other models were able to identify unusual batches using

all of the modelled components in the Monte Carlo search.

Hotelling^s 1^ Control Phase 2

All models identified similar batches of tablets as unusual (Appendix B: Table B43),

with a number of groups of consecutive batches identified as unusual, suggesting a drift

from normal process operating performance. Batches: 24 to 31 (BN5048, BN5049,

BN5050, BN5051, BN5052, BN5053, BN5054, BN5058) and 40 and 41 (BN5078 and

BN5079) were identified as unusual (Appendix B: Table B43). These batches produced

tablets with average thickness above the limit of 4.6 mm, and from batch 28 to 30

(BN5052, BN5053, BN5054) also produced tablets which were friable (2 mg, 4 mg and

1 mg respectively). Batches 40 and 41 produced tablets of average thickness above the

162
maximum limit and batch 40 produced tablets which were friable (1 mg). These tablets

had very large transmission values across the spectral range scanned. An example

control chart for Sg2dl 1 data is shown in Fig. 3.26B. These batches were clearly

observed as falling outside the 99% Hotelling’s 7^ control ellipse on the PCI versus 2

score plot of Sg2dl 1 data (Fig. 3.27).

H o t e l l i n g M E W M A Control Charts

With this control chart, the statistic was largely out of control for all batches with the

models produced from SNV and SNV DT transmission measurements. The charts using

PC scores from transmission, DT and Sg2dl 1 transmission measurements were more

consistent with the Hotelling’s control phase 2 charts (Fig. 3.26C). Significant drift

ip = 0.01) in the process mean vector was observed from batch 24 (BN5048) onwards.

The value fell from batch 31 (BN5058), consistent with Hotelling’s 7^ control phase 2

charts, however it began drifting at batch 40 (BN5078), also consistent with previous

findings.

PC Shewhart Control Charts

Consistent results were obtained with this control chart across the models (Appendix B:

Table B28). Batches 30, 36 and 40 (BN5054, BN5070, BN5078) were identified as

significant different (p = 0.01 level).

Process Variance: Anderson*s Asymptotic Normal Approximation

With this control chart, batches 30,40 and 41 (BN5054, BN5078 and BN5079) were

found to have exhibited significant process variance across the models tested (Appendix

B: Table B43). Batch 30 and 40 produced tablets of an average thickness above the

maximum limit which were also friable (1 mg); batch 41 produced tablets which were

163
H o tellin g ’s 7^ c o n tro l c tia rt: p h a s e 1 H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 2

10"' 5 10 15 20 25
10"' 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

i 300

'U Id' S’ 200

c 100

10' 5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 3.26. MSPC Control charts of higher strength tablet NIR transmission spectra
PC scores (Sg2dll data) {n = 43 batches): A) Hotelling’s 7^ control phase 1; B)
Hotelling’s control phase 2; C) Hotelling’s MEWMA & D) Anderson’s
asymptotic normal approximation.

164
0.04

40
0.035

0.03

0.025

0.02

ü) 0.015 •30

0.01

0.005

•38
66 •14
^ % •J|B-9-1^Û11
-0 .0 0 5

- 0.01
-3 -2 1 0 1 2 3
PC2 S c o re s
X 10

Fig. 3.27. PC2 versus PCI scores plot of PCA scores of higher strength tablets (n
43 batches) (Sg2dll NIR transmission data) with Hotelling’s 7^ 95% and 99%
control ellipses.

165
above the maximum average thickness.

3.10 Multivariate Statistical Quality Control of The Entire Process

In this section multiway PCA models are examined. For the two strengths of tablet, the

models were derived from both blend NIR spectral data and tablet NIR spectral data.

Different combinations of blend and tablet spectral data were examined. Consistency

between the results of these models was examined, and compared with reference

analysis data. The combinations of blend and tablet spectral data examined were :

1. Blend absorbance and tablet absorbance NIR data (including pre-treatments.

Section 3.6)

2. Blend absorbance and tablet transmission NIR data (including pre-treatments.

Section 3.6)

3. Blend absorbance and tablet absorbance and transmission NIR data (including

pre-treatments. Section 3.6)

The results of MSPC of each of these models were compared between MPCA models

and against results of MSPC of PCA models for the blend and tablet stage. They were

also compared against reference analysis data. These comparisons were made so that the

relative importance of tablet absorbance and transmission data for modelling and

monitoring process performance could be determined.

3 .1 0 .1 Lower Strength Tablet Process

Multiway Principal Components Analysis

For the lower strength multiway models tested, between 3 and 8 PCs were required to

model the data sets (Appendix B: Tables B6, B8 & BIO). The Sg2dl I transformation

166
produced models which required the least number of components and the multiway

model produced from raw blend and tablet absorbance and transmission data required

most PCs. The loadings for each model appeared to represent chemical and physical

information, however their precise interpretation with these data sets was difficult.

Unusual Batch Detection: Q Statistic

With all models examined, the multiway models produced from SNV blend absorbance

and tablet absorbance data (Appendix B: Table B 17) and Sg2dl 1 blend absorbance and

tablet transmission data (Appendix B: Table B 19) produced MPCA models which were

able to identify groups of batches previously found to have processed unusually. These

were batches: 25 to 28 (BN5040, BN5041, BN5042, BN5055) and 33 (BN5064) and 35

to 37 (BN5067, BN5075, BN5076).

Hotelling^s Control Phase 1: Monte Carlo Genetic Algorithm

Between 11 and 39 batches were selected in control phase 1 using this algorithm, as for

other PCA models.

Hotelling ^s Control Phase 2

MSPC control charts were found to perform best with blend and tablet transmission data

(Appendix B: Table B46) and blend and tablet absorbance and transmission data

(Appendix B: Table B48) (Fig. 3.28B). The MPCA models derived from pre-treated

blend and tablet absorbance data identified systematic variation in the process with the

first 18 batches selected for the control ellipsoid (Appendix B: Table B44). With these

batches, this systematic variation was traced to the blend data which showed two groups

of spectra differing only in offset.

167
H o te llin g 's 7® c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7® c o n tro l c h a rt: p h a s e 2

I I
I I
10"'
5 10 15 20 25 5 10 15 20 25 30 35
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b iiity c o n tro l c h a rt: p h a s e 2

5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 3.28. MSPC Control charts of lower strength tablet multiway PCA scores
(SNV DT blend absorbance and tablet absorbance and transmission data) (n = 39
batches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.

168
With the other MPCA models, batches 25, 26, 27 (BN5040, BN5041, BN5042) and 33,

34, 35 and 39 (BN5064, BN5065, BN5067, BN5967) were found to have processed

unusually with this statistic (p = 0.01). These results agree with previous blend and

tablet PCA models and could be traced to one or two PCs (Appendix B: Tables B44,

B46 and B48). An example PC2 versus PCI score plot with Hotelling’s 7^ 95% and

99% control ellipses is shown in Fig. 3.29 for the SNV DT multiway data set.

Hotelling^s MEW MA Control Charts

With the charts produced from the MPCA models containing transmission data, process

drift was observed from batch 23 (BN5039) onwards. The process mean vector drifted

out of control (p = 0.01) and from batch 27 (BN5042), began to shift towards the grand

mean vector, however further drift in process operating performance from batch 32 was

observed (Fig. 3.28C). These charts were very similar to those for the tablet models.

PC Shewhart Control Charts

With the MPCA models derived from blend and either tablet absorbance or transmission

measurements, batches 10 (BN5017) and 25 (BN5040) were found to be unusual at the

p = 0.01 level (Appendix B: Tables B29 & B31). The MPCA model derived from blend

absorbance and combined tablet absorbance and transmission measurements identified

batches 25 (BN5040), 35 & 36 (BN5067 and BN5075) as having processed unusually

(Appendix B: Table B33).

Process Variance: Anders on *s Asymptotic Normal Approximation

With this control chart, some consistent results between MPCA charts were observed.

The overall MPCA model incorporating all NIR measurements identified batches 24

169
1.5

0.5
^5
68Q8 ^220
60
2

o
Q. •10
« -14-15
-0 .5

- 1 . 5 '-
— 0.8 - 0.6 -0 .4 - 0.2 0 0.2 0.4 0.6
PC2 S c o re s

Fig. 3.29. PC2 versus PCI scores plot of multiway PCA scores of lower strength
tablet process (SNV DT blend absorbance and tablet absorbance and transmission
data) (w = 39 batches) with Hotelling’s 7^ 95% and 99% control ellipses.

170
(BN5039 [re-blend]), 25 (BN5040) and 27 (BN5042) as having exhibited excessive

process variance (Appendix B: Table B48). The most consistent results with this were

obtained from the MPCA models calculated from blend and tablet transmission data

(Appendix B: Table B46) (Fig 3.28D).

3 .1 0 .1 Higher Strength Tablet Process

Multiway Principal Components Analysis

Between 3 and 8 PCs were required to model the data sets. Raw data tended to produce

models requiring most PCs (Appendix B: Tables B7, B9 & B 11). Models produced

from SNV data which included SNV transmission spectra required only 3 PCs

(Appendix B: Tables B9 & B 11) as did the Sg2dl 1 model produced from blend and

tablet absorbance and transmission measurements (Appendix B: Table B 11).

Unusual Batch Detection: Q Statistic

With this control chart, consistent results were produced between models derived from

blend and tablet absorbance data and blend and tablet transmission data (Appendix B:

Tables B18 & B20). Batches found to have processed unusually and therefore not fit the

model plane were: 23 to 25 (BN5049, BN5050, BN5051); 27 and 28 (BN5053 and

BN5054) and 38 and 39 (BN5078 and BN5079). These results are consistent with

previous PCA models. These groups of consecutive batch numbers show that the

process is deviating from its optimum conditions with time.

Hotelling^s Control Phase 1: Monte Carlo Genetic Algorithm

Between 18 and 41 batches were selected in control phase 1 using this algorithm, as for

other PCA models.

171
Hotelling^s Control Phase 2

With this control chart, models produced from blend and tablet transmission and blend

and tablet absorbance and transmission measurements produced the most consistent

results which were also in agreement with those of previous PCA models (Appendix B:

Tables B47 & B49). This is shown in Fig. 3.30B for the Sg2dll multiway data set.

Batches: 23 to 29 (BN5049, BN5050, BN5051, BN5052, BN5053, BN5054, BN5058);

34 (BN5070) and 38 and 39 (BN5078 and BN5079) were found to have processed

unusually, typically on the first PC (Appendix B: Tables B47 & B49). These results

were clearly observed on PCI versus PC2 scores plots for Sg2dl 1 data with Hotelling’s

7^ 95% and 99% control ellipses (Fig. 3.31).

Hotelling*s MEWMA Control Charts

The control charts produced from MPCA models which included transmission data

were very consistent with those of PCA data for higher strength tablet transmission

measurements. With the overall MPCA models (blend and tablet absorbance and

transmission measurements), drift in the process mean vector was observed for all

control charts. With raw, DT and Sg2dl 1 data (Fig. 3.30C), this occurred from batch 23

(BN5049) onwards, and shifted toward the process mean vector at batch 27 (BN5053),

and then drifted further from batch 33 (BN5069). With the SNV DT model, the process

mean vector drifted within control (p = 0.01) at batch 30 (BN5059), however it then

drifted out of control from batch 32 (BN5068). The SNV DT Hotelling’s 7^ MEWMA

chart performed best.

PC Shewhart Control Charts

With these control charts, batches 24 (BN5050), 34 (BN5070) and 38 (BN5078) were

identified as unusual (Appendix B: Tables B30, B32 & B34). As with the Hotelling’s 7^

172
H o te llin g 's 7^ c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2

Ik
I I
I 1

10"' 5 10 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

'x 1 0 0 0

Ë 500

10'* 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 3.30. MSPC Control charts of higher strength tablet multiway PCA scores
(Sg2dll hlend ahsorhance and tablet absorbance and transmission data) {n = 41
hatches): A) Hotelling’s 7^ control phase 1; B) Hotelling’s 7^ control phase 2; C)
Hotelling’s 7^ MEWMA & D) Anderson’s asymptotic normal approximation.

173
0.04

68
0.035

0.03

0.025

0.02

69
w 0.015 ■28

0.01
67

0.005 63
65

66
64
10 6 3 .
^ "—
^3 -tot -5«4«-7
-0.005

- 0.01
-2.5 -2 -1.5 1 -0.5 0 0.5 1 1.5 2 2.5
PC2 Scores X 10"

Fig. 3.31. PC2 versus PCI scores plot of multiway PCA scores of higher strength
tablet process (Sg2dll blend absorbance and tablet absorbance and transmission
data) {n = 41 batches) with Hotelling’s 95% and 99% control ellipses.

174
control charts, MPC A models which included tablet transmission data produced most

consistent results (Appendix B: Tables B30 & B34).

Process Variance: Anderson *s Asymptotic Normal Approximation

With these control charts, batches 20 (BN5046), 23 (BN5049), 28 (BN5054), 38

(BN5078) and 40 (BN5080) were found to have exhibited significant process variance

with the MFC A data sets which included transmission measurements (Appendix B:

Tables B47 & B49) (Fig. 3.30D).

175
3.11 Summary of Results

The results o f this chapter are summarised in Tables 3.7 and 3.8.

Table 3.7. Summary of results for lower strength tablet process.


NLR blend absorbance NIR tablet absorbance NIR tablet transmission Multiway NIR data Unusual C. of A. Cixnment
number data predicted unusual data predicted unusual data predicted unusual predicted unusual data

Q A" Q r* A Q A Q A

4993 ! •
4994 2 • • •
4996 3 • • • •
4997 4 • . . • .
4998 5
5002 6
5003 7 • •
5013 8 . •
5015 9
5017 10 • Increase in content uniformity
5018 11 • . Increase in content unifwmity
5025 12 •
5025"' 13 •
5026 14 •
5027 15
5028 16
5028"’ 17
5029 18 . •
5035 19
5036 20
5037 21
5038 22
5039 23 Excessive moisture deviation
5039"’ 24 Excessive moisture deviation
5040 25 . •
5041 26 Friability of Img
5042 27 Friability of 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 Friability of Img
5065 34 • N/A' C of A. data missing
5067 35 N/A' C. of A. data missing
5075 36
5076 37 • Low drug content per tablet
5077 38
5967 39 •

represents Anderson’s asymptotic normal approximation,


rb
denotes a re-blended batch.

N/A denotes certificate of analysis data not available

176
Table 3.8. Summary of results for higher strength tablet process.
NIR blend absorbance NIR tablet absorbance NIR tablet transmission M ultiway NIR data Unusual C. of A. Comment
n u rn b ^ data predicted unusual data predicted unusual data predicted unusual predicted unusual

Q A" Q A Q A Q 7" A

4991 1
4992 2
4999 3
5000 4
5001 5
5009 6
5010 7
5011 8
5021 9
5022 10
5023 11
5024 12
5031 13
5032 14
5033 15
5043 16
5043"’ 17
5043™ 18
5044 19 .
5046 20 •
5047 21
5048 22
5049 23 • • • Thick tablets (4.62 mm) of high
m oisture (4.44% )
5050 24 Thick tablets (4.62 mm)
5051 25 Thick tablets (4.62 mm)
5052 26
5053 27 Friability of 1 mg
5054 28 • • • Friability 4 mg; tablet thickness
= 4.63 mm
5058 29 . • Thick tablets (>4.6 mm)
5059 30 Thick tablets (>4.6 mm)
5060 31 Thick tablets (>4.6 mm)
5068 32 Thick tablets (4.62 mm)
5069 33 Thick tablets (4.62 mm)
5070 34 • • Thick tablets (4.61 mm)
5071 35 Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36 •
5074 37 Friability of 4 mg
5078 38 • • Friability of 1 mg; thick tablets
(4.62 mm)
5079 39 . • Thick tablets (4.62 mm)
5080 40 •
5081 41 •

^ represents Anderson’s asymptotic normal approximation.


* denotes a re-blended batch.

177
3.12 Conclusion

The aim of this study was to determine whether NIR spectrometry could be used for

quality control of a pharmaceutical tablet manufacturing process through application of

MSPC procedures. The ability of the method to identify trends in process performance

and their relationship to reference analytical product quality data were therefore

determined. Statistical correlations were also made of MSPC results with raw material

batch usage data, in an attempt to determine whether particular batches of raw materials

lead to poor process performance.

The MSPC procedures were applied to PCs of the NIR spectral data. These were used as

they are linear combinations of the original data which summarise the systematic

variability in a ‘least squares sense’. The dimensionality of the data were therefore

reduced to a few variables, without loss of information. This allowed easier process

monitoring.

The statistical assumptions made of the data were that most batches of product were

from the same multinormal population and were produced whilst the process operated

within statistical control. This assumption was verified by reference of MSPC control

phase batches to their reference analytical data, i.e. all control phase 1 batches produced

high quality tablets.

Trends in process performance could be detected at each process stage by the MSPC

procedures. Systematic variability could be identified in the blending process from raw

or pretreated spectral data. Within this period of systematic variability, a number of

batches of blend were produced which were of unusual quality or which produced

tablets of unusual quality. These exhibited excessive blend moisture deviation and

deviation in drug substance content, and were identified as significantly different by the

NIR method. Some of these blends were not tabletted. The placebo blend was also

identified as unusual by the NIR method and indicates that this method may be useful

178
for surveillance of counterfeit medicines. Many of the other blends identified as unusual

by the NIR method, but not having shown unusual reference analysis results, ultimately

produced batches of unusual tablets. These unusual product quality included tablet

friability, increased tablet thickness and prolonged dissolution time. The differences in

unusual product quality may relate to the particular batch numbers of raw materials used

as significant correlations were observed with MSPC results.

The NIR measurements required to implement the MSPC method includes diffuse

reflectance measurements of the blends and both diffuse reflectance and transmission

measurements of the tablets (in this study these raw data were transformed to apparent

absorbance data). Though batch averaged spectra were used for PC A, several

measurements of blends and tablets are required for each batch for detection of process

variability by the NIR method, eg excessive moisture deviation in blends.

With both strengths of tablet, diffuse reflectance and transmission measurements are

necessary and provide different, but complementary information. The diffuse

reflectance measurements (expressed as apparent absorbance) were useful for detecting

physical anomalies which affected the tablet surface, eg friability. The transmission

measurements (expressed as apparent absorbance) were useful for qualifying tablet

thickness and drug substance content. Hence both measurements should be made. The

most useful scatter correction pre-treatments were SNV-DT and S g2dll.

Overall, the results suggest that with a properly validated reference set of blend and

tablet NIR measurements, the MSPC method could be solely used for quality control

and assurance of the process at the blending and tabletting stage and to monitor process

performance with time.

179
CHAPTER 4

Multivariate Statistical Process Control of a

Pharmaceutical Process Using Partial Least

Squares Regression (PLSR) of Near Infrared

and Reference Analysis Measurements

4.1 Introduction

In Chapter 3, MSPC procedures were applied to PC scores of NIR spectral

measurements of blends and tablets. The ability of this ‘model-free’ approach to process

control and monitoring was determined by comparison of MSPC results with reference

analytical measurements. In this chapter, the multivariate methods known as partial

least squares regression (PLSR) and multiblock* PLSR are examined. These methods

maximise the covariance between NIR spectral measurements and reference analytical

measurements and therefore produce latent vectors which are most closely related to the

reference analytical values. The PLS scores produced may be monitored in the same

manner as described in Chapter 3 for PC scores.

In Sections 4.3, singleblock^ PLS models of blends and tablets respectively, are

subjected to MSPC. The predictive abilities of these models are compared with those of

Chapter 3.

Section 4.4 models the entire process by multiblock PLSR. Conclusions regarding these

methods of process monitoring are discussed in Section 4.6.

multiblock data sets are a collection of three-way data sets of process data at each process stage.
' singleblock data sets are three-way data sets of process data of one process stage.
180
4. 2 Near Infrared And Reference Analysis Data Sets Used

In this study, the NIR spectral and reference analysis data sets (Section 3.4) used were

those of Section 3.6.3 for multiway PC A of the process for the two strengths of tablet

(Tables 3.4 and 3.5).

The number of batches of blends and tablets used in the manufacture of the lower

strength tablet was therefore 39 (Table 3.6). For the higher strength tablet process, 41

batches of blends and their corresponding tablets were examined (Table 3.6).

With a few batches produced, some reference analysis data were missing.

4. 2 .1 Data Analysis And Pre-treatment

The spectral data were analysed using code programmed in Matlab 5.2 Scientific and

Technical Programming Language (The Mathworks Inc., Natick, MA, USA). A number

of different pre-treatments of blend and tablet data were examined. The data pre­

treatments examined were those which were previously shown to produce the best

multivariate principal components analysis process models for such data (Section 3.11).

These were:

1. Raw spectral data (absorbance/transmission);

2. SNV-DT;

3. S g2dll.

These 2 pre-treatments were applied to blend absorbance data, and absorbance and

transmission data for each of the two strengths of tablet. This provided 18 data sets

(including multiway data sets of blend and appended tablet absorbance and transmission

data) to be studied via projection to latent structures and multiblock projection to latent

structures.

181
4. 3 Statistical Quality Control of Pharmaceutical Blends And Tailets by Single

Block PLSR

The method of PLS summarises the important variability in both the process (NIR

spectral data) (X) and the final product quality data (Y) (Morud, 1996). This procedure

projects the information in the high-dimensional data spaces (X, V) dowi onto low­

dimensional spaces defined by a small number of latent variables. The NR (X) and

final product quality (Y) data sets were first mean-centred and scaled to mit variance

and then decomposed according to equations (1.7.18) and (1.7.19) respectively.

The number of PLS components required to extract the information fromX and Y was

judged to be 6 components for all models. This number of components vas selected as

it modelled a considerable amount of the Y data and was also the numbeiof

components used with most principal components analysis model in Chapter 3. An

advantage of this algorithm is its ability to handle missing data.

4. 3. 1 Singleblock PLS Model Variability

Singleblock PLS models (Wangen and Kowalski, 1988) were created forraw and pre­

treated spectral data sets (SNV DT, 11 point quadratic Savitzky-Golay snoothed 2"^

derivative) for the blends used to produce lower strength tablets {n = 39 batches) and for

combined absorbance and transmission measurements of the lower strength tablets {n =

39 batches). The models were calculated using the average spectrum of tlose recorded

for each batch. The average spectrum of each batch was used instead of several

measurements to eliminate systematic variability within measurements of the same

batch introduced by particle size effects and differences in scatter from tie surfaces of

the glass vials and tablets and also because average measurements tend tc follow a

normal distribution.

182
Single block PLS models were also created for raw and pre-treated NIR data sets of

blends used to produce higher strength tablets (« = 41 batches) and of combined NIR

absorbance and transmission measurements of higher strength tablets (« = 41 batches)

using the average spectrum of each batch. This produced a total of 12 models of blends

and tablets for each strength and for raw and pre-treated data sets which were monitored

subsequently.

For each of the singleblock PLS models 6 components were considered to be an

appropriate rank for the models. The decision to use this size of model was based on

previous experience of multivariate PCA projection of these data and because this rank

explained most variability within the NIR data sets and significant amounts of

variability within the certificate of analysis data sets (Appendix C: Tables C l to C4).

With raw data sets for blend and tablet models, the high amount of variance explained,

typically above 99.6%, is due to multiplicative scatter within the data sets which

accounts for most variability in the spectra. The amount of variance accounted for by

these models of the certificate of analysis data was therefore not surprisingly lower: 37

and 53% for lower strength blend and tablet models respectively and 27 and 29% for

higher strength blend and tablet models respectively. The lower variability accounted

for in the certificate of analysis data probably arises from the fact that these data do not

contain any particle size information.

With the SNV DT singleblock PLS models, slightly lower variability in the NIR data

sets is accounted for by the models (Appendix C: Tables C l to C4) due to scatter

correction. The 11 point Savitzky-Golay 2"^ derivative transformation effectively

removed much of the multiple scatter information from the NIR data sets and produced

single block PLS which accounted for similar amounts of variability for both the NIR

and certificate of analysis data sets (Appendix C: Tables C l to C4).

183
4. 3. 2 Quantitative Calibration o f Individual Certificate o f Analysis Blend and

Tablet Variables by Partial Least Squares Regression

Multivariate projection methods for monitoring process operating performance of

multivariate processes have been shown to work well where all process data and

product quality data are monitored (MacGregor et al, 1994). This is because the

variability within and between process and product quality variables is required to

model the process (MacGregor et al, 1994). In this study, the ability to produce

quantitative models of the individual certificate of analysis variables was investigated.

Only low amounts of variability could be modelled for some of the individual blend and

tablet variables and was not considered accurate enough for future prediction of

individual reference analysis variables (Appendix C: Tables C7 & C8). The low

variance within each variable is likely to be the reason for this.

4. 3. 3 Singleblock PLS Loadings

The loadings of the single block PLS models appeared to represent physical and

chemical information. However, their precise interpretation was difficult, especially

with the combined tablet absorbance and transmission data.

4. 3. 4 Q Statistic Monitoring o f Unusual Batches

The performance of this control chart was determined for each model type and for the

different data pre-treatments by comparison of results above the 99% significance level

with the results from the certificate of analysis reference data {i.e. significant Q values

and corresponding unusual reference analysis values). In particular, the ability of this

chart to identify anomalies in the process in batches preceding those which produced

lower quality product and failed reference laboratory tests, was examined as this would

be useful for detecting trends in the process over time. PLS models for each process

184
stage (blend and tablet) were created for the different pre-treated data sets in a recursive

fashion, with batches whose Q statistic exceeded the 99% significance level excluded

from the model. Each PLS model had 6 components

Singleblock PLS Models

With the lower strength tablet data set, raw data showed batches 22 to 26 as having

processed unusually at the blend stage (Appendix C: Table C9). Of these batches,

batches 23 (BN5039), 24 (BN5039 re-blend) were re-blends of a blend which was found

to exhibit excessive moisture deviation throughout (moisture deviation for batch 23 =

5.94%, limit: < 5%). With the SNV DT and Savitzky-Golay 2"^ derivative blend data

sets, batch 25 (BN5040) was found to have processed unusually despite normal

certificate of analysis data (Appendix C: Table C9). With the SNV DT data set, batches

34 (BN5065), 35 (BN5067) and 36 (BN5075) were found to have processed unusually

at the blend stage (Appendix C: Table C9) but had reference laboratory results within

limits. The PLS models for lower strength tablets showed some agreement with this

result: batches 36 and 37 (BN5076) were outside the 99% limit with the SNV DT and

Savitzky-Golay 2"^ derivative models; batch 32 (BN5062) was outside the control limit

with raw and SNV DT models and batch 33 (BN5064) was outside the limit for raw

data (Appendix C: Table CIO). The certificate of analysis results showed that batch 33

produced tablets which were friable (1 mg). Reference data for batches 34 and 35 were

missing. For batch 37 the drug substance content was found to be low (4.83 mg/tablet,

range: 4.85 to 5.15 mg/tablet). With the Savitzky-Golay tablet model, batches 26

(BN5041), 27 (BN5042) and 28 (BN5055) were found to have processed unusually

(Appendix C: Table CIO). Their certificate of analysis data revealed that batches 26 and

27 showed friability of 1 mg and 2 mg respectively.

Clearly, these control charts were able to detect unusual process behaviour at the blend

185
and tablet stages with consistency between results for the two stages. Raw data and

SNV DT data sets were able to detect process anomalies at the blend stage which were

not detected by the reference analytical data.

With the higher strength tablet data set, batches 35 (BN5071), 36 (BN5073), 37

(BN5074), 38 (BN5078), and 39 (BN5079) were all found to exceed the 99% limit with

the SNV DT blend data set (Appendix C: Table C l 1). The certificate of analysis data

for the blends of these batches were all within limits. However, the reference analytical

data showed that batches 35, 37 and 38 produced tablets which exhibited friability of 1

mg, 4 mg and 1 mg respectively. In addition, batches 35, 38 and 39 also had average

tablet thicknesses which were above the limit of 4.6 mm (all 4.62 mm). Batches 32

(BN5068), 35 and 38 were outside the 99% limit for the Q statistic with the Savitzky-

Golay 2"^ derivative blend data (Appendix C: Table Cl 1) and batches 33 (BN5069), 34

(BN5070) and 39 were outside the 99% limit with the blend raw data (Appendix C:

Table C l 1). Batches 32 and 33 did not produce tablets which were friable, but the

average tablet thicknesses for these batches were 4.62 mm and 4.61 mm respectively -

outside the limit. Batch 34 produced tablets which showed longer than normal

disintegration time (17 seconds) and had an average tablet thickness of 4.66 mm, which

is above the limit. The tablet Q statistic control charts which performed best were those

for raw and Savitzky-Golay 2"^ derivative data. These identified batches 35 and 38 and

batches 35, 37 and 39 as unusual respectively (Appendix C: Table C l2). With the

Savitzky-Golay 2"^ derivative tablet PLS model, batches 27 (BN5053) and 28

(BN5054) were identified as unusual (Appendix C: Table C l2). These batches were

both friable (4 mg and 1 mg respectively) and batch 28 had an average tablet thickness

of 4.63 mm. With the higher strength tablet raw data PLS model, batches 28 (BN5054),

29 (BN5058), 30 (BN5059) and 31 (BN5060) were identified as unusual (Appendix C:

Table C12). Batches 29 to 31 had average tablet thicknesses above the limit of 4.6 mm.

186
The Q statistic control charts for the higher strength tablet process were also able to

identify batches and groups of consecutive batches which differed from the normal data

set at both the blend and tablet stages. The manufacture of these batches ultimately

produced tablets of lower quality. This deviation from normal process operating

performance was identifiable from blend data despite showing no unusual reference

analytical results at that stage. Overall, the SNV DT data set showed the best

performance for the higher strength tablet process of the pre-treatments tested.

4. 3. 5 MSPC o f Singleblock PLS Models o f Blends A nd Tablets

The PLS scores of the single block models corresponding to the latent vectors of the

NIR data were used for MSPC monitoring. Estimation of the control phase 1 batches

was performed by a Monte Carlo simulation. This involved random selection of the

scores of 8 or 9 batches and construction of 99% Hotelling’s f^ control ellipses as

described in Chapter 3. The Hotelling’s distance was then measured for the scores of

the remaining batches from this ellipse and batches which had significant values were

recorded. This process was repeated 200 times to produce a frequency bar chart which

showed the frequency that any batch had been found to be significantly different from

the control group. Batches which had a frequency greater than zero were not used for

estimation of the process variance-covariance matrix and process mean vector. All

batches were then monitored in control phase 2 using those batches which were deemed

to be in-control {i.e. Monte Carlo bar chart frequency = 0) as the control phase 1 group.

Hotelling^s Control Phase 1 For Blends A nd Tablets

With raw data for the blends of lower strength tablet process and higher strength tablet

process batches, the Monte Carlo search was unable to identify any unusual batches

(Appendix C: Tables C17 and CIS). Most batches were therefore considered by the

187
algorithm to be in-control {n - 39 batches of lower strength tablet process blends, « = 36

batches of higher strength tablet process blends). For both processes, examination of the

scores revealed that they were evenly divided into two clusters. This was found to be

due to a difference in offset in the original blend absorbance data (Fig. 4.1), probably

arising from different particle size distributions and porosity. With the PLS models

produced from raw tablet data, these distinct clusters were not observed on the score

plots, with some batches identified as unusual at control phase 1. The results of the

tablet models produced from raw data for both strengths of tablet showed agreement

with results from the Q statistic control charts (Appendix C: Tables C19 & C20), hence

raw data was not considered useful for monitoring the blends and spectral scatter

correction was considered necessary. With the SNV DT transformation, 17 and 31

batches were used in control phase 1 for the lower strength tablet process and higher

strength tablet process blends respectively (Appendix C: Tables C17 & CIS). With the

Savitzky-Golay 2"^ derivative data, 29 and 32 batches were used in control phase 1 for

the lower strength tablet process and higher strength tablet process blends (Appendix C:

Tables C l7 & CIS). PLS models for the two strengths of tablet used between 17 and 2S

batches for the lower strength tablet process tablets (Appendix C: Table C l9) and used

between 19 and 34 batches for the higher strength tablet process tablets (Appendix C:

Table C20).

Hotelling*s Control Phase 2 For Blends A nd Tablets

Scatter correction was considered a useful pre-treatment of the blend absorbance data.

With the SNV DT scatter correction of the lower strength tablet process data set, a

number of batches, some consecutive in batch number, were found to have significant

Hotelling’s 7^ values (Appendix C: Table C17). These were compared with results of

188
0.3

- 0.1

- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm

0.3

n - 0.1

- 0.2
1200 1400 1600 1800 2000 2200 2400
Wavelength/nm

Fig. 4.1. Blends batch mean absorbance spectra for: A) lower strength tablet
process {n = 39 spectra) and B) higher strength tablet process (n = 41 spectra),
showing spectra for each process divided into two classes with different offsets
(batches 1 to 18 for lower strength process blends have lowest offsets, batches: 1 to
17 ; 19 to 20; and 23 to 25 for higher strength process blends have lowest offsets).

189
the lower strength tablet process SNV DT lower strength tablet process data set. Batches

25 (BN5040), 26 (BN5041) and 27 (BN5042) and batches 33 (BN5064) and 35

(BN5067) were found to have significant values at both the blend and tablet stage

(Appendix C: Tables C17 & C19). Batches 26 and 27 produced tablets with friability of

1 mg and 2 mg; batch 33 produced tablets with friability of 1 mg (tablet reference

analysis data for batch 35 was not recorded). The Savitzky-Golay 2"^ derivative

transformation did not detect these batches as unusual from their blends, however

batches 33, 34 (BN5065) and 35 were detected as unusual from the tablet PLS model

(Appendix C: Table C l9) (tablet reference analysis data for batch 34 was not recorded).

For the lower strength tablet process, the SNV DT transformation was considered to be

the most appropriate pre-treatment of those tested for blend and tablet data.

With the higher strength tablet process data sets, the SNV DT transformation detected

batches 23 (BN5049)and 24 (BN5050) as unusual at both the blend and tablet stages.

Batch 25 (BN5051) was detected as unusual at the blend stage. The blends were not

found to have unusual reference analysis values, however they produced tablets with

average thicknesses of 4.62 mm, above the limit of 4.6 mm. The SNV DT and Savitzky-

Golay tablet models detected batches 34 (BN5070) and 38 (BN5078) and 39 (BN5079)

as unusual (Appendix C: Table C20). Batch 34 produced tablets with unusually long

disintegration time (17 seconds) and which had an average thickness greater than the

maximum limit (average thickness = 4.66 mm). Batches 38 and 39 had average tablet

thicknesses above the maximum limit (4.62 mm for both batches) and batch 38

produced tablets with 1 mg friability. With the Savitzky-Golay blend model, batch 38

could be detected as unusual despite having apparently normal reference analysis results

(Appendix C: Table C l8). Both the SNV DT and Savitzky-Golay 2"^ derivative pre­

treatments produced useful blend and tablet models.

190
4. 4 Statistical Quality Control of The Entire Process by Multiblock PLSR

4. 4 .1 Multiblock Partial Least Squares Model Generation

Multiblock PLS has been proposed as an alternative projection method to single block

PLS for situations with large numbers of variables that can be divided into distinct

process sections (X blocks) (MacGregor et al, 1994; Wangen and Kowalski, 1988). The

data sets used in this study may be considered as two process X blocks: blend stage, X I

(blend spectra) and tablet stage, X2 (combined tablet absorbance and transmission data).

The final product quality data, Y, are the blend and tablet certificate of analysis

reference data combined (14 variables). MacGregor (1994) states that multiblock

projection methods allow for easier interpretation of process data because smaller

meaningful blocks can be individually monitored as may the relationship between these

blocks.

The multiblock PLS algorithms used in this study were variations of those of Wold et al

(1987) and Wangen and Kowalski (1988). This algorithm leads to a set of orthogonal

loading vectors (w /a , a= 1 , 2 , . . . ) and orthogonal latent vectors (r /a , a= 1 , 2 , . . . ) for

each block X/. The X/ blocks are then represented in terms of their leading A PLS

components as:

= (4.4.1)
a=\

X 2 = '^ t 2 ^ p 2 l + E 2 (4.4.2)
a=\

This enables monitoring and construction of diagnostic plots for each block separately,

as previously described for singleblock PLS. This algorithm is also able to effectively

handle missing data. An overall monitoring space for the process may be obtained by

191
using projections in the latent vector space {tCa, a= 1,2, ...) of the consensus matrix T

formed by collecting the latent vectors from the individual blocks. The score vectors of

this consensus matrix are no longer orthogonal, however it has been shown that where

blocking of the process variables has been done in a meaningful fashion, these vectors

should continue to define the same plane as the latent vectors obtained by single block

PLS, and provide essentially the same predictions of Y:

Y = ± tc ,g l (4.4.3)
a=]

A check on whether the blocking has been done well is to compare predictions of Y

obtained from the singleblock and multiblock algorithms for the same number of

dimensions (A). These should be comparable.

4, 4. 2 Multiblock PLS Model Variability

The multiblock PLS models were found to account for similar amounts of variability

within each process stage NIR data set and for the certificate of analysis data as was

explained by the single block PLS models (Appendix C: Tables C5 to C6). The raw and

SNV DT models explained considerably more variability within the NIR data sets than

in the certificate of analysis data sets, as with single block PLS models. The Savitzky-

Golay smoothed second derivative produced multiblock PLS models which accounted

for similar amounts of variability within the NIR and certificate of analysis data sets

(Appendix C: Tables C5 to C6).

These results suggest that the multiblock models are able to model the data as

effectively as the single block PLS models for each stage of the process and that the

Savitzky-Golay 2"^ derivative transformation produces models which explain similar

amounts of variation in both the NIR and reference analytical data.

192
4. 4. 3 Multiblock PLS Loadings

The loadings of the multiblock PLS models appeared to represent physical and chemical

information. However, their precise interpretation was difficult, especially with the

combined tablet absorbance and transmission data.

4, 4. 4 Q Statistic Monitoring o f Unusual Batches

The performance of this control chart was determined for each model type and for the

different data pre-treatments by comparison of results above the 99% significance level

with the results from the certificate of analysis reference data {i.e. significant Q values

and corresponding unusual reference analysis values). In particular, the ability of this

chart to identify anomalies in the process in batches preceding those which produced

lower quality product and failed reference laboratory tests, was examined as this would

be useful for detecting trends in the process over time. MB PLS models for each process

stage (blend and tablet) were created for the different pre-treated data sets in a recursive

fashion, with batches whose Q statistic exceeded the 99% significance level excluded

from the model. Each MB PLS model had 6 components

Multiblock PLS Models

With the lower strength process multiblock PLS models, batch 36 (BN5075) was found

to have processed unusually at both the blend and tablet stages for all data sets,

consistent with results for the single block PLS results of tablet models and SNV DT

blend PLS model (Appendix C: Tables CIS & C14) (Fig. 4.2). This batch did not show

unusual reference analytical results at either process stage, however it occurred within a

period where unusual product was produced. Batches 25 and 26 (BN5040 and BN5041)

were found to have exceeded the 99% limit for Q at both the blend and tablet stage with

the Savitzky-Golay 2"^ derivative data consistent with singleblock PLS models

193
(Appendix C: Tables C13 & C14) (Fig. 4.3). With the SNV DT and Savitzky-Golay 2"^

derivative lower strength tablet models, both batches 36 and 37 exceeded the 99% limit

for Q (Appendix C: Table C14); batch 37 (BN5076) was found to have an average drug

substance content per tablet below the minimum limit. This result is consistent with

those of single block PLS models and also confirms the results of this control chart at

the blend stage where batch 36 was identified as having processed unusually. With the

lower strength process multiblock PLS models, the SNV DT model performed best at

the blend stage and the Savitzky-Golay 2"^ derivative model performed best at the tablet

stage.

With the higher strength tablet process data sets, consistent results were obtained

between blend and tablet control charts. Batch 34 (BN5070) and batches 36 to 39

(BN5073, BN5074, BN5078, BN5079) were found to have processed unusually at the

blend and tablet stage with raw data (Appendix C: Tables CIS & C16) (Fig. 4.4). With

SNV DT data, batches 28 to 30 (BN5054, BN5058, BN5059) were found to have

processed unusually at the blend stage, and batches 28 and 30 were found to have

processed unusually at the tablet stage (Appendix C: Tables CIS & C16) (Fig. 4.S). The

results for Savitzky-Golay 2"^ derivative data were very consistent: batches 27 to 33

(BNS0S3, BNS0S4, BNS0S8, BNS0S9, BNS068, BNS069) and batches 3S, 38 and 39

(BNS071, BNS078, BNS079) were found to have processed unusually at both the blend

and tablet stages (Appendix C: Tables CIS & C16) (Fig. 4.6). These results are

consistent with those of the single block PLS models. For the higher strength tablet

process multiblock PLS models, the Savitzky-Golay 2"^ derivative model performed

best.

194
A


O 40

1 •
5 10 15 20 25 30 35
O b s e rv a tio n

300
B
250 -

200

O 150 - -

100

50

0
5 10 15 20 25 30 35
O b s e rv a tio n

Fig. 4.2. Multiblock PLS Q statistic control charts for the lower strength tablet
process: A) blends, B) tablets (SNV detrend data, n =29 batches for PLS
modelling).

1000
A

800 -

600

400 - -

200

0 1 •
5 10 15 20 25 30 35
O b s e rv a tio n

O 400

• •

5 10 15 20 25 30 35
O b s e rv a tio n

Fig. 4.3. Multiblock PLS Q statistic control charts for lower strength tablet
manufacturing process: A) blends; B) tablets (11 point quadratic Savitzky-Golay
smoothed second derivative data, n =29 batches for PLS modelling).

195
A '


. ' . ' '

5 10 15 20 25 30 35 40
O b s e rv a tio n

O b s e rv a tio n

Fig. 4.4. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (raw data, n = 28 batches out of 41 used for
PLS model).

20 25
O b s e rv a tio n

O 1000

20 25
O b s e rv a tio n

Fig. 4.5. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (SNV detrend data, n = 32 batches out of 41
used for PLS model).

196
20 25
O b serv atio n

20 25
O b serv atio n

Fig. 4.6. Q statistic control charts for multiblock PLS model of the higher strength
tablet process: A) blends, B) tablets (11 point quadratic Savitzky-Golay smoothed
second derivative data, n =22 batches out of 41 used for PLS model).

197
Overall, the Q statistic monitoring charts enabled construction of single block and

multiblock PLS models which represented the variability present within batches

produced whilst the process was operating in a normal manner. The results of

multiblock models were consistent with those of singleblock PLS models.

4. 4. 5 MSPC o f Multiblock PLS Blend A nd Tablet Models

Preliminary work with multiblock PLS models showed that these produced consistent

results between blocks with the Q statistic control charts and with the Hotelling’s

control charts. The MEWMA Hotelling’s 7^ and Anderson’s asymptotic normal

approximation control charts were therefore only considered with these models.

H o t e l l i n g C o n t r o l Phase 1 For Blends A nd Tablets

With both lower strength and higher strength tablet process multiblock PLS models,

Savitzky-Golay 2"^ derivative transformation of the blend absorbance data was found to

be necessary. The Monte Carlo search algorithm was unable to detect any unusual

batches with blend absorbance data for the lower strength tablet process (Appendix C:

Table C21) and could only detect two batches as unusual with higher strength tablet

process SNV DT blend data set (Appendix C: Table C22). With the higher strength

process blend absorbance data, this search algorithm identified a cluster of batches

which have been shown to have a different spectral offset only (Appendix C: Table

C22) (Fig. 4. IB). With the lower strength process SNV DT blend data, the first 18

batches were identified as unusual by the search algorithm, however these have been

shown to differ only in their spectral offset from the remaining batches (Appendix C:

Table C21) (Fig. 4.1 A). The Savitzky-Golay 2"^ derivative pre-treatment effectively

removed the offset and was considered to be useful for the blend data set. The number

of batches selected by the Monte Carlo search with this blend data set was 28.

198
With the lower strength tablet data sets, scatter correction was not necessary and all

models used between 25 and 29 batches (Appendix C: Table C23). With the higher

strength tablet data sets, scatter correction of the raw spectra was necessary (Appendix

C: Table C24).

Hotelling^s Control Phase 2 For Blends A nd Tablets

With the lower strength tablet process model generated from Savitzky-Golay 2"^

derivative blend data, batches 33 (BN5064), 34 (BN5065) and 35 (BN5067) were

identified as unusual; batch 33 produced tablets of 1 mg friability (Appendix C: Table

C21, note: no reference analysis data were recorded fo r batches 34 and 35). These

results are consistent with those for the single block PLS models, and show that the

model can detect deviations in the process from the blend stage which are not detectable

with current reference analysis. These results are shown clearly with the MSPC control

charts (Fig. 4.7B) and on the PLS components 1 and 2 score plot (Fig. 4.8). The score

plot shows that these batches have deviated away from the normal operating region

defined by the 99% control ellipse, in addition batch 26 (BN5041) lies outside the 95%

control ellipse. With the lower strength tablet models, all three of these batches were

identified as unusual, in addition with the Savitzky-Golay 2"^ derivative model, batches

25 (BN5040) and 26 (BN5041) were identified as unusual (Appendix C: Table C23).

These results are shown with the MSPC control charts (Fig. 4.9B) and on the PLS

component 2 and 1 score plot (Fig. 4.10). With this plot, batch 25 lies outside the 99%

control region, and deviation from the normal process operating region was observed

for batches 33, 34 and 35 with increasing distance.

199
H o tellin g ’s 7^ c o n tro l c h a rt; p h a s e 1 H o te liin g 's c o n tro i c h a rt: p h a s e 2

10" T- 10*

10"' 10 20 25
10"'
5 5 10 15 20 25 30 35
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v aria b iiity c o n tro l c h a rt: p h a s e 2

10"' 5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.7. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.

200
25

•16

•3 2
•3 9

•10 <37,
<31
a. 0.1

-5
•m 02

-10

-15
<33

-2 0 •3 5
•3 4

-2 5
-3 0 -20 -1 0 0 10 20 30
PLS com ponent 2 sc o res

Fig. 4.8. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components 2
and 1 scores of multiblock PLS model of lower strength tablet process blends (11
point Savitzky-Golay smoothed second derivative data, n = 29 batches for PLS
modelling; 28 batches used for optimising control limits).

201
H o te llin g 's 7* c o n tro l c h a rt: p h a s e 1 H o tellin g ’s 7^ c o n tro l c h a rt: p h a s e 2

I
I

10 *
5 10 15 20 25 5 10 15 20 25 30 35
I n -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt; p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

10 '
5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 4.9. Multivariate statistical process control charts for multiblock PLS model of
lower strength tablets (11 point Savitzky-Golay smoothed second derivative data, n
= 25 control phase 1 batches): A) Hotelling’s 7^ control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.

202
40 47
•39 -38

41

•14
c -1 0 ■20
g.

43
-30 44

-40 •35

-50
-50 -40 -30 -2 0 -1 0 0 10 20 30 40 50
PLS component 2 scores

Fig. 4.10. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
2 and 1 scores of multiblock PLS model of lower strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n - 2 9 batches for PLS
modelling; 25 batches used for optimising control limits).

203
With the higher strength tablet process blend data sets, the Savitzky-Golay 2"^

derivative produced results which were most consistent with single block PLS models.

Batches 28 (BN5054); 29 (BN5058) and 38 (BN5078) were found to have processed

unusually with this pre-treatment (Appendix C: Table C22) (Fig. 4.11B). This is a

useful result as these batches produced lower quality tablets despite having normal

blend reference analysis results. The higher strength tablet models produced results

consistent with single block PLS and reference analysis data for the SNV DT and

Savitzky-Golay 2"^ derivative transformations respectively (Appendix C: Table C24)

(Figs. 4.12B & 4.13B). Batches 34, 38 and 39 (BN5070, BN5078 and BN5079) were

found to be unusual with SNV DT tablet data, and batches 34 and 38 were unusual with

Savitzky-Golay 2"^ derivative data. This is shown clearly with the MS PC control charts

(Figs. 4.12B & 4.13B) and with PLS components score plots (Figs. 4.14 to 4.17). With

the PLS component 6 and 1 score plot of SNV DT higher strength tablet data (Fig.

4.15), deviation from normal process operating performance can be observed from

batch 27 (BN5053) to 28 (BN5054). These batches were above the 95% Hotelling’s 7^

limit with the MS PC charts. Deviation from normal operating performance can be

clearly observed from batches 38 and 39. In addition, batch 37 (BN5074) lies just

outside the 95% ellipse but at the opposite end of the ellipse from batches 38 and 39.

Interestingly, this batch did not exhibit excessive average tablet thickness as batches 38

and 39 did, however the tablets showed 4 mg friability. Similar results were observed

with the PLS component 6 and 5 score plot (Fig. 4.14). With the Savitzky-Golay 2"^

derivative data, batches 38 and 39 also lay outside the 99% control region on PLS

components score plots (Figs. 4.16 & 4.17). Batch 27 lay outside the 95% control limit

on PLS component 1 and 4 score plot (Fig. 4.16).

These results clearly show that the multivariate PLS projection method produces

excellent diagnostic ability for identifying deviations from normal process operating

204
Hotelling's 7^ control chart: p h a se 1 H otelling's 7^ controi chart: p h a se 2

10*

10*
5 10 15 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro i c h a rt: p h a s e 2

5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.11. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (11 point Savitzky-Golay smoothed second
derivative data, n = 28 control phase 1 batches): A) Hotelling’s 7^ control phase 1
chart; B) Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate
exponentially weighted moving average control chart; D) Anderson’s asymptotic
normal approximation control chart.

205
Hotelling’s 7* control chart: p h a se 1 Hoteiling s 7^ controi chart: p h a s e 2

10"' 10" '


5 10 15 20 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p r o d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v ariab iiity c o n tro l c h a rt: p h a s e 2


1500

X 1000

c 500

5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.12. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablets (11 point Savitzky-Golay smoothed second derivative
data, n = 21 control phase 1 batches): A) Hotelling’s control phase 1 chart; B)
Hotelling’s 7^ control phase 2 chart; C) Hotelling’s 7^ multivariate exponentially
weighted moving average control chart; D) Anderson’s asymptotic normal
approximation control chart.

206
Hotelling’s 7® control chart: p h ase 1 H oteiiing's 7® control chart: p h a se 2

10"

10"' 5 10 15 20
10"' 10 15 20 25 30 35 40
5
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

X100

10" '
5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.13. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablets (SNV detrend data, n = 24 control phase 1 batches): A)
Hotelling’s control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.

207
g2 -3 3

•14
•1 -3 6
0.1 ■24 ^ 5
40

-15 <39

-2 0
•3 8

-2 5
-2 5 -2 0 -15 -1 0 -5 0 5 10 15
PLS com ponent 6 sc o res

Fig. 4.14. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
6 and 5 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).

208
50

40

■24
<39
■26
20 •3 8

•10 •3 7

•12
•36 ■21

CL
•32 <33
o _10

-2 0
•15
-30
•14
-40
•13
-50
-25 -2 0 -15 -1 0 -5 0 5 10 15
PLS com ponent 6 sc o res

Fig. 4.15. Hotelling’s 7^control ellipses (95% and 99% limits) for PLS components
6 and 1 scores of multiblock PLS model of higher strength tablets (SNV detrend
data, n = 28 hatches for PLS modelling; 24 hatches used for optimising control
limits).

209
■38

20

•10/
<39
•30
•3€3

a.

,■21
•14
-1 0

•IB
•19
-2 0 •13

-3 0
-4 0 -3 0 -2 0 -1 0 0 10 20 30
PLS com ponent 4 sc o r e s

Fig. 4.16. Hotelling’s 7^ control ellipses (95% and 99% limits) for PLS components
4 and 1 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n = 22 hatches for PLS
modelling; 21 hatches used for optimising control limits).

210
^4 c
•15 <32
<35
40
2o 41
O •2 9

c ■20 -2 4
66
g •11 <37
& •10
E -1 0
O
ü •21

Q.
<39
-2 0

-3 0
•3 8

-4 0
-3 0 -2 5 -2 0 -1 5 -1 0 -5 0 5 10 15 20
PLS com ponent 5 sc o r e s

Fig. 4.17. Hotelling’s control ellipses (95% and 99% limits) for PLS components
4 and 5 scores of multiblock PLS model of higher strength tablets (11 point
Savitzky-Golay smoothed second derivative data, n =22 hatches for PLS
modelling; 21 hatches used for optimising control limits).

211
conditions. Deviation may be observed at the blend stage before tabletting, despite

normal reference analysis results. The monitoring plots may be further simplified to two

PLS scores plots once statistical limits and in-control batches have been established.

These enable both determination of process deviation and also diagnosis of the problem

as the regions in which the scores are located are characteristic of the problem.

Multivariate Exponentially Weighted Moving Average Control Charts

These control charts were able to successfully identify drift in the process mean vector.

With the lower strength and higher strength process blend models produced from raw

data, approximately half of the batches exceeded the MEWMA Hotelling’s 7^ limit of

17.72. An example for higher strength tablet process blends is given in Fig. 4.18C. This

chart was sensitive to the systematic variation in the blending process (See Chapter 3)

which resulted in an offset difference in their absorbance spectra (Fig. 4.1 A). However,

scatter correction of the blend and tablet data sets was necessary for monitoring the

blending stage. The MEWMA control charts for SNV DT and Savitzky-Golay 2"^

derivative data of the lower strength tablet process blends were both able to detect drift

in the process mean vector, which reached a state of statistical control and then drifted

out of control, above the MEWMA Hotelling’s 7^ limit of 17.72, from blend 33

(BN5064) onwards (Fig. 4.7C). With the higher strength tablet process blend data sets,

the Savitzky-Golay data and SNV DT data both show the MEWMA Hotelling’s 7^

drifting in and out of control, above the limit of 17.72 (Figs 4.11C & 4.19C

respectively). The Savitzky-Golay data were better able to identify drift in the process

mean vector as it performed better with the Hotelling’s 7^ control phase 2 charts.

With the lower strength tablet data sets, the performance of the MEWMA Hotelling’s 7^

chart also depended on the ability of the model to identify unusual batches from the

212
Hotelling’s 7^ control ctiart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2

10'

10"' 10"'
5 10 15 20 25 5 10 15 20 25 30 35 40
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v aria b ility c o n tro l c h a rt: p h a s e 2

10'^ 5 10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p ro d u c tio n b a tc h e s F u tu re p r o d u c tio n b a tc h e s

Fig. 4.18. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (raw data, n = 25 control phase 1 batches):
A) Hotelling’s control phase 1 chart; B) Hotelling’s 7^ control phase 2 chart; C)
Hotelling’s 7^ multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.

213
Hotelling’s 7^ control chart: p h ase 1 Hotelling's 7^ control chart: p h ase 2
10

10

10" ' 10" '


5 10 15 20 25 30 35 5 10 15 20 25 30 35 40
I n -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b iiity c o n tro l c h a rt: p h a s e 2


60

50

40

: 30

20

10

0
-1 0

-2 0
10 15 20 25 30 35 40 10 15 20 25 30 35
F u tu re p r o d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.19. Multivariate statistical process control charts for multiblock PLS model
of higher strength tablet process blends (SNV detrend data, n =35 control phase 1
batches): A) Hotelling’s 7^ control phase 1 chart; B) Hotelling’s 7^ control phase 2
chart; C) Hotelling’s 7^ multivariate exponentially weighted moving average
control chart; D) Anderson’s asymptotic normal approximation control chart.

214
Hotelling’s control phase 2 charts. This control chart showed drift out of control with

the SNV DT data from batch 33 onwards (BN5064) (Fig. 4.20C), however the chart

using Savitzky-Golay 2"^ derivative data also identified systematic drift from batch 24

to 27 (BN5039 to BN5042) and was therefore considered to perform better (Fig. 4.9C).

For the higher strength tablet data sets, the SNV DT and Savitzky-Golay models

showed the MEWMA Hotelling’s 7^ mostly above the limit of 17.72 (Figs. 4.13C &

4.12C). This was due to batches exceeding the Hotelling’s 7^ 99% limits over the

production period.

Anderson Asymptotic Normal Approximation

With the lower strength tablet process data sets, the SNV DT model detected significant

process variance in the tablets of batches 26 (BN5041), which produced friable tablets,

and 36 (BN5075) (Appendix C: Table C23) (Fig. 4.20D).With raw tablet data, batch 37

(BN5076) was also detected as having exhibited excessive process variation at the tablet

stage (Appendix C: Table C23). The tablets from this batch were found to have low

drug substance content (4.83 mg per tablet). Batches 10 (BN5017) and 11 (BN5018)

were also found to have shown significant process variance with raw data (Appendix C:

Table C23). Examination of the reference analysis data revealed an increase in content

uniformity with these batches. Batch 10 had content uniformity of 2.43% and batch 11

had a content uniformity of 5.1%. Although less than the maximum limit of 6%, this

was still a high value (mean content uniformity = 1.66%, standard deviation = 1.06%, n

= 39 batches).

With higher strength tablet process blend data, the SNV DT model identified batches 23

(BN5049) and 38 (BN5078) as having exhibited significant process variance (Appendix

C: Table C22) (Fig. 4.19D). Both of these batches produced tablets of average

215
Hotelling’s 7^ control chart: p h a se 1 Hotelling’s 7^ control chart: p h a se 2

10 *

10

10' *
20
10*
5 10 15 25 5 10 15 20 25 30 35
In -c o n tro l b a tc h e s F u tu re p ro d u c tio n b a tc h e s

MEWMA c o n tro l c h a rt: p h a s e 2 P r o c e s s v a ria b ility c o n tro l c h a rt: p h a s e 2

E 60

40

10* 5 10 15 20 25 30 35 10 15 20 25 30
F u tu re p ro d u c tio n b a tc h e s F u tu re p ro d u c tio n b a tc h e s

Fig. 4.20. Multivariate statistical process control charts for multiblock PLS model
of lower strength tablets (SNV detrend data, n = 29 control phase 1 batches):
A) Hotelling s 7^ control phase 1 chart; B) Hotelling’s control phase 2 chart; C)
Hotelling’s multivariate exponentially weighted moving average control chart;
D) Anderson’s asymptotic normal approximation control chart.

216
thickness 4.62 mm (above the maximum limit of 4.6 mm); batch 23 (BN5049) produced

tablets with a high moisture content of 4.44%, close to the maximum limit of 4.5%

(mean moisture content = 3.18%, standard deviation of moisture content = 0.31%, n =

41 batches) and batch 38 (BN5078) produced tablets which were friable (1 mg). Batch

38 was also identified as having shown significant process variance at the tablet stage

with Savitzky-Golay 2"^ derivative data (Appendix C: Table C24) (Fig. 4.12D).

The Anderson’s asymptotic normal approximation control chart is able to detect an

increase in process variance that leads to extreme final product quality on a number of

variables. With the multiblock models, this may also be traced back to the blend stage

despite apparently normal blend reference analysis values. This demonstrates the power

of the ‘multivariate projection of NIR process data’ method over the traditional wet

chemical tests used.

4. 5 Sum m ary of Results

The results of this chapter are summarised in Tables 4.1 (lower strength tablet process)

and 4.2 (higher strength tablet process).

217
Table 4.1. Summary of results for lower strength tablet process.
NIR blend absorbance NIR tablet combined data NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number data predicted unusual predicted unusual predicted unusual tablet data predicted
unusual

G f A" G 0 A G A

4993 1 •
4994 2
4996 3 .
4997 4 . •
4998 5
5002 6
5003 7 .
5013 8
5016 9 .
5017 10 . • . Increase in content uniformity
5018 11 • . • Increase in ccmtent uniformity
5025 12
5025"* 13
5026 14
5027 15
5028 16
5028"* 17 •
5029 18 .
5035 19
5036 20
5037 21
5038 22
5039 23 • Excessive moisture deviation
5039"’ 24 . • Excessive moisture deviation
5040 25 • • • • .
5041 26 • • • • • • • . • Friability o f Im g
5042 27 • • • • Friability o f 2mg
5055 28
5056 29
5057 30
5061 31
5062 32
5064 33 . • Friability o f Img
5065 34 N/A' C of A. data missing
5067 35 N /A' C of A. data missing
5075 36 • • • •
5076 37 • Low drug ccxitent per tablet
5077 38
5967 39

represents Anderson’s asymptotic normal approximation


rb
denotes a re-blended batch

N/A denotes certificate of analysis data not available

218
Table 4.2. Summary of results for higher strength tablet process.
NIR PLS blend NIR PLS tablet combined NIR MBPLS blend data NIR MBPLS combined Unusual C. of A. Comment
number absorbance data predicted data predicted unusual predicted unusual tablet data predicted
unusual unusual

Q t Q t A Q A Q A

4991 1 •
4992 2 •
4999 3 •
5000 4 • •
5001 5
5009 6
5010 7
5011 8 •
5021 9 . . •
5022 10
5023 11
5024 12
5031 13 • • • • . •
5032 14 • • •
5033 15 • •
5043 16
5043"’ 17
5043" 18 •
5044 19 • • .
5046 20
5047 21 • •
5048 22
5049 23 • • • • • Thick tablets (4.62 m m ) of high
moisture (4.44%)
5050 24 • . Thick tablets (4.62 m m )
5051 25 • Thick tablets (4.62 m m )
5052 26
5053 27 Friability o f 1 mg
5054 28 • • Friability 4 m g; tablet thickness
= 4.63 mm
5058 29 • Thick tablets (>4.6 mm)
5059 30 Thick tablets (>4.6 mm)
5060 31 Thick tablets (>4.6 mm)
5068 32 Thick tablets (4.62 m m )
5069 33 Thick tablets (4.62 m m )
5070 34 . Thick tablets (4.61 mm)
5071 35 • • • Long tablet disintegration time
(17 s); thick tablets (4.66 mm)
5073 36
5074 37 • Friability o f 4 mg
5078 38 • • • • • • Friability o f 1 mg; thick tablets
(4.62 mm)
5079 39 • • • • Thick tablets (4.62 m m )
5080 40
5081 41

^ represents Anderson’s asymptotic normal approximation


^ denotes re-blended batch

219
4 .6 Conclusion

Both the singleblock and multiblock PLS models appear to be equally effective in

modelling the variability within the reference analytical data at both process stages. The

loadings of the models suggest that physical and chemical information is modelled.

For identification of batches which processed unusually (Q statistic), scatter correction

(SNV-DT or Sg2dl 1) of the NIR data was useful. This was especially the case with

blend NIR measurements. Batches which processed in an unusual manner could be

detected at the blend stage, even though the reference analytical data did not detect

anything unusual. For example, batches of unusual blends which produced friable

tablets. The Q statistic control charts of singleblock and multiblock models were also

able to identify trends in process deviation at both process stages: consecutive batches

which exhibited unusual reference analytical values, eg excessive blend moisture

deviation, low average drug substance content per tablet, friability, high average tablet

thickness and prolonged tablet dissolution time. Overall, the SNV-DT scatter correction

was considered the most appropriate, of those tested for blend data, for detection of

unusual processing by this control chart. For tablets, the Sg2dl 1 was considered the best

scatter correction method tested.

The MS PC control charts of the singleblock and multiblock PLS scores were also able

to identify unusual batches, as with the Q statistic charts. For example, batches which

produced friable tablets could be detected at the blend stage, despite normal reference

analytical data, and also from their tablet NIR measurements. Scatter correction by

Sg2dl 1 transformation was found to be the best data pre-treatment of blends, as this

produced the most consistent (i.e. best pre-treatment for detection of process deviation)

results between process stages. Interestingly with the lower strength tablet models,

scatter correction was not necessary to identify unusual batches (friable or increased

average tablet thickness), however for detection of process variation (drift), the Sg2dl 1

220
produced best results. The models of the higher strength tablets did require scatter

correction, eg SNV-DT or Sg2dl 1.

Once models have been calculated from scatter corrected data, monitoring of the PLS or

MBPLS scores enables diagnostic control charts to be constructed. These may be

constructed from just two PLS scores and are able to provide diagnosis of the process

problem as the region in which the unusual batch scores are located is characteristic. For

example, the scores of blends or tablets which produce friable tablets or tablets which

have high average thickness tend to locate in specific areas of the score plot, and outside

the 99% Hotelling’s 7^ control ellipse. This was also found to be the case with batches

for which no reference analytical data was provided (eg BN5041 and BN5042). These

batches were consistently found to be unusual throughout the process, and had

preceding batches which were shown to be unusual from C. of A. data. These batches

would therefore be expected to show the same trend as the preceding batches, i.e. show

friability or increased tablet thickness etc. The process variance control charts were also

found to be useful for process monitoring. With the lower strength tablets, batches

which showed significant process variation corresponded to: friable tablet batches;

batches with low average drug substance content per tablet and batches which occurred

during a manufacturing period where a trend in increase in content uniformity was

observed (although remaining within limits). These control charts for the higher strength

tablet process showed consistency in results at the blend and tablet stages. For example,

batches which showed significant process variability at both stages produced tablets

which had high average thickness and which were friable.

Overall, the PLS and MBPLS models produced consistent results between process

stages and results which are consistent with the PCA methods (Chapter 3). This

suggests that these PLS methods may also be solely used to control and monitor the

quality of product produced in this process.

221
CHAPTER 5

Multivariate Image Analysis of Near

Infrared Multispectral Blend Images For

Quality Control of Pharmaceutical Blending

5 .1 Introduction

In Chapters 3 and 4, multivariate procedures were described for statistical quality

control of blending and tabletting using near infrared spectroscopy. The methods

described required construction of a reference set of batches which processed normally.

Unusual batches could be identified easily since they did not belong to the same

multinormal distribution. In some instances, chemical and physical information could be

directly obtained from the NIR data, enabling diagnosis of the process problem. For

example, significant variation in the PC scores of a batch of blend on a PC known to be

related to moisture content. However, this was not always possible.

In this chapter, near infrared multispectral imaging is used to study some of the unusual

blends. The aim is to generate detailed, spatially resolved images which provide

diagnostic chemical and physical information at the microscopic level. Section 5.4

describes liquid crystal tuneable filter InSb focal plane array NIR imaging microscopy.

Multivariate methods and data pre-treatments are described in Sections 5.5 and 5.6

respectively. In Sections 5.7 to 5.9, methods are described for statistical control and

monitoring of blend quality. Section 5.10 describes alternative approaches to monitoring

blend quality. Conclusions regarding this method are described in Section 5.11.

222
5. 2 Materials Studied

A selection of blends which exhibited unusual reference analytical results at the blend

and tablet stage were studied. Their batch numbers (BNs) were: 5039, 5040, 5045,

5064, 5065, 5067, 5070, 5078. Blend BN5045 exhibited excessive moisture deviation

and poor uniformity of content of the drug substance, throughout the blend; large lumps

of drug substance were found. These did not dissolve prior to high performance liquid

chromatography (HPLC) analysis, and were observed by visible light microscopy in this

study. This blend was therefore not tabletted and was considered an important blend for

study by NIR imaging microscopy. The drug substance content of each blend should lie

between 2.42 to 2.58% (2.5% nominal value); between different blend samples of the

same batch, the maximum absolute deviation in active content from their mean value

should not exceed 3%. In addition to these blends, powdered crystalline drug substance

was also studied.

5. 3 Sample Preparation

A single sample from each batch was prepared for imaging using a powder sampling

accessory with a barium fluoride window. Samples were prepared by compressing

approximately 200 mg of powder between two flat circular barium fluoride discs. The

discs were held together between three stainless steel pins mounted on a steel plate and

were tightly secured to the plate by a steel screw cap. A circular hole in the screw cap

allowed for passage of light through the barium fluoride disc and onto the sample. The

samples were positioned on the microscope stage with the barium fluoride window

aligned beneath the objective lens.

223
5. 4 Liquid Crystal Tuneable Filter InSb Focal Plane Array Near Infrared

Imaging Microscopy

5. 4 .1 Instrumentation

The near infrared (NIR) imaging system (MatrixNIR, Spectral Dimensions Inc.,

Maryland, MD, USA) incorporated a high resolution focal plane array InSb detector,

capable of producing NIR-images at high frame rate and with resolution of 256-by-256

pixels. The focal plane array is cooled by a Dewar flask filled with liquid nitrogen.

Light from the 100 W tungsten halogen lamp was directed through the liquid crystal

tuneable filter (LCTF) which had a tuneable range of 1100 to 1900 nm and wavelength

resolution of 1 nm (bandpass of 6 nm). The light emitted through the LCTF was then

reflected by a beam splitter through an Olympus microscope (objective lens

magnification of five times), equipped with a tungsten halogen lamp of variable

intensity, and onto the sample. Adjustment of shutters on the beam splitter directed

light diffusely-reflected from the sample up through the objective lens and onto the

focal plane array detector via a mirror.

5. 4. 2 Sample Imaging

A sample prepared from each batch was imaged through the barium fluoride window of

the sample accessory. The microscope stage height required to focus light from the

LCTF on the powder surface was determined by manually focusing visible light on to

the sample using the lamp attached to the microscope. The shutters of the beam splitter

were adjusted for this purpose. When focused, the microscope lamp was then switched

off and the shutters of the beam splitter re-adjusted to allow light from the LCTF to

reach the powder surface and be diffusely reflected onto the InSb detector. The InSb

detector was maintained at low temperature by means of a liquid nitrogen filled Dewar

flask. All images were produced in a darkened room with the camera frame rate and

224
gain adjusted to provide optimum image quality with minimal saturation of the focal

plane array pixels. A frame rate of 7.89 Hz and 2000 accumulations per image plane

(wavelength) were deemed to provide optimum image quality and were used

subsequently. Images were recorded at each wavelength across the spectral region used.

For each sample, this provided a three dimensional image cube where the first two

dimensions (x,y) represent spatial location (pixel) and the third dimension represents

wavelength. Each pixel represented 36 p,m^ of sample.

A background reflectance intensity image was produced using a ceramic reflectance

standard and with the same camera settings and wavelengths as for the blends.

5. 4. 3 Data Analysis

The multispectral images of the blends were individually transformed to diffuse-

reflectance NIR images by pixel-wise division of each pixel by the corresponding pixel

of the ceramic background image. The resultant images were used for multivariate

image analysis (MIA). All programs used were written in Matlab 5.3 Scientific and

Technical Programming Language (The Mathworks Inc., Natick, MA, USA) and were

run on an Acer Pentium II 333 MHz PC fitted with 320 Mb RAM.

5. 5 Multivariate Image Analysis

5. 5.1 Wavelength Range Selection

Preliminary data analysis using NIR absorbance spectra of blend BN5045 (n = 42

spectra) and a placebo blend (n = 9 spectra) (Chapter 3) were performed in order to

determine the spectral region where the active drug absorbs. This was investigated to

permit imaging over the minimum number of wavelengths necessary to reduce both the

computation time and the image file size. Principal components analysis of 11 point

quadratic Savitzky-Golay 2"^ derivative smoothed absorbance spectra (1110 to 2200

225
nm, 2 nm increments) (Fig. 5.1.) was able to separate the placebo blend and blend

BN5045 data on the first component (Fig. 5.2). The loadings for the first component

revealed that the important wavelengths to monitor {i.e. highest absolute loading values)

were 1132, 1135, 1649, 1665, 1701 and 2145 nm (Fig. 5.3), although the latter

wavelength was outside the range of the NIR imaging microscope. The loadings from

1649 to 1701 were most important, hence the wavelength range selected was restricted

to 1600 to 1750 nm. The wavelength increment was carefully set to 6 nm and therefore

incorporated these important peaks. The number of wavelengths selected was therefore

26 out of the 800 hundred possible. Imaging time for each blend was approximately 1.8

hours.

5. 5. 2 Multiway Principal Components Analysis

Multiway principal components analysis of three dimensional image data produces a

loading vector and a score image for each principal component extracted (Esbensen et

al, 1992; Geladi et al, 1989a, 1989b, 1994, 1996; Geladi, 1992). Mathematically, the 3-

way algebra is equivalent to unfolding the 3-way array, X, to a two way array followed

by ordinary principal components analysis (Section 3.6.3, equation (3.6.1)) (Bharati and

MacGregor, 1998). This unfolding results in only transient loss of information as the

resulting scores, T, may be rearranged back to form an image (three-way array) (Geladi

et al, 1996). This procedure has been termed unfolding and backfolding (Wold et al.,

1987). The principal components algorithm used here was the power method. This was

considered to be more suitable and computationally faster than the NIPALS method as

the kernel matrix is formed only once. Successive PCs are extracted by deflating the

residual kernel matrix. Different methods of calculating the kernel matrix were

examined. These were: the cross products matrix; the variance-covariance matrix and

the correlation matrix (Geladi et al, 1996). Preliminary data analysis showed these

226
■S-1

-3

-4

1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
W avelength/nm

Fig. 5.1. Savitzky-Golay 2"^ derivative (11 point quadratic) of absorbance NIR
spectra of a blend BN5045 {n = 9) and a placebo blend {n = 42) over the wavelength
range 1110 to 2200 nm (2 nm increments).

-2
•2.5 •2 -1.5 0 0.5 1 1.5 2 2.5
PC2 S c o re s
xIO"^

Fig. 5.2. Hotelling’s control ellipse (99%) of PCI and PC2 scores for a blend
BN5045 (n = 42 blend BN5045 spectra used in PCA, 11 point quadratic Savitzky-
Golay second derivative of absorbance data) showing separation of blend BN5045
from placebo blend (n = 9 placebo blend spectra used in PCA) (control limits set
using blend BN5045 PC scores).
227
0.2

0.15

0.1

0.05

0.05

-0.1

- 0.15

1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
Wav«l*ngthAim

Fig. 5.3. Principal component loadings for the first component for 11 point
quadratic Savitzky-Golay second derivative of absorbance spectra of a blend
BN5045 in = 42) and a placebo blend {n = 9).

228
methods to produce similar results. All PCI and PC2 images formed from these kernel

matrices were considered to represent an intensity and contrast image respectively

(Geladi et al, 1996).The cross products matrix has been suggested as a method for

generating intensity PC images (Geladi et al, 1996), however the first two PC images

from this kernel matrix were not visually different from those obtained using the other

two kernel matrices. Subsequent MIA used the variance-covariance matrix and not the

correlation matrix to avoid scaling up variables representing small amounts of variance

{i.e. noise) to unit variance.

5. 6 Image Cube Data Pre-treatm ents

5. 6 . 1 Sample Illumination A n d Multivariate Shading Correction

A preliminary multivariate image analysis was performed on the images using

reflectance data and the variance-covariance matrix. The first principal component

image for each blend was found to represent the intensity of reflected light {eg BN5045

(Fig. 5.4A.)). This image clearly shows that the illumination of the sample was non-

uniform. The score values on the first component confirm this and show a trend of

increasing intensity with pixel number (Fig. 5.5). The effect of pixel norm scaling

(multivariate shading correction) (Geladi et al, 1996) was also studied. Improvement in

intensity across the first PC image was observed (Fig. 5.4B) however lower intensity

still remained around the borders of the first 100 rows of the image. MIA was therefore

restricted to rows 100 to 256 for each image {i.e. 157 rows). Pixel norm scaling was not

considered necessary for this image region and was therefore not used.

5. 6 . 2 Noise Reduction by Median Filtering

The effect of an m-hy-n median filter on signal to noise ratio was investigated for all

images. This involved replacing pixel reflectance values by the median value of itself

229
150

200
#
250
50 100 150 200 250

50 100 150 250

Fig. 5.4. Principal component one image (PCI) of blend BN5045 (reflectance data)
showing non-uniform illumination of sample. A) Before multivariate shading
correction, B) after multivariate shading correction.
230
w -0 .5

-1 .5

3 4
P ix e l

Fig. 5.5. Principal component one score plot (PCI) for blend BN5045 (reflectance
data) showing variation in illumination intensity (score value) with pixel number.

231
and the pixels surrounding it in a 3-by-3 pixel filter at each image plane (wavelength).

The filter clearly improved signal to noise, probably by removing stray light and ‘salt

and pepper’ noise, producing smoother images (Fig. 5.6). Median filtering also

produced better visual contrast between true image features for each principal

component image (Fig. 5.6) because the 256 intensity levels were set from proper

intensity levels and not from extreme noise values. In figure 5.6, a surface scratch on the

barium fluoride disc of image BN5045 (lower left part of image) shows up more clearly

as a dark line in PCI image after median filtering (compare Fig. 5.6A & 5.6B). Other

spectral features, such as crystalline material, appear brighter after median filtering

(compare Fig. 5.6A & 5.6B). Examination of the first four principal components

loadings for PCA models derived from raw reflectance data and from median filtered

reflectance data showed that the third and fourth PCs appeared to contain more

chemical information after median filtering (Fig. 5.7). The cumulative percentage sum

of squares explained by the PCs extracted was greater after filtering, especially with the

first PC, and confirms the reduction in random noise in the data set (Table 5.1). Median

filtering was therefore considered very useful and was used in all subsequent MIA.

5. 6. 3 Effect o f Pixel-Wise Digital Smoothing A nd Baseline Correction on M IA

The effect of a range of data pre-treatments, commonly applied to near infrared spectra,

were studied with the blend images. These were: SNV; Sg2dl 1 and 15 point quadratic

Savitzky-Golay smoothed second derivative and DT transformation. Each was applied

pixel-wise to the 26 reflectance values. All pre-treatments were found to remove much

of the intensity information from the images. The first two principal components

loadings showed high positive and negative peak values across the wavelength range

and were considered to represent chemical information. The loss of intensity

information from the images is likely to make detection of unmilled crystalline material

232

È . . r : ' ' % ' :' i / J

Fig. 5.6. The effect of 3-by-3 median filtering on image signal to noise ratio (first
principal component image, reflectance data) for blend BN5045. A) No filtering, B)
3-by-3 median filtered data.

233
W av«l*ngth/nm

W av«l«ngth/nm W #v#l#ngth/nm

I
I
g

§ 3 - 0.1

Fig. 5.7. Principal component loadings for blend BN5045 (reflectance data) for
first four components. Raw reflectance data: (A) to (D); median filtered reflectance
data: (E) to (H).

Table 5.1. Cumulative percentage sum of squares (%SS) explained by principal

components models (reflectance and median filtered reflectance data sets) for

blend BN5045.

Data set Principal components %SS

Reflectance 1 57.19

2 80.01

3 84.56

4 87.79

Median filtered reflectance 1 76.99

2 87.23

3 89.76

4 91.70

234
harder as unmilled material appears to reflect with higher intensity, probably due to

some specular reflectance. This was observed with the first PC image for all blends (Fig

5.8), especially BN5045 (Fig. 5.6A & 5.6B). For this study, use of reflectance data was

therefore preferred.

5. 7 Multiway Principal Components Analysis of Multispectral Images

5. 7. 1 Multivariate Segmentation: Detection of Spatial Locations of Unmilled

Material And Drug Substance For Blends

Multivariate segmentation of images (pixel class delineation) identifies regions in the

image where a class of material is located (Geladi et al, 1996). In this study three

different approaches to identifying regions of drug substance and unmilled crystalline

material were examined.

The first approach attempted to classify by selection of pixels on a principal component

image above an arbitrary threshold intensity.

Another method studied involved construction of a pixel density map. This method

required selection of two PC images to monitor and therefore necessitated visual

interpretation of the PC loadings (for example the first and third PCs would be selected,

representing intensity and drug substance respectively). A pixel density map is a three

dimensional histogram which classifies pixels according to their intensities (0 to 255

intensity levels) on the two selected PCs. Mathematically, it is a two way matrix where

the row and column dimensions represent the intensity on the two PCs and the value of

an element in the array is the number of pixels with these intensities (Bharati and

MacGregor, 1998). Once calculated, the array is then displayed as an image. The

reasoning behind this approach is that pixels representing the same class of material will

show similar

235
100

150 150
50 100 150 200 250 50 100 150 200 250

100

150
50 100 150 200 250 50 100 150 200 250

100 100

150 150
100 150 200 250 50 100 150 200 250
H

100

150 150
50 100 150 200 250 50 100 150 200 250

Fig. 5.8. Principal component one intensity images of blends (median filtered
reflectance data). A) BN5039, B) BN5040, C) BN5045, D) BN5064, E) BN5065, F)
BN5067, G) BN5070, H) BN5078.

236
intensities on the two PCs and will be grouped together in the density map (Bharati and

MacGregor, 1998). Regions of interest in these maps were highlighted by placing them

inside a rectangle, selected using a computer mouse and an on-screen cross-hair pointer.

Pixels with the range of selected intensities for the two PCs were then projected back

into the image score space for visual interpretation.

The final method studied involved construction of Shewhart type control charts for the

PCs studied. The control limits used were those suggested by Jackson (1991) (Section

3.7). The reasoning behind this method is that pixels which represent a high

concentration of a material will show high positive or negative values on the PC which

represents that material.

With each method, the pixels which were selected were also used to construct binary

images. Selected pixels were given a value of one (white), non-selected pixels were

given a value of zero (black).

Overall, the most effective method for these data sets was found to be the Shewhart

chart method. This was able to locate areas of unmilled crystalline material with both

the blends and powdered drug substance from the first PC (intensity). In addition, with

the blends, the spatial locations of drug substance (both fine and unmilled clumps) were

located by use of a control chart for the third PC. This PC showed loadings similar to

spectra of the placebo and active blends (Fig. 5.9A and 5.9B) and importantly, similar to

the loadings of the third PC of drug substance (Fig 5. IOC). In this spectral region, the

absorbance was found to be higher if active material was present (Fig. 5.9), especially

from 1648 to 1672 nm. It therefore follows that drug substance shows lower reflectance

in this region. Interpretation of the loadings for the third component revealed that areas

of blend with no drug substance would have lower (more negative) score values than

areas of blend with drug substance. This is because regions of blend with low drug

237
-0 .0 3
-0.04

-0 .0 4
5-0.05
-0.05
-0.06
c -0.06

S -0.07 S -0 .0 7

-0 .0 8
-0.08
1600 1650 1700 1750 1600 1650 1700 1750
W avelength/nm

o 10

o
g
I.
° 1600 1650 1700 1750 ° 1600 1650 1700 1750
W avelength/nm W avelength/nm

Fig. 5.9. Near infrared absorbance spectra of a placebo blend (n = 9) and a blend
{n = 42) (1600 to 1750 nm, 6 nm increment). Absorbance spectra: A) placebo blend,
B) blend. Second derivative of absorbance: C) placebo blend, D) blend.

I
1

Wmv#l*ngt.h/r

Fig. 5.10. Principal components (PC) loadings for the first four components for
powdered drug substance. A) First PC, B) second PC, C) third PC, D) fourth PC.

238
substance content will, similar to placebo blend, show higher reflectance. The

contributions to lower score values for such pixels will therefore be mostly from the

high negative loadings. With the careful selection of the wavelength range used {i.e.

where most differences in reflectance occur between drug substance and the bulk), the

drug substance should be identified from the Shewhart PC chart above a threshold level

(95% or 99% confidence). A comparison of the first and third PC images of blend

BN5045 provided evidence which supported this. The areas of crystalline material in

this blend (Fig. 5.11C) are known from certificate of analysis tests to represent clumps

of unmilled drug substance. Although these regions showed high reflectance in PCI

image, probably due to specular reflectance from the crystals, some absorption of light

also occurred as the same areas also appear as high score values for the third PC image.

With blend BN5067, the polarity of the loadings on PC3 were reversed. Pixels with low

score values were therefore deemed to represent drug substance, and also had identical

spatial locations to the crystalline material identified on the first PC image.

Binary images were produced for each blend using the third PC image. The percentage

of pixels found to represent drug substance was approximately 2.5% for the blends.

This is not inconsistent with the concentration of drug substance in the blends

(2.5%""/m).

5. 7. 2 ^On-line ^ Type Analysis o f Blends From PC Shewhart Control Chart Binary

Images

The binary images (95% control limits on PC scores) representing unmilled crystalline

material (PCI) and drug substance (PC3) were monitored by dividing the images into

16 sub-images (Fig. 5.12) (each image used just the first 156 rows, and all 256

columns). With each image sub-area (39-by-64 pixels), the number of pixels classified

as unmilled or drug substance was recorded, and the results for the 16 sub-areas were

239
150 200

200 250 200 250

150 200 150 200

Fig. 5.11. Binary images of crystalline areas produced from principal component
one intensity images of blends (median filtered reflectance data). A) BN5039, B)
BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H) BN5078.

240
1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

256

Fig. 5.12. Image sub-areas in = 16).

241
displayed as bar charts. This allowed for monitoring of areas with high concentration of

either unmilled material or drug substance.

Detection o f Unmilled Material

The results of this monitoring method allowed easy identification of areas with high

pixel concentration of unmilled material (Fig. 5.13). Overall, an average percentage of

unmilled material was calculated for each image and control limits, based on the t-

distribution (Neave, 1995), were used corresponding to 95% and 99% (Table 5.2).

These limits differ between blends and were used as a rough guide for identifying high

concentration areas; future monitoring would probably use limits set empirically. All

images showed at least one region with a high pixel concentration of unmilled

crystalline material (Fig. 5.13). Blend BN5045 showed two areas with high pixel

concentration of unmilled material (Fig. 5.13C). Blends BN: 5039; 5040; 5064; 5065

and 5067 all showed one area with a pixel concentration of 15% or greater (Fig. 5.13).

Detection o f Drug Substance

Monitoring of the PC3 binary images used the same method as for unmilled crystalline

material. For each image, the percentages of drug substance for the 16 sub-areas were

averaged and the standard deviation calculated (Table 5.3). From the results for these

images, blends BN: 5040, 5064; 5070 and 5078 appear to have mean drug substance

contents by surface area which are reasonably consistent with the specification limits of

2.42 to 2.58%"™/m. Blends BNs: 5039 and 5045 showed high concentration of drug

substance content in these images and their respective certificate of analysis results

confirmed that some blend samples showed excessive drug substance content. The

standard deviation of drug substance content was higher for BN5045 than for BN5039,

which agrees with the certificate of analysis content uniformity results. Two batches,

242
B

■5 15
X
■q.
o
10
ra o>

5 10 15 5 10 15
Image area Image area

5 10
image area Image area

«20

|io

Image area
0)
5

5 10
Image area
iJ
15

15

-10

5 10 15
Image area Image area

Fig. 5.13. On-line Shewhart PC control bar charts showing percentage of unmilled
crystalline material in 16 sub-areas of blend images, with mean and sigma and 2

sigma limits. Blend image: A) BN5039; B) BN5040; C) BN5045; D) BN5064; E)


BN5065; F) BN5067; G) BN5070; H) BN5078.

243
Table 5.2. PCI Unmilled crystalline material Shewhart control bar charts of blend
images, each divided into 16 image sub-areas.

Blend Percentage unmilled material (mean) Standard deviation o f unmilled material (%) Content uniform ity"

BN5039 1.888 4.064 729.71

BN5040 1.845 3.900 705.43

BN5045 1.470 2.854 464.22

BN5064 2.487 5.064 676.64

BN5065 2.101 4.263 618.95

BN5067 2.229 5.264 780.90

BN5070 1.360 1.726 315.47

BN5078 1.608 3.427 700

Table 5.3. PC3 ‘drug substance’ Shewhart control bar charts of blend images, eai

divided into 16 image sub-areas.

Blend Image calculated percentage of drug Certificate of analysis drug content Standard deviation of drug Content uniformity"

substance by surface area (mean) content

BN5039 2.659 2.523 0.694 68.362

BN5040 2.574 2.462 1.280 105.5

BN5045 2.654 2.579 1.230 140.0

BN5064 2.544 2.455 1.076 68.50

BN5065 2.637 2.448 1.579 155.3

BN5067 2.284 2.443 1.180 143.9

BN5070 2.478 2.502 0.981 110.3

BN5078 2.514 2.521 1.037 81.67

^ ‘Content uniformity’ is the maximum absolute difference between the image sub-area
with the highest number of pixels classified as unmilled with the mean value, divided by
the mean value over all sub-areas, expressed as a percentage.

244
BN5065 and BN5067, were produced consecutively and subsequently found to have

processed unusually by NIR spectroscopy (Chapters 3 and 4), having showed high and

low drug substance contents respectively by this method. These results are shown in

Figs. 5.14 and 5.15.

5.8 Particle Size Analysis of Unmilled Crystalline Material and Drug

Substance.

The binary images of unmilled crystalline material and of powdered drug substance

were subjected to particle size analysis. With each image, all 157 rows and 256 columns

were used. Particle-sizing was performed by scanning horizontally and vertically across

each binary image (PCI ‘unmilled crystalline material’ and PCS ‘powdered drug

substance’ binary images) and measuring the diameter of each particle encountered.

Hence for each particle, a number of measurements of its size were recorded (Table

5.4).

The data for each image were used to construct percentage frequency particle size

distributions (Fig. 5.16 and 5.17) and to estimate the mean, median and maximum

particle sizes for each image, and the standard deviation of each percentage frequency

particle size distribution. The resolution of the NIR imaging microscope restricted the

lowest measurable particle size to 6 microns.

Particles identified as crystalline material in each blend’s PCI binary images showed

mean and median particle sizes greater than those measured for the corresponding PC3

powdered drug substance images (Table 5.4). In addition, with the exception of blend

BN5070, the maximum measured particle sizes of each blend’s PCI unmilled

crystalline material image were much larger than with the corresponding PC3 powdered

drug substance image. Blends: BN5040; BN5045; BN5065 and BN5078 showed the

245
100 150 200 250 100 150 200 250
H

150 200 250 200 250

Fig. 5.14. Binary images of areas of drug substance produced from principal
component three images of blends (median filtered reflectance data). A) BN5039,
B) BN5040, C) BN5045, D) BN5064, E) BN5065, F) BN5067, G) BN5070, H)
BN5078.

246
0) o
O)
o

i ü
5 10 5 10
Im age area Im age area

X6
0)
o 4

5 10 5 10 15
Image area Im age area

5 10 5 10
Im age area Im age area

a.
o4 o 4
oO)

i ü
5 10 15 5 10 15
Im age area Im age area

Fig. 5.15. On-line Shewhart PC control bar charts showing percentage of drug
substance in 16 sub-areas of blend images, with mean and sigma and 2 sigma
limits. Blend image: A) BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065;
F) BN5067; G) BN5070; H) BN5078.

247
Table 5.4. Particle size analysis of unmilled crystalline material and powdered

drug substance from PCI and PC3 binary images respectively.

B a tc h C r y s t a l l i n e m a t e r ia l ( P C I i m a g e s ) P o w d e re d d ru g s u b s ta n c e (P C 3 im a g e s )

n* M ean t/so /n m o /^ m M ax. M ean d sü !\im d \a n M ax.

s i z e / |i m s iz e /^ m s iz e /^ im s i z e / |i m

B N 5039 707 1 3 .2 6 12 9 .3 0 60 1356 9 .4 5 6 5 .4 2 48

B N 5040 638 1 4 .6 5 12 1 0 .9 8 102 1365 9 .2 0 6 5 .0 7 42

B N 5045 443 1 6 .1 9 12 1 5 .6 3 120 1386 9 .2 7 6 5 .4 4 42

B N 5064 914 1 3 .4 3 12 9 .4 9 72 1377 8 .9 8 6 4 .8 0 36

B N 5065 683 1 5 .1 4 12 1 1 .7 6 108 1364 9 .5 0 6 5 .1 8 42

BN 5067 721 1 5 .5 5 12 1 0 .8 6 96 1231 9 .0 3 6 4 .9 8 36

B N 5070 528 1 2 .4 1 12 8 .5 48 1365 8 .8 6 6 4 .6 3 48

B N 5078 592 1 3 .5 8 12 9 .8 8 114 1319 9 .2 3 6 4 .9 4 36

where n is the number of particle size measurements recorded for each image.

248
>>40

O-30

0)20

e 10

6 12 18 24 30 36 42 48 54 60 20 40 60 80 100
Particle size/|im Particle size/pm

>,40

O’ 30

0)20

y 10

20 40 60 80 100 120 6 12 18 24 30 36 42 48 54 60 66 72
Particle size/pm Particle size/pm

- 20

(u 10

20 40 60 80 100 20 40 60 80
Particle size/pm Particle size/pm

o 40

ST30

O'20
Ü 10

6 12 18 24 30 36 42 48 54 6 12 18 24 30 36 42 48 54
Particle size/iim Particle size/^m

Fig. 5.16. Percentage frequency particle size distributions of unmilled crystalline


drug substance measured from blend PCI binary images (spatial resolution = 6
pm): A) BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G)
BN5070 and H) BN5078.

249
6 12 18 24 30 36 42 48 6 12 18 24 30 36 42
Particle size/pm Particle size/pm

12 18 24 30 36 42 12 18 24 30 36
Particle size/pm Particle size/pm

12 18 24 30 36 42 12 18 24 30 36
Particle size/pm Particle size/pm

6 12 18 24 30 36 42 48 12 18 24 30 36
Particle size/pm Particle size/pm

Fig. 5.17. Percentage frequency particle size distributions of powdered drug


substance measured from blend PC3 binary images (spatial resolution = 6 pm): A)
BN5039; B) BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G) BN5070
and H) BN5078.

250
largest unmilled crystalline material particle size.

5. 9 M ultivariate M onitoring of Blend Quality

In this Section, all 157 rows and 256 columns of each image were used.

5. 9. 1 Monitoring Chemical Composition o f Unmilled Crystalline Material by

Residual Analysis

With these images, most pixels have been shown to fall within control limits on a given

PC, that is most are grouped within the same multinormal distribution. Preliminary Q

residual analysis for blend BN5045, with a 7PC model, showed that most of the 40192

pixels fall within the model space, as predicted by the PC Shewhart control charts.

Unusual pixels, which did not fit the model were found to be those with high positive or

negative score values on a PC and included pixels representing crystalline areas,

crystalline drug substance and holes in the powder surface or scratches on the barium

fluoride disc. Either of these groups of pixels should therefore be amenable to Q

residual analysis, once identified from the PC score Shewhart control charts. A PCA

model specific to either of these groups would need to be constructed, with an

appropriate rank determined by cross validation. In this work, location and chemical

characterisation of areas of large unmilled crystalline material was considered important

and was therefore monitored.

The method of detecting unmilled crystalline material on the PCI intensity images by

Shewhart control charts was used to identify the spatial locations with highly reflective

dense crystalline material. Blend BN5045 was treated as a reference blend, from which

the Q residual model was built. This blend was used as a reference as its crystalline

areas are known to be unmilled drug substance. With the other unusual blends, the

crystalline areas do not all necessarily contain the same chemical composition, i.e. drug

251
substance. For these, this was therefore monitored by residual analysis using the Q

statistic and also by visual inspection of residual distance, Q, to model images.

Cross Validatory Estimation o f The Number o f Principal Components to Retain

A PCA model was built from blend BN5045 using pixels which had intensity values

above the PC Shewhart 99% control limit (« =196 pixels). The crystalline material on

this image is known from certificate of analysis data and from examination of both PCI

and PC3 images to be drug substance crystals. Cross validation using 14 subgroups each

with 14 pixels was performed. The W, R and percentage sum of squares {%SS) statistics

were examined. The rank of the data set was determined to be 7 PCs {R = 1.03, %SS =

8 8 .6 6 , Table 5.5).

Residual Distance, Q, to Model Images

Residual analysis of PCA models was examined after extraction of 3 components,

which accounts for intensity and drug substance content, and after extraction of 7

components, the model rank determined by cross validation for pixels with PCI score

values above 99% control limit (Fig. 5.18). For calculation of the residual distance to

model images (Geladi et al, 1996), PCA models were constructed from the entire image.

The residual distance, Q^a , of each pixel from the 3PC and from the 7PC models was

calculated as:

r K ^ (5.8.1)
Q ua =1jt=i ijkA

Where ^ijL4 is a vector of size (K-by-1) with indices i andy, extracted from E a , d\jA is a

pixel with indices i and j for the PCA models with rank A, i= 1, ..., / is the column

252
Table 5.5. PCA results for modelling unmilled crystalline drug substance in blend

BN5045 (n = 196 pixels) (pixels selected from PC3 Shewhart control chart with

99% control limit).

Principal component W statistic R statistic Cumulative % 55*

0 -

1 14.25 0.61 40.83

2 8.89 0.72 59.05

3 10.21 0.70 73.07

4 9.34 0.72 81.65

5 3.08 0.91 84.64

6 2.70 0.94 86.97

7 1.29 1.03 88.66

0.003

0.002

Observation

Fig. 5.18. Q Residual control chart for PCA model of unmilled crystalline material
identified from PC Shewhart control chart of blend BN5045 with a 99%
confidence level (/i = 196 observations, 7PCs) (95% and 99% limits for Q shown).

^ Cumulative % SS is the cumulative percentage sum of squares of the original data set
explained by the model with n PCs.

253
index in the image, 7 = 1, . . 7 is the row index in the image, k= 1, ..., 7Tis the variable

index for the multivariate residual image from MPCA, E a (Section 3.6.3, equation

(3.6.1)), and A is the effective rank of the model considered (3 or 7PCs).The form

an intensity image Qa which shows distance and location and clustering (Geladi et al,

1996). Regions which fit the model well will show small distances, regions which do

not fit the model will show up as large distances. After 3PCs were extracted, most of the

crystalline areas fit the model, however one large crystal is observed in the upper right

region (Fig. 5.19A). However, after 7PCs, little structure remains in the residual image

E a , with only a few pixels showing high distance (Fig. 5.19B). The 7PC model

therefore adequately describes most systematic variation in the image.

Q Residual Control Charts For Monitoring Crystalline Material

Using blend BN5045, a 7PC model was constructed from pixels above their PCI

Shewhart 99% control limits {n = 196 pixels) (Fig. 5.18). This limit was used instead of

the lower 95% limit to reduce Type I errors (random). Using these pixels, control limits

for Q were estimated (Section 3.7.5, equation (3.7.16)).

The 95% and 99% control limits for Q were 3.4644 x 10“^ and 4.5704 x 10“^

respectively. The type I errors were found to be satisfactory with one pixel out of 196

exceeding the 99% control limit, and 10 pixels exceeded the 95% control limit. Type I

errors were also estimated for this model using pixels identified as crystalline on PCI

Shewhart chart with a 95% control limit. Out of 598 pixels above the 95% PCI

Shewhart control limit, only 19 had Q values which exceeded the g 95% control limit

(3.18%). Future monitoring of crystalline material for the other blends used this model.

The pixels which were monitored for each blend were those which exceeded the 95%

PCI Shewhart control limits for those images (Table 5.6). Blends BN: 5040; 5064;

5067; 5070 and 5078 all showed greater than 1% of their crystalline pixels to exceed the

254
Fig. 5.19. Residual distance, d, to model image for blend BN5045 after principal
components analysis. Principal components extracted: A) 3; B) 7.

255
Table 5.6. Q Residual analysis of blend pixels classed as unmilled crystalline

material on their respective PCI control charts (95% limits). Model built from

blend BN5045 pixels above 99% control limit on PCI (/i = 7PCs).

Batch Pixels above 95% control limit on g > Q 95% control g > Q 99% control

number PCI limit (%) limit (%)

5039 781 5.2 1.4

5040 779 7.8 3.9

5045 598 3.2 0.3

5064 1023 6.5 1.5

5065 862 3.0 0.46

5067 934 9.4 4.8

5070 546 7.7 3.3

5078 670 11.5 4.3

256
99% Q limit (Table 5.6). Their control charts showed that those pixels which exceeded

the 99% limit for Q were not randomly distributed, but spatially were closely located

(Fig. 5.20). Hence these pixels are unusual and, although they are crystalline, they may

not be drug substance. Further investigation of these pixels could involve projecting

them back into the image space, to identify their spatial location, and in addition the

spectra of these pixels could be examined. However, identification of their composition

may require imaging over the full NIR spectral wavelength range.

5. 9. 2 Multivariate Monitoring o f Blend Quality Using Hotelling^s Control

Ellipses

The monitoring of blend quality with individual PC score Shewhart control charts was

compared with a more sophisticated approach. Hotelling’s 7^ control ellipses of PC

scores. This method simultaneously monitors all blend quality variables with a specific

overall type I error. The PC score Shewhart control charts used each had a type I error

of 5%; overall this equates to a type I error of 14.26% for a 3PC model.

With this method, the first three principal components scores were monitored with 95%

and 99% control limits. With the PCA models. Hotelling’s 7^ was calculated according

to equation (1.8.4).The upper control limit, UCL, for this statistic was be calculated

according to equation (1.8.5). The value of a for the I00a% critical point of the F

distribution was set to 0.95 (warning level) and 0.99 (control level).

With a PCA model derived from mean-centred data. Hotelling’s 7^ may also be

calculated as (Kourti and MacGregor, 1996):

n *2 w *2

^ '= I-T =I ^ (5.8.2)


a= l 1=1

where Xa,a= 1 , 2 , ...,« , are the eigenvalues of the covariance matrix, S, and ta are the

257
A
0.01 0.02

0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002

0
200 400 600 200 400 600
Observation Observation
C
0.01

0.008
0.015
0.006
O O 0.01
0.004
0.005
0.002

100 200 300 400 500 200 400 600 800 1000
Observation Observation
E F
0.01 0.04

0.008
0.03
0.006
O O 0.02
0.004
0.01
0.002

200 400 600 800 200 400 600 800


Observation Observation
G H
0.01 0.02

0.008
0.015
0.006
O 0.01
0.004
0.005
0.002

0 100 200 300 400 500 200 400 600


Observation Observation

Fig. 5.20. Q Statistic control charts for blends using local PCA model derived from
unmilled crystalline drug substance pixels (99% control limit on PC3 Sbewbart
chart) of blend BN5045 (95% and 99% Q control limits). Blend: A) BN5039; B)
BN5040; C) BN5045; D) BN5064; E) BN5065; F) BN5067; G) BN5070; H) BN5078.

258
scores from the principal component transformation, s^a is the variance of ta (the

variances of the principal components are the eigenvalues of S).

Interpretation o f The Principal Components Used For Control

Prior interpretation of the PC loadings and of the PC score images has shown that the

first three components represent physical and chemical information. The first

component represents intensity, and is therefore useful for monitoring physical

characteristics, such as the presence of large crystalline particles. The second

component is a contrast of the first. The third component represents chemical

information and allows identification of areas with high concentration of drug

substance. The first and third PCs were considered most important for monitoring and

are shown in Fig. 5.21. This figure shows the Hotelling’s 7^ 95% and 99% control

ellipses and the 95% and 99% PC Shewhart control limits. This is an informative plot

which allows pixels with high values of Hotelling’s 7^ to be classified as either

crystalline (high positive PCI score value), drug substance (high positive PC3 score

value) or both (high positive score values on PCI and PC3) or as a surface scratch on

the barium fluoride disc of the sampling accessory (high negative value on PCI score

value). A simple computer program was written to assign pixels with Hotelling’s 7^

above the 95% limit {UCL = 8.99, a = 0.95). This involved monitoring normalised

scores. The sum of square of each of these values corresponds to Hotelling’s 7^ and

monitoring of their normalised values for a pixel provides directional information.

Similar to the PC Shewhart control charts, a 95% control limit based on the t-

distribution was used. Hence any normalised score value exceeding a value of 1.96 was

deemed to represent a high value on the PC. An example of this is shown as bar charts

for two pixels in Fig. 5.22. The two pixels which had the highest score values on PCI

and PC3 are shown with their normalised score values. The first pixel was classified as

259
0.6
Unmilled crystalline materilal I Unmilled drug su b stan ce
I I
I I •
0.4
!.. ••• . .

0.2

Milled drug su b stan ce

O
^ - 0.2

-0 .4

Barium fluoride disip


surface scratch
- 0.6

- 0.8 _i_L
-0.15 - 0.1 -0.05 0 0.05 0.1 0.15 0.2
PC3 S cores

Fig. 5.21. Hotelling’s 7^ control ellipses (95% and 99%) and PC Shewhart control
limits (95% and 99%, dotted line and dotted-dashed line respectively) for principal
components 1 and 3 of blend BN5045 showing regions of: unmilled crystalline
material and crystalline drug substance; drug substance and surface scratches on
the barium fluoride (BaFi) disc of the sampling accessory window.

260
1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t

« 0.3

o 0 .2

1 2 3 1 2 3
P rin cip al c o m p o n e n t P rin cip al c o m p o n e n t

Fig. 5.22. Bar charts of principal components (PC) scores and normalised PC
scores of pixels from blend BN5045 PCI to 3 images (7 ^ 3, 40192-3, 0.95 = 8.99, 7^ 3, 40192-
3, 0.99 = 13.82). Pixel 34159 (drug substance), 7^ = 91.50: A) PC scores & B)
normalised PC scores. Pixel 34992 (unmilled crystalline material), 7^ = 33.45: C)
PC scores & D) normalised PC scores.

261
both drug substance and crystalline (Fig 5.22B), the second pixel was classified as

crystalline (Fig. 5.22D). This agrees with the PCI versus PC3 plot (Fig. 5.21).

‘On-line^ Type Imaging o f Multivariate Statistical Control Blend Results

As an example, the results of the multivariate control approach for blend BN5045 were

represented as an image. A colour scheme was chosen which would be suitable for an

engineer to monitor. Areas which were identified as crystalline were given the colour

red (denoting a warning colour). Areas which were identified as drug substance were

given the colour green (denoting an acceptable colour). The background colour was

blue. This is shown in Fig. 5.23. Areas which were classified as both crystalline and as

drug substance were shown as green. The advantage with this multivariate approach is

that now both areas with dense unmilled particles and drug substance may be shown on

the same graph. In this image, both unmilled crystals and drug substance occur together

in agreement with previous findings. In total 715 pixels were identified as unmilled

crystals, 850 were identified as active, and 747 as surface scratches. Further analysis of

these results showed that only 332 out of the 715 crystalline pixels were identified

solely as crystalline material, the rest were also identified as drug substance. With the

surface scratch pixels, only 399 were solely identified as being that. Therefore, with this

image, 2.97% (« = 850+332 pixels) may be considered to represent drug substance in

total, with 0.84% being unmilled large crystals. This result is not inconsistent with the

certificate of analysis results. These results may also be subjected to an ‘on-line’ type of

analysis in the same manner as with the binary images from the PC Shewhart control

charts. The identification of areas with unmilled crystals and drug substance should be

as straightforward.

262
100

120

140

150 200

Fig. 5.23. Image of blend BN5045 showing areas identified as drug substance
(green, n = 850 out of 40192 pixels) and crystalline material (red, n = 715 out of
40192 pixels) from Hotelling’s 7^ control ellipse (7^95% control limit = 8.99) of
principal components 1 to 3 scores. Pixels classified as both drug substance and
crystalline are shown as green.

263
Future Monitoring o f Blend Images Using The Multivariate Model

The multivariate control model developed from blend BN5045 was used to verify its

ability to detect both crystalline regions and drug substance. This was tested using an

image of pure drug substance. In Fig. 5.24, the first PC images of drug substance and

blend BN5045 are shown. From these, it is readily apparent that the image of drug

substance is visually more darker. This is consistent with spectroscopic analysis as drug

substance absorbs more strongly over this spectral region than the remaining bulk

components of the blend. In addition, a considerable number of dark regions are

identifiable on the drug substance image. These probably represent holes in the powder

surface. The sample of drug substance imaged was a very fine, micronised and porous

sample. In both images, crystalline areas are visible as bright yellow spots.

The multivariate monitoring procedure was the same as for blend BN5045. First, the

unfolded image array of drug substance was centred and projected onto the eigenvectors

from the 3PC model of blend BN5045. The normalised scores were then monitored and

assigned as for BN5045. In this case, a considerable number of pixels were found to

have high negative score values on PCI (« = 10660 out of 40192 pixels). These were

projected back into the image score space (Fig. 5.25) and appear to overlap with the

dark areas observed in the first PC image (Fig. 5.24). These were considered to

represent holes in the powder surface. This was confirmed by a pixel density map of

score intensities on PCs 1 and 3 for the drug substance and blend BN5045 combined

image. This showed that many of the dmg substance pixels had lower intensities on the

first PC (Fig. 5.26); this PC had a bimodal distribution of pixel intensities (Fig. 5.27).

Further analysis of the scores revealed that of these 10660 pixels, 4334 pixels were not

assigned to another class. The number of crystalline pixels was found to be 391, with

153 of these classified solely as crystalline. The number of pixels found to be drug

264
100

150

200

250

300
g *
50 100 150 200 250

Fig. 5.24. Principal component 1 image of drug substance (upper half) and blend
BN5045 (lower half) (PCA model calculated for blend BN5045) showing an overall
darker and more porous image for the drug substance.

265
20

40

GO
iP I l- s .

80
Æ ^ . : v :
100 : V î 'V '< .

120

140

50 100 150 200 250

Fig. 5.25. ‘On-line monitoring of powder porosity’ image for micronised drug
substance powder produced from 3PC model derived from blend BN5045 with
Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as holes in
the powder surface from high normalised PC2 scores (red, n = 10660 pixels out of
40192).

266
Powdered drug substance

B end BN5045
r 150

100 150
PC3 Pixel intensity

Fig. 5.26. Pixel density image of principal components 1 and 3 scores for blend
BN5045 and powdered drug substance (PCA model calculated from blend
BN5045) showing separation in intensity along PCI.

267
o 1000
0)
o- 800
r 600

100 150
PCI Pixel intensity

4000

:»3000

g> 2000

1000

50 100 150 200 250


PC3 Pixel intensity

Fig. 5.27. Pixel intensity frequency histograms of principal components (PC) 1 and
3 scores for blend BN5045 and powdered drug substance (PCA model calculated
from blend BN5045). PCI scores showed bimodal distribution of pixel intensity
frequencies (0 to 255 intensities): A) PCI (mode intensities: 72, n = 757 pixels; 135,
n = 1423 pixels) and B) PC3 (mode intensity: 123, n = 4067 pixels).

268
substance was 22241. An image showing crystalline and drug substance regions of the

powdered drug substance is shown in Fig. 5.28. By correcting the total number of pixels

for those believed to represent holes in the powder surface, the drug substance content

of this image was 62.03% by surface area. This value is not 100% and may be a result

of the signal to noise in this image; spectra (pixels) of very fine particles are likely to

have lower signal to noise and might not be classified as drug substance. Instead these

might be classified as holes in the powder surface. However, visually, this image shows

uniform coverage of the image area with drug substance.

The calculated active content value of 62.03% by surface area is less than the content by

mass. This may be due to the porous nature of the powder. Importantly, though, this

result confirms the ability of the model to identify a greater abundance of drug

substance.

5.10 Alternative Approaches to Monitoring Blend Quality

Although not tested here, the specificity of the model towards the drug substance could

be investigated by projection of an image of a chemically different powder into the

model space, followed by similar multivariate monitoring. In this study, testing the

model’s specificity to this drug substance was not necessary since all imaged blends

have been identified and qualified by NIR spectroscopy (Chapter 3). However, other

powdered materials which absorb in this same narrow NIR region might be incorrectly

classified as containing this drug substance if the same MPCA model of blend BN5045

were to be used for monitoring. In this event, it is likely that either identification and

qualification of the tested material by NIR spectroscopy, or imaging over a wider NIR

wavelength range would be required to prevent this.

In addition, model specificity might be determined by Q residual analysis of images of

other powders, using the PCA model of blend BN5045.

269
I

100

120

140
I
Fig. 5.28. ‘On-line type monitoring of drug substance content’ image of micronised
drug substance powder produced from 3PC model derived from blend BN5045
with Hotelling’s T ^control limit of 95% (T^ = 8.99) showing areas identified as
drug substance from high normalised PC3 scores (green, n = 22241 out of 40192
pixels) and crystalline material from high normalised PCI scores (red, n = 391 out
of 40192 pixels). (Dark areas corresponding to holes in powder surface and surface
scratches on barium fluoride sampling accessory window not shown, n = 10660 out
of 40192 pixels). In total, 35412 pixels were outside the 95% control ellipse.

270
Another method of classification that could be tested is /[-nearest neighbour. This

method could be used to classify pixels classed as crystalline, drug substance and as

barium fluoride disc surface scratches. This would require a multivariate model with

one or all of these classes of pixels, eg blend BN5045. The classification results by this

method might yield results as good as those based on Hotelling’s 7^ and normalised PC

scores.

5.11 Conclusion

Multivariate image analysis of NIR multispectral images may be used to monitor the

quality of blends. The method allows characterisation of physical and chemical

parameters such as drug substance content, dense crystalline material content, blend

uniformity, particle size distribution and powder porosity. Monitoring of future blends

requires construction of a multivariate model. The blend used for this purpose should

contain all of the features which are to be monitored in the future.

271
CHAPTER 6

Conclusion

The results of this study clearly demonstrate the ability of near infrared spectroscopy

and imaging microscopy to allow multivariate statistical quality control of an entire

pharmaceutical manufacturing process. This includes identification and qualification of

powdered pharmaceutical raw materials through to statistical process control of a tablet

manufacturing process at both the blending and tabletting stages.

Each of the process stages requires the construction of a multivariate model of NIR

measurements of high quality process intermediates. These measurements should

include diffuse reflectance measurements of blends and tablets and transmission

measurements of tablets (in Chapters 3 and 4, these raw data were transformed to

apparent absorbance). These data should be transformed to either SNV-DT or Sg2dl 1

data. For each process stage, the multivariate model derived from either of these data

requires validation by statistical means and by comparison of model validation results

with reference analytical data. The initial implementation of this MSPC method for

control of a pharmaceutical tabletting process will therefore require reference analysis

data. However, as the multivariate models derived from NIR measurements and from

both NIR measurements and reference analytical data were equally effective in

monitoring process operating performance, subsequent process monitoring could be

achieved solely with NIR measurements.

The NIR method has several advantages over the reference analytical tests. These relate

to both the analysis and the analytical results obtained. The NIR analysis is considerably

faster and allows non destructive analysis of the process material’s matrix. The

272
multivariate analysis of these measurements allows the process performance at all

process stages to be monitored over time. Significant deviations in process performance,

that lead to lower quality product, are readily identified and diagnosed throughout the

process. Importantly, deviations in process performance which affect the blending stage

may be observed at this stage by the NIR method. These are not always identified with

the reference analyses. NIR imaging of these blends may be used for at-line quality

control of the blending stage and provides useful diagnostic information of blending

performance. This information may allow for improved control of blending to ensure

that the tablets produced are of the highest possible quality.

273
References

Advances in Near-Infrared Measurements, ed. Patonay, G., JAI Press Inc., Greenwich,

Connecticut, 1993.

Alt, F. B., in Encyclopedia o f Statistical Sciences, ed. Kotz, S., and Johnson, N. L., John

Wiley and Sons Inc., New York, 1985.

Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley and

Sons Inc., New York, 2"^ edn., 1984.

Andersson, M., Josefson, M., Langkilde, F. W., and Wahlund, K. G., J. Pharm. Biomed.

Anal, 1999,20,27.

Aucott, L. S., Garthwaite, P. H., and Buckland, S. T., Analyst, 1988,113, 1849.

Aulton, M. E., Pharmaceutics: The Science o f Dosage Form Design, Churchill

Livingstone, Edinburgh, 1988.

Barnes, I. J., Dhanoa, M. S., and Lister, S. J., Applied Spectroscopy, 1989, 43, 772.

Barth, H. G., Sun, S., and Nickol, R. M., Anal Chem., 1987, 59, 142.

Beebe, K. R., Pell, R. J., and Seasholtz, M. B., Chemometrics: A Practical Guide, John

Wiley & Sons Inc., New York, 1998.

Bharati, M. H., and MacGregor, J. F., Ind. Eng. Chem. Res., 1998, 37, 4715.

Blanco, M., Coello, J., Iturriaga, H., Maspoch, S., and Pezuela, de la, C., Analyst, 1998,

123, 135R.

Bromba, M. U. A., and Ziegler, H., Anal Chem., 1981, 53, 1583.

Bromba, M. U. A., and Ziegler, H., Anal. Chem., 1983, 55, 1299.

Bull, C. K., Analyst, 1991,116, 781.

British Pharmacopoeia 1999, HM Stationery Office, London, 1999, vol. 2, Appendix

XVII, A & B .

274
Candolfi, A., Maesschalck, de, R., Massart, D. L., Hailey, P. A., and Harrington, A. C.

E., J. Pharm. Biomed. Anal, 1999a, 19, 923.

Candolfi, A., Maesschalck, de, R., Jouan-Rimbaud, D., Hailey, P. A., and Massait, D.

L., J. Pharm. Biomed. Anal., 1999b, 21, 115.

Ciurczak, E., Torlini, P., and Demkowicz, P., Spectroscopy, 1986,1, 36.

Cowe, I. A., McNicol, J. W., and Clifford Cuthbertson, D., Analyst, 1989, 114, 683.

Dhanoa, M. S., Lister, S. J., Sanderson, R., and Barnes, R. J., J. Near Infrared

Spectrosc., 1994, 2, 43.

Dillon, W. R., and Goldstein, M., Multivariate Analysis Methods And Applications,

John Wiley & Sons Inc., New York, 1984.

Dreassi, E., Ceramelli, G., Savini, L., Corti, P., Peruccio, P. L., and Lonardi, S., Analyst,

1995a, 120,319.

Dreassi, E., Ceramelli, G., and Corti, P., Analyst, 1995b, 120, 1005.

Dreassi, E., Ceramelli, G., Corti, P., Massacesi, M., and Perruccio, P. L., Analyst,

1995c, 120, 2361.

Dubois, P., Martinez, J., and Levillain, P., Analyst, 1987,112, 1675.

Eastment, H. T., and Krzanowski, W. J., Technometrics, 1982, 24, 73.

Esbensen, K. H., Geladi, P., and Grahn, H. P., Chemometrics and Intelligent Laboratory

Systems, 1992, 14, 357.

Forbes, R. A., Persinger, M. L., and Smith, D. R., J. Pharm. Biomed. Anal., 1996,15,

315.

Frake, P., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U. A.,

Analytical Communications, 1998a, 35, 133.

Frake, P., Gill, I., Luscombe, C. N., Rudd, D. R., Waterhouse, J., and Jayasooriya, U.

A.., Analyst, \99^h, 123,2043.

Frei, R. W. and MacNeil, J. D., Diffuse Reflectance Spectroscopy in Environmental

275
Problem-Solving, CRC Press, Cleveland, Ohio, 1973.

Geladi, P., Isaksson, H., Lindqvist, L., Wold, S., and Esbensen, K., Chemometrics and

Intelligent Laboratory Systems, 1989a, 5, 209.

Geladi, P., and Esbensen, K., Journal o f Chemometrics, 1989b, 3, 419.

Geladi, P., Chemometrics and Intelligent Laboratory Systems, 1992,14, 375.

Geladi, P., Swerts, J., and Lindgren, F., Chemometrics and Intelligent Laboratory

Systems, 1994, 24, 145.

Geladi, P., and Grahn, H., Multivariate Image Analysis, John Wiley & Sons Inc., New

York, 1996.

Hailey, P. A., Oakley, A. C. E., Doherty, P., Pettman, A. J., Sharp,D. C. A., and

Barnes, D. M. H., NIR News, 1994, 5, 10.

Hailey, P. A., European Pharmaceutical Review, 1996, 1 (2), 45.

Hailey, P. A., Doherty, P., Tapsell, P., Oliver, T., and Aldridge, P.K., J. Pharm.

Biomed. Anal., 1996, 14, 551.

Handbook o f Pharmaceutical Excipients, ed. Wade, A., and Weller, P. J., American

Pharmaceutical Association, Washington, The Pharmaceutical Press, London, 2nd

edn., 1994.

Harman, H., Modem Factor Analysis, The University of Chicago Press, Chicago, 3"^^

edn., 1976.

Dari, J. L., Martens, H., and Isaksson, T., Applied Spectroscopy, 1988, 42, 722.

The Industrial Pharmacist, 1999,10, 9.

Jackson, J. E., Journal o f Quality Technology, 1980,12, 201.

Jackson, J. E., Journal o f Quality Technology, 1981a, 13, 46.

Jackson, J. E., Journal o f Quality Technology, 1981b, 13, 125.

Jackson, J. E., A User’s Guide To Principal Components, John Wiley & Sons Inc., New

York, 1991.

276
Kortum, G., Reflectance Spectroscopy, Principles, Methods, Applications, Springer-

verlag, Berlin, 1969.

Kourti, T., and MacGregor, J. F., Journal o f Quality Technology, 1996, 28, 409.

Kresta, J. V., MacGregor, J. P., and Marlin, T. E., The Canadian Journal o f Chemical

Engineering, 1991, 69, 35.

Lindberg, W., Persson, J. A., and Wold, S., Anal Chem., 1983, 55, 643.

Lowry, C. A., Woodhall, W. H., Champ, C. W., and Rigdon, S. E., Technometrics,

1992, 34, 46.

MacGregor, J. P., Jaeckle, C., Kiparissides, C., and Koutoudi, M., AIChE Journal,

1994, 40, 826.

Mark, H., Principles and Practice o f Spectroscopic Calibration, John Wiley & Sons

Inc., New York, 1991.

Maesschalck, de, R., Cuesta Sanchez, P., Massart, D. L., Doherty, P., and Hailey, P.,

Applied Spectroscopy, 1998, 52, 725.

Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y., and Kaufman, L.,

Chemometrics: a textbook, Elsevier Science Publishers B. V., Amsterdam, 1988.

Montgomery, D. C., Introduction To Statistical Quality Control, John Wiley & Sons

Inc., New York, 1997.

Morud, T. E., Journal o f Chemometrics, 1996,10, 669.

Naes, T., and Martens, H., Journal o f Chemometrics, 1988, 2, 155.

Neave, H. R., Elementary Statistics Tables, Routledge, London, 1995.

Nomikos, P., and MacGregor, J. P., AIChE Journal, 1994, 40, 1361.

Nomikos, P., and MacGregor, J. P., Technometrics, 1995, 37,41.

Norris, K. H., and Williams, P. C., Cereal Chemistry, 1984, 61, 158.

Osborne, B. G., Peam, T., Hindle, P. H., Practical NIR Spectroscopy With Applications

in Food and Beverage Analysis, Longman Scientific & Technical, UK, 2nd edn..

277
1993.

Pearce, S., Manufacturing Chemist, 1986, 57, 77.

Piovoso, M. J., Kosanovich, K. A., and Pearson, R. K., Proc. Amer. Control Conf,

1992, 2359.

Plugge, W., and Vlies, van der, C., J. Pharm. Biomed. Anal., 1993,11, 435.

Plugge, W., and Vlies, van der, C , J. Pharm. Biomed Anal., 1996,14, 891.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Numerical

Recipes in C, The Art o f Scientific Computing, Cambridge University Press,

Cambridge, 2nd edn., 1992.

Sekulic, S. S., Wakeman, J., Doherty, P., and Hailey, P. A., J. Pharm. Biomed. A nal,

1998,17,1285.

Simmons, A., International LABMATE, 1993,17, 23.

Skagerberg, B., MacGregor, J. P., and Kiparissides, C., Chemometrics and Intelligent

Laboratory Systems, 1992,14, 341.

Stark, E., Luchter, K., and Margoshes, M., Applied Spectroscopy Reviews, 1986, 22,

335.

Statistical Process Control, ed. Mamzic, C. L., Instrument Society of America, NC,

USA, 1995.

Vlies, van der, C., Plugge, W., and Kaffka, K. J., Spectroscopy, 1995, 10, 46.

Vlies, van der, C., European Pharmaceutical Review, 1996,1(1), 49.

Wangen, L. E., and Kowalski, B. R., Journal o f Chemometrics, 1988, 3, 3.

Wargo, D. J., and Drennen, J. K., J. Pharm. Biomed. Anal., 1996,14, 1415.

Washington, C., Particle Size Analysis in Pharmaceutics and Other Industries, Ellis

Horwood, New York, 1992.

Westerhuis, J. A., and Coenegracht, P. M. J., Journal o f Chemometrics, 1997,11, 379.

Whitfield, G., Pharmaceutical Manufacturing, 1986, 3, 31.

278
Wise, B. M., Veltkamp, D. J., Ricker, N. L., Kowalski, B. R., Bames, S. M., and

Arakali, V., Waste Management Proc., Tuscon, 1991, 169.

Wold, S., Technometrics, 1978, 20, 397.

Wold, S., Geladi, P., Esbensen, K., and Ohman, J., Journal o f Chemometrics, 1987,1,

41.

279
APPENDIX A

List of Publications

O' Neil, A. J., Jee, R. D., Watt, R. A., and Moffat, A. C., J. Pharm. Pharmacol., 1997,
49, {Supplement): 19.

O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1998a, 50
(Supplement): 45.

O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1998b, 123, 2297 - 2302.

O' Neil, A. J., Jee, R. D. and Moffat, A. C., Analyst, 1999a, 124, 33 - 36.

O' Neil, A. J., Jee, R. D. and Moffat, A. C., J. Pharm. Pharmacol., 1999b, 51

(Supplement): 47.

280
APPENDIX B

Tables For PCA And Multiway PCA Data Sets

281
Table B l. PCA results: blends.
Data Set PCs PRESS SS %SS R d.f.
Raw data 0 - 3.4034e-004 0 0 0 -

1 2.3765e-005 2.3300e-005 93.1539 0.0698 1.2769e-k004 77


2 2.0786e-006 2.0070e-006 99.4103 0.0892 1.3567e4-004 65
3 6.2922e-007 5.9890e-007 99.8240 0.3135 1.3688e+004 54
4 3.5143e-007 3.2341e-007 99.9050 0.5868 1.3260e+004 44
5 1.0715e-007 9.8904e-008 99.9709 0.3313 1.2497e4-004 35
6 6.4774e-008 5.8307e-008 99.9829 0.6549 1.1670e-k004 27
7 4.2406e-008 3.7018e-008 99.9891 0.7273 1.0487e-r004 20
8 2.6310e-008 2.0742e-008 99.9939 0.7107 9.1472e+003 14
9* 2.1502e-008 1.3155e-008 99.9961 1.0366 7.7176e4-003 9

SNV 0 6.7573e-004 0 0 0
I 8.5795e-005 8.3710e-005 87.6119 0.1270 6.6698e+003 65
2 3.5162e-005 3.3626e-005 95.0238 0.4200 7.4692e-f-003 54
3 1,2058e-005 1.1431e-005 98.3083 0.3586 7.6622e-k003 44
4 5.4395e-006 5.0691 e-006 99.2498 0.4758 7.627 le-i-003 35
5 3.6739e-006 3.261 le-006 99.5174 0.7248 7.3151e4-003 27
6 2.0019e-006 1.7481e-006 99.7413 0.6139 6.7325e4-003 20
7 1.7173e-006 1.0332e-006 99.8471 0.9824 6.0815e4-003 14

Detrend 0 3.6857e-006 0 0
1 7.2327e-007 7.0743e-007 82.2622 0.1962 9.1077e-f-003 119
2 3.8726e-007 3.6801e-007 90.3347 0.5474 1.0103e-)-004 104
3 1.2486e-007 1.1867e-007 97.0801 0.3393 1.0538e4-004 90
4 7.4748e-008 6.9656e-008 98.4290 0.6299 1.0736e-t-004 77
5 4.4516e-008 4.0115e-008 99.2071 0.6391 1.0588e-k004 65
6 2.7663e-008 2.2799e-008 99.6229 0.6896 1.0316e4-004 54
7 2.3569e-008 1.5094e-008 99.7491 1.0338 9.9154e-t-003 44

SNV Detrend 0 2.9907e-004 0 0 0


1 5.1065e-005 4.9855e-005 83.3297 0.1707 6.5708e-t-003 77
2 2.5235e-005 2.3939e-005 91.9955 0.5062 7.4406e-t-003 65
3 8.3422e-006 7.8986e-006 97.3589 0.3485 7.7392e+003 54
4 4.2461 e-006 3.9428e-006 98.6816 0.5376 7.8302e+003 44
5 2.4987e-006 2.2542e-006 99.2463 0.6337 7.6138e4-003 35
6 2.0691 e-006 1.4533e-006 99.5141 0.9179 7.2036e-h003 27
7 1.6120e-006 9.3497e-007 99.6874 1.1092 6.6206e-t-003 20

Savitzky-Golay 2"*' derivative 0 3.6738e-011 0 0 0 0


1 1.7291e-011 1.6920e-011 53.9447 0.4706 1.3112e-n003 77
2 1.2038e-011 1.1599e-011 68.4283 0.7115 2.4743e-h003 65
3 1.0619e-011 9.0654e-012 75.3244 0.9155 3.0039e-k003 54
4 8.1489e-012 6.6312e-012 81.9503 0.8989 3.3147e-k003 44
5 6.4114e-012 4.3682e-012 88.1101 0.9669 3.5459e-n003 35
6 5.1445e-012 3.1088e-012 91.5380 1.1777 .6651e-h003 27
7 3.4186e-012 2.1806e-012 94.0646 1.0996 3.6170e-h003 20

8 PCs extracted after residual analysis, based on visual inspection of loadings and %SS
(= 99.993).
282
Table B2. PCA results: lower strength tablet RCA.
Data Set PCs PRESS SS %SS R d.f.
Absorbance 0 - ].5141e-004 0 0 -

1 9.4706e-006 7.9859e-006 94.73 0.06 1261.32 44


2 1.6827e-006 1.3494e-006 99.11 0.21 1362.85 35
3 3.9786e-007 2.8244e-007 99.81 0.29 1361.27 27
4 1.1642e-007 7.5907e-008 99.95 0.41 1301.11 20
5 6.9585e-008 3.9056e-008 99.97 0.92 1188.24 14
6 3.1565e-008 1.7063e-008 99.99 0.81 1019.52 9
7 1.5746e-008 8.1669e-009 99.99 0.92 828.48 5
8 8.8043e-009 3.8448e-009 100.00 1.08 601.34 2

SNV 0 1.0928e-004 0 0
1 4.3295e-005 3.8458e-005 64.81 0.40 663.75 44
2 9.6814e-006 8.3578e-006 92.35 0.25 908.64 35
3 4.2721e-006 3.4629e-006 96.83 0.51 996.20 27
4 1.8986e-006 1.4246e-006 98.70 0.55 992.58 20
5 1.2809e-006 8.7922e-007 99.20 0.90 941.74 14
6 9.0963e-007 5.8307e-007 99.47 1.03 827.79 9

Detrend 0 2.2620e-006 0 0
1 9.2012e-007 8.0456e-007 64.43 0.41 491.17 35
2 2.1666e-007 1.8132e-007 91.98 0.27 700.40 27
3 9.6564e-008 7.8296e-008 96.54 0.53 774.70 20
4 3.7717e-008 2.7805e-008 98.77 0.48 765.73 14
5 2.9147e-008 1.8019e-008 99.20 1.05 717.45 9
6 1.6825e-008 1.0247e-008 99.55 0.93 596.79 5
7 1,0453e-008 5.9254e-009 99.74 1.02 454.07 2

SNV detrend 0 9.249 le-005 0 0


1 2.6515e-005 2.328 le-005 74.83 0.29 745.16 44
2 5.1537e-006 4.4064e-006 95.24 0.22 927.63 35
3 2.689 le-006 1.9790e-006 97.86 0.61 987.53 27
4 1 2890e-006 9.3724e-007 98.99 0.65 963.08 20
5 8.8885e-007 5.8454e-007 99.37 0.95 899.96 14
6 5.4824e-007 3.3678e-007 99.64 0.94 790.16 9
7 3.6598e-007 2.1064e-007 99.77 1.09 658.20 5

Savitzky-Golay 2"“^derivative 0 - 6.1338e-011 0 0


1 3.0265e-0il 2.6700e-01I 56.47 0.49 178.00 35
2 ].6247e-011 1.3391e-011 78.17 0.61 379.04 27
3 9.4102e-012 7.2495e-012 88.18 0.70 466.90 20
4 6.2523e-012 4.411 le-012 92.81 0.86 496.67 14
5 4.2918e-012 2.8648e-012 95.33 0.97 481.88 9
6 3.2307e-0I2 2.0362C-012 96.68 1.13 430.92 5

283
Table B3. PCA results: higher strength tablet RCA.
Data Set PCs PRESS SS %SS R / d.f.
Absorbance 0 - 2.8673e-005 0 0 0 0
1 5.5138e-006 4.9724e-006 82.66 0.1923 1163.52 44
2 1.6202e-006 1.3475e-006 95.30 0.3258 1330.37 35
3 4.7295e-007 3.6299e-007 98.73 0.3510 1378.80 27
4 8.439 le-008 6.3968e-008 99.78 0.2325 1362.22 20
5 5.5398e-008 3.7337e-008 99.87 0.8660 1289.96 14
6 2.6475e-008 1.7377e-008 99.94 0.7091 1115.41 9
7 1.8597e-008 1.1116e-008 99.96 1.0702 919.66 5

SNV 0 0 1.1126e-004 0 0 0 0
1 l.ll26e-004 3.19886-005 71.25 0.34 553.93 35
2 3.7539e-005 8.2639e-006 92.57 0.31 723.56 27
3 9.9305e-006 2.9176e-006 97.38 0.45 783.28 20
4 3.7155e-006 1.0414e-006 99.06 0.48 775.59 14
5 I.4108e-006 5.9575e-007 99.46 0.84 721.16 9
6 8.7558e-007 3.6048e-007 99.68 0.97 607.10 5
7 5.7592e-007 2.7077e-007 99.76 1.35 458.54 2

Detrend 0 1.6017e-006 0 0 0 0
1 1.6017e-006 4.906le-007 69.37 0.34 539.02 35
2 5.4476e-007 1.4136e-007 91.17 0.34 731.19 27
3 1.6899e-007 7.4979e-008 95.32 0.81 797.76 20
4 1.1408e-007 1.90356-008 98.81 0.33 787.93 14
5 2.466 le-008 1.21796-008 99.24 0.92 755.82 9
6 1.7555e-008 7.6559e-009 99.52 0.97 631.49 5
7 1.1757e-008 5.6942e-009 99.64 1.31 477.33 2

SNV detrend 0 0 1.01516-004 0 0 0 0


1 1.0151e-004 2.63906-005 74.00 0.30 655.26 35
2 3.0373C-005 5.6355e-006 94.45 0.25 834.69 27
3 6.7015e-006 1.5607e-006 98.46 0.34 895.12 20
4 1.9060C-006 9.80056-007 99.03 0.85 877.35 14
5 1.325 le-006 5.6182e-007 99.45 0.82 777.21 9
6 8.0055e-007 3.5296e-007 99.65 0.98 654.44 5
7 5.4903e-007 2.5956e-007 99.74 1.26 492.12 2

Savitzky-Golay 2"“* 0 0 4.0630e-011 0 0 0 0


derivative 1 4.0630e-011 2.20716-011 45.68 0.62 60.96 35
2 2.5278e-011 1.08146-011 73.38 0.58 301.61 27
3 1.2858e-011 6.5609e-012 83.85 0.77 404.08 20
4 8.3584e-012 3.7225e-012 90.84 0.76 442.94 14
5 4.9766e-012 2.5260C-012 93.78 0.98 445.39 9
6 3.6538e-012 1.86716-012 95.40 1.15 401.31 5

284
Table B4. PCA results: lower strength tablet Intact.
Data Set PCs PRESS SS %SS R d.f.
Transmission 0 - 1.9309e-002 0 0 -

1 1.62l6e-003 1.3995e-003 92.75 0.08 2513.19 54


2 1.1444e-004 9.0285e-005 99.53 0.08 2668.62 44
3 1.1508e-005 9.1151 e-006 99.95 0.13 2688.09 35
4 2.6575e-006 1 9877e-006 99.99 0.29 2589.91 27
5 1.3875e-006 9.6032e-007 100.00 0.70 2388.58 20
6 7.8370e-007 4.8750e-007 100.00 0.82 2103.07 14
7 3.7168e-007 2.2196e-007 100.00 0.76 1778.46 9
8 1.8946e-007 1.0396e-007 100.00 0.85 1420.67 5
9 1.1290e-007 5.7926e-008 100.00 1.09 1016.83 2

SNV 0 1.3275e-003 0 0
1 1.0250e-004 8.770 le-005 93.39 0.08 1369.42 44
2 2.1154e-005 1.6823e-005 98.73 0.24 1502.04 35
3 5.9883e-006 4.7138e-006 99.64 0.36 1508.00 27
4 2.7927e-006 2.0684e-006 99.84 0.59 1441.00 20
5 1.3794e-006 9.1907e-007 99.93 0.67 1311.17 14
6 7.6756e-007 4.6957e-007 99.96 0.84 1142.90 9
7 4.4474e-007 2.480 le-007 99.98 0.95 930.50 5
8 3.0003e-007 1.5154e-007 99.99 1.21 679.33 2

Detrend 0 1.2686e-004 0 0
1 3.1805e-005 2.8447e-005 77.58 0.25 938.43 44
2 9.827 le-006 7.831 le-006 93.83 0.35 1106.23 35
3 3.8546e-006 2.6945e-006 97.88 0.49 1162.27 27
4 6.6619e-007 5.0676e-007 99.60 0.25 1154.72 20
5 4.1237e-007 2.581 le-007 99.80 0.81 1106.38 14
6 1.7928e-007 1.1244e-007 99.91 0.69 972.24 9
7 9.7649e-008 5.3988e-008 99.96 0.87 809.59 5
8 5.6415e-008 2.8888e-008 99.98 1.04 601.79 2

SNV detrend 0 2.1205e-004 0 0


1 6.3504e-005 5.3239e-005 74.89 0.30 833.47 44
2 2.1988e-005 1.6236e-005 92.34 0.41 992.37 35
3 2.9974e-006 2.4629e-006 98.84 0.18 1052.55 27
4 1.945 le-006 1.3016e-006 99.39 0.79 1058.54 20
5 6.6538e-007 4.3415e-007 99.80 0.51 976.80 14
6 4.7093e-007 2.5773e-007 99.88 1.08 878.07 9
7 2.1661e-007 1.1992e-007 99.94 0.84 718.38 5
8 1.1890e-007 6.0651e-008 99.97 0.99 539.25 2

Savitzky- 0 6.3958e-009 0 0
Golay 2"** 1 3.3879e-009 2.8325e-009 55.71 0.53 161.72 27
derivative 2 I.2977e-009 1.0l47e-009 84.14 0.46 354.11 20
3 5.8567e-010 4.5514e-010 92.88 0.58 430.16 14
4 4.1493e-010 2.8737e-010 95.51 0.91 439.20 9
5 2.6904e-010 1.7286e-010 97.30 0.94 392.77 5
6 1.8724e-010 1.1325e-010 98.23 1.08 318.36 2

285
Table B5. PCA results: higher strength tablet Intact.
Data Set PCs PRESS SS %SS R d.f.
Transmission 0 - 2.6399e-002 0 0 -

1 1.253le-003 1.1360e-003 95.70 0.05 1581.86 35


2 7.9033e-005 6.6845e-005 99.75 0.07 1700.81 27
3 1.6135e-005 1.2157e-005 99.95 0.24 1689.27 20
4 5.7359e-006 3.7653e-006 99.99 0.47 1564.20 14
5 2.5204e-006 1.5297e-006 99.99 0.67 1364.36 9
6 1.3679e-006 6.6336e-007 100.00 0.89 1107.78 5
7 7.8812e-007 3.9265e-007 100.00 1.19 804.84 2

SNV 0 8.0087e-003 0 0
1 5.1457e-004 3.9485e-004 95.07 0.06 985.63 35
2 3.7113e-004 1.2975e-004 98.38 0.94 1099.51 27
3 1.7685e-004 5.0638e-005 99.37 1.36 1083.37 20
4 1.1255e-004 2.3430e-005 99.71 2.22 1017.20 14
5 2.8991 e-005 8.0100e-006 99.90 1.24 906.17 9

Detrend 0 5.6523e-005 0 0
1 2.393 le-005 2.0798e-005 63.20 0.42 398.75 35
2 1.3644e-005 1.0400e-005 81.60 0.66 566.83 27
3 3.3333e-006 2.5707e-006 95.45 0.32 648.10 20
4 1.6904e-006 1.1839e-006 97.91 0.66 679.60 14
5 7.402 le-007 4.9321 e-007 99.13 0.63 639.88 9
6 4.063 le-007 2.5389e-007 99.55 0.82 565.20 5
7 3.3351e-007 1 8709e-007 99.67 1.31 441.01 2

SNV detrend 0 1.8997e-003 0 0


1 3.4320e-004 2.345 le-004 87.66 0.18 761.39 35
2 2.2666e-004 6.3937e-005 96.63 0.97 883.39 27
3 4.3565e-005 1.6059e-005 99.15 0.68 904.81 20
4 3.0980e-005 8.2879e-006 99.56 1.93 873.52 14

Savitzky- 0 2.5273e-007 0 0
Golay 2"“ 1 8.6205e-009 6.4296e-009 97.46 0.03 507.65 14
derivative 2 6.0060e-009 4.087 le-009 98.38 0.93 601.63 9
3 4.3278e-009 2.7883e-009 98.90 1.06 512.83 5

286
Table B6. Multiway PCA results: lower strength blend (RCA) and tablet (RCA).
Data Set PCs PRESS SS %SS R d.f.
Absorbance 0 - 2.2114e-004 0 0 -

1 5.1864e-005 4.5513e-005 79.42 0.23 1258.04 54


2 4.5078e-006 3.8245e-006 98.27 0.10 1444.08 44
3 2.6324e-006 1.9818e-006 99.10 0.69 1500.09 35
4 1.1562e-006 8.3717e-007 99.62 0.58 1437.89 27
5 5.9752e-007 4.0400e-007 99.82 0.71 1352.31 20
6 3.3556e-007 2.0858e-007 99.91 0.83 1225.38 14
7 1.9792e-007 1.1063e-007 99.95 0.95 1062.42 9

SNV 0 2.0617e-004 0 0
1 4.5335e-005 3.9636e-005 80.78 0.22 388.45 35
2 3.3000e-005 2.3310e-005 88.69 0.83 498.52 27
3 1.4935e-005 1.0887e-005 94.72 0.64 524.00 20
4 8.2068e-006 5.4248e-006 97.37 0.75 526.59 14
5 4.6967e-006 2.8593e-006 98.61 0.87 495.05 9

Detrend 0 1.9936e-006 0 0
1 8.2742e-007 7.2626e-007 63.57 0.42 226.74 35
2 5.109 le-007 3.9394e-007 80.24 0.70 384.64 27
3 2,0096e-007 1,5679e-007 92.14 0.51 457.02 20
4 1.6188e-007 1 0628e-007 94.67 1.03 483.95 14

SNV detrend 0 1.1895e-004 0 0


1 3.5454e-005 3.1488e-005 73.53 0.30 359.00 35
2 2.1297e-005 1.7173e-005 85.56 0.68 505.71 27
3 1.0739e-005 8,0632e-006 93.22 0.63 556.97 20
4 5.893 le-006 4.2767e-006 96.40 0.73 570.37 14
5 4.0154e-006 2.7064e-006 97.72 0.94 539.62 9
6 3.064le-006 1.9023e-006 98.40 1.13 466.38 5

Savitzky- 0 - 3.8694e-011 0 0 . -

Golay 2"“ 1 2.1007e-011 1.8852e-011 51.28 0.54 46.82 54


derivative 2 1.60l6e-011 !.3415e-011 65.33 0.85 253.17 44
3 1.42l6e-011 1.0387e-011 73.16 1.06 348.46 35

287
Table B7. Multiway PCA results: higher strength blend (RCA) and tablet (RCA).
Data Set PCs PRESS SS %SS R x" d. f.
Absorbance 0 - 1,8636e-004 0 0 -

1 1,7958e-005 I.5929e-005 91.45 0.10 769.35 35


2 6.6317e-006 5.4135e-006 97.10 0.42 889.52 27
3 3.7125e-006 2.7465e-006 98.53 0.69 893.48 20
4 1.6192e-006 1.1086e-006 99.41 0.59 841.54 14
5 1.0191e-006 6.2059e-007 99.67 0.92 763.99 9
6 4.7642e-007 2.8372e-007 99.85 0.77 634.94 5
7 2.4213e-007 1.4028e-007 99.92 0.85 480.84 2
8 1.4030e-007 7.4985e-008 99.96 1.00 278.46 0

SNV 0 2.2307e-004 0 0
1 5.8589e-005 5.088 le-005 77.19 0.26 340.44 35
2 3.5179e-005 2.7080e-005 87.86 0.69 461.98 27
3 2.2223e-005 1.5434e-005 93.08 0.82 496.35 20
4 1,1666e-005 7.6827e-006 96.56 0.76 496.53 14
5 6.8397e-006 4.0711 e-006 98.17 0.89 473.09 9
6 3.6997e-006 2.1260e-006 99.05 0.91 416.39 5

Detrend 0 1.8364e-006 0 0
1 8.1432e-007 7.2469e-007 60.54 0.44 168.12 35
2 5.0930e-007 4.1116e-007 77.61 0.70 339.23 27
3 3.9002e-007 2.6049e-007 85.82 0.95 409.77 20
4 1.9347e-007 1.3238e-007 92.79 0.74 436.65 14
5 1.1370e-007 7.3184e-008 96.01 0.86 438.48 9
6 7.0525e-008 4.2038e-008 97.71 0.96 399.16 5

SNV detrend 0 1.2769e-004 0 0


1 4.7403e-005 4.2703e-005 66.56 0.37 195.17 27
2 2.8092e-005 2.1595e-005 83.09 0.66 359.05 20
3 1.3839e-005 1.074le-005 91.59 0.64 420.50 14
4 9.8221e-006 6.9032e-006 94.59 0.91 430.41 9
5 5.5516e-006 3.8135e-006 97.01 0.80 386.57 5
6 3.8402e-006 2.3709e-006 98.14 1.01 317.99 2

Savitzky- 0 - 3.0210e-011 0 0
Golay 2"“ 1 2.0085e-011 1.7638e-011 41.62 0.66 -67.51 14
derivative 2 1.7120e-011 1.2664e-011 58.08 0.97 62.89 9
3 I.1656e-011 8.5891e-012 71.57 0.92 117.81 5
4 8.6818e-012 6.0954e-012 79.82 1.01 133.42 2

288
Table B8. Multiway PCA results: lower strength blend (RCA) and tablet (Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 6.0221e-003 0 0 -

1 5.7318e-004 5.0057e-004 91.69 0.10 1435.57 44


2 1.6859e-004 1.4586e-004 97.58 0.34 1570.16 35
3 3.6290e-005 2.743 le-005 99.54 0.25 1582.98 27
4 5.9125e-006 4.4696e-006 99.93 0.22 1544.85 20
5 3.6375e-006 2.3943e-006 99.96 0.81 1438.90 14
6 1.4189e-006 9.3101e-007 99.98 0.59 1236.52 9
7 9.0220e-007 5.377 le-007 99.99 0.97 1011.82 5

SNV 0 5.3543e-004 0 0
1 2.2612e-004 2,0357e-004 61.98 0.42 440.32 35
2 4.8841e-005 4,0050e-005 92.52 0.24 655.60 27
3 2.9911 e-005 2.1992e-005 95.89 0.75 728.76 20
4 1.435 le-005 9.8170e-006 98.17 0.65 709.27 14
5 7.178le-006 4.9250e-006 99.08 0.73 663.90 9
6 4.7181e-006 3.0184e-006 99.44 0.96 574.13 5
7 3.7015e-006 2.1273e-006 99.60 1.23 436.52 2

Detrend 0 3.8312e-005 0 0
1 8.4304e-006 7.4780e-006 80.48 0.22 403.20 27
2 4.0596e-006 3.2836e-006 91.43 0.54 530.47 20
3 2.0885e-006 1.5127e-006 96.05 0.64 559.43 14
4 9.4316e-007 6.3848e-007 98.33 0.62 544.10 9
5 3.6735e-007 2.6052e-007 99.32 0.58 492.15 5
6 2.7006e-007 1.7042e-007 99.56 1.04 397.74 2

SNV detrend 0 1.6356e-004 0 0


1 7.2459e-005 6.4047e-005 60.84 0.44 278.89 35
2 2.8052e-005 2.2855e-005 86.03 0.44 480.13 27
3 1.5287e-005 1.1520e-005 92.96 0.67 558.10 20
4 8.8599e-006 6.3751 e-006 96.10 0.77 568.22 14
5 5.7952e-006 3.7922e-006 97.68 0.91 536.69 9
6 3.4829e-006 2.2624e-006 98.62 0.92 468.35 5
7 2.4475e-006 1.4597e-006 99.11 1.08 367.02 2

Savitzky- 0 - 1.9480e-009 0 0
Golay 2"“ 1 9.5597e-010 7.7635e-010 60.15 0.49 104.79 20
derivative 2 4.2379e-010 3.2825e-010 83.15 0.55 238.79 14
3 1.8666e-010 1.4089e-010 92.77 0.57 290.09 9
4 1.2009e-010 8.6205e-011 95.57 0.85 292.33 5
5 8.5508e-011 5.6108e-011 97.12 0.99 243.16 2

Multiway PCA model: Blend absorbance data and tablet transmission data.

289
Table B9. Multiway PCA results: higher strength blend (RCA) and tablet (Intact).
Data Set PCs PRESS SS %ss R d.f.
Raw* 0 - 5.3808e-003 0 0 -

1 5.0828e-004 4.3756e-004 91.87 0.09 1018.76 35


2 2.1086e-004 1.5104e-004 97.19 0.48 1131.24 27
3 1.993 le-005 1.4898e-005 99.72 0.13 1135.22 2 0

4 8.3505e-006 5.7367e-006 99.89 0.56 1103.47 14


5 5.1774e-006 2.9962e-006 99.94 0.90 975.74 9
6 1.7527e-006 1.0858e-006 99.98 0.58 799.80 5
7 1.2705e-006 6.8042e-007 99.99 1.17 600.70 2

SNV 0 2.0576e-003 0 0

1 4.0494e-004 3.3618e-004 83.66 0 .2 0 430.34 27


2 1.4897e-004 1.0745e-004 94.78 0.44 549.78 2 0

3 1.0474e-004 4.6409e-005 97.74 0.97 573.35 14

Detrend 0 1.8813e-005 0 0

1 1.4680e-005 9.738 le-006 48.24 0.78 6 6 .6 8 27


2 6.0791e-006 4.2724e-006 77.29 0.62 277.56 2 0

3 3.4738e-006 1.9104e-006 89.85 0.81 369.23 14


4 1.9776e-006 1.0109e-006 94.63 1.04 396.66 9

SNV detrend 0 6.192le-004 0 0

1 2.0730e-004 1.7706e-004 71.40 0.33 325.11 27


2 8.1592e-005 6.2649e-005 89.88 0.46 490.92 2 0

3 4.595 le-005 2.8093e-005 95.46 0.73 546.41 14


4 3.2545e-005 1.8449e-005 97.02 1.16 535.77 9

Savitzky- 0 - 7.0949e-008 0 0 - -

Golay 2"“ 1 2.4743e-009 1.8460e-009 97.40 0.03 486.50 14


derivative 2 1.7267e-009 1.1667e-009 98.36 0.94 577.98 9
3 1.3286e-009 8.0216e-010 98.87 1.14 492.77 5

Multiway PCA model: blend absorbance data and tablet transmission data

290
Table BIO. Multiway PCA results: lower strength blend (RCA) and tablet (RCA
and Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 3.5069e-003 0 0 -

1 3.6294e-004 3.1752e-004 90.95 0 .1 0 1523.90 54


2 1.0694e-004 9.2927e-005 97.35 0.34 1670.71 44
3 2.8177e-005 2.1658e-005 99.38 0.30 1692.88 35
4 1.096 le-005 7.9323e-006 99.77 0.51 1660.65 27
5 4.4789e-006 3.1368e-006 99.91 0.56 1559.49 2 0

6 3.1193e-006 1.8856e-006 99.95 0.99 1411.72 14


7 1.6664e-006 9.9394e-007 99.97 0 .8 8 1205.20 9
8 1.0195e-006 5.8816e-007 99.98 1.03 976.16 5

SNV 0 3.6179e-004 0 0

1 1.7427e-004 1.5482e-004 57.21 0.48 235.04 35


2 5.8638e-005 4.7749e-005 86.80 0.38 442.69 27
3 3.7213e-005 2.8030e-005 92.25 0.78 518.30 2 0

4 2.4744e-005 1.6579e-005 95.42 0 .8 8 519.81 14


5 1,4978e-005 9.4908e-006 97.38 0.90 491.02 9
6 8.5909e-006 5.0959e-006 98.59 0.91 434.33 5
7 5.5894e-006 3.1719e-006 99.12 1 .1 0 347.67 2

Detrend 0 2.3182e-005 0 0

1 5.5527e-006 4.8936e-006 78.89 0.24 421.93 35


2 2.9994e-006 2.3925e-006 89.68 0.61 554.92 27
3 1.7340e-006 1.2448e-006 94.63 0.72 591.49 2 0

4 8.7454e-007 6.1198e-007 97.36 0.70 587.92 14


5 5.5607e-007 3.6856e-007 98.41 0.91 552.93 9
6 3.4827e-007 2.1550e-007 99.07 0.94 475.72 5
7 2.069 le-007 1.2032e-007 99.48 0.96 370.55 2

SNV detrend 0 1.2983e-004 0 0

1 6.3945e-005 5.6604e-005 56.40 0.49 144.76 35


2 3.6285e-005 2.9715e-005 77.11 0.64 332.09 27
3 2.3326e-005 1 7353e-005 86.63 0.78 413.26 2 0

4 1.4153e-005 9.6101e-006 92.60 0.82 443.07 14


5 8.4503e-006 5.4496e-006 95.80 0 .8 8 439.31 9
6 5.656le-006 3.5124e-006 97.29 1.04 399.64 5

Savitzky- 0 1.1640e-009 0 0

Golay 2"" 1 6.1913e-010 5.2262e-010 55.10 0.53 334.59 54


derivative 2 2.6854e-010 2.1528e-010 81.51 0.51 592.01 44
3 1.3474e-010 1.0650e-010 90.85 0.63 701.48 35
4 9.8797e-011 7.4375e-011 93.61 0.93 737.55 27
5 8.1261e-011 5.5706e-011 95.21 1.09 716.60 2 0

Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
291
Table B ll. Multiway PCA results: higher strength blend (RCA) and tablet (RCA
and Intact).
Data Set PCs PRESS SS %SS R d.f.
Raw* 0 - 5.1944e-003 0 0 -

1 3.0869e-004 2.7908e-004 94.63 0.06 1677.58 54


2 1,2682e-004 1,0546e-004 97.97 0.45 1820.11 44
3 2.2252e-005 1.6983e-005 99.67 0 .2 1 1816.08 35
4 1.1619e-005 8.3142e-006 99.84 0 .6 8 1774.34 27
5 7.8440e-006 5.1694e-006 99.90 0.94 1633.56 2 0

6 5.863 le-006 3.4698e-006 99.93 1.13 1445.42 14

SNV 0 1.4184e-003 0 0

1 2.6442e-004 2.3269e-004 83.60 0.19 822.42 54


2 1.1613e-004 9.4948e-005 93.31 0.50 992.02 44
3 1,0342e-004 6.3332e-005 95.54 1.09 1034.64 35

Detrend 0 1.1324e-005 0 0

1 9.2578e-006 6.2702e-006 44.63 0.82 68.84 35


2 4.2032e-006 2.9978e-006 73.53 0.67 322.69 27
3 2.3736e-006 1 4636e-006 87.07 0.79 437.81 2 0

4 1.7556e-006 8.9420e-007 92.10 1 .2 0 481.62 14


5 9.1283e-007 4.9293e-007 95.65 1 .0 2 471.69 9

SNV detrend 0 3.900 le-004 0 0

1 1.5753e-004 1.3564e-004 65.22 0.40 290.04 35


2 7.1520e-005 5.7048e-005 85.37 0.53 481.10 27
3 5.I308e-005 3.2879e-005 91.57 0.90 553.15 2 0

4 3.7524e-005 1 9292e-005 95.05 1.14 560.83 14

Savitzky- 0 4.1191e-008 0 0

Golay 2"“ 1 1.3994e-009 1.1078e-009 97.31 0.03 1296.77 54


derivative 2 9.8406e-010 7.2603e-010 98.24 0.89 1423.54 44
3 7.8394e-010 5.1952e-010 98.74 1.08 1367.18 35

Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
292
Table B12. Blend PCA Q statistics.
Batch Q > Q ç9
29 9,9989e-004
50 3.8650e-005
76 2.8328e-005
125 4.1612e-005
126 2.7180e-005
141 3.6368e-005
160 2.83606-005
162 4.0577e-005
167 4.3216e-005
168 5.4408e-005
169 6.9286e-005
171 3.41156-005
173 4.10106-005
174 2.9097e-005
180 4.05476-005
193 3.0612e-005

1.4924e-003 29 8.6033e-002
50 2.6435e-003
125 2.4948e-003
141 2.72446-003
162 2.52856-003
167 3.15816-003
168 3.58276-003
169 4.89186-003
171 2.40066-003
173 2.39996-003
180 2.62906-003
193 2.24856-003

29 9.84226-004
91 2.09726-005
125 3.93746-005
160 3.18356-005
162 3.95456-005
163 2.30366-005
167 2.86346-005
168 3.57806-005
169 4.60576-005
170 2.14686-005
171 4.60626-005
173 4.04536-005
174 2.83116-005
180 2.75506-005
193 3.09266-005

SNV Detrend 7.8427C-004 29 7.76506-002


50 1.54516-003
64 1.65586-003
67 1.54646-003
89 1.14226-003
125 1.94756-003
126 1.64996-003
137 1.28146-003
157 1.39276-003
161 1.46986-003
167 1.95906-003
168 2.05326-003
169 3.35146-003
171 1.47746-003
173 1.60966-003
180 2.00196-003
181 1.09756-003
192 1.65876-003
193 2.36156-003

Savitzky-Golay 2" 1.4089e-009 1 1.94346-009


derivative 17 1.73376-009
20 1.74756-009
21 2.88886-009
22 3.10056-009
23 2.90636-009
28 1.94486-009
29 8.20516-008
38 2.00896-009
55 2.49306-009
56 2.57276-009
62 2.22966-009
64 2.09616-009
67 2.17556-009
68 1.93096-009
138 2.00716-009
167 2.21386-009
168 2.25616-009
169 3.12526-009
180 2.24066-009
181 1.75766-009
192 2.09586-009
193 3.88336-009

293
Table B13. Q statistics for lower strength Tablet (RCA) PCA.
Data Set Qgg Q 95 Batch Q > 099
Absorbance 4.9744e-006 4.1779e-006 3 2.0823e-005
7 2.9661e-005
14 1.3612e-005
18 1.6689e-005
21 2.6919e-005
2 2 1.8446e-005
27 9.9993e-006
30 2.5717e-005
31 2.6398e-005
32 2.4187e-005
33 9.1554e-006
38 3.1898e-005
39 5.5400e-005
40 5.7075e-005

SNV 1.1333e-003 8.5092e-004 7 2.2698e-003


42 1.2746e-003

Detrend 9.898 le-006 7.7477e-006 7 2.5652e-005


9 1.3786e-005
11 1.795 le-005
21 2.2040e-005
31 2.0820e-005

SNV detrend 3.3199e-004 2.3178e-004 2 7.5289e-004


7 6.8202e-004
21 8.1532e-004
31 4.1489e-004
38 1.5914e-003
39 2.2810e-003
40 2.8171e-003

Savitzky-Golay 2"'* 2.0109e-009 1.6990e-009 11 2.8535e-009


derivative 13 3.4055e-009
2 0 2.4806e-009
21 3.9365e-009
41 2.7154e-009

Table B14. Q statistics for higher strength Tablet (RCA) PCA.


Data Set Q 99 Ô 95 Batch Q > 099
Absorbance 1.9648e-005 1.5142e-005 32 2.5085e-005
33 3.635 le-005
42 2.801 le-005

SNV 4.8838e-004 3.7428e-004 1 9.4335e-004


4 1.3209e-003
6 1.0309e-003
33 1.2482e-003
39 5.2010e-004
42 7.7018e-004
43 1.0678e-003

Detrend 8.8034e-006 7.0393e-006 4 2.4553e-005


6 2.1819e-005
33 2.5949e-005

SNV detrend 4.5336e-004 3.5054e-004 4 1.1607e-003


6 1.0696e-003
32 6.3294e-004
33 1.4386e-003

Savitzky-Golay 2"“* 1.8816e-009 1.5809e-009 4 7.5230e-009


derivative 6 3.6866e-009
2 0 2.6156e-009
25 5.9786e-009

294
Table B15. Q statistics for lower strength Tablet (Intact) PCA.
Data Set Ô9 9 Ô95 Batch Q > Q 99
Transmission 5.2093e-005 3.8329e-005 12 1.7977e-004
21 1.3308e-004
31 1.0500e-004

SNV 1.1009e-004 8.5906e-005 12 5.6989e-004


13 1.5413e-004
21 3.1784e-004
27 2.1313e-004
28 2.3024e-004
29 1.4907e-004

Detrend 2.1911 e-005 1.6897e-005 5 1.6097e-004


6 2.0328e-004
12 1.7702e-004
14 4.9454e-005
21 9.5759e-005
23 3.5307e-005
28 5.1909e-005

SNV detrend 3.8494e-005 3.1175e-005 5 3.1986e-004


6 4.3138e-004
12 4.5632e-004
14 1.3133e-004
2 0 6.3377e-005
21 2.6612e-004
23 1.0322e-004
28 1.5675e-004
30 2.0845e-004
33 7.4489e-005

Savitzky-Golay 2"** 5.864le-008 4.9724e-008 12 1.7618e-007


derivative 17 8.3382e-008
19 1.1108e-007
27 1.2536e-007
30 2.5247e-007
31 1.6950e-007
32 1.4492e-007

295
Table B16. Q statistics for higher strength Tablet (Intact) PCA.
Data Set Qg, Ô 95 Batch Q > Q 99
Transmission 2.1780e-004 1,6979e-004 20 5.2657e-004
26 4.6967e-004
28 4.6604e-004
30 4.9237e-003
31 6.1234e-004
40 9.5833e-003

SNV 4.8715e-003 3.7027e-003 1 1.7733e-002


28 9.4336e-003
29 1.7096e-002
30 4.2784e-002
31 1.4916e-002
32 8.9986e-003
41 1.1527e-002

Detrend 9.5224e-005 7.6020e-005 14 1.2623e-004


27 1.4583e-004
29 4.3460e-004
30 5.9208e-003
38 1.3273e-004
40 1.5912e-002
41 1.4367e-003

SNV detrend 5.2424e-003 3.9417e-003 1 1.3028e-002


19 5.4406e-003
24 8.9175e-003
25 7.1171e-003
29 9.6550e-003
30 5.7097e-002
31 9.4374e-003
41 1.0339e-002

Savitzky-Golay 2"*^ 1.7825e-006 I.3181e-006 1 2.0560e-006


derivative 2 1.878 le-006
21 2.6345e-006
29 2.0629e-006
30 1.0607e-005

296
Table B17. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Q 99
Absorbance 3.9809e-004 3.0530e-004 33 1.1911e-003
35 7.0804e-004

SNV 7.5891e-003 6.3292e-003 9 2.3266e-002


15 1.2068e-002
19 1.5666e-002
25 1.8175e-002
28 3.0518e-002
33 4.1400e-002
35 2.5518e-002
36 1.6482e-002

Detrend 2.9I59e-004 2.4097e-004 10 7.6645e-004


11 3.7375e-004
19 3.1051e-004
36 4.1973e-004

SNV detrend 6.4019e-003 5.0012e-003 15 1.1284e-002


35 8.3091e-003

Savitzky-Golay I"** 3.1322e-008 2.3553e-008 - -

derivative

Table B18. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(RCA).
Data Set Q 99 Q 95 Batch Q > Ü 99
Absorbance 2.7978e-004 2.1245e-004 1 9.0982e-004
2 1.2517e-003
6 8.1050e-004
8 8.420 le-004
39 4.7903e-004

SNV 7.6923e-003 5.8904e-003 13 1.8904e-002


14 1.5525e-002
18 1.6901e-002
23 2.8408e-002
24 2.7384e-002
25 1.8181e-002
33 8.1036e-003
38 1.7199e-002
39 1.5279e-002

Detrend I.4100e-004 1.1026e-004 8 2.8793e-004


9 3.4958e-004
18 2.6217e-004
35 2.3900e-004

SNV detrend 7.7382e-003 6.0963e-003 18 1.4617e-002

Savitzky-Golay 2"*’ 1.3829e-008 1.1246e-008 14 1.905 le-008


derivative 15 2.7134e-008
23 2.0146e-008
24 2.3255e-008
38 2.3272e-008
39 2.1624e-008

297
Table B19. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(Intact).
Data Set Q'» Ô95 Batch !2>Ô99
R aw ' 1.5907e-003 1.1738e-003 28 2.7533C-003

SNV 5.1276e-003 4.0039e-003 30 7.5109e-003

Detrend 3.8712e-004 3.0723e-004 20 4.9826e-004


31 3.9653e-004
36 6.169 le-004

SNV detrend 3.6453C-003 2.8187e-003 18 6.7668C-003

Savitzky-Golay 2”“* 1.1668e-007 9.0296e-008 4 2.3788e-007


derivative 21 1.6089e-007
25 3.0874e-007
26 2.048 le-007
27 1.6877C-007
36 2.2012e-007
37 1.9747C-007

Table B20. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(Intact).
Data Set Qv) 695 Batch Q>Q99
R aw ' 1.4708e-003 1.1593e-003 21 5.2403e-003
27 7.343 le-003
28 1.398 le-002
32 2.5276e-003
38 6.3302e-002
39 8.1246e-003
41 2.2253e-003

SNV 1.0076e-001 7,9322e-002 2 1.0352e-001


13 1.1048e-001
24 1.8528C-001
28 1.8429e-001
29 1.3560e-001
34 1.9496e-001
39 l.0656e-001

Detrend 2.3131e-003 1.7949e-003 24 7.0645e-003


27 4.5675e-003
28 4.8798C-003

SNV detrend 4.451.SC-002 3.404 le-002 13 9.6856C-002

Savitzky-Golay 2"‘‘ 1.737 le-006 1.2968e-006 19 2.2427e-006


derivative 27 2.0910C-006
28 1.0698C-005

Table B21. Q statistics for multiway PCA: lower strength blend (RCA) and tablet
(RCA and Intact).
Data Set Q99 Qts Batch 2>Ô99
Raw ' 2.4705e-003 1.8427e-003 10 4.149 le-003
15 3.3556e-003

SNV 1.1685e-002 9.3937e-003 11 1.7378e-002


19 2.0498C-002
28 3.2384e-002

Detrend 5.1087e-004 3.9496e-004 20 1.0090e-003


21 8.9269C-004
36 1.5686e-003
37 1.0384e-003

SNV detrend 1.2784e-002 1.0312e-002 9 2.4090C-002


28 2.1033e-002

Savitzky-Golay 2"** 2.1165e-007 1.5935e-007


derivative

Multiway PCA model: blend absorbance and tablet transmission data.

Multiway PCA model: blend absorbance, tablet absorbance and transmission data.
298
Table B22. Q statistics for multiway PCA: higher strength blend (RCA) and tablet
(RCA and Intact).
Data Set Q 99 Q 95 Batch Q > Q 99
Raw* 1.4852e-002 1.1309e-002 - -

SNV 1.7927e-001 1,4223e-001 13 2.3560e-001


24 6.1258e-001
34 5.3285e-001

Detrend 2.0636e-003 1.5826e-003 28 5.8723e-003

SNV detrend 9.1369e-002 6.7729e-002 13 2.0121e-001

Savitzky-Golay 2"'* 1,9274e-006 1.4392e-006 19 2.3320e-006


derivative 28 1.0317e-005

Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
299
Table B23. Blend PC score Shewhart plots results.
Daia Set Batch PC Score value 99% Control limit
Raw 21 5 -3.4255e-002 -3.21346-002
23 6 -1.4635e-002 -1.35906-002
29 5 4.6468e-002 3.32656-002
29 8 1.3889e-002 8.63976-003
32 4 3.9906e-002 3.54816-002
37 8 -9.5603e-003 -8.54216-003
50 8 -9.229 le-003 -8.54216-003
64 7 1.0903e-002 1.01596-002
68 7 1.0328e-002 1.01596-002
71 6 1.8794e-002 1.42326-002
74 4 3,7637e-002 3.54816-002
127 4 3.9395e-002 3.54816-002
127 7 -1.0824e-002 -1.00026-002
192 6 1.6389e-002 1.42326-002

SNV 21 2 -4.7707e-001 -3.45336-001


22 2 -3.5495e-001 -3.45336-001
24 2 -3.9561e-001 -3.45336-001
28 3 -3.47896-001 -2.73806-001
29 2 4.9455e-001 3.56496-001
29 6 -7.5455e-002 -5.62936-002
29 7 7.3642e-002 4.24966-002
31 3 -3.03996-001 -2.73806-001
33 3 -3.2266e-001 -2.73806-001
63 5 -9.3944e-002 -9.03856-002
64 5 -9.3676e-002 -9.03856-002
69 6 -5.995 le-002 -5.62936-002
74 3 3.36346-001 2.79206-001
110 6 5.8387e-002 5.39806-002
127 6 7.31706-002 5.39806-002
192 6 -6.36626-002 -5.62936-002

Detrend 17 4 1.64186-002 1.51686-002


21 4 2.10266-002 1.51686-002
22 4 2.04566-002 1.51686-002
23 4 1.89506-002 1.51686-002
24 4 1.66096-002 1.51686-002
29 6 -1.34696-002 -8.96906-003
64 6 1.01366-002 8.78766-003
74 4 1.82736-002 1.51686-002
76 5 1.26186-002 1.17616-002
127 5 1.70686-002 1.17616-002
192 6 -1.00196-002 -8.96906-003
193 2 4.38616-002 4.14376-002

SNV Detrend 21 2 -4.77076-001 -3.45336-001


22 2 -3.54956-001 -3.45336-001
24 2 -3.95616-001 -3.45336-001
28 3 -3.47896-001 -2.73806-001
29 2 4.94556-001 3.56496-001
29 6 -7.54556-002 -5.62936-002
29 7 7.36426-002 4.24966-002
31 3 -3.03996-001 -2.73806-001
33 3 -3.22666-001 -2.73806-001
63 5 -9.39446-002 -9.03856-002
64 5 -9.36766-002 -9.03856-002
69 6 -5.99516-002 -5.62936-002
74 3 3.36346-001 2.79206-001
110 6 5.8387e-002 5.39806-002
127 6 7.31706-002 5.39806-002
192 6 -6.36626-002 -5.62936-002

Savitzky-Golay 2'“' 29 5 7.51416-005 7.21036-005


derivative 29 6 6.3694C-005 6.04936-005
29 7 -3.32166-004 -7.87026-005
125 3 -1.25856-004 -9.79626-005
136 5 -9.37706-005 -7.09226-005
137 5 -7.80756-005 -7.09226-005
138 5 -7.23816-005 -7.09226-005
152 5 -7.1721e-005 -7.09226-005
157 6 7.85556-005 6.04936-005
161 6 7.55546-005 6.04936-005
171 3 -1.30756-004 -9.79626-005
173 3 -1.18686-004 -9.79626-005
174 4 1.22576-004 9.50776-005

300
Table B24. Blend PC score Shewhart plot results: dispersion of sub-group centred
scores.
Data Set Batch PC 99% Control limits (+ /-) Observations exceeding 99% lim its'
Raw data g 1 +4.1520e-001,-4.1520e-(X)l 5
21 1 +4.1520e-001, -4.1520e-001 3
48 1 +4.1520e-001, - 4 .1 520e-001 3
49 1 +4.1520e-001,-4.1520e-001 5
49 + 1.6328e-001.-1.6328e-001 4
53 1 +4.1520e-001,^.1520e-001 8
53 + 1.6328e-001,-1.6328e-001 3
146 1 +4.1520e-001.-4.1520e-001 5
189 1 +4.1520e-001,-4.1520e-001 4
189 +4.4435e-002, -4.4435e-002 6
192 +1.0889e-(X)2, -1.0889e-002 4

SNV 8 1 +5.4989e-001, -5.4989e-001 4


8 +2.8536e-001,-2.8536e-001 3
15 1 +5.4989e-001, -5.4989e-001 3
16 1 +5.4989e-001, -5.4989e-001 4
25 1 +5.4989e-001, -5.4989e-001 3
27 1 +5.4989e-001, -5.4989e-001 3
27 +2.8536e-001, -2.8536e-001 3
39 1 +5.4989e-001,-5.4989e-001 5
39 +2.8536e-001,-2.8536e-001 3
40 1 +5.4989e-001, -5.4989e-001 3
49 1 +5.4989e-001, -5.4989e-001 5
49 +2.8536e-001, -2.8536e-001 4
58 1 +5.4989e-001, -5.4989e-001 6
58 +2.8536e-001, -2.8536e-001 3
59 1 +5.4989e-001,-5.4989e-001 6
59 +2.8536e-001,-2.8536e-001 3
64 1 +5.4989e-001,-5.4989e-001 6
64 +2.8536e-001, -2.8536e-001 4
71 +2.5687e-001, -2.5687e-001 4
146 1 +5.4989e-001,-5.4989e-001 6
154 + 1.2447e-001, -1.2447e-001 3
186 -rl.2447e-001.-1.2447e-001 3
189 1 -r5.4989e-001, -5.4989e-001 8
189 2 +2.8536e-001, -2.8536c-001 7
192 6 + 1.2447c-001,-1.2447e-001 4

Detrend 8 3 +2.1745C-002, -2.1745e-002 5


15 4 +5.6283e-003, -5.6283e-003 3
59 3 +2.1745e-002,-2.1745e-(X)2 4
64 1 + 2.9354e-002, -2.9354c-002 4
64 3 +2.1745e-002, -2 .1 745c-002 4
71 6 + 1.2227e-002, - 3
146 1 +2.9354C-002. -2.9354c-002 8
186 5 +9.9877C-003, -9.9877e-003 3
186 7 +5.2675e-003, -5.2675e-003 3
189 1 +2.9354C-002, -2.9354e-(X)2 6
189 3 + 2 .1745C-002, - 2 . 1745e-002 7
192 6 + 1.2227e-002, -1.2227e-002 4

SNV Detrend 8 1 +2.1727e-001, -2.1727e-001 3


8 3 + 1.8377e-001, -1.8377e-001 3
31 7 +3.1738e-002, - 3 . 1738e-(X)2 3
37 4 +6.0075e-002, -6.(X)75e-002 5
39 1 +2.1727e-001, -2.1727C-001 4
43 1 +2.1727e-001,-2.1727e-001 3
43 4 +6.0075e-002, -6.0075e-002 4
43 7 +3.1738e-002,-3.1738e-002 5
44 4 +6.0075e-(X)2, -6.0075c-002 4
53 7 +3.1738e-002, -3 .1 738e-002 6
58 1 +2.1727e-001,-2.1727e-CX)l 4
59 1 +2.1727e-001,-2.1727e-001 3
64 1 +2.1727e-001,-2.1727e-001 6
64 3 + 1.8377e-001,-1.8377e-001 5
146 1 +2.1727e-001,-2.1727e-001 9
146 4 +6.0075e-002, -6.0075e-002 3
150 5 +1.0842e-001, - 3
154 6 +7.465le-002, -7 .4 6 5 le-002 4
189 1 +2.1727e-001,-2.1727e-001 8
189 3 +1.8377e-001,-1.8377e-001 7
192 5 +1.0842e-001, -1.0842e-001 4

Savitzky-Golay 21 5 +4.9628e-005, -4.9628e-005 3


2“* Derivative
53 7 +2.2707e-005, -2.2707e-005 4
64 6 +2.9123e-(X)5, -2.9123e-(X)5 4
142 4 +1.3328e-004,- 3
146 1 +6.5719e-005, -6.5719e-005 8
150 3 +6.4059e-(X)5, - 3
150 4 +1.3328e-004,- 3
182 6 +2.9123e-005, -2.9123e-005 3
186 3 +6.4059e-005, -6.4059e-005 3
186 4 + 1.3328e-004, -1.3328e-004 4

” Restricted to a minimum o f 3 observations (equivalent to one blend sample out of three per batch). Control limits are shown as positive and negative where observations fall
beyond upper and lower limits.
301
Table B25. PC score Shewhart plot results for lower strength Tablets (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 18 4 3.9412e-002 3.8330e-002
39 6 2.0824e-002 1.7222e-002
40 6 1.8152e-002 1.7222e-002

SNV 2 4 -8.6349e-002 -8.0326e-002


5 3 1.1017e-001 1.0719e-001
10 5 -4.2980e-002 -4.2878e-002

Detrend 2 2 -5.9015e-002 -5.6332e-002


5 4 1.9610e-002 1.6383e-002

SNV detrend 2 4 -8.6349e-002 -8,0326e-002


5 3 1.1017e-001 1.0719e-001
10 5 -4.2980e-002 -4.2878e-002

Savitzky-Golay 2"“*derivative 2 2 -2.5744e-004 -2.2580e-004


5 3 1.8518e-004 1.4897e-004
31 4 1.1906e-004 1.0393e-004
42 5 9.0237e-005 8.0597e-005

Table B26. PC score Shewhart plot results for higher strength Tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 3 1 4.0676e-001 3.4801e-001

SNV - - -

Detrend 36 1 7.9890e-002 7.6085e-002


41 5 -6.7603e-003 -5.7996e-003

SNV detrend 41 5 -4.8264e-002 ^.4 6 9 8 e-0 0 2


42 6 -3.2585e-002 -3.1940e-002

Savitzky-Golay 2“'* 41 6 5.5241e-005 5.2591e-005


derivative

Table B27. PC score Shewhart plot results for lower strength Tablet (Intact).
Data Set Batch PC Score value 99% Control limit
Transmission 2 4 -1.2771e-001 -1.2343e-001
6 8 1.6276e-002 1.58I2e-002

SNV 5 6 3.0174e-002 2.9772e-002


27 7 3.1849e-002 2.8843e-002
27 8 -2.2767e-002 -2.2317e-002

Detrend 5 7 -1.9020e-002 -1.5136e-002


30 2 -2.1765e-001 -2.0130e-001

SNV detrend 5 7 -2.7288e-002 -2.2561e-002


27 6 2.0466e-002 1.9752e-002
30 2 -3.6439e-001 -3.3250e-001
31 8 -1.3180e-002 1.2178C-002

Savitzky-Golay 2"'' 5 5 -4.871 le-004 -4.7223e-004


derivative 6 5 -5.1441e-004 ^ .7 2 2 3 e -0 0 4
41 6 -3.9947e-004 3.5901e-004

Table B28. PC score Shewhart plot results for higher strength Tablet (Intact).
Data Set Batch PC Score value 99% Control limit
Transmission 36 4 1.3984e-001 1.2727e-001
40 1 9.705 le-tOOO 8.7008e-f000
40 3 -3.3700e-001 -3.2879e-001
40 6 -1.8427e-001 -9.8051e-002
40 7 -3.0900e-002 -2.5547e-002
43 5 -6 .4 2 6 le-002 -6.2210e-002

SNV 30 5 2.0175e-001 1.7988e-001


36 3 ^ .2159e-001 —4.0883e-001
40 2 -1.2174e-tOOO —6.6052e-001

Detrend 1 1 -2.6050e-001 -2.4802e-001


30 7 -1.6410e-002 -1.4472e-002
36 3 1.4791e-001 1.0737e-001
40 2 -3.0849e-001 -2.3694e-001
40 4 2.5635e-001 1.3788e-001
40 7 1.8963e-002 1.5376e-002

SNV detrend 36 3 -3.0932e-001 -2.9049e-001


40 2 8.9516e-001 5.4008e-001

Savitzky-Golay 2"“* 36 2 -2.3379e-003 -1.8498e-003


derivative 40 1 3.6740e-002 2.0559e-002

302
Table B29. PC score Shewhart plot results for multiway PC A: lower strength
blend (RCA) and tablet (Intact).
Data Set Batch PC Score value 99% Control limit
R aw ' -

SNV 25 3 3.6550e-001 3.6412e-001

Detrend 25 2 -2.0095e-001 -1.7757e-001

SNV detrend 10 7 —8.6866e-002 -7.6995e-002


25 3 -3.2550e-001 -2.9671e-001

Savitzky-Golay 2"'“ - - -
derivative

Table B30. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (Intact).
Data Set Batch PC Score value 99% Control limit
R aw ' 21 7 7.5409e-002 7.3657e-002
38 1 9.4213e-t4XX) 8.3792e+000
38 4 ^ .6845e-001 -3.4343e-001

SNV 38 3 -1.087264000 -6.2344e-001

Detrend 28 2 -2.6072e-001 -2,3232e-001


34 3 1.4293e-001 1.2496e-001
38 1 -3.7778e-001 -2.4695e-001

SNV detrend 38 3 7.5972e-001 4.8390e-001

Savitzky-Golay 2"'* 34 2 -2.2685e-003 -l.8924e-003


derivative 38 1 3.6585e-002 2.0853e-002

Table B3I. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance 10 7 -3.4132e-002 -3.1283e-002
15 6 4.6640e-002 4.4400e-002
25 5 6.7545e-002 6.7212C-002

SNV 25 4 4.0872e-001 3.5299e-001

Detrend 20 4 -2.3329e-002 -2.3318e-002

SNV detrend - -

Savitzky-Golay 2"'' 18 3 -1.8611e-004 -1.5796e-004


derivative

Table B32. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (RCA).
Data Set Batch PC Score value 99% Control limit
Absorbance - - -

SNV 24 3 3.9296e-001 3.9053e-001

Detrend 24 2 7.4402e-002 5.5530e-002

SNV detrend 9 6 1.3182e-001 1.2260e-001


13 3 -3.4029e-001 -3.3292e-001
24 2 -6.1649e-001 ^,68 5 8 e-0 0 1
Savitzky-Golay 2"'
derivative

Multiway PCA model; blend absorbance and tablet transmission data.

303
Table B33. PC score Shewhart plot results for multiway PCA: lower strength
blend (RCA) and tablet (RCA and Intact).
Data Set Batch PC Score value 99% Control limit
Raw* 36 8 -7.3195e-002 -7.1348e-002

SNV 25 3 5.3250e-001 4.8857e-001


35 6 -2.5187e-001 -2.3543e-001

Detrend 25 2 -2.0833e-001 -1.7886e-001

SNV detrend 25 3 -4.7052e-001 -3.9068e-001

Savitzky-Golay 2"'' - - - -
derivative

Table B34. PC score Shewhart plot results for multiway PCA: higher strength
blend (RCA) and tablet (RCA and Intact).
Data Set Batch PC Score value 99% Control limit
Raw* 24 3 -1.1030e+000 -1.0395e-K)00
38 1 8.8263e+000 T7480e+000

SNV 38 3 -1.0555e-H000 -6.1915e-001

Detrend 24 5 -7.7750e-002 -6.9869e-002


28 2 -2.546 le-001 -2.3106e-001
34 3 1.6589e-001 1.3503e-001
38 1 -3.9147e-001 -2.4680e-001

SNV detrend 24 3 6.3643e-001 5.3612e-001


38 4 5.9262e-001 4.0872e-001

Savitzky-Golay 2"** 34 2 -2.275 le-003 -1.8968e-003


derivative 38 1 3.6375e-002 2,0646e-002

Multiway PCA model: blend absorbance and tablet absorbance and transmission data.

304
Table B35. MSPC of blend PCA models (absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (C ontrol Phase 1) PCs approximation normal
(99.99% limit)
42.42 11 1,5 15.56 14 235.00
47.86 21 5 15 1506.35
35.28 22 2, 5 ,6 20 19.26
34.99 23 2 ,5 ,6 23 20.26
37.61 24 5 ,6 33 29.40
30.78 28 1 ,3 ,4 37 703.16
58.44 29 5 43 83.02
26.05 31 1,2, 3 ,4 64 75.56
41.33 33 1 .3 ,4 67 26.54
25.93 49 1 109 177.71
33.91 53 1,5 115 191.46
121.12 83 1 144 44.90
180.46 84 1, 4, 5, 6 146 136.65
155.58 85 1.5 154 16.90
164.05 86 1.5 182 25.51
105.12 87 1
158.76 88 1.5
158.58 89 1.5
148.31 90 1.5
118.45 91 1,5
97.52 92 1
80.67 93 1
130.34 94 1.5
134.36 95 1,5
141.11 96 1,5
131.75 97 1.5
146.04 98 1,5
126.11 99 1,5
126.02 100 1,5
121.52 101 1.5
145.15 102 1,5
157.45 103 1.5
77.55 104 1
154.40 105 1.5
125.38 106 1,5
103.77 107 1.5
105.89 108 1
85.85 109 1.2
91.22 110 1
67.03 111 1
80.60 112 1
146.27 113 1.5
140.40 114 1,5
75.70 115 1,6
68.11 116 1
81.93 117 1
70.42 118 1
111.21 119 1
146.84 120 1
105.21 121 1
124.06 122 1
122.44 123 1
117.02 124 1
141.88 125 1,5
136.21 126 1.5
132.71 127 1
87.53 128 1 .5 ,6
96.66 129 1 ,6
81.93 139 1,5
127.70 140 1.5
82.16 141 1 ,4 ,5
85.95 142 1
86.88 143 1
76.25 144 1 ,4 ,5
111.79 145 1,5
115.15 149 1
94.50 151 1,4
120.22 152 1
146.32 153 1
142.07 154 1

305
Table B36. MSPC of blend PCA models (SNV absorbance data).
C ontrol Im plicated Anderson's normal Anderson's
B atches (ControlPhase I) PCs approximation normal
(99.99% limit) approximation
24.90 1 1,3 13.48 8 14.24
40.81 8 1 ,2 ,4 14 450.36
64.86 11 1 ,2 ,4 15 114.50
28.20 12 4. 1,2 16 57.91
46.04 14 1 .2 ,4 27 25.95
50.39 21 2, 4 ,3 37 621.11
31.48 22 4, 2 ,3 39 18.19
29.17 23 4 ,2 40 75.23
50.11 24 2 ,4 43 48.93
33.13 25 1 .2 ,4 44 13.68
19.94 26 1 .4 59 78.21
38.03 27 1 64 21.51
63.31 28 1 .3 ,2 67 23.34
53.98 29 2, 1 ,3 ,4 109 23.47
44.65 31 1.2, 3 ,4 144 194.38
74.33 33 1.2 146 273.34
29.53 40 4. 1.2
38.38 48 1 .2 ,4
46.14 49 1.2
49.63 53 1 .2 ,4
21.39 64 1.5
21.78 67 1
93.94 83 1 .2 ,4
142.89 84 1 .2 ,4
117.64 85 1 ,2 ,4
131.87 86 1 .2 ,4
76.66 87 1 .2 ,4
122.11 88 1 .2 ,4
117.46 89 1 ,2 ,4
119.29 90 1 .2 ,4
93.01 91 1 .2 ,4
73.97 92 1 .2 ,4
55.95 93 1 .2 ,4
102.14 94 1 ,2 ,4
112.52 95 1 ,2 ,4
118.24 96 1 .2 ,4
102.25 97 1 .2 ,4
111.25 98 1 .2 ,4
106.37 99 1 .2 ,4
106.17 100 1 .2 ,4
100.64 101 1 .2 ,4
110.94 102 1 .2 ,4
131.39 103 1 .2 ,4
56.92 104 1 .4 ,6
120.23 105 1 .2 ,4
106.86 106 1 .2 ,4
80.93 107 1 .2 ,4
79.12 108 1 ,2 ,4
66.23 109 1 .2 ,6
64.45 110 1 .6 ,4
49.54 111 1. 2
57.11 112 1. 2
116.17 113 1 .2 ,4
109.96 114 1 .2 ,4
61.92 115 1 .4 ,6
48.04 116 1
56.70 117 1.2
48.78 118 1
65.65 119 1 .2 ,6
95.97 120 2, 1
77.02 121 1 ,2 ,4
76.39 122 2, 1
77.15 123 1. 2
84.44 124 1 .2 ,4
105.84 125 1 .2 ,4
103.68 126 1 .2 .4
93.25 127 1.2, 6 ,4
50.49 128 1 .2 ,6
68.91 129 1 .2 ,4
56.70 139 1. 2
93.70 140 1.2
76.78 141 1 .4 ,5
54.77 142 1.2
59.85 143 1.2
62.12 144 1.2
90.93 145 1 .2 ,4
77.23 149 1.2
69.34 151 1.2
71.88 152 1.2
90.85 153 2. 1
87.36 154 2. 1
22.98 189 3 .2 ,1

306
Table B37. MSPC of blend PCA models (detrend absorbance data).
Data Set Control PCs 7^„ 9 ,^ T ^> FI. t n - n . 9 9Î Batch Implicated Anderson's normal Batch Anderson's
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Detrend 139 7 20.51 28.98 21 1.3 14.5573 8 18.58
33.1572 29 6, 2 ,3 14 317.40
25.0336 83 1 16 25.31
49.3930 84 1,5 37 160.49
42.6574 85 1 ,2 43 100.85
44.8700 86 1 ,2 59 635.11
21.4141 87 1 64 39.21
43.1488 88 1 ,2 67 35.78
40.9772 89 1 ,2 109 152.77
35.9743 90 1 ,2 115 30.85
27.6983 91 1 ,2 142 21.73
37.7630 94 1 ,2 144 94.74
33.0183 95 1 146 30.59
36.1442 96 1,2 152 56.67
35.7966 97 1 ,2 186 26.61
32.9009 98 1
30.9418 99 1
31.0504 100 1
30.7297 101 1 ,2
36.6090 102 1 ,2
42.2416 103 1 ,2
37.5513 105 1 ,2
32.8472 106 1
20.5247 107 1
21.0809 108 1
39.0639 113 1 ,2 ,4
31.5319 114 1,4
23.3561 119 1
36.2788 120 1 ,2 ,3
24.6408 121 1,2
25.4348 122 1,3
23.3730 123 1
31.6397 124 1
38.3432 125 1,2
44.4650 126 1 ,2 ,7
29.2552 127 5, 1 ,6 ,3
32.7394 140 1,2
41.4892 141 1 ,5 ,7
23.5197 144 1
31.5817 145 1,2
31.7742 149 1,2
27.3713 151 1
27.2661 152 1
37.0055 153 1,3
36.5750 154 1,3

307
Table B38. MSPC of blend PCA models (SNV detrend absorbance data).
Data Set Control n. m-«. 99% TST». Batch Implicated Anderson's normal Batch Anderson’s
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
SNV 123 7 20.8037 35.22 21 1,4 14.5573 8 33.37
Detrend 21.67 22 1 .4 14 273.01
25.61 24 1 ,4 15 273.05
77.88 29 1 ,2 , 6 ,7 16 267.84
22.46 33 1,3 26 84.36
87.34 83 1 ,4 27 128.08
137.62 84 1 ,3 ,4 28 34.95
117.26 85 1 ,2 ,4 33 36.02
127.73 86 1 37 164.11
70.19 87 1 39 66.28
116.19 88 1 43 163.99
112.38 89 1 59 125.17
115.21 90 1 64 127.47
91.36 91 1 67 25.53
71.36 92 1 144 677.97
52.47 93 1 146 213.66
99.20 94 1 ,3 ,4
107.55 95 1 ,4
112.74 96 1,4
97.29 97 1,4
105.36 98 1 ,3 ,4
102.24 99 1 ,2 ,4
100.29 100 1 ,2 ,4
94.78 101 1 ,4
108.45 102 1 ,2 ,4
125.86 103 1 ,3 ,4
51.75 104 1 ,6
118.36 105 1 ,3 ,4
102.00 106 1 ,3 ,4
74.80 107 1,4
75.22 108 1
65.06 109 1 ,2 ,6
62.31 110 1 ,6
50.20 111 1
54.45 112 1
113.31 113 1 ,2 ,4
106.53 114 1 ,2 ,4
57.20 115 1 ,6
42.74 116 1
53.47 117 1
44.33 118 1
62.26 119 1
93.72 120 1,3
73.30 121 1
74.31 122 1 ,2 ,3
75.38 123 1 ,2 ,3
81.15 124 1,3
106.71 125 1,3
101.65 126 1,3
98.10 127 1 ,2 ,6
51.59 128 1
66.20 129 1
53.47 139 1
92.73 140 1
79.06 141 1 ,4
55.30 142 1
59.02 143 1
63.77 144 1
90.65 145 1
74.82 149 1
66.93 151 1,3
72.84 152 1,3
89.43 153 1,3
86.73 _______ 154 1,3

308
Table B39. MSPC of blend PCA models (Savitzky-Golay 11 point 2"^ derivative of
absorbance data).
Data Set Control PCs 7^„ 99, m-n. Batch Implicated Anderson's normal Batch Anderson's
Batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Savitzky- 91 7 21.7160 25.6375 21 5 .6 14.5573 33 18.79
Golay 2'“' 623.2553 29 5 .7 36 25.01
derivative 29.3948 32 2. 5 ,7 89 44.95
22.5586 48 5 90 29.68
32.6354 50 3 .5 .7 102 16.18
32.3917 63 1 .2 .3 106 84.83
114.5680 83 1 109 62.79
184.4618 84 1 125 5851.04
164.7183 85 1.2 126 3989.45
195.6259 86 1.2 144 5686.18
93.6012 87 1 .2 ,3 186 38.09
168.3523 1.2
170.9391 89 1.2
175.9201 90 1 .2 ,3
132.6630 91 1.2
87.0157 92 1
65.1144 93 1
133.6834 94
156.8426 95
173.9118 96
157.7016 97 1 .2 .3
151.5180 98 1 .2 .3
144.9607 99 1
145.0235 100 1
151.8975 101 1.2
155.1517 102 1.2
193.0451 103 1 .2 .3
57.4501 104 1
154.7630 105 1.2
145.7133 106 1.2
98.8913 107 1.2
99.4680 108 1
77.8768 109 1.3
49.1095 110 1 .2 .3
56.2726 111 1.3
65.3618 112 1.3
154.3364 113 1.2
139.8857 114 1
53.3653 115 1 .3 .7
51.0597 116 1 .3 .7
76.4228 117 1.2
49.5355 118 1 .2 .3
106.1942 119 1.5
166.6243 120 1.5
120.4525 121 1,2. 5 ,7
119.3954 122 1.5
133.2169 123 1.5
164.7556 124 1.5
156.3455 125 1.2
172.6288 126 1 .2 .5
70.4721 127 1 .3 .6
50.8225 128 1 .3 .6
83.0562 129 1.3
173.7180 136 5. 6 ,7
104.4856 137 2 .5
73.5134 138 2 .5
76.4228 139 1.2
138.2863 140 1 .2 .3
104.4436 141 1.3
58.2344 142 1.3
63.3440 143 1
93.0315 144 1
122.1246 145 1
129.4913 149 1
104.9995 151 1.3
155.6240 152 1.5
184.8494 153 1.5
171.9002 154 1.5
34.8586 156 2 .5
66.6923 157 5 .6
80.8661 161 5 .6
24.1009 162 3 .4
34.6592 170 2 .5
54.6877 171 3. 5 .6
62.9707 172 2 .5 .6
57.8558 173 2 .3
30.1715 174 3 .4

309
Table B40. MSPC of lower strength tablet PCA models (RCA).
Data Set Control PCs T^> 7^n. m-«. 99« Batch Implicated Anderson's normal Batch Anderson's normal
batches (Control Phase 1) PCs approximation approximation
(99.99% limit)
Absorbance 35 8 33.82 71.90 5 2, 1 15.56 4 76.78
47.12 6 2 9 689.51
35.06 8 1,8
36.82 10 1 ,7 ,6
48.93 16 1,8

SNV 24 6 32.06 89.69 5 4 13.48 4 30.76


47.60 6 4 9 91.82
38.16 26 1 15 187.56
71.99 30 1
35.65 38 5
80.85 39 5 ,3
46.31 40 5
54.96 44 1,5

Detrend 34 7 48.41 14.56 4 46.70


15 599.20
16 25.15

SNV detrend 17 7 61.67 62.74 1 6 14.56 4 19.98


385.00 2 1 ,6 ,4 9 17.94
161.66 3 6, 1,2 15 404.05
168.65 5 3 16 24.80
163.08 6 3
168.47 7 6, 1,2
68.29 9 3
148.74 15 6, 1
165.96 16 2 ,6
64.96 19 6, 5 ,4
84.15 29 3, 2 .4

Savitzky- 26 6 30.15 52.52 10 4 13.48 4 54.40


Golay 2°‘‘ 89.81 18 4 ,3 13 102.00
derivative 32.42 21 4 15 199.00
30.39 22 4 21 127.73
156.65 30 4, 1
197.62 31 4
96.29 32 4
40.96 38 4
37.84 39 4
38.01 40 4

Table B41. MSPC of higher strength tablet PCA models (RCA).


Data Set Control PCs m-fi. 99'X 7^> n, in-r. 99% Batch Implicated Anderson's normal Batch Anderson's normal
batches (Control Phase 1) PCs approximation approximation
(99.99% limit)
Absorbance 32 7 30.94 48.58 3 1 14.56 4 24.47
11 292.02
12 129.88
19 47.18
43 110.57

SNV 20 7 47.71 64.31 3 3, 2. 1 14.56 4 64.71


53.53 23 3 11 116.65
60.38 36 1,5, 2 ,4 18 28.36
113.96 40 4, 5 .3 19 156.03
72.94 41 5 .4 43 40.96

Detrend 22 7 42.44 42.85 12 14.56 4 50.28


49.20 36 11 20.15
54.82 40 18 373.97
51.10 41 19 259.03

SNV detrend 21 7 44.82 64.04 6 1,3 14.56 18 189.67


44.93 7 1,3 19 371.18
71.33 8 1 ,3 .6 34 22.14
70.29 11 1,3 43 42.49
150.19 18 3. 1
166.88 26 1 ,6
105.72 30 1,5, 6 ,7
231.95 36 1 ,5 ,6
82.23 39 1 ,6
114.32 43 6, 5. 1

Savitzky- 28 6 28.66 40.29 3 2 13.48 4 745.15


Golay 28.98 13 2 11 33.54
derivative 39.50 19 4, 6 ,2 19 213.59
32.28 41 6, 4, 1 43 28.98

310
Table B42. MSPC of lower strength tablet PCA models (Intact).
Data Set Control PCs m-n. 99% Batch Implicated PCs Anderson's normal Batch Anderson's
batches (Control Phase 1) approximation normal
(99.99% limit) approximation
Transmission 41 9 34.86 92.50 38 2. 1.3 16.51 1 -217.45
115.28 39 2. 1,3 6 120.46
135.78 40 2, 1,3 19 80.15
30 68.29
32 28.52
39 2519.17

SNV 17 8 82.33 394.65 5 6 15.56 6 39.78


181.01 6 6 ,3 30 39.91
91.66 8 6 39 2027.54
118.27 9 6
104.54 25 1
93.99 26 1
1413.65 30 6, 1
1491.62 31 6, 1,8
1565.12 32 6, 1,8
1220.39 38 1 ,3 ,6
1946.85 39 1 ,6 ,3
1943.94 40 1 ,6 ,3

Detrend 32 8 57.44 63.65 38 1,4 15.56 1 303.70


71.14 39 1,4 9 43.36
84.86 40 1 ,4 ,3 17 216.84
30 159.72
32 188.62
39 18.72

SNV detrend 33 8 35.07 39.71 5 1,7 15.56 17 66.50


52.38 30 2 39 633.97
159.73 38 1.3
183.35 39 1.3
191.50 40 1,3

Savitzky-Golay 35 6 25.24 47.51 5 5, 1 13.48 9 113.90


2“'“ derivative 29.92 6 5 18 21.97
63.09 38 1 ,2 ,3 19 46.16
90.75 39 2, 3, 1
92.94 40 2, 1,3

311
Table B43. MSPC of higher strength tablet PCA models (Intact).
Data Set Control PCs n. m-fi. 99a 7^». >»-». 99* Batch Implicated Anderson’s normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Transmission 25 7 37.28 437.21 25 1 .5 ,3 14.56 22 95.19
261.36 26 1.5 40 15.61
237.65 27 1 .3 .5 41 455.95
239.99 28 1 .6
638.92 29 1 .6 ,5
2020.40 30 6. 1,2
289.32 31 1 .6
47.51 38 1
10423.23 40 6. 1 .3 .5
2196.21 41 1.6, 5 ,4

SNV 29 5 23,50 23.92 2 4 12.30 5 20.12


1865.05 13 1 .5 ,4 ,3 22 181.92
218.45 14 1 .4 30 35.99
1442.91 15 1 .4 ,5 40 128.54
397.21 22 1 41 16.98
139.15 23 1.3
479.36 24 1.5
87.29 28 1.3
1482.03 29 1 .4 , 3 ,5
4587.34 30 1 .5 ,4 , 3 ,2
329.09 31 1 .5 ,3
3077.95 40 1 .2 ,4 ,3
179.77 41 1 .2 ,3

Detrend 32 7 30.94 41.08 1 1 14.56 25 30.40


64.40 30 2, 4 ,7 40 35.76
275.75 40 4, 2 ,7 41 622.95
55.73 41 2 ,4

SNV detrend 23 4 21.75 33.27 2 1 11.00 22 58.49


31.63 4 1 26 20.50
-42.60 6 1 30 64.20
61.52 13 1 40 48.09
52.60 15 1 41 22.83
31.31 22 1
25.97 24 1
33.00 25 1.2
24.19 26
48.59 28 1
141.29 29 1
331.80 30 1.2
64.28 31 1
1067.67 40 2. 1
200.99 41 2, 1

Savitzky-Golay 29 3 15.51 60.60 25 1 9.53 29 23.12


2"^ derivative 38.21 26 1 30 29.44
33.22 27 1 40 305.13
31.50 28 1 41 130.30
91.48 29 1
250.66 30 1
52.60 31 1
1316.8 40 1
273.22 41 1

312
Table B44. MSPC of lower strength blend (RCA) and tablet (RCA) multiway PCA
models.
C ontrol PCs t ‘> Im plicated Anderson’s normal Anderson's
batches (Control Phase 1) PC s approximation normal
(99.99% limit) approximation
14,56 42,87
247,51

42,22 1 3. 1 12,30 22 12,95


34,49 4 1.3 24 19,07
52,10 8 3. 1
36,33 14 1

82,51 1 1 11,00 6 20,14


101.72 2 1 7 23,69
107,15 3 1
130,48 4 1
125,20 5 1
126,68 6 1
143,01 7 1
88,05 8 1
56,33 9 1
65,26 10 1
89,53 11 1
113,07 12 1
133,59 13 1,3
102,33 14 1
114,56 15 1
108,81 16 1
134,63 17 1
150,54 18 1.3

SNV detrend 30 40,24 25 2 .4 13.48 7 115,59


32,49 33 5 .4 ,2

Savitzky- 217,59 1 1 9,53 6 28,95


Golay 2"'* 390.96 2 1.3 7 12,42
derivative 313.55 3 1 8 15,52
273,96 4 1 32 11,19
374,73 5 1.3
312,74 6 1
340,16 7 1
286,54 8 1
163,58 9 1
33,15 10 1
152,11 11 1
67,37 12 1.3
101,56 13 1.3
272,28 14 1.3
223,58 15 1.3
75,48 16 1.3
100,70 17 1.3
193,19 18 3. 1

313
Table B45. MSPC of higher strength blend (RCA) and tablet (RCA) multiway
PCA models.
Data Set Control PCs .-..« % Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Absorbance 41 8 32.20 - 15.56 4 70.30
32 16.73
38 20.62
40 25.38

SNV 27 6 29,37 32.23 24 3 .4 13.48 2 39.06


30.06 32 1,2 16 109.60
17 18.54

Detrend 18 6 43.26 293.51 1 5 ,4 13.48 2 23.58


86.84 2 5, 3 ,4 7 15.11
93.36 8 5 15 16.83
400.07 13 5 ,2 16 44.65
95.87 14 4 ,3 19 105.55
262.87 15 1 .5 ,4 ,3
95.52 18 5
57.68 20 2
440.81 21 5 ,2
410.31 24 2 ,4
44.41 27 5 ,2
110.67 29 4
183.19 32 5 ,2
318.14 34 2 ,4
72.68 36 2
213.81 39 4 ,2 ,5
284.34 40 2 ,5

SNV detrend 35 6 25.35 64.29 1 6 ,3 13.48 2 51.70


41.85 8 6 15 18.46
56.08 9 6 16 20.88
61.20 13 3 19 47.64
45.70 19 6 ,3
27.00 24 2

Savitzky- 31 4 18.87 151.03 1 2 11.00 7 22.44


Golay 2"" 141.35 8 2 16 14.47
derivative 115.67 9 2 19 104.15
107.02 13 2
99.81 19 2
107.54 21 2

Table B46. MSPC of lower strength blend (RCA) and tablet (Intact) multiway
PCA models.
Data Set Control PCs n. m-n. 99» Batch Implicated Anderson's normal Batch Anderson's
hatches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Raw" 26 7 36.01 53.49 1 1 14.56 9 17.72
46.10 8 1 10 34.83
21 40.10
25 36.12

SNV 28 7 33.93 101.38 25 1,3 14.56 25 640.75


61.68 26 1 27 17.14
58.62 27 1
149.91 33 3, 1,5
136.54 34 3,1
204.25 35 1,3, 6 ,4

Detrend 22 6 34.59 48.27 25 2 13.48 4 28.52


126.06 33 3, 1 25 476.38
130.06 34 3, 1
176.96 35 3, 1,5

SNV detrend 23 7 40.44 114.44 25 3 ,7 14.56 4 168.96


95.20 26 3 ,2 .7 7 42.34
92.28 27 2, 3, 1 17 22.19
199.08 33 2, 5 ,3 25 134.83
189.41 34 2, 3 ,5
225.25 35 2 ,5 ,3

Savitzky- 18 5 33.56 64.31 1 3 12.30 9 14.28


Golay 2"“ 48.00 8 3 11 28.67
derivative 36.99 20 3
84.96 25 2
53.48 26 2
36.05 27 2
175.64 33 2 ,1 ,3
246.71 34 2 ,1 ,3
262.41 35 2,1
51.25 39 3, 1

Multiway PCA model: blend absorbance and tablet transmission data.


314
Table B47. MSPC of higher strength blend (RCA) and tablet (Intact) multiway
PCA models.
Data Set Control PCs 7^> 7^», Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Raw ' 37 7 28.50 67.27 38 4, 1 14.56 9 15.85
13 490.92
41 49.67

SNV 27 3 15.90 48.06 9 1 9.53 13 11.35


41.22 11 1 20 155.42
19.61 20 1 38 43.41
24.76 21 40 12.55
20.55 22 1
52.09 27 1
83.50 28 1
17.45 29 1
191.72 38 3. 1
24.90 39 1

Detrend 27 4 19.99 23.85 27 2 11.00 20 18.65


58.15 28 2 38 56.46
22.37 34 3 40 21.18
257.54 38 1, 2
50.20 39 1,2

SNV detrend 26 4 20.36 33.36 2 1 11.00 15 25.92


38.75 9 1 20 291.46
30.91 11 1 38 79.53
21.05 20 1 40 13.36
24.39 26 1.2
70.43 27 1
153.31 28 1
35.61 29 1
402.77 38 1 .3 ,2
82.66 39 1.3

Savitzky- 19 3 18.80 19.69 1 1 9.53 20 17.07


Golay 2"“ 30.19 2 1 23 15.46
derivative 42.15 23 1 28 19.61
65.45 24 1.3 38 1091.35
18.84 25 1
33.92 26 1
77.87 27 1
207.69 28 1
34.21 29 1
30.11 34
1091.89 38 1
325.52 39 1.3

Table B48. MSPC of lower strength blend (RCA) and tablet (RCA and Intact)
multiway PCA models.
Data Set Control PCs m-fi, 99% Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% Umit) approximation
Raw" 39 8 31.91 15.56

SNV 32 7 30.94 60.22 25 1.3 14.56 24 307.98


35.47 27 1 25 331.61
86.48 33 1 .6 ,3
67.50 34 1.3
95.97 35 6, 1

Detrend 29 7 33.06 33.84 25 2 14.56 25 23.52


113.28 33 3. 1.4 27 16.22
143.00 34 3. 1.4
131.55 35 3. 1

SNV detrend 29 6 28.03 70.74 25 3 13.48 7 39.13


42.54 26 5 25 44.45
44.57 27 5 .2
70.31 33 2
74.99 34 2
121.79 35 2 .6

Savitzky- 11 5 79.51 121.83 25 2 12.30 9 14.17


Golay 2"“ 108.98 33 2 11 14.31
derivative 166.56 34 2 34 36.23
196.42 35 2
99.46 39 1.3

Multiway PCA model: blend absorbance and tablet transmission data.


Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
315
Table B49. MSPC of higher strength blend (RCA) and tablet (RCA and Intact)
multiway PCA models.
Daia Set Control PCs r. m-n. 99* Batch Implicated Anderson's normal Batch Anderson's
batches (Control Phase 1) PCs approximation normal
(99.99% limit) approximation
Raw ' 25 6 31.05 32.28 2 1 13.48 13 329.20
35.31 21 3 20 229.50
100.22 23 1,3 38 27.25
116.25 24 3, 1
87.(X) 25 3, 1
36.59 27 1
55.92 28 1
240.66 38 1,4
109.02 39 1

SNV 21 3 17,78 35.70 9 1 9.53 13 14.65


31.78 11 1 20 104.04
55.75 21 2 ,3 38 19.60
35.30 26 2
88.07 27 1,2
93.12 28 1
26.78 29 1
24.91 32 2
20.88 34 2
182.40 38 3. 1,2
40.24 39 2, 1

Detrend 33 5 22.14 36.85 24 5 12.30


24.77 27 1,2
52.25 28 2
41.10 34 3
299.88 38 1,2.4
52.67 39 1

SNV detrend 36 4 17.87 30.97 24 3 11.00 7 12.15


25.42 34 2 ,4 15 89.13
126.15 38 4, 3. 1 20 97.53

Savitzky- 28 3 15.69 19.46 1 1 9.53 20 16.67


Golay 2"'* 26.31 2 1 23 15.00
derivative 17.17 11 1 28 20.60
77.40 23 1 38 1111.89
67.11 24 1
39.15 25 1
42.99 26 1
117.90 27 1
323.83 28 1
60.24 29 1
16.07 34
1715.01 38 1
427.71 39 1

Multiway PCA model: blend absorbance and tablet absorbance and transmission data.
316
Table B50. Principal Factor Analysis results (normal varimax): raw material usage
data (kg) and blend PCA results {Q statistic and Anderson’s asymptotic normal
approximation).
Data Factor Variable Batch Number Loading*
Reflectance 14 M agnesium stearate EX 004181 0.35
7^ PC 5 - - 0 .3 9
fPC6 - 0.54
Anderson's asymptotic normal approx. - 0.44

SNV 10 Dibasic calcium phosphate anhydrous E X 005147 0.49


Microcrystalline cellulose E X 004245 0.51
Anderson's asymptotic normal approx. - 0.36
14 Dibasic calcium phosphate anhydrous E X 006189 -0 .4 0
Dibasic calcium phosphate anhydrous E X 005148 0.77
fPC3 - 0.43

Detrend 8 Drug substance 7D R B 041A 0.37


Dibasic calcium phosphate anhydrous EX 006245 0.48
Microcrystalline cellulose EX 006279 0.51
fPC6 - 0.39
14 Dibasic calcium phosphate anhydrous E X 006189 -0 .3 6
Dibasic calcium phosphate anhydrous E X 005148 0.73
Q - - 0 .3 9

SNV Detrend 3 Dibasic calcium phosphate anhydrous E X 005147 -0 .4 8


Microcrystalline cellulose E X 004245 - 0 .5 2
Anderson's asymptotic normal approx. - -0 .3 7
10 Drug substance 7D R B 041A 0.35
Dibasic calcium phosphate anhydrous EX 006245 0.51
Microcrystalline cellulose EX 006279 0.55
7^ PC 3 - 0.34
14 7^ PC 5 - 0.33
fPC6 - 0.51
7^ PC 7 - 0.48
15 Dibasic calcium phosphate anhydrous EX 006189 0.40
Dibasic calcium phosphate anhydrous E X 005148 -0 .8 1
Q - 0.34

Savitzky-Golay 2 Dibasic calcium phosphate anhydrous EX 008202 0.44


2"^ Derivative
Microcrystalline cellulose E X 007173 0.44
Sodium starch glycolate E X 008015 0.44
Q - 0.37
13 Magnesium stearate E X 008090 0.51
7^ PC 4 - 0.36
7^ PC 6 - 0.50
Q - -0 .4 5
16 fPC3 - 0.61
Anderson's asymptotic normal approx. - - 0 .5 4

Significant correlation {p = 0.01, « = 70 )

317
Table B51. Blend PCA loading correlations with excipient NIR spectra
(absorbance, DT absorbance and Savitzky-Golay 11 point 2"** derivative of
absorbance spectra).
NIR Spectral Data Raw Material PC Loadings Correlation, r
data used for Points
PCA
Reflectance^ 700 Dibasic calcium phosphate 2 -0.969
anhydrous
Magnesium stearate 2,1 -0.837, -0.694
Microcrystalline cellulose 2,1 -0.961, -0.859
Sodium starch glycolate 1 -0.829

SNV^ 700 Dibasic calcium phosphate - -

anhydrous
Magnesium stearate 5,6 0.656, 0.624
Microcrystalline cellulose 3 0.861
Sodium starch glycolate 3,2* 0.685, -0.508

Detrend^ 700 Dibasic calcium phosphate 1 0.636


anhydrous
Magnesium stearate 6 -0.703
Microcrystalline cellulose 2 0.819
Sodium starch glycolate 2 0.900

SNV Detrend^ 700 Dibasic calcium phosphate _

anhydrous
Magnesium stearate 5 0.800
Microcrystalline cellulose 2,3 0.746, -0.568
Sodium starch glycolate 2 0.757

Savitzky- 546 Dibasie calcium phosphate - -

Golay 2"^ anhydrous


derivative^
Magnesium stearate 4 -0.810
Microcrystalline cellulose 2 -0.759
Sodium starch glycolate 2 -0.808

' correlation produced using raw absorbance spectral data.


' correlation produced using detrend of absorbance spectral data,
correlation produced using Sg2dl 1 spectral data,
loadings characteristic of water.

318
APPENDIX C

Tables For Singleblock PLS And Multiblock PLS

Data Sets

319
Table C l. Cumulative percentage sum of squares (%S5) accounted for by lower
strength blend PLS models (n = 39 batch average observations) for X (NIR) and
Y(Certificate of Analysis data) blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%SS) Y block (%SS)
Absorbance 0 0 0
1 98.5278 11.2094
2 99.5886 12.5387
3 99.9250 15.4238
4 99.9655 18.8266
5 99.9723 33.4962
6 99.9891 36.8413

SNV detrend 0 0 0
1 63.0350 9.6071
2 73.4564 13.6424
3 83.1379 19.2952
4 92.3981 24.0503
5 96.1400 32.7977
6 97.3619 41.1443

Savitzky-Golay 2"’’ derivative 0 0 0


I 35.1658 12.4350
2 43.6291 38.5427
3 53.1901 45.7549
4 60.1101 53.5354
5 65.3785 63.8242
6 71.0695 69.6576

Table C2. Cumulative percentage sum of squares (%SS) accounted for by lower
strength tablet (combined absorbance and transmission data sets) PLS models (n
39 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%55) V block (%SS)
Absorbance 0 0 0
1 37.4909 16.6747
2 93.2834 22.2812
3 98.6305 36.0010
4 99.2602 46.0702
5 99.6718 51.7658
6 99.8546 52.6298

SNV detrend 0 0 0
1 24.1250 24.4640
2 53.4029 34.8386
3 67.3050 38.7702
4 82.9736 39.8916
5 94.3031 41.3013
6 96.5111 45.9564

Savitzky-Golay 2"'* derivative 0 0 0


1 17.2797 33.7190
2 51.3616 37.6372
3 66.3221 46.5585
4 74.1837 56.2960
5 77.4930 66.1358
6 80.8087 71.1353

320
Table C3. Cumulative percentage sum of squares (%SS) accounted for by higher
strength blend (absorbance and pre-treated absorbance data sets) PLS models (n
41 batch average observations) for X (NIR) and Y(Certificate of Analysis data)
blocks and for different NIR spectral data sets.
NIR data set PLS components X block (%S5) Y block (%SS)
Absorbance 0 0 0
1 97.0926 4.2075
2 99.4767 5.3678
3 99.7981 12.3122
4 99.9240 20.1134
5 99.9713 22.4132
6 99.9897 26.9295

SNV detrend 0 0 0
1 54.1217 6.4363
2 79.3772 15.4671
3 88.5172 18.5090
4 91.4556 23.9379
5 97.0076 25.7904
6 98.5628 30.6108

Savitzky-Golay 2"'* derivative 0 0 0


1 35.3910 8.1480
2 42.9317 21.7321
3 55.5738 27.8595
4 65.2647 33.4627
5 69.2866 41.3344
6 71.7891 49.1541

Table C4. Cumulative percentage sum of squares (%SS) accounted for by higher
strength tablet (combined absorbance and transmission and pre-treated
absorbance and transmittance data sets) PLS models (n = 41 batch average
observations) for X (NIR) and Y(Certificate of Analysis data) blocks and for
different NIR spectral data sets.
NIR data set PLS components X block (%SS) V block
Absorbance 0 0 0
1 60.6272 11.1973
2 88.9116 15.2110
3 95.9915 18.7508
4 98.2400 21.5196
5 99.3493 23.8571
6 99.6597 28.7177

SNV detrend 0 0 0
1 46.7215 9.4302
2 71.0492 12.7547
3 79.8127 19.2646
4 91.7310 22.1985
5 94.5895 26.8946
6 96.2716 32.4628

Savitzky-Golay 2"‘‘ derivative 0 0 0


1 32.7060 13.0083
2 46.2040 21.6517
3 53.8930 30.4609
4 63.4475 36.4756
5 70.7722 40.0074
6 74.1100 45.2025

321
Table C5. Cumulative percentage sum of squares (%SS) explained by multiblock
PLS models for each subsection of the lower strength manufacturing process : XI
(blend: absorbance/pre-treated absorbance data) and X2 (tablet: combined
absorbance/pre-treated absorbance and transmission/pre-treated transmission
data) and certificate of analysis data, Y (n = 39 batch average observations) for
different NIR spectral data sets.
NIR data set PLS components X I (blend, %SS) X2 (tablet, %SS) Y (Certificate of analysis data, %SS)
Raw* 0 0 0 0
1 98.5270 21.3553 13.2083
2 98.7977 43.9091 21.5690
3 99.8554 98.6277 23.1329
4 99.9655 99.1732 30.8496
5 99.9829 99.6724 34.7930
6 99.9901 99.7694 36.8361

SNV detrend 0 0 0 0
1 42.0676 14.0736 19.2026
2 75.2256 51.3683 23.9042
3 85.4275 68.6934 26.2144
4 88.4289 82.1438 30.7211
5 96.1761 91.1881 34.7918
6 97.1229 96.4851 36.3713

Savitzky-Golay 2"‘‘ 0 0 0 0
derivative 1 26.1042 18.8555 23.8181
2 44.8448 57.2383 27.2238
3 53.2462 63.7748 36.4666
4 60.8855 67.1647 45.4472
5 67.5435 75.4463 48.8964
6 72.0485 80.2112 53.0074

Table C6. Cumulative percentage sum of squares (%SS) explained by multiblock


PLS models for each subsection of the higher strength manufacturing process: XI
(blend: absorbance/pre-treated absorbance data) and X2 (tablet: combined
absorbance/pre-treated absorbance and transmission/pre-treated transmission
data) and certificate of analysis data, Y (n = 41 batch average observations) for
different NIR spectral data sets.
NIR data set PLS components XI (blend, %S5) X2 (tablet, %SS) V (Certificate of analysis data, %SS)
Raw* 0 0 0 0
1 97.0868 62.6657 10.5780
2 99.5043 79.4850 13.9584
3 99.6162 94.8990 17.8981
4 99.9178 97.2575 21.3120
5 99.9751 99.0602 23.1101
6 99.9864 99.5196 31.5082

SNV detrend 0 0 0 0
1 49.0888 42.9342 8.4910
2 79.4721 67.4368 12.2611
3 83.4558 79.4804 19.0699
4 90.7284 89.2626 24.2839
5 96.5455 92.4355 28.7376
6 98.3923 95.7558 31.2060

Savitzky-Golay 2"'* 0 0 0 0
derivative 1 30.7872 19.9184 15.1495
2 46.3398 40.3850 21.5796
3 59.4878 50.9751 24.6986
4 63.2746 57.4616 29.0838
5 68.6447 62.7568 34.7574
6 74.8242 67.8627 36.9821

^ Raw data includes blend absorbance data (XI) and combined tablet absorbance and
transmittance data (X2).

322
Table C7. Partial least squares regression modelling (PLSRl algorithm) of
individual certificate of analysis (C. of A.) variables of lower strength blends and
tablets with their near infrared blend absorbance/pre-treated absorbance and
tablet combined absorbance/transmission or pre-treated absorbance/transmission
data (n = 36 observations).
C. of A. variable modelled PLS components PRESS minimum" ; Sum o f squares

Blend data (Sum o f squares o f autoscaled C. o f A. data = 0.97222)

Absorbance Blend uniformity 0.81764 15.90


Moisture deviation 0.79157 18.58

Blend uniformity 0.88841 8.62


Moisture deviation 0.86741 10.78

Savitzky-Golay 2 derivative Blend uniformity 0.85003 12.57


Moisture deviation 0.86956 10.56

Tablet data (Sum o f squares o f autoscaled C. o f A. data = 0.97222)

Raw Drug substance content 0.85185 12.38


Content uniformity 0.89283 8.17
Moisture 0.81383 16.29

Content uniformity 0.9257 4.78


Moisture 0.92766 4.58
Disintegration 0.94034 3.279

Savitzky-Golay 2°'^ derivative Drug substance content 0.96839 0.39


Content uniformity 0.93631 3.69
Moisture 0.86472 11.06
Disintegration 0.88985 8.47
Thickness 0.95229 2.05

Table C8. Partial least squares regression modelling (PLSRl algorithm) of


individual certificate of analysis (C. of A.) variables of higher strength blends and
tablets with their near infrared blend absorbance/pre-treated absorbance and
tablet combined absorbance/transmission or pre-treated absorbance/transmission
data {n = 36 observations).
NIR data set C. of A. variable modelled PLS components PRESS minimum" % Sum o f squares

Blend data (Sum o f .squares o f autoscaled C. o f A. data = 0.9750)

Absorbance Content uniformity

SNV detrend

Savitzky-Golay 2”'* derivative

Tablet data (Sum o f squares o f autoscaled C. o f A. data = 0.9750)

Raw Average weight 0.75624 22.44


Tablet thickness 0.58611 39.89

SNV detrend Average weight 0.84602 13.23


Tablet thickness 0.62250 36.15

Savitzky-Golay 2"‘‘ derivative Average weight 0.74988 23.09


Tablet thickness 0.58528 39.97

^PRESS is the predicted residual error sum of squares between PLS model predicted
and certificate of analysis measured values after cross validation.

323
Table C9. Q statistics for singleblock PLS models of lower strength blend.
Q99 Q9! Q>Q^
1 0.1374
2 0.1612
3 0.2119
10 0.2066
19 0.2217
22 0.1681
23 0.1754
24 0.2106
25 0.1740
26 0.1313
30 0.1703
35 0.1076
39 0.1159

1 31.0732
2 58.4866
3 66.2464
9 112.4084
10 43.6827
25 63.3014
28 54.5769
31 27.4952
34 32.9563
35 26.6006
36 50.0634
39 23.5542

Savilzky-Golay 2 derivative 14 480.1945


25 418.6779

Table CIO. Q statistics for singleblock PLS models of lower strength tablets (n = 39
batches).
Q99 Q95
1 9.9436
5 2.8456
6 2.9097
8 5.0869
9 1.7279
10 4.1349
11 3.9836
21 2.8280
32 2.3869
33 4.5181
36 4.7909

38.3869 5 84.7026
7 80.5027
10 112.7194
11 132.7731
21 57.7824
32 84.8975
36 131.0980
37 143.3551

Savitzky-Golay 2 derivative I 251.4722


5 327.7355
11 384.2397
21 266.5947
22 250.0825
26 526.7971
27 330.1475
28 246.5633
31 291.0530
36 386.5780
37 486.9569

324
Table C il. Q statistics for singleblock PLS models of higher strength blends (n
41 hatches).
Q„s__________________ Baich_________ Q> Qvt
18 0.3257
21 0.2699
33 0.1929
34 0.1732
39 0.2439

SNV detrend 4 13.4550


15 15.3647
18 67.7308
30 21.8702
33 28.9836
35 32.4789
36 23.6168
37 14.2792
38 69.1630
39 41.3193

Savilzky-Golay 2 ”‘‘ derivative I 424.1561


2 413.2710
6 453.1932
9 257.3108
18 353.7704
21 260.2007
32 350.7126
35 328.6117
38 286.3250

Table C12. Q statistics for singleblock PLS models of higher strength tablets (n
41 hatches).

3
-£>J2a-
10.8397
8 19.3108
12 4.2183
14 24.4520
15 14.7481
21 13.2031
23 9.2130
28 6.9264
29 5.0530
30 6.7363
31 4.8932
35 12.1203
38 12.5183

2 119.7522
30 104.9829

Savitzky-Golay 2"“*derivative 2 632.6778


12 396.8873
24 508.0848
27 364.4079
28 738.8445
35 368.9999
37 547.1033
39 823.7451
40 459.3705
41 671.8081

Table C13. Q statistic monitoring of multihlock PLS models of lower strength


blends (n = 39 hatches).
Data Set_________________________________________________________ Batch Q> Q<k
5 0.1106
9 0.2374
18 0.1950
19 0.2284
21 0.1176
30 0.1502
36 0.3784

SNV detrend 1 37.2855


5 36.1332
9 36.8617
18 66.1289
36 44.3539

Savitzky-Golay 2°‘‘ derivative 299.6489 2 662.9428


3 444.4115
5 668.3952
10 397.1586
18 812.1618
25 587.9371
26 371.3142
36 319.7184

325
Table C14. Q statistic monitoring of multiblock PLS models of lower strength
tablets (n = 39 batches).
Data Set__________________________________ g » _______________________________ BatchQ> Q 99_________________
1 4.0697
4 1.2737
5 2.7040
6 4.0430
9 4.3076
10 2.1581
U 4.0817
21 2.2855
32 2.0614
33 3.7292
36 4.2138

5 141.7461
6 82.5636
7 74.1785
9 89.9190
11 284.0160
21 79.1252
36 213.1416
37 117.3819

Savitzky-Golay 2 “’*derivative 2 364.2942


3 402.7941
5 396.1163
10 601.3112
14 450.2222
25 722.4410
26 710.5008
36 391.8748
37 548.6745

Table CIS. Q statistic monitoring of multiblock PLS models of higher strength


blends (n = 41 batches).
Data Set_____________________________________ 2s!2____________________ Batch Q > Qw_________________
13 0.3501
18 0.4603
21 0.4079
34 0.3011
36 0.4371
37 0.2378
38 0.2379
39 0.2998

8 23.4291
14 26.4575
18 57.6511
28 20.9336
29 38.7516
30 23.9276
38 55.3172
39 34.4153

Savitzky-Golay 2"'* derivative 1 829.7140


2 531.1372
6 546.0073
8 453.8431
15 359.1001
18 397.5697
21 364.2394
27 368.0456
28 402.0031
29 445.5624
30 523.9487
31 336.3478
32 445.7456
33 633.3479
35 717.5016
38 597.7105
39 369.1461

326
Table C16. Q statistic monitoring of multiblock PLS models of higher strength
tablets (i2 = 41 hatches).
Q>Q^
13 20.3496
14 42.0068
15 50.0628
28 34.1517
34 8.2974
36 13.3758
37 27.2729
38 27.9747
39 7.6078
41 9.4235

28 293.69
30 106.02
38 1507.04
39 119.99

Savilzky-Golay 2"'* derivative 353,2041 276.5355 1 447.87


2 729.30
8 356.96
15 373.96
23 1266.44
25 657.69
27 950.02
28 2316.70
29 873.21
30 601.94
31 420.30
32 411.18
33 434.62
35 673.49
38 9760.45
39 2527.45

Table C17. MSPC of single block PLS models of lower strength blends (n = 39
hatches).
D ata Set C ontrol PLS model ^ Batch Im plicated A nderson's B atch Anderson's
7^>7^ n. ;n-n. 99<*
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw data 39 6 24.1641 - - - 13.4774 17 231.4818
24 18.2005
27 17.6282
32 65.7944

SNV 17 6 46.8408 51.4504 10 6 13.4774 4 18.6013


detrend 162.3735 12 3 ,5 10 17.5367
166.7626 13 2. 3, 4, 5 25 46.7734
178.8413 15 3 ,5 ,4
127.4049 16 3 ,5 ,4
96.9685 18 4, 2 ,6
64.8746 22 3
129.6379 25 3 ,2
146.3403 26 6, 3, 2, 4
128.8867 27 3, 2, 5, 4
87.1072 28 5 ,6
70.4428 29 3 ,2
56.2280 33 4, 3 ,6
67.2597 35 6, 3 ,4

Savitzky- 29 6 28.0336 35.3602 2 2 ,4 13.4774 3 43.9022


G olay 2'"' 50.5780 5 2 4 139.4466
derivative 32.7429 13 3 7 75.1900
80.9381 18 2 ,4 9 74.8522
18 71.0997

327
Table CIS. MSPC of single block PLS models of higher strength blends (n = 41
hatches).
Data Set Control PLS model Batch Im plicated A nderson's Batch A nderson's
T^n. m—n. 99% 7^>7^n, /n-n. 99%
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw data 36 6 24.9863 - - - 13.4774 9 55.2053
23 13.9840
40 224.0407

SNV 31 6 26.9568 56.2801 13 1 ,4 , 3, 2 ,6 13.4774 9 27.9771


detrend 39.7230 14 5 ,3 19 34.9401
42.3505 18 5 23 109.5630
34.5354 19 5, 3, 1
35.6836 23 3
36.5784 24 3
39.4643 25 3
39.7649 29 3 ,5

Savitzky- 32 6 26.0762 30.0860 13 2 13.4774 7 42.7164


G olay 2"‘* 41.4962 28 2 ,6 9 42.5453
derivative 30.7203 29 3 ,6 ,2 13 24.2530
41.9554 38 6 ,2 23 18.1693

Table C19. MSPC of singleblock PLS models of lower strength tablets {n = 39


hatches).
D ata Set Control PLS m odel ^ B atch Im plicated A nderson's Batch A nderson's
Batches rank T^n. n^n.99%. 7 ^ > 7 ^ n . m -n.99%
com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 28 6 28.6609 32.5607 2 1 ,3 13.4774 10 28.2784
data 33.4329 3 3, 1 11 14.9977
33.1637 8 1 ,5 30 19.5014
56.0507 33 1 ,3 ,5 37 23.1500
97.4051 34 1 ,3 ,5
70.4464 35 1 ,2 ,3

SNV 25 6 31.0476 67.4030 1 1 ,6 ,5 13.4774 26 17.8473


detrend 36.6776 2 4 36 58.3499
42.5664 19 1 ,6
39.5029 20 1
120.2734 25 4, 1
62.9724 26 4
78.8773 27 4, 1
116.3010 33 1 ,2
136.8727 34 1 ,2 ,4
172.8398 35 1 ,2 ,4

Savitzky 17 6 46.8408 85.6728 6 5 ,2 13.4774 1 62.0617


-Golay 47.5518 8 2 ,4 4 41.0105
2nd
51.3959 10 6 ,5 11 37.1803
derivati 74.8446 14 5, 2 ,4
ve 51.6367 15 5 ,2
107.0301 33 1 ,5 ,3
118.1172 34 1 ,3 ,6
122.4901 35 1

328
Table C20. MSPC of single block PLS models of higher strength tablets (/i = 41
hatches).
Data Set C ontrol PLS model ^ Batch Im plicated A nderson's Batch A nderson's
rank J n. ^ . 99% 7^> T^n. m-n. 99H
Batches com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 34 6 25.7327 37.6401 1 2 13.4774 2 20.6533
data 43.3983 9 3 ,2 4 107.2205
45.3390 13 2 ,3 39 20.1148
37.7641 19 3 ,2
31.3083 38 3, 5 ,6

SNV 19 6 40.4016 136.4086 1 1 ,4 ,3 13.4774 3 16.0606


detrend 48.8874 3 1 4 41.1726
66.3386 8 1 ,3 ,5 15 143.9220
109.6429 9 3. 1 ,4 35 30.0806
179.9242 13 1 ,4
172.5391 14 4, 1 ,3
87.2163 15 4, 2 ,5
85.5137 19 3, 1 ,4
77.0827 21 3
48.3603 23 6 ,5
93.3001 24 1
56.8649 34 1 ,2
255.1997 38 4, 6 , 2
88.8304 39 6, 1 ,4

Savitzky 23 6 33.2356 79.5435 23 5 13.4774 2 22.7688


-G olay 54.7429 34 3, 4, 1 ,5 13 290.4568
2"‘* 270.5364 38 1 ,3 , 2 ,4 15 36.1492
derivati 66.2157 39 2, 1

Table C21. MSPC of multihlock PLS models of lower strength blends {n = 39


hatches).
Data Set Control PLS model Batch Im plicated A nderson's Batch A nderson's
T^>T^n. m—n. 99^
Batches rank ^ com ponents norm al approx. norm al
(99.99% lim it) approx.
Raw 39 6 24.1641 - - - 13.4774 17 31.0109
data 22 24.0181

SNV 19 6 40.4016 336.9790 1 1, 2 13.4774 10 14.8050


detrend 454.4193 2 1, 2 12 41.5443
478.3653 3 1, 2 18 34.5997
536.2881 4 1, 2
441.1569 5 1, 2
490.5406 6 1, 2
511.1302 7 1, 2
392.6983 8 1 ,2
327.7269 9 1, 2
415.2102 10 1 ,4 .2
384.9173 11 1, 2
641.8274 12 1 ,5
722.5097 13 1 ,5 ,2
446.3537 14 1 ,2
713.7106 15 1 ,5
637.7805 16 1, 5
519.8125 17 2, 1 ,5
573.8687 18 1 ,2 , 5

Savitzky 28 6 28.6609 55.7324 4 2, 3 ,5 13.4774 4 240.0300


-G olay 145.1090 6 3, 2, 4, 6 6 24.4122
2nd 75.2717 7 3, 2 , 4 9 4 8.2270
derivati 30.7492 31 5, 3
ve 93.7048 33 1 ,2 ,5
108.5247 34 1 ,2 , 3 ,6
101.7803 35 1 ,2 ,5

329
Table C22. MSPC of multiblock PLS models of higher strength blends (n = 41
hatches).
D ata Set C ontrol PLS model ^ Batch Implicated A nderson’s B atch A nderson’s
rank m-n. 99% 7 ^> 7^n. m-n. 99»
Batches com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 25 6 31.0476 98.8444 21 1 ,3 ,4 13.4774 9 33.8573
data 80.5131 26 1 .3 ,4 40 205.7495
117.2681 27 1 ,3 ,4
69.7281 30 1 ,3
66.5197 31 1
161.0069 32 1 ,4 ,3
68.6803 33 1
103.4204 34 1
69.5886 35 1 ,5
114.5315 36 1
108.2342 37 1
59.8452 39 1 ,3
80.2606 40 1 ,4
60.3487 41 1 ,3 ,5

SNV 35 6 25.3531 27.2447 9 6, 1 ,3 13.4774 9 20.4434


detrend 47.7170 13 6, 2, 1 19 55.4270
23 25.7298
38 50.0795

Savitzky 28 6 28.66 31.9045 13 3, 2 , 1 13.4774 7 21.0214


-G olay 78.6975 14 4 ,3 9 25.8174
2nd
43.3303 19 4 ,3 13 21.4501
derivati 99.0945 28 6, 3, 4, 1 23 160.3362
ve 80.8732 29 6, 3 ,4
71.2137 38 3, 6, 1

Table C23. MSPC of multihlock PLS models of lower strength tablets in = 39


hatches).
Data Set C ontrol PLS model Batch Implicated A nderson’s Batch A nderson’s
^ n. m—n. 99» 7 ^ 7 ^ n. m—n. 99»
Batches rank com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 27 6 29.3665 56.1923 33 1 ,5 , 2 ,3 13.4774 10 18.4722
data 93.9374 34 1 ,5 , 2 ,3 11 26.2823
98.3632 35 1 ,2 37 29.5160
32.9918 39 1 ,4 ,5

SNV 29 6 28.0336 33.3677 25 5, 4 ,2 13.4774 26 24.2920


detrend 79.7102 33 1 ,2 36 74.5298
104.7714 34 1 ,2 ,6
117.9409 35 1 ,2 ,6

Savitzky 25 6 31.0476 39.3699 3 4, 3 ,2 13.4774 1 16.3947


-G olay 56.3101 25 3, 1 ,4 , 2 ,6 4 13.8552
2nd
32.2647 26 3 ,6 7 15.1962
derivati 102.7039 33 1 ,4 ,6
109.6028 34 1 ,5 ,2
132.3347 35 1 ,2

330
Table C24. MSPC of multiblock PLS models of higher strength tablets (n = 41
batches).
Data Set C ontrol PLS m odel ^ Batch Im plicated A nderson’s Batch A nderson's
T^>T^n. m-n. 99*
Batches rank " ^ com ponents norm al approx. normal
(99.99% lim it) approx.
Raw 40 6 23.9074 - - - 13.4774 2 99.4951
data 4 25.4433
32 21.5128

SNV 24 6 32.0642 88.5502 1 1 ,4 13.4774 4 42.3555


detrend 32.2035 8 6, 1 .5 7 14.6677
81.4838 9 4. 3, 2, 1 ,6 15 108.0377
110.2670 13 1 ,2 ,4 35 28.6275
64.2434 14 1 ,6 ,2
67.3319 15 1 ,2 , 6 ,5
82.7787 19 1 ,6 ,2 , 3
42.9276 21 6, 4, 2, 3
35.1013 23 5, 2, 1
66.7054 34 2, 3 ,5
108.2507 38 5, 4 , 6
48.6369 39 5, 6, 1

Savitzky 21 6 36.1890 62.2460 1 1 ,6 ,3 13.4774 4 30.13


-G olay 111.8127 8 1 ,6 , 4 ,5 13 1144.13
2nd
91.5308 9 4, 1 ,3 15 23.93
derivati 194.0301 13 1 ,4 , 6 ,5 38 14.23
123.3179 16 1 ,3 ,4
123.3179 17 1 ,3 ,4
123.3179 18 1 ,3 ,4
91.4559 19 1 ,3 ,4
85.7615 21 4, 1 ,6
51.9810 34 4, 3 ,2
64.5257 38 4, 5 ,3

331
Measurement of the cumulative particle size distribution of
microcrystalline cellulose using near infrared reflectance
spectroscopy

Andrew J. O ’Neil,* Roger D. Jee and Anthony C. Moffat

Centre fo r Pharmaceutical Analysis, The School o f Pharmacy, University o f London, 29-39


Brunswick Square, London, UK W CIN lA X

Received 14th September 1998, Accepted 9th November 1998

The cumulative particle size distribution of microcrystalline cellulose, a widely used pharmaceutical excipient, was
determined using near infrared (NIR) reflectance spectroscopy. Forward angle laser light scattering measurements
were used to provide reference particle size values corresponding to different quantités and then used to calibrate
the NIR data. Two different chemometric methods, three wavelength multiple linear regression and principal
components regression (three components), were compared. For each method, calibration equations were produced
at each of eleven quantités (5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 95%). NIR predicted cumulative frequency
particle-size distributions were calculated for each of the calibration samples {n = 34) and for an independent test
set {n = 23). The NIR procedure was able to predict those obtained via forward angle laser light scattering.

Measurement of the particle size distribution of powdered E xperim ental


pharmaceutical raw materials is an important task that must be
performed prior to manufacturing processes. This is because the Instrumentation
distribution determines physical properties such as powder flow
(Hausner ratio), dissolution rate and compressibility.‘ With NIR measurements were made using an FT-NIR NIRVIS
microcrystalline cellulose, a range of different grades are spectrometer (Model 100.1, Buhler, UzwiI, Switzerland) fitted
commercially available, each with different physico-chemical with a Buhler fibre-optic probe (Model 110.2). Reflectance
properties.^ These grades are classified by their median particle spectra were recorded over the range 4008-9996 cm ' (500 data
size and by their cumulative particle size distribution.^ Each points), each spectrum being the average of six scans. Particle
grade should have a nominal median particle size and a size data were acquired by forward angle laser light scattering
cumulative particle size distribution which agrees with the using a Malvern 2600C particle sizer (Malvern Instruments,
range set by a pharmacopoeial monograph.^ Malvern, UK). Sieve fractions were produced using a machine
Typically, particle size analysis of this material is by forward sieve (Endecotts, London, UK).
angle laser light scattering (FALLS) or sieve analysis.^
However, a drawback with these methods is that the analysis is
time consuming and sample destructive.^ Recently, a near Materials
infrared (NIR) spectroscopic method of analysis has been
described which is capable of measuring the median particle The grades of microcrystalline cellulose used were Avicel
size using a two wavelength multiple linear regression of NIR PH101 (16 batches), Avicel PH 102 (19 batches) and Avicel
reflectance, R, and FALLS data.^ PH200 (single batch), all from EMC International (Wallings-
The best calibration results were obtained using the logarithm town. Little Island, Co. Cork, Ireland).
of the FALLS median particle size versus reflectance data.^
Taking this method a step further, it should be possible to Sample preparation and presentation
produce calibrations for particle size quantiles other than just
50% (median size), for example, 5,10,20,30,40,50,60,70,80, Sieve fractions from single batches of Avicel PH 101, Avicel
90 and 95%. These calibrations would be produced in the same PH 102 and Avicel PH200 were produced by machine sieving
manner as described in our previous paper,^ and thus permit the using a nest of progressively finer stainless steel wire mesh
measurement of a sample’s cumulative frequency particle-size sieves (150, 90, 63, 45, 38 and 32 pm). In addition, material
distribution from its NIR spectrum. A manufacturer should then falling through the 32 pm sieve was collected.
have sufficient information to assess the likely physico­ The sieved fractions and samples of all the original Avicel
chemical properties of the sample.^ batches were particle sized by FALLS. Each sample was
The aim of this work was to measure the cumulative suspended in cold, distilled water with surfactant (dilute
percentage frequency particle-size distribution of microcrystal­ household detergent) prior to particle sizing and was gently
line cellulose. Two chemometric methods of calibration were shaken using a vortex mixer to prevent the formation of
compared; three wavelength multiple linear regression (MLR)^ agglomerates. NIR diffuse reflectance measurements were
and principal components regression (PCR)^ using three made on the samples of sieved and bulk materials in narrow
principal components, each model using log (FALLS particle disposable glass vials to permit a consistent compaction
size) values and NIR reflectance data. The robustness of the pressure. Sieved and bulk materials were scanned on different
calibrations was assessed using an independent validation set. days over the course of several weeks.

Analyst, 1999, 124, 33-36 33


Data analysis distributional shapes. Of the 57 samples available, 34 were
chosen at random for the calibration set; the remaining 23
Data were processed using in-house computer programs written samples were used as an independent validation set. To aid
in C and in Matlab 5 Scientific and Technical Programming comparison of the two calibration methods, the same calibration
language (Mathworks, Natick, MA, USA). The MLR program and validation data were used for each method.
was based on the routine svdfit, available in the literature.*
Programs were run on an Acer Pentium II 333 MHz machine. Multiple linear regression. Data from the calibration
samples were used to generate calibration equations for each
quantile by fitting the logd^ values to the NIR reflectance values
Results and discussion according to the equation
^ogdjc - bo + biRxi + bxRi^ + bj,Rx^ (1)
Preliminary investigation
where d is the FALLS interpolated particle size at quantile x, R
The results of previous work^ have shown that useful calibra­ is the reflectance at wavelength X and b are the MLR
tions for median particle size can be obtained by using NIR coefficients. The selection of wavelengths was performed on a
reflectance data with a logarithmic transform of the FALLS reduced data set of every other wavelength to reduce the
particle-size data, hence these data were used in this work. computation time required. A full three wavelength search for
With MLR calibrations, preliminary work showed that a each particle size quantile calibrated therefore used 250 of the
three wavelength linear regression at any of the FALLS 500 available wavelengths. This reduced the total computation
quantiles produced calibrations more robust than a two time for all 1 1 calibrations to about 1 0 h, compared with an
wavelength fit. It was therefore decided that three wavelength estimated 80 h if all 500 wavelengths had been searched.
MLR calibrations would be employed subsequently. With PCR For each calibration equation, the three chosen wavenumbers
models, three principal components were required to obtain (Table 2) were those which gave the smallest standard error of
satisfactory calibrations and this was used for all subsequent calibration (SEC).® The optimum wavenumbers were similar
calibrations. for the 30-60% quantiles, but varied for the extreme quantiles.
The calibration equations were then used to predict the
validation set {n = 23) to give an indication of the robustness of
Spectral characteristics the method (Table 3).

Each powdered sample exhibited an NIR reflectance spectrum Principal components regression. This calibration method
with a curved baseline resulting from multiple scattering (Fig. required the generation of a principal components analysis
1). Across the spectrum of each sample, the apparent offset (PGA) model. This consists of a set of new variables which are
appears to increase and this has previously been attributed to uncorrelated and represent linear combinations of the original
variations in pathlength,^ which in turn is dependent on particle NIR reflectance data.
size and sample porosity.
Table 1 Particle size ranges at each quantile for the calibration and
validation sets as determined by FALLS
Model generation
Particle size/^im
The FALLS instrument gives values of the cumulative percent­
age frequency particle-size distribution at 64 particle sizes Calibration set (n =: 34) Validation set (n == 23)
(range 564-5.8 pm) at intervals which follow a geometric Quantile
progression. For each sample, linear interpolation of the (%) Minimum Median Maximum Minimum Median1 Maximum
measured FALLS values was used to calculate the particle size 5 6.45 25.72 216.52 7.21 23.06 167.11
values corresponding to the 5,10, 20, 30,40, 50, 60, 70, 80,90 10 9.92 37.14 268.91 11.44 32.34 187.67
and 95% quantiles. The samples exhibited a wide range of 20 14.48 52.92 311.96 18.05 45.40 219.33
particle sizes at each quantile (Table 1) and a wide variety of 30 18.39 67.10 345.62 22.55 56.36 251.13
40 21.40 81.41 376.44 26.27 67.35 283.66
50 23.99 96.59 406.07 2&82 78.98 319.67
60 26.47 112.82 436.21 33.71 91.57 359.67
70 29.25 131.29 466.55 38.30 105.95 402.81
80 33.11 154.78 497.51 44.94 124.03 451.54
90 40.62 197.11 529.54 57.16 152.54 504.66
0.9 546.76 70.34 184.74 533.21
95 48.47 240.07
0.8

Table 2 MLR wavelengths and PCs selected for each percentage quantile
calibration
%
oc Percentage MLR wavenumber/cm-' PCs
0.5 4008 9300 9528 28 22 17
5
10 4008 9300 9528 29 27 14
0.4 20 5640 5676 6216 29 27 14
30 4464 9852 9864 29 28 27
0.3 40 5736 9M2 9864 27 15 1
50 5736 9852 9864 20 15 1
4500 5500 6500 7500 8500 9500 60 5496 9852 9864 20 15 1
Wavenumber/cm“^ 70 6024 6948 9168 15 14 1
80 5664 5796 9432 23 9 3
Fig. 1 NIR spectra of microcrystalline cellulose samples with different 90 5952 6996 8280 28 18 6
particle-size distributions and median particle sizes, (a) 24, (b) 45.8, (c) 95 7632 8532 8664 28 18 6
93.4, (d) 261 and (e) 406 fxm.

34 Analyst, 1999, 124, 33-36


The PCA model, X, was obtained as the product of a score with superscript b) which occurred at some of the extreme
matrix, T, with a loadings matrix, U, plus a residuals matrix, quantiles.
E:

X = TU+E (2) Cumulative particle size distributions


where X represents the original spectral data. Principal
The percentage quantile value was plotted against the NIR
components (PCs) are arranged such that the first represents the
predicted logcf^ of each sample in the calibration and validation
variable describing the largest amount of variance in the data
sets to give cumulative particle-size distribution curves for both
set, the next represents the largest residual variance, and so on
the MLR and PCR methods. The MLR and PCR results for the
until all PCs are extracted. Regression of FALLS data was as
first four validation samples are shown in Fig. 3, which also
described above for MLR, except that PC scores were used in
shows the FALLS measured cumulative percentage frequency
place of reflectance values. For each calibration, the three PCs
distributions overlaid. Predicted distributions for both calibra­
selected were those which gave the highest correlations with the
tion methods closely follow those obtained by FALLS, although
FALLS data (Table 2). The total time required to compute PCs
PCR predicted distributions match the FALLS measured
and PCR calibration equations was much faster than MLR,
distributions more closely than with MLR.
requiring only about 2 0 min.
In this work, the number of quantiles at which calibration
equations were set up was restricted to 11. In principle, more or

Calibration and validation precision


0.12

With both methods, individual calibrations were the most 0.11


precise at the 40% and 50% quantiles (Table 3). This is clearly
seen from the plot of SEC versus percentage quantile (Fig. 2).
The decrease in the precision of individual calibrations at the
extreme quantiles probably reflects the shape of the distribution I
curves for the particle sizes in the calibration sets. The shapes of â 0.08
the distributions become more skewed at the extreme quan­
o’ 0.07
tiles.
Ü
With both MLR and PCR, excellent calibration results were w 0.06
obtained, with low SECs (Table 3). The SECs at each quantile
are smaller with MLR; however, the standard errors of 0.05
prediction (SEPs)^ for the independent validation set are smaller
0.04
with the PCR model (Table 3). This suggests that the PCR
model is more robust. Table 3 also gives the slopes and 0.03
intercepts for the plots of NIR predicted logd^ versus FALLS 0 10 20 30 40 50 60 70 80 90 100
measured log<ix values at each quantile. The slopes and Percentage quantile
intercepts were not significantly (5% probability level) different Fig. 2 Standard errors of calibration (SEC) versus cumulative percentage
from 1 and 0 , respectively, apart from a few values (indicated quantile: (A) MLR; and (B) PCR.

Table 3 MLR and PCR calibration and validation results at various percentage quantiles

Percentage

Parameter^ 5 10 20 30 40 50 60 70 80 90 95

MLR calibration set {n = 34]-


R 0.977 0.980 0.984 0.987 0.989 0.988 0.984 0.979 0.972 0.932 0.889
0.954 0.960 0.968 0.974 0.978 0.975 0.968 0.958 0.945 0.869* 0.791*
0.065 0.063 0.055 0.048 0.042 0.049 0.065 0.088 0.121 0.301* 0.497*
SEC [log(4/pm)] 0.084 0.071 0.055 0.046 0.039 0.039 0.042 0.046 0.052 0.080 0.104
RSD (%) 19.3 16.3 12.7 10.6 9.0 9.0 9.7 10.6 12.0 18.4 23.9
Validation set [n = 23 )—
R 0.951 0.951 0.965 0.971 0.959 0.955 0.950 0.959 0.964 0.897 0.822
m 0.876 0.950 0.977 0.978 0.984 0.973 0.943 0.986 0.980 0.921 0.734*
c 0.140 0.064 0.046 0.021 0.013 0.040 0.108 0.050 0.057 0.216 0.657*
SEP [log(4/fxm)] 0.131 0.109 0.074 0.066 0.074 0.073 0.073 0.070 0.061 0.106 0.132
RSD (%) 30.1 25.1 17.0 15.2 17.0 16.8 - 16.8 16.1 14.0 24.4 30.4
PCR calibration set (n = 34 )—
R 0.969 0.973 0.978 0.980 0.981 0.981 0.976 0.968 0.959 0.898 0.858
m 0.939 0.946 0.956 0.960 0.963 0.961 0.953 0.937 0.921 0.806 0.737
c 0.086 0.084 0.076 0.074 0.070 0.077 0.096 0.133 0.174 0.445* 0.627*
SEC [log(4/p,m)] 0.096 0.082 0.065 0.057 0.051 0.049 0.051 0.057 0.062 0.097 0.116
RSD (%) 22.1 18.9 15.0 13.1 11.7 11.3 11.7 13.1 14.3 22.3 3&8
Validation set [n - 23 )—
R 0.980 0.981 0.981 0.978 0.984 0.981 0.969 0.965 0.953 0.924 0.842
1.128& 1.045 0.998 0.969 0.967 0.970 0.977 0.975 0.946 0.932 0.861
-0.16* -0.071 0.005 0.053 0.051 0.042 0.024 0.034 0.079 0.092 0.258
SEP [log(d*/pm)] 0.085 0.062 0.051 0.050 0.041 0.045 0.056 0.057 0.071 0.094 0.124
RSD (%) 19.6 14.3 11.7 11.5 9.4 10.4 12.9 13.1 16.3 21.6 28.5
" R is multiple correlation coefficient, m and c are slope and intercept of plots of NIR predicted logd^ versus FALLS measured logd%; n is the number of
samples in each data set. * m significantly different from 1, ore significantly different from 0.

Analyst, 1999, 124, 33-36 35


less could be used. With the present data sets the errors do not the median or mean particle size.^-'o-^^ Although setting up the
justify the need for smaller intervals (Table 3). calibration equations is time consuming, once generated they
allow the rapid determination of the cumulative frequency
distribution of subsequent samples. Both MLR and PCR
Conclusion provide excellent results; however, the PCR method is compu­
tationally faster and slightly more robust. The method should be
applicable to other powdered pharmaceutical materials.
NIR spectroscopy may be used to measure the cumulative
percentage frequency particle-size distribution of powdered
The authors are grateful to Buhler for the loan of the NIR
microcrystalline cellulose. This represents a development over
instrument and to Mathworks for providing Matlab 5 software.
previous studies which have focused on measurement of only
They thank P. A. Hailey, Pfizer, Sandwich, UK and FMC
International for advice and providing samples of pharmaceuti­
100 100 cal excipients and A. J. O’Neil thanks Pfizer for a research
A B
80 80
grant. Kevin Taylor and Keith Barnes, The School of Pharmacy,
/
University of London, are thanked for assistance with forward
60 / 60 /
angle laser light scattering and sieving.
40 y 40 /
20 y / 20 /
0 0
10^ 10 ' 10^ 10^ 10 ' 10^
References
1 C. Washington, Particle Size Analysis in Pharmaceutics and Other
Industries, Ellis Horwood, New York, 1992.
2 Handbook o f Pharmaceutical Excipients, ed. A. Wade and P. J.
Weller, American Pharmaceutical Association, Washington, DC and
Pharmaceutical Press, London, 2nd edn., 1994.
3 British Pharmacopoeia 1993, H.M. Stationery Office, London, 1993,
vol. 1.
4 M. E. Aulton, Pharmaceutics: the Science o f D osage Form Design,
Churchill Livingstone, Edinburgh, 1988.
5 P. A. Hailey, P. Doherty, P. Tapsell, T. Oliver and P. K. Aldridge, J.
Pharm. Biomed. Anal., 1996,14, 551.
6 A. J. O’Neil, R. D. Jee and A. C. Moffat, Analyst, submitted for
publication.
7 B. G. Osborne, T. Feam and P. H. Kindle, Practical NIR
Spectroscopy with Applications in Food and Beverage Analysis,
Longman, Harlow, 2nd edn., 1993.
8 W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery,
Numerical Recipes in C. The Art o f Scientific Computing, Cambridge
University Press, Cambridge, 2nd edn., 1992.
9 H. Mark, Principles and Practice o f Spectroscopic Calibration, J.
Wiley, New York, 1991.
10 J. L. Ilari, H. Martens and T. Isaksson, Appl. Spectrosc., 1988, 42,
722.
Particle Size/jxm 11 E. Ciurczak, P. Tori ini and P. Demkowicz, Spectroscopy, 1986, 1,
36.
Fig. 3 Cumulative percentage frequency particle-size distributions for the 12 P. Frake, C. N. Luscombe, D. R. Rudd, J. Waterhouse and U. A.
first four validation samples with FALLS measured values overlaid. Sample Jayasooriya, Anal. Commun., 1998, 35, 133.
1; (A) MLR and (B) PCR. Sample 2; (C) MLR and (D) PCR. Sample 3: (E)
MLR and (F) PCR. Sample 4: (G) MLR and (H) PCR. Paper 81071341

36 Analyst, 1999, 124, 33-36


The application of multiple linear regression to the
measurement of the median particle size of drugs and
pharmaceutical excipients by near-infrared spectroscopy

Andrew J. O’Neil,* Roger D. Jee and Anthony C. Moffat

Centre fo r Pharmaceutical Analysis, The School o f Pharmacy, University o f London, 29139


Brunswick Square, London, UK W CIN lA X

Received 30th July 1998, Accepted 14th September 1998

A number of powdered drugs and pharmaceutical excipients were used to demonstrate the ability of near-infrared
spectroscopy to measure median particle size {dso). Sieved fractions and bulk samples of aspirin, anhydrous
caffeine, paracetamol, lactose monohydrate and microcrystalline cellulose were particle sized by forward angle
laser light scattering (FALLS) and scanned by fibre-optic probe FT-NIR spectroscopy. Two-wavenumber multiple
linear regression (MLR) calibrations were produced using: NIR reflectance; absorbance and Kubelka-Munk
function data with each of median particle size, reciprocal median particle size and the logarithm of median
particle size. Best calibrations were obtained using reflectance data versus the logarithm of median particle size
(NIR predicted Inafgo versus ln(FALLS dso) for microcrystalline cellulose and lactose monohydrate sieve fraction
calibrations: r = 0.99 in each case). Working calibrations for lactose monohydrate (median particle size range:
19.2-183 pm) and microcrystalline cellulose (median particle size range: 24-406 pm) were set-up using
combinations of machine sieve-fractions and bulk samples. This approach was found to produce more robust
calibrations than just the use of sieved fractions. The method has been compared with single wavenumber
quadratic least squares regression using reflectance and mean-corrected reflectance data with median particle size.
(Correlation between NIR predicted and FALLS values was significantly better using the MLR method.

The measurement of particle size for pharmaceutical materials relationship as these may exhibit Rayleigh scatter, which is
is important^ ' 2 because it influences bulk physical properties,^ proportional to the third power of the particle size.?*
and determines the ability of powders to flow, mix, granulate The aim of this study has been to measure the number median
and dissolve. It is also often a requirement in pharmaceutical particle size, J5 0 ,* in drugs and pharmaceutical excipients by
manufacturing processes that particle size measurements are NIR spectroscopy using relatively simple chemometrics. Single
performed on raw materials.^ Commonly employed methods of wavenumber quadratic least squares regression of NIR and
measurement are forward angle laser light scattering (FALLS) particle size data has been compared with a two-wavenumber
and electrical zone sensing.^ A disadvantage with these multiple linear regression (MLR) of the same data.?®- ?? The
methods is that samples generally need to be analysed away effect of different pre-treatments of NIR data and FALLS data
from the production area, which is time consuming and leads to on calibration has also been investigated.
manufacturing delays. These problems can be effectively
overcome by the use of near-infrared spectroscopy (NIRS).^
This technique has the advantages that in addition to providing E xp erim en tal
physical information about the raw material, such as its particle
size,^ it can simultaneously provide useful chemical informa- Instrumentation
tion, * - ' 9 and the analysis can be performed in a few minutes in
the pharmaceutical warehouse using fibre-optic probe instru­ NIR measurements were made using a FT-NIR NIRVIS (No.
ments.^ 100.1, Buhler AG, Uzwil, Switzerland) spectrometer fitted with
With diffusely reflecting materials, such as pharmaceutical a fibre-optic probe (No. 110.2). Spectra were recorded over the
powders, scattering of light in the NIR region (4000 to 10 000 range 4008 to 9996 cmr* (500 data points) each spectrum being
cm-*) produces spectra with non-uniform baselines and the average of six scans. Particle size data were acquired by
varying offsets. These scatter effects vary with the particle FALLS using a Malvern 2600C particle-sizer (Malvern Instru­
size,’®sample porosity® (and hence compaction pressure) and ments Ltd, Malvern, UK). A Philips XL20 scanning electron
with the wavelength^® and can be described using Rayleigh and microscope (Phillips Electron Optics, Eindhoven, Netherlands)
Mie theory,21 or alternatively using the Kubelka-Munk theory was used to determine particle shapes. Sieve fractions were
of diffuse reflectance. 2 1 produced using a machine sieve (Endecotts Ltd, London, UK)
Previous studies that have examined the effects of particle and an air-jet sieve (Alpine, Augsburg, Germany).
size on NIR spectra have demonstrated that reflectance varies
non-Iinearly with particle size.2*-25 Ciurczack et al3^ found that
reflectance exhibited an inverse relationship with mean particle Materials
size in agreement with Mie theory.^’ However, this relationship
does not necessarily apply in all cases and is dependent on the Single batches of aspirin and anhydrous caffeine (Sigma
shape of the particle size distribution of the sample,^’ the Chemical Co, St Louis, MO, USA) and paracetamol (Boots
particle shape?* and the materials refractive index.?* The Pharmaceuticals, Nottingham, UK) were used. Microcrystalline
presence of very small particles will further complicate the cellulose: Avicel PH 101 (16 batches), Avicel PH 102 (19

Analyst, 1998, 123, 2297-2302 2297


batches) and Avicel PH200 (single batch) were all from FMC
1.0 International (Wallingstown, Little Island, Co Cork, Ireland).
The batches of lactose monohydrate used were a single batch of
0 .9
a reagent grade material (Avocado Research Chemicals Ltd,
0.8
Heysham, UK), 9 samples of 110 mesh obtained from two
manufacturers (DMV International, Veghel, Netherlands and
0 .7 Lactose New Zealand, Hawera, New Zealand) and 18 samples
of Fast-Flo (Foremost Ingredients Group, Baraboo, WI,
0.6 USA).
0 .5

0 .4
Sample preparation and presentation

0 .3 A range of aspirin samples of different particle size distributions


were obtained by grinding the coarse bulk material with a
4000 5000 6000 7000 8000 9000 10000
mortar and pestle. Samples were taken successively with every
Wavenumber/cm
few minutes of grinding. The ground aspirin samples were air-
1.0
jet sieved to remove fines.
0 .9 Air-jet sieve fractions of the aspirin, anhydrous caffeine and
paracetamol were produced using stainless steel wire mesh
0.8
sieves of different sieve diameter (75, 56, 50, 40 and 36 pm).
0 .7 With use of the smallest air-jet sieve, an additional sample was
0.6 collected from the sieve filter paper.
Sieve fractions of a single batch of Avicel PHlOl, Avicel
0 .5
PH102, Avicel PH200 and the reagent grade lactose mono­
0 .4 hydrate were produced by machine sieving using a nest of
progressively finer stainless steel wire mesh sieves (150,90,63,
0 .3
45, 38 and 32 pm). In addition, material falling through the 32
0.2 pm sieve was collected.
0.1 These size fractions and the remaining batches of bulk lactose
110 mesh, Fast-Flo, Avicel PHlOl and Avicel PH 102 were
o'— particle sized by FALLS. A sample from each was suspended in
4000 5000 6000 7000 8000 9000 10000
Wavenumber/cm^^ a practically insoluble disperse medium with surfactant (sorbi-
tan trioleaté or dilute household detergent) prior to particle
Fig. 1 NIR reflectance spectra for different median particle sizes. A,
Microcrystalline cellulose; a, 24 p,m; b, 45.8 pm; c, 93.4 pm; d, 261 pm; e, sizing and was gently shaken using a vortex-mixer to prevent
406 pm; B, lactose monohydrate; a, 44.7 pm; b, 66.3 pm; c, 98 pm; d, 132 formation of agglomerates. Avicel and aspirin samples were
pm; e, 168 pm. dispersed in cold, distilled water using dilute detergent.

0 .0 1 5
0 .5 5

0.020
0 .5 0
S 0 .0 2 5
» 0 .4 5 0 .0 3 0

0 .0 3 5
0 .4 0

0 .0 4 0
0 .3 5
100 300 400 100 20 0 300 400
FALLS d /|m

0 .1 6
0 .4 0
0 .1 5
0 .3 5

o 0 .3 0 g 0.14-
tJ 0 . 2 5 I oa3.
0.20 5 0.1 2 '

0 .1 5
I O.ll.f.
/
0.10 0.10
0 .0 5
0 .0 9 -
50 100 150 100 150
FALLS FALLS

Fig. 2 Single wavenumber quadratic least squares fit of NIR spectral data and median particle size, d^Q . A, microcrystalline cellulose reflectance data; B,
microcrystalline cellulose mean-corrected reflectance data; C, lactose reflectance data, and D, lactose mean-corrected reflectance data.

2298 Analyst, 1998, 123, 2297-2302


Anhydrous caffeine and lactose monohydrate were suspended l/d s o and the ln(FALLS dso). All possible combinations of
in cyclohexane with sorbitan trioleate. Paracetamol was sus­ pretreated NIR and FALLS data were tested.
pended in pentane with sorbitan trioleate. NIR diffuse re­ Investigation with both calibration methods revealed that the
flectance measurements were made on the samples of sieved, reflectance data recorded by the NIR spectrometer produced
ground and bulk material in narrow disposable glass vials to calibrations with significant correlation, low scatter and low
enable consistent compaction pressure. Sieved and bulk materi­ bias. Kubelka-Munk function data and absorbance also showed
als were scanned on different days, over the course of several significant correlation between NIR predicted dso and FALLS
dso, however these pretreatments introduced significant fixed
weeks.
bias into the calibration. Reflectance data were therefore
subsequently used. The ln(FALLS dso) was found to give two
wavenumber MLR calibrations with a lower SEC than l/d s o or
Data analysis dso and were the FALLS data used in subsequent MLR
calibrations.
Data were processed using code programmed in Matlab 5 To demonstrate the feasibility of the MLR calibration
Scientific and Technical Programming language (The Math­ method, sieve fractions of a single batch of three drugs were
works Inc, Natick, MA, USA). tried initially (aspirin, anhydrous caffeine and paracetamol).
Subsequent working MLR calibrations for the two pharmaceuti­
cal excipients (microcrystalline cellulose and lactose mono­
Results and discussion hydrate) were produced from a larger data set using either
machine sieve fractions or a combination of machine sieve
fractions and bulk samples from a number of different batches.
Preliminary results

Reflectance at any wavenumber versus dso or l/c^so exhibited a Table 1 Microcrystalline cellulose and lactose MLR calibration (sieve
curvilinear relationship. To allow for this, two different fraction data) and validation (bulk sample data) results
approaches were compared: single wavenumber quadratic least
squares regression and full two-wavenumber search MLR. With Microcrystalline
each of these calibration methods, different pretreatments of the Material cellulose Lactose monohydrate
NIR spectral and FALLS dso data were applied and their effects V -3.99 4.59
on standard errors of calibration and prediction (SEC and SEP bi« 59.65 -172.3
b2 « -57.99 167.8
respectively),bias and linearity investigated. Wavenumber 1
Quadratic least squares fits of NIR spectral and FALLS dso (cm-') 8244 6012
data were used to allow for gentle curvature in calibrations. The Wavenumber 2
NIR spectral data were diffuse reflectance (of infinite thickness (cm-') 5964 5940
for all practical purposes), R; mean corrected reflectance (where SEC [ln(£?5 o/pm)] 0.067 0.097
the mean reflectance value of an individual spectrum is SEP [bi(J5 o/p.m)] 0.17 0.18
subtracted from the reflectance at each of its spectral wave­ In P = c + m In(FALLS dso)
numbers); absorbance, log (l/R) and Kubelka-Munk function,
Calibration set*
/W: r 0.99 0.99
£ m 0.99 0.98
ji-R) c 0.035 0.068
f{R) = (1)
2R S n 24 (PHlOl, PH102, 15 (Sieved and
PH200 sieved) 1 1 0 mesh)

where K and S are the Kubelka-Munk absorption and scatter Validation set*
coefficients respectively. A search of all 500 datapoints was r 0.84 0.014
used to select the wavenumber giving the smallest SEC for each m 0.96 0.018
NIR data pretreatment. c 0.17 4.38
n 33 (PHlOl and 18 (Fast-flo samples)
The second calibration technique applied was two wave­
PH102, bulk)
number MLR. A search of all combinations of two wave­
numbers from the 500 measured by the spectrometer was “ MLR coefficients: b o , intercept; 6 ,, wavenumber 1, and 6 2 ,
wavenumber 2 . * r is correlation coefficient; m and c are slope and
carried out. The NIR spectral data used were again reflectance,
intercept, respectively, of plots of NIR predicted W 5 0 vs. FALLS measured
mean-corrected reflectance, absorbance and Kubelka-Munk Intiso; n number of samples in each data set.
function. FALLS data pretreatments investigated were dso,

400 180 180

160
160
140
I
% 140
350 S 120

S 120 100

!... 100

80

60

250
250 300 350 400 60 ) 100 150
FALLS dgg/pm FALLS d ^ / i m

Fig. 3 Feasibility study. Results of MLR calibration. NIR measured median particle size, dso . versus FALLS dso : A, aspirin; B, anhydrous caffeine, and
C, paracetamol.

Anafysf, 1998, 123, 2297-2302 2299


Fig. 4 SEM results. A, Avicel PHlOl bulk sample; B, Avicel PH200 > 200 pm sieve fraction; C, lactose monohydrate < 31 pm sieve fraction, and D,
lactose monohydrate > 150 pm à ic v c iiav^iiuii.

The quadratic least squares calibrations were performed using Mean-correction of each spectrum was found to improve
the microcrystalline cellulose and lactose monohydrate data sets correlation between NIR predicted and FALLS dso with the
as these had the largest number of data. microcrystalline cellulose data set [r = 0.98, 7128 cm ' (n =
57)]. This pre-treatment acts to centre the data of individual
spectra (^mean = 0 ) and can help to eliminate baseline
Spectral characteristics differences that occur as a result of variable sample porosity and
pressure applied with the fibre-optic probe. The variation in
Scans of each powdered sample exhibited the characteristic offset will also be influenced by the flow properties of the
overlapping combinations and overtones arising from the material. Use of this pretreatment is likely to be appropriate in
fundamentals of the mid-infrared, with non-uniform baselines single wavenumber least squares calibrations where the mate­
resulting from multiple scattering. Spectra also showed differ­ rial exhibits variable compaction properties, such as with
ent offset values, which appear to increase with wavenumber different grades of microcrystalline cellulose,2 » and also where
(Fig. 1). This has previously been attributed to variation in the NIR measurements are recorded using a fibre-optic probe.
pathlength,26 which is influenced by particle size and sample However, with lactose monohydrate (Reagent grade, 110 mesh
porosity. and Fast-Flo), which tends to have good flow and compaction
properties,29 pretreatment was not appropriate and gave
a poorer fit between NIR predicted and FALLS dso [/' = 0.68,
Single wavenumber quadratic least squares calibration 7056 cm-' {n = 33)].

Reflectance at any wavenumber showed a generally inverse


linear trend with median particle size up to approximately 1 0 0 MLR calibration using aspirin, anhydrous caffeine, and
pm (Fig. 2) broadly agreeing with Mie and Fraunhoffer theory paracetamol
for particles of comparable size to the wavelength.^' Beyond
this particle size, the relationship becomes markedly non-linear The results of these calibrations which applied MLR to all two
(Fig. 2). A quadratic least squares fit of the data (Fig. 2) between wavenumber combinations and contained 7 or 1 0 data points,
reflectance and median particle showed useful correlations clearly demonstrates a relationship between NIR reflectance
between the NIR predicted and FALLS J5 0 [microcrystalline and the ln(FALLS dso) (Fig. 3). The ln(FALLS dso) and
cellulose: r = 0.96,9012 cm-' {n = 57); lactose monohydrate: reflectance data, R, were fitted to an equation of the general
r = 0.90, 7428 cm-' {n = 33)]. form:

2300 Analyst, 1998, 123, 2297-2302


ln(FALLS dso) — bo + b\R\^ + b2R\^ (2) 1). However, the validation sets for each material exhibited
more scatter (Table 1).
were bo is the intercept, b^ and 6 2 are the MLR coefficients for
With the microcrystalline cellulose prediction set of bulk
the two wavenumbers, X,i and X2 respectively. Significant linear
samples (Avicel grades PHlOl and PH 102), significant correla­
association was found between NIR predicted Intiso and
tion (p < 0.005) between NIR predicted W 5 0 and ln(FALLS
ln(FALLS dso) values in each case [aspirin; r = 0.99 (n = 7),
dso) was obtained with a SEP greater than the SEC (Table 1).
anhydrous caffeine: r = 0.99 (« = 7) and paracetamol: r = 0.96
The high SEP is possibly accounted for by the FALLS and
{n = 10); in each case p < 0.005].
scanning electron microscopy (SEM) results (Fig. 4A and 4B)
which showed that these bulk samples had broad distributions
and comprised a mixture of irregularly shaped fines and large
Full two wavenumber search MLR using microcrystalline
spherical particles. Previous work^i has shown that this can
cellulose and lactose
produce more variable results than the use of narrow or
uniform-size distributions as the NIR scattering and absorbing
With each of these materials, preliminary data processing
properties of these particles will be different to that of median
revealed that MLR of reflectance versus ln(FALLS dsç^
sized particles.
produced the most linear calibrations. In addition, mean-
Validation of the lactose calibration used bulk samples of
correction of these spectra was not found to improve calibration
Fast-Flo from 18 different batches. This spray dried material
results. The MLR two wavenumber model therefore com­
generally has a relatively uniform and spherical particle size.29
pensates for variation in baseline offset.
A narrow range of dso was confirmed by FALLS (range dso'
81.1-115.7 p m ) . Poor correlation was obtained between NIR
Particle size calibration using sieve fraction data. Before
predicted Inciso and ln(FALLS dso) with these samples and is
probably due to the narrow range of particle size in the
calibrations were attempted for both materials, the data of each
prediction set as the SEP is not significantly different from that
was split into two sets: a calibration set of mainly sieved
of the microcrystalline cellulose prediction set of bulk samples
fractions [microcrystalline cellulose {n = 24) and lactose (n =
(Table 1).
15)1 and a validation set of bulk samples [microcrystalline
SEM results of the lactose sieve fractions used in the
cellulose (n = 33) and lactose (n = 18)]. With each calibration
calibration set showed small, irregularly shaped fines in the
set, highly significant correlation (p < 0.005) was obtained
smallest sieve fractions and large spherical particles in the
between NIR predicted InJ^o and ln(FALLS dso) values (Table
largest sieve fractions (Fig. 4C and 4D), much the same as with
the microcrystalline cellulose calibration set.

10

u 10
102

1
10 10
FALLS d^g/pm 101 103

10

Ü 10

10 10 1 3
FALLS d /pm 10 102 10
FALLS d^g/pm
Fig. 5 Results of microcrystalline cellulose MLR calibration with
randomised sieve fraction and bulk sample data. NIR measured median Fig. 6 Results of lactose monohydrate MLR calibration with randomised
particle size, dso, versus FALLS dso- A, Calibration set and B, validation sieve fraction and bulk sample data. NIR measmed median particle size, dso,
set. versus FALLS dso- A, Calibration set and B, validation set.

Analyst, 1998, 123, 2297-2302 2301


Particle size calibration using randomised sieve fraction grant. Keith Barnes and Dave McCarthy, The School of
and bulk sample data. To produce working calibrations with Pharmacy, University of London are thanked for assistance with
the microcrystalline cellulose and lactose data sets, the sieve sieving and SEM.
fraction and bulk sample data were randomly assigned to either
the calibration set (67% of spectra) or validation set (33% of
spectra). This procedure was repeated three times to test the
References
robustness of the method, giving three different calibration and
validation sets for the two materials. In each case, all three 1 M. E. Aulton, Pharmaceutics: The Science o f Dosage Form Design,
calibrations employed slightly different combinations of wave­ Churchill Livingstone, Edinburgh, 1988.
numbers. The selected wavenumbers were found to occur on the 2 H. G. Barth, S. Sun and R. M. Nickol, Anal. Chem., 1987, 59, 142.
slopes of overtone peaks; the selection of each wavenumber is 3 C. Washington, Particle Size Analysis in Pharmaceutics and Other
Industries, Ellis Horwood, New York, 1992.
therefore likely to have been influenced by the random noise in
4 British Pharmacopoeia 1993, HM Stationery Office, London, 1993,
each data set. With both materials, each of the three MLR vol. 2, Appendix XVII, A & B.
calibrations showed a good fit between NIR spectral and 5 A. Simmons, International LABMATE, 1993,17, 23.
FALLS data {microcrystalline cellulose: SEC [ln(J5 o/pm)] = 6 P. A. Hailey, P. Doherty, P. Tapsell, T., Oliver and P. K. Aldridge,
0.10-0.1 It and lactose monohydrate: SEC [ln(rf5 o/p.m)] = J. Pharm. Biomed. Anal., 1996,14, 551.
0.12-0.13t}. This was confirmed by plots of NIR predicted 7 P. Frake, C. N. Luscombe, D. R. Rudd, J. Waterhouse and U. A.
ln^5 o versus ln(FALLS dgo) which showed significant linear Jayasooriya, Anal. Commun., 1998, 35, 133.
8 P. Dubois, J. Martinez and P. Levillain, Analyst, 1987,112, 1675.
association [microcrystalline cellulose: r = 0.98 (n = 38 for 9 W. Plugge and C. van der Vlies, J. Pharm. Biomed. Anal., 1993,11,
each set) and lactose monohydrate: r = 0.97-0.981 {n = 22 for 435.
each set); in each case with p < 0.005] (Fig. 5A and 6A). The 10 C. van der Vhes, W. Plugge and K. J. Kaffka, Spectroscopy, 1995,10,
three validation sets for each material showed similar results 46.
(Fig. 5B and 6B) with highly significant correlation between 11 W. Plugge and C. van der Vlies, J. Pharm. Biomed Anal., 1996, 14,
NIR predicted In^so and ln(FALLS d^o) [microcrystalline 891.
12 E. Dreassi, G. Ceramelli, L. Savini, P. Corti, P. L. Peruccio and S.
cellulose (n = 19): r = 0.98, lactose monohydrate (n = 11): r Lonardi, Analyst, 1995, 120, 319.
= 0.93-0.97;t in each casep < 0.005], and SEP comparable to 13 E. Dreassi, G. Ceramelli and P. Corti, Analyst, 1995,120, 1005.
SEC, {microcrystalline cellulose: SEP [ln(c?5 o/pm)] = 14 E. Dreassi, G. Ceramelli, P. Corti, M. Massacesi and P. L. Perruccio,
0.12-0.14,t and lactose monohydrate: SEP [ln(<i5 o/pm)] = Analyst, 1995, 120, 2361.
0.15-0.21t). 15 D. J. Wargo and J. K. Drennen, J. Pharm. Biomed, Anal., 1996,14,
1415.
16 R. A. Forbes, M. L. Persinger and D. R. Smith, J. Pharm. Biomed.
Anal., 1996,15, 315.
Conclusion 17 LA. Cowe, J. W. McNicol and D. C. Cuthbertson, Ana/yV, 1989,114,
683.
18 L. S. Aucott, P. H. Garthwaite and S. T. Buckland, Analyst, 1988,
The accurate measurement of median particle size in pharma­ 113, 1849.
ceutical powders can be achieved using NIR spectroscopy. 19 I. J. Bames, M. S. Dhanoa and S. J. Lister, Appl.Spectrosc., 1989,43,
Excellent calibration results can be achieved by applying the 772
MLR method to sieved fractions, however in practice as bulk 20 C. R. Bull, Analyst, 1991,116, 781.
samples are more likely to be particle sized, calibrations also 21 G. Kortum, Reflectance Spectroscopy, Principles, Methods, Applica­
require inclusion of bulk samples. Use of both bulk samples and tions, Springer-verlag, Berlin, 1969.
22 K. H. Norris and P. C. Williams, Cereal Chem., 1984, 61, 158.
sieve fractions produces robust calibrations that can be used 23 A. J. O’Neil, R. D. Jee, R. A. Watt and A. C. Moffat, J. Pharm.
over a wide range of particle size. Pharm acol, 1997, 49, Suppl. 4,19.
24 J. L. Bari, H. Martens and T. Isaksson, Appl. Spectrosc., 1988, 42,
722.
25 E. Ciurczak, P. Torlini and P. Demkowicz, Spectroscopy, 1986, 1,
Acknowledgements 36.
26 H. Mark, Principles and Practice o f Spectroscopic Calibration, John
The authors are grateful to Buhler AG for the loan of the NIR Wiley & Sons Inc., New York, 1991.
instrument, and to The Mathworks Inc for providing Matlab 5 27 J. C. Miller and J. N. Miller, Statistics For Analytical Chemistry, Ellis
software. We thank P. A. Hailey, Pfizer Ltd and FMC Horwood, New York, 3rd edn., 1993.
28 Handbook o f Pharmaceutical Excipients, ed. A. Wade and P. J.
International for advice and providing samples of pharmaceuti­ WeUer, American Pharmaceutical Association, Washington, The
cal excipients and A. J. O’Neil thanks Pfizer Ltd for a research Pharmaceutical Press, London, 2nd edn., 1994.
29 S. Pearce, Manuf. Chem., 1986, 57, 77.
t The range gives the minimum and maximum values observed for the three
randomly selected calibration and validation sets. Paper 810600I K

2302 Analyst, 1998, 123, 2297-2302

You might also like