Professional Documents
Culture Documents
Investigation of Sources of Variance Which Contribute To NIRS Measurement, Candolfi
Investigation of Sources of Variance Which Contribute To NIRS Measurement, Candolfi
CHIMICA
ACTA
ELSEVIER Analytica Chimica Acta 345 (1997) 185-196
Received 17 October 1996; received in revised form 17 October 1996; accepted 31 December 1996
Abstract
The goal of this study is to improve NIR-spectroscopic measurement of pharmaceutical formulations. The variance
contributions of the following factors to the NIR-measurement of tablets, hard and soft gelatine capsules are estimated:
measurement repeatability, positioning, time, different samples of one batch and different batches. The data analysis is
performed on the one hand with PCA (Principal Component Analysis) as a display method and on the other hand numerically
with a nested ANOVA design.
These error sources are studied on three different 2. I .3. Second derivative
drug delivery systems, namely on tablets, hard and soft Second derivative eliminates baseline shift and
gelatine capsules. The final aim of the study is to emphasises small features, such as shoulders on a
develop procedures for identification of clinical study band. Therefore this pre-processing was selected
lots by pattern recognition techniques. and applied as described by Gorry [9].
random contribution to the total variance for the ith optical window
position for tablet
group of (the higher) level A, and B, is the random
contribution for the jth subgroup (level B) of the ith
group; t,,k is the error term of the kth item in the ,jth
subgroup of the ith group [ IO,11 1. sample fixture
cell.
For tablets (Fig. l(a)) the optical fibre is directed to
the top and the sample is lying directly on the optical
window. The underside of the tablet is scanned. For
capsules (Fig. 1(b)) the fibre is directed to the bottom.
The sample is lying below the optical window in a
special cavity with the exact dimension of the capsule
size so that the capsule cannot move. The upper side of Fig. I. (a) Fihre- and sample-holder for tablets. (bl Fibre- and
the capsule is scanned. sample-holder for capsules.
The data are NIR-spectra of log (l/R) of tablets,
hard and soft gelatine capsules, where R is the reflec- 7.1 mm. The filling of the hard gelatine capsules
tance of the sample versus the reflectance of a white consists of four excipients, the main one being cellu-
ceramic standard. The standard is measured at the lose, and 4.5% of the active drug. The capsule shell is
beginning of each measurement series and directly yellow and the capsule size is #4. The tilling of the soft
subtracted from all spectra of the corresponding mea- gelatine capsules contains the active drug in 10%
surement series. The spectra are obtained over the concentration and five excipients. The pure gelatine
range 10 0004000 cm ’ (l OOO-2500 nm), with 778 shell takes 29% of the total weight of the capsule and
measuring points. The measurements were performed is red-brown in colour. The soft gelatine capsules are
with a resolution of 16 cm ’ and a number of scans of oblong, and 20 mm in length. It is known from the
10. The resulting spectrum is the average spectrum of hard and soft gelatine capsules analysed in this study
these IO scans. that no cross-linking [ 12,131 occurs, if they are stored
The spectra were measured by using the optical at room conditions. The three drug delivery systems
tibre. The optical properties of the fibre do not allow to are completely different drugs, with different active
measure accurately in the spectral range 4600- components.
40()0 cm ‘. For this reason 78 measurement points
were discarded at the end of the spectra. The resulting
spectra contain 700 variables (measuring points).
The tablets contain 4% of an active drug and eight Definitions of the investigated factors:
excipients. The main excipient is lactose. They are Replicures. Replicate measurements of one sample
coloured, uncoated, plain tablets with a diameter of without moving it between the measurements.
188 A. Candolj et ul./Analyticn Chimica Actu 345 (1997) 185-196
Positioning. After a measurement the sample is was decided to perform PCA on the data matrix
removed from the position and put back for the next containing the spectra including the most part of
measurement. the investigated sources of variation (scheme C),
Different samples. Different samples of one produc- and select the wavenumbers according to high peak
tion batch are measured. loadings on the first three PCs. The full wavenumber
Different batches. Samples of different batches of range was used in a first step. SNV and column
one product are measured. centering was carried out prior to PCA. The data were
The measurements were performed for following pre-processed with SNV in order to remove uncon-
three measuring schemes: A - 30 replicates of one trolled variations of the baseline due to the different
sample, 30 measurements for positioning, 30 measure- particle sizes in solid dosage forms [8]. It turned out
ments of different samples of one batch; B - on five that high peak loadings were especially obtained for
consecutive days one sample is measured in three the two main spectral ranges where water is absorbing
positions, for each position three replicates were (6896 cm-’ res. 1450 nm, 5154 cm-’ res. 1940 nm
collected; C - measurement of eight samples taking [7]). To prevent making wrong decisions based on
into account positioning, time, different samples of computations carried out in the range of the water
one batch and different batches. The scheme considers peaks, the following ranges for water were defined and
data for four similar batches of one product. It is built eliminated from the spectra: 7263-6600 cm-’ (74
as follows (see Fig. 2). measuring points) for the first water peak and
These measurements were repeated for each drug 535 l-4989 cm-’ (48 measuring points) for the second
delivery system. water peak. They were removed from the spectra as
well as 15 measuring points at the beginning of the
3.3. Computations of the variance contributions of spectra. The resulting spectra contain 563 variables. In
the sources of variance a second step SNV was applied on the reduced original
data matrix, and the wavenumber selection was car-
For each drug delivery system 90 spectra were ried out again according to high peak loadings in the
collected with scheme A, 45 spectra with scheme B first three PCs after PCA. In some of the cases high
and 120 spectra with scheme C. The spectra obtained peak loadings were obtained for the same variable on
for each scheme were investigated for outliers by PC 1, PC2 and PC3. This procedure was repeated for
visual inspection of the plotted spectra and by PC each drug delivery system.
score plots. Four wavenumbers were finally selected for tablets,
Since nested ANOVA is a univariate method, the five for hard gelatine capsules and five for soft gelatine
first task was to select appropriate wavenumbers for capsules. In order to make this univariate approach
each drug delivery system for further computations. It more robust, the sum of the absorbance at the selected
value and that at the two neighbouring wavenumbers gelatine capsules at 5 120 cm- ‘. The selected variables
on both sides, i.e. the sum of five wavenumbers, was correspond to the water peak at 5154 cm-’ [7]. There-
computed and used as input value for all the computa- fore this spectral band contains the largest variation in
tions. this measuring scheme. Since it was not the goal of
this work to investigate the water variation. this factor
3.4. Influence of different pre-processing methods was not controlled while performing the measure-
ments. As a result it was decided to eliminate the
The influence of two pre-processing methods, two spectral ranges where water is mainly absorbing
namely SNV and second derivative transformation, from the spectra. However, that water can affect the
was studied and compared with the results obtained entire baseline of NIR-spectra [7] has to be taken into
from raw data. account. Our results also show that, when carrying out
The reduced matrix, without the two water ranges of pattern recognition, one should take care to avoid
spectra of measuring scheme C for hard gelatine regions where the absorption of water can play a role
capsules was used once as original data, once after or that one should correct for it.
SNV and once after second derivative transformation. The procedure for variable selection was performed
Again variables with high loadings were selected after again on the reduced data matrix (120x563). It
PCA for the corresponding type of pre-processing and seemed reasonable to select variables lying between
for the original data. The sum of the absorbance at the the two water peaks, because the absorbance values
selected values and that at the two neighbouring are very low at high wavenumbers ( 10 000 cm-~‘j and
variables on both sides was computed and used as the noise level is increased at low wavenumbers
input for the nested ANOVA. (4600 cm-‘). The chosen variables correspond to
peaks in the spectrum of the active drug or of the
3.5. Computer programs main excipient.
Fig. 3(a)-(c) show the loading plots on PC I to PC3
The data collected with the Bruker instrument were for the three drug delivery systems with the selected
transformed from OPUS format to JCAMP format by features. The indicated wavenumbers represent the
the OPUS V2.0 [ 14] software and exported from the selected variables. They are representative for peaks
NIR system. The files were later transformed into of the active drug or the main excipient.
ASCII-code by a program written in Visual basic
3.0 [15] and imported into MATLAB [16], which 4.2. Measuring scheme A
was used to perform most of the computations. The
computations for the nested ANOVA were carried out The variance was computed from the univariate
with Statgraphics Plus V.6.1 [17]. values for the 30 replicates, for the 30 measurements
for positioning and for the 30 different samples of one
batch. Table 1 shows the results for the three drug
4. Results and discussion delivery systems.
The computations were carried out on all selected
4.1. Wavenumber selection variables for the respective drug delivery systems. The
results for each investigated factor for the individual
The features for the analysis with nested ANOVA drug delivery systems are similar. The variance
were selected with PCA, carried out on the full data obtained for positioning is significantly higher for
matrix ( 120 x 700) of measuring scheme C. Scheme C all three systems than for replicates. It is especially
includes measurements for several batches of one increased for hard gelatine capsules. This is an indica-
product, different samples of one batch, measurements tion that for this drug delivery system the inhomo-
over a certain time and for different positions. The geneity of the sample seems to be higher than for the
loadings plots for the first three PCs show, that for the other two. This is expected, since the tilling of a hard
tablets the highest peak loading is found at 5 166 cm-‘, gelatine capsule is loose and therefore moving by new
for hard gelatine capsules at 5143 cm-’ and for soft positioning of the sample. On the other hand tablets
190 A. Candolfi et al./Anal~tica Chimica Acta 345 (1997) 185-196
wavenumbers
wavenumbers
wavenumber
loadlngs on PC1
wavenumbers
loadings on PC2
loadings on PC3
Fig. 3. Loading plots on PCI, PC2 and PC3 from measuring scheme C for (a) tablets, (b) hard gelatine capsules and (c) soft gelatine capsules.
and soft gelatine capsules are fixed systems. From One can see on the score plot for the tablets
positioning to different samples the variance is Fig. 4(a) that the three scores for replicates lay close
increased only a little for tablets; for the capsules it to each other. Each day, nine measurements were
is slightly decreased. The differences for tablets prob- performed, three replicates at three positions. The
ably are due to additional physical parameters, i.e. scores representing positioning are already more dis-
hardness, thickness, surface. tant. The measurements of different days are separated
and placed along the diagonal of the plot. The mea-
4.3. Measuring scheme B surements of day 1 are located in the upper left comer
of the PC space, the measurements of the last (5th) day
The data of measuring scheme B taking into are located in the lower right comer. This effect can be
account replicates, positioning and time are analysed random or can be an indication for a drift in time. It
with PCA for the three drug delivery systems. shows that carrying out measurements on different
Fig. 4(a)-(c) show the corresponding PCl-PC2 score days is a not negligible source of variance in this
plots. measuring scheme.
A. Candolfi et al. /Anulytica Chimica Arta 34.5 (1997) 185-l 96 191
Table I
Measuring scheme A: The variance obtained from 30 replicates. 30 times positioning and 30 different samples for each drug dehvery system :It
each selected variable
72birr.s
6484 7.29x IO-’ 1.54x IO 1 3.62x IO ~’
6276 6.25x10-’ 0.85x10 1 2.81x10 1
6006 2.89x IWh 0.44x IO- 0.70x IO ’
5783 3.24x 10 h 0.90x lo- 3.41 xl0 1
Hard capsulr~
6122 4.84x IOmh 4.86x IO-’ 2.81 x 10~ ’
6037 5.76x10-” 0.02 8.08x IO i
5914 6.25x10 ’ 0.01 9.98~10 ’
5790 9.00x10-h 0.0 I 0.0 1
572 I 4.41 x10-h 0.01 8.56< IO-
SoftcqJ.wlr.s
5945 1.44x10 5 7.13x10 ’ 5.3Xx IO i
5891 1.85x10-5 5.38x lo-~ 3.13x IO ’
5852 2.30x10 ’ 7.02x IO- 3.69x10 a
5767 2.21x10-5 4.28~10 -I 2.37x IO J
542x 1.94x10 5 5.23x10 ’ 5.70x10 q
For the hard gelatine capsules the main variance on capsules the variance contribution for the replicates
PC 1 is related to positioning, mainly occurring on day can be neglected. From the two other factors it seems
1. The variance for positioning on the other four days that the factor time tends to have a somewhat higher
is much smaller. This large variance for positioning part of the total variance. This is also valid for the soft
leads to the appearance of tightly clustered replicates. gelatine capsules. Investigating this drug delivery
PC2 more or less separates the measurements per- system one observes, that the replicates have a larger
formed on day 2, 3,4 and 5. The samples are ordered variance contribution. The high variance contribution
according to time, which might be again an indication for time can be an indication for a drift. An internal
of drift in time. reference was periodically measured at the beginning
The PCl-PC2 score plot for the soft gelatine cap- of each measuring series and directly subtracted from
sules (Fig. 4(c)) shows, that the replicates contain those measurements. However, this standard might not
somewhat more variance, compared to the tablets correct for all changes in the measuring conditions
and hard gelatine capsules. One can see the clusters (instrument, surroundings). Appropriate measure-
for the positions of one day. Again day 1 includes a ments were performed over a time period of six
large variance, which is observed on PC 1. PC2 sepa- months to investigate the long time stability of the
rates more or less the measurements performed instrument performance. A small drift in time was
between days 2 and 5, but does not order them observed.
according to time. The nested ANOVA results correspond with the
For the same data a nested ANOVA was computed results obtained by the interpretation of the PC score
with the univariate data described above. The results plots as well as with the results obtained with measur-
are summarised in Table 2 and numerically show the ing scheme A.
results already displayed by PCA.
The results obtained for the tablets show that the 4.4. Measuring scheme C
variance contribution of the replicates is very small.
The rest of the total variance is approximately divided The data of measuring scheme C taking into
into equal parts for the two remaining factors, days account positioning, time, different samples of one
and positioning. In the case of the hard gelatine batch and different batches are analysed with PCA for
192 A. Candolfi et al. /Analytica Chimica Acta 345 (1997) 185-196
0.04,
Fig. 4. Score plots of the data obtained from measuring scheme B for (a) tablets, (b) hard gelatine capsules and (c) soft gelatine capsules. 1, 2,
3: sample measured after repositioning on day 1; 4, 5, 6: sample measured after repositioning on day 2 etc.
Table 2
Measuring scheme B: The variance contribution obtained with nested ANOVA for different sources of variance for each drug delivery system
at the corresponding selected variables
Fig. 5. Score plots of the data obtained from measuring scheme C for (a) tablets, (b) hard gelatine capsules and (c) soft gelatine capsules. I. 2:
measurements of samples I and 2, for batch 1: 3, 4: measurements of samples I and 2. for batch 2 etc.
the three drug delivery systems. Fig. 5(a)-(c) show the factors, positioning and time, seem to be less impor-
PC 1-PC2 score plots. tant.
On the score plot for the tablets presented in On the score plot for the hard gelatine capsules
Fig. 5(a) one can see that the major variation along (Fig. 5(b)) no clusters for the batches and for the
PCI, which represents 74% of all variation, is due to different capsules can be seen. These factors will
differences between batch 4 (scorenumbers 7 and 8) contain a smaller part of the total variance and there-
and the others. This is probably due to physical fore the factor positioning and time affect to a higher
parameters of the tablets, which are different for this extent the measurement. A separation along PC1 can
production batch compared with the other batches be recognised. The measurements performed between
(another tablet machine, another person, who was days 1 and 3 are found on the left and the measure-
producing this batch etc.). PC1 also separates the ments for days 4 and 5 on the right.
two tablets of each batch. PC2 separates the three The scores for the soft gelatine capsules (see
other batches from each other. The two remaining Fig. 5(c) for batch 1. 3 and 4 are randomly distributed
194 A. Ccrndolfi et al. /Analytica Chimica Acta 345 (1997) 185-196
Table 3
Measuring scheme C: The variance contribution obtained with nested ANOVA for different sources of variance for each drug delivery system
at the corresponding selected variables
over the whole PC space. Batch 2 is somewhat sepa- the cross-linking phenomenon described by Digenis
rated along PC2 (25% of variance). et al. [ 131 in the investigated capsules. Small altera-
For the same data the nested ANOVA computations tions in the humidity can, however, lead to background
were carried out for the selected variables. The results effects, since the entire baseline of NIR-spectra is
are summarised in Table 3. The computations were influenced by water and not only the two main spectral
additionally performed for the selected variables of regions [7]. The variance due to the two other factors is
the water peak (see wavenumber selection) before rather small.
dropping the two spectral regions for humidity. For the soft gelatine capsules not all results obtained
Results similar to those presented in Table 3 for at the different selected variables are similar. The
non-water features are obtained for the water features. selected variable at 5428 cm-’ leads to a different
The main individual parts of variance for hard repartition of the total variance for all four sources of
gelatine capsules are positioning and time. As variance. This might occur because this variable is
explained earlier, if the capsules are not completely situated close to the water band. For the other features
filled with powder, granules or pellets, the content a clear trend is shown. The total variance is divided
moves when the position of the sample is changed. into approximately equal parts. These capsules are
Another reason for the effect of positioning could be filled with a solution and therefore the filling is more
the round shape of the capsule and the smooth and homogeneous than for hard gelatine capsules. Addi-
brilliant shell. During the measurement the light is tionally they are completely filled. However, the cap-
scattered over the full range of reflection angles. Due sule shell is a source of inhomogeneity, and therefore
to these two reasons positioning is especially proble- positioning is rather important. It is also possible that
matic. It is not possible to reproduce the same position the water content of the gelatine shell is changing. For
twice. The factor time expresses the variance due to this drug delivery system the variance due to different
alterations in the sample, surrounding conditions and capsules is large because the capsule shape varies for
the instability of the instrument. As described before a each sample. The production process for soft gelatine
small drift was observed over a long time period for capsules is very complicated and leads to the high
the measurements. Alterations of the capsule might be variance contribution for different batches.
due to small changes in the water content of the The factor positioning and time are less important
gelatine shell. Nevertheless, this does not lead to for tablets. As long as the diameter of a tablet is larger
A. Candolj et al. /Analytica Chimica Acta 345 (I 997) 185-196 195
Table 4
Measuring scheme C of the hard gelatine capsules: The variance contribution obtained with nested ANOVA for the different sources of
variance for raw data. SNV data and second derivative data
than the diameter of the optical window, positioning pre-processed data. For the original data three vari-
will be less problematic. This drug delivery system ables are selected, 5 after SNV and second derivative
seems to be more stable than capsules since time transformation. This means that after applying pre-
affects the measurement less. The two other factors, processing more variables are selected. The reason for
different tablets and different batches, are important. this in the case of SNV is, that the variance included in
The physical complexity of the sample, i.e. hardness, the original peaks is distributed to the corresponding
thickness, compression force etc. is expressed in the peaks and some neighbouring variables. Second deri-
variance due to tablets. For the production process of vative transformation emphasises small features and
tablets such physical parameters are given in a range, as a result more variables are selected.
wherein all samples are normally distributed. There- For the selected variables of one type of pre-pro-
fore each tablet is a little different. This is true also for cessed data, as well as for all three types of data, the
several batches. However, more parameters can influ- repartition of the total variance is similar. The highest
ence a batch, e.g. the use of different tableting variance part is obtained for the factor time followed
machines, other batches of active drug and excipients, by positioning. The factors in batch to batch variability
temperature, humidity etc. As a result the variance and capsule to capsule variability contain a smaller
obtained for this level is the highest for tablets. part of the total variance. This indicates that the results
obtained from these experiments are in this case rather
4.5. InjluenccJ of d$ferent pre-processing methods independent of pre-processing.
If one compares the results obtained with original
All data of this study are pre-processed with SNV. data and with SNV data more in detail, one can see for
We wanted to investigate if the variance contribution SNV, that the individual variance contribution of batch
found for the different factors depends on a specific and of positioning are decreased. SNV removes linear
type of pre-processing, in this situation, the SNV. slope variations of the individual spectra [g]. This
Therefore the influence of SNV and second derivative phenomenon depends on sample positioning, and this
as pre-processing methods is studied and compared contribution is therefore somewhat corrected with
with the results obtained from original data. The SNV. Second derivative also corrects for baseline drift
results are summarised in Table 4. which explains why the variance contribution of the
The variable selection is performed as discussed factor positioning is decreased, compared to the
earlier for original data, for SNVand second derivative results of the raw data.
196 A. Candolji et al./Annlytica Chimica Acta 345 (1997) 185-196