Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Chemometrics and Intelligent Laboratory Systems 44 1998.

175185

Orthogonal signal correction of near-infrared spectra


Svante Wold
a

a,)

, Henrik Antti a , Fredrik Lindgren b, Jerker Ohman

Research Group for Chemometrics, Department of Organic Chemistry, Umea Uniersity, S-901 87 Umea,
Sweden
b
Astra-Draco, Box 34, S-221 00 Lund, Sweden
c
Umetri, Box 7960, S-907 19 Umea,
Sweden
Received 30 October 1997; revised 21 April 1998; accepted 12 June 1998

Abstract
Near-infrared NIR. spectra are often pre-processed in order to remove systematic noise such as base-line variation and
multiplicative scatter effects. This is done by differentiating the spectra to first or second derivatives, by multiplicative signal
correction MSC., or by similar mathematical filtering methods. This pre-processing may, however, also remove information
from the spectra regarding Y the measured response variable in multivariate calibration applications.. We here show how a
variant of PLS can be used to achieve a signal correction that is as close to orthogonal as possible to a given Y-vector or
Y-matrix. Thus, one ensures that the signal correction removes as little information as possible regarding Y. In the case when
the number of X-variables K. exceeds the number of observations N., strict orthogonality is obtained. The approach is called
orthogonal signal correction OSC. and is here applied to four different data sets of multivariate calibration. The results are
compared with those of traditional signal correction as well as with those of no pre-processing, and OSC is shown to give
substantial improvements. Prediction sets of new data, not used in the model development, are used for the comparisons.
q 1998 Elsevier Science B.V. All rights reserved.
Keywords: Orthogonal signal correction; Near-infrared spectra; Multiplicative signal correction

1. Introduction
Near-infrared NIR. spectroscopy is being increasingly used for the characterisation of solid,
semi-solid, fluid and vapour samples. Frequently the
objective with this characterisation is to determine the
value of one or several concentrations in the samples. Multivariate calibration is then used to develop
a quantitative relation between the digitised spectra
the matrix X. and the concentrations in the matrix
Y., as reviewed by Martens and Naes w1x. NIR spectroscopy is also increasingly used to infer other prop)

Corresponding author. Tel.: q46-90-138-885

erties Y. of samples than concentrations, e.g., the


strength and viscosity of polymers, the thickness of a
tablet coating, or the octane number of gasoline.
The first step of a multivariate calibration based on
NIR spectra is often to pre-process the data. The reason is that NIR spectra often contain systematic variation that is unrelated to the responses Y.. For solid
samples this systematic variation is due to, among
others, light scattering and differences in spectroscopic path length, and may often constitute the major part of the variation of the sample spectra. Another reason for systematic but unwanted variation in
the sample spectra may be that the analyte of interest
absorbs only in small parts of the spectral region. The

0169-7439r98r$ - see front matter q 1998 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 9 8 . 0 0 1 0 9 - 9

176

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

variation in X that is unrelated to Y may disturb the


multivariate modelling and cause imprecise predictions for new samples.
For removal of undesirable systematic variation in
the data, two types of pre-processing are commonly
reported in the analytical chemistry literature, differentiation and signal correction. Popular approaches of
signal correction include SavitzkyGolay smoothing
w2x, multiple signal correction MSC. w1,3x, Fourier
transformation w4x, principal components analysis
PCA. w5x, variable selection w1,6x, and base line correction w1,11x.
These signal corrections are actually different
cases of filtering, where a signal e.g., a NIR spectrum. is made to have better properties by passing
it through a filters mathematical function. The objectives of filtering often are rather vague; it is not
always easy to specify what we mean by better
properties. Even if we, in the case of calibration, can
specify this objective in terms of lowered prediction
errors or simpler calibration models, it is difficult to
construct general filters that indeed improve these
properties of the data.
We here wish to report the first results of a very
simple idea, namely to construct a filter that removes
from the spectral matrix X. only the part that definitely is unrelated to Y. This is made by ensuring that
the removed part is mathematically orthogonal to Y,
or as close to orthogonal as possible. We call this approach OSC for orthogonal signal correction. As an
illustration, we shall here use a. the modelling of the
viscosity of three sets of modified cellulose samples
in terms of their NIR reflectance spectra and b. a
similar example where NIR spectra are used to model
and predict 17 measured properties of pulp samples.
Differentiation to first and second derivatives, and
variable selection, will not be further discussed, since
these operations cannot easily be performed orthogonally to Y.
2. Notation
We shall use capital bold characters for matrices,
e.g., X and Y, small bold characters for column vectors, e.g., v, non-bold characters for vector and matrix elements, e.g., x ik , and vi , and for indices, e.g.,
i, j, k, and l, and capital non-bold characters for index limits, e.g., K and N, and A. Row vectors are seen

as transposed vectors, e.g., vX , and hence transponation is indicated by a prime X . The index i is used as
sample index rows in X and Y; i s 1, 2, . . . , N. and
the index k as index of X-variables k s 1, 2, . . . , K..
3. Filtering and calibration
Before multivariate calibration, the unanimous
agreement was that ideal spectra looked like well resolved NMR or IR spectra, i.e., mainly a straight
baseline plus some narrow and symmetrical peaks
unambiguously raising above this baseline. Noise introduced wiggles, but could be removed by a judicious filtering of the spectra. Much of the objectives of filtering are still formulated accordingly, as a
way to make signals and spectra smooth and pleasing for the eye, and thus easy to interpret.
With multivariate analysis, however, we evaluate
our spectra by means of mathematical methods buried
in computers. There is not much evidence that
smooth and eye-pleasing signals contain more information for computerised mathematical approaches
than do rough un-filtered spectra.
In essence, in order to construct an efficient filtering approach, we need to quantitatively formulate
criteria for what the filtering is supposed to achieve.
In multivariate calibration, we can quantitatively
specify at least one objective of filtering, namely that
the filtering should NOT remove information about Y
from the spectra X.. Here Y is what we calibrate
against, i.e., analyte concentrations or other sample
properties. This non-removing can be stringently
formulated mathematically, namely that the information in Y should be un-related, orthogonal, to what is
removed from X by the filtering.
To achieve such orthogonality, however, we must
express our filters in such a way that their orthogonality to a vector or matrix, Y, can be quantified. This
seems easiest accomplished by expressing the filtering as removing a bilinear structure from X, i.e., a
product of a score matrix, T, times a loading matrix, PX . Orthogonality to Y then means that both TX Y
and Y X T are matrices with only zero-valued elements.
As discussed by Sun w5x, we can formulate a class
of filters as PCA-like multivariate projections. Basing the filter on unmodified PCA has the advantage
that a PC model describes as much as possible of X,

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

and hence should catch most systematic variation in


X. However, the drawback is that this systematic
variation contains both the one that is linearly unrelated to Y, and the one that contains linear information about Y.
PCA-like filters can be written as:
X s TPX H E
Here X denotes the N = K. matrix of unfiltered,
uncorrected, set of digitised spectra, and E is the N
= K. matrix of filtered spectra. The N = A. score
matrix is denoted by T, and PX is a K = A. matrix
of filters, loadings. The numbers of samples and
variables of the training set calibration set. are N
and K, and A is the number of components. Sun finds
A s 1 or 2 to be optimal in the cases he reports w5x.
If now T can be made orthogonal against Y, we
are not removing information from X that is linearly
related to Y. With this simple insight, we can now try
to develop filters, base line corrections, etc., with an
orthogonality constraint.
4. Multiple signal correction (MSC)
Let us first consider the often used additive and
multiplicative signal correction MSC., of Geladi et
al. w3x, see also Martens and Naes w1x. Here each digitized spectrum, the row vector xXi , is normalised by
regressing it against the average spectrum of the
training set, m:
x ik s a i q b i m k q e ik
The average training set spectrum has the elements:
m k s x ikrN
Then, from each row, xXi , one subtracts the intercept
a i . and divides by the multiplicative constant b i .:
xXi ,corr s xXi y a i . rbi
We see that this correction filter. can not be expressed as a bilinear, PCA-like, expansion. Hence, the
MSC can not in a simple way be orthogonalized
against Y. We also realise that there is a substantial
risk that the vectors a and b are correlated to Y, and
hence that MSC may remove from X information related to Y.
The same argument applies to the standard normal
variate SNV. correction of Barnes et al. w11x. This

177

filter has exactly the same mathematical form as the


one of MSC, but the parameters a i and b i are calculated as the average and the standard deviation of the
i-th row of X, respectively.
However, if the sample spectra are fairly similar
to their mean, and if the slopes b i are not too far
from 1.0 or otherwise fairly constant, a two component bilinear expansion can still fairly well approximate the above corrections. Then OSC as described
below would accomplish a filtering similar to MSC
and SNV, but without removing relevant information
from X.

5. Orthogonal signal correction (OSC)


We will now investigate the possibility to remove
bilinear components from X which are orthogonal to
Y, i.e., make a signal correction that does not remove information from X.
We do this by setting up a PCArPLS-related solution which removes only so much of X as is unrelated orthogonal. to Y. This approach is based on the
fact that as long as the steps of the NIPALS iterative
algorithm of the classical two-block PLS regression
are retained, the weight vector w. can be modified
in any way to encompass constraints, smoothness, or,
as here, the objective that t s Xw is orthogonal to Y.
This fact was shown some time ago by Hoskuldsson

w7x. Hence, the OSC algorithm will be identical to the


ordinary PLS algorithm except for the crucial step of
calculating the weights, w. Normally, these are calculated as to maximize the covariance between X and
Y, but here they will instead be calculated as to minimize this covariance, i.e., to get as close to orthogonality between t and Y as possible.
Essential features of the OSC method are a. that
the signal correction filtering. is expressed as the
subtraction of a bilinear, PLS-like, expression, TPX ,
and b. that the OSC scores, t a , can be calculated
from new sample spectra as t new,a s x new wa .
When the number of X-variables, K, is larger than
N the number of training samples., it is always possible to find an exactly orthogonal OSC solution,
while if K - N this is not always possible. This also
means that for K ) N, there are infinitely many OSC
solutions, where the present algorithm is set up to find
the one that models as much of X as possible in each

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

178

component. This solution may not always be the


best, however, and additional constraints on the
OSC w vectors may be warranted. This question and
other possible OSC modifications will hopefully be
investigated in this laboratory in the near future.
5.1. Outline of the algorithm
Below we give an overview and brief explanation
of the present version of the OSC algorithm. The details of the algorithm are given in the appendix.
Before the calculations, X and Y can be transformed, centered and scaled as usual as discussed
further below.
Our version of OSC calculates and removes from
X one component at a time in the standard NIPALS
way. This has the advantage to make the approach
work also with missing data in the same way as ordinary PLS and PCA.
The first step in the calculation of each component is to calculate the first principal component of
the current X. This ensures that the starting score t.
is an optimal linear summary of X. This score vector
is then orthogonalized against Y to give t). This is
simply done as
X

t) s 1 y Y Y Y .

y1

Y .t

We see that t) is orthogonal to Y, since:


y1

t s Y X y Y X Y Y X Y.

The residual vector is then obtained as


eXnew ,1 s xXnew y t new ,1 pX1

Y X t) s Y X 1 y Y Y X Y .

nary PLS-regression or PC-regression if one so


prefers. to develop the calibration model.
After each component, the normal projection diagnostics can be calculated, i.e., modelled variance of
X R2X ., score plots of t) from different components, etc. The latter should, as usual, be checked for
outliers since aberrant sample spectra may seriously
affect the OSC model. The OSC loadings may
provide interesting hints for the causes of the undesired variation.
The resulting OSC-components a s 1, 2, . . . , A.
can now be considered as part of the PLS calibration
model. For a new observation digitized spectrum of
a new sample. with the raw data vector xXnew,raw a
X
row vector., we wish to have predicted values of ynew
the row vector of responses.. We proceed as follows, just like with an ordinary PLS model, but reX
membering that the contribution to ynew
from the
OSC components is zero. First the new x-vector is
transformed and scaled in the same way as the training set, giving xXnew . Then the first score value t new,1 .
is calculated as
t new ,1 s xXnew w 1 ,

YX .

y1

YX .t s 0

Thereafter the PLS weights w. are calculated by


an embedded PLS regression. to make Xw as close
as possible to t). The orthogonal score vector, t),
is then taken a NIPALS round through X to give an
updated score vector, t, which is then again orthogonalized against Y, and so on until convergence. In this
way t) is made to converge towards the longest vector that is orthogonal to Y and which still provides a
good summary of X.
After convergence, a loading vector, p, is calculated, and the residual matrix E 1 is then calculated by
removing from X the orthogonalized scores times the
final loadings. This residual matrix is then used as X
in the calculation of the next OSC component. The
final residual matrix, E A , after the final A-th. OSC
component, constitutes the filtered X-matrix. This
filtered X is then, together with Y, used in an ordi-

This residual vector is then used as the x-vector in


the second component, etc., until the corrected x new
has been obtained as the residual vector after A OSC
components. This corrected x-vector is then used in
X
the calibration model to give predicted values, ynew
.
5.2. Why not orthogonalize X first?
After the presentation in Lahtis, several chemometricians have proposed that the simplest way to
calculate an optimal OSC would be to use PCA of the
matrix Z, where Z is the X-matrix orthogonalized to
Y. Thus each column of Z is:
z k s 1 y Y Y X Y.

y1

YX .x k

This calculation may work well for the training set,


but since no y-values are available for new samples,
the x-vectors of these new samples cannot be orthogonalized in the same manner as the training set. Hence
this approach presently seems unworkable. However,
this new sample problem may be solvable after further thinking, but this is left to others to pursue.

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

179

5.3. The number of OSC components


According to our limited experience, two OSC
components are typically warranted with reflectance
NIR spectra, where the first component often resembles a base-line correction, and the second often to
some extent corrects for multiplicative effects. The
filtration, i.e., removal of two OSC components from
the raw reflectance NIR spectra Fig. 1. provides
quite different looking OSC corrected spectra Fig.
2..
The number of OSC components needed in a given
application can be inferred from the amount of X explained in each OSC component. When this goes below a value corresponding to an eigen-value of 2.0,
i.e., the modelled R2X of the component is less than
2rmin N,K., this indicates that more components are
not warranted. This corresponds to the eigen-value
) 1 criterion commonly used in PCA and factor
analysis, but made a little safer by using 2.0 instead
of 1.0. The orthogonality constraint adds some further rigor to this criterion. More stringent tests for
when to terminate the OSC expansion can, of course,
be developed based on cross-validation.
Since OSC with each component removing structure from X that is unrelated to Y, the remainder E
will model Y better and better also in the case when
X actually contains no significant information about
Y at all. Hence, OSC with the subsequent PLS or
PCR. analysis converges towards a multiple linear
regression MLR. solution. Since MLR is known to
show a dangerous over-fit with multivariate collinear

Fig. 1. Raw NIR reflectance spectra.

Fig. 2. OSC corrected NIR reflectance spectra.

data, this shows that also OSC-PLS is risky if too


many OSC components are allowed.
Hence, the number of OSC components should not
exceed a small fraction of min N,K., thus making the
risk small for spurious relations between the corrected X and Y. In the present examples this ratio has
been well below 1r10, which seems safe as long as
the observations are fairly independent i.e., not
grouped.. We also recommend that OSC results are
validated with new prediction samples as done in the
present examples.

6. Related signal corrections


The target rotation method presented by Kvalheim
and Karstang w8x and later reviewed by Christie w9x is
a filtering method related to PCA and hence also to
OSC. This target rotation method uses a specific target score vector to filter the X matrix. The idea with
the target rotation is to remove known information
from X as expressed by the target vector. that masks
the information of interest. But since no orthogonalization to Y is performed there is a risk for removing
information correlated to Y. Obviously, this problem
can be solved by making an orthogonalization of the
target score vector to Y before the target rotation.
This will give an orthogonal target rotation method
which may provide even better models than the original target rotation method.

180

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

Sun w5x recently published a paper using PCA for


signal correction. He found the method useful for removing large base-line variation as well as varying
background spectra. The present method of OSC can
be seen as an orthogonalized and further modified
PCA, which hence should have better properties than
the approach of Sun.
Before the present approach, we investigated a direct PLS approach based on the signal matrix X and
a contrast response matrix Z s 1 y YY X , where Y
first has been normalized to unit variance. This gave
fairly good results, but the approach is less rigorous
than the present OSC and was hence abandoned.

7. Scaling
The results of any projection, including OSC, are
influenced by the scaling of the original data in X. In
NIR applications one normally either uses un-scaled
data, data scaled to unit variance auto-scaling., or
something in between these two, e.g., so called Pareto
scaling w10x.
A problem with scaling of the original data occurs
when much of the variation in the X-data is due to
light scattering and other phenomena which will be
removed by the OSC-filtration. Then, the auto-scaling or Pareto scaling will be based on major variation in X that is irrelevant to the actual calibration
model.
Hence the use of un-scaled X-data would seem to
be more appropriate for OSC-filtering. To circumvent this difficulty, one can run the OSC-algorithm on
the original scaled or un-scaled. data and then use
the filtered X-matrix to calculate a new scaling of the
original data. The OSC-algorithm is run again on the
re-scaled X-data.

8. Data sets
In this study four different data sets were used for
comparison of filtering signal correction. methods.
Three of the data sets are NIR data collected on cellulose derivatives in order to predict the measured
viscosity. The fourth data set is NIR data on pulp
samples from the pulp and paper industry on which
17 physical properties have been measured.

Each data set is divided into one calibration set and


one external test set used for validation of the model
predictive ability.
8.1. Cellulose ground

This data set was collected at Akzo Nobel, O-vik,


Sweden. The raw material for their cellulose derivative process is delivered to the factory in form of cellulose sheets. Before entering the process the cellulose sheets are controlled by a viscosity measurement, which functions as a steering parameter for that
particular batch.
For this data set NIR spectra for 181 cellulose
sheets were collected after the sheets had been sent
through a grinding process. Hence the NIR spectra
were measured on the cellulose raw material in powder form. For calculation of a calibration model 161
sample spectra were used. The remaining 20 sample
spectra were used for model validation.
X: 1201 wavelengths in the VIS-NIR region
Y: Viscosity
N Calibration set. s 161
N Test set. s 20
8.2. Cellulose XFM
This data set contains NIR spectra from 106 cellulose samples in powder form. The XFM cellulose is
a raw material in the cellulose derivative process of

Akzo Nobel, O-vik,


Sweden.
For each of the 106 cellulose samples the viscosity was measured. N s 86 sample spectra were used
for calibration, while the remaining 20 were used for
model validation.
X: 1201 wavelengths in the VIS-NIR region
Y: Viscosity
N Calibration set. s 86
N Test set. s 20
8.3. Cellulose sheets
The data set cellulose sheets were collected at

Akzo Nobel, O-vik,


Sweden. In this data set the NIR
measurements were carried out on the actual cellulose sheets. The measured viscosity for each sheet
was used as response variable Y. in the model calculations.
NIR spectra for 65 cellulose sheets were collected. N s 45 sample spectra were used for calibra-

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

tion and the remaining 20 were used for model validation.


X: 1201 wavelengths in the VIS-NIR region
Y: Viscosity
N Calibration set. s 45
N Test set. s 20

181

Table 2
RMSEP-values for the test set for each of the calculated models

Ground
XFM
Sheets
Pulp

Raw

MSC

OSC

199.76
616.56
138.55
85.51

195.13
478.99
130.54
96.48

183.16
428.48
78.43
78.74

8.4. Pulp samples


This data set was collected at Assi-Doman
Corporate Research, Pitea,
Sweden. Here NIR spectra for 39
laboratory cooked pulps were collected. For each pulp
17 physical properties were measured. These correspond to product quality parameters commonly used
in the pulp and paper industry. The 17 physical properties were used as response variables in the model
calculations.
28 samples were used for the calibration and 11
samples were used for the external validation of the
model.
X: 233 wavelengths in the VIS-NIR region
Y: 17 physical properties
N Calibration set. s 28
N Test set. s 11

9. Results and discussion


For all four data sets two OSC components were
used for the filtering.
For the cellulose data sets we see that the OSC
method gives better calibration models, i.e., higher
Q 2 , according to cross-validation Table 1.. Evaluation of the prediction errors for the external test sets
reveals that the OSC treated data give substantially
lower RMSEP values than the raw and MSC data
Table 2.. Also, the OSC-filtered data give much
simpler calibration models with fewer components
than the ones based on the raw data Table 3..
Table 1
Q 2-values according to cross-validation for each of the calculated
models

Ground
XFM
Sheets
Pulp

Raw

MSC

OSC

0.72
0.614
0.536
0.54

0.623
0.618
0.604
0.54

0.94
0.839
0.863
0.57

The observed versus predicted plots for Y s


viscosity, show in a nice way how the fit and predictions develop from raw data via MSC to OSC, as exemplified by the data for ground samples Fig. 3ac..
For the pulp data set where the Y-matrix consists
of seventeen measured properties the Q 2 values according to cross-validation are about the same for the
three calibration models corresponding to unfiltered,
MSC-filtered, and OSC filtered data. The RMSEP
value for the external test set is, however, lowest for
the model calculated with OSC treated data.
The results imply that the OSC method indeed removes information from the NIR data that is not necessary for fitting of the Y-variables. Since the method
gives better predictions of the test set objects it seems
like this pre-treatment removes disturbing noise from
the NIR data, which also was the aim with the
method. The results also demonstrate that the OSC
method works for models using both single- and
multi-Y responses.
For these data we see that to get good calibration
models with low prediction errors some kind of scatter correction is needed. MSC seems to work in many
cases but often doesnt really improve the models
significantly. OSC on the other hand for these data
sets gives calibration models with substantially better
fit and predictive ability.
In some cases the OSC method also removes
non-linear relationships between X and Y. This can
for example be seen in the observed vs. predicted
plots for the data set cellulose ground Fig. 3ac..
Table 3
Number of PLS components for each of the calculated models

Ground
XFM
Sheets
Pulp

Raw

MSC

OSC

5
3
4
4

3
6
6
4

3
2
1
2

182

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

183

Fig. 3. Observed vs. predicted viscosity values for the PLS models calculated from the ground cellulose data set. Calibration set filled circles., test set open circles.. a. Results from a model based on raw data. b. Results from a model based on MSC filtered data. c. Results
from a model based on OSC filtered data.

One can here clearly see the non-linearity in the plots


for the raw data and the MSC model while in the plot
for the OSC treated data the non-linearity has disappeared. Why this removal of non-linearities between
X and Y occurs is something that we cant explain at
this point. But since it may be a nice property of the
method this will be further investigated.
According to the results shown in this study the
OSC of NIR spectra seems to be a good approach in
order to improve multivariate calibration models.

10. Conclusions
Since evidently projection methods such as PLS
are affected by strong systematic variation in the predictor matrix X. which is unrelated to the response
matrix, Y, there is a need for removing such variation from X before further modelling. We have here
presented an approach where signal correction filtering. is made in such a way that the removed parts are

linearly un-related orthogonal. to the response matrix, Y. OSC seems to have additional advantages
beyond improved predictability of the PLS model,
such as substantially simpler fewer components.
calibration models, which facilitates the interpretation of the models.
Computationally, we have based OSC on PLS and
the NIPALS algorithm, calculating one OSC component at a time. This choice was made because for us
this is the easiest way to implement the orthogonality
constraint, and also to make the method applicable to
incomplete data matrices.
To apply OSC to the filtering of a signal matrix,
one needs of course, also a response vector or matrix, Y. This is always present in multivariate calibration applications, but in other cases of filtering it may
not be available. In signal analysis, for instance, one
often wants to filter time series from unwanted noise.
Similarly, spectra used for the characterization of
materials or products such as pharmaceutical tablets,
may look noisy and a filtering would be warranted.

184

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

When one scrutinizes the objective of the characterization, however, it is often possible to construct a
fuzzy or soft response matrix, Y. In time series
analysis, for instance, we may want to use the signals to look for trendslinear or quadratic, or maybe
exponentialand then we can construct a Y-matrix
accordingly with columns varying linearly, quadratically, and exponentially with time. Analogously, if a
spectral matrix from a material characterization is
used for classificatione.g., bad materials vs. acceptable onesone may be able to construct a Ymatrix corresponding to this classification.
Other application areas where OSC may be found
useful include 3D-QSAR, where the structures of a
set of molecules are translated to a set of structure
descriptor vectors by means of, e.g., CoMFA w12x or
GRID w13x. The resulting X-matrix which often has
thousands of columns but just a few rows, say 15 to
50, is then related by PLS to a matrix Y with measured biological activity values. Since there often is
huge parts of X that is unrelated to Y, OSC may help
to clean up the data before the analysis, and hence
improve the predictivity and interpretability of the
solution.
When the number of X-variables, K, is larger than
N the number of training samples., it is always possible to find an exactly orthogonal OSC solution,
while if K - N this is not always possible. This also
means that for K ) N, there are infinitely many OSC
solutions, where the present algorithm is set up to find
the one that models as much of X as possible in each
component. This solution may not always be the best,
however, and additional constraints on the OSC w
vectors may be warranted. This question and other
possible OSC modifications will hopefully be investigated in this laboratory in the near future, together
with the performance of OSC in other applications
than multivariate calibration based on NIR spectroscopy.

Acknowledgements
Financial support to SW and HA by the Swedish
Natural Science Research Council NFR. and the
Swedish Foundation for Strategic Research is gratefully acknowledged. We are most grateful for permission to use data from our collaborations with Akzo

Nobel Surface Chemistry, O-vik,


and Assi-Doman

Corporate Research, Pitea,


Sweden.
Appendix A. The OSC NIPALS algorithm
1. Optionally transform, center, and scale the data
to give the raw matrices X and Y.
2. Start by calculating the first principal component of X, with the score vector, t.
3. Orthogonalize t to Y.
t new s 1 y Y Y X Y .

y1

YX .t

4. Calculate a weight vector, w, that makes Xw


s t new . This is done by a PLS estimation giving a
generalised inverses Xy
w s Xyt new
5. Calculate a new score vector from X and w
t s Xw
6. Check for convergence, by testing if t has stabilized. Convergence if 5t y t old 5r5t 5 - 10y6 , if not
convergence, return to step 3, otherwise, continue to
step 7.
7. Compute a loading vector, p needed for orthogonality between components.
pX s tX Xr t t new .
8. Subtract the correction from X, to give residuals, E
E s X I tpX
9. Continue with the next component using E as
X, then another one, etc., until satisfaction.
An important remark is that if to many OSC components are subtracted from the original X-matrix, we
will end up with a MLR solution since all information not correlated to Y then has been removed. Using only two or three components for the filtering will
not, however, result in problems of that kind.
10. New samples the prediction set. are corrected using W and P of the calibration model. For
each new observation vector, xXnew :
t 1 s xXnew w 1
eX1 s xXnew I t 1 pX1
t 2 s eX1w 2X
eX2 s eX1 I t 2 pX2

S. Wold et al.r Chemometrics and Intelligent Laboratory Systems 44 (1998) 175185

and so on! After A components, the final residuals are


the corrected x-data, which then give the predicted
X
predicted response values for the new
values of ynew
sample..

References
w1x H. Martens, T. Naes, Multivariate Calibration, Wiley, New
York, 1989.
w2x A. Savitzky, M.J.E. Golay, Anal. Chem. 65 1993. 3279
3289.
w3x P. Geladi, D. MacDougall, H. Martens, Linearization and
scatter-correction for near-infrared reflectance spectra of
meat, Appl. Spectrosc. 3 1985. 491500.
w4x P.C. Williams, K. Norris, Near-Infrared Technology in Agricultural and Food Industries, American Cereal Association,
St. Paul, MN, 1987.
w5x J. Sun, Statistical analysis of NIR data: data pretreatment, J.
Chemometr. 11 1997. 525532.
w6x M. Baroni, S. Clementi, G. Cruciani, G. Constantino, D. Riganelli, Predictive ability of regression models: Part 2. Selec-

w7x
w8x

w9x

w10x

w11x

w12x

w13x

185

tion of the best predictive PLS model, J. Chemometr. 6 1992.


347356.
A. Hoskuldsson,
PLS regression methods, J. Chemometr. 2

1988. 211228.
O.M. Kvalheim, T.V. Karstang, Interpretation of latent-variable regression models, Chemometrics and Intelligent Laboratory Systems 7 1989. 3951.
O.H.J. Christie, Data laundering by target rotation in chemistry-based oil exploration, J. Chemometr. 10 1996. 453
461.
S. Wold, PLS for multivariate linear modelling, in: H. van de
Waterbeemd Ed.., QSAR: Chemometric Methods in Molecular Design, Methods and Principles in Medicinal Chemistry,
Vol 2, Verlag Chemie, Weinheim, Germany, 1995.
R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse
reflectance spectra, Appl. Spectrosc. 43 1989. 772777.
R.D. Cramer III, D.E. Patterson, J.D. Bruce, Comparative
molecular field analysis CoMFA.: I. Effect of shape on
binding of steroids to carrier proteins, J. Am. Chem. Soc. 110
1988. 59595967.
P.J. Goodford, A computational procedure for determining
energetically favourable binding sites on biologically important macromolecules, J. Med. Chem. 28 1985. 849857.

You might also like