Professional Documents
Culture Documents
Quality Analysis With PCA and PLS - MTK337
Quality Analysis With PCA and PLS - MTK337
Quality Analysis With PCA and PLS - MTK337
Description
Main learning outcomes
Data table
Preparing the data
Insert category variables
Check column (variable) sets
Define sample sets from category variable column
Objective 1: Find the main sensory qualities
Make a PCA model
Interpret the variance plot in the PCA overview
Interpretation of the scores plot for the PCA
Interpretation of the correlation loadings plot
Interpretation of scores and loadings
Interpretation of the influence plot
Objective 2: Explore the relationships between instrumental/chemical data (X) and sensory data (Y)
Make a PLS regression model
Interpretation of the variance plot
Interpretation of the scores plot
Interpretation of the loadings and loading weights plot
Interpretation of the predicted vs. reference plot
Objective 3: Predict user preference from sensory measurements
Make a PLS regression model for preference
Interpretation of the regression overview
Interpretation of the regression coefficients
Open result matrices in the Editor
Predict preference for new samples
Interpretation of Predicted with Deviation
Check the error in original units – RMSE
Description
This tutorial aims to use multivariate techniques to analyze the quality of raspberry jam in order to
determine which sensory attributes are relevant to “perceived quality”. The analysis will cover three
aspects as follows.
1. A trained tasting panel has provided scores for a number of different variables using descriptive
sensory analysis. In this tutorial the first objective is to find the main sensory quality properties
relevant for raspberry jam.
2. The second objective is to find a way of rationalizing quality control, since the use of taste
panels is very costly. In this application a number of laboratory instrumental measurements
were investigated to potentially replace the sensory testing panel.
3. The third and final objective of this application is to be able to predict consumer preference for
Page 1 of 36
B: PCA/PLS quality analysis
raspberry jam from descriptive sensory analysis. The use of PLS regression modeling techniques
were investigated in order to potentially find a relationship between sensory data and
preference.
References:
Data table
Click the following link to import the Tutorial B data set used in this tutorial.
The analysis is based on 12 samples of jam (objects), selected to span the expected, normal quality
variations inherent in such products. Several observations and measurements were made on the
samples.
The samples were taken from four different cultivars, at three different harvesting times. The table
below describes the sampling plan for this analysis.
Sample description
1 C1-H1 1 1 7 C3-H1 3 1
2 C1-H2 1 2 8 C3-H2 3 2
3 C1-H3 1 3 9 C3-H3 3 3
4 C2-H1 2 1 10 C4-H1 4 1
5 C2-H2 2 2 11 C4-H2 4 2
Page 2 of 36
B: PCA/PLS quality analysis
6 C2-H3 2 3 12 C4-H3 4 3
Note that the agronomic production variables are not used as input variables in any of the matrices.
These represent known information which may be extremely valuable for the interpretation of the
results of the data analysis. They will be utilized as category variables in the analyses performed in
this tutorial.
Three chemical and three instrumental variables (APHA colorimetry) variables were also measured on
the samples tested by the sensory panel. These are described in the table below.
Instrumental variables
No Name Method
1 L Lightness
2 a Green-red axis
3 b Blue-yellow axis
4 Absorbance Absorbance
A trained sensory panel evaluated 12 different sensory attributes of the raspberries used to make the
jam, using a 1-9 point intensity scale. The entries in the data matrix are the average ratings over all
judges. The observed variables are listed in the table below.
Sensory variables
No Name Type
1 Redness Redness
3 Shininess Shininess
6 Sweetness Sweetness
7 Sourness Sourness
8 Bitterness Bitterness
9 Off-flav Off-flavor
10 Juiciness Juiciness
11 Thickness Viscosity/thickness
Page 3 of 36
B: PCA/PLS quality analysis
114 representative consumers were invited to taste the 12 jam samples used in this application. They
each provided an individual preference score on a scale from 1-9. The average over all consumers for
each sample is provided in the data table.
The data table, “JAMdemo”, consists of 20 samples. The first twelve samples will be used to develop
the models in this application and are hereafter referred to as training samples.
Eight new jam samples were assessed by the trained panel and given a sensory rating. These samples
represent the eight last samples in the table, and are referred to as Prediction samples. The
preference and the instrumental values are missing for these samples, as measurements were not
performed on these samples. The calibration model will be used to predict the preference for these
eight samples.
Task
How to do it
The data table should be opened by following the above link and are already organized into two row
sets for training and prediction. The different types of variables have been defined in the column sets
as Instrumental, Sensory and Preference, based on the definitions in the data tables above. These
defined sets can be seen by expanding the folders in the project navigator.
Page 4 of 36
B: PCA/PLS quality analysis
Some additional information about the cultivar and harvest time now needs to be added to this data
as two new columns.
To select a column, click on the header cell containing the column number. Activate the first column
of the table, right mouse click and select Insert - Category Variable or use the menu options and
select Edit - Insert - Category variable.
In the dialog box, enter the category variable name “Harvest Time”. Keep the default option Specify
the level manually selected.
Page 5 of 36
B: PCA/PLS quality analysis
Enter the level names: “H1”, “H2” and “H3” followed by a click on Add.
Click OK.
In the new column, click in each cell and select the appropriate value for each sample as given in the
sample names.
Note: Category variable cells are orange in the editor to distinguish them
from ordinary variables.
Add a second column in the same way, after highlighting the first column: Edit - Insert - Category
Variable. In the dialog box, enter the category variable name “Cultivar”.
Enter the level names: “C1”, “C2”, “C3”, and “C4” followed by a click on Add.
Page 6 of 36
B: PCA/PLS quality analysis
Click OK.
In the new column, double click in each cell and select the appropriate value for each sample as given
in the sample names. Alternatively, select all cells of each cultivar in sequence and fill in the category
level using the right-click Fill function.
The Tutorial_b data table displayed in the Editor (after insertion of Cultivar and Harvest Time)
Page 7 of 36
B: PCA/PLS quality analysis
groups.
Task
Check that the three column (Variable) Sets: “Instrumental”, “Sensory” and “Preference” have been
defined.
Verify the existence of two sample sets “Training” samples and “Prediction” samples. These sets can
be visualized in the project navigator.
How to do it
To create column and row ranges, select Edit - Define Range to open the Define Range dialog.
Three sets have been predefined in the project Tutorial_B data set.
To verify these definitions use the Edit - Define range and inspect the information in this dialog.
Page 8 of 36
B: PCA/PLS quality analysis
Additional row sets will be added for the various levels of the category variables harvest time and
cultivar.
How to do it
Begin by selecting the column “Cultivar” in the data editor, and select Edit- Group Rows…, which will
open the Create row ranges from column dialog.
The column that was selected, “Cultivar”,is already in the Cols field.
Page 9 of 36
B: PCA/PLS quality analysis
Click OK.
Automatically 4 row ranges have been added. Look in the Row folder to see them:
Make a PCA model using the column set “Sensory” as the variable set.
Page 10 of 36
B: PCA/PLS quality analysis
How to do it
Select Tasks – Analyze - Principal Component Analysis… Specify the following parameters in the
dialog box:
Model inputs
Data matrix: “JAMdemo” (20x21)
Rows: Training (12)
Cols: Sensory (12)
Maximum components: 6
Check the Identify outliers and Mean center data boxes, if these check boxes are not
already selected.
Principal Component Analysis dialog: Model inputs
Weights
From the Weights tab verify that the weights are all 1.0 (constant).
No weighting is used in this model as the sensory panel is known to be well trained.
However, sensory variables are often weighted when there is evidence that the panel is
not well trained, or when investigating relationships with other variables. The most
common weighting to use is 1/SDev.
Weights tab dialog
Page 11 of 36
B: PCA/PLS quality analysis
Validation
From the Validation tab select the option Cross Validation and press Setup which
opens the Cross Validation Setup dialog. Here select Full from the drop-down list for
cross validation method.
Validation Dialog
Page 12 of 36
B: PCA/PLS quality analysis
This validation method is more time consuming than other options, but the estimate of the residual
variance is more reliable.
Ensure that NIPALS is selected in the algorithm pane. Otherwise, the second principal component
might change direction. This has no impact on the drawn conclusions, but might be confusing for
beginners.
Click OK to start the PCA. After PCA analysis is completed, the program will request a user, “Do you
want to view plots of model PCA now?”. Click Yes to see the PCA Overview plots. A new node has been
added to the project navigator containing all the PCA result matrices and plots.
How to do it
The PCA Overview contains the most commonly used plots for interpreting PCA models, including
Scores plot.
Loadings plot.
Page 13 of 36
B: PCA/PLS quality analysis
Influence plot.
The scores plot is a map of the samples, and shows how they are distributed. It can be used to isolate
samples that are similar, or dissimilar to one another. In this analysis, the plot labels show that PC-1
explains 58% and PC-2 28% of the total variance in the data. The explained variance curve (in the
lower right corner) is an excellent tool for selecting the optimal number of components in the model.
The explained variance increases until PC 5 is reached. The software does suggest the optimal
number of PCs for a model, but it is up to the user to analyze the data and confirm the optimal
number of PCs in this model, usually based on this plot.
The highest explained variance is found with 5 PCs, but the explained variance in a model using 3 PCs
contains similar explained variation. A simple (parsimonious) model is usually more robust than a
complex one, and easier to interpret. It is always suggested to work with a model consisting of as few
PCs as possible. The info box in the lower left corner of the main workspace indicates that 3 PCs are
considered optimal for this model.
Info Box
Page 14 of 36
B: PCA/PLS quality analysis
Task
How to do it
Activate the lower right plot by clicking in it. Toggle between the Explained / Residual buttons from
toolbar shortcuts .
The explained variance is now converted to residual variance. The information is the same, but
presented in another way. The residual variance is well suited to finding the optimal number of PCs to
use in a model, while the explained variance is a better measure for explaining how much of the
variation is described by the model. The plot layout can be changed to a bar chart by using the plot
layout shortcut .
Page 15 of 36
B: PCA/PLS quality analysis
The model with 3 PCs describes 92% of the total validation variance in the data; for calibration it is
96%. These values may be obtained by clicking on the specific data point in the plot.
Use the toolbar buttons to change between having only the calibration or validation variance
curve plotted, or both.
Task
Interpret Scores plot. Use different plot options for ease of interpretation.
How to do it
The scores plot shows the projected locations of the samples onto the calculated PCs. By studying
patterns in the samples a meaningful interpretation of the PCs may be possible.
Page 16 of 36
B: PCA/PLS quality analysis
The scores plot for this analysis indicates that the 12 samples are not arranged in a random way. By
moving from left to right along this plot, a pattern can be observed where samples harvested at time
H1 are mainly found on the left. These then change to H2 and finally H3. Moreover, moving from the
top to the bottom, C1 samples occupy the top region, followed by C2, then C3, and finally C4.
The row sets based on the category variables that were inserted into the data table can be used to
better visualize these trends.
In the scores plot, right mouse click and select Sample Grouping to open the dialog where different
row sets can be used for grouping and color-coding the plot.
Select the Value of variable and the Cultivar category variable. In Labels, select Name to display the
real name of each sample.
The marker color, shape and size can be customized here for optimized viewing of the data.
Page 17 of 36
B: PCA/PLS quality analysis
When the desired settings have been defined, click OK to complete the operation.
Repeat the above sample grouping process, this time using the category variable Harvest Time.
Page 18 of 36
B: PCA/PLS quality analysis
Task
How to do it
Activate the X-Loadings plot by clicking in it, then use the corresponding shortcut button to make
it the correlation loadings plot.
The Correlation Loadings plot may be used to study the variable correlations that exist in a particular
data set.
The plot shows that two variables (redness and colour) have an extreme position to the right of the
plot along PC1. They are close to each other (i.e. they are highly positively correlated), and far from
the center and are very close to the edge of the 100% explained variance ellipse. This also means that
samples lying to the right of the scores plot have higher values for those two variables.
Along the vertical axis (PC2), two variables can be observed, with high negative values for this PC.
These are R.SMELL and R.FLAV. These two variables are opposite to the variable OFF FLAV which has
higher values for this PC. This indicates that raspberry smell and flavor correlate positively with each
other, and negatively with off-flavor.
Page 19 of 36
B: PCA/PLS quality analysis
Task
How to do it
The Scores plot and Correlation Loadings plot show that samples C2H3 and C1H3 have high color and
redness intensities, while sample C1H2 is more likely to have an off-flavor character. Samples located
in a specific part of a 2-vector scores plot have, in general, much of the properties of the variables in
the same location in the 2-vector loadings plot, provided that the plotted PCs describe a large
proportion of the variance.
PC 3 describes the variation in sweetness, bitterness and chewing resistance. Confirm this by
activating the loadings plot (upper right quadrant) and selecting Plot - Loadings. Display PC 1 vs. PC 3
In this new plot, the horizontal axis is unchanged (PC1) and the vertical axis now shows PC3.
Interpret the influence plot, which is used for the detection of outliers.
How to do it
The influence plot is displayed in the lower left quadrant of the PCA Overview. The strongest outliers
are placed in the upper right corner of the plot, i.e. they have a large leverage and a high residual
variance. In the current analysis, there is no evidence of outliers.
Page 20 of 36
B: PCA/PLS quality analysis
All of the results for the PCA are now part of the project Tutorial_B. Save the project to capture the
PCA results. The next steps in this tutorial will make use of the sensory, instrumental and preference
data.
Close the PCA overview by selecting its name in the navigation bar at the bottom of the viewer and
right clicking to select Close.
Task
Make a PLS regression model that predicts the variations in sensory variables from instrumental and
chemical variables.
How to do it
Select Tasks - Analyze - Partial Least Squares Regression…. Specify the following parameters in the
Page 21 of 36
B: PCA/PLS quality analysis
Regression dialog:
Page 22 of 36
B: PCA/PLS quality analysis
Press All to change the weighting of all variables at the same time. Variables can also
be selected by clicking on them in the list. Remember to hold the Ctrl key down while
selecting several variables. Choose the A / (SDev +B) button with the constants A = 1
and B = 0. Ensure that the weights change in the list.
All variables are weighted by dividing them with their own standard deviations. This
allows all variables to contribute to the model, regardless of whether they have a small
or large standard deviation from the outset; only the systematic variation is of interest
here.
Now go to the Y Weights tab and do the same. Do not click Finish, but go to the
Validation tab.
Validation tab
Select Cross validation from the Validation tab.
Press the Setup button to access the Cross Validation Setup dialog and choose Full
from the drop-down list. It is always recommended to use test set or cross validation
to develop final models. Ensure that NIPALS is selected in the algorithm pane.
Click Finish in the regression dialog when all parameters have been set up. The computation of the
model will begin. After PLS analysis is completed, the system will ask “Do you want to view the plots of
model PLS now?”.
Click Yes to see the PLS Overview plots. A new node, PLS, has been added to the project navigator.
Page 23 of 36
B: PCA/PLS quality analysis
This overview provides the most useful and common predefined result plots for PLS, including loading
weights and residuals, etc. The model can always be reviewed during the analysis stage by selecting
any of the result plots under the PLS - Plots node in the project navigator. For this exercise, various Y
response values were used for model development. Therefore the overview results for each of these
responses are available by choosing the Y value of interest in the tool bar. When performing this type
of analysis with multiple responses the non-significant variables may be determined for each of the
responses. It can also provide information on which sensory responses can best be predicted from the
instrumental measurements without making a separate PLS model for each response. When a
Predicted vs. reference plot is selected (lower right quadrant) active, the name of the Y value being
analyzed appears in the toolbar . Another Y-response can be chosen from the
drop-menu menu, or one can scroll through the values using the arrow tool on the right.
Interpret the explained variance curve, which can be shown as residual variance, or as explained
variance. The two different views are useful for different tasks.
How to do it
The Y-explained variance plot is in the lower left quadrant. This plot can be changed to the residual
variance plot by using the toolbar and as the X-explained variance by clicking on the X button
.
A local maximum is achieved for five PLS factors. The next task is to determine why the validation
curve does not follow the general trend. This can be done by looking at the explained variance for the
Page 24 of 36
B: PCA/PLS quality analysis
variables individually.
From the plot menu select Variances and RMSEP - X- and Y-Variance… Make sure the bottom plot
shows the Explained Variance for the 12 individual Y variables. If not, change it by using the toolbar
shortcut. Also do not select Total, but select Cal from the toolbar shortcuts .
Add a legend to the plot by right clicking and selecting Properties. Select legend, and check the box
visible to add the legend to the plot.
PLS, Explained Validation Variance Plot displayed for the 12 individual Y-variables
The conclusion reached from the residual variance curve was that two PLS factors were optimal. The
variables that are well described are reflected in the information conveyed by these factors.
About 85% of the color variation (variables 1 and 2), and 80% of the variation in sweetness (variable 6)
can be explained by a combination of the chemical and instrumental variables.
Page 25 of 36
B: PCA/PLS quality analysis
Note that only 23% of the total Y-variance is explained by the model using two factors.
Task
How to do it
Return to the Regression Overview Plot (by selecting it from the Plots node in the project navigator).
The Scores plot is always found in the upper left quadrant of the overview. The scores plot shows
patterns in the samples. This is often difficult to see without some other powerful visual tools. Use the
category variables as markers in the same way it was performed in the “Interpretation of the Scores
plot” for the PCA model. This can be performed by highlighting the scores plot and right clicking to
select Sample Grouping. The category variables harvest time, will be used for the sample grouping.
PLS factor 1 describes the harvesting time. Harvest time 3 is found on the right in the plot and harvest
time 1 to the left. The scores plot does not reveal information about the cultivars.
A comparison with the loadings plot provides more information. Interpret the two plots (Scores and
Loadings) by analyzing them together.
Task
How to do it
The loadings plot is located in the upper right quadrant of the Regression Overview. Activate it (if it is
present), or choose it from the project navigator under the PLS - Plots node. Make sure both X and Y
loadings are plotted.
To interpret variable relationships, visualize straight lines between the variables through the origin.
Variables along the same line, far from the origin, may be correlated. (Negatively correlated when
situated on opposite sides of the origin.)
Page 26 of 36
B: PCA/PLS quality analysis
The spectrophotometric color measurements (L, a, and b) appear to be strongly negatively correlated
with color intensity and redness. Sweetness is, as expected, strongly negatively correlated with
measured Acidity. But the R. Flavor shows weak correlation to the PLS-factors (near origin = low PLS
loadings).
The regression coefficients may also be analyzed to understand which X variables are important in
describing each of the Y responses. These can be selected from the project navigator, or from the
menu Plot- Regression Coefficients - Raw coefficients (B)- Line. The coefficients for each of the Y
responses can be displayed by selecting them from the drop-down list in the toolbar.
From Problem I it was concluded that the jam quality varied with respect to color, flavor, and
sweetness. But the results so far in Problem II show that the chemical and instrumental variables
mainly predict variations in color and sweetness (which is indicated by the low explained Y-variance of
Flavor). This indicates that the Y-variable Flavor cannot be replaced with the present set of X-
variables, i.e. there is no information in the chemical and instrumental measurements related to the
Flavor of the jam samples.
Use of other instrumental X-variables, e.g. gas chromatographic data, may have increased the flavor
prediction ability of the raspberry jam data.
Task
How to do it
The predicted vs. reference plot in the regression overview currently displays the results for the first
Y-variable, in this case, “Redness”.
Page 27 of 36
B: PCA/PLS quality analysis
PLS, Predicted vs. Reference Plot for variable “Redness”, model with two factors
Use the drop-down list in the toolbar to observe the prediction quality for other variables measured in
this analysis. Make sure these plots are displayed for two PLS factors, as this is the correct number for
this model. Note that for several of the properties, including raspberry flavor, raspberry smell, and
off-flavor, the instrumental values do not provide any real information. This analysis shows that the
chosen instrumental measurements are not a good substitution for the sensory analysis of these jams.
Task
Make a PLS regression model for describing the relationships between sensory data and preference.
How to do it
From the Main Menu, select Tasks - Analyze - Partial Least Squares Regression…, and specify the
following parameters in the PLS Regression dialog:
Model Inputs
Page 28 of 36
B: PCA/PLS quality analysis
Predictors
X data set: “JAMdemo”
Rows/Samples: Training (12)
Col/X-variables: Sensory (12)
Responses
Y data set: “JAMdemo”
Rows/Samples: Training (12)
Cols/Y-variables: Preference (1)
Maximum components: 6
PLS Regression Dialog
Weights in X and Y
It is necessary to standardize all variable with the option 1/SDev.
Select the X Weights tab and weight all the X variables with 1/SDev so that each
variable will contribute equally in the modeling step. Also weight the Preference values
(Y) by 1/SDev in the Y Weights tab.
Validation
Full Cross Validation
Press Setup to access the Cross Validation Setup dialog and choose Full cross
validation as the cross validation method.
Press OK.
Page 29 of 36
B: PCA/PLS quality analysis
Task
A new PLS node has been added to the project navigator. Rename this to “PLS Sensory” by highlighting
it, then right clicking and selecting the Rename option. Interpret the model using the regression
overview plots and other diagnostic tools available.
How to do it
It is of primary interest to determine how well the model can predict new values. Therefore only the
residual variance and the Predicted vs. reference plots have most meaning.
Activate the explained variance plot in the lower left quadrant, and change it to the residual Y variance
plot by using the toolbar shortcuts . The prediction error tapers off significantly after two PLS
factors. This represents the optimal model conditions.
Activate the predicted vs. reference plot and specify to display it for 2 PLS factors, using the arrows in
the toolbar .
Turn on the regression line and the target line with the toolbar shortcuts .
Page 30 of 36
B: PCA/PLS quality analysis
It can be observed that the predictions are of good quality. Some samples are not so well predicted,
but the overall correlation is satisfactory.
There are two kinds of regression coefficients, B and B. The B coefficients are calculated from the
w w
weighted data table and are used for interpretation. The B coefficients (raw) are calculated from the
raw data table and are used for predictions.
Task
Find which variables are important for predicting the Y-variable Preference.
How to do it
The estimated regression coefficients indicate the cumulative importance of each of the sensory
variables to the consumer preference.
Select Plot - Regression Coefficients. Choose the Weighted coefficients (B ) option. Using the arrows
w
in the toolbar, change the plot to show regression coefficients for 2 PLS factors, and change the plot
layout to a bar chart.
Page 31 of 36
B: PCA/PLS quality analysis
Redness, Color and Sweetness (B1, B2 and B6) are significant in predicting Preference. Raspberry Smell
(B4) is also significant, but contributing negatively to the Preference. Thickness (B11) seems to be of
importance also as it has a large (negative) coefficient.
Save the project file with the name “Tutorial_B “. It may also be saved as the model file itself, providing
a smaller file with just the model information that can be used for predicting new samples in real time
using The Unscrambler® Prediction Engine and The Unscrambler® X Process Pulse products. To save
the model only, right click on the model node in the project navigator and select the option Save
Model. In the dialog choose what size model to save. Models other than the full model do not include
all the results matrices, and therefore provide fewer results in addition to the predicted values when
used.
Save Model
The plot Raw regression Coefficients (B) is available as a predefined plot from the Plot menu in the
regression results viewer. However, for this exercise the B coefficients will be viewed from the list of
Page 32 of 36
B: PCA/PLS quality analysis
Task
How to do it
Open the Results folder under the PLS node in the project navigator and select the Beta Coefficients
(raw) matrix. Any of the other validation matrices may be selected from the validation folder of the PLS
model. The beta coefficients can then be treated as every other data in an Editor. They may be plotted
from the Plot menu, etc.
The purpose of the model previously developed was to predict the jam preference for some
consumers based on sensory values that were measured for the samples.
Task
Interpret the prediction results to see whether the predictions can be trusted.
How to do it
Activate the “JAMdemo” data matrix. Select Tasks - Predict - Regression… and specify the following
parameters in the Prediction dialog:
Page 33 of 36
B: PCA/PLS quality analysis
Check the boxes for Inlier statistics and Sample Inlier dist (Mahalanobis distance) to provide valuable
statistical measures of the similarity of the prediction samples to the calibration samples.
Page 34 of 36
B: PCA/PLS quality analysis
Task
Interpret the Predicted with Deviation plot, and other plots related to prediction results.
How to do it
Click OK in the Prediction dialog to display the predicted with deviation plot, and the tabulated
prediction results.
Prediction results
Predicted preference for the “unknown” new jams have some uncertainty limits, i.e. the accuracy of
new predictions is not so reliable, however, this model can be used to predict the preference of new
jam samples providing an indication of which ones will be accepted or not by consumers.
View the Inlier vs. Hotelling’s T² plot by selecting Plot – Residuals and Influence - Inlier vs Hotelling’s
T². This plot shows how similar the new samples are to those used in developing the calibration
model. For a prediction to be trusted the predicted sample must not be too far from a calibration
sample. This is checked by the Inlier distance. The projection of the new sample onto the model also
should not be too far from the center. This may be checked using the Hotelling’s T² distance.
Save the project file under the name “Tutorial B_complete”. This now includes all the data, three
models, and the predicted results for preference.
Task
How to do it
Return to the PLS Sensory node in the project navigator. In the plots folder select Regression
Page 35 of 36
B: PCA/PLS quality analysis
Two curves are plotted, one for the calibration: RMSEC and one for validation. In this particular case it
is the cross-validation error: RMSECV.
To gain a better approximation of what to expect in future predictions, the RMSECV should be
analyzed.
The RMSECV may be studied for Preference for all PLS factors. RMSECV (using two factors) is 0.83. This
means that any predicted new sample on the scale from 1 to 9 will have a prediction error around 0.8.
This is an acceptable error level in sensory analysis, which has much uncertainty in all measurements.
Page 36 of 36