Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Metabolomic Data Analysis with MetaboAnalyst 6.

Name: guest10885850341536898571

June 25, 2024

1 Data Processing and Normalization

1.1 Reading and Processing the Raw Data

MetaboAnalyst accepts a variety of data types generated in metabolomic studies, including compound
concentration data, binned NMR/MS spectra data, NMR/MS peak list data, as well as MS spectra
(NetCDF, mzXML, mzDATA). Users need to specify the data types when uploading their data in order
for MetaboAnalyst to select the correct algorithm to process them. Table 1 summarizes the result of the
data processing steps.

1.1.1 Reading Binned Spectral Data

The binned spectra data should be uploaded in comma seperated values (.csv) format. Samples can be
in rows or columns, with class labels immediately following the sample IDs.

Samples are in rows and features in columns The uploaded le is in comma separated values (.csv)
format. The uploaded data le contains 18 (samples) by 562 (spectra bins) data matrix.

1.1.2 Data Integrity Check

Before data analysis, a data integrity check is performed to make sure that all the necessary information
has been collected. The class labels must be present and contain only two classes. If samples are paired,
the class label must be from -n/2 to -1 for one group, and 1 to n/2 for the other group (n is the sample
number and must be an even number). Class labels with same absolute value are assumed to be pairs.
Compound concentration or peak intensity values should all be non-negative numbers. By default, all
missing values, zeros and negative values will be replaced by the half of the minimum positive value
found within the data (see next section)

1.1.3 Missing value imputations

Too many zeroes or missing values will cause diculties for downstream analysis. MetaboAnalyst oers
several dierent methods for this purpose. The default method replaces all the missing and zero values
with a small values (the half of the minimum positive values in the original data) assuming to be
the detection limit. The assumption of this approach is that most missing values are caused by low
abundance metabolites (i.e.below the detection limit). In addition, since zero values may cause problem
for data normalization (i.e. log), they are also replaced with this small value. User can also specify other
methods, such as replace by mean/median, or use K-Nearest Neighbours (KNN), Probabilistic PCA
(PPCA), Bayesian PCA (BPCA) method, Singular Value Decomposition (SVD) method to impute the
missing values 1 . Please choose the one that is the most appropriate for your data.
1 Stacklies W, Redestig H, Scholz M, Walther D, Selbig J. pcaMethods: a bioconductor package, providing PCA methods
for incomplete data., Bioinformatics 2007 23(9):1164-1167

1
Zero or missing values were replaced by 1/5 of the min positive value for each variable.

1.1.4 Data Filtering

The purpose of the data ltering is to identify and remove variables that are unlikely to be of use
when modeling the data. No phenotype information are used in the ltering process, so the result
can be used with any downstream analysis. This step can usually improves the results. Data lter is
strongly recommended for datasets with large number of variables (> 250) datasets contain much noise
(i.e.chemometrics data). Filtering can usually improve your results2 .

For data with number of variables < 250, this step will reduce 5% of variables; For variable number
between 250 and 500, 10% of variables will be removed; For variable number bwteen 500 and 1000, 25%
of variables will be removed; And 40% of variabled will be removed for data with over 1000 variables.
The None option is only for less than 5000 features. Over that, if you choose None, the IQR lter will
still be applied. In addition, the maximum allowed number of variables is 10000
No data ltering was performed.

Table 1: Summary of data processing results


Features (positive) Missing/Zero Features (processed)
1G280424_T0 562 0 562
1G280424_T1 562 0 562
1G280524_T0 558 4 562
2G280424_T0 556 6 562
2G280424_T1 559 3 562
2G280524_T0 557 5 562
3G280424_T0 561 1 562
3G280424_T1 562 0 562
3G280524_T0 559 3 562
1O280424_T0 541 21 562
1O280424_T1 536 26 562
1O280524_T0 536 26 562
2O280424_T0 538 24 562
2O280424_T1 537 25 562
2O280524_T0 534 28 562
3O280424_T0 542 20 562
3O280424_T1 529 33 562
3O280524_T0 549 13 562

2 Hackstadt AJ, Hess AM.Filtering for increased power for microarray data analysis, BMC Bioinformatics. 2009; 10:
11.

2
1.2 Data Normalization

The data is stored as a table with one sample per row and one variable (bin/peak/metabolite) per
column. The normalization procedures implemented below are grouped into four categories. Sample
specic normalization allows users to manually adjust concentrations based on biological inputs (i.e.
volume, mass); row-wise normalization allows general-purpose adjustment for dierences among samples;
data transformation and scaling are two dierent approaches to make features more comparable. You
can use one or combine both to achieve better results.

The normalization consists of the following options:

1. Row-wise procedures:

ˆ Sample specic normalization (i.e. normalize by dry weight, volume)


ˆ Normalization by the sum
ˆ Normalization by the sample median
ˆ Normalization by a reference sample (probabilistic quotient normalization)3
ˆ Normalization by a pooled or average sample from a particular group
ˆ Normalization by a reference feature (i.e. creatinine, internal control)
ˆ Quantile normalization

2. Data transformation :

ˆ Log transformation (base 10)


ˆ Square root transformation
ˆ Cube root transformation

3. Data scaling:

ˆ Mean centering (mean-centered only)


ˆ Auto scaling (mean-centered and divided by standard deviation of each variable)
ˆ Pareto scaling (mean-centered and divided by the square root of standard deviation of each
variable)
ˆ Range scaling (mean-centered and divided by the value range of each variable)

Figure 1 shows the eects before and after normalization.

Row-wise normalization: Normalization to constant sum; Data transformation: Log10 Normalization;


Data scaling: Pareto Scaling.

3 Dieterle F, Ross A, Schlotterbeck G, Senn H. Probabilistic quotient normalization as robust method to account for
dilution of complex biological mixtures. Application in 1H NMR metabonomics, 2006, Anal Chem 78 (13);4281 - 4290

3
Before Normalization After Normalization

2.5e+15
15
Density

2.0e+15
10 1.5e+15
1.0e+15
5
5.0e+14
0 0.0e+00
0

20

40

60

80

0e+00

5e-16
-5e-16
B2_4502 B2_4502
B0_9489 B0_9489
B5_2750 B5_2750
B0_9304 B0_9304
B2_6628 B2_6628
B1_9714 B1_9714
B4_1042 B4_1042
B4_6035 B4_6035
B4_2020 B4_2020
B3_8248 B3_8248
B4_6727 B4_6727
B7_1213 B7_1213
B0_7043 B0_7043
B2_7385 B2_7385
B2_1020 B2_1020
B1_0096 B1_0096
B7_4295 B7_4295
B2_2282 B2_2282
B1_0530 B1_0530
B2_0818 B2_0818
B4_1280 B4_1280
B1_4086 B1_4086
B0_3473 B0_3473
B1_3045 B1_3045
B2_1999 B2_1999
B4_7279 B4_7279
B1_3319 B1_3319
B5_6439 B5_6439
B1_7374 B1_7374
B2_3151 B2_3151
B1_4442 B1_4442
B0_5258 B0_5258
B3_4949 B3_4949
B6_0857 B6_0857
B1_9349 B1_9349
B2_4202 B2_4202
B5_0642 B5_0642
B1_0208 B1_0208
B1_0465 B1_0465
B2_4265 B2_4265
0

20

40

60

80

-2

-1

Intensity Normalized Intensity

Figure 1: Box plots and kernel density plots before and after normalization. The boxplots show at most
50 features due to space limit. The density plots are based on all samples.

4
2 Statistical and Machine Learning Data Analysis
MetaboAnalyst oers a variety of methods commonly used in metabolomic data analyses. They include:

1. Univariate analysis methods:

ˆ Fold Change Analysis


ˆ T-tests
ˆ Volcano Plot
ˆ One-way ANOVA and post-hoc analysis
ˆ Correlation analysis

2. Multivariate analysis methods:

ˆ Principal Component Analysis (PCA)


ˆ Partial Least Squares - Discriminant Analysis (PLS-DA)

3. Robust Feature Selection Methods in microarray studies

ˆ Signicance Analysis of Microarray (SAM)


ˆ Empirical Bayesian Analysis of Microarray (EBAM)

4. Clustering Analysis

ˆ Hierarchical Clustering
 Dendrogram
 Heatmap
ˆ Partitional Clustering
 K-means Clustering
 Self-Organizing Map (SOM)
5. Supervised Classication and Feature Selection methods

ˆ Random Forest
ˆ Support Vector Machine (SVM)

Please note: some advanced methods are available only for two-group sample analyais.

5
2.1 Principal Component Analysis (PCA)

PCA is an unsupervised method aiming to nd the directions that best explain the variance in a data
set (X) without referring to class labels (Y). The data are summarized into much fewer variables called
scores which are weighted average of the original variables. The weighting proles are called loadings.
The PCA analysis is performed using the prcomp package. The calculation is based on singular value
decomposition.

The Rscript chemometrics.R is required. Figure 2 is pairwise score plots providing an overview of
the various seperation patterns among the most signicant PCs; Figure 3 is the scree plot showing the
variances explained by the selected PCs; Figure 4 shows the 2-D scores plot between selected PCs; Figure
5 shows the biplot between the selected PCs. Interactive 3-D scores plots are not included here and can
be directly downloaded from website.

-5 0 5

10
5
PC 1
67.5 %

0
-5
-10
5

PC 2
0

11.8 %
-5

-10 -5 0 5 10

G O

Figure 2: Pairwise score plots between the selected PCs. The explained variance of each PC is shown in
the corresponding diagonal cell.

6
Scree plot

%
%

.9
1.0

.3

91
89
.1
%

85
.2
79
%
0.8

.5
67
Variance explained

0.6
0.4

%
8
0.2

.
11

%
8

2
5.

6
4.

2.
0.0

1 2 3 4 5

PC index

Figure 3: Scree plot shows the variance explained by PCs. The green line on top shows the accumulated
variance explained; the blue line underneath shows the variance explained by individual PC.

7
Scores Plot
15

G
O
10

3O280424_T1

1O280424_T0
5
PC 2 ( 11.8 %)

1O280524_T0
1G280424_T0 2O280524_T0
3G280524_T0 2O280424_T1
1G280524_T0
3G280424_T0
0

1O280424_T1
2G280424_T1
2G280524_T0
3G280424_T1
2G280424_T0 3O280424_T0
1G280424_T1

2O280424_T0
-5

3O280524_T0
-10
-15

-10 -5 0 5 10 15

PC 1 ( 67.5 %)

Figure 4: Scores plot between the selected PCs. The explained variances are shown in brackets.

8
-6 -4 -2 0 2 4 6

B2_6863 3O280424_T1

6
B2_6972
0.4

1O280424_T0

4
0.2

B2_5305 1O280524_T0
1G280424_T0 2O280524_T0

2
B2_8431
B1_7933
B1_7885
B2_6018
B4_6727
B1_7829
B2_8881
B3_4760 B4_6465
B1_7984
B1_7781
B1_8048
B1_7741
B1_7699
B2_6495
B1_8107
B4_7354
B2_8814
B2_8578
B4_6771 B5_1582
B5_1710
B1_7647
B4_6424
B5_0568B1_8842
B1_4327
B4_7279
B2_1769
B5_0254 B2_1265
B7_1696
B5_2275
B5_2179 2O280424_T1
B9_5124
B2_6763
B1_8157
B5_0217
B1_7587
B4_5484
B3_4949
B4_7231
B2_3927
B2_8634
B3_4721B2_3803
B4_6381
B1_7534
B4_6938
B4_6281B5_0115
B4_7312
B5_0152
B2_4860 B2_4502
B5_2131
B2_4405
B5_2099
B2_4449
B5_1962
B5_1808
B2_3864 B4_0517
B2_1333
B5_1854
B1_7463
B1_8752
3G280524_T0
B4_7054 B1_4086
B4_0010
B2_3740
B4_5530 B9_7581
B4_0623
B1_4215
B1_3994
B4_0460
B5_1492
B5_2067
B5_0642
B4_0417
B4_0337
B3_4852
B5_1886
B5_2036
B2_1687B2_3995
B2_4597
B5_1917B2_4068
B4_1042
B4_0694B2_1187
B1_9182
B1_8966
B1_9095
1G280524_T0
B1_8209
B1_4524
B1_4784
B1_8692
B3_5091
B2_7063
B2_8367
B3_4539
B3_5022
B1_4614
B4_6333
B1_4721
B4_6842
B2_1442
B3_4619
B4_6546
B2_6628
B3_4675
B4_7008
B1_8276
B1_8612
B2_8723
B4_6099 B2_1852
B1_4442
B3_9851
B1_4847
B5_1372
B1_5239 B4_2594
B2_1934
B2_2201
B5_1441
B1_4145
B4_0123
B2_2282B4_0228
B5_2416
B4_5689
B2_2129
B1_9248
B2_1999
B2_4132
B4_0984
B2_4343
B1_7374B5_0734
B4_1137
B2_4265
B2_2066
B5_2522
B4_2508
B4_0788
B2_4202
B1_7252
B4_0893
B5_2703
B2_3619 B2_0033
B2_0157
B1_9885
B2_0245
B1_2716
B4_5978 B5_2895
B1_5184
B4_5591
B1_5287
B1_8362
B1_8457B1_5069
B1_5010
B1_8533
B2_1566
B4_6672
B3_5140
B4_6592 B3_9689
B1_5127
B2_2374
B2_1093
B3_9888
B1_4934
B2_7151
B5_1308 B7_4295
B3_9467
B1_5330
B3_5602
B3_5275 B5_0823
B1_7133
B5_1162
B3_5427
B3_5488
B2_1020
B2_7209
B3_5326
B5_1241
B2_4740
B4_5862 B9_7661
B4_2404B5_3455
B1_2814
B1_3319
B5_3372
B1_6066
B0_8796
B4_1280
B5_2636
B2_3473
B3_7082
B5_2829 B0_8628
B4_2731
B1_9714
B1_3045
B5_3552
B1_5930
B3_7198
B4_1393
B2_3084
0.0

3G280424_T0
B4_6035 B3_5552
B5_1096
B1_3857
B3_5383
B4_5647
B4_7111
B3_9775
B5_1008
B4_5794
B3_9422
B2_7289B1_7031
B1_9349
B5_0912
B4_7172
B3_9925
B3_5213
B1_5445
B3_5652
B4_5922
B3_5729
B4_6218B2_0944 B4_1493
B4_2307
B5_2569
B3_7480
B3_8248
B3_7592
B2_3215
B3_6889
B2_3151
B3_6952
B7_4360
B4_2884
B2_3289
B3_7539
B5_3282
B3_9325 B9_7622
B4_2983
B2_3021
B2_2915
B1_2381
B4_1616B0_3252
B1_5697
B3_6278
B1_6400
PC2

B3_5769
B2_7385 B3_9030
B0_9169
B2_2489
B5_2996
B2_0882 B0_8916
B1_3454
B0_9237 B3_7011
B7_1169
B5_3096B5_2750
B3_6762
B1_9464
B2_3367
B3_6444
B3_6817
B3_9093B4_2208
B3_7362

0
B0_9415
B2_0818
B2_8251 B1_9574
B0_9304
B3_5839
B2_0312
B2_2613
B0_9372
B2_0759
B2_2700
B0_9825
B5_3507
B1_3588
B5_3155
B0_9050
B3_8807
B2_2757
B3_9254
B0_9452
B0_9705
B1_6831
B0_9532
B2_7694
B2_7529
B2_7951 B3_6152
B0_9958
B0_9572
B2_0677
B0_9881
B5_3624
B2_0418
B2_0548
B0_9763
B2_7830
B5_3759
B2_8078
B0_9621
B3_9159
B5_3891 B3_6029B4_3130
B3_6377
B3_6566
B5_3207
B3_5932
B3_6647
B1_3711 B2_2810
B4_1722
B4_1905
B4_1808
B7_1213
B4_2020
B4_2121 B0_3338
B0_3382
B5_3860
B5_3948B0_7728
B1_0027
B0_7858
B0_8106
B1_0096
B5_4039
B1_0208
B0_9489
B1_0342
B0_7819 B7_0876
B0_5208
B0_8166
B0_5054
B0_8215
B1_0300
B5_4005
B0_7786
B0_8052 B7_3730
B7_1305
B6_0709
B0_5258
B0_8016
B0_7958
B1_0388
B0_8260
B0_7902
B0_6877
B0_5389
B0_7671 B1_1578
B1_0465 B5_9732
B4_4204
B1_1395
B0_8301 B4_4292 B0_3473 1O280424_T1
B0_6736
B1_0577
B0_8482
B0_8405
B1_0627
B6_0777
B0_6435B1_0853
B1_0972
B1_1037
B1_1185
B1_1112
B1_0924
B6_0857 B1_1774
B6_0999
B1_0530 B1_1881
B0_5131
B1_1975
B1_1286
B1_0676
B5_4084
B0_8345
B1_0786
B1_0738 B5_9982
B1_2071
B4_4107
B4_4001
B0_7590
B4_3264 3O280424_T0
B0_5716
B6_0641
2G280424_T1
2G280524_T0
B0_6221
B0_6583
B6_2294 B0_6257
3G280424_T1
B0_6643 B0_5814
B7_5292
B7_0515
B0_6166 B4_3345
B0_5676
B6_5956
B0_5635
B5_4630
B5_4801
B0_6032
B0_6300
B0_7133
B0_6095
B0_6989B5_4727
B5_4520
B0_7043
2G280424_T0 B0_7436
B0_7356
B0_7467
B0_7297
B0_7504
B0_7237
B4_3898
B5_4867
B6_0033
B5_4156
B6_2891 B5_4934
B5_4977 B4_3407
B4_3849
B4_4356B7_3552
B4_3446
B5_6439
B5_4372
B4_3612

-2
B6_1646
1G280424_T1
-0.2

2O280424_T0

-4
B5_7201
-0.4

-6

3O280524_T0
-0.6

-0.6 -0.4 -0.2 0.0 0.2 0.4

PC1

Figure 5: PCA biplot between the selected PCs. Note, you may want to test dierent centering and
scaling normalization methods for the biplot to be displayed properly.

9
2.2 Orthogonal-Orthogonal Projections to Latent Structures Discriminant
Analysis (OPLS-DA)

OPLS-DA, like PLS-DA, is a powerful tool used for dimension reduction and identication of spectral
features that drive group separation. It is a supervised modeling method, and may be used instead
of PLS-DA due to its capablitities to distinguish between variations in a dataset relevant to predicting
group-labels and variations irrelevant to predicting group-labels. In this sense, OPLS-DA tends to make
models that are less complex and more insightful than PLS-DA. However, both OPLS-DA and PLS-
DA are prone to create models that over-t data, therefore requiring cross-validation to ensure model
reliability. For further details, please refer to Worley and Powers 2013 (PMC4465187) and Worley and
Powers 2016 (PMC4990351). The permutation testing for OPLS-DA is provided from Szymanska et al.
2012.

Figure 6 shows the score plot for all metabolite features; Figure 7 shows the variable importance in
an OPLS-DA model; Figure 8 shows the model overview; Figure 9 shows the results of the permutation
tests for the models;

Scores Plot

G
O
20
Orthogonal T score [1] ( 11.7 %)

10

2O280424_T0
1O280424_T1
2G280424_T0
2O280424_T1
2G280424_T1
1G280424_T1 3O280424_T0
3G280424_T1
3O280524_T0
3G280424_T0
0

2G280524_T0 2O280524_T0
1O280424_T0
1G280524_T0
3G280524_T0 1O280524_T0

1G280424_T0
-10

3O280424_T1
-20

-20 -10 0 10 20

T score [1] ( 64.5 %)

Figure 6: OPLS-DA score plot of all metabolite features.

10
Feature Importance
1.0

B1_3045
B1_3319 B2_0245
B5_3455
B1_2814
B0_8796
B5_3372 B2_0157
B0_8628
B5_3552
B1_9714
B1_2381 B2_0033
B1_2716
B1_9885
B1_9095
B1_8966 B9_5124
B1_9182

B0_3338
B2_3084
B2_3021
B4_2983
B4_2731
B2_1187
B2_2915 B0_3382
B4_1280 B9_7622
B1_6066
B4_1493
B0_3252
B4_1393
0.5

B9_7661
B5_2703
B5_2522 B0_3473
B4_2508
B2_1265
B5_9732
B1_8842
B3_7198B9_7581
B4_2594
B1_5930
B7_1696
B5_2131
B4_1137
B5_2179
B2_4502
p(corr)[1]

B0_5131
B7_3730
0.0

B4_2404
B7_3552
B2_4132
B4_3264
B7_4295
B4_6465
B5_2099
B5_2275
B3_7082
B4_3345
B5_9982
B2_5305
B4_3407
B4_0623
B4_3446
B2_4068
B5_4630
B3_6952
B7_1169
B4_0694
B6_5956
B5_4801
B2_6972 B7_0876
B0_5635B4_0788
B3_7480
B4_2307
B2_4597
B7_1213 B4_1905
B4_3612B2_4449
B4_0984
B5_2416
B2_1999
B2_6863 B5_7201 B0_5676
B6_0999 B7_4360
B4_0228
B2_3473
B2_4202
B4_0337
B7_1305 B3_7362
B2_4405
B3_7011
B4_4001
B4_1042
B3_8248 B5_0642
B4_0010
B3_7592
B4_1722
B5_1582
B4_1808
B4_2121
B5_2636
-0.5

B4_4204
B5_0568
B3_9030 B4_2208
B2_3619
B4_0123
B3_8807B0_5716
B0_5814
B2_3995
B5_1492
B2_1934
B5_4372 B4_3898
B4_6424 B4_2020
B2_4265
B5_1962
B4_0517
B4_4356B4_3849
B0_7356
B1_7031
B1_7133
B4_7279 B4_0893
B1_1774
B3_4852
B5_4727
B3_7539
B4_0417
B3_9093 B2_1852
B3_6889
B0_7237
B0_7436
B0_5208 B5_0734
B1_3994
B4_7231 B1_4327
B5_4520
B1_4145
B5_2067
B5_6439 B7_5292B6_0033 B3_9851
B4_0460
B4_5689 B4_4292
B3_6762
B0_7467
B0_7297
B1_7252 B4_3130
B4_4107
B3_9467
B0_6032
B1_6831 B0_7590
B1_1881
B6_1646 B5_4934 B1_4215
B0_7133
B3_6444
B3_9689
B3_9422 B1_6400
B1_2071 B4_2884
B5_2569
B3_9325
B0_7504
B3_6278
B6_0641
B3_4760B2_1769 B4_1616
B5_2750
B4_7354
B5_0254
B5_0115
B5_0152B3_6817
B3_5488
B3_6377
B2_4343
B1_4086
B2_2066
B1_1975
B7_0515
B2_8431B1_7647B2_3740
B5_0217 B3_9888
B0_6095
B3_6566
B0_7043
B1_7374
B6_0857
B1_7463
B5_1710 B2_2201
B2_2129
B5_1441
B5_2036
B3_6152 B5_2829
B4_5484B1_7781
B1_7699
B5_4977
B1_7741
B1_7829
B1_7933
B1_7885
B4_7312B2_3803
B5_4867
B3_5552
B1_1578
B3_4721
B6_2891
B4_6727
B2_6018 B6_0777
B4_5530 B3_6647
B3_6029
B2_3864
B1_7587 B5_1854
B5_1808
B2_3927
B3_4949
B0_6300
B0_6989
B1_7534
B0_5258
B0_8345
B3_9775 B1_9248
B4_6381
B1_7984
B5_4156
B1_8048 B5_1886
B2_1333
B0_5054
B4_5591
B0_6166
B0_6643
B2_8881
B3_4539
B4_6771
B4_7054
B0_6257
B2_4860
B1_8107
B0_6221 B5_1917
B3_9925
B2_2282
B3_5427
B0_8260
B1_4442 B2_3215
B5_0823
B2_3289
B2_2810
B1_8157
B2_6495
B4_6281
B2_8814 B1_8752
B2_1687
B3_5022B3_5383
B3_9254
B2_8578 B0_8405
B2_4740 B1_1286
B1_4784
B1_0924
B3_5091 B1_1395 B2_3151
B2_3367
B2_6763
B4_6546
B4_6938
B3_4675
B2_8634
B4_6842
B0_6583
B1_8209
B4_7008
B3_4619
B4_6333
B2_8367
B3_9159 B1_4524
B5_2895
B1_0676
B4_5647
B1_0738
B3_5602
B6_0709
B0_8482
B0_8301
B1_4721
B1_1112
B1_0465
B1_4614
B1_8692
B1_4847
B1_0577
B1_0530
B3_5275
B1_1037
B3_5932
B2_1442
B2_7063 B2_1093
B5_1372
B1_0786
B1_0853
B3_5326
B1_1185
B1_0627
B1_0972
B1_5239
B1_0388
B1_5069
B4_6672
B1_5287
B4_6099
B4_5978
B0_6435 B3_5140
B0_5389
B1_8276
B3_5652
B2_6628 B0_9489
B1_8612
B4_5862
B2_8723
B4_7111 B1_5010
B1_5184
B1_5127
B5_1308
B1_4934
B0_8016
B1_5330
B0_8215
B0_7958
B1_8533
B2_1566
B3_5213
B4_7172
B4_5794
B4_6592
B2_7151
B3_5729
B1_8362
B4_6035
B1_8457 B1_9574
B5_0912
B2_2700
B5_1162
B2_1020
B2_2374
B5_1008
B5_1241
B3_5839B5_1096
B1_0300
B0_8052 B1_5697
B1_0342
B1_9349
B2_7209
B1_3857
B2_2757
B0_8106
-1.0

B6_2294
B4_6218
B4_5922
B0_6736 B0_8166
B3_5769
B5_4084
B2_7385
B1_0208
B5_4005B0_7671
B0_7902
B2_7289
B2_2489
B2_0759
B5_3948
B2_8251
B5_3891 B1_5445
B1_9464
B2_0944
B0_6877
B1_0027
B0_7819
B0_9532
B1_3711
B0_9572BB0_7728
B0_7858
B2_2613
B5_2996B0_9372
B0_9169
B0_9452
B2_0882
B0_7786
B2_0818
B1_0096
B0_9415
B0_9237
B0_9304
B0_9050
B1_3588
B5_3624
B2_0677
B5_3860
B5_3759
B2_7830
B2_0548
B2_0418
B2_7694
B2_7529
B2_7951
B0_9621
B2_8078
B0_9763
B0_9881 B5_3282
B5_3507
5_3096
B1_3454
B2_0312
B0_9958
B5_3207
B0_9705
B5_3155
B5_4039 B0_9825B0_8916

-15 -10 -5 0 5 10

p[1]

Figure 7: OPLS-DA loadings S-plot showing the variable importance in a model, combining the covari-
ance and the correlation (p(corr)) loading prole.

11
0.982 0.974
0.8

0.645
0.6

R2X
0.4

R2Y
Q2
0.2

0.117

0.0138 0.0136
0.0

p1 o1

Figure 8: Model overview of the OPLS-DA model for the provided dataset. It shows the R2X, R2Y, and
Q2 coecients for the groups.

12
200
150
Frequency

Q2: 0.988
p < 0.001 (0/1000)
100

R2Y: 0.995
p < 0.001 (0/1000)

Perm R2Y
Perm Q2
50
0

-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

Permutations

Figure 9: Permutation analysis, showing the observed and cross-validated R2Y and Q2 coecients.

13
3 Appendix: R Command History
[1] "mSet<-InitDataObjects(\"specbin\", \"stat\", FALSE)"
[2] "mSet<-Read.TextData(mSet, \"Replacing_with_your_file_path\", \"rowu\", \"disc\");"
[3] "mSet<-SanityCheckData(mSet)"
[4] "mSet<-ReplaceMin(mSet);"
[5] "mSet<-SanityCheckData(mSet)"
[6] "mSet<-FilterVariable(mSet, \"F\", 25, \"none\", -1, \"mean\", 0)"
[7] "mSet<-PreparePrenormData(mSet)"
[8] "mSet<-Normalization(mSet, \"SumNorm\", \"LogNorm\", \"ParetoNorm\", ratio=FALSE, ratioNum=20)"
[9] "mSet<-PlotNormSummary(mSet, \"norm_0_\", \"png\", 72, width=NA)"
[10] "mSet<-PlotSampleNormSummary(mSet, \"snorm_0_\", \"png\", 72, width=NA)"
[11] "mSet<-PCA.Anal(mSet)"
[12] "mSet<-PlotPCAPairSummary(mSet, \"pca_pair_0_\", \"png\", 72, width=NA, 5)"
[13] "mSet<-PlotPCAScree(mSet, \"pca_scree_0_\", \"png\", 72, width=NA, 5)"
[14] "mSet<-PlotPCA2DScore(mSet, \"pca_score2d_0_\", \"png\", 72, width=NA, 1,2,0.95,0,0, \"na\")"
[15] "mSet<-PlotPCALoading(mSet, \"pca_loading_0_\", \"png\", 72, width=NA, 1,2);"
[16] "mSet<-PlotPCABiplot(mSet, \"pca_biplot_0_\", \"png\", 72, width=NA, 1,2)"
[17] "mSet<-PlotPCA3DLoading(mSet, \"pca_loading3d_0_\", \"json\", 1,2,3)"
[18] "mSet<-PlotPCAPairSummary(mSet, \"pca_pair_1_\", \"png\", 72, width=NA, 2)"
[19] "mSet<-PlotPCA2DScore(mSet, \"pca_score2d_1_\", \"png\", 72, width=NA, 1,2,0.95,1,0, \"na\")"
[20] "mSet<-GetGroupNames(mSet, \"null\")"
[21] "colVec<-c(\"##e19017\",\"##619a38\")"
[22] "shapeVec<-c(17,16)"
[23] "mSet<-UpdateGraphSettings(mSet, colVec, shapeVec)"
[24] "mSet<-PlotPCA2DScore(mSet, \"pca_score2d_2_\", \"png\", 72, width=NA, 1,2,0.95,1,0, \"na\")"
[25] "mSet<-OPLSR.Anal(mSet, reg=TRUE)"
[26] "mSet<-PlotOPLS2DScore(mSet, \"opls_score2d_0_\", \"png\", 72, width=NA, 1,2,0.95,0,0, \"na\")"
[27] "mSet<-PlotOPLS.Splot(mSet, \"opls_splot_0_\", \"all\", \"png\", 72, width=NA);"
[28] "mSet<-PlotOPLS.Imp(mSet, \"opls_imp_0_\", \"png\", 72, width=NA, \"vip\", \"tscore\", 15,FALSE)
[29] "mSet<-PlotOPLS.MDL(mSet, \"opls_mdl_0_\", \"png\", 72, width=NA)"
[30] "mSet<-PlotOPLS2DScore(mSet, \"opls_score2d_1_\", \"png\", 72, width=NA, 1,2,0.95,1,0, \"na\")"
[31] "mSet<-PlotOPLS.Imp(mSet, \"opls_imp_1_\", \"png\", 72, width=NA, \"vip\", \"tscore\", 100,FALSE
[32] "mSet<-PlotOPLS.Imp(mSet, \"opls_imp_2_\", \"png\", 72, width=NA, \"vip\", \"tscore\", 200,FALSE
[33] "mSet<-PlotOPLS.Imp(mSet, \"opls_imp_3_\", \"png\", 72, width=NA, \"vip\", \"tscore\", 250,FALSE
[34] "mSet<-OPLSDA.Permut(mSet, 1000)"
[35] "mSet<-PlotOPLS.Permutation(mSet, \"opls_perm_0_\", \"png\", 72, width=NA)"
[36] "mSet<-SaveTransformedData(mSet)"
[37] "mSet<-PreparePDFReport(mSet, \"guest10885850341536898571\")\n"



The report was generated on Tue Jun 25 05:19:21 2024 with R version 4.2.2 (2022-10-31), OS system:
Linux.

14

You might also like