Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718

Contents lists available at ScienceDirect

Spectrochimica Acta Part A: Molecular and Biomolecular


Spectroscopy
journal homepage: www.elsevier.com/locate/saa

Nondestructive detection of sunset yellow in cream based on


near-infrared spectroscopy and interval random forest
Jun Liu a,b, Siqi Sun b, Zhenglin Tan c,⁎, Yang Liu b
a
Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan 430205, China
b
School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan 430205, China
c
Department of Cuisine and Nutrition, Hubei University of Economics, Wuhan 430205, China

a r t i c l e i n f o a b s t r a c t

Article history: Based on near-infrared spectrum and interval random forest, a fast quantitative analysis method for the content
Received 8 May 2020 of sunset yellow content was established. The spectra of 132 cream pigment samples were obtained by FT-NIR
Received in revised form 30 June 2020 spectrometer, and various preprocessing methods such as standard normal variable (SNV), wavelet transform
Accepted 7 July 2020
(WT), and SG (Savitzky-Golay) were used to smooth and denoise the original spectrum. In this paper, WT and
Available online 20 July 2020
first-order differentiation were used as pretreatment and the Kennard-Stone algorithm was used to divide the
Keywords:
data set. Finally interval partial least squares, partial least squares, interval random forest and random forest
NIR were used to construct an optimal quantitative analysis model. The experimental results show that the interval
Interval random forest random forest can find the best sub-interval to achieve the prediction ability of the model. The R2 (the coefficient
Non-destructive of determination) and RMSEP (root mean square error of the prediction) of the prediction set are 0.8965 and
Sunset yellow 0.2454, respectively. The research results show that near-infrared spectroscopy combined with interval random
forest algorithm is a fast and non-destructive method to detect the content of sunset yellow in cream.
© 2020 Elsevier B.V. All rights reserved.

1. Introduction very complicated [7]. Therefore, this paper chose the method of NIR
spectroscopy to implement this study.
Margarine is commonly used in baked goods such as cakes and pas- Thanks to its rapid detection and non-destructive testing, NIR tech-
tries. To brighten its color, various artificial colors are often added, one nology has been widely applied in many fields like medical analysis, pe-
of which is artificial pigment. Generally made from aniline dyes in troleum product analysis, molecular material analysis, etc. [9–11].
coal, it has no nutritional value to the human body and will affect However, it also has its shortcomings, that is, the noise in the acquired
children's physical development and cause diseases and even cancer spectral image is too cluttered to be used on a large scale. Therefore, it
[1–4]. Thus, the analysis of artificial pigments in cream has become par- is necessary for pre-processing before establishing the analytical
ticularly urgent. model. The pre-processing process generally includes SNV (standard
Since we have done previous research on indigotine [5], and normal variable), MSC (multiple scattering correction), SG (Savitzky-
achieved good results, with R2 (the coefficient of determination) Golay), and others [12]. In practical application analysis, the most
reaching 0.9402 and RMSEP(root mean square error of the prediction) widely used methods in NIR are PLSR (partial least squares regression),
0.2509, the focus of this study has shifted to sunset yellow, one of the ar- MLR (multiple linear regression), etc. as well as many other linear and
tificial colors listed in China's hygienic use standards. Previous studies nonlinear analysis methods [13–15].
have shown that yellow pigments can harm human health, such as With the popularity of machine learning, various types of machine
liver cells. learning algorithms have also been applied in the field of NIR by
At present, artificial color detection methods include thin-layer scholars. For example, in 2018, Liu et al. used the SVM (Support Vector
chromatography, high-performance liquid chromatography, polarogra- Machine) method to analyse the content of camelina protein and ob-
phy, spectrophotometry, capillary electrophoresis and others [6–8], but tained RMSEC (root mean square error of calibration) and RMSEP(root
all have their technical limitations. For example, thin-layer chromatog- mean square error of the prediction) of 0.83963 and 0.96578, respec-
raphy is cumbersome and has poor quantitative accuracy; spectropho- tively, which proves more efficient than PLSR and PCR (Principal com-
tometry requires some chemometric methods, and data processing is ponent regression) [16]. And in recent years, Yang et al. employed
machine learning methods such as SVM to analyse soil organic matter
⁎ Corresponding author. and pH [17]. Another common algorithm applied in the field of machine
E-mail address: tanzhenglin@hbue.edu.cn (Z. Tan). learning is Random Forest (RF), which is often used as a classification

https://doi.org/10.1016/j.saa.2020.118718
1386-1425/© 2020 Elsevier B.V. All rights reserved.
2 J. Liu et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718

method and regression analysis [18]. RF is an extension of the concept of Table 1


the decision tree, where each tree is a separate model representing a Statistics of the calibration set and validation set.

characteristic in a given data set, and the combined analysis can obtain Dataset The number of Minimum Maximum Mean value
more objective and fair results [19,20]. samples (g/kg) (g/kg) (g/kg)
In the field of near-infrared spectroscopy, random forest is not only Data set 132 0.0 0.1014 0.04631
widely used for classification, but also for quantitative analysis [21,22].
In 2006, some scholars performed adaptive discrete wavelet transform
(DWT) on NIR and then used penal discriminant analysis (PDA), multi- 12, Russia). The cream samples containing sunset yellow pigment
variate adaptive regression spline discriminant analysis (MARS-DA), were carefully placed and loaded in a 40 cm3 sample cell to avoid air
and RF for modeling analysis to determine the quality of the wine of bubbles (air bubbles affect the machine's near-infrared scanning,
the wine, with the corresponding accuracy rates of 99.93%, 99.2% and resulting in highly inaccurate near-infrared spectroscopy). The near-
76.4% [23]. However, recent studies have shown that when used for infrared spectral images of samples in the range of 8000-14,000 cm−1
quantitative analysis, multiple combination trees are cascaded into RF were recorded by the spectrometer, with an average spectral resolution
to form a comprehensive learner. Considering various parameters, the of 3 scans. Further information about the samples in this experiment is
prediction of concentration will be more accurate and robust. In 2017, shown in Table 1.
Chemura et al. [24] used the RF algorithm to test the ability of selected
bands in the VIS/NIR range to predict plant water content (PWC) in cof- 2.2. Data preprocessing
fee. Their research selected three bands after determining appropriate
parameters and the results showed that the selected bands could reli- As the original data has a lot of irrelevant information, a preprocess-
ably predict PWC. In the establishment of RF, the trees in the selected ing of the original image is needed prior to the building of a model. The
data set are randomly constructed, so each tree is relatively indepen- standard normal variable transform (SNV) [27,28], Savitzky-Golay
dently distributed. Then comprehensive analysis and band selection smoothing [29] and wavelet transform [30,31] are often used, which
are performed to avoid over-fitting. Therefore, improved RF band will lead to different results when used separately or in combination.
screening was an accurate regression prediction analysis. Therefore, a large number of experiments are needed to verify the re-
In this study, we explored a new method of band screening for NIR sults so that the optimal model and the best preprocessing method
spectroscopy analysis of margarine pigment called sunset yellow. It is can be obtained.
worth noting that in previous studies where FT-NIR technology was Near-infrared spectral images usually contain a lot of unwanted
used, the detection limit reached 1% or one thousandth [25,26]. How- physical information about non-target factors, such as background
ever, it reached one in ten thousand in this study, which helps us to noise and baseline drift. To obtain useful feature information, SNV pro-
make this study more accurate. Since hydrocarbons of artificial colors cessing can be performed. SNV is used to eliminate the influence of un-
have very similar analytical structures and even isomers, this study se- even particle distribution, surface scattering and different particle sizes
lected the sunset yellow because of its high use in margarine food. on the spectrum as well as the influence of optical path reflection on the
Fast and reliable testing is a task in the field of food supervision. This diffuse reflection spectrum. The calculation formula is as follows:
study used partial least squares (PLS) and RF for quantitative analysis.
Then it is time to use corresponding NIR spectra to determine the spec- X i;k −X i
X i;SNV ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1Þ
trum of the samples in margarine, and the corresponding artificial pig- Pm  2
ment concentrations. To better determine the model, this study also k¼1 X i;k −X i
m−1
compared the preprocessing methods such as WT, selected the best pre-
processing method, and then optimized the final model. At the same
time, the partial least square (PLS) and random forest (RF) processing where Xi is the average of the spectrum of the first sample, k = 1, 2, Λ, m.
results were compared. In the analysis process, the influence of the se- m is the number of wavelength points and Xi, SNV is the transformed
lected wavelength on the final model was tested, and the interval partial spectrum.
least square method and the interval random forest were proposed. In Savitzky-Golay aims to smooth the noisy data and eliminate the data
addition, it was clearly concluded that when the number of trees con- points with large obstacle so that the map can be operated simply and
structed in the random forest and the construction method changed, preliminarily [32]. SG is based on polynomials in the time domain. By
the evaluation performance index of the final model would also change. moving windows and fitting polynomials with continuous subsets, the
convolution coefficients and related differential orders of all data points
2. Material and methods can be obtained. This formula is as follows:

2.1. Samples and instrumentation 0 1 Xþw


xk;smooth ¼ xk ¼ x m ð2Þ
M i¼−w kþi i
The sample in this paper was margarine purchased in a supermarket.
The pigment used was sunset yellow with a purity of more than 99%. A
where M is the normalisation factor, mi is the smoothing factor and w is
Sartorius CP224S electronic balance with an accuracy of 0.0001 g was
the number of points that need to be fitted.
used to accurately weigh the pigment about 0.0006 g. The dosage of
When the wavelet function is two-dimensional, the time-domain
sunset yellow in China's food standards does not exceed 0.05 g/kg, so
and frequency-domain can be located to accelerate the calculation effect
the concentration of pigments formulated in this experiment was
[31,33]. The basic calculation formula of continuous wavelet transform
0 g/kg to 0.10 g/kg.
is as follows:
After creams of different mass was mixed, weighed, and then added
with sunset yellow, the near-infrared scans of 132 samples (including Z  
1 t−τ
the control group) were performed. At the same time, it was ensured CWT ψx ðτ; sÞ ¼ Ψψx ðτ; sÞ ¼ pffiffiffiffiffiffiffiffi xðt Þψ  dt ð3Þ
jsj s
that the experiment was carried out at stable room temperature and hu-
midity. To obtain the experimental data comprehensively, a control
group without any pigment was specially set up to compare with Among them, x(t) is the original time-domain signal, which ψ(t) is a
other samples containing pigment. Spectral measurements were per- wavelet basis function, which can be formulated according to different
formed using a Fourier transform infrared spectrometer (InfraLUM FT- situations.
J. Liu et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718 3

Commonly used wavelet basis functions include Haar, Coiflets,


Symlets, Daubechies and others. The basic characteristics of wavelet
basis functions are:
Z ∞
ψðt Þ ¼ 0 ð4Þ
−∞

The wavelet basis adopted in this experiment is Daubechies12.

2.3. Kennard-Stone algorithm

The Kennard-Stone algorithm [34,35] regards all samples as candi-


date samples for the training set and selects certain samples from
them to the training set in turn. Firstly, two samples with the farthest
Euclidean distance are selected to enter the training set. Then, by calcu-
lating the Euclidean distance from each remaining sample to each
known sample in the training set, two samples with the farthest dis-
tance and the closest distance to the selected sample are found. Finally,
Fig. 1. NIR raw absorbance spectra of samples.
these two samples are selected into the training set, and the above steps
are repeated until the number of samples meets the requirements.
The formula for Euclidean distance is: of 700-900 nm, the change of absorbance mainly depended on the con-
centration. The absorption peak of aniline at 800 nm provided a theoret-
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u N ical basis for the detection of sunset yellow. Since sunset yellow has O-H
uX  2
dx ðp; qÞ ¼ t xp ð jÞ−xq ð jÞ ; p; q∈½1; N  ð5Þ (Hydroxyl) groups in the molecular structure, FT-NIR uses the 700-
j¼1 1200 nm band to record the information of best power and the com-
bined frequency information in the infrared fundamental frequency as
where xp and xq represent two different samples, and N is the number of well as the vibration of hydrogen-containing groups including C\\H
spectral wave points of the sample. (Hydrocarbon), O\\H (Hydroxyl), N\\H (Amine) and so on. Due to dif-
ferent temperature and humidity, the spectrum at the same concentra-
2.4. Interval Random Forest tion may vary. As can be seen from Fig. 1, there are problems of
smoothness and unevenness in the near-infrared spectrum of marga-
Random Forest is a very efficient method for processing nonlinear rine, such as baseline drift and scattering, etc. Therefore, spectral pre-
data [36]. Some scholars have applied it in the field of processing treatment is required before quantitative analysis.
near-infrared spectroscopy multi-class analysis, and used Monte Carlo
and uninformative variable elimination to improve the RF algorithm, 3.2. Comparison of preprocessing methods
which has achieved good results [37]. Therefore, this paper proposed
an interval Random Forest algorithm based on random forests and ex- At present, the methods used for near-infrared spectral preprocess-
pected to obtain a better regression prediction effect on the near- ing include multiple scattering correction (MSC), SNV, normalisation,
infrared spectrum of the amount of sunset yellow in cream. Interval first derivative (D1st), and WT. MSC is often used to improve the base-
Random Forest divides the original wavelength region from 700 nm to line drift of the sample spectrum; SNV is often used to correct the influ-
1250 nm into n equal-width sub-intervals and performs random forest ence of scattering; normalisation is often used to improve the influence
regression on each sub-interval to establish a local regression model of of spectrum changes or sample dilution on the spectrum; first derivative
cream pigment. In this way, n local regression models can be obtained, methods are often used to eliminate constant translation of the
where the best seed of random forest is found. Finally, the optimal background-; WT has multi-resolution characteristics. Selecting a
model based on sub-intervals is obtained. To build a random forest cor- proper wavelet basis function can achieve an appropriate expression
rection model, the contribution of variables should be calculated first. in the time and frequency domain of high-frequency non-stable nonlin-
When the variable that contributes the most to the final result is se- ear signals.
lected to be the best one, the prediction performance of the random for- When wavelet transform was used as the preprocessing of near-
est correction model is better [38,39]. This experiment is written in infrared spectral data of cream pigment, it is necessary to select an ap-
python language, and uses the Random Forest model of scikit-learn li- propriate wavelet basis function and wavelet decomposition layers to
brary for initial modeling. In addition, in the interval Random Forest obtain a better preprocessing effect. The commonly used wavelet basis
model, adjusted important parameters are n_estimators (the number functions are Haar, Daubechies, Coiflets, Symlets, and others.
of subtrees) is 7, and max_depth (the maximum growth depth of the Daubechies wavelet basis functions were wavelet functions constructed
tree). The optimal value of grid search parameters is 5. In this paper, by Lurid Daubechies, which can provide more effective analysis results.
R2 (coefficient of determination), RPD (ratio of the standard error in Therefore, after consulting the related literature [42–44], four different
prediction to the standard deviation), RMSEP (root mean square error generating functions of Haar, Daubechies, Coiflets, Symlets and their dif-
of the prediction), MSE (mean square error) [40,41] were used as the ferent decompositions were compared. In this experiment, the process
evaluation indexes of model prediction performance. of denoising using wavelet function was first to decompose the spectral
signal, then set thresholds at different frequencies, and finally recon-
3. Results and discussion struct the selected spectral signal. Table 2 shows the comparison of
four kinds of wavelet basis functions and their different decompositions.
3.1. Analysis of raw near-infrared spectrum It can be seen from Table 2 that when the wavelet function is
Daubechies12 and the number of decomposition layers is 3 in the full
The absorption spectrum of the sample used in this paper is shown wavelength range, the WT-RF correction model obtains the best predic-
in Fig. 1. Samples with different concentrations of sunset yellow pig- tion effect. Its R2, MSE, RPD and RMSEP on the prediction set are 0.8368,
ment have similar absorption spectra. As shown in Fig. 1, in the range 0.0848, 2.3771, and 0.2912, respectively.
4 J. Liu et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718

Table 2 Table 4
Predictive performance of random forest (RF) calibration models with different wavelet Prediction performance under different interval numbers.
functions.
Interval number selection R2 MSE RPD RMSEP
Wavelet functions R2 MSE RPD RMSEP
5 0.8476 0.1009 2.5173 0.2795
Haar 0.7895 0.3443 1.4473 0.5868 10 0.8965 0.0902 2.6195 0.2454
Daubechies(4,4) 0.8271 0.0908 2.2219 0.3014 15 0.7831 0.1127 2.1290 0.2957
Daubechies(12,3) 0.8386 0.0848 2.3771 0.2912 20 0.7915 0.1013 2.2374 0.2864
Symlet(5,3) 0.8193 0.0949 2.3037 0.3081 30 0.7264 0.1739 1.9452 0.3798
Symlet(4,2) 0.8147 0.0973 2.2475 0.3120
Coif(5,4) 0.8310 0.0888 2.3336 0.2980

3.3. NIR spectral band selections

Table 3 In the quantitative analysis of artificial pigment sunset yellow by


Predictive performance of interval random forest (RF) calibration models with different using cream pigment near-infrared spectroscopy combined with
preprocessing methods.
D1st + WT-RF algorithm, too many variables will lead to too much
Preprocessing R2 MSE RPD RMSEP modeling data, thus increasing the calculation time, possibly adding re-
None 0.6943 0.3485 1.5371 0.5417 dundant information and reducing the accuracy of model analysis. Too
SG 0.7826 0.2967 1.7562 0.3968 few variables will lead to the insufficient use of spectral information of
SNV 0.8162 0.2583 1.9856 0.3749 this kind of cream pigment samples, and also reduce the accuracy of
WT 0.8397 0.2357 2.1794 0.3268
quantitative analysis of the model. Fig. 2 showed the corresponding co-
D1st derivative + SNV 0.8549 0.1839 2.3146 0.2937
D1st derivative + WT 0.8965 0.0902 2.6195 0.2454 efficients for each band of the selected number of intervals. It can be
seen that the contribution rate of some bands to the results is not
The highest metrics were marked as bold.
large, so only the wavelength variables with high model correction de-
gree were selected. The different results for specific interval number se-
lection were shown in Table 4. It can be seen from Table 4 that after the
In this study, the five spectral preprocessing methods described number of intervals reaches 10, the values of various predictive perfor-
above were applied to the preprocessing of margarine sunset yellow mance indicators such as R2 and RPD did not increase but decreased
near-infrared spectroscopy respectively, and the RF correction model 0.1134 and 0.3883 when the number of intervals up to 15, so the num-
was constructed with the preprocessed spectral data as input variables. ber of intervals finally selected in this paper is 10. It can be seen from
Table 2 showed the prediction performance of RF correction models for Fig. 2 that when the number of intervals was 10, the wavelength vari-
different spectral preprocessing methods (the appropriate parameters able in the tenth interval had the largest contribution to the prediction
have been set). effect, that is, the ordinate was the largest. In other words, when the
As can be seen from Table 3, the RF correction model was directly wavelength was 13,400-14,000 cm−1 (tenth interval), this variable
constructed by using the original near-infrared spectrum of cream pig- can affect the final model prediction ability to a great extent.
ment. On the prediction set, R2, MSE, RPD and RMSEP were respectively In Table 4, more accurate predictions can be obtained in the case of
0.6943, 0.3485, 1.5371 and 0.5417. When the spectrum processed by SG more band selections. It can be known that when the number of inter-
to construct an RF correction model was used, the results on its predic- vals reaches 30, R2 dropped sharply to 0.7264, RPD to 1.9452, and the
tion set changed significantly (R2 and RPD increased 0.0883 and 0.4485, performance of other indicators also decreased significantly. Therefore,
MSE and RMSEP decreased 0.0518 and 0.1449). When the near-infrared it can be known that when the number of bands reached 30, the perfor-
spectrum processed by SNV, WT and SG was used to construct the inter- mance of the model decreased instead.
val RF correction model, the prediction results all improved. Especially
after WT preprocessing, compared without preprocessing, R2 and RPD 3.4. Comparison of prediction performance of four models
increased by 0.1454 and 0.6593 respectively, while MSE and RMSEP de-
creased by 0.1182 and 0.2149 respectively. When WT and D1st proc- To verify the difference between the prediction performance of the
essed spectra were combined to construct the interval RF, the model D1st + WT-RF correction model and other correction models, a wavelet
verification results R2, MSE, RPD, and RMSEP were 0.8965, 0.0902, transform partial least squares correction model (D1st + WT-PLSR) for
2.6195, and 0.2454, respectively. The results showed that it was appro- the near-infrared spectrum of cream pigment was constructed. In the
priate to use wavelet transform and first-order differential as a near- construction of the D1st + WT-PLSR correction model, the latent vari-
infrared spectrum preprocessing method for pigment content in cream. able was optimized by using ten-fold cross-validation, and the optimal

Fig. 2. The corresponding coefficients for different intervals.


J. Liu et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718 5

Table 5 4. Conclusion
Prediction performance under different models.

Method R2 MSE RPD RMSEP Wavelet transform and interval random forest algorithm were com-
PLSR 0.7572 0.1275 1.8749 0.3572
bined in this study to analyse the content of sunset yellow in margarine
RF 0.8121 0.0987 2.4723 0.3142 quickly and quantitatively. The FT-NIR spectra of 132 groups of cream
Inv-PLSR 0.8570 0.0779 2.5868 0.2792 were collected by using Fourier transform near-infrared spectrometer.
Inv-RF 0.8965 0.0902 2.6195 0.2454 Four different spectral preprocessing methods (MSC, SNV, D1st, WT)
were compared and the FT-NIR spectra of cream samples were proc-
essed by combining various preprocessing methods. The WT combined
with D1st was selected as the preprocessing method for the analysis of
latent variable was determined to be 10. Then, the optimization the cream pigment spectrum. The band selection of the NIR spectrum
between different zones was carried out. On the basis of wavelength was performed, and the influence of the important threshold of differ-
interval selection, the comparison is made. The specific results were ent bands on the prediction performance of the model was compared.
shown in Table 5 below. Under the same preprocessing method, R2, To further explore the prediction performance of the correction model
MSE, RPD, and RMSEP of RF were 0.8121, 0.0987, 2.4723 and 0.3142, in this study, it was compared with the PLSR model under the same pre-
respectively. Compared with PLSR, the effect was better. In the optimi- processing method. The results showed that the correction model of this
zation of the interval wavelength selection method, the R2, MSE, RPD, study had better prediction performance, and its prediction set R2 and
and RMSEP of Inv-RF were 0.8965, 0.0902, 2.6195 and 0.2454, respec- RMSEP were 0.8965 and 0.2454, respectively. Studies showed that
tively. Compared with Inv-PLSR under the same conditions, the effect D1st + WT-Inv-RF can accurately and quickly quantify the amount of
was better. After interval variable wavelength selection was used, the sunset yellow pigment in margarine. At the same time, this study pro-
prediction effects of both PLSR model and RF model have improved to vided a theoretical basis and technical support for the detection of
some extent. baked foods and analysis of other indicators in the field of food
Fig. 3 showed the predictions of the four models for the validation supervision.
set. As can be seen from Fig. 3, the scatter the scatter plot of the random
forest model was more concentrated and close to the 45 degree regres- CRediT authorship contribution statement
sion line than that of PLSR. The closer it was to the 45-degree line, the
better the regression fitting effect was. When longitudinal comparison Jun Liu, Siqi Sun, Zhenglin Tan and Yang Liu hereby solemnly declare
was performed with each wavelength selection, it can be known that that the submitted paper "Nondestructive detection of sunset yellow in
the prediction effect was better after wavelength selection. cream based on near-infrared spectroscopy and interval random

Fig. 3. Scatter plots of PLSR, random forest, Inv-PLSR and Inv-RF.


6 J. Liu et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 242 (2020) 118718

forest" is the result of our research work, and there is no intellectual [15] D.M.M. Gila, et al., Rapid quantification of total polyphenol content in EVOO using
NIR sensor with wavelength selection and FS-MLR, 2015 IEEE International Confer-
property dispute. The paper is completed by our cooperation. Except ence on Imaging Systems and Techniques (IST), 2015.
for the content quoted in the article, this paper does not contain any [16] J. Liu, et al., Predicting the content of camelina protein using FT-IR spectroscopy
work that has been published or written by any other individual or coupled with SVM model, Clust. Comput. (2018)https://doi.org/10.1007/s10586-
018-1838-3.
group. We fully understand that the legal results of this statement are [17] M. Yang, et al., Evaluation of machine learning approaches to predict soil organic
borne by us. matter and pH using vis-NIR spectra, Sensors (2019) 19(2).
[18] V. Svetnik, A. Liaw, C. Tong, J.C. Culberson, R.P. Sheridan, B.P. Feuston, Random for-
est: a classification and regression tool for compound classification and QSAR
modeling, J. Chem. Inf. Comput. Sci. 43 (6) (2003) 1947–1958, https://doi.org/10.
Declaration of competing interest 1021/ci034160g.
[19] C. Strobl, et al., Bias in random forest variable importance measures: illustrations,
sources and a solution 8 (1) (2007) (p. 25-0).
We declare that we have no financial and personal relationships [20] D.R. Cutler, J.T.C. E., Random forests for classification in ecology, Ecological Society of
with other people or organizations that can inappropriately influence America ESA Online Journals (2008)https://doi.org/10.1890/07-0539.1.
our work. There is no professional or other personal interest of any na- [21] N. Said, M. Abdul, Comparison between random forests, artificial neural networks
and gradient boosted machines methods of on-line vis-NIR spectroscopy measure-
ture or kind in any product, service and/or company that could be con-
ments of soil total nitrogen and total carbon, Sensors 17 (10) (2017) 2428.
strued as influencing the position presented in, or the review of, the [22] W. JI, et al., Using different data mining algorithms to predict soil organic matter
manuscript entitled. based on visible-near infrared spectroscopy, Spectroscopy & Spectral Analysis 32
(9) (2012) 2393.
[23] D. Donald, et al., Adaptive wavelet modelling of a nested 3 factor experimental de-
Acknowledgements sign in NIR chemometrics, Chemometrics & Intelligent Laboratory Systems 82 (1–2)
(2006) 122–129.
This work was supported by the National Natural Science Founda- [24] A. Chemura, M.O.D. T., Remote sensing leaf water stress in coffee (Coffea arabica)
using secondary effects of water absorption and random forests, Physics and Chem-
tion of China (61906139, 61172150, 61803286), Hubei Provincial Natu- istry of the Earth, Parts A/B/C (2017)https://doi.org/10.1016/j.pce.2017.02.011.
ral Science Foundation of China under Grant (2019CFB173), the [25] E. Ercioglu, H.M. Velioglu, I.H. Boyaci, Determination of terpenoid contents of aro-
Foundation of Hubei Provincial Key Laboratory of Intelligent Robot matic plants using NIRS, 178 (2018) 716.
[26] A.L.D.O. Antonio José Steidle Neto, A.L.D.A. Lopes, C.L. Ferraza, Non-destructive pre-
(HBIR 201802) and the eleventh Graduate Innovation Fund of Wuhan diction of pigment content in lettuce based on visible–NIR spectroscopy, Journal of
Institute of Technology (CX2019240, CX2019241). the Science of Food & Agriculture 97 (2017) (2017) 2015–2022.
[27] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Letter: correction to the description of Standard
Normal Variate (SNV) and De-Trend (DT) transformations in Practical Spectroscopy
References with Applications in Food and Beverage Analysis 2nd edition, J. Near Infrared
Spectrosc. (1993) 1(1).
[1] A. Shakeri, V. Soheili, M. Karimi, S.A. Hosseininia, B.S. Fazly Bazzaz, Biological activ- [28] T. Fearn, et al., On the geometry of SNV and MSC, Chemometrics & Intelligent Labo-
ities of three natural plant pigments and their health benefits, J. Food Meas. Charact. ratory Systems 96 (1) (2009) 22–26.
12 (1) (2017) 356–361, https://doi.org/10.1007/s11694-017-9647-6. [29] P.A. Gorry, General least-squares smoothing and differentiation by the convolution
[2] N. Martins, C.L. Roriz, P. Morales, L. Barros, I.C.F.R. Ferreira, Food colorants: chal- (Savitzky-Golay) method, Anal. Chem. 6 (62) (1990) 570–573.
lenges, opportunities and current desires of agro-industries to ensure consumer ex- [30] M. Antonini, et al., Image coding using wavelet transform 1 (2) (1992) 205–220.
pectations and regulatory practices, Trends Food Sci. Technol. 52 (2016) 1–15, [31] Y. Xu, et al., Wavelet transform domain filters: a spatially selective noise filtration
https://doi.org/10.1016/j.tifs.2016.03.009. technique 3 (6) (1994) 747–758.
[3] R.G. Ackman, S.N. Hooper, Isoprenoid fatty acids in the human diet: distinctive geo- [32] R.W. Schafer, What is a Savitzky-Golay filter? [lecture notes], Signal Processing Mag-
graphical features in butterfats and importance in margarines based on marine oils, azine IEEE 28 (4) (2011) 111–117.
Canadian Institute of Food Science & Technology Journal 6 (3) (1973) 159–165. [33] A.S. Lewis, G. Knowles, Image compression using the 2-D wavelet transform, IEEE
[4] S.A.H. Goli, et al., The production of an experimental table margarine enriched with Trans. Image Process. 1 (2) (2002) 244–250.
conjugated linoleic acid (CLA): physical properties, Journal of the American Oil [34] A. Saptoro, T.M.O. V., A modified Kennard-Stone algorithm for optimal division of
Chemists Society 86 (5) (2009) 453–458. data for developing artificial neural network models, Chem. Prod. Process. Model.
[5] Z.T.J.L. Supei Zhang, Determination of the food dye indigotine in cream by 1 (7) (2012) (p. 16-16).
nearinfrared spectroscopy technology combined with random forest model, [35] D.D. Claeys, T. Verstraelen, E. Pauwels, et al., Conformational sampling of macrocy-
Spectrochimica Acta Part A 2019. clic alkenes using a Kennard-Stone-based algorithm.[J], J. Phys. Chem. A 114 (25)
[6] D.P. Song, H. Zhang, L.I. Qi, Comparison of national standards for edible pigments be- (2010) 6879–6887.
tween China and foreign countries and progress on analytical techniques, Food Sci. [36] M. Zhu, J. Xia, M.L. Yan, S.Y. Zhang, G.L. Cai, J. Yan, G.M. Ning, Feature selection and
35 (3) (2014) 295–300. optimization of random forest modeling, Appl. Mech. Mater. 687-691 (2014)
[7] J. Wang, et al., Highly sensitive electrochemical determination of Sunset Yellow 1416–1419, https://doi.org/10.4028/www.scientific.net/AMM.687-691.1416.
based on gold nanoparticles/graphene electrode, Anal. Chim. Acta 893 (2015), [37] J. Bin, A.F.F. F., A modified random forest approach to improve multi-class classifica-
S0003267015010533. . tion performance of tobacco leaf grades coupled with NIR spectroscopy, RSC Adv. 36
[8] T. Pocock, M. Król, N.P. Huner, The determination and quantification of photosyn- (6) (2016) 30353–30361.
thetic pigments by reverse phase high-performance liquid chromatography, thin- [38] D. Sharma, De-Biased Random Forest Variable Selection, Social Science Electronic
layer chromatography, and spectrophotometry, 274 (2004) 137–148. Publishing, 2011.
[39] K.J. Archer, R.V. Kimes, Empirical characterization of random forest variable impor-
[9] S. Jiang, et al., NIR-to-visible upconversion nanoparticles for fluorescent labeling and
tance measures, Computational Statistics & Data Analysis 52 (4) (2008) 2249–2260.
targeted delivery of siRNA, Nanotechnology 20 (15) (2009) 155101.
[40] J. Li, et al., Analysis of soil nutrient content based on near infrared reflectance spec-
[10] F. Berset, Percentage of body fat and risk factors of coronary heart disease, Tidsskrift
troscopy in Beijing region, Transactions of the Chinese Society of Agricultural Engi-
for Den Norske Lgeforening Tidsskrift for Praktisk Medicin Ny Rkke 112 (22) (1992)
neering 28 (2) (2012) 176–179.
2848–2851.
[41] Blakey, J. Robert, Evaluation of avocado fruit maturity with a portable near-infrared
[11] M. Schnaiter, et al., UV-VIS-NIR spectral optical properties of soot and soot- spectrometer, Postharvest Biology & Technology 121 (2016) 101–105.
containing aerosols, J. Aerosol Sci. 34 (10) (2003) 1421–1444. [42] V.J.B.R. Hamilton, Anal. Chem. 1 (69) (1997) 78–90.
[12] H. Fang, et al., Detection of activity of POD in tomato leaves based on hyperspectral [43] K.L. Alex Ander, C.F.G. J., Application of wavelet transform in infrared spectrometry:
imaging technology, Spectrosc. Spectr. Anal. 32 (8) (2012) 2228. spectral compression and library search, Chemometrics & Intelligent Laboratory
[13] Y. Liu, X. Sun, A. Ouyang, Nondestructive measurement of soluble solid content of Systems 1-2 (43) (1998) 69–88.
navel orange fruit by visible–NIR spectrometric technique with PLSR and PCA- [44] H.Y. Yoo, L.K.W. J., Selecting optimal basis function with energy parameter in image
BPNN 43 (4) (2010) 0–607. classification based on wavelet coefficients, 대한원격탐사학회지 5 (24) (2008)
[14] N. Shetty, G. R., Quantification of fructan concentration in grasses using NIR spec- 437–444.
troscopy and PLSR, Field Crop Res. 1 (120) (2011) 0–37.

You might also like