Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Food Chemistry 351 (2021) 129314

Contents lists available at ScienceDirect

Food Chemistry
journal homepage: www.elsevier.com/locate/foodchem

Predicting oil content in ripe Macaw fruits (Acrocomia aculeata) from unripe
ones by near infrared spectroscopy and PLS regression
Ulisses F. Oliveira a, Annanda M. Costa b, Jussara V. Roque a, Wilson Cardoso a,
Sergio Y. Motoike c, Marcio H.P. Barbosa c, Reinaldo F. Teofilo a, *
a
Multivariate Chemical Data Analysis Laboratory, Department of Chemistry, Universidade Federal de Viçosa, 36570-900 Viçosa, MG, Brazil
b
Campus Ponta Porã, Instituto Federal de Mato Grosso do Sul, 79100-510 Campo Grande, MS, Brazil
c
Department of Agronomy, Universidade Federal de Viçosa, 36570-900 Viçosa, MG, Brazil

A R T I C L E I N F O A B S T R A C T

Keywords: A method for early quantification of unripe macaw fruits oil content using near-infrared spectroscopy (NIR) and
Macaw palm partial least squares (PLS) is presented. After harvest, the fruit takes about 30 days to reach its maximum oil
Multivariate regression accumulation. The oil content was quantified thirty days after harvest using Soxhlet extraction. PLS models were
Near-infrared spectroscopy
built using NIR spectra of shell obtained five days after harvest (Shell5). The Shell5 model was compared with
Partial least squares
models built using NIR spectra of the shell (Shell30) and mesocarp thirty days after harvest (Pulp30). Ordered
Early prediction
predictors selection was used to select the most informative variables. The best models presented root mean
square error of prediction and correlation coefficient of prediction of 4.87% and 0.89 for Shell5; 5.83% and 0.85
for Shell30; 4.76% and 0.92 for Pulp30. Thus, the anticipated prediction of oil content could reduce the time and
costs of macaw palm quality control and storage.

1. Introduction current source of vegetal oil. However, the macaw palm does not
compete with rainforest and fertile land and presents higher storage
The demand for vegetable oils has increased over the last decades stability (Evaristo et al., 2016; Prates-Valério et al., 2019). The macaw
(Sajjadi, Raman, & Arandiyan, 2016; Senger et al., 2017; Yang, Zhang, palm mesocarp oil comprises about 60% oleic acid and 29% palmitic
Li, Yu, Mao, Wang, & Zhang, 2018). It may be attributable to food acid and is rich in carotenoids, which play an essential role as an anti­
consumption and biofuel production related to the continual increase in oxidant agent (Nunes, Favaro, Galvani, & Miranda, 2015). It can be used
the world’s population and demand for cleaner and renewable energy in food, pharmaceutical, cosmetics, and biodiesel production due to its
sources (Muscat, de Olde, de Boer, & Ripoll-Bosch, 2019; Prates-Valério, higher oxidation stability and operation at low temperatures (Montoya,
Celayeta, & Cren, 2019). Motoike, Kuki, & Couto, 2016; Simiqueli, de Resende, Motoike, &
The oleaginous palm, Acrocomia aculeata, commonly known as Henriques, 2018). The cake obtained after oil extraction can be used as
macaw palm, is found in savanna-like vegetation, semi-deciduous sea­ animal feed since it contains no toxic compounds, and the shell could be
sonal forests, and deforested areas, among other regions of Central and used for energy generation (Tilahun et al., 2019). Therefore, macaw
South America (Lanes, Motoike, Kuki, Nick, & Freitas, 2015; Machado, palm has the potential to be an economically viable crop.
Figueiredo, & Guimarães, 2016). The estimated oil production from this Macaw palm is still under the process of domestication, and most
crop is up to 6200 kg/ha, similar to oil palm (Elaeis guineensis), the major studies aim at its breeding to increase its oil yield (Pires, dos Santos

Abbreviations: AutoiOPS, automatic interval ordered predictors selection; FeediOPS, feedback interval ordered predictors selection; hMod, number of latent
variables of the model; hOPS, number of latent variables for ordered predictors selection; LOD, limit of detection; MPGAB, macaw palm germplasm active bank; NIR,
near-infrared spectroscopy; Nlvs, number of latent variables; nVars, number of variables; OC, oil content; OPS, ordered predictors selection; PLS, partial least squares;
Pred, prediction; Pulp30, spectra of macaw mesocarp post-harvest thirty days post-harvest; R, correlation coefficient (Rc) of calibration, (Rcv) of cross-validation, and
(Rp) of prediction; RMSE, root mean square error (RMSEC) of calibration, (RMSECV) of cross-validation and (RMSEP) of prediction; SEL, selectivity; SEN, sensitivity;
Shell5, spectra of macaw shell five days post-harvest of, Shell30, spectra of macaw shell thirty days post-harvest.
* Corresponding author.
E-mail addresses: ulisses.oliveira@ufv.br (U.F. Oliveira), jussara.roque@ufv.br (J.V. Roque), wilson.cardoso@ufv.br (W. Cardoso), motoike@ufv.br
(S.Y. Motoike), barbosa@ufv.br (M.H.P. Barbosa), rteofilo@ufv.br (R.F. Teofilo).

https://doi.org/10.1016/j.foodchem.2021.129314
Received 26 May 2020; Received in revised form 28 December 2020; Accepted 5 February 2021
Available online 18 February 2021
0308-8146/© 2021 Elsevier Ltd. This article is made available under the Elsevier license (http://www.elsevier.com/open-access/userlicense/1.0/).
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

Fig. 1. Macaw palm (Acrocomia aculeata) (A), NIR set up to acquire the macaw fruit’s shell spectra (B), and NIR set up to acquire the macaw fruit’s mesocarp
spectra (C).

Souza, Kuki, & Motoike, 2013). An important characteristic of this fruit & Marcatti, 2015). However, studies involving early prediction from
is that it continues to increase its oil content after harvest (Costa et al., plant products are still scarce due to their high complexity and possible
2018; de França, Reber, Meireles, Machado, & Brunner, 1999; Mon­ matrix changes. Studies about the early prediction of sugarcane features
selise, 2018). The current harvest practices based on the collection of (de Porto, 2019) and cassava productivity (Vitor, Diniz, Morgante,
dropped fruits and its oil extraction without adequate handling and Antônio, & de Oliveira, 2019) can be found in the literature. Early
storage have resulted in a low-quality oil. Besides, acceptable practices predicting a fruit property is essential to assess the fruit before reaching
can increase the fruit’s oil content by over 20% during 30 days of storage its full maturity, improving its quality and saving time and storage costs.
after harvest (Evaristo et al., 2016; Prates-Valério et al., 2019; Tilahun Thus, the goal of this work was to establish for the first time a non-
et al., 2019). destructive, fast, inexpensive, and reliable method capable of early
The oil content is often quantified using solvent extraction (Danlami, predict the mesocarp oil content of ripe macaw fruits (30 days after
Arsad, Ahmad Zaini, & Sulaiman, 2014; Richter et al., 1996). This harvest) using NIR spectra of their respective unripe fruits (5 days after
method is laborious, time-consuming, and destructive, and they use a harvest) and PLS regression. Ordered predictors selection (OPS) (Roque,
considerable amount of sample and toxic solvents to perform the Cardoso, Peternelli, & Teófilo, 2019; Teófilo, Martins, & Ferreira, 2009)
extraction (Richter et al., 1996). was used to improve the models’ quality by selecting more informative
Due to the continual increase in oil content, long maturation time, and predictive variables. The method proposed was compared with
and limitations of the conventional procedures for oil quantification, a methods using the NIR spectra of ripe fruits.
method capable of early predicting the oil content of the ripe fruits
would be advantageous for decision-making regarding the selection of 2. Experimental
high yield fruits, transportation, and storage. Considering the routine
analyses carried out at breeding programs and quality assessments at 2.1. Macaw palm samples
industrial sites, a quick, inexpensive, non-destructive, and reliable
method is required to estimate the oil content of numerous samples The macaw fruits used in this study were part of the macaw palm
(Evaristo et al., 2016). germplasm active bank (MPGAB) of the Universidade Federal de Viçosa
Noninvasive and non-destructive methods such as near-infrared (UFV) located in the municipality of Araponga, Minas Gerais State (lat
spectroscopy (NIR) coupled with partial least squares (PLS) regression 20◦ 40′ 1′′ S, long 42◦ 31′ 15′′ W, alt ~980 m). The MPGAB is registered
have been widely used to quantify agronomic properties and assess the under the number 084/2013 – SECEX/CGEN and consists of 253 ac­
quality of agricultural and food products (Ferreira, Galão, Pallone, & cessions, collected in several parts of Brazil. This bank is considered the
Poppi, 2014; Tilahun et al., 2018). This technique provides rapid in­ largest MPGAB in the world.
formation on physical and chemical properties (Montoya et al., 2016; The fruits were collected in the MPBAG, consisting of 44 accessions
Nunes et al., 2015; Pires et al., 2013) with minimal or no sample (half-sib families). Three plants were collected for each access, and five
preparation with a high predictive ability for unknown samples (Pas­ fruits were chosen from each plant for oil content analysis. In total, 660
quini, 2003). fruits were collected; however, some of them were lost in transportation
Several works have presented the use of NIR to quantify the oil or damaged; therefore, only 630 fruits were used. The fruits were taken
content of plants, such as avocado (Olarewaju, Bertling, & Magwaza, to the post-harvest laboratory, where they remained stored at room
2016), olive (Altieri, Matera, Genovese, & Di Renzo, 2020; Saha & temperature for ripening. After ripening, the fruits were frozen at
Jackson, 2018), palm fruits (Sudarno, 2017), maize (Tallada, Palacios- − 20 ◦ C freezer to stop the metabolism for conservation and further oil
Rojas, & Armstrong, 2009), and citrus (Steuer, Schulz, & Läger, 2001). quantification.
Previous work reported the application of VIS-NIR region from 30,770
to 9300 cm− 1 to predict the oil content of the macaw palm in different
maturation stages using the mesocarp (Matsimbe, Motoike, Pinto, Leite,

2
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

2.2. Spectral analysis built using the PLS regression. Random cross-validation with ten splits
was used. Besides the raw data, two preprocessing methods, mean center
Due to the logistics of transporting the macaw fruits, only after five and autoscale, eight transformations, smoothing, first derivative, second
days, the near-infrared (NIR) spectral analysis was performed on the derivative, multiplicative scatter/signal correction (MSC), detrend,
shell (Shell5). Spectra were taken from the shell to avoid damage to the normalize, baseline, standard normal variate scaling (SNV). Their
fruit before reaching its full maturity. After harvest, considering good combinations were tested to find the best regression model.
storage practices, the macaw palm takes thirty days to reach its maturity Two OPS methods (Roque et al., 2019), i.e., AutoiOPS and FeediOPS,
and, therefore, its maximum oil accumulation (Tilahun et al., 2019). were applied to select more predictive variables to improve the models.
The Shell5 model result was compared to models built from, NIR Firstly, the original response variables (X matrix columns) were sub­
spectral analysis was performed on the shell (Shell30) and mesocarp divided into intervals, each one containing fifty variables. In each in­
(Pulp30) of the fruits after 30 days of maturation. Fig. 1A shows the terval, informative vectors that contain information about the location
fruits before harvest, and Fig. 1B and C refer to how the NIR spectra were of the best response variables for prediction were calculated. The orig­
obtained for the macaw shell and mesocarp, respectively. inal variables in each interval were differentiated according to the cor­
The NIR spectra were obtained using a Fourier transform NIR (FT- responding absolute values of the informative vector elements. The
NIR) 660 spectrometer (Agilent Technologies, Santa Clara, United differentiated variables were sorted in descending order. An initial
States) with a reflectance-integrating sphere accessory from PIKE subset of two variables (window) for each interval was defined to build
Technologies. The range investigated was 10,000–4000 cm− 1 with an PLS models. The initial subset was extended by adding one variable
increment of 2 cm− 1. The spectra were obtained using the software Pro (increment) over each window until all variables were taken. The var­
Resolutions Version 5.1 and stored as log (1/R), where R is the reflec­ iable subsets in each interval were compared using the cross-validation
tance collected. For each NIR spectrum (Shell5, Shell30, and Pulp30) parameters calculated during validations, and the best variable subset
acquired in the instrument, 32 scans were performed to increase the was defined for each interval. These variable selection steps were per­
signal to noise ratio, and the average of these scans stored. formed using several informative vectors, and for each interval, the best
For Shell5 and Shell30, each fruit was directly placed on the in­ vector was chosen. This method is called AutoiOPS.
strument window without sample preparation and scanned in two po­ Additionally, after performing the AutoiOPS selection, the selected
sitions. Each position was measured three times, and the averaged variables returned to a new selection, and this procedure is called Fee­
spectrum of the two positions was used to modeling. For Pulp30, the diOPS. This method was carried out until relative differences between
fruit shell was removed using a knife to expose the mesocarp, and then two consecutive RMSECV values were less than 0.02% (Roque et al.,
the spectrum was obtained, placing the mesocarp directly on the in­ 2019).
strument window. A spectrum was obtained from three different Two optimum number of latent variables (nlvs) are employed in this
mesocarp positions, and the averaged spectrum was used to modeling. work: one representing the component number for model building
In total, 630 spectra were used for each model, Shell5, Shell30, and (hMod) and the other representing the component number employed to
Pulp30. The spectra were acquired in a laboratory with a controlled generate the best informative vector in the OPS method (hOPS) (Roque
temperature at 21 ◦ C. et al., 2019). The OPS methods were applied using the algorithms
available at www.deq.ufv.br/chemometrics. All the calculations were
2.3. Oil content quantification performed in Matlab 2019a.

Oil extraction was performed using n-hexane (Sigma-Aldrich, San 2.5. Figure of merit
Luis, United States) in a Soxhlet extractor (Marconi, Piracicaba, Brazil)
according to the adapted procedure from Analytical Norms of Adolfo The quality of the models was evaluated using the parameters root
Lutz Institute (IAL, 2008). The fruit’s mesocarps were dried in a venti­ mean square error (RMSE) and correlation coefficient (R), which were
lated oven at 65 ◦ C for 72 h. After drying, 5 g of sample was placed in a calculated by Equations (2) and (3), respectively.
filter paper cartridge and arranged in the Soxhlet extractor containing √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ N
√∑
150 mL of n-hexane; the extraction was carried out for eight hours. In RMSE = √ (yi − y1 )2 /N (2)
sequence, the n-hexane extract was transferred to a beaker, placed in an i
oven at 105 ◦ C for 24 h for n-hexane evaporation. Finally, the remaining
( )
oil was cooled down to room temperature and weighed. ∑N
̂yi − ̂ y (yi − y)
Oil extraction was performed for each fruit, which generated an oil i=1
R = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
)2̅√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (3)
content (OC) value for each sample (Equation (1)). (
∑N ∑N
( ) ̂y i − ̂ y (yi − y)2
M0
(1)
i=1 i=1
OC = 100
MS
where ̂ y i and ̂
y are the estimated value and mean of estimated values,
where OC stands for oil content percent, M0 stands for oil mass in grams, respectively, y andy are the observed values and the mean of the
and Ms stands for the total mesocarp mass in grams. observed values, respectively, and N represents the number of samples.
When internal cross-validation (CV) is applied, N represents the number
2.4. Multivariate regression models of samples in the cross-validation set, and the error and correlation co­
efficient are called root mean square error of cross-validation (RMSECV)
The y vector (dependent variable) is associated with oil content and correlation coefficient of cross-validation (RCV), respectively. For
values. The X matrix of the NIR spectra (independent variables) was the external validation, N represents the number of samples predicted
different for each condition (Shell5, Shell30, and Pulp30), but the y (P), and, in this case, the error and correlation coefficients are named
vector was the same. The Kennard and Stone algorithm (Kennard & correlation root mean square error of prediction (RMSEP) and correla­
Stone, 1969) was used to split the samples into calibration and predic­ tion coefficients of prediction (RP), respectively.
tion sets. For all models, 504 and 126 samples were selected for the The sensitivity (SEN), selectivity (SEL), and limit of detection (LOD)
calibration and prediction sets, respectively. were calculated according to Equations (4), 5, and 6, respectively.
The spectra were imported to the Matlab 2019a environment (The
SEN = 1/‖b‖ (4)
Mathworks, Natick, USA). An inverse regression model (y = Xb) was

3
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

where ‖b‖ represents the Euclidean norm of the regression


coefficientsΔ(α, β).
Each model was verified for chance correlation. The y vector
(dependent variables) was randomized 10,000 times, and models were
built using these randomizations. The correlation coefficient between
the reference and predicted values was evaluated. If the model’s cor­
relation parameter built using the authentic y was isolated from those of
the models using the randomized y, then the model did not occur by
chance.

3. Results and discussion

3.1. Oil content

The samples’ mesocarp oil content ranged from 29.5 to 74.5%, with
an average of 55.7% and a 10.7 standard deviation. The variability in oil
content is related to the plants’ genetic variability (Costa et al., 2018).
This variability is also due to the different regions where the fruits were
harvested.

3.2. Spectral interpretation

For all raw spectra shown in Fig. 2 (A, C, and E), the band from 9000
to 7700 cm− 1 (region 1) is related to the second overtone of C–H
stretching, specifically related to oleic acid. The higher absorbance of
this band is related to an oil rich in unsaturated fatty acids. The band
from 7200 to 6500 cm− 1 (region 2) is related to water and combination
bands. The bands in 5800 (region 3) and 5670 cm− 1 (region 4) are
related to saturated and unsaturated fatty acids, in this case, palmitic
and oleic acids, respectively. The band around 5200 cm− 1 (region 5) is
related to water absorbance and combination bands. The region from
Fig. 2. (A), (C), and (E) NIR spectra for Shell5 (Xshell5), Shell30 (Xshell30), and 4500 to 4000 cm− 1 (region 6) is related to combinations of bands of
Pulp30 (Xpulp30) respectively; (B), (D) and (F) represents the spectra trans­
edible oils (Hourant, Baeten, Morales, Meurens, & Aparicio, 2000;
formed using first derivative and smoothing. Region 1: 9000 to 7700 cm− 1;
Ozaki, McClure, & Christy, 2006; Sato, Kawano, & Iwamoto, 1991).
Region 2: 7200 to 6500 cm− 1; Region 3: 5800 cm− 1; Region 4: 5670 cm− 1;
Region 5: 5200 cm− 1; Region 6: 4500 to 4000 cm− 1. The spectra of Shell5 and Shell30 are similar, the latter presenting a
more scattered profile. It may be due to physical changes in the shell
during the maturation. The Pulp30 spectra differ from Shell5 and Sell30
where ||b|| is the Euclidean norm of the vector of regression coefficients spectra, presenting a more accentuated absorbance in the regions from
of the PLS model (Assis et al., 2017; Roque, Dias, & Teófilo, 2017). 8700 to 8000 cm− 1, 7200 to 6500 cm− 1, 6000 to 5500 cm− 1, 5300 to
SEL = nasi /‖xi ‖ (5) 5000 cm− 1, and 4500 to 4000 cm− 1. This difference may be because the
Shell5 and Shell30 spectra were obtained from the shell, which is not
where nasi is the absolute scalar value of the net analytical signal for rich in oil and water, while the Pulp30 spectra were obtained directly on
sample i and || xi || represents the Euclidean norm of the instrumental the pulp, which is rich in oil and water. For that reason, the spectra of
response vector for sample i (Assis et al., 2017; Rambo, Amorim, & Pulp30 presented a more accentuated profile in the regions related to
Ferreira, 2013; Roque et al., 2017). fatty acids and water.

LOD = (3.3 × nas × ‖b‖ )/max(b) (6)

Table 1
Statistical parameters and figures of merit for the PLS models with all variables (Full) and variables selected using AutoiOPS and FeediOPS.
Shell5 Shell30 Pulp30

Full AutoiOPS FeediOPS Full AutoiOPS FeediOPS Full AutoiOPS FeediOPS

nlvs 6 6 6 7 7 7 6 6 6
nVars* 3113 260 290 3113 200 155 3113 340 220
RMSEC (g/g) 3.81 3.33 3.31 3.65 3.49 3.99 3.14 2.52 2.19
Rc 0.93 0.95 0.96 0.94 0.94 0.93 0.95 0.97 0.98
RMSECV (g/g) 4.97 4.21 4.16 4.97 4.32 4.75 3.89 3.24 2.82
Rcv 0.88 0.91 0.92 0.88 0.91 0.89 0.93 0.95 0.96
RMSEP (g/g) 5.28 5.50 4.87 6.11 5.83 5.95 4.76 5.49 5.47
Rp 0.88 0.88 0.89 0.83 0.85 0.82 0.92 0.88 0.88
SEN 0.54 0.20 0.21 0.50 0.11 0.12 0.83 0.29 0.22
SEL 0.08 0.10 0.10 0.07 0.21 0.20 0.13 0.13 0.13
LOD (g/g) 26.19 11.43 18.32 25.89 13.23 7.01 13.66 8.83 8.66

nlvs: number of latent variables; nVars: number of variables; Rc: correlation coefficient of calibration; Rcv: correlation coefficient of cross-validation; Rp: correlation
coefficient of prediction; RMSEC: root mean square error of calibration; RMSECV: root mean square error of cross-validation; RMSEP: root mean square error of
prediction; SEN: sensitivity; SEL: selectivity; LOD: limit of detection.

4
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

Fig. 3. Measured and predicted oil content for (A) Shell5; (D) Shell30; (G) Pulp30 (● represents the regression sample set and the prediction sample set). Relative
error for calibration set. (B) Shell5; (E) Shell30; (H) Pulp30. Relative error for prediction set (C) Shell5; (F) Shell30; (I) Pulp30.

3.3. Modeling Pulp30-Full from the results shown in Table 1, it is possible to observe
that the Shell5-FeediOPS model was quite similar to the models built
The original NIR spectra and their respective transformations are with spectra of the ripe fruits, i.e., Shell30-AutoiOPS and Pulp30-Full.
shown in Fig. 2. Mathematical transformations in the matrix rows are Furthermore, the Shell5-FeediOPS model presented RMSECV and RMSEP
essential to increase the signal-to-noise ratio and reduce systematic er­ values lower and Rcv and Rp values higher than the Shell30-AutoiOPS
rors from undesired light scattering variations. The best preprocessing model.
and transformation combination that produced the smallest RMSECV Matsimbe et al. determined the oil content in macaw palm using VIS-
and highest RCV was chosen. The transformation that presented the best NIR spectra (30,770 to 9300 cm− 1) of the mesocarp (Matsimbe et al.,
results for all data sets was the first derivative, followed by smoothing. 2015) and found RMSEP, Rcv, and nlvs values equal to 7.08%, 0.88, 9,
Statistical parameters and figures of merit for the PLS models using respectively. Considering the Pulp30, the model presented better pa­
all variables (Full) and selected variables (AutoiOPS and FeediOPS) are rameters, RMSEP of 4.76% and Rcv of 0.92, with six nlvs, a more
shown in Table 1. For Shell5, the model using the variables selected parsimonious model. It may be due to the wavelength range used and a
using AutoiOPS presented a lower RMSECV value than the Full model. more extensive preprocessing screening, resulting in more corrected
However, a smaller RMSEP was achieved with the variables selected spectra, and therefore a better model.
using FeediOPS, which means that it is a more predictive model than the For all models, low SEN values can be observed. A decrease in their
model using all variables. For the Shell30 model, the variables selected values is noticed in the models using the selected variables. This may be
improved both RMSECV and RMSEP values; the AutoiOPS was the best due to the use of more interpretative and predictive variables. As ex­
approach for this model. For the Pulp30 model, the variables selected pected, low SEL values were observed for all models as the spectra
improved the RMSECV values but not the RMSEP values, so the Full contain information about other components, which could be consid­
model was chosen. ered interferents. For both AutoiOPS and FeediOPS models, this param­
Comparing the models Shell5-FeediOPS, Shell30-AutoiOPS, and eter was higher as more selected regions of the NIR spectra related to the

5
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

Fig. 4. Correlation coefficient of cross-validation (Rcv) versus correlation coefficient of calibration (Rc) (chance correlation) for (A) Shell5; (B) Shell30; (C) Pulp30.

Fig. 5. Variables selected for (A) Shell5 using FeediOPS and (B) Shell30 using AutoiOPS.

property were used. Except for the Pulp30 model, where the selection harvest. Also, it can be observed that the selected regions are related to
did not improve the SEL values. The LOD values also decreased for the absorbance bands of fatty acids, and the bands related to water absor­
models using the selected variables. bance were not selected.
Table 1 shows that FeediOPS models were chosen for Shell5, Auto­ Thus, the prediction of the macaw fruit’s oil content was possible
iOPS for Shell30, and the Full model for Pulp30, as variable selection did through NIR of the shell or mesocarp and PLS regression. All models
not improve the RMSEP value. Fig. 3 presents the measured versus presented reliable predictive capacity. However, the model Shell5 was
predicted oil content values (3A, 3D, and 3G) using the NIR spectra for the most advantageous, as the ripe fruit’s maximum oil content could be
the best model of each data set. A linear fit can be observed for all early predicted 25 days before without damaging the fruit. This method
datasets, indicating that the models can accurately predict the oil con­ can improve the harvest and storage practices aimed at improving the
tent in macaw fruit. oil’s quality and reducing the time needed for decision-making related
Fig. 3 also shows the relative errors of calibration (Fig. 3B, E, and H) to selecting the best fruits. Besides, this work opens the way to further
and prediction (Fig. 3C, F, and I). Most of the relative errors were less studies relating to the early prediction of fruit properties.
than 10%, showing that the model Shell5 could be used to satisfactory
predict the amount of oil present in the fruit 25 days before its 4. Conclusions
extraction.
The above models were evaluated in order to verify if there was a From this study, NIR spectroscopy and PLS regression proved to be
correlation by chance. Fig. 4 shows that the models presented in this capable of early predict the maximum oil content of macaw fruits using
work were distinguished from the models built by randomizing the y spectra from the shell of unripe fruits. Comparing the results of the
vector. Therefore, it can be concluded that the models were not obtained Shell5 with the Shell30 and Pulp30 models showed that the models
by chance. using the unripe fruits were similar to the ones using the ripe fruits. The
Fig. 5 presents the variables selected for Shell5 (Fig. 5A) and Shell30 OPS improved the models Shell5 and Shell30 by selecting more pre­
(Fig. 5B). The variables selected for Shell5 and Shell30 were obtained by dictive and informative variables. The method presented is faster, non-
FeediOPS and AutoiOPS, respectively. The selected regions, 1000–9000, destructive, solvent-free, and low-cost when compared with reference
8500–7500 cm− 1 6100–5500 cm− 1, were similar for both sets as the methods.
models were built using shell NIR spectra, differing only in the days after

6
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

CRediT authorship contribution statement spectrometry1. Revista Ciência Agronômica, 46(1), 21–28. https://doi.org/10.1590/
S1806-66902015000100003.
Monselise, S. (2018). Handbook of Fruit Set and Development (Boca Raton). CRC Press.
Ulisses F. Oliveira: Data curation, Formal analysis, Investigation, https://doi.org/10.1201/9781351073042.
Methodology, Validation, Visualization, Writing - original draft, Writing Montoya, S. G., Motoike, S. Y., Kuki, K. N., & Couto, A. D. (2016). Fruit development,
- review & editing. Annanda M. Costa: Conceptualization, Data cura­ growth, and stored reserves in macauba palm (Acrocomia aculeata), an alternative
bioenergy crop. Planta, 244(4), 927–938. https://doi.org/10.1007/s00425-016-
tion, Formal analysis, Funding acquisition, Investigation, Methodology. 2558-7.
Jussara V. Roque: Supervision, Validation, Visualization, Writing - Muscat, A., de Olde, E. M., de Boer, I. J. M., & Ripoll-Bosch, R. (2019). The battle for
review & editing, Writing - original draft. Wilson Cardoso: Supervision, biomass: A systematic review of food-feed-fuel competition. Global Food Security,
April, 100330. https://doi.org/10.1016/j.gfs.2019.100330.
Validation, Visualization, Writing - review & editing, Writing - original Nunes, A. A., Favaro, S. P., Galvani, F., & Miranda, C. H. B. (2015). Good practices of
draft. Sergio Y. Motoike: Supervision, Funding acquisition, Project harvest and processing provide high quality Macauba pulp oil. European Journal of
administration, Resources. Marcio H.P. Barbosa: Supervision, Funding Lipid Science and Technology, 117(12), 2036–2043. https://doi.org/10.1002/
ejlt.201400577.
acquisition, Project administration, Resources. Reinaldo F. Teofilo: Olarewaju, O. O., Bertling, I., & Magwaza, L. S. (2016). Non-destructive evaluation of
Conceptualization, Funding acquisition, Project administration, Re­ avocado fruit maturity using near infrared spectroscopy and PLS regression models.
sources, Supervision, Validation, Visualization, Writing - review & Scientia Horticulturae, 199, 229–236. https://doi.org/10.1016/j.
scienta.2015.12.047.
editing. Ozaki, Y., McClure, W. F., & Christy, A. A. (2006). Near-infrared spectroscopy in food
science and technology. John Wiley & Sons.
Pasquini, C. (2003). Near Infrared Spectroscopy: Fundamentals, practical aspects and
Declaration of Competing Interest analytical applications. Journal of the Brazilian Chemical Society, 14(2), 198–219.
https://doi.org/10.1590/S0103-50532003000200006.
Pires, T. P., dos Santos Souza, E., Kuki, K. N., & Motoike, S. Y. (2013). Ecophysiological
The authors declare that they have no known competing financial
traits of the macaw palm: A contribution towards the domestication of a novel oil
interests or personal relationships that could have appeared to influence crop. Industrial Crops and Products, 44, 200–210. https://doi.org/10.1016/j.
the work reported in this paper. indcrop.2012.09.029.
Porto, N. de A., Roque, J. V., Wartha, C. A., Cardoso, W., Peternelli, L. A., Barbosa, M. H.
P., & Teófilo, R. F. (2019). Early prediction of sugarcane genotypes susceptible and
Acknowledgments resistant to Diatraea saccharalis using spectroscopies and classification techniques.
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy, 218, 69–75.
https://doi.org/10.1016/j.saa.2019.03.114.
This study was financed in part by the Coordenação de Aperfeiçoa­ Prates-Valério, P., Celayeta, J. M. F., & Cren, E. C. (2019). Quality Parameters of
mento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001 Mechanically Extracted Edible Macauba Oils (Acrocomia aculeata) for Potential
and Fundação de Amparo à Pesquisa do Estado de Minas Gerais Food and Alternative Industrial Feedstock Application. European Journal of Lipid
Science and Technology, 121(5), 1800329. https://doi.org/10.1002/ejlt.201800329.
(FAPEMIG) (Project: CEX - APQ-02254-15).
Rambo, M. K. D., Amorim, E. P., & Ferreira, M. M. C. (2013). Potential of visible-near
infrared spectroscopy combined with chemometrics for analysis of some constituents
References of coffee and banana residues. Analytica Chimica Acta, 775, 41–49. https://doi.org/
10.1016/j.aca.2013.03.015.
Richter, B. E., Jones, B. A., Ezzell, J. L., Porter, N. L., Avdalovic, N., & Pohl, C. (1996).
Altieri, G., Matera, A., Genovese, F., & Di Renzo, G. C. (2020). Models for the rapid
Accelerated Solvent Extraction: A Technique for Sample Preparation. Analytical
assessment of water and oil content in olive pomace by near-infrared spectrometry.
Chemistry, 68(6), 1033–1039. https://doi.org/10.1021/ac9508199.
Journal of the Science of Food and Agriculture, 100(7), 3236–3245. https://doi.org/
Roque, J. V., Cardoso, W., Peternelli, L. A., & Teófilo, R. F. (2019). Comprehensive new
10.1002/jsfa.10361.
approaches for variable selection using ordered predictors selection. Analytica
Assis, C., Ramos, R. S., Silva, L. A., Kist, V., Barbosa, M. H. P., & Teófilo, R. F. (2017).
Chimica Acta, 1075, 57–70. https://doi.org/10.1016/j.aca.2019.05.039.
Prediction of Lignin Content in Different Parts of Sugarcane Using Near-Infrared
Roque, J. V., Dias, L. A. S., & Teófilo, R. F. (2017). Multivariate Calibration to Determine
Spectroscopy (NIR), Ordered Predictors Selection (OPS), and Partial Least Squares
Phorbol Esters in Seeds of Jatropha curcas L. Using Near Infrared and Ultraviolet
(PLS). Applied Spectroscopy, 71(8), 2001–2012. https://doi.org/10.1177/
Spectroscopies. Journal of the Brazilian Chemical Society, 28(8), 1506–1516.
0003702817704147.
Saha, U., & Jackson, D. (2018). Analysis of moisture, oil, and fatty acid composition of
Costa, A. M., Motoike, S. Y., Corrêa, T. R., Silva, T. C., Coser, S. M., de Resende, M. D. V.,
olives by near-infrared spectroscopy: Development and validation calibration
& Teófilo, R. F. (2018). Genetic parameters and selection of macaw palm (Acrocomia
models. Journal of the Science of Food and Agriculture, 98(5), 1821–1831. https://doi.
aculeata) accessions: An alternative crop for biofuels. Crop Breeding and Applied
org/10.1002/jsfa.8658.
Biotechnology, 18(3), 259–266. https://doi.org/10.1590/1984-70332018v18n3a39.
Sajjadi, B., Raman, A. A. A., & Arandiyan, H. (2016). A comprehensive review on
Danlami, J. M., Arsad, A., Ahmad Zaini, M. A., & Sulaiman, H. (2014). A comparative
properties of edible and non-edible vegetable oil-based biodiesel: Composition,
study of various oil extraction techniques from plants. Reviews in Chemical
specifications and prediction models. Renewable and Sustainable Energy Reviews, 63,
Engineering, 30(6), 605. https://doi.org/10.1515/revce-2013-0038.
62–92. https://doi.org/10.1016/j.rser.2016.05.035.
Evaristo, A. B., Grossi, J. A. S., Pimentel, L. D., de Melo Goulart, S., Martins, A. D., dos
Sato, T., Kawano, S., & Iwamoto, M. (1991). Near infrared spectral patterns of fatty acid
Santos, V. L., & Motoike, S. (2016). Harvest and post-harvest conditions influencing
analysis from fats and oils. Journal of the American Oil Chemists’ Society, 68(11),
macauba (Acrocomia aculeata) oil quality attributes. Industrial Crops and Products,
827–833. https://doi.org/10.1007/BF02660596.
85, 63–73. https://doi.org/10.1016/j.indcrop.2016.02.052.
Senger, E., Bohlinger, B., Esgaib, S., Hernández-Cubero, L. C., Montes, J. M., & Becker, K.
Ferreira, D. S., Galão, O. F., Pallone, J. A. L., & Poppi, R. J. (2014). Comparison and
(2017). Chuta (edible Jatropha curcas L.), the newcomer among underutilized crops:
application of near-infrared (NIR) and mid-infrared (MIR) spectroscopy for
A rich source of vegetable oil and protein for human consumption. European Food
determination of quality parameters in soybean samples. Food Control, 35(1),
Research and Technology, 243(6), 987–997. https://doi.org/10.1007/s00217-016-
227–232. https://doi.org/10.1016/j.foodcont.2013.07.010.
2814-x.
de França, L. F., Reber, G., Meireles, M. A. A., Machado, N. T., & Brunner, G. (1999).
Simiqueli, G. F., de Resende, M. D. V., Motoike, S. Y., & Henriques, E. (2018). Inbreeding
Supercritical extraction of carotenoids and lipids from buriti (Mauritia flexuosa), a
depression as a cause of fruit abortion in structured populations of macaw palm
fruit from the Amazon region. The Journal of Supercritical Fluids, 14(3), 247–256.
(Acrocomia aculeata): Implications for breeding programs. Industrial Crops and
https://doi.org/10.1016/S0896-8446(98)00122-3.
Products, 112, 652–659. https://doi.org/10.1016/j.indcrop.2017.12.068.
Hourant, P., Baeten, V., Morales, M. T., Meurens, M., & Aparicio, R. (2000). Oil and Fat
Steuer, B., Schulz, H., & Läger, E. (2001). Classification and analysis of citrus oils by NIR
Classification by Selected Bands of Near-Infrared Spectroscopy. Applied Spectroscopy,
spectroscopy. Food Chemistry, 72(1), 113–117. https://doi.org/10.1016/S0308-8146
54(8), 1168–1174. https://doi.org/10.1366/0003702001950733.
(00)00209-0.
IAL. (2008). Métodos químicos e físicos para análise de alimentos. In O. Zenebon, N. S.
Sudarno, Silalahi, D. D., Risman, T., Widyastuti, B. L., Davrieux, F., Yuan, Y. Y., &
Pascuet, & P. Tiglea (Eds.), Normas Analíticas do Instituto Adolf Lutz (4th ed., pp.
Caliman, J. P. (2017). Rapid determination of oil content in dried-ground oil palm
116–118). Instituto Adolfo Lutz.
mesocarp and kernel using near infrared spectroscopy. Journal of Near Infrared
Kennard, R. W., & Stone, L. A. (1969). Computer Aided Design of Experiments.
Spectroscopy, 25(5), 338–347. https://doi.org/10.1177/0967033517732679.
Technometrics, 11(1), 137–148. https://doi.org/10.1080/00401706.1969.10490666.
Tallada, J. G., Palacios-Rojas, N., & Armstrong, P. R. (2009). Prediction of maize seed
Lanes, E. C. M., Motoike, S. Y., Kuki, K. N., Nick, C., & Freitas, R. D. (2015). Molecular
attributes using a rapid single kernel near infrared instrument. Journal of Cereal
Characterization and Population Structure of the Macaw Palm, Acrocomia aculeata
Science, 50(3), 381–387. https://doi.org/10.1016/j.jcs.2009.08.003.
(Arecaceae), Ex Situ Germplasm Collection Using Microsatellites Markers. Journal of
Teófilo, R. F., Martins, J. P. A., & Ferreira, M. M. C. (2009). Sorting variables by using
Heredity, 106(1), 102–112. https://doi.org/10.1093/jhered/esu073.
informative vectors as a strategy for feature selection in multivariate regression.
Machado, W., Figueiredo, A., & Guimarães, M. F. (2016). Initial development of
Journal of Chemometrics, 23(1), 32–48. https://doi.org/10.1002/cem.1192.
seedlings of macauba palm (Acrocomia aculeata). Industrial Crops and Products, 87,
Tilahun, S., Park, D. S., Seo, M. H., Hwang, I. G., Kim, S. H., Choi, H. R., & Jeong, C. S.
14–19. https://doi.org/10.1016/j.indcrop.2016.04.022.
(2018). Prediction of lycopene and β-carotene in tomatoes by portable chroma-meter
Matsimbe, S. F. S., Motoike, S. Y., Pinto, F. A. de C., Leite, H. G., & Marcatti, G. E. (2015).
Prediction of oil content in the mesocarp of fruit from the macauba palm using

7
U.F. Oliveira et al. Food Chemistry 351 (2021) 129314

and VIS/NIR spectra. Postharvest Biology and Technology, 136, 50–56. https://doi. Vitor, A. B., Diniz, R. P., Morgante, C. V., Antônio, R. P., & de Oliveira, E. J. (2019). Early
org/10.1016/j.postharvbio.2017.10.007. prediction models for cassava root yield in different water regimes. Field Crops
Tilahun, W. W., Grossi, J. A. S., Favaro, S. P., Sediyama, C. S., Goulart, S. D. M., Research, 239, 149–158. https://doi.org/10.1016/j.fcr.2019.05.017.
Pimentel, L. D., & Motoike, S. Y. (2019). Increase in oil content and changes in Yang, R., Zhang, L., Li, P., Yu, L., Mao, J., Wang, X., & Zhang, Q. (2018). A review of
quality of macauba mesocarp oil along storage. OCL, 26, 20. https://doi.org/ chemical composition and nutritional properties of minor vegetable oils in China.
10.1051/ocl/2019014. Trends in Food Science and Technology, 74(May 2017), 26–32. https://doi.org/
10.1016/j.tifs.2018.01.013.

You might also like