Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc.

22, 411–421 (2014) 411


Received: 13 October 2014 n Revised: 10 December 2014 n Accepted: 8 January 2015 n Publication: 13 January 2015

JOURNAL
OF
NEAR
INFRARED
SPECTROSCOPY

Application of near infrared spectroscopy


for green coffee biochemical phenotyping
Maria Brígida dos Santos Scholz,a,* Cíntia Sorane Good Kitzberger,a Luiz Filipe Protasio Pereira,b
Fabrice Davrieux,c David Pot,d Pierre Charmetantd and Thierry Leroyd
a
IAPAR—Instituto Agronômico do Paraná, Departamento de Fisiologia Vegetal, Londrina, Brazil. E-mail: mbscholz@iapar.br
b
Embrapa Café, Londrina, Paraná, Brazil
c
Cirad-UMR Qualisud, F-34398 Montpellier, France
d
Cirad-UMR AGAP, F-34398 Montpellier, France

Accessions resulting from surveys in Ethiopia (the centre of origin of Arabica coffee) can be used as a source of genetic variability in
breeding coffee plants. They may contain some genes of interest for coffee breeding, specifically in relation to beverage quality. Near
infrared (NIR) spectroscopy was used to develop models for predicting the major coffee constituents related to quality beverage (pro-
teins, caffeine, lipids, chlorogenic acids, phenolic compounds, total sugars and sucrose). We selected coffee samples listed in a data-
base containing data of chemical contents from samples of traditional and modern cultivars and of Ethiopian accessions to construct
models to predict these compounds. Spectra were collected between 1100 nm and 2500 nm, and mathematical pretreatments were
applied. The number of samples for the calibration step for each compound was set so as to be representative of distribution values.
Cross-validation was performed on the total set of samples to select the optimal number of terms for the prediction models of each
component. The prediction models were developed employing a modified partial least-squares regression. The total set of samples for
each component was divided randomly into two subsets: one for developing the prediction model and the other to evaluate the predicted
values. The best prediction models obtained were for chlorogenic acids (r2 = 0.94, RPD = 4.16), proteins (r2 = 0.94, RPD = 4.09) and caf-
feine (r2 = 0.92, RPD = 4.16). Models for lipids and phenolic compounds were not as accurate (r2 = 0.87, RPD = 2.77 and r2 = 0.86, RPD = 2.62,
respectively), while models for sucrose (r2 = 0.84, RPD = 2.44) and total sugars (r2 = 0.85, RPD = 2.55) were even less accurate. All these
models can be used for identifying coffee lines with more desirable traits in breeding programmes. The models were effective in dis-
criminating Ethiopian coffee accessions from modern cultivars of coffee. Additionally, the NIR technique will make it possible to analyse
a large number of samples in breeding programmes and may be used as a high-throughput analysis for green coffee phenotyping.

Keywords: green coffee components, NIR spectroscopy, Ethiopian coffee accessions, coffee cultivars, high-throughput phenotyping

Introduction
In the global economy, coffee is a major commercialised agri- by Colombia and Vietnam. Many producing countries are also
cultural product, the production of which involves thousands large consumers of coffee, such as Brazil, which is currently
of people around the world. In 2010, there were approximately the second largest consumer of this beverage.2
26 million farmers and coffee workers in over 50 producing Coffee is consumed mainly for the stimulating effect of
countries engaged in coffee production.1 Coffee production caffeine, but the aroma and flavor play a fundamental role
in the 2013–2014 season was 145 million 60 kg bags of coffee. in making coffee one of the most appreciated and consumed
Brazil is the largest producer and exporter of coffee followed beverages in the world.3

ISSN: 0967-0335 © IM Publications LLP 2014


doi: 10.1255/jnirs.1134 All rights reserved
412 Green Coffee Biochemical Phenotyping

During roasting, the reactions between various constituents traditional techniques: it is fast and inexpensive and requires
of green beans produce nearly 1000 compounds (volatile and only small quantities of samples without the use of chem-
nonvolatile), but approximately 30 compounds are responsible ical reagents. Furthermore, samples can be analysed in their
for the main impression of the coffee aroma.4 To identify the natural state or in small preparations, and several compounds
role of these compounds in beverage quality, the relationship can be measured simultaneously.15,16
between chemical compounds in green beans and beverage In the area of plant breeding, the composition and quality
quality has been built over the years.5 of new lines must be evaluated for at least three years to be
The compounds in green beans, such as caffeine, proteins, confirmed, generating a large number of determinations. NIR
lipids, phenolic compounds and sugars, are among the main has been widely used in this area to quantify the oil, protein and
precursors of the typical aroma and flavor of roasted coffee.6 total and free fatty acids content in sunflower lines17, to assess
Caffeine was the first compound studied in coffee, and the the seed quality of various species18,19 and to identify intro-
content varies greatly within a single species owing to the gression between different coffee species.20 Prediction models
environmental and agronomic conditions.7 The protein content were developed to determine caffeine and21,22 proteins,23 and
of coffee beans also varied greatly and is influenced by envi- more recently, this technique was also shown to be able to
ronmental conditions, fertilisation practices and the cultivar.8,9 predict cafestol and kahweol in green coffee beans.24
The presence of lipids is associated with aroma retention and The quantification of these compounds in green coffee
foaming in the beverage, and their accumulation depends on by NIR was used to evaluate the influence of altitude and
several factors, particularly the species and cultivars. Beans cultivar on the biochemical composition of Arabic hybrids5
of the same cultivar of coffee grown in different environmental as well as differentiation between Arabica non-introgressed
conditions show lipid contents ranging from 10.65 g 100 g–1 to and C. canephora derivate genotypes.20 Furthermore, NIR
13.16 g 100 g–1.9 However, a wider range (10.69–16.75 g 100 g–1) was employed to evaluate the quality of Arabica and Robusta
was found by measuring different canephora cultivars.10 coffees from different geographical locations25 and the quality
Sugars are important precursors of the aromatic compounds of roasted coffee.26
generated during roasting. Sucrose, fructose and glucose are The objective of this study was to evaluate the potential of the
the major sugars found in coffee, and an increase in their NIR technique for quantifying the main components of green
concentrations has been associated with maturation stage of coffee (caffeine, chlorogenic acids, proteins, lipids, phenolic
the beans. The sucrose level increases as the beans develop, compounds, total sugars and sucrose) and for discriminating
and the glucose and fructose levels decrease at the end of the cultivars from different origins (Ethiopian coffee accessions,
cycle.11 traditional and modern cultivars).
The concentrations of phenolic compounds, especially chlo-
rogenic acids, are inversely associated with bean ripening. The
immature beans contain large amounts of such compounds,
and their presence in the beverage creates a high acidity and
Materials and methods
astringency.12 Samples
Despite the range of compositions observed for the chem- To construct prediction models of each of the proposed constit-
ical compounds, the genetic diversity in coffee varieties is uents, representative sampling of the coffee was conducted.
very narrow. Basically, with rare exceptions, the cultivars of C. Thus, a database containing data on chemical contents from
arabica plantations in Central and South America are derived coffee samples collected and analysed during various seasons
from Typica and Bourbon cultivars.13 This narrow origin has in the coffee region of Paraná State was created. In the 2004
limited the search for polymorphic markers associated with season, samples of one cultivar (by IAPAR 59), cultivated under
the formation of the main compounds in the coffee bean.14 different fertiliser levels, were collected from various agri-
Ethiopia is considered the centre of origin of Arabica cultural properties. Samples were collected from Ethiopian
coffee.13 Accessions resulting from surveys in Ethiopia can coffee accessions at IAPAR for two seasons (2006 and 2007).
be used as a source of genetic variability. They may contain Modern coffee cultivars developed by IAPAR (Iapar 59, IPR 97,
some genes of interest for coffee breeding related mainly IPR 98, IPR 99, IPR 100, IPR 101, IPR 102, IPR 103, IPR 104,
to beverage quality. One hundred and thirty-two samples of IPR 105, IPR 106, IPR 107 and IPR 108) were collected for
accessions were collected in Ethiopia, and planted and main- three seasons (2008, 2009 and 2010) in experimental plots
tained in the experimental fields of the Agronomic Institute of in the coffee region of Paraná State. Finally, to complete the
Paraná, Brazil (IAPAR) in 1976. The phenotypic and genotypic database, traditional and modern coffee cultivars (Iapar 59,
characterisation of this population is the starting point for IPR 99, Catuaí, Mundo Novo, Tupi, Obatã, and Bourbon) were
studies of linkage disequilibrium mapping, or mapping by harvested on coffee farms in the region Norte Pioneiro of
association, current tools that are used to accelerate genetic Paraná State in the 2009 and 2010 seasons.
gain in breeding programmes. To ensure that the maximum range of values was covered
Near infrared (NIR) spectroscopy is a technique that makes for the constituents of green coffee, specific subsets were
it possible to evaluate a large number of phenotypes in a short separated manually from the total set of samples. This action
time. This technique has many advantages compared with resulted in a variable number of samples for each constituent
M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc. 22, 411–421 (2014) 413

model. The variability of the samples in the data sets (calibra- All the biochemical determinations were performed in
tion and validation) was evaluated for each group by the coef- duplicate, all reagents were suitable for chemical analysis,
ficient of variation, standard deviation and range values of the and all results were expressed on a dry basis.
contents.
In order to verify the applicability of the developed prediction NIR spectrum acquisition
models, another group of samples was analysed. This group NIR spectra were collected using a spectrphotometer (model
was formed by traditional cultivars [Bourbon (BA10), Catuaí 6500, Foss NIRSystems, Silver Spring, MD) equipped with a
(Ctai)], modern cultivars (by IAPAR 59, IPR 97, IPR 98, IPR 99, reflectance detector. Approximately 6 g of ground coffee was
IPR 100, IPR 101, IPR 102, IPR 103, IPR 104, IPR 105, IPR 106, placed in a rectangular cell, compressed gently with a metal
IPR 107 and IPR 108) and coffee accessions from the collec- spatula and closed with a cardboard lid. The ISIscan software
tion of Ethiopian coffees (Eastern side of the Rift Valley: E007 package (Foss, Silver Spring, Maryland-USA) was used to
and E237, Western side: E041, E044, E047, E087, E123a, E123b, control the spectrophotometer, collect the spectra, import and
E331, E338, E383, E464, E511 and E516). These accessions analyse the data. Although spectra were collected between
were planted in experimental fields at IAPAR, and the samples 400 nm and 2500 nm, only the region between 1100 nm and
were collected for two seasons (2010 and 2011). The chemical 2500 nm was used to develop prediction models. Spectra were
composition of these samples was determined by applying the collected between 1100 nm and 2498 nm at 2 nm intervals, and
prediction model developed here. saved as the average of 32 scans. Scans were stored as log
(1/R), where R was the reflectance at each wavelength. All
Sample preparation calculations were performed in WinISI 4.5 software (Infrasoft
Ripe coffee beans were collected manually, washed and then International, Port Matilda, PA, USA).
dried in the sun until a moisture content of 12–12.5% was The noise in the spectra caused by light scattering was
reached. The husk and parchment were removed mechani- corrected mathematically by applying standard normal variate
cally in portions of approximately 300 g. The green coffee beans and detrend corrections.15,31,32 Then, a second derivative was
were dipped in liquid nitrogen and ground (0.5 mm particles) in calculated on five points of the spectrum, and additional
a disk mill (Perten 3600, Kungens Kurva, Sweden). The ground smoothing was performed using the Savitzky–Golay poly-
samples were frozen at –20°C for further chemical analysis nomial method with a gap of five points to reduce the peak
and spectral measurements. overlap, eliminate the baseline shift and preserve the essen-
tial features of the data in the spectrum.15,31
Biochemical analyses Prior to developing any of the calibration, outlier samples in
To quantify the caffeine content, the ground coffee was added the population were detected using the Mahalanobis distance
to water and MgO, and heated to the boiling point for 30 (H) from a principal-component analysis (PCA) of corrected
min. After filtering, an aliquot of chloroform was evaporated spectra. The Mahalanobis distance of each spectrum with
completely and added to the boiling water. After cooling, the respect to the average spectrum was calculated.18 Samples
absorbance was read at 273 nm, and the concentration was with an H statistics value greater than 3 units from the mean
calculated from a standard curve for caffeine. The total sugars spectrum were defined as outliers and removed when calibra-
(reducing and non-reducing) were extracted with water, which tions were developed.15,31,32
was maintained at 80°C, for 30 min. After cooling, the sample
was clarified with Carrez reagent, and the reducing sugars NIR calibration development
were determined using the Somogyi–Nelson assay.27 To deter- The calibration model was developed using the regression
mine the total sugars, an aliquot of the clarified solution was algorithm of modified partial least squares (mPLS),19 and a
hydrolysed before determining the reducing sugars. The sugar cross-validation was performed to select the optimal number
content was quantified using a standard curve prepared with of terms for the prediction models of each component and
glucose. The amount of sucrose was calculated by subtracting avoid over-fitting.33 For cross-validation, the sample set was
the amount of reducing sugars and the total sugar value. The divided and analysed automatically using WinISI 4.5 software.
protein content was determined by the Kjeldahl method. 28 During this procedure, 25% of the samples were randomly
After digestion of the organic matter with sulfuric acid, the assigned as validation samples, and the calibration model
protein concentration was obtained by multiplying the value was developed with the remaining 75%. This procedure was
of the nitrogen by a factor of 6.25. The lipids were solvent- repeated four times, and the average standard error of cross-
extracted (petroleum ether) over 16 h with heating (45–50°C). validation (SECV) was calculated. The optimal number is
The phenolic compounds (PC) were determined in the aqueous defined as number terms beyond which there is no significant
extract with the Folin–Ciocalteau reagent, and the concen- decrease in the SECV.22,31
tration of these compounds was obtained from a standard After this step, the total sample set was divided randomly
curve prepared with gallic acid.29 The total chlorogenic acids using the WinISI software into two parts: a subset containing
were extracted with isopropyl alcohol, and after a reaction about 75% of the samples was used to construct the calibra-
with potassium periodate, the absorbance was measured at tion model, and another subset (about 25% of the samples)
530 nm. A standard curve was prepared with 5-CQA.30 was used as the validation set.
414 Green Coffee Biochemical Phenotyping

The calibration statistics included the following parameters:


standard deviation (SD), coefficient of determination (R2) and
standard error of calibration (SEC). The Student (t) test was
used to identify t-outlier samples during calibration develop-
ment in order to identify samples that could not be predicted
by the model. In this case, outlier detection was based on
the standardised residuals ( = error/SECV), and the limit for
acceptance was t ≤ 2.5.15,18
In the validation step, the following statistical parameters
were used to evaluate the models: SD, r2 and standard error
of validation (SEP), bias, slope and ratio of performance devia-
tion (RPD).13,34 This later parameter is the ratio of the SD of
the external validation set to SEP (SD/SEP), and it is valuable
when evaluating the capability of the prediction of the model.
In contrast to the SEP and SECV, the RPD is a dimensionless
parameter and aids comparisons between models. Williams
and Sobering35 suggest that values between 5 and 10 are suit-
able for estimating absolute contents, and values in the range
from 2.5 to 4.9 are satisfactory for comparison purposes and
screening in breeding programmes.

Other statistical analysis


PCA was conducted using the program XLSTAT (Addinsoft,
2008).36

Results and discussion


NIR spectral analysis
Figure 1. Typical spectrum of green coffee in the wavelength
Figure 1(a) represents a typical spectrum of green coffee
range of 1100–2500 nm: (a) raw spectrum (log 1/R) and
beans: many overlaps in the vibrations of chemical bonds (b) second-derivative spectrum.
result in large bands or poor peak resolution. Using the second
derivative and smoothing techniques solved these problems
and produced a spectrum with defined peaks [Figure 1(b)]. from CH2 of lipids. In the last region of the NIRS spectra (2300–
Then, specific chemical bonds can be linked to these peaks. 2480 nm) C–H combinations related mainly to carbohydrates
The spectrum for ground green coffee beans contained are also observed. Information on the functional group in the
various peaks between 1100 nm and 2500 nm. The main spectrum was obtained from WinISI software and previously
absorption bands are observed at 1208 nm related to C–H published articles.26,37,38
second overtone stretching. These bands are associated with These chemical bonds are found in several compounds of
carbohydrate (e.g. sucrose) or lipids and amino acids (proteins) green coffee; thus, it was possible to establish the relation
and caffeine. The region near 1400 nm contains the first over- between these peaks and the concentration of the compounds
tones of C–H and O–H that could be attributed to water and to construct a prediction model.
other compounds of green coffee. The aromatic bonds can be
found at around 1500 nm, corresponding to the first overtone NIR calibration
of aromatic structures C–H, which is assumed to be related in The numbers of samples and terms of the model and statis-
PC and chlorogenic acids (CGA). On the other hand, the bands tical description of samples employed in the cross-validation
situated between 1700 nm and 1800 nm are linked to the first step for the components of green coffee are presented in
overtone of C–H corresponding to lipids, carbohydrates and Table 1.
caffeine. In the region 1800–2000 nm important bands related The number of samples used to build the prediction model
to water, carboxylic acids and carboxylates or amides are for each compound was variable. These samples were chosen
found. Next to this region (2016 nm), the 2 × O–H deforma- at random within a database so as to cover broad ranges
tion and C–O deformation, associated with various chemical for each component (Table 1). Different numbers of samples
compounds, can be found. were used because the quantity of components arises from
Signals between 2200 nm and 2300 nm are associated with different metabolic pathways, and thus it is not usual to have
the O–H (water) bonds and C–C vibrations, and peaks above the appropriate concentration ranges of each component in
2300 nm correspond to combinations of tones between C–H the same group of samples.
M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc. 22, 411–421 (2014) 415

Table 1. Number of samples (N), number of terms (T), mean and


range values (g 100 g–1) for the cross-validation analysis.

Constituents N T Mean Range SD


Proteins 309 10 13.81  9.62–18.34 1.87
Caffeine 361 11  1.98 0.66–1.57 0.18
Lipids 254  7 13.30 10.15–16.69 1.26
TS 355 10  8.26  5.01–10.94 1.47
Sucrose 358 12  7.96  4.72–10.58 1.37
CGA 427 10  8.39  3.92–13.71 2.85
PC 381  9  4.90  3.68–6.86 0.70

TS: total sugars. CGA: chlorogenic acids. PC: phenolic compounds.

The optimum number of terms defined in the cross-valida-


tion step ranged from seven for the lipids PLS model to 12 for
the sucrose model (Table 1).
All constituents showed a large variability of values (Table 1).
This distribution of the concentrations was sufficient for the Figure 2. NIR-predicted values versus laboratory measure-
purposes of calibration and validation, and suggests wide ments of proteins in green coffee beans.
applicability for each of the developed models.39 Biotic and
abiotic stress, different conditions of harvesting and drying, conditions, the contents of compounds such as caffeine and
and interspecific crosses may modify the content of these proteins are modified. In our study, the range of caffeine and
compounds in green coffee.7,20,24 For studies of coffee biochem- proteins contents in the calibration models was large, covering
ical phenotypes, samples with an extremely low or extremely almost all situations that could occur in a general study of
high values of chemical composition should be carefully coffee. Studies involving low caffeine values occur frequently
reviewed.40 Therefore, values outside the limits described in because obtaining cultivars with low levels of caffeine is one of
the literature were included in this study to ensure wide appli- the objectives of breeding coffee.41,42
cability in the developed models. The calibration model developed for the prediction of
A summary of all parameters for the predicting models is proteins and caffeine produced coefficients of determination
presented in Table 2 and Figures 2–8. The prediction models equal to 0.97 and 0.83, respectively (Table 2).
were formed using a variable number of PLS terms (seven The comparison of the laboratory values and NIR predicted
and 10) that did not exceed those established in the cross- contents in the group of validation samples (Figures 2 and
validation (Table 1). 3) resulted in coefficients of determination of 0.94 and 0.92
for protein and caffeine, respectively (Table 3). The ranges
Proteins and caffeine of different values in the groups of calibration and valida-
When coffee is cultivated using various agronomic practices tion for caffeine may have caused the highest value of R2 in
(fertilisation, irrigation) or grown under different climatic the validation group of caffeine (Table 3). It is well known

Table 2. Number of samples (N), number of terms (T), mean values (g 100 g–1), range values (g 100 g–1) standard error of laboratory (SEL) and
statistical parameters for the calibration models (SD, SEC, R2).

Constituents N T Mean Range SEL SD SEC R2


Proteins 254 10 13.82  9.62–18.34 0.35 1.85 0.32 0.97
Caffeine 305 10  1.19 0.69–1.57 0.05 0.18 0.07 0.83
Lipids 195  7 13.27 10.15–16.69 0.40 1.25 0.46 0.86
TS 288 10  8.24  5.01–10.81 0.25 1.448 0.45 0.90
Sucrose 287 10  7.97  4.72–10.51 0.25 1.37 0.44 0.90
CGA 368 10  8.41  3.93–13.71 0.30 2.86 0.73 0.93
PC 315  9  4.88 3.68–6.86 0.30 0.71 0.22 0.91
TS: total sugars. CGA: chlorogenic acids. PC: phenolic compounds. SD: standard deviation. R2: coefficient of determination. SEC: standard error of
calibration­.
416 Green Coffee Biochemical Phenotyping

Figure 3. NIR-predicted values versus laboratory measure-


ments of caffeine in green coffee beans.
Figure 5. NIR-predicted values versus laboratory measure-
ments of PC in green coffee beans.

that the range of values in the validation group affects the


value of the coefficient of determination.18,43 Thus, in this case, coffee beans grown in different geographical locations found
other parameters such as SEP and RPD would be better indi- SEC and R2 values of 0.07 and 0.88, respectively.22 On the other
cators of fitting of model.18 The SEP values for caffeine and hand, models for caffeine employing Brazilian coffees showed
proteins are near the values of SEC, indicating good adjust- a better performance (SEC = 0.02, R2 = 0.95) than models for
ment models for these compounds, and they are similar to protein (SEC = 0.064, R2 = 0.69).21,23
respective standard error of laboratory (SEL) values (Table Furthermore, the models developed in the present study for
2). Published models developed for caffeinein green Arabica protein and caffeine yield RPD values equal to 4.09 and 3.32,

Figure 4. NIR-predicted values versus laboratory measure- Figure 6. NIR-predicted values versus laboratory measure-
ments of CGA in green coffee beans. ments of lipids in green coffee beans.
M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc. 22, 411–421 (2014) 417

Figure 7. NIR-predicted values versus laboratory measure- Figure 8. NIR-predicted values versus laboratory measure-
ments of total sugar in green coffee beans. ments of sucrose in green coffee beans.

respectively; these values were similar to those indicated for the prediction model of CGA was better than the PC model. The
comparison purposes. specificity of the chemical reaction in the laboratory determi-
nation probably contributed to a more accurate model for CGA.
CGA and PC
The contents of CGA and PC in the green coffee beans depend Total sugars, sucrose and lipids
primarily on climatic conditions and the degree of ripeness.12 The contents of total sugars, sucrose and lipids in green coffee
The calibration models for CGA and PC showed a high preci- were strongly influenced by the stage of maturity.
sion for NIR determinations. The values of SEL and SEP for The calibration models for lipids (Figure 6), total sugars
the PC model were also very close (Tables 2 and 3). CGA model (Figure 7) and sucrose (Figure 8) showed R2 values of 0.86,
showed SEP values slightly higher than the respective SEL 0.90 and 0.90, respectively, and SEC values of 0.46, 0.45 and
(Tables 2 and 3). The validation models of CGA and PC showed 0.44, respectively (Table 2). However, in the sample set used to
r2 values of 0.94 and 0.86, respectively (Table 3, Figures 4 and validate the models of TS and sucrose, the r2 values of these
5). The RPD values of 4.16 and 2.65 for CGA and PC, respec- compounds showed a slight decrease relative to calibration
tively, indicate that both models can be used to screen samples values (Table 3).
for these compounds in a coffee-plant breeding programme. The RPD values for validation models for the lipids, total
Although these compounds have similar chemical structures, sugars and sucrose (2.77, 2.55 and 2.44, respectively)

Table 3. Number of samples (N), number of terms (T), mean values (g 100 g–1), range values (g 100 g–1) and statistics parameters for the
validation analysis

Constituents N SD Mean Range SEP r2 RPD


Proteins 56 1.70 13.86 9.94–16.90 0.42 0.94 4.09
Caffeine 65 0.19 1.20 0.76–1.63 0.06 0.92 3.32
Lipids 57 1.28 13.32 10.38–15.66 0.46 0.87 2.77
TS 62 1.35 8.34 5.47–10.71 0.53 0.85 2.55
Sucrose 62 1.30 8.04 5.28–10.37 0.53 0.84 2.44
CGA 55 2.97 8.42 4.28–12.89 0.72 0.94 4.16
PC 64 0.69 4.87 3.79–6.37 0.26 0.86 2.62
TS: total sugars. CGA: chlorogenic acids. PC: phenolic compounds. r 2: coefficient of determination. SEL: standard error of laboratory. SEP: standard error of
prediction. RPD: ratio of performance deviation.
418 Green Coffee Biochemical Phenotyping

indicated  that these models can be used for comparison absorption bands relative to C–H + CH combinations (2284 and
purposes. 2356) Models for determining cereal and roasted coffee lipids
had high coefficients in a similar spectrum bands.26,48
Principal mPLS loadings for the formation of The prediction models for total sugars showed similar sets
models of the constituents of green coffee beans of bands with high coefficients to those for sucrose, because
MPLS loadings (coefficients) were analysed for each constit- sucrose is the main sugar of green coffee beans,6,7 and to
uent of green coffee beans. The first factor in the mPLS model other sugars present (glucose, fructose), as these have chem-
proteins was mainly associated with absorption bands of ical structures very similar to sucrose.
chemical bonds relevant to protein structure. High coeffi- The prediction models showed settings that allowed its
cient values are found in the region of 1212 nm for the second application in breeding coffee. The possibility of phenotyping
overtone of the C–H stretching bands from the –CH2 group, at the genotypes for these compounds in coffee will contribute to
1388 nm associated with the first overtone bands of the C–H substantial research advances in functional genomics.
combinations from CH3 and at 1452 nm owing to the first over-
tone of N–H and O–H. The coefficients with high values found Composition variability of Ethiopian
in the region between 1708 nm and 1764 nm were attributed accessions­and cultivars
to absorption bands of the first overtone of C–H. Furthermore, Biochemical phenotyping is one of the tools that can be used
absorptions at 1972 nm were related to N–H and at 1978 nm by breeders for genetic mapping (quantitative trait loci) and
to asymmetric N–H stretching and amide II, respectively. In for association studies. Therefore, chemical compounds
addition, significant coefficients attributed to the absorption related to quality beverages were determined by the predic-
arising from the structure of proteins, N–H combinations tion models developed in this study. A greater variability in
(2060 nm and 2076 nm), CH + CC combination bands (2300 nm) the biochemical composition was observed in the Ethiopian
and C–H + CC combinations bands from CH, CH2 and CH3 accessions than in traditional and modern cultivars [Figure
groups (2276 nm and 2348 nm), were found. Similar bands 9(a, b)]. This behaviour­in the traditional and modern cultivars
were associated with models for predicting protein in oilseeds can be attributed mainly to the narrow genetic base from
and press cakes of rosehips hips, hazelnut and soy.43 which these cultivars originated. In comparison, Ethiopian
Many absorption bands that contributed to the prediction accessions showed higher contents for all constituents of
model of protein (1212 nm, 1388 nm, 1708 nm, 1760 nm and coffee, with the exception of total sugars and sucrose contents
1920 nm) were also present in the model of caffeine. However, [Figure 9(b)].
in the caffeine model, the contribution of bands at 2276 nm Usually, the constituents analysed in the present study play
owing to the absorption by N–H and O–H bands combinations a fundamental role in the description and discrimination of
was noted. Models built for caffeine were formed by similar
bands.39,44
The model for prediction of chlorogenic acid showed high
coefficients in bands similar to those in models of caffeine and
protein. However, the band at 2348 nm attributed to C–H + C–C
bands from combinations CH, CH2 and CH3 made a greater
contribution to the CGA and protein models than the caffeine
model. Perhaps some chemical bonds in caffeine were
unavailable because this compound was present as a caffeine
chlorogenate complex in green coffee beans.45,46
The PC evaluated in this study were those containing
many polymerised aromatic rings and reacted with the Folin–
Ciocalteau reagent.12,29,47 The highest coefficients for the
prediction model were found at 1388 nm and 1452 nm (first
overtone of aromatic structures OH), 2060 nm and 2116 nm
(first overtone of C = O and OH bands combinations) and those
in the regions 2300–2480 nm (CH, CH + C–H, C–H + CC and
combinations). Compounds of olive oil determined with the
same reagent also showed similar bands (1332–1470 nm and
1834–2175 nm), which were used to form the model for the
prediction of these compounds.47
The highest coefficients for the prediction model of lipids
were found: at 1212 nm (second overtone of OH), 1388 nm
(2 × CH stretching) and 1740 nm (first CH stretch). High coef-
Figure 9. (a) Coefficient of variation and (b) mean values of
ficients were also associated with the first overtone of C = O chemical compounds for Ethiopian accessions and cultivars.
and OH band combinations of various fatty acids (2052 nm) and
M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc. 22, 411–421 (2014) 419

sition determined by NIR. Furthermore, the description of


biochemical characteristics and a multivariate analysis made
it possible to identify which genotypes are appropriate for
desired characteristics.
Knowledge of the concentration of the various constituents
of coffee from a single spectrum will pave the way for studies
that are currently limited by human and financial resources.
Our study demonstrated how the NIR technique can be used
to improve biochemical phenotyping in breeding programmes.

Acknowledgements
Thanks to Capes (Coordenação de Aperfeiçoamento de
Pessoal de Nível Superior-Brazil), to Agropolis Foundation
Figure 10. PCA: plot of chemical contents and coffee cultivars/ France for their support to the joint Cirad–UEL–IAPAR Project
accessions in the space formed by PC1 and PC2.
No. 1002-02 and to Foss Company (now Metrohm NIRSystem)
for 3 months of the WINISI 4.5 software licence.

coffee cultivars. PCA has been frequently used to associate


the chemical composition of coffee with the geographical
origin,9,49 physicochemical and sensory traits50 and botanical
References
origin.10,51–53 Here, we apply this technique to discriminate 1. International Coffee Organization. Promotion and Market
Ethiopian accessions and traditional/modern cultivars. Development. http://www.ico.org
PCA applied to compounds predicted through the NIR 2. Associação Brasileira de Indústria de Café—ABIC.
showed that the first two PCA axes accounted for 75.22% and Estatisticas. http://www.abic.com.br
15.11% of the total variability. Proteins, caffeine, lipids, total 3. N. Bhumiratana, K. Adhikari and E. Chambers IV,
sugars, sucrose and PC contributed the most to first principal “Evolution of sensory aroma attributes from coffee­beans
component. The CGA content had a similar correlation with to brewed coffee”, Food Sci. Technol-LEB 44(10), 2185–
both principal components. 2192 (2011). doi: http://dx.doi.org/10.1016/j.lwt.2011.07.001
PC1 components showed a clear separation between 4. J. Baggenstoss, L. Poisson, R. Kaegi, R. Perren and
Ethiopian accessions and modern cultivars. The Ethiopian F. Escher, “Coffee roasting and aroma formation: appli-
accessions, to the right of the biplot (Figure 10), are associated cation of different time−temperature conditions”, J. Agr.
with high values of caffeine, lipids, CGA and PC. Total sugar Food Chem. 56(14), 5836–5846 (2008). doi: http://dx.doi.
and sucrose are found in greatest concentrations in the tradi- org/10.1021/jf800327j
tional and modern cultivars. It is probable that the selection 5. B. Bertrand, P. Vaast, E. Alpizar, F. Davrieux and P.
process has given priority to lines with higher levels of sucrose Charmetant, “Comparison of bean biochemical composi-
in relation to taste. tion and beverage quality of Arabica hybrids involving
The compounds predicted with these models were effective Sudanese–Ethiopian origins with traditional varieties at
for characterising and separating the biochemical composi- various elevations in Central America”, Tree Physiol. 26(9),
tion of coffee from traditional and modern cultivars, and from 1239–1248 (2006). doi: http://dx.doi.org/10.1093/tree-
Ethiopia. Therefore, it was possible to identify biochemical phys/26.9.1239
features in these cultivars and accessions of interest for future 6. D. Selmar, G. Bytof and S. Knopp, “The storage of
crosses. green coffee (Coffea arabica): decrease of viability
and changes of potential aroma precursors”, An. Bot.
101(1), 31–38 (2008). doi: http://dx.doi.org/10.1093/aob/

Conclusions mcm277
7. T. Joët, A. Laffargue, F. Descroix, S. Doulbeau, B.
This study has demonstrated the effectiveness of near infrared Bertrand, A. Kochko and S. Dussert, “Influence of envi-
spectroscopy in the biochemical characterisation of the most ronmental factors, wet processing and their interactions
important constituents of green coffee (caffeine, proteins, on the biochemical composition of green coffee beans”,
lipids, total sugars, sucrose, CGA and PC). Prediction models Food Chem. 118(3), 693–701 (2010). doi: http://dx.doi.
were proposed for these compounds and appropriately org/10.1016/j.foodchem.2009.05.048
adjusted. 8. C.I. Rodrigues, R. Maia and C. Máguas, “Comparing total
Furthermore, this study also showed the capacity to nitrogen and crude protein content of green coffee beans
separate coffee cultivars based on the biochemical compo-
420 Green Coffee Biochemical Phenotyping

(Coffea spp.) from different geographical origins”, Coffee 20. B. Bertrand, H. Etienne, P. Lashermes, B. Guyot and
Sci. 5(3), 197–205 (2010). F. Davrieux, “Can near-infrared reflectance of green
9. D. Borsato, M.V.R. Pina, K.R. Spacino, M.B.S. Scholz coffee be used to detect introgression in Coffea arabica
and A. Androcioli Filho, “Application of artificial neural cultivars?”, J. Sci. Food. Agr. 85(6), 955–962 (2005). doi:
networks­in the geographical identification of coffee sam- http://dx.doi.org/10.1002.jsfa2049
ples”, Eur. Food Res. Technol. 233(3), 533–543 (2011). doi: 21. M.A. Morgano, C. Camargo, A, P. Pagel, M.F. Ferrão,
http://dx.doi.org/10.1007/s00217-011-1548-z N. Bragagnolo and M.M.C. Ferreira, “Determinação
10. A.T.E. Aguiar, L.C. Fazuoli, T.J.G. Salva and J.L. Favarin, simultânea dos teores de cafeína, trigonelina e áci-
“Variação no teor de lipídios em grãos de variedades dos clorogênicos em amostras de café cru por análise
de Coffea canephora”, Pesqui. Agropecu. Bras. 40(12), multivariada­em dados de espectroscopia por reflexão
1251–1254 (2005). doi: http://dx.doi.org/10.1590/S0100- difusa no infravermelho próximo”, II Simpósio de
204X2005001200015 Pesquisas dos cafés do Brasil. 24–27 setembro de
11. A.L. Vasconcelos, A.S. França, M.B.A. Glória and J.C.F. 2001, Vitória, ES. Resumo expandido do II Simpósio de
Mendonça, “A comparative study of chemical attributes Pesquisas dos cafés do Brasil (2001).
and levels of amines in defective green and roasted coffee­ 22. F. Davrieux, J.C. Manez, N. Durand and B. Guyot,
beans”, Food Chem. 100(1), 26–32 (2007). doi: http://dx.doi. “Determination of the content of six major biochemical
org/10.1016/j.foodchem.2005.12.049 compounds of green coffee using NIR”, NIR Publications,
12. A. Farah and C.M. Donangelo, “Phenolic compounds in Proceedings of the 11th International Conference on Near
coffee”, Braz. J. Plant Phys. 18(1), 23–26 (2006). doi: http:// Infrared Spectroscopy, Cordoba, Spain, pp. 441–444 (2004).
dx.doi.org/10.1590/S1677-04202006000100003 23. M.A. Morgano, C.C. Farias, M.F. Ferrão, N. Bragagnolo
13. F. Anthony, M.C. Combes, C. Astorga, B. Bertrand, G. and M.M.C. Ferreira, “Determinação de proteínas em
Graziosi and P. Lashermes, “The origin of cultivated café cru por espectroscopia NIR e regressão PLS” Cienc.
Coffea arabica L. varieties revealed by AFLP and SSR Tecnol. Aliment. 25(1), 25–31 (2005). doi: http://dx.doi.
markers”, Theor. Appl. Genet. 104(5), 894–900 (2002). doi: org/10.1590/S0101-20612005000100005
http://dx.doi.org/10.1007/s00122-001-0798-8 24. M.B.S. Scholz, N.F. Pagiatto, C.S.G. Kitzberger, L.F.P.
14. D.L. Steiger, C. Nagai, P.H. Moore, C.W. Morden, R.V. Pereira, F. Davrieux, P. Charmetant, and T. Leroy,
Osgood and R. Ming, “AFLP analysis of genetic diversity­ “Validation of near-infrared spectroscopy for the quantifi-
within and among Coffea arabica cultivars”, Theor Appl cation of cafestol and kahweol in green coffee”, Food Res.
Genet. 105(1–2), 209–215 (2002). doi: http://dx.doi. Int. 61(1), 176–182 (2014). doi: http://dx.doi.org/10.1016/j.
org/10.1007/s00122-002-0939-8 foodres.2013.12.008
15. M. Plans, J. Simó, F. Casañas and J. Sabaté, “Near- 25. J.R. Santos, M.C., Sarraguça, A.O.S.S. Rangel and J.A.
infrared spectroscopy analysis of seed coats of common Lopes, “Evaluation of green coffee beans quality using
beans (Phaseolus vulgaris L): a potential tool for breeding near infrared spectroscopy: A quantitative approach”,
and quality evaluation”, J. Agric. Food Chem. 60(3), 706– Food Chem. 135(3), 1828–1835 (2012). doi: http://dx.doi.
712 (2012). doi: http://dx.doi.org/10.1021/jf204110k org/10.1016/j.foodchem.2012.06.059
16. F. Davrieux, P.L.A. Rousset, T.C.M. Pastore, L.A Macedo 26. I. Esteban-Díez, J.M., González-Sáiz and C. Pizarro,
and W.F. Quirino, “Discrimination of native wood char- “Prediction of sensory properties of espresso from
coal by infrared spectroscopy”, Química Nova 33(5), roasted coffee samples by near-infrared spectroscopy”,
1093–1097 (2010). doi: http://dx.doi.org/10.1590/S0100- Anal. Chim. Acta 525(2), 171–182 (2004). doi: http://dx.doi.
40422010000500016 org/10.1016/j.aca.2004.08.057
17. B. Biskupek-Korell and C.R. Moschner, “Near-infrared 27. D.A.T. Southgate, Determination of Food Carbohydrates.
spectroscopy (NIRS) for quality assurance in breeding, Applied Science Publishers, London, 1976.
cultivation and marketing of high-oleic sunflowers”, Helia 28. P. Cuniff (Ed.), Official Methods of Analysis of AOAC
29(45), 73–80 (2006). International, 16th ed. AOAC International, Gaithersburg,
18. R. Font, M. Del Rio, J.M. Fernandez-Martinez and A. MD (1995).
Haro-Bailon, “Use of near-infrared spectroscopy for 29. H. Zielinski and H. Kozlowska, “Antioxidant activity and
screening the individual and total glucosinolate contents total phenolics in selected cereal grains and their differ-
in Indian mustard seed (Brassica juncea L. Czern. & ent morphological fractions”, J. Agric. Food Chem. 48(6),
Coss.)”, J. Agric. Food Chem. 52(11), 3563–3569 (2004). doi: 2008–2016 (2000). doi: http://dx.doi.org/10.1021/jf990619o
http://dx.doi.org/10.1021/jf0307649 30. M.N. Clifford and J.C. Wight, “The measurement of feru-
19. R. Font, M. Rio-Celestino and A. Haro-Bailon, “The use loylquinic acids and caffeoylquinic acid in coffee beans.
of near-infrared spectroscopy (NIRS) in the study of Development of the technique and its preliminary applica-
seed quality components in plant breeding programs”, tion to green coffee beans”, J. Sci. Food Agric. 27(1), 73–84
Ind. Crops. Prod. 24(3), 307–313 (2006). doi: http://dx.doi. (1976). doi: http://dx.doi.org/10.1002/jsfa.2740270112
org/10.1016/j.indcrop.2006.06.012 31. F. Davrieux, F. Allal, G. Piombo, B. Kelly, J.B. Okulo, M.
Thiam and O.B. Diallo, “Near infrared spectroscopy for
M.B. dos Santos Scholz et al., J. Near Infrared Spectrosc. 22, 411–421 (2014) 421

high-throughput characterization of shea tree (Vitellaria 43. D. Franco, M.J. Nunez M., Pinelo and J. Sineiro,
paradoxa) nut fat profiles”, J. Agr. Food Chem. 58(13), 7811– “Applicability of NIR spectroscopy to determine oil and
7819 (2010). doi: http://dx.doi.org/10.1021/jf100409v other physicochemical parameters in Rosa mosqueta
32. D. Cozzolino and A. Morion, “Exploring the use of near and Chilean hazelnut”, Eur Food Res Technol. 222(3/4),
infrared reflectance spectroscopy (NIRS) to predict 443–450 (2006). doi: http://dx.doi.org/10.1007/s00217-005-
trace minerals in legumes”. Anim. Feed Sci. Tech. 111(1), 0084-0
161–173 (2004). doi: http://dx.doi.org/10.1016/j.anifeed- 44. C.W. Huck, W. Guggenbichler and G.K. Bonn, “Analysis of
sci.2003.08.001 caffeine, theobromine and theophylline in coffee by near
33. D. Sabatier, P. Dardenne and L. Thuriès, “Near infrared infrared spectroscopy (NIRS) compared to high-perfor-
reflectance calibration optimization to predict lignocel- mance liquid chromatography (HPLC) coupled to mass
lulosic compounds in sugarcane samples with coarse spectrometry”, Anal. Chim. Acta 538(1/2) 195–203. (2005).
particle size,” J. Near Infrared Spectrosc. 19(3), 199–209 doi: http://dx.doi.org/10.1016/j.aca.2005.01.064
(2011). doi: http://dx.doi.org/10.1255/jnirs.929 45. C. Campa, S. Doulbeau , S. Dussert, S. Hamon and M.
34. J.S. Shenk and M.O. Westerhaus, “Population definition, Noirot “Qualitative relationship between caffeine and
sample selection, and calibration procedures for near chlorogenic acid contents among wild Coffea species­”,
infrared reflectance spectroscopy”, Crop Sci. 31(2), 469– Food Chem. 93(1), 35–139 (2005). doi: http://dx.doi.
474 (1991). doi: http://dx.doi.org/10.2135/cropsci1991.00111 org/10.1016/j.foodchem.2004.10.015
83X003100020049x 46. N. D’Amelio, L. Fontanive, F. Uggeri, F. Suggi-Liverani
35. P.C. Williams and D.C. Sobering, “Comparison of com- and L. Navarini, “NMR reinvestigation of the caffeine­–
mercial near infrared transmittance and reflectance chlorogenate complex in aqueous solution and in
instruments for analysis of whole grains and seeds”, J. coffee­brews” Food Biophys. 4(4), 321–330 (2009). doi:
Near Infrared Spectrosc. 1(1), 25–32 (1993). doi: http:// http://dx.doi.org/10.1007/s11483-009-9130-y
dx.doi.org/10.1255/jnirs.3 47. A.M. Inajeros-Garcia, S. Gómez-Alonso, G. Frepane and
36. Addinsoft. XLStat: Software for Statistical Analysis. M.D. Salvador, “Evaluation of minor compounds, sen-
Version 2008.4.02, 2008. Paris. CD-ROM (2008). sory characteristics and quality of virgin olive oil by near
37. J. Workman and L. Weyer, Practical Guide to Interpretive infrared (NIR) spectroscopy”, Food Res Int. 50(1), 250–258
Near-Infrared Spectroscopy. CRC Press, Boca Raton, FL (2013). doi: http://dx.doi.org/10.1016/j.foodres.2012.10.029
(2007). 48. L.L. Vines, S.E. Kays and P.E. Koehler. Near-infrared
38. J.S. Ribeiro, M.M.C. Ferreira and T.J.G. Salva, reflectance model for the rapid prediction of total fat in
“Chemometric models for the quantitative descriptive cereal foods. J Agric Food Chem. 53(5), 1550–1555 (2005).
sensory analysis of arabica coffee beverages using near doi: http://dx.doi.org/10.1021/jf040391r
infrared spectroscopy”, Talanta 83(5), 1352–1358 (2011). 49. B. Bertrand, D. Villarreal, A. Laffargue, H. Posada, P.
doi: http://dx.doi.org/10.1016/j.talanta.2010.11.001 Lashermes and S. Dussert, “Comparison of the effective-
39. M.I. González-Martín, C. Fernandez Bermejo, J.M., ness of fatty acids, chlorogenic acids, and elements for
Hernández Hierro and C.I. Sánchez González, the chemometric discrimination of coffee (Coffea arabica
“Determination of hydroxyproline in cured pork sausages­ L.) varieties and growing origins”, J. Sci. Food. Agr. 56(6),
and dry cured beef products by NIRS technology 2273–2280 (2008). doi: http://dx.doi.org/10.1021/jf073314f
employing a fibre-optic probe”, Food Control 20(8), 50. M.B.S. Scholz, V.R.G. Figueiredo, J.V.N. Silva and C.S.G.
752–755 (2009). doi: http://dx.doi.org/10.1016/j.food- Kitzberger, “Atributos sensoriais e características físico-
cont.2008.09.015 químicas de bebida de cultivares de café do Iapar”, Coffee
40. T. Leroy, F. De Bellis, H. Legnate, E. Kananura, G. Sci. 8(1), 6–16 (2013).
Gonzales, L.F. Pereira, A.C. Andrade, P. Charmetant, C. 51. R.M.N. Souza and M.T. Benassi, “Discrimination of
Montagnon, P. Cubry, P. Marraccini, D. Pot and A. De commercial roasted and ground coffees according
Kochko, “Improving the quality of African robustas: QTLs to chemical composition”. J. Braz. Chem. Soc. 23(7),
for yield- and quality-related traits in Coffea canephora”, 1347–1354 (2012). doi: http://dx.doi.org/10.1590/S0103-
Tree Genet. Genomes 7(4), 781–798 (2011). doi: http:// 50532012000700020
dx.doi.org/10.1007/s11295-011-0374-6 52. C.S.G. Kitzberger, M.B.S. Scholz, L.F.P. Pereira and M.T.
41. M.B. Silvarolla, P. Mazzafera and M.M.A. Lima, “Caffeine Benassi, “Composição química de cafés arábica de cul-
content of Ethiopian coffee arabica beans”, Genet. Mol. tivares tradicionais e modernas”. Pesqui. Agropecu. Bras.
Biol. 23(1), 213–215 (2000). doi: http://dx.doi.org/10.1590/ 48(11), 1498–1506 (2013). doi: http://dx.doi.org/10.1590/
S1415-47572000000100036 S0100-204X2013001100011
42. A.L. Teixeira, P.E.R. Prado, K.O.G. Dias, M.R. Malta and 53. M.B.S. Scholz, V.R.G. Figueiredo, J.V.N. Silva and C.S.G.
F.M.A. Gonçalves, “Avaliação do teor de cafeína em folhas Kitzberger, “Características físico-químicas de grãos
e grãos de acessos de café arábica”, Rev. Cienc. Agron. verdes e torrados de cultivares de café do Iapar”, Coffee
43(1), 129–137 (2012). Sci. 6(3), 245–255 (2011).

You might also like