Computers and Electronics in Agriculture: Original Papers

Computers and Electronics in Agriculture 155 (2018) 324–338
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture

journal homepage: www.elsevier.com/locate/compag
Original papers
Artificial intelligence approach for the prediction of Robusta coffee yield T

using soil fertility properties
Louis Kouadioa, , Ravinesh C. Deob, , Vivekananda Byrareddya, Jan F. Adamowskic,
⁎ ⁎
Shahbaz Mushtaqa, Van Phuong Nguyend

a
Centre for Applied Climate Sciences, Institute for Life Sciences and the Environment, University of Southern Queensland, Toowoomba, QLD 4350, Australia
b
School of Agricultural, Computational and Environmental Science, Centre for Sustainable Agricultural Systems & Centre for Applied Climate Sciences, Institute for Life
Sciences and the Environment, University of Southern Queensland, Springfield, QLD 4300, Australia
c
Department of Bioresource Engineering, Faculty of Agricultural and Environmental Science, McGill University, Québec H9X 3V9, Canada
d
Western Highlands Agriculture and Forestry Science Institute, Buon Ma Thuot, Dak Lak, Viet Nam
ARTICLE INFO ABSTRACT
Keywords: As a commodity for daily consumption, coffee plays a crucial role in the economy of several African, American
Smallholder farms and Asian countries; yet, the accurate prediction of coffee yield based on environmental, climatic and soil fer-
Robusta coffee tility conditions remains a challenge for agricultural system modellers. The ability of an Extreme Learning
Soil fertility Machine (ELM) model to analyse soil fertility properties and to generate an accurate estimation of Robusta coffee
Extreme learning machine
yield was assessed in this study. The performance of 18 different ELM-based models with single and multiple
Machine learning in agriculture
combinations of the predictor variables based on the soil organic matter (SOM), available potassium, boron,
sulphur, zinc, phosphorus, nitrogen, exchangeable calcium, magnesium, and pH, was evaluated. The ELM
model’s performance was compared to that of existing predictive tools: Multiple Linear Regression (MLR) and
Random Forest (RF). Individual model performance and inter-model performance comparisons were based on
the root mean square error (RMSE), mean absolute error (MAE), Willmott’s Index (WI), Nash-Sutcliffe efficiency
coefficient (ENS), and the Legates and McCabe’s Index (ELM) in the independent testing dataset. In the in-
dependent testing phase, an ELM model constructed with SOM, available potassium and available sulphur as
predictor variables generated the most accurate coffee yield estimate (i.e., RMSE = 496.35 kg ha−1 or ± 13.6%,
and MAE = 326.40 kg ha−1 or ± 7.9%). This contrasted with the less accurate MLR (RMSE = 1072.09 kg ha−1
and MAE = 797.60 kg ha−1) and RF (RMSE = 1087.35 kg ha−1 and MAE = 769.57 kg ha−1) model. Normalized
metrics showed the ELM model’s ability to yield highly accurate results: WI = 0.9952, ENS = 0.406 and
ELM = 0.431. In comparison to the MLR and RF models, the adoption of the ELM model as an improved class of
artificial intelligence models for coffee yield prediction in smallholder farms in this study constitutes an original
contribution to the agronomic sector, particularly with respect to the appropriate selection of most optimal soil
properties that can be used in the prediction of optimal coffee yield. The potential utility of coupling artificial
intelligence algorithms with biophysical-crop models (i.e., as a data-intelligent automation tool) in decision-
support systems that implement precision agriculture, in an effort to improve yield in smallholder farms based
on carefully screened soil fertility dataset was confirmed.
1. Introduction industry is the second largest producer of coffee beans worldwide, and
the first in Robusta coffee production (Marsh, 2007; FAO, 2016). A
Coffee is the second most important traded commodity in the world, combination of various factors including government policies, state-
playing an important role in the economy of several African, American sponsored migration, suitable land for production, infrastructure in-
and Asian countries. The two economically important coffee species vestment in irrigation, and the 1990s coffee price boom, have led to a
worldwide are Arabica coffee (Coffea arabica L.) and Robusta coffee (C. significant increase in coffee production in Vietnam during the last
canephora Pierre ex A. Froehner), with the latter representing about three decades (20% year−1 between 1993 and 2000) (D’haeze et al.,
40% of total bean production (ICO, 2016). The Vietnamese coffee 2005; Marsh, 2007). The Vietnamese coffee industry is dominated by
⁎
Corresponding authors.
E-mail addresses: Louis.Kouadio@usq.edu.au (L. Kouadio), ravinesh.deo@usq.edu.au (R.C. Deo).
https://doi.org/10.1016/j.compag.2018.10.014
Received 3 April 2018; Received in revised form 31 August 2018; Accepted 9 October 2018
Available online 30 October 2018
0168-1699/ © 2018 Elsevier B.V. All rights reserved.
L. Kouadio et al. Computers and Electronics in Agriculture 155 (2018) 324–338
Nomenclature MAE mean absolute error

MARS multivariate adaptive regression spline
Abbreviations Mg magnesium
Mn manganese; Mo molybdenum
a hidden neuron parameter MLR multiple linear regression
b hidden neuron parameter N number of training data pairs
h (x) hidden neuron outputs representing the randomized N nitrogen
hidden features of predictor Xi Ni nickel
hi (x) ith hidden neuron P phosphorus
m number of variables drawn randomly out of M data (RF) PC personal computer
r pearson correlation coefficient RF random forest
AI artificial intelligence RMSE root mean square error
ANN artificial neural network S sulphur
B boron SLFN single layer feed forward network
C ordinate axis intercept in MLR model SOM soil organic matter
Ca calcium SVM support vector machine
CEC cation exchange capacity T target matrix
Cl— chloride WI Willmott’s index
Cu copper Xi predictor variables
D dimension of predictor variables (ELM)/learning sample X (N × D) matrix of predictor variables in MLR model
(RF) Yi objective or output variable
ELM extreme learning machine Y (N × 1) matrix of the (single) objective variable in MLR model
ELM legates and McCabe’s index Yim is the ith observed value
ENS nash-sutcliffe efficiency coefficient Ȳ m is the mean observed value
FAOV food and agriculture organization of the United Nations Yip is the ith predicted value, and
Fe Iron Ȳ p is the mean predicted value
GSOV general statistics office of Vietnam Zn Zinc
H hidden layer output matrix output layer/output weight matrix
H+ Moore-Penrose generalized inverse function (+) optimal solution of linear equations (ELM)
ICO international coffee organization β1, …, β kvMLR coefficients deduced from the training dataset
K potassium L (x ) a SLFN with L hidden neurons
L number of hidden neurons (ai, bi , X ) approximation theorem in ELM
M number of predictors (RF)
smallholders, with 85% of all farms being under 1 ha and only 1% mineral elements are involved to a greater or lesser degree in the dif-
exceeding 5 ha (World Bank, 2004; Marsh, 2007). Coffee production ferent crop growth and development processes including seed germi-
costs vary across provinces according to weather conditions, soil ferti- nation, root and leaf development, photosynthesis, fruit set and de-
lity, and farm management practices (e.g., fertilisation, irrigation, velopment, fruit quality, and disease resistance. Accordingly,
pruning, etc.). The availability of soil fertility data on smallholder farms deficiencies in any one of these mineral nutrients can have a negative
throughout the coffee growing season can be limited for a number of impact on plant growth and lead to yield losses (Mitchell, 1988; Kuit
reasons (e.g., lack of financial resources, time constraints, etc.), al- et al., 2004; Winston et al., 2005).
though such data can provide important information regarding the Using modelling approaches ranging from statistical to process-
potential final yield and guide farmers towards better crop management based, coffee crop dynamics have been frequently modelled (Gutierrez
practices. et al., 1998; van Oijen et al., 2010; Rodríguez et al., 2011; Coltri et al.,
Soil fertility is a very dynamic process, influenced by a wide range 2015). Drawing on an understanding of the role of weather- and nu-
of factors including the proportion of nutrients, pH, organic matter trient-driven dynamics and constraints on a plant trophic level, and
content, cation exchange capacity (CEC), and soil texture and structure, based on individual plant growth processes (e.g., photosynthesis, re-
etc. Soils with a low pH have fewer binding sites for nutrients, and are spiration, assimilate partitioning, and phenology), process-based
therefore generally less fertile. Soil organic matter (SOM) increases the models can integrate these factors to predict coffee growth. However,
soil’s water holding capacity, aeration and infiltration, and soil CEC data collection for model parameterisation remains a major hurdle to
(the greater a soil’s CEC the more fertile the soil may be), reduces soil the successful application of such models at greater spatial scales
erosion, and supplies nutrients to plants. A high CEC also buffers the (district to region). Statistical-based modelling (integrated with data-
soil against rapid changes in pH. Most research sources (Mitchell, 1988; intelligent algorithms) offers a path to represent and understand real-
Kuit et al., 2004; Winston et al., 2005; Nair, 2010; Lambot et al., 2017) world agricultural production systems (Kouadio and Newlands, 2014),
recognise 17 key chemical elements or nutrients (divided in two groups: and provides some insights into the calibration of complex process-
non-mineral and mineral) as being essential to all crop plants. The non- based crop models.
mineral nutrients are carbon, hydrogen and oxygen, which under In the field of climate, water resources and energy studies, artificial
normal conditions are freely available to the plant from air and water. intelligence-based simulation models [e.g., artificial neural network
The mineral nutrients are divided into two groups: macronutrients (ANN), support vector machine (SVM), multivariate adaptive regression
[nitrogen (N), phosphorus (P), potassium (K), calcium (Ca), magnesium spline (MARS)] are increasingly being applied to extract predictive
(Mg) and sulphur (S)], and micronutrients [boron (B), chloride (Cl−), features embedded in historical input data (Jiang et al., 2004; Meena
copper (Cu), iron (Fe), manganese (Mn), molybdenum (Mo), nickel and Singh, 2013; Newlands et al., 2014; Deo and Şahin, 2015, 2016;
(Ni), and zinc (Zn)]. The macronutrients are required in greater Fieuzal et al., 2017; Barzegar et al., 2018; Fijani et al., 2019). Artificial
amounts than micronutrients for normal coffee plant growth. All intelligence models offer an alternative framework to the process-based
325
predictive models, and these models are easier to implement, less and the 1-dimensional Yi (i.e., objective variable = Robusta coffee
mathematically complex, do not require ‘initial condition’ parameter yield). For i = 1, 2… N, the SLFN with L hidden neurons, L (x ), is
values for the equations required in simulating the physical processes, expressed as (Huang et al., 2006a and b; Huang et al., 2015):
and also offer greater flexibility for their incorporation into an auto- i=L
mated decision-support systems (Santamouris et al., 1999; Elminir L (x ) = hi (x )· i = h (x )·
et al., 2008; Evrendilek et al., 2012; Goyal et al., 2014; Kavousi-Fard i=1 (1)
et al., 2014; Deo et al., 2017). Artificial intelligence approaches have
where i.e., [ 1, 2, , L is the output weight matrix computed be-
]T
recently been used, particularly in the field of agronomic sciences, for tween the hidden and the output neurons. h (x) i.e., [h1, h2, , hL], is the
example, in grain crop yield forecasting (Drummond et al., 2003; Jiang hidden neuron outputs representing the randomized hidden features of
et al., 2004; Meena and Singh, 2013; Pantazi et al., 2016; Fieuzal et al., predictor Xi, and hi (x) ‘ is the ith hidden neuron, given as (Huang et al.,
2017), selection of soil properties for salinity investigation (Ghorbani 2006a and b; Huang et al., 2015):
et al., 2018), cotton yield prediction (Ali et al., 2018) and soil moisture
forecasting (Prasad et al., 2018a and b). However, their applications in hi (x) = (ai, bi , X)andai Rd , bi R (2)
the prediction of coffee bean yield, as attempted in the present study, is
Therefore, the non-linear piecewise-continuous hidden layer activation
yet to be reported.
function hi (x) is defined using hidden neuron parameters (a, b) and
Accordingly, the purpose of this research study was to investigate
must satisfy the approximation theorem, (ai, bi , X ) . In the present
the design and application of artificial intelligence models [i.e., extreme
study, an optimal ELM model was achieved by comparing several
learning machine (ELM), multiple linear regression (MLR), and random
hidden layer activation functions serving in feature extraction (Deo
forest (RF)] to model the inherent relationships between soil fertility
et al., 2017a,b):
properties and Robusta coffee yield data collected for smallholder farms
in the Lam Dong province of Vietnam. To the best of the authors’ 2
Tangent Sigmoid (a , b , X ) = 1
knowledge, the present objective of demonstrating the capability of the 1+e 2( aX + b) (3.1)
ELM-based model to estimate Robusta coffee yield using soil fertility 1
data has never been reported in the literature. Furthermore, the use of Logarithmic Sigmoid (ai, bi , X ) =
1 + e ( aX + b) (3.2)
artificial intelligence, i.e., as an automation based on intelligent models,
can help design and improve decision-support systems for precision Hard Limit (a , b, X ) = 1if aX + b > 0 (3.3)
agriculture (e.g., (Nguyen-Huy et al., 2018a and b) and the overall
farming system decision-making process in the present big data era Triangular Basis (a , b , X ) = 1 | aX + b| if 1 aX + b
where field data are used to guide the operation and management of the 1, or0otherwise (3.4)
agricultural industry .
( aX + b)2]
Radial Basis (a , b , X ) = e [ (3.5)
2. Theory of machine learning models
The trained model’s approximation error must then be minimised
when solving for the weights connecting the hidden and output layer
2.1. Extreme learning machines
(β) using a least square method (Huang et al., 2006a and b; Huang
et al., 2015):
A new method employing a state-of-the-art single layer feed forward
network (SLFN) with input, feature and output spaces (Huang et al., min ||H T||2
L×m
R (4)
2006a), the generalized ELM model employs a least squares estimation
approach, relying on weights and biases to obtain a closed solution. where || || is the Frobenius norm, and H is the hidden layer output
This allows ELMs to solve regression problems in a relatively short matrix, given as (Huang et al., 2006a and b; Huang et al., 2015):
model run time and with greater accuracy than other AI learning
g (x1) g1 (a1 x1 + b1) gL (aL x1 + bL )
methods (e.g., ANN, SVM) (Huang et al., 2015). Assigning the most
suitable weights (and biases) in a random fashion, with outputs based H= =
on least-squares, the ELM applies the Moore-Penrose inverse function to g (xN ) g1 (aN xN + b1) gL (aL xN + bL) (5)
provide the solution, and thus avoids iterative training that tends to
Tis the target matrix, drawn from the training dataset, and given as
collapse to a local, rather than global, minimum (Huang et al., 2006b).
(Huang et al., 2006a and b; Huang et al., 2015):
ELM-based modelling proceeds in three stages: (i) hidden layer
weights and biases are randomly generated, (ii) inputs are passed tT1 t11 t1m
through hidden layer parameters to generate a hidden layer output T= =
matrix, and (iii) the hidden layer is inverted to the output matrix (viz., tTN tN 1 tNm
(6)
Moore-Penrose generalized inverse function). The outputs are de-
termined, and the product is computed in relation to the target. In the An optimal solution is then determined by solving a system of linear
final process, a system of linear equations is solved. Optimal hidden equations (Huang et al., 2006a and b; Huang et al., 2015):
neurons are typically identified through a trial-and-error process using = H+T (7)
cross-validation data. Advantages of ELM over other modelling systems
+
include: fast convergence, superior generalization, lack of local minima where H is the Moore-Penrose generalized inverse function (+).
issues, lesser over-fitting and no iterative tuning. In accordance with
universal approximation theory, the randomly initiated hidden neurons 2.2. Multiple linear regression
remain fixed. Consequently, the model is efficient in attaining a global
optimum, following universal approximation of a single layer feed- The comparison model, MLR, is a method used to analyse cause and
forward network (Huang et al., 2006a and b; Huang et al., 2015). effect relationships between the objective (y ≡ Yield) and the predictor
Drawing on the work of Huang et al. (2006a, b and 2015) the ELM variables (x ≡ soil organic matter, available potassium, boron, sulphur,
model employed in the present study was employed to train N data zinc, phosphorus, total nitrogen, exchangeable calcium, exchangeable
pairs with a set of D-dimensional Xi (i.e., predictors = soil organic magnesium and soil pH). With the capability to incorporate several
matter, available potassium, boron, sulphur, zinc, phosphorus, total input matrices with respect to the target matrix, this method seeks to
nitrogen, exchangeable calcium, exchangeable magnesium, and pH) explain variations in predictor data features using a set of fitted
326
multiple regression coefficients (Deo and Sahin, 2017). Random Forest (RF) tools are based on a learning algorithm relying on
For a set of N observations over D-dimensional predictors, the MLR the concept of model aggregation (Breiman, 2001; Prasad et al., 2018).
model takes the form of a regression equation (Draper and Smith, 1981; The RF aims to combine binary decision trees built with bootstrapped
Montgomery et al., 2012): samples from the learning sample D where a subset of explanatory
variables X have been chosen randomly at each node. In the RF algo-
Y=C+ 1 X1 + 2 Xs + + k Xk (8)
rithm, a combination of random trees (typically up to 2000) are grown
where C is the ordinate axis intercept, X (N × D) is a matrix of predictor based on the input data, where each tree is generated from a bootstrap
variables, Y (N × 1) is a matrix of the objective variable (i.e., yield, Y), sample, leaving about a third of the overall samples for validation, or
and β 1, …, β k represent the multiple linear regression coefficients “out-of-bag” predictions.
deduced from the training dataset (Civelekoglu et al., 2007; Şahin et al., Tree branching draws on a randomized subset of predictors at each
2013). node and the outcome is determined as the average of all trees
To attain an optimal MLR model, the magnitude of β must be esti- (Breiman, 2001; Cutler et al., 2007). Importantly, RF uses out of bag
mated through a least squares method (Apaydın et al., 1994; Ozdamar, samples to determine the model error associated with the independent
2004). The best equation having been determined from the training observations used to grow the tree. Therefore, no cross-validation data
dataset, it is then fitted to the test matrix by way of the coefficients and are required (Prasad et al., 2006).
the ordinate-intercept (Eq. (8)) to generate predictions in the testing In a step-by-step process, the RF algorithm is executed as:
phase. This is further detailed elsewhere (Draper and Smith, 1981;
Montgomery et al., 2012). If N represents the data cases in the training set, then a sample of
these cases is drawn randomly with replacement. These are used for
2.3. Random forest training (growing the original trees).
If there are M predictors (or inputs), RF will specify a number,
A family of ensemble-based data mining tools employed to generate m < M such that at each node, a total of m variables are drawn
accurate predictions without overfitting, the popular and efficient randomly out of the M data. The best split on these m variables is
Fig. 1. Location of the study area in Lam Dong Province, Vietnam. (Source: GADM database of Global administrative areas; http://www.gadm.org/).
327
adopted to split the node, whilst holding the value of m as a constant Robusta variety (D’haeze et al., 2005; GSOV, 2014). The main Robusta
as the forest continues to grow. coffee producing areas are located in the Central Highlands of Southern
During the execution of the RF algorithm, each resulting tree is Vietnam, encompassing the provinces of Dak Lak, Lam Dong, Dak
grown to its maximum extent without any pruning being applied to Nong, Gia Lai, and Kon Tum. These provinces account for > 90% of the
its structure. national Robusta coffee production, with Dak Lak and Lam Dong
New data are predicted by aggregating the predictions of n trees comprising 60–70% of the production (GSOV, 2014).
(i.e., the mean value is determined for the case of a regression During the 2013–2014 growing season, soil fertility parameters and
problem). coffee bean yield data were collected from 44 randomly-selected
smallholder farmers across five districts of Lam Dong province: Bao
In order to accurately model a given type of data with the RF al- Lam, Bao Loc, Di Linh, Duc Trong and Lam Ha (Fig. 1). The number of
gorithm, it is important to note that the out of bag estimate of error rate farmers surveyed by district ranged from 2 to 14 (Table S1). Overall,
can be quite accurate as long as a sufficient number of trees have been the area of the surveyed coffee farms ranged from 0.6 to 7.4 ha, with
grown (Bylander, 2002). 45% having an area ≤ 2 ha and 41% having an area between 2 and
5 ha. Only 14% of the farms were ≥ 5 ha. More than half of the sur-
veyed farms (57%) were located between 950 and 1015 m A.M.S.L; the
3. Materials and methods
remaining being located between 850 and 950 m A.M.S.L. Regarding
the age of coffee trees, only one farm was 7 years old. Coffee trees on
3.1. Study area and data
most farms (34) were between 21 and 27 years old; the remaining 9
farms were 11 to 20 years old. Two plantation densities were noted at
3.1.1. Study area
the surveyed farms: 1100 and 1000 plants ha−1 in 52% and 48% of
The total Vietnamese coffee production between 2009 and 2013
total farms, respectively (Table S1). Crop management practices such as
averaged 1.2 Tg y−1 (FAO, 2016), of which over 95% was of the
Fig. 2. Monthly averages of solar radiation, minimum and maximum temperatures and rainfall total during the 1981–2010 period in Lam Dong province, Vietnam
(a), and comparison between the provincial 30-year total rainfall (minimum, maximum and average) and the 2014 total rainfall in the five study districts (b).
1981–2010 rainfall data were used to calculate the 30-year values. Tmax and Tmin refer to maximum and minimum temperatures, respectively.
328
pruning, fungicide spraying, harvest, etc., were similar among the 3.2. Predictive model development
producing provinces. Differences among them are linked to the irriga-
tion technique (manual or sprinkler irrigation). However, the sampling All the models (ELM, RF and MLR) were developed in a Windows 10
was carried out in such a manner as to represent both techniques. platform using MATLAB sub-routines running on a PC equipped with an
Long periods of observed climate data (≥30 years) were not avail- Intel(R) core i7-4770 CPU 3.4 GHz. In order to predict Robusta Coffee
able for all districts, except Bao Loc where such data was available yield (Y) in smallholder farms, the ELM model drew on the patterns
given the presence of a synoptic climate station. Data from this station embedded within the k (≡10) lots of the soil fertility data matrix,
were therefore used to characterize climate conditions in Lam Dong Xk = [B, Ca, K, Mg, N, OM, P, pH, S, Zn] and its relationship with the
province. Mean monthly rainfall total at Lam Dong, calculated over the objective variable, Y (Table 1). The performances were compared
1981–2010 period, varied between 44 mm and 500 mm. Maximum against the RF and MLR models. To explore the relationship between
temperatures were normally well above 25 °C, and the average monthly each of the soil fertility properties and coffee yield data, a cross cor-
solar radiation ranged from 8 to 30 MJ m−2 (Fig. 2a). Comparing 2014 relation analysis between Xk and Y was performed using the training
rainfall totals in each district with 30 year average total rainfall at the data, where the value of the Pearson correlation coefficient (r) was
provincial level (Fig. 2b), showed 2014 to be a below-normal rainfall computed to measure the similarity between Xk and Y.
year for all five districts. Values of r for each of the soil parameter × yield combinations
(Table 1), show OM, S, and Zn to exhibit a relatively high correlation
with measured yield (0.241 ≤ r ≤ 0.266), followed by Mg (0.163), Ca
3.1.2. Soil sampling and soil analyses and P (0.142 ≤ r ≤ 0.144), and K (r = 0.132). This indicates that a
Soil sampling was carried out at the start of the cropping season predictive model for coffee yield is likely to extract optimal yield-in-
(October 2013) by farmers who were instructed beforehand by scien- fluencing parameters from OM, S, and Zn for this specific study region.
tists of Vietnam’s Western Highlands Agriculture and Forestry Science The pH, B and N contents of the soil showed the weakest correlation
Institute. Such soil sampling initiatives had been carried out over sev- with coffee yield (0.006 ≤ r ≤ 0.091).
eral previous cropping seasons, accordingly farmers were well trained Measured data were partitioned into training (56.8%, n = 25), va-
to perform them autonomously. Five soil types were found at the farms lidation (20.5%, n = 9) and testing (22.7%, n = 10) sets, and used in an
surveyed: sandy, sandy-clay and gravel-sandy soils in the Bao Lam and independent manner for model development, evaluation/selection and
Bao Loc districts; red loam soils in Di Linh district; and black clay in Duc testing purposes, respectively. Descriptive statistics of training, valida-
Trong and Lam Ha districts (Table S1). For each farm, 5 to 10 points tion and testing phase soil fertility model input parameters are provided
were selected homogenously across 2 diagonal lines across the field in Table 2.
(coffee fields are typically square or rectangular). At each point, the soil The ELM model was built with a three-layer neuronal arrangement
was sampled up to a depth of 0.30 m, this layer bearing over 80% of consisting of input-hidden-output nodes (Fig. 3a). The input layer
coffee roots (Kuit et al., 2004). All point samples were thoroughly contained each of the single and the various combined soil fertility
mixed together, and a 1 kg subsample was placed in a plastic bag or parameters (Table 3) and the output layer contained the predicted yield
box, labelled, and sent for laboratory analysis. The soil properties de- (Yp). In order to facilitate the best feature extraction from the predictor
termined included: SOM, total N, available P, K, B, Zn, S, exchangeable dataset, a number of neuronal activation functions were tested (e.g.,
Ca, Mg, and pH. Table 1 shows the soil fertility parameters used as sine, hard limit, radial basis, triangular basis, logarithmic sigmoid and
predictor variables for data obtained in Lam Dong Province, Vietnam. tangent sigmoid (Eqns . (3.1)–(3.5)) following our earlier approach
The SOM was determined according to the Walkley–Black method (Deo and Şahin, 2016; Deo et al., 2017; Barzegar et al., 2018; Fijani
(Walkley and Black, 1934; Allison et al., 1965). The total nitrogen was et al., 2019). The number of hidden neurons was varied from 1 to n + 1
determined through the Kjeldahl method using an automatic UDK 129 (in increments of 1), where n was the number of training/validation/
Kjeldahl Distillation Unit (VELP Scientifica, Italy). Available phos- testing data points in each predictor matrix. Based on the objective
phorus was extracted according to the Bray n 2 extraction method (Bray criterion applied, the MSE was monitored in each trial and each neu-
and Kurtz, 1945). Available potassium and zinc were extracted using ronal architecture was evaluated on a validation dataset (set as 20% in
0.1 N solutions of H2SO4 and HCl, respectively, then measured by this study). Table 3 shows the ELM model design parameters used in
atomic adsorption spectrophotometry. Extracted using Azomethine-H this study.
(Carrero et al., 2005) and CaCl2 solutions, respectively, available B and Since ELM requires random initializations of its hidden layer para-
S were measured spectrophotometrically. The exchangeable Ca and Mg meters, the model was executed 1000 times to achieve the best hidden
were extracted using ammonium acetate and determined by an ion neuronal arrangement. Based on the 1000 different ELM models, the
selective electrode method (Cheng et al., 1973). The pH was potentio- best architecture generating the smallest MSE in the independent va-
metrically measured in the supernatant suspension of a 1:2.5 soil:KCl lidation was selected (Table 4), and this model was run on the testing
solution (1 M) mixture. set to assess its performance. Since each of the prescribed soil fertility
parameters were expected to contribute to the predicted overall coffee
Table 1
Soil fertility parameters used as predictor variables for data obtained in Lam Dong Province, Vietnam, and Pearson correlation of Robusta coffee yield with respective
predictor variables.
Predictor Variable Model subscript abbreviation Units Range [mean] Pearson Coefficient (r)
Available Boron B mg kg−1 0.38–1.88 [0.75] 0.046

Exchangeable Calcium Ca Meq kg−1 0.37–3.59 [0.74] 0.144
Available Potassium K mg K2O kg−1 6.45–109.15 [16.89] 0.132
Exchangeable Magnesium Mg Meq kg−1 0.08–1.12 [0.21] 0.163
Total Nitrogen N mg kg−1 2.24–20.71 [6.64] 0.006
Organic Matter OM % 1.47–7.89 [3.99] 0.266
Available Phosphorus P mg P2O5 kg−1 0.54–29.8 [3.80] 0.142
pH pH – 3.62–4.26 [3.88] 0.091
Available Sulphur S mg kg−1 20.37–390.66 [182.81] 0.257
Available Zinc Zn mg kg−1 0.36–4.08 [1.02] 0.241
329
Table 2
Descriptive statistics of training, validation and testing period crop yield (Y) and the corresponding model input soil fertility parameters. Symbols and units as in
Table 1.
Statistics Y B Ca K Mg N OM P pH S Zn
Training
Mean 3744.77 0.84 0.69 15.84 0.21 7.17 4.30 3.31 3.92 240.83 1.26
Standard Deviation 742.04 0.31 0.33 5.98 0.14 2.40 1.03 5.78 0.14 103.42 0.85
Median 3800.00 0.72 0.53 13.89 0.18 6.72 4.36 1.51 3.90 247.89 1.00
Minimum 2173.91 0.38 0.37 8.38 0.08 3.92 2.34 0.54 3.68 72.43 0.37
Maximum 5555.56 1.88 1.69 31.70 0.79 12.32 6.25 29.80 4.26 390.66 4.08
Skewness 0.27 1.51 1.64 0.90 3.11 0.81 0.00 4.35 0.43 −0.09 1.73
Kurtosis 1.12 3.79 2.70 0.52 11.68 −0.15 −0.43 20.29 −0.34 −1.40 3.80
Validation
Mean 3286.98 0.67 0.66 11.89 0.18 5.39 3.01 3.83 3.82 98.84 0.62
Median 3428.57 0.68 0.56 11.38 0.17 6.16 3.09 1.20 3.84 89.04 0.54
Minimum 1500.00 0.44 0.41 7.31 0.08 2.24 1.47 0.86 3.62 20.37 0.36
Maximum 4500.00 0.79 1.04 20.58 0.26 7.84 4.48 14.57 4.01 205.39 1.34
Skewness −0.48 −1.49 0.74 1.06 −0.03 −0.68 −0.16 1.81 −0.06 0.52 2.11
Kurtosis −0.63 3.44 −1.30 1.52 −1.42 −1.30 −0.25 2.93 −1.64 −1.01 5.16
Testing
Mean 3650.35 0.62 0.95 24.00 0.26 6.44 4.08 5.02 3.87 113.32 0.75
Median 3416.67 0.58 0.71 15.87 0.17 4.76 3.66 1.26 3.84 81.89 0.58
Minimum 2666.67 0.46 0.40 6.45 0.10 2.80 1.47 0.58 3.74 32.19 0.37
Maximum 4736.84 1.03 3.59 109.15 1.12 20.71 7.89 29.20 4.05 260.08 1.46
Skewness 0.30 1.80 2.85 3.01 3.11 2.73 0.70 2.73 0.49 0.96 0.90
Kurtosis −1.13 3.52 8.52 9.29 9.77 7.97 0.51 7.78 −1.30 −0.47 −0.51
yield, a total of 18 optimal ELM models were developed. This gener- model performed better than the best MLR model.
ated:
3.3. Model performance evaluation
(a) a set of single soil fertility property input models
(ELMOM, ELMK, ELMCa, ELMMg, ELMB, ELMN, ELMS, ELMZn, ELMpH, ELMP ), To establish the accuracy of ELM, RF and MLR models applied to the
(b) a further eight combined soil fertility property models problem of coffee yield prediction, measured vs. predicted yield data in
(ELMOM S, ELMOM K , ELMOM S K , the test phase were compared using statistical performance metrics
ELMOM K Ca , ELMOM S K Ca , ELMOM S K Ca B, ELMOM S K Ca B Mg ), (Krause et al., 2005; Dawson et al., 2007). The RMSE, mean absolute
and error (MAE) and their normalized (%) equivalents, as well as the cor-
(c) an ELM model with all soil fertility properties (ELMall ). Table 4 relation coefficient (r), Willmott’s Index (WI), Nash Sutcliffe Coefficient
shows the optimal models with their neuronal arrangement, (ENS) and Legates and McCabes Index (ELM), were used, viz (Krause
Pearson correlation coefficient (r) and RMSE for the training phase. et al., 2005):
i=N
Serving as an objective comparison to the ELM model, the RF model RMSE =
1
(Yim Yip ) 2
employed a bootstrap aggregation (‘bagging’) approach to construct an N i=1 (9)
ensemble of decision trees to regress the exploratory and response re-
i=N
lationships (e.g., (Prasad et al., 2018a)) between training phase soil 1
MAE = |Yim Yi p|
fertility and the coffee yield data. A key parameter, the number of de- N (10)
i=1
cision trees was optimized through a trial of decision tree numbers from
50 to 1600 in two-fold increments. In the present case, the optimal RF RRMSE = 100·
RMSE
model was achieved with 800 trees, with ‘leaf’ set to 5 and ‘fboot’ set to Y¯ m (11)
1 (Table 3). MAE
In all RF models, the surrogate option was ‘on’ to allow the pre- RMAE = 100·
Y¯ m (12)
dictive measures of variable association to fill each decision tree aver-
aged over the split. Using a bootstrap replica of the predictor data in the i=N
i=1
[(Yim Y¯ m)·(Yip Y¯ p)]
training set, every tree was grown independently, and drawn according r= where, 1 r 1
i=N i=N
to the boot-strap principle of sampled data replacement. Observations i=1
(Yim Y¯ m) 2· i=1
(Yip Y¯ p) (13)
not included in this replica were termed “out of bag”. The error of
i=N
bagged ensembles was estimated by comparing each tree’s predictions i=1
(Yim Yip ) 2
ENS = 1
based on its out of bag observations averaged over the ensemble and the i=N
(Yim Y¯ m) 2
i=1 (14)
predicted out of bag response with true values. The split criterion and
number of decision splits for each predictor was found to be unique. i=N
(Yim Yip )2
i=1
Table 4 shows the optimal RF models (RFOM, RFOM S K , RFall ), WI = 1 i=N
where only three input combinations are shown for comparison with i=1
(|Yi p
Y¯ m| + |Yim Y¯ m|) 2 (15)
the ELM model. As an additional comparison, MLR models were de- i=N
veloped for the same predictors (MLROM , MLROM S K , MLRall (Tables 3 i=1
|Yim Yip |
ELM = 1
and 4). In terms of training performance, it is interesting to note that i=N
i=1
|Yim Y¯ m| (16)
the best RF model outperformed the best ELM model, but the best ELM
where N is the number of datum points in the test period (intended
330
Fig. 3. Schematic diagram of extreme learning machine (ELM) and random forest (RF) algorithms.
Table 3 sample; N = 20), Yim is the ith observed value, Ȳ m is the mean observed
Model design parameters for extreme learning machine (ELM), random forest value, Yip is the ith predicted value, andȲ p is the mean predicted value.
(RF) and multiple linear regression (MLR) models. A combination of performance metrics are often required to assess
Model Design Parameters State of parameters in study the overall model performance since one metric only emphasizes a
certain aspect of the error characteristics (Chai and Draxler, 2014).
ELM No. layers 3 [Input-Hidden-Output] Whilst the correlation coefficient (as well as correlation-based mea-
No. input neurons 1, 2, …, 18 [i.e., B, Ca, K, Mg, N, OM, P, pH, S, Zn
sures) and RMSE have been widely used as goodness-of-fit statistics,
and their relevant combinations]
No hidden neurons 1, 2, …, n + 1 (n = No. data for each input
these metrics are regarded as suboptimal measures of model accuracy
combination) given their oversensitivity to extreme values (outliers) and insensitivity
No. output neurons 1 (simulated yield, Y) to additive and proportional differences between model predictions and
Activation sigmoid, sine, hard limit, triangular basis, radial observations (Legates and McCabe, 1999; Willmott et al., 2012; Legates
Functions basis, tangent sigmoid, or logarithmic sigmoid
and McCabe, 2013). Statistical indices such as the normalized RMSE
RF No. trees 50, 100, 200, 400, 800, 1600 (optimal and MAE, WI, ENS, ELM are important in providing a consistent and
Leaf value = 800)
multi-faceted baseline comparison between different groups of models
fboot 5
Surrogate 1 (Willmott et al., 1985; Legates and McCabe, 1999; Willmott et al., 2012;
On; Sample with replacement Legates and McCabe, 2013; Chai and Draxler, 2014), as was the case in
MLR ordinate-intercept General Equation for Yield (YS):
the present study.
(b) YS = β1 X1 + β2 X2 + … + βN XN + b
Model Coefficient
(s) (βx) 4. Results
For the testing phase, the degree of agreement between measured

yield (Ym) and predicted yield (Yp) data was visually assessed by means
331
Table 4
Training phase single and combined predictor inputs (see Table 1), model structural parameters and training phase model accuracy assessment for ELM, RF and MLR
models, using the correlation coefficient (r) between measured and predicted yield, and root mean square error, RMSE; kg ha−1).
Model Training Performance
TypePredictor(s) Properties r RMSE
ELM models
Neuronal architecture
Input Hidden Output
ELMOM 1 2 1 0.271 753.42

ELMK 1 2 1 0.259 755.18
ELMCa 1 2 1 0.324 742.44
ELMMg 1 3 1 0.167 782.42
ELMB 1 1 1 0.150 779.18
ELMN 1 1 1 0.186 775.06
ELMS 1 2 1 0.138 784.42
ELMZn 1 1 1 0.098 782.66
ELMpH 1 1 1 0.046 786.24
ELMP 1 2 1 0.068 784.38
ELMOM·S 2 5 1 0.312 742.89
ELMOM·K 2 1 1 0.305 747.09
ELMOM·S·K 3 3 1 0.364 728.48
ELMOM·K·Ca 3 3 1 0.311 745.14
ELMOM·S·K·Ca 4 3 1 0.368 728.02
ELMOM·S·K·Ca·B 5 4 1 0.378 723.46
ELMOM·S·K·Ca·B·Mg 6 13 1 0.464 710.07
ELMall 10 9 1 0.492 681.32
RF models*Optimal Tree Bagger
C Np ED
RFOM OM 0.1008 2240 −0.2385 0.569 634.21
RFOM·S·K OM 0.0147 786 −0.2249 0.718 453.86
S 0.0145 720 −0.1245
K 0.0153 764 −0.0935
RFall OM 0.0346 678 −0.0543 0.868 658.95
K 0.0232 685 −0.1024
B 0.0191 699 −0.1542
S 0.0416 740 0.0292
Zn 0.0447 743 0.0476
P 0.0235 756 −0.0772
N 0.0559 675 −0.1428
Ca 0.0159 680 0.1459
Mg 0.0192 788
pH 0.0514 649 0.056 0.031
MLR models**
b β
MLROM OM 0.4823 0.1616 0.142 719.70
MLROM·S·K OM 0.2983 0.0546 0.210 504.90
S −0.0999
K −0.0345
(continued on next page)
332
Table 4 (continued)
Model Training Performance
TypePredictor(s) Properties r RMSE
ELM models
Neuronal architecture
Input Hidden Output
MLRall OM −0.1272 0.1967 0.784 561.97
K −0.0458
B 0.0865
S 0.4123
Zn 0.5568
P −0.4226
N 0.6468
Ca −0.1308
Mg 0.1111
pH 0.0179
* C, Delta criterion decision split; Np, No. of predictor split; ED, Permuted predicted delta error.
** b, is the ordinate intercept, and β values are regression coefficients for each of the ten predictors: OM, K, B, S, Zn, P, N, Ca, Mg, pH.
of a series of scatterplots (Fig. 4). Among the 18 ELM models, for each of determination (r2) and the linear equation of best fit Yp = aYm + b
model type, only the best performing (lowest RMSE, highest r) trained (with a = 1 and b = 0 for an ideal model) are also shown.
models were considered (Table 4). For each plot (Fig. 4), the coefficient A direct graphic comparison of the testing phase Ym and Yp revealed
Fig. 4. Scatterplot of the predicted (Yp) and measured (Ym) Robusta coffee yield within the testing phase (ELM: extreme learning machine; RF: random forest; MLR:
multiple linear regression).
333
the highest coefficient of determination (r2 ≈ 0.534) occurred for RFOM, RFOM–S–K , RFall and MLROM , MLROM–S–K , MLRall models are given
model ELMOM·S·K , combining the use of OM, available S and available K for comparison purposes (Table 5). It should be noted that, overall, the
as predictor variables (Fig. 4). For the case of the RF and MLR models, ranges of testing phase r and MAE values for the RF and MLR models
the best performing models were the single predictor models with OM were similar to those of the ELM models, with MAE for the latter being
(RFOM —r2 ≈ 0.423 and MLROM — r2 ≈ 0.350). Although the ELMOM slightly higher than for the former. For the RFOM–S–K and MLROM–S–K
model did not perform as well as the RFOM or MLROM models, the in- models RMSE > 1000 kg ha−1 in both cases, which far exceeded the
clusion of multiple inputs in the ELM model led to a significant im- value for the ELMOM·S·K model (RMSE = 496.35 kg ha−1). However, in
provement in the accuracy of its coffee yield prediction over the contrast to the ELM model, the optimal RF and MLR models were RFOM
equivalent RF and MLR models, highlighting the ELM model’s capacity (r = 0.592, RMSE = 560.87 kg ha−1, MAE = 498.99 kg ha−1) and
to outperform both RF and MLR models when a combination of soil MLROM (r = 0.633, RMSE = 581.42 kg ha−1, MAE = 516.26 kg ha−1).
fertility properties are employed. Also notable is the fact that, given the When normalized errors were assessed, the ELM models clearly out-
same multiple input combinations (OM, S, K), the RF and MLR models performed both RF and MLR models (RRMSE = 13.60% vs. 44.89% and
had a worse performance than their equivalent single input models 44.26%; RMAE = 7.91% vs. 14.09% and 14.34%; Table 5). Conse-
(r2 = 0.24 vs. 0.42, and r2 = 0.02 vs. 0.35, respectively; Fig. 4), in- quently, and in agreement with the literature (Mohammadi et al., 2015;
dicating the inability of these coffee yield prediction models to accu- Deo and Şahin, 2016), the present ELM model proved significantly more
rately capture predictive features from multiple parameter data. reliable and accurate in the prediction of coffee yield based on soil
Detailed assessments of testing phase model performance in terms of r, fertility properties, than RF or MLR models. Although it proved to not
along with RMSE and MAE (both expressed in kg ha−1) and their relative be the optimal set of soil fertility parameters, the ELMall model, using all
percentages (i.e., RRMSE and RMAE) are presented in Table 5. For all ELM ten soil fertility properties as model inputs, generated reasonably good
models evaluated, the results showed -0.471 ≤ r ≤ 0.737, results (RRMSE = 14.52% and RMAE = 7.49%).
465.12 ≤ RMSE ≤ 764.67 kg ha−1, and 304.72 ≤ MAE ≤ 612.40 kg ha−1. A comparison of ELM, RF and MLR model accuracy for three distinct
Weak correlation coefficients and relatively high RMSE and MAE values input combinations (OM, OM-S-K, and all ten soil fertility properties)
were generally found for models with a single predictor variable, but the using the robust normalised performance metrics of WI (Willmott,
primary role of SOM in determining yield was clear, given that this single 1984), ENS (Nash and Sutcliffe, 1970) and ELM (Nash and Sutcliffe,
soil parameter yielded the strongest correlations and lowest error values 1970; Willmott, 1984; Legates and McCabe, 1999) overcame some of
among single predictor models. Nevertheless, the RRMSE values of all the weaknesses of RMSE/MAE analysis, allowing a more in depth eva-
models, except ELMOM S K Ca B Mg , were < 20%. Taking the low mean luation of model performance (Table 5). Consistent with previous re-
square error for the ELM compared to that of the other two models, and in sults (Fig. 4), ELM stood out as the optimal predictive model for its
accordance with existing literature on data-driven models (Mohammadi ability to achieve significantly higher values of WI, ENS and ELM than
et al., 2015), this indicated that the ELM model was generally good, par- either the RF or MLR models (Table 5). Based on WI and ENS values
ticularly with respect to the low RRMSE compared to the literature. Asso- alone, across model types, models for prediction of coffee yield con-
ciated with the other metrics, such RRMSE values indicate a generally good structed around OM, S and K outperformed those constructed around
performance of the models. Overall, the optimal ELM model was OM alone, or all soil parameters. However, the ELM index was slightly
ELMOM S K , with ELMall and ELMOM yielding marginally worse testing higher for the ELMall (0.469) model than either the ELMOM S K (0.431)
period results, especially in terms of r, RMSE and MAE. or ELMOM (0.254) models. Since the ELM can be considered a better
Only the performance metrics of the evaluation metric than either WI or ENS (Legates and McCabe, 2013),
Table 5
Testing period evaluation of the models for Robusta coffee yield prediction with different soil fertility properties in terms of r, MAE, RMSE values including the
relative percentage errors, RRMSE and RMAE, Willmott’s Index (WI), Nash-Sutcliffe’s Coefficient (ENS) and Legates and McCabe Index (ELM). Optimal models are
indicated in blue (boldfaced).
Designated Model Model Testing Performance Metrics
r RMSE(kg ha−1) MAE(kg ha−1) RRMSE(%) RMAE(%) WI ENS ELM
ELMOM 0.670 509.84 427.91 13.97 11.80 0.9929 0.373 0.254

ELMK 0.499 560.81 514.93 15.36 14.01
ELMCa 0.500 570.22 528.08 15.62 14.57
ELMMg 0.486 610.21 567.40 16.72 16.04
ELMB 0.501 623.49 558.95 17.08 15.78
ELMN −0.003 648.42 583.30 17.76 16.38
ELMS 0.143 649.81 497.59 17.80 13.50
ELMZn −0.471 651.70 575.54 17.85 16.06
ELMpH −0.017 652.26 585.26 17.87 16.53
ELMP −0.231 698.29 612.40 19.13 17.71
ELMOM S 0.457 603.42 432.46 16.53 11.49
ELMOM K 0.700 468.19 401.49 12.83 10.58
ELMOM S K 0.737 496.35 326.40 13.60 7.91 0.9952 0.406 0.431
ELMOM K Ca 0.721 465.12 408.39 12.74 10.88
ELMOM S K Ca 0.700 537.79 343.16 14.73 8.24
ELMOM S K Ca B 0.657 517.91 362.62 14.19 9.06
ELMOM S K Ca B Mg 0.445 764.67 589.82 20.95 15.10
ELMall 0.626 529.92 304.72 14.52 7.49 0.9946 0.323 0.469
RFOM 0.592 560.87 498.99 15.36 14.09 0.1800 0.242 0.130
RFOM S K 0.229 1087.35 769.57 44.89 30.93 −0.2834 −0.001 −0.118
RFall 0.567 705.76 590.15 36.58 29.88 −0.1329 −2.319 −0.731
MLROM 0.633 581.42 516.26 15.93 14.34 0.2324 0.185 0.100
MLROM S K 0.507 1072.09 797.60 44.26 32.32 −0.3166 0.027 −0.159
MLRall 0.408 636.56 562.29 32.99 32.85 −3.4570 −1.700 -0.649
334
this indicates that the ten-predictor ELM is quite accurate, whereas the
equivalent RF and MLR models remain inferior. However, if the study’s
site-specificity is considered in conjunction with the absolute error
(Table 6), a lower error value and greater correlation between the
predicted and measured Robusta coffee yield can only be obtained by
the optimal combination of model inputs (i.e., OM,·S and·K). Although
the exact cause of the discrepancy in accuracy between the optimal
ELMOM S K and ELMall is not clear, considering the ELM alone, the ELM
model is more efficient in feature extraction and is marginally better
when the optimal input combination of OM,·S, and·K is used. Overall,
the normalised metrics clearly justify the ELM model as being the op-
timal predictive tool, in contrast with RF and MLR models which gen-
erally underestimated Robusta yields (Fig. 4; Table 5).
The distribution of predicted and observed Robusta coffee yields is
illustrated in Fig. 5. For ELM models, the median value of observed and
predicted coffee yield were slightly different, whereas for the RF and
MLR models the differences were much more noticeable (i.e.,
RFOM–S–K , RFall, MLROM–S–K , and MLRall ) (Table 6). Moreover, for ELM
models the upper and lower quartiles of the observed data showed only
small discrepancies, albeit the upper quartile appeared to be under-
predicted for all 3 ELM models: ELMOM, ELMOM–S–K , ELMall (Fig. 5). The
lower quartiles were nearly identical between observed and predicted
data for ELMOM–S–K , and ELMall . The discrepancies observed in the case
of RF- and MLR-predicted yield showed consistent yield under-predic-
tion.
As a further diagnostic tool, a Taylor diagram was employed (Fig. 6)
to collectively assess approximate representations of the developed
models for Robusta coffee yield estimation. The Taylor diagram is ap-
plied to quantify the degree of correspondence between predicted and
measured behaviour of coffee yield in the tested data in terms of three
primary statistics on a single diagram: the correlation coefficient, the
RMSE, and the standard deviation for ELMOM–S–K , RFOM and MLROM Fig. 5. Boxplots of the measured Robusta coffee yield compared with the pre-
models. Concurring with earlier results, the ELMOM–S–K model is located dicted coffee yield by the optimal ELM (i.e., ELMOM, ELMOM-S-K, ELMall), RF (i.e.,
much closer to the measured reference data when a combined visual RFOM, RFOM-S-K, RFall) and MLR (i.e., MLROM, MLROM-S-K, MLRall) in the testing
assessment of the statistics is made. In fact, the ranking of these models phase.
followed the order ELMOM–S–K , RFOM and MLROM , reaffirming our pre-
vious results. Taken together, the Taylor diagram provides compelling errors (Table 5) for the ELM (vs. RF and MLR) model, showed its su-
evidence that ELMOM–S–K model was able to provide a more accurate perior accuracy in predicting coffee yield. For example, the maximum
simulation of Robusta coffee yield in smallholder farms where several simulation error for the ELMOM–S–K model was 793.8 kg ha−1, com-
soil fertility indicators need to be screened with respect to their relative pared to 889.4 and 1011.1 kg ha−1 for the optimal MLROM and RFOM
contributions to the predicted yield. models. Similar deductions were drawn when the minimum, standard
The differences in distribution of error statistics in terms of quartile deviation and the different quartiles were analysed (Table 6). Two
p25, p50 (median), and p75, maximum, minimum, standard deviation, deductions can be made:
skewness, and kurtosis of predicted coffee yield data for the three op-
timum input combinations (OM, OM-S-K, and ‘all’) for all 3 algorithms i. Predicting Robusta coffee yield based on soil fertility properties
(Table 6), provide a complete picture of the model error variation (Chai requires the selection of the most relevant input variables by a
and Draxler, 2014). Although the overall median value of predicted careful and robust assessment of statistical dependence of input(s)
yield for the optimal testing phase RF and MLR models were like those and the target variable (Table 1); and
obtained with the optimal ELM model (Fig. 4), the lower performance ii. ELM models were more efficient than RF or MLR models in
Table 6
Descriptive statistics of model performance errors in the testing period (absolute values; kg ha−1) for three selected input combinations (OM, OM-S-K and all soil
properties). Model designations are as per Table 4.
Univariate distribution statistics for performance error Model and input parameters
ELM RF MLR
ELMOM ELMOM S K ELMall RFOM RFOM S K RFall MLROM MLROM S K MLRall
Lower Quartile: p25 100.0 100.6 13.7 296.2 507.8 902.4 284.9 562.4 899.2
Median: p50 188.7 150.7 87.9 473.6 811.4 1036.5 487.5 901.2 1433.0
Upper Quartile: p75 772.0 226.5 146.9 709.8 1608.7 1642.0 653.6 1570.5 1756.6
Maximum 1320.4 793.8 814.8 1011.1 2104.8 2359.5 889.4 2011.6 2662.1
Minimum 54.8 22.4 0.5 165.8 31.1 43.6 111.4 60.3 90.7
Standard Deviation 443.6 245.2 274.7 286.6 711.0 702.4 270.0 672.7 784.2
Skewness 1.1 1.7 1.8 0.4 0.2 0.2 0.1 0.2 −0.1
Kurtosis −0.1 2.3 2.5 −0.9 −1.4 −0.3 −1.0 −1.4 −0.3
335
Shepherd et al., 2002; Stockdale et al., 2002). When the process of

selecting and reducing variables is automated, the predictors chosen
might not highlight the biological assumptions. Our findings showed
the best performing model to be an ELM model with organic matter,
available potassium, and sulphur as predictors. Such a selection concurs
with the literature regarding the role of these nutrients in coffee growth
and bean yield.
Modelling the complex components and processes of the atmo-
sphere, soil, and farming practices, and their impacts on crop growth
depends upon a strong understanding of these processes. With the dif-
ferent challenges faced by the Vietnamese coffee industry (e.g., acid
soils, intensive farming practices and declining soil fertility, etc.), im-
proving crop nutrition practices and resource utilization efficiency on
coffee farms are critical for sustainable production in future years.
Through our methodology the pertinent soil fertility parameters for
yield estimation were identified. Although the non-selected nutrients
may still play a role in the overall coffee growth process, our findings
could, in part, help guide further investigations for improved fertiliza-
tion decision-making in smallholder coffee farms. In the present study,
other environmental conditions such as weather, fertilizer application,
and pest and disease impacts, which are also important in Robusta
coffee production and may have some influence on observed yields,
were not considered. Using such variables as predictors for coffee yield
Fig. 6. Taylor diagram depicting the standard deviation (SD) and the correla-
might improve the overall performance of the selected models.
tion coefficient of the measured Robusta coffee yield with respect to predicted
A variety of machine learning approaches are being used in agri-
yield, applying in the testing phase the optimum models ELMOM S K , RFOM and
MLROM.
culture for purposes including growth modelling and yield prediction
(Drummond et al., 2003; Kaul et al., 2005; Görgens et al., 2015; Pantazi
et al., 2016; Fieuzal et al., 2017), crop pest prediction (Kim et al.,
extracting the relationships between soil fertility properties and 2014), modelling of groundwater level changes (Sahoo et al., 2017),
coffee yields, and more reliable in predicting coffee yield using optimization of irrigation strategies (Cheviron et al., 2016), precision
several inputs. The RFOM–S–K , RFall, MLROM–S–K , and MLRall models agriculture (Dimitriadis and Goumopoulos, 2008), or crop area map-
were relative failures. ping (Chemura and Mutanga, 2017; Sonobe et al., 2017). For instance,
Pantazi et al. (2016), working in the UK, found an average overall ac-
The performance indicators for models including soil OM as a pre- curacy of 78–82% when using ANNs to predict wheat (Triticum æstivum
dictor (alone or in combination) indicate this soil fertility parameter to L.) yield and classifying the field area into different yield potential
have been an important determinant in coffee yield estimation. Out of zones based on multi-layer soil data and remotely sensed crop growth
10 soil fertility parameters used as predictors with the ELM model, characteristics. Our findings reveal that an ELM model constructed with
three (OM, S and K) proved relevant to estimating Robusta coffee yield, organic matter, available potassium and available sulphur as the pre-
indicating that a proper selection of variables tends to achieve better dictor data generated the most accurate yield (relative RMSE and MAE
simulation accuracy. being 13.6% and 7.9%, respectively) compared to MLR and RF models.
In spite of the superior performance of the ELM model, there remain
5. Discussion some limitations in this study that indicates a need for further tests.
With further studies involving data from different coffee producing
Biophysical modelling in farming systems requires a precise provinces, the ability of the ELM to rigorously select the pertinent soil
knowledge of the different processes involved within the soil-plant-at- fertility parameters for yield estimation will help elucidate such em-
mosphere continuum that act to generate an optimal crop yield. Owing pirical relationships (e.g., penalization function), as relevant to the
to the complexity and parameterisation issues found in crop growth biophysical modelling of Robusta coffee yield at larger spatial scales.
simulation models, statistical analytics built within a data intelligent Also, the ELM model needs to be tested for both smallholder and larger
system can be utilised as a predictive approach to generate optimal farms, with a diverse range of conditions. For example, a committee of
yield through a set of carefully selected soil fertility properties. Soil ELM models with such diverse inputs should be tested to validate its
fertility is the result of physical, chemical and biological processes in- usage elsewhere. As the conditions could vary among the different
teracting together to alter crop growth (Stockdale et al., 2002). Various farms and datasets, a random sampling approach leading to an en-
studies dealing with the impact of agrochemicals, liming and mulches semble ELM model may be able to further elucidate uncertainties in
on soil fertility parameters and yield have shown the importance of soil predictions, and lead to more confident predictions with statistical error
nutrients in achieving expected yields in coffee farms (Njoroge, 2001; bounds.
Van Der Vossen, 2005; Paulo and Furlani, 2010; Chemura, 2014;
PVFCCo, 2016). For instance, N, P and K are essential for optimum 6. Conclusions
vegetative growth, cherry filling and in increasing the tree’s tolerance
to diseases; calcium is needed to ensure good root and leaf growth, The utility of ELM as a robust data-driven tool to analyse predictive
whereas zinc and boron are important at the flowering stage to improve features in soil fertility data required to optimise Robusta coffee yield in
berry set and overall coffee yield potential (Kuit et al., 2004). Any smallholder farms in Vietnam was evaluated. The ELM, a novel machine
sulphur deficiency would impair basic plant metabolic functions, thus learning tool for addressing complex and ill-defined problems, was
reducing both crop yield and quality (Kovar and Grant, 2011). The developed using 10 sets of soil fertility data (organic matter, available
SOM, which contains most of the soil’s reserve of nitrogen and large potassium, boron, sulphur, zinc, phosphorus, total nitrogen, ex-
portions of phosphorus and sulphur, plays a vital role in terms of nu- changeable calcium, magnesium and soil pH) as predictor variables,
trient availability and absorption by plants (Berry et al., 2002; while coffee yield (Y) was the objective variable. Our findings reveal
336
that ELM models were more efficient than RF or MLR models in ex- Deo, R.C., Tiwari, M.K., Adamowski, J.F., Quilty, M.J., 2017a. Forecasting effective
tracting the features between soil fertility properties and coffee yields, drought index using a wavelet extreme learning machine (W-ELM) model. Stoch.
Environ. Res. Risk Assess. 31 (5), 1211–1240.
and more reliable for predicting coffee yield using several inputs. This Deo, R.C., Downs, N., Parisi, A., Adamowski, J., Quilty, J., 2017b. Very short-term re-
study confirmed the potential utility of coupling artificial intelligence active forecasting of the solar ultraviolet index using an extreme learning machine
algorithms with biophysical-crop models in decision-support systems integrated with the solar zenith angle. Environ. Res. 155, 141–166.
Deo, R.C., Sahin, M., 2017. Forecasting long-term global solar radiation with an ANN
that implement precision agriculture, to purposely improve yield in algorithm coupled with satellite-derived (MODIS) land surface temperature (LST) for
smallholder farms based on a set of carefully screened soil fertility regional locations in Queensland. Renew. Sustain. Energy Rev. 72, 828–848.
datasets. Deo, R.C., Şahin, M., 2015. Application of the artificial neural network model for pre-
diction of monthly standardized precipitation and evapotranspiration index using
hydrometeorological parameters and climate indices in eastern Australia. Atmos. Res.
Acknowledgements 161–162, 65–81.
Deo, R.C., Şahin, M., 2016. An extreme learning machine model for the simulation of
monthly mean streamflow water level in eastern Queensland. Environ. Monit. Assess.
We would like to thank the University of Southern Queensland,
188, 90.
Australia for the support though the University’s Strategic Research Dimitriadis, S., Goumopoulos, C., 2008. Applying machine learning to extract new
Fund initiative. We are also thankful to the ECOM’s (Vietnam) field knowledge in precision pgriculture applications, 2008 Panhellenic Conference on
survey team for their assistance in collecting and processing field data. Informatics, pp. 100–104.
Draper, N., Smith, H., 1981. Applied regression analysis, 709 pp. John Wiley, New York.
We thank both anonymous reviewers and the journal Editor for their Drummond, S.T., Sudduth, K.A., Joshi, A., Birrell, S.J., Kitchen, N.R., 2003. Statistical and
constructive comments. neural methods for site–specific yield prediction. Trans. ASAE 46, 5–14.
Elminir, H.K., Azzam, Y.A., Riad, A., 2008. Testing the applicability of artificial in-
telligence techniques to the subject of erythemal ultraviolet solar radiation. Part two:
Appendix A. Supplementary material an intelligent system based on multi-classifier technique. J. Photochem. Photobiol., B
90, 198–206.
Supplementary data to this article can be found online at https:// Evrendilek, F., Karakaya, N., Gungor, K., Aslan, G., 2012. Satellite-based and mesoscale
regression modeling of monthly air and soil temperatures over complex terrain in
doi.org/10.1016/j.compag.2018.10.014. Turkey. Expert Syst. Appl. 39, 2059–2066.
FAO, 2016. FAOSTAT, Crops. National Production, FAO, Rome, Italy.
References Fieuzal, R., Marais Sicre, C., Baup, F., 2017. Estimation of corn yield using multi-temporal
optical and radar satellite data and artificial neural networks. Int. J. Appl. Earth
Obser. Geoinform. 57, 14–23.
Allison, L.E., Bollen, W.B., Moodie, C.D., 1965. Total Carbon, In: Norman, A.G. (Ed.), Fijani, E., Barzegar, R., Deo, R., Tziritis, E., Konstantinos, S., 2019. Design and im-
Methods of Soil Analysis. Part 2. Chemical and Microbiological Properties. American plementation of a hybrid model based on two-layer decomposition method coupled
Society of Agronomy, Soil Science Society of America, Madison, WI, pp. 1346–1366. with extreme learning machines to support real-time environmental monitoring of
Apaydın, A., Kutsal, A., Atakan, C., Ankara, 1994. Uygulamalı İstatistik [Statistics in water quality parameters. Sci. Total Environ. 648, 839–853.
Practice], Baran Ofset, Ankara, Turkey. Ghorbani, M.A., Deo, R.C., Kashani, M.-H., Shahabi, M., Ghorbani, S., 2018. Fast and
Ali, M., Deo, R.C., Downs, N., Maraseni, T., 2018. Cotton yield prediction with Markov efficient hybrid approach for spatial modelling of soil electrical conductivity. Soil
Chain Monte Carlo-based simulation model integrated with genetic programing al- Tillage Res. (in press).
gorithm: A new hybrid copula-driven approach. Agric. Forest Meteorol. 263, Görgens, E.B., Montaghi, A., Rodriguez, L.C.E., 2015. A performance comparison of
428–443. machine learning methods to estimate the fast-growing forest plantation yield based
Barzegar, R., Moghaddam, A.A., Deo, R., Fijani, E., Tziritis, E., 2018. Mapping ground- on laser scanning metrics. Comput. Electron. Agric. 116, 221–227.
water contamination risk of multiple aquifers using multi-model ensemble of ma- Goyal, M.K., Bharti, B., Quilty, J., Adamowski, J., Pandey, A., 2014. Modeling of daily
chine learning algorithms. Sci. Total Environ. 621, 697–712. pan evaporation in sub tropical climates using ANN, LS-SVR, Fuzzy Logic, and ANFIS.
Berry, P.M., Sylvester-Bradley, R., Philipps, L., Hatch, D.J., Cuttle, S.P., Rayns, F.W., Expert Syst. Appl. 41, 5267–5276.
Gosling, P., 2002. Is the productivity of organic farms restricted by the supply of GSOV, 2014. Table 196. Production of main perennial crops. In: Statistical Yearbook of
available nitrogen? Soil Use Manage. 18, 248–255. Vietnam 2014. Statistical Documentation and Service Centre, General Statistics Office
Bray, R.H., Kurtz, L.T., 1945. Determination of total, organic, and available forms of of Vietnam, Hanoi Vietnam p. 450 http://www.gso.gov.vn/default_en.
phosphorus in soils. Soil Sci. 59, 39–45. aspx?tabid=515&idmid=5&ItemI (accessed 14.07.2017).
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Gutierrez, A.P., Villacorta, A., Cure, J.R., Ellis, C.K., 1998. Tritrophic analysis of the
Bylander, T., 2002. Estimating generalization error on two-class datasets using out-of-bag coffee (Coffea arabica) - coffee berry borer [Hypothenemus hampei (Ferrari)] - para-
estimates. Mach. Learn. 48, 287–297. sitoid system. Anais da Sociedade Entomológica do Brasil 27, 357–385.
Carrero, P., Malavé, A., Rojas, E., Rondón, C., de Peña, Y.P., Burguera, J.L., Burguera, M., Huang, G.-B., Chen, L., Siew, C.K., 2006a. Universal approximation using incremental
2005. On-line generation and hydrolysis of methyl borate for the spectrophotometric constructive feedforward networks with random hidden nodes. IEEE Trans. Neural
determination of boron in soil and plants with azomethine-H. Talanta 68, 374–381. Netw. 17, 879–892.
Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error Huang, G.-B., Zhu, Q.-Y., Siew, C.-K., 2006b. Extreme learning machine: theory and ap-
(MAE)? – arguments against avoiding RMSE in the literature. Geoscient. Model Devel. plications. Neurocomputing 70, 489–501.
7, 1247–1250. Huang, G., Huang, G.-B., Song, S., You, K., 2015. Trends in extreme learning machines: a
Chemura, A., 2014. The growth response of coffee (Coffea arabica L) plants to organic review. Neural Networks 61, 32–48.
manure, inorganic fertilizers and integrated soil fertility management under different ICO, 2016. Historical data on the global coffee trade. International Coffee Organization
irrigation water supply levels. Int. J. Recycling Org. Waste Agric. 3, 59. (ICO), London, UK. http://www.ico.org/new_historical.asp (accessed 14.07.2017).
Chemura, A., Mutanga, O., 2017. Developing detailed age-specific thematic maps for Jiang, D., Yang, X., Clinton, N., Wang, N., 2004. An artificial neural network model for
coffee (Coffea arabica L.) in heterogeneous agricultural landscapes using random estimating crop yields using remotely sensed information. Int. J. Remote Sens. 25,
forests applied on Landsat 8 multispectral sensor. Geocarto Int. 32, 759–776. 1723–1732.
Cheng, K.L., Hung, J.-C., Prager, D.H., 1973. Determination of exchangeable calcium and Kaul, M., Hill, R.L., Walthall, C., 2005. Artificial neural networks for corn and soybean
magnesium in soil by ion-selective electrode method. Microchem. J. 18, 256–261. yield prediction. Agric. Syst. 85, 1–18.
Cheviron, B., Vervoort, R.W., Albasha, R., Dairon, R., Le Priol, C., Mailhol, J.-C., 2016. A Kavousi-Fard, A., Samet, H., Marzbani, F., 2014. A new hybrid modified firefly algorithm
framework to use crop models for multi-objective constrained optimization of irri- and support vector regression model for accurate short term load forecasting. Expert
gation strategies. Environ. Modell. Soft. 86, 145–157. Syst. Appl. 41, 6047–6056.
Civelekoglu, G., Yigit, N., Diamadopoulos, E., Kitis, M., 2007. Prediction of bromate Kim, Y.H., Yoo, S.J., Gu, Y.H., Lim, J.H., Han, D., Baik, S.W., 2014. Crop pests prediction
formation using multi-linear regression and artificial neural networks. Ozone Sci. method using regression and machine learning technology: survey. IERI Procedia 6,
Eng. 29, 353–362. 52–56.
Coltri, P.P., Zullo Junior, J., Dubreuil, V., Ramirez, G.M., Pinto, H.S., Coral, G., Lazarim, Kouadio, L., Newlands, N.K., 2014. Data hungry models in a food hungry world - an
C.G., 2015. Empirical models to predict LAI and aboveground biomass of Coffea interdisciplinary challenge bridged by statistics. In: Lawless, J.F. (Ed.), Statistics in
arabica under full sun and shaded plantation: a case study of South of Minas Gerais, Action: A Canadian Outlook. CRC Press, Taylor and Francis Group, New York, U.S.A.,
Brazil. Agroforest. Syst. 1–16. pp. 371–385.
Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., Kovar, J.L., Grant, C.A., 2011. Nutrient cycling in soils: Sulfur. In: Hatfield, J.L., Sauer,
2007. Random forests for classification in ecology. Ecology 88, 2783–2792. T.J. (Eds.), Soil Management: Building a Stable Base for Agriculture. Soil Science
D’haeze, D., Deckers, J., Raes, D., Phong, T.A., Loi, H.V., 2005. Environmental and socio- Society of America, Madison, WI, pp. 103–115.
economic impacts of institutional reforms on the agricultural sector of Vietnam: Land Krause, P., Boyle, D., Bäse, F., 2005. Comparison of different efficiency criteria for hy-
suitability assessment for Robusta coffee in the Dak Gan region. Agric. Ecosyst. drological model assessment. Adv. Geosci. 5, 89–97.
Environ. 105, 59–76. Kuit, M., Jansen, D.M., Thiet, N.V., 2004. Coffee handbook. Manual for Arabica culti-
Dawson, C.W., Abrahart, R.J., See, L.M., 2007. HydroTest: a web-based toolbox of eva- vation. Tan Lam Agricultural Product Joint Stock Company, Vietnam. 219 pp.
luation metrics for the standardised assessment of hydrological forecasts. Environ. https://bootcoffee.com/wp-content/uploads/2015/04/manual-for-arabica-cultiva-
Modell. Soft. 22, 1034–1052. tion-vs.pdf (accessed 14.03.2018).
337
Lambot, C., Herrera, J.C., Bertrand, B., Sadeghian, S., Benavides, P., Gaitán, A., 2017. Prasad, R., Deo, R.C., Li, Y., Maraseni, T., 2018a. Soil moisture forecasting by a hybrid
Cultivating coffee quality — Terroir and agro-ecosystem. In: Folmer, B. (Ed.), The machine learning technique: ELM integrated with ensemble empirical mode de-
Craft and Science of Coffee. Academic Press, pp. 17–49. composition. Geoderma 330, 136–161.
Legates, D.R., McCabe, G.J., 1999. Evaluating the use of “goodness-of-fit” measures in Prasad, R., Deo, R.C., Li, Y., Maraseni, T., 2018b. Ensemble committee-based data in-
hydrologic and hydroclimatic model validation. Water Resour. Res. 35, 233–241. telligent approach for generating soil moisture forecasts with multivariate hydro-
Legates, D.R., McCabe, G.J., 2013. A refined index of model performance: a rejoinder. Int. meteorological predictors. Soil Tillage Res. 181, 63–81.
J. Climatol. 33, 1053–1056. PVFCCo, 2016. Polyhalite application improves coffee (Coffea robusta) yield and quality
Marsh, A., 2007. Diversification by smallholder farmers: Vietnam Robusta Coffee, in Vietnam. Internation Potash Institute Research Findings: e-ifc No 47, December
Agricultural management, marketing and finance working document 19. Italy, FAO, 2016. pp 12–19. https://www.ipipotash.org/en/eifc/2016/47/2/english (accessed
Rome, pp. 50. 14.07.2017).
Meena, M., Singh, P.K., 2013. Crop yield forecasting using neural networks. In: Panigrahi, Rodríguez, D., Cure, J.R., Cotes, J.M., Gutierrez, A.P., Cantor, F., 2011. A coffee agroe-
B.K., Suganthan, P.N., Das, S., Dash, S.S. (Eds.), Swarm, Evolutionary, and Memetic cosystem model: I. Growth and development of the coffee plant. Ecol. Model. 222,
Computing: 4th International Conference, SEMCCO 2013, Chennai, India, December 3626–3639.
19–21, 2013, Proceedings, Part II. Springer International Publishing, Cham, pp. Şahin, M., Kaya, Y., Uyar, M., 2013. Comparison of ANN and MLR models for estimating
319–331. solar radiation in Turkey using NOAA/AVHRR data. Adv. Space Res. 51, 891–904.
Mitchell, H.W., 1988. Cultivation and harvesting of the Arabica coffee tree. In: Clarke, Sahoo, S., Russo, T.A., Elliott, J., Foster, I., 2017. Machine learning algorithms for
R.J. (Ed.), Coffee: Agronomy. Elsevier Applied Science, New York, USA. modeling groundwater level changes in agricultural regions of the U.S. Water Resour.
Mohammadi, K., Shamshirband, S., Tong, C.W., Arif, M., Petković, D., Ch, S., 2015. A new Res. 53, 3878–3895.
hybrid support vector machine–wavelet transform approach for estimation of hor- Santamouris, M., Mihalakakou, G., Psiloglou, B., Eftaxias, G., Asimakopoulos, D., 1999.
izontal global solar radiation. Energy Convers. Manage. 92, 162–171. Modeling the global solar radiation on the Earth's surface using atmospheric de-
Montgomery, D.C., Peck, E.A., Vining, G.G., 2012. Introduction to linear regression terministic and intelligent data-driven techniques. J. Clim. 12, 3105–3116.
analysis. John Wiley & Sons. Shepherd, M.A., Harrison, R., Webb, J., 2002. Managing soil organic matter – implica-
Nair, K.P.P., 2010. Coffee. In: Nair, K.P.P. (Ed.), The agronomy and economy of important tions for soil structure on organic farms. Soil Use Manage. 18, 284–292.
tree crops of the Developing World. Elsevier, London, pp. 181–208. Sonobe, R., Tani, H., Wang, X., 2017. An experimental comparison between KELM and
Nash, J., Sutcliffe, J., 1970. River flow forecasting through conceptual models part I—a CART for crop classification using Landsat-8 OLI data. Geocarto Int. 32, 128–138.
discussion of principles. J. Hydrol. 10, 282–290. Stockdale, E.A., Shepherd, M.A., Fortune, S., Cuttle, S.P., 2002. Soil fertility in organic
Newlands, N.K., Zamar, D.S., Kouadio, L.A., Zhang, Y., Chipanshi, A., Potgieter, A., Toure, farming systems – fundamentally different? Soil Use Manage. 18, 301–308.
S., Hill, H.S.J., 2014. An integrated, probabilistic model for improved seasonal Van Der Vossen, H.A.M., 2005. A critical analysis of the agronomic and economic sus-
forecasting of agricultural crop yield under environmental uncertainty. Front. tainability of organic coffee production. Exp. Agric. 41, 449–473.
Environ. Sci. 2. https://doi.org/10.3389/fenvs.2014.00017. van Oijen, M., Dauzat, J., Harmand, J.-M., Lawson, G., Vaast, P., 2010. Coffee agrofor-
Nguyen-Huy, T., Deo, R.C., Mushtaq, S., Kath, J., Khan, S., 2018a. Copula-based agri- estry systems in Central America: II. Development of a simple process-based model
cultural conditional value-at-risk modelling for geographical diversifications in and preliminary results. Agroforest. Syst. 80, 361–378.
wheat farming portfolio management. Weather Clim. Extremes 21, 75–78. Walkley, A., Black, I.A., 1934. An examination of Degtjareff method for determining soil
Nguyen-Huy, T., Deo, R.C., Mushtaq, S., An-Vo, D.-A., Khan, S., 2018b. Modeling the joint organic matter and a proposed modification of the chromic acid titration method. Soil
influence of multiple synoptic-scale, climate mode indices on Australian wheat yield Sci. 37, 29–37.
using a vine copula-based approach. Eur. J. Agronomy 98, 65–81. Willmott, C.J., 1984. On the evaluation of model performance in physical geography,
Njoroge, J.M., 2001. Advances in coffee agronomy. In Proceedings of the International Spatial Statistics and Models, Springer, pp. 443–460.
Scientific Symposium on Coffee December 4, 2000, CBI-CCRI Bangalore India. Willmott, C.J., Ackleson, S.G., Davis, R.E., Feddema, J.J., Klink, K.M., Legates, D.R.,
104–119. O'Donnell, J., Rowe, C.M., 1985. Statistics for the evaluation and comparison of
Ozdamar, K., 2004. Paket Programlar ile Istatistiksel veri Analizi. [Statistical Data models. J. Geophys. Res. Oceans 90, 8995–9005.
Analysis with Software Packages]. Pegem, Eskişehir, Turkey. Willmott, C.J., Robeson, S.M., Matsuura, K., 2012. A refined index of model performance.
Pantazi, X.E., Moshou, D., Alexandridis, T., Whetton, R.L., Mouazen, A.M., 2016. Wheat Int. J. Climatol. 32, 2088–2094.
yield prediction using machine learning and advanced sensing techniques. Comput. Winston, E., Op de Laak, J., Marsh, T., Lempke, H., Chapman, K., 2005. Arabica coffee
Electron. Agric. 121, 57–65. manual for Lao-PDR. FAO Regional Office for Asia and the Pacific, Bangkok,
Paulo, E.M., Furlani Jr., E., 2010. Yield performance and leaf nutrient levels of coffee Thailand. http://www.fao.org/docrep/008/ae939e/ae939e06.htm (accessed 14.03.
cultivars under different plant densities. Scientia Agricola 67, 720–726. 2018).
Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classification and regression tree World Bank, 2004. The socialist republic of Viet Nam – coffee sector report. World Bank,
techniques: bagging and random forests for ecological prediction. Ecosystems 9, Hanoi.
181–199.
338

Computers and Electronics in Agriculture: Original Papers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computers and Electronics in Agriculture: Original Papers

Uploaded by

Copyright:

Available Formats

Computers and Electronics in Agriculture 155 (2018) 324–338

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture

Artificial intelligence approach for the prediction of Robusta coffee yield T

Shahbaz Mushtaqa, Van Phuong Nguyend

ARTICLE INFO ABSTRACT

Nomenclature MAE mean absolute error

Available Boron B mg kg−1 0.38–1.88 [0.75] 0.046

For the testing phase, the degree of agreement between measured

TypePredictor(s) Properties r RMSE

Input Hidden Output

ELMOM 1 2 1 0.271 753.42

RF models*Optimal Tree Bagger

RFOM OM 0.1008 2240 −0.2385 0.569 634.21

RFOM·S·K OM 0.0147 786 −0.2249 0.718 453.86

S 0.0145 720 −0.1245

K 0.0153 764 −0.0935

RFall OM 0.0346 678 −0.0543 0.868 658.95

K 0.0232 685 −0.1024

B 0.0191 699 −0.1542

S 0.0416 740 0.0292

Zn 0.0447 743 0.0476

P 0.0235 756 −0.0772

N 0.0559 675 −0.1428

Ca 0.0159 680 0.1459

pH 0.0514 649 0.056 0.031

MLROM·S·K OM 0.2983 0.0546 0.210 504.90

(continued on next page)

Model Training Performance

TypePredictor(s) Properties r RMSE

Input Hidden Output

MLRall OM −0.1272 0.1967 0.784 561.97

r RMSE(kg ha−1) MAE(kg ha−1) RRMSE(%) RMAE(%) WI ENS ELM

ELMOM 0.670 509.84 427.91 13.97 11.80 0.9929 0.373 0.254

ELMOM ELMOM S K ELMall RFOM RFOM S K RFall MLROM MLROM S K MLRall

Shepherd et al., 2002; Stockdale et al., 2002). When the process of

You might also like