Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Computers and Electronics in Agriculture 181 (2021) 105945

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture


journal homepage: www.elsevier.com/locate/compag

Original papers

Predicting the sugarcane yield in real-time by harvester engine parameters


and machine learning approaches
Leonardo Felipe Maldaner *, Lucas de Paula Corrêdo , Tatiana Fernanda Canata ,
José Paulo Molin
Precision Agriculture Laboratory, Biosystems Engineering Department, University of São Paulo, Av. Pádua Dias 11, 13418-900 Piracicaba, São Paulo, Brazil

A R T I C L E I N F O A B S T R A C T

Keywords: Information regarding sugarcane yield is the starting point for the application of Precision Agriculture (PA)
Artificial intelligence strategies to the management of sugarcane fields. Over the years, many sugarcane yield monitoring approaches
CAN data have been developed and tested, but its use is still limited. One option is to use data generated by the sugarcane
Precision agriculture
harvester provided in the controller area network (CAN), which allows getting the messages related to the
Yield monitor
harvester operation. Our goal was to train and to test random forest (RF), multiple linear regression (MLR), and
artificial neural network (ANN) using CAN data available on the on-board computer of the harvester. All pre­
dictive models were trained and tested using the main engine parameters: fuel consumption, engine rotation,
engine power, and specific fuel consumption (SFC) as data input. A scale yield monitor was installed on the
harvester elevator to provide the georeferenced yield data. The yield prediction accuracy of the predictive
models using the four parameters was compared. All models followed the observed sugarcane yield and
recognized the changes over the time of data collection. The MLR and ANN models have a higher error of
prediction in the extreme values of yield. Over the testing time, the ANN underestimated the sugarcane yield, and
the RF prediction by the engine parameters of the harvester was coming closest to the observed yield. The best
results for predicting sugarcane yield by engine parameters were obtained from the RF model with a mean
absolute percent error (MAPE) of 5.6%, and root mean square error (RMSE) of 7.0 Mg ha− 1. The MLR and ANN
models have a MAPE of 7.8% and 5.6%, respectively. All the variables were necessary for a better performance of
the models; however, SFC was the most important variable for predicting sugarcane yield.

1. Introduction Bramley and Ouzman, 2019).


There are different attempts to generate data to estimate yield: from
Achieving minimum costs to maximum crop yield with a healthy a yield monitor embedded in the harvester, called proximal sensing
ecosystem is the main goal of Precision Agriculture (PA) (Chlingaryan technique, or by analyzing remotely imagery, from remote sensing
et al. 2018). Knowledge of the crop cycle, of the growth and the techniques (Momin et al., 2019). The sugarcane harvester cuts a single
development patterns of the plants, is essential for sugarcane site- or double row and chops the sugarcane stalks into small pieces. A device
specific management. In this regard, yield mapping is the most impor­ that measures weight or volume passing along the harvester might
tant data source, as it is the result of all occurrences and interventions provide a reliable font of information to produce high-resolution maps
carried out during the crop cycle. The sugarcane yield mapping provides (Maldaner and Molin, 2020). However, yield monitors are still rarely
quantitative information that can be more easily used by decision- used by growers and sugar mills (Sanches et al., 2019). Studies have
makers (Fernandes et al., 2017) to the agronomical intervention plan­ proposed different physical principles to measure sugarcane yield, such
ning. The possibility to quantify sugarcane yield variability in the fields, as load cell (Mailander et al., 2010; Magalhães and Cerri, 2007), fiber
coupled with soil and crop data, and the knowledge of the real nutri­ optics (Price et al., 2011), laser sensor distance (Price et al., 2017), and a
tional needs of the crop, allows an understanding of the cause and effect 3D sensor with a stereo camera system (Darr et al., 2019). However,
relationship that occur at the within-field level (Price et al., 2017; according to Momin et al. (2019), a drawback of machine-embedded,

* Corresponding author.
E-mail addresses: leonardofm@usp.br (L. Felipe Maldaner), lucaspcorredo@usp.br (L. de Paula Corrêdo), tatiana.canata@usp.br (T. Fernanda Canata), jpmolin@
usp.br (J. Paulo Molin).

https://doi.org/10.1016/j.compag.2020.105945
Received 1 July 2020; Received in revised form 28 November 2020; Accepted 4 December 2020
Available online 7 January 2021
0168-1699/© 2020 Elsevier B.V. All rights reserved.
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

real-time yield monitoring systems is their reliance upon seasonal cali­ the harvest.
brations and maintenance. Sugarcane yield maps from imagery data can Our goal was to create RF, MLR, and ANN models through supervised
be inexpensive and easier to produce; however, their limited accuracy learning methods using CAN data available on the on-board computer of
can be problematic (Momin et al., 2019). On the other hand, the current a sugarcane harvester. The objective was to compare the yield prediction
advances in machine learning methods might assist in obtained high accuracy of the predictive models using the parameters related to
accuracy yield data automatically and at a lower cost (van Klompenburg harvester engine performance, fuel consumption, specific fuel con­
et al., 2020). sumption, engine rotation, and engine power, as input variables.
Currently, the sugarcane harvesters already have information sup­
port and embedded instrumentation, generating data related to its 2. Material and methods
operation in real-time (Corrêdo et al., 2020). Modern harvesters contain
a controller area network (CAN), which allows scores of on-board The study was divided into data collection, data pre-processing, and
electronic control units (ECU) to communicate messages related to the data split into training and validating the developed models. The
harvester operation. These messages include data continuously updating description of the data collected from CAN of the sugarcane harvester,
information about the engine, power train, equipment, power take-off, the pre-processing steps, and the features that comprise the training
hydraulic system, and other parts of the machine (Darr, 2012) to sup­ data, as well as the ML models, are shown in Fig. 1.
port the automation of control processes performed by the harvester,
increasing machine efficiency. Modern sugarcane harvesters are not 2.1. Data collection
only considered a closed-loop system, but they also have several types of
communication with the user’s interface. With the current advances in The harvesting was carried out in a field located in the central-
information and communication technology, it is possible to collect real- western region of the state of São Paulo, Brazil (21◦ 21′ S, 48◦ 40′ W;
time information related to the performance of each on-board ECU, 523 m altitude) in a field of 3.3 ha on the second ratoon stage. The
evidencing an essential source of information related to the performance harvester was running with auto-steering, following the baselines
of the machine during the harvest. defined on planting, with accurate traffic control. In this region sugar­
Agricultural machinery has an internal network that is based on a cane is cultivated in contour lines and is green harvested so the harvester
communication protocol called ISOBUS. ISOBUS is a communication runs over residue on level conditions and after the first cut soil
standard regulated by ISO 11,783 and established by the Agricultural compaction is relatively constant. Besides that, the soil of the experi­
Industry Electronics Foundation (AEF), which standardizes communi­ mental area was under low moisture content as the harvest was con­
cation between tractors, implements, and on-board computers. There is ducted in the dry season. Therefore, variables related to traction were
a lack of recent experiments that have been performed for machine data considered negligible for the conditions of the present study. The data
collection and statistical analysis using the ISOBUS diagnostic protocol collection was carried out with a single row sugarcane harvester
(Wang et al., 2017). These studies focused on how to monitor and equipped with the capability of registering real-time data of the sensors
control agricultural machinery functions to support decision-making that assist its operation. These data were transmitted via CAN to the on-
(Al-Aani et al., 2018). Pitla et al. (2014) collected the engine fuel use board harvester computer. The input variables for the models were
rate from the CAN data of the tractor diagnostic port during field op­ hourly fuel consumption of the engine (l h− 1), engine rotation (RPM),
erations to determine its fuel efficiency. Wang et al. (2017) developed an engine power (kW), and specific fuel consumption (SFC - l kW h− 1). The
open-source infrastructure for collecting and processing CAN data of the specific fuel consumption is the ratio between fuel consumption and
agricultural machines, e.g., tractor and combine, to produce useful data engine power (Wei et al., 2012). According to Eisenbies et al. (2020),
analytics and visualizations in real-time. Suvinen and Saarilahti (2006) sugarcane mass flow in the harvester has directly influenced the engine
demonstrated the applicability of forwarder hydraulic driveline CAN load and in the consequent hourly fuel consumption of the engine. Also,
data for measuring mobility parameters. Ala-Ilomäki et al. (2020) the mass flow affects the travel speed that influences the engine rotation
evaluated the use of CAN data collected continuously by forest machines and hourly fuel consumption (Ramos et al., 2016, Martins et al., 2017).
to map the extraction trail trafficability. Also, many studies in the auto Besides, travel speed was collected to verify its relationship with the
industry have been using CAN data analytics to detect anomalous states engine parameters.
in vehicles (Narayanan et al., 2015) and predictive maintenance of the Besides that, it was necessary to use external sensors to the harvester
engine (Furch, 2014). to measure in real-time the sugarcane yield to training the predictive
Most of the studies used supervised and unsupervised machine models and comparing the predicted values. A prototype scale yield
learning (ML) methods to transform a large amount of CAN data into monitor with load cells was installed on the harvester elevator (Fig. 1) to
useful information. ML method involves a learning process to learn from provide the yield data, measuring the amount of sugarcane mass that
a set of examples (training data) to perform a task. Usually, an individual passes through the conveyor before being dumped into the infield
example is described by a set of attributes, also known as features or wagon. Yield data has a time lag due to the mass flow inside the
variables (Liakos et al., 2018). In most practical cases, ML ultimately harvester compared to the data generated by the engine. The harvester’s
aims to learn, or choose from, a pool of candidate probability models on-board computer synchronizes this time difference. Therefore, the
that can best predict unobserved data (Morota et al., 2018). In other yield data, recorded on the yield sensor located in the elevator, was
words, the method located a deterministic capacity that relates each associated with the coordinates generated by the GNSS at the harvest
input attribute to its related target keeping in mind that the end goal was time, as well as CAN data was associated with that same moment. The
to limit the error in future forecast or prediction (Elavarasan et al., data of the four input variables, yield and the time of collection, were
2018). One of the most selected models in crop yield prediction studies is recorded in an SD card at 1 Hz, which was the maximum frequency
the RF model (Everingham et al., 2016). RF techniques aim to build allowed by the harvester computer (Fig. 2). The density of data collected
multiple regression trees by repeatedly taking random subsets of the was 4,800 points ha− 1, totaling 16,000 points.
data to determine the splits in the regression trees (Ip et al., 2018).
Another supervised model typically used with a large amount of data is 2.2. Pre-processing data and training models
the ANN to predict sugarcane yield (Fernandes et al., 2017). Usually, the
most input attributes of the RF and ANN models are satellite imagery Original yield data were screened by filtering null values, global
reflectance, vegetation indexes, and climate data. However, there are no outliers, and local outliers considering georeferenced data. The global
published studies on the application of these ML models in the CAN filtering method for identifying outliers was the Interquartile Range
dataset of the harvester to predict sugarcane yield in real-time during (IQR) (Tukey, 1977). Data points that fell outside 1.5 times the IQR

2
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

Fig. 1. Flowchart from data collection to the predictive models. Adapted from Bratsas et al. (2019).

Fig. 2. Raw data of the selected variables and the yield data collected in the sugarcane harvester over time. SFC: Specific Fuel Consumption.

below the first quartile or 1.5 times the IQR above the third quartile distribution over the predictor space. This split was performed using the
were considered outliers and removed from the dataset. For sugarcane kenStone of the prospectr package (Stevens and Ramirez-Lopez, 2020).
yield values, in-row filtering was performed using a median local filter The same train dataset was used to carry out the training of each model
(Maldaner and Molin, 2020) to remove local outliers. All input and used in this study.
output parameters were normalized between 0 and 1 values as described
by Eq. (1) and Eq. (2) (Ahmad et al., 2017) to improve the accuracy of
the predictive models. 2.3. Models
xi − xmim
x,i = (1) 2.3.1. Multiple linear regression (MLR)
xmax − xmin
MLR is one of the most used methods to develop empirical models for
yi − ymim large datasets. Linear regression is a parametric model and a supervised
y,i = (2) learning algorithm that uses a linear approach between the variables.
ymax − ymin
The prediction function is assumed to be a linear combination of the
where xi represents each input variable, yi is the sugarcane yield, and features (Bratsas et al., 2020). It is used to analyses cause and effect
xmin, xmax, ymin, ymax represent their corresponding minimum and relationships between the dependent variable and the inputs variables
maximum values, x′ i and y′ i are normalized input and output variables, (Chatterjee and Price, 1991):
respectively. The CADEX (computer aided design of experiments) sam­
Y = b0 + b1 x1 + b2 x2 + ⋯ + bP xP (3)
pling method was used to split the dataset into train and test data with a
70:30 ratio. The CADEX allows selecting samples with a uniform where b0 is the constant term, bp is the slope coefficients for each

3
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

independent variable deduced from the training dataset, xp is the inde­


pendent variable, and Y is the dependent variable (sugarcane yield). The
independent variables considered were fuel consumption, engine rota­
tion, engine power, and SFC. Although the model is simple, in some
cases, it is shown to produce quite good results, as well as fast prediction
due to its simple form (Bratsas et al., 2020).

2.3.2. Random forest (RF)


RF regression is a combination of decision trees that each tree de­
pends on the values of a random vector sampled independently from the
input vector with the same distribution for all trees in the forest (Brei­
man, 2001). According to Blanco et al. (2018), RF differs from single
decision tree models because it relies on the average result of many trees
(Fig. 3). For each tree, the data are recursively split into more homog­
enous units, which are commonly referred to as nodes, to improve the Fig. 4. Artificial Neural Network structures for sugarcane yield prediction.
predictability of the response variable. RF fit separate decision trees to a
predefined number of bootstrapped datasets (Everingham et al., 2016). are connected to the neurons in the next layer and the information flows
The predicted value of sugarcane yield is the mean fitted response from forward from the input to the output through the hidden layers (Cher­
all the individual trees that resulted from each bootstrapped sample. In ukuri et al., 2019). An ANN can have zero or more hidden layers. The
this study, the RF regression model was built using the “randomForest” data move between the layers across weighted connections in one di­
package (Liaw and Wiener, 2002) in the R free statistical software (R rection, from the input through the hidden to the output layers. A
Development Core Team, 2014). The best combination of the number of neuron accepts data from the previous layer and calculates a weighted
trees (ntree) and the number of variables available for splitting at each sum of all its inputs:
tree node (mtry) was selected. The performance indicators calculated to ∑n
evaluate the model goodness of fit is described in section 2.4. RF ti = j=i
wij xj (4)
regression model was built using 500 trees (ntree = 500). In each tree, RF
uses randomness in the regression process by selecting a random subset where ti is the weighted sum of all its i inputs, n is the number of inputs,
of variables (mtry) to determine the split (the random selection of the w is the weight of the connection between neurons i and j, and × is the
variables) at each node (Breiman, 2001). The predictor subset was input from neuron j. A transfer function (Eq. (5)) is applied to the
restricted to a minimum size of two variables (mtry = 2). weighted value to calculate the neuron output:
oi = f (ti ) (5)
2.3.3. Artificial neural network (ANN)
A feed-forward back-propagating ANN structure was used to develop
where oi is the neuron output and f (ti) is a transfer function applied to
sugarcane yield prediction. The ANN was developed by a supervised
the weighted value t of i inputs.
learning procedure in R using the package neuralnet (Günther and
The most used transfer function is the sigmoidal function for the
Fritsch, 2010). The package contains a flexible function to train feed-
hidden and output layers, and a linear transfer function is commonly
forward neural networks. A minimum of three layers is required in
used for the input layer. The number of hidden neurons determines the
ANN: the input, hidden, and output layers (Fig. 4). The input layer
number of connections between inputs and outputs and may vary
contains four neurons that correspond to the input variables: fuel con­
depending on the specific problem under study (Kaul et al., 2005). If too
sumption, engine rotation, engine power, and SFC. The output layer
many neurons are used, then the ANN may become over-trained,
contains one neuron that corresponds to the value of sugarcane pre­
causing it to memorize the training data resulting in poor predictions
dicted yield. In a feed-forward neural network, the neurons in one layer
(Lawrence, 1994). In this study, the number of hidden neurons selected
was equal to 5,000. ANN with one hidden layer with different numbers
of neurons was tested with the train data. This number of neurons in the
ANN model generated the smallest error between the sugarcane yield
observed and the ANN output (sugarcane yield prediction).

2.4. Evaluation metrics

To assess the quality of sugarcane yield prediction, it is essential to


establish metrics that allow the comparison of different methods (Brat­
sas et al., 2020). This evaluation must consist of a comparison between
the observed sugarcane yield and the predicted yield. For each sampled
point, it was calculated the prediction error, which was the difference
between the predicted sugarcane yield (Mg ha− 1) by the MLR, ANN, and
RF models, and observed yield (Mg ha− 1). We used the metrics Root
Mean Square Error (RMSE, Eq. (6)), Mean Absolute Error (MAE, Eq. (7)),
and Mean Absolute Percent Error (MAPE, Eq. (8)) to evaluate and to
compare the MLR, ANN, and RF models.
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )2
1∑n
(6)
ˇ
RMSE = xi − xi
n i=1

1∑n ˇ
MAE = |xi − xi | (7)
n i=1
Fig. 3. Random forest structure for sugarcane yield prediction.

4
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

[ ˇ
] variables related to soil conditions, such as humidity, compaction, and
1∑n |xi − xi | residue coverage, to increase the robustness of the model.
MAPE = × 100 (8)
n i=1 xi

ˇ
where n is the number of the sample, xi is predicted variable response, 3.2. Models performance
and xi is the observed variable response.
The supervised models MLR, ANN, and RF were trained with sug­
3. Results and discussion arcane yield data ranging from 45 to 155 Mg ha− 1 (Fig. 6a). There is a
small difference in the average yield of test data (105 Mg ha− 1)
3.1. Data analysis compared to the train data (112 Mg ha− 1). However, it was evident that
the CADEX allowed the segmentation of the data to obtain a training
The input variables engine power, engine rotation, and SFC do not dataset that encompasses the variability of the test dataset, as can be
correlate with the observed yield (Fig. 5). A study carried out by Ramos observed by comparing the minimum and the maximum values for the
et al. (2016) shows that there was no influence on fuel consumption by train dataset (45 and 155 Mg ha− 1, respectively), and test dataset (56.2
the amount of sugarcane harvested (l Mg− 1) at different engine rota­ and 155 Mg ha− 1, respectively). The yield prediction by the models
tions. Furthermore, in the study conducted by Martins et al. (2017) the decreased the range of the yield data (Fig. 6b and c). The density of the
average values of fuel consumption per hour (l h− 1) did not differ in predicted values by the models is close to the average value, which
relation to the variations in sugarcane yield. The variables engine power reducing the standard deviation of the dataset. Although using the
and engine rotation had a moderate correlation coefficient of 0.4. The training set the average yield predicted by the models is equal to the
data dispersion indicates a linear trend between these two variables, average yield observed, in the test data, the average yield predicted by
which indicates an increase in power as there was an increase in engine the MLR model was lower than the average yield observed. The average
rotation. yield values predicted by the ANN and RF models were closer to the
During the cutting and processing of sugarcane by the harvester, average observed yield.
there was no correlation between the travel speed of the harvester with The best results for predicting sugarcane yield by engine parameters
the input variables, and the observed yield. The relatively constant in train data were obtained from the RF model with an RMSE of 7.0 Mg
traction parameters conditions can explain the constant power demand ha− 1, which was 5.6% of the average observed yield, and an MAE of 5.5
for traction, reflecting in the results of the correlation observed between Mg ha− 1. The results were similar to the predicted values in the test data
the speed and the engine parameters. However, we were able to collect with an RMSE of 7.7 Mg ha− 1, MAE of 6.1 Mg ha− 1, and an error of 5.6%
data from a single harvester, limited by the access of one prototype of (Table 1). The performance obtained by the MLR model was less satis­
yield monitor, to perform the concept of using harvester data for factory than ANN and RF models, with an error of 11.9% in the train
measuring sugarcane yield. For future research, a greater number of data and 7.8% in the train sets. The errors of the ANN model for train
variables and different field conditions can be added, such as traction and test dataset in comparison to the average observed yield were 9.4%
demand for locomotion of the machine, and its relationship with and 5.6%, respectively. The ANN model prediction had a distinct per­
formance for both datasets; there was a considerable reduction in the

Fig. 5. Dispersion of the dataset and correlation between the variables used in this study.

5
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

Fig. 6. Kernel density and descriptive statistics of the yield raw data (A), yield predicted by multiple linear regression (MLR), Random Forest (RF), and artificial
neural network (ANN) models in the train data (B) and test data (C).

values and it decreases close to the average value (Fig. 7), with
Table 1
increasing linear tendency. These results indicated that estimating
Machine learning models performance evaluation statistics.
sugarcane yield with the supervised models MLR, ANN, and RF de­
Train dataset Test dataset creases the range of data. Thus, there is a possibility that using input
MLR ANN RF MLR ANN RF parameters from the engine for ML models may decrease the variability
MAE Mg ha − 1
11.0 9.0 5.5 8.7 6.0 6.1
of the estimated sugarcane yield, which indicates the underestimation of
RMSE Mg ha-1 14.2 11.9 7.0 10.7 7.6 7.7 the minimum values and an overestimation of the maximum values.
MAPE % 11.9 9.4 5.6 7.8 5.6 5.6 All models follow the observed sugarcane yield and recognize the
MAE – mean absolute error; RMSE – root mean square error; MAPE - mean changes over the time of data collection, but when occurs an abrupt
absolute percent error; MLR – multiple linear regression; ANN – artificial neural change of sugarcane yield, ANN and MLR models seem to follow the
network; RF – random forest. large deviations (Fig. 8). The plot demonstrating the yield over the
harvest operation shows that the greater the peaks of yield, the greater
RMSE and MAE values in the test data compared to the train set. the deviation (yellow stripes in Fig. 8). That is, at low yield peaks, the
However, the sugarcane yield prediction with ANN in the test and train models underestimated yield, with errors below zero, and at high yield
data were better than MLR and similar to the values indicated from RF peaks, the models overestimated yield. The RF model had the lowest
regression. The sugarcane yield prediction by engine parameters based prediction error at high and low yield peaks. However, in some cases,
on ML models, proposed in this study, presented similar results to the there were changes in the variation of income upwards, and the models
sugarcane yield monitor that use fiber-optic (Price et al., 2011), laser underestimated the yield (blue stripes in Fig. 8). Yield predicted by MLR
sensor distance (Price et al., 2017), and load cell (Mailander et al., 2010) had more considerable variation compared to the observed yield values,
embedded in the harvesters (Table 2). and consequently, a greater variation in error. Over the collection time,
Data obtained directly from the harvester CAN messages, integrated the ANN had underestimated the sugarcane yield, and the RF predictions
with ML processing techniques, do not provide the most accurate yield by engine parameters were coming closest to the observed yield.
map if compared with some built-in systems. Moreover, it has practical When it comes to using yield data as a layer of information in the
applications, as there is no additional sensor to install in the harvester spatial management of the sugarcane field, the methodology used to
and has the potential to reverse the trend of low adherence to the map the yield is expected to be able to identify the higher and lower
adoption of sugarcane yield maps, as pointed out by Silva et al. (2011) yield areas within the field. Yield maps are the key to managing crops in
and Sanches et al. (2019). terms of spatial variation, in particular, which is an important compo­
The MLR and ANN models had a substantial dispersion of the errors, nent of implementing precision agricultural practices (Momin et al.,
with a range of − 88 to 51 Mg ha− 1, and from − 75 to 48 Mg ha− 1 (Fig. 7). 2019; Bramley and Ouzman, 2019; Colaço et al., 2020). Accurate crop
However, the error dispersion for the test data is much less than the yield estimates serve a range of important purposes helping to make
training data for these two models. The sugarcane yield estimated with agriculture more productive and more resilient (Hunt et al., 2019).
RF showed the lowest dispersion of the error. Yield prediction by RF had Reliable yield estimates can be used to identify yield-limiting factors to
a lower error dispersion than prediction error of the other models, which guide the development of site-specific management strategies (Taylor
suggests that the errors were concentrated closer to the mean value of et al., 2007; Leroux et al., 2018; Blasch et al., 2020). Although the
sugarcane yield, while the prediction from MLR had the highest error models have greater prediction error in the extreme values of observed
dispersion. To the three models, the error was greater in the extreme yield, it was evident that it is possible to identify the higher and lower
yield patterns over the time of data collection. This work was the first
approach using CAN data generated by the sugarcane harvester to es­
Table 2 timate sugarcane yield in real-time. Studies with georeferenced data
The overall average error of sugarcane yield prediction by different sensor-based should be performed to monitor the spatial variability of sugarcane yield
techniques. in commercial fields by adding different machines and field conditions
Sensor/Sensing technique Error (%) Reference in the model.

Weighing system 33.0 Benjamin et al. (2001)


Weighing system 8.3 Molin and Menegatti (2004) 3.3. Variable importance
Mass flow sensor 6.4 Magalhães and Cerri (2007)
Laser distance sensor 6.2 Price et al. (2017) The variable importance (Fig. 9) was calculated by fitting the model
Load cell 11.0 Mailander et al. (2010) using all the predictors and fitting the model after the permutation of
Fiber-optic 7.5 Price et al. (2011)
each predictor variable. The relationship between each predictor and

6
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

Fig. 7. Scatter plot of the error as a function of the yield predicted, and density of the error in the train and test dataset for all predictive models.

Fig. 8. Variation of the observed and predicted sugarcane yield over collection time.

on the fuel consumption rate (Al-Aani et al. 2018). The required engine
power variation to process sugarcane could reflect in the power
requirement changes. Thus, the variation of machine energy re­
quirements to process the raw material is reflected in the effort of the
internal systems of the machine guided by the engine (fuel consumption,
power, and engine rotation). Therefore, the variation in this set of var­
iables reflects on the fuel consumption/engine power ratio, the SFC,
possibly explaining the greater importance of this variable. However, all
the variables were necessary for a better performance of the ML algo­
rithm for interpreting the CAN signal from the harvester and relating
them to the yield.
The satisfactory accuracy of the ML technique to estimate sugarcane
yield, given the simplicity to obtain data from CAN data of the har­
vesters, can provide yield data to produce maps, an essential informa­
Fig. 9. Variable importance of the model’s performance.
tion layer to the crop management using PA strategies. Therefore,
further studies are needed to evaluate the technique to monitor the
the outcome was evaluated. A linear model was fit, and the absolute
spatial variability of sugarcane yield in commercial fields.
value of the t-value for the slope of the predictor was used. Otherwise,
loess smoother was fit between the outcome and the predictor. The R2
4. Conclusion
statistic was calculated for this model against the intercept only null
model. This number was returned as a relative measure of variable
The results were satisfactory to predict sugarcane yield using CAN
importance. All importance measurements were scaled, ranging from
data from sugarcane harvester using post-processing data and machine
zero to one. The variables engine rotation and engine power were less
learning techniques. Predictive yield models using RF regression showed
important for estimating sugarcane yield than the variables based on the
better performance than models using ANN and MLR, with a MAPE of
fuel demanded by the mechanized system. SFC was the variable with
5.6% and RMSE of 7.7 Mg ha− 1. All variables had a substantial contri­
more importance to predict sugarcane yield.
bution to modeling; however, the specific fuel consumption was sub­
It is widely known that the fuel consumption of sugarcane harvesters
stantially more important to the predictive models than fuel
is an indicator of the efficiency of the energy conversion process of the
consumption, engine power, and engine speed parameters.
mechanized system used in the harvesting operation (Kim et al. 2011).
The results are promising for obtaining yield data from the engine
Then, the power requirement of the harvester is mainly estimated based
data collected through CAN, with the potential to replace conventional

7
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

yield sensors. It is an innovative approach focused on machine learning Comput. Electron. Agric. 151, 61–69. https://doi.org/10.1016/j.
compag.2018.05.012.
techniques and its application on CAN data with great interest in the
Colaço, A.F., Trevisan, R.G., Karp, F.H.S., Molin, J.P., 2020. Yield mapping methods for
area of precision agriculture and mechanization for sugarcane site- manually harvested crops. Comput. Electron. Agric. 177, 105693 https://doi.org/
specific management. However, future studies are needed to evaluate 10.1016/j.compag.2020.105693.
the application of this technique to monitor the spatial variability of Corrêdo, L. P., Canata, T. F., Maldaner, L. F., Lima, J. J. A., Molin, J. P. Sugarcane
Harvester for In-field Data Collection: State of the Art, Its Applicability, and Future
sugarcane yield in commercial fields by adding different machine and Perspectives. Sugar Tech, 1-14. Doi: 10.1007/s12355-020-00874-3.
field conditions in the model. Darr, M.J., 2012. CAN bus technology enables advanced machinery management.
Resource: Engineering Technology for a Sustainable. World 19 (5), 10. https://doi.
org/10.13031/2013.42312.
CRediT authorship contribution statement Darr, M. J., Corbett, D. J., Herman, H., Vallespi-Gonzalez, C., Dugas, B. E., Badino, H.
2019. U.S. Patent No. 10,371,561. Washington, DC: U.S. Patent and Trademark
Office.
Leonardo Felipe Maldaner: Conceptualization, Methodology,
Eisenbies, M.H., Volk, T.A., de Souza, D.P., Hallen, K.W., 2020. Cut-and-chip harvester
Formal analysis, Writing - original draft, Writing - review & editing. material capacity and fuel performance on commercial-scale willow fields for
Lucas de Paula Corrêdo: Conceptualization, Methodology, Formal varying ground and crop conditions. GCB Bioenergy 12 (6), 380–395. https://doi.
analysis, Writing - original draft, Writing - review & editing. Tatiana org/10.1111/gcbb.12679.
Elavarasan, D., Vincent, D.R., Sharma, V., Zomaya, A.Y., Srinivasan, K., 2018.
Fernanda Canata: Conceptualization, Methodology, Formal analysis, Forecasting yield by integrating agrarian factors and machine learning models: A
Writing - original draft, Writing - review & editing. José Paulo Molin: survey. Comput. Electron. Agric. 155, 257–282. https://doi.org/10.1016/j.
Conceptualization, Supervision, Writing - review & editing. compag.2018.10.024.
Everingham, Y., Sexton, J., Skocaj, D., 2016. Accurate prediction of sugarcane yield
using a random forest algorithm. Agron. Sustain. Dev. 36, 27. https://doi.org/
10.1007/s13593-016-0364-z.
Declaration of Competing Interest Fernandes, J.L., Ebecken, N.F.F., Esquerdo, J.C.D.M., 2017. Sugarcane yield prediction in
Brazil using the NDVI time series and neural networks ensemble. Int. J. Remote Sens.
The authors declare that they have no known competing financial 38 (16), 4631–4644. https://doi.org/10.1080/01431161.2017.1325531.
Furch, J., 2014. Proactive maintenance of motor vehicles. Machines. Technolog. Mater. 8
interests or personal relationships that could have appeared to influence (4), 26–31.
the work reported in this paper. Günther, F., Fritsch, S., 2010. neuralnet: Training of neural networks. The R Journal 2
(1), 30–38.
Hunt, M.L., Blackburn, G.A., Carrasco, L., Redhead, J.W., Rowland, C.S., 2019. High-
Acknowledgements resolution wheat yield mapping using Sentinel-2. Remote Sens. Environ. 233,
111410.
The authors gratefully acknowledge to the National Council for Ip, R.H., Ang, L.M., Seng, K.P., Broster, J.C., Pratley, J.E., 2018. Big data and machine
learning for crop protection. Comput. Electron. Agric. 151, 376–383.
Scientific and Technological Development (Conselho Nacional de Kaul, M., Hill, R.L., Walthall, C., 2005. Artificial neural networks for corn and soybean
Desenvolvimento Científico e Tecnológico - CNPq) for granting a grad­ yield prediction. Agric. Syst. 85 (1), 1–18.
uate scholarship to L.F.M. (Grant number 168643/2017-0). Also, to the Kim, S.C., Kim, K.U., Kim, D.C., 2011. Prediction of fuel consumption of agricultural
tractors. Appl. Eng. Agric. 27 (5), 705–710.
financial support of São Paulo Research Foundation (Fundação de Lawrence, J., 1994. Introduction to Neural Networks. California Scientific Software
Amparo à Pesquisa do Estado de São Paulo, FAPESP) for granting a Press, Nevada City, CA.
graduate scholarship to L.P.C. (Grant number 2018/25008-8), and to Leroux, C., Jones, H., Taylor, J., Clenet, A., Tisseyre, B., 2018. A zone-based approach for
processing and interpreting variability in multi-temporal yield data sets. Comput.
Coordination for the Improvement of Higher Education Personnel
Electron. Agric. 148, 299–308. https://doi.org/10.1016/j.compag.2018.03.029.
(Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Liakos, K.G., Busato, P., Moshou, D., Pearson, S., Bochtis, D., 2018. Machine learning in
CAPES) for the graduate scholarship provided to T.F.C. (Finance Code agriculture: A review. Sensors 18 (8), 2674.
Liaw, A., Wiener, M., 2002. Classification and regression by randomForest. R news 2 (3),
001).
18–22.
Magalhães, P.S.G., Cerri, D.G.P., 2007. Yield monitoring of sugar cane. Biosyst. Eng. 96
References (1), 1–6.
Mailander, M., Benjamin, C., Price, R., Hall, S., 2010. Sugar cane yield monitoring
system. Appl. Eng. Agric. 26 (6), 965–969.
Ahmad, M.W., Mourshed, M., Rezgui, Y., 2017. Trees vs Neurons: Comparison between
Maldaner, L.F., Molin, J.P., 2020. Data processing within rows for sugarcane yield
random forest and ANN for high-resolution prediction of building energy
mapping. Scientia Agricola 77 (5). https://doi.org/10.1590/1678-992x-2018-0391.
consumption. Energy Build. 147, 77–89.
Martins, M.B., Ramos, C.R.G., de Souza, F.L., Sartori, M.M.P., Lanças, K.P., 2017.
Al-Aani, F.S., Darr, M.J., Powell, L.J., Convington, B.R., 2018. Design and validation of
Relationship between harvester speed, sugarcane yield, and fuel consumption of
an electronic data logging system (CAN Bus) for monitoring machinery performance
harvester (in Portuguese). Journal of Neotropical Agriculture 4 (1), 88–91.
and management- Planting application. American Society of Agricultural and
Momin, M.A., Grift, T.E., Valente, D.S., Hansen, A.C., 2019. Sugarcane yield mapping
Biological Engineers Annual International Meeting 2018. ASABE 2018, 2–22.
based on vehicle tracking. Precis. Agric. 20 (5), 896–910. https://doi.org/10.1007/
Ala-Ilomäki, J., Salmivaara, A., Launiainen, S., Lindeman, H., Kulju, S., Finér, L.,
s11119-018-9621-2.
Uusitalo, J., 2020. Assessing extraction trail trafficability using harvester CAN-bus
Molin, J.P., Menegatti, L.A.A., 2004. Field-testing of a sugar cane yield monitor in Brazil.
data. Int. J. Forest Eng. 1–8.
ASAE Annual Int. Meeting 2004 (0300), 733–744.
Blasch, G., Li, Z., Taylor, J.A., 2020. Multi-temporal yield pattern analysis method for
Morota, G., Ventura, R., Silva, F.F., Koyama, M., Fernando, S.C., 2018. Machine learning
deriving yield zones in crop production systems. Precis. Agric. 21, 1263–1290.
and data mining advance predictive big data analysis in precision animal agriculture.
https://doi.org/10.1007/s11119-020-09719-1.
J. Animal Sci. 96, 1540–1550. https://doi.org/10.1093/jas/sky014.
Benjamin, C.E., Mailander, M.P., Price, R.R., 2001. Sugar Cane Yield Monitoring System.
Narayanan, S. N., Mittal, S., Joshi, A. 2015. Using data analytics to detect anomalous
Appl. Eng. Agric. 26, 965–969. https://doi.org/10.13031/2013.35905.
states in vehicles. arXiv preprint arXiv:1512.08048.
Blanco, C.M.G., Gomez, V.M.B., Crespo, P., Ließ, M., 2018. Spatial prediction of soil
Pitla, S.K., Lin, N., Shearer, S.A., Luck, J.D., 2014. Use of controller area network (CAN)
water retention in a Páramo landscape: Methodological insight into machine
data to determine field efficiencies of agricultural machinery. Appl. Eng. Agric. 30
learning using random forest. Geoderma 316, 100–114.
(6), 829–839. https://doi.org/10.13031/aea.30.10618.
Bramley, R.G.V., Ouzman, J., 2019. Farmer attitudes to the use of sensors and
Price, R.R., Johnson, R.M., Viator, R.P., 2017. An overhead optical yield monitor for a
automation in fertilizer decision-making: Nitrogen fertilization in the Australian
sugarcane harvester based on two optical distance sensors mounted above the
grains sector. Precis. Agric. 20 (1), 157–175. https://doi.org/10.1007/s11119-018-
loading elevator. Appl. Eng. Agric. 33 (5), 687–693.
9589-y.
Price, R.R., Johnson, R.M., Viator, R.P., Larsen, J., Peters, A., 2011. Fiber optic yield
Bratsas, C., Koupidis, K., Salanova, J.M., Giannakopoulos, K., Kaloudis, A.,
monitor for a sugarcane harvester. Trans. ASABE 54 (2007), 31–39.
Aifadopoulou, G., 2020. A Comparison of Machine Learning Methods for the
R Development Core Team, 2014. R: A Language and Environment for Statistical
Prediction of Traffic Speed in Urban Places. Sustainability 12 (1), 142. https://doi.
Computing. R Foundation for Statistical Computing. http://www.R-project.org.
org/10.3390/su12010142.
Ramos, C.R., Lanças, K.P., Lyra, G.A.D., Sandi, J., 2016. Fuel consumption of a sugarcane
Breiman, L., 2001. Random forests. Machine Learn. 45, 5–32.
harvester in different operational settings. Revista Brasileira de Engenharia Agrícola
Chatterjee, S., Price, B. 1991. Regression diagnostics. New York.
e Ambiental 20 (6), 588–592. https://doi.org/10.1590/1807-1929/agriambi.
Cherukuri, H., Perez-Bernabeu, E., Selles, M., Schmitz, T., 2019. Machining chatter
v20n6p588-592.
prediction using a data-learning model. J. Manufact. Mater. Process. 3 (2), 45.
Sanches, G.M., Graziano Magalhães, P.S., Junqueira Franco, H.C., 2019. Site-specific
https://doi.org/10.3390/jmmp3020045.
assessment of spatial and temporal variability of sugarcane yield related to soil
Chlingaryan, A., Sukkarieh, S., Whelan, B., 2018. Machine learning approaches for crop
yield prediction and nitrogen status estimation in precision agriculture: A review.

8
L. Felipe Maldaner et al. Computers and Electronics in Agriculture 181 (2021) 105945

attributes. Geoderma 334, 90–98. https://doi.org/10.1016/j. Tukey, J.W., 1977. Exploratory Data Analysis. Addison-Wesley Publishing Company.
geoderma.2018.07.051. van Klompenburg, T., Kassahun, A., Catal, C., 2020. Crop yield prediction using machine
Silva, C.B., de Moraes, M.A.F.D., Molin, J.P., 2011. Adoption and use of precision learning: A systematic literature review. Computers Electronics Agri. https://doi.
agriculture technologies in the sugarcane industry of São Paulo state. Brazil. org/10.1016/j.compag.2020.105709.
Precision Agri. 12 (1), 67–81. https://doi.org/10.1007/s11119-009-9155-8. Wang, Y., Balmos, A. D., Layton, A. W., Noel, S., Ault, A., Krogmeier, J. V., Buckmaster,
Stevens, A., Ramirez-Lopez, L., 2020. An introduction to the prospectr package. D. R. 2017. An Open-Source Infrastructure for Real-Time Automatic Agricultural
R package version (2). Machine Data Processing. In 2017 ASABE Annual International Meeting (p. 1).
Suvinen, A., Saarilahti, M., 2006. Measuring the mobility parameters of forwarders using American Society of Agricultural and Biological Engineers.
GPS and CAN bus techniques. J. Terramech. 43 (2), 237–252. Wei, Q., Mao, Z., Liu, Q. 2012. Fuel economy analysis of corn combine harvesters in-field
Taylor, J.A., McBratney, A.B., Whelan, B.M., 2007. Establishing management classes for operation. In 2012 Dallas, Texas, July 29-August 1, 2012 (p. 1). American Society of
broadacre agricultural production. Agron. J. 99, 1366–1376. Agricultural and Biological Engineers.

You might also like