Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.elsevier.com/locate/issn/15375110

Research Paper

Detection of passion fruits and maturity


classification using Red-Green-Blue Depth images

Shuqin Tu a, Yueju Xue b,c,*, Chan Zheng a, Yu Qi a, Hua Wan a,


Liang Mao b
a
College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
b
College of Electronic Engineering, South China Agricultural University, Guangzhou 510642, China
c
Guangdong Engineering Research Center for Datamation of Modern Pig Production, Guangzhou 510642, China

article info
A machine vision algorithm was developed to detect passion fruits and identify maturity of
Article history: the detected fruits using natural outdoor RGB-D images. As different passion fruits on the
Received 12 October 2017 same branch can be in different maturity stages, detection and maturity classification on a
Received in revised form complex background are very important for yield mapping and development of intelligent
27 August 2018 mobile fruit-picking robots. In this study, a Kinect sensor was used for data acquisition, and
Accepted 4 September 2018 maturity stages of the fruits were divided into five categories: young (Y), near-young (NY),
Published online 4 October 2018 near-mature (NM), mature (M) and after-mature (AM). The algorithm involved two stages.
First, by colour and depth images, passion fruits were detected using faster region-based
Keywords: convolutional neural networks (Faster R-CNN), and colour-based detection was integrated
Fruit detection and maturity identi- with depth-based detection for improving detection performance. Second, for each detected
fication fruit region, the dense scale invariant features transform (DSIFT) algorithm combined with
DSIFT locality-constrained linear coding (LLC) was used to extract and represent the features of fruit
Locality-constrained linear coding maturity from R, G, and B channels, respectively. In addition, the RGB-DSIFT-LLC features
Faster R-CNN were input into a linear support vector machine (SVM) classifier for identifying the maturity of
fruits. By conducting an experimental study on a special dataset, we verified that the proposed
method achieves 92.71% detection accuracy and 91.52% maturity classification accuracy.
© 2018 IAgrE. Published by Elsevier Ltd. All rights reserved.

increasing owing to a severe shortage of available farm


1. Introduction workers. Efficient harvesting labour assignment in large pas-
sion fruit fields could significantly reduce the associated cost.
Passion fruit contains a variety of functional active com- In addition, yield estimation before fruit ripening would be
pounds, which have high nutritional and medicinal value valuable for efficient labour deployment. Furthermore, yield
(Lewis et al., 2013). With expansion of passion fruit planting estimation prior to harvest helps growers to proactively detect
areas in South China, significant economic benefits can be problems. It is also useful for making decisions pertaining to
obtained. Passion fruits on the same branch of a tree do not irrigation, pest control, and weed control. Thus, there is an
usually ripen at the same time during harvesting season. La- increasing demand for automatic detection of fruits and
bour expense of handpicked passion fruit for fresh markets is

* Corresponding author.
E-mail address: xueyueju@163.com (Y. Xue).
https://doi.org/10.1016/j.biosystemseng.2018.09.004
1537-5110/© 2018 IAgrE. Published by Elsevier Ltd. All rights reserved.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 157

for ripening ones. Ji et al. (2012) reported that the highest ac-
Nomenclature curacy (92.0%) for identifying red apples was obtained using a
colour camera and the support vector machine (SVM) for
Acronyms image classification. Image segmentation for fruit identifica-
AM After-mature tion has also been investigated using texture, colour, and
BoF Bag-of-features geometric properties. The fusion of blob analysis and the cir-
CE Circle Estimation cular Hough transform (CHT) method were developed for
CHT Circle Hough Transform detection of apples when the fruits in canopy images are
DSIFT Dense Scale-invariant Feature Transform occluded by leaves, branches, and other fruits (Gongal,
HOD Histogram of Oriented Depths Amatya, Karkee, Zhang, & Lewis, 2015). This algorithm was
HOG Histogram of Oriented Gradients tested on 60 images of apple trees and yielded a 90% apple
LLC Locality-constrained linear coding identification accuracy. For detection of citruses, a new
NY Near-young ‘eigenfruit’ approach (Kurtulmus, Lee, & Vardar, 2011) was
NM Near-mature developed for detection of immature green citrus fruits from
PCA Principal Component Analysis colour images acquired under natural outdoor conditions, and
RG Red-Green reported a detection accuracy of 75%. Bansal, Lee, and Satish
RGB Red-Green-Blue (2013) used the fast Fourier transform leakage values for
RGB-D Red-Green-Blue Depth detecting immature green citruses and obtained an accuracy
R-CNN Region-based Convolutional Neural Networks of 82%. For detection of blueberries, Li, Lee, and Wang (2014)
ScSPM Sparse code Spatial Pyramid Matching developed the colour component analysis-based detection
SVM Support Vector Machine (CCAD) method for identifying blueberries in different growth
VQ Vector Quantisation stages using natural outdoor colour images, and used the
VGG Visual Geometry Group forward feature selection algorithm (FFSA) method to separate
all berries in four maturity stages using three classifiers,
Parameters/Variables
which yielded a high accuracy of 90%. The accuracy of fruit
b Perpendicular distance from optimal hyper
detection and maturity estimation was shown to be affected
plane to the origin
by uncertain and variable lighting conditions in the field
B Visual codebook
environment, as well as by complex canopy structures
C Codes set for X
(Karkee & Zhang, 2012).
di Locality adaptor that gives measures the
Detection of green fruits is an especially difficult task,
similarity between the X and B
owing to the following issues: (1) the colours of green fruits
f Function estimates the label of test vector
and leaves in the same image are often very similar; (2) most
p Resulting probability of RGB-D
of the existing detection algorithms are sensitive to various
pD Resulting probability of detecting depth images
lighting conditions and occlusion of fruits. Various types of
pRGB Resulting probability of detecting RGB images
sensor systems and image processing methods have been
ti SVM input
studied for improving detection accuracy of green fruits. An
w Normal vector of the hyper plane
automated yield monitoring system was developed (Chang,
X Local descriptors set of an image
Zaman, Farooque, Schumann, & Percival, 2012), consisting of
Yi SVM output label to be assigned as either
two colour cameras and a real time kinematic-GPS receiver.
positive (þ1) or negative (1)
Yang, Lee, and Gader (2014) explored the feasibility of hyper-
ai Nonzero coefficients as obtained with the
spectral imaging for classifying the growth stages of blue-
quadratic programming
berries and achieved a nearly 88% classification accuracy. A
l Constant value
stereovision camera was used for localisation of fruits in
l1 Scalars controlling the relative contribution of
global coordinates, to identify repeated counting of apples
the sparsity and locality constraints
owing to multiple imaging (Wang, Nuske, Bergerman, & Singh,
s Adjusting the weight decay speed for di
2013). A stereovision camera addresses issues by using a sin-
gle camera-based system which is relatively complex and
whose classification accuracy is low, particularly for outdoor
accurate and efficient recognition of their maturity, given environments where stereo matching is problematic. A laser
passion fruit tree canopy images. range finder (Bulanon & Kataoka, 2010) has demonstrated
Automated computer vision systems are commonly used better location accuracy compared with other currently used
for detecting and identifying the maturity stages of various sensors. However, this system is comparatively bulky, slow,
fruits, such as apples, citruses, and blueberries. The detection and costly. Nissimov, Goldberger, and Alchanatis (2015) pre-
accuracy for citruses and apples was reported to be between sented an approach for obstacle detection in a greenhouse
70.0 and 92.0%. A recognition algorithm for apple detection environment using the Kinect 3D sensor, and developed an
was proposed based on the colour differences, such as red obstacle detection system based on the information about
minus blue (RB), and green minus red (GR) (Zhou, Damerow, depth, colour, and texture, to achieve obstacle detection
Sun, & Blanke, 2012). The coefficients of determination (R2) robustness. Because the Kinect sensor utilises the time-of-
for the apples detected by the fruit counting algorithm and an flight principle, which is similar to the working principle of
actual harvested yield ranged from 0.57 for young fruits to 0.70 three-dimensional photonic mixer device (3D PMD) cameras,
158 b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

the Kinect sensor could potentially become a low-cost sub- method with the best performance on maturity classification.
stitute for high-accuracy three-dimensional (3D) information. As a result, we propose a novel method for detection and
This sensor could also become a good option for the RGB-D maturity classification of passion fruits; this method can be
based fruit detection method. used by passion fruit-picking robots.
Moreover, the bag-of-features (BoF) approach has now
played a leading role in the field of generic image classification
research (Lazebnik, Schmid, & Ponce, 2006; J. Yang, Yu, Gong, 2. Materials and methods
& Huang, 2009; J. Wang et al., 2010). It commonly consists of
feature extraction, codebook construction, feature coding, 2.1. Acquisition of Red-Green-Blue Depth images
and feature pooling. Designing a suitable feature descriptor is
a central problem in the fruit maturity identification. A study RGB-D images were obtained from a passion fruit farm in
on the comparison of feature descriptors for object classifi- HeYuan city, Guangdong Province, China, from July 29th to
cation (Hietanen, Lankinen, Ka € ma€ ra
€ inen, Buch, & Krüger, August 12th, 2015 and 2016. In that farm, there were 50 rows
2016) emphasised that the dense scale invariant features and 80 strains per row. From each row, 40 strains were
transform (DSIFT) method is superior to other feature selec- randomly selected. Therefore, overall 4000 images were ac-
tion methods. However, the DSIFT method has been mainly quired. A Kinect V2.0 device was used for image acquisition,
developed for grey images, which implies a loss of colour- consisting of an infrared (IR) depth and a digital RGB camera.
related information. To overcome this limitation, different The camera was affixed on a tripod with telescopic adjust-
colour scale invariant features transform (CSIFT) descriptors ment, as shown in Fig. 1a and b. The acquired images were
have been proposed (Abdel-Hakim & Farag, 2006; Burghouts & saved in the JPG format, and contained 1024  1248 pixels
Geusebroek, 2009; Gevers, Gijsenij, Weijer, & Geusebroek, (RGB) and 540  421 pixels (depth), corresponding to an
2012, pp. 138e148). With enhancement of colour informa- approximately 13 cm  13 cm actual scene. Image acquisition
tion, CSIFT descriptors are more stable than scale invariant was performed between 3:00 PM and 6:00 PM local time. The
features transform (SIFT) descriptors in the case of illumina- illumination was affected by changes in the sunlight, wind,
tion changes (Abdel-Hakim & Farag, 2006). On the other hand, and cloud conditions. Because the underlying structured-light
choosing an appropriate coding scheme significantly impacts technology (Gupta, Yin, & Nayar, 2013) cannot be used in
the classification performance. Locality-constrained linear sunlight conditions, the Kinect sensor was used in shady
coding (LLC) is considered as one of the most representative areas and was not exposed to direct sunlight. Trellis-grown
methods, which provides both a fast coding speed and state- vines of passion fruits protected the Kinect sensor from
of-the-art classification accuracy (Chen, Li, Peng, & Wong, direct exposure to light sources. The Kinect sensor was moved
2015). A linear SVM has also been selected as an optimised along trellis-grown vines, so that at all times the camera was
classifier. Therefore, an RGB-DSIFT based on the LLC coding positioned at approximately 1 m from the vines.
feature algorithm and linear SVM classifier were used for
identifying the maturity of passion fruits. 2.2. Building the data library
The overall objective of this study was to develop a method
for detection of different growth stages of passion fruits and During the detection of 4000 images of passion fruits, 2000
for identification of maturity using both colour and depth images with 14000 fruits labelled using LabelPad software
images acquired outdoors. Firstly, an improved fruit detection were randomly selected as training set examples, while other
approach that combines a colour-based detector with a depth- images were assigned to the test set. A detection example is
based detector is proposed based on the faster region-based illustrated in Fig. 2, where (a) shows an RGB image with
convolutional neural networks (Faster R-CNN), which allows 1024  1248 pixels and (b) shows the corresponding depth
to avoid the influence of illumination changes. Then, an effi- image with 540  421 pixels, under normal illumination. The
cient system is developed for identification of different depth image in Fig. 2(c) was pre-processed by applying adap-
maturity stages of the detected passion fruits. Finally, the tive histogram equalisation and the median filter (the window
results that were obtained using different feature selection size is 5  5), to improve the local contrast and to remove the
methods were compared, to select one feature representation noise interference.

Fig. 1 e The detection system's hardware. (a) Kinect prototype, (b) Kinect V2.0.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 159

Fig. 2 e An RGB image and the corresponding depth image, showing different growth stages. (a) The RGB image, (b) the
depth image, (c) the pre-processed depth image.

Fig. 3 e Different levels of passion fruit maturity.

classification data (4056 images) were randomly chosen as


Table 1 e Brief description of five different maturity training set images, while the rest were used as test set
categories in the maturity classification database. images.
Class Class Colour Total number
abbreviation description description of images 2.3. Methods of fruit detection and fruit maturity
AM After-mature fruit Full red 1304
classification
M Mature fruit 90%e100% red 2404
NM Near-mature fruit 50e80% red 592 The maturity classification process was divided into two
NY Near-young fruit 20e50% red 784 stages: 1) detection of passion fruits using the Kinect device
Y Young fruit Green 3028 and 2) classification of the detected passion fruits' maturity.
The flowchart of this process is shown in Fig. 4. A faster R-CNN
detector of RGB-D images was introduced, which fused the
Based on those 4000 detection images, a maturity classifi- detection of RGB data and depth data linearly. After detecting
cation library with 8112 images was built under the supervi- passion fruits, an automated system was proposed for iden-
sion of biologists. The images in the library were categorised tifying different maturity stages of the detected passion fruits,
into five maturity classes: after-mature (AM), mature (M), including three steps: 1) extraction of dense SIFT features, 2)
near-mature (NM), near-young (NY), and young (Y). The five using LLC þ spatial pyramid matching (SPM) and 3) using a
maturity phases of passion fruit are shown in Fig. 3, and a brief linear SVM classifier. The methods employed in each of these
description is provided in Table 1. Half of maturity processes are detailed below.

Stage one:Detection of fruits


Faster R-CNN detector for RGB data
RGB-D images Fusion: RGB-D detector
Faster R-CNN detector for Depth data

Linear SVM classifier LLC+ SPM Dense SIFT Detected RGB image

Stage two:Maturity classification of fruits

Fig. 4 e Flowchart of the process of detection and maturity classification.


160 b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

2.4. Stage one: detection of passion fruits under natural probability pD is the resulting probability of detecting depth
outdoor conditions images and pRGB is the resulting probability of detecting RGB
images.
Faster R-CNN (Ren, He, Girshick, & Sun, 2017) is currently one We also tried a popular vision pipeline of local feature de-
of the best-performing and most efficient methods for object scriptors (Dalal & Triggs, 2005), followed by a bag of visual
detection; the method is an end-to-end trained one and is words and an SVM classifier. This approach did not yield
widely used for image detection. Inspired by this, we used the satisfactory detection performance. Thus, the Faster-RCNN
faster-RCNN method for detection of passion fruits, using method was eventually used for detection of passion fruits.
RGB-D images. Combining the RGB data and depth data ap-
pears promising: depth data are robust with respect to illu- 2.5. Stage two: identifying passion fruits in different
mination changes but sensitive to low-signal strength returns maturity stages
and suffer from limited depth resolution. Image data are rich
in colour and texture, have a high angular resolution, but After detecting the passion fruits in an image, the colour
break down quickly under non-ideal illumination. image was used for maturity classification of the detected
To take advantage of the richness of RGB-D data, Spinello fruits. Depth images provide less information for maturity
and Arras (2011) proposed a combo histogram of oriented identification; thus, they were not used in this stage. The
depths (Combo-HOD) detector, an RGB-D detector for classification procedure of maturity was conducted using
detection of humans, based on the Kinect device. This de- state-of-the-art feature extraction methods. As shown in
tector exhibited an equal error rate of 85%, nearly fourfold Fig. 4, the DSIFT method was used first for feature extraction,
larger than the operation range specified by the sensor's followed by the application of LLC to project each descriptor
manufacturer. The Combo-HOD detector combined a HOG onto its local-coordinate system; then, the projected co-
detector (Dalal & Triggs, 2005) for RGB data with a HOD ordinates were integrated by the SPM to generate a final
detector for depth data. We followed the Combo-HOD pro- representation, and finally a linear SVM classifier was uti-
cess; however, our approach differed from the above one in lised on the final representation to identify the maturity of
that a HOG descriptor and an SVM classifier were used as fruits.
detectors for RGB and depth images in the Combo-HOD,
while in the present paper we used the Faster R-CNN. The 2.5.1. Dense scale invariant features transform
RGB-D detector was trained separately using an RGB Faster- SIFT (David G. Lowe, 2004) is a computer vision algorithm for
RCNN detector trained on colour images and a depth Faster- representing and identifying objects with some local and
RCNN detector trained on depth images. If no depth data are distinctive features; the algorithm was proposed by Lowe
available, the detector gracefully degrades to the regular (1999). The SIFT algorithm obtains invariant descriptors of
RGB detector. A linear strategy was used to combine the rotation, scale, and variation in illumination, and is robust to
RGB-based Faster-RCNN detector with the depth-based geometric transformations such as isometry, similarity,
Faster-RCNN detector affine, projective, and inversion transformations.
In our work, we used the DSIFT algorithm (Yang et al.,
p ¼ lpD þ ð1  lÞpRGB (1) 2009), which is derived from the SIFT algorithm. The algo-
where p is the resulting probability of detecting passion fruits rithm allows to quickly compute descriptors for densely
in RGB-D images, and l is adjusted by the cross-validation sampled key points with identical size and orientation. The
method in the experiment (used 0.2 in the paper). The main advantage of the DSIFT algorithm over the SIFT

Fig. 5 e DSIFT features of grey images of passion fruits in different stages of maturity.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 161

algorithm is that the former utilises a sampling procedure to


reduce the computation time. Grey-based DSIFT features of  
distðxi ; BÞ
di ¼ exp (3)
passion fruits in five different stages of maturity are shown in s
Fig. 5.
where distðxi ; BÞ ¼ ½distðxi ; b1 Þ; distðxi ; b2 Þ; :::; distðxi ; bM Þ, and
distðxi ; bj Þ is the Euclidean distance between xi and bj . The
2.5.2. Red-Green-Blue Dense scale invariant features
parameter s is used for adjusting the speed of the weight
transform
decay for locality adaptor di .
Being the most popular colour model, the RGB colour space
provides plenty of information for vision applications. To
2.5.4. Linear support vector machine
embed the RGB colour information into the DSIFT descriptor,
The SVM (Cortes & Vapnik, 1995; Rifkin, Klautau, & Org, 2004)
we calculated traditional DSIFT descriptors for R, G, and B
is a well-known and widely used classifier. We used a simple
colour space channels. By combining extracted features, a
implementation of linear SVMs in our experiments. Let the
128  3 descriptor matrix was built (128 rows for each colour
training data be T ¼ fðt1 ;Y1 Þ;ðt2 ; YÞ2 ;:::;ðtN ;YN Þg, where T needs
channel). Compared with the conventional grey-based DSIFT,
to be categorised into two groups using an SVM, ti denotes a
the RGB colour gradients (or edges) in the analysed image can
datum vector and Yi denotes the label of the datum vector, to
be captured.
be assigned as either positive (þ1) or negative (1). Then, any
unknown vector ttest can be assigned to either one of the two
2.5.3. Locality-constrained linear coding
groups, based on the following criterion
The BoF approach with the SPM kernel (Lazebnik et al., 2006)
has been employed to build recent state-of-the-art image X
N
   
classification systems. The traditional SPM creates a codebook f ðttest Þ ¼ ai Yi tTi w þ b (4)
i¼1
using clustering techniques, such as the K-means vector
quantisation (VQ), and uses classifiers with nonlinear Mercer where ai (i ¼ 1, 2,… …, N) are nonzero coefficients that are
kernels that afford additional computational complexity. For obtained using quadratic programming methods, ðjbj=kwkÞ is
scalability improvement, Yang et al. (2009) proposed the the perpendicular distance from the optimal hyperplane to
ScSPM method, which uses sparse coding (SC) instead of VQ to the origin, and w is the normal vector of the hyperplane. For
obtain nonlinear codes. To develop a faster implementation of linearly separable problems, function f estimates the label of
SC, Wang et al. (2010) presented a novel and practical coding the test vector. For multi-class classification, the concept of
scheme, LLC. In this paper, LLC was also used for feature the SVM for two-class separation can be used in the “one
coding in our experiments. against all” paradigm.
Let X denote a set of D-dimensional local descriptors in an
image, C is a sparse coefficient matrix (Yang et al., 2009), 2.6. The proposed method for maturity classification of
i.e.X ¼ fx1 ;x2 ;:::;xN g2RDN ; C ¼ fc1 ;c2 ;:::;cN g2RMN , and let B ¼ passion fruits
½b1 ; b2 ; :::; bM 2RDM be a visual codebook with M entries. The
coding methods convert each descriptor into an M-dimen- The maturity classification procedure of passion fruits is
sional code. The basis descriptors B can be reconstructed by shown in Fig. 6. The proposed algorithm mainly proceeds in
optimising the following equation: four steps. In the first step, R-DSIFT, G-DSIFT and B-DSIFT
(128-dimensional) feature vectors are obtained from the three
X
N
2 different colour channels. Next, dictionaries are obtained
min kxi  Bci k2 þ l1 kdi 1ci k s:t:1T ci ¼ 1; ci (2)
c using K-means clustering, where M is the size of a dictionary,
i¼1
and the three codes c-R, c-G and c-B, corresponding to
where ʘ denotes element-wise multiplication, and di 2RM is a
different descriptors, are computed using the dictionaries and
locality adaptor that assigns weights to basis vectors propor-
LLC on the R-DSIFT, G-DSIFT, and B-DSIFT features. In the
tional to their similarity to the input descriptor xi (Wang et al.,
third step, the SPM for each image is divided into three layers
2010). Let l1 be the scalars that control the relative contribu-
(L0 ;L1 ;L2 ). The three codes for each sub-region in each layer are
tions of the sparsity and locality constraints. The LLC method
pooled together by max pooling and combined using a linear
ensures that these descriptors are proportionally similar to
approach to generate the final image representation,
the input descriptor xi . Specifically,

Fig. 6 e Maturity classification based on RGB-SIFT-LLC.


162 b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

PC: CPU, Intel® Xeon(R) CPU E3-1245 v3 @ 3.40GHz  8;


Table 2 e Comparison of passion fruit detectors.
Memory, 32 GB; Graphics, GeForce GTX TITAN X/PCIe/SSE2.
Type of Accuracy Error Time per OS: Ubuntu Kylin 14.04, CUDA7.5, OPENCV3.0, CAFFE,
detector rate (%) rate (%) image (ms)
MATAB2012b.
RGB 89.68 4.60 72.14
Depth 76.26 19.2 50.07
RGB-D 92.71 3.50 72.14
3. Experimental results and discussion

following which the dimensionality of the final representation 3.1. Results of the passion fruit detection method using
is reduced by performing the principal component analysis Red-Green-Blue Depth images
(PCA). Finally, the reduced representation of the images is fed
into a linear SVM classifier to derive maturity labels for the We used the public VGG-16 model 5 (Simonyan & Zisserman,
categorised passion fruits. 2015), which has 13 conv layers and 3 fc layers, to establish the
detector of RGB and depth images, respectively. The accuracy
2.7. Testing materials and experiment rate, the error rate, and the detection time of the RGB, depth,
and RGB-D detectors are listed in Table 2.
To evaluate the performance of our proposed method for With respect to detection accuracy, the RGB-D detector
detection and maturity classification of passion fruits, the demonstrated an accuracy of 92.71%, outperforming the RGB
experiment was conducted on test set images. In stage one, detector and the depth detector by 3% and 16%, respectively.
the Faster R-CNN detectors of RGB and depth were created With respect to the error rate of detection, owing to noise in
respectively from 2000 images of RGB and depth, and other the depth image, the depth detector demonstrated poor re-
2000 images were used to test the detectors. During stage two, sults, with an error rate of 19.2%. The number of pixels in a
as stated in Section 2.2, 4056 images were used to train the depth image is smaller than that in an RGB image; thus, the
SVM classifiers of maturity, to determine the best hyper- depth detector is faster than the RGB one. With parallel
parameters that were difficult to learn using machine learning employment of RGB and depth detectors, the RGB-D detector
algorithms, for example l of feature coding based on the LLC required 72.14 ms for detecting an image, the same as for the
method. The remaining 4056 images were used as a test set. RGB detector.
The training set was partitioned into 10 equal-sized sub- Some example images, analysed using the Faster R-CNN
samples (folds). In each cross-validation run, nine folds were approach, are shown in Fig. 7. The figure illustrates several
used to train the SVM, and the remaining left-out fold was passion fruits detected with different probabilities (p  0.7),
used for validating the model. The cross-validation process under partial overlap conditions. For single fruit detection,
was repeated 10 times so that each of the 10 folds would be both colour and depth detectors achieved a good detection
used exactly once as a validation set. accuracy, above 90%. The fruits that were affected by the
The computational experimental environment is described partial overlap (Fig. 7(b), (c), top row) could not be individually
below: distinguished by the colour and depth detectors. The Faster R-

Fig. 7 e Examples of passion fruit detection by RGB and depth detectors, for different overlap conditions. The top row shows
the colour image detection results, and the bottom row shows the depth image detection results. (a) No overlap, (b) a
relatively weak overlap, (c) a relatively strong overlap.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 163

Fig. 8 e Detection results of passion fruits on RGB-D data, for different illumination conditions. (a) Normal illumination, (b)
bright light, (c) light exposure.

CNN detector lacked any knowledge about tight cluster pas- CHT and the circle estimation (CE) algorithm were used to find
sion fruits size in the training set, which causes the over- circular objects in RGB and depth images, respectively,
lapping fruits are mistaken for a single fruit. For fruits strongly following which a CNN classifier that was trained on the same
affected by the partial overlap, the colour and depth detectors training data was tested for detecting fruits on the background
could not resolve tight clusters of passion fruits into individ- by using randomly selected 100 samples from the test set.
ual fruit. Therefore, detection of highly overlapping passion The detailed results of this comparison are shown in Table
fruits requires additional research. 3 and Fig. 9. Compared with Choi's method, the recall rate of
We have further conducted experiments to analyse the our detectors (either RGB or depth) was approximately 20%
contribution of the depth detector to fruit detection under higher, mainly because Choi's method, either for the CHT on
illumination changes. Some examples are shown in Fig. 8. RGB images or for CE on depth images, is prone to not
Under normal illumination, the RGB detector yielded some detecting some circular objects.
undetected fruits (e.g. Fig. 8(a) top row), while the depth de- Especially, it is difficult to accurately define the parameters
tector (e.g. Fig. 8(a) bottom row) yielded very accurate detec- for the CHT method in the case of leaf occlusion or fruit
tion. This occurred because stems and young fruits had the overlap. The depth images containing noise from sunlight
same colour (green) under different illumination conditions; could restrict the performance of the EC method using
consequently, several passion fruits were not correctly gradient vectors of depth values. Thus, fruits that are marked
detected by the RGB detector under bright light and light by blue circles would not be detected by the CHT/CE þ CNN
exposure conditions (e.g. Fig. 8(b) (c) top row) because the method (e.g. Fig. 9(b), (c) row). However, for the Faster R-CNN
colour of these fruits was very similar to that of leaves and method with a strong learning ability, automatic feature
bushes. Using the depth detector, fruits under various light extraction can detect objects accurately in RGB images and
exposure (e.g. Fig. 8(b) (c) bottom row) could be detected quite depth images (Ren et al., 2017; Zheng et al., 2018). The first
reliably. This suggests that a detection system that combines layers of the network learn simple features (such as edges,
depth and colour detection can improve the detection accu- boundaries, and corners) from depth images, while the deeper
racy under different illumination conditions; this further layers learn more abstract feature representations, such as
suggests that multimodality can help detect fruits in situa- object parts or profiles. Thus, the features extracted by the
tions that cannot be handled by single-cue detectors. Faster R-CNN can amplify aspects of inputs that are important
Based on the OpenCV platform, we have compared our for discrimination, to classify and suppress irrelevant low-
approach with Choi's method (Choi et al., 2017), in which the level variations. Therefore, the Faster R-CNN method can

Table 3 e Comparison of the Faster R-CNN and CHT/CE þ CNN methods on the fruit detection task.
Type of detector Actual number of fruits Ours (Faster R-CNN) CHT/CE þ CNN (Choi et al., 2017)
True positives Recall rate (%) True positives Recall rate (%)
RGB 477 403 84.49 306 64.15
Depth 345 252 73.04 185 53.62
164 b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

Fig. 9 e Examples of passion fruit detection by the RGB and depth detectors, using the Faster R-CNN and CHT/CE þ CNN
methods. (a) Results for the Faster R-CNN method, (b) results for the CHT method on the RGB image and of the CE method on
the depth image. The red circles and blue circles indicate the detected and undetected circular objects, respectively. (c)
Results for the CHT þ CNN method on the RGB image, and for the CE þ CNN method on the depth image. The red circles and
blue circles indicate the detected and undetected fruits, respectively.

accurately detect fruits (e.g. Fig. 9(a) row) in the presence of Table 4 shows the results obtained using different Colour-
leaf occlusion. DSIFT models (Red-DSIFT, Green-DSIFT, Blue-DSIFT, RG-
DSIFT, and RGB-DSIFT). The RGB-DSIFT model demonstrated
3.2. The results of maturity classification based on the the best classification accuracy (91.52%) among the five
RGB-DSIFT-LLC system considered models. Each frame took 0.28 s to compute, which
was almost the same time as that required by the other four
To validate the performance of the proposed method, an models. Compared with the performance of the Red-based
extensive experimental study using the RGB-DSIFT-LLC sys- DSIFT model, the RGB-DSIFT model was approximately 13%
tem was conducted on the test dataset. The test set images better in terms of the average classification accuracy, which
were natural scene images that featured zoom, scale change, can be significant for image classification tasks. The RG-DSIFT
rotation, and illumination variations.

Table 4 e Results of maturity classification using the RGB-


Table 5 e Comparison of classification results for three
DSIFT-LLC system.
feature coding methods (RGB-DSIFT).
Feature Feature Classification Time per
Coding Classification Time per
representation dimension accuracy (%) frame(S)
method accuracy (%) frame(s)
Red-DSIFT 3735 78.40 0.27
LLC 91.52 0.27
Green-DSIFT 3718 82.52 0.26
ScSPM 90.71 2.51
Blue-DSIFT 3772 76.90 0.27
VQ 82.36 1.80
RG-DSIFT 3303 87.60 0.28
RGB-DSIFT 3933 91.52 0.28

The result of maturity classification in bold reflects the best result


among the five models.

Fig. 10 e Comparison of classification accuracies of three Fig. 11 e Confusion matrix for the proposed algorithm
feature coding methods (VQ, ScSPM, and LLC). using the RGB-DSIFT model and the LLC method.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 165

Fig. 12 e Classification performance of the proposed algorithm using the RG-DSIFT and RGB-DSIFT models. (a) The
recognition rate. (b) The recall rate. (c) The false positives rate. (d) The false negatives rate.

model demonstrated the second best average accuracy in the “mature” state, and the incorrect classification rate was
(87.60%). 21.32%. Passion fruits in the NY state were incorrectly classi-
The results for the three feature coding methods, namely fied as being in the “young” state, and the error rate was
VQ (Lazebnik et al., 2006), ScSPM (Yang et al., 2009), and LLC, 23.88%.
for the different Colour-DSIFT descriptors, are shown in Based on the results in Table 4, Figs. 10, and 11, the RGB-
Fig. 10. Notably, the LLC method significantly outperforms the DSIFT and RG-DSIFT models were deemed to achieve the
VQ method (by approximately 10%) in terms of the classifi- best classification accuracy; thus, in what follows, the per-
cation accuracy. Compared with the ScSPM method, the LLC formance of these two models was further analysed.
method was approximately 5% better on the classification
accuracy. Table 5 compares the performances of the three 3.3. Classification performance evaluation and analysis
coding methods (VQ, ScSPM, and LLC) using RGB-DSIFT
models. The LLC method outperforms the VQ and ScSPM Figure 12 shows the recognition rate, the recall rate, the false
methods on the classification accuracy and runtime. More positives rate and the false negatives rate for the proposed
interestingly, the classification accuracies of the LLC and algorithm when using the RG-DSIFT and RGB-DSIFT models.
ScSPM methods are very close; however, the runtime of the In Fig. 12a, an average recognition rate above 90% was ach-
LLC method is approximately 10 times smaller than that of the ieved for fruits in the “young”, “mature”, and AM categories,
ScSPM method. using the RGB-DSIFT model. For the RG-DSIFT model, the
Figure 11 shows the results obtained using the RGB-DSIFT recognitions for fruits in the “young”, NY, NM, and “mature”
model and the LLC method, as visualised by a confusion ma- categories were worse compared with the RGB-DSIFT model.
trix. The confusion matrix shows that “young”, “mature”, and Figure 12b also shows that the RGB-DSIFT model per-
AM states were classified with the accuracies above 95%. But formed better, with an average recall over 95% for fruits in the
the separation of NM and NY was not ideal, because the fruits “young”, “mature”, and AM categories. The RG-DSIFT model
in these categories had similar luminance and colours. Pas- yielded slightly lower recalls for fruits in the “young”, NY, NM,
sion fruits in the NM state were incorrectly classified as being and “mature” categories, compared with the RGB-DSIFT

Table 6 e Comparison of the colour histogram method and the proposed method.
Colour channel Average accuracy (%) Time per frame(s)
Colour histograms The proposed method Colour histograms The proposed method
R 58.11 78.4 0.21 0.27
G 59.52 82.52 0.22 0.26
RG 79.04 87.60 0.24 0.28
RGB 82.45 91.52 0.25 0.28

The result of maturity classification in bold reflects the best result among the five models.
166 b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7

model. However, the recall rate for the two models was 70%
lower for fruits in the NY and NM categories. 4. Conclusions
Figure 12c and d show that the RGB-DSIFT model respec-
tively achieved lower rates of false positives and false nega- In this paper, a new approach for detection of passion fruit
tives (under 10% and 5%) for fruits in the “young”, “mature”, and identification of maturity using RGB-D images was
and AM categories. The RG-DSIFT model yielded higher rates developed, and some pilot tests were performed. The major
of false positives (over 13%) and false negatives for fruits in the contributions of this study can be summarised as follows:
“young”, NY, NM, and “mature” categories, compared with the
RGB-DSIFT model. Thus, the rates of false positives and false (1) Faster R-CNN is a universal object detection method
negatives for the two models were respectively above 11% and that can detect objects of any shape using colour and
31% for NY and NM. depth images. The Faster R-CNN method can be used
for detection of passion fruits in natural canopies.
3.4. Discussion The network learned to detect passion fruits based
on positive and negative training examples. There-
Our approach was compared with colour histograms com- fore, it should be easy to adapt a detection method
bined with the LLC and linear SVM classification method. For based on the Faster R-CNN for detection of new fruit
fair comparison, we set colour histograms of red, green, red- species.
green (RG), and red-green-blue (RGB), with the same training (2) We developed and evaluated an RGB-D image-based
and test datasets. Then, a comparison was made by analysing machine vision system. In the detection step, the
the average accuracies and the time costs of the different fruit detector performed better when RGB informa-
methods. tion was combined with depth information. The
Table 6 lists the classification accuracies and time costs RGB-D detector demonstrated an accuracy of 92.71%,
per frame, for the colour histogram method and for the outperforming the RGB and depth detectors by more
proposed method. The colour histogram method with the RG than 3% and 16%, respectively. The Kinect device
and RGB models was more accurate compared with the R and might be subsequently used for fruit detection in
G models, with the accuracy gain in excess of 20%. This natural outdoor conditions, if the device is protected
occurred because combining the features of red, green, and from direct exposure to sunlight.
blue in the DSIFT method allowed to comprehensively (3) The maturity identification based RGB-DSIFT-LLC
represent changes in the different maturity stages of passion system was discriminative and robust when perfor-
fruits, and the differences between the analysed fruits' col- mance was considered in terms of the identification
ours were much more distinct for the red and green com- accuracy. The misclassified samples could be
ponents, compared with the blue component. This shows attributed to the lack of stability of selected features
that red and green channels are crucial for the presently owing to the high level of interclasses similarity (NM
considered classification task. The proposed method out- and NY). The obtained 91.52% accuracy clearly
performed the colour histogram method using different shows that the proposed method is robust and can
colour models. be used in real applications.
We also evaluated computation time for single image
classification; this time was computed as an average over all
the images in the test set. The colour histogram method using Acknowledgments
R, G, RG, and RGB required similar time (approximately 0.23 s)
using parallel processing methods that allowed to increase the This research was supported by the Science and Technology
feature extraction efficiency and reduce the computation Planning Project of Guangdong Province under Grant No.
time. The same procedure for feature learning of different 2015A020209148, 2015A020224038, 2014A020208108 and
colour components in our proposed method was used. As 2016A020210087, the Guangzhou Municipal Science and
shown in Table 6 (right), the proposed method was slightly Technology Planning Project under Grant No. 201605030013
slower than the method of colour histograms. This likely re- and 201604016122.
flects the fact that our method adopts the LLC and SPM
method for learning higher feature representations, which
references
costs about 0.04 s.
Comprehensively considering the classification accuracy
and time performance, the proposed method yields an
Abdel-Hakim, A. E., & Farag, A. A. (2006). CSIFT: A SIFT descriptor
improvement of approximately 10% in terms of the average
with color invariant characteristics. In Proceedings of the IEEE
classification accuracy, which can be significant for applica- computer society conference on computer vision and pattern
tions that seek to determine passion fruit maturity. Although recognition, 2(February) (pp. 1978e1983).
the proposed method requires more computation time Bansal, R., Lee, W. S., & Satish, S. (2013). Green citrus detection
compared with the colour histogram method, our algorithm using fast Fourier transform (FFT) leakage. Precision Agriculture,
met the needs of fruit robotic harvesting controlled in real- 14(1), 59e70.
time. Meanwhile, with the development of computer soft- Bulanon, D. M., & Kataoka, T. (2010). A fruit detection system and an
end effector for robotic harvesting of Fuji apples. Agricultural
ware and hardware, the computation time will continue to
Engineering International: CIGR Journal, 12(1), 203e210.
decrease.
b i o s y s t e m s e n g i n e e r i n g 1 7 5 ( 2 0 1 8 ) 1 5 6 e1 6 7 167

Burghouts, G. J., & Geusebroek, J. M. (2009). Performance effect of passion fruit peel extract and its major bioactive
evaluation of local colour invariants. Computer Vision and Image components following acute supplementation in
Understanding, 113(1), 48e62. spontaneously hypertensive rats. Journal of Nutritional
Chang, Y. K., Zaman, Q., Farooque, A. A., Schumann, A. W., & Biochemistry, 24(7), 1359e1366.
Percival, D. C. (2012). An automated yield monitoring system II Li, H., Lee, W. S., & Wang, K. (2014). Identifying blueberry fruit of
for commercial wild blueberry double-head harvester. different growth stages using natural outdoor color images.
Computers and Electronics in Agriculture, 81(4), 97e103. Computers and Electronics in Agriculture, 106(106), 91e101.
Chen, J., Li, Q., Peng, Q., & Wong, K. H. (2015). CSIFT based locality- Lowe, D. G. (1999). Object recognition from local scale-invariant
constrained linear coding for image classification. Pattern features. In Proceedings of the seventh IEEE international
Analysis & Applications, 18(2), 441e450. conference on computer vision, IEEE2 (pp. 1150e1157).
Choi, D., Lee, W. S., Schueller, J. K., Ehsani, R., Roka, F., & Lowe, D. G. (2004). Distinctive image features from scale-invariant
Diamond, J. (2017). A performance comparison of RGB, NIR, keypoints. International Journal of Computer Vision, 60(2), 91e110.
and depth images in immature citrus detection using deep Nissimov, S., Goldberger, J., & Alchanatis, V. (2015). Obstacle
learning algorithms for yield prediction. In 2017 ASABE Annual detection in a greenhouse environment using the Kinect
International Meeting. sensor. Computers and Electronics in Agriculture, 113(C), 104e115.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN:
Learning. https://doi.org/10.1023/A:1022627411411. Towards real-time object detection with region proposal
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for networks. IEEE Transactions on Pattern Analysis and Machine
human detection. In Proceedings e 2005 IEEE computer society Intelligence, 39(6), 1137e1149.
conference on computer vision and pattern recognition, CVPR 2005 Rifkin, R., Klautau, A., & Org, K. (2004). In defense of one-vs-all
(pp. 886e893). classification. Journal of Machine Learning Research, 5(1),
Gevers, T., Gijsenij, A., Van De Weijer, J., & Geusebroek, J. M. 101e141.
(2012). Color in computer vision: Fundamentals and applications. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional
Gongal, A., Amatya, S., Karkee, M., Zhang, Q., & Lewis, K. (2015). networks for large-scale image recognition. In International
Sensors and systems for fruit detection and localization: A conference on learning representations (ICRL).
review. Computers and Electronics in Agriculture, 116(C), 8e19. Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data.
Gupta, M., Yin, Q., & Nayar, S. K. (2013). Structured light in In IEEE international conference on intelligent robots and systems
sunlight. In Proceedings of the IEEE international conference on (pp. 3838e3843). IEEE.
computer vision (pp. 545e552). Wang, Q., Nuske, S., Bergerman, M., & Singh, S. (2013). Automated
Hietanen, A., Lankinen, J., Ka € ma
€ ra
€ inen, J. K., Buch, A. G., & crop yield estimation for apple orchards. In , vol. 88.
Krüger, N. (2016). A comparison of feature detectors and Experimental Robotics (pp. 745e758).
descriptors for object class matching. Neurocomputing, 184(C), Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010).
3e12. Locality-constrained linear coding for image classification. In ,
Ji, W., Zhao, D., Cheng, F., Xu, B., Zhang, Y., & Wang, J. (2012). vol. 119. Proceedings of the IEEE computer society conference on
Automatic recognition vision system guided for apple computer vision and pattern recognition (pp. 3360e3367).
harvesting robot. Computers & Electrical Engineering, 38(5), Yang, C., Lee, W. S., & Gader, P. (2014). Hyperspectral band
1186e1195. selection for detecting different blueberry fruit maturity
Karkee, M., & Zhang, Q. (2012). Mechanization and automation stages. Computers and Electronics in Agriculture, 109(109), 23e31.
technologies in specialty crop production. Resource Engineering Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial
& Technology for Sustainable World, 19(5), 16e17. pyramid matching using sparse coding for image
Kurtulmus, F., Lee, W. S., & Vardar, A. (2011). Green citrus classification. In IEEE computer society conference on computer
detection using “eigenfruit”, color and circular Gabor texture vision and pattern recognition workshops, CVPR workshops 2009
features under natural outdoor conditions. Computers and (pp. 1794e1801).
Electronics in Agriculture, 78(2), 140e149. Zheng, C., Zhu, X., Yang, X., Wang, L., Tu, S., & Xue, Y. (2018).
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of Automatic recognition of lactating sow postures from depth
features: Spatial pyramid matching for recognizing natural images by deep learning detector. Computers and Electronics in
scene categories. In Proceedings of the IEEE computer society Agriculture, 147, 51e63.
conference on computer vision and pattern recognition (pp. Zhou, R., Damerow, L., Sun, Y., & Blanke, M. M. (2012). Using
2169e2178). colour features of cv. “Gala” apple fruits in an orchard in
Lewis, B. J., Herrlinger, K. A., Craig, T. A., Mehring-Franklin, C. E., image processing to predict yield. Precision Agriculture, 13(5),
DeFreitas, Z., & Hinojosa-Laborde, C. (2013). Antihypertensive 568e580.

You might also like