Professional Documents
Culture Documents
Classification Applications With Deep Learning and Machine Learning Technologies
Classification Applications With Deep Learning and Machine Learning Technologies
Classification Applications With Deep Learning and Machine Learning Technologies
Classification
Applications with Deep
Learning and Machine
Learning Technologies
Studies in Computational Intelligence
Volume 1071
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes
new developments and advances in the various areas of computational
intelligence—quickly and with a high quality. The intent is to cover the theory,
applications, and design methods of computational intelligence, as embedded in
the fields of engineering, computer science, physics and life sciences, as well as
the methodologies behind them. The series contains monographs, lecture notes and
edited volumes in computational intelligence spanning the areas of neural networks,
connectionist systems, genetic algorithms, evolutionary computation, artificial
intelligence, cellular automata, self-organizing systems, soft computing, fuzzy
systems, and hybrid intelligent systems. Of particular value to both the contributors
and the readership are the short publication timeframe and the world-wide
distribution, which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Laith Abualigah
Editor
Classification Applications
with Deep Learning
and Machine Learning
Technologies
Editor
Laith Abualigah
Hourani Center for Applied Scientific
Research
Al-Ahliyya Amman University
Amman, Jordan
Faculty of Information Technology
Middle East University
Amman, Jordan
School of Computer Sciences
Universiti Sains Malaysia
George Town, Pulau Pinang, Malaysia
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Nowadays, with the considerable growth in deep learning and machine learning clas-
sification approaches ranging from many real-world problems such as Artocarpus
Classification, Rambutan Classification, Mango Varieties Classification, Salak Clas-
sification, Image Processing, Identification for Sapodilla Transfer Learning Tech-
niques, Classification of Jackfruit Artocarpus integer and Artocarpus heterophyllus,
Markisa/Passion Fruit Classification, Big Data Classification, and Arabic text classifi-
cation. Deep learning and machine learning have become indispensable technologies
in the current time, and this is the era of artificial intelligence. These techniques find
their marks in data analysis, text mining, classification problems, computer vision,
image analysis, pattern recognition, medicine, etc. There is a continuous flow of data,
so it is impossible to manage and analyze these data manually. The outcome depends
on the processing of high-dimensional data. Most of it is irregular and unordered,
present in various forms like text, images, videos, audio, graphics, etc. Fruit image
recognition systems are used to classify different types of fruits and to differentiate
different fruit variants of a single fruit type. Rambutan is an exotic fruit mainly in the
Southeast Asian region and prevalent fruit in Malaysia. It comes in different varieties
or cultivars. These cultivars appear to look alike in the naked eyes. Hence, an image
recognition system powered by deep learning methods can be applied in classifying
rambutan cultivars accurately. Currently, sorting and classifying mango cultivars are
manually done by observing the features or attributes of mango like size, skin color,
shape, sweetness and flesh color. Generally, experienced taxonomy experts can iden-
tify different species. However, it is not easy to distinguish these mangoes for most
people. Nowadays, society is advancing in science and technology. There is a lot of
technology that could solve the problem, which can make it easy for people to distin-
guish the cultivar. The solution we would like to propose to solve the concern is the
computer vision technique. Artificial intelligence trains computers to interpret and
understand the visual world like images and video. Deep learning, also known as deep
neural networks or deep neural understanding, is used to process the data and create
patterns by imitating the human brain to decide. It uses neurocodes that are linked
together within the hierarchical neural network to analyze the incoming data. Image
recognition is one of the most popular deep learning applications that help many
v
vi Preface
fields, especially in fruit agriculture, to identify the classification of the fruit. This
book proposal intends to bring together researchers and developers from academic
fields and industries worldwide working in the broad areas of deep learning and
machine learning community-wide discussion of ideas that will influence and foster
continued research in this field to better humanity. This book emphasizes bringing in
front some of the technology-based revolutionary solutions that make the classifica-
tion process more efficient. It also provides deep insight into classification techniques
by capturing information from the given chapters.
vii
viii Contents
Lee Zhi Pen, Kong Xian Xian, Ching Fum Yew, Ong Swee Hau,
Putra Sumari, Laith Abualigah, Absalom E. Ezugwu,
Mohammad Al Shinwan, Faiza Gul, and Ala Mughaid
Abstract There are many species of Artocarpus fruits in Malaysia, which have
different market potentials. This study classifies 4 species of Artocarpus fruits using
deep learning approach, which is Convolutional Neural Network (CNN). A new
proposed CNN model is compared with pre-trained models, i.e., VGG-16, ResNet50,
and Xception. Effects of variables, i.e., hidden layers, perceptrons, filter number,
optimizers, and learning rate, on the proposed model are also investigated in this
study. The best performing model in this study is the new proposed model with 2
CNN layers (12, 96 filters) and 6 dense layers with 147 perceptrons, achieving an
accuracy of 87%.
1 Introduction
Agricultural fields have faced the challenge of labour costs, and automated agri-
cultural systems are on demand to overcome such challenges [1]. Computer vision
technology has contributed to automation, such as the weed removal robots using real-
time weed recognition to remove weeds from the crop field, thus reducing both labour
and chemical costs [2, 3]. Fruit harvesting can harness this technology to enhance
the industry’s profitability, and fruit recognition is the crucial part of the solution [4].
Multiple works have been done on fruit recognition with machine learning approach.
However, only few are done on Malaysian fruits.
Previous works on fruit recognition or classification have been done using both
conventional machine learning approaches and deep learning approach. By extracting
fruit color and fruit shape as features via specialized computing modules, fruit recog-
nition system using KNN was able to have accuracy ranging from 30 to 90%, although
the fruit types are highly distinctive to each other [5]. Wide range of accuracies (30–
90%) achieved by the system raise doubts on its capabilities and optimization of
the feature extracting computing modules with various fruit types will be time and
cost consuming. Another study using conventional machine learning approaches
was done on Supermarket Produce data set, which is very well-documented with
minimum noise. Although it scored high on accuracy with Support Vector Machine
model, generalization of such model in a complicated, real harvesting environment
remains questionable. Few studies using deep learning approaches were also able to
obtain high accuracy (>90%) with well-documented dataset, while researchers are
investigating effects of noise on generalization of neural networks.
This study is to use deep learning approach to recognize four species of Artocarpus
fruit in Malaysia, breadfruit (Artocarpus altilis), Keledang (Artocarpus lanceifolius),
Nangka (Artocarpus heterophyllus), and Tarap (Artocarpus odoratissimus).
Figure 1 shows our proposed CNN architecture. It consists of two layers of convo-
lution, two layers of max pooling layer, one layer of flattening, six layers of dense
layers and one output layer [6, 7]. The hyperparameters are shown in Fig. 2. The first
layer of convolution is with 12 filters, 3 kernel size and activation function of relu.
Then, followed by max pooling layer of size = 2. Next, the output will be fed into the
second layer of convolution with 96 filters, 3 kernel size, activation function of relu
and second layer of max pooling layer of size = 2. The main purpose of using the
convolution is to summarize the presence of detected features in our input image and
the usage of max pooling layer is to reduce the dimensions of our input so that we
Artocarpus Classification Technique Using Deep Learning … 3
could reduce the parameter to be trained. After that, flatten all the output with a flatten
layer and proceed to 6 layers of dense layer with 147 perceptrons. These dense layers
are used to identify the features in our input data and help the output layer to generate
a correct output. Before connecting to the output layer, a dropout layer with the rate
of 0.3 is utilized. Lastly, it is connected to output layer with activation function of
softmax to generate the output of 4 label classes which are breadfruit (Artocarpus
altilis), Keledang (Artocarpus lanceifolius), Nangka (Artocarpus heterophyllus), and
Tarap (Artocarpus odoratissimus).
Transfer learning model is a method of transferring what has been learnt from a
previous application into a new application, which in our case is for Artocarpus clas-
sification [8–12]. Those models that have been trained at a different application are
called pre-trained models. For our study, we selected three main pre-trained models
which are VGG16, ResNet50 and Xception. Some other optimization methods can
be used to optimize the problems as given in [13–18].
VGG16
VGG16 was proposed by Karen Simonyan and Andrew Zisserman in 2015 at a
paper published during the International Conference on Learning Representations
[19]. This model achieved 90.1% accuracy on the ImageNet validation dataset which
consist of over 14 million images. The architecture of VGG16 is shown below in
Fig. 3.
A number of different configurations and fine tuning was done to identify the best
performing model for our Artocarpus image classification. For VGG16, the highest
accuracy achieved was 81.50% using 4096 perceptron, freezing the whole model
except the top layer and running it with a new classifier with 2 dense layers as shown
in Fig. 4. Figure 5 shows VGG16 Transfer Model with Freeze All except Top layer,
New classifier with 2 dense layers and 4096 perceptron.
ResNet50
ResNet50 is a variant of the residual network that consists of 48 convolution layers
and 1 max pooling and 1 average pooling layer. This architecture has enabled the
ability to train many layers (hundreds to thousands) while maintaining high perfor-
mance. Prior to ResNet50, there were no models that were able to achieve the same
feat especially in deep layers of training. ResNet50 achieved 92.1% accuracy on the
ImageNet validation dataset. Figure 6 shows ResNet50 Architecture.
For our Artocarpus image classification using ResNet50, the highest accuracy
achieved was using freezing all except the top layer and run with new classifier with
2 dense layers. The first layer uses 1024 perceptron while the second layer uses 4096
perceptron. This configuration managed to achieve 86% accuracy on our Artocarpus
image classification. Figure 7 shows the performance of ResNet50 Transfer Model
on Artocarpus Image Classification. Figure 8 shows ResNet50 Transfer Model with
Freeze All except Top Layer, New classifier with 2 dense layers with 1024 perceptron
followed by 4096 perceptron.
Xception
Xception is a deep convolutional neural network which was developed by Francois
Chollet from Google Inc. Figure 9 shows Xception Architecture. The name stands
for Extreme Inception and is based on the Inception model but with its modules
replaced using depthwise separable convolutions instead. Xception achieved 94.5%
on the ImageNet validation dataset.
Figure 10 shows the performance of Xception Transfer Model on Artocarpus
Image Classification. Figure 11 shows Xception Transfer Model with Freeze All
except Top Layer, New Classifier with 3 dense layers each with 4096 perceptrons.
For the Artocarpus image classification using Xception, the best performing model
only managed to achieve 66.50% accuracy. It was achieved using freeze all with new
classifier and 3 dense layers, each with 4096 perceptrons.
6 L. Z. Pen et al.
Fig. 5 VGG16 transfer model with freeze all except top layer, new classifier with 2 dense layers
and 4096 perceptron
Artocarpus Classification Technique Using Deep Learning … 7
Fig. 8 ResNet50 transfer model with freeze all except top layer, new classifier with 2 dense layers
with 1024 perceptron followed by 4096 perceptron
8 L. Z. Pen et al.
Fig. 11 Xception transfer model with freeze all except top layer, new classifier with 3 dense layers
each with 4096 perceptrons
best performing transfer model for Artocarpus image classification. This is because
ResNet50 is able to maintain reasonably good accuracy >70% across all configura-
tions tested. VGG16 comes in second with around half of them performing reasonably
good whereas Xception is unable to achieve >70% on all the configurations tested.
Here we can conclude that Xception model is not suitable to use on Artocarpus image
classification. However, it is good to take note that all three models could still achieve
much higher accuracy if the number of epochs is increased. To conclude, ResNet50
is the best transfer model to use for Artocarpus image classification as compared to
VGG16 and Xception.
2.3 Dataset
The Artocarpus genus consists of approximately 50 species of trees which are mainly
restricted to Southeast Asia [20]. For our study, we focused on 4 edible fruits species
namely, (1) Artocarpus altilis (2) Artocarpus lanceifolius (3) Artocarpus hetero-
phyllus and (4) Artocarpus odoratissimus. The dataset consists of a total of 1000
images with each species having 250 images each. The images are resized to 224 ×
224 pixels. The dataset was then split into 80% training and 20% test set. The sample
images can be seen in Fig. 13.
2.4 Augmentation
90° image rotation was used to augment the images to increase accuracy and train
the model better. The code and sample images can be seen in Fig. 14.
3 Performance Result
The original dataset consists of 1000 images with 4 classes which are breadfruit (Arto-
carpus altilis), Keledang (Artocarpus lanceifolius), Nangka (Artocarpus hetero-
phyllus) and Tarap (Artocarpus odoratissimus). Each class has 250 images and has
already been preprocessed to 224 pixels × 224 pixels × 3 filters. We will use python
programming languages like Keras and Tensorflow library with Jupyter notebook to
Artocarpus Classification Technique Using Deep Learning … 11
build our program. First, we load all the images and then perform data augmentation
by rotating all the images 90°. Then, we feed all the 2000 images into the Keras
library function, “image_dataset_from_directory()” to preprocess the data so that it
is converted to the format supported by Tensorflow library. The dataset is further
split into 20% test dataset and 80% train dataset. Next, we perform hyperparameter
optimization starting from the number of hidden layers (dense layer and CNN layer),
number of perceptrons, number of filters, optimizers, epochs and learning rate. In
order to reduce the tuning time for trying different combinations of hyperparameters,
we decide to tune each hyperparameter separately. This can be done by fixing all the
other hyperparameters when tuning for a specific hyperparameter. Once the hyper-
parameter reaches optimum, then proceed to another hyperparameter. The detailed
illustration of the hyperparameter optimization workflow and the hyperparameter
utilized are stated in Fig. 15 and Table 1.
In this section, we will discuss the effect of hidden layer, perceptrons, filter number,
optimizers, number of epochs and learning rate on the performance of our model.
After that, identify the best hyperparameter for our proposed CNN model and
compare its accuracy with the performance of transfer learning for VGG16 and
Xception model.
12 L. Z. Pen et al.
Hyperparameter tuning is done on the hidden layers which are the convolutional
layers and dense layers. The performance of the convolutional neural network has
been found to be greatly affected by varying the numbers of hidden layers. Figure 16
shows the accuracy results of the model when different combinations of convolutional
layers and dense layers are used to build the model. The convolutional layers are tested
out with 2, 3, 4 and 5 layers while the dense layers are tested out with 1, 2, 3, 4, 5, 6
and 7 layers. Different combinations are tested such as 2 convolutional layers with 1
dense layer, 2 convolutional layers with 2 dense layers, 5 convolutional layers with
6 dense layer, 5 convolutional layers with 7 dense layers etc. It is observed that the
best result was obtained with 2 convolutional layers with 5 dense layers, giving the
accuracy of 76%.
Table 1 The hyperparameter utilized in optimization and its values
No. Hyperparameter Hyperparameter Hidden Number of Number of Optimizer Number of Learning
explanation layer perceptron filter epochs rate
1 Hyperparameter 1 Effect of hidden Tuning Same as the 3 Loss = 15 0.01
layer (CNN layer number of ‘sparse_categorical_crossentropy’, (default)
and dense layer) perceptron optimizer = ‘adam’
after
flattening
2 Hyperparameter 2 Effect of number Optimum Tuning 3 Loss = 15 0.01
of perceptron ‘sparse_categorical_crossentropy’, (default)
optimizer = ‘adam’
3 Hyperparameter 3 Effect of filter Optimum Optimum Tuning Loss = 15 0.01
number ‘sparse_categorical_crossentropy’, (default)
optimizer = ‘adam’
Artocarpus Classification Technique Using Deep Learning …
Fig. 16 Accuracy results from different combinations of convolutional layers and dense layers
During the development of the CNN model, one of the hyperparameter tested is
the number of perceptrons. Using the best model obtained from 2.3, the number of
perceptrons in the dense layers are decreased to observe its effect on the model’s
accuracy. Originally, the number of perceptrons is 9408, following the number of
perceptrons obtained from the flatten layer. Then, 9408 perceptrons are increasingly
divided by 2, 4, 8, 16, 32, 64 and 128. Figure 17 shows that the accuracy of the
model varies when different number of perceptrons are applied. The model achieved
highest accuracy, 81% when the number of perceptrons in dense layers are reduced
by 64 times, 147 perceptrons.
The number of convolutional filter layers was tested out with 3, 6, 12, 24, 48, 96,
192 filters. According to Fig. 18, it is observed that upon increasing the number of
convolutional filter layers from 3 to 192, the accuracy is decreased from 81 to 54%.
Different combinations of filter numbers in convolutional layers are tested. The
results are shown in Fig. 19. For example, first convolutional layer uses 3 filters while
the second convolutional layer uses 24 filters. The highest accuracy, 85% is obtained
when the first convolutional layer uses 12 filters and the second convolutional layer
uses 96 filters. Based on the results gathered, the usage of different filter numbers in
convolutional layers achieved higher accuracy than using the same filter numbers in
convolutional layers.
Artocarpus Classification Technique Using Deep Learning … 15
Fig. 18 Accuracy of the model when convolutional layers use same filter numbers
Fig. 19 Accuracy of the model when convolutional layers use different filter numbers
Based on Fig. 21, Adam is the fastest optimizer that reaches its own highest
accuracy if compared to other optimizers. Adam achieved accuracy of 78% at 7
epochs. Other optimizers only reached their own highest accuracy after 13 epochs.
Adagrad gained its highest accuracy at 14 epochs. Adamax and Adagrad had quite
consistent accuracy after 4 epochs. RMSprop was able to gain 71% accuracy at 1
epoch. However, the accuracy was not consistent. The highest accuracy of RMSprop
among epochs ran was 85% and the lowest accuracy was 60%, which results in final
accuracy of 79%.
Artocarpus Classification Technique Using Deep Learning … 17
Learning rate is one of the important hyperparameter used in training the CNN
model. The learning rates adopted and observed in this project are 0.1, 0.01, 0.001,
0.0001, 0.00001 and 0.000001. The model reached highest accuracy of 87% when
the learning rate is 0.001. Figure 22 shows that the accuracy is improved from 23 to
87% with learning rate range from 0.1 to 0.001. However, the accuracy decreases to
42% when the learning rate is 0.0001. The accuracy increases to 63% when learning
rate is 0.00001 and decreases again when the learning rate is 0.000001. Therefore, it
can be concluded that 0.001 is the optimum learning rate for the CNN model. Based
on Fig. 23, the number of epochs may need to be modified for other learning rates
to reach higher accuracy.
Fig. 23 Accuracy of the model with different learning rates in each epoch
The accuracies of pre-trained and proposed models are shown in Table 2, Bold font
refers to the best result. It can be observed that the model with the best performance is
our proposed model which has the accuracy of 87.00%. Then, followed by ResnNet50
(Freeze all with new classifier, 1024 then 4096 perceptrons, 2 dense layers) and VGG-
16 (Freeze all with new classifier, 4096 perceptrons, 2 dense layers) with the accuracy
of 86.00% and 81.50% respectively. These models have almost similar accuracy and
do not improve even when we tried out for other combinations of hyperparameters.
It may be due to the presence of bayes error in our dataset in which there are images
with almost similar features but different targets. It is possible as almost all of our
images contain a large amount of green pixels but with different labels. This will
cause the images difficult to be trained and has the Bayes error which is irreducible.
Thus, our model may have achieved the optimum performance. All the pre-trained
models with freeze all hyperparameters do not show a high accuracy in the prediction
and have the accuracy ranging from 22.00 to 30.00%. This is because the pre-trained
model is complex and requires more epoch to converge to the optimum accuracy.
The accuracy of the pretrained model and proposed model for 15 consecutive epochs
are shown in Fig. 24. In this figure, the proposed model has the highest accuracy in the
first epoch. Then, it increases sharply and reaches its maximum accuracy at the tenth
epoch. After the tenth epoch, it consolidates at the level of 80–87%. For ResnNet50
(Freeze all with new classifier, 1024 then 4096 perceptrons, 2 dense layers) and VGG-
16 (Freeze all with new classifier, 4096 perceptrons, 2 dense layers), it is increase
Artocarpus Classification Technique Using Deep Learning … 19
gradually starting from the first epoch to the fifteenth epoch. However, the increment
is not greater than the proposed model. This means that our proposed model requires
to be trained with less epoch to achieve the optimum and higher accuracy than these
models. For other pretrained models, it does not provide a significant enhancement
when trained from the first epoch until the fifteen epochs, but it is still showing the
upward trend.
0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Epoch
Fig. 24 Accuracy of the pretrained model and proposed model in each epoch
20 L. Z. Pen et al.
4 Conclusion
In conclusion, the best performing model is our proposed model with the prediction
accuracy of 87% which has an architecture of 2 CNN layers (12, 96 filters) and
6 dense layers with 147 perceptrons. It also requires to be trained with less epoch
compared to other pretrained models to achieve optimum accuracy.
References
1. Araújo, S. O., Peres, R. S., Barata, J., Lidon, F., & Ramalho, J. C. (2021). Characterising the
agriculture 4.0 landscape—Emerging trends, challenges and opportunities. Agronomy, 11(4),
667.
2. Fennimore, S. A., Slaughter, D. C., Siemens, M. C., Leon, R. G., & Saber, M. N. (2016).
Technology for automation of weed control in specialty crops. Weed Technology, 30(4), 823–
837.
3. Jamei, M., Karbasi, M., Malik, A., Abualigah, L., Islam, A. R. M. T., & Yaseen, Z. M. (2022).
Computational assessment of groundwater salinity distribution within coastal multi-aquifers
of Bangladesh. Scientific Reports, 12(1), 1–28.
4. Sarig, Y. (1993). Robotics of fruit harvesting: A state-of-the-art review. Journal of Agricultural
Engineering Research, 54(4), 265–280.
5. Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). Deepfruits: A fruit
detection system using deep neural networks. Sensors, 16(8), 1222.
6. Daradkeh, M., Abualigah, L., Atalla, S., & Mansoor, W. (2022). Scientometric analysis and
classification of research using convolutional neural networks: A case study in data science
and analytics. Electronics, 11(13), 2066.
7. AlShourbaji, I., Kachare, P., Zogaan, W., Muhammad, L. J., & Abualigah, L. (2022). Learning
features using an optimized artificial neural network for breast cancer diagnosis. SN Computer
Science, 3(3), 1–8.
8. ud Din, A. F., Mir, I., Gul, F., Mir, S., Saeed, N., Althobaiti, T., Abbas, S. M., & Abualigah, L.
(2022). Deep reinforcement learning for integrated non-linear control of autonomous UAVs.
Processes, 10(7), 1307.
9. Alkhatib, K., Khazaleh, H., Alkhazaleh, H. A., Alsoud, A. R., & Abualigah, L. (2022). A
new stock price forecasting method using active deep learning approach. Journal of Open
Innovation: Technology, Market, and Complexity, 8(2), 96.
10. Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi,
A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-
of-the-art methods. Computers in Biology and Medicine, 145, 105458.
11. Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C.
I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-
the-art machine learning applications, taxonomy, challenges, and future research prospects.
Engineering Applications of Artificial Intelligence, 110, 104743.
12. Wu, D., Wang, S., Liu, Q., Abualigah, L., & Jia, H. (2022). An improved teaching-learning-
based optimization algorithm with reinforcement learning strategy for solving optimization
problems. Computational Intelligence and Neuroscience.
13. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
14. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
Artocarpus Classification Technique Using Deep Learning … 21
15. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
16. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
17. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
18. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
19. Hong, S., Noh, H., & Han, B. (2015). Decoupled deep neural network for semi-supervised
semantic segmentation. Advances in Neural Information Processing Systems, 28.
20. Jagtap, U. B., & Bapat, V. A. (2010). Artocarpus: A review of its traditional uses, phytochem-
istry and pharmacology. Journal of Ethnopharmacology, 129(2), 142–166.
Rambutan Image Classification Using
Various Deep Learning Approaches
Nur Alia Anuar, Loganathan Muniandy, Khairul Adli Bin Jaafar, Yi Lim,
Al Lami Lamyaa Sabeeh, Putra Sumari, Laith Abualigah,
Mohamed Abd Elaziz, Anas Ratib Alsoud, and Ahmad MohdAziz Hussein
Abstract Rambutan (Nephelium lappaceum L.) is a widely grown and favored fruit
in tropical countries such as Malaysia, Indonesia, Thailand, and the Philippines.
This fruit is classified into tens of different cultivars based on fruit, flesh, and tree
features. In this project, five different rambutan cultivars classification models using
deep learning techniques were developed based on a 1000 rambutan images dataset.
Common deep learning methods for the image classification task, Convolutional
Neural Network (CNN), and transfer learning method were applied to recognize each
rambutan variant. Results have shown that the VGG16 pre-trained model performed
best as it achieved 96% accuracy on the test dataset. This indicates the model is
reliable for the rambutan classification task.
1 Introduction
Computer vision is a subset field of Artificial Intelligence (AI) responsible for “teach-
ing” the machine to understand and interpret the visual world such as digital images
or videos. The rise of big data, faster and cheaper computing resources, and new algo-
rithms have contributed to the widespread of this domain. Image classification is one
of the computer vision approaches is applied to various fields including technology,
medical, manufacturing, and agriculture. In agriculture, automated fruit image recog-
nition can assist in quality control and the development of robotic harvesting systems
from orchards [1].
Fruit image recognition systems are used to classify different types of fruits and
to differentiate different fruit variants of a single fruit type [2, 3]. Rambutan is an
exotic fruit that exists mainly in the Southeast Asian region and particularly popular
fruit in Malaysia. It comes in different varieties or cultivars such as Binjai, Gading,
Gula Batu, Jarum Mas, and Rongrien [4]. These cultivars appear to look alike in the
naked eyes. Hence, an image recognition system powered by deep learning methods
can be applied in classifying rambutan cultivars accurately [5–11].
The Convolutional Neural Networks (CNN) algorithm consistently shows remark-
able performances on image classification tasks in image databases including the
MNIST database, the NORB database, and the CIFAR10 dataset [12]. Besides CNN,
transfer learning is amongst the popular method used by researchers for image clas-
sification. Transfer learning adopts the usage of the pre-trained model which is a
network trained on a huge dataset and managed to achieve state-of-the-art perfor-
mance. In this paper, we studied Rambutan cultivars classification using deep learning
models such as CNN and transfer learning.
2 Literature Review
Deep learning provides the capability of a computer model to learn and perform
classification tasks directly from various types of data like images, text, or audio
[13–16]. It provides a high accuracy rate on the go where models are trained using
a huge amount of labeled data and neural network architectures that contain many
layers [17]. The relevant features are learned while the network trains on a collection
of data. This feature extraction while the network trains make deep learning models
highly accurate for computer vision tasks such as object classification. It has become
one of the core technologies for machine-critical artificial intelligence applications
including medical diagnosis to screening various types of cancer [18]. Most recently
image classification technique was used for the Covid-19 screening test using chest
X-ray and CT images of patients [19].
Deep learning achieves tremendous performance in many applications including
fruit classification. There are research works for fruit classification with different
goals and applications [20]. One of these applications refers to agriculture. Anyhow,
Rambutan Image Classification Using Various … 25
deep learning has the drawback of requiring an exceptionally high processing power
due to its massive parameters, which can easily go up to millions in number. Hence,
the necessity to have a lightweight deep learning architecture to fasten the diagnosis
without sacrificing accuracy.
In this section, let’s review several previous attempts to use neural networks and
deep learning for fruit recognition. On the topic of detecting fruits from images
using deep neural networks, paper [21] shows a network trained to recognize fruits.
The researcher seems to adapt a Faster Region-based convolutional network. The
objective is to create a neural network that would be used by autonomous robots
that can harvest fruits. The network is trained using RGB and NIR (near infra-red)
images. The combination of the RGB and NIR models is done in 2 separate cases
named early and late fusion. The result is a multi-modal network that obtains much
better performance than the existing networks.
Another paper [22], uses two backpropagation neural networks trained on images
with apple “Gala” variety trees to predict the yield for the upcoming season. For
this task, four features have been extracted from images like total cross-sectional
area of fruits, fruit number, the total cross-section area of small fruits, and cross-
sectional area of foliage. It was found that the deep learning methods were highly
useful to classify the fruits effectively. Some other optimization methods can be used
to optimize the problems as given in [23–28].
In this paper, we planned to use a few Deep learning named convolutional neural
networks (CNN), Residual networks (ResNet) and VGG16.
3.1 CNN
Specifically, after a nonlinearity has been applied to the feature maps output by a
convolutional layer. Max pooling and average pooling are the most common pooling
operation while RELU is the common choice for the activation function to transfer
gradient in training by backpropagation.
In our work, we proposed a CNN model in classifying five rambutan types:
Rambutan Binjai, Gading, Gula Batu, Jarum Mas, and Rongrien. The model consists
of four convolutional layers. The first convolution layer uses 32 convolution filters
with a filter size of 3 × 3, kernel regularizer 0.001. Regularizer is used to add penalties
on the layer while optimizing. These penalties are used in the loss function in which
the network optimizes. Padding is used to ensure the input and output tensors remain
in the same shape. The input image size is 224 × 224 × 3. Batch normalization is
applied on each convolution before the activation enters. RELU, a rectified linear
activation function, the commonly used activation function at every convolution.
This activation function ensures the output to be either positive or zero only. The
output of each convolutional layer is given as input to the max-pooling layer with a
pool size of 2 × 2. This layer reduces the number the parameters by down-sampling.
Thus, it reduces the amount of memory and time required for computation. So, this
layer aggregates only the required features for the classification. Dropout of 0.3, 0.2,
and 0.1 are applied respectively starting from the second convolutional layers. This
aims to reduce the model complexity to prevent overfitting and reduce the compu-
tation power and time at each convolution. The second convolution layer uses 64
convolution filters with 2 × 2 kernel size and the third convolution layer use 128
convolution filters with 2 × 2 kernel size and followed by the fourth layer with 256
filters with 2 × 2 kernel size. Finally, we use a fully connected layer with 4 dense
layers and 0.5 dropouts, then ended with a SoftMax classifier. Before using dense,
the feature map of the fourth convolution is flattened. In our model, the loss function
used is categorical cross-entropy and Adam optimizer with a learning rate of 0.001.
The architecture of the proposed CNN model is shown in Figs. 2, 3, and 4. Figure 5
shows the expected classification output from the model.
Rambutan Image Classification Using Various … 27
3.2.1 ResNet
Residual networks (ResNet) were developed by the Microsoft Research team for
image recognition tasks implemented using deep residual learning. This algorithm
has managed to secure 1st place on the ILSVRC 2015 classification task. The deep
residual learning architecture was developed to address the degradation problem
which occurs due to increasing stacked layers (depth). Despite having several more
depths compared to VGG nets, the networks show a lower complexity [29]. The
models were trained on over 1.28 million images and evaluated on 50,000 validation
images. ResNet was constructed in five convolutional blocks in the forms of 18, 34,
50, 101, and 152-layers.
We propose the application of ResNet-50 and ResNet-101 pre-trained models on
our Rambutan type classification task using the Keras library. Some and all ResNet
convolutional blocks will be frozen to study the effect of using a fully trained model
versus a partially trained pre-trained model. A new classifier layer consists of two
dense layers with 256 neuron units per layer and SoftMax activation function. Adap-
tive Moment Estimation (Adam) optimizer is used to compute the optimum weights
of the classifier layers with different learning rates. The fully connected layer applies
the categorical cross-entropy loss function to calculate the loss between predictions
and actual labels. Figure 6 shows the architecture of the ResNet model.
28 N. A. Anuar et al.
Fig. 5 Snapshot of a part of rambutan classifications output from the CNN model
3.2.2 VGG
Transfer learning is the reuse of a pre-trained model on a new problem. Its popularity
in deep learning is given by its advantage of training deep neural networks with
comparatively little data. This is very useful since most real-world problems typically
do not have millions of labeled data points to train such complex models [30].
To reiterate, in transfer learning, the knowledge of an already trained machine
learning model is applied to a different but related problem. With transfer learning,
we basically try to exploit what has been learned in one task to improve generalization
in another. We transfer the weights that a network has learned at “task A” to a new
“task B” [31].
VGG16 is one of the transfer learning algorithms. The model achieves 92.7% top-
5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging
to 1000 classes [32]. It was one of the famous models submitted to ILSVRC-2014.
Instead of using a large kernel (11 × 11 and 5 × 5 in the first and second layer),
VGG16 improved upon AlexNet by opting smaller kernel size of 3 × 3.
VGG16 architecture accepts a fixed input size of 224 * 224 RGB images, where
it has a total of 138 million parameters. The architecture comprises of 5 blocks of
convolution layer followed by a max-pool layer after each block and at the end three
fully connected layers with 4096, 4096, 1000 neurons respectively. The last fully
connected layer is the SoftMax layer for classification. VGG16 architecture uses a
very small kernel size i.e., 3 * 3, where after every convolutional layer, a non-linear
operation is performed by a ReLU activation function. Every block contains at least
two convolution layers and at most three convolution layers where the number of
Rambutan Image Classification Using Various … 31
filters for convolution increases with the power of two starting from 64 to 512 [33].
Figure 9 shows the architecture of VGG16.
After loading the VGG16 pre-trained model, all layers were frozen, except for the
last 5 as the last few layers represent a higher-level combination of lower features
and we want to train these layers to suit our problem (Fig. 10). Then a sequential
model was created by adding the VGG convolutional base model, and some fully
connected layers which include a Flatten layer, 3 Dense layers with filter sizes 1024,
1024, and 5 respectively. The first 2 Dense layers use the ReLu activation function.
After the second Dense layer, a Dropout layer is added with a weight of 0.5 to
minimize overfitting. The final Dense layer is the classification layer with a SoftMax
activation function and Adam optimizer with a learning rate of 0.0001. The model
summary is shown in Fig. 11.
3.3 Dataset
There are various types of rambutan. However, we only collected five different types
of rambutan (Gading, Binjai, Gula Batu, Jarum Mas, Rongrien) with 200 images per
label. So, the total size of the dataset used in this work is 1000 images. All images
are resized to 224 × 224 pixels. All images are split into training 80%, validation
10%, and testing 10%. Figure 12 shows the types of rambutan available.
32 N. A. Anuar et al.
Table 1 Segmentation of
Rambutan class Rambutan photo F1-score (%)
F1-score for CNN
Binjai 77
Gading 99
Gula Batu 82
Jarum Mas 74
Rongrien 68
possible. Relu activation function was used for all layers except for the final output
layer for class prediction which used the SoftMax activation function.
With all the parameters set, the overall accuracy of the model was 79% based on
the test set. The model was trained until the 40th epoch before it achieved its lowest
loss. Table 1 shows the segmentation of the F1-score for each class of rambutan for
this basic model.
Gading rambutan has the highest classification score at 99% and Rongrien has the
lowest score at 59%. It may be obvious that from all 5 classes of rambutan, Gading
has distinctive yellow color while others are red. This feature is well extracted by the
model as defining feature for Gading. On the other hand, clear features of the other
4 rambutans may be overlap hence giving lower classification performance. Diving
further into the performance of Rongrien (lowest performance), its recall score is
also significantly lower as compared to other classes at only 47%. This means that
the false-negative rate for the Rongrien is high i.e., Rongrien is commonly mistaken
for other classes of rambutan.
With the baseline model, we then venture for manipulating the training parame-
ters namely batch size, epochs run, and layers of convolution to observe the model’s
performance. The parameters were changed one at a time with the rest of the param-
eters fixed as in the baseline model. The observation is as shown in Tables 2, 3 and 4.
For the convolution layer, the maximum layers are 6 before the max-pooling caused
negative dimension to occur hence the layers manipulated to be slightly lower and
higher than the baseline model layers (2 to 6 layers).
Rambutan Image Classification Using Various … 35
Combining all the best performance for each parameter, the performance that we
got is (Table 5):
The overall performance is much worse than the baseline model when all the
best parameters are combined. Inspection on each class classification shows that
prediction was made only for a single type of rambutan. The main contributor for
this is the small batch size, changing the batch size to its baseline number, 128 gives
us the accuracy of 77%, which is lower than the baseline model. Hence, the baseline
model with 4 convolutional layers, 128 batch size with early stopping gave the highest
accuracy model at 79%.
36 N. A. Anuar et al.
We used two different transfer learning models as discussed in the previous section:
VGG16 and ResNet model. For both transfer learning models, we unfreeze some of
the layers for training.
4.2.1 ResNet
There are two parameters tested in the ResNet model namely batch size and learning
rate.
Three batch sizes were tested for ResNet: 32, 64, and 128. Table 6 shows the
model accuracy summary for each batch size tested. One interesting observation
is that unfreezing some layers improved the model’s performance and this effect is
more noticeable than the batch size difference. Within each model, changing the batch
size does not significantly improve the accuracy performance except for ResNet101
which accuracy jumped from 20 to 77% when batch size increased from 32 to 64.
Doubling the batch size to 128 however, does not bring any more significant improve-
ment. About the partially frozen layers, the unfreeze layers can extract and learn the
distinctive features for our dataset which improved their performance.
For the learning rate, we used two lower learning rates (0.01, 0.05) and two higher
learning rates (0.1, 0.5). Table 7 shows the summary of the model performance result.
For 3 models; ResNet50, ResNet50 partially frozen, and ResNet101, the observed
trend is by increasing the learning rate the performance accuracy increased before
plateaued. 50 epochs with early stopping were used for all model training and the
lower learning rate may still be far from the lowest loss solution when the training
stopped as compared to the higher learning rate that may be closer to the optimized
solution when the training ended either by reaching final epoch or sequence of lowest
loss occur. On the other hand, increasing the learning rate for ResNet101 partially
frozen model caused the performance to slightly be dropped due to the opposite
reason of overshooting the optimized solution (Fig. 13).
4.2.2 VGG16
VGG16 model has experimented with different architecture, batch sizes, epochs, and
optimizers for training. The batch sizes used for VGG16 training are 100, 128, 256.
As mentioned previously, all the layers are frozen except for the last few layers.
The model’s performance is different for different batch sizes and architectures.
Table 8 shows the VGG16 performance summary, where Bold font refers to the
best result. Model 2 performed exceptionally well as compared to other models with
96% accuracy. Model 2 is trained for 125 epochs with a batch size of 128 and Adam
optimizer. On the other hand, models with the same architecture and using Adam
optimizer (Model 1, Model 3) with a batch size of 256 achieved validation accuracy of
89%, and batch size of 100 achieved validation accuracy of 91%. Model 4 with SGD
optimizer achieved validation accuracy 87% for a batch size of 256. Model 5 with
RMSprop optimizer also performed well with a validation accuracy of 95%. Model
6 and model 7 used the same architecture and Adam optimizer but with different
batch sizes. Model 6 with a batch size of 256 achieved validation accuracy of 87%
and whereas Model 7 with a batch size of 128 achieved a validation accuracy of 94%.
The performance of the model improves when batch size decreases from 256 to 128.
Within each model, changing the batch size does not significantly improve the
accuracy performance. In Adam optimizer changing the batch size from 256 to 128
improves the validation accuracy from 89 to 96%. Compared to the other optimizers,
RMSprop achieved a good validation accuracy of 95% for a batch size of 256.
Increasing the number of layers in the architecture does not bring any more significant
improvement in model performance. The performance history of learning the model
and performance metrics of the best model is as shown in Fig. 14 (Fig. 15; Table 9).
The overall validation accuracy of the best model was 96% based on the validation
set. The model was trained for 125 epochs with a batch size of 128 and an Adam
optimizer. The classification of the F1-score for each class of rambutan for this basic
model is depicted in Table 10.
Gading rambutan has the highest F1-score at 100% and Binjai has the lowest
score of 92%. Like discussed previously, Gading has a distinctive yellow color while
others are red which is well extracted by the model. On the other hand, clear features
of the other 4 rambutans may overlap hence giving low classification performance
compared to Gading. Nevertheless, the model still able to classify each type of
rambutan with high accuracy as compared to other models discussed previously.
Based on the highest accuracy of the model, we recommend VGG16 as the classifier
for listed rambutan types.
5 Concluding Remarks
The use of a convolution neural network to classify rambutan shows immense poten-
tial to correctly identify the type of rambutan. The initial hypothesis that all types
of transfer learning models would outperform the conventional, built-from-scratch
Rambutan Image Classification Using Various … 39
Table 8 (continued)
Model Optimizer Batch size Epochs Fully Training Testing
VGG16 connected accuracy % accuracy %
layer
Model 7 Adam 128 125 Flatten + 3 98.87 94
dense layers
with filter sizes
(4098, 1024,
512) +
dropout (0.5)
+ output layer
CNN model is supported, shown using both ResNet and VGG model which yields
higher improvement as compared to the conventional CNN model. Between the
two transfer-learning models, VGG16 has the better accuracy in classifying all the
types of rambutan, achieving overall 96% accuracy as compared to ResNet50 at
85%. VGG16 also manage to identify each type of rambutan well, with each type
of rambutan correctly classified more than 90%. Built from scratch CNN model has
Rambutan Image Classification Using Various … 41
Table 10 F1-score
Rambutan class Rambutan images F1-score (%)
segmentation for VGG16 best
model Binjai 92
Gading 100
Gula Batu 95
Jarum Mas 97
Rongrien 98
the lowest accuracy with the best model achieved 79% accuracy. Rambutan Gading
has the highest accuracy among other types of rambutan, which is believed due to its
distinct color extracted well by the model. It would be suggested for the next training
iteration to remove Rambutan Gading for the model to fully extract defining features
of the other 4 types of rambutan. This system of experts is a basis for the future. It is
recommended for future research to expand size of dataset to classify more varieties
of Rambutan and can be applied to the agriculture field.
42 N. A. Anuar et al.
References
1. Risdin, F., Mondal, P. K., & Hassan, K. M. (2020). Convolutional neural networks (CNN)
for detecting fruit information using machine learning techniques. IOSR Journal of Computer
Engineering (IOSR-JCE), 22(2), 1–13.
2. Morton, J. F. (1987). Fruits of warm climates. Morton.
3. Rojas-Aranda, J. L., Nunez-Varela, J. I., Cuevas-Tello, J. C., & Rangel-Ramirez, G. (2020).
Fruit classification for retail stores using deep learning. Lecture Notes in Computer Science,
12088, 3–13.
4. Goenaga, R., & Jenkins, D. (2011). Yield and fruit quality traits of rambutan cultivars grafted
onto a common rootstock and grown at two locations in Puerto Rico. HortTechnology, 21(1),
136–140.
5. Abualigah, L., Al-Okbi, N. K., Elaziz, M. A., & Houssein, E. H. (2022). Boosting marine
predators algorithm by salp swarm algorithm for multilevel thresholding image segmentation.
Multimedia Tools and Applications, 81(12), 16707–16742.
6. Mehbodniya, A., Douraki, B. K., Webber, J. L., Alkhazaleh, H. A., Elbasi, E., Dameshghi,
M., Abu Zitar, R., & Abualigah, L. (2022). Multilayer reversible data hiding based on the
difference expansion method using multilevel thresholding of host images based on the slime
mould algorithm. Processes, 10(5), 858.
7. Otair, M., Abualigah, L., & Qawaqzeh, M. K. (2022). Improved near-lossless technique using
the Huffman coding for enhancing the quality of image compression. Multimedia Tools and
Applications, 1–21.
8. Liu, Q., Li, N., Jia, H., Qi, Q., & Abualigah, L. (2022). Modified remora optimization algorithm
for global optimization and multilevel thresholding image segmentation. Mathematics, 10(7),
1014.
9. Lin, S., Jia, H., Abualigah, L., & Altalhi, M. (2021). Enhanced slime mould algorithm for
multilevel thresholding image segmentation using entropy measures. Entropy, 23(12), 1700.
10. Ewees, A. A., Abualigah, L., Yousri, D., Sahlol, A. T., Al-qaness, M. A., Alshathri, S., & Elaziz,
M. A. (2021). Modified artificial ecosystem-based optimization for multilevel thresholding
image segmentation. Mathematics, 9(19), 2363.
11. Abualigah, L., Diabat, A., Sumari, P., & Gandomi, A. H. (2021). A novel evolutionary arith-
metic optimization algorithm for multilevel thresholding segmentation of Covid-19 CT images.
Processes, 9(7), 1155.
12. Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification:
A comprehensive review. Neural Computation, 29(9), 2352–2449.
13. Sumari, P., Syed, S. J., & Abualigah, L. (2021). A novel deep learning pipeline architecture
based on CNN to detect Covid-19 in chest X-ray images. Turkish Journal of Computer and
Mathematics Education (TURCOMAT), 12(6), 2001–2011.
14. Kadyan, V., Singh, A., Mittal, M., & Abualigah, L. (2021). Deep learning approaches for
spoken and natural language processing.
15. Abuowaida, S. F. A., Chan, H. Y., Alshdaifat, N. F. F., & Abualigah, L. (2021). A novel instance
segmentation algorithm based on improved deep learning algorithm for multi-object images.
Jordanian Journal of Computer and Information Technology (JJCIT), 7(01), 10–5455.
16. Danandeh Mehr, A., Rikhtehgar Ghiasi, A., Yaseen, Z. M., Sorman, A. U., & Abualigah,
L. (2022). A novel intelligent deep learning predictive model for meteorological drought
forecasting. Journal of Ambient Intelligence and Humanized Computing, 1–15.
17. MathWorks. (2021). What is deep learning? How it works, techniques & applications. Math-
Works. [Online]. https://www.mathworks.com/discovery/deep-learning.html. Accessed July
01, 2021.
18. Ardila, D., Kiraly, A. P., Bharadwaj, S., Choi, B., Reicher, J. J., Peng, L., Tse, D., Etemadi, M.,
Ye, W., Corrado, G., Naidich, D. P., & Shetty, S. (2019). End-to-end lung cancer screening with
three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine,
25(6), 954–961.
Rambutan Image Classification Using Various … 43
19. Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., Cai, M., Yang, J., Li, Y., Meng, X., &
Xu, B. (2021) A deep learning algorithm using CT images to screen for Corona virus disease
(COVID-19). European Radiology, 31(8), 6096–6104.
20. Hameed, K., Chai, D., & Rassau, A. (2018). A comprehensive review of fruit and vegetable
classification techniques. Image and Vision Computing, 80, 24–44.
21. Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). DeepFruits: A fruit
detection system using deep neural networks. Sensors, 16(8), 1222.
22. Cheng, H., Damerow, L., Sun, Y., & Blanke, M. (2017). Early yield prediction using image
analysis of apple fruit and tree canopy features with neural networks. Journal of Imaging, 3(1),
6.
23. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
24. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
25. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
26. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
27. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
28. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
29. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).
30. Qassim, H., Verma, A., & Feinzimer, D. (2018). Compressed residual-VGG16 CNN model for
big data places image recognition. In 2018 IEEE 8th annual computing and communication
workshop and conference (CCWC).
31. Ferguson, M., Ak, R., Lee, Y.-T. T., & Law, K. H. (2017) Automatic localization of casting
defects with convolutional neural networks. In 2017 IEEE international conference on big data
(big data) (pp. 1726–1735).
32. Naranjo-Torres, J., Mora, M., Hernández-García, R., Barrientos, R. J., Fredes, C., & Valenzuela,
A. (2020). A review of convolutional neural network applied to fruit image processing. Applied
Sciences, 10(10), 3443.
33. ul Hassan, M. (2021). VGG16—Convolutional network for classification and detection.
Neurohive, November 20, 2018. [Online]. https://neurohive.io/en/popular-networks/vgg16/.
Accessed July 31, 2021.
Mango Varieties Classification-Based
Optimization with Transfer Learning
and Deep Learning Approaches
Chen Ke, Ng Tee Weng, Yifan Yang, Zhang Ming Yang, Putra Sumari,
Laith Abualigah, Salah Kamel, Mohsen Ahmadi,
Mohammed A. A. Al-Qaness, Agostino Forestiero, and Anas Ratib Alsoud
Abstract Mango is one of the well known tropical fruits native to south asia and
currently there are over 500 varieties of mangoes known. Depending on the variety,
mango fruit can be varied in size, skin color, shape, sweetness, and flesh color which
may be pale yellow, gold, or orange. However, sometimes it is difficult for us to
differentiate what type of mango it is. Thus, in this paper, four types of mango
classification approach is presented. Thus, we are going to use convolutional neural
network (CNN) algorithm and transfer learning methods (VGG16 and Xception) to
train on the 1000 mango images collected and obtain a deep learning model which is
able to classify four types of mango (Alampur Baneshan, Alphonso, Harum Manis
and Keitt) automatically. In summary, the objective in this paper is to develop a deep
learning algorithm to automatically classify four types of mango cultivar.
1 Introduction
2 Methodology
2.1 Dataset
The data set for the development of this study consists of 1000 mango photographs
divided into 4 categories Alampur Baneshan, Alphonso, Harum Manis and Keitt, 250
units for each category where all of them are collected from Google image. Figure 1
shows some examples for each type of mango.
Besides, all the image in in 3 dimension channel and all the image is resize into
the dimension of 224 * 224. Moreover, data augmentation will be used to increase
the robustness of the model. In short, by using the data we will train the model by
using three different deep learning algorithms, one convolutional neural network
and another two transfer learning methods. In short, in section two we will discuss
some literature review related to the topic after that in the following section we will
show the deep learning model we design and discuss the performance for the model
trained.
48 C. Ke et al.
2.2.1 Augmentation
Data augmentation is an important step in data processing. It can increase the data
size by augmenting the image like rotating, magnifying, different color intensity
and so on. Which is able to prevent overfitting of the model. At the same time, the
generalization ability of the model is enhanced. In all the experiments, we use the
ImageDataGenerator function to argue the input image data. Figure 2 shows the
augmentation code that we use in experiment.
In the first row we have converted the RGB value from the range of 0–255 to 0–1.
Secondly, we randomly rotate the image within the degree 0 to 180. Next, for the third
and fourth row we randomly shift the image in the vertical or horizontal direction.
On the fifth row we applied a random shear transform to shear the image. Moreover,
in the sixth row the zoom function is used to randomly scale the image into different
sizes. Furthemore, horizontal_flip is applied to 50% random probability to flip the
image horizontally. Lastly, the nearest fill mode is the filling strategy used to fill up
the image after augmentation like rotation or translation.
network consists of one or more convolution layers and all connected layers at the
top, as well as correlation weight and pooling layer. Compared with other deep
learning structures, convolutional neural networks can give better results in image
and speech recognition.
Firstly, the training set data is enhanced, because in deep learning, the number
of samples is generally required to be sufficient. The more the number of samples,
the better the trained model effect, and the stronger the generalization ability of the
model. For the input image, some simple translation, scaling, color change, etc.
As shown in Fig. 3, the CNN architecture model consists of five convolution
layers, followed by five maximum pooling layers and two fully connected layers.
The network input layer is 224 × 224 × 3 pixel RGB image. Convolution layer and
pooling layer: the first convolution layer is convolution layer 1, which contains 32
convolution cores with the size of 3 * 3 and relu as the activation function, and the
maximum pooling layer 1 is 2 * 2. The second convolution layer is convolution layer
2, which has 64 convolution kernels with the size of 3 * 3 and relu as the activation
function, and the maximum pooling layer 2 is 2 * 2. The third convolution layer is
convolution layer 3, which has 128 convolution cores with the size of 3 * 3 and relu
as the activation function, and the maximum pooling layer 3 with the size of 2 * 2.
The fourth convolution layer is convolution layer 4, which has 256 convolution cores
with the size of 3 * 3 and relu as the activation function, and the maximum pooling
layer 4 with the size of 2 * 2. The fifth convolution layer is convolution layer 5, which
has 512 convolution cores with the size of 3 * 3 and relu as the activation function,
and the maximum pooling layer 5 is 2 * 2.
Flatten layer: Enter the fully connected layer from multi-dimensional input to
one-dimensional.
Full connection layer: (density (256, activation = ‘relu’)). Then dropout and relu
of 0.5 are used for faster convolution calculation. Finally, the classification layer
(density (4), activation = ‘softmax’) is used to predict the output of the model and
50 C. Ke et al.
represent four different kinds of mangoes. SGD: we set the parameters of SGD
optimizer (LR = 0.001, decay = 1e−6, momentum = 0.9, nesterov = true).
2.4.1 VGG16
2.4.2 Xception
makes the architecture very easy to define and modify based on requirements. Using
advanced libraries such as keras or tensorflow slim requires very little code.
Fig. 5 Xception
52 C. Ke et al.
3 Experiment Result
3.1 CNN
There are 1000 mango images in the data set, all images are 224 * 224 pixels in
size, and there are four types, namely Alampur Baneshan, Alphonso, HarumManis,
and Keitt. Including 60% training set, 20% validation set, 20% test set. The deep
learning experiment is carried out in the local jupyter notebook. The model summary
is shown in Fig. 6 shows the model architecture and the input and output of each
layer.
The Flatten layer is used to “flatten” the input, that is, to make the multi-dimensional
input one-dimensional, which is commonly used in the transition from the convo-
lution layer to the (Convolution) fully connected layer (Dense) as shown in Fig. 7.
In other words, after the Convolution convolutional layer, the Dense fully connected
layer cannot be directly connected. The data of the Convolution layer needs to be
flattened (Flatten), and then the Dense layer can be added directly. Dense(256, acti-
vation = ‘relu’) After using relu, Training uses traditional Dropout with a drop rate
of 0.5. For each neuron in the layer that uses Dropout, there is a 50% probability
of being dropped during training, and the last fully connected layer uses softmax to
output 4 categories.
The model uses the SGD optimizer, LR = 0.001, decay = 1e−6, momentum = 0.9,
nesterov = True, gradient descent can make loss drop. Calculate the accuracy on the
test set after training the model.
As shown in Fig. 8, we chose 10, 50, and 100 rounds of training. Figure 8 shows the
accuracy and loss of 50 and 100 epochs. The accuracy of the epochs 10 test set is
0.65 and the loss is 0.82. The accuracy of the epochs 50 test set is 0.78 and the loss
is 0.67. The accuracy of the epochs 100 test set is 0.75 and the loss is 1.07.
In this section we are going to conduct our experiment, where the guideline is
proposed in the article [20]. By observing the figure we could indicate that it is
more suitable for us to follow the third and fourth quarter since our image data data
only has 1000 units which could be considered as low quantity. In the first experi-
ment we will try to train the model with the original model as shown in Figs. 10 and
11 without freezing any layer. Second experiment we will try to fine tune the lower
layers of the pretrained model and in the last experiment we will try to fine tune the
output density of the pretrained model.
3.2.1 VGG16
Experiment 1: Train the entire model with original algorithm design (doesn’t
freeze any layer)
The hyperparameter we set for this experiment is batch size equal to 2, learning rate
equal to 0.0001 and epoch equal to 18 and 100. After that the output layer is changed
from 1000 to 4.
Mango Varieties Classification-Based Optimization … 55
Fig. 9 The impact of different LR on the accuracy of the training set and the accuracy of the
validation set, and the loss
56 C. Ke et al.
In this first experiment we are able to indicate that this model is not performing
well on the data we train. As the figure and table shown the model is overfitting
when using epoch 100 and the performance is bad as shown in Table 3 the accuracy
obtained for both models is lower than 0.4. Thus, we proceed to experiment two to
test for different methods or hyperparameters.
Experiment 2: Train the model by freezing the convolutional layer
In this experiment we have freezed all the convolutional layers and trained the
model with the original fully connected layer and made a comparison with the fully
connected layer used in our CNN model shown in Fig. 12. The result obtained by
using the original VGG16 dense layer is shown in the Fig. 12 below. Both of the
models are trained with 100 epochs.
With this model we will be able to obtain an accuracy 61.5% with the loss 3.9735
where the result is not that ideal and some more it is suffering from overfitting as we
can see in Fig. 12 the distance of validating accuracy and training accuracy is far from
each other. Next we train again by modifying the fully connected layer according to
the method shown in the article [21] and the results are shown in Fig. 12.
For this model the best accuracy is 0.61 and the loss is 1.1217. Besides, by
observing the Fig. 13 we are able to indicate that the model is still suffering from
overfitting and the accuracy is not much different compared with the original design
model, but if we compare the loss then this model will be better. Thus, we will use
this new design model and proceed to the next experiment.
Furthemore, we also tried to reduce the number of neurons from 4096 to 128
units and surprisingly the result obtained better than the previous experiment with
the accuracy of 66.5% and 0.5039 loss. Figure 14 below shows the result obtained
in this experiment.
Experiment 3: Train some layers and leave others frozen
In this section we try to freeze the front few layers and keep the rest of the layer
untrainable. The epoch we used in this section is 100 and the rest will be the same
as experiment 1.
In this section, we have tried to freeze the layer for the first 10 and 15, as shown in
Figs. 15 and 16. The best result we obtained is accuracy equal to 72.5% with the loss
of 0.4586 from the model freezing the first 15 layers. If compared with the original
model or previous experiment we are able to indicate that this model has improved
a lot where the accuracy is 72.5%. Compared with the previous experiment it has
improved by up to 10%. However, if talking about the overfitting issue we are able to
notice that it is still not solved, so after referring to some paper this problem might be
affected by the data we collected. Thus, in this section we would like to conclude that
the model train with 128 neurons and freezing the first fifteen convolutional layers
is the best model we obtain in this experiment.
3.3 Xception
First, we need to create a baseline model, and then modify one parameter at a time
to partition the result and compare it with the baseline model to get its impact. In
Mango Varieties Classification-Based Optimization … 59
order to achieve this goal, a total of five experiments were designed. Experiment 1:
Create a baseline model, mainly modify the number of frozen layers. Experiment 2:
Modify optimizer and compare the performance with the baseline model. Experiment
3: Modify deny layer and compare the performance with the baseline model. Exper-
iment 4: Modify number of epochs and compare the performance with the baseline
model. Experiment 5: Modify learning rate and compare the performance with the
baseline model. We divide the dataset into three parts: training dataset, validation
dataset and testing dataset. We will take the performance of the test dataset as the
evaluation standard of the model.
When creating the baseline model, we try to freeze all, part and no layer in the original
model. Table 4 shows baseline model setting. Table 5 shows the performance of model
with different freezing layer.
Table 5 Performance of
Accuracy (test) Loss (test)
model with different freezing
layer Freeze all 0.185 1.3884
Freeze part 0.42 1.4410
Freeze no 0.78 1.5862
Considering that unfreezing will make the model perform better, we choose
unfreezing as the baseline model, and the following experiments all choose
unfreezing.
In order to form a contrast experiment with experiment 1, only optimizer was modi-
fied here. Table 6 shows experiment 2 model setting and Table 7 experiment 2 model
comparison.
It can be seen from the experimental results in Fig. 17 that the accuracy of the
model has decreased a lot. For our dataset, RMSprop is a better choice. The reason
is that Adagrad learning rate decreases more slowly than RMSprop, which leads to
the slow convergence of the model.
Compared with experiment 1, we changed the setting of the dense layer. Table 8
shows the experiment 3 settings. Table 9 shows experiment 3 model comparison.
Obviously, the dense layer of the baseline model has better performance, which
shows that it can better distinguish image features.
Table 11 Experiment 4
Accuracy (test) Loss (test)
model comparison
Baseline model 0.78 1.5862
Experiment 4 model 0.61 3.2966
In this paper we have trained the model with three deep learning algorithms
(Convolutional Neural Network. Transfer learning (Xception) and Transfer learning
(VGG16)). Table 14 shows the best result we obtain for each model trained.
Mango Varieties Classification-Based Optimization … 63
Table 13 Experiment 5
Accuracy (test) Loss (test)
model comparison
Baseline model 0.78 1.5862
Experment 5 model 0.675 3.872
Table 14 Accuracy
Model Accuracy Loss
comparison
CNN 0.78 0.67
VGG16 0.725 0.4586
Xception 0.78 1.59
In this experiment there are two issues which have occurred; our experiments’
result is good, but the problem of overfitting can not be minimized. The parameters
from pre-trained models can not fit our dataset accurately.
Thus, in order to solve the first issue we might need to increase the number of
samples in our dataset and diversify the image collection or we might improve the
augmentation function, which is able to minimize the overfitting problems effectively.
Moreover, for the second issue we will need to retrain all the parameters with training
data where it is quite time consuming. Since time is precious, thus to solve this
problem we might need to subscribe to a virtual machine on cloud which is able
to process and obtain the result quickly. Therefore, we will be able to do more
experiments in a finite time given.
4 Conclusion
In this study, three variants of CNN model are proposed. One is to customize a CNN
model, and the other two is transfer and the model we used is Xception and VGG16.
By comparing the accuracy of these three algorithms, we would like to conclude that
64 C. Ke et al.
the CNN model that is shown in Sect. 3.3 is our best model. Although we notice
that the Xception also gives the same result, the loss obtained is lower. However,
if compared with the VGG16 performance it seems like the loss matrix is not that
ideal. So within the three models we choose CNN as the best model since the model
gives an average performance compared with the other two models.
References
1. Alhaj, Y. A., Dahou, A., Al-Qaness, M. A., Abualigah, L., Abbasi, A. A., Almaweri, N. A. O.,
Elaziz, M. A., & Damaševičius, R. (2022). A novel text classification technique using improved
particle swarm optimization: A case study of Arabic language. Future Internet, 14(7), 194.
2. Daradkeh, M., Abualigah, L., Atalla, S., & Mansoor, W. (2022). Scientometric analysis and
classification of research using convolutional neural networks: A case study in data science
and analytics. Electronics, 11(13), 2066.
3. Wu, D., Jia, H., Abualigah, L., Xing, Z., Zheng, R., Wang, H., & Altalhi, M. (2022). Enhance
teaching-learning-based optimization for tsallis-entropy-based feature selection classification
approach. Processes, 10(2), 360.
4. Ali, M. A., Balasubramanian, K., Krishnamoorthy, G. D., Muthusamy, S., Pandiyan, S.,
Panchal, H., Mann, S., Thangaraj, K., El-Attar, N. E., Abualigah, L., & Elminaam, A. (2022).
Classification of glaucoma based on elephant-herding optimization algorithm and deep belief
network. Electronics, 11(11), 1763.
5. Abualigah, L., Kareem, N. K., Omari, M., Elaziz, M. A., & Gandomi, A. H. (2021). Survey
on Twitter sentiment analysis: Architecture, classifications, and challenges. In Deep learning
approaches for spoken and natural language processing (pp. 1–18). Springer.
6. Fan, H., Du, W., Dahou, A., Ewees, A. A., Yousri, D., Elaziz, M. A., Elsheikh, A. H., Abualigah,
L., & Al-Qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
world application UK Brexit. Electronics, 10(11), 1332.
7. Alomari, O. A., Khader, A. T., Al-Betar, M. A., & Abualigah, L. M. (2017). MRMR BA: A
hybrid gene selection algorithm for cancer classification. Journal of Theoretical and Applied
Information Technology, 95(12), 2610–2618.
8. Alomari, O. A., Khader, A. T., Al-Betar, M. A., & Abualigah, L. M. (2017). Gene selection for
cancer classification by combining minimum redundancy maximum relevancy and bat-inspired
algorithm. International Journal of Data Mining and Bioinformatics, 19(1), 32–51.
9. Chung, D. T. P., & Van Tai, D. (2019). A fruit recognition system based on a modern deep
learning technique. Journal of Physics: Conference Series, 1327.
10. Andrea, L., Mauro, L., & Di Ruberto, C. (2021). A novel deep learning based approach for
seed image classification and retrieval. Computers and Electronics in Agriculture, 187.
11. Shaohua, W., & Guodos, S.(2019). Faster R-CNN for multi-class fruit detection using a robotic
vision system. School of Information and Safety Engineering.
12. Osako, Y., et al. (2020). Cultivar discrimination of litchi fruit images using deep learning.
Scientia Horticulturae, 269.
13. Jaswal, D., Vishvanathan, S., & Soman, K. P. (2014). Image classification using convolutional
neural networks. International Journal of Scientific and Engineering Research, 5(6), 1661–
1668.
14. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
15. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
Mango Varieties Classification-Based Optimization … 65
16. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
17. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
18. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
19. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
20. Diahashree, G. (2017, June 1). Transfer learning and the art of using pre-trained models in deep
learning. https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tun
ing-a-pre-trained-model/
21. Transfer learning in Keras using VGG16, 2020. https://thebinarynotes.com/transfer-learning-
keras-vgg16/
Salak Image Classification Method Based
Deep Learning Technique Using Two
Transfer Learning Models
Lau Wei Theng, Moo Mei San, Ong Zhi Cheng, Wong Wei Shen,
Putra Sumari, Laith Abualigah, Raed Abu Zitar, Davut Izci, Mehdi Jamei,
and Shadi Al-Zu’bi
Abstract Salak is one of the fruits plants in Southeast Asia; there are at least 30
cultivars of salak. The size, shape, skin color, sweetness or even flesh color will be
different depending on the cultivar. Thus, classification of salak based on their cultivar
become a daily job for the fruit farmers. There are many techniques that can be used
for fruit classification using computer vision technology. Deep learning is the most
promising algorithm compared to another Machine Learning (ML) algorithm. This
paper presents an image classification method on 4 types of salak (salak pondoh, salak
gading, salak sideempuan and salak affinis) using a Convolutional Neural Network
(CNN), VGG16 and ResNet50. The dataset consists of 1000 images which having
250 of images for each type of salak. Pre-processing on the dataset is required to
standardize the dataset by resizing the image into 224 * 224 pixels, convert into
jpg format and augmentation. Based on the accuracy result from the model, the
best model for the salak classification is ResNet50 which gave an accuracy of 84%
followed by VGG16 that gave an accuracy of 77% and CNN which gave 31%.
1 Introduction
Snake fruit which also known as salak or Salacca zalacca is a species of palm tree
that is native to Indonesia but it is now grown and produced in the Southeast Asia
[1]. It is called snake fruit due to its reddish-brown scaly skin [2]. The inside of the
fruit consists of 3 lobes that resembles white colored large peeled garlic cloves. The
taste is commonly sweet and acidic with apple-like texture [2]. There is a lot type of
salak such as salak pondoh, salak sidempuan, salak gading, salak affinis, etc. They
are too similar, and it is hard to differentiate among them. Thus, this is where deep
learning comes to the picture.
Deep learning which also known as deep neural network or deep neural learning is
used to process the data and creates the patterns by imitating the human brain to make
a decision [3]. It uses neuro codes that are linked together within the hierarchical
neural network to analyze the incoming data [3]. Image recognition is one of the
most popular deep learning applications that helps a lot of field especially in fruit
agricultural to identify the classification of the fruit.
In the past few decades, CNN or deep learning has been proven a powerful tool
in handling big amount of data especially fruits, characters, animals classification
[4–8]. Say is something easier than done, there are also challenges in image classi-
fication. Image classification is mainly a process of labelling an image according to
the patterns (classes) [9, 10]. For example, image classification of an apple can be
categories at least three types of color which is red, green, yellowish and many more.
Some common problem in fruits detection are sizing, color and view-point variation
which the input image of red cherry and tomatoes can be likely looks similar to a red
apple [10]. According to the journal on fruit classification system using computer
vision, this paper uses image classification and processing to conduct fruits grading
quality, sorting and disease detection before selling to the market [11]. This imple-
mentation benefits fruits industry quality in time saving, reduce human errors, fast
and efficiency and protects good consumer relations [11]. Fruits disease detection
uses the techniques involve clustering, color-based segmentation, and other disease
categorization classifiers [11].
Convolutional Neural Network (CNN) is one of the popular algorithms used to
identify pattern in an image [12]. An image is a picture which form in appearance of
an object such as durian, strawberry, or mango. Is easy for human eye to detect the
object in the image but for a computer vision, it only read it as pixels in bits or binary
format. CNN is kind of deep neural network, very efficient and reliable for all image
processing. The combination of CNN involves few convolution layer, pooling layer
and fully connected neural network [13]. The first process of CNN requires input
image, cropping a section of input image to the convolution layer. Convolution layer
consist a number of filters to extract features with kernel (K) of size 3 × 3 × 1 from
section of input image [13]. Next the image will proceed through pooling layer that
Salak Image Classification Method Based Deep Learning … 69
used non-linear down-sampling which shortened half the size of the image during the
process [13]. There are two kind of pooling layer which is max pooling and average
pooling also referred as activation maps [13]. Max pooling identify the largest value
from the section of the image while average pooling uses the (total sum/number of
pool size) in an image. Next process duplicates the flow of convolution and pooling
layer again to extract more information through the image. Last process uses only
one fully connected layer that all neuron is connected into few classes. It determines
image in few possible classes such as 0.97% for apple, 0.02% for banana and 0.01%
for durian. At the end it will select the highest accuracy among all classes to populate
the result.
A good method to quickly resolve image classification problem is through transfer
learning models. One of the most significant advantages of applying transfer learning
models is that it reduces developer work without requiring too much time to build
a new model at the beginning because the transfer learning model can be instantly
applied to the present image classification problem [14]. Other than directly applying
the transfer learning models, the developer or user should understand the problem
definition of the image classification issue faced and perform fine tuning on certain
convolution layer. Froze some layers and more training layers to fit the objective of
the situation needed. There are various number of transfer learning model can be
used such as VGG, AlexNet, MobileNet, ResNet and etc. [14].
Other than using normal CNN for image classification, Karen Simonyan and
Andrew Zisserman from University Oxford published a paper title called “Very
Deep Convolutional Networks for Large-Scale Image Recognition” which intro-
duced VGG16 model [15]. This VGG16 model has larger parameter size likely the
same with AlexNet model but VGG16 consist of 16 layers convolution layers. The
architecture of VGG16 in first convolution layer fixed the size of (224 × 224) RGB
then continue with a max pooling layer (3 × 3). For the second convolution layer
fixed with size of (112 × 112) RGB and max pooling layer (3 × 3). Then continue
the with three convolution and one max pooling layer on the third to fifth phase,
lastly end with three fully dense layer. The max pooling layer is used to reduce the
image extraction sizes in half. This outstanding model result obtained up to 92.7%
accuracy, placing in the top five at ImageNet [15]. Although the model result is good
but there are also disadvantage such as model requires more time to train and size of
the architecture is huge [16].
Another popular transfer learning model is ResNet50, also known as residual
neural network [17]. ResNet50 uses lesser parameter as compared to VGG16, this
benefits in model running faster because of lesser weight in it. During feature extrac-
tion and weight learning, RestNet50 uses the same way softmax layer via CNN [18].
First pre-processing of ResNet50 resized all images to (224 × 224) pixels to fit the
model input size [18]. Then perform CNN in filtering method for image extraction
depends by the filtering mask applied in kernel (3 × 3) [18]. Next, the section of
the input image will go through feature extraction with 2D-Convolution filter [18].
Depending on the amount of weight in the image, the more valuable feature will be
extracted. Each layer will continue passing through the activation layer to understand
70 L. W. Theng et al.
complex feature. Lastly process in fully connected layer by repeating the backprop-
agation process depend on the input number of iterations [18]. Based on the keras
application result, this model achieved 92.1% percent accuracy with the parameter
of 25,636,712 [19]. Some other optimization methods can be used to optimize the
problems as given in [20–25].
The main goal of this paper is to develop a CNN model and 2 transfer learning
models which are VGG16 and ResNet50 for image classification. The developed
models should be able to classify the salak images into 4 types of classification
which are salak pondoh, salak gading, salak sideempuan and salak affinis.
2 Dataset
Dataset preparation is done to process or transform the image collected into a form
that can be used in designing the model. In this study, resizing, augmentation as well
as converting the images into a standard format is done.
. Resizing—Image’s pixel is resized into 224 × 224 × 3 pixel.
. Image format—Converted into JPEG standard format.
Salak Image Classification Method Based Deep Learning … 71
In this study, Convolutional Neural Network (CNN) as well as two transfer learning
which are VGG16 and ResNet50 models will be developed. All the models will be
trained and tested using the salak dataset to select the best accuracy among them.
3.1 CNN
the input images with the size of (224 × 224 × 3) and feed them into the 2 sets of
convolutional layers and pooling layers. The outputs are then flattened into a single
dimension and fed into 2 hidden layers before the final layer. The activation functions
used for the dense layer is relu and the final layer of the classifier is using the softmax
as its activation function. Since there are 4 classes in the salak dataset, the final output
should have 4 nodes (Fig. 7).
3.2 VGG16
In VGG16, the convolutional base model is frozen, and we unfreeze the top layer.
Two dense layers are added with units’ number 2048 and 1048 respectively and the
output layer with units’ number 4. Output layer is indicating the classes output. The
VGG16 model diagram is shown in Fig. 8.
3.3 ResNet50
In ResNet50, the convolutional base model is frozen, and we unfreeze the top layer.
Two dense layers are added with units’ number 2048 and 1048 respectively and
74 L. W. Theng et al.
the output layer with units’ number 4. Output layer is indicating the classes output.
Figure 9, 10, and 11
4 Performance Result
There is a total of 1000 color images in the salak dataset which contains 250 images
from each of the classes (salak pondoh, salak affinis, salak gading and salak sideem-
puan). All the image is resized into 224 × 224 pixels. The dataset is split into 70%
train, 20% validation and 10% test. Train dataset is used to train the modal while
the validation dataset is used to evaluate a given model performance while tuning
model hyperparameters. The test dataset is to acts as new data to evaluate the final
model performance. Python is used in these experiments as it has an extensive set of
libraries for artificial intelligence and machine learning such as TensorFlow, Keras
and Scikit-learn. We used Keras API to build, train and validation our models. Google
Colaboratory (Colab) Platform is used to perform all the experiments as no setup
76 L. W. Theng et al.
is required, share code with others without any setup and easy to use. Dataset is
upload to Google Drive and the path is shared within the team members and they are
required to add a shortcut to drive for the shared path. Colab allowed us to access our
Google Drive by using the drive module from google.colab. Figure 12 is shown the
code for mounting the drive. Once key in the authorization code by clicking on the
link, it mounted at the drive. We can access the same dataset without downloading
it.
ImageDataGenerator API is used to return batches of images from the subdirec-
tories Sideempuan, Pondoh, Gading and Affinis. Model summary for both VGG16
and ResNet50 is shown in Figs. 13, 14, 15, 16, 17, 18 and 19.
For the transfer learning model (VGG16 and ResNet50) and CNN, we perform
several fine-tuning parameters such as the number of epochs, optimizers, learning
rate and several dense layers. For CNN, additional tuning on filter size while for
transfer learning model on the unfrozen percentage of the model. The activation
function relu for the dense layer except for the output layer as the output layer used
softmax for all the experiments. Validation and test accuracy used to evaluate the
performance of the model.
Kernel size refers to the size of the filter, which convolves around the feature map.
In this experiments, 3 kernel size are used which are 2, 3 and 4 in CNN only while
VGG16 and ResNet50 model remain using the default value.
Figures 20 and 21 shows the test and validation accuracy obtained. The results
show that the validation accuracy have the best accuracy of 68% when kernel size
is at 3 while it became worst for the test accuracy which gave only 20%. For test
accuracy, it gave the best accuracy of 31% when the kernel size is at 4.
Pool size refer to size that is used to reduce the dimensions of the feature maps.
This will reduce the number of parameters to learn and the amount of computation
performed in the network. In this experiment, there are 3 pool size are used which are
Salak Image Classification Method Based Deep Learning … 77
2, 3 and 4 on CNN model only while the rest of the model will be using the default
value.
Figures 22 and 23 and shows the results of the validation and test accuracy. The
similar pattern as the kernel size can be seen whereby it gave best accuracy of 36%
validation accuracy when pool size is 3 and 31% test accuracy when pool size is 2.
Salak Image Classification Method Based Deep Learning … 79
Epoch is one of the neural networks’ hyperparameter which representing the gradient
descent that controls the number of complete passes through the training dataset. In
this experiment, 3 different epoch value are used which are 10, 20 and 50.
80 L. W. Theng et al.
Based on Fig. 24, the validation accuracy shows the highest at 35% when the epoch
value is at 10 and 50. As for the lowest validation accuracy, it is at 20% when the
epoch value is at 20. The test accuracy is at 31% following by a steady 27% when
to epoch is at 10, 20 and 50 as shown in Fig. 25.
Salak Image Classification Method Based Deep Learning … 81
Figures 26 and 27 show the accuracy obtained from the test and validation dataset.
The validation accuracy gave its highest at 75.5% when the epoch value is at 10
followed by 69.5% when the epoch value is at 20 and 71% when the epoch value is
at 50. As for the test accuracy, it gave 75% when the epoch is at 20, 73% when epoch
is at 50 and lastly 68% when epoch is at 10.
82 L. W. Theng et al.
The accuracy of the test and validation is as shown in Figs. 28 and 29. The epoch
value of 10 gave the highest accuracy of 84% and is decreasing as the epoch value
increase. As for the test accuracy, it gave the peak accuracy of 82% when the epoch
value is at 20.
Salak Image Classification Method Based Deep Learning … 83
Optimizers are a neural network algorithm that is used to change the attributes of
the neural network such as the weight parameters and learning rate. The objective of
the optimizers is to reduce the loss of the neural network function by enhancing the
84 L. W. Theng et al.
parameters of the neural network. In this experiment, there are 4 types of optimizer
that are used which are Adam, SGD, Adadelta and Adagrad.
Figures 30 and 31 shows the accuracy from validation and test dataset when using
different optimizer. Adagrad optimizer shows the best validation accuracy of 67%,
Salak Image Classification Method Based Deep Learning … 85
Adadelta gave 41.5%, Adam gave 35% and SGD gave 25%. For the test accuracy,
Adam gave the highest of 31% compared to SGD who gave 25%, Adagrad who gave
19% and Adadelta who gave 17%.
86 L. W. Theng et al.
Figures 32 and 33 shows the comparison of the accuracy using test and validation
dataset in VGG16 model. SGD optimizer shows the best optimizer when using the
validation dataset which having 71% followed by Adam and Adagrad which having
69.5% and lastly Adadelta which having 44%. As for the test data set, Adam gives
the best accuracy among all. Adam having an accuracy of 76%, SGD having 69%,
Adagrad having 66% and Adadelta having 50%.
Salak Image Classification Method Based Deep Learning … 87
The effect of the optimizer on ResNet50 is shows in Figs. 34 and 35. For the valida-
tion accuracy, Adadelta giving the highest accuracy of 86.5% while Adagrad gave
accuracy of 78%. Adam and SGD gave the lowest accuracy of 25%. As for the test
accuracy, Adagrad shows the best result obtained which are 82% of the accuracy.
However, Adadelta is also given a quite high accuracy of 79% while Adam and SGD
are the lowest which gave an accuracy of 25%.
88 L. W. Theng et al.
Learning rate is one hyperparameter of neural network that controls how much to
change the model in response to the estimated error for each time the weight of the
model is updated. Selecting the learning rate is a challenge as a too small value will
result in a long training process while high value will cause the training process to
Salak Image Classification Method Based Deep Learning … 89
unstable. There are 4 different learning rate values are used in this experiment which
are 0.1, 0.01, 0.001 and 0.0001.
90 L. W. Theng et al.
Figures 36 and 37 shows the result of the validation and test accuracy. The validation
accuracy on CNN shows its peak on 82.14% when the learning rate is 0.01. When
learning rate is at 0.1 it gave an accuracy of 26.7% followed by 25% with learning rate
of 0.001 and 0.1. As for the test accuracy, it shows the similar pattern as validation
Salak Image Classification Method Based Deep Learning … 91
accuracy. When learning rate is 0.01, it gave the highest test accuracy of 35% followed
by 25% when the learning rate is at 0.1, 0.001 and 0.0001.
Figures 38 and 39 show the accuracy on test and validation dataset. The highest
accuracy is at 76% when learning rate value is 0.0001 for validation accuracy and
92 L. W. Theng et al.
0.001 for test accuracy. The overall results show that the higher the value of learning
rate, the lower the accuracy.
Figures 40 and 41 shows the accuracy obtained for ResNet50 based on the learning
rate. The results show the similar pattern as VGG16, whereby the higher the value
of the learning rate, the lower the accuracy will be. Both test and validation highest
accuracy is at 83% when learning rate is at 0.0001 and 0.001 respectively.
Dense layer is a neural network layer that is connected deeply. This means that all
neuron in the dense layer receives inputs from the previous layer. In this experiment,
4 different dense layer is used, which are 1, 2, 3 and 4.
Figures 42 and 43 show the effect of dense layer on validation and test accuracy for
CNN. The results show that as the dense layer increase, the accuracy will decrease.
The validation accuracy gave 51% for 1 dense layer followed by 33.5%, 28% and 31.5
respectively. As for the test accuracy, it gave its highest accuracy of 25% followed
by 16% and 23%.
Figures 44 and 45 shows the result of the validation and test accuracy. The highest
validation accuracy is at 76% while the highest test accuracy is at 77% when dense
layer is at 3. As for the lowest both gave 25% when the dense layer is at 2.
Salak Image Classification Method Based Deep Learning … 95
Figures 46 and 47 shows the result of the validation and test accuracy. The valida-
tion accuracy has the highest at 86.5% as the dense layer increased. As for the test
accuracy, it shows the highest at 82% while the lowest is at 72%.
96 L. W. Theng et al.
model base and attaching a few newly added classifier layers. Retraining of the newly
modified models is required to obtain the new weights and biases.
As shown in Tables 1 and 2, the pretrained model performs the best when 100% of
the layers of the pretrained models are frozen. ResNet-50 in general performs better
than VGG-16 for this dataset as it can obtain over 80% of accuracy when 100% of the
layers are frozen. Bold font refers to the best result. Both model performances decline
after we unfreeze the layers. VGG-16 gets 25% of accuracy for all the unfrozen
98 L. W. Theng et al.
percentage. Whereas ResNet-50 can obtain high validation accuracy but very low-
test accuracy. This suggests that the ResNet-50 model is having the problem of
overfitting after we unfreeze the layers. In a nutshell, a pre-trained model performs
better for salak dataset when the layers are all frozen.
All three models have the highest validation accuracy when epoch = 10. Within the
range of 10, 20 and 50. After that, the validation accuracy suffers a drop at epoch
= 20 and increases again at epoch = 50. As for the test accuracy, the graphs above
show that pretrained models can achieve the highest test accuracy when epoch = 20
within the epoch range of 10, 20, 50. CNN on the other hand, has the highest test
accuracy when epoch = 10. This shows that CNN can achieve high test accuracy
faster than the pre-trained models (Figs. 48 and 49).
From the two bar charts (Figs. 50 and 51), we can infer that using Adadelta
and Adagrad can yield better validation accuracy for all the models. When we are
comparing the test accuracy, pretrained models that use Adadelta and Adagrad can
give a higher test accuracy. However, CNN model with Adam optimizer can give a
higher test accuracy compared to CNN model that uses other optimizers.
For the learning rate charts, they share quite similar trends for validation accuracy
and test accuracy (Figs. 52 and 53). Pretrained models are having the decreasing
trend on the validation accuracy while CNN model is having the optimal learning
rate at 0.001 on validation accuracy and test accuracy. Pretrained models are also
having highest at optimal learning rate at 0.001 on test accuracy. Therefore, we can
deduce that 0.001 of learning rate works best for the salak dataset in this study.
Based on the two graphs (Figs. 54 and 55), we can see that the increasing dense
layer for the ResNet-50 pre-trained model increases its validation accuracy but
decreases its test accuracy. while VGG-16 is sharing a similar pattern for the valida-
tion accuracy and test accuracy, having the highest score when dense layer = 3 and
lowest when dense layer = 2. Whereas for CNN model, we can see a decrease in
Salak Image Classification Method Based Deep Learning … 99
validation accuracy and test accuracy when the dense layer is increased. Therefore,
CNN performs the best when dense layer = 1.
The best performing model is ResNet-50 with 84% of test accuracy, closely
followed by VGG-16 model with 77% test accuracy (Fig. 56). CNN has the lowest
test accuracy which is 31% for the salak dataset. The best combinations of parameters
and the hyperparameters of the 3 respective models are presented in Table 3.
100 L. W. Theng et al.
Fig. 56 VGG16, ResNet50 and CNN—comparison of best validation and test accuracy
5 Conclusion
In conclusion, the experiments are performed using CNN model and 2 transfer
learning models which are VGG16 and ResNet50. The results are compared using
test accuracy and validation accuracy to evaluate the performance of the model for
each of the fine-tuning parameters. The highest validation accuracy value for each of
the model when epoch at 10. The highest test accuracy for transfer learning models
(VGG16 and ResNet50) is when epoch is at 20 while the CNN is when epoch is at
10. ResNet50 has the highest test accuracy which is 84% compare to VGG16 and
CNN. Transfer learning model is performed better than CNN model. In this dataset,
there is overfitting as the model is performs well on the training data but performs
poorly on the validation data which is not used during training. There is some future
work can be done to increase accuracy. Sampling method can be used to split the
dataset into train, validation, and test dataset.
References
11. Naik, S., & Patel, B. (2017). Machine vision based fruit classification and grading-a review.
International Journal of Computer Applications, 170(9), 22–34.
12. What is a convolutional neural network? [Online]. https://poloclub.github.io/cnn-explainer/.
Accessed June 22, 2021.
13. Das, A. (2020). Convolution neural network for image processing—Using keras. towards
data science, August 21, 2020. [Online]. https://towardsdatascience.com/convolution-neural-
network-for-image-processing-using-keras-dc3429056306. Accessed June 22, 2021.
14. Marcelino, P. (2018). Solve any image classification problem quickly and easily. KDnuggets,
December 2018. [Online]. https://www.kdnuggets.com/2018/12/solve-image-classification-
problem-quickly-easily.html. Accessed June 24, 2021.
15. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale
image recognition, Cornel University, April 10, 2015. [Online]. https://arxiv.org/abs/1409.
1556. Accessed June 24, 2021.
16. VGG16—Convolutional network for classification and detection. Neurohive, November 20,
2018. [Online]. https://neurohive.io/en/popular-networks/vgg16/. Accessed June 24, 2021.
17. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition,
Cornell University, December 10, 2015. [Online]. https://arxiv.org/abs/1512.03385. Accessed
June 25, 2021.
18. Zahisham, Z., Lee, C. P., & Lim, K. M. (2020). Food recognition with ResNet-50. In IEEE 2nd
international conference on artificial intelligence in engineering and technology (IICAIET)
19. “Keras” [Online]. https://keras.io/api/applications/. Accessed June 6, 2021.
20. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
21. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
22. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
23. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
24. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
25. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
Image Processing Identification
for Sapodilla Using Convolution Neural
Network (CNN) and Transfer Learning
Techniques
Abstract Image identification is a useful tool for classifying and organizing fruits
in agribusiness. This study aims to use deep learning to construct a design for
Sapodilla identification and classification. Sapodilla comes in a various of vari-
eties from throughout the world. Sapodilla can come in different sizes, form, and
taste depending on species and kind. The goal is to create a system which uses
convolutional neural networks and transfer learning to extract the feature and deter-
mine the type of Sapodilla. The system can sort the type of Sapodilla. This research
uses a dataset including over 1000 pictures to demonstrate four different types of
Sapodilla classification approaches. This assignment was completed using Convo-
lutional Neural Network (CNN) algorithms, a deep learning technology widely
utilised in image classification. Deep learning-based classifiers have recently allowed
to distinguish Sapodilla from various images. Furthermore, we utilized different
versions of hidden layer and epochs for various outcomes to improve predictive
performance. We investigated transfer learning approaches in the classification of
Sapodilla in the suggested study. The suggested CNN model improves transfer
learning techniques and state-of-the-art approaches in terms of results.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 107
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_5
108 A. Khazalah et al.
1 Introduction
2 Literature Survey
Despite the fact that several scientists have handled the subject of fruit recognition,
such as with the resulting in the development in [2, 15–17] the survey concluded
that the difficulty of developing a quick and efficient fruit detector continues. This
is owing to the variety of color, dimensions, sizes, textures, ed and susceptible to
constantly shifting lighting and shadow circumstances in the bulk of these scenarios.
The subject of fruit recognition as a feature’s extraction issue has been addressed
110 A. Khazalah et al.
in many works in the literature (i.e., fruit vs. background). The subject of apples
recognition for yield estimation was investigated by Wang et al. [2].
They established a form that could recognise apples primarily on their color and
sparkling reflection’s structure. Additional details were utilised whether to elimi-
nate inaccurate incidences or separate regions that might include numerous apples,
including the size distribution of apples. Another strategy used was to only consider
detection methods from locations that were predominantly circular. Bac at [15]
and his colleagues for sweet peppers, a classification method was presented. They
employed 6 multi-spectral cameras and a variety of characteristics, comprising unpro-
cessed spectral information, standardized precipitation indexes, and feature descrip-
tors based on entropy. Investigations in a carefully controlled glasshouse setting
revealed that this method yielded fairly accurate segmented image. The writers,
though, made a point. It wasn’t precise enough to create a trustworthy impediment
map.
For almonds identification, Hung et al. [16] advocated using artificial potential
fields. They suggested a five-class categorization method based on a Sparse Autoen-
coder that learnt parameters (SAE). These traits were again applied to a CRF frame-
work, which outperformed earlier research. They were able to divide the data quite
well, but could not recognize any objects. They also mentioned that refraction was a
significant difficulty. Instinctively, such a strategy can only handle modest amounts
of opacity.
Yamamoto et al. [15], for example, used color-based segmented to conduct tomato
identification. Then, using color and figure information, a Classifier and Regression
Trees (CART) classifier was trained. As a result, a classification map was created,
which divided related pixels into areas. To limit the number of false alerts, each zone
was assigned a detector. They used a random forest to train a non-fruit classifier in
regulated glasshouse conditions.
A pixel-level separation methodology for image recognition has been used in every
one of the earlier in this thread research, and the majority of these efforts have focused
on fruit recognition primarily for production estimate. Fruit recognition has only
been done in regulated glasshouse situations in the few experiments that have been
done. All things considered, the issue of organic product location in exceptionally
testing conditions remains unsolved. This is because of the great changeability in
the presence of the objective articles in the horticultural settings, which implied
that the exemplary strategies for sliding window draws near, despite the fact that
showing great execution when tried on datasets of chosen pictures, can’t deal with
the inconstancy in scale and presence of the objective items when sent in genuine
homestead settings [2, 15].
Deep learning models have already made significant advances in the categoriza-
tion and recognition of objects. On PASCAL-VOC, the state-of-the-art recognition
architecture is divided into two phases. The pipeline project first step uses a fully
convolutional approach like feature extraction or edge box to pick areas of focus from
a picture, which are then sent to a deep learning for classifications. This pipeline is
computationally intensive, preventing that from being employed in instantaneously
for an engineering application, despite its great recognition memory [16, 17]. RPNs
Image Processing Identification for Sapodilla Using Convolution Neural … 111
Artificial neural networks [18, 19] produced the most effective achievements in the
domain of picture identification and classification. The majority of deep learning
methods are built on top of all these systems.
Neural networks [18] is a type of machine learning technique that employs
numerous layers of asymmetric processing elements. Each layer acquires the ability
to modify its incoming information into a more complex and model is an appropriate
[19]. Other machine learning techniques have been outperformed by deep learning
models.
In some sectors, they also accomplished the very first superhuman image recog-
nition [18]. This is amplified by the fact that neural networks are seen as a vital
step towards achieving High Quantities. Second, deep learning models, particu-
larly convolutional neural network (CNN), have been shown to provide excellent
classification performance recognition.
112 A. Khazalah et al.
A deep learning framework is used for the conceptual model. There are three CNN
layer in the framework. A group of pixels in the picture might indicate a picture’s
boundary, the shadows of a picture, or any other structure. Convolution is one method
for detecting these connections. A matrix is used to describe the picture elements
during calculation. The CNN Model’s framework is seen in Fig. 2. It entails the
extraction and categorization of features. Cropping removes any unnecessary data
from input photos. The pictures have all been resized. Convolution and pooling layers
are applied a repeatedly to extract features. One convolution layer and a maximum
pooling layer are found in the first two blocks. For identifying the examples, we
need to utilize a “filter” network which is increased with the picture pixel grid. These
channel sizes may shift and the duplication absolutely relies upon the channel size
and one can take a subset from the picture pixel lattice dependent on the filter size for
convolution beginning from the principal pixel in the picture pixel network. Then, at
that point the convolution continues forward to the following pixel and this cycle is
rehashed until all the picture pixels in the framework are finished. Then, at that point
the convolution continues forward to the following pixel and this cycle is rehashed
until all the picture pixels in the lattice are finished. The pooling layer will be the next
kind of level in the CNN method. This layer reduces the outcome size, i.e. the feature
map, and hence avoids curse of dimensionality. A fully connected surface is utilised
as the output layer. This level “compresses” the result from preceding levels into a
descriptor number that may be used as an intake for the following stage. Figure 3
shows the trained images of CNN model.
eliminated. Then, towards the platform’s end, we inserted additional thick layer, this
time with the amount of sapodilla types we wish to forecast. Figure 4 shows the
transfer learning.
3.2.1 VGG16
Convolutional and completely linked layers make up the 16-layer matrix. For conve-
nience, just 33 convolution layers were put on top of the other. The first and secondary
convolution materials are composed of 64 element kernels filter with a size of 33%.
The parameters of the input picture increase to 224 × 224 × 64 as it passes through
the first and secondary convolution layer. The outcome is then transferred to the
pooling layer with a duration of two. The 124 element kernels filter in the 3rd and
4th convolutional layers have a filter of 33%. After these 2 phases, a max pooling
with 2 × 2 is applied, and the outcome is shrunk to 56 × 56 × 128. Convolutional
layers with just a kernel of 33 are used in the five, six, and seven levels. 256 local
features are used in all 3. These cells are surrounded by a phase 2 pooling layer.
114 A. Khazalah et al.
There are 2 types of convolution operation with kernel sizes of 33rd and thirteenth.
There are 512 kernels filters in every one of those convolution kernel sets. Following
such levels is a max—pooling with a duration of 1. Figure 5 shows the architecture
of VGG16.
Fig. 5 Architecture of
VGG16
Image Processing Identification for Sapodilla Using Convolution Neural … 115
3.2.2 VGG19
VGG19 is perhaps the latest VGG architecture, and it looks quite identical to VGG16.
When we examine the structure of the network with VGG16, we’ll notice that they’re
both built on 5 convolutional layers. Nevertheless, by implementing a convolu-
tion operation throughout the last 3 groups, the network’s complexity was already
enhanced yet further. The intake is indeed an RGB picture with the form (224, 224,
3), and the outcome is a features vector with the same structure (224, 224, 3). VGG19
has its own preparation method in Keras, however if we examine at the source, we’ll
notice that it is indeed exactly the same except VGG 16. As a result, we won’t have
to redefine anything.
3.2.3 MobileNet
3.3 Dataset
Pictures of four various types of sapodilla are included in the dataset. The four types
of sapodilla are ciku Subang, ciku Mega, ciku Jantung and ciku Betawi. The pictures
in the collection include ciku of various sizes from several classes. The photos do
not have a uniform backdrop. Various postures of the very same sorts of ciku may
be found in the dataset. Cikus are included in a variety of postures and viewpoints,
including side angle, back view, various backgrounds, partially chopped, sliced on the
plate, chopped into bits, displaying the seeds, and degree of variability. Ciku might
be freshly, rotting, or packaged in bunches. Many photos have bad lighting, unusual
lighting characteristics, are covered with net, are adorned, decorated, and have leaves
on trees. The dataset consist of more than 1000 images. Figure 6 shows sample dataset
images. Table 1 shows the dataset description.
116 A. Khazalah et al.
3.4 Augmentation
The availability of data frequently enhances the effectiveness of deep learning neural
network models. Data augmentation is a method of dynamically creating fresh
training data from previous facts. This is accomplished by using database methods to
transform instances from the learning algorithm into different and innovative training
images. The very well sort of dimensionality reduction is picture data augmentation,
which entails transforming pictures in the train dataset into modified copies that
correspond to almost the same classification also as actual picture. Transitions, rota-
tions, digital zoom, and other procedures from the area of picture modification are
included in transformations. The goal is to add fresh, credible instances to the train
collection.
This refers to changes of the training data set pictures that perhaps the algorithm
is interested in examining. A horizontally tilt of a sapodilla shot, for instance, would
make logical sense as its picture may well have been captured from either the left-
hand side or right. A vertically inversion of a sapodilla image makes some sense and
is certainly not acceptable, considering that perhaps the modelling is uncommon to
view an inverted sapodilla shot.
As a result, it is obvious that only the exact data augmentation methodologies
utilised for a training sample are always deliberately selected, taking into account
the training set as well as understanding of the issue area. Furthermore, experimenting
Image Processing Identification for Sapodilla Using Convolution Neural … 117
4 Performance Result
To eliminate any extra info, the pictures in the collection are normalised, shrunk, and
clipped. The information is split into two parts: train and validation. The dataset is
divided into 80 and 20%.
employed a few models, our own model built from scratch and also a few existing
image classification models to perform transfer learning as detailed below. Figure 8
shows the proposed model. Figure 9 shows VGG16 model. Figure 10 shows VGG19
model.
This method incorporates a 20% sample size, learning rates, batch sizes of 500, and
epochs of 20. The results were evaluated on the testing sample after the machine was
developed on the sapodilla training dataset. The model’s accuracy is 0.54. Figure 11
shows the plot of training accuracy and validation accuracy.
Fig. 9 VGG16
120 A. Khazalah et al.
Fig. 10 VGG19
Image Processing Identification for Sapodilla Using Convolution Neural … 121
at = β1 at−1 + (1 − β1 )dt
ut = β2 ut−1 + (1 − β2 )dt2
In the very first trial, 3 convolution layers of filter sizes of 3 * 3 pixel resolution and
32 filters were used; with in experiment 2, the quantity was increased from 32 to 64
for three separate convolutional layer with about the same sampling frequency of
3 * 3 pixel values; and in the experiment 3, 128 filters with filter sizes of 3 * 3 pixels
were tried to apply. Table 3 shows the filter size. The runtime was also affected by the
filter size with number of filters. Table 3 shows that the model has greater accuracy
when the filter size is 128.
Image Processing Identification for Sapodilla Using Convolution Neural … 123
Each outcome component in the neural network’s hidden layers has a variable
distance measure. We attempt to create them take the characteristics of the data
since they are adaptable. The concealed element’s borders are made up of a variety
of characters. As a result, we modify the masses of all these concealed component
lines to vary the form of the border. Figure 14 and Table 4 show the accuracy of the
number of epochs. The number of epochs determines how often times the network’s
parameters will indeed be changed. As that of the quantity of epochs grows, so do
the bunch of times the neural network’s parameters are altered, and also the border
shifts between minimizing the error to optimum to curse of dimensionality. In this
experiment when the number of epochs is 30 the accuracy of the model increased to
0.99.
The sampling frequency, often known as the learning rate, is the quantity by which
the parameters are changed throughout learning. Figure 15 shows the learning rates.
The learning rate is a customizable parameter that seems to have a modest particular
benefit, usually within 0.0 and 1.0, being used in the application of neural networks.
The learning rate is a parameter that determines when rapidly the system adapts to the
challenge. Considering the minor improvements to the parameters during iteration,
lesser learning rates necessitate greater training epochs, however bigger learning
rates necessitate smaller training epochs. A high learning rate helps the network to
estimate more quickly, but at the price of a sub-optimal ultimate deep network. A
slower learning rate might expect the system to acquire a somewhat more optimum
or indeed completely optimum weight matrix, but training will take considerably
longer.
When contrasted to its layers, CNNs with minimum layers have low installation
needs and quicker training periods. Table 5 shows the comparison of accuracy. Short
recovery durations enable more parameters to be tested and make the entire devel-
opment transition easier. Reduced computational needs also allow for higher image
quality. The best model is the one which is obtained by using the adam optimizer
then it obtained the accuracy of 0.99. Figure 16 shows the bar chart representing
accuracy scores.
5 Conclusion
The study develops a deep convolutional neural network for sapodilla recognition and
categorization. The study describes a technology that performs automated sapodilla
species detection. Mostly on data, the CNN approach performs really well. The
126 A. Khazalah et al.
technique may have been used to training a large range of sapodilla in the next level
of applications. It may also look at the effects of other variables such as Optimizers,
Epochs, dense layers, learning rates and pooling function. We additionally ran several
quantitative tests by using Keras library to categorise the photos based on their
content. Only with aid of the suggested convolution neural network, the provided
proposed method can simplify the process of the neural network in categorising the
kind of sapodilla, minimizing administrative mistakes in sapodilla classification. The
suggested Convolution layer has a 99% accuracy rate.
References
1. ABARE. (2015). Australian vegetable growing farms: An economic survey, 2013–14 and 2014–
15. Australian Bureau of Agricultural and Resource Economics (ABARE), Canberra, Australia.
Research report.
2. Abualigah, L., Al-Okbi, N. K., Elaziz, M. A., & Houssein, E. H. (2022). Boosting marine
predators algorithm by salp swarm algorithm for multilevel thresholding image segmentation.
Multimedia Tools and Applications, 81(12), 16707–16742.
3. Palakodati, S. S. S., Chirra, V. R., Dasari, Y., & Bulla, S. (2020). Fresh and rotten fruits
classification using CNN and transfer learning. Revue d’Intelligence Artificielle, 34(5), 617–
622. https://doi.org/10.18280/ria.340512
4. Sakib, S., Ashrafi, Z., & Siddique, M. A. (2019). Implementation of fruits recognition classifier
using convolutional neural network algorithm for observation of accuracies for various hidden
layers. ArXiv, abs/1904.00783.
5. Mettleq, A. S. A., Dheir, I. M., Elsharif, A. A., & Abu-Naser, S. S. (2020). Mango classification
using deep learning. International Journal of Academic Engineering Research (IJAER), 3(12),
22–29.
6. Rojas-Arandra, J. L., Nunez-Varela, J.I., Cuevas-Tello, J.C., & Rangel-Ramirez, G. (2020)
Fruit classification for retail stores using deep learning. In Proceedings of pattern recognition
12th mexican conference, Morelia, Mexico (pp. 3–13).
7. Risdin, F., Mondal, P., & Hassan, K. M. (2020). Convolutional neural networks (CNN) for
detecting fruit information using machine learning techniques.
8. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
9. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
10. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
11. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
12. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
13. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
14. Álvarez-Canchila, O. I., Arroyo-Pérez, D. E., Patino-Saucedo, A., González, H. R., & Patiño-
Vanegas, A. (2020). Colombian fruit and vegetables recognition using convolutional neural
networks and transfer learning.
Image Processing Identification for Sapodilla Using Convolution Neural … 127
15. Otair, M., Abualigah, L., & Qawaqzeh, M. K. (2022). Improved near-lossless technique using
the Huffman coding for enhancing the quality of image compression. Multimedia Tools and
Applications, 1–21.
16. Liu, Q., Li, N., Jia, H., Qi, Q., & Abualigah, L. (2022). Modified remora optimization algorithm
for global optimization and multilevel thresholding image segmentation. Mathematics, 10(7),
1014.
17. Lin, S., Jia, H., Abualigah, L., & Altalhi, M. (2021). Enhanced slime mould algorithm for
multilevel thresholding image segmentation using entropy measures. Entropy, 23(12), 1700.
18. Ciresan, D. C.,Meier, U.,Masci, J., Gambardella, L. M., & Schmid-Huber, J. (2011). Flexible,
high performance convolutional neural networks for image classification. In Proceedings of the
twenty-second international joint conference on artificial intelligence—Volume Two, IJCAI’11
(pp. 1237–1242). AAAI Press.
19. Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. CoRR
abs/1507.06228.
Comparison of Pre-trained
and Convolutional Neural Networks
for Classification of Jackfruit Artocarpus
integer and Artocarpus heterophyllus
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 129
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_6
130 S.-Q. Ong et al.
architecture and transfer learning by five pre-trained CNNs. We also compared the
performance of optimizers and three levels of epoch on the performance of the model.
In general, transfer learning with a pre-trained VGG16 neural network provides
higher performance for the dataset; the dataset performed better with an optimizer
of SGD, compared with ADAM.
1 Introduction
(b) To compare the performance of customized DCNN and transfer learning algo-
rithm with pre-trained CNN of Xception, VGG16, VGG19, ResNet50 and
InceptionV3.
2 Literature Review
Due to the similarities between classes and inconsistent features within the cultivar,
fruit and vegetable classification presents significant problems [6–8]. Due to the wide
diversity of each type, the selection of appropriate data collection sensors and feature
representation methods is particularly critical [9–12]. Methods for quality assessment
and automated harvesting of fruits and vegetables have been created, but the latest
technologies have been created for limited classes and small data sets. The problem
is multidimensional, with many hyper-dimensional properties, which is one of the
fundamental problems in current machine learning techniques [13]. The authors of
this study concluded that machine vision methods are ineffective when dealing with
multi-characteristic, hyperdimensional data for classification. Fruits and vegetables
are divided into several groups, each of which has its own set of characteristics.
Due to the paucity of basic data sets, specific classification methods are limited.
The majority of trials are either restricted in terms of categories or data set size. The
present study into building a pre-trained CNN is a step toward creating the capacity to
supply turnkey computer vision components. These pre-trained CNNs, on the other
hand, are data-driven, and there is a scarcity of huge datasets of fruits and vegetables
[13].
Rahnemoonfar and Sheppard [14] utilise a deep neural network to apply to robotic
agriculture in this article (DNN). This study focuses on tomato pictures found on
the internet. They utilised an Inception-ResNet architecture that had been tweaked.
A variety of training data was used to train the model (under the shade, surrounded
by leaves, surrounded by branches, the overlap between fruits). Their search results
revealed an average test accuracy of 93% on synthetic pictures and 91% on actual
photos. In this study, researchers used CNN to create a model that can notify a
driver of a car when he or she is sleepy. To extract features and apply them in the
learning phase, the deep convolution network was created. The CNN classifier uses
the SoftMax layer to determine whether a driver is sleeping or not. For this research,
the Viola-Jones face detection method was adapted. The eye area is removed from the
face when it has been discovered. The suggested Staked Deep CNN overcomes the
drawbacks of standard CNN, such as location accuracy in regression. The suggested
model has a 96.42% accuracy rate. The researchers suggest that transfer learning can
be used in the future to improve the performance of the model [15]. Based on four
different varieties of fruits, this research article provides a method for recognising the
kind of fruit (litchi, apple, grape and lemon) [16]. Smartphones were used to capture
the photos, which were then processed using a contemporary detection framework.
Because the model is trained using a new data set of 2403 data from four different fruit
classes, CNN is utilised to train it. The model’s total performance was outstanding,
Comparison of Pre-trained and Convolutional … 133
with a precision of 99.89%. Where CNN was successful in identifying the sort of
fruit. The researchers plan to use the algorithm to detect a variety of fruits in the
future. Some other optimization methods can be used to optimize the problems as
given in [17–22].
3 Methodology
3.1 Dataset
The fruit dataset was shot with a digital single-lens reflex (DSLR) camera (Canon
7D, ∅22.3 × 14.9 mm CMOS sensor, RGB Color Filter Array, 18 million effective
pixels). The data are two classes as follows: cempedak (Artocarpus integer) and
nangka (Artocarpus heterophyllus) with a total of 1000 images (each class consists
of 500 images) with a resolution of 4608 × 3456 pixels. For the training purpose of
the network, a sub-sampling of a factor of 72 was performed on the entire data set
producing images of 48 × 64 pixels. The images were collected with three spectrums
of lights: green, red, blue (by introducing an external gel filter on the flashlight) and
white light. This is aim to have a dataset that could represent high variability in
position and number of fruits devising a real scenario.
The entire dataset of images is reshaped to 224 × 224 × 3 and converted into a
NumPy array for faster convolution in the case of building the CNN model. The
converted dataset of images is labelled according to the two classes, and training of
the dataset was conducted with the random image augmentation is applied, validation
is done in parallel while training and tested upon the test set. Data partitioning was
performed by splitting the data into training and test sets, as illustrated in Fig. 2.
For this research, the DCNN model for classifying the cempedak (Artocarpus integer)
and nangka (Artocarpus heterophyllus) is shown in Fig. 3. It consists of 15 convolu-
tional layers/blocks of deep learning. The first convolution layer uses 16 convolution
filters with a filter size of 3 × 3, kernel regularizer, and bias regularizer of 0.05. It also
uses random_uniform, which is a kernel initializer. It is used to initialize the neural
network with some weights and then update them to better values for every iteration.
Random_uniform is an initializer that generates tensors with a uniform distribution.
134
Test set
Model
(30%) Adjustment
(300 images)
Fig. 2 Data splitting and process to be used for training and testing
S.-Q. Ong et al.
Comparison of Pre-trained and Convolutional … 135
Its minimum value is -0.05 and the maximum value of 0.05. Regularizer is used to add
penalties on the layer while optimizing. These penalties are used in the loss function
in which the network optimizes. No padding is used so the input and output tensors
are of the same shape. The input image size is 224 × 224 × 3. Then before giving
output tensor to max-pooling layer batch normalization is applied at each convolu-
tion layer which ensures that the mean activation is nearer to zero and the activation
standard deviation nearer to 1. After normalizing RELU an activation function is
used at every convolution. The rectified linear activation function (RELU) is a linear
function. It will output the input when the output is positive, else it outputs zero. The
output of each convolutional layer given as input to the max-pooling layer with the
pool size of 2 × 2. This layer reduces the number of parameters by down-sampling.
Thus, it reduces the amount of memory and time required for computation. So, this
layer aggregates only the required features for the classification. The finally a dropout
of 0.5 is used for faster computation at each convolution. The 2nd convolution layer
uses 16 convolution filters with 5 × 5 kernel size and the third convolution layer use
16 convolution filters with 7 × 7 kernel size. Finally, we use a fully connected layer.
Here dense layer is used. Before using dense we have to flatten the feature map of
the last convolution. In our model, the loss function used is categorical cross-entropy
and we compare the performance of the optimizers of Adam and SGD with three
levels of epochs (25, 50 and 75), and with a learning rate of 0.001.
Customization of deep convolutional neural network models may take a longer time
to train on the datasets. Transfer learning consists of taking features that have been
learned on one problem of a dataset and leveraging them on a new and similar
problem. In this study, the workflow of the proposed model construction was first,
take layers from a previously trained model (VGG16, VGG19, Xception, ResNet50,
InceptionV3) and freeze them, to avoid destroying any of the information they contain
during future training rounds. Next with the addition of new and trainable layers on
top of the frozen layers. The layers of the architecture then learn to turn the old features
into predictions on a new dataset. Here we were comparing five transfer learning
models—VGG16, VGG19, Xception, ResNet50, InceptionV3 with the proposed
CNN model.
3.4.1 VGG16
VGG16 was developed by Simonyan and Zisserman for ILSVRC 2014 competition.
It consists of 16 convolutional layers with only 3 × 3 kernels. The design opted
by authors is similar to Alexnet i.e., increase the number of the features map or
convolution as the depth of the network increases. The network comprises of 138
million parameters. In our model, this architecture is modified at the last FC layer
136 S.-Q. Ong et al.
with 1000 classes. We replaced the 1000 classes with our number of classes i.e., 6.
Adam Optimizer is used and accuracy is obtained. Similarly, by pushing the depth
to 19 layers vgg19 architecture is defined. As stated above we changed the number
of output classes to 6 in the last layer.
Comparison of Pre-trained and Convolutional … 137
3.4.2 VGG19
The VGG19 is an upgrade to the VGG16 model. VGG19 enhances VGG16 archi-
tecture by eliminating AlexNet’s flaws and increasing system accuracy [3]. It is a
19-layer convolutional neural network model and is constructed by stacking convo-
lutions together, however, the depth of the model is limited due to a phenomenon
known as diminishing gradient. Deep convolutional networks are difficult to train
because of this issue.
3.4.3 ResNet50
3.4.4 Inception V3
3.4.5 Xception
Xception stands for “Extreme Inception”. This architecture was proposed by Google.
It consists of the same number of parameters that are used in Inception V3. The
efficient usage of parameters in the model and increased capacity are the reasons
for the performance increase in Xception. The output maps in inception architecture
138 S.-Q. Ong et al.
The dataset has been processed and analysed using various analysis method. With
higher trainable weights for a customised build of the proposed DCNN modal, the
training takes a longer time. Based on the data in Table 1, it shows that the proposed
DCNN architecture able to provide an accuracy of 0.89 to 0.9367. The graph to
represent the comparison between the proposed method (highlighted in Yellow) and
other models are shown in Figs. 4 and 5 respectively. Out of all, the accuracy of the
VGG16 and the SGD is the highest. While SDG is the highest, VGG16 provide more
stable and consistence performance throughout the epoch and it is evident as shown
in Fig. 6. Overall, it shows that the higher the epoch, the higher accuracy.
Table 1 Accuracy of the proposed DCNN and transfer learning models with optimizers of Adam
or SGD at three level of epochs
Optimizer Adam SGD
Epochs 25 50 75 25 50 75
Proposed model 0.8933 0.9267 0.9100 0.9233 0.9267 0.9367
Xception 0.8200 0.8800 0.9000 0.9000 0.9167 0.9000
VGG16 0.4733 0.8667 0.8700 0.6000 0.9567 0.9633
VGG19 0.7967 0.8567 0.8800 0.8800 0.8800 0.8800
ResNet50 0.6800 0.7200 0.7500 0.7933 0.6900 0.8000
InceptionV3 0.8800 0.8900 0.9167 0.9133 0.9000 0.9167
Adam Optimizer
0.9267 0.89
0.8567
1 0.8933 0.91 0.88 0.9 0.88 0.88 0.9167
0.8667 0.87 0.72
0.82 0.7967
0.75
Accuracy
0.8 0.68
0.6 0.4733
0.4
Proposed model Xception VGG16 VGG19 ResNet50 InceptionV3
Epochs 25 50 75
Fig. 4 Accuracy of the model for Adam optimizer at three levels of epochs
Comparison of Pre-trained and Convolutional … 139
0.8 0.69
0.6
0.6
0.4
Proposed model Xception VGG16 VGG19 ResNet50 InceptionV3
Epochs 25 50 75
Fig. 5 Accuracy of the model for SGD optimizer at three levels of epochs
Fig. 6 Performance of model on train and test set by using Adam or SGD optimizer at three levels
of epochs
5 Conclusion
the dataset has been processed and analysed using various CNN methods. Based on
methodology imposed in the proposed method, it shows that the proposed DCNN
architecture are able to provide an accuracy of 89–93.67%. While SDG is the highest,
VGG16 provide more stable and consistence performance throughout the epoch and
it is evident as shown in Fig. 6. Overall, it shows that the higher the epoch, the higher
accuracy.
References
1. Grimm, J. E., & Steinhaus, M. (2020). Characterization of the major odorants in Cempedak—
Differences to jackfruit. Journal of Agricultural and Food Chemistry, 68(1), 258–266.
2. Balamaze, J., Muyonga, J. H., & Byaruhanga, Y. B. (2019). Physico-chemical characteristics of
selected jackfruit (Artocarpus Heterophyllus Lam) varieties. Journal of Food Research, 8(4),
11.
3. Shaha, M., & Pawar, M. (2018). Transfer learning for image classification. In 2018 Second
international conference on electronics, communication and aerospace technology (ICECA)
(pp. 656–660). https://doi.org/10.1109/ICECA.2018.8474802
4. Wang, M. M. H., Gardner, E. M., Chung, R. C. K., Chew, M. Y., Milan, A. R., Pereira, J.
T., & Zerega, N. J. C. (2018). Origin and diversity of an underutilized fruit tree crop, cempedak
(Artocarpus integer, Moraceae). American Journal of Botany, 105(5), 898–914.
5. Sharma, N., Jain, V., & Mishra, A. (2018). An analysis of convolutional neural networks
for image classification. In International conference on computational intelligence and data
science (ICCIDS 2018); Procedia Computer Science, 132, 377–384. ISSN 1877-0509. https://
doi.org/10.1016/j.procs.2018.05.198
6. Alhaj, Y. A., Dahou, A., Al-qaness, M. A., Abualigah, L., Abbasi, A. A., Almaweri, N. A. O.,
Elaziz, M. A., & Damaševičius, R. (2022). A novel text classification technique using improved
particle swarm optimization: A case study of Arabic language. Future Internet, 14(7), 194.
7. Daradkeh, M., Abualigah, L., Atalla, S., & Mansoor, W. (2022). Scientometric analysis and
classification of research using convolutional neural networks: A case study in data science
and analytics. Electronics, 11(13), 2066.
8. Wu, D., Jia, H., Abualigah, L., Xing, Z., Zheng, R., Wang, H., & Altalhi, M. (2022). Enhance
teaching-learning-based optimization for tsallis-entropy-based feature selection classification
approach. Processes, 10(2), 360.
9. Ali, M. A., Balasubramanian, K., Krishnamoorthy, G. D., Muthusamy, S., Pandiyan, S.,
Panchal, H., Mann, S., Thangaraj, K., El-Attar, N. E., Abualigah, L., & Elminaam, A. (2022).
Classification of glaucoma based on elephant-herding optimization algorithm and deep belief
network. Electronics, 11(11), 1763.
10. Abualigah, L., Kareem, N. K., Omari, M., Elaziz, M. A., & Gandomi, A. H. (2021). Survey
on Twitter sentiment analysis: Architecture, classifications, and challenges. In Deep learning
approaches for spoken and natural language processing (pp. 1–18). Springer.
11. Fan, H., Du, W., Dahou, A., Ewees, A. A., Yousri, D., Elaziz, M. A., Elsheikh, A. H., Abualigah,
L., & Al-qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
world application UK Brexit. Electronics, 10(11), 1332.
12. Abualigah, L. M. Q. (2019). Feature selection and enhanced krill herd algorithm for text
document clustering (pp. 1–165). Springer.
13. Hameed, K., Chai, D., & Rassau, A. (2018). A comprehensive review of fruit and vegetable
classification techniques. Image and Vision Computing, 80(September), 24–44.
14. Rahnemoonfar, M., & Sheppard, C. (2017). Deep count: Fruit counting based on deep simulated
learning. Sensors (Switzerland), 17(4), 1–12.
Comparison of Pre-trained and Convolutional … 141
15. Reddy Chirra, V. R., Uyyala, S. R., & Kishore Kolli, V. K. (2019). Deep CNN: A machine
learning approach for driver drowsiness detection based on eye state. Revue d’Intelligence
Artificielle, 33(6), 461–466.
16. Risdin, F., Mondal, P. K., & Hassan, K. M. (2020). Convolutional neural networks (CNN)
for detecting fruit information using machine learning techniques. IOSR Journal of Computer
Engineering, 22(2), 1–13.
17. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
18. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
19. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
20. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
21. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
22. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
23. Chollet, F. (2021). Xception: Deep learning with depthwise separable convolutions. [online]
arXiv.org. https://arxiv.org/abs/1610.02357v3. Accessed May 30, 2021.
Markisa/Passion Fruit Image
Classification Based Improved Deep
Learning Approach Using Transfer
Learning
Ahmed Abdo, Chin Jun Hong, Lee Meng Kuan, Maisarah Mohamed Pauzi,
Putra Sumari, Laith Abualigah, Raed Abu Zitar, and Diego Oliva
Abstract Fruit recognition becomes more and more important in the agricultural
industry. Traditionally, we need to manually identify and label all the fruits in the
production line, which is labor intensive, error-prone, and ineffective. Therefore, a lot
of fruit recognition systems are created to automate the process, but fruit recognition
system for Malaysia local fruit is limited. Thus, this project will focus on classifying
one of the Malaysia local fruits which is markisa/passion fruit. We proposed two
CNN models for markisa classification. The performances of the proposed models
are evaluated on our own dataset collection and produces an accuracy of 97% and
65% respectively. The results indicated that the architecture of CNN model is very
important because different architecture can produce different results. Therefore,
first CNN model is selected because it can classify 4 types of markisa with a higher
accuracy. In the proposed work, we also inspected two transfer learning methods in
the classification of markisa which are VGG-16 and InceptionV3. The results showed
that the performance of the first proposed CNN model outperforms VGG-16 (95%
accuracy) and InceptionV3 (65% accuracy).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 143
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_7
144 A. Abdo et al.
1 Introduction
2 Literature Survey
In image object detection or classification, two approaches available are deep learning
or Convolutional Neural Network (CNN) and traditional Computer Vision (CV)
approach [7, 8]. The traditional CV algorithms for feature extraction include edge
detection, corner detection and threshold segmentation [9–12]. The deep learning
approach performs a better accuracy in image classification as compared to the tradi-
tional CV techniques [13]. Deep learning also offers less demand from the expert
to do the fine-tuning or features extraction, it can be done by the CNN with high
flexibility and re-trained to get the optimum result. Therefore, CNN or deep learning
are applied in many images classification, fruit classification which is one of the
classification tasks to help in the robotic harvesting system or checking the quality
of the fruit [14]. Risdin et al. [14] develop a CNN in fruit classification that achieves
a 98.99% accuracy better than the traditional machine learning techniques such as
SVM with only 87% accuracy [5]. Moreover, Palakodati et al. [15] develop a fresh and
rotten fruit classification CNN model and able to achieve accuracies up to 97.82%.
Among the best Transfer Learning model that have been experimented with fruits
and vegetable dataset is VGG16. A study by Kishore et al. [16] have proven that
by using VGG16 on dataset that consists of 4 classes (Banana, Tomato, Carrot, and
Potato) achieves about 97% accuracy [16]. Each class in the dataset consists of 600
images. Interestingly, the model was tested with different image sizes to prove that
VGG16 works well with smaller and noisier images. With the achieved accuracy, it
is no doubt that VGG16 is a good option to opt for the fruit or vegetable dataset.
There is also another study by Pardede et al. [17], that applying the VGG16
model on fruits dataset. The aim of the study is to build a deep learning model that
can detect the fruit ripeness, which is a bit different from the previous study but
have the same nature in the dataset. In that study, there are 8 classes of fruit (Ripe
Mango, Unripe Mango, Ripe Tomato, Unripe Tomato, Ripe Orange, Unripe Orange,
146 A. Abdo et al.
Ripe Apple, Unripe Apple). As the outcome, they achieved about 90% accuracy with
Dropout rate at 0.5. The study concluded that the best technique to reduce overfitting
in Transfer Learning is by using Dropout.
Inception-v3 is a convolutional neural network architecture named after the Incep-
tion movie directed by Christopher Nolan; the model is mainly used for image
analysis and object detection and was introduced during the ImageNet Recognition
Challenge held by Google [18]. A study published by Wikimedia Foundation [18].
Szegedy et al. [19] proposed the architecture of Inception-v3 and studied them in the
context of Inception architectures; the formal has proven to have “high-performance
vision networks that have a relatively modest computation cost compared to simpler,
more monolithic architectures”. In addition, the highest quality trained version
of Inception-v3 has “reached 21.2%, top-1 and 5.6% top-5 error for single crop
evaluation” compared to other CNN architectures at the time.
Another paper that Chunmian Lin et al. published has explored the application of
the Inception-v3 Model through transfer learning; the transfer learning-based model
“is retrained for 5000 epochs at different learning rates [20]. The accuracy test results
indicate that the transfer learning-based method is robust for traffic sign recognition,
with the best recognition accuracy of 99.18 % at the learning rate of 0.05”. Some other
optimization methods can be used to optimize the problems as given in [8, 21–25].
3.1.1 Model 1
The proposed architecture of the CNN model for Passion fruit classification can be
seen in Figs. 2 and 3. This CNN model consisted of 4 convolutions layers as illustrated
in Figs. 2 and 4 dense layers for the classifier of the neural network excluded the input
and output layer as shown in Fig. 3 [12, 26]. This model able to give testing accuracy
of 97% in classifying the 4 types of passion fruits. As seen in Fig. 2, after the input
of training data in the size of (224, 224) RGB color images, it is passed into the first
convolutional layer. The first convolutional layer is designed with 64 convolutional
filters in the size of (3, 3); the stride is (1, 1) when translating the filters on the input
images by one step, padding is set to the same which will provide the same output after
the first convolutional. Next, batch normalization is applied after the ReLu activation
function to give the output of the mean activation close to zero and the standard
deviation close to 1 [3]. The result will then pass to the max-pooling layer of size (2,
2) to reduce the output size to simplify the model. Furthermore, the same number of
convolutional filters applied in the convolutional layer 2. However, the filter size is
increased to (5, 5) with no padding and the same batch normalization is applied to
Markisa/Passion Fruit Image Classification Based Improved … 147
the ReLu activation function before the max-pooling layer of size (2, 2). The same
architecture used for the convolutional layer 2 is repeated on the convolutional layer 3
but increase the filter size to (7, 7), the same ReLu activation and batch normalization
is applied to prevent the overfitting of the model before the max-pooling layer of size
(2, 2). In the convolutional layer 4, the convolutional filters are reduced to only 16
filters with size (7, 7) and batch normalization is applied on the output without the
ReLu function and max-pooling layer. The output provided by the base model for
the proposed CNN will become the input to the neural network architecture after it
extracted the features in the input images.
After the convolutional layer extracted the features inside the fruit dataset, the
pixel value is flattening out before input to the neural network as seen in Fig. 3.
3 dense layers inside the neural network without including the input and output
layers. The first dense layer is constructed by 512 numbers of nodes and the L2
regularization or Ridge Regularization with both lambda and bias of 0.01 are used
to add penalties on the weights to create a simpler model and prevent overfitting [4].
The dropout rate of 0.25 for faster computation by ignoring 25% of the neurons when
training to avoid overfitting. Therefore, two regularization techniques which are the
L2 regularization and dropout regularization of 0.25 are used in the first dense layer
due to large training neurons. The ReLu activation function is used for the first dense
layers and input to the second dense layer with only 64 numbers of neuron and the
same dropout rate of 0.25 as in the first layer without L2 regularization. The output
is then passed to the ReLu activation function and the becoming the input of the last
dense layer before the output layer. The third layer of the neural network only has
32 nodes with no dropout rate and regularization. ReLu activation function is used
for the third dense layer and the Softmax activation function is used for the output
layers with 4 neurons to make the multiclass classification. The optimizer used in
the loss function to update the weight and bias in the neural network is categorical
cross-entropy due to the input is in one hot-encoded and Adagrad optimizer of 0.001
learning rate. The epoch is set to 30 and the batch size is set to 10. The metric used
to measure is the categorical accuracy for the multiclass classification.
3.1.2 Model 2
Figure 4 shows the second proposed CNN model architecture. This model has 6
convolution blocks, 2 pooling blocks, 2 fully connected layers and a SoftMax clas-
sifier. All input images are color images with sizes of 224 × 224 pixels, 3 channels.
All the convolution blocks have same filter sizes (3 × 3), and paddings are applied
to ensure the output images have same sizes as the inputs. However, different filter
number are used, 128 filters for convolution block 1, 96 filters for block 2, 64 filters
for block 3, 32 filters for blocks 4 and 5, and 12 filters for block 6. All convolution
blocks have same activation function which is RELU. Maximum pooling with sizes
of 2 × 2 is applied after convolution blocks 1 and 2 to reduce the sizes of the image
twice (from 224 × 224 to 56 × 56). After going through the convolution base, the
dimension of the images is 56 × 56 × 12. Then, the images are flattened to vectors
148 A. Abdo et al.
of size 37,632 before fitting into the fully connected layers. Both fully connected
layers have 1000 nodes, use RELU as the activation function. The only difference
is dropout rate of 0.3 is applied to the first layer. Then, the SoftMax classifier will
output the result, either the images are markisa besar, markisa kuning, markisa manis
or markisa ungu.
As part of this study, we also include transfer learning model. Due to some limitations
in our device and limited resources we are only able to compare between VGG16
and InceptionV3 model. In both models, we freeze the base convolutional layers and
150 A. Abdo et al.
remove the flatten layer and its classifier. However, we maintained the weights by
using the ‘imagenet’ option. Then, we replaced it to suit our dataset which contained
4 classes. We are using ReLu as the dense layer activation function and softmax
as the output layer activation function and implemented early stopping to reduce
the training time. Whenever the model reaches 99% accuracy, we stop the model
training. This will also reduce the possibility of overfitting.
3.2.1 VGG16
VGG16 is convolutional neural network that was developed by Karen Simonyan and
Andrew Zisserman from Oxford University in 2014 [27]. This model contained 16
layers and achieves 92.7% top-5 test accuracy on ImageNet dataset which contains
14 million images belonging to 1000 classes. Figure 5 shows the architecture of
VGG16.
Markisa/Passion Fruit Image Classification Based Improved … 151
3.2.2 InceptionV3.
3.3 Dataset
There are many types of Markisa/passion fruits. In our dataset, we include 4 different
types of this fruit which are Markisa Besar (Giant Passion Fruit), Markisa Kuning
(Yellow Passion Fruit), Markisa Manis (Sweet Passion Fruit), and Markisa Ungu
(Purple Passion Fruit). We divided the dataset into 80% training, 10% validation and
10% for testing. Figure 7 shows the examples of images in our dataset.
3.4 Augmentation
We also apply some image augmentation by rotating the images to certain degree.
Table 1 shows the rotation for one sample of the image in Markisa Besar class.
Markisa/Passion Fruit Image Classification Based Improved … 153
180°
90° anticlockwise
275°
154 A. Abdo et al.
4 Performance Result
The dataset used was quite balanced, which consisted of 250 Markisa Besar, 250
Markisa Kuning, 250 Markisa Manis, 250 Markisa Ungu and totaling 1000 images.
The size of the images has been standardized to 224 × 244 × 3. The programming
language used in this study is Python with Tensorflow and Keras library. To run the
codes, we use Google Colaboratory with GPU. However, the GPU runtime is limited,
and we are unable to use it extensively. Thus, we reduce our parameter tuning options
from 4 different values in each parameter into 2 different values only for transfer
learning section. Tables 2 and 3 show the parameters tuning options for proposed
CNN model and transfer learning, respectively.
4.2.1 Model 1
To obtain the 97% passion fruit classification model as seen in Figs. 2 and 3, the
model summary of the CNN architecture as seen in Fig. 8. We will first exper-
iment with different architectural designs and different hyperparameters tuning.
Markisa/Passion Fruit Image Classification Based Improved … 155
The first proposed model is seen in Fig. 9, used to experiment with different opti-
mizers, number of dense layers, different learning rates, number of epochs, number
of filters and lastly is the number of training batch sizes. The best model is the CNN
architecture as seen in Fig. 8 with 97% accuracy on the testing data.
The initial proposed CNN architecture as seen in Fig. 9 consisted of 4 convo-
lutional layers and 3 dense layers just like the best model as seen in Fig. 8. The
Fig. 8 The model summary for the best proposed CNN model with accuracy of 100%
156 A. Abdo et al.
first convolutional layer consists of 64 filters with size (3, 3), stride (1, 1) and zero-
padding that will output the same result just like the best model as seen in Figs. 2,3
and 8. However, the number of filters for the second and third convolutional layer is
only 16 instead of 64 as compared to the best model. The rest of the convolutional
architecture and the neural network dense layer architecture is the same as the best
model. However, the initial learning rate was set to 0.0001 and Adam optimizer is
used to get the categorical accuracy. In addition, the epoch was set to 30 and a batch
size of 10 for the training dataset. As a result, we can observe that the total parameter
for the best model is 17,725,876, larger than the initial model with 17,422,804 due
to fewer number of filters used in the CNN architecture as seen in Figs. 8 and 9.
Markisa/Passion Fruit Image Classification Based Improved … 157
Fig. 10 The effect of the optimizer on the training and validation accuracy against epoch
158 A. Abdo et al.
activation after the second dense layer as seen in Fig. 12. As a result, the experiment
will have up to 6 dense layers after the 3 times of adding new dense layers in the
classifier. The result shows that when more dense layers were added, the model will
learn slower as it required more epochs or episodes of training before it can predict
well. The validation accuracy will higher than the training accuracies after 4 dense
layers existed in the model as the complexity of the model increased. As a result, the
testing evaluation accuracies illustrated in Fig. 13 shows that model with dense layer
3 will have a higher testing evaluation accuracy with 0.96 than more dense layers in
the epoch size 30. Therefore, we will remain the dense layers with only 3, Adagrad
optimizer, a learning rate of 0.0001 and the training epoch of 30 with batch size of
10 as our current model.
(III) Effect of Learning Rate
Next, the model now is tested with different learning rates from 0.1, 0.01, 0.001 and
0.0001 as seen in Fig. 14. The result shows an obvious trend that when the learning
rate is high, the model will converge faster shows by the validation accuracy closed
to the training accuracy in less than 10 epochs. However, this is not stable as a high
learning rate means the model will update its weight and bias faster and might not
learn well after 10 epoch with fluctuation in the validation accuracies. The smaller
the learning rate, the model will learn slower and get better accuracies as illustrated
in Fig. 14. The testing evaluate accuracies increase when the learning rate become
smaller but only until 0.001. This is because the learning rate of 0.001 gives the
highest which is close to 1 or 0.99 as compared to a learning rate of 0.0001 with
testing evaluation accuracy of 0.96. The result could be explained because only 30
epoch is tested and the smaller learning rate might need a bigger epoch size to train
better. As a result, because of the low epoch for fast computation and high accuracy
generated, we will select the learning rate of 0.001 instead of the initial learning rate
Markisa/Passion Fruit Image Classification Based Improved … 159
Fig. 12 The effect of the number of epochs on the training and validation accuracy against epoch
of 0.0001 as set by the initial model. Although the validation accuracy for learning
rate of 0.0001 is slightly higher than 0.001 as seen in Fig. 15, but we will go for a
fast converge model. The current model is Adagrad optimizer, the learning rate of
0.001, 3 dense layers and epoch size of 30 with batch size of 10.
Furthermore, the model now is experimented with different epoch sizes from 10, 30,
50 to 70 as seen in Fig. 16. The first 10 epoch shows that the validation accuracy is
small and the model is overfitting as it has bad testing evaluate accuracy with only
0.39. When the epoch size getting bigger, the validation accuracy starts to converge
and become consistent even with further increase of the epoch size as seen in epoch
70. The model will start to have a consistent validation accuracy after the epoch of
20. The testing evaluation accuracies for different epoch sizes can be seen in Fig. 17.
The result shows that epoch sizes 30 and 50 yield the highest testing evaluation
accuracies of 0.99 as compared to epoch 70 with an accuracy of 0.95. Consequently,
we will pick epoch size of 30 for training the model as it required less computational
resource and yet give a good result on the validation accuracy. The current model is
Adagrad optimizer, a learning rate change from 0.0001 to 0.001, 3 dense layers and
epoch size of 30 with batch size of 10.
Fig. 14 The effect of learning rate on the training and validation accuracy against epoch
Markisa/Passion Fruit Image Classification Based Improved … 161
Fig. 16 The effect of learning rate on the training and validation accuracy against epoch
162 A. Abdo et al.
The initial model has only the first convolutional layer of 64 filters number, the
second convolutional has 16 filters numbers and the third convolutional layer has
also 16 filters numbers. When the second and third layers of the convolutional layer
filters also change to 64, the result are shown in Fig. 18. The total number of filters
added is 48 if only a second convolutional is added else, total added filter is 90 for
both second and third layer of convolutional filter change to 64. When more filters
added, the model able to capture the image features better as seen in the validation
accuracies close to the training accuracies in the 48 and 90 filters added. On the
other hand, the testing evaluation accuracies show improvement after adding the 48
filters and 90 filters which both yield 100% accuracies from 99% as seen in Fig. 19.
Therefore, the number of filters for the first 3 convolutional layers will change to 64
filters and it is the best model’s architecture as illustrated in Figs. 2 and 3.
(VI) Effect of Batch Size
Since we already determine the best model’s architecture, now we will experiment
on how the training batch size will affect the performance of the model as seen in
Fig. 20. We can observe that when the batch size increase, the model will update
slower, and the model testing evaluation accuracies will become smaller as seen in
Fig. 21. The accuracies drop to 0.97 and 0.98 after the increase of the batch size.
This can be explained by the larger batch size will decrease the number of times the
parameters update. As a result, we will retain the batch size of 10 in the training of
the input dataset. The best model now is 10 epochs with batch size of 10, learning
rate of 0.001, 3 dense layers, 64 number of filters for all the first 3 convolutional
layers and Adagrad optimizer.
Markisa/Passion Fruit Image Classification Based Improved … 163
Fig. 18 The effect of number of filters on the training and validation accuracy against epoch
Fig. 20 The effect of number of filters on the training and validation accuracy against epoch
Figure 22 depicts the best model predicted accuracy for the testing dataset which
shows 97% accuracy in the passion fruit classification. 3 misclassifications happened
on the Markisa Besar and the model able to predict all the others classes correctly.
The best model shows a 100% accuracy during the testing evaluation accuracy but the
Markisa/Passion Fruit Image Classification Based Improved … 165
actual predicted accuracy on the input dataset is 97% with 3 images being misclassi-
fied out of the 100 images in the testing dataset. Therefore, it is believed that testing
accuracy can be increased my feeding the model on more Markisa Besar images with
different variety.
4.2.2 Model 2
Figure 23 shows the summary of the second CNN models. Based on the summary, we
need to train 38838816 parameters. We first trained the model with training data, then
performed hyperparameters tuning using validation data, finally test the accuracy of
the model using testing data. To obtain the best parameters for this model, we have
performed the hyperparameters tuning according to the setup mentioned in Table 2.
After performing the hyperparameters tuning, the model has test accuracy of 65%
with the following parameters:
. Optimizer = Adam
. Learning rate = 0.001
. Last filter numbers = 12
. Number of nodes in each dense layer = 1000
. Epochs = 50
. Batch size = 20.
The section below shows the effect of each hyperparameter on the model
performance.
166 A. Abdo et al.
To test the effect of optimizers on the model performance, we keep the other
parameters as constant:
. Last filter size = 12
. Number of nodes = 1000
. epochs = 30
. batch size = 40.
The effect of optimizers is shown in Fig. 26. Based on the result, we can see that
Adam optimizer has the highest validation accuracy which is 0.80. We also can see
that the model is overfitted with the training data because it can classify the training
images with 0.99 accuracy but the performance on the validation set is only 0.80
accuracy. In the next hyperparameter tuning, we will keep optimizer as Adam.
model is overfitted with the training data because it can classify the training images
perfectly but the performance on the validation set is only 0.65 accuracy.
4.3.1 VGG16
In this experiment, we added only 1 dense layer after the flatten layer and 1 dropout
layer before the output layer while the base VGG16 convolutional layer are being
freeze. Figure 29 illustrate the model architecture.
As stated in the previous section, we come out with different parameter tuning
during the model training. From the parameters option, we have trained 64 models
with different combination of the parameters (refer in Appendix). As the comparison,
we selected the best model with different parameters. The best accuracy achieved
from the training is 0.97. Figure 30 shows the training and validation accuracy and
losses across different epochs before it stopped learning when 99% accuracy achieved
for the best model in this part.
For optimizer, we chose Adam and SGD with same parameters. The comparison
between these 2 optimizers are as shown in Table 4.
As we can see, the Adam Optimizer perform better than SGD with same
parameters.
Fig. 30 Training/validation
accuracy and loss across
different epochs
Table 4 Comparison
Same parameters Optimizer Accuracy
between Adam and SDG
optimizers Number of neurons in dense layer: 512 Adam 0.97
Dropout: 0.2 SGD 0.75
Epochs: 20
Learning rate: 0.01
Batch size: 100
Table 5 Comparison
Same parameters No. of neurons in dense layer Accuracy
between different number of
neurons Optimizer: Adam 512 0.97
Dropout: 0.2 1024 0.87
Epochs: 20
Learning rate: 0.01
Batch size: 100
Markisa/Passion Fruit Image Classification Based Improved … 171
Table 6 Comparison
Same parameters Dropout Accuracy
between different dropout rate
Optimizer: Adam 0.2 0.97
No of neurons in dense layer: 512 0.1 0.81
Epochs: 20
Learning rate: 0.01
Batch size: 100
Table 7 Comparison
Same parameters Learning rate Accuracy
between different learning
rate Optimizer: Adam 0.01 0.97
No of neurons in Dense layer: 512 0.001 0.97
Epochs: 20
Dropout: 0.2
Batch size: 100
From the result, it is obvious that the higher dropout performs better than the
lower dropout rate.
(IV) Effect on Learning Rate
Besides rate of dropout, we also test on different learning rate, 0.01 or 0.001. The
result is shown in Table 7.
There is no effect in changing the learning rate as the result is same accuracy.
(V) Effect on Batch Size
We also tested on the effect of different batch size during the model training. The
result is shown in Table 8.
For batch size, it only has little difference in accuracy which is only 0.03. It can
be concluded that batch size does affect the performance a bit.
(VI) Effect on Epochs
Lastly, we try 2 different epochs, 10 and 20. The result is shown in Table 9.
Same as batch size, number of epochs just affect a little on the accuracy with 0.06
in difference.
As a summary from the experiment on VGG16 transfer learning, we need to
choose the best optimizer, dropout rate and number of neurons in the dense layer to
get the best model. However, different learning rate does not affect the performance
of the model while the batch size and number of epochs only affects a little on the
accuracy value.
Table 8 Comparison
Same parameters Batch size Accuracy
between different batch size
Optimizer: Adam 100 0.97
No of neurons in dense layer: 512 50 0.94
Epochs: 20
Dropout: 0.2
Learning rate: 0.01
172 A. Abdo et al.
Table 9 Comparison
Same parameters Epochs Accuracy
between different epochs
Optimizer: Adam 20 0.97
No of neurons in dense layer: 512 10 0.91
Batch size: 100
Dropout: 0.2
Learning rate: 0.01
After we get the best model with the best parameters, we apply the model by
classifying the testing dataset consists of 100 images with 25 images per label. And
the results turned out quite good as illustrated in Fig. 31.
Only 5 images from Markisa Manis that are misclassified as Markisa Kuning
with predicted accuracy of 95%. However, the rest are predicted correctly. Seems
like Markisa Kuning is a dominant label. Perhaps in future works, we can identify
why the other labels always misclassified as Markisa Kuning in this model although
the misclassified images are the minority.
4.3.2 InceptionV3
In this experiment, we imported InceptionV3’s base model and omitted its top layer
that consists of Dense layers and dropout layers. Then, the base model weights and
biases were frozen to preserve the learnable parameters from the previous training.
Next, a fully connected layer with one dense layer and a dropout layer configured
with the Relu activation function. Finally, the Sigmoid activation function is used
for the output layer with four classes representing the four cultivars of Markisa.
Figure 33 illustrates the model architecture. It’s imperative to mention that Dense
layers and two dropout layers are used for the experiment part. Figure 32 shows the
transfer learning through Inception-V3 model.
Markisa/Passion Fruit Image Classification Based Improved … 173
For the number of neurons in the Dense layer, we experimented with two different
values, 512 and 1024.
From Fig. 36, we can conclude that the number of neurons hasn’t affected the
model accuracy with one Dense layer, as both configurations of 512 and 1024 neurons
have resulted in lower and higher accuracy; probably, other parameters have more
174 A. Abdo et al.
Fig. 34 HParams scatter plot matrix view for one dense layer optimizer
Fig. 35 HParams scatter plot matrix view for two dense layers optimizer
effect on the model’s accuracy. However, 512 neurons have contributed to the top-
performing model.
From Fig. 37, the results are like the one Dense layer experiment; however, 512
neurons have contributed to the top-performing and lowest-performing model.
(III) Effect on Dropout
For Dropout rates, we experimented with the values 0.1 and 0.2.
From Fig. 38, we can conclude that the dropout value of 0.2 for one Dense layer
has an overall better result; however, it failed to produce the top-performing model
and produced the lowest-performing model.
Markisa/Passion Fruit Image Classification Based Improved … 175
Fig. 36 HParams scatter plot matrix view for one dense layers neurons
Fig. 37 HParams scatter plot matrix view for two dense layers neurons
From Fig. 39, we can conclude that a dropout value of 0.2 has resulted in a better
performing model for the two Dense layers than the one Dense layer; probably, higher
dropout values correlate with higher accuracy in Dense multilayers.
In addition to the Dropout rate, we also test on different learning rates, 0.01 or 0.001.
From Fig. 40, we can conclude that the learning rate with the value of 0.01 has
resulted in overall higher accuracy than the 0.001 in the one Dense layer experiment; it
has also contributed to the top-performing model when it comes to accuracy (Fig. 41).
176 A. Abdo et al.
Fig. 38 HParams scatter plot matrix view for one dense layer dropout
Fig. 39 HParams scatter plot matrix view for two dense layers dropout
Contrary to the one Dense layer experiment, a lower value of 0.001 has performed
better in the two Dense layers experiment; probably, lower learning rate values
correlate with higher accuracy in Dense multilayers.
(V) Effect on Batch Size
We also tested the effect of different batch sizes during the model training.
We can conclude that the batch sizes haven’t affected the model accuracy with
one Dense layer from Fig. 42. Both configurations of 50 and 100 batch sizes have
resulted in lower and higher accuracy; probably, other parameters affect the model’s
accuracy. However, batch sizes of 50 have contributed to the top-performing model.
Markisa/Passion Fruit Image Classification Based Improved … 177
Fig. 40 HParams scatter plot matrix view for one Dense layer learning rates
Fig. 41 HParams scatter plot matrix view for two Dense layers learning rates
From Figure 43, the results are similar to the one Dense layer experiment; however,
batch sizes of 100 have contributed to the top-performing and lowest-performing
model.
For the number of Epochs used to train the models, we experimented with two
different values, 10 and 20.
178 A. Abdo et al.
Fig. 42 HParams scatter plot matrix view for one dense layer batch size
Fig. 43 HParams scatter plot matrix view for two dense layers batch sizes
From Fig. 44, we can conclude that higher Epochs has resulted in overall higher
accuracy than the lower in the one Dense layer experiment; it has also contributed
to the top-performing model when it comes to accuracy.
From Fig. 45, we can conclude that a higher Epochs value has also resulted in
higher accuracy than the lower value in the two Dense layers experiments. However,
it has been proven that higher Epochs result in very high train accuracy; however,
a very high Epochs will cause overfitting, and the validation accuracy will decrease
because models won’t generalize very well.
Markisa/Passion Fruit Image Classification Based Improved … 179
Fig. 44 HParams scatter plot matrix view for one dense layer epochs
Fig. 45 HParams scatter plot matrix view for two dense layers epochs
Figure 46 shows that the model with one Dense layer has more consistent performance
than the two Dense layers model. However, the latter has a performance spike when
configured with specific attributes. Therefore, as a summary from the experiments
on Inception-V3 transfer learning, we can conclude that the best performing model
is the model with the below parameters (Fig. 47).
Finally, after finding the best performing model with the most optimum param-
eters, we test the model with a dataset consists of 100 images with 25 images per
label. The results are illustrated in Fig. 48.
180 A. Abdo et al.
Fig. 46 Accuracy comparison between the different configurations of the dense layers
From Fig. 48, we can conclude that the best performing Inception-V3 transfer
learning has low testing performance and an average accuracy of 65.3%. The model
has failed to classify Markisa Manis.
For comparison, the exact same testing set is applied to other prevalent deep learning
architectures, result shown as Table 10.
Markisa/Passion Fruit Image Classification Based Improved … 181
5 Conclusion
In this study, 4 different CNN models are created for the Markisa Fruit classification
for the 4 different types of Markisa. Two custom CNN models are created, and 2
transfer learning models are used with the based model of VGG16 and Inceptionv3.
The classifier of the two transfer learning models is customed with different classifiers
and use to make the prediction. The result showed that the first custom CNN model
shows the highest accuracy with 97% followed by the transfer learning model of VGG
16 with an accuracy of 95%. The second custom CNN model and the Inceptionv3 both
give the same testing accuracy of 65%. Consequently, the custom CNN’s performance
on the testing accuracy is comparable to the transfer learning model such as VGG16.
The architecture design is crucial in determining how well the model able to capture
the feature inside the input dataset.
182 A. Abdo et al.
Appendix
(continued)
Num_units Dropout Optimizer Epochs Learning rate Batch_size Accuracy
512 0.1 SGD 10 0.001 100 0.85
1024 0.2 SGD 20 0.001 50 0.85
1024 0.2 SGD 20 0.01 50 0.85
512 0.1 SGD 10 0.01 100 0.84
1024 0.2 SGD 10 0.01 100 0.84
512 0.1 Adam 20 0.01 50 0.83
1024 0.2 SGD 20 0.01 100 0.83
1024 0.2 Adam 20 0.001 50 0.83
512 0.2 SGD 10 0.01 50 0.82
1024 0.1 SGD 20 0.01 50 0.82
1024 0.1 SGD 20 0.001 50 0.82
512 0.1 SGD 20 0.001 50 0.81
512 0.1 Adam 20 0.01 100 0.81
512 0.1 Adam 10 0.01 50 0.81
1024 0.2 SGD 10 0.001 100 0.81
1024 0.1 Adam 20 0.001 100 0.81
512 0.2 SGD 20 0.01 50 0.8
1024 0.1 SGD 10 0.001 50 0.8
1024 0.1 Adam 10 0.001 50 0.8
512 0.2 SGD 10 0.001 100 0.79
512 0.2 SGD 10 0.001 50 0.78
512 0.1 SGD 10 0.01 50 0.78
512 0.1 SGD 20 0.01 50 0.77
512 0.2 Adam 10 0.001 50 0.75
512 0.2 SGD 20 0.01 100 0.75
512 0.2 SGD 10 0.01 100 0.75
1024 0.1 SGD 10 0.01 50 0.75
1024 0.1 SGD 20 0.01 100 0.73
1024 0.2 Adam 10 0.01 50 0.73
1024 0.1 SGD 10 0.01 100 0.72
Table 10 Result of parameter tuning for Inception-V3 with one dense layer
Number of Dropout rate Optimizer Epochs Learning rate Batch size Accuracy (%)
neurons
512 0.1 Adam 20 0.01 50 93
512 0.2 Adam 20 0.01 100 91
(continued)
184 A. Abdo et al.
(continued)
Number of Dropout rate Optimizer Epochs Learning rate Batch size Accuracy (%)
neurons
1024 0.1 Adam 10 0.001 50 90
1024 0.1 Adam 10 0.001 100 89
1024 0.1 Adam 20 0.01 50 89
512 0.1 Adam 20 0.001 50 89
512 0.2 Adam 10 0.001 50 89
512 0.1 Adam 10 0.001 50 89
1024 0.2 Adam 20 0.001 100 89
1024 0.2 Adam 20 0.01 50 89
1024 0.2 Adam 20 0.01 100 89
1024 0.1 Adam 20 0.001 100 89
512 0.2 Adam 20 0.001 100 89
1024 0.2 Adam 20 0.001 50 89
512 0.2 Adam 20 0.001 50 89
512 0.1 Adam 10 0.01 100 88
1024 0.2 Adam 10 0.001 50 88
1024 0.1 Adam 20 0.01 100 88
512 0.2 Adam 10 0.01 50 88
1024 0.1 Adam 10 0.01 100 88
512 0.1 Adam 20 0.01 100 88
512 0.1 Adam 10 0.01 50 88
1024 0.1 Adam 20 0.001 50 88
512 0.1 Adam 10 0.001 100 88
512 0.1 Adam 20 0.001 100 88
1024 0.1 Adam 10 0.01 50 87
1024 0.2 Adam 10 0.01 50 87
1024 0.2 Adam 10 0.001 100 87
512 0.2 Adam 10 0.01 100 86
1024 0.2 Adam 10 0.01 100 86
512 0.2 Adam 20 0.01 50 86
512 0.2 Adam 10 0.001 100 86
1024 0.2 SGD 20 0.001 100 90
1024 0.2 SGD 20 0.01 100 89
512 0.1 SGD 20 0.001 100 89
1024 0.1 SGD 20 0.01 50 89
1024 0.1 SGD 20 0.01 100 89
512 0.2 SGD 20 0.01 100 89
(continued)
Markisa/Passion Fruit Image Classification Based Improved … 185
(continued)
Number of Dropout rate Optimizer Epochs Learning rate Batch size Accuracy (%)
neurons
512 0.1 SGD 20 0.01 100 89
1024 0.2 SGD 10 0.01 50 89
1024 0.1 SGD 10 0.01 100 88
1024 0.2 SGD 10 0.01 100 88
1024 0.1 SGD 20 0.001 50 88
512 0.1 SGD 10 0.01 50 88
1024 0.2 SGD 20 0.01 50 88
512 0.2 SGD 20 0.001 100 87
512 0.2 SGD 10 0.01 50 87
1024 0.2 SGD 20 0.001 50 87
1024 0.1 SGD 10 0.01 50 87
512 0.1 SGD 20 0.01 50 87
512 0.1 SGD 10 0.01 100 87
512 0.1 SGD 20 0.001 50 87
512 0.2 SGD 10 0.001 100 87
1024 0.1 SGD 20 0.001 100 87
1024 0.1 SGD 10 0.001 50 86
1024 0.2 SGD 10 0.001 50 86
512 0.2 SGD 10 0.01 100 86
512 0.2 SGD 10 0.001 50 85
512 0.1 SGD 10 0.001 100 85
512 0.2 SGD 20 0.01 50 85
1024 0.1 SGD 10 0.001 100 84
512 0.2 SGD 20 0.001 50 84
512 0.1 SGD 10 0.001 50 83
1024 0.2 SGD 10 0.001 100 77
Table 10 Result of parameter tuning for Inception-V3 with two dense layers
Optimizer Learning rate Batch size Dropout rate Epochs Number of Accuracy (%)
neurons
Adam 0.001 100 0.2 20 512 92
Adam 0.001 50 0.1 20 512 90
Adam 0.001 100 0.1 20 512 89
Adam 0.001 100 0.1 10 512 89
Adam 0.001 50 0.1 20 1024 89
Adam 0.001 50 0.2 20 1024 89
(continued)
186 A. Abdo et al.
(continued)
Optimizer Learning rate Batch size Dropout rate Epochs Number of Accuracy (%)
neurons
Adam 0.01 100 0.2 20 512 89
Adam 0.01 50 0.1 20 512 89
Adam 0.001 100 0.1 10 1024 89
SGD 0.001 50 0.1 20 1024 89
SGD 0.001 50 0.2 20 512 89
SGD 0.01 100 0.1 20 512 89
SGD 0.01 100 0.2 20 512 89
SGD 0.01 50 0.1 20 512 89
SGD 0.01 50 0.1 20 1024 89
SGD 0.01 50 0.1 10 1024 89
SGD 0.01 100 0.2 20 1024 89
Adam 0.001 100 0.2 10 1024 88
Adam 0.001 100 0.2 10 512 88
Adam 0.001 50 0.2 20 512 88
Adam 0.001 50 0.1 10 1024 88
Adam 0.001 100 0.1 20 1024 88
Adam 0.01 100 0.2 20 1024 88
Adam 0.001 100 0.2 20 1024 88
Adam 0.01 50 0.2 20 512 88
Adam 0.001 50 0.2 10 512 88
Adam 0.001 50 0.2 10 1024 88
Adam 0.01 100 0.1 10 512 88
SGD 0.01 50 0.2 20 1024 88
SGD 0.01 50 0.1 10 512 88
SGD 0.01 50 0.2 10 1024 88
SGD 0.01 50 0.2 20 512 88
SGD 0.01 100 0.1 20 1024 88
SGD 0.01 50 0.2 10 512 88
Adam 0.01 100 0.1 20 512 87
Adam 0.01 100 0.1 10 1024 87
Adam 0.01 50 0.2 20 1024 87
Adam 0.001 50 0.1 10 512 87
Adam 0.01 100 0.2 10 1024 87
SGD 0.01 100 0.2 10 512 87
SGD 0.001 50 0.1 20 512 87
SGD 0.01 100 0.2 10 1024 87
(continued)
Markisa/Passion Fruit Image Classification Based Improved … 187
(continued)
Optimizer Learning rate Batch size Dropout rate Epochs Number of Accuracy (%)
neurons
SGD 0.01 100 0.1 10 512 87
SGD 0.01 100 0.1 10 1024 87
Adam 0.01 100 0.2 10 512 86
SGD 0.001 50 0.1 10 1024 86
SGD 0.001 100 0.1 20 512 86
SGD 0.001 50 0.2 10 512 86
Adam 0.01 50 0.1 10 512 85
Adam 0.01 50 0.2 10 1024 85
Adam 0.01 100 0.1 20 1024 85
Adam 0.01 50 0.1 20 1024 85
SGD 0.001 100 0.1 20 1024 85
SGD 0.001 100 0.2 10 512 85
SGD 0.001 50 0.2 10 1024 85
SGD 0.001 100 0.2 20 512 85
SGD 0.001 100 0.2 20 1024 84
SGD 0.001 100 0.2 10 1024 84
SGD 0.001 100 0.1 10 1024 83
SGD 0.001 50 0.2 20 1024 83
Adam 0.01 50 0.1 10 1024 82
SGD 0.001 100 0.1 10 512 82
Adam 0.01 50 0.2 10 512 80
SGD 0.001 50 0.1 10 512 78
References
25. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
26. Fan, H., Du, W., Dahou, A., Ewees, A. A., Yousri, D., Elaziz, M. A., Elsheikh, A. H., Abualigah,
L., & Al-qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
world application UK Brexit. Electronics, 10(11), 1332.
27. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
Enhanced MapReduce Performance
for the Distributed Parallel Computing:
Application of the Big Data
Abstract Now a days and previous years, the increase in the volume of data has
accelerated and this requires more storage places with the increase of data, as big
data has a huge number of users and cloud computing, and these users need to access
data securely and privately from any device at any time. Therefore, it is important to
provide a safe flow of data in the Internet of Things (IOT records file) and to reduce its
size in a way that does not affect its purpose or its purpose. The most important field
of data mining is the search for items and repetitive data inside storage locations.
Apriori algorithm was the most common algorithm for finding a set of repeated
elements from data. This needs to delete a group of data that is repeated more than
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 191
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_8
192 N. Milhem et al.
once and create a number of new groups after deleting the repeated ones, which leads
to an increase in the storage space and an increase in the speed of its performance. In
this paper, we implemented the MapReduce Apriori (MRA) algorithm on the Apache
Hadoop cluster that includes two functions (Map and Reduce) to find the repeated
sets of k-elements.
Keywords Internet of Things (IoT) · Big Data · Hadoop · Map Reduce · Apriori
algorithms · Data mining
1 Introduction
Modern technology has become more complex, especially with the development of
Internet of things devices, which leads to an increase in huge data to accelerate in size
and grow dramatically with the passage of time, to become of a size and complexity
so large that it is difficult to store and the lack of tools to manage or process it with
high efficiency [1]. Internet of things devices connected to the distributed and cloud
infrastructure provide and transmit data and other resources for uploading to the
cloud. Therefore, it is important to ensure that data and resources are ready to be
accessed and that users are able to access them securely in any IoT environment and
are distributed in an orderly manner and reduce their volume [2].
Distributed and parallel computing systems are the best way to process data on a
large scale, and these algorithms have been used and transformed into ‘large algo-
rithms’ to work with big data. MapReduce contributes to data analysis and is one
of the best algorithms in this field, and is a programming model for parallel and
distributed execution of big data [3]. The Apriori algorithm is the most popular and
widely used algorithm in data mining that mining sets of repetitive elements using
filter generation.
Apriori is the core algorithm of Association Rule Mining (ARM) and its genesis
has fueled research in data mining. Apriori is one of the top 10 data mining algorithms
identified by the IEEE International Conference on Data Mining (ICDM) in 2006
based on the most impactful data mining [4]. It not only works to shrink large data,
but is also concerned with a set of characteristics such as speed, and the movement
of various data in many forms. Which is mainly determined by large size and high
speed. The variety is high. Traditional data mining techniques and tools are effective
in analyzing / extracting data but not scalable and efficient in managing big data. Big
data architectures and technologies have been adopted to analyze this data.
This study aims to achieve adding a proposed application of distributed parallel
computing performance on big data and how to transfer big data as it is collected from
the Internet of Things to be considered as input data and simplified after processing
using Hadoop. How effective is the validity of reducing the repetition of big data
and ensuring its quality in operations that include (data collection and processing)
through algorithms analyzing data and data results.
Enhanced MapReduce Performance for the Distributed … 193
2 Background
The term “Big Data” include the (large volume, different forms, speed of processing,
technology, methods and impact) of digital data that accorded from companies and
individuals [5–12]. Big Data is the Information asset characterized by such a High
Volume, Velocity and Variety to require specific Technology and Analytical Methods
for its transformation into Value [13, 14].
Volume: This feature represents the large amount of data that is generated or obtained
from various sources such as social media, the bank, and the government and private
sectors, and it is increasing by the year 2021, so more than 44 trillion GB.
Value: It shows obtaining data through data collected from different sources,
conducting analyzes on them, and making sure of their values, as the analysis informs
us to give values of interest to companies and businesses for growth and progress,
and accordingly some decisions and ideas can be taken in the future.
Veracity: This part clarifies the contradictions and doubts that exist in the data during
the process. Some data packets have to be removed.
Velocity: The rate at which all the data is accumulated, this property measures Data
generation rate with increasing numbers of users, and it was accessed via IoT.
Variety: This feature deals with different formats of data including data coming from
IOT (images, video, JSON files, and social media). Include three formats of data are
structured, unstructured, and semi-structured data Fig. 1 explain formats of data.
2.2 Hadoop
It is an open source programming system that is based on Java that processes a set
of big data that exists in a distributed computing environment. Hadoop Ecosystem
is a program that provides various services for finding a solution to the big data
problem. It contains Apache projects and a set of tools and special solutions. It
includes four major components of Hadoop, namely HDFS, Map Reduce, YARN,
and Hadoop Common. These Tools a are used to find solutions and support these key
components. These tools are linked to provide services such as data ingest, analysis,
storage, maintenance, etc.
2.2.1 HDFS
Distributed file system is designed in order to contain very huge amounts of data
obtained from the Internet of things and its size (terabytes or even petabytes) and
connect to information. It stores redundant files across storage devices in case of
failure and high availability.
It is one of the programming models that was created to process big data by dividing
it into a group of Independent tasks, and the division is parallel, through two models
[1]. The first model is Map. The second models Reduce Each model does its job,
and the map’s function is to extract the results as pairs of values, where a Reduce
model takes the output of the map and processes it„ a collection of values is produced
(Fig. 2).
The Apriori algorithm works on sets of repetitive elements in order to establish the
correlation factor between them, K, and it is designed to work on big data that have
related parameters between them. With the help of the correlation factor K + 1, in
order to determine the strength or weakness of the contact between two objects. This
algorithm is widely used to efficiently calculate the set of functions for elements.
The goal of this iterative process is to find the repeated data set from the huge data
set. Some other optimization methods can be used to optimize the problems as given
in [15–20] (Fig. 3).
Enhanced MapReduce Performance for the Distributed … 195
3 Related Work
on analyzing big data. In his study, he focused on finding a solution to the problem
of scaling the “large algorithms” of the common correlation mining algorithm. The
results in this study confirm that an effective MapReduce implementation should
avoid dependent iterations, such as those of the original Apriori sequential algorithm.
Utility Frequent Patterns Mining on Large Scale Data based on Apriori MapRe-
duce Algorithm, the main objective was to enhance “Pattern Mining Algorithms” to
work on big Data by proposing a set of algorithms based on MapReduce architecture
and hadoop environment. This algorithm was merging Apriori with MapReduce, The
results indicated a good performance in wipers [21].
Effective implementation of Apriori The algorithm is based on MapReduce on the
Hadoop, a set of problems were posed, such as load balancing, the mechanism for
dividing data and how to distribute it, working on monitoring it, as well as passing
parameters between nodes [22]. Parallel and distributed computing is one of the
most widespread fields and has become wide and diversified, and there is also a
major difference that distinguishes Hadoop is its scalability, simplicity in its work,
and high reliability to solve most challenges and problems easily and effectively.
To determine the way of Distributed Parallel Computing Environment for Big Data
in mapreduce base apriori alhorthims, the researcher present the literature surveyed
(Table 1) as a case study to highlight the challenges envisaged for effective for imple-
mented the MapReduce Apriori (MRA) algorithm on the Apache Hadoop cluster to
stream/process BD.
The architecture of Hadoop cluster as on (Fig. 4) consists of Master and Slave, the
Master is Name Node and the Slaves Are Data Node. The Name Node in master of
HDFS runs the dataNode daemon in the slave. The job in master submission node runs
the task Tracker in the slave, which is the only point of contact for a client wishing
to execute a MapReduce job. The Job Tracker in the master monitors the progress
of running MapReduce task and is responsible for coordinating the execution of the
map and reduce [14].
These services work on two deferent machines, and in a small cluster they are
often collocated. The bulk of a Hadoop cluster consists of slave nodes that run both a
Task Tracker, which is responsible running the user code, and a Data Node daemon,
for serving HDFS data [13].
Enhanced MapReduce Performance for the Distributed … 197
Table 1 Comparison of existing approaches used to handle the frequent elements to Apply efficient,
validation, scalability and reliability
Author Year Objective Pros Cons
[23] 2010 Apache Mapreduce A set of 9 machines (1 master Ability of mapreduce with
framework used to calculate and 8 slave), used data from Apriori to give more
achieve parallelism and find IBM Corporation and the advantage. It can applied is
frequently element number of nodes was compared easily to many machines to
with the speed up through deal with big data without
hadoop cluster synchronization problem
[24] 2011 Data mining is used new Used data set from Google, and The algorithm works in the
strategy of rules and focused in the input data was divided into cloud computing environment
cloud computing environment two groups: a first group effectively and can extract the
and propose a method of big consisting of a 16-MB and a redundant set of data from the
data set distribution second group consisting of a group data, through the
64-MB and Experimental mechanism of data
Between N of Node and segmentation and distributing
Executions Times data.The efficiency of the
algorithm has been improved
[25] 2012 Propose new framework for The data set experiments for an The experimental and result
work on big data on certain AllElectronics branch and between three stage is actually
problems types of distributable framework used three stages: more efficient can works a
using a huge N.of nodes to scaleup-sizeup -speedup huge data
find scale well and efficiently
[26] 2013 A new model for mining Use 256 MB datasets and single Model has proven that the
dataset of frequent elements to machine to experiments Apriori results in this method are
Apply validation, scalability and FP-Growth algorithm feasible, valid and capable of
and reliability through running time with Data improving the overall
Size performance of the data
mining operation on a large
scale
[27] 2014 The algorithm works on big Created program in Java and The results proved that this
data processing and efficient application on Intel computer algorithm is led to a higher
data mining when it changes at processor 3.10 GHz i3-200 acceleration and effective in
the same time between dual-core with 4 GB main reducing the frequent time of
threshold value and the memory and used Apriori and work,
original database at the level FP-Growth coupler algorithms
analyzed by comparison
(Dataset Size, Dataset
Transactions)
[28] 2015 Discusses the use and A hadoop cluster is setup with 4 The inventory of the product
implementation mechanism system nods(3-slave and of e-commerce companies can
through mining big data for 1-master name node) on Ubuntu be updated based on the set of
e-commerce companies and 14.04 recurring items at regular
improving sales processes intervals
[29] 2016 It focuses on taking the DESIGN is enhance of apriori Map-Hbase-Apriori can only
timestamp at each stage and implement on MR and HBASE once scan to finish the
considers it as a symbol in its on hadoop cluster, and compare database matching of the
transaction, and this is between apriori orginal and MH frequent element
considered appropriate for the apriori Linux with Hadoop
process of indexing data with 0.20.0. consist of 5 nodes, (1
its timestamp master-4 slave) dataset size is
1.8 GB form IBM
(continued)
198 N. Milhem et al.
Table 1 (continued)
Author Year Objective Pros Cons
[30] 2017 Enhance Apriori based on 350,000 records from 2007 to The results showed that the
Hadoop cluster on big data 2014 were obtained after data algorithm achieved high
applied of axle faults of EMUs preprocessing; and applied of accuracy in the error
Apriori based Hadoop cluster prediction process and speed
in the operation process
[31] 2018 Developed for MR approach A new algorithm, Apriori Core This algorithm works on any
base with Apriori algorithm MapReduce, is proposed to type of database
for recursive data mining and work on big data that takes less
works on any type of database time and memory than the
original algorithm
[32] 2019 Improving performance of The algorithm is implemented If the proposed is not work
iterative element set parallel on data set and market basket with mapreduce, the time for
mining using Hadoop with analyses exploration forces will
FP-Growth and Apriori decrease
comparison
[33] 2020 Create an algorithm based on Hadoop v1.2.1 used data size Number of frequent cases
mining effectively on the real 400 GB by AIS Global for two decreases rapidly, but They
data set that works in parallel months (4–5) was used for the considered the size of the data
and To split the original data year 2012 and experimental data to be small compared to the
set through three stage consist first experiment, and we need more
stage Calculating the partition data for comparison
number N and second stage
Determining partition
boundaries and third stage
includes Partitioning the data set
MAP-REDUCE computing model (Fig. 5) include two functions, Map () and Reduce
() functions. The tow functions are both defined with pairs of data structure (key1;
value1). Map function is work to each item in the input dataset (key1; value1) pairs
and call produces a list (key2; value2). All the (key, value) that have the same key in
the output lists is save to reduce () function which generates one (value 3) or empty
return.
6 Conclusion
In this paper, we have proposed new frame to efficient pattern to mining frequent
data available in big data, and apply algorithms to effective and validity of reducing
the repetition of big data and ensuring its quality in operations. Through MapReuce
base Apriori algorithms in Hadoop cluster. Where all the practical researches related
to this field were compared to each other and the results lead to widely effective in
the field of data mining. After comparing all studies and verifying the effectiveness
of the algorithms in giving reliable results in this field, we will apply them to neural
network based deep learning, especially since it is working on MapRduce in different
studies.
References
1. Altaf, M. A. B., Barapatre, H. K., & Sangvi, A. Mining condensed representations of frequent
patterns on big data using max Apriori map reducing technique.
2. Apache Hadoop. http://hadoop.apache.org/
3. Kijsanayothin, P., Chalumporn, G., & Hewett, R. (2019). On using MapReduce to scale algo-
rithms for big data analytics: A case study. J Big Data, 6, 105. https://doi.org/10.1186/s40537-
019-0269-1
4. Singh, S., Garg, R., & Mishra, P. K. (2018). Performance optimization of MapReduce-based
Apriori algorithm on Hadoop cluster. Computers and Electrical Engineering, 67, 348–364.
ISSN 0045-7906.
5. Gharaibeh, M., Alzu’bi, D., Abdullah, M., Hmeidi, I., Al Nasar, M. R., Abualigah, L., &
Gandomi, A. H. (2022). Radiology imaging scans for early diagnosis of kidney tumors: a
review of data analytics-based machine learning and deep learning approaches. Big Data and
Cognitive Computing, 6(1), 29.
6. Gandomi, A. H., Chen, F., & Abualigah, L. (2022). Machine learning technologies for big data
analytics. Electronics, 11(3), 421.
7. Bashabsheh, M. Q., Abualigah, L., & Alshinwan, M. (2022). Big data analysis using
hybrid meta-heuristic optimization algorithm and MapReduce framework. In Integrating
meta-heuristics and machine learning for real-world optimization problems (pp. 181–223).
Springer.
8. Gharaibeh, M., Almahmoud, M., Ali, M. Z., Al-Badarneh, A., El-Heis, M., Abualigah, L.,
Altalhi, M., Alaiad, A., & Gandomi, A. H. (2021). Early diagnosis of alzheimer’s disease using
cerebral catheter angiogram neuroimaging: A novel model based on deep learning approaches.
Big Data and Cognitive Computing, 6(1), 2.
9. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for Big Data
applications in IoT cloud computing environments. Cluster Computing, 24(4), 2957–2976.
10. Abualigah, L., Gandomi, A. H., Elaziz, M. A., Hamad, H. A., Omari, M., Alshinwan, M., &
Khasawneh, A. M. (2021). Advances in meta-heuristic optimization algorithms in big data text
clustering. Electronics, 10(2), 101.
11. Abualigah, L., & Masri, B. A. (2021). Advances in MapReduce big data processing: platform,
tools, and algorithms. In Artificial intelligence and IoT (pp. 105–128).
12. Al-Sai, Z. A., & Abualigah, L. M. (2017, May). Big data and e-government: A review. In 2017
8th International conference on information technology (ICIT) (pp. 580–587). IEEE.
13. Kumar, A., Kiran, M., Mukherjee, S., & Ravi Prakash G. (2013). Verification and validation of
MapReduce program model for parallel K-means algorithm on Hadoop cluster. International
Journal of Computer Applications 72(8). (0975-8887).
202 N. Milhem et al.
14. Qayyum, R. (2020). A roadmap towards big data opportunities, emerging issues and Hadoop
as a solution. International Journal of Education and Management Engineering, 10, 8–17.
https://doi.org/10.5815/ijeme.2020.04.02
15. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
16. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
17. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
18. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
19. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
20. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
21. Nandini, G. V. S., & Rao, N. K. K. (2019) Utility frequent patterns mining on large scale
data based on Apriori MapReduce algorithm. International Journal of Research in Informative
Science Application and Techniques (IJRISAT), 3(8), 19381–19387.
22. Yahya, A. A., & Osman, A. (2019). Using data mining techniques to guide academic programs
design and assessment. Procedia Computer Science, 163, 472–481. ISSN 1877-0509,
23. Yang, X. Y., Liu, Z., & Fu, Y. (2010). MapReduce as a programming model for association
rules algorithm on Hadoop. In The 3rd international conference on information sciences and
interaction sciences (pp. 99–102). https://doi.org/10.1109/ICICIS.2010.5534718
24. Li, L., & Zhang, M. (2011). The strategy of mining association rule based on cloud computing,
In 2011 International conference on business computing and global informatization (pp. 475–
478).https://doi.org/10.1109/BCGIn.2011.125
25. Li, N., Zeng, L., He, Q., Shi, Z. (2012). Parallel implementation of Apriori algorithm based on
MapReduce. In 2012 13th ACIS international conference on software engineering, artificial
intelligence, networking and parallel/distributed computing (pp. 236–241). https://doi.org/10.
1109/SNPD.2012.31
26. Rong, Z., Xia, D., & Zhang, Z. (2013). Complex statistical analysis of big data: Implementation
and application of Apriori and FP-Growth algorithm based on MapReduce. In 2013 IEEE 4th
international conference on software engineering and service science (pp. 968–972). https://
doi.org/10.1109/ICSESS.2013.6615467
27. Wei, X., Ma, Y., Zhang, F., Liu, M., & Shen, W. (2014). Incremental FP-Growth mining
strategy for dynamic threshold value and database based on MapReduce. In Proceedings of the
2014 IEEE 18th international conference on computer supported cooperative work in design
(CSCWD) (pp. 271–276). https://doi.org/10.1109/CSCWD.2014.6846854
28. Chaudhary, H., Yadav, D. K., Bhatnagar, R., & Chandrasekhar, U. (2015). MapReduce
based frequent itemset mining algorithm on stream data. In 2015 Global conference on
communication technologies (GCCT) (pp. 598–603).https://doi.org/10.1109/GCCT.2015.734
2732
29. Feng, D., Zhu, L., & Zhang, L. (2016). Research on improved Apriori algorithm based on
MapReduce and HBase. In 2016 IEEE advanced information management, communicates,
electronic and automation control conference (IMCEC) (pp. 887–891).https://doi.org/10.1109/
IMCEC.2016.7867338
30. Li, L., Shi, T., & Zhang, W. (2017). Axle fault prognostics of electric multiple units based on
improved Apriori algorithm. In 2017 29th Chinese control and decision conference (CCDC)
(pp. 4229–4233). https://doi.org/10.1109/CCDC.2017.7979241
Enhanced MapReduce Performance for the Distributed … 203
31. Pandey, K. K., & Shukla, D. (2018) Mining on relationships in big data era using improve apriori
algorithm with MapReduce approach. In 2018 International conference on advanced compu-
tation and telecommunication (ICACAT) (pp. 1–5).https://doi.org/10.1109/ICACAT.2018.893
3674
32. Deshmukh, R. A., Bharathi, H. N., & Tripathy, A. K. (2019). Parallel processing of frequent
itemset based on MapReduce programming model. In 2019 5th International conference on
computing, communication, control and automation (ICCUBEA) (pp. 1–6)https://doi.org/10.
1109/ICCUBEA47591.2019.9128369
33. Lei, B. (2020). Apriori-based spatial pattern mining algorithm for big data. In 2020 Inter-
national conference on urban engineering and management science (ICUEMS) (pp. 310–
313).https://doi.org/10.1109/ICUEMS50872.2020.00074
A Novel Big Data Classification
Technique for Healthcare Application
Using Support Vector Machine, Random
Forest and J48
Hitham Al-Manaseer, Laith Abualigah, Anas Ratib Alsoud, Raed Abu Zitar,
Absalom E. Ezugwu, and Heming Jia
Abstract In this study, the possibility of using and applying the capabilities of
artificial intelligence (AI) and machine learning (ML) to increase the effective-
ness of Internet of Things (IoT) and big data in developing a system that supports
decision makers in the medical fields was studied. This was done by studying the
performance of three well-known classification algorithms Random Forest Classi-
fier (RFC), Support Vector Machine (SVM), and Decision Tree-J48 (J48), to predict
the probability of heart attack. The performance of the algorithms for accuracy was
evaluated using the Healthcare (heart attack possibility) dataset, freely available on
kagle. The data was divided into three categories consisting of (303, 909, 1808)
instances which were analyzed on the WEKA platform. The results showed that the
RFC was the best performer.
Keywords Big data · Internet of Things · Random forest classifier · J48 · Support
vector machine · Weka · E-Health
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 205
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_9
206 H. Al-Manaseer et al.
1 Introduction
In the current era, communication has become widespread between many things, such
as computers, large web servers, smart devices, etc. through the Internet. This contact
form is known as the Internet of Things (IoT) [1]. IoT is characterized by its massive
structure and complexity, and represents the second set of the Internet, possibly may
have trillions of interconnected points. The use of IoT will lead to achieving high
economic benefit to the various sectors, because it works to enhance the possibility of
production and innovation [2–5]. It has brought about tremendous and unprecedented
changes that helped reduce costs, improve efficiencies, and increase revenues, which
led to the generation of a huge volume of data. Figure 1 describe the concept of it.
The current technological revolution has resulted in the generation of large
amounts of data [6–9]. As a result of the massive development of the IoT, huge
amounts of data have been created. This data is called “big data”, and it refers to a
wide range of data that needs new structures and technologies to manage that data,
whether to capture and process it in order to be able to extract value to enhance
insight and decision making [10]. Big data has many characteristics such as being
large in size, high speed, high diversity, and high accuracy [11, 12]. Due to advances
in healthcare dataset management systems, large amounts of medical data have been
generated, and this type of machine learning is classified as supervised learning.
Analysis and classification methods can be used in big data science and data mining
to enhance the effectiveness of the IoT and meet the challenges it faces such as the
mechanism of storage, transportation and processing to large volume of data.
One of the problems facing big data science is the classification issue. If the dataset
contains many dimensions, the compilation process becomes moderate. However,
consideration must be given to choosing a method for extracting the desired features
from the set of features for the dataset as this leads to the loss of part of the dataset’s
data [13, 14]. The main benefit of selecting specific features and ignoring unnecessary
ones is to reduce data volumes and improve “classification/prediction accuracy” [15].
The classification method is one of the most applied methods in the data mining, as
it uses a set of previously classified examples in order to build new model can be
used in several application such as IoT E-Health systems.
Data mining is defined as the mechanism of extracting data from the data set,
discovering useful information from it, and then analyzing the data collected in order
to enhance the decision-making mechanism. Data mining uses different algorithms
and seeks to reveal specific features of data [16].
This study aims to apply data mining techniques in the E-health systems of the
IoT, especially the study of Health care (heart attack possibility) dataset and the real
feasibility of these techniques in the E-health of the IoT. There are various ways
to use the principles of data mining to create smart E-Health systems for the IoT.
As a case study, technologically scalable study of healthcare dataset was developed
using free, open source software such as WEKA (Waikato Knowledge Acquisition
Environment). And also it is aim to compare the accuracy of Random forest classifier
(RFC), Support Vector Machine (SVM), and Decision Tree-J48 (J48) algorithms in
classifying and analyzing medical data.
Here is a review of the main benefits of using healthcare data mining:
. Predicting the patient’s likelihood of having a heart attack.
. The use of data mining techniques helps decision makers (i.e. health care workers)
to make decisions related to disease cases.
. Reducing the rate of medical errors, as the use of data mining techniques in this
study predicts in advance the possibility of a heart attack.
The rest of this paper is organized as follows. Section 2 Literature Review.
Section 3 Methodology. Section 4 Process development. Section 5 Experiment and
Result. Finally, Sect. 6 shows conclusions and future work.
2 Literature Review
technique are computationally slow due to the large dataset. Some other optimization
methods can be used to optimize the problems as given in [18–23].
Cervantes et al. in [24] conducted a comprehensive survey of SVMs taxonomy
including applications, challenges, and trends that included a brief introduction to
SVMs, a description of its many applications, and a summary of its challenges and
trends. Examine and define limitations of SVMs [24]. Study and discuss the future
of SVM in conjunction with more applications. Describe the major flaws in SVM
and the various algorithms implemented to address these flaws in detail based on the
work of researchers who encountered these flaws.
Jain et al. [25] linked Apache Hadoop to Weka. The big data stored on the Hadoop
distributed file system (HDFS) and processed with Weka using Weka’s Knowledge
flow. Knowledge flow provides a good way to build topologies using HDFS compo-
nents that can be used to provide data for machine learning algorithms available in
Weka [25]. In big data mining, the supervised machine learning methods used which
include Naïve Bayes, SVM and J48. The accuracy of these methods was compared
with raw data and normative data given for the same structure. A new approach in big
data mining proposed that gives better results compared to the reference approach.
The accuracy of classifying raw data sets has been increased. Normalization was also
applied to the raw dataset and the accuracy was found to improve after supervised
estimation of the dataset.
Siou-Wei and others in [26] use the SVM for classifying and processing data
based on three characteristics: healthy, unhealthy, and very unhealthy. Uploaded the
physiological parameters of the test object and classification results to cloud storage
and web page rendering in order to provide the basis for big data analysis in future
research. All biomedical units equipped with wireless sensor network chips can
process and collect the measured data and then transmit it to the cloud server via the
wireless network for storage and analysis of that data.
Li et al. in [27] presented a comprehensive survey of using big data science and
data mining methods on IoTs aims to identify the topics that should be focused more
on in current or future research. By following up on conference articles and published
journals on IoT big data and also IoT data mining areas between 2010 and 2017.
Articles were screened using the literature review set and methodological maps of
44 articles. These articles fall into three classes: architecture, platform, framework,
and application.
3 Methodology
This part studies the methodology used to analyze big data in IoT E-Health systems,
using some of the modeling procedures. This analysis uses Health care (heart attack
possibility) dataset for training and testing purposes.
IoT data is used for the performance of systems, infrastructure and, IoT objects.
IoT objects contain data produced as a result of interaction between people, people,
systems, and systems. This data can be used to improve the services provided by the
A Novel Big Data Classification Technique for Healthcare … 209
IoT. All health centers, regardless of where testing is conducted, have access to each
patient’s information, using big data science, and also tests are stored at the same
time the test was made, allowing appropriate decisions to be made from the moment
the patient is tested.
Extracting specific data from big data, as well as extracting any data from smart
data, are thorny problems that can be solved through data mining techniques. There-
fore, different models can be used to extract data. Figure 2 illustrates a model of big
data in the IoT [17]. The dataset on IoT objects, infrastructure, includes some minute
details and information about healthcare data such as patient age, sex, etc. Health
data were classified using the RFC, SVM and J48.
have become one of the most widely used classification methods due to good
theoretical foundations and generalization [24].
This section describes the approach chosen to develop data mining techniques in
order to focus on analyzing data and discovering exploration principles by which
health information can be provided to the patient and predicted heart disease.
C. Case Study
Archived historical data was used, the data set consisted of 76 attributes, but in all
published experiments a subset of 14 attributes was used. Especially, machine
learning (ML) researchers have only used the Cleveland dataset so far. The
“target” field expresses the extent to which the patient has a heart disease. The
integer number with a value of (0) indicates that there is no less chance or
chance of having a heart attack. As for the chance of having a heart attack, it
is represented by the number (1). This data set is freely available on the kaggle
website. Table 1 show the full list of attributes [31].
Depending on the composition of the data set, a mechanism for preparing the data
and extracting knowledge from it was hypothesized. After the validation process
through the case study, the approach is applicable and feasible in many analyzes of
patients’ E-health data. The objective of the approach is to build an analytical model
to produce a set of decisions for use as a decision support system for E-Health.
Figure 3 shows a flowchart illustrating the proposed approach.
After a process of data validation by case study, the approach is applicable and
feasible in many analyzes of patients’ E-health data [17].
The WEKA data mining software was used to implement the proposed system.
WEKA is free open source software, defined as a set of ML methods for solving
data mining problems in real-world, developed in Java and works on almost any
platform. It is analytical tool that applies data mining approach to any datasets.
Although there are several supported and professional data mining software pack-
ages, WEKA provide many advantages such as it is open source, downloadable
application, fast, ease of use and access, easy to implement, and does not require any
financial requirements (i.e. no fees) [32, 33].
In this study, data stored in comma-separated values file (csv) form were used.
The target attribute was chosen as the main attribute of the trial class. Then a rules
set is used as by decision-makers in the health centers as a decision support system,
where information is provided to them to predict the possibility heart attack. The
target attribute was chosen as the main attribute of the experiment category. Then a
set of rules is used by decision makers in health centers as a decision support system,
where information is provided to them to predict the possibility of a heart attack.
As a result of increasing computing power and the massive amount of data currently
available, machine learning algorithms are becoming increasingly complex and more
powerful [34]. In this study, three types of classification algorithms are tested: SVM,
RFC, and J48.
212 H. Al-Manaseer et al.
Determining the optimal size of the dataset is essential, as too many cases and
too few can lead to imprecise models [32]. For this reason, Health care (Heart attack
possibility) dataset was divided into three categories, the first consisting of 303
instances, the second one consisting of 909 instances, and the third consisting of
1818 instances. The SVM, RFC, and J48 algorithms ran, evaluated with tenfold
validation.
Since cross-validation suffers from an overfitting problem because the data being
tested is the same as the data used in training, which means it often learns and
maintains patterns within this dataset [34]. So another evaluation mechanism used
based on creating an isolated test set consisting of 25% of the total dataset for each
of the previous three classifications and using it to evaluate these algorithms.
Figure 3 shows the percentage of correctly classified instances when the algo-
rithms are applied to the previous three categories. It is noted from the graph that
the algorithms converged in classification accuracy when the dataset size exceeded
909 cases. While SVM failed to rank at 303 cases. Table 2 shows a summary of the
results.
Figure 4 shows the percentage of correctly classified instances when the algo-
rithms are applied to the three previous categories. It is noted from the graph the
RFC outperformed the other algorithms, and the three converged in classification
accuracy when the size of the dataset exceeded 1818 instances. And also again the
SVM failed to rank at 303 cases. Table 3 shows a summary of the results.
6 Conclusion
The IoT works hand in hand with big data when huge scales of information must
be processed and analyzed. In this study, E-health data were analyzed using classifi-
cation algorithms and in particular the Health care (Heart attack possibility) dataset
was used. The optimal feature of the medical database was identified, which helps
in building an effective model in predicting heart disease. The results showed the
superiority of RFC over other.
References
1. Firouzi, F., Farahani, B., Weinberger, M., DePace, G., & Aliee, F. S. (2020). IoT fundamentals:
Definitions, architectures, challenges, and promises. In Intelligent Internet of Things (pp. 3–50).
Springer.
2. Gharaibeh, M., Alzu’bi, D., Abdullah, M., Hmeidi, I., Al Nasar, M. R., Abualigah, L., &
Gandomi, A. H. (2022). Radiology imaging scans for early diagnosis of kidney tumors: a
review of data analytics-based machine learning and deep learning approaches. Big Data and
Cognitive Computing, 6(1), 29.
3. Gandomi, A. H., Chen, F., & Abualigah, L. (2022). Machine learning technologies for big data
analytics. Electronics, 11(3), 421.
4. Bashabsheh, M. Q., Abualigah, L., & Alshinwan, M. (2022). Big data analysis using
hybrid meta-heuristic optimization algorithm and MapReduce framework. In Integrating
meta-heuristics and machine learning for real-world optimization problems (pp. 181–223).
Springer.
5. Gharaibeh, M., Almahmoud, M., Ali, M. Z., Al-Badarneh, A., El-Heis, M., Abualigah, L.,
Altalhi, M., Alaiad, A., & Gandomi, A. H. (2021). Early diagnosis of alzheimer’s disease using
cerebral catheter angiogram neuroimaging: A novel model based on deep learning approaches.
Big Data and Cognitive Computing, 6(1), 2.
6. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data
applications in IoT cloud computing environments. Cluster Computing, 24(4), 2957–2976.
7. Abualigah, L., Gandomi, A. H., Elaziz, M. A., Hamad, H. A., Omari, M., Alshinwan, M., &
Khasawneh, A. M. (2021). Advances in meta-heuristic optimization algorithms in big data text
clustering. Electronics, 10(2), 101.
8. Abualigah, L., & Masri, B. A. (2021). Advances in MapReduce big data processing: platform,
tools, and algorithms. In Artificial intelligence and IoT (pp. 105–128).
9. Al-Sai, Z. A., & Abualigah, L. M. (2017, May). Big data and e-government: A review. In 2017
8th international conference on information technology (ICIT) (pp. 580–587). IEEE.
10. Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and good
practices. In 2013 Sixth international conference on contemporary computing (IC3) (pp. 404–
409). IEEE.
214 H. Al-Manaseer et al.
11. Chebbi, I., Boulila, W., & Farah, I. R. (2015) Big data: Concepts, challenges and applications.
In Computational collective intelligence (pp. 638–647). Springer.
12. Alam, F., Mehmood, R., Katib, I., Albogami, N. N., & Albeshri, A. (2017). Data fusion and
IoT for smart ubiquitous environments: A survey. IEEE Access, 5, 9533–9554.
13. Revathi, L., & Appandiraj, A. (2015). Hadoop based parallel framework for feature subset
selection in big data. International Journal of Innovative Research in Science, Engineering
and Technology, 4(5), 3530–3534.
14. Shankar, K. (2017). Prediction of most risk factors in hepatitis disease using apriori algorithm.
Research Journal of Pharmaceutical Biological and Chemical Sciences, 8(5), 477–484.
15. Manogaran, G., Lopez, D., & Chilamkurti, N. (2018). In-mapper combiner based MapReduce
algorithm for processing of big climate data. Future Generation Computer Systems, 86, 433–
445.
16. Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2020). Multi-split optimized bagging
ensemble model selection for multi-class educational data mining. Applied Intelligence, 50(12),
4506–4528.
17. Lakshmanaprabu, S. K., et al. (2019). Random forest for big data classification in the internet
of things using optimal features. International Journal of Machine Learning and Cybernetics,
10(10), 2609–2618.
18. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
19. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
20. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
21. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
22. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
23. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
24. Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A compre-
hensive survey on support vector machine classification: Applications, challenges and trends.
Neurocomputing, 408, 189–215.
25. Jain, A., Sharma, V., & Sharma, V. (2017). Big data mining using supervised machine learning
approaches for Hadoop with Weka distribution. International Journal of Computational
Intelligence Research, 13(8), 2095–2111.
26. Su, M. Y., Wei, H. S., Chen, X. Y., Lin, P. W., & Qiu, D. Y. (2018). Using ad-related network
behavior to distinguish ad libraries. Applied Sciences, 8(10), 1852.
27. Li, W., Chai, Y., Khan, F., Jan, S. R. U., Verma, S., Menon, V. G., & Li, X. (2021). A compre-
hensive survey on machine learning-based big data analytics for IoT-enabled smart healthcare
system. Mobile Networks and Applications, 26(1), 234–252.
28. Chin, J., Callaghan, V., & Lam, I. (2017). Understanding and personalising smart city services
using machine learning, the internet-of-things and big data. In 2017 IEEE 26th International
Symposium on Industrial Electronics (ISIE) (pp. 2050–2055). IEEE.
29. Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business
Media.
30. Liang, X., Zhu, L., & Huang, D. (2017). S Multi-task ranking SVM for image cosegmentation.
Neurocomputing, 247, 126–136.
31. Naresh, B. (2021) Health care: Heart attack possibility [Online]. Kaggle, July 4, 2021. https://
www.kaggle.com/nareshbhat/health-care-data-set-on-heart-attack-possibility
A Novel Big Data Classification Technique for Healthcare … 215
32. Oliff, H., & Liu, Y. (2017). Towards industry 4.0 utilizing data-mining techniques: A case study
on quality improvement. Procedia CIRP, 63, 167–172.
33. WEKA. (2021). The workbench for machine learning [Online]. WEKA. https://www.cs.wai
kato.ac.nz/ml/weka/index.html. Last accessed June 4, 2021.
34. Géron, A. (2019)/ Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
Comparative Study on Arabic Text
Classification: Challenges
and Opportunities
Abstract There have been great improvements in web technology over the past years
which heavily loaded the Internet with various digital contents of different fields. This
made finding certain text classification algorithms that fit a specific language or a set
of languages a difficult task for researchers. Text Classification or categorization is
the practice of allocating a given text document to one or more predefined labels or
categories, it aims to obtain valuable information from unstructured text documents.
This paper presents a comparative study based on a list of chosen published papers
that focus on improving Arabic text classifications, to highlight the given models
and the used classifiers besides discussing the faced challenges in these types of
researches, then this paper proposes the expected research opportunities in the field
of text classification research. Based on the reviewed researches, SVM and Naive
Bayes were the most widely used classifiers for Arabic text classification, while
more effort is needed to develop and to implement flexible Arabic text classification
methods and classifiers.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 217
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_10
218 M. K. B. Melhem et al.
1 Introduction
2 Literature Review
Alshaer et al. in [13], studied the impact of ImpCHI squares on the text classi-
fiers (Random Forest, Naïve Bayes Multinomial, Decision Tree, Bayes Net, Artifi-
cial Neural Networks and Naïve Bayes,) and the influence of using improved CHI
squares as feature selection on the process results of the text classification to build
Comparative Study on Arabic Text … 219
the model according to precision, F-measure, Recall and Time. Also, they described
the importance of data pre-processing steps in the text classification process to derive
supporting results and improve efficiency.
Chantar et al. in [14], studied the impact of Enhanced Grey Wolf Binary Optimizer
(GWO) in the FS packaging method on the Arabic text classification problem, then the
authors, using News datasets Akhbar-Alkhaleej, Al-jazeera and Alwatan, compared
the the proposed method performance with SVM, Decision Trees, NB and KNN
classifiers.
Bahassine et al. in [15], proposed improved method that concern with employing
the Chi-square feature selection (referred to, hereafter, as ImpCHI) to make an
enhancements on Arabic text classification performance, and compared it with three
metrics (mutual information, information gain and Chi-square).
Marie-Sainte et al. in [16], studied a new proposed algorithm (firefly algorithm
based feature selection method) and applied it on different combinatorial problems.
This technique was validated by using the Support Vector Machine classifier and
three evaluation measures (precision, recall and F-measure).
Ashraf Elnagar in [17], (Arabic text classification using deep learning models,
2020), introduce a new freely rich and unprejudiced datasets for both Arabic text
categorization tasks: single-label (SANAD) and multi-label (NADiA) tasks. Also
proposed a comprehensive comparison of various deep learning models that are
used to classify Arabic text to evaluate the effectiveness of such models in NADiA
and SANAD datasets. Some other optimization methods can be used to optimize the
problems as given in [18–23].
3 Background
1. Text Classification
input or output, except for hidden layers. The hidden layer is an additional layer that is
added to the network to add more calculations, where the task is too complicated for
a small network. The number of hidden layers can reach a hundred or more. DNN
has excellent precision and is considered revolutionary. There are many types of
DNNs (Convolution neural networks (CNN), Recurrent neural networks (RNN) and
others), the difference between the various DNN models is how they are connected
[28], Arabic text classification using deep learning models, 2020).
3. Feature Selection
Feature selection is one of the most important elements that might increase the sorting
process’ performance. It is the elimination of redundant and irrelevant data and the
selection of important data to reduce the complexity of the classification process
[15].
4. CHI Square
CHI Square is a statistical approach for extracting random data from large data sets
using two independent variables and two variables. In the data mining process, it is
a method for selecting features. The CHI square method is used in the preprocessing
step of the text classification system [13].
5. Improved CHI Square
The Enhanced CHI method (impCHI) is an enhancement of the classical CHI method.
The ImpCHI method is used in conjunction with Chinese. The research result showed
that the function is effective when selecting Arabic text data. Additionally, ImpCHI
squares are used with Arabic and decision trees when using the optical drying process.
Given results showed that, in terms of recovery measures, ImpCHI performs better
than conventional CHI.
6. Grey Wolf Optimizer (GWO)
This algorithm was proposed in [29], it’s one of the most recent swarm intelligence
(SI) algorithms, which has attracted the attention of many researchers in different
fields of optimization.
7. Firefly Algorithm
Firefly Algorithm (FA) is Bio-inspired algorithm it is also well-known and efficient
algorithm [30]. It was successfully applied in the FS concept to deal with Arabic
speech recognition systems but it was not implement for Arabic text classification
[31].
Different types of Arabic text classifiers were used by Alshaer et al. in [13] (Bayes Net
(BN), Naïve Bayes (NB), Naïve Bayes Multinomial (NBM), Random Forest (RF),
Comparative Study on Arabic Text … 221
Decision Tree (DT) and Artificial Neural Networks (ANNs)) with improved CHI
(ImpCHI) Square algorithm and compared it to each other according to the Average
precision, Average Recall, Average F-measure, and Average Time, by conducting
six tests for each classifier: without pre-processing, with pre-processing, without
pre-processing and CHI, with pre-processing and CHI, without pre-processing and
ImpCHI, and with pre-processing and impCHI. The results of this study show that
using ImpCHI square as feature selection method, gave better results in precision,
Recall and F-measure. But it gave worse results in Time build model. Moreover,
results have the superiority over the classified CHI Square without the pre-processing
for Avg. precision, Avg. Recall, Avg. f-measure and Avg. time. Overall, Naïve Bayes
classifiers get the best results for Avg. precision, Avg. Recall and Avg. F-measure
which means the Naïve Bayes classifier is the best algorithm that was compared.
The used dataset was collected from different Arabic resources and contains 9055
Arabic documents.
In another study, Bahassine et al. in [15], feature selection method with improved
Chi-square and SVM classifier was used to enhance Arabic text classification process,
and compared results, via common evaluation criteria’s precision, recall and f-
measure, with previous features selection methods Mutual Information (MI), Chi-
square, Information Gain (IG) and Term Frequency-Inverse Document Frequency
(TFIDF). results showed that ImpCHI performs better than other features selection
for most features, When the number of features not equal 20, at different sizes of
features the results are better in precision, recall and f-measure when using SVM
classifier compared to DT for all features selection. But this study mentions an easily
interpretable result by non-export done by the decision tree, which helps to identify
for every class the important and pertinent term, while SVM is difficult to interpret
the results.
Chantar et al. in [14] within a wrapper FS approach proposed an enhanced binary
grey wolf optimizer (GWO) using different learning models with classifiers deci-
sion trees, K-nearest neighbour, Naive Bayes, and SVM and Three Arabic public
datasets, Alwatan, Akhbar-Alkhaleej, and Al-jazeera-News to study and evaluate
the efficacy of different BGWO-based wrapper methods. Two different methods are
proposed to convert continuous GWO (CGWO) to binary version (BGWO) BGWO1
and BGWO2. Also, common evaluation criteria’s precision, recall and f-measure
were used. The results of this research show that a great performance added via the
SVM-based feature selection technique, the proposed binary GWO optimizer and
the elite-based crossover scheme in the Arabic document classification process.
Marie-Sainte et al. in [16], go with another different approach to enhanced the
Arabic text classification in different combinatorial problems using Firefly Algo-
rithm based Feature Selection. Support Vector Machine classifier, three evaluation
measures (precision, recall and F-measure) had been used to validate this method.
The data set named OSAC used in this study was collected from the BBC and
CNN Arabic websites. The data set also contains 5843 text documents. It is divided
into two subsets to construct the training and test data of the classification system. The
preprocessing stage was skipped in this study because the dataset has already been
preprocessed. The results of this paper showed that the proposed feature selection
222 M. K. B. Melhem et al.
method is very efficient in improving Arabic Text Classification accuracy and the
precision value of this method achieves values equal to 0.994, which is great evidence
of its efficiency.
In a very attractive and extensive study, Ashraf Elnagar in [28], on the impact of
the deep learning model in Arabic text classification, proposed and introduce free,
rich and unbiased dataset freely available to the research community, for both tasks
(single-label, multi-label) of Arabic text classification were called in order SANAD
and NADiA, The final size of NADiA is approximately 485,000 articles, covering
a subset of 30 categories. In this research, nine deep learning models (BIGRU,
BILSTM, CGRU, CLSTM, CNN, GRU, HANGRU, HANLSTM and LSTM) were
developed for Arabic text classification tasks with no pre-processing requirements.
This study shows that all models work well in the SANAD corpus. The lowest
precision achieved by the convolutional GRU is 91.18% and the highest perfor-
mance achieved by the GRU of care is 96.94%. Regarding NADiA, Attention-GRU
achieved the highest overall accuracy rate of 88.68% in the largest subset of the 10
categories in the "Masrawy" dataset.
The total number of reviewed publications in this study were 5, 1 publication imple-
ment firefly algorithm, 1 publication implement binary grey wolf optimizer, 2 publi-
cations used improved CHI Square and 1 publication implement the deep learning
models. The selected publications were published in 2020. 2 publications intro-
duced new datasets one of them introduce extensive and large dataset, also, all of the
publications used a ready dataset, some of them have been already preprocessing.
Overall, all of the reviewed publications gave an improvement using the proposed
method of each other for Arabic text classification process.
List of challenges and research opportunities achieved by this study:
. Low resources of Freely Available Arabic datasets still an important challenge to
researchers.
. a verified good classifier on document classification like Naive Bayes and SVM
can be used with other methods proposed in other studies.
. The proposed methods can be used with other classifiers even if it is giving worse
results with specific method.
. ImpCHI, Firefly and GWO are affective methods which have a good research
opportunity.
. Deep learning models is an important technique that may be implemented by
adapted by any method or algorithms with superiority effective results.
Comparative Study on Arabic Text … 223
In recent years, the classification of Arabic texts has been regarded as one of the
most important topics in the field of knowledge discovery. Large amounts of data
are submitted online every day, from social media posts and comments to product
reviews. By using Arabic text classification tools, these data sources can be used to
obtain useful information. Our research explored and analyzed five recent articles
that applied different techniques to explore and improve the classification of Arabic
texts. Our findings are summarized as follows:
. Arabic dataset still considers as Low-resource for researchers.
. Using verified classifiers on deferent algorithms may enhance Arabic text
classifications.
. There are many research opportunities for the hot topics considered in deep
learning.
In the future work, we will expand the selected publications to all publications
that publish in 2020 and find the most effective classifier and method that may
accept enhancement, besides the worse classifier and methods that used in Arabic
text classifications.
References
1. Jackson, P., & Moulinier, I. (2007). Natural language processing for online applications: text
retrieval, extraction and categorization (vol. 5). John Benjamins Publishing.
2. Sanasam, R., Murthy, H., & Gonsalves, T. (2010). Feature selection for text classification based
on Gini coefficient of inequality. FSDM, 10, 76–85.
3. Feldman, R. (2007). The text mining handbook: Advanced approaches in analyzing unstruc-
tured data. Cambridge University Press.
4. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval.
5. Gharaibeh, M., Alzu’bi, D., Abdullah, M., Hmeidi, I., Al Nasar, M. R., Abualigah, L., &
Gandomi, A. H. (2022). Radiology imaging scans for early diagnosis of kidney tumors: a
review of data analytics-based machine learning and deep learning approaches. Big Data and
Cognitive Computing, 6(1), 29.
6. Gandomi, A. H., Chen, F., & Abualigah, L. (2022). Machine learning technologies for big data
analytics. Electronics, 11(3), 421.
7. Bashabsheh, M. Q., Abualigah, L., & Alshinwan, M. (2022). Big data analysis using
hybrid meta-heuristic optimization algorithm and MapReduce framework. In Integrating
meta-heuristics and machine learning for real-world optimization problems (pp. 181–223).
Springer.
8. Gharaibeh, M., Almahmoud, M., Ali, M. Z., Al-Badarneh, A., El-Heis, M., Abualigah, L.,
Altalhi, M., Alaiad, A., & Gandomi, A. H. (2021). Early diagnosis of alzheimer’s disease using
cerebral catheter angiogram neuroimaging: A novel model based on deep learning approaches.
Big Data and Cognitive Computing, 6(1), 2.
9. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data
applications in IoT cloud computing environments. Cluster Computing, 24(4), 2957–2976.
224 M. K. B. Melhem et al.
10. Abualigah, L., Gandomi, A. H., Elaziz, M. A., Hamad, H. A., Omari, M., Alshinwan, M., &
Khasawneh, A. M. (2021). Advances in meta-heuristic optimization algorithms in big data text
clustering. Electronics, 10(2), 101.
11. Abualigah, L., & Masri, B. A. (2021). Advances in MapReduce big data processing: platform,
tools, and algorithms. In Artificial intelligence and IoT (pp. 105–128).
12. Al-Sai, Z. A., & Abualigah, L. M. (2017, May). Big data and e-government: A review. In 2017
8th international conference on information technology (ICIT) (pp. 580–587). IEEE.
13. Alshaer, H., Otair, M., Abualigah, L., Alshinwan, M., & Khasawneh, A. (2020). Feature
selection method using improved CHI Square on Arabic text classifiers.
14. Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I., & Faris, H. (2020).
Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text
classification.
15. Bahassine, S., Madani, A., Al-Sarem, M., & Kissi, M. (2020). Feature selection using an
improved Chi-square for Arabic text.
16. Marie-Sainte, S. L., & Alalyani, N. (2020). Firefly algorithm based feature selection for Arabic
text classification.
17. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models.
18. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
19. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
20. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
21. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
22. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
23. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
24. Khreisat, L. (2009). A machine learning approach for Arabic text classification using N-gram
frequency statistics. Journal of Informetrics, 72–77.
25. Sebastiani, F. (2005). Text categorization. In J. H. Doorn, L. C. Rivero, & V. E. Ferraggine
(Eds.), Encyclopedia of database technologies and applications (pp. 683–687). IGI Global.
26. Dharmadhikari, S., Ingle, M., & Kulkarni, P. (2011). Empirical studies on machine learning
based text classification algorithms. Advanced Computing: An International Journal, 161–169.
27. El Kourdi, M., Bensaid, A., & Rachidi, T. (2004). Automatic Arabic document categoriza-
tion based on the Naïve Bayes algorithm. In Proceedings of the workshop on computational
approaches to Arabic script-based languages (pp. 51–58).
28. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models. Information Processing and Management.
29. Mirjalili, S., Mirjalili, S. M., & Lewisa, A. (2014). Grey Wolf optimizer. Advances in
Engineering Software.
30. Sayadi, M. K., Ramezanian, R., & Ghaffarinasab, N. (2010). A discrete firefly meta-heuristic
with local search for makespan minimization in permutation flow shop scheduling problems.
International Journal of Industrial Engineering Computations.
31. Harrag, A., & Nassir, H. (2014). Firefly feature subset selection application to Arabic speaker
recognition system. International Journal of Engineering Intelligent Systems for Electrical
Engineering and Communications.
Pedestrian Speed Prediction Using Feed
Forward Neural Network
A. Dayyabu · H. M. Alhassan
Department of Civil Engineering, Bayero University Kano, Gwarzo Road New Campus,
Kano 700241, Nigeria
e-mail: hmalhassan.civ@b.u.k.edu.ng
A. Dayyabu
Department of Civil Engineering, Nile University of Nigeria, Abuja, Nigeria
L. Abualigah (B)
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman 19328,
Jordan
e-mail: Aligah.2020@gmail.com
Faculty of Information Technology, Middle East University, Amman 11831, Jordan
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 225
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_11
226 A. Dayyabu et al.
1 Introduction
Walking has been the oldest, natural, and most used mode of transportation by a
man in search of materials to shelter, the water to drink, and the food to eat for his
survival, as such pedestrian facilities could be traced as far back as the origin of
man when the first man was brought on the earth surface. After shelter, the first man
created a footpath to source for water to drink, the food to eat, and footpath remain
the only means of transportation until when animals were domesticated [1]. Many
walk for recreation, for exercise, some walk due to its health benefits, some walk due
to its simplicity, and some walk due to its cheapness and no personal vehicle [2, 3].
Despite the advantages above, usage, and historical origin, little attention is given
to the walking facilities regarding design standards, regulation, and safety. These
results in higher pedestrian-involved accidents.
According to the World health organization (WHO, 2010, 2013, [4]), (22%) of
those killed in road traffic accidents worldwide are pedestrians. The African region
accounts for the highest with thirty-eight percent (38%) even though it has the least
number of motorized vehicles among the six world regions. Nigeria and South Africa
have the highest fatality rates (33.7 and 31.9 deaths per 100,000 population per year,
respectively) in the region. A study conducted in Ghana found that 68% of the
pedestrian killed were knocked down by a vehicle when they were in the middle of
the roadway road crossing [5]. In another study, Ogendi et al. [6] reported that out of
the 176 persons involved in a road traffic accident in Kenya, 59.1% were pedestrians.
The study also revealed that 72.6% of the pedestrians involved were injured while
crossing the road, 11% were standing by the road, while 8.2% were walking along
the road, and another 8.2% were hit while engaging in other activities, including
hawking. The trend is similar in Nigeria; for instance, Aladelusi et al. [7] found
pedestrians to be among the highest victims of a road traffic accident. Also, Solagberu
et al. [8] investigated pedestrian injuries in Lagos, Nigeria, found that 67% out of
702 pedestrians involved in a road accident resulted from road crossing instances.
Odeleye [9] mentioned poor planning, reckless motorized drivers’ behavior toward
pedestrians, and unsafe state of road traffic environment as the leading causes of a
pedestrian accident in Nigeria.
Based on the rising trend in pedestrians’ fatality globally and locally, under-
standing pedestrians’ behavior has been the focus of this research. This study aims
to develop a model for predicting pedestrians’ speed using an artificial neural network
(ANN) approach based on the field data considering the effect of gender, clothing
types, and shoe types worn by individual pedestrians in Kano, Nigeria. The micro-
scopic pedestrian model is extensively studied by many researchers, including [10],
that used the concept of magnetic theory to described movement, representing the
movement of each pedestrian by the motion of a magnetized object in a magnetic
field, assuming each pedestrian and obstacle to be positive magnetic pole and the
pedestrian destination to be a negative magnetic pole. Gipps and Marksjö [11] used
a CA-like concept to model pedestrian traffic flow. The authors use reverse gravity-
based rules to move pedestrians over a grid of hexagonal cells. Blue and Adler ([12],
Pedestrian Speed Prediction Using Feed Forward Neural Network 227
The data for the research was collected at an overhead bridge located at Sa’adatu Rimi
College of education Kano, Nigeria; the bridge is constructed in 2014 by the Kano
state government through the ministry of works, housing, and transport of Kano
state to improve pedestrian safety and reduced delays to the motorist by crossing
pedestrian. The majority of the people using the pedestrian overhead were Sa’adatu
228 A. Dayyabu et al.
Rimi College Of Education, Kano. The college is among the largest Teacher Training
Institution in Nigeria, with a student population above 45,000 in 2012. The location
of the data collection is presented in Fig. 1. The road under the bridge is a four-lane
divided arterial road with a higher traffic flow.
The input data of pedestrians’ gender, clothing type, shoe type, and speed, obtained
from the playback of the field observation video, were normalized into a standard
scale of 0–1prior to model building and analysis. The normalization was carried out
using the normalization equation presented in Eq. (1)
Pedestrian Speed Prediction Using Feed Forward Neural Network 229
X i − X min
Xs = (1)
X max − X min
Sensitivity analysis was carried to find the relationship between the input variables
and the output variable and establish the significance of each input variable in
model building. Pearson product-moment coefficient of correlation was used for
the sensitivity analysis. The Pearson correlation equation is presented in Eq. (2)
Sx y
r=. (2)
Sx x S yy
on the association of input and target output to represent the ANN architecture do to
no know general rules (Bums and Whitesides, 1993).
The research proposes an ANN model based on feed-forward with backpropaga-
tion algorithm. The chosen feed-forward ANN comprises of input; a hidden layer
and an output layer. The required number of neurons in the hidden layer is selected
by trial and error based on the best performance value. The input layer comprises
of 2 neurons; 3 neurons; 4 neurons; 5 neurons, which the target output layer has a
single neuron of field observed speed. The strength of each connection of neurons
is referred to as weight. The sum of the inputs and their weights processing into a
summation operation is given in Eq. (3)
.
n
N E TJ = Wi j X i j (3)
i=1
where Wij is established weight; Xij is input value; NETj is input to a node in layer j.
In the backpropagation technique, the target output neuron quantified by a sigmoid
function is given by Eq. (4)
1
f (N E TJ ) = (4)
1 + exp(−N E TJ )
determination, MSE, and RMSE is used for model validation. RMSE represents the
sample standard deviation of the differences between predicted values and observed
values. These values of R2 , R, MSE, and RMSE are estimated using Eqs. 5–8.
Table 2b, d present the validation result of both ascending and descending direction
pedestrians.
.n
(Oi − Pi )2
R = 1 − .i=1
2
n ( ) (5)
i=1 Oi − O
√
R = R2 (6)
.n
(Oi − Pi )2
MSE = i=1
(7)
N
√
RMSE = MSE (8)
The data collected were classified into discrete and continuous the discrete data were
presented in Fig. 3a–e; Fig. 3a pedestrian classification based on gender type; Fig. 3b
pedestrian classification based on the direction of movement; Fig. 3c pedestrian
classification based on Age group; Fig. 3d pedestrian classification based on clothing
types; Fig. 3e pedestrian classification based on shoe types.
The research shows the presence of different types of pedestrians with a total
pedestrian observed was 5672 male, 4443 in ascending direction, 1229 in descending
direction and 1138 female, 983 in ascending direction, and 155 in descending direc-
tion as presented in Fig. 3a. The pedestrian group sizes observed were single pedes-
trian having a total of 4219, 3254 in ascending direction, 965 in descending direc-
tion, two pedestrian groups having a total 1939, 1716 in ascending direction, 223 in
descending direction, three pedestrian group having a total of 631, 456 in ascending
direction, 175 in descending direction, four pedestrian group having a total of 271,
250 in ascending direction and 21 in descending direction as presented in Fig. 3b.
The pedestrians comprise all age with a pedestrian in the age range between 18–40
having a total of 5233, 4182 in ascending direction 1051 in descending direction,
age range less than 18 having a total of 402 pedestrians 242 in ascending direction
and 160 in descending direction and age range more significant than 40 with a total
pedestrians 1175, 1002 in ascending direction and 173 in descending direction as
presented in Fig. 3c. The pedestrians were observed wearing different types of clothes
ranging from English wear with a total of 1601, 1342 in ascending direction, 259
232 A. Dayyabu et al.
Fig. 3 a Pedestrian classification based on gender type. b Pedestrian classification based on the
direction of movement. c Pedestrian classification based on age group. d Pedestrian classification
based on clothing types. e Pedestrian classification based on shoe types
in descending direction, short African wear having a total of 583, 344 in ascending
direction, 239 in descending direction, long African wear having a total of 4131,
3407 in ascending direction, 724 in descending direction, and pedestrian wearing
gown/hijab accounted for a total of 495, 333 pedestrians in ascending direction and
162 in descending direction as presented in Fig. 3d. The pedestrians observed were
wearing a different type of shoes, 1756 were wearing a cover shoe, 1376 in ascending
direction, 380 in descending direction, while 5054 pedestrians were wearing slippers,
4050 in ascending direction, 1004 in descending direction as presented in Fig. 3e.
Pedestrian Speed Prediction Using Feed Forward Neural Network 233
The speed characteristics of maximum, minimum, and mean speed for all the different
pedestrian combination mentioned in the methodology are presented; with Table 1a
presenting male pedestrians speed characteristic based on cover shoe type; Table
1b presenting male pedestrians speed characteristic based on slipper shoe type and
Table 1c presenting female pedestrians speed characteristic based on slipper shoe
type.
The statistical analyses presented in Tables 1a–c indicate ascending direction
speed to be higher than descending direction pedestrian speed with a value of
67.72 m/min and 52.19 m/min, respectively. The speed distribution also indicates
male pedestrian wearing English/short African clothes and cover shoe to have a
higher mean speed of 84.21 m/min and 60.10 m/min in ascending descending direc-
tion followed by male pedestrians wearing English/short African clothes and slippers
shoe with a mean speed of 72.6 and 57.7 m/min in ascending and descending direc-
tion, followed by male pedestrians wearing long/gown clothes type and cover shoe
Table 1 a Male pedestrian speed characteristics base on cover shoe type. b Male pedestrian speed
characteristics base on slippers shoe type. c Female pedestrian speed characteristics base on slippers
shoe type
(a)
All pedestrian Pedestrian comb. I Pedestrian comb. II
Ascending Descending Ascending Descending Ascending Descending
No. of pedestrian 1167 300 240 60 141 39
Max (m/min) 102 85 102 63.75 78 54.35
Min (m/min) 34 34 51 46.36 51.57 42.35
Mean (m/min) 67.74 52.19 84.21 60.1 70.14 58.92
(b)
All pedestrian Pedestrian comb. I Pedestrian comb. II
Ascending Descending Ascending Descending Ascending Descending
No. of pedestrian 1167 300 203 56 583 145
Max (m/min) 102 85 102 72.86 70 48.35
Min (m/min) 34 34 56.67 46.36 39.35 34.23
Mean (m/min) 67.74 52.19 72.7 57.70 68.3 56.07
(c)
All pedestrian Pedestrian comb. I Pedestrian comb. II
Ascending Descending Ascending Descending Ascending Descending
No. of pedestrian 330 123 37 18 293 105
Max (m/min) 85 85 85 72.86 85 85
Min (m/min) 26.84 28.33 28.33 34 26.84 28.33
Mean (m/min) 50.42 47.09 55.25 50.5 49.1 48.90
234 A. Dayyabu et al.
Table 2 a Pearson correlation coefficient matrix for ascending direction pedestrians. b Pearson
correlation coefficient matrix for descending direction pedestrians
(a)
Male Female C-Type I C-Type II S-Type I S-Type II Seed
Male 1
Female -1 1
C-Type I 0.002295 −0.00229 1
C-Type II 0.010094 −0.01009 −0.9053 1
S-Type I −0.07758 0.077585 0.319891 −0.39542 1
S-Type II 0.06734 −0.06734 −0.32385 0.389791 −0.99453 1
Seed 0.205272 −0.20527 0.522692 −0.49069 0.611517 −0.6234 1
(b)
Male Female C-Type I C-Type II S-Type I S-Type II Seed
Male 1
Female -1 1
C-Type I 0.267199 −0.2672 1
C-Type II −0.20397 0.203973 −0.95477 1
S-Type I 0.147209 −0.14721 0.153344 −0.15923 1
S-Type II −0.14413 0.144127 −0.1508 0.157253 −0.98757 1
Seed 0.487616 −0.487622 0.822385 −0.78832 0.433206 −0.42792 1
with a mean speed of 70.14 m/min and 58.92 m/min in ascending and descending
direction, followed by male pedestrians wearing long/gown clothes type and slipper
shoe with a mean speed of 68.30 m/min and 56.07 m/min in ascending and descending
direction, followed by female wearing English/short African clothes and slipper shoe
with a mean speed of 55.25 m/min and 50.5 m/min in ascending and descending direc-
tion and lastly female wearing long/gown African clothes and slipper shoe with a
mean speed of 55.25 m/min and 50.5 m/min in ascending and descending direction.
The speed distribution of the observed pedestrian data was presented based on
the combinations specified in the methodology; Fig. 4a for all single pedestrians
in ascending direction; Fig. 4b, for all single pedestrians in descending direction;
Fig. 4c–f for male pedestrians wearing a cover shoe. Figure 4g–j for male pedestrians
wearing slippers. Figure 4k–n for female pedestrians wearing slippers.
The research uses the Pearson correlation method in determining the order of impor-
tance of each variable in model building. Table 1a, b presented the Sensitivity anal-
ysis provide a relationship between the independent with the dependent variable and
Pedestrian Speed Prediction Using Feed Forward Neural Network 235
Table 3 a ANN model training ascending direction. b ANN model validation ascending direction.
c ANN model training descending direction. d ANN model validationdescending direction
(a)
Training-phase
R2 R MSE RMSE
ANN-M1 0.4125 0.6423 0.03880 0.1971
ANN-M2 0.4559 0.6752 0.0357 0.1890
ANN-M3 0.4953 0.7038 0.0326 0.1806
ANN-M4 0.4165 0.6454 0.0386 0.1964
ANN-M5 0.5020 0.7085 0.0320 0.1790
(b)
Validation-phase
R2 R MSE RMSE
ANN-M1 0.4272 0.6536 0.0364 0.1908
ANN-M2 0.4499 0.6708 0.0348 0.1866
ANN-M3 0.4946 0.7046 0.0312 0.1767
ANN-M4 0.4311 0.6566 0.0362 0.1901
ANN-M5 0.4948 0.7034 0.0314 0.1771
(c)
Training-phase
R2 R MSE RMSE
ANN-M1 0.3997 0.6366 0.0285 0.1687
ANN-M2 0.5908 0.6322 0.0287 0.1695
ANN-M3 0.5482 0.7687 0.0196 0.1400
ANN-M4 0.6193 0.7404 0.0216 0.1471
ANN-M5 0.6193 0.7870 0.0182 0.1350
(d)
Training-phase
R2 R MSE RMSE
ANN-M1 0.3974 0.6304 0.0297 0.1723
ANN-M2 0.3975 0.6305 0.0297 0.1723
ANN-M3 0.5803 0.7618 0.0207 0.1438
ANN-M4 0.5405 0.7352 0.0226 0.1505
ANN-M5 0.6077 0.7795 0.0193 0.1390
236 A. Dayyabu et al.
Fig. 4 a Pedestrian speed distribution (ALL PEDESTRIAN ACSEND DIR). b Pedestrian speed
distribution (ALL PEDESTRIAN DESCEND DIR). c Pedestrian speeddist (PEDESTRIAN COMB.
I ACSEND DIR based on cover shoe). d Pedestrian speeddist (PEDESTRIAN COMB. I DESCEND
DIR based on cover shoe). e Pedestrian speeddist (PEDESTRIAN COMB. II ACSEND DIR based
on cover shoe). f Pedestrian speeddist (PEDESTRIAN COMB. II DESCEND DIR based on cover
shoe). g Pedestrian speeddist (PEDESTRIAN COMB. I ACSEND DIR based on slipper shoe). h
Pedestrian speeddist (PEDESTRIAN COMB. I DESCEND DIR based on slipper shoe). i Pedestrian
speeddist (PEDESTRIAN COMB. II ACSEND DIR based on slipper shoe). j Pedestrian speed
dist (PEDESTRIAN COMB. II DESCEND DIR based on slipper shoe. k Pedestrian speeddist
(PEDESTRIAN COMB. I ACSEND DIR based on slipper shoe). l Pedestrian speeddist (PEDES-
TRIAN COMB. I DESCEND DIR based on slipper shoe). m Pedestrian speeddist (PEDESTRIAN
COMB. II ACSEND DIR based on slipper shoe). n Pedestrian speeddist (PEDESTRIAN COMB.
II DESCEND DIR based on slipper shoe
Pedestrian Speed Prediction Using Feed Forward Neural Network 237
Fig. 4 (continued)
238 A. Dayyabu et al.
each variable’s significance in model building. Table 2a presents the relationship for
ascending direction, and Table 2b presents the relationship for descending direction.
Moreover, the sensitivity analysis result indicates that shoe types have more signif-
icance in ascending direction, with slippers being the most significant followed by a
cover shoe, followed by clothing type I, followed by clothing type II and less is the
gender presented in Table 2a. While in descending direction, clothing type I is the
most significant, followed by clothing type II, followed by female gender, followed
by male gender, followed by cover shoe type, and shoe type II, as presented in Table
2b.
4 Conclusion
The artificial intelligent modeling based on ANN could be used in pedestrian speed
prediction considering the effect of gender, clothing types, and shoe types, as shown
in the ANN performance analysis conducted in this research. All the ANN models
built from the observed data have the performance greater than 0.5, indicating the
acceptability of ANN in pedestrian speed prediction on a stairway.
The research also concluded that dressing of pedestrian in terms of clothing, shoe
type, and gender affects pedestrian speed, male pedestrians wearing English/short
African clothes with cover shoe has the highest speed compared with any other
dressing a pedestrian could wear. Female pedestrians wearing long African with
slippers have less speed than any other pedestrian combination.
Pedestrian Speed Prediction Using Feed Forward Neural Network 239
Fig. 5 a Pedestrian speed relationship between predicted and observed data (TRAINING). b Pedes-
trian speed relationship between predicted and observed data (TESTING). c Pedestrian speed
relationship between predicted and observed data (TRAINING). d Pedestrian speed relationship
between predicted and observed data (TESTING)
References
1. Jacobson, H. R. (1940). A history of roads from ancient times to the motor age (Georgia
Institute of Technology). https://smartech.gatech.edu/bitstream/handle/1853/36216/jacobson_
herbert_r_194005_ms_95034.pdf
2. Olojede, O., Yoade, A., & Olufemi, B. (2017). Determinants of walking as an active travel
mode in a Nigerian city. Journal of Transport and Health, 6, 327–334. https://doi.org/10.1016/
j.jth.2017.06.008
3. Litman, T. (2011). Evaluating public transportation health benefits. (April). http://site.ebrary.
com/lib/sfu/docDetail.action?docID=10534560
4. WHO. (2015). Global status report on road safety 2013. WHO. http://www.who.int/violence_
injury_prevention/road_safety_status/2013/en/
5. Damsere-Derry, J., et al. (2010). Pedestrians’ injury patterns in Ghana. Accident Analysis and
Prevention, 42(4), 1080–1088.
6. Ogendi, J., Odero, W., Mitullah, W., & Khayesi, M. (2013). Pattern of pedestrian injuries in
the city of Nairobi: Implications for urban safety planning. Journal of Urban Health, 90(5),
849–856.
7. Aladelusi, T. O., et al. (2014). Evaluation of pedestrian road traffic maxillofacial injuries in a
Nigerian tertiary hospital. African Journal of Medicine and Medical Sciences, 43(4), 353–359.
8. Solagberu, B. A., et al. (2014). Child pedestrian injury and fatality in a developing country.
Pediatric Surgery International, 30(6), 625–632.
9. Odeleye, A. J. (2001). Improved road traffic environment for better child safety in Nigeria. In
Road user characteristics with emphasis on life-styles, quality of life and safety—proceedings
of 14th ICTCT workshop held Caserta, Italy, October, 2001, pp. 72–82. http://trid.trb.org/view/
745284
240 A. Dayyabu et al.
10. Okazaki, S., & Matsushita, S. (1979). A study of simulation model for pedestrian movement. In
Architectural space, part 3: along the shortest path, taking fire, congestion and unrecognized
space into account, transactions of architectural institute of Japan, 285. https://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.626.596
11. Gipps, P. G., & Marksjö, B. (1985). A micro-simulation model for pedestrian flows. Math-
ematics and Computers in Simulation, 27(2–3), 95–105. https://doi.org/10.1016/0378-475
4(85)90027-8
12. Blue, V. J., & Adler, J. L. (1998). Emergent fundamental pedestrian flows from cellular automata
microsimulation. Transportation Research Record: Journal of the Transportation Research
Board, 1644(1), 29–36. https://doi.org/10.3141/1644-04
13. Dijkstra, J., & Jessurun, J. (2001). Theory and practical issues on cellular automata. Theory
and practical issues on cellular automata, (January 2000). https://doi.org/10.1007/978-1-4471-
0709-5
14. Wang, J., Zhang, L., Shi, Q., Yang, P., & Hu, X. (2015). Modeling and simulating for congestion
pedestrian evacuation with panic. Physica A: Statistical Mechanics and Its Applications, 428,
396–409. https://doi.org/10.1016/j.physa.2015.01.057
15. Chen, Y., Chen, N., Wang, Y., Wang, Z., & Feng, G. (2015). Modeling pedestrian behaviors
under attracting incidents using cellular automata. Physica A: Statistical Mechanics and Its
Applications, 432, 287–300. https://doi.org/10.1016/j.physa.2015.03.017
16. Hu, J., You, L., Zhang, H., Wei, J., & Guo, Y. (2018). Study on queueing behavior in pedestrian
evacuation by extended cellular automata model. Physica A: Statistical Mechanics and Its
Applications, 489, 112–127. https://doi.org/10.1016/j.physa.2017.07.004
17. Alghadi, M. Y., Mazlan, A. R., & Azhari, A. (2019). The impact of board gender and multiple
directorship on cash holdings: Evidence from Jordan. International Journal of Finance and
Banking Research, 5(4), 71–75.
18. Lu, L., Guo, X., & Zhao, J. (2017). A unified nonlocal strain gradient model for nanobeams
and the importance of higher order terms. International Journal of Engineering Science, 119,
265–277.
19. Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physical Review
E, 51(5), 4282–4286. https://doi.org/10.1103/PhysRevE.51.4282
20. Lewin, K. (1951). Field theory in social science. Amazon.co.uk: Lewin, Kurt: Books. Retrieved
September 24, 2020, from https://www.amazon.co.uk/Field-Theory-Social-Science-Lewin/dp/
B0007DDXKY
21. Teknomo, K. (2006). Application of microscopic pedestrian simulation model. Transportation
Research Part F: Traffic Psychology and Behaviour, 9(1), 15–27. https://doi.org/10.1016/j.trf.
2005.08.006
22. Helbing, D., Buzna, L., Johansson, A., & Werner, T. (2005). Self-organized pedestrian crowd
dynamics: Experiments, simulations, and design solutions. Transportation Science, 39(1), 1–
24.
23. Lakoba, T. I., Kaup, D. J., & Finkelstein, N. M. (2005). Modifications of the Helbing-Molnár-
Farkas-Vicsek social force model for pedestrian evolution. Simulation, 81(5), 339–352. https://
doi.org/10.1177/0037549705052772
24. Zanlungo, F„ Brščić, D., & Kanda, T. (2014). Pedestrian group behaviour analysis under
different density conditions. Transportation Research Procedia, 2, 149–158. https://doi.org/
10.1016/j.trpro.2014.09.020
25. Moussaïd, M., Perozo, N., Garnier, S., Helbing, D., & Theraulaz, G. (2010). The walking
behaviour of pedestrian social groups and its impact on crowd dynamics. PLoS ONE, 5(4),
e10047. https://doi.org/10.1371/journal.pone.0010047
26. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
27. Gruden, C., Otković, I. I., & Šraml, M. (2020). Neural networks applied to microsimulation:
A prediction model for pedestrian crossing time. Sustainability (Switzerland), 12(13).
Pedestrian Speed Prediction Using Feed Forward Neural Network 241
28. Das, P., Parida, M., & Katiyar, V. K. (2015). Analysis of interrelationship between pedestrian
flow parameters using artificial neural network. Journal of Medical and Biological Engineering,
35(6), 298–309.
29. Zampieri, F. L., Rigatti, D., & Ugalde, C. (2009). Evaluated model of pedestrian movement
based on space syntax, performance measures and artificial neural nets. In 7th International
space syntax symposium, pp 1–8.
30. Govindaraju, R. S. (2000). Artificial neural networks in hydrology. II: Hydrologic applications.
Journal of Hydrologic Engineering, 5(2), 124–137.
31. Solgi, M., Najib, T., Ahmadnejad, S., & Nasernejad, B. (2017). Synthesis and characterization
of novel activated carbon from Medlar seed for chromium removal: Experimental analysis
and modeling with artificial neural network and support vector regression. Resource-Efficient
Technologies, 3(3), 236–248.
32. Elkiran, G., Nourani, V., & Abba, S. I. (2019). Multi-step ahead modelling of river water quality
parameters using ensemble artificial intelligence-based approach. Journal of Hydrology, 577,
123962.
33. Price, J. L., McKeel Jr, D. W., Buckles, V. D., Roe, C. M., Xiong, C., Grundman, M., ... & Morris,
J. C. (2009). Neuropathology of nondemented aging: Presumptive evidence for preclinical
Alzheimer disease. Neurobiology of Aging, 30(7), 1026–1036.
34. Zare, M., & Koch, M. (2016, July). Using ANN and ANFIS models for simulating and
predicting groundwater level fluctuations in the Miandarband Plain, Iran. In Proceedings of the
4th IAHR Europe congress. Sustainable hydraulics in the era of global change (p. 416), Liege,
Belgium.
35. Schuchhardt, J., Schneider, G., Reichelt, J., Schomburg, D., & Wrede, P. (1995). Classification
of local protein structural motifs by kohonen networks. Bioinformatics: From Nucleic Acids
and Proteins to Cell Metabolism, 85–92.
36. Blue, V. J., & Adler, J. L. (2001). Cellular automata microsimulation for modeling bi-directional
pedestrian walkways. Transportation Research Part B: Methodological, 35(3), 293–312.
37. Zheng, X., Li, H. Y., Meng, L. Y., Xu, X. Y., & Chen, X. (2015). Improved social force model
based on exit selection for microscopic pedestrian simulation in subway station. Journal of
Central South University, 22(11), 4490–4497.
Arabic Text Classification Using
Modified Artificial Bee Colony
Algorithm for Sentiment Analysis:
The Case of Jordanian Dialect
Abstract Arab customers give their comments and opinions daily, and it increases
dramatically through online reviews of products or services from companies, in
both Arabic, and its dialects. This text describes the user’s condition or needs for
satisfaction or dissatisfaction, and this evaluation is either negative or positive
polarity. Based on the need to work on Arabic text sentiment analysis problem, the
case of the Jordanian dialect. The main purpose of this paper is to classify text into
two classes: negative or positive which may help the business to maintain a report
A. Habeeb . M. A. Otair
Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953,
Jordan
L. Abualigah (&) . A. R. Alsoud
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman,
Jordan
e-mail: Aligah.2020@gmail.com
L. Abualigah
Faculty of Information Technology, Middle East University, Amman 11831, Jordan
School of Computer Sciences, Universiti Sains Malaysia, 11800 Pulau Pinang, Gelugor,
Malaysia
D. S. A. Elminaam
Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
Faculty of Computer Science, Misr International University, Obour, Egypt
R. A. Zitar
Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, Abu Dhabi,
United Arab Emirates
A. E. Ezugwu
School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal,
King Edward Road, Pietermaritzburg, KwaZulu-Natal 3201, South Africa
H. Jia
Department of Information Engineering, Sanming University, Fujian 365004, China
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 243
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_12
244 A. Habeeb et al.
about service or product. The first phase has tools used in natural language pro-
cessing; the stemming, stop word removal, and tokenization to filtering the text.
The second phase, modified the Artificial Bee Colony (ABC) Algorithm, with
Upper Confidence Bound (UCB) Algorithm, to promote the exploitation ability for
the minimum dimension, to get the minimum number of the optimal feature, then
using forward feature selection strategy by four classifiers of machine learning
algorithms: (K-Nearest Neighbors (KNN), Support vector machines (SVM),
Naïve-Bayes (NB), and Polynomial Neural Networks (PNN). This proposed model
has been applied to the Jordanian dialect database, which contains comments from
Jordanian telecom company’s customers. Based on the results of sentiment analysis
few suggestions can be provided to the products or services to discontinue or drop,
or upgrades it. Moreover, the proposed model is applied to the database of the
Algerian dialect, which contains long Arabic texts, in order to see the efficiency of
the proposed model for short and long texts. Four performance evaluation criteria
were used: precision, recall, f1-score, and accuracy. For a future step, in order to
build on or use for the classification of Arabic dialects, the experimental results
show that the proposed model gives height accuracy up to 99% by applying to the
Jordanian dialect, and a 82% by applying to the Algerian dialect.
. .
Keywords Natural language processing Text classification Sentiment analy-
. . . . .
sis Feature selection Inspired algorithms ABC UBC KNN SVM PNN . . .
Naïve Bayes
1 Introduction
classification task to report the relevant multiple and single closed, Format the
unstructured textual to be compatible with ML algorithms. Mine the interesting
knowledge and understand customer needs. The most important task in Natural
language processing (NLP) techniques a sentiment analysis used to determine
textual is positive or negative. The use of NLP to complete automatic analysis of
text, represent data in a format suitable for machine learning [3].
One of the optimization algorithms is the artificial bee colony algorithm
(ABC) that used successfully in many studies. This algorithm suffers in part from its
stochastic feature when search in poor exploitation equation to improve it for best
solutions [4]. Because of this weakness in the algorithm, the ABC algorithm with
elite opposition-based learning strategy is utilized to solve poor exploitation in
original ABC [5]. The examining ABC algorithm with elite opposition-based
learning strategy (EOABC) [5].
Customer feedback is important for the business; for fully understand your
customer’s requirements; to know the level of customer’s satisfaction; it is nec-
essary to take customers notes to evaluate their responses. This can help with
innovation, product development and improve service that build a loyal customer
base. However, the huge volume of data needs to process. In this paper the problem
is about classification Arabic text in Jordanian dialect, which will be used in
classifiers algorithms to test the training Dataset to the predicted label.
The typical ABC algorithms are solutions of some search equation, which are
good at exploration, but often demonstrates insufficient exploitation such that
exploitation is the act of confining the search to a small area of the search space to
refine the solutions. In the artificial bee colony algorithm, the greedy equation
Chooses a food source according to the probability value, based on the roulette
wheel method. A greedy selection applies between the food source and the new
food source. As a first contribution, we modify the Artificial Bee Colony to enhance
exploitation by applying UBC algorithms instead of A greedy selection. To Choose
a food source according to the probability value, and get the optimal solution in
small area of the search exploit. As a second contribution, classifiers reveal the
ability of machine learning through supervised machine learning algorithms used to
determine the value of the text, which can be a negative value expressing dissat-
isfaction, or a positive value expressing satisfaction, in order to describe a person’s
feeling towards a product, service, or current state.
2 Related Works
2.1 Introduction
The research applies for social media content of opinions customers to solve the
Arabic Sentiment (SA) analysis problem. Analyzing their written text to apply in
improve the customer services and product quality. SA dealing with massive data.
246 A. Habeeb et al.
To reduce the high dimensionality space need feature selection for machine
learning, proposed a bio-inspired optimizer an enhancement called the salp swarm
algorithm (SSA) designed for feature selection (FS) to solve the problem of Arabic
sentiment analysis. Proposed two phases, first reduce the number of features by
apply filtering technique based on information gain metric [6].
Second phase applies the wrapper (FS) technique with combines (SSA) opti-
mizer with four variants of S-shaped transfer and applies the KNN for classification.
Experimental results show classification accuracy of SSA combined with the
S-shaped transfer, functions outperformed the particle swarms optimizer and the
grey wolf optimizer [6].
The sentiment analysis, proposed model a semi-supervised approach applies in
Arabic and its dialects. this method Made up of a deep learning algorithms, to tackle
classify Arabic text as detecting its polarity (Positive, negative), on a sentiment
corpus. The approach applies on FB Facebook text massages written in MSA
Modern Standard Arabic in DALG Algerian dialect for to scripts Arabic and
Arabizi. They have two option to handle Arabizi, translation and transliteration, the
experimented were done on many test corpora dedicated to DALG/MSD, with deep
learning classifiers such as (LR) Logistic Regression, (RF) Random Forest, (LSTM)
short-term memory and (CNN) Convolutional Neural Network. The classifiers are
combined with fast Text and Word2vec, Experimental results F1 score up 95% and
for extrinsic experiments 89% [7].
The optimization algorithm is the most important way to choose the feature
selection because it is important in the classification process for high-dimensional
text, where it works in select a set of optimal features that reduce calculation and
cost. It improves the accuracy of text classification. Feature Selection method based
on natural difference measurement and binary Jaya optimization algorithm
(NDM-BJO) and evaluations using the Support Vector Machine and Naive Bayes,
to find the error rate. The results show that the NDM-BJO model gives improve-
ments. Evaluating various categories of feature Selection methods [8].
A difficult mathematical task in machine learning is text classification, due to the
large increase in natural language text documents. Here the feature selection is the
basis of the process because thousands of feature sets are possible to classify the
text. The proposed model suggests an enhanced binary gray wolf (GWO) modified
within a wrapper (FS) approach to address Arabic script classification problems.
Shell-based feature selection while using various learning models, Naive Bayes,
K-nearest neighbor and SVM classifiers, training data from three Arab public
datasets, Gulf News, Al Watan and Al Jazeera News, BGWO-based wrapper
methods. Results and analysis show that SVM based feature selection technique
with the proposed binary GWO optimizer with elite-based crossover scheme has
enhanced efficacy in dealing with Arabic text classification problems compared to
other peers [9].
Choose efficient features from datasets is important to artificial intelligence,
pattern recognition, text classification, and data mining, Feature selection (FS) can
exclude features that are not relevant to the classification task and reducing the
dimensions of data sets, which helps us understand better data. By choosing feature
Arabic Text Classification Using Modified Artificial Bee Colony … 247
selection, machine learning techniques are performed Optimize, and reduce account
requirements. So far, a large number of feature selection methods suggested, while
the most practical method suggested not found.
Although it is conceivable that different classes of feature selection methods
followed various criteria to evaluate the variables, which were focus on rare studies
Evaluation of the different classes of feature selection methods. Feature selection
methods under five different categories are thirteen superiors, focusing on assess-
ment compare the general diversity and effectiveness of these methods.
Thirteen feature selection methods classified using the rank aggregation method.
The later, the better Five FS methods chosen to perform multi-class classifications.
The SVM a classifier. Different numbers, different languages of the selected fea-
tures, and different performance measures used for general diversity and measure
validation of these methods combined. Analyses results signify the Mahalanobis
distance is the better approach ever [10].
Many different techniques used to identify offensive speech in the media and
tweet community. This research classifies neural networks (NN). To participate in
the task OffensEval No. 12 of the workshop SemEval 2020, a model used to
identify offensive speech C-BiGRU composed of a CNN, bidirectional RNN.
A multidimensional numerical representation or each words and detect it using fast
text, this apply on dataset of labels tweets to train the model on detecting a words
have an offensive meaning, this model use for English, Turkish and Danish.
Respectively models achieved 90.88%, 76.76% and 76.70% F1-score [11].
The emotional state of client’s needs to understand through sentiment analysis
technique in natural language processing. To analyses the Chinese language, the
proposed use LSTM-based Chinese text sentiment analysis, Bi-GRU and, attention
mechanism model. This model works on deep properties of text and merges context
to learn text properties with greater precision. Then the Multi-Head Self-Attention
Model used to reduce external transactions and determine word weights and mis-
lead the distinct text. The experiment gets 87.1% accuracy [12].
Cyberbullying is a problem that has victims, with the increase in the use of the
Internet, more cyberbullying results. Classification studies on bullying in Arabic
and English have done. This paper suggests using RNN algorithms with trained
pre-word embedding an interconnected set of experiences on channel News
Comments dataset, 0.84 F1 Scores [13].
Predominantly the exploitation problem appears in the (ABC) algorithm. The
swarm of honeybees inspired this algorithm. It has addressed many problems. For
more solution for exploitation in ABC algorithms, this paper proposes a chaotic
ABC with elite opposition based learning strategy. The outcome is to improve
exploitation ability. Furthermore, the elite opposition utilized to best exploit
potency in available solutions. The results compared with several artificial bee
colony algorithms [14].
Contribute to sentiment analysis for natural language processing, concerned with
classifies the polarity of the text and the cause the need to understand opinions,
feelings, emotions, and evaluations data is urgent. This work aims to implement a
248 A. Habeeb et al.
sentiment analysis system that identifies and understands semantics without lin-
guistic resources. The proposed model examined to detect its polarity positive or
negative [15].
Feature selection is very important for classification, it enhances classification
performance, removes redundant features, and reduces computational time.
A proposal for a new error-based artificial bee colony algorithm for the feature
selection problem. Developed by incorporating new error-based standardized
solution search mechanisms. Thirteen machine learning data sets are used. SVM
and KNN Classification algorithms are used [16].
The proposed Multi-objective artificial bee colony-based feature weighting tech-
nique for naïve Bayes (MOABC-FWNB), the approaches consider the relationship
between feature-feature (redundancy) independently and feature-class (relevancy)
using the Naïve Bayes (NB), the proposed model to determine the weights of features,
an experimental study was conducted on 20 benchmark UCI datasets [17] (Table 1).
The literature review is related to text value extraction, so that the text value is
used in a diverse way. To employ it from the process of classification or analyzing a
feeling or extracting a certain value. We mentioned in this research problem about
sentiment analysis in Arabic text, to cover these gaps, in this paper we work to
identify a subset of optimal traits by modifying the artificial bee colony algorithm,
and then employ this subset of features in the classification process within super-
vised machine learning to build an integrated application that serves prediction
operations to analyze the human feeling from the value of a text.
3.1 Introduction
This chapter presents the procedures and implementation of the experiments and
how to obtain the results of our proposed models. This paper aims to get the
minimum number of the optimal feature that effect the value of text using the
enhanced ABC-UBC designed for feature selection described in details section 3.4
then apply with wrapper technique classification that needed for machine learning
to improve accuracy Classifiers text. to solve the problem of Arabic sentiment
analysis. The proposed model has examined on two datasets, (1) Jordanian dialect
sentiment corpus (2) Algerian dialect sentiment corpus. In addition, the datasets will
have divided into 80% training, 20% test.
The learning model phase depends on an optimal set of features from the
essential phase, which will be used in classifiers algorithms to test the training
dataset to the predicted label. Evaluate the proposed model compared with widely
used classification techniques. The pre-processing steps of the dataset will also be
discussed in this chapter
The entire experiment was designed and implemented using Python. Python 3.8,
Spyder 3, Jupiter notebook server is: 6.1.4, have been used to import dataset and
Arabic Text Classification Using Modified Artificial Bee Colony … 249
Table 1 (continued)
Author Method Dataset Research title Summary
[12] Neural network Chinese text A Intelligent Model
models CNN-BiLSTM (CNN-BiLSTM)
approach for chinese The experiment gets
sentiment analysis 87.1% accuracy
on spark
[13] Neural network Arabic channel Classification of The result of f1
models news comments cyberbullying text in scores up to 84%
dataset Arabic
[14] ABC algorithm Benchmark test A survey on the Presented a survey
with elite functions studies employing on studies of
opposition-based machine learning improving the ABC
learning strategy (ML) for enhancing using ML
(EOABC) artificial bee colony
(ABC) optimization
algorithm
[15] Neural network Corpus of Deep Using deep learning
models Arabic texts attention-based ANN
review level
sentiment analysis
for arabic reviews
[16] New standard Thirteen A new standard error Using artificial bee
error-based datasets are used based artificial bee colony algorithm
artificial bee from colony algorithm
colony (SEABC) UCI machine and its applications
algorithm learning datasets in feature selection
[17] Multi-objective Twenty Feature weighting Using multi
artificial bee benchmark UCI for naïve Bayes objective artificial
colony-based datasets using multi objective bee colony
feature weighting artificial bee colony algorithm
technique for naïve algorithm For Feature
Bayes weighting
evaluate and compare the result. Using the CountVectorizer means breaking down a
sentence or paragraph or any text into words then to convert the words to multi-
dimensional matrix to training data in classifiers forward features selection using
the machine learning algorithms. The operating system was used OS Windows 10
20H2, Processor Intel(R) Core(TM) i7-3520M, RAM 12 GB.
This paper has two datasets as shown in Fig. 1, first the Jordanian dialect sentiment
corpus 3000 notes are written in the Arabic Jordanian dialect specifically and
collect from different telecommunication companies, the dataset was collected from
Arabic Text Classification Using Modified Artificial Bee Colony … 251
Jordanian telecom company notes that were written by call center employees, these
notes were written during the customer’s calls with call center. Call center
employees summarize the calls that they receive as notes. The characteristics of the
dataset are given in Table 2.
The second dataset, the Algerian dialect sentiment corpus Articles extracted
from political, news, sports, religion, and society articles selected from Algerian
Arabic newspaper websites. The characteristics of the dataset are given in Table 3.
The dataset has been divided into two different categories as positive, negative. The
dataset has been annotated by a group of experts, the classification of Arabic
messages into two categories has been linked with a number, to facilitate the
classification process as 1 for positive, and 2 for negative.
Table 4, shows a sample of Jordanian dialect sentiment corpus, Table 5 shows a
sample of Algerian dialect sentiment corpus.
The Algerian dataset contains articles that describe the human feeling in its
positive or negative state, as this paper needs long paragraphs to train the proposed
model.
3.4 Preprocessing
3.4.1 Tokenization
The process of converting text into tokens before transforming it into vectors. It is
also easier to filter out unnecessary tokens. For example, split a document into
paragraphs or sentences into words. In this case, the tokenizing split sentences into
words as shown in Fig. 1 pre-processing phase. Words using CAMel Tools to apply
tokenizing for Arabic Natural language processing in (ANLP) Python [18].
The main task is to avoid non-meaningful, it is important for text classification can
reduce the error with high accuracy. Each file of the corpus was subject to the
following procedure as shown in Fig. 1 pre-processing phase:
. Delete digits, punctuation marks and numbers.
. Delete all non-Arabic characters.
. Delete stop-words and non-useful words like pronouns, articles.
. In addition, propositions.
. Change the letter ‘‘”ﻯto ‘‘”ﻱ.
. Change the letter ‘‘”ﺓto ‘‘”ﻩ.
. Change the letter ‘‘” ‘‘ﺁ,” ‘‘ﺇ,” ‘‘ﺅ,” ‘‘ﺉ,” ﺃto ‘‘”ﺍ. Delete characters that confuse
the classification process [19].
3.4.3 Stemming
The implements CAMel Tools for ANLP Arabic Natural language processing in
Python as shown in Fig. 1 pre-processing phase, a collection of open-source,
utilities for dialect identification, pre-processing, morphological modeling, senti-
ment analysis, and named entity recognition, and describe the functionalities and
stemming of Arabic words [18].
It is a process of reducing inflected words into one root or stem by removing
suffixes, prefix, and infixes. Types of Stemming: statistical [20].
Arabic Text Classification Using Modified Artificial Bee Colony … 255
Karaboga [22] has defined the swarm intelligence as “any attempt to design
algorithms or distributed problem-solving devices inspired by the collective
behavior of social insect colonies and other animal societies” there is a special
intelligent behavior of a honey bee swarm, based on this foraging behavior,
Table 7 (continued)
Features Words Score
Feature: 779 ﻣﺶ Score: 0.00286
Feature: 780 ﻣﺸﺎﻛﻞ Score: 0.00458
Feature: 782 ﻣﺸﺠﻌﺔ Score: 0.00552
Feature: 802 ﻣﻤﺘﺎﺯ Score: 0.01359
Feature: 804 ﺍﻋﻄﻮﻧﻲ Score: 0.00505
Feature: 815 ﻓ ﺰﺕ Score: 0.00443
Feature: 812 ﻣﻨﻴﺤﺔ Score: 0.00310
Feature: 828 ﻧﺰﻟﺖ Score: 0.00472
Feature: 887 ﻳﻄﻠﻌﻠﻲ Score: 0.00542
Feature: 888 ﻳﻌﻤﻞ Score: 0.00117
establish the new ABC algorithm simulating real world. The ABC algorithms can
be efficiently used for solving multimodal and multidimensional optimization
problems.
The ABC has three groups, employed, onlookers, and Scouts bees. Distributed
as the first half has employed artificial bees, second half consist of onlookers.
One employed bee for food source, onlooker bees wait in the hive and decide on
a food source to exploit based on the information shared with the employed bees.
The employed bee becomes a scout after depleting its food [22].
To produce a candidate solution according to Vij , position from the old one in
this phase search of employed bees denoted by Eq. 2, where j 2 (1, 2, …, D),
k 2 (1, 2, …, SN). hji ; ; theta is a random number in [−1, 1]. A food source vi is
assigned for every food source xi . Once vi is obtain it will be evaluated and
compared with xi . A greedy selection is applied between xi and vi . Then, best one is
selected depending on fitness values, the food amount of at xi .
( )
vji ¼ xji þ uji xji _ xjk ð2Þ
ABC select food, each onlooker bee select food depending on fitness value that
is obtained from employed bees. Where the fit(xi) is the fitness value of solution i.
Onlooker will select food source and produce new candidate position pi of the
selected food. Moreover, the selection probability of each solution is calculated by:
fit(x Þ
pi ¼ PSN i ð3Þ
m¼1 fit(xm Þ
After completing the search of employed and onlooker bees, the ABC algorithm
checks with here is any exhaust source to be disused. The scouts can discover rich
entirely as unknown food sources.
Arabic Text Classification Using Modified Artificial Bee Colony … 259
The Original Artificial Colony Bee Algorithm has three control parameters, food
source, limit value to stop iteration when find the optimal food source,and MEN the
maximum cycle number [23].
Upper Confidence Bound algorithms changes its pure Exploration and Exploitation
balance as it gathers more information of the environment to best exploitation in
it [24].
Exploration and exploitation are essential for a population-based optimization
algorithm. Like PSO, GA, DE, where exploration refers to the ability to achieve
optimal discovery of unknown areas. In terms of exploitation, it is the ability to
apply prior knowledge to obtain a better solution in practice for exploration [25].
The ABC algorithm is the process for maximum or minimum solution in
problem-solving within possible search space. The scout bees have to control the
exploration ability while employed and onlooker bees are having exploitation
ability. The artificial bee colony is efficient for constrained and multidimensional
basic functions. When we deal with local search ability. the convergence rate is
poor with complex multimodal function.
The artificial bee colony algorithm in equation (2) Chose a food source
according to the probability value, based on the roulette wheel method. A greedy
selection is applied between xi and vi. In this phase of the original ABC Pseudocode
(1): (5)(c) and (6)(d) where apply the greedy selection is applied, in order to
improve the exploitation some modifications inspired by Upper Confidence Bound
algorithm (UBC). with this modified affects the four results: mode, mean, median,
and standard deviation.
The UCB algorithm modifies its levels of exploration and exploitation, when
UCB has information about the available actions. Low confidence in the best
actions, can increase good action favors exploitation. adjust the balance as time
progresses, the UBC achieves an optimal action of average reward compared to
greedy.
sffiffiffiffiffiffiffiffiffiffiffi
log t
AðtÞ ¼ argmax½QtðaÞ þ ð4Þ
c
UBC algorithm
NtðaÞ
where Nt(k) is the number of times the treatment arm k has been selected up to the
time t, equation (5).
where argmax specifies choosing the action ‘a’ for Qt(a) is maximizing QtðaÞ
action ‘a’ at time step ‘t’.
260 A. Habeeb et al.
Table 8 shows how to map parameters in the equation from greedy selection to
the UBC selection process.
Since the UBC is high potential for being optimal, it inspired a method of to
MAB problems called (Upper confidence bound) approach [26].
In order to simplify the modified ABC with UBC as shown in Pseudocode 2 step
(5)(a), (6)(d), and how much the modified selection of new food source effects the
behavior of ABC-UBC using the reinforcement learning at the Artificial bee colony.
(1) Generate the initial population xi (i = 1, 2,..., SN)
(2) Evaluate the fitness (fit(xi)) of the population
(3) Set cycle to 1
(4) Repeat
(5) For each employed bee {
(a) Produce new solution Vi by using (2)
(b) Calculate its fitness value fit(Vi)
(c) Apply UBC selection process}
(6) Calculate the probability values Pi for the solution (xi) by (3)
(7) For each onlooker bee {
(a) Select a solution xi depending on Pi
(b) Produce new solution Vj
(c) Calculate its fitness value fit(Vj)
(d) Apply UBC selection process}
(8) If there is an abandoned solution for the scout,
then replace it with a new solution which will be randomly produced by (4)
(9) Memorize the best solution so far
(10) Cycle = cycle + 1
(11) Until cycle = maximum cycle number
Pseudocode 2: modified ABC with UBC
Arabic Text Classification Using Modified Artificial Bee Colony … 261
The ROC analysis evaluates models using (FPR) false positive rate and
(TPR) true positive rate. These are calculated as FPR ¼ FP N and TPR ¼ P , where N
TP
is the number of negative, p is the number of positives, and TP is the number of true
positives. Researchers use the Forward Feature Selection, that starts with no feature
and adds one at a time by evaluating all features individually, then select the feature
that results in best performance [6].
Text categorization or tagging is the process of tagging text into labeled groups, text
classifiers can analyze text and assign labels or tags based on their content [30].
4 Results
This paper aims to extract the polarity of Arabic text, for the introduced datasets. To
classify these texts using the proposed model Fig 1. The Results section is a
summary of the experiments that will be presenting results in tables. Four perfor-
mance evaluation criteria were used: precision, recall, f1-score, and accuracy.
Table 17 Result of KNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 KNN 0.89 0.94 0.94 0.89 0.92 0.91 0.92
selection with Dialect Macro 0.92 0.92 0.92
ABC-UBC avg
results with
Weighted 0.92 0.92 0.92
pre-processing
avg
Accuracy 0.92
Table 18 Result of SVM using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 SVM 0.86 0.99 0.80 1.00 0.83 0.99 0.98
selection with dialect Macro 0.92 0.90 0.91
ABC-UBC avg
results with
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98
Table 19 Result of NB using forward feature selection with ABC-UBC and pre-processing phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 Naïve 1.00 0.99 0.80 1.00 0.89 0.99 0.99
selection with Dialect Bayes
ABC-UBC Macro 0.99 0.90 0.94
results with avg
pre-processing
Weighted 0.99 0.99 0.99
avg
Accuracy 0.99
268 A. Habeeb et al.
Table 20 Result of PNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 8 PNN 0.92 0.99 0.80 1.00 0.86 0.99 0.98
selection with Dialect Macro 0.95 0.90 0.92
ABC-UBC avg
results with
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98
Arabic Text Classification Using Modified Artificial Bee Colony … 269
Table 21 Result of KNN using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 KNN 0.89 0.94 0.94 0.89 0.92 0.91 0.92
selection with Dialect Macro 0.92 0.92 0.92
ABC-UBC avg
results without
Weighted 0.92 0.92 0.92
pre-processing
avg
Accuracy 0.92
270 A. Habeeb et al.
Table 22 Result of SVM using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 SVM 0.86 0.99 0.80 0.99 0.83 0.99
selection with Dialect Macro 0.92 0.90 0.91
ABC-UBC avg
results without
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98
Arabic Text Classification Using Modified Artificial Bee Colony … 271
Table 23 Result of NB using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall abel f1-score Accuracy
UBC for label 1 12 label 1 2
Fno 2
Feature Jordan 10 Naïve 0.92 0.99 0.80 1.00 0.86 0.99 0.98
selection with Dialect Bayes
ABC-UBC Macro 0.95 0.90 0.92
results without avg
pre-processing
Weighted 0.98 0.98 0.98
avg
Accuracy 0.98
272 A. Habeeb et al.
Table 24 Result of PNN using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2 training test
Fno 2
Feature Jordan 8 PNN 0.92 0.99 0.80 1.00 0.86 0.99 0.99 0.97
selection with Dialect Macro 0.95 0.90 0.98
ABC-UBC avg
results without
Weighted 0.98 0.98 0.92
pre-processing
avg
Accuracy 0.98
Arabic Text Classification Using Modified Artificial Bee Colony … 273
Table 33 Result of KNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 KNN 0.94 0.67 0.62 0.95 0.75 0.79 0.77
selection with Dialect Macro 0.81 0.79 0.77
ABC-UBC avg
Results with
Weighted 0.82 0.77 0.77
Pre-processing
avg
Accuracy 0.77
Table 34 Result of SVM using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 SVM 0.83 0.76 0.79 0.79 0.81 0.77 0.79
selection with Dialect Macro 0.79 0.79 0.79
ABC-UBC avg
Results with
Weighted 0.79 0.79 0.79
Pre-processing
avg
Accuracy 0.79
Table 35 Result of NB using forward feature selection with ABC-UBC and pre-processing phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 Naïve 0.90 0.74 0.75 0.90 0.82 0.81 0.82
selection with Dialect Bayes
ABC-UBC Macro 0.82 0.82 0.82
results with avg
pre-processing
Weighted 0.83 0.82 0.82
avg
Accuracy 0.82
278 A. Habeeb et al.
Table 36 Result of PNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC UBC Classifier precisionnfor Recall f1-score Accuracy
Fno label 1 2 label 1 2 label 1 2 Training
Test
Feature selection Jordan 10 PNN 0.86 0.65 0.62 0.87 0.72 0.75 0.74 0.70
with ABC-UBC Dialect macro avg 0.76 0.75 0.74
results with
pre-processing weighted avg 0.77 0.74 0.73
accuracy 0.74
Arabic Text Classification Using Modified Artificial Bee Colony … 279
Table 37 Result of KNN using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2 training test
Fno 2
Feature Jordan 10 KNN 0.94 0.70 0.67 0.95 0.78 0.80 0.79 0.793
selection with Dialect Macro 0.82 0.81 0.79
ABC-UBC avg
results
Weighted 0.70 0.95 0.80
without
avg
pre-processing
Accuracy 0.79
Table 38 Result of SVM using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2
Fno 2
Feature selection Jordan 10 SVM 0.81 0.96 0.71 0.79 0.76 0.74 0.75
with ABC-UBC Dialect Macro 0.75 0.75 0.75
results without avg
pre-processing
0.75 0.75 0.75 0.75
Accuracy 0.75
Arabic Text Classification Using Modified Artificial Bee Colony … 281
Table 39 Result of NB using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2 training test
Fno 2
Feature Jordan 10 Naïve 0.90 0.74 0.75 0.90 0.82 0.81 0.82 0.81
selection with Dialect Bayes
ABC-UBC Macro 0.82 0.82 0.82
results without avg
pre-processing
Weighted 0.83 0.82 0.82
avg
Accuracy 0.82
282 A. Habeeb et al.
Table 40 Result of PNN using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2
Fno 2
Feature selection Jordan 6 PNN 0.91 0.67 0.62 0.92 0.74 0.77 0.76
with ABC-UBC Dialect Macro 0.79 0.77 0.76
results without avg
pre-processing
Weighted 0.80 0.76 0.76
avg
Accuracy 0.76
Arabic Text Classification Using Modified Artificial Bee Colony … 283
Arabic text classifiers with Pre-processing Arabic text classifiers without Pre-processing
1 1
0.9
0.8 0.8
0.7
0.6
0.6
0.5
0.4
0.4
0.3 0.2
0.2
0.1 0
0 Precision Recall F1-SCORE Accuracy
Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN
SVM NB PNN KNN
F.F.S with ABC-UBC with Pre-processing phase F.F.S with ABC-UBC without Pre-processing phase
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
Precision Recall F1-SCORE Accuracy Precision Recall F1-SCORE Accuracy
Fig. 20 Compared prediction accuracy for the four tests using Jordanian dialect dataset
Table 42 (continued)
Model Optimization Machine Performance measures
algorithms learning Precision Recall F1-SCORE Accuracy
classifiers
Arabic text using Modified
forward feature ABC-UBC
selection with SVM 0.79 0.79 0.79 0.79
ABC-UBC and
NB 0.82 0.82 0.82 0.82
pre-processing
phase PNN 0.76 0.75 0.74 0.74
Arabic text using Modified KNN 0.82 0.81 0.79 0.79
forward feature ABC-UBC
selection with SVM 0.75 0.75 0.75 0.75
ABC-UBC without
NB 0.82 0.82 0.82 0.82
pre-processing
phase PNN 0.74 0.72 0.72 0.76
Arabic text classifiers with Pre-processing Arabic text classifiers without Pre-processing
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
Precision Recall F1-SCORE Accuracy Precision Recall F1-SCORE Accuracy
F.F.S with ABC-UBC with Pre-processing phase F.F.S with ABC-UBC without Pre-processing phase
1 1
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 Precision Recall F1-SCORE Accuracy
Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN
SVM NB PNN KNN
Fig. 21 Compared prediction accuracy for the four tests using Algerian dialect dataset
286 A. Habeeb et al.
5 Conclusion
In this paper, The extent to which the modified algorithm influences optimal fea-
tures. within Jordanian text Classifiers and their effect, the proposed modified
ABC-UBC achieves the minimum number of feature selection picks out the optimal
features from the words for the classification task. The test was carried out using the
Jordanian dialect dataset. The comparison of performance measures shown in
Table 40, with four tests in Jordanian text classifiers: with Pre-processing phase,
without Pre-processing phase, with using forward feature selection with ABC-UBC
with Pre-processing phase, and with using forward feature selection with
ABC-UBC without Pre-processing phase. We inferred The optimized features are
given into the classification task. with higher accuracy up to 99% Moreover, the
precision, recall, and f1-score also rate from 95% to 99%. After testing the
classification algorithms, we compared prediction accuracy for four tests so that
have Support Vector(SVM), KNeighborsClassifier(KNN), Naive Bayes(NB),
Probabilistic Neural Network (PNN) as shown in Fig. 5 the best result of KNN, NB,
PNN, accuracy up to 99.9%. And the test was carried out using the Algerian dialect
dataset. The comparison of performance measures shown in Table 41, with four
tests in Algerian text classifiers: with Pre-processing phase, without Pre-processing
phase, with using forward feature selection with ABC-UBC with Pre-processing
phase, and with using forward feature selection with ABC-UBC without Pre-
processing phase. This model with the four tests gives accuracy up to 82% (for F1
score).
A comparison between the contents of the Jordanian dialect data set and the
Algerian dialect data set.
The text size in the Jordanian dialect does not exceed twenty words for each row
in the database. While the text size in the Algerian dialect is a long paragraph, the
words are more than 100 per row in the database. Through experience, the fol-
lowing was observed: The accuracy of classification is affected by the number of
words. If the number of a word decreases, the accuracy of classification increases.
In the future, The objective is to apply the proposed model supervised approach
in Arabic, and its dialects, to be comparable with other methods after test in more
Arabic datasets. The method will introduce different functions like spam detection
and others to achieve the excellent results of the Arabic text classification system.
References
1. Proudfoot, D. (2020). Rethinking turing’s test and the philosophical implications. Minds and
Machines, 1–26.
2. Janani, R., & Vijayarani, S. (2020). Automatic text classification using machine learning and
optimization algorithms. Soft Computing, 1–17.
3. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models. Information Processing & Management, 57(1), 102121.
Arabic Text Classification Using Modified Artificial Bee Colony … 287
4. Karaboga, D., Gorkemli, B., Ozturk, C., & Karaboga, N. (2014). A comprehensive survey:
Artificial bee colony (ABC) algorithm and applications. Artificial Intelligence Review, 42(1),
21–57.
5. Jiang, D., Yue, X., Li, K., Wang, S., & Guo, Z. (2015). Elite opposition-based artificial bee
colony algorithm for global optimization. International Journal of Engineering, 28(9), 1268–
1275.
6. Alzaqebah, A., Smadi, B., & Hammo, B. H. (2020, April). Arabic sentiment analysis based on
salp swarm algorithm with S-shaped transfer functions. In 2020 11th International
Conference on Information and Communication Systems (ICICS) (pp. 179–184). IEEE.
7. Guellil, I., Adeel, A., Azouaou, F., Benali, F., Hachani, A. E., Dashtipour, K., ... & Hussain,
A. (2021). A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages:
Application to the algerian dialect. SN Computer Science, 2(2), 1–18.
8. Thirumoorthy, K., & Muneeswaran, K. (2020). Optimal feature subset selection using hybrid
binary Jaya optimization algorithm for text classification. Sādhanā, 45(1), 1–13.
9. Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I., & Faris, H. (2020).
Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text
classification. Neural Computing and Applications, 32(16), 12201–12220.
10. Zheng, W., & Jin, M. (2020). Comparing multiple categories of feature selection methods for
text classification. Digital Scholarship in the Humanities, 35(1), 208–224.
11. Hussein, O., Sfar, H., Mitrović, J., & Granitzer, M. (2020, December). NLP_Passau at
SemEval-2020 Task 12: Multilingual neural network for offensive language detection in
English, Danish and Turkish. In Proceedings of the Fourteenth Workshop on Semantic
Evaluation (pp. 2090–2097).
12. Pan, Y., & Liang, M. (2020, June). chinese text sentiment analysis based on BI-GRU and
self-attention. In 2020 IEEE 4th Information Technology, Networking, Electronic and
Automation Control Conference (ITNEC) (vol. 1, pp. 1983–1988). IEEE.
13. Rachid, B. A., Azza, H., & Ghezala, H. H. B. (2020, July). Classification of cyberbullying
text in Arabic. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1–
7). IEEE.
14. Guo, Z., Shi, J., Xiong, X., Xia, X., & Liu, X. (2019). Chaotic artificial bee colony with elite
opposition-based learning. International Journal of Computational Science and Engineering,
18(4), 383–390.
15. Almani, N., & Tang, L. H. (2020, March). Deep attention-based review level sentiment
analysis for Arabic reviews. In 2020 6th Conference on Data Science and Machine Learning
Applications (CDMA) (pp. 47–53). IEEE.
16. Hanbay, K. (2021). A new standard error based artificial bee colony algorithm and its
applications in feature selection. Journal of King Saud University-Computer and Information
Sciences.
17. Chaudhuri, A., & Sahu, T. P. (2021). Feature weighting for naïve Bayes using multi objective
artificial bee colony algorithm. International Journal of Computational Science and
Engineering, 24(1), 74–88.
18. Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., ... & Habash, N. (2020,
May). CAMeL tools: An open source python toolkit for Arabic natural language processing.
In Proceedings of the 12th language resources and evaluation conference (pp. 7022–7032).
19. Ayedh, A., Tan, G., Alwesabi, K., & Rajeh, H. (2016). The effect of preprocessing on arabic
document categorization. Algorithms, 9(2), 27.
20. Chen, P. H. (2020). Essential elements of natural language processing: What the radiologist
should know. Academic radiology, 27(1), 6–12.
21. Vijayaraghavan, S., & Basu, D. (2020). Sentiment analysis in drug reviews using supervised
machine learning algorithms. arXiv preprint arXiv:2003.11643.
22. Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization (vol.
200, pp. 1–10). Technical report-tr06, Erciyes university, engineering faculty, computer
engineering department.
288 A. Habeeb et al.
23. Ghambari, S., & Rahati, A. (2018). An improved artificial bee colony algorithm and its
application to reliability optimization problems. Applied Soft Computing, 62, 736–767.
24. Xiang, Z., Xiang, C., Li, T., & Guo, Y. (2020). A self-adapting hierarchical actions and
structures joint optimization framework for automatic design of robotic and animation
skeletons. Soft Computing, 1–14.
25. Sharma, A., Sharma, A., Choudhary, S., Pachauri, R. K., Shrivastava, A., & Kumar, D. A.
(2020). Review on artificial bee colony and it’s engineering applications. Journal of Critical
Reviews.
26. Li, Y. (2020). Comparison of various multi-armed bandit algorithms (E-greedy, ompson
sampling and UCB-) to standard A/B testing.
27. Hijazi, M., Zeki, A., & Ismail, A. (2021). Arabic text classification using hybrid feature
selection method using chi-square binary artificial bee colony algorithm. Computer Science,
16(1), 213–228.
28. Zhang, X., Fan, M., Wang, D., Zhou, P., & Tao, D. (2020). Top-k feature selection
framework using robust 0–1 integer programming. IEEE Transactions on Neural Networks
and Learning Systems.
29. Janani, R., & Vijayarani, S. (2020). Automatic text classification using machine learning and
optimization algorithms. Soft Computing, 1–17.
30. Dhar, A., Mukherjee, H., Dash, N. S., & Roy, K. (2021). Text categorization: Past and
present. Artificial Intelligence Review, 54(4), 3007–3054.
31. Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., &
Homayouni, S. (2020). Support vector machine vs. random forest for remote sensing image
classification: A meta-analysis and systematic review. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing.
32. Saadatfar, H., Khosravi, S., Joloudari, J. H., Mosavi, A., & Shamshirband, S. (2020). A new
K-nearest neighbors classifier for big data based on efficient data pruning. Mathematics, 8(2),
286.
33. Ruan, S., Li, H., Li, C., & Song, K. (2020). Class-specific deep feature weighting for Naïve
Bayes text classifiers. IEEE Access, 8, 20151–20159.
34. Oh, S. K., Pedrycz, W., & Park, B. J. (2003). Polynomial neural networks architecture:
Analysis and design. Computers & Electrical Engineering, 29(6), 703–725.