Classification Applications With Deep Learning and Machine Learning Technologies

Studies in Computational Intelligence 1071
Laith Abualigah Editor
Classification
Applications with Deep
Learning and Machine
Learning Technologies
Studies in Computational Intelligence
Volume 1071
Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes
new developments and advances in the various areas of computational
intelligence—quickly and with a high quality. The intent is to cover the theory,
applications, and design methods of computational intelligence, as embedded in
the fields of engineering, computer science, physics and life sciences, as well as
the methodologies behind them. The series contains monographs, lecture notes and
edited volumes in computational intelligence spanning the areas of neural networks,
connectionist systems, genetic algorithms, evolutionary computation, artificial
intelligence, cellular automata, self-organizing systems, soft computing, fuzzy
systems, and hybrid intelligent systems. Of particular value to both the contributors
and the readership are the short publication timeframe and the world-wide
distribution, which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
Laith Abualigah
Editor
Classification Applications
with Deep Learning
and Machine Learning
Technologies
Editor
Laith Abualigah
Hourani Center for Applied Scientific
Research
Al-Ahliyya Amman University
Amman, Jordan
Faculty of Information Technology
Middle East University
Amman, Jordan
School of Computer Sciences
Universiti Sains Malaysia
George Town, Pulau Pinang, Malaysia
ISSN 1860-949X ISSN 1860-9503 (electronic)

Studies in Computational Intelligence
ISBN 978-3-031-17575-6 ISBN 978-3-031-17576-3 (eBook)
https://doi.org/10.1007/978-3-031-17576-3
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Nowadays, with the considerable growth in deep learning and machine learning clas-
sification approaches ranging from many real-world problems such as Artocarpus
Classification, Rambutan Classification, Mango Varieties Classification, Salak Clas-
sification, Image Processing, Identification for Sapodilla Transfer Learning Tech-
niques, Classification of Jackfruit Artocarpus integer and Artocarpus heterophyllus,
Markisa/Passion Fruit Classification, Big Data Classification, and Arabic text classifi-
cation. Deep learning and machine learning have become indispensable technologies
in the current time, and this is the era of artificial intelligence. These techniques find
their marks in data analysis, text mining, classification problems, computer vision,
image analysis, pattern recognition, medicine, etc. There is a continuous flow of data,
so it is impossible to manage and analyze these data manually. The outcome depends
on the processing of high-dimensional data. Most of it is irregular and unordered,
present in various forms like text, images, videos, audio, graphics, etc. Fruit image
recognition systems are used to classify different types of fruits and to differentiate
different fruit variants of a single fruit type. Rambutan is an exotic fruit mainly in the
Southeast Asian region and prevalent fruit in Malaysia. It comes in different varieties
or cultivars. These cultivars appear to look alike in the naked eyes. Hence, an image
recognition system powered by deep learning methods can be applied in classifying
rambutan cultivars accurately. Currently, sorting and classifying mango cultivars are
manually done by observing the features or attributes of mango like size, skin color,
shape, sweetness and flesh color. Generally, experienced taxonomy experts can iden-
tify different species. However, it is not easy to distinguish these mangoes for most
people. Nowadays, society is advancing in science and technology. There is a lot of
technology that could solve the problem, which can make it easy for people to distin-
guish the cultivar. The solution we would like to propose to solve the concern is the
computer vision technique. Artificial intelligence trains computers to interpret and
understand the visual world like images and video. Deep learning, also known as deep
neural networks or deep neural understanding, is used to process the data and create
patterns by imitating the human brain to decide. It uses neurocodes that are linked
together within the hierarchical neural network to analyze the incoming data. Image
recognition is one of the most popular deep learning applications that help many
v
vi Preface
fields, especially in fruit agriculture, to identify the classification of the fruit. This
book proposal intends to bring together researchers and developers from academic
fields and industries worldwide working in the broad areas of deep learning and
machine learning community-wide discussion of ideas that will influence and foster
continued research in this field to better humanity. This book emphasizes bringing in
front some of the technology-based revolutionary solutions that make the classifica-
tion process more efficient. It also provides deep insight into classification techniques
by capturing information from the given chapters.
Amman, Jordan Laith Abualigah

Contents
Artocarpus Classification Technique Using Deep Learning Based

Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lee Zhi Pen, Kong Xian Xian, Ching Fum Yew, Ong Swee Hau,
Putra Sumari, Laith Abualigah, Absalom E. Ezugwu,
Mohammad Al Shinwan, Faiza Gul, and Ala Mughaid
Rambutan Image Classification Using Various Deep Learning
Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Nur Alia Anuar, Loganathan Muniandy, Khairul Adli Bin Jaafar,
Yi Lim, Al Lami Lamyaa Sabeeh, Putra Sumari,
Laith Abualigah, Mohamed Abd Elaziz, Anas Ratib Alsoud,
and Ahmad MohdAziz Hussein
Mango Varieties Classification-Based Optimization with Transfer
Learning and Deep Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Chen Ke, Ng Tee Weng, Yifan Yang, Zhang Ming Yang,
Putra Sumari, Laith Abualigah, Salah Kamel, Mohsen Ahmadi,
Mohammed A. A. Al-Qaness, Agostino Forestiero,
and Anas Ratib Alsoud
Salak Image Classification Method Based Deep Learning
Technique Using Two Transfer Learning Models . . . . . . . . . . . . . . . . . . . . . 67
Lau Wei Theng, Moo Mei San, Ong Zhi Cheng, Wong Wei Shen,
Putra Sumari, Laith Abualigah, Raed Abu Zitar, Davut Izci,
Mehdi Jamei, and Shadi Al-Zu’bi
Image Processing Identification for Sapodilla Using Convolution
Neural Network (CNN) and Transfer Learning Techniques . . . . . . . . . . . . 107
Ali Khazalah, Boppana Prasanthi, Dheniesh Thomas,
Nishathinee Vello, Suhanya Jayaprakasam, Putra Sumari,
Laith Abualigah, Absalom E. Ezugwu, Essam Said Hanandeh,
and Nima Khodadadi
vii
viii Contents
Comparison of Pre-trained and Convolutional Neural Networks

for Classification of Jackfruit Artocarpus integer and Artocarpus
heterophyllus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Song-Quan Ong, Gomesh Nair, Ragheed Duraid Al Dabbagh,
Nur Farihah Aminuddin, Putra Sumari, Laith Abualigah,
Heming Jia, Shubham Mahajan, Abdelazim G. Hussien,
and Diaa Salama Abd Elminaam
Markisa/Passion Fruit Image Classification Based Improved Deep
Learning Approach Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . 143
Ahmed Abdo, Chin Jun Hong, Lee Meng Kuan,
Maisarah Mohamed Pauzi, Putra Sumari, Laith Abualigah,
Raed Abu Zitar, and Diego Oliva
Enhanced MapReduce Performance for the Distributed Parallel
Computing: Application of the Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Nathier Milhem, Laith Abualigah, Mohammad H. Nadimi-Shahraki,
Heming Jia, Absalom E. Ezugwu, and Abdelazim G. Hussien
A Novel Big Data Classification Technique for Healthcare
Application Using Support Vector Machine, Random Forest
and J48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Hitham Al-Manaseer, Laith Abualigah, Anas Ratib Alsoud,
Raed Abu Zitar, Absalom E. Ezugwu, and Heming Jia
Comparative Study on Arabic Text Classification: Challenges
and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Mohammed K. Bani Melhem, Laith Abualigah, Raed Abu Zitar,
Abdelazim G. Hussien, and Diego Oliva
Pedestrian Speed Prediction Using Feed Forward Neural Network . . . . . 225
Abubakar Dayyabu, Hashim Mohammed Alhassan, and Laith Abualigah
Arabic Text Classification Using Modified Artificial Bee Colony
Algorithm for Sentiment Analysis: The Case of Jordanian Dialect . . . . . . 243
Abdallah Habeeb, Mohammed A. Otair, Laith Abualigah,
Anas Ratib Alsoud, Diaa Salama Abd Elminaam, Raed Abu Zitar,
Absalom E. Ezugwu, and Heming Jia
Artocarpus Classification Technique
Using Deep Learning Based
Convolutional Neural Network
Lee Zhi Pen, Kong Xian Xian, Ching Fum Yew, Ong Swee Hau,
Putra Sumari, Laith Abualigah, Absalom E. Ezugwu,
Mohammad Al Shinwan, Faiza Gul, and Ala Mughaid
Abstract There are many species of Artocarpus fruits in Malaysia, which have
different market potentials. This study classifies 4 species of Artocarpus fruits using
deep learning approach, which is Convolutional Neural Network (CNN). A new
proposed CNN model is compared with pre-trained models, i.e., VGG-16, ResNet50,
and Xception. Effects of variables, i.e., hidden layers, perceptrons, filter number,
optimizers, and learning rate, on the proposed model are also investigated in this
study. The best performing model in this study is the new proposed model with 2
CNN layers (12, 96 filters) and 6 dense layers with 147 perceptrons, achieving an
accuracy of 87%.
Keywords Deep learning · Transfer learning · Convolutional neural network ·

Fruit classification · Artocarpus
L. Z. Pen · K. Xian Xian · C. F. Yew · O. S. Hau · P. Sumari · L. Abualigah (B)

School of Computer Sciences, Universiti Sains Malaysia, 11800 George Town, Pulau Pinang,
Malaysia
e-mail: Aligah.2020@gmail.com
L. Abualigah
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman 11328,
Jordan
Faculty of Information Technology, Middle East University, Amman 11831, Jordan
A. E. Ezugwu
School of Computer Science, University of KwaZulu-Natal, Pietermaritzburg Campus,
Pietermaritzburg 3201, South Africa
M. A. Shinwan
Faculty of Information Technology, Applied Science Private University, Amman 11931, Jordan
F. Gul
Department of Electrical Engineering, Air University, Aerospace and Aviation Campus, Kamra,
Attock 43600, Pakistan
A. Mughaid
Department of Information Technology, Faculty of Prince Al-Hussien Bin Abdullah for IT, The
Hashemite University, PO Box 330127, Zarqa 13133, Jordan
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1

L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
Technologies, Studies in Computational Intelligence 1071,
https://doi.org/10.1007/978-3-031-17576-3_1
2 L. Z. Pen et al.
1 Introduction
Agricultural fields have faced the challenge of labour costs, and automated agri-
cultural systems are on demand to overcome such challenges [1]. Computer vision
technology has contributed to automation, such as the weed removal robots using real-
time weed recognition to remove weeds from the crop field, thus reducing both labour
and chemical costs [2, 3]. Fruit harvesting can harness this technology to enhance
the industry’s profitability, and fruit recognition is the crucial part of the solution [4].
Multiple works have been done on fruit recognition with machine learning approach.
However, only few are done on Malaysian fruits.
Previous works on fruit recognition or classification have been done using both
conventional machine learning approaches and deep learning approach. By extracting
fruit color and fruit shape as features via specialized computing modules, fruit recog-
nition system using KNN was able to have accuracy ranging from 30 to 90%, although
the fruit types are highly distinctive to each other [5]. Wide range of accuracies (30–
90%) achieved by the system raise doubts on its capabilities and optimization of
the feature extracting computing modules with various fruit types will be time and
cost consuming. Another study using conventional machine learning approaches
was done on Supermarket Produce data set, which is very well-documented with
minimum noise. Although it scored high on accuracy with Support Vector Machine
model, generalization of such model in a complicated, real harvesting environment
remains questionable. Few studies using deep learning approaches were also able to
obtain high accuracy (>90%) with well-documented dataset, while researchers are
investigating effects of noise on generalization of neural networks.
This study is to use deep learning approach to recognize four species of Artocarpus
fruit in Malaysia, breadfruit (Artocarpus altilis), Keledang (Artocarpus lanceifolius),
Nangka (Artocarpus heterophyllus), and Tarap (Artocarpus odoratissimus).
2 Propose Deep Learning
2.1 Proposed Convolutional Neural Network (CNN)

Architecture
Figure 1 shows our proposed CNN architecture. It consists of two layers of convo-
lution, two layers of max pooling layer, one layer of flattening, six layers of dense
layers and one output layer [6, 7]. The hyperparameters are shown in Fig. 2. The first
layer of convolution is with 12 filters, 3 kernel size and activation function of relu.
Then, followed by max pooling layer of size = 2. Next, the output will be fed into the
second layer of convolution with 96 filters, 3 kernel size, activation function of relu
and second layer of max pooling layer of size = 2. The main purpose of using the
convolution is to summarize the presence of detected features in our input image and
the usage of max pooling layer is to reduce the dimensions of our input so that we
Artocarpus Classification Technique Using Deep Learning … 3
could reduce the parameter to be trained. After that, flatten all the output with a flatten
layer and proceed to 6 layers of dense layer with 147 perceptrons. These dense layers
are used to identify the features in our input data and help the output layer to generate
a correct output. Before connecting to the output layer, a dropout layer with the rate
of 0.3 is utilized. Lastly, it is connected to output layer with activation function of
softmax to generate the output of 4 label classes which are breadfruit (Artocarpus
altilis), Keledang (Artocarpus lanceifolius), Nangka (Artocarpus heterophyllus), and
Tarap (Artocarpus odoratissimus).
Fig. 1 The proposed CNN

architecture
Fig. 2 The hyperparameter of proposed CNN architecture

4 L. Z. Pen et al.
2.2 Transfer Learning Model for Artocarpus Classification
Transfer learning model is a method of transferring what has been learnt from a
previous application into a new application, which in our case is for Artocarpus clas-
sification [8–12]. Those models that have been trained at a different application are
called pre-trained models. For our study, we selected three main pre-trained models
which are VGG16, ResNet50 and Xception. Some other optimization methods can
be used to optimize the problems as given in [13–18].
VGG16
VGG16 was proposed by Karen Simonyan and Andrew Zisserman in 2015 at a
paper published during the International Conference on Learning Representations
[19]. This model achieved 90.1% accuracy on the ImageNet validation dataset which
consist of over 14 million images. The architecture of VGG16 is shown below in
Fig. 3.
A number of different configurations and fine tuning was done to identify the best
performing model for our Artocarpus image classification. For VGG16, the highest
accuracy achieved was 81.50% using 4096 perceptron, freezing the whole model
except the top layer and running it with a new classifier with 2 dense layers as shown
in Fig. 4. Figure 5 shows VGG16 Transfer Model with Freeze All except Top layer,
New classifier with 2 dense layers and 4096 perceptron.
Fig. 3 VGG16 architecture

Fig. 4 Performance of VGG16 transfer model on Artocarpus image classification
ResNet50
ResNet50 is a variant of the residual network that consists of 48 convolution layers
and 1 max pooling and 1 average pooling layer. This architecture has enabled the
ability to train many layers (hundreds to thousands) while maintaining high perfor-
mance. Prior to ResNet50, there were no models that were able to achieve the same
feat especially in deep layers of training. ResNet50 achieved 92.1% accuracy on the
ImageNet validation dataset. Figure 6 shows ResNet50 Architecture.
For our Artocarpus image classification using ResNet50, the highest accuracy
achieved was using freezing all except the top layer and run with new classifier with
2 dense layers. The first layer uses 1024 perceptron while the second layer uses 4096
perceptron. This configuration managed to achieve 86% accuracy on our Artocarpus
image classification. Figure 7 shows the performance of ResNet50 Transfer Model
on Artocarpus Image Classification. Figure 8 shows ResNet50 Transfer Model with
Freeze All except Top Layer, New classifier with 2 dense layers with 1024 perceptron
followed by 4096 perceptron.
Xception
Xception is a deep convolutional neural network which was developed by Francois
Chollet from Google Inc. Figure 9 shows Xception Architecture. The name stands
for Extreme Inception and is based on the Inception model but with its modules
replaced using depthwise separable convolutions instead. Xception achieved 94.5%
on the ImageNet validation dataset.
Figure 10 shows the performance of Xception Transfer Model on Artocarpus
Image Classification. Figure 11 shows Xception Transfer Model with Freeze All
except Top Layer, New Classifier with 3 dense layers each with 4096 perceptrons.
For the Artocarpus image classification using Xception, the best performing model
only managed to achieve 66.50% accuracy. It was achieved using freeze all with new
classifier and 3 dense layers, each with 4096 perceptrons.
6 L. Z. Pen et al.
Fig. 5 VGG16 transfer model with freeze all except top layer, new classifier with 2 dense layers
and 4096 perceptron
Fig. 6 ResNet50 architecture
Fig. 7 Performance of ResNet50 transfer model on Artocarpus image classification
Fig. 8 ResNet50 transfer model with freeze all except top layer, new classifier with 2 dense layers
with 1024 perceptron followed by 4096 perceptron
8 L. Z. Pen et al.
Fig. 9 Xception architecture
Fig. 10 Performance of Xception transfer model on Artocarpus image classification
Summary on Transfer Learning Models

Figure 12 shows the performance of VGG16, ResNet50 and Xception on Artocarpus
Image Classification. At a glance, all three models have poor performance if run
using the original pre-trained model and original classifier with freeze all config-
urations. This is probably due to the lack of Artocarpus images in the ImageNet
dataset which they were pre-trained on. Secondly, VGG16 and Xception model tend
to perform better using higher perceptron count. This characteristic is less significant
on ResNet50 since all the configurations on it seems to perform reasonably good.
Another characteristic that was seen on all three models is that at lower percep-
tron count (150), the increase in the number of dense layer reduces the accuracy
whereas at higher perceptron count (4096), the increase in the number of dense layer
increases the accuracy. In addition, we can see that ResNet50 is the most suitable and
Fig. 11 Xception transfer model with freeze all except top layer, new classifier with 3 dense layers
each with 4096 perceptrons
best performing transfer model for Artocarpus image classification. This is because
ResNet50 is able to maintain reasonably good accuracy >70% across all configura-
tions tested. VGG16 comes in second with around half of them performing reasonably
good whereas Xception is unable to achieve >70% on all the configurations tested.
Here we can conclude that Xception model is not suitable to use on Artocarpus image
classification. However, it is good to take note that all three models could still achieve
much higher accuracy if the number of epochs is increased. To conclude, ResNet50
is the best transfer model to use for Artocarpus image classification as compared to
VGG16 and Xception.
Fig. 12 Performance of VGG16, ResNet50 and Xception on Artocarpus image classification

10 L. Z. Pen et al.
Fig. 13 Sample images of Artocarpus dataset
2.3 Dataset
The Artocarpus genus consists of approximately 50 species of trees which are mainly
restricted to Southeast Asia [20]. For our study, we focused on 4 edible fruits species
namely, (1) Artocarpus altilis (2) Artocarpus lanceifolius (3) Artocarpus hetero-
phyllus and (4) Artocarpus odoratissimus. The dataset consists of a total of 1000
images with each species having 250 images each. The images are resized to 224 ×
224 pixels. The dataset was then split into 80% training and 20% test set. The sample
images can be seen in Fig. 13.
2.4 Augmentation
90° image rotation was used to augment the images to increase accuracy and train
the model better. The code and sample images can be seen in Fig. 14.
3 Performance Result
3.1 Experimental Setup
The original dataset consists of 1000 images with 4 classes which are breadfruit (Arto-
carpus altilis), Keledang (Artocarpus lanceifolius), Nangka (Artocarpus hetero-
phyllus) and Tarap (Artocarpus odoratissimus). Each class has 250 images and has
already been preprocessed to 224 pixels × 224 pixels × 3 filters. We will use python
programming languages like Keras and Tensorflow library with Jupyter notebook to
Fig. 14 Augmented Images using 90° rotation
build our program. First, we load all the images and then perform data augmentation
by rotating all the images 90°. Then, we feed all the 2000 images into the Keras
library function, “image_dataset_from_directory()” to preprocess the data so that it
is converted to the format supported by Tensorflow library. The dataset is further
split into 20% test dataset and 80% train dataset. Next, we perform hyperparameter
optimization starting from the number of hidden layers (dense layer and CNN layer),
number of perceptrons, number of filters, optimizers, epochs and learning rate. In
order to reduce the tuning time for trying different combinations of hyperparameters,
we decide to tune each hyperparameter separately. This can be done by fixing all the
other hyperparameters when tuning for a specific hyperparameter. Once the hyper-
parameter reaches optimum, then proceed to another hyperparameter. The detailed
illustration of the hyperparameter optimization workflow and the hyperparameter
utilized are stated in Fig. 15 and Table 1.
3.2 Performance of Proposed CNN Model
In this section, we will discuss the effect of hidden layer, perceptrons, filter number,
optimizers, number of epochs and learning rate on the performance of our model.
After that, identify the best hyperparameter for our proposed CNN model and
compare its accuracy with the performance of transfer learning for VGG16 and
Xception model.
12 L. Z. Pen et al.
Fig. 15 The hyperparameter optimization workflow
3.2.1 Effect of Hidden Layers (Convolutional Layers and Dense Layers)
Hyperparameter tuning is done on the hidden layers which are the convolutional
layers and dense layers. The performance of the convolutional neural network has
been found to be greatly affected by varying the numbers of hidden layers. Figure 16
shows the accuracy results of the model when different combinations of convolutional
layers and dense layers are used to build the model. The convolutional layers are tested
out with 2, 3, 4 and 5 layers while the dense layers are tested out with 1, 2, 3, 4, 5, 6
and 7 layers. Different combinations are tested such as 2 convolutional layers with 1
dense layer, 2 convolutional layers with 2 dense layers, 5 convolutional layers with
6 dense layer, 5 convolutional layers with 7 dense layers etc. It is observed that the
best result was obtained with 2 convolutional layers with 5 dense layers, giving the
accuracy of 76%.
Table 1 The hyperparameter utilized in optimization and its values
No. Hyperparameter Hyperparameter Hidden Number of Number of Optimizer Number of Learning
explanation layer perceptron filter epochs rate
1 Hyperparameter 1 Effect of hidden Tuning Same as the 3 Loss = 15 0.01
layer (CNN layer number of ‘sparse_categorical_crossentropy’, (default)
and dense layer) perceptron optimizer = ‘adam’
after
flattening
2 Hyperparameter 2 Effect of number Optimum Tuning 3 Loss = 15 0.01
of perceptron ‘sparse_categorical_crossentropy’, (default)
optimizer = ‘adam’
3 Hyperparameter 3 Effect of filter Optimum Optimum Tuning Loss = 15 0.01
number ‘sparse_categorical_crossentropy’, (default)
optimizer = ‘adam’
Artocarpus Classification Technique Using Deep Learning …
4 Hyperparameter 4 Effect of Optimum Optimum Optimum Tuning 15 0.01

optimizers (default)
5 Hyperparameter 5 Effect of number Optimum Optimum Optimum Optimum Tuning 0.01
of epochs (default)
6 Hyperparameter 6 Effect of learning Optimum Optimum Optimum Optimum Optimum Tuning
rate
13
14 L. Z. Pen et al.
Fig. 16 Accuracy results from different combinations of convolutional layers and dense layers
3.2.2 Effect of Perceptrons
During the development of the CNN model, one of the hyperparameter tested is
the number of perceptrons. Using the best model obtained from 2.3, the number of
perceptrons in the dense layers are decreased to observe its effect on the model’s
accuracy. Originally, the number of perceptrons is 9408, following the number of
perceptrons obtained from the flatten layer. Then, 9408 perceptrons are increasingly
divided by 2, 4, 8, 16, 32, 64 and 128. Figure 17 shows that the accuracy of the
model varies when different number of perceptrons are applied. The model achieved
highest accuracy, 81% when the number of perceptrons in dense layers are reduced
by 64 times, 147 perceptrons.
3.2.3 Effect of Filter Number
The number of convolutional filter layers was tested out with 3, 6, 12, 24, 48, 96,
192 filters. According to Fig. 18, it is observed that upon increasing the number of
convolutional filter layers from 3 to 192, the accuracy is decreased from 81 to 54%.
Different combinations of filter numbers in convolutional layers are tested. The
results are shown in Fig. 19. For example, first convolutional layer uses 3 filters while
the second convolutional layer uses 24 filters. The highest accuracy, 85% is obtained
when the first convolutional layer uses 12 filters and the second convolutional layer
uses 96 filters. Based on the results gathered, the usage of different filter numbers in
convolutional layers achieved higher accuracy than using the same filter numbers in
convolutional layers.
Fig. 17 Accuracy of model with different number of perceptrons
Fig. 18 Accuracy of the model when convolutional layers use same filter numbers
3.2.4 Effect of Optimizers
Different types of optimizers such as Adam, Adagrad, RMSprop, SGD, Adadelta,

Adamax, Nadam and Ftrl are used to optimize the model’s performance. These
optimizers help to minimize the loss of the neural network by updating the weight
parameters. According to Fig. 20, the best optimizer for the model is Adagrad, which
achieved the accuracy of 86.3%. Other optimizers such as, Adam gained accuracy
of 77%, RMSprop gained accuracy of 79%, SGD gained accuracy of 28%, Adadelta
gained accuracy of 65%, Adamax gained accuracy of 86%, Nadam gained accuracy
of 82% and Ftrl gained accuracy of 37%.
16 L. Z. Pen et al.
Fig. 19 Accuracy of the model when convolutional layers use different filter numbers
Fig. 20 Accuracy of model using different types of optimizers
Based on Fig. 21, Adam is the fastest optimizer that reaches its own highest
accuracy if compared to other optimizers. Adam achieved accuracy of 78% at 7
epochs. Other optimizers only reached their own highest accuracy after 13 epochs.
Adagrad gained its highest accuracy at 14 epochs. Adamax and Adagrad had quite
consistent accuracy after 4 epochs. RMSprop was able to gain 71% accuracy at 1
epoch. However, the accuracy was not consistent. The highest accuracy of RMSprop
among epochs ran was 85% and the lowest accuracy was 60%, which results in final
accuracy of 79%.
Fig. 21 Accuracy of different optimizers in each epoch
3.2.5 Effect of Learning Rate
Learning rate is one of the important hyperparameter used in training the CNN
model. The learning rates adopted and observed in this project are 0.1, 0.01, 0.001,
0.0001, 0.00001 and 0.000001. The model reached highest accuracy of 87% when
the learning rate is 0.001. Figure 22 shows that the accuracy is improved from 23 to
87% with learning rate range from 0.1 to 0.001. However, the accuracy decreases to
42% when the learning rate is 0.0001. The accuracy increases to 63% when learning
rate is 0.00001 and decreases again when the learning rate is 0.000001. Therefore, it
can be concluded that 0.001 is the optimum learning rate for the CNN model. Based
on Fig. 23, the number of epochs may need to be modified for other learning rates
to reach higher accuracy.
Fig. 22 Accuracy of the model with different learning rates

18 L. Z. Pen et al.
Fig. 23 Accuracy of the model with different learning rates in each epoch
3.3 Accuracy Comparison
The accuracies of pre-trained and proposed models are shown in Table 2, Bold font
refers to the best result. It can be observed that the model with the best performance is
our proposed model which has the accuracy of 87.00%. Then, followed by ResnNet50
(Freeze all with new classifier, 1024 then 4096 perceptrons, 2 dense layers) and VGG-
16 (Freeze all with new classifier, 4096 perceptrons, 2 dense layers) with the accuracy
of 86.00% and 81.50% respectively. These models have almost similar accuracy and
do not improve even when we tried out for other combinations of hyperparameters.
It may be due to the presence of bayes error in our dataset in which there are images
with almost similar features but different targets. It is possible as almost all of our
images contain a large amount of green pixels but with different labels. This will
cause the images difficult to be trained and has the Bayes error which is irreducible.
Thus, our model may have achieved the optimum performance. All the pre-trained
models with freeze all hyperparameters do not show a high accuracy in the prediction
and have the accuracy ranging from 22.00 to 30.00%. This is because the pre-trained
model is complex and requires more epoch to converge to the optimum accuracy.
3.4 Model Performance Comparison
The accuracy of the pretrained model and proposed model for 15 consecutive epochs
are shown in Fig. 24. In this figure, the proposed model has the highest accuracy in the
first epoch. Then, it increases sharply and reaches its maximum accuracy at the tenth
epoch. After the tenth epoch, it consolidates at the level of 80–87%. For ResnNet50
(Freeze all with new classifier, 1024 then 4096 perceptrons, 2 dense layers) and VGG-
16 (Freeze all with new classifier, 4096 perceptrons, 2 dense layers), it is increase
Table 2 Accuracy of pre-trained, proposed model and its hyperparameter

Model Hyperparameter Accuracy (%)
VGG-16 Freeze all 30.00
VGG-16 Freeze all with new classifier, 4096 perceptrons, 2 dense 81.50
layers
ResNet50 Freeze all 23.00
ResNet50 Freeze all with new classifier, 1024 then 4096 perceptrons, 2 86.00
dense layers
Xception Freeze all 22.00
Xception Freeze all with new classifier, 4096 perceptrons, 3 dense 66.50
layers
Proposed model 2 CNN layers (12, 96 filters) and 6 dense layers with 147 87.00
perceptrons
gradually starting from the first epoch to the fifteenth epoch. However, the increment
is not greater than the proposed model. This means that our proposed model requires
to be trained with less epoch to achieve the optimum and higher accuracy than these
models. For other pretrained models, it does not provide a significant enhancement
when trained from the first epoch until the fifteen epochs, but it is still showing the
upward trend.
Pretrained Model and Proposed Model Accuracy in Each

Epoch
1.0000
0.9000
0.8000
0.7000
0.6000
Accuracy
0.5000
0.4000
0.3000
0.2000
0.1000
0.0000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Epoch
VGG-16 (freeze all) VGG-16 (freeze all with new classifier)

ResNet50 (freeze all) ResNet50 (freeze all with new classifier)
Xception (freeze all) Xception3 (freeze all with new classifier)
Proposed model
Fig. 24 Accuracy of the pretrained model and proposed model in each epoch
20 L. Z. Pen et al.
4 Conclusion
In conclusion, the best performing model is our proposed model with the prediction
accuracy of 87% which has an architecture of 2 CNN layers (12, 96 filters) and
6 dense layers with 147 perceptrons. It also requires to be trained with less epoch
compared to other pretrained models to achieve optimum accuracy.
References
1. Araújo, S. O., Peres, R. S., Barata, J., Lidon, F., & Ramalho, J. C. (2021). Characterising the
agriculture 4.0 landscape—Emerging trends, challenges and opportunities. Agronomy, 11(4),
667.
2. Fennimore, S. A., Slaughter, D. C., Siemens, M. C., Leon, R. G., & Saber, M. N. (2016).
Technology for automation of weed control in specialty crops. Weed Technology, 30(4), 823–
837.
3. Jamei, M., Karbasi, M., Malik, A., Abualigah, L., Islam, A. R. M. T., & Yaseen, Z. M. (2022).
Computational assessment of groundwater salinity distribution within coastal multi-aquifers
of Bangladesh. Scientific Reports, 12(1), 1–28.
4. Sarig, Y. (1993). Robotics of fruit harvesting: A state-of-the-art review. Journal of Agricultural
Engineering Research, 54(4), 265–280.
5. Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). Deepfruits: A fruit
detection system using deep neural networks. Sensors, 16(8), 1222.
6. Daradkeh, M., Abualigah, L., Atalla, S., & Mansoor, W. (2022). Scientometric analysis and
classification of research using convolutional neural networks: A case study in data science
and analytics. Electronics, 11(13), 2066.
7. AlShourbaji, I., Kachare, P., Zogaan, W., Muhammad, L. J., & Abualigah, L. (2022). Learning
features using an optimized artificial neural network for breast cancer diagnosis. SN Computer
Science, 3(3), 1–8.
8. ud Din, A. F., Mir, I., Gul, F., Mir, S., Saeed, N., Althobaiti, T., Abbas, S. M., & Abualigah, L.
(2022). Deep reinforcement learning for integrated non-linear control of autonomous UAVs.
Processes, 10(7), 1307.
9. Alkhatib, K., Khazaleh, H., Alkhazaleh, H. A., Alsoud, A. R., & Abualigah, L. (2022). A
new stock price forecasting method using active deep learning approach. Journal of Open
Innovation: Technology, Market, and Complexity, 8(2), 96.
10. Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi,
A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-
of-the-art methods. Computers in Biology and Medicine, 145, 105458.
11. Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C.
I., & Akinyelu, A. A. (2022). A comprehensive survey of clustering algorithms: State-of-
the-art machine learning applications, taxonomy, challenges, and future research prospects.
Engineering Applications of Artificial Intelligence, 110, 104743.
12. Wu, D., Wang, S., Liu, Q., Abualigah, L., & Jia, H. (2022). An improved teaching-learning-
based optimization algorithm with reinforcement learning strategy for solving optimization
problems. Computational Intelligence and Neuroscience.
13. Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The arith-
metic optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 376,
113609.
14. Abualigah, L., Yousri, D., Abd Elaziz, M., Ewees, A. A., Al-Qaness, M. A., & Gandomi, A.
H. (2021). Aquila optimizer: A novel meta-heuristic optimization algorithm. Computers and
Industrial Engineering, 157, 107250.
15. Abualigah, L., Abd Elaziz, M., Sumari, P., Geem, Z. W., & Gandomi, A. H. (2022). Reptile
search algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
Applications, 191, 116158.
16. Agushaka, J. O., Ezugwu, A. E., & Abualigah, L. (2022). Dwarf mongoose optimization
algorithm. Computer Methods in Applied Mechanics and Engineering, 391, 114570.
17. Oyelade, O. N., Ezugwu, A. E. S., Mohamed, T. I., & Abualigah, L. (2022). Ebola optimization
search algorithm: A new nature-inspired metaheuristic optimization algorithm. IEEE Access,
10, 16150–16177.
18. Ezugwu, A. E., Agushaka, J. O., Abualigah, L., Mirjalili, S., & Gandomi, A. H. (2022). Prairie
dog optimization algorithm. Neural Computing and Applications, 1–49.
19. Hong, S., Noh, H., & Han, B. (2015). Decoupled deep neural network for semi-supervised
semantic segmentation. Advances in Neural Information Processing Systems, 28.
20. Jagtap, U. B., & Bapat, V. A. (2010). Artocarpus: A review of its traditional uses, phytochem-
istry and pharmacology. Journal of Ethnopharmacology, 129(2), 142–166.
Rambutan Image Classification Using
Various Deep Learning Approaches
Nur Alia Anuar, Loganathan Muniandy, Khairul Adli Bin Jaafar, Yi Lim,
Al Lami Lamyaa Sabeeh, Putra Sumari, Laith Abualigah,
Mohamed Abd Elaziz, Anas Ratib Alsoud, and Ahmad MohdAziz Hussein
Abstract Rambutan (Nephelium lappaceum L.) is a widely grown and favored fruit
in tropical countries such as Malaysia, Indonesia, Thailand, and the Philippines.
This fruit is classified into tens of different cultivars based on fruit, flesh, and tree
features. In this project, five different rambutan cultivars classification models using
deep learning techniques were developed based on a 1000 rambutan images dataset.
Common deep learning methods for the image classification task, Convolutional
Neural Network (CNN), and transfer learning method were applied to recognize each
rambutan variant. Results have shown that the VGG16 pre-trained model performed
best as it achieved 96% accuracy on the test dataset. This indicates the model is
reliable for the rambutan classification task.
Keywords Deep learning · Convolutional neural networks · Fruit classification ·

Rambutan · ResNet · VGG
N. A. Anuar · L. Muniandy · K. A. B. Jaafar · Y. Lim · A. L. L. Sabeeh · P. Sumari ·

L. Abualigah (B)
Malaysia
L. Abualigah · A. R. Alsoud
Jordan
L. Abualigah
M. A. Elaziz
Faculty of Computer Science and Engineering, Galala University, Al Galala City, Egypt
Artificial Intelligence Research Center (AIRC), Ajman University, 346 Ajman, United Arab
Emirates
Department of Mathematics, Faculty of Science, Zagazig University, Zagazig 44519, Egypt
School of Computer Science and Robotics, Tomsk Polytechnic University, Tomsk, Russia
A. M. Hussein
Deanship of E-Learning and Distance Education, Umm Al-Qura University, Makkah 21955,
Saudi Arabia

https://doi.org/10.1007/978-3-031-17576-3_2
24 N. A. Anuar et al.
1 Introduction
Computer vision is a subset field of Artificial Intelligence (AI) responsible for “teach-
ing” the machine to understand and interpret the visual world such as digital images
or videos. The rise of big data, faster and cheaper computing resources, and new algo-
rithms have contributed to the widespread of this domain. Image classification is one
of the computer vision approaches is applied to various fields including technology,
medical, manufacturing, and agriculture. In agriculture, automated fruit image recog-
nition can assist in quality control and the development of robotic harvesting systems
from orchards [1].
Fruit image recognition systems are used to classify different types of fruits and
to differentiate different fruit variants of a single fruit type [2, 3]. Rambutan is an
exotic fruit that exists mainly in the Southeast Asian region and particularly popular
fruit in Malaysia. It comes in different varieties or cultivars such as Binjai, Gading,
Gula Batu, Jarum Mas, and Rongrien [4]. These cultivars appear to look alike in the
naked eyes. Hence, an image recognition system powered by deep learning methods
can be applied in classifying rambutan cultivars accurately [5–11].
The Convolutional Neural Networks (CNN) algorithm consistently shows remark-
able performances on image classification tasks in image databases including the
MNIST database, the NORB database, and the CIFAR10 dataset [12]. Besides CNN,
transfer learning is amongst the popular method used by researchers for image clas-
sification. Transfer learning adopts the usage of the pre-trained model which is a
network trained on a huge dataset and managed to achieve state-of-the-art perfor-
mance. In this paper, we studied Rambutan cultivars classification using deep learning
models such as CNN and transfer learning.
2 Literature Review
Deep learning provides the capability of a computer model to learn and perform
classification tasks directly from various types of data like images, text, or audio
[13–16]. It provides a high accuracy rate on the go where models are trained using
a huge amount of labeled data and neural network architectures that contain many
layers [17]. The relevant features are learned while the network trains on a collection
of data. This feature extraction while the network trains make deep learning models
highly accurate for computer vision tasks such as object classification. It has become
one of the core technologies for machine-critical artificial intelligence applications
including medical diagnosis to screening various types of cancer [18]. Most recently
image classification technique was used for the Covid-19 screening test using chest
X-ray and CT images of patients [19].
Deep learning achieves tremendous performance in many applications including
fruit classification. There are research works for fruit classification with different
goals and applications [20]. One of these applications refers to agriculture. Anyhow,
Rambutan Image Classification Using Various … 25
deep learning has the drawback of requiring an exceptionally high processing power
due to its massive parameters, which can easily go up to millions in number. Hence,
the necessity to have a lightweight deep learning architecture to fasten the diagnosis
without sacrificing accuracy.
In this section, let’s review several previous attempts to use neural networks and
deep learning for fruit recognition. On the topic of detecting fruits from images
using deep neural networks, paper [21] shows a network trained to recognize fruits.
The researcher seems to adapt a Faster Region-based convolutional network. The
objective is to create a neural network that would be used by autonomous robots
that can harvest fruits. The network is trained using RGB and NIR (near infra-red)
images. The combination of the RGB and NIR models is done in 2 separate cases
named early and late fusion. The result is a multi-modal network that obtains much
better performance than the existing networks.
Another paper [22], uses two backpropagation neural networks trained on images
with apple “Gala” variety trees to predict the yield for the upcoming season. For
this task, four features have been extracted from images like total cross-sectional
area of fruits, fruit number, the total cross-section area of small fruits, and cross-
sectional area of foliage. It was found that the deep learning methods were highly
useful to classify the fruits effectively. Some other optimization methods can be used
to optimize the problems as given in [23–28].
3 Proposed Deep Learning Method
In this paper, we planned to use a few Deep learning named convolutional neural
networks (CNN), Residual networks (ResNet) and VGG16.
3.1 CNN
A convolutional neural network (CNN) is a particular type of feed-forward neural

network that is widely used for image recognition. CNN extracts each portion of
the input image, which is known as the receptive field, and assigns weights for
each neuron based on the significant role of the receptive field to discriminate the
importance of neurons from one another. The architecture of CNN consists of three
types of layers: (1) convolution, (2) pooling, and (3) fully connected as shown in
Fig. 1. The convolution operation works to apply multiple filters to extract features
from the images, which is known as a feature map. With this, corresponding spatial
information from the dataset can be preserved. The pooling operation, also called
subsampling, is used to reduce the dimensionality of feature maps from the convo-
lution operation. A pooling layer is a new layer added after the convolutional layer.
Fig. 1 Basic architecture of CNN
Specifically, after a nonlinearity has been applied to the feature maps output by a
convolutional layer. Max pooling and average pooling are the most common pooling
operation while RELU is the common choice for the activation function to transfer
gradient in training by backpropagation.
In our work, we proposed a CNN model in classifying five rambutan types:
Rambutan Binjai, Gading, Gula Batu, Jarum Mas, and Rongrien. The model consists
of four convolutional layers. The first convolution layer uses 32 convolution filters
with a filter size of 3 × 3, kernel regularizer 0.001. Regularizer is used to add penalties
on the layer while optimizing. These penalties are used in the loss function in which
the network optimizes. Padding is used to ensure the input and output tensors remain
in the same shape. The input image size is 224 × 224 × 3. Batch normalization is
applied on each convolution before the activation enters. RELU, a rectified linear
activation function, the commonly used activation function at every convolution.
This activation function ensures the output to be either positive or zero only. The
output of each convolutional layer is given as input to the max-pooling layer with a
pool size of 2 × 2. This layer reduces the number the parameters by down-sampling.
Thus, it reduces the amount of memory and time required for computation. So, this
layer aggregates only the required features for the classification. Dropout of 0.3, 0.2,
and 0.1 are applied respectively starting from the second convolutional layers. This
aims to reduce the model complexity to prevent overfitting and reduce the compu-
tation power and time at each convolution. The second convolution layer uses 64
convolution filters with 2 × 2 kernel size and the third convolution layer use 128
convolution filters with 2 × 2 kernel size and followed by the fourth layer with 256
filters with 2 × 2 kernel size. Finally, we use a fully connected layer with 4 dense
layers and 0.5 dropouts, then ended with a SoftMax classifier. Before using dense,
the feature map of the fourth convolution is flattened. In our model, the loss function
used is categorical cross-entropy and Adam optimizer with a learning rate of 0.001.
The architecture of the proposed CNN model is shown in Figs. 2, 3, and 4. Figure 5
shows the expected classification output from the model.
Fig. 2 Proposed CNN architecture
Fig. 3 Building the proposed model
3.2 Transfer Learning
3.2.1 ResNet
Residual networks (ResNet) were developed by the Microsoft Research team for
image recognition tasks implemented using deep residual learning. This algorithm
has managed to secure 1st place on the ILSVRC 2015 classification task. The deep
residual learning architecture was developed to address the degradation problem
which occurs due to increasing stacked layers (depth). Despite having several more
depths compared to VGG nets, the networks show a lower complexity [29]. The
models were trained on over 1.28 million images and evaluated on 50,000 validation
images. ResNet was constructed in five convolutional blocks in the forms of 18, 34,
50, 101, and 152-layers.
We propose the application of ResNet-50 and ResNet-101 pre-trained models on
our Rambutan type classification task using the Keras library. Some and all ResNet
convolutional blocks will be frozen to study the effect of using a fully trained model
versus a partially trained pre-trained model. A new classifier layer consists of two
dense layers with 256 neuron units per layer and SoftMax activation function. Adap-
tive Moment Estimation (Adam) optimizer is used to compute the optimum weights
of the classifier layers with different learning rates. The fully connected layer applies
the categorical cross-entropy loss function to calculate the loss between predictions
and actual labels. Figure 6 shows the architecture of the ResNet model.
Fig. 4 Model’s summary
Feature Extraction and Model Training for ResNet and VGG:

1. Load the pre-trained model by specifying “include-top = False” and the shape
of the image data.
2. Extract convolved visual features bypassing the image data through the pre-
trained layers.
3. The resultant feature stack will be three-dimensional, and it will need to be
flattened before it can be used for prediction by the classifier.
4. The fully Connected layer is created and used in conjunction with the pre-trained
layers. Initialize this Fully Connected layer with random weights, which will
update during training (Figs. 7 and 8).
Fig. 5 Snapshot of a part of rambutan classifications output from the CNN model
Fig. 6 A 34-layer ResNet architecture
Fig. 7 Setting up the ResNet model

Fig. 8 Resnet model summary
3.2.2 VGG
Transfer learning is the reuse of a pre-trained model on a new problem. Its popularity
in deep learning is given by its advantage of training deep neural networks with
comparatively little data. This is very useful since most real-world problems typically
do not have millions of labeled data points to train such complex models [30].
To reiterate, in transfer learning, the knowledge of an already trained machine
learning model is applied to a different but related problem. With transfer learning,
we basically try to exploit what has been learned in one task to improve generalization
in another. We transfer the weights that a network has learned at “task A” to a new
“task B” [31].
VGG16 is one of the transfer learning algorithms. The model achieves 92.7% top-
5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging
to 1000 classes [32]. It was one of the famous models submitted to ILSVRC-2014.
Instead of using a large kernel (11 × 11 and 5 × 5 in the first and second layer),
VGG16 improved upon AlexNet by opting smaller kernel size of 3 × 3.
VGG16 architecture accepts a fixed input size of 224 * 224 RGB images, where
it has a total of 138 million parameters. The architecture comprises of 5 blocks of
convolution layer followed by a max-pool layer after each block and at the end three
fully connected layers with 4096, 4096, 1000 neurons respectively. The last fully
connected layer is the SoftMax layer for classification. VGG16 architecture uses a
very small kernel size i.e., 3 * 3, where after every convolutional layer, a non-linear
operation is performed by a ReLU activation function. Every block contains at least
two convolution layers and at most three convolution layers where the number of
Fig. 9 VGG16 architecture
filters for convolution increases with the power of two starting from 64 to 512 [33].
Figure 9 shows the architecture of VGG16.
After loading the VGG16 pre-trained model, all layers were frozen, except for the
last 5 as the last few layers represent a higher-level combination of lower features
and we want to train these layers to suit our problem (Fig. 10). Then a sequential
model was created by adding the VGG convolutional base model, and some fully
connected layers which include a Flatten layer, 3 Dense layers with filter sizes 1024,
1024, and 5 respectively. The first 2 Dense layers use the ReLu activation function.
After the second Dense layer, a Dropout layer is added with a weight of 0.5 to
minimize overfitting. The final Dense layer is the classification layer with a SoftMax
activation function and Adam optimizer with a learning rate of 0.0001. The model
summary is shown in Fig. 11.
3.3 Dataset
There are various types of rambutan. However, we only collected five different types
of rambutan (Gading, Binjai, Gula Batu, Jarum Mas, Rongrien) with 200 images per
label. So, the total size of the dataset used in this work is 1000 images. All images
are resized to 224 × 224 pixels. All images are split into training 80%, validation
10%, and testing 10%. Figure 12 shows the types of rambutan available.
Fig. 10 Building the VGG16 model
Fig. 11 VGG16 model summary

Fig. 12 Types of rambutan (R-156Y is a variety code for Gading cultivar)
4 Performance Results and Recommendation
4.1 Convolutional Neural Network (CNN)
A basic Convolutional Neural Network (CNN) was established as a baseline model to

compare the performances with more complex transfer learning models; ResNet and
VGG16 model. The parameters for the basic CNN model are built upon 4 convolution
layers that double in filter size after each convolution. Max pooling was used in the
model. The architecture for the convolution layers is as shown in Fig. 2.
3 layers of dense neural were used, with neurons numbers decreased by half at
each layer (256 neurons > 128 neurons > 64 neurons) before all the output passed to
output layer with 5 neurons based on 5 classes of rambutan to be predicted. The batch
size for the training was set at 128 samples per batch for 100 epochs. Nevertheless,
an early stopping mechanism was used to ensure the training has the lowest loss
Table 1 Segmentation of
Rambutan class Rambutan photo F1-score (%)
F1-score for CNN
Binjai 77
Gading 99
Gula Batu 82
Jarum Mas 74
Rongrien 68
possible. Relu activation function was used for all layers except for the final output
layer for class prediction which used the SoftMax activation function.
With all the parameters set, the overall accuracy of the model was 79% based on
the test set. The model was trained until the 40th epoch before it achieved its lowest
loss. Table 1 shows the segmentation of the F1-score for each class of rambutan for
this basic model.
Gading rambutan has the highest classification score at 99% and Rongrien has the
lowest score at 59%. It may be obvious that from all 5 classes of rambutan, Gading
has distinctive yellow color while others are red. This feature is well extracted by the
model as defining feature for Gading. On the other hand, clear features of the other
4 rambutans may be overlap hence giving lower classification performance. Diving
further into the performance of Rongrien (lowest performance), its recall score is
also significantly lower as compared to other classes at only 47%. This means that
the false-negative rate for the Rongrien is high i.e., Rongrien is commonly mistaken
for other classes of rambutan.
With the baseline model, we then venture for manipulating the training parame-
ters namely batch size, epochs run, and layers of convolution to observe the model’s
performance. The parameters were changed one at a time with the rest of the param-
eters fixed as in the baseline model. The observation is as shown in Tables 2, 3 and 4.
For the convolution layer, the maximum layers are 6 before the max-pooling caused
negative dimension to occur hence the layers manipulated to be slightly lower and
higher than the baseline model layers (2 to 6 layers).
Table 2 Performance result

Batch size Epochs | Convolution Overall accuracy, %
for different batch sizes used
layers
32 Early stopping epochs 79
64 4 convolution layers 77
100 74
256 77
Table 3 Performance results

Epochs Batch size | Convolution Overall accuracy, %
for different epochs number
layers
30 128 samples/batch 79
60 4 convolution layers 81
90 80
120 79

Convolution layers Batch size | Epochs Overall accuracy, %
for different number of
convolution layers 3 128 samples/batch 77
6 Early stopping 65
epoch
2 80
5 80

Convolution layers Batch size Epochs Overall accuracy, %
for combining best
parameters from each test 5 32 60 20
Combining all the best performance for each parameter, the performance that we
got is (Table 5):
The overall performance is much worse than the baseline model when all the
best parameters are combined. Inspection on each class classification shows that
prediction was made only for a single type of rambutan. The main contributor for
this is the small batch size, changing the batch size to its baseline number, 128 gives
us the accuracy of 77%, which is lower than the baseline model. Hence, the baseline
model with 4 convolutional layers, 128 batch size with early stopping gave the highest
accuracy model at 79%.
4.2 Transfer Learning Model
We used two different transfer learning models as discussed in the previous section:
VGG16 and ResNet model. For both transfer learning models, we unfreeze some of
the layers for training.
4.2.1 ResNet
There are two parameters tested in the ResNet model namely batch size and learning
rate.
Three batch sizes were tested for ResNet: 32, 64, and 128. Table 6 shows the
model accuracy summary for each batch size tested. One interesting observation
is that unfreezing some layers improved the model’s performance and this effect is
more noticeable than the batch size difference. Within each model, changing the batch
size does not significantly improve the accuracy performance except for ResNet101
which accuracy jumped from 20 to 77% when batch size increased from 32 to 64.
Doubling the batch size to 128 however, does not bring any more significant improve-
ment. About the partially frozen layers, the unfreeze layers can extract and learn the
distinctive features for our dataset which improved their performance.
For the learning rate, we used two lower learning rates (0.01, 0.05) and two higher
learning rates (0.1, 0.5). Table 7 shows the summary of the model performance result.
For 3 models; ResNet50, ResNet50 partially frozen, and ResNet101, the observed
trend is by increasing the learning rate the performance accuracy increased before
plateaued. 50 epochs with early stopping were used for all model training and the
lower learning rate may still be far from the lowest loss solution when the training

Model Batch size Accuracy, %
for different batch sizes
using different models ResNet50 32 38.9
64 40
128 40
ResNet50 (partially frozen) 32 66
64 67
128 71
ResNet101 32 20
64 77
128 78
ResNet101 (partially frozen) 32 76
64 80
128 80
* Learning rate set at 0.001

Model Batch size Learning rate Accuracy, %
for different learning rate used
ResNet50 64 0.01 81
0.05 85
0.1 85
0.5 85
ResNet50 (partially 0.01 58
frozen) 0.05 71
0.1 75
0.5 75
ResNet101 0.01 67
0.05 83
0.1 82
0.5 83
ResNet101 (partially 0.01 84
frozen) 0.05 82
0.1 82
0.5 82
stopped as compared to the higher learning rate that may be closer to the optimized
solution when the training ended either by reaching final epoch or sequence of lowest
loss occur. On the other hand, increasing the learning rate for ResNet101 partially
frozen model caused the performance to slightly be dropped due to the opposite
reason of overshooting the optimized solution (Fig. 13).
Fig. 13 Accuracy and loss for best ResNet model

4.2.2 VGG16
VGG16 model has experimented with different architecture, batch sizes, epochs, and
optimizers for training. The batch sizes used for VGG16 training are 100, 128, 256.
As mentioned previously, all the layers are frozen except for the last few layers.
The model’s performance is different for different batch sizes and architectures.
Table 8 shows the VGG16 performance summary, where Bold font refers to the
best result. Model 2 performed exceptionally well as compared to other models with
96% accuracy. Model 2 is trained for 125 epochs with a batch size of 128 and Adam
optimizer. On the other hand, models with the same architecture and using Adam
optimizer (Model 1, Model 3) with a batch size of 256 achieved validation accuracy of
89%, and batch size of 100 achieved validation accuracy of 91%. Model 4 with SGD
optimizer achieved validation accuracy 87% for a batch size of 256. Model 5 with
RMSprop optimizer also performed well with a validation accuracy of 95%. Model
6 and model 7 used the same architecture and Adam optimizer but with different
batch sizes. Model 6 with a batch size of 256 achieved validation accuracy of 87%
and whereas Model 7 with a batch size of 128 achieved a validation accuracy of 94%.
The performance of the model improves when batch size decreases from 256 to 128.
Within each model, changing the batch size does not significantly improve the
accuracy performance. In Adam optimizer changing the batch size from 256 to 128
improves the validation accuracy from 89 to 96%. Compared to the other optimizers,
RMSprop achieved a good validation accuracy of 95% for a batch size of 256.
Increasing the number of layers in the architecture does not bring any more significant
improvement in model performance. The performance history of learning the model
and performance metrics of the best model is as shown in Fig. 14 (Fig. 15; Table 9).
The overall validation accuracy of the best model was 96% based on the validation
set. The model was trained for 125 epochs with a batch size of 128 and an Adam
optimizer. The classification of the F1-score for each class of rambutan for this basic
model is depicted in Table 10.
Gading rambutan has the highest F1-score at 100% and Binjai has the lowest
score of 92%. Like discussed previously, Gading has a distinctive yellow color while
others are red which is well extracted by the model. On the other hand, clear features
of the other 4 rambutans may overlap hence giving low classification performance
compared to Gading. Nevertheless, the model still able to classify each type of
rambutan with high accuracy as compared to other models discussed previously.
Based on the highest accuracy of the model, we recommend VGG16 as the classifier
for listed rambutan types.
5 Concluding Remarks
The use of a convolution neural network to classify rambutan shows immense poten-
tial to correctly identify the type of rambutan. The initial hypothesis that all types
of transfer learning models would outperform the conventional, built-from-scratch
Table 8 VGG16 performance summary

Model Optimizer Batch size Epochs Fully Training Testing
VGG16 connected accuracy % accuracy %
layer
Model 1 Adam 256 150 Flatten + 2 94.5 89
dense layer
with filter sizes
(1024, 1024)
+ dropout
(0.5) + output
layer
Model 2 Adam 128 125 Flatten + 2 97 96
dense layers
with filter
sizes (1024,
1024) +
dropout (0.5)
+ output
layer
dense layers
with filter sizes
(1024, 1024)
+ dropout
(0.5) + output
layer
Model 4 SGD 256 125 Flatten + 3 93.6 87
dense layers
with filter sizes
(4098, 1024,
512) + output
layer
Model 5 RMSprop 256 100 Flatten + 2 97.75 95
dense layers
with filter sizes
(1024, 1024)
+ dropout
(0.5) + output
layer
dense layers
with filter sizes
(4098, 1024,
512) +
dropout (0.5)
+ output layer
(continued)
Table 8 (continued)
Model Optimizer Batch size Epochs Fully Training Testing
VGG16 connected accuracy % accuracy %
layer
dense layers
with filter sizes
(4098, 1024,
512) +
dropout (0.5)
+ output layer
Fig. 14 VGG16 best model accuracy and loss
Fig. 15 Best model confusion matrix
CNN model is supported, shown using both ResNet and VGG model which yields
higher improvement as compared to the conventional CNN model. Between the
two transfer-learning models, VGG16 has the better accuracy in classifying all the
types of rambutan, achieving overall 96% accuracy as compared to ResNet50 at
85%. VGG16 also manage to identify each type of rambutan well, with each type
of rambutan correctly classified more than 90%. Built from scratch CNN model has
Table 9 VGG16 best model

Model 2
parameters
Optimizer Adam
Batch size 128
Epochs 125
Learning rate 0.0001
Training accuracy 0.9737
Training loss 0.0790
Validation accuracy 0.9600
Validation loss 0.1914
Table 10 F1-score
Rambutan class Rambutan images F1-score (%)
segmentation for VGG16 best
model Binjai 92
Gading 100
Gula Batu 95
Jarum Mas 97
Rongrien 98
the lowest accuracy with the best model achieved 79% accuracy. Rambutan Gading
has the highest accuracy among other types of rambutan, which is believed due to its
distinct color extracted well by the model. It would be suggested for the next training
iteration to remove Rambutan Gading for the model to fully extract defining features
of the other 4 types of rambutan. This system of experts is a basis for the future. It is
recommended for future research to expand size of dataset to classify more varieties
of Rambutan and can be applied to the agriculture field.
References
1. Risdin, F., Mondal, P. K., & Hassan, K. M. (2020). Convolutional neural networks (CNN)
for detecting fruit information using machine learning techniques. IOSR Journal of Computer
Engineering (IOSR-JCE), 22(2), 1–13.
2. Morton, J. F. (1987). Fruits of warm climates. Morton.
3. Rojas-Aranda, J. L., Nunez-Varela, J. I., Cuevas-Tello, J. C., & Rangel-Ramirez, G. (2020).
Fruit classification for retail stores using deep learning. Lecture Notes in Computer Science,
12088, 3–13.
4. Goenaga, R., & Jenkins, D. (2011). Yield and fruit quality traits of rambutan cultivars grafted
onto a common rootstock and grown at two locations in Puerto Rico. HortTechnology, 21(1),
136–140.
5. Abualigah, L., Al-Okbi, N. K., Elaziz, M. A., & Houssein, E. H. (2022). Boosting marine
predators algorithm by salp swarm algorithm for multilevel thresholding image segmentation.
Multimedia Tools and Applications, 81(12), 16707–16742.
6. Mehbodniya, A., Douraki, B. K., Webber, J. L., Alkhazaleh, H. A., Elbasi, E., Dameshghi,
M., Abu Zitar, R., & Abualigah, L. (2022). Multilayer reversible data hiding based on the
difference expansion method using multilevel thresholding of host images based on the slime
mould algorithm. Processes, 10(5), 858.
7. Otair, M., Abualigah, L., & Qawaqzeh, M. K. (2022). Improved near-lossless technique using
the Huffman coding for enhancing the quality of image compression. Multimedia Tools and
Applications, 1–21.
8. Liu, Q., Li, N., Jia, H., Qi, Q., & Abualigah, L. (2022). Modified remora optimization algorithm
for global optimization and multilevel thresholding image segmentation. Mathematics, 10(7),
1014.
9. Lin, S., Jia, H., Abualigah, L., & Altalhi, M. (2021). Enhanced slime mould algorithm for
multilevel thresholding image segmentation using entropy measures. Entropy, 23(12), 1700.
10. Ewees, A. A., Abualigah, L., Yousri, D., Sahlol, A. T., Al-qaness, M. A., Alshathri, S., & Elaziz,
M. A. (2021). Modified artificial ecosystem-based optimization for multilevel thresholding
image segmentation. Mathematics, 9(19), 2363.
11. Abualigah, L., Diabat, A., Sumari, P., & Gandomi, A. H. (2021). A novel evolutionary arith-
metic optimization algorithm for multilevel thresholding segmentation of Covid-19 CT images.
Processes, 9(7), 1155.
12. Rawat, W., & Wang, Z. (2017). Deep convolutional neural networks for image classification:
A comprehensive review. Neural Computation, 29(9), 2352–2449.
13. Sumari, P., Syed, S. J., & Abualigah, L. (2021). A novel deep learning pipeline architecture
based on CNN to detect Covid-19 in chest X-ray images. Turkish Journal of Computer and
Mathematics Education (TURCOMAT), 12(6), 2001–2011.
14. Kadyan, V., Singh, A., Mittal, M., & Abualigah, L. (2021). Deep learning approaches for
spoken and natural language processing.
15. Abuowaida, S. F. A., Chan, H. Y., Alshdaifat, N. F. F., & Abualigah, L. (2021). A novel instance
segmentation algorithm based on improved deep learning algorithm for multi-object images.
Jordanian Journal of Computer and Information Technology (JJCIT), 7(01), 10–5455.
16. Danandeh Mehr, A., Rikhtehgar Ghiasi, A., Yaseen, Z. M., Sorman, A. U., & Abualigah,
L. (2022). A novel intelligent deep learning predictive model for meteorological drought
forecasting. Journal of Ambient Intelligence and Humanized Computing, 1–15.
17. MathWorks. (2021). What is deep learning? How it works, techniques & applications. Math-
Works. [Online]. https://www.mathworks.com/discovery/deep-learning.html. Accessed July
01, 2021.
18. Ardila, D., Kiraly, A. P., Bharadwaj, S., Choi, B., Reicher, J. J., Peng, L., Tse, D., Etemadi, M.,
Ye, W., Corrado, G., Naidich, D. P., & Shetty, S. (2019). End-to-end lung cancer screening with
three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine,
25(6), 954–961.
19. Wang, S., Kang, B., Ma, J., Zeng, X., Xiao, M., Guo, J., Cai, M., Yang, J., Li, Y., Meng, X., &
Xu, B. (2021) A deep learning algorithm using CT images to screen for Corona virus disease
(COVID-19). European Radiology, 31(8), 6096–6104.
20. Hameed, K., Chai, D., & Rassau, A. (2018). A comprehensive review of fruit and vegetable
classification techniques. Image and Vision Computing, 80, 24–44.
21. Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T., & McCool, C. (2016). DeepFruits: A fruit
detection system using deep neural networks. Sensors, 16(8), 1222.
22. Cheng, H., Damerow, L., Sun, Y., & Blanke, M. (2017). Early yield prediction using image
analysis of apple fruit and tree canopy features with neural networks. Journal of Imaging, 3(1),
6.
113609.
10, 16150–16177.
29. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In
2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 770–778).
30. Qassim, H., Verma, A., & Feinzimer, D. (2018). Compressed residual-VGG16 CNN model for
big data places image recognition. In 2018 IEEE 8th annual computing and communication
workshop and conference (CCWC).
31. Ferguson, M., Ak, R., Lee, Y.-T. T., & Law, K. H. (2017) Automatic localization of casting
defects with convolutional neural networks. In 2017 IEEE international conference on big data
(big data) (pp. 1726–1735).
32. Naranjo-Torres, J., Mora, M., Hernández-García, R., Barrientos, R. J., Fredes, C., & Valenzuela,
A. (2020). A review of convolutional neural network applied to fruit image processing. Applied
Sciences, 10(10), 3443.
33. ul Hassan, M. (2021). VGG16—Convolutional network for classification and detection.
Neurohive, November 20, 2018. [Online]. https://neurohive.io/en/popular-networks/vgg16/.
Accessed July 31, 2021.
Mango Varieties Classification-Based
Optimization with Transfer Learning
and Deep Learning Approaches
Chen Ke, Ng Tee Weng, Yifan Yang, Zhang Ming Yang, Putra Sumari,
Laith Abualigah, Salah Kamel, Mohsen Ahmadi,
Mohammed A. A. Al-Qaness, Agostino Forestiero, and Anas Ratib Alsoud
Abstract Mango is one of the well known tropical fruits native to south asia and
currently there are over 500 varieties of mangoes known. Depending on the variety,
mango fruit can be varied in size, skin color, shape, sweetness, and flesh color which
may be pale yellow, gold, or orange. However, sometimes it is difficult for us to
differentiate what type of mango it is. Thus, in this paper, four types of mango
classification approach is presented. Thus, we are going to use convolutional neural
network (CNN) algorithm and transfer learning methods (VGG16 and Xception) to
train on the 1000 mango images collected and obtain a deep learning model which is
able to classify four types of mango (Alampur Baneshan, Alphonso, Harum Manis
C. Ke · N. T. Weng · Y. Yang · Z. M. Yang · P. Sumari · L. Abualigah (B)

Malaysia
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, Jordan
L. Abualigah
S. Kamel
Department of Electrical Engineering, Faculty of Engineering, Aswan University, Aswan 81542,
Egypt
M. Ahmadi
Department of Industrial Engineering, Urmia University of Technology, Urmia, Iran
M. A. A. Al-Qaness
State Key Laboratory for Information Engineering in Surveying, Mapping and Remote Sensing,
Wuhan University, Wuhan 430079, China
Faculty of Engineering, Sana’a University, 12544 Sana’a, Yemen
College of Physics and Electronic Information Engineering, Zhejiang Normal University,
Jinhua 321004, China
A. Forestiero
Institute for High Performance Computing and Networking, National Research Council of Italy,
Rende, Cosenza, Italy

https://doi.org/10.1007/978-3-031-17576-3_3
46 C. Ke et al.
and Keitt) automatically. In summary, the objective in this paper is to develop a deep
learning algorithm to automatically classify four types of mango cultivar.
Keywords Mango · Convolutional neural network (CNN) · Transfer learning ·

Deep learning · DGG16 · Xception
1 Introduction
Currently, sorting and classifying cultivar of mango is manually done by observing

the features or attributes of mango like size, skin color, shape, sweetness, and flesh
color [1–3]. Generally, experienced taxonomy experts can identify different species.
However, it is difficult to distinguish these mangoes for most people. Nowadays,
society is advancing in science and technology. There is alot of technology that could
be used to solve the problem which is able to make it easy for people to distinguish
the cultivar. The solution we would like to propose to solve the concern is computer
vision technique where it is an artificial intelligence that trains computers to interpret
and understand the visual world like image and video [4–8].
Nowadays, the most popular technology used in this innovative era is Computer
Vision for fruit recognition. Compare to other machine learning algorithms, Convo-
lutional neural network (CNN) provide promising results to identify fruits in images
[9] Mostly, deep learning is able to help people to solve some problems such as
seed classification and retrieval [10], fruit detection for farmers [11], discrimination
of litchi fruit [12] and etc. The major process of image classification contains three
steps: feature extraction, training for the model and followed by testing. The feature
extraction process means to take the characteristic properties in the images. After
that, while training an algorithm will be used to train for a model form a unique
description for a particular class. The testing step means to classify the test images
under various classes with the model trained [13]. Also, modification of the convo-
lutional layers is to have a more accurate and faster detection. The test results show
the proposed algorithm has achieved higher detecting accuracy and lower processing
time than the traditional detectors [11]. Some other optimization methods can be used
to optimize the problems as given in [14–19]. In short, Table 1 shows the summary
of the literature review.
By using digital images from cameras and videos and deep learning models,
machines can accurately identify and classify four types of the mango. Therefore,
in this paper we will develop a deep learning model train by using the 1000 images
we collected. Moreover, there will be three algorithms that will be tested: one is
convolution neural network (CNN) and the other two is transfer learning method
VGG16 and Xception. Thus, with the model train we might be able to implement it
in some phone system or application so people could classify the mango cultivar by
just snapping a picture with their phone camera.
Mango Varieties Classification-Based Optimization … 47
Table 1 Summary of literature review

Author Topic Objective Data Algorithms Performance
(%)
Jaswal et al. Image Image The images CNN 95
[13] classification classification are converted
using to gray scale
convolutional
neural
networks
Chung and A fruits Fruits Fruit 360 CNN/DL 95
Van Tai [9] recognition recognition dataset
system based
on a modern
deep learning
technique
Shaohua and Faster R-CNN multi-class Fruit images Faster CNN 86.41
Goudos [11] for multi-class fruit detection
fruit detection using a robotic
using a robotic vision system
vision system
Osako et al. Cultivar Cultivar litchi fruit DL 98.33
[12] discrimination discrimination images
of litchi fruit of litchi fruit
images using images
deep learning
Andrea et al. A novel deep Seed image Seeds pictures CNNs with 95.65
[10] learning based classification different
approach for and retrieval structure
seed image
classification
and retrieval
2 Methodology
2.1 Dataset
The data set for the development of this study consists of 1000 mango photographs
divided into 4 categories Alampur Baneshan, Alphonso, Harum Manis and Keitt, 250
units for each category where all of them are collected from Google image. Figure 1
shows some examples for each type of mango.
Besides, all the image in in 3 dimension channel and all the image is resize into
the dimension of 224 * 224. Moreover, data augmentation will be used to increase
the robustness of the model. In short, by using the data we will train the model by
using three different deep learning algorithms, one convolutional neural network
and another two transfer learning methods. In short, in section two we will discuss
some literature review related to the topic after that in the following section we will
show the deep learning model we design and discuss the performance for the model
trained.
48 C. Ke et al.
Fig. 1 The used image

dataset
Alampur Baneshan Alphonso
Harum Manis Keitt
2.2 Data Preparation
2.2.1 Augmentation
Data augmentation is an important step in data processing. It can increase the data
size by augmenting the image like rotating, magnifying, different color intensity
and so on. Which is able to prevent overfitting of the model. At the same time, the
generalization ability of the model is enhanced. In all the experiments, we use the
ImageDataGenerator function to argue the input image data. Figure 2 shows the
augmentation code that we use in experiment.
In the first row we have converted the RGB value from the range of 0–255 to 0–1.
Secondly, we randomly rotate the image within the degree 0 to 180. Next, for the third
and fourth row we randomly shift the image in the vertical or horizontal direction.
On the fifth row we applied a random shear transform to shear the image. Moreover,
in the sixth row the zoom function is used to randomly scale the image into different
sizes. Furthemore, horizontal_flip is applied to 50% random probability to flip the
image horizontally. Lastly, the nearest fill mode is the filling strategy used to fill up
the image after augmentation like rotation or translation.
2.3 Proposed CNN Architecture
Convolutional neural network (CNN) is a kind of feedforward neural network, which

has excellent performance for large-scale image processing. Convolutional neural
Fig. 2 Augmentation code
network consists of one or more convolution layers and all connected layers at the
top, as well as correlation weight and pooling layer. Compared with other deep
learning structures, convolutional neural networks can give better results in image
and speech recognition.
Firstly, the training set data is enhanced, because in deep learning, the number
of samples is generally required to be sufficient. The more the number of samples,
the better the trained model effect, and the stronger the generalization ability of the
model. For the input image, some simple translation, scaling, color change, etc.
As shown in Fig. 3, the CNN architecture model consists of five convolution
layers, followed by five maximum pooling layers and two fully connected layers.
The network input layer is 224 × 224 × 3 pixel RGB image. Convolution layer and
pooling layer: the first convolution layer is convolution layer 1, which contains 32
convolution cores with the size of 3 * 3 and relu as the activation function, and the
maximum pooling layer 1 is 2 * 2. The second convolution layer is convolution layer
2, which has 64 convolution kernels with the size of 3 * 3 and relu as the activation
function, and the maximum pooling layer 2 is 2 * 2. The third convolution layer is
convolution layer 3, which has 128 convolution cores with the size of 3 * 3 and relu
as the activation function, and the maximum pooling layer 3 with the size of 2 * 2.
The fourth convolution layer is convolution layer 4, which has 256 convolution cores
with the size of 3 * 3 and relu as the activation function, and the maximum pooling
layer 4 with the size of 2 * 2. The fifth convolution layer is convolution layer 5, which
has 512 convolution cores with the size of 3 * 3 and relu as the activation function,
and the maximum pooling layer 5 is 2 * 2.
Flatten layer: Enter the fully connected layer from multi-dimensional input to
one-dimensional.
Full connection layer: (density (256, activation = ‘relu’)). Then dropout and relu
of 0.5 are used for faster convolution calculation. Finally, the classification layer
(density (4), activation = ‘softmax’) is used to predict the output of the model and
50 C. Ke et al.
Fig. 3 CNN model
represent four different kinds of mangoes. SGD: we set the parameters of SGD
optimizer (LR = 0.001, decay = 1e−6, momentum = 0.9, nesterov = true).
2.4.1 VGG16
VGG16 is a convolutional neural network (CNN) algorithm proposed by K.

Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep
Convolutional Networks for Large-Scale Image Recognition”. This model is able to
achieve the top one accuracy 0.713 and top five accuracy 0.901 in imagenet which
contains 14 million images with 1000 class labels. The model contains 5 convolu-
tional layers, 1 flatten layer and a fully connected layer. Besides, the fully connected
layer contains 2 layers with 4096 neurons. Moreover, the original output layer is 1000.
However in our dataset we only have 4 class labels (Alampur Baneshan, Alphonso,
Harum Manis and Keitt), therefore in our experiment we will adjust the number to 4.
Since, VGG16 is a CNN model thus the activation function used is relu and softmax
for output. Figure 4 shows the summary of the VGG16 model.
2.4.2 Xception
Xception is a convolutional neural network (CNN) algorithm based on inception,

which is presented in Fig. 5. Xception architecture has 36 convolution layers, which
constitute the basis of network feature extraction. In our experiment, we will focus
on mango image classification, so our convolution basis will follow the logistic
regression layer. So, a fully connected layer must be inserted before the logistic
regression layer, which will be discussed in the effect of the dense layer section. The
36 convolution layers are constructed into 14 modules, all of which are connected
by linear residuals except the first and last modules. Finally, xception architecture is
a linear stack of deeply separable convolution layers with residual connections. This
Fig. 4 VGG16 model
makes the architecture very easy to define and modify based on requirements. Using
advanced libraries such as keras or tensorflow slim requires very little code.
Fig. 5 Xception
52 C. Ke et al.
3 Experiment Result
3.1 CNN
3.1.1 Experimental Setup
There are 1000 mango images in the data set, all images are 224 * 224 pixels in
size, and there are four types, namely Alampur Baneshan, Alphonso, HarumManis,
and Keitt. Including 60% training set, 20% validation set, 20% test set. The deep
learning experiment is carried out in the local jupyter notebook. The model summary
is shown in Fig. 6 shows the model architecture and the input and output of each
layer.
Fig. 6 a Model summary, b model architecture

Fig. 7 Dense layer
3.1.2 Dense Layer
The Flatten layer is used to “flatten” the input, that is, to make the multi-dimensional
input one-dimensional, which is commonly used in the transition from the convo-
lution layer to the (Convolution) fully connected layer (Dense) as shown in Fig. 7.
In other words, after the Convolution convolutional layer, the Dense fully connected
layer cannot be directly connected. The data of the Convolution layer needs to be
flattened (Flatten), and then the Dense layer can be added directly. Dense(256, acti-
vation = ‘relu’) After using relu, Training uses traditional Dropout with a drop rate
of 0.5. For each neuron in the layer that uses Dropout, there is a 50% probability
of being dropped during training, and the last fully connected layer uses softmax to
output 4 categories.
3.1.3 Modeler Optimizer
The model uses the SGD optimizer, LR = 0.001, decay = 1e−6, momentum = 0.9,
nesterov = True, gradient descent can make loss drop. Calculate the accuracy on the
test set after training the model.
3.1.4 Number of Epochs
As shown in Fig. 8, we chose 10, 50, and 100 rounds of training. Figure 8 shows the
accuracy and loss of 50 and 100 epochs. The accuracy of the epochs 10 test set is
0.65 and the loss is 0.82. The accuracy of the epochs 50 test set is 0.78 and the loss
is 0.67. The accuracy of the epochs 100 test set is 0.75 and the loss is 1.07.
3.1.5 Learning Rate
As shown in Table 2, the effect of different LR on accuracy. As shown in Fig. 9, the

impact of different LR on the accuracy of the training set and the accuracy of the
validation set, and the loss.
54 C. Ke et al.
Fig. 8 Epochs 10, epochs 50, and epochs 100
Table 2 The effect of

Epochs Lr Test set_acc Loss
different LR on accuracy
10 0.01 0.72 0.82
10 0.001 0.65 0.82
50 0.001 0.78 0.67
100 0.001 0.75 1.07
In this section we are going to conduct our experiment, where the guideline is
proposed in the article [20]. By observing the figure we could indicate that it is
more suitable for us to follow the third and fourth quarter since our image data data
only has 1000 units which could be considered as low quantity. In the first experi-
ment we will try to train the model with the original model as shown in Figs. 10 and
11 without freezing any layer. Second experiment we will try to fine tune the lower
layers of the pretrained model and in the last experiment we will try to fine tune the
output density of the pretrained model.
3.2.1 VGG16
Experiment 1: Train the entire model with original algorithm design (doesn’t
freeze any layer)
The hyperparameter we set for this experiment is batch size equal to 2, learning rate
equal to 0.0001 and epoch equal to 18 and 100. After that the output layer is changed
from 1000 to 4.
A (epochs 10 Lr=0.01) B (epochs 10 Lr=0.001)
C (epochs 50 Lr=0.001) D (epochs 100 Lr=0.001)
Fig. 9 The impact of different LR on the accuracy of the training set and the accuracy of the
validation set, and the loss
56 C. Ke et al.
Fig. 10 The result obtained from 100 epchs
Fig. 11 The result obtained from 18 epochs
In this first experiment we are able to indicate that this model is not performing
well on the data we train. As the figure and table shown the model is overfitting
when using epoch 100 and the performance is bad as shown in Table 3 the accuracy
obtained for both models is lower than 0.4. Thus, we proceed to experiment two to
test for different methods or hyperparameters.
Experiment 2: Train the model by freezing the convolutional layer
In this experiment we have freezed all the convolutional layers and trained the
model with the original fully connected layer and made a comparison with the fully
connected layer used in our CNN model shown in Fig. 12. The result obtained by
using the original VGG16 dense layer is shown in the Fig. 12 below. Both of the
models are trained with 100 epochs.
Table 3 The obtained results

Epochs Accuracy Loss
for both models
18 0.3 0.665
100 0.25 0.5627
Fig. 12 The result for original design
With this model we will be able to obtain an accuracy 61.5% with the loss 3.9735
where the result is not that ideal and some more it is suffering from overfitting as we
can see in Fig. 12 the distance of validating accuracy and training accuracy is far from
each other. Next we train again by modifying the fully connected layer according to
the method shown in the article [21] and the results are shown in Fig. 12.
For this model the best accuracy is 0.61 and the loss is 1.1217. Besides, by
observing the Fig. 13 we are able to indicate that the model is still suffering from
overfitting and the accuracy is not much different compared with the original design
model, but if we compare the loss then this model will be better. Thus, we will use
this new design model and proceed to the next experiment.
Furthemore, we also tried to reduce the number of neurons from 4096 to 128
units and surprisingly the result obtained better than the previous experiment with
the accuracy of 66.5% and 0.5039 loss. Figure 14 below shows the result obtained
in this experiment.
Experiment 3: Train some layers and leave others frozen
In this section we try to freeze the front few layers and keep the rest of the layer
untrainable. The epoch we used in this section is 100 and the rest will be the same
as experiment 1.
Fig. 13 The result after changing the fully connected layer

58 C. Ke et al.
Fig. 14 Result obtain with 128 neuron
Fig. 15 The result for freezing first 10 layer
In this section, we have tried to freeze the layer for the first 10 and 15, as shown in
Figs. 15 and 16. The best result we obtained is accuracy equal to 72.5% with the loss
of 0.4586 from the model freezing the first 15 layers. If compared with the original
model or previous experiment we are able to indicate that this model has improved
a lot where the accuracy is 72.5%. Compared with the previous experiment it has
improved by up to 10%. However, if talking about the overfitting issue we are able to
notice that it is still not solved, so after referring to some paper this problem might be
affected by the data we collected. Thus, in this section we would like to conclude that
the model train with 128 neurons and freezing the first fifteen convolutional layers
is the best model we obtain in this experiment.
3.3 Xception
3.3.1 Experimental Setup
First, we need to create a baseline model, and then modify one parameter at a time
to partition the result and compare it with the baseline model to get its impact. In
Fig. 16 The result for freezing first 15 layer
order to achieve this goal, a total of five experiments were designed. Experiment 1:
Create a baseline model, mainly modify the number of frozen layers. Experiment 2:
Modify optimizer and compare the performance with the baseline model. Experiment
3: Modify deny layer and compare the performance with the baseline model. Exper-
iment 4: Modify number of epochs and compare the performance with the baseline
model. Experiment 5: Modify learning rate and compare the performance with the
baseline model. We divide the dataset into three parts: training dataset, validation
dataset and testing dataset. We will take the performance of the test dataset as the
evaluation standard of the model.
3.3.2 Experiment 1: Create a Baseline Model
When creating the baseline model, we try to freeze all, part and no layer in the original
model. Table 4 shows baseline model setting. Table 5 shows the performance of model
with different freezing layer.
Table 4 Baseline model setting

Optimizer Dense layer Number of Learning rate
epochs
Baseline RMSprop x= 50 Learning rate
model GlobalAveragePooling2D()(x) scheduler
x = Dropout(0.5)(x)
x = Dense(1024)(x)
x = Activation(‘relu’)(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
Predictions = Dense(4,
activation = ‘sigmoid’)(x)
60 C. Ke et al.
Table 5 Performance of
Accuracy (test) Loss (test)
model with different freezing
layer Freeze all 0.185 1.3884
Freeze part 0.42 1.4410
Freeze no 0.78 1.5862
Considering that unfreezing will make the model perform better, we choose
unfreezing as the baseline model, and the following experiments all choose
unfreezing.
3.3.3 Experiment 2: Effect of Optimizers
In order to form a contrast experiment with experiment 1, only optimizer was modi-
fied here. Table 6 shows experiment 2 model setting and Table 7 experiment 2 model
comparison.
It can be seen from the experimental results in Fig. 17 that the accuracy of the
model has decreased a lot. For our dataset, RMSprop is a better choice. The reason
is that Adagrad learning rate decreases more slowly than RMSprop, which leads to
the slow convergence of the model.
Table 6 Experiment 2 model setting

epochs
Experiment 2 Adagrad() x= 50 Learning rate
x = Dropout(0.5)(x)
x = Dense(1024)(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
Table 7 Experiment 2 model

comparison
Baseline model 0.78 1.5862
Experment 2 model 0.315 1.3310
Fig. 17 Comparison of learning rate between experiment 1 and experiment 2
3.3.4 Experiment 3: Effect of Dense Layer
Compared with experiment 1, we changed the setting of the dense layer. Table 8
shows the experiment 3 settings. Table 9 shows experiment 3 model comparison.
Obviously, the dense layer of the baseline model has better performance, which
shows that it can better distinguish image features.
Table 8 Experiment 3 setting

epochs
Experiment 3 RMSprop x= 50 Learning rate
x = Dense(1024)(x)
x = BatchNormalization()(x)
x = Dropout(0.2)(x)
x = Dense(256)(x)
x = BatchNormalization()(x)
x = Dropout(0.2)(x)
Table 9 Experiment 3 model

comparison
62 C. Ke et al.

Optimizer Dense layer Number of learning rate
epochs
Experiment 4 RMSprop x= 100 Learning rate
x = Dropout(0.5)(x)
x = Dense(1024)(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
Table 11 Experiment 4
model comparison
Experiment 4 model 0.61 3.2966
3.3.5 Experiment 4: Effect of Number of Epochs
Table 10 shows the experiment 4 settings. Table 11 shows experiment 4 model

comparison. It can be seen from Table 11 that the accuracy of the model has declined.
However, by observing the training log, the accuracy of the model in the training
dataset reaches 0.96, which indicates that the high epochs makes the model over fit.
In this experiment, the number of epochs was increased from 50 to 100.
3.3.6 Experiment 5: Effect of Learning Rate
In Experiment 2, we tested the influence of different learning rates on the accuracy

of the model by modifying optimizers. In this experiment, we tested the influence
of learning rates on the accuracy through ReduceLROnPlateau. Table 12 shows the
experiment 5 settings. Table 13 shows experiment 5 model comparison. By observing
the training log, the ReduceLROnPlateau function keeps the learning rate at 0.0001.
But the result is not as good as RMSprop.
In this paper we have trained the model with three deep learning algorithms
(Convolutional Neural Network. Transfer learning (Xception) and Transfer learning
(VGG16)). Table 14 shows the best result we obtain for each model trained.

Optimizer Dense layer Number Learning rate
of epochs
Experiment 5 RMSprop x= 50 ReduceLROnPlateau
model GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
x = Dense(1024)(x)
x = Dropout(0.5)(x)
x = Dense(512)(x)
Table 13 Experiment 5
model comparison
Table 14 Accuracy
Model Accuracy Loss
comparison
CNN 0.78 0.67
VGG16 0.725 0.4586
Xception 0.78 1.59
In this experiment there are two issues which have occurred; our experiments’
result is good, but the problem of overfitting can not be minimized. The parameters
from pre-trained models can not fit our dataset accurately.
Thus, in order to solve the first issue we might need to increase the number of
samples in our dataset and diversify the image collection or we might improve the
augmentation function, which is able to minimize the overfitting problems effectively.
Moreover, for the second issue we will need to retrain all the parameters with training
data where it is quite time consuming. Since time is precious, thus to solve this
problem we might need to subscribe to a virtual machine on cloud which is able
to process and obtain the result quickly. Therefore, we will be able to do more
experiments in a finite time given.
4 Conclusion
In this study, three variants of CNN model are proposed. One is to customize a CNN
model, and the other two is transfer and the model we used is Xception and VGG16.
By comparing the accuracy of these three algorithms, we would like to conclude that
64 C. Ke et al.
the CNN model that is shown in Sect. 3.3 is our best model. Although we notice
that the Xception also gives the same result, the loss obtained is lower. However,
if compared with the VGG16 performance it seems like the loss matrix is not that
ideal. So within the three models we choose CNN as the best model since the model
gives an average performance compared with the other two models.
References
1. Alhaj, Y. A., Dahou, A., Al-Qaness, M. A., Abualigah, L., Abbasi, A. A., Almaweri, N. A. O.,
Elaziz, M. A., & Damaševičius, R. (2022). A novel text classification technique using improved
particle swarm optimization: A case study of Arabic language. Future Internet, 14(7), 194.
3. Wu, D., Jia, H., Abualigah, L., Xing, Z., Zheng, R., Wang, H., & Altalhi, M. (2022). Enhance
teaching-learning-based optimization for tsallis-entropy-based feature selection classification
approach. Processes, 10(2), 360.
4. Ali, M. A., Balasubramanian, K., Krishnamoorthy, G. D., Muthusamy, S., Pandiyan, S.,
Panchal, H., Mann, S., Thangaraj, K., El-Attar, N. E., Abualigah, L., & Elminaam, A. (2022).
Classification of glaucoma based on elephant-herding optimization algorithm and deep belief
network. Electronics, 11(11), 1763.
5. Abualigah, L., Kareem, N. K., Omari, M., Elaziz, M. A., & Gandomi, A. H. (2021). Survey
on Twitter sentiment analysis: Architecture, classifications, and challenges. In Deep learning
approaches for spoken and natural language processing (pp. 1–18). Springer.
6. Fan, H., Du, W., Dahou, A., Ewees, A. A., Yousri, D., Elaziz, M. A., Elsheikh, A. H., Abualigah,
L., & Al-Qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
world application UK Brexit. Electronics, 10(11), 1332.
7. Alomari, O. A., Khader, A. T., Al-Betar, M. A., & Abualigah, L. M. (2017). MRMR BA: A
hybrid gene selection algorithm for cancer classification. Journal of Theoretical and Applied
Information Technology, 95(12), 2610–2618.
8. Alomari, O. A., Khader, A. T., Al-Betar, M. A., & Abualigah, L. M. (2017). Gene selection for
cancer classification by combining minimum redundancy maximum relevancy and bat-inspired
algorithm. International Journal of Data Mining and Bioinformatics, 19(1), 32–51.
9. Chung, D. T. P., & Van Tai, D. (2019). A fruit recognition system based on a modern deep
learning technique. Journal of Physics: Conference Series, 1327.
10. Andrea, L., Mauro, L., & Di Ruberto, C. (2021). A novel deep learning based approach for
seed image classification and retrieval. Computers and Electronics in Agriculture, 187.
11. Shaohua, W., & Guodos, S.(2019). Faster R-CNN for multi-class fruit detection using a robotic
vision system. School of Information and Safety Engineering.
12. Osako, Y., et al. (2020). Cultivar discrimination of litchi fruit images using deep learning.
Scientia Horticulturae, 269.
13. Jaswal, D., Vishvanathan, S., & Soman, K. P. (2014). Image classification using convolutional
neural networks. International Journal of Scientific and Engineering Research, 5(6), 1661–
1668.
113609.
10, 16150–16177.
20. Diahashree, G. (2017, June 1). Transfer learning and the art of using pre-trained models in deep
learning. https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-the-art-of-fine-tun
ing-a-pre-trained-model/
21. Transfer learning in Keras using VGG16, 2020. https://thebinarynotes.com/transfer-learning-
keras-vgg16/
Salak Image Classification Method Based
Deep Learning Technique Using Two
Transfer Learning Models
Lau Wei Theng, Moo Mei San, Ong Zhi Cheng, Wong Wei Shen,
Putra Sumari, Laith Abualigah, Raed Abu Zitar, Davut Izci, Mehdi Jamei,
and Shadi Al-Zu’bi
Abstract Salak is one of the fruits plants in Southeast Asia; there are at least 30
cultivars of salak. The size, shape, skin color, sweetness or even flesh color will be
different depending on the cultivar. Thus, classification of salak based on their cultivar
become a daily job for the fruit farmers. There are many techniques that can be used
for fruit classification using computer vision technology. Deep learning is the most
promising algorithm compared to another Machine Learning (ML) algorithm. This
paper presents an image classification method on 4 types of salak (salak pondoh, salak
gading, salak sideempuan and salak affinis) using a Convolutional Neural Network
(CNN), VGG16 and ResNet50. The dataset consists of 1000 images which having
250 of images for each type of salak. Pre-processing on the dataset is required to
standardize the dataset by resizing the image into 224 * 224 pixels, convert into
jpg format and augmentation. Based on the accuracy result from the model, the
best model for the salak classification is ResNet50 which gave an accuracy of 84%
followed by VGG16 that gave an accuracy of 77% and CNN which gave 31%.
L. W. Theng · M. M. San · O. Z. Cheng · W. W. Shen · P. Sumari · L. Abualigah (B)

Malaysia
L. Abualigah
Jordan
R. A. Zitar
Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, 38044 Abu Dhabi,
United Arab Emirates
D. Izci
Department of Electronics and Automation, Batman University, Batman 72060, Turkey
M. Jamei
Faculty of Engineering, Shohadaye Hoveizeh Campus of Technology, Shahid Chamran University
of Ahvaz, Dashte Azadegan, Iran
S. Al-Zu’bi
Faculty of Science and IT, Al-Zaytoonah University of Jordan, Amman, Jordan

https://doi.org/10.1007/978-3-031-17576-3_4
68 L. W. Theng et al.
Keywords Salak classification · Deep learning · CNN · ResNet50 · VGG16
1 Introduction
Snake fruit which also known as salak or Salacca zalacca is a species of palm tree
that is native to Indonesia but it is now grown and produced in the Southeast Asia
[1]. It is called snake fruit due to its reddish-brown scaly skin [2]. The inside of the
fruit consists of 3 lobes that resembles white colored large peeled garlic cloves. The
taste is commonly sweet and acidic with apple-like texture [2]. There is a lot type of
salak such as salak pondoh, salak sidempuan, salak gading, salak affinis, etc. They
are too similar, and it is hard to differentiate among them. Thus, this is where deep
learning comes to the picture.
Deep learning which also known as deep neural network or deep neural learning is
used to process the data and creates the patterns by imitating the human brain to make
a decision [3]. It uses neuro codes that are linked together within the hierarchical
neural network to analyze the incoming data [3]. Image recognition is one of the
most popular deep learning applications that helps a lot of field especially in fruit
agricultural to identify the classification of the fruit.
In the past few decades, CNN or deep learning has been proven a powerful tool
in handling big amount of data especially fruits, characters, animals classification
[4–8]. Say is something easier than done, there are also challenges in image classi-
fication. Image classification is mainly a process of labelling an image according to
the patterns (classes) [9, 10]. For example, image classification of an apple can be
categories at least three types of color which is red, green, yellowish and many more.
Some common problem in fruits detection are sizing, color and view-point variation
which the input image of red cherry and tomatoes can be likely looks similar to a red
apple [10]. According to the journal on fruit classification system using computer
vision, this paper uses image classification and processing to conduct fruits grading
quality, sorting and disease detection before selling to the market [11]. This imple-
mentation benefits fruits industry quality in time saving, reduce human errors, fast
and efficiency and protects good consumer relations [11]. Fruits disease detection
uses the techniques involve clustering, color-based segmentation, and other disease
categorization classifiers [11].
Convolutional Neural Network (CNN) is one of the popular algorithms used to
identify pattern in an image [12]. An image is a picture which form in appearance of
an object such as durian, strawberry, or mango. Is easy for human eye to detect the
object in the image but for a computer vision, it only read it as pixels in bits or binary
format. CNN is kind of deep neural network, very efficient and reliable for all image
processing. The combination of CNN involves few convolution layer, pooling layer
and fully connected neural network [13]. The first process of CNN requires input
image, cropping a section of input image to the convolution layer. Convolution layer
consist a number of filters to extract features with kernel (K) of size 3 × 3 × 1 from
section of input image [13]. Next the image will proceed through pooling layer that
Salak Image Classification Method Based Deep Learning … 69
used non-linear down-sampling which shortened half the size of the image during the
process [13]. There are two kind of pooling layer which is max pooling and average
pooling also referred as activation maps [13]. Max pooling identify the largest value
from the section of the image while average pooling uses the (total sum/number of
pool size) in an image. Next process duplicates the flow of convolution and pooling
layer again to extract more information through the image. Last process uses only
one fully connected layer that all neuron is connected into few classes. It determines
image in few possible classes such as 0.97% for apple, 0.02% for banana and 0.01%
for durian. At the end it will select the highest accuracy among all classes to populate
the result.
A good method to quickly resolve image classification problem is through transfer
learning models. One of the most significant advantages of applying transfer learning
models is that it reduces developer work without requiring too much time to build
a new model at the beginning because the transfer learning model can be instantly
applied to the present image classification problem [14]. Other than directly applying
the transfer learning models, the developer or user should understand the problem
definition of the image classification issue faced and perform fine tuning on certain
convolution layer. Froze some layers and more training layers to fit the objective of
the situation needed. There are various number of transfer learning model can be
used such as VGG, AlexNet, MobileNet, ResNet and etc. [14].
Other than using normal CNN for image classification, Karen Simonyan and
Andrew Zisserman from University Oxford published a paper title called “Very
Deep Convolutional Networks for Large-Scale Image Recognition” which intro-
duced VGG16 model [15]. This VGG16 model has larger parameter size likely the
same with AlexNet model but VGG16 consist of 16 layers convolution layers. The
architecture of VGG16 in first convolution layer fixed the size of (224 × 224) RGB
then continue with a max pooling layer (3 × 3). For the second convolution layer
fixed with size of (112 × 112) RGB and max pooling layer (3 × 3). Then continue
the with three convolution and one max pooling layer on the third to fifth phase,
lastly end with three fully dense layer. The max pooling layer is used to reduce the
image extraction sizes in half. This outstanding model result obtained up to 92.7%
accuracy, placing in the top five at ImageNet [15]. Although the model result is good
but there are also disadvantage such as model requires more time to train and size of
the architecture is huge [16].
Another popular transfer learning model is ResNet50, also known as residual
neural network [17]. ResNet50 uses lesser parameter as compared to VGG16, this
benefits in model running faster because of lesser weight in it. During feature extrac-
tion and weight learning, RestNet50 uses the same way softmax layer via CNN [18].
First pre-processing of ResNet50 resized all images to (224 × 224) pixels to fit the
model input size [18]. Then perform CNN in filtering method for image extraction
depends by the filtering mask applied in kernel (3 × 3) [18]. Next, the section of
the input image will go through feature extraction with 2D-Convolution filter [18].
Depending on the amount of weight in the image, the more valuable feature will be
extracted. Each layer will continue passing through the activation layer to understand
Fig. 1 Sample of salak dataset
complex feature. Lastly process in fully connected layer by repeating the backprop-
agation process depend on the input number of iterations [18]. Based on the keras
application result, this model achieved 92.1% percent accuracy with the parameter
of 25,636,712 [19]. Some other optimization methods can be used to optimize the
problems as given in [20–25].
The main goal of this paper is to develop a CNN model and 2 transfer learning
models which are VGG16 and ResNet50 for image classification. The developed
models should be able to classify the salak images into 4 types of classification
which are salak pondoh, salak gading, salak sideempuan and salak affinis.
2 Dataset
2.1 Dataset Description
The dataset is a collection of images from google, facebook, Instagram, youtube,

etc. All collected images are real-life photos with color with not more than 30%
noise. There are total of 1000 color images in the salak dataset which contains of 250
images from each of the classes (salak pondoh, salak affinis, salak gading and salak
sideempuan). Figure 1 shows the sample of the salak dataset that has been collected.
2.2 Dataset Preparation
Dataset preparation is done to process or transform the image collected into a form
that can be used in designing the model. In this study, resizing, augmentation as well
as converting the images into a standard format is done.
. Resizing—Image’s pixel is resized into 224 × 224 × 3 pixel.
. Image format—Converted into JPEG standard format.
. Augmentation—Transform the image by rotates, flips, etc. to expand the size

of the dataset. This only apply to insufficient dataset such as salak affinis, salak
gading and salak sideempuan. Figure 2 shows the sample of augmentation images.
All 1000 images are split into 70% training, 20% validation as well as 10% testing
dataset. The training and validation dataset will be used to build the model while the
testing dataset is a non-trained dataset which will be used to test the overall accuracy
of the model. Figure 3 shows the example of the directory structure which having a
main directory call Salak. Inside the main directory, it will have 3 folders which are
training, validation and testing and each of the folders will have 4 subfolders called
Pondoh, Gading, Sideempuan and Affinis accordingly as shown in Figs. 4, 5 and 6.
Fig. 2 Sample of augmented images
Fig. 3 Salak directory

Fig. 4 Test directory
Fig. 5 Train directory
Fig. 6 Validation directory
3 Proposed Deep Learning
In this study, Convolutional Neural Network (CNN) as well as two transfer learning
which are VGG16 and ResNet50 models will be developed. All the models will be
trained and tested using the salak dataset to select the best accuracy among them.
3.1 CNN
In our proposed CNN model, we use 2 convolutional layers, 2 pooling layers, 1

flatten operator and 2 dense layers to generate the desired output. We first take in
Fig. 7 CNN model diagram
the input images with the size of (224 × 224 × 3) and feed them into the 2 sets of
convolutional layers and pooling layers. The outputs are then flattened into a single
dimension and fed into 2 hidden layers before the final layer. The activation functions
used for the dense layer is relu and the final layer of the classifier is using the softmax
as its activation function. Since there are 4 classes in the salak dataset, the final output
should have 4 nodes (Fig. 7).
3.2 VGG16
In VGG16, the convolutional base model is frozen, and we unfreeze the top layer.
Two dense layers are added with units’ number 2048 and 1048 respectively and the
output layer with units’ number 4. Output layer is indicating the classes output. The
VGG16 model diagram is shown in Fig. 8.
3.3 ResNet50
In ResNet50, the convolutional base model is frozen, and we unfreeze the top layer.
Two dense layers are added with units’ number 2048 and 1048 respectively and
Fig. 8 VGG16 model diagram
the output layer with units’ number 4. Output layer is indicating the classes output.
Figure 9, 10, and 11
Fig. 9 Overall ResNet50

Fig. 10 Conv block on ResNet50 A
Fig. 11 Conv block on ResNet50 B
There is a total of 1000 color images in the salak dataset which contains 250 images
from each of the classes (salak pondoh, salak affinis, salak gading and salak sideem-
puan). All the image is resized into 224 × 224 pixels. The dataset is split into 70%
train, 20% validation and 10% test. Train dataset is used to train the modal while
the validation dataset is used to evaluate a given model performance while tuning
model hyperparameters. The test dataset is to acts as new data to evaluate the final
model performance. Python is used in these experiments as it has an extensive set of
libraries for artificial intelligence and machine learning such as TensorFlow, Keras
and Scikit-learn. We used Keras API to build, train and validation our models. Google
Colaboratory (Colab) Platform is used to perform all the experiments as no setup
Fig. 12 Mounting to Google

drive
is required, share code with others without any setup and easy to use. Dataset is
upload to Google Drive and the path is shared within the team members and they are
required to add a shortcut to drive for the shared path. Colab allowed us to access our
Google Drive by using the drive module from google.colab. Figure 12 is shown the
code for mounting the drive. Once key in the authorization code by clicking on the
link, it mounted at the drive. We can access the same dataset without downloading
it.
ImageDataGenerator API is used to return batches of images from the subdirec-
tories Sideempuan, Pondoh, Gading and Affinis. Model summary for both VGG16
and ResNet50 is shown in Figs. 13, 14, 15, 16, 17, 18 and 19.
For the transfer learning model (VGG16 and ResNet50) and CNN, we perform
several fine-tuning parameters such as the number of epochs, optimizers, learning
rate and several dense layers. For CNN, additional tuning on filter size while for
transfer learning model on the unfrozen percentage of the model. The activation
function relu for the dense layer except for the output layer as the output layer used
softmax for all the experiments. Validation and test accuracy used to evaluate the
performance of the model.
4.2 Effect of Kernel Size: CNN
Kernel size refers to the size of the filter, which convolves around the feature map.
In this experiments, 3 kernel size are used which are 2, 3 and 4 in CNN only while
VGG16 and ResNet50 model remain using the default value.
Figures 20 and 21 shows the test and validation accuracy obtained. The results
show that the validation accuracy have the best accuracy of 68% when kernel size
is at 3 while it became worst for the test accuracy which gave only 20%. For test
accuracy, it gave the best accuracy of 31% when the kernel size is at 4.
4.3 Effect of Pool Size: CNN
Pool size refer to size that is used to reduce the dimensions of the feature maps.
This will reduce the number of parameters to learn and the amount of computation
performed in the network. In this experiment, there are 3 pool size are used which are
Fig. 13 VGG16 model summary

Fig. 14 ResNet50 model wrapper summary part 1
2, 3 and 4 on CNN model only while the rest of the model will be using the default
value.
Figures 22 and 23 and shows the results of the validation and test accuracy. The
similar pattern as the kernel size can be seen whereby it gave best accuracy of 36%
validation accuracy when pool size is 3 and 31% test accuracy when pool size is 2.
4.4 Effect of Epoch
Epoch is one of the neural networks’ hyperparameter which representing the gradient
descent that controls the number of complete passes through the training dataset. In
this experiment, 3 different epoch value are used which are 10, 20 and 50.
4.4.1 Effect of Epoch: CNN
Based on Fig. 24, the validation accuracy shows the highest at 35% when the epoch
value is at 10 and 50. As for the lowest validation accuracy, it is at 20% when the
epoch value is at 20. The test accuracy is at 31% following by a steady 27% when
to epoch is at 10, 20 and 50 as shown in Fig. 25.
4.4.2 Effect of Epoch: VGG16
Figures 26 and 27 show the accuracy obtained from the test and validation dataset.
The validation accuracy gave its highest at 75.5% when the epoch value is at 10
followed by 69.5% when the epoch value is at 20 and 71% when the epoch value is
at 50. As for the test accuracy, it gave 75% when the epoch is at 20, 73% when epoch
is at 50 and lastly 68% when epoch is at 10.
4.4.3 Effect of Epoch: ResNet50
The accuracy of the test and validation is as shown in Figs. 28 and 29. The epoch
value of 10 gave the highest accuracy of 84% and is decreasing as the epoch value
increase. As for the test accuracy, it gave the peak accuracy of 82% when the epoch
value is at 20.
Fig. 19 ResNet50 overall model summary
Fig. 20 CNN—effect of kernel size on validation accuracy
4.5 Effect of Optimizer
Optimizers are a neural network algorithm that is used to change the attributes of
the neural network such as the weight parameters and learning rate. The objective of
the optimizers is to reduce the loss of the neural network function by enhancing the
Fig. 21 CNN—effect of kernel size on test accuracy
Fig. 22 CNN—effect of pool size on validation accuracy
parameters of the neural network. In this experiment, there are 4 types of optimizer
that are used which are Adam, SGD, Adadelta and Adagrad.
4.5.1 Effect of Optimizer: CNN
Figures 30 and 31 shows the accuracy from validation and test dataset when using
different optimizer. Adagrad optimizer shows the best validation accuracy of 67%,
Fig. 23 CNN—effect of Pool size on test accuracy
Fig. 24 CNN—effect of epoch on validation accuracy
Adadelta gave 41.5%, Adam gave 35% and SGD gave 25%. For the test accuracy,
Adam gave the highest of 31% compared to SGD who gave 25%, Adagrad who gave
19% and Adadelta who gave 17%.
Fig. 25 CNN—effect of epoch on test accuracy
Fig. 26 VGG16—effect of epoch on validation accuracy
4.5.2 Effect of Optimizer: VGG16
Figures 32 and 33 shows the comparison of the accuracy using test and validation
dataset in VGG16 model. SGD optimizer shows the best optimizer when using the
validation dataset which having 71% followed by Adam and Adagrad which having
69.5% and lastly Adadelta which having 44%. As for the test data set, Adam gives
the best accuracy among all. Adam having an accuracy of 76%, SGD having 69%,
Adagrad having 66% and Adadelta having 50%.
Fig. 27 VGG16—effect of epoch on test accuracy
Fig. 28 ResNet50—effect of epoch on validation accuracy
4.5.3 Effect of Optimizer: ResNet50
The effect of the optimizer on ResNet50 is shows in Figs. 34 and 35. For the valida-
tion accuracy, Adadelta giving the highest accuracy of 86.5% while Adagrad gave
accuracy of 78%. Adam and SGD gave the lowest accuracy of 25%. As for the test
accuracy, Adagrad shows the best result obtained which are 82% of the accuracy.
However, Adadelta is also given a quite high accuracy of 79% while Adam and SGD
are the lowest which gave an accuracy of 25%.
Fig. 29 ResNet50—effect of epoch on test accuracy
Fig. 30 CNN—effect of optimizer on validation accuracy
4.6 Effect of Learning Rate
Learning rate is one hyperparameter of neural network that controls how much to
change the model in response to the estimated error for each time the weight of the
model is updated. Selecting the learning rate is a challenge as a too small value will
result in a long training process while high value will cause the training process to
Fig. 31 CNN—effect of optimizer on test accuracy
Fig. 32 VGG16—effect of optimizer on validation accuracy
unstable. There are 4 different learning rate values are used in this experiment which
are 0.1, 0.01, 0.001 and 0.0001.
Fig. 33 VGG16—effect of optimizer on test accuracy
Fig. 34 ResNet50—effect of optimizer on validation accuracy
4.6.1 Effect of Learning Rate: CNN
Figures 36 and 37 shows the result of the validation and test accuracy. The validation
accuracy on CNN shows its peak on 82.14% when the learning rate is 0.01. When
learning rate is at 0.1 it gave an accuracy of 26.7% followed by 25% with learning rate
of 0.001 and 0.1. As for the test accuracy, it shows the similar pattern as validation
Fig. 35 ResNet50—effect of optimizer on test accuracy
Fig. 36 CNN—effect of learning rate on validation accuracy
accuracy. When learning rate is 0.01, it gave the highest test accuracy of 35% followed
by 25% when the learning rate is at 0.1, 0.001 and 0.0001.
4.6.2 Effect of Learning Rate: VGG16
Figures 38 and 39 show the accuracy on test and validation dataset. The highest
accuracy is at 76% when learning rate value is 0.0001 for validation accuracy and
Fig. 37 CNN—effect of learning rate on test accuracy
0.001 for test accuracy. The overall results show that the higher the value of learning
rate, the lower the accuracy.
Fig. 38 VGG16—effect of learning rate on validation accuracy

Fig. 39 VGG16—effect of learning rate on test accuracy
4.6.3 Effect of Learning Rate: ResNet50
Figures 40 and 41 shows the accuracy obtained for ResNet50 based on the learning
rate. The results show the similar pattern as VGG16, whereby the higher the value
of the learning rate, the lower the accuracy will be. Both test and validation highest
accuracy is at 83% when learning rate is at 0.0001 and 0.001 respectively.
Fig. 40 ResNet50—effect of learning rate on validation accuracy

Fig. 41 ResNet50—effect of learning rate on test accuracy
4.7 Effect of Dense Layer
Dense layer is a neural network layer that is connected deeply. This means that all
neuron in the dense layer receives inputs from the previous layer. In this experiment,
4 different dense layer is used, which are 1, 2, 3 and 4.
4.7.1 Effect of Dense Layer: CNN
Figures 42 and 43 show the effect of dense layer on validation and test accuracy for
CNN. The results show that as the dense layer increase, the accuracy will decrease.
The validation accuracy gave 51% for 1 dense layer followed by 33.5%, 28% and 31.5
respectively. As for the test accuracy, it gave its highest accuracy of 25% followed
by 16% and 23%.
4.7.2 Effect of Dense Layer: VGG16
Figures 44 and 45 shows the result of the validation and test accuracy. The highest
validation accuracy is at 76% while the highest test accuracy is at 77% when dense
layer is at 3. As for the lowest both gave 25% when the dense layer is at 2.
Fig. 42 CNN—effect of dense layer on validation accuracy
Fig. 43 CNN—effect of dense layer on test accuracy
4.7.3 Effect of Dense Layer: ResNet50
Figures 46 and 47 shows the result of the validation and test accuracy. The valida-
tion accuracy has the highest at 86.5% as the dense layer increased. As for the test
accuracy, it shows the highest at 82% while the lowest is at 72%.
Fig. 44 Effect of dense layer on validation accuracy
Fig. 45 Effect of dense layer on test accuracy
4.8 Effect of Fine-Tuning for Pre-trained Models (VGG16

and ResNet50)
Fine-tuning is a process that modifies the feature representation of the pretrained

model to make the model more suitable for a specific task, in this case, salak dataset.
The fine-tuning steps involved are unfreezing the top layers of a frozen pre-trained
Fig. 46 ResNet50—effect of dense layer on validation accuracy
Fig. 47 ResNet50—effect of dense layer on test accuracy
model base and attaching a few newly added classifier layers. Retraining of the newly
modified models is required to obtain the new weights and biases.
As shown in Tables 1 and 2, the pretrained model performs the best when 100% of
the layers of the pretrained models are frozen. ResNet-50 in general performs better
than VGG-16 for this dataset as it can obtain over 80% of accuracy when 100% of the
layers are frozen. Bold font refers to the best result. Both model performances decline
after we unfreeze the layers. VGG-16 gets 25% of accuracy for all the unfrozen
Table 1 Validation accuracy

Unfrozen percentage of pre-trained models (%) 0 20 40 60 80 100
VGG-16 0.70 0.25 0.25 0.25 0.25 0.25
ResNet-50 0.83 0.80 0.81 0.68 0.71 0.87
Table 2 Test accuracy

Unfrozen percentage of pre-trained models (%) 0 20 40 60 80 100
VGG-16 0.76 0.25 0.25 0.25 0.25 0.25
ResNet-50 0.81 0.21 0.23 0.18 0.26 0.23
percentage. Whereas ResNet-50 can obtain high validation accuracy but very low-
test accuracy. This suggests that the ResNet-50 model is having the problem of
overfitting after we unfreeze the layers. In a nutshell, a pre-trained model performs
better for salak dataset when the layers are all frozen.
All three models have the highest validation accuracy when epoch = 10. Within the
range of 10, 20 and 50. After that, the validation accuracy suffers a drop at epoch
= 20 and increases again at epoch = 50. As for the test accuracy, the graphs above
show that pretrained models can achieve the highest test accuracy when epoch = 20
within the epoch range of 10, 20, 50. CNN on the other hand, has the highest test
accuracy when epoch = 10. This shows that CNN can achieve high test accuracy
faster than the pre-trained models (Figs. 48 and 49).
From the two bar charts (Figs. 50 and 51), we can infer that using Adadelta
and Adagrad can yield better validation accuracy for all the models. When we are
comparing the test accuracy, pretrained models that use Adadelta and Adagrad can
give a higher test accuracy. However, CNN model with Adam optimizer can give a
higher test accuracy compared to CNN model that uses other optimizers.
For the learning rate charts, they share quite similar trends for validation accuracy
and test accuracy (Figs. 52 and 53). Pretrained models are having the decreasing
trend on the validation accuracy while CNN model is having the optimal learning
rate at 0.001 on validation accuracy and test accuracy. Pretrained models are also
having highest at optimal learning rate at 0.001 on test accuracy. Therefore, we can
deduce that 0.001 of learning rate works best for the salak dataset in this study.
Based on the two graphs (Figs. 54 and 55), we can see that the increasing dense
layer for the ResNet-50 pre-trained model increases its validation accuracy but
decreases its test accuracy. while VGG-16 is sharing a similar pattern for the valida-
tion accuracy and test accuracy, having the highest score when dense layer = 3 and
lowest when dense layer = 2. Whereas for CNN model, we can see a decrease in
Fig. 48 VGG16, ResNet50 and CNN—effect of epoch on validation accuracy
Fig. 49 VGG16, ResNet50 and CNN—effect of epoch on test accuracy
validation accuracy and test accuracy when the dense layer is increased. Therefore,
CNN performs the best when dense layer = 1.
The best performing model is ResNet-50 with 84% of test accuracy, closely
followed by VGG-16 model with 77% test accuracy (Fig. 56). CNN has the lowest
test accuracy which is 31% for the salak dataset. The best combinations of parameters
and the hyperparameters of the 3 respective models are presented in Table 3.
Fig. 50 VGG16, ResNet50 and CNN—effect of optimizer on validation accuracy
Fig. 51 VGG16, ResNet50 and CNN—effect of optimizer on test accuracy

Fig. 52 VGG16, ResNet50 and CNN—effect of learning rate on validation accuracy
Fig. 53 VGG16, ResNet50 and CNN—effect of learning rate on test accuracy

Fig. 54 VGG16, ResNet50 and CNN—effect of dense layer on validation accuracy
Fig. 55 VGG16, ResNet50 and CNN—effect of dense layer on test accuracy

Fig. 56 VGG16, ResNet50 and CNN—comparison of best validation and test accuracy
Table 3 Best combination of parameters/hyperparameters for models

Models Best combination of parameters/hyperparameters
VGG-16 • base_model: VGG-16 (100% frozen weight)
• 20 epochs
• Adam optimizer
• Learning rate of 0.001
• 3 dense layers
ResNet-50 • base_model: ResNet-50 (100% frozen weight)
• 20 epochs
• Adagrad optimizer
• 2 dense layers
CNN • 10 epochs
• Adam optimizer
• 2 Convolutional layers
• 2 Pooling layers
• 1 hidden layer/dense layer
• Kernel size: (4, 4)
• Pool size: (2, 2)
5 Conclusion
In conclusion, the experiments are performed using CNN model and 2 transfer
learning models which are VGG16 and ResNet50. The results are compared using
test accuracy and validation accuracy to evaluate the performance of the model for
each of the fine-tuning parameters. The highest validation accuracy value for each of
the model when epoch at 10. The highest test accuracy for transfer learning models
(VGG16 and ResNet50) is when epoch is at 20 while the CNN is when epoch is at
10. ResNet50 has the highest test accuracy which is 84% compare to VGG16 and
CNN. Transfer learning model is performed better than CNN model. In this dataset,
there is overfitting as the model is performs well on the training data but performs
poorly on the validation data which is not used during training. There is some future
work can be done to increase accuracy. Sampling method can be used to split the
dataset into train, validation, and test dataset.
References
1. Snake fruit—Delicious taste, terrifying nightmare. Migrationology. [Online]. https://migration

ology.com/snake-fruit-salak/
2. Salak fruit facts and health benefits. HealthBenefits. [Online]. https://www.healthbenefitstimes.
com/health-benefits-of-salak-fruit/
3. Deep learning. Investopedia. [Online]. https://www.investopedia.com/terms/d/deep-learni
ng.asp
4. ud Din, A. F., Mir, I., Gul, F., Mir, S., Saeed, N., Althobaiti, T., Abbas, S. M., & Abualigah, L.
(2022). Deep reinforcement learning for integrated non-linear control of autonomous UAVs.
Processes, 10(7), 1307.
5. Gharaibeh, M., Alzu’bi, D., Abdullah, M., Hmeidi, I., Al Nasar, M. R., Abualigah, L., &
Gandomi, A. H. (2022). Radiology imaging scans for early diagnosis of kidney tumors: a
review of data analytics-based machine learning and deep learning approaches. Big Data and
Cognitive Computing, 6(1), 29.
6. Danandeh Mehr, A., Rikhtehgar Ghiasi, A., Yaseen, Z. M., Sorman, A. U., & Abualigah,
L. (2022). A novel intelligent deep learning predictive model for meteorological drought
forecasting. Journal of Ambient Intelligence and Humanized Computing, 1–15.
7. Abualigah, L., Zitar, R. A., Almotairi, K. H., Hussein, A. M., Abd Elaziz, M., Nikoo, M.
R., & Gandomi, A. H. (2022). Wind, solar, and photovoltaic renewable energy systems with
and without energy storage optimization: A survey of advanced machine learning and deep
learning techniques. Energies, 15(2), 578.
9. Convolutional neural networks (CNN). Analytics Vidhya, May 1, 2021. [Online]. https://
www.analyticsvidhya.com/blog/2021/05/convolutional-neural-networks-cnn/. Accessed June
6, 2021.
10. Gilani, R. (2021). Main challenges in image classification. towards data science, June 13,
2020. [Online]. https://towardsdatascience.com/main-challenges-in-image-classification-ba2
4dc78b558. Accessed June 20, 2021.
11. Naik, S., & Patel, B. (2017). Machine vision based fruit classification and grading-a review.
International Journal of Computer Applications, 170(9), 22–34.
12. What is a convolutional neural network? [Online]. https://poloclub.github.io/cnn-explainer/.
Accessed June 22, 2021.
13. Das, A. (2020). Convolution neural network for image processing—Using keras. towards
data science, August 21, 2020. [Online]. https://towardsdatascience.com/convolution-neural-
network-for-image-processing-using-keras-dc3429056306. Accessed June 22, 2021.
14. Marcelino, P. (2018). Solve any image classification problem quickly and easily. KDnuggets,
December 2018. [Online]. https://www.kdnuggets.com/2018/12/solve-image-classification-
problem-quickly-easily.html. Accessed June 24, 2021.
15. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale
image recognition, Cornel University, April 10, 2015. [Online]. https://arxiv.org/abs/1409.
1556. Accessed June 24, 2021.
16. VGG16—Convolutional network for classification and detection. Neurohive, November 20,
2018. [Online]. https://neurohive.io/en/popular-networks/vgg16/. Accessed June 24, 2021.
17. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition,
Cornell University, December 10, 2015. [Online]. https://arxiv.org/abs/1512.03385. Accessed
June 25, 2021.
18. Zahisham, Z., Lee, C. P., & Lim, K. M. (2020). Food recognition with ResNet-50. In IEEE 2nd
international conference on artificial intelligence in engineering and technology (IICAIET)
19. “Keras” [Online]. https://keras.io/api/applications/. Accessed June 6, 2021.
113609.
10, 16150–16177.
Image Processing Identification
for Sapodilla Using Convolution Neural
Network (CNN) and Transfer Learning
Techniques
Ali Khazalah, Boppana Prasanthi, Dheniesh Thomas, Nishathinee Vello,

Suhanya Jayaprakasam, Putra Sumari, Laith Abualigah,
Absalom E. Ezugwu, Essam Said Hanandeh, and Nima Khodadadi
Abstract Image identification is a useful tool for classifying and organizing fruits
in agribusiness. This study aims to use deep learning to construct a design for
Sapodilla identification and classification. Sapodilla comes in a various of vari-
eties from throughout the world. Sapodilla can come in different sizes, form, and
taste depending on species and kind. The goal is to create a system which uses
convolutional neural networks and transfer learning to extract the feature and deter-
mine the type of Sapodilla. The system can sort the type of Sapodilla. This research
uses a dataset including over 1000 pictures to demonstrate four different types of
Sapodilla classification approaches. This assignment was completed using Convo-
lutional Neural Network (CNN) algorithms, a deep learning technology widely
utilised in image classification. Deep learning-based classifiers have recently allowed
to distinguish Sapodilla from various images. Furthermore, we utilized different
versions of hidden layer and epochs for various outcomes to improve predictive
performance. We investigated transfer learning approaches in the classification of
Sapodilla in the suggested study. The suggested CNN model improves transfer
learning techniques and state-of-the-art approaches in terms of results.
A. Khazalah · B. Prasanthi · D. Thomas · N. Vello · S. Jayaprakasam · P. Sumari ·

L. Abualigah (B)
Malaysia
L. Abualigah
A. E. Ezugwu
School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King
Edward Road, Pietermaritzburg 3201, KwaZulu-Natal, South Africa
E. S. Hanandeh
Department of Computer Information System, Zarqa University, Zarqa, Jordan
N. Khodadadi
Department of Civil and Environmental Engineering, Florida International University, Miami, FL,
USA
https://doi.org/10.1007/978-3-031-17576-3_5
108 A. Khazalah et al.
Keywords Sapodilla · Deep learning · Convolution neural network · Transfer

learning
1 Introduction
Sourcing talented ranch work in the farming business (particularly cultivation) is

perhaps the most expense requesting factors in the industry [1]. This would be due to
growing supply prices for things like power, water irrigation, and genetically modi-
fied crops, among other things. Farm businesses and the agricultural sector are being
squeezed by low profit margins as a result of this. Under certain conditions, agri-
cultural production must continue to fulfil the rising demands of an ever-increasing
worldwide population, posing a serious concern in the future.
Sapodilla is a tropical fruit that can be found in South America as well as South
Asia. In Malaysia, this fruit is better known as Ciku. One of the greatest issues in
the farm fields is detecting Sapodilla and classifying the many types of Sapodilla.
Furthermore, it leads to higher prices [2]. As a result, we require an automation system
that will reduce manual labour, improve productivity, and decrease maintenance
money and effort. Figure 1 shows different types of Sapodilla.
Robotic cultivation has the opportunity to overcome this challenge by lowering
labour expenses (because to increased durability and predictability) while also
improving crop productivity. For some of these factors, during the last 3 decades,
there seems to be a significant focus in using agriculture sector robots to harvest
fruits [3]. The creation of these kind of systems entails a variety of difficult activi-
ties, including such manipulating and choosing. Nevertheless, developing a precise
fruit recognition system is a critical step towards completely automated harvest robot,
Fig. 1 Different types of Sapodilla

Image Processing Identification for Sapodilla Using Convolution Neural … 109
since it is the front-end perceptual technology that precedes succeeding manipula-

tive and clutching technologies; if fruits is not recognised or even seen, it cannot be
harvested [1].
This phase is difficult due to a variety of circumstances, including lighting vari-
ations, occlusions, and circumstances whenever the fruit has a consistent image
look to the background. With the rapid advancement of our human civilization,
more emphasis was placed on the perfection of our lives, especially the foods we
consume. Computer vision have become increasingly popular in personalized recom-
mender technologies in recent years. Deep Neural Network (DNN) is often used to
recognise fruits from photos in the areas of images recognition and characteriza-
tion [4]. DNN outperforms other machine learning algorithms. Convolutional Neural
Networks (CNNs) are a type of neural network. A deep learning algorithm is one that
is categorized as such. CNNs are now the most widely utilized kind in deep learning.
It is employed in a variety of image processing analyses. The accuracy rates in certain
sectors, such as fruit classification using CNN, have surpassed human abilities [5].
CNN’s framework is remarkably similar to ANN’s. Each layer of the ANN has
many neurons. As a result, the weighed total of a layer’s neurons is now the source of
a neuron in the following layer, which adds a biased result. The layer in CNN contains
three elements [6]. All of the neurons are linked to a single convolutional layer rather
than being totally linked. To train the classifier, a costing process is defined. It analyses
the network parameters to the expected outcome [7]. Deep learning approaches has
made good advances in meeting these objectives in recent times. Fruit detecting is
a challenge that may be thought of and expressed as a feature’s extraction problem.
Convolutional Neural Networks (CNN) were employed in the presented system to
detect fruit communications system form photographs [5]. In comparison to those
other studies, the suggested technique attempts to solve all of the constraints of
comparable fruit detecting system operates and achieve a high level of accuracy. The
technology has delivered functionality that is both simple and efficient [3]. Some
other optimization methods can be used to optimize the problems as given in [8–13].
In effort to accomplish fruit identification by machine. We suggest CNN training,
under which the computer must display the output type provided to the networks
as a consequence, independently of its type, color, number, texture, or other
characteristics [14].
2 Literature Survey
Despite the fact that several scientists have handled the subject of fruit recognition,
such as with the resulting in the development in [2, 15–17] the survey concluded
that the difficulty of developing a quick and efficient fruit detector continues. This
is owing to the variety of color, dimensions, sizes, textures, ed and susceptible to
constantly shifting lighting and shadow circumstances in the bulk of these scenarios.
The subject of fruit recognition as a feature’s extraction issue has been addressed
in many works in the literature (i.e., fruit vs. background). The subject of apples
recognition for yield estimation was investigated by Wang et al. [2].
They established a form that could recognise apples primarily on their color and
sparkling reflection’s structure. Additional details were utilised whether to elimi-
nate inaccurate incidences or separate regions that might include numerous apples,
including the size distribution of apples. Another strategy used was to only consider
detection methods from locations that were predominantly circular. Bac at [15]
and his colleagues for sweet peppers, a classification method was presented. They
employed 6 multi-spectral cameras and a variety of characteristics, comprising unpro-
cessed spectral information, standardized precipitation indexes, and feature descrip-
tors based on entropy. Investigations in a carefully controlled glasshouse setting
revealed that this method yielded fairly accurate segmented image. The writers,
though, made a point. It wasn’t precise enough to create a trustworthy impediment
map.
For almonds identification, Hung et al. [16] advocated using artificial potential
fields. They suggested a five-class categorization method based on a Sparse Autoen-
coder that learnt parameters (SAE). These traits were again applied to a CRF frame-
work, which outperformed earlier research. They were able to divide the data quite
well, but could not recognize any objects. They also mentioned that refraction was a
significant difficulty. Instinctively, such a strategy can only handle modest amounts
of opacity.
Yamamoto et al. [15], for example, used color-based segmented to conduct tomato
identification. Then, using color and figure information, a Classifier and Regression
Trees (CART) classifier was trained. As a result, a classification map was created,
which divided related pixels into areas. To limit the number of false alerts, each zone
was assigned a detector. They used a random forest to train a non-fruit classifier in
regulated glasshouse conditions.
A pixel-level separation methodology for image recognition has been used in every
one of the earlier in this thread research, and the majority of these efforts have focused
on fruit recognition primarily for production estimate. Fruit recognition has only
been done in regulated glasshouse situations in the few experiments that have been
done. All things considered, the issue of organic product location in exceptionally
testing conditions remains unsolved. This is because of the great changeability in
the presence of the objective articles in the horticultural settings, which implied
that the exemplary strategies for sliding window draws near, despite the fact that
showing great execution when tried on datasets of chosen pictures, can’t deal with
the inconstancy in scale and presence of the objective items when sent in genuine
homestead settings [2, 15].
Deep learning models have already made significant advances in the categoriza-
tion and recognition of objects. On PASCAL-VOC, the state-of-the-art recognition
architecture is divided into two phases. The pipeline project first step uses a fully
convolutional approach like feature extraction or edge box to pick areas of focus from
a picture, which are then sent to a deep learning for classifications. This pipeline is
computationally intensive, preventing that from being employed in instantaneously
for an engineering application, despite its great recognition memory [16, 17]. RPNs
solve these problems by integrating a recognition convolutional neural infrastructure

with an image helpful for increasing, allowing the device to forecast regulate and
identify them from each location asynchronously. The specifications of the network
entities are decided to share, resulting in significantly higher throughput, making it
perfect for engineering manipulators.
In real-world outside agricultural areas, a multiple sensor modal is seldom enough
to data from relevant fruits behind a variety of lighting conditions, partially invariant,
and various looks. This creates a strong argument for multi-modal fruit detection
methods, since multiple types of devices may offer complimentary information on
specific features of the fruit. Deep learning models have previously demonstrated
considerable potential when employed for multi-modal algorithms in sectors other
than farming technology, such as in, whereby audio/video has already been employed
extremely well, and in, whereby photograph has outperformed each modality sepa-
rately. As shown in the next parts, this study takes the very same technique and
shows how much a multi-modal geographical area fruit identification system beats
pixel-level segmentation technique [15–17].
A remarkable procedure for perceiving organic products from photographs
utilizing profound convolutional neural organizations is introduced in examination.
The scientists utilize a Faster Region-based cnn model for this. The objective is to
foster a computational model that can be used without anyone else driving robots to
pick organic products. RGB and NIR (infrared area) pictures are utilized to prepare
the neural organization. The RGB and NIR forms are consolidated in two unique
manners: early and center intermingling. The initiation work for starting combina-
tion contains four streams: three for the RGB picture and another for the NIR picture.
Postponed assembly utilizes the use of two independently preparing pictures that are
consolidated by normal the yields both from calculations. As a result, a multi-modular
organization with a lot better than past frameworks have been created [26].
3 Proposed Deep Learning for Sapodilla Recognition
Artificial neural networks [18, 19] produced the most effective achievements in the
domain of picture identification and classification. The majority of deep learning
methods are built on top of all these systems.
Neural networks [18] is a type of machine learning technique that employs
numerous layers of asymmetric processing elements. Each layer acquires the ability
to modify its incoming information into a more complex and model is an appropriate
[19]. Other machine learning techniques have been outperformed by deep learning
models.
In some sectors, they also accomplished the very first superhuman image recog-
nition [18]. This is amplified by the fact that neural networks are seen as a vital
step towards achieving High Quantities. Second, deep learning models, particu-
larly convolutional neural network (CNN), have been shown to provide excellent
classification performance recognition.
Fig. 2 CNN architecture
3.1 The Proposed CNN Architecture
A deep learning framework is used for the conceptual model. There are three CNN
layer in the framework. A group of pixels in the picture might indicate a picture’s
boundary, the shadows of a picture, or any other structure. Convolution is one method
for detecting these connections. A matrix is used to describe the picture elements
during calculation. The CNN Model’s framework is seen in Fig. 2. It entails the
extraction and categorization of features. Cropping removes any unnecessary data
from input photos. The pictures have all been resized. Convolution and pooling layers
are applied a repeatedly to extract features. One convolution layer and a maximum
pooling layer are found in the first two blocks. For identifying the examples, we
need to utilize a “filter” network which is increased with the picture pixel grid. These
channel sizes may shift and the duplication absolutely relies upon the channel size
and one can take a subset from the picture pixel lattice dependent on the filter size for
convolution beginning from the principal pixel in the picture pixel network. Then, at
that point the convolution continues forward to the following pixel and this cycle is
rehashed until all the picture pixels in the framework are finished. Then, at that point
the convolution continues forward to the following pixel and this cycle is rehashed
until all the picture pixels in the lattice are finished. The pooling layer will be the next
kind of level in the CNN method. This layer reduces the outcome size, i.e. the feature
map, and hence avoids curse of dimensionality. A fully connected surface is utilised
as the output layer. This level “compresses” the result from preceding levels into a
descriptor number that may be used as an intake for the following stage. Figure 3
shows the trained images of CNN model.
Transfer learning is a machine learning technique in which a prototype developed

once per job is used as the foundation for a simulation for another activity. Because
when given data is insufficient, this methodology performs well, and the algorithm
corresponds quickly. Transfer learning may be used to classify images in a variety
of different ways. We began by loading the pre-trained models and discarding the
final layer. We adjusted the remainder of the levels to non-trainable when they were
Fig. 3 Trained images of CNN model
Fig. 4 Transfer learning
eliminated. Then, towards the platform’s end, we inserted additional thick layer, this
time with the amount of sapodilla types we wish to forecast. Figure 4 shows the
transfer learning.
3.2.1 VGG16
Convolutional and completely linked layers make up the 16-layer matrix. For conve-
nience, just 33 convolution layers were put on top of the other. The first and secondary
convolution materials are composed of 64 element kernels filter with a size of 33%.
The parameters of the input picture increase to 224 × 224 × 64 as it passes through
the first and secondary convolution layer. The outcome is then transferred to the
pooling layer with a duration of two. The 124 element kernels filter in the 3rd and
4th convolutional layers have a filter of 33%. After these 2 phases, a max pooling
with 2 × 2 is applied, and the outcome is shrunk to 56 × 56 × 128. Convolutional
layers with just a kernel of 33 are used in the five, six, and seven levels. 256 local
features are used in all 3. These cells are surrounded by a phase 2 pooling layer.
There are 2 types of convolution operation with kernel sizes of 33rd and thirteenth.
There are 512 kernels filters in every one of those convolution kernel sets. Following
such levels is a max—pooling with a duration of 1. Figure 5 shows the architecture
of VGG16.
Fig. 5 Architecture of
VGG16
3.2.2 VGG19
VGG19 is perhaps the latest VGG architecture, and it looks quite identical to VGG16.
When we examine the structure of the network with VGG16, we’ll notice that they’re
both built on 5 convolutional layers. Nevertheless, by implementing a convolu-
tion operation throughout the last 3 groups, the network’s complexity was already
enhanced yet further. The intake is indeed an RGB picture with the form (224, 224,
3), and the outcome is a features vector with the same structure (224, 224, 3). VGG19
has its own preparation method in Keras, however if we examine at the source, we’ll
notice that it is indeed exactly the same except VGG 16. As a result, we won’t have
to redefine anything.
3.2.3 MobileNet
MobileNet is a neural network which is used for categorization, recognition, and

some other typical applications. They are quite tiny, which allows us to use them on
portable apps, and their dimension is 17 MB. A simplified framework is used to create
them. This design built compact and deep neural network models using complexity
sustainabilit. These complexity convolutional layers generate a simplification benefit,
reducing the model’s length and speeding up execution. Mobile Nets could be used
to increase productivity in a variety of applications. MobileNets may be utilised for
a variety of activities, including object identification, fine—grained categorization,
face feature categorization, and so on. MobileNet is a powerful neural network that
may be utilised in image recognition. Only with amount of methods in our framework,
we updated the very last level in the MobileNet framework in this project.
3.3 Dataset
Pictures of four various types of sapodilla are included in the dataset. The four types
of sapodilla are ciku Subang, ciku Mega, ciku Jantung and ciku Betawi. The pictures
in the collection include ciku of various sizes from several classes. The photos do
not have a uniform backdrop. Various postures of the very same sorts of ciku may
be found in the dataset. Cikus are included in a variety of postures and viewpoints,
including side angle, back view, various backgrounds, partially chopped, sliced on the
plate, chopped into bits, displaying the seeds, and degree of variability. Ciku might
be freshly, rotting, or packaged in bunches. Many photos have bad lighting, unusual
lighting characteristics, are covered with net, are adorned, decorated, and have leaves
on trees. The dataset consist of more than 1000 images. Figure 6 shows sample dataset
images. Table 1 shows the dataset description.
Fig. 6 Sample dataset

images
Table 1 Dataset description

Input Labels
Ciku Subang 250
Ciku Mega 250
Ciku Betawi 250
Ciku Jantung 250
3.4 Augmentation
The availability of data frequently enhances the effectiveness of deep learning neural
network models. Data augmentation is a method of dynamically creating fresh
training data from previous facts. This is accomplished by using database methods to
transform instances from the learning algorithm into different and innovative training
images. The very well sort of dimensionality reduction is picture data augmentation,
which entails transforming pictures in the train dataset into modified copies that
correspond to almost the same classification also as actual picture. Transitions, rota-
tions, digital zoom, and other procedures from the area of picture modification are
included in transformations. The goal is to add fresh, credible instances to the train
collection.
This refers to changes of the training data set pictures that perhaps the algorithm
is interested in examining. A horizontally tilt of a sapodilla shot, for instance, would
make logical sense as its picture may well have been captured from either the left-
hand side or right. A vertically inversion of a sapodilla image makes some sense and
is certainly not acceptable, considering that perhaps the modelling is uncommon to
view an inverted sapodilla shot.
As a result, it is obvious that only the exact data augmentation methodologies
utilised for a training sample are always deliberately selected, taking into account
the training set as well as understanding of the issue area. Furthermore, experimenting
Fig. 7 Images that are augmented
with data augmentation approaches alone and in combination to determine if these

lead to significant gain in system performance can be beneficial. Advanced deep
learning methods, including the convolutional neural network (CNN) [17], could
understand characteristics that become independent of where they appear in the
picture.
Nonetheless, augmentation can help with this change consistent method to training
by assisting the algorithm in understanding characteristics many of which are change
consistent, along with right-to-left to top-to-bottom sorting, and lighting intensities
in photos. Usually, digital data augmentation was used only on the training set, not
really the validating or test datasets. Pre-processing, including picture cropping and
pixels resizing, differs in that it will be done uniformly throughout all variables that
communicate only with algorithm. Figure 7 shows the augmented image.
To eliminate any extra info, the pictures in the collection are normalised, shrunk, and
clipped. The information is split into two parts: train and validation. The dataset is
divided into 80 and 20%.
We conducted comprehensive tests to examine performance of the classifier based

on skin color, material, and structure related to the system utilising numerous inde-
pendently isolated picture examples in order to determine the optimal subset of
features and classification techniques. The sample utilised was well-balanced, with
1000 pictures totaling 250 Ciku Mega, 250 Ciku Jantung, 250 Ciku Subang, and 250
Ciku Betawi. This dataset’s pictures have all been initialised. To develop the proposed
approach, we employ the Python programming (particularly, the Keras modules). We
Fig. 8 Own model
employed a few models, our own model built from scratch and also a few existing
image classification models to perform transfer learning as detailed below. Figure 8
shows the proposed model. Figure 9 shows VGG16 model. Figure 10 shows VGG19
model.
This method incorporates a 20% sample size, learning rates, batch sizes of 500, and
epochs of 20. The results were evaluated on the testing sample after the machine was
developed on the sapodilla training dataset. The model’s accuracy is 0.54. Figure 11
shows the plot of training accuracy and validation accuracy.
4.2.1 Effect of Optimizers
The neural Network system is built by continuously changing parameters of all

nodes within the network, with the optimization playing a key role. The gradient
Fig. 9 VGG16
Fig. 10 VGG19
Fig. 11 Plot of training accuracy and validation accuracy
descent technique is a top destination for CNN optimization. Furthermore, Adam

optimizer’s categorization patterns are somewhat superior than those created by many
other adapted optimization methods. Adam: It saves the exponential function mean
including its previous inclination (at ), that indicates that the very first instant (mean),
and also the previous square variation (ut ), which indicates the second cut (variance).
The following is how at and ut are calculated:
at = β1 at−1 + (1 − β1 )dt
ut = β2 ut−1 + (1 − β2 )dt2
The effectiveness of a deep Convolution layer with several optimization methods

has indeed been given in contrast to quantitative assessment. After using the adam
optimizer the accuracy was raised to 0.99 epochs of 30. Figure 12 shows the trained
CNN model. Figure 13 shows the accuracy results after the training.
Fig. 12 Trained CNN model

Fig. 13 After training
Table 2 Performance of all

Model Dense layer Accuracy
the models using dense layers
CNN model 3 0.54
CNN model (Adam) 4 0.99
Transfer learning 1 3 0.45
VGG16 3 0.57
VGG19 3 0.70
MobileNet 3 0.65
4.2.2 Effect of Dense Layer
We added dense layers and used a compressed convolutional neural network to

increase the speed feed-forward operation. Table 2 shows the performance of all the
models using dense layers.
The effect of dense layer is high when the dense layer is 4 the accuracy score is
0.99 when the dense layer is 3 the accuracy score is less.
4.2.3 Effect of Filter Number
In the very first trial, 3 convolution layers of filter sizes of 3 * 3 pixel resolution and
32 filters were used; with in experiment 2, the quantity was increased from 32 to 64
for three separate convolutional layer with about the same sampling frequency of
3 * 3 pixel values; and in the experiment 3, 128 filters with filter sizes of 3 * 3 pixels
were tried to apply. Table 3 shows the filter size. The runtime was also affected by the
filter size with number of filters. Table 3 shows that the model has greater accuracy
when the filter size is 128.
Table 3 Filter size

Model Dense layer Filter size Accuracy
CNN model 3 32 0.54
CNN model (Adam) 4 64 0.99
Transfer learning 1 3 128 0.45
VGG16 3 32 0.57
VGG19 3 32 0.70
MobileNet 3 32 0.65
Inception 1 32 0.27
Xception 1 32 0.24
4.2.4 Effect of Number of Epochs
Each outcome component in the neural network’s hidden layers has a variable
distance measure. We attempt to create them take the characteristics of the data
since they are adaptable. The concealed element’s borders are made up of a variety
of characters. As a result, we modify the masses of all these concealed component
lines to vary the form of the border. Figure 14 and Table 4 show the accuracy of the
number of epochs. The number of epochs determines how often times the network’s
parameters will indeed be changed. As that of the quantity of epochs grows, so do
the bunch of times the neural network’s parameters are altered, and also the border
shifts between minimizing the error to optimum to curse of dimensionality. In this
experiment when the number of epochs is 30 the accuracy of the model increased to
0.99.
Fig. 14 Number of epochs

Table 4 Number of epochs

Model Dense layer Epochs Accuracy
CNN model 3 20 0.54
CNN model (Adam) 4 30 0.99
Transfer learning 1 3 20 0.45
VGG16 3 25 0.57
VGG19 3 25 0.70
MobileNet 3 20 0.65
Inception 1 100 0.27
Xception 1 100 0.24
4.2.5 Effect of Learning Rate
The sampling frequency, often known as the learning rate, is the quantity by which
the parameters are changed throughout learning. Figure 15 shows the learning rates.
The learning rate is a customizable parameter that seems to have a modest particular
benefit, usually within 0.0 and 1.0, being used in the application of neural networks.
The learning rate is a parameter that determines when rapidly the system adapts to the
challenge. Considering the minor improvements to the parameters during iteration,
lesser learning rates necessitate greater training epochs, however bigger learning
rates necessitate smaller training epochs. A high learning rate helps the network to
estimate more quickly, but at the price of a sub-optimal ultimate deep network. A
slower learning rate might expect the system to acquire a somewhat more optimum
or indeed completely optimum weight matrix, but training will take considerably
longer.
Fig. 15 Learning rates

Table 5 Comparison of accuracy

Model Dense layer Filter size Epochs Accuracy
CNN model 3 32 20 0.54
Proposed model 4 64 30 0.99
Transfer learning 1 3 128 20 0.45
VGG16 3 32 25 0.57
VGG19 3 32 24 0.70
MobileNet 3 32 20 0.65
Inception 1 32 100 0.27
Xception 1 32 100 0.24
Fig. 16 Bar chart representing accuracy scores
When contrasted to its layers, CNNs with minimum layers have low installation
needs and quicker training periods. Table 5 shows the comparison of accuracy. Short
recovery durations enable more parameters to be tested and make the entire devel-
opment transition easier. Reduced computational needs also allow for higher image
quality. The best model is the one which is obtained by using the adam optimizer
then it obtained the accuracy of 0.99. Figure 16 shows the bar chart representing
accuracy scores.
5 Conclusion
The study develops a deep convolutional neural network for sapodilla recognition and
categorization. The study describes a technology that performs automated sapodilla
species detection. Mostly on data, the CNN approach performs really well. The
technique may have been used to training a large range of sapodilla in the next level
of applications. It may also look at the effects of other variables such as Optimizers,
Epochs, dense layers, learning rates and pooling function. We additionally ran several
quantitative tests by using Keras library to categorise the photos based on their
content. Only with aid of the suggested convolution neural network, the provided
proposed method can simplify the process of the neural network in categorising the
kind of sapodilla, minimizing administrative mistakes in sapodilla classification. The
suggested Convolution layer has a 99% accuracy rate.
References
1. ABARE. (2015). Australian vegetable growing farms: An economic survey, 2013–14 and 2014–
15. Australian Bureau of Agricultural and Resource Economics (ABARE), Canberra, Australia.
Research report.
2. Abualigah, L., Al-Okbi, N. K., Elaziz, M. A., & Houssein, E. H. (2022). Boosting marine
predators algorithm by salp swarm algorithm for multilevel thresholding image segmentation.
Multimedia Tools and Applications, 81(12), 16707–16742.
3. Palakodati, S. S. S., Chirra, V. R., Dasari, Y., & Bulla, S. (2020). Fresh and rotten fruits
classification using CNN and transfer learning. Revue d’Intelligence Artificielle, 34(5), 617–
622. https://doi.org/10.18280/ria.340512
4. Sakib, S., Ashrafi, Z., & Siddique, M. A. (2019). Implementation of fruits recognition classifier
using convolutional neural network algorithm for observation of accuracies for various hidden
layers. ArXiv, abs/1904.00783.
5. Mettleq, A. S. A., Dheir, I. M., Elsharif, A. A., & Abu-Naser, S. S. (2020). Mango classification
using deep learning. International Journal of Academic Engineering Research (IJAER), 3(12),
22–29.
6. Rojas-Arandra, J. L., Nunez-Varela, J.I., Cuevas-Tello, J.C., & Rangel-Ramirez, G. (2020)
Fruit classification for retail stores using deep learning. In Proceedings of pattern recognition
12th mexican conference, Morelia, Mexico (pp. 3–13).
7. Risdin, F., Mondal, P., & Hassan, K. M. (2020). Convolutional neural networks (CNN) for
detecting fruit information using machine learning techniques.
113609.
10, 16150–16177.
14. Álvarez-Canchila, O. I., Arroyo-Pérez, D. E., Patino-Saucedo, A., González, H. R., & Patiño-
Vanegas, A. (2020). Colombian fruit and vegetables recognition using convolutional neural
networks and transfer learning.
15. Otair, M., Abualigah, L., & Qawaqzeh, M. K. (2022). Improved near-lossless technique using
the Huffman coding for enhancing the quality of image compression. Multimedia Tools and
Applications, 1–21.
16. Liu, Q., Li, N., Jia, H., Qi, Q., & Abualigah, L. (2022). Modified remora optimization algorithm
for global optimization and multilevel thresholding image segmentation. Mathematics, 10(7),
1014.
17. Lin, S., Jia, H., Abualigah, L., & Altalhi, M. (2021). Enhanced slime mould algorithm for
multilevel thresholding image segmentation using entropy measures. Entropy, 23(12), 1700.
18. Ciresan, D. C.,Meier, U.,Masci, J., Gambardella, L. M., & Schmid-Huber, J. (2011). Flexible,
high performance convolutional neural networks for image classification. In Proceedings of the
twenty-second international joint conference on artificial intelligence—Volume Two, IJCAI’11
(pp. 1237–1242). AAAI Press.
19. Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. CoRR
abs/1507.06228.
Comparison of Pre-trained
and Convolutional Neural Networks
for Classification of Jackfruit Artocarpus
integer and Artocarpus heterophyllus
Song-Quan Ong, Gomesh Nair, Ragheed Duraid Al Dabbagh,

Nur Farihah Aminuddin, Putra Sumari, Laith Abualigah, Heming Jia,
Shubham Mahajan, Abdelazim G. Hussien,
and Diaa Salama Abd Elminaam
Abstract Cempedak (Artocarpus heterophyllus) and nangka (Artocarpus integer)

are highly similar in their external appearance and are difficult to recognize visually
by a human. It is also common to name both jackfruits. Computer vision and deep
convolutional neural networks (DCNN) can provide an excellent solution to recog-
nize the fruits. Although several studies have demonstrated the application of DCNN
and transfer learning on fruits recognition system, previous studies did not solve two
crucial problems; classification of fruit until species level, and comparison of pre-
trained CNN in transfer learning. In this study, we aim to construct a recognition
system for cempedak and nangka, and compare the performance of proposed DCNN
S.-Q. Ong · G. Nair · R. D. A. Dabbagh · N. F. Aminuddin · P. Sumari · L. Abualigah (B)

Malaysia
L. Abualigah
H. Jia
School of Information Engineering, Sanming University, Sanming 365004, China
S. Mahajan
School of Electronics and Communication, Shri Mata Vaishno Devi University, Katra, Jammu and
Kashmir 182320, India
A. G. Hussien
Department of Computer and Information Science, Linköping University, 581 83 Linköping,
Sweden
Faculty of Science, Fayoum University, Faiyum 63514, Egypt
D. S. A. Elminaam
Information Systems Department, Faculty of Computers and Artificial Intelligence, Benha
University, Benha 12311, Egypt
Computer Science Department, Faculty of Computer Science, Misr International University,
Cairo 12585, Egypt
https://doi.org/10.1007/978-3-031-17576-3_6
130 S.-Q. Ong et al.
architecture and transfer learning by five pre-trained CNNs. We also compared the
performance of optimizers and three levels of epoch on the performance of the model.
In general, transfer learning with a pre-trained VGG16 neural network provides
higher performance for the dataset; the dataset performed better with an optimizer
of SGD, compared with ADAM.
Keywords Cempedak · Nangka · Deep learning · Computer vision · Optimization
1 Introduction
The “Nangka” (Artocarpus heterophyllus) and “Cempedak” (Artocarpus integer)

as shown in Fig. 1 are tropical fruits common in Southeast Asia. In fact, it is also
common for people to unwittingly name both as jackfruit. Both fruits belong to
the genus Artocarpus, which shows its characteristics in irregular oval and slightly
curvy shape, in addition to its large size. Its skin is distinguished by its sharp thorns.
Sometimes these thorns do not appear to just point. The skin turns yellow when
it is ripe or old [1]. When looking at them from a distance, the distinction is very
difficult, and it may be easier to approach them, look closely and give the matter
close attention. However, the outward appearance of the fruit makes for a distinct
challenge [2].
In many aspects, cempedak is similar to nangka; however, cempedak is smaller
and has a thinner peduncle. Nangka may range in size from 8 inches to 3 ft. (20–
90 cm) long and 6 to 20 in. (15–50 cm) broad, with weights ranging from 10 to 60
pounds or even more (4.5–20 or 50 kg). When mature, the ‘skin,’ or exterior of the
compound or bundled fruit, is green or yellow, with many hard, conical points linked
to a thick, rubbery light yellow or white wall [2]. Cempedak range in size from 10 to
15 cm wide and 20 to 35 cm long, and can be cylindrical or oval. The thin, leathery
skin is greenish, yellowish, or brown in hue, and has pentagons with elevated bumps
or flattened eye sides [3, 4]. Odour identification and texture of fruit bundles are
the most common approach to distinguish between cempedak and nangka, in which
cempedak usually exhibit stronger smell and softer texture.
Due to the similarities between classes and inconsistent features within the
cultivar, fruit and vegetable classification presents significant problems [3, 5]. It
is common to mistaken cempedak with nangka and vice versa based on the size and
sometimes the fragrance, however, in the naked eye, it is often deceiving to notice
these fruits. Though this may not be a huge issue the idea to distinguish between
both using DCNN and transfer learning algorithm is proposed in this report.
Methods for quality assessment and automated harvesting of fruits and vegetables
have been created, but the latest technologies have been created for limited classes
and small data sets. Often the application of DCNN will need a different algorithm to
train the model of best fit, but there is no result so far for the accuracy to distinguish
between cempedak and nangka.
Comparison of Pre-trained and Convolutional … 131
Fig. 1 Sample of images of “Nangka” (Artocarpus heterophyllus) and “Cempedak” (Artocarpus

integer)
This research aims to utilize multimodal information retrieval to determine the

cempedak and nangka fruit accurately hence the objectives of the research are:
(a) To construct a DCNN classification system for cempedak (Artocarpus integer)
and nangka (Artocarpus heterophyllus)
(b) To compare the performance of customized DCNN and transfer learning algo-
rithm with pre-trained CNN of Xception, VGG16, VGG19, ResNet50 and
InceptionV3.
2 Literature Review
Due to the similarities between classes and inconsistent features within the cultivar,
fruit and vegetable classification presents significant problems [6–8]. Due to the wide
diversity of each type, the selection of appropriate data collection sensors and feature
representation methods is particularly critical [9–12]. Methods for quality assessment
and automated harvesting of fruits and vegetables have been created, but the latest
technologies have been created for limited classes and small data sets. The problem
is multidimensional, with many hyper-dimensional properties, which is one of the
fundamental problems in current machine learning techniques [13]. The authors of
this study concluded that machine vision methods are ineffective when dealing with
multi-characteristic, hyperdimensional data for classification. Fruits and vegetables
are divided into several groups, each of which has its own set of characteristics.
Due to the paucity of basic data sets, specific classification methods are limited.
The majority of trials are either restricted in terms of categories or data set size. The
present study into building a pre-trained CNN is a step toward creating the capacity to
supply turnkey computer vision components. These pre-trained CNNs, on the other
hand, are data-driven, and there is a scarcity of huge datasets of fruits and vegetables
[13].
Rahnemoonfar and Sheppard [14] utilise a deep neural network to apply to robotic
agriculture in this article (DNN). This study focuses on tomato pictures found on
the internet. They utilised an Inception-ResNet architecture that had been tweaked.
A variety of training data was used to train the model (under the shade, surrounded
by leaves, surrounded by branches, the overlap between fruits). Their search results
revealed an average test accuracy of 93% on synthetic pictures and 91% on actual
photos. In this study, researchers used CNN to create a model that can notify a
driver of a car when he or she is sleepy. To extract features and apply them in the
learning phase, the deep convolution network was created. The CNN classifier uses
the SoftMax layer to determine whether a driver is sleeping or not. For this research,
the Viola-Jones face detection method was adapted. The eye area is removed from the
face when it has been discovered. The suggested Staked Deep CNN overcomes the
drawbacks of standard CNN, such as location accuracy in regression. The suggested
model has a 96.42% accuracy rate. The researchers suggest that transfer learning can
be used in the future to improve the performance of the model [15]. Based on four
different varieties of fruits, this research article provides a method for recognising the
kind of fruit (litchi, apple, grape and lemon) [16]. Smartphones were used to capture
the photos, which were then processed using a contemporary detection framework.
Because the model is trained using a new data set of 2403 data from four different fruit
classes, CNN is utilised to train it. The model’s total performance was outstanding,
with a precision of 99.89%. Where CNN was successful in identifying the sort of
fruit. The researchers plan to use the algorithm to detect a variety of fruits in the
future. Some other optimization methods can be used to optimize the problems as
given in [17–22].
3 Methodology
3.1 Dataset
The fruit dataset was shot with a digital single-lens reflex (DSLR) camera (Canon
7D, ∅22.3 × 14.9 mm CMOS sensor, RGB Color Filter Array, 18 million effective
pixels). The data are two classes as follows: cempedak (Artocarpus integer) and
nangka (Artocarpus heterophyllus) with a total of 1000 images (each class consists
of 500 images) with a resolution of 4608 × 3456 pixels. For the training purpose of
the network, a sub-sampling of a factor of 72 was performed on the entire data set
producing images of 48 × 64 pixels. The images were collected with three spectrums
of lights: green, red, blue (by introducing an external gel filter on the flashlight) and
white light. This is aim to have a dataset that could represent high variability in
position and number of fruits devising a real scenario.
3.2 Data Preprocessing and Partition
The entire dataset of images is reshaped to 224 × 224 × 3 and converted into a
NumPy array for faster convolution in the case of building the CNN model. The
converted dataset of images is labelled according to the two classes, and training of
the dataset was conducted with the random image augmentation is applied, validation
is done in parallel while training and tested upon the test set. Data partitioning was
performed by splitting the data into training and test sets, as illustrated in Fig. 2.
3.3 Convolutional Neural Networks
For this research, the DCNN model for classifying the cempedak (Artocarpus integer)
and nangka (Artocarpus heterophyllus) is shown in Fig. 3. It consists of 15 convolu-
tional layers/blocks of deep learning. The first convolution layer uses 16 convolution
filters with a filter size of 3 × 3, kernel regularizer, and bias regularizer of 0.05. It also
uses random_uniform, which is a kernel initializer. It is used to initialize the neural
network with some weights and then update them to better values for every iteration.
Random_uniform is an initializer that generates tensors with a uniform distribution.
134
Test set
Model
(30%) Adjustment
(300 images)
Dataset Train set Trained Tested Model

(Jackfruits) (70%) Model model Deployment
(700 images)
Fig. 2 Data splitting and process to be used for training and testing
S.-Q. Ong et al.
Its minimum value is -0.05 and the maximum value of 0.05. Regularizer is used to add
penalties on the layer while optimizing. These penalties are used in the loss function
in which the network optimizes. No padding is used so the input and output tensors
are of the same shape. The input image size is 224 × 224 × 3. Then before giving
output tensor to max-pooling layer batch normalization is applied at each convolu-
tion layer which ensures that the mean activation is nearer to zero and the activation
standard deviation nearer to 1. After normalizing RELU an activation function is
used at every convolution. The rectified linear activation function (RELU) is a linear
function. It will output the input when the output is positive, else it outputs zero. The
output of each convolutional layer given as input to the max-pooling layer with the
pool size of 2 × 2. This layer reduces the number of parameters by down-sampling.
Thus, it reduces the amount of memory and time required for computation. So, this
layer aggregates only the required features for the classification. The finally a dropout
of 0.5 is used for faster computation at each convolution. The 2nd convolution layer
uses 16 convolution filters with 5 × 5 kernel size and the third convolution layer use
16 convolution filters with 7 × 7 kernel size. Finally, we use a fully connected layer.
Here dense layer is used. Before using dense we have to flatten the feature map of
the last convolution. In our model, the loss function used is categorical cross-entropy
and we compare the performance of the optimizers of Adam and SGD with three
levels of epochs (25, 50 and 75), and with a learning rate of 0.001.
Customization of deep convolutional neural network models may take a longer time
to train on the datasets. Transfer learning consists of taking features that have been
learned on one problem of a dataset and leveraging them on a new and similar
problem. In this study, the workflow of the proposed model construction was first,
take layers from a previously trained model (VGG16, VGG19, Xception, ResNet50,
InceptionV3) and freeze them, to avoid destroying any of the information they contain
during future training rounds. Next with the addition of new and trainable layers on
top of the frozen layers. The layers of the architecture then learn to turn the old features
into predictions on a new dataset. Here we were comparing five transfer learning
models—VGG16, VGG19, Xception, ResNet50, InceptionV3 with the proposed
CNN model.
3.4.1 VGG16
VGG16 was developed by Simonyan and Zisserman for ILSVRC 2014 competition.
It consists of 16 convolutional layers with only 3 × 3 kernels. The design opted
by authors is similar to Alexnet i.e., increase the number of the features map or
convolution as the depth of the network increases. The network comprises of 138
million parameters. In our model, this architecture is modified at the last FC layer
Fig. 3 Proposed DCNN architecture
with 1000 classes. We replaced the 1000 classes with our number of classes i.e., 6.
Adam Optimizer is used and accuracy is obtained. Similarly, by pushing the depth
to 19 layers vgg19 architecture is defined. As stated above we changed the number
of output classes to 6 in the last layer.
3.4.2 VGG19
The VGG19 is an upgrade to the VGG16 model. VGG19 enhances VGG16 archi-
tecture by eliminating AlexNet’s flaws and increasing system accuracy [3]. It is a
19-layer convolutional neural network model and is constructed by stacking convo-
lutions together, however, the depth of the model is limited due to a phenomenon
known as diminishing gradient. Deep convolutional networks are difficult to train
because of this issue.
3.4.3 ResNet50
ResNet stands for Residual Network in short. ResNet-50 is a Convolutional Neural

Network-based Deep Learning model for image categorization that has been pre-
trained. Many other image recognition tasks have benefited substantially from very
deep models as well [5]. ResNet-50 is a 50-layer neural network that was trained
on a million photos from the ImageNet database in 1000 categories. In addition,
the model comprises approximately 23 million trainable parameters, indicating a
deep architecture that improves image identification. When compared to building
a model from scratch, where usually a large amount of data must be collected and
trained, using a pre-trained model is a highly effective option. ResNet-50 is a helpful
tool to know because of its high generalisation performance and low error rates on
recognition tasks.
3.4.4 Inception V3
Inception-v3 is a 48-layer deep pre-trained convolutional neural network model.

It’s a version of the network that’s already been trained on over a million photos
from ImageNet. It’s the third version of Google’s Inception CNN model, which was
first proposed during the ImageNet Recognition Challenge. Inception V3 is capable
of categorising photos into 1000 different object types. As a result, the network has
learned a variety of rich feature representations for a variety of images. The network’s
picture input size is 299 by 299 pixels. In the first stage, the model extracts generic
features from input photos and then classifies them using those features in the second
portion. On the ImageNet dataset, Inception v3 has been demonstrated to achieve
better than 78.1% accuracy and roughly 93.9% accuracy in the top 5 results.
3.4.5 Xception
Xception stands for “Extreme Inception”. This architecture was proposed by Google.
It consists of the same number of parameters that are used in Inception V3. The
efficient usage of parameters in the model and increased capacity are the reasons
for the performance increase in Xception. The output maps in inception architecture
consist of cross-channel and spatial correlation mappings. These types of mappings

were completely decoupled in Xception architecture [23]. 36 convolutional layers
of the architecture were used in feature extraction in the network. These 36 layers
are divided into 14 modules. For each module, it is surrounded by linear residual
connections. The last and first modules do not have these kinds of representations.
In the last FC layer, the number of classes is replaced with 6.
4 Result and Discussion
The dataset has been processed and analysed using various analysis method. With
higher trainable weights for a customised build of the proposed DCNN modal, the
training takes a longer time. Based on the data in Table 1, it shows that the proposed
DCNN architecture able to provide an accuracy of 0.89 to 0.9367. The graph to
represent the comparison between the proposed method (highlighted in Yellow) and
other models are shown in Figs. 4 and 5 respectively. Out of all, the accuracy of the
VGG16 and the SGD is the highest. While SDG is the highest, VGG16 provide more
stable and consistence performance throughout the epoch and it is evident as shown
in Fig. 6. Overall, it shows that the higher the epoch, the higher accuracy.
Table 1 Accuracy of the proposed DCNN and transfer learning models with optimizers of Adam
or SGD at three level of epochs
Optimizer Adam SGD
Epochs 25 50 75 25 50 75
Proposed model 0.8933 0.9267 0.9100 0.9233 0.9267 0.9367
Xception 0.8200 0.8800 0.9000 0.9000 0.9167 0.9000
VGG16 0.4733 0.8667 0.8700 0.6000 0.9567 0.9633
VGG19 0.7967 0.8567 0.8800 0.8800 0.8800 0.8800
ResNet50 0.6800 0.7200 0.7500 0.7933 0.6900 0.8000
InceptionV3 0.8800 0.8900 0.9167 0.9133 0.9000 0.9167
Adam Optimizer
0.9267 0.89
0.8567
1 0.8933 0.91 0.88 0.9 0.88 0.88 0.9167
0.8667 0.87 0.72
0.82 0.7967
0.75
Accuracy
0.8 0.68
0.6 0.4733
0.4
Proposed model Xception VGG16 VGG19 ResNet50 InceptionV3
Epochs 25 50 75
Fig. 4 Accuracy of the model for Adam optimizer at three levels of epochs
0.9267 0.9167 SGD Optimizer

0.9367 0.9
0.9233 0.9 0.9567 0.9633 0.88
1 0.9 0.88 0.88 0.9133 0.9167
0.7933 0.8
Accuracy
0.8 0.69
0.6
0.6
0.4
Proposed model Xception VGG16 VGG19 ResNet50 InceptionV3
Epochs 25 50 75
Fig. 5 Accuracy of the model for SGD optimizer at three levels of epochs
Fig. 6 Performance of model on train and test set by using Adam or SGD optimizer at three levels
of epochs
5 Conclusion
Cempedak (Artocarpus heterophyllus) and nangka (Artocarpus integer) are highly

similar in their external appearance and are difficult to recognize visually by a
human and due to the similarities between classes and inconsistent features within
the cultivar, fruit and vegetable classification presents significant problems). Due to
these two classes as follows the cempedak and nangka images with a total amount of
500 each and a resolution of 4608 × 3456 pixels was generated and for the training
purpose of the network, a sub-sampling of a factor of 72 was performed on the entire
data set producing images of 48 × 64 pixels. Based on the experiment conducted,
the dataset has been processed and analysed using various CNN methods. Based on
methodology imposed in the proposed method, it shows that the proposed DCNN
architecture are able to provide an accuracy of 89–93.67%. While SDG is the highest,
VGG16 provide more stable and consistence performance throughout the epoch and
it is evident as shown in Fig. 6. Overall, it shows that the higher the epoch, the higher
accuracy.
References
1. Grimm, J. E., & Steinhaus, M. (2020). Characterization of the major odorants in Cempedak—
Differences to jackfruit. Journal of Agricultural and Food Chemistry, 68(1), 258–266.
2. Balamaze, J., Muyonga, J. H., & Byaruhanga, Y. B. (2019). Physico-chemical characteristics of
selected jackfruit (Artocarpus Heterophyllus Lam) varieties. Journal of Food Research, 8(4),
11.
3. Shaha, M., & Pawar, M. (2018). Transfer learning for image classification. In 2018 Second
international conference on electronics, communication and aerospace technology (ICECA)
(pp. 656–660). https://doi.org/10.1109/ICECA.2018.8474802
4. Wang, M. M. H., Gardner, E. M., Chung, R. C. K., Chew, M. Y., Milan, A. R., Pereira, J.
T., & Zerega, N. J. C. (2018). Origin and diversity of an underutilized fruit tree crop, cempedak
(Artocarpus integer, Moraceae). American Journal of Botany, 105(5), 898–914.
5. Sharma, N., Jain, V., & Mishra, A. (2018). An analysis of convolutional neural networks
for image classification. In International conference on computational intelligence and data
science (ICCIDS 2018); Procedia Computer Science, 132, 377–384. ISSN 1877-0509. https://
doi.org/10.1016/j.procs.2018.05.198
6. Alhaj, Y. A., Dahou, A., Al-qaness, M. A., Abualigah, L., Abbasi, A. A., Almaweri, N. A. O.,
on Twitter sentiment analysis: Architecture, classifications, and challenges. In Deep learning
L., & Al-qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
12. Abualigah, L. M. Q. (2019). Feature selection and enhanced krill herd algorithm for text
document clustering (pp. 1–165). Springer.
13. Hameed, K., Chai, D., & Rassau, A. (2018). A comprehensive review of fruit and vegetable
classification techniques. Image and Vision Computing, 80(September), 24–44.
14. Rahnemoonfar, M., & Sheppard, C. (2017). Deep count: Fruit counting based on deep simulated
learning. Sensors (Switzerland), 17(4), 1–12.
15. Reddy Chirra, V. R., Uyyala, S. R., & Kishore Kolli, V. K. (2019). Deep CNN: A machine
learning approach for driver drowsiness detection based on eye state. Revue d’Intelligence
Artificielle, 33(6), 461–466.
Engineering, 22(2), 1–13.
113609.
10, 16150–16177.
23. Chollet, F. (2021). Xception: Deep learning with depthwise separable convolutions. [online]
arXiv.org. https://arxiv.org/abs/1610.02357v3. Accessed May 30, 2021.
Markisa/Passion Fruit Image
Classification Based Improved Deep
Learning Approach Using Transfer
Learning
Ahmed Abdo, Chin Jun Hong, Lee Meng Kuan, Maisarah Mohamed Pauzi,
Putra Sumari, Laith Abualigah, Raed Abu Zitar, and Diego Oliva
Abstract Fruit recognition becomes more and more important in the agricultural
industry. Traditionally, we need to manually identify and label all the fruits in the
production line, which is labor intensive, error-prone, and ineffective. Therefore, a lot
of fruit recognition systems are created to automate the process, but fruit recognition
system for Malaysia local fruit is limited. Thus, this project will focus on classifying
one of the Malaysia local fruits which is markisa/passion fruit. We proposed two
CNN models for markisa classification. The performances of the proposed models
are evaluated on our own dataset collection and produces an accuracy of 97% and
65% respectively. The results indicated that the architecture of CNN model is very
important because different architecture can produce different results. Therefore,
first CNN model is selected because it can classify 4 types of markisa with a higher
accuracy. In the proposed work, we also inspected two transfer learning methods in
the classification of markisa which are VGG-16 and InceptionV3. The results showed
that the performance of the first proposed CNN model outperforms VGG-16 (95%
accuracy) and InceptionV3 (65% accuracy).
Keywords Markisa · Passion fruit · Convolutional neural network · Deep

learning · Transfer learning · VGG-16 · InceptionV3
A. Abdo · C. J. Hong · L. M. Kuan · M. M. Pauzi · P. Sumari · L. Abualigah (B)

Malaysia
L. Abualigah
R. A. Zitar
Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, Abu Dhabi, United
Arab Emirates
D. Oliva
IN3—Computer Science Department, Universitat Oberta de Catalunya, Castelldefels, Spain
Depto. de Ciencias Computacionales, Universidad de Guadalajara, CUCEI, Guadalajara, Jalisco,
Mexico
https://doi.org/10.1007/978-3-031-17576-3_7
144 A. Abdo et al.
1 Introduction
The recent development in computer vision contributed to the advancement of

Machine Learning (ML), Neural Networks (NN), and Conventional Neural Networks
(CNN), improved image classification tasks’ efficiency. Detection of several distinct
varieties and the classification of Passiflora edulis [1, 2], commonly known as Passion
Fruit or (Markisa) as known in the Malay language, represents one of the significant
challenges in the fruits packaging and processing industry [3, 4]. Different colors,
sizes, shapes, and orientations caused by several cultivars of this fruit, as shown in
the Fig. 1, resulted in misclassification, affecting productivity and packaging quality.
Customarily this classification of several cultivars of Markisa is done manually
in production lines, in which a laborer manually sorts it into different processing
lines, making the entire process labor-intensive, time-consuming, error-prone, and
Fig. 1 Samples of several cultivars of Markisa

Markisa/Passion Fruit Image Classification Based Improved … 145
ineffective. Additionally, it increases the cost of production as hiring employees to do

manual work introduces wages cost and operation overheads, resulting in a decrease
in production output. Hence, we need an automated system that can reduce humans’
efforts, increase production, and reduce the cost and time of production [5, 6].
In this study, we proposed a CNN architecture to perform classification between
several cultivars of Markisa, notably the following cultivars of Markisa, Sweet
Passion Fruit (Markisa Manis), Yellow Passion Fruit (Markisa Kuning), Purple
Passion Fruit (Markisa Ungu), and Big Passion fruit (Markisa Besar).
The novelties of this paper are: (i) the end-to-end deep learning pipeline architec-
ture for Markisa classification. The paper is organized as follows: Section 2 provides
the literature survey for the proposed CNN architecture. Then, Sect. 3 represents
the details of the proposed CNN pipeline architecture for Markisa classification. As
for Sect. 4, the experiment overview is presented, listing the tools, parameters, and
criteria of the experiment and result. Finally, Sect. 5 represents our conclusion.
2 Literature Survey
In image object detection or classification, two approaches available are deep learning
or Convolutional Neural Network (CNN) and traditional Computer Vision (CV)
approach [7, 8]. The traditional CV algorithms for feature extraction include edge
detection, corner detection and threshold segmentation [9–12]. The deep learning
approach performs a better accuracy in image classification as compared to the tradi-
tional CV techniques [13]. Deep learning also offers less demand from the expert
to do the fine-tuning or features extraction, it can be done by the CNN with high
flexibility and re-trained to get the optimum result. Therefore, CNN or deep learning
are applied in many images classification, fruit classification which is one of the
classification tasks to help in the robotic harvesting system or checking the quality
of the fruit [14]. Risdin et al. [14] develop a CNN in fruit classification that achieves
a 98.99% accuracy better than the traditional machine learning techniques such as
SVM with only 87% accuracy [5]. Moreover, Palakodati et al. [15] develop a fresh and
rotten fruit classification CNN model and able to achieve accuracies up to 97.82%.
Among the best Transfer Learning model that have been experimented with fruits
and vegetable dataset is VGG16. A study by Kishore et al. [16] have proven that
by using VGG16 on dataset that consists of 4 classes (Banana, Tomato, Carrot, and
Potato) achieves about 97% accuracy [16]. Each class in the dataset consists of 600
images. Interestingly, the model was tested with different image sizes to prove that
VGG16 works well with smaller and noisier images. With the achieved accuracy, it
is no doubt that VGG16 is a good option to opt for the fruit or vegetable dataset.
There is also another study by Pardede et al. [17], that applying the VGG16
model on fruits dataset. The aim of the study is to build a deep learning model that
can detect the fruit ripeness, which is a bit different from the previous study but
have the same nature in the dataset. In that study, there are 8 classes of fruit (Ripe
Mango, Unripe Mango, Ripe Tomato, Unripe Tomato, Ripe Orange, Unripe Orange,
146 A. Abdo et al.
Ripe Apple, Unripe Apple). As the outcome, they achieved about 90% accuracy with
Dropout rate at 0.5. The study concluded that the best technique to reduce overfitting
in Transfer Learning is by using Dropout.
Inception-v3 is a convolutional neural network architecture named after the Incep-
tion movie directed by Christopher Nolan; the model is mainly used for image
analysis and object detection and was introduced during the ImageNet Recognition
Challenge held by Google [18]. A study published by Wikimedia Foundation [18].
Szegedy et al. [19] proposed the architecture of Inception-v3 and studied them in the
context of Inception architectures; the formal has proven to have “high-performance
vision networks that have a relatively modest computation cost compared to simpler,
more monolithic architectures”. In addition, the highest quality trained version
of Inception-v3 has “reached 21.2%, top-1 and 5.6% top-5 error for single crop
evaluation” compared to other CNN architectures at the time.
Another paper that Chunmian Lin et al. published has explored the application of
the Inception-v3 Model through transfer learning; the transfer learning-based model
“is retrained for 5000 epochs at different learning rates [20]. The accuracy test results
indicate that the transfer learning-based method is robust for traffic sign recognition,
with the best recognition accuracy of 99.18 % at the learning rate of 0.05”. Some other
optimization methods can be used to optimize the problems as given in [8, 21–25].
3 Proposed CNN Architecture for Passion Food

Recognition
3.1 The Proposed CNN Architectures
3.1.1 Model 1
The proposed architecture of the CNN model for Passion fruit classification can be
seen in Figs. 2 and 3. This CNN model consisted of 4 convolutions layers as illustrated
in Figs. 2 and 4 dense layers for the classifier of the neural network excluded the input
and output layer as shown in Fig. 3 [12, 26]. This model able to give testing accuracy
of 97% in classifying the 4 types of passion fruits. As seen in Fig. 2, after the input
of training data in the size of (224, 224) RGB color images, it is passed into the first
convolutional layer. The first convolutional layer is designed with 64 convolutional
filters in the size of (3, 3); the stride is (1, 1) when translating the filters on the input
images by one step, padding is set to the same which will provide the same output after
the first convolutional. Next, batch normalization is applied after the ReLu activation
function to give the output of the mean activation close to zero and the standard
deviation close to 1 [3]. The result will then pass to the max-pooling layer of size (2,
2) to reduce the output size to simplify the model. Furthermore, the same number of
convolutional filters applied in the convolutional layer 2. However, the filter size is
increased to (5, 5) with no padding and the same batch normalization is applied to
the ReLu activation function before the max-pooling layer of size (2, 2). The same
architecture used for the convolutional layer 2 is repeated on the convolutional layer 3
but increase the filter size to (7, 7), the same ReLu activation and batch normalization
is applied to prevent the overfitting of the model before the max-pooling layer of size
(2, 2). In the convolutional layer 4, the convolutional filters are reduced to only 16
filters with size (7, 7) and batch normalization is applied on the output without the
ReLu function and max-pooling layer. The output provided by the base model for
the proposed CNN will become the input to the neural network architecture after it
extracted the features in the input images.
After the convolutional layer extracted the features inside the fruit dataset, the
pixel value is flattening out before input to the neural network as seen in Fig. 3.
3 dense layers inside the neural network without including the input and output
layers. The first dense layer is constructed by 512 numbers of nodes and the L2
regularization or Ridge Regularization with both lambda and bias of 0.01 are used
to add penalties on the weights to create a simpler model and prevent overfitting [4].
The dropout rate of 0.25 for faster computation by ignoring 25% of the neurons when
training to avoid overfitting. Therefore, two regularization techniques which are the
L2 regularization and dropout regularization of 0.25 are used in the first dense layer
due to large training neurons. The ReLu activation function is used for the first dense
layers and input to the second dense layer with only 64 numbers of neuron and the
same dropout rate of 0.25 as in the first layer without L2 regularization. The output
is then passed to the ReLu activation function and the becoming the input of the last
dense layer before the output layer. The third layer of the neural network only has
32 nodes with no dropout rate and regularization. ReLu activation function is used
for the third dense layer and the Softmax activation function is used for the output
layers with 4 neurons to make the multiclass classification. The optimizer used in
the loss function to update the weight and bias in the neural network is categorical
cross-entropy due to the input is in one hot-encoded and Adagrad optimizer of 0.001
learning rate. The epoch is set to 30 and the batch size is set to 10. The metric used
to measure is the categorical accuracy for the multiclass classification.
3.1.2 Model 2
Figure 4 shows the second proposed CNN model architecture. This model has 6
convolution blocks, 2 pooling blocks, 2 fully connected layers and a SoftMax clas-
sifier. All input images are color images with sizes of 224 × 224 pixels, 3 channels.
All the convolution blocks have same filter sizes (3 × 3), and paddings are applied
to ensure the output images have same sizes as the inputs. However, different filter
number are used, 128 filters for convolution block 1, 96 filters for block 2, 64 filters
for block 3, 32 filters for blocks 4 and 5, and 12 filters for block 6. All convolution
blocks have same activation function which is RELU. Maximum pooling with sizes
of 2 × 2 is applied after convolution blocks 1 and 2 to reduce the sizes of the image
twice (from 224 × 224 to 56 × 56). After going through the convolution base, the
dimension of the images is 56 × 56 × 12. Then, the images are flattened to vectors
148 A. Abdo et al.
Fig. 2 CNN architecture for

model 1
Fig. 3 Classifier for the

proposed CNN architecture
for model 1
of size 37,632 before fitting into the fully connected layers. Both fully connected
layers have 1000 nodes, use RELU as the activation function. The only difference
is dropout rate of 0.3 is applied to the first layer. Then, the SoftMax classifier will
output the result, either the images are markisa besar, markisa kuning, markisa manis
or markisa ungu.
3.2 Transfer Learning Models
As part of this study, we also include transfer learning model. Due to some limitations
in our device and limited resources we are only able to compare between VGG16
and InceptionV3 model. In both models, we freeze the base convolutional layers and
150 A. Abdo et al.
Fig. 4 Second CNN model
remove the flatten layer and its classifier. However, we maintained the weights by
using the ‘imagenet’ option. Then, we replaced it to suit our dataset which contained
4 classes. We are using ReLu as the dense layer activation function and softmax
as the output layer activation function and implemented early stopping to reduce
the training time. Whenever the model reaches 99% accuracy, we stop the model
training. This will also reduce the possibility of overfitting.
3.2.1 VGG16
VGG16 is convolutional neural network that was developed by Karen Simonyan and
Andrew Zisserman from Oxford University in 2014 [27]. This model contained 16
layers and achieves 92.7% top-5 test accuracy on ImageNet dataset which contains
14 million images belonging to 1000 classes. Figure 5 shows the architecture of
VGG16.
Fig. 5 VGG-16 architecture
3.2.2 InceptionV3.
The second transfer learning model is sequentially concatenated based on the

Inception-v3 model. The model consists of Low-level feature mappings learned by
basic convolutional operation with (1 × 1) and (3 × 3) kernels. In addition, multi-
scale feature representations are concatenated to feed into auxiliary classifiers with
diverse convolution kernels (i.e., 1 × 1, 1 × 3, 3 × 1, 3 × 3, 5 × 5, 1 × 7, and 7 × 7
filters), which is used to produce better convergence performance. In the experiment
section, we configured the fully connected layer with one Dense layer and a dropout
layer, followed by another experiment with two Dense layers and two dropout layers.
Finally, a Sigmoid classifier is used to produce a one-hot vector, consistent with four-
class probability. Finally, a classification result can be determined depending on the
maximum value of the four-class probability. Figure 6 shows the architecture of
InceptionV3.
3.3 Dataset
There are many types of Markisa/passion fruits. In our dataset, we include 4 different
types of this fruit which are Markisa Besar (Giant Passion Fruit), Markisa Kuning
Fig. 6 InceptionV3 architecture

152 A. Abdo et al.
Fig. 7 Examples of images
(Yellow Passion Fruit), Markisa Manis (Sweet Passion Fruit), and Markisa Ungu
(Purple Passion Fruit). We divided the dataset into 80% training, 10% validation and
10% for testing. Figure 7 shows the examples of images in our dataset.
3.4 Augmentation
We also apply some image augmentation by rotating the images to certain degree.
Table 1 shows the rotation for one sample of the image in Markisa Besar class.
Table 1 Image augmentation

Degree of rotation Image
of Markisa Besar
0° (original image)
180°
90° anticlockwise
275°
154 A. Abdo et al.
Table 2 Parameters tuning

Parameter Options
options for proposed CNN
model Optimizer Adam, SDG, Adagrad, RMSprop
Dense layer –
Learning rate 0.1, 0.01, 0.001, 0.0001
Epoch 10, 30, 50, 70
Filter –
Batch size 10, 20, 30, 40
Table 3 Parameters tuning

Parameter Options
options for transfer learning
No of neurons in single dense layer 512, 1024
Type of optimizer Adam, SGD
Epochs 10, 20
Dropout 0.1, 0.2
Learning rate 0.01, 0.001
Batch size 50, 100
The dataset used was quite balanced, which consisted of 250 Markisa Besar, 250
Markisa Kuning, 250 Markisa Manis, 250 Markisa Ungu and totaling 1000 images.
The size of the images has been standardized to 224 × 244 × 3. The programming
language used in this study is Python with Tensorflow and Keras library. To run the
codes, we use Google Colaboratory with GPU. However, the GPU runtime is limited,
and we are unable to use it extensively. Thus, we reduce our parameter tuning options
from 4 different values in each parameter into 2 different values only for transfer
learning section. Tables 2 and 3 show the parameters tuning options for proposed
CNN model and transfer learning, respectively.
4.2.1 Model 1
To obtain the 97% passion fruit classification model as seen in Figs. 2 and 3, the
model summary of the CNN architecture as seen in Fig. 8. We will first exper-
iment with different architectural designs and different hyperparameters tuning.
The first proposed model is seen in Fig. 9, used to experiment with different opti-
mizers, number of dense layers, different learning rates, number of epochs, number
of filters and lastly is the number of training batch sizes. The best model is the CNN
architecture as seen in Fig. 8 with 97% accuracy on the testing data.
The initial proposed CNN architecture as seen in Fig. 9 consisted of 4 convo-
lutional layers and 3 dense layers just like the best model as seen in Fig. 8. The
Fig. 8 The model summary for the best proposed CNN model with accuracy of 100%
156 A. Abdo et al.
Fig. 9 The first experimented model
first convolutional layer consists of 64 filters with size (3, 3), stride (1, 1) and zero-
padding that will output the same result just like the best model as seen in Figs. 2,3
and 8. However, the number of filters for the second and third convolutional layer is
only 16 instead of 64 as compared to the best model. The rest of the convolutional
architecture and the neural network dense layer architecture is the same as the best
model. However, the initial learning rate was set to 0.0001 and Adam optimizer is
used to get the categorical accuracy. In addition, the epoch was set to 30 and a batch
size of 10 for the training dataset. As a result, we can observe that the total parameter
for the best model is 17,725,876, larger than the initial model with 17,422,804 due
to fewer number of filters used in the CNN architecture as seen in Figs. 8 and 9.
(I) Effect of Optimizers

The initial model has experimented with different optimizers such as Adam, SDG,
Adagrad and RMSprop, respectively. The training accuracies and the validation accu-
racies can be seen in Fig. 10. Out of the 4 optimizers, we can see that the Adagrad
optimizer gives a stable validation accuracy with close to 90% accurate in detecting
the multiclass classification. RMSprop also shows a good validation accuracy near the
epoch 30 but the values are fluctuating as compared to Adagrad. The training accura-
cies and the validation accuracy for Adagrad is more stable as compared to the other
3 optimizers with no fluctuation in the validation accuracies. Both training and vali-
dation accuracies for the Adagrad are close to each other showing the model becomes
less overfitting. Figure 11 illustrated the testing evaluation accuracies comparison of
the horizontal bar graph for different optimizers. As a result, based on the perfor-
mance of validation accuracies, we can see that Adagrad optimizer performs better
and less overfit. The high validation accuracy also shows a high testing evaluation
accuracy as seen in Fig. 11. Therefore, we will now switch the Adam optimizer to
Adagrad optimizer and continue to tune the hyperparameter.
(II) Effect of Dense Layer

The initial model has 3 dense layers in the neural network classifier, it experiments
with the increase of one dense layer with 64 nodes and a dropout rate of 0.25, ReLu
Fig. 10 The effect of the optimizer on the training and validation accuracy against epoch
158 A. Abdo et al.
Fig. 11 The testing evaluation accuracies comparison with different optimizers
activation after the second dense layer as seen in Fig. 12. As a result, the experiment
will have up to 6 dense layers after the 3 times of adding new dense layers in the
classifier. The result shows that when more dense layers were added, the model will
learn slower as it required more epochs or episodes of training before it can predict
well. The validation accuracy will higher than the training accuracies after 4 dense
layers existed in the model as the complexity of the model increased. As a result, the
testing evaluation accuracies illustrated in Fig. 13 shows that model with dense layer
3 will have a higher testing evaluation accuracy with 0.96 than more dense layers in
the epoch size 30. Therefore, we will remain the dense layers with only 3, Adagrad
optimizer, a learning rate of 0.0001 and the training epoch of 30 with batch size of
10 as our current model.
(III) Effect of Learning Rate
Next, the model now is tested with different learning rates from 0.1, 0.01, 0.001 and
0.0001 as seen in Fig. 14. The result shows an obvious trend that when the learning
rate is high, the model will converge faster shows by the validation accuracy closed
to the training accuracy in less than 10 epochs. However, this is not stable as a high
learning rate means the model will update its weight and bias faster and might not
learn well after 10 epoch with fluctuation in the validation accuracies. The smaller
the learning rate, the model will learn slower and get better accuracies as illustrated
in Fig. 14. The testing evaluate accuracies increase when the learning rate become
smaller but only until 0.001. This is because the learning rate of 0.001 gives the
highest which is close to 1 or 0.99 as compared to a learning rate of 0.0001 with
testing evaluation accuracy of 0.96. The result could be explained because only 30
epoch is tested and the smaller learning rate might need a bigger epoch size to train
better. As a result, because of the low epoch for fast computation and high accuracy
generated, we will select the learning rate of 0.001 instead of the initial learning rate
Fig. 12 The effect of the number of epochs on the training and validation accuracy against epoch
Fig. 13 The testing

evaluation accuracies
comparison with different
number of epochs
160 A. Abdo et al.
of 0.0001 as set by the initial model. Although the validation accuracy for learning
rate of 0.0001 is slightly higher than 0.001 as seen in Fig. 15, but we will go for a
fast converge model. The current model is Adagrad optimizer, the learning rate of
0.001, 3 dense layers and epoch size of 30 with batch size of 10.
(IV) Effect of Number of Epochs
Furthermore, the model now is experimented with different epoch sizes from 10, 30,
50 to 70 as seen in Fig. 16. The first 10 epoch shows that the validation accuracy is
small and the model is overfitting as it has bad testing evaluate accuracy with only
0.39. When the epoch size getting bigger, the validation accuracy starts to converge
and become consistent even with further increase of the epoch size as seen in epoch
70. The model will start to have a consistent validation accuracy after the epoch of
20. The testing evaluation accuracies for different epoch sizes can be seen in Fig. 17.
The result shows that epoch sizes 30 and 50 yield the highest testing evaluation
accuracies of 0.99 as compared to epoch 70 with an accuracy of 0.95. Consequently,
we will pick epoch size of 30 for training the model as it required less computational
resource and yet give a good result on the validation accuracy. The current model is
Adagrad optimizer, a learning rate change from 0.0001 to 0.001, 3 dense layers and
epoch size of 30 with batch size of 10.
Fig. 14 The effect of learning rate on the training and validation accuracy against epoch
Fig. 15 The testing

learning rate
Fig. 16 The effect of learning rate on the training and validation accuracy against epoch
162 A. Abdo et al.
Fig. 17 The testing

learning rate
(V) Effect of Filter Number
The initial model has only the first convolutional layer of 64 filters number, the
second convolutional has 16 filters numbers and the third convolutional layer has
also 16 filters numbers. When the second and third layers of the convolutional layer
filters also change to 64, the result are shown in Fig. 18. The total number of filters
added is 48 if only a second convolutional is added else, total added filter is 90 for
both second and third layer of convolutional filter change to 64. When more filters
added, the model able to capture the image features better as seen in the validation
accuracies close to the training accuracies in the 48 and 90 filters added. On the
other hand, the testing evaluation accuracies show improvement after adding the 48
filters and 90 filters which both yield 100% accuracies from 99% as seen in Fig. 19.
Therefore, the number of filters for the first 3 convolutional layers will change to 64
filters and it is the best model’s architecture as illustrated in Figs. 2 and 3.
(VI) Effect of Batch Size
Since we already determine the best model’s architecture, now we will experiment
on how the training batch size will affect the performance of the model as seen in
Fig. 20. We can observe that when the batch size increase, the model will update
slower, and the model testing evaluation accuracies will become smaller as seen in
Fig. 21. The accuracies drop to 0.97 and 0.98 after the increase of the batch size.
This can be explained by the larger batch size will decrease the number of times the
parameters update. As a result, we will retain the batch size of 10 in the training of
the input dataset. The best model now is 10 epochs with batch size of 10, learning
rate of 0.001, 3 dense layers, 64 number of filters for all the first 3 convolutional
layers and Adagrad optimizer.
Fig. 18 The effect of number of filters on the training and validation accuracy against epoch
Fig. 19 The testing

number of filters
164 A. Abdo et al.
Fig. 20 The effect of number of filters on the training and validation accuracy against epoch
Fig. 21 The testing

number of filters
Figure 22 depicts the best model predicted accuracy for the testing dataset which
shows 97% accuracy in the passion fruit classification. 3 misclassifications happened
on the Markisa Besar and the model able to predict all the others classes correctly.
The best model shows a 100% accuracy during the testing evaluation accuracy but the
Fig. 22 The best model predicted accuracy for Model 1 is 97%
actual predicted accuracy on the input dataset is 97% with 3 images being misclassi-
fied out of the 100 images in the testing dataset. Therefore, it is believed that testing
accuracy can be increased my feeding the model on more Markisa Besar images with
different variety.
4.2.2 Model 2
Figure 23 shows the summary of the second CNN models. Based on the summary, we
need to train 38838816 parameters. We first trained the model with training data, then
performed hyperparameters tuning using validation data, finally test the accuracy of
the model using testing data. To obtain the best parameters for this model, we have
performed the hyperparameters tuning according to the setup mentioned in Table 2.
After performing the hyperparameters tuning, the model has test accuracy of 65%
with the following parameters:
. Optimizer = Adam
. Learning rate = 0.001
. Last filter numbers = 12
. Number of nodes in each dense layer = 1000
. Epochs = 50
. Batch size = 20.
The section below shows the effect of each hyperparameter on the model
performance.
166 A. Abdo et al.
Fig. 23 Summary of second CNN models
(I) Effect of Last Filter Size

To test the effect of last filter size of convolution base on the model performance, we
keep the other parameters as constant:
. Number of nodes = 2000
. optimizer = Adam
. epochs = 30
. batch size = 40.
The effect of the last filter size is shown in Fig. 24. Based on the result, we can
see that filter number of 12 has the highest validation accuracy which is 0.71. We
also can see that the model is overfitted with the training data because it can classify
the training images perfectly but the performance on the validation set is only 0.71
accuracy. In the next hyperparameter tuning, we will keep filter size to 12.
(II) Effect of Number of Nodes in Each Dense Layer

To test the effect of number of nodes in each dense layer on the model performance,
we keep the other parameters as constant:
Fig. 24 Effect of the last filter size on model performance
. Last filter size = 12

. optimizer = Adam
. epochs = 30
. batch size = 40.
The effect of the number of nodes in each dense layer is shown in Fig. 25. Based
on the result, we can see that 1000 nodes have the highest validation accuracy which
is 0.70. We also can see that the model is overfitted with the training data because it
can classify the training images perfectly but the performance on the validation set
is only 0.70 accuracy. In the next hyperparameter tuning, we will keep number of
nodes to 1000.
(III) Effect of Optimizers
To test the effect of optimizers on the model performance, we keep the other
parameters as constant:
. epochs = 30
. batch size = 40.
The effect of optimizers is shown in Fig. 26. Based on the result, we can see that
Adam optimizer has the highest validation accuracy which is 0.80. We also can see
that the model is overfitted with the training data because it can classify the training
images with 0.99 accuracy but the performance on the validation set is only 0.80
accuracy. In the next hyperparameter tuning, we will keep optimizer as Adam.
Fig. 25 Effect of the number of nodes on model performance

168 A. Abdo et al.
Fig. 26 Effect of optimizers on model performance
(IV) Effect of Batch Size

To test the effect of batch size on the model performance, we keep the other parameters
as constant:
. epochs = 30
. Optimizer = Adam.
The effect of batch size is shown in Fig. 27. Based on the result, we can see that
batch size of 20 has the highest validation accuracy which is 0.58. We also can see
that the model is overfitted with the training data because it can classify the training
images perfectly but the performance on the validation set is only 0.58 accuracy. In
the next hyperparameter tuning, we will keep batch size to 20.
(V) Effect of Epochs
To test the effect of epochs on the model performance, we keep the other parameters
as constant:
. Batch size = 20
. Optimizer = Adam.
The effect of epochs is shown in Fig. 28. Based on the result, we can see that 50
epochs have the highest validation accuracy which is 0.65. We also can see that the
Fig. 27 Effect of batch size on model performance

Fig. 28 Effect of epochs on model performance
model is overfitted with the training data because it can classify the training images
perfectly but the performance on the validation set is only 0.65 accuracy.
4.3 Performance of Proposed Transfer Learning Model
4.3.1 VGG16
In this experiment, we added only 1 dense layer after the flatten layer and 1 dropout
layer before the output layer while the base VGG16 convolutional layer are being
freeze. Figure 29 illustrate the model architecture.
As stated in the previous section, we come out with different parameter tuning
during the model training. From the parameters option, we have trained 64 models
with different combination of the parameters (refer in Appendix). As the comparison,
we selected the best model with different parameters. The best accuracy achieved
from the training is 0.97. Figure 30 shows the training and validation accuracy and
losses across different epochs before it stopped learning when 99% accuracy achieved
for the best model in this part.
(I) Effect on Optimizers
For optimizer, we chose Adam and SGD with same parameters. The comparison
between these 2 optimizers are as shown in Table 4.
As we can see, the Adam Optimizer perform better than SGD with same
parameters.
Fig. 29 VGG-16 model architecture

170 A. Abdo et al.
Fig. 30 Training/validation
accuracy and loss across
different epochs
Table 4 Comparison
Same parameters Optimizer Accuracy
between Adam and SDG
optimizers Number of neurons in dense layer: 512 Adam 0.97
Dropout: 0.2 SGD 0.75
Epochs: 20
Learning rate: 0.01
Batch size: 100
(II) Effect on Number of Neurons in Dense Layer

For number of neurons in dense layer, we try with 2 different values 512 or 1024.
The result is shown in Table 5.
From the result obtained, seems like the lower number of neurons perform better
than the higher number of neurons with accuracy difference 0.10.
(III) Effect on Dropout
Different rate of dropout used in this experiment are 0.1 and 0.2. The result is shown
in Table 6.
Table 5 Comparison
Same parameters No. of neurons in dense layer Accuracy
between different number of
neurons Optimizer: Adam 512 0.97
Dropout: 0.2 1024 0.87
Epochs: 20
Learning rate: 0.01
Batch size: 100
Table 6 Comparison
Same parameters Dropout Accuracy
between different dropout rate
Optimizer: Adam 0.2 0.97
No of neurons in dense layer: 512 0.1 0.81
Epochs: 20
Learning rate: 0.01
Batch size: 100
Table 7 Comparison
Same parameters Learning rate Accuracy
between different learning
rate Optimizer: Adam 0.01 0.97
No of neurons in Dense layer: 512 0.001 0.97
Epochs: 20
Dropout: 0.2
Batch size: 100
From the result, it is obvious that the higher dropout performs better than the
lower dropout rate.
(IV) Effect on Learning Rate
Besides rate of dropout, we also test on different learning rate, 0.01 or 0.001. The
result is shown in Table 7.
There is no effect in changing the learning rate as the result is same accuracy.
(V) Effect on Batch Size
We also tested on the effect of different batch size during the model training. The
result is shown in Table 8.
For batch size, it only has little difference in accuracy which is only 0.03. It can
be concluded that batch size does affect the performance a bit.
(VI) Effect on Epochs
Lastly, we try 2 different epochs, 10 and 20. The result is shown in Table 9.
Same as batch size, number of epochs just affect a little on the accuracy with 0.06
in difference.
As a summary from the experiment on VGG16 transfer learning, we need to
choose the best optimizer, dropout rate and number of neurons in the dense layer to
get the best model. However, different learning rate does not affect the performance
of the model while the batch size and number of epochs only affects a little on the
accuracy value.
Table 8 Comparison
Same parameters Batch size Accuracy
between different batch size
Optimizer: Adam 100 0.97
No of neurons in dense layer: 512 50 0.94
Epochs: 20
Dropout: 0.2
Learning rate: 0.01
172 A. Abdo et al.
Table 9 Comparison
Same parameters Epochs Accuracy
between different epochs
Optimizer: Adam 20 0.97
No of neurons in dense layer: 512 10 0.91
Batch size: 100
Dropout: 0.2
Learning rate: 0.01
Fig. 31 Confusion matrix

for VGG-16
After we get the best model with the best parameters, we apply the model by
classifying the testing dataset consists of 100 images with 25 images per label. And
the results turned out quite good as illustrated in Fig. 31.
Only 5 images from Markisa Manis that are misclassified as Markisa Kuning
with predicted accuracy of 95%. However, the rest are predicted correctly. Seems
like Markisa Kuning is a dominant label. Perhaps in future works, we can identify
why the other labels always misclassified as Markisa Kuning in this model although
the misclassified images are the minority.
4.3.2 InceptionV3
In this experiment, we imported InceptionV3’s base model and omitted its top layer
that consists of Dense layers and dropout layers. Then, the base model weights and
biases were frozen to preserve the learnable parameters from the previous training.
Next, a fully connected layer with one dense layer and a dropout layer configured
with the Relu activation function. Finally, the Sigmoid activation function is used
for the output layer with four classes representing the four cultivars of Markisa.
Figure 33 illustrates the model architecture. It’s imperative to mention that Dense
layers and two dropout layers are used for the experiment part. Figure 32 shows the
transfer learning through Inception-V3 model.
Fig. 32 Transfer learning through Inception-V3 model
Fig. 33 Best performing models
As stated in the previous section, we specified different parameters tuning options

for our transfer learning model. To expedite the time to experiment with different
combinations of parameters and automate the selection of model with the best accu-
racy, we conducted hyperparameter tuning with the HParams Dashboard [7]. Running
the latter resulted in 64 models or trails for the connected layer with one Dense layer
and additional 64 models or trails for the connected layer with two Dense layers.
Comparing the accuracy results for all the 128 models, we can conclude that the best
performing model has 93% accuracy (refer to Appendix).
(I) Effect on Optimizers
For the optimizer, we have limited our selection between Adam and SGD as
optimizers as shown in Fig. 34.
From Fig. 34, we can conclude that the Adam optimizer has resulted in overall
higher accuracy than the SDG optimizer in the one Dense layer experiment; it has
also contributed to the top-performing model when it comes to accuracy.
From Fig. 35, we can conclude that the Adam optimizer has also resulted in
overall higher accuracy than the SDG optimizer in the two Dense layers experiments;
the result is not as good as the experiment with the one Dense layer; however, it
contributed to the top-performing model when it comes to accuracy.
(II) Effect on Number of Neurons in the Dense Layer
For the number of neurons in the Dense layer, we experimented with two different
values, 512 and 1024.
From Fig. 36, we can conclude that the number of neurons hasn’t affected the
model accuracy with one Dense layer, as both configurations of 512 and 1024 neurons
have resulted in lower and higher accuracy; probably, other parameters have more
174 A. Abdo et al.
Fig. 34 HParams scatter plot matrix view for one dense layer optimizer
Fig. 35 HParams scatter plot matrix view for two dense layers optimizer
effect on the model’s accuracy. However, 512 neurons have contributed to the top-
performing model.
From Fig. 37, the results are like the one Dense layer experiment; however, 512
neurons have contributed to the top-performing and lowest-performing model.
(III) Effect on Dropout
For Dropout rates, we experimented with the values 0.1 and 0.2.
From Fig. 38, we can conclude that the dropout value of 0.2 for one Dense layer
has an overall better result; however, it failed to produce the top-performing model
and produced the lowest-performing model.
Fig. 36 HParams scatter plot matrix view for one dense layers neurons
Fig. 37 HParams scatter plot matrix view for two dense layers neurons
From Fig. 39, we can conclude that a dropout value of 0.2 has resulted in a better
performing model for the two Dense layers than the one Dense layer; probably, higher
dropout values correlate with higher accuracy in Dense multilayers.
(IV) Effect on Learning Rate
In addition to the Dropout rate, we also test on different learning rates, 0.01 or 0.001.
From Fig. 40, we can conclude that the learning rate with the value of 0.01 has
resulted in overall higher accuracy than the 0.001 in the one Dense layer experiment; it
has also contributed to the top-performing model when it comes to accuracy (Fig. 41).
176 A. Abdo et al.
Fig. 38 HParams scatter plot matrix view for one dense layer dropout
Fig. 39 HParams scatter plot matrix view for two dense layers dropout
Contrary to the one Dense layer experiment, a lower value of 0.001 has performed
better in the two Dense layers experiment; probably, lower learning rate values
correlate with higher accuracy in Dense multilayers.
(V) Effect on Batch Size
We also tested the effect of different batch sizes during the model training.
We can conclude that the batch sizes haven’t affected the model accuracy with
one Dense layer from Fig. 42. Both configurations of 50 and 100 batch sizes have
resulted in lower and higher accuracy; probably, other parameters affect the model’s
accuracy. However, batch sizes of 50 have contributed to the top-performing model.
Fig. 40 HParams scatter plot matrix view for one Dense layer learning rates
Fig. 41 HParams scatter plot matrix view for two Dense layers learning rates
From Figure 43, the results are similar to the one Dense layer experiment; however,
batch sizes of 100 have contributed to the top-performing and lowest-performing
model.
(VI) Effect on Epochs
For the number of Epochs used to train the models, we experimented with two
different values, 10 and 20.
178 A. Abdo et al.
Fig. 42 HParams scatter plot matrix view for one dense layer batch size
Fig. 43 HParams scatter plot matrix view for two dense layers batch sizes
From Fig. 44, we can conclude that higher Epochs has resulted in overall higher
accuracy than the lower in the one Dense layer experiment; it has also contributed
to the top-performing model when it comes to accuracy.
From Fig. 45, we can conclude that a higher Epochs value has also resulted in
higher accuracy than the lower value in the two Dense layers experiments. However,
it has been proven that higher Epochs result in very high train accuracy; however,
a very high Epochs will cause overfitting, and the validation accuracy will decrease
because models won’t generalize very well.
Fig. 44 HParams scatter plot matrix view for one dense layer epochs
Fig. 45 HParams scatter plot matrix view for two dense layers epochs
(VII) Effect of the Dense Layer
Figure 46 shows that the model with one Dense layer has more consistent performance
than the two Dense layers model. However, the latter has a performance spike when
configured with specific attributes. Therefore, as a summary from the experiments
on Inception-V3 transfer learning, we can conclude that the best performing model
is the model with the below parameters (Fig. 47).
Finally, after finding the best performing model with the most optimum param-
eters, we test the model with a dataset consists of 100 images with 25 images per
label. The results are illustrated in Fig. 48.
180 A. Abdo et al.
Fig. 46 Accuracy comparison between the different configurations of the dense layers
Fig. 47 Top model parameters
From Fig. 48, we can conclude that the best performing Inception-V3 transfer
learning has low testing performance and an average accuracy of 65.3%. The model
has failed to classify Markisa Manis.
For comparison, the exact same testing set is applied to other prevalent deep learning
architectures, result shown as Table 10.
Fig. 48 Confusion matrix

for Inception-V3
Table 10 Testing accuracy

Model Accuracy (%)
of all models
Custom CNN 1 97
Custom CNN 2 65
VGG16 95
Inceptionv3 65
5 Conclusion
In this study, 4 different CNN models are created for the Markisa Fruit classification
for the 4 different types of Markisa. Two custom CNN models are created, and 2
transfer learning models are used with the based model of VGG16 and Inceptionv3.
The classifier of the two transfer learning models is customed with different classifiers
and use to make the prediction. The result showed that the first custom CNN model
shows the highest accuracy with 97% followed by the transfer learning model of VGG
16 with an accuracy of 95%. The second custom CNN model and the Inceptionv3 both
give the same testing accuracy of 65%. Consequently, the custom CNN’s performance
on the testing accuracy is comparable to the transfer learning model such as VGG16.
The architecture design is crucial in determining how well the model able to capture
the feature inside the input dataset.
182 A. Abdo et al.
Appendix
Table 10 Result of parameter tuning in VGG16
Num_units Dropout Optimizer Epochs Learning rate Batch_size Accuracy

512 0.2 Adam 20 0.01 100 0.97
512 0.2 Adam 20 0.001 100 0.97
512 0.1 Adam 20 0.001 50 0.96
512 0.2 Adam 20 0.001 50 0.96
1024 0.2 Adam 10 0.001 50 0.96
512 0.1 SGD 20 0.001 100 0.95
1024 0.1 Adam 10 0.001 100 0.95
512 0.1 Adam 10 0.001 50 0.94
512 0.2 Adam 20 0.01 50 0.94
512 0.1 SGD 10 0.001 50 0.94
512 0.2 Adam 10 0.01 50 0.94
512 0.1 Adam 20 0.001 100 0.94
512 0.2 Adam 10 0.001 100 0.94
1024 0.2 Adam 10 0.01 100 0.94
1024 0.1 SGD 10 0.001 100 0.94
1024 0.1 SGD 20 0.001 100 0.94
1024 0.2 Adam 10 0.001 100 0.94
512 0.1 Adam 10 0.01 100 0.93
1024 0.2 Adam 20 0.01 50 0.93
512 0.2 SGD 20 0.001 50 0.92
1024 0.1 Adam 10 0.01 50 0.92
1024 0.1 Adam 20 0.001 50 0.92
512 0.2 Adam 10 0.01 100 0.91
1024 0.2 SGD 10 0.01 50 0.91
1024 0.2 Adam 20 0.001 100 0.9
1024 0.2 SGD 10 0.001 50 0.89
512 0.1 SGD 20 0.01 100 0.87
512 0.1 Adam 10 0.001 100 0.87
1024 0.1 Adam 20 0.01 50 0.87
1024 0.2 Adam 20 0.01 100 0.87
1024 0.1 Adam 10 0.01 100 0.87
1024 0.2 SGD 20 0.001 100 0.87
512 0.2 SGD 20 0.001 100 0.86
1024 0.1 Adam 20 0.01 100 0.86
(continued)
(continued)
Num_units Dropout Optimizer Epochs Learning rate Batch_size Accuracy
512 0.1 SGD 10 0.001 100 0.85
1024 0.2 SGD 20 0.001 50 0.85
1024 0.2 SGD 20 0.01 50 0.85
512 0.1 SGD 10 0.01 100 0.84
1024 0.2 SGD 10 0.01 100 0.84
512 0.1 Adam 20 0.01 50 0.83
1024 0.2 SGD 20 0.01 100 0.83
1024 0.2 Adam 20 0.001 50 0.83
512 0.2 SGD 10 0.01 50 0.82
1024 0.1 SGD 20 0.01 50 0.82
1024 0.1 SGD 20 0.001 50 0.82
512 0.1 SGD 20 0.001 50 0.81
512 0.1 Adam 20 0.01 100 0.81
512 0.1 Adam 10 0.01 50 0.81
1024 0.2 SGD 10 0.001 100 0.81
1024 0.1 Adam 20 0.001 100 0.81
512 0.2 SGD 20 0.01 50 0.8
1024 0.1 SGD 10 0.001 50 0.8
1024 0.1 Adam 10 0.001 50 0.8
512 0.2 SGD 10 0.001 100 0.79
512 0.2 SGD 10 0.001 50 0.78
512 0.1 SGD 10 0.01 50 0.78
512 0.1 SGD 20 0.01 50 0.77
512 0.2 Adam 10 0.001 50 0.75
512 0.2 SGD 20 0.01 100 0.75
512 0.2 SGD 10 0.01 100 0.75
1024 0.1 SGD 10 0.01 50 0.75
1024 0.1 SGD 20 0.01 100 0.73
1024 0.2 Adam 10 0.01 50 0.73
1024 0.1 SGD 10 0.01 100 0.72
Table 10 Result of parameter tuning for Inception-V3 with one dense layer
Number of Dropout rate Optimizer Epochs Learning rate Batch size Accuracy (%)
neurons
512 0.1 Adam 20 0.01 50 93
512 0.2 Adam 20 0.01 100 91
(continued)
184 A. Abdo et al.
(continued)
neurons
1024 0.1 Adam 10 0.001 50 90
1024 0.1 Adam 10 0.001 100 89
1024 0.1 Adam 20 0.01 50 89
512 0.1 Adam 20 0.001 50 89
512 0.2 Adam 10 0.001 50 89
512 0.1 Adam 10 0.001 50 89
1024 0.2 Adam 20 0.001 100 89
1024 0.2 Adam 20 0.01 50 89
1024 0.2 Adam 20 0.01 100 89
1024 0.1 Adam 20 0.001 100 89
512 0.2 Adam 20 0.001 100 89
1024 0.2 Adam 20 0.001 50 89
512 0.2 Adam 20 0.001 50 89
512 0.1 Adam 10 0.01 100 88
1024 0.2 Adam 10 0.001 50 88
1024 0.1 Adam 20 0.01 100 88
512 0.2 Adam 10 0.01 50 88
1024 0.1 Adam 10 0.01 100 88
512 0.1 Adam 20 0.01 100 88
512 0.1 Adam 10 0.01 50 88
1024 0.1 Adam 20 0.001 50 88
512 0.1 Adam 10 0.001 100 88
512 0.1 Adam 20 0.001 100 88
1024 0.1 Adam 10 0.01 50 87
1024 0.2 Adam 10 0.01 50 87
1024 0.2 Adam 10 0.001 100 87
512 0.2 Adam 10 0.01 100 86
1024 0.2 Adam 10 0.01 100 86
512 0.2 Adam 20 0.01 50 86
512 0.2 Adam 10 0.001 100 86
1024 0.2 SGD 20 0.001 100 90
1024 0.2 SGD 20 0.01 100 89
512 0.1 SGD 20 0.001 100 89
1024 0.1 SGD 20 0.01 50 89
1024 0.1 SGD 20 0.01 100 89
512 0.2 SGD 20 0.01 100 89
(continued)
(continued)
neurons
512 0.1 SGD 20 0.01 100 89
1024 0.2 SGD 10 0.01 50 89
1024 0.1 SGD 10 0.01 100 88
1024 0.2 SGD 10 0.01 100 88
1024 0.1 SGD 20 0.001 50 88
512 0.1 SGD 10 0.01 50 88
1024 0.2 SGD 20 0.01 50 88
512 0.2 SGD 20 0.001 100 87
512 0.2 SGD 10 0.01 50 87
1024 0.2 SGD 20 0.001 50 87
1024 0.1 SGD 10 0.01 50 87
512 0.1 SGD 20 0.01 50 87
512 0.1 SGD 10 0.01 100 87
512 0.1 SGD 20 0.001 50 87
512 0.2 SGD 10 0.001 100 87
1024 0.1 SGD 20 0.001 100 87
1024 0.1 SGD 10 0.001 50 86
1024 0.2 SGD 10 0.001 50 86
512 0.2 SGD 10 0.01 100 86
512 0.2 SGD 10 0.001 50 85
512 0.1 SGD 10 0.001 100 85
512 0.2 SGD 20 0.01 50 85
1024 0.1 SGD 10 0.001 100 84
512 0.2 SGD 20 0.001 50 84
512 0.1 SGD 10 0.001 50 83
1024 0.2 SGD 10 0.001 100 77
Table 10 Result of parameter tuning for Inception-V3 with two dense layers
Optimizer Learning rate Batch size Dropout rate Epochs Number of Accuracy (%)
neurons
Adam 0.001 100 0.2 20 512 92
Adam 0.001 50 0.1 20 512 90
Adam 0.001 100 0.1 20 512 89
Adam 0.001 100 0.1 10 512 89
Adam 0.001 50 0.1 20 1024 89
Adam 0.001 50 0.2 20 1024 89
(continued)
186 A. Abdo et al.
(continued)
neurons
Adam 0.01 100 0.2 20 512 89
Adam 0.01 50 0.1 20 512 89
Adam 0.001 100 0.1 10 1024 89
SGD 0.001 50 0.1 20 1024 89
SGD 0.001 50 0.2 20 512 89
SGD 0.01 100 0.1 20 512 89
SGD 0.01 100 0.2 20 512 89
SGD 0.01 50 0.1 20 512 89
SGD 0.01 50 0.1 20 1024 89
SGD 0.01 50 0.1 10 1024 89
SGD 0.01 100 0.2 20 1024 89
Adam 0.001 100 0.2 10 1024 88
Adam 0.001 100 0.2 10 512 88
Adam 0.001 50 0.2 20 512 88
Adam 0.001 50 0.1 10 1024 88
Adam 0.001 100 0.1 20 1024 88
Adam 0.01 100 0.2 20 1024 88
Adam 0.001 100 0.2 20 1024 88
Adam 0.01 50 0.2 20 512 88
Adam 0.001 50 0.2 10 512 88
Adam 0.001 50 0.2 10 1024 88
Adam 0.01 100 0.1 10 512 88
SGD 0.01 50 0.2 20 1024 88
SGD 0.01 50 0.1 10 512 88
SGD 0.01 50 0.2 10 1024 88
SGD 0.01 50 0.2 20 512 88
SGD 0.01 100 0.1 20 1024 88
SGD 0.01 50 0.2 10 512 88
Adam 0.01 100 0.1 20 512 87
Adam 0.01 100 0.1 10 1024 87
Adam 0.01 50 0.2 20 1024 87
Adam 0.001 50 0.1 10 512 87
Adam 0.01 100 0.2 10 1024 87
SGD 0.01 100 0.2 10 512 87
SGD 0.001 50 0.1 20 512 87
SGD 0.01 100 0.2 10 1024 87
(continued)
(continued)
neurons
SGD 0.01 100 0.1 10 512 87
SGD 0.01 100 0.1 10 1024 87
Adam 0.01 100 0.2 10 512 86
SGD 0.001 50 0.1 10 1024 86
SGD 0.001 100 0.1 20 512 86
SGD 0.001 50 0.2 10 512 86
Adam 0.01 50 0.1 10 512 85
Adam 0.01 50 0.2 10 1024 85
Adam 0.01 100 0.1 20 1024 85
Adam 0.01 50 0.1 20 1024 85
SGD 0.001 100 0.1 20 1024 85
SGD 0.001 100 0.2 10 512 85
SGD 0.001 50 0.2 10 1024 85
SGD 0.001 100 0.2 20 512 85
SGD 0.001 100 0.2 20 1024 84
SGD 0.001 100 0.2 10 1024 84
SGD 0.001 100 0.1 10 1024 83
SGD 0.001 50 0.2 20 1024 83
Adam 0.01 50 0.1 10 1024 82
SGD 0.001 100 0.1 10 512 82
Adam 0.01 50 0.2 10 512 80
SGD 0.001 50 0.1 10 512 78
References
1. Passiflora edulis. July 1, 2021. [Online]. https://en.wikipedia.org/wiki/Passiflora_edulis

2. Abualigah, L. M. Q. (2019). Feature selection and enhanced krill herd algorithm for text
document clustering (pp. 1–165). Springer.
3. Dwivedi, R. (2020, December 4). Everything you should know about dropouts and batch
normalization in CNN. Analytics India Magazine. https://analyticsindiamag.com/everything-
you-should-know-about-dropouts-and-batchnormalization-in-cnn/
4. Khandelwal, R. (2019, January 10). L1 and L2 regularization—DataDrivenInvestor. Medium.
https://medium.datadriveninvestor.com/l1-l2-regularization-7f1b4fe948f2?gi=bccf46d4504a
5. Kumari, N., Bhatt, A. K., Dwivedi, R. K., & Belwal, R. (2019). Performance analysis of support
vector machine in defective and non defective mangoes classification.
6. Alhaj, Y. A., Dahou, A., Al-qaness, M. A., Abualigah, L., Abbasi, A. A., Almaweri, N. A. O.,
188 A. Abdo et al.
7. Hyperparameter tuning with the HParams dashboard. TensorFlow, April 8, 2021.

[Online]. Available: https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_h
params. Accessed June 5, 2021.
113609.
on Twitter sentiment analysis: architecture, classifications, and challenges. In Deep learning
13. O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova,
L., Riordan, D., & Walsh, J. (2019, April). Deep learning vs. traditional computer vision. In
Science and information conference (pp. 128–144). Springer.
Engineering, 22, 1–13.
15. Palakodati, S. S. S., Chirra, V. R. R., Yakobu, D., & Bulla, S. (2020). Fresh and rotten fruits
classification using CNN and transfer learning. Revue d’Intelligence Artificielle, 34(5), 617–
622.
16. Kishore, M., Kulkarni, S., & Senthil Babu, K. (n.d.). Fruits and vegetables classification using
progressive resizing and transfer learning. Journal of University of Shanghai for Science and
Technology. Retrieved July 5, 2021, from https://jusst.org/wp-content/uploads/2021/02/Fruits-
and-Vegetables-Classification-using-Progressive-Resizing-and-Transfer-Learning-1.pdf
17. Pardede, J., Sitohang, B., Akbar, S., & Khodra, M. (2021). Implementation of transfer learning
using VGG16 on fruit ripeness detection. International Journal of Intelligent Systems and
Applications, 13(2), 52–61. https://doi.org/10.5815/ijisa.2021.02.04
18. Inceptionv3. Wikimedia Foundation, June 29, 2021. [Online]. https://en.wikipedia.org/wiki/
Inceptionv3. Accessed July 5, 2021.
19. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception
architecture for computer vision. In Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 2818–2826.
20. Lin, C., Li, L., Luo, W., Wang, K. C., & Guo, J. (2019) Transfer learning based traffic sign
recognition using inception-v3 model. Periodica Polytechnica Transportation Engineering,
242–250.
Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Systems with
10, 16150–16177.
L., & Al-qaness, M. A. (2021). Social media toxicity classification using deep learning: Real-
27. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556
Enhanced MapReduce Performance
for the Distributed Parallel Computing:
Application of the Big Data
Nathier Milhem, Laith Abualigah, Mohammad H. Nadimi-Shahraki,

Heming Jia, Absalom E. Ezugwu, and Abdelazim G. Hussien
Abstract Now a days and previous years, the increase in the volume of data has
accelerated and this requires more storage places with the increase of data, as big
data has a huge number of users and cloud computing, and these users need to access
data securely and privately from any device at any time. Therefore, it is important to
provide a safe flow of data in the Internet of Things (IOT records file) and to reduce its
size in a way that does not affect its purpose or its purpose. The most important field
of data mining is the search for items and repetitive data inside storage locations.
Apriori algorithm was the most common algorithm for finding a set of repeated
elements from data. This needs to delete a group of data that is repeated more than
N. Milhem · L. Abualigah (B)

School of Computer Sciences, Universiti Sains Malaysia, 11800, George Town, Pulau Pinang,
Malaysia
L. Abualigah
M. H. Nadimi-Shahraki
Faculty of Computer Engineering, Islamic Azad University, Najafabad Branch, 8514143131
Najafabad, Iran
Big Data Research Center, Islamic Azad University, Najafabad Branch, 8514143131 Najafabad,
Iran
Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia,
Brisbane 4006, Australia
H. Jia
Department of Information Engineering, Sanming University, Fujian 365004, China
A. E. Ezugwu
School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal,
KwaZulu-Natal, King Edward Road, Pietermaritzburg 3201, South Africa
A. G. Hussien
Department of Computer and Information Science, Linköping University, Linköping, Sweden
Faculty of Science, Fayoum University, Faiyum, Egypt
https://doi.org/10.1007/978-3-031-17576-3_8
192 N. Milhem et al.
once and create a number of new groups after deleting the repeated ones, which leads
to an increase in the storage space and an increase in the speed of its performance. In
this paper, we implemented the MapReduce Apriori (MRA) algorithm on the Apache
Hadoop cluster that includes two functions (Map and Reduce) to find the repeated
sets of k-elements.
Keywords Internet of Things (IoT) · Big Data · Hadoop · Map Reduce · Apriori
algorithms · Data mining
1 Introduction
Modern technology has become more complex, especially with the development of
Internet of things devices, which leads to an increase in huge data to accelerate in size
and grow dramatically with the passage of time, to become of a size and complexity
so large that it is difficult to store and the lack of tools to manage or process it with
high efficiency [1]. Internet of things devices connected to the distributed and cloud
infrastructure provide and transmit data and other resources for uploading to the
cloud. Therefore, it is important to ensure that data and resources are ready to be
accessed and that users are able to access them securely in any IoT environment and
are distributed in an orderly manner and reduce their volume [2].
Distributed and parallel computing systems are the best way to process data on a
large scale, and these algorithms have been used and transformed into ‘large algo-
rithms’ to work with big data. MapReduce contributes to data analysis and is one
of the best algorithms in this field, and is a programming model for parallel and
distributed execution of big data [3]. The Apriori algorithm is the most popular and
widely used algorithm in data mining that mining sets of repetitive elements using
filter generation.
Apriori is the core algorithm of Association Rule Mining (ARM) and its genesis
has fueled research in data mining. Apriori is one of the top 10 data mining algorithms
identified by the IEEE International Conference on Data Mining (ICDM) in 2006
based on the most impactful data mining [4]. It not only works to shrink large data,
but is also concerned with a set of characteristics such as speed, and the movement
of various data in many forms. Which is mainly determined by large size and high
speed. The variety is high. Traditional data mining techniques and tools are effective
in analyzing / extracting data but not scalable and efficient in managing big data. Big
data architectures and technologies have been adopted to analyze this data.
This study aims to achieve adding a proposed application of distributed parallel
computing performance on big data and how to transfer big data as it is collected from
the Internet of Things to be considered as input data and simplified after processing
using Hadoop. How effective is the validity of reducing the repetition of big data
and ensuring its quality in operations that include (data collection and processing)
through algorithms analyzing data and data results.
Enhanced MapReduce Performance for the Distributed … 193
2 Background
2.1 Big Data (BD)
The term “Big Data” include the (large volume, different forms, speed of processing,
technology, methods and impact) of digital data that accorded from companies and
individuals [5–12]. Big Data is the Information asset characterized by such a High
Volume, Velocity and Variety to require specific Technology and Analytical Methods
for its transformation into Value [13, 14].
Volume: This feature represents the large amount of data that is generated or obtained
from various sources such as social media, the bank, and the government and private
sectors, and it is increasing by the year 2021, so more than 44 trillion GB.
Value: It shows obtaining data through data collected from different sources,
conducting analyzes on them, and making sure of their values, as the analysis informs
us to give values of interest to companies and businesses for growth and progress,
and accordingly some decisions and ideas can be taken in the future.
Veracity: This part clarifies the contradictions and doubts that exist in the data during
the process. Some data packets have to be removed.
Velocity: The rate at which all the data is accumulated, this property measures Data
generation rate with increasing numbers of users, and it was accessed via IoT.
Variety: This feature deals with different formats of data including data coming from
IOT (images, video, JSON files, and social media). Include three formats of data are
structured, unstructured, and semi-structured data Fig. 1 explain formats of data.
Fig. 1 Big data classification

2.2 Hadoop
It is an open source programming system that is based on Java that processes a set
of big data that exists in a distributed computing environment. Hadoop Ecosystem
is a program that provides various services for finding a solution to the big data
problem. It contains Apache projects and a set of tools and special solutions. It
includes four major components of Hadoop, namely HDFS, Map Reduce, YARN,
and Hadoop Common. These Tools a are used to find solutions and support these key
components. These tools are linked to provide services such as data ingest, analysis,
storage, maintenance, etc.
2.2.1 HDFS
Distributed file system is designed in order to contain very huge amounts of data
obtained from the Internet of things and its size (terabytes or even petabytes) and
connect to information. It stores redundant files across storage devices in case of
failure and high availability.
2.2.2 Map Reduce
It is one of the programming models that was created to process big data by dividing
it into a group of Independent tasks, and the division is parallel, through two models
[1]. The first model is Map. The second models Reduce Each model does its job,
and the map’s function is to extract the results as pairs of values, where a Reduce
model takes the output of the map and processes it„ a collection of values is produced
(Fig. 2).
2.3 Apriori Algorithm
The Apriori algorithm works on sets of repetitive elements in order to establish the
correlation factor between them, K, and it is designed to work on big data that have
related parameters between them. With the help of the correlation factor K + 1, in
order to determine the strength or weakness of the contact between two objects. This
algorithm is widely used to efficiently calculate the set of functions for elements.
The goal of this iterative process is to find the repeated data set from the huge data
set. Some other optimization methods can be used to optimize the problems as given
in [15–20] (Fig. 3).
Fig. 2 MapReduce programming model
Fig. 3 The APRIORI-algorithm
3 Related Work
Researcher and research communities have provided experiments too many

approaches in BD analytical. This part of research provides the conducted work
and the results of each one of them.
MapReduce is used to scale algorithms for Big Data analyticsIt investigates an in
this case, the focus has been on reducing the volume of data through the development
of MapReduce and its integration with Apriori [3]. These algorithms are working
on analyzing big data. In his study, he focused on finding a solution to the problem
of scaling the “large algorithms” of the common correlation mining algorithm. The
results in this study confirm that an effective MapReduce implementation should
avoid dependent iterations, such as those of the original Apriori sequential algorithm.
Utility Frequent Patterns Mining on Large Scale Data based on Apriori MapRe-
duce Algorithm, the main objective was to enhance “Pattern Mining Algorithms” to
work on big Data by proposing a set of algorithms based on MapReduce architecture
and hadoop environment. This algorithm was merging Apriori with MapReduce, The
results indicated a good performance in wipers [21].
Effective implementation of Apriori The algorithm is based on MapReduce on the
Hadoop, a set of problems were posed, such as load balancing, the mechanism for
dividing data and how to distribute it, working on monitoring it, as well as passing
parameters between nodes [22]. Parallel and distributed computing is one of the
most widespread fields and has become wide and diversified, and there is also a
major difference that distinguishes Hadoop is its scalability, simplicity in its work,
and high reliability to solve most challenges and problems easily and effectively.
To determine the way of Distributed Parallel Computing Environment for Big Data
in mapreduce base apriori alhorthims, the researcher present the literature surveyed
(Table 1) as a case study to highlight the challenges envisaged for effective for imple-
mented the MapReduce Apriori (MRA) algorithm on the Apache Hadoop cluster to
stream/process BD.
4 Methodology (Prescriptive Study)
4.1 Hadoop Architecture
The architecture of Hadoop cluster as on (Fig. 4) consists of Master and Slave, the
Master is Name Node and the Slaves Are Data Node. The Name Node in master of
HDFS runs the dataNode daemon in the slave. The job in master submission node runs
the task Tracker in the slave, which is the only point of contact for a client wishing
to execute a MapReduce job. The Job Tracker in the master monitors the progress
of running MapReduce task and is responsible for coordinating the execution of the
map and reduce [14].
These services work on two deferent machines, and in a small cluster they are
often collocated. The bulk of a Hadoop cluster consists of slave nodes that run both a
Task Tracker, which is responsible running the user code, and a Data Node daemon,
for serving HDFS data [13].
Table 1 Comparison of existing approaches used to handle the frequent elements to Apply efficient,
validation, scalability and reliability
Author Year Objective Pros Cons
[23] 2010 Apache Mapreduce A set of 9 machines (1 master Ability of mapreduce with
framework used to calculate and 8 slave), used data from Apriori to give more
achieve parallelism and find IBM Corporation and the advantage. It can applied is
frequently element number of nodes was compared easily to many machines to
with the speed up through deal with big data without
hadoop cluster synchronization problem
[24] 2011 Data mining is used new Used data set from Google, and The algorithm works in the
strategy of rules and focused in the input data was divided into cloud computing environment
cloud computing environment two groups: a first group effectively and can extract the
and propose a method of big consisting of a 16-MB and a redundant set of data from the
data set distribution second group consisting of a group data, through the
64-MB and Experimental mechanism of data
Between N of Node and segmentation and distributing
Executions Times data.The efficiency of the
algorithm has been improved
[25] 2012 Propose new framework for The data set experiments for an The experimental and result
work on big data on certain AllElectronics branch and between three stage is actually
problems types of distributable framework used three stages: more efficient can works a
using a huge N.of nodes to scaleup-sizeup -speedup huge data
find scale well and efficiently
[26] 2013 A new model for mining Use 256 MB datasets and single Model has proven that the
dataset of frequent elements to machine to experiments Apriori results in this method are
Apply validation, scalability and FP-Growth algorithm feasible, valid and capable of
and reliability through running time with Data improving the overall
Size performance of the data
mining operation on a large
scale
[27] 2014 The algorithm works on big Created program in Java and The results proved that this
data processing and efficient application on Intel computer algorithm is led to a higher
data mining when it changes at processor 3.10 GHz i3-200 acceleration and effective in
the same time between dual-core with 4 GB main reducing the frequent time of
threshold value and the memory and used Apriori and work,
original database at the level FP-Growth coupler algorithms
analyzed by comparison
(Dataset Size, Dataset
Transactions)
[28] 2015 Discusses the use and A hadoop cluster is setup with 4 The inventory of the product
implementation mechanism system nods(3-slave and of e-commerce companies can
through mining big data for 1-master name node) on Ubuntu be updated based on the set of
e-commerce companies and 14.04 recurring items at regular
improving sales processes intervals
[29] 2016 It focuses on taking the DESIGN is enhance of apriori Map-Hbase-Apriori can only
timestamp at each stage and implement on MR and HBASE once scan to finish the
considers it as a symbol in its on hadoop cluster, and compare database matching of the
transaction, and this is between apriori orginal and MH frequent element
considered appropriate for the apriori Linux with Hadoop
process of indexing data with 0.20.0. consist of 5 nodes, (1
its timestamp master-4 slave) dataset size is
1.8 GB form IBM
(continued)
Table 1 (continued)
Author Year Objective Pros Cons
[30] 2017 Enhance Apriori based on 350,000 records from 2007 to The results showed that the
Hadoop cluster on big data 2014 were obtained after data algorithm achieved high
applied of axle faults of EMUs preprocessing; and applied of accuracy in the error
Apriori based Hadoop cluster prediction process and speed
in the operation process
[31] 2018 Developed for MR approach A new algorithm, Apriori Core This algorithm works on any
base with Apriori algorithm MapReduce, is proposed to type of database
for recursive data mining and work on big data that takes less
works on any type of database time and memory than the
original algorithm
[32] 2019 Improving performance of The algorithm is implemented If the proposed is not work
iterative element set parallel on data set and market basket with mapreduce, the time for
mining using Hadoop with analyses exploration forces will
FP-Growth and Apriori decrease
comparison
[33] 2020 Create an algorithm based on Hadoop v1.2.1 used data size Number of frequent cases
mining effectively on the real 400 GB by AIS Global for two decreases rapidly, but They
data set that works in parallel months (4–5) was used for the considered the size of the data
and To split the original data year 2012 and experimental data to be small compared to the
set through three stage consist first experiment, and we need more
stage Calculating the partition data for comparison
number N and second stage
Determining partition
boundaries and third stage
includes Partitioning the data set
Fig. 4 Hadoop: master/slave architecture

Fig. 5 MapReduce in Hadoop
4.2 MR Programming Model
MAP-REDUCE computing model (Fig. 5) include two functions, Map () and Reduce
() functions. The tow functions are both defined with pairs of data structure (key1;
value1). Map function is work to each item in the input dataset (key1; value1) pairs
and call produces a list (key2; value2). All the (key, value) that have the same key in
the output lists is save to reduce () function which generates one (value 3) or empty
return.
4.3 Apriori Algorithm
Apriori algorithm uses an iterative method called layer-by-layer search, whereby

the set of k elements is used to explore the set of elements (k + 1). First, scan the
database, sum the number of each item, and collect the items that meet the minimum
support score to find the set of 1 recurring item clusters. This group is referred to
as L1. Then, use L1 to find the set of L2 for two sets of repeating elements, use L2
to search for L3, and so on, until the set of repeating k-elements can no longer be
found. Every Lk present requires a full database scan. The Apriori algorithm uses a
priori properties of repeated element groups to compress the search space.
5 Result and Discussion (Proposed Framework)
In these results, implementation of proposed a framework improving Enhance Perfor-

mance Distributed Parallel Computing for Big Data in cloud computing environment,
Fig. 6 explain how to frame process.
This section explains frame architecture and basic concepts in the context of the
mapreduce base Apriori algorithm, we presented a high-level abstract architecture
to find frequent Item of set data. The Concept of frame consist is:
Logs file: The color group is data collected from the Internet of Things, and stored
in HDFS pure in Hadoop cluster where this data is unordered.
MAP—Apriori: The data is fetched from HDFS in the form of pairs (K, V), and
they are arranged according to the type of data and this process can be repeated more
than once because it includes frequent items.
Reduce—Apriori: The inputs of this process are a group of similar pairs (K, V)
in their kind coming from MAP—Apriori in the multiple layers, the main task is
collecting the similar values (K, Vn).
Output: At this stage, the data is verified before it is stored in HDFS Output, all
frequent values (K1, V1) are retrieved to MAP—Apori, until they are completely
collected.
Fig. 6 MP-A frame

6 Conclusion
In this paper, we have proposed new frame to efficient pattern to mining frequent
data available in big data, and apply algorithms to effective and validity of reducing
the repetition of big data and ensuring its quality in operations. Through MapReuce
base Apriori algorithms in Hadoop cluster. Where all the practical researches related
to this field were compared to each other and the results lead to widely effective in
the field of data mining. After comparing all studies and verifying the effectiveness
of the algorithms in giving reliable results in this field, we will apply them to neural
network based deep learning, especially since it is working on MapRduce in different
studies.
References
1. Altaf, M. A. B., Barapatre, H. K., & Sangvi, A. Mining condensed representations of frequent
patterns on big data using max Apriori map reducing technique.
2. Apache Hadoop. http://hadoop.apache.org/
3. Kijsanayothin, P., Chalumporn, G., & Hewett, R. (2019). On using MapReduce to scale algo-
rithms for big data analytics: A case study. J Big Data, 6, 105. https://doi.org/10.1186/s40537-
019-0269-1
4. Singh, S., Garg, R., & Mishra, P. K. (2018). Performance optimization of MapReduce-based
Apriori algorithm on Hadoop cluster. Computers and Electrical Engineering, 67, 348–364.
ISSN 0045-7906.
6. Gandomi, A. H., Chen, F., & Abualigah, L. (2022). Machine learning technologies for big data
analytics. Electronics, 11(3), 421.
7. Bashabsheh, M. Q., Abualigah, L., & Alshinwan, M. (2022). Big data analysis using
hybrid meta-heuristic optimization algorithm and MapReduce framework. In Integrating
meta-heuristics and machine learning for real-world optimization problems (pp. 181–223).
Springer.
8. Gharaibeh, M., Almahmoud, M., Ali, M. Z., Al-Badarneh, A., El-Heis, M., Abualigah, L.,
Altalhi, M., Alaiad, A., & Gandomi, A. H. (2021). Early diagnosis of alzheimer’s disease using
cerebral catheter angiogram neuroimaging: A novel model based on deep learning approaches.
Big Data and Cognitive Computing, 6(1), 2.
9. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for Big Data
applications in IoT cloud computing environments. Cluster Computing, 24(4), 2957–2976.
10. Abualigah, L., Gandomi, A. H., Elaziz, M. A., Hamad, H. A., Omari, M., Alshinwan, M., &
Khasawneh, A. M. (2021). Advances in meta-heuristic optimization algorithms in big data text
clustering. Electronics, 10(2), 101.
11. Abualigah, L., & Masri, B. A. (2021). Advances in MapReduce big data processing: platform,
tools, and algorithms. In Artificial intelligence and IoT (pp. 105–128).
12. Al-Sai, Z. A., & Abualigah, L. M. (2017, May). Big data and e-government: A review. In 2017
8th International conference on information technology (ICIT) (pp. 580–587). IEEE.
13. Kumar, A., Kiran, M., Mukherjee, S., & Ravi Prakash G. (2013). Verification and validation of
MapReduce program model for parallel K-means algorithm on Hadoop cluster. International
Journal of Computer Applications 72(8). (0975-8887).
14. Qayyum, R. (2020). A roadmap towards big data opportunities, emerging issues and Hadoop
as a solution. International Journal of Education and Management Engineering, 10, 8–17.
https://doi.org/10.5815/ijeme.2020.04.02
113609.
10, 16150–16177.
21. Nandini, G. V. S., & Rao, N. K. K. (2019) Utility frequent patterns mining on large scale
data based on Apriori MapReduce algorithm. International Journal of Research in Informative
Science Application and Techniques (IJRISAT), 3(8), 19381–19387.
22. Yahya, A. A., & Osman, A. (2019). Using data mining techniques to guide academic programs
design and assessment. Procedia Computer Science, 163, 472–481. ISSN 1877-0509,
23. Yang, X. Y., Liu, Z., & Fu, Y. (2010). MapReduce as a programming model for association
rules algorithm on Hadoop. In The 3rd international conference on information sciences and
interaction sciences (pp. 99–102). https://doi.org/10.1109/ICICIS.2010.5534718
24. Li, L., & Zhang, M. (2011). The strategy of mining association rule based on cloud computing,
In 2011 International conference on business computing and global informatization (pp. 475–
478).https://doi.org/10.1109/BCGIn.2011.125
25. Li, N., Zeng, L., He, Q., Shi, Z. (2012). Parallel implementation of Apriori algorithm based on
MapReduce. In 2012 13th ACIS international conference on software engineering, artificial
intelligence, networking and parallel/distributed computing (pp. 236–241). https://doi.org/10.
1109/SNPD.2012.31
26. Rong, Z., Xia, D., & Zhang, Z. (2013). Complex statistical analysis of big data: Implementation
and application of Apriori and FP-Growth algorithm based on MapReduce. In 2013 IEEE 4th
international conference on software engineering and service science (pp. 968–972). https://
doi.org/10.1109/ICSESS.2013.6615467
27. Wei, X., Ma, Y., Zhang, F., Liu, M., & Shen, W. (2014). Incremental FP-Growth mining
strategy for dynamic threshold value and database based on MapReduce. In Proceedings of the
2014 IEEE 18th international conference on computer supported cooperative work in design
(CSCWD) (pp. 271–276). https://doi.org/10.1109/CSCWD.2014.6846854
28. Chaudhary, H., Yadav, D. K., Bhatnagar, R., & Chandrasekhar, U. (2015). MapReduce
based frequent itemset mining algorithm on stream data. In 2015 Global conference on
communication technologies (GCCT) (pp. 598–603).https://doi.org/10.1109/GCCT.2015.734
2732
29. Feng, D., Zhu, L., & Zhang, L. (2016). Research on improved Apriori algorithm based on
MapReduce and HBase. In 2016 IEEE advanced information management, communicates,
electronic and automation control conference (IMCEC) (pp. 887–891).https://doi.org/10.1109/
IMCEC.2016.7867338
30. Li, L., Shi, T., & Zhang, W. (2017). Axle fault prognostics of electric multiple units based on
improved Apriori algorithm. In 2017 29th Chinese control and decision conference (CCDC)
(pp. 4229–4233). https://doi.org/10.1109/CCDC.2017.7979241
31. Pandey, K. K., & Shukla, D. (2018) Mining on relationships in big data era using improve apriori
algorithm with MapReduce approach. In 2018 International conference on advanced compu-
tation and telecommunication (ICACAT) (pp. 1–5).https://doi.org/10.1109/ICACAT.2018.893
3674
32. Deshmukh, R. A., Bharathi, H. N., & Tripathy, A. K. (2019). Parallel processing of frequent
itemset based on MapReduce programming model. In 2019 5th International conference on
computing, communication, control and automation (ICCUBEA) (pp. 1–6)https://doi.org/10.
1109/ICCUBEA47591.2019.9128369
33. Lei, B. (2020). Apriori-based spatial pattern mining algorithm for big data. In 2020 Inter-
national conference on urban engineering and management science (ICUEMS) (pp. 310–
313).https://doi.org/10.1109/ICUEMS50872.2020.00074
A Novel Big Data Classification
Technique for Healthcare Application
Using Support Vector Machine, Random
Forest and J48
Hitham Al-Manaseer, Laith Abualigah, Anas Ratib Alsoud, Raed Abu Zitar,
Abstract In this study, the possibility of using and applying the capabilities of
artificial intelligence (AI) and machine learning (ML) to increase the effective-
ness of Internet of Things (IoT) and big data in developing a system that supports
decision makers in the medical fields was studied. This was done by studying the
performance of three well-known classification algorithms Random Forest Classi-
fier (RFC), Support Vector Machine (SVM), and Decision Tree-J48 (J48), to predict
the probability of heart attack. The performance of the algorithms for accuracy was
evaluated using the Healthcare (heart attack possibility) dataset, freely available on
kagle. The data was divided into three categories consisting of (303, 909, 1808)
instances which were analyzed on the WEKA platform. The results showed that the
RFC was the best performer.
Keywords Big data · Internet of Things · Random forest classifier · J48 · Support
vector machine · Weka · E-Health
H. Al-Manaseer · L. Abualigah (B)

Malaysia
Jordan
L. Abualigah
R. A. Zitar
Arab Emirates
A. E. Ezugwu
KwaZulu-Natal, King Edward Road, Pietermaritzburg 3201, South Africa
H. Jia
https://doi.org/10.1007/978-3-031-17576-3_9
206 H. Al-Manaseer et al.
1 Introduction
In the current era, communication has become widespread between many things, such
as computers, large web servers, smart devices, etc. through the Internet. This contact
form is known as the Internet of Things (IoT) [1]. IoT is characterized by its massive
structure and complexity, and represents the second set of the Internet, possibly may
have trillions of interconnected points. The use of IoT will lead to achieving high
economic benefit to the various sectors, because it works to enhance the possibility of
production and innovation [2–5]. It has brought about tremendous and unprecedented
changes that helped reduce costs, improve efficiencies, and increase revenues, which
led to the generation of a huge volume of data. Figure 1 describe the concept of it.
The current technological revolution has resulted in the generation of large
amounts of data [6–9]. As a result of the massive development of the IoT, huge
amounts of data have been created. This data is called “big data”, and it refers to a
wide range of data that needs new structures and technologies to manage that data,
whether to capture and process it in order to be able to extract value to enhance
insight and decision making [10]. Big data has many characteristics such as being
large in size, high speed, high diversity, and high accuracy [11, 12]. Due to advances
in healthcare dataset management systems, large amounts of medical data have been
generated, and this type of machine learning is classified as supervised learning.
Analysis and classification methods can be used in big data science and data mining
to enhance the effectiveness of the IoT and meet the challenges it faces such as the
mechanism of storage, transportation and processing to large volume of data.
One of the problems facing big data science is the classification issue. If the dataset
contains many dimensions, the compilation process becomes moderate. However,
consideration must be given to choosing a method for extracting the desired features
from the set of features for the dataset as this leads to the loss of part of the dataset’s
data [13, 14]. The main benefit of selecting specific features and ignoring unnecessary
ones is to reduce data volumes and improve “classification/prediction accuracy” [15].
The classification method is one of the most applied methods in the data mining, as
Fig. 1 IoT concept

A Novel Big Data Classification Technique for Healthcare … 207
it uses a set of previously classified examples in order to build new model can be
used in several application such as IoT E-Health systems.
Data mining is defined as the mechanism of extracting data from the data set,
discovering useful information from it, and then analyzing the data collected in order
to enhance the decision-making mechanism. Data mining uses different algorithms
and seeks to reveal specific features of data [16].
This study aims to apply data mining techniques in the E-health systems of the
IoT, especially the study of Health care (heart attack possibility) dataset and the real
feasibility of these techniques in the E-health of the IoT. There are various ways
to use the principles of data mining to create smart E-Health systems for the IoT.
As a case study, technologically scalable study of healthcare dataset was developed
using free, open source software such as WEKA (Waikato Knowledge Acquisition
Environment). And also it is aim to compare the accuracy of Random forest classifier
(RFC), Support Vector Machine (SVM), and Decision Tree-J48 (J48) algorithms in
classifying and analyzing medical data.
Here is a review of the main benefits of using healthcare data mining:
. Predicting the patient’s likelihood of having a heart attack.
. The use of data mining techniques helps decision makers (i.e. health care workers)
to make decisions related to disease cases.
. Reducing the rate of medical errors, as the use of data mining techniques in this
study predicts in advance the possibility of a heart attack.
The rest of this paper is organized as follows. Section 2 Literature Review.
Section 3 Methodology. Section 4 Process development. Section 5 Experiment and
Result. Finally, Sect. 6 shows conclusions and future work.
2 Literature Review
There is currently a wide body of literature covering a wide range of techniques,

which can be used as an integral part of big data and IoT. The following sections
survey the best methods used in this field.
Lakshmanaprabu et al. in [17] used Random Forest Classifier (RFC) and the
MapReduce process to develop a technology based on big data analysis in an IoTs
based healthcare system. E-Health data was collected from patients with various
diseases and taken into account for the analysis of this data. To get the best rating,
the optimal traits were selected based on the Enhanced Dragonfly Algorithm (IDA)
from the dataset. RFC used for classifying E-Health data using enhanced features. The
proposed technique outperforms other classification methods such as the Gaussian
mixture model and logistic regression. The maximum training and testing accuracy of
the proposed technique is 94.2% precision, and 89.99% recall. Various performance
measures were analyzed and their results were compared with existing methods in
order to verify the efficiency of the proposed method. The limitations on the proposed
technique are computationally slow due to the large dataset. Some other optimization
methods can be used to optimize the problems as given in [18–23].
Cervantes et al. in [24] conducted a comprehensive survey of SVMs taxonomy
including applications, challenges, and trends that included a brief introduction to
SVMs, a description of its many applications, and a summary of its challenges and
trends. Examine and define limitations of SVMs [24]. Study and discuss the future
of SVM in conjunction with more applications. Describe the major flaws in SVM
and the various algorithms implemented to address these flaws in detail based on the
work of researchers who encountered these flaws.
Jain et al. [25] linked Apache Hadoop to Weka. The big data stored on the Hadoop
distributed file system (HDFS) and processed with Weka using Weka’s Knowledge
flow. Knowledge flow provides a good way to build topologies using HDFS compo-
nents that can be used to provide data for machine learning algorithms available in
Weka [25]. In big data mining, the supervised machine learning methods used which
include Naïve Bayes, SVM and J48. The accuracy of these methods was compared
with raw data and normative data given for the same structure. A new approach in big
data mining proposed that gives better results compared to the reference approach.
The accuracy of classifying raw data sets has been increased. Normalization was also
applied to the raw dataset and the accuracy was found to improve after supervised
estimation of the dataset.
Siou-Wei and others in [26] use the SVM for classifying and processing data
based on three characteristics: healthy, unhealthy, and very unhealthy. Uploaded the
physiological parameters of the test object and classification results to cloud storage
and web page rendering in order to provide the basis for big data analysis in future
research. All biomedical units equipped with wireless sensor network chips can
process and collect the measured data and then transmit it to the cloud server via the
wireless network for storage and analysis of that data.
Li et al. in [27] presented a comprehensive survey of using big data science and
data mining methods on IoTs aims to identify the topics that should be focused more
on in current or future research. By following up on conference articles and published
journals on IoT big data and also IoT data mining areas between 2010 and 2017.
Articles were screened using the literature review set and methodological maps of
44 articles. These articles fall into three classes: architecture, platform, framework,
and application.
3 Methodology
This part studies the methodology used to analyze big data in IoT E-Health systems,
using some of the modeling procedures. This analysis uses Health care (heart attack
possibility) dataset for training and testing purposes.
IoT data is used for the performance of systems, infrastructure and, IoT objects.
IoT objects contain data produced as a result of interaction between people, people,
systems, and systems. This data can be used to improve the services provided by the
Fig. 2 Big data model in the IoT [17]
IoT. All health centers, regardless of where testing is conducted, have access to each
patient’s information, using big data science, and also tests are stored at the same
time the test was made, allowing appropriate decisions to be made from the moment
the patient is tested.
Extracting specific data from big data, as well as extracting any data from smart
data, are thorny problems that can be solved through data mining techniques. There-
fore, different models can be used to extract data. Figure 2 illustrates a model of big
data in the IoT [17]. The dataset on IoT objects, infrastructure, includes some minute
details and information about healthcare data such as patient age, sex, etc. Health
data were classified using the RFC, SVM and J48.
A. Random Forest Classifier (RFC)

RFC represents an immutable set of classification trees. It performs well in
many job issues. The reason is that it is not sensitive to any disturbances in the
information set, and it does not have an overfitting problem. It combines many
trees predictions, each one being trained independently of the rest [28]. The RFC
generates a randomized example to the information and visualizes a major order
of ratios in order to develop choice trees.
B. Support Vector Machine (SVM)
SVM classified as one of the best techniques used in predicting expected solu-
tions [29]. SVM was presented by Vapnik as one of machine learning model
that performs the task of classification and regression. Due to the generalization,
optimization, and discriminatory power of SVM, it has been used in the fields
of data mining, machine learning, and pattern recognition in the past years. Use
SVM extensively to solve binary process classification problems. SVMs outper-
form other supervised machine learning methods [30]. In recent years SVMs
have become one of the most widely used classification methods due to good
theoretical foundations and generalization [24].
4 The Proposed Method
This section describes the approach chosen to develop data mining techniques in
order to focus on analyzing data and discovering exploration principles by which
health information can be provided to the patient and predicted heart disease.
C. Case Study
Archived historical data was used, the data set consisted of 76 attributes, but in all
published experiments a subset of 14 attributes was used. Especially, machine
learning (ML) researchers have only used the Cleveland dataset so far. The
“target” field expresses the extent to which the patient has a heart disease. The
integer number with a value of (0) indicates that there is no less chance or
chance of having a heart attack. As for the chance of having a heart attack, it
is represented by the number (1). This data set is freely available on the kaggle
website. Table 1 show the full list of attributes [31].
Depending on the composition of the data set, a mechanism for preparing the data
and extracting knowledge from it was hypothesized. After the validation process
through the case study, the approach is applicable and feasible in many analyzes of
patients’ E-health data. The objective of the approach is to build an analytical model
Table 1 Descriptions of the dataset attributes for heart attack [31]

No. Description Attribute
1 age Age
2 sex Sex
3 cp Chest pain type (4 values)
4 trestbps Resting blood pressure
5 chol Serum cholestoral in mg/dl
6 fbs Fasting blood sugar > 120 mg/dl
7 restecg Resting electrocardiographic results (values 0,1,2)
8 thalach Maximum heart rate achieved
9 exang exercise induced angina
10 oldpeak oldpeak = ST depression induced by exercise relative to rest
11 slope The slope of the peak exercise ST segment
12 ca Number of major vessels (0–3) colored by flourosopy
13 thal thal: reversible defect = 2, normal = 0; 1 = fixed defect = 1;
14 target Objective: 0 = low probability of having a heart attack, 1 = probability of
having a heart attack at a high rate
Fig. 3 The correctly classified instances using cross validation
to produce a set of decisions for use as a decision support system for E-Health.
Figure 3 shows a flowchart illustrating the proposed approach.
After a process of data validation by case study, the approach is applicable and
feasible in many analyzes of patients’ E-health data [17].
The WEKA data mining software was used to implement the proposed system.
WEKA is free open source software, defined as a set of ML methods for solving
data mining problems in real-world, developed in Java and works on almost any
platform. It is analytical tool that applies data mining approach to any datasets.
Although there are several supported and professional data mining software pack-
ages, WEKA provide many advantages such as it is open source, downloadable
application, fast, ease of use and access, easy to implement, and does not require any
financial requirements (i.e. no fees) [32, 33].
In this study, data stored in comma-separated values file (csv) form were used.
The target attribute was chosen as the main attribute of the trial class. Then a rules
set is used as by decision-makers in the health centers as a decision support system,
where information is provided to them to predict the possibility heart attack. The
target attribute was chosen as the main attribute of the experiment category. Then a
set of rules is used by decision makers in health centers as a decision support system,
where information is provided to them to predict the possibility of a heart attack.
5 Experiments and Results
As a result of increasing computing power and the massive amount of data currently
available, machine learning algorithms are becoming increasingly complex and more
powerful [34]. In this study, three types of classification algorithms are tested: SVM,
RFC, and J48.
Table 2 Correctly classified

Algorithms T303 T909 T1818
instances cross validation
SVM 47.4 96 100
RF 84.2 98.7 100
J48 82.9 92.5 99.1
Determining the optimal size of the dataset is essential, as too many cases and
too few can lead to imprecise models [32]. For this reason, Health care (Heart attack
possibility) dataset was divided into three categories, the first consisting of 303
instances, the second one consisting of 909 instances, and the third consisting of
1818 instances. The SVM, RFC, and J48 algorithms ran, evaluated with tenfold
validation.
Since cross-validation suffers from an overfitting problem because the data being
tested is the same as the data used in training, which means it often learns and
maintains patterns within this dataset [34]. So another evaluation mechanism used
based on creating an isolated test set consisting of 25% of the total dataset for each
of the previous three classifications and using it to evaluate these algorithms.
Figure 3 shows the percentage of correctly classified instances when the algo-
rithms are applied to the previous three categories. It is noted from the graph that
the algorithms converged in classification accuracy when the dataset size exceeded
909 cases. While SVM failed to rank at 303 cases. Table 2 shows a summary of the
results.
Figure 4 shows the percentage of correctly classified instances when the algo-
rithms are applied to the three previous categories. It is noted from the graph the
RFC outperformed the other algorithms, and the three converged in classification
accuracy when the size of the dataset exceeded 1818 instances. And also again the
SVM failed to rank at 303 cases. Table 3 shows a summary of the results.
Fig. 4 Correctly classified instances percentage split (25%)

Table 3 Correctly classified

Algorithms T303 T909 T1818
instances percentage split
(25%) SVM 47.4 96 100
RF 84.2 98.7 100
J48 82.9 92.5 99.1
6 Conclusion
The IoT works hand in hand with big data when huge scales of information must
be processed and analyzed. In this study, E-health data were analyzed using classifi-
cation algorithms and in particular the Health care (Heart attack possibility) dataset
was used. The optimal feature of the medical database was identified, which helps
in building an effective model in predicting heart disease. The results showed the
superiority of RFC over other.
References
1. Firouzi, F., Farahani, B., Weinberger, M., DePace, G., & Aliee, F. S. (2020). IoT fundamentals:
Definitions, architectures, challenges, and promises. In Intelligent Internet of Things (pp. 3–50).
Springer.
Springer.
6. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data
8th international conference on information technology (ICIT) (pp. 580–587). IEEE.
10. Katal, A., Wazid, M., & Goudar, R. H. (2013). Big data: Issues, challenges, tools and good
practices. In 2013 Sixth international conference on contemporary computing (IC3) (pp. 404–
409). IEEE.
11. Chebbi, I., Boulila, W., & Farah, I. R. (2015) Big data: Concepts, challenges and applications.
In Computational collective intelligence (pp. 638–647). Springer.
12. Alam, F., Mehmood, R., Katib, I., Albogami, N. N., & Albeshri, A. (2017). Data fusion and
IoT for smart ubiquitous environments: A survey. IEEE Access, 5, 9533–9554.
13. Revathi, L., & Appandiraj, A. (2015). Hadoop based parallel framework for feature subset
selection in big data. International Journal of Innovative Research in Science, Engineering
and Technology, 4(5), 3530–3534.
14. Shankar, K. (2017). Prediction of most risk factors in hepatitis disease using apriori algorithm.
Research Journal of Pharmaceutical Biological and Chemical Sciences, 8(5), 477–484.
15. Manogaran, G., Lopez, D., & Chilamkurti, N. (2018). In-mapper combiner based MapReduce
algorithm for processing of big climate data. Future Generation Computer Systems, 86, 433–
445.
16. Injadat, M., Moubayed, A., Nassif, A. B., & Shami, A. (2020). Multi-split optimized bagging
ensemble model selection for multi-class educational data mining. Applied Intelligence, 50(12),
4506–4528.
17. Lakshmanaprabu, S. K., et al. (2019). Random forest for big data classification in the internet
of things using optimal features. International Journal of Machine Learning and Cybernetics,
10(10), 2609–2618.
113609.
10, 16150–16177.
24. Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). A compre-
hensive survey on support vector machine classification: Applications, challenges and trends.
Neurocomputing, 408, 189–215.
25. Jain, A., Sharma, V., & Sharma, V. (2017). Big data mining using supervised machine learning
approaches for Hadoop with Weka distribution. International Journal of Computational
Intelligence Research, 13(8), 2095–2111.
26. Su, M. Y., Wei, H. S., Chen, X. Y., Lin, P. W., & Qiu, D. Y. (2018). Using ad-related network
behavior to distinguish ad libraries. Applied Sciences, 8(10), 1852.
27. Li, W., Chai, Y., Khan, F., Jan, S. R. U., Verma, S., Menon, V. G., & Li, X. (2021). A compre-
hensive survey on machine learning-based big data analytics for IoT-enabled smart healthcare
system. Mobile Networks and Applications, 26(1), 234–252.
28. Chin, J., Callaghan, V., & Lam, I. (2017). Understanding and personalising smart city services
using machine learning, the internet-of-things and big data. In 2017 IEEE 26th International
Symposium on Industrial Electronics (ISIE) (pp. 2050–2055). IEEE.
29. Vapnik, V. (2013). The nature of statistical learning theory. Springer Science & Business
Media.
30. Liang, X., Zhu, L., & Huang, D. (2017). S Multi-task ranking SVM for image cosegmentation.
Neurocomputing, 247, 126–136.
31. Naresh, B. (2021) Health care: Heart attack possibility [Online]. Kaggle, July 4, 2021. https://
www.kaggle.com/nareshbhat/health-care-data-set-on-heart-attack-possibility
32. Oliff, H., & Liu, Y. (2017). Towards industry 4.0 utilizing data-mining techniques: A case study
on quality improvement. Procedia CIRP, 63, 167–172.
33. WEKA. (2021). The workbench for machine learning [Online]. WEKA. https://www.cs.wai
kato.ac.nz/ml/weka/index.html. Last accessed June 4, 2021.
34. Géron, A. (2019)/ Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow:
Concepts, tools, and techniques to build intelligent systems. O’Reilly Media.
Comparative Study on Arabic Text
Classification: Challenges
and Opportunities
Mohammed K. Bani Melhem, Laith Abualigah, Raed Abu Zitar,

Abdelazim G. Hussien, and Diego Oliva
Abstract There have been great improvements in web technology over the past years
which heavily loaded the Internet with various digital contents of different fields. This
made finding certain text classification algorithms that fit a specific language or a set
of languages a difficult task for researchers. Text Classification or categorization is
the practice of allocating a given text document to one or more predefined labels or
categories, it aims to obtain valuable information from unstructured text documents.
This paper presents a comparative study based on a list of chosen published papers
that focus on improving Arabic text classifications, to highlight the given models
and the used classifiers besides discussing the faced challenges in these types of
researches, then this paper proposes the expected research opportunities in the field
of text classification research. Based on the reviewed researches, SVM and Naive
Bayes were the most widely used classifiers for Arabic text classification, while
more effort is needed to develop and to implement flexible Arabic text classification
methods and classifiers.
M. K. B. Melhem · L. Abualigah (B)

Malaysia
L. Abualigah
R. A. Zitar
Arab Emirates
A. G. Hussien
Department of Computer and Information Science, Linköping University, Linköping, Sweden
Faculty of Science, Fayoum University, Faiyum, Egypt
D. Oliva
IN3—Computer Science Department, Universitat Oberta de Catalunya, Castelldefels, Spain
Depto. de Ciencias Computacionales, Universidad de Guadalajara, CUCEI, Guadalajara, Jalisco,
Mexico
https://doi.org/10.1007/978-3-031-17576-3_10
218 M. K. B. Melhem et al.
Keywords Arabic text classification · Deep learning · CHI square · Single-label

text categorization · Multi-label text categorization · Naïve Bayes · Arabic natural
language processing · Feature selection
1 Introduction
Nowadays, enormous amounts of information and much-hidden knowledge are avail-

able on the internet behind the globally distributed digital contents, this knowledge
can be extracted if suitable and innovative tools were applied to the given digital
contents [1, 2]. Several sciences and methods helped in the process of automatically
extracting information from digital content, text classification is one of these methods
that significantly contributes to the speed and the accuracy of obtaining information
and knowledge. Previously, professionals and domain experts were classifying docu-
ments manually [3, 4]. However, with the tremendous and increasing development
in the quantity and quality of Arabic digital contents, manual classification became
ineffective and unfeasible, which posed significant challenges to the text classifica-
tion process and motivated researchers to develop and improve automatic methods
for text classification, which in turn created many challenges for the researchers
[5–12].
Interested researchers presented and implemented many solutions, but most of
these solutions were restricted to the use of classical machine learning classifiers
with small datasets that were not freely available, nor sufficient in most cases. To
overcome this challenge, many researchers turned to adapting the use of deep learning
techniques, improving the given algorithms, providing and suggesting more free
datasets, which added clear improvements to the processes of text classifying.
The purpose of this study is to explore the publications available on the topic
of Arabic text classification, and to summarize the results of this publication and
list them as research challenges and opportunities that can help researchers who are
interested in this type of research. Therefore, the main objective of this paper is
to select some of the latest researches in Arabic text classifiers and explore them
to highlight the most prominent improvements and additions that they brought and
what are the research areas that emerged through these improvements, to help users,
researchers and community to take advantage of the information that exists in the
Arabic digital contents. To achieve this goal, the researcher selected five Arabic text
classifications papers published in 2020 that were concerned with using different
techniques to improve the text classification process.
2 Literature Review
Alshaer et al. in [13], studied the impact of ImpCHI squares on the text classi-
fiers (Random Forest, Naïve Bayes Multinomial, Decision Tree, Bayes Net, Artifi-
cial Neural Networks and Naïve Bayes,) and the influence of using improved CHI
squares as feature selection on the process results of the text classification to build
Comparative Study on Arabic Text … 219
the model according to precision, F-measure, Recall and Time. Also, they described
the importance of data pre-processing steps in the text classification process to derive
supporting results and improve efficiency.
Chantar et al. in [14], studied the impact of Enhanced Grey Wolf Binary Optimizer
(GWO) in the FS packaging method on the Arabic text classification problem, then the
authors, using News datasets Akhbar-Alkhaleej, Al-jazeera and Alwatan, compared
the the proposed method performance with SVM, Decision Trees, NB and KNN
classifiers.
Bahassine et al. in [15], proposed improved method that concern with employing
the Chi-square feature selection (referred to, hereafter, as ImpCHI) to make an
enhancements on Arabic text classification performance, and compared it with three
metrics (mutual information, information gain and Chi-square).
Marie-Sainte et al. in [16], studied a new proposed algorithm (firefly algorithm
based feature selection method) and applied it on different combinatorial problems.
This technique was validated by using the Support Vector Machine classifier and
three evaluation measures (precision, recall and F-measure).
Ashraf Elnagar in [17], (Arabic text classification using deep learning models,
2020), introduce a new freely rich and unprejudiced datasets for both Arabic text
categorization tasks: single-label (SANAD) and multi-label (NADiA) tasks. Also
proposed a comprehensive comparison of various deep learning models that are
used to classify Arabic text to evaluate the effectiveness of such models in NADiA
and SANAD datasets. Some other optimization methods can be used to optimize the
problems as given in [18–23].
3 Background
1. Text Classification
Text classification or categorization is the practice of allocating a given text document

to one or more predefined labels or categories [24, 25]. It aims to obtain valuable
information from unstructured text documents to be employed in many applications
such as detecting and categorizing spam emails, news monitoring, and indexing
scientific articles in an automated way [26, 27].
Generally, there are two types of label classification: single-label classification
(assigning documents to a single specific related class or category) and multi-label
classification (meaning that each document or instance is identified as several cate-
gories or categories) [1, 2]. Basically, in most cases, the text document will run as a
frequency vector of words [3, 4].
2. Deep Learning Models
A deep neural network (DNN) is a neural network with a deep, rich set of hidden
layers. The three main parts of the network are the input layer, the hidden layer, and
the output layer. As the name suggests, the main purpose of each type of layer is as
input or output, except for hidden layers. The hidden layer is an additional layer that is
added to the network to add more calculations, where the task is too complicated for
a small network. The number of hidden layers can reach a hundred or more. DNN
has excellent precision and is considered revolutionary. There are many types of
DNNs (Convolution neural networks (CNN), Recurrent neural networks (RNN) and
others), the difference between the various DNN models is how they are connected
[28], Arabic text classification using deep learning models, 2020).
3. Feature Selection
Feature selection is one of the most important elements that might increase the sorting
process’ performance. It is the elimination of redundant and irrelevant data and the
selection of important data to reduce the complexity of the classification process
[15].
4. CHI Square
CHI Square is a statistical approach for extracting random data from large data sets
using two independent variables and two variables. In the data mining process, it is
a method for selecting features. The CHI square method is used in the preprocessing
step of the text classification system [13].
5. Improved CHI Square
The Enhanced CHI method (impCHI) is an enhancement of the classical CHI method.
The ImpCHI method is used in conjunction with Chinese. The research result showed
that the function is effective when selecting Arabic text data. Additionally, ImpCHI
squares are used with Arabic and decision trees when using the optical drying process.
Given results showed that, in terms of recovery measures, ImpCHI performs better
than conventional CHI.
6. Grey Wolf Optimizer (GWO)
This algorithm was proposed in [29], it’s one of the most recent swarm intelligence
(SI) algorithms, which has attracted the attention of many researchers in different
fields of optimization.
7. Firefly Algorithm
Firefly Algorithm (FA) is Bio-inspired algorithm it is also well-known and efficient
algorithm [30]. It was successfully applied in the FS concept to deal with Arabic
speech recognition systems but it was not implement for Arabic text classification
[31].
4 Literature Review Results and Discussion
Different types of Arabic text classifiers were used by Alshaer et al. in [13] (Bayes Net
(BN), Naïve Bayes (NB), Naïve Bayes Multinomial (NBM), Random Forest (RF),
Decision Tree (DT) and Artificial Neural Networks (ANNs)) with improved CHI
(ImpCHI) Square algorithm and compared it to each other according to the Average
precision, Average Recall, Average F-measure, and Average Time, by conducting
six tests for each classifier: without pre-processing, with pre-processing, without
pre-processing and CHI, with pre-processing and CHI, without pre-processing and
ImpCHI, and with pre-processing and impCHI. The results of this study show that
using ImpCHI square as feature selection method, gave better results in precision,
Recall and F-measure. But it gave worse results in Time build model. Moreover,
results have the superiority over the classified CHI Square without the pre-processing
for Avg. precision, Avg. Recall, Avg. f-measure and Avg. time. Overall, Naïve Bayes
classifiers get the best results for Avg. precision, Avg. Recall and Avg. F-measure
which means the Naïve Bayes classifier is the best algorithm that was compared.
The used dataset was collected from different Arabic resources and contains 9055
Arabic documents.
In another study, Bahassine et al. in [15], feature selection method with improved
Chi-square and SVM classifier was used to enhance Arabic text classification process,
and compared results, via common evaluation criteria’s precision, recall and f-
measure, with previous features selection methods Mutual Information (MI), Chi-
square, Information Gain (IG) and Term Frequency-Inverse Document Frequency
(TFIDF). results showed that ImpCHI performs better than other features selection
for most features, When the number of features not equal 20, at different sizes of
features the results are better in precision, recall and f-measure when using SVM
classifier compared to DT for all features selection. But this study mentions an easily
interpretable result by non-export done by the decision tree, which helps to identify
for every class the important and pertinent term, while SVM is difficult to interpret
the results.
Chantar et al. in [14] within a wrapper FS approach proposed an enhanced binary
grey wolf optimizer (GWO) using different learning models with classifiers deci-
sion trees, K-nearest neighbour, Naive Bayes, and SVM and Three Arabic public
datasets, Alwatan, Akhbar-Alkhaleej, and Al-jazeera-News to study and evaluate
the efficacy of different BGWO-based wrapper methods. Two different methods are
proposed to convert continuous GWO (CGWO) to binary version (BGWO) BGWO1
and BGWO2. Also, common evaluation criteria’s precision, recall and f-measure
were used. The results of this research show that a great performance added via the
SVM-based feature selection technique, the proposed binary GWO optimizer and
the elite-based crossover scheme in the Arabic document classification process.
Marie-Sainte et al. in [16], go with another different approach to enhanced the
Arabic text classification in different combinatorial problems using Firefly Algo-
rithm based Feature Selection. Support Vector Machine classifier, three evaluation
measures (precision, recall and F-measure) had been used to validate this method.
The data set named OSAC used in this study was collected from the BBC and
CNN Arabic websites. The data set also contains 5843 text documents. It is divided
into two subsets to construct the training and test data of the classification system. The
preprocessing stage was skipped in this study because the dataset has already been
preprocessed. The results of this paper showed that the proposed feature selection
method is very efficient in improving Arabic Text Classification accuracy and the
precision value of this method achieves values equal to 0.994, which is great evidence
of its efficiency.
In a very attractive and extensive study, Ashraf Elnagar in [28], on the impact of
the deep learning model in Arabic text classification, proposed and introduce free,
rich and unbiased dataset freely available to the research community, for both tasks
(single-label, multi-label) of Arabic text classification were called in order SANAD
and NADiA, The final size of NADiA is approximately 485,000 articles, covering
a subset of 30 categories. In this research, nine deep learning models (BIGRU,
BILSTM, CGRU, CLSTM, CNN, GRU, HANGRU, HANLSTM and LSTM) were
developed for Arabic text classification tasks with no pre-processing requirements.
This study shows that all models work well in the SANAD corpus. The lowest
precision achieved by the convolutional GRU is 91.18% and the highest perfor-
mance achieved by the GRU of care is 96.94%. Regarding NADiA, Attention-GRU
achieved the highest overall accuracy rate of 88.68% in the largest subset of the 10
categories in the "Masrawy" dataset.
5 Results and Discussion
The total number of reviewed publications in this study were 5, 1 publication imple-
ment firefly algorithm, 1 publication implement binary grey wolf optimizer, 2 publi-
cations used improved CHI Square and 1 publication implement the deep learning
models. The selected publications were published in 2020. 2 publications intro-
duced new datasets one of them introduce extensive and large dataset, also, all of the
publications used a ready dataset, some of them have been already preprocessing.
Overall, all of the reviewed publications gave an improvement using the proposed
method of each other for Arabic text classification process.
List of challenges and research opportunities achieved by this study:
. Low resources of Freely Available Arabic datasets still an important challenge to
researchers.
. a verified good classifier on document classification like Naive Bayes and SVM
can be used with other methods proposed in other studies.
. The proposed methods can be used with other classifiers even if it is giving worse
results with specific method.
. ImpCHI, Firefly and GWO are affective methods which have a good research
opportunity.
. Deep learning models is an important technique that may be implemented by
adapted by any method or algorithms with superiority effective results.
6 Conclusions and Future Work
In recent years, the classification of Arabic texts has been regarded as one of the
most important topics in the field of knowledge discovery. Large amounts of data
are submitted online every day, from social media posts and comments to product
reviews. By using Arabic text classification tools, these data sources can be used to
obtain useful information. Our research explored and analyzed five recent articles
that applied different techniques to explore and improve the classification of Arabic
texts. Our findings are summarized as follows:
. Arabic dataset still considers as Low-resource for researchers.
. Using verified classifiers on deferent algorithms may enhance Arabic text
classifications.
. There are many research opportunities for the hot topics considered in deep
learning.
In the future work, we will expand the selected publications to all publications
that publish in 2020 and find the most effective classifier and method that may
accept enhancement, besides the worse classifier and methods that used in Arabic
text classifications.
References
1. Jackson, P., & Moulinier, I. (2007). Natural language processing for online applications: text
retrieval, extraction and categorization (vol. 5). John Benjamins Publishing.
2. Sanasam, R., Murthy, H., & Gonsalves, T. (2010). Feature selection for text classification based
on Gini coefficient of inequality. FSDM, 10, 76–85.
3. Feldman, R. (2007). The text mining handbook: Advanced approaches in analyzing unstruc-
tured data. Cambridge University Press.
4. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval.
Springer.
9. Abualigah, L., Diabat, A., & Elaziz, M. A. (2021). Intelligent workflow scheduling for big data
8th international conference on information technology (ICIT) (pp. 580–587). IEEE.
13. Alshaer, H., Otair, M., Abualigah, L., Alshinwan, M., & Khasawneh, A. (2020). Feature
selection method using improved CHI Square on Arabic text classifiers.
14. Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I., & Faris, H. (2020).
Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text
classification.
15. Bahassine, S., Madani, A., Al-Sarem, M., & Kissi, M. (2020). Feature selection using an
improved Chi-square for Arabic text.
16. Marie-Sainte, S. L., & Alalyani, N. (2020). Firefly algorithm based feature selection for Arabic
text classification.
17. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models.
113609.
10, 16150–16177.
24. Khreisat, L. (2009). A machine learning approach for Arabic text classification using N-gram
frequency statistics. Journal of Informetrics, 72–77.
25. Sebastiani, F. (2005). Text categorization. In J. H. Doorn, L. C. Rivero, & V. E. Ferraggine
(Eds.), Encyclopedia of database technologies and applications (pp. 683–687). IGI Global.
26. Dharmadhikari, S., Ingle, M., & Kulkarni, P. (2011). Empirical studies on machine learning
based text classification algorithms. Advanced Computing: An International Journal, 161–169.
27. El Kourdi, M., Bensaid, A., & Rachidi, T. (2004). Automatic Arabic document categoriza-
tion based on the Naïve Bayes algorithm. In Proceedings of the workshop on computational
approaches to Arabic script-based languages (pp. 51–58).
28. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models. Information Processing and Management.
29. Mirjalili, S., Mirjalili, S. M., & Lewisa, A. (2014). Grey Wolf optimizer. Advances in
Engineering Software.
30. Sayadi, M. K., Ramezanian, R., & Ghaffarinasab, N. (2010). A discrete firefly meta-heuristic
with local search for makespan minimization in permutation flow shop scheduling problems.
International Journal of Industrial Engineering Computations.
31. Harrag, A., & Nassir, H. (2014). Firefly feature subset selection application to Arabic speaker
recognition system. International Journal of Engineering Intelligent Systems for Electrical
Engineering and Communications.
Pedestrian Speed Prediction Using Feed
Forward Neural Network
Abubakar Dayyabu, Hashim Mohammed Alhassan, and Laith Abualigah
Abstract Pedestrian speed behavior is governed by the pedestrian characteristic of

Gender, age, group size, and facility types, as investigated by many researchers in
dynamic pedestrian studies. However, little attention is given to investigating the
effect of pedestrian dresses on pedestrian speed behavior. This research investigates
the effect of dressing types on pedestrian speed behavior through the use of non-linear
feed-forward neural networks to model the pedestrian speed behavior considering the
dressing effect on the overhead pedestrian crossing bridge. The research uses a video
method of data collection, a manual method of data extraction from video, excel, and
Minitab for statistical analysis, artificial neural network (ANN) for model building,
training, validation, and prediction. The statistical analyses indicate ascending direc-
tion speed to be higher than descending direction pedestrian speed with a value of
67.72 m/min and 52.19 m/min. The speed distribution also indicate male pedestrian
wearing English/short African clothes and cover shoe to have a higher mean speed
of 84.21 m/min and 60.10 m/min in ascending descending direction The artificial
neural network was satisfactory in building, training and validation as indicated by
R and RMSE values presented in Table 3a–d, respectively.
Keywords Pedestrian microscopic modeling · ANN · R and RMSE
A. Dayyabu · H. M. Alhassan
Department of Civil Engineering, Bayero University Kano, Gwarzo Road New Campus,
Kano 700241, Nigeria
e-mail: hmalhassan.civ@b.u.k.edu.ng
A. Dayyabu
Department of Civil Engineering, Nile University of Nigeria, Abuja, Nigeria
L. Abualigah (B)
Jordan
https://doi.org/10.1007/978-3-031-17576-3_11
226 A. Dayyabu et al.
1 Introduction
Walking has been the oldest, natural, and most used mode of transportation by a
man in search of materials to shelter, the water to drink, and the food to eat for his
survival, as such pedestrian facilities could be traced as far back as the origin of
man when the first man was brought on the earth surface. After shelter, the first man
created a footpath to source for water to drink, the food to eat, and footpath remain
the only means of transportation until when animals were domesticated [1]. Many
walk for recreation, for exercise, some walk due to its health benefits, some walk due
to its simplicity, and some walk due to its cheapness and no personal vehicle [2, 3].
Despite the advantages above, usage, and historical origin, little attention is given
to the walking facilities regarding design standards, regulation, and safety. These
results in higher pedestrian-involved accidents.
According to the World health organization (WHO, 2010, 2013, [4]), (22%) of
those killed in road traffic accidents worldwide are pedestrians. The African region
accounts for the highest with thirty-eight percent (38%) even though it has the least
number of motorized vehicles among the six world regions. Nigeria and South Africa
have the highest fatality rates (33.7 and 31.9 deaths per 100,000 population per year,
respectively) in the region. A study conducted in Ghana found that 68% of the
pedestrian killed were knocked down by a vehicle when they were in the middle of
the roadway road crossing [5]. In another study, Ogendi et al. [6] reported that out of
the 176 persons involved in a road traffic accident in Kenya, 59.1% were pedestrians.
The study also revealed that 72.6% of the pedestrians involved were injured while
crossing the road, 11% were standing by the road, while 8.2% were walking along
the road, and another 8.2% were hit while engaging in other activities, including
hawking. The trend is similar in Nigeria; for instance, Aladelusi et al. [7] found
pedestrians to be among the highest victims of a road traffic accident. Also, Solagberu
et al. [8] investigated pedestrian injuries in Lagos, Nigeria, found that 67% out of
702 pedestrians involved in a road accident resulted from road crossing instances.
Odeleye [9] mentioned poor planning, reckless motorized drivers’ behavior toward
pedestrians, and unsafe state of road traffic environment as the leading causes of a
pedestrian accident in Nigeria.
Based on the rising trend in pedestrians’ fatality globally and locally, under-
standing pedestrians’ behavior has been the focus of this research. This study aims
to develop a model for predicting pedestrians’ speed using an artificial neural network
(ANN) approach based on the field data considering the effect of gender, clothing
types, and shoe types worn by individual pedestrians in Kano, Nigeria. The micro-
scopic pedestrian model is extensively studied by many researchers, including [10],
that used the concept of magnetic theory to described movement, representing the
movement of each pedestrian by the motion of a magnetized object in a magnetic
field, assuming each pedestrian and obstacle to be positive magnetic pole and the
pedestrian destination to be a negative magnetic pole. Gipps and Marksjö [11] used
a CA-like concept to model pedestrian traffic flow. The authors use reverse gravity-
based rules to move pedestrians over a grid of hexagonal cells. Blue and Adler ([12],
Pedestrian Speed Prediction Using Feed Forward Neural Network 227
2001) used cellular automata principles to model pedestrian behavior on a unidirec-

tional and bi-direction movement. Dijkstra and Jessurun [13] and Wang et al. [14]
both extended the cellular automata model to simulate pedestrian behavior in public
places. Chen et al. [15] extended the cellular automata model in modeling pedes-
trian behaviors under attracting incidents. Hu et al. [16] extended cellular automata
(CA) to enhance evacuation efficiency and analyze the model concerning queuing
time. Alghadi et al. [17] allowed more pedestrians to be in the same cell. Lu et al.
[18] extended the floor field cellular automata (CA) model to capture and evaluate
group behaviors’ influence on crowd evacuation as individuals’ presence within the
crowd with family and friends resulting in a mixture of groups than a pure collection
of individuals. Helbing and Molnár [19] suggested that pedestrians’ behavior along
their movement path could be model as social forces. However, some steps toward it
were taken previously in Lewin [20], who suggested that human behavioral change
is guided by social field or social force. Teknomo [21] extended the social force
model considering the repulsive force to be two, with one coming in effect when
there is a pedestrian in front and the other coming in effect when the radius of two or
more pedestrians overlap. Helbing et al. [22], Lakoba et al. [23], Parisi et al. (2009)
introduced a “self-stopping” mechanism to prevent a pedestrian from pushing over
other pedestrians in the simulation process. Zanlungo et al. [24] introduced colli-
sion prediction and avoidance mechanisms during the simulation process. Moussaid
et al. [25] developed an individual-based model that could describe how a pedestrian
interacts with other members in the same group and with the other group members.
Xun et al. (2015) investigated the effects of spatial distance, occupant density, and
exit width in exit selection in a subway station. Abualigah et al. [26] proposed a new
optimization technique can be used to solve this problem.
Gruden et al. [27] use ANN to model microscopic pedestrian crossing behavior.
Das et al. [28] use ANN to model pedestrian macroscopic traffic flow relationships.
The author compares ANN with the other deterministic models on different pedes-
trian facilities and found the ANN to have an outstanding performance. Zampieri
et al. [29] compare space syntax and ANN to model pedestrian movement behavior
and found ANN to have a better performance with an accuracy of more than 90% of
correlation coefficient and an average error smaller than 0.02.
2 Material and Method
2.1 Data Collection Location
The data for the research was collected at an overhead bridge located at Sa’adatu Rimi
College of education Kano, Nigeria; the bridge is constructed in 2014 by the Kano
state government through the ministry of works, housing, and transport of Kano
state to improve pedestrian safety and reduced delays to the motorist by crossing
pedestrian. The majority of the people using the pedestrian overhead were Sa’adatu
Fig. 1 Location pedestrian bridge 001 for the data collection
Rimi College Of Education, Kano. The college is among the largest Teacher Training
Institution in Nigeria, with a student population above 45,000 in 2012. The location
of the data collection is presented in Fig. 1. The road under the bridge is a four-lane
divided arterial road with a higher traffic flow.
2.2 Data Capturing and Extraction
Hikvision cube IP security camera DS-2CD2442FWD-IW 4 MP WDR was used for

data capturing. The camera was mounted at 7 m above the ground level to complete
pedestrians’ features of gender, clothing types, and shoe types. A 12-h data was
collected from 7:00 a.m. to 7:00 p.m. from Monday through Thursdays. Features
and speed of individuals were extracted from the playback of the recorded video
manually with AVS video editing. Research considers single pedestrians speed data
for pedestrians in the age range of 18–40. The speed data were re-grouped into
three pedestrian combinations, All pedestrians made of all single pedestrian, Combi-
nation I; made of single pedestrian wearing English/ African short clothing type.
Combination II; made of single pedestrian wearing African long/gown.
The input data of pedestrians’ gender, clothing type, shoe type, and speed, obtained
from the playback of the field observation video, were normalized into a standard
scale of 0–1prior to model building and analysis. The normalization was carried out
using the normalization equation presented in Eq. (1)
X i − X min
Xs = (1)
X max − X min
where Xs , is standardized value: Xi is original value; Xmin is the minimum value of

X; Xmax is the maximum value of X.
2.4 Sensitivity Analysis
Sensitivity analysis was carried to find the relationship between the input variables
and the output variable and establish the significance of each input variable in
model building. Pearson product-moment coefficient of correlation was used for
the sensitivity analysis. The Pearson correlation equation is presented in Eq. (2)
Sx y
r=. (2)
Sx x S yy
where; r—is the Pearson product-moment coefficient of correlation; Sxx—is the

standard deviation of variable X; Syy —is the standard deviation of variable Y; Sxy —is
the standard deviation of the product of variable X and Y.
2.5 ANN Model Formulation
An artificial neural network as an AI-based model is a mathematical model that

aims to handle the non-linear relationship of an input–output dataset. Historically,
ANN is information processing tools derived from analogy with the brain’s biolog-
ical nervous system, with the fundamental component called neuron (node) (Sirhan
and Koch, 2013). ANN has proved to be practical regarding complex functions in
various fields, including prediction, pattern recognition, classification, forecasting,
control system, and simulation [30, 31]. Among the different classifications of ANN
algorithms, Feed-Forward Neural Network (FFNN) with Backpropagation (BP) is
widely applied and the most common classes [32]. Artificial neural network (ANN)
is a tremendously fast emerging technique in non-linear modeling due to its predic-
tive capability and ability to quickly learn system behavior. ANN is made of parallel
operating architecture consisting of input, hidden and output layers interconnected by
neurons, as presented in Fig. 2. ANN is trained with the association of input and target
output values by activation function of hidden neurons, and its predictive capability
can be improved by adjusting connection weights of each neuron until the required
performance value is reached (maximum correlation coefficient or minimum mean
square error between the target and output values). The critical problem in solving
complex ANN architecture is obtaining required performance value and the numbers
of hidden layers as well as neurons. There are several alternatives which tried based
Fig. 2 Artificial neural

network (ANN) structure
on the association of input and target output to represent the ANN architecture do to
no know general rules (Bums and Whitesides, 1993).
The research proposes an ANN model based on feed-forward with backpropaga-
tion algorithm. The chosen feed-forward ANN comprises of input; a hidden layer
and an output layer. The required number of neurons in the hidden layer is selected
by trial and error based on the best performance value. The input layer comprises
of 2 neurons; 3 neurons; 4 neurons; 5 neurons, which the target output layer has a
single neuron of field observed speed. The strength of each connection of neurons
is referred to as weight. The sum of the inputs and their weights processing into a
summation operation is given in Eq. (3)
.
n
N E TJ = Wi j X i j (3)
i=1
where Wij is established weight; Xij is input value; NETj is input to a node in layer j.
In the backpropagation technique, the target output neuron quantified by a sigmoid
function is given by Eq. (4)
1
f (N E TJ ) = (4)
1 + exp(−N E TJ )
The backpropagation algorithm is analogous to supervised training and minimizes

the sum of square error by modifying connection weights.
2.6 ANN Model Validations
Validation is an essential part of modeling as it demonstrates how reasonable the

model represents the actual system. The coefficient of correlation, coefficient of
determination, MSE, and RMSE is used for model validation. RMSE represents the
sample standard deviation of the differences between predicted values and observed
values. These values of R2 , R, MSE, and RMSE are estimated using Eqs. 5–8.
Table 2b, d present the validation result of both ascending and descending direction
pedestrians.
.n
(Oi − Pi )2
R = 1 − .i=1
2
n ( ) (5)
i=1 Oi − O
√
R = R2 (6)
.n
(Oi − Pi )2
MSE = i=1
(7)
N
√
RMSE = MSE (8)
3 Results Analysis and Discussion
3.1 Descriptive of Observed Pedestrian Data.
The data collected were classified into discrete and continuous the discrete data were
presented in Fig. 3a–e; Fig. 3a pedestrian classification based on gender type; Fig. 3b
pedestrian classification based on the direction of movement; Fig. 3c pedestrian
classification based on Age group; Fig. 3d pedestrian classification based on clothing
types; Fig. 3e pedestrian classification based on shoe types.
The research shows the presence of different types of pedestrians with a total
pedestrian observed was 5672 male, 4443 in ascending direction, 1229 in descending
direction and 1138 female, 983 in ascending direction, and 155 in descending direc-
tion as presented in Fig. 3a. The pedestrian group sizes observed were single pedes-
trian having a total of 4219, 3254 in ascending direction, 965 in descending direc-
tion, two pedestrian groups having a total 1939, 1716 in ascending direction, 223 in
descending direction, three pedestrian group having a total of 631, 456 in ascending
direction, 175 in descending direction, four pedestrian group having a total of 271,
250 in ascending direction and 21 in descending direction as presented in Fig. 3b.
The pedestrians comprise all age with a pedestrian in the age range between 18–40
having a total of 5233, 4182 in ascending direction 1051 in descending direction,
age range less than 18 having a total of 402 pedestrians 242 in ascending direction
and 160 in descending direction and age range more significant than 40 with a total
pedestrians 1175, 1002 in ascending direction and 173 in descending direction as
presented in Fig. 3c. The pedestrians were observed wearing different types of clothes
ranging from English wear with a total of 1601, 1342 in ascending direction, 259
Fig. 3 a Pedestrian classification based on gender type. b Pedestrian classification based on the
direction of movement. c Pedestrian classification based on age group. d Pedestrian classification
based on clothing types. e Pedestrian classification based on shoe types
in descending direction, short African wear having a total of 583, 344 in ascending
direction, 239 in descending direction, long African wear having a total of 4131,
3407 in ascending direction, 724 in descending direction, and pedestrian wearing
gown/hijab accounted for a total of 495, 333 pedestrians in ascending direction and
162 in descending direction as presented in Fig. 3d. The pedestrians observed were
wearing a different type of shoes, 1756 were wearing a cover shoe, 1376 in ascending
direction, 380 in descending direction, while 5054 pedestrians were wearing slippers,
4050 in ascending direction, 1004 in descending direction as presented in Fig. 3e.
3.2 Speed Characteristic and Distribution Results
The speed characteristics of maximum, minimum, and mean speed for all the different
pedestrian combination mentioned in the methodology are presented; with Table 1a
presenting male pedestrians speed characteristic based on cover shoe type; Table
1b presenting male pedestrians speed characteristic based on slipper shoe type and
Table 1c presenting female pedestrians speed characteristic based on slipper shoe
type.
The statistical analyses presented in Tables 1a–c indicate ascending direction
speed to be higher than descending direction pedestrian speed with a value of
67.72 m/min and 52.19 m/min, respectively. The speed distribution also indicates
male pedestrian wearing English/short African clothes and cover shoe to have a
higher mean speed of 84.21 m/min and 60.10 m/min in ascending descending direc-
tion followed by male pedestrians wearing English/short African clothes and slippers
shoe with a mean speed of 72.6 and 57.7 m/min in ascending and descending direc-
tion, followed by male pedestrians wearing long/gown clothes type and cover shoe
Table 1 a Male pedestrian speed characteristics base on cover shoe type. b Male pedestrian speed
characteristics base on slippers shoe type. c Female pedestrian speed characteristics base on slippers
shoe type
(a)
All pedestrian Pedestrian comb. I Pedestrian comb. II
Ascending Descending Ascending Descending Ascending Descending
No. of pedestrian 1167 300 240 60 141 39
Max (m/min) 102 85 102 63.75 78 54.35
Min (m/min) 34 34 51 46.36 51.57 42.35
Mean (m/min) 67.74 52.19 84.21 60.1 70.14 58.92
(b)
No. of pedestrian 1167 300 203 56 583 145
Max (m/min) 102 85 102 72.86 70 48.35
Min (m/min) 34 34 56.67 46.36 39.35 34.23
Mean (m/min) 67.74 52.19 72.7 57.70 68.3 56.07
(c)
No. of pedestrian 330 123 37 18 293 105
Max (m/min) 85 85 85 72.86 85 85
Min (m/min) 26.84 28.33 28.33 34 26.84 28.33
Mean (m/min) 50.42 47.09 55.25 50.5 49.1 48.90
Table 2 a Pearson correlation coefficient matrix for ascending direction pedestrians. b Pearson
correlation coefficient matrix for descending direction pedestrians
(a)
Male Female C-Type I C-Type II S-Type I S-Type II Seed
Male 1
Female -1 1
C-Type I 0.002295 −0.00229 1
C-Type II 0.010094 −0.01009 −0.9053 1
S-Type I −0.07758 0.077585 0.319891 −0.39542 1
S-Type II 0.06734 −0.06734 −0.32385 0.389791 −0.99453 1
Seed 0.205272 −0.20527 0.522692 −0.49069 0.611517 −0.6234 1
(b)
Male Female C-Type I C-Type II S-Type I S-Type II Seed
Male 1
Female -1 1
C-Type I 0.267199 −0.2672 1
C-Type II −0.20397 0.203973 −0.95477 1
S-Type I 0.147209 −0.14721 0.153344 −0.15923 1
S-Type II −0.14413 0.144127 −0.1508 0.157253 −0.98757 1
Seed 0.487616 −0.487622 0.822385 −0.78832 0.433206 −0.42792 1
with a mean speed of 70.14 m/min and 58.92 m/min in ascending and descending
direction, followed by male pedestrians wearing long/gown clothes type and slipper
shoe with a mean speed of 68.30 m/min and 56.07 m/min in ascending and descending
direction, followed by female wearing English/short African clothes and slipper shoe
with a mean speed of 55.25 m/min and 50.5 m/min in ascending and descending direc-
tion and lastly female wearing long/gown African clothes and slipper shoe with a
mean speed of 55.25 m/min and 50.5 m/min in ascending and descending direction.
The speed distribution of the observed pedestrian data was presented based on
the combinations specified in the methodology; Fig. 4a for all single pedestrians
in ascending direction; Fig. 4b, for all single pedestrians in descending direction;
Fig. 4c–f for male pedestrians wearing a cover shoe. Figure 4g–j for male pedestrians
wearing slippers. Figure 4k–n for female pedestrians wearing slippers.
3.3 Result of Sensitivity Analysis
The research uses the Pearson correlation method in determining the order of impor-
tance of each variable in model building. Table 1a, b presented the Sensitivity anal-
ysis provide a relationship between the independent with the dependent variable and
Table 3 a ANN model training ascending direction. b ANN model validation ascending direction.
c ANN model training descending direction. d ANN model validationdescending direction
(a)
Training-phase
R2 R MSE RMSE
ANN-M1 0.4125 0.6423 0.03880 0.1971
ANN-M2 0.4559 0.6752 0.0357 0.1890
ANN-M3 0.4953 0.7038 0.0326 0.1806
ANN-M4 0.4165 0.6454 0.0386 0.1964
ANN-M5 0.5020 0.7085 0.0320 0.1790
(b)
Validation-phase
R2 R MSE RMSE
ANN-M1 0.4272 0.6536 0.0364 0.1908
ANN-M2 0.4499 0.6708 0.0348 0.1866
ANN-M3 0.4946 0.7046 0.0312 0.1767
ANN-M4 0.4311 0.6566 0.0362 0.1901
ANN-M5 0.4948 0.7034 0.0314 0.1771
(c)
Training-phase
R2 R MSE RMSE
ANN-M1 0.3997 0.6366 0.0285 0.1687
ANN-M2 0.5908 0.6322 0.0287 0.1695
ANN-M3 0.5482 0.7687 0.0196 0.1400
ANN-M4 0.6193 0.7404 0.0216 0.1471
ANN-M5 0.6193 0.7870 0.0182 0.1350
(d)
Training-phase
R2 R MSE RMSE
ANN-M1 0.3974 0.6304 0.0297 0.1723
ANN-M2 0.3975 0.6305 0.0297 0.1723
ANN-M3 0.5803 0.7618 0.0207 0.1438
ANN-M4 0.5405 0.7352 0.0226 0.1505
ANN-M5 0.6077 0.7795 0.0193 0.1390
Fig. 4 a Pedestrian speed distribution (ALL PEDESTRIAN ACSEND DIR). b Pedestrian speed
distribution (ALL PEDESTRIAN DESCEND DIR). c Pedestrian speeddist (PEDESTRIAN COMB.
I ACSEND DIR based on cover shoe). d Pedestrian speeddist (PEDESTRIAN COMB. I DESCEND
DIR based on cover shoe). e Pedestrian speeddist (PEDESTRIAN COMB. II ACSEND DIR based
on cover shoe). f Pedestrian speeddist (PEDESTRIAN COMB. II DESCEND DIR based on cover
shoe). g Pedestrian speeddist (PEDESTRIAN COMB. I ACSEND DIR based on slipper shoe). h
Pedestrian speeddist (PEDESTRIAN COMB. I DESCEND DIR based on slipper shoe). i Pedestrian
speeddist (PEDESTRIAN COMB. II ACSEND DIR based on slipper shoe). j Pedestrian speed
dist (PEDESTRIAN COMB. II DESCEND DIR based on slipper shoe. k Pedestrian speeddist
(PEDESTRIAN COMB. I ACSEND DIR based on slipper shoe). l Pedestrian speeddist (PEDES-
TRIAN COMB. I DESCEND DIR based on slipper shoe). m Pedestrian speeddist (PEDESTRIAN
COMB. II ACSEND DIR based on slipper shoe). n Pedestrian speeddist (PEDESTRIAN COMB.
II DESCEND DIR based on slipper shoe
Fig. 4 (continued)
each variable’s significance in model building. Table 2a presents the relationship for
ascending direction, and Table 2b presents the relationship for descending direction.
Moreover, the sensitivity analysis result indicates that shoe types have more signif-
icance in ascending direction, with slippers being the most significant followed by a
cover shoe, followed by clothing type I, followed by clothing type II and less is the
gender presented in Table 2a. While in descending direction, clothing type I is the
most significant, followed by clothing type II, followed by female gender, followed
by male gender, followed by cover shoe type, and shoe type II, as presented in Table
2b.
3.4 Model Estimation Analysis Results
In this research, a two-layer feed-forward network trained with Levenberg–

Marquardt algorithm is used to analyze ANN Models. Feed-forward networks consist
of a series of layers, and each subsequent layer has a connection from the previous.
The model was built using the MATLAB 2019a; five ANN models were devel-
oped based on the Pearson correlation coefficients. During the process, 75% of data
for training and 25% for validation were used to analyze ANN models. Network
performance was measured according to the mean of squared error (MSE). Table
3a–d present the performance measure for all the five ANN models in training and
validation in ascending and descending directions.
From the ANN performance analysis presented in Table 3a–d, the values of R
and RMSE showed evidence that ANN could be used to model pedestrian speed on
the stair as all the values of R from model 1 to model 5 are more significant than
0.5 and model 5 has the best performance with R-value of (0.7085 and 0.7034) in
training and validation ascending direction and (0.7870 and 0.7795) in training and
validation descending direction (Fig. 5).
4 Conclusion
The artificial intelligent modeling based on ANN could be used in pedestrian speed
prediction considering the effect of gender, clothing types, and shoe types, as shown
in the ANN performance analysis conducted in this research. All the ANN models
built from the observed data have the performance greater than 0.5, indicating the
acceptability of ANN in pedestrian speed prediction on a stairway.
The research also concluded that dressing of pedestrian in terms of clothing, shoe
type, and gender affects pedestrian speed, male pedestrians wearing English/short
African clothes with cover shoe has the highest speed compared with any other
dressing a pedestrian could wear. Female pedestrians wearing long African with
slippers have less speed than any other pedestrian combination.
Fig. 5 a Pedestrian speed relationship between predicted and observed data (TRAINING). b Pedes-
trian speed relationship between predicted and observed data (TESTING). c Pedestrian speed
relationship between predicted and observed data (TRAINING). d Pedestrian speed relationship
between predicted and observed data (TESTING)
References
1. Jacobson, H. R. (1940). A history of roads from ancient times to the motor age (Georgia
Institute of Technology). https://smartech.gatech.edu/bitstream/handle/1853/36216/jacobson_
herbert_r_194005_ms_95034.pdf
2. Olojede, O., Yoade, A., & Olufemi, B. (2017). Determinants of walking as an active travel
mode in a Nigerian city. Journal of Transport and Health, 6, 327–334. https://doi.org/10.1016/
j.jth.2017.06.008
3. Litman, T. (2011). Evaluating public transportation health benefits. (April). http://site.ebrary.
com/lib/sfu/docDetail.action?docID=10534560
4. WHO. (2015). Global status report on road safety 2013. WHO. http://www.who.int/violence_
injury_prevention/road_safety_status/2013/en/
5. Damsere-Derry, J., et al. (2010). Pedestrians’ injury patterns in Ghana. Accident Analysis and
Prevention, 42(4), 1080–1088.
6. Ogendi, J., Odero, W., Mitullah, W., & Khayesi, M. (2013). Pattern of pedestrian injuries in
the city of Nairobi: Implications for urban safety planning. Journal of Urban Health, 90(5),
849–856.
7. Aladelusi, T. O., et al. (2014). Evaluation of pedestrian road traffic maxillofacial injuries in a
Nigerian tertiary hospital. African Journal of Medicine and Medical Sciences, 43(4), 353–359.
8. Solagberu, B. A., et al. (2014). Child pedestrian injury and fatality in a developing country.
Pediatric Surgery International, 30(6), 625–632.
9. Odeleye, A. J. (2001). Improved road traffic environment for better child safety in Nigeria. In
Road user characteristics with emphasis on life-styles, quality of life and safety—proceedings
of 14th ICTCT workshop held Caserta, Italy, October, 2001, pp. 72–82. http://trid.trb.org/view/
745284
10. Okazaki, S., & Matsushita, S. (1979). A study of simulation model for pedestrian movement. In
Architectural space, part 3: along the shortest path, taking fire, congestion and unrecognized
space into account, transactions of architectural institute of Japan, 285. https://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.626.596
11. Gipps, P. G., & Marksjö, B. (1985). A micro-simulation model for pedestrian flows. Math-
ematics and Computers in Simulation, 27(2–3), 95–105. https://doi.org/10.1016/0378-475
4(85)90027-8
12. Blue, V. J., & Adler, J. L. (1998). Emergent fundamental pedestrian flows from cellular automata
microsimulation. Transportation Research Record: Journal of the Transportation Research
Board, 1644(1), 29–36. https://doi.org/10.3141/1644-04
13. Dijkstra, J., & Jessurun, J. (2001). Theory and practical issues on cellular automata. Theory
and practical issues on cellular automata, (January 2000). https://doi.org/10.1007/978-1-4471-
0709-5
14. Wang, J., Zhang, L., Shi, Q., Yang, P., & Hu, X. (2015). Modeling and simulating for congestion
pedestrian evacuation with panic. Physica A: Statistical Mechanics and Its Applications, 428,
396–409. https://doi.org/10.1016/j.physa.2015.01.057
15. Chen, Y., Chen, N., Wang, Y., Wang, Z., & Feng, G. (2015). Modeling pedestrian behaviors
under attracting incidents using cellular automata. Physica A: Statistical Mechanics and Its
Applications, 432, 287–300. https://doi.org/10.1016/j.physa.2015.03.017
16. Hu, J., You, L., Zhang, H., Wei, J., & Guo, Y. (2018). Study on queueing behavior in pedestrian
evacuation by extended cellular automata model. Physica A: Statistical Mechanics and Its
Applications, 489, 112–127. https://doi.org/10.1016/j.physa.2017.07.004
17. Alghadi, M. Y., Mazlan, A. R., & Azhari, A. (2019). The impact of board gender and multiple
directorship on cash holdings: Evidence from Jordan. International Journal of Finance and
Banking Research, 5(4), 71–75.
18. Lu, L., Guo, X., & Zhao, J. (2017). A unified nonlocal strain gradient model for nanobeams
and the importance of higher order terms. International Journal of Engineering Science, 119,
265–277.
19. Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics. Physical Review
E, 51(5), 4282–4286. https://doi.org/10.1103/PhysRevE.51.4282
20. Lewin, K. (1951). Field theory in social science. Amazon.co.uk: Lewin, Kurt: Books. Retrieved
September 24, 2020, from https://www.amazon.co.uk/Field-Theory-Social-Science-Lewin/dp/
B0007DDXKY
21. Teknomo, K. (2006). Application of microscopic pedestrian simulation model. Transportation
Research Part F: Traffic Psychology and Behaviour, 9(1), 15–27. https://doi.org/10.1016/j.trf.
2005.08.006
22. Helbing, D., Buzna, L., Johansson, A., & Werner, T. (2005). Self-organized pedestrian crowd
dynamics: Experiments, simulations, and design solutions. Transportation Science, 39(1), 1–
24.
23. Lakoba, T. I., Kaup, D. J., & Finkelstein, N. M. (2005). Modifications of the Helbing-Molnár-
Farkas-Vicsek social force model for pedestrian evolution. Simulation, 81(5), 339–352. https://
doi.org/10.1177/0037549705052772
24. Zanlungo, F„ Brščić, D., & Kanda, T. (2014). Pedestrian group behaviour analysis under
different density conditions. Transportation Research Procedia, 2, 149–158. https://doi.org/
10.1016/j.trpro.2014.09.020
25. Moussaïd, M., Perozo, N., Garnier, S., Helbing, D., & Theraulaz, G. (2010). The walking
behaviour of pedestrian social groups and its impact on crowd dynamics. PLoS ONE, 5(4),
e10047. https://doi.org/10.1371/journal.pone.0010047
113609.
27. Gruden, C., Otković, I. I., & Šraml, M. (2020). Neural networks applied to microsimulation:
A prediction model for pedestrian crossing time. Sustainability (Switzerland), 12(13).
28. Das, P., Parida, M., & Katiyar, V. K. (2015). Analysis of interrelationship between pedestrian
flow parameters using artificial neural network. Journal of Medical and Biological Engineering,
35(6), 298–309.
29. Zampieri, F. L., Rigatti, D., & Ugalde, C. (2009). Evaluated model of pedestrian movement
based on space syntax, performance measures and artificial neural nets. In 7th International
space syntax symposium, pp 1–8.
30. Govindaraju, R. S. (2000). Artificial neural networks in hydrology. II: Hydrologic applications.
Journal of Hydrologic Engineering, 5(2), 124–137.
31. Solgi, M., Najib, T., Ahmadnejad, S., & Nasernejad, B. (2017). Synthesis and characterization
of novel activated carbon from Medlar seed for chromium removal: Experimental analysis
and modeling with artificial neural network and support vector regression. Resource-Efficient
Technologies, 3(3), 236–248.
32. Elkiran, G., Nourani, V., & Abba, S. I. (2019). Multi-step ahead modelling of river water quality
parameters using ensemble artificial intelligence-based approach. Journal of Hydrology, 577,
123962.
33. Price, J. L., McKeel Jr, D. W., Buckles, V. D., Roe, C. M., Xiong, C., Grundman, M., ... & Morris,
J. C. (2009). Neuropathology of nondemented aging: Presumptive evidence for preclinical
Alzheimer disease. Neurobiology of Aging, 30(7), 1026–1036.
34. Zare, M., & Koch, M. (2016, July). Using ANN and ANFIS models for simulating and
predicting groundwater level fluctuations in the Miandarband Plain, Iran. In Proceedings of the
4th IAHR Europe congress. Sustainable hydraulics in the era of global change (p. 416), Liege,
Belgium.
35. Schuchhardt, J., Schneider, G., Reichelt, J., Schomburg, D., & Wrede, P. (1995). Classification
of local protein structural motifs by kohonen networks. Bioinformatics: From Nucleic Acids
and Proteins to Cell Metabolism, 85–92.
36. Blue, V. J., & Adler, J. L. (2001). Cellular automata microsimulation for modeling bi-directional
pedestrian walkways. Transportation Research Part B: Methodological, 35(3), 293–312.
37. Zheng, X., Li, H. Y., Meng, L. Y., Xu, X. Y., & Chen, X. (2015). Improved social force model
based on exit selection for microscopic pedestrian simulation in subway station. Journal of
Central South University, 22(11), 4490–4497.
Arabic Text Classification Using
Modified Artificial Bee Colony
Algorithm for Sentiment Analysis:
The Case of Jordanian Dialect
Abdallah Habeeb, Mohammed A. Otair, Laith Abualigah,

Anas Ratib Alsoud, Diaa Salama Abd Elminaam, Raed Abu Zitar,
Abstract Arab customers give their comments and opinions daily, and it increases
dramatically through online reviews of products or services from companies, in
both Arabic, and its dialects. This text describes the user’s condition or needs for
satisfaction or dissatisfaction, and this evaluation is either negative or positive
polarity. Based on the need to work on Arabic text sentiment analysis problem, the
case of the Jordanian dialect. The main purpose of this paper is to classify text into
two classes: negative or positive which may help the business to maintain a report
A. Habeeb . M. A. Otair
Faculty of Computer Sciences and Informatics, Amman Arab University, Amman 11953,
Jordan
L. Abualigah (&) . A. R. Alsoud
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman,
Jordan
L. Abualigah
School of Computer Sciences, Universiti Sains Malaysia, 11800 Pulau Pinang, Gelugor,
Malaysia
D. S. A. Elminaam
Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
Faculty of Computer Science, Misr International University, Obour, Egypt
R. A. Zitar
Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, Abu Dhabi,
United Arab Emirates
A. E. Ezugwu
King Edward Road, Pietermaritzburg, KwaZulu-Natal 3201, South Africa
H. Jia
L. Abualigah (ed.), Classification Applications with Deep Learning and Machine Learning
https://doi.org/10.1007/978-3-031-17576-3_12
244 A. Habeeb et al.
about service or product. The first phase has tools used in natural language pro-
cessing; the stemming, stop word removal, and tokenization to filtering the text.
The second phase, modified the Artificial Bee Colony (ABC) Algorithm, with
Upper Confidence Bound (UCB) Algorithm, to promote the exploitation ability for
the minimum dimension, to get the minimum number of the optimal feature, then
using forward feature selection strategy by four classifiers of machine learning
algorithms: (K-Nearest Neighbors (KNN), Support vector machines (SVM),
Naïve-Bayes (NB), and Polynomial Neural Networks (PNN). This proposed model
has been applied to the Jordanian dialect database, which contains comments from
Jordanian telecom company’s customers. Based on the results of sentiment analysis
few suggestions can be provided to the products or services to discontinue or drop,
or upgrades it. Moreover, the proposed model is applied to the database of the
Algerian dialect, which contains long Arabic texts, in order to see the efficiency of
the proposed model for short and long texts. Four performance evaluation criteria
were used: precision, recall, f1-score, and accuracy. For a future step, in order to
build on or use for the classification of Arabic dialects, the experimental results
show that the proposed model gives height accuracy up to 99% by applying to the
Jordanian dialect, and a 82% by applying to the Algerian dialect.
. .
Keywords Natural language processing Text classification Sentiment analy-
. . . . .
sis Feature selection Inspired algorithms ABC UBC KNN SVM PNN . . .
Naïve Bayes
1 Introduction
A part of human intelligence is the use of language in communication, including the

ability to speak, read, and analyze images to understand content. With artificial
intelligence, the method uses machine learning to reach a part of the intelligence
able to read and understand the context [1].
Machine learning algorithms deal with automatic text classification. Learning the
features used to build text classification in various fields like email routing, spam
filtering, web page classification, sentiment analysis, topic tracking. To perform the
text classification job, will use proposed feature selection, the preprocessing tools
stemming, tokenization, and stop word removal with feature selection based on
optimization algorithms, to handle the high-dimensional of features. Feature selec-
tion is an approach to choosing the most valuable features from the dataset of high
dimensionality. Then use it to reduce the performance of classification [2].
The amount of data on the dynamic web pages and Internet is increasing every
second, produced from social media, companies that care about customer opinion,
and multi sources, hence the need to classify the textual documents for unstructured
data. The existence of unstructured data creates a need to have knowledge that used
in many domains. The text classification and categorization use to point the task of
predict predefined domains or categories to given written text. The automated
Arabic Text Classification Using Modified Artificial Bee Colony … 245
classification task to report the relevant multiple and single closed, Format the
unstructured textual to be compatible with ML algorithms. Mine the interesting
knowledge and understand customer needs. The most important task in Natural
language processing (NLP) techniques a sentiment analysis used to determine
textual is positive or negative. The use of NLP to complete automatic analysis of
text, represent data in a format suitable for machine learning [3].
One of the optimization algorithms is the artificial bee colony algorithm
(ABC) that used successfully in many studies. This algorithm suffers in part from its
stochastic feature when search in poor exploitation equation to improve it for best
solutions [4]. Because of this weakness in the algorithm, the ABC algorithm with
elite opposition-based learning strategy is utilized to solve poor exploitation in
original ABC [5]. The examining ABC algorithm with elite opposition-based
learning strategy (EOABC) [5].
Customer feedback is important for the business; for fully understand your
customer’s requirements; to know the level of customer’s satisfaction; it is nec-
essary to take customers notes to evaluate their responses. This can help with
innovation, product development and improve service that build a loyal customer
base. However, the huge volume of data needs to process. In this paper the problem
is about classification Arabic text in Jordanian dialect, which will be used in
classifiers algorithms to test the training Dataset to the predicted label.
The typical ABC algorithms are solutions of some search equation, which are
good at exploration, but often demonstrates insufficient exploitation such that
exploitation is the act of confining the search to a small area of the search space to
refine the solutions. In the artificial bee colony algorithm, the greedy equation
Chooses a food source according to the probability value, based on the roulette
wheel method. A greedy selection applies between the food source and the new
food source. As a first contribution, we modify the Artificial Bee Colony to enhance
exploitation by applying UBC algorithms instead of A greedy selection. To Choose
a food source according to the probability value, and get the optimal solution in
small area of the search exploit. As a second contribution, classifiers reveal the
ability of machine learning through supervised machine learning algorithms used to
determine the value of the text, which can be a negative value expressing dissat-
isfaction, or a positive value expressing satisfaction, in order to describe a person’s
feeling towards a product, service, or current state.
2 Related Works
2.1 Introduction
The research applies for social media content of opinions customers to solve the
Arabic Sentiment (SA) analysis problem. Analyzing their written text to apply in
improve the customer services and product quality. SA dealing with massive data.
To reduce the high dimensionality space need feature selection for machine
learning, proposed a bio-inspired optimizer an enhancement called the salp swarm
algorithm (SSA) designed for feature selection (FS) to solve the problem of Arabic
sentiment analysis. Proposed two phases, first reduce the number of features by
apply filtering technique based on information gain metric [6].
Second phase applies the wrapper (FS) technique with combines (SSA) opti-
mizer with four variants of S-shaped transfer and applies the KNN for classification.
Experimental results show classification accuracy of SSA combined with the
S-shaped transfer, functions outperformed the particle swarms optimizer and the
grey wolf optimizer [6].
The sentiment analysis, proposed model a semi-supervised approach applies in
Arabic and its dialects. this method Made up of a deep learning algorithms, to tackle
classify Arabic text as detecting its polarity (Positive, negative), on a sentiment
corpus. The approach applies on FB Facebook text massages written in MSA
Modern Standard Arabic in DALG Algerian dialect for to scripts Arabic and
Arabizi. They have two option to handle Arabizi, translation and transliteration, the
experimented were done on many test corpora dedicated to DALG/MSD, with deep
learning classifiers such as (LR) Logistic Regression, (RF) Random Forest, (LSTM)
short-term memory and (CNN) Convolutional Neural Network. The classifiers are
combined with fast Text and Word2vec, Experimental results F1 score up 95% and
for extrinsic experiments 89% [7].
The optimization algorithm is the most important way to choose the feature
selection because it is important in the classification process for high-dimensional
text, where it works in select a set of optimal features that reduce calculation and
cost. It improves the accuracy of text classification. Feature Selection method based
on natural difference measurement and binary Jaya optimization algorithm
(NDM-BJO) and evaluations using the Support Vector Machine and Naive Bayes,
to find the error rate. The results show that the NDM-BJO model gives improve-
ments. Evaluating various categories of feature Selection methods [8].
A difficult mathematical task in machine learning is text classification, due to the
large increase in natural language text documents. Here the feature selection is the
basis of the process because thousands of feature sets are possible to classify the
text. The proposed model suggests an enhanced binary gray wolf (GWO) modified
within a wrapper (FS) approach to address Arabic script classification problems.
Shell-based feature selection while using various learning models, Naive Bayes,
K-nearest neighbor and SVM classifiers, training data from three Arab public
datasets, Gulf News, Al Watan and Al Jazeera News, BGWO-based wrapper
methods. Results and analysis show that SVM based feature selection technique
with the proposed binary GWO optimizer with elite-based crossover scheme has
enhanced efficacy in dealing with Arabic text classification problems compared to
other peers [9].
Choose efficient features from datasets is important to artificial intelligence,
pattern recognition, text classification, and data mining, Feature selection (FS) can
exclude features that are not relevant to the classification task and reducing the
dimensions of data sets, which helps us understand better data. By choosing feature
selection, machine learning techniques are performed Optimize, and reduce account
requirements. So far, a large number of feature selection methods suggested, while
the most practical method suggested not found.
Although it is conceivable that different classes of feature selection methods
followed various criteria to evaluate the variables, which were focus on rare studies
Evaluation of the different classes of feature selection methods. Feature selection
methods under five different categories are thirteen superiors, focusing on assess-
ment compare the general diversity and effectiveness of these methods.
Thirteen feature selection methods classified using the rank aggregation method.
The later, the better Five FS methods chosen to perform multi-class classifications.
The SVM a classifier. Different numbers, different languages of the selected fea-
tures, and different performance measures used for general diversity and measure
validation of these methods combined. Analyses results signify the Mahalanobis
distance is the better approach ever [10].
Many different techniques used to identify offensive speech in the media and
tweet community. This research classifies neural networks (NN). To participate in
the task OffensEval No. 12 of the workshop SemEval 2020, a model used to
identify offensive speech C-BiGRU composed of a CNN, bidirectional RNN.
A multidimensional numerical representation or each words and detect it using fast
text, this apply on dataset of labels tweets to train the model on detecting a words
have an offensive meaning, this model use for English, Turkish and Danish.
Respectively models achieved 90.88%, 76.76% and 76.70% F1-score [11].
The emotional state of client’s needs to understand through sentiment analysis
technique in natural language processing. To analyses the Chinese language, the
proposed use LSTM-based Chinese text sentiment analysis, Bi-GRU and, attention
mechanism model. This model works on deep properties of text and merges context
to learn text properties with greater precision. Then the Multi-Head Self-Attention
Model used to reduce external transactions and determine word weights and mis-
lead the distinct text. The experiment gets 87.1% accuracy [12].
Cyberbullying is a problem that has victims, with the increase in the use of the
Internet, more cyberbullying results. Classification studies on bullying in Arabic
and English have done. This paper suggests using RNN algorithms with trained
pre-word embedding an interconnected set of experiences on channel News
Comments dataset, 0.84 F1 Scores [13].
Predominantly the exploitation problem appears in the (ABC) algorithm. The
swarm of honeybees inspired this algorithm. It has addressed many problems. For
more solution for exploitation in ABC algorithms, this paper proposes a chaotic
ABC with elite opposition based learning strategy. The outcome is to improve
exploitation ability. Furthermore, the elite opposition utilized to best exploit
potency in available solutions. The results compared with several artificial bee
colony algorithms [14].
Contribute to sentiment analysis for natural language processing, concerned with
classifies the polarity of the text and the cause the need to understand opinions,
feelings, emotions, and evaluations data is urgent. This work aims to implement a
sentiment analysis system that identifies and understands semantics without lin-
guistic resources. The proposed model examined to detect its polarity positive or
negative [15].
Feature selection is very important for classification, it enhances classification
performance, removes redundant features, and reduces computational time.
A proposal for a new error-based artificial bee colony algorithm for the feature
selection problem. Developed by incorporating new error-based standardized
solution search mechanisms. Thirteen machine learning data sets are used. SVM
and KNN Classification algorithms are used [16].
The proposed Multi-objective artificial bee colony-based feature weighting tech-
nique for naïve Bayes (MOABC-FWNB), the approaches consider the relationship
between feature-feature (redundancy) independently and feature-class (relevancy)
using the Naïve Bayes (NB), the proposed model to determine the weights of features,
an experimental study was conducted on 20 benchmark UCI datasets [17] (Table 1).
The literature review is related to text value extraction, so that the text value is
used in a diverse way. To employ it from the process of classification or analyzing a
feeling or extracting a certain value. We mentioned in this research problem about
sentiment analysis in Arabic text, to cover these gaps, in this paper we work to
identify a subset of optimal traits by modifying the artificial bee colony algorithm,
and then employ this subset of features in the classification process within super-
vised machine learning to build an integrated application that serves prediction
operations to analyze the human feeling from the value of a text.
3 The Proposed Method
3.1 Introduction
This chapter presents the procedures and implementation of the experiments and
how to obtain the results of our proposed models. This paper aims to get the
minimum number of the optimal feature that effect the value of text using the
enhanced ABC-UBC designed for feature selection described in details section 3.4
then apply with wrapper technique classification that needed for machine learning
to improve accuracy Classifiers text. to solve the problem of Arabic sentiment
analysis. The proposed model has examined on two datasets, (1) Jordanian dialect
sentiment corpus (2) Algerian dialect sentiment corpus. In addition, the datasets will
have divided into 80% training, 20% test.
The learning model phase depends on an optimal set of features from the
essential phase, which will be used in classifiers algorithms to test the training
dataset to the predicted label. Evaluate the proposed model compared with widely
used classification techniques. The pre-processing steps of the dataset will also be
discussed in this chapter
The entire experiment was designed and implemented using Python. Python 3.8,
Spyder 3, Jupiter notebook server is: 6.1.4, have been used to import dataset and
Table 1 Related studies summary

Author Method Dataset Research title Summary
[6] Salp swarm Benchmark Arabic sentiment The average
algorithm dataset of analysis based on classification
Arabic tweets salp swarm accuracy
algorithm with Rate of (80.08%).
S-shaped transfer PSO came next with
functions an average
Classification
accuracy rate of
(80.06%)
[7] Word2vec Algerian dialect A Semi supervised The best results that
corpus approach for obtained are up to
sentiment analysis of 80.58% (for F1
arab (ic + izi) score)
Messages:
application to the
algerian dialect
[8] Hybrid feature 10 News group Optimal feature Accuracy in 92.5%,
selection method text corpus subset selection for 5648 features)
based on using hybrid binary and 97.8% for 300
normalized Jaya optimization features)
difference measure algorithm for text
and binary Jaya classification SVM,
optimization NB
algorithm
(NDM-BJO)
[9] Enhanced binary Corpus of Feature selection The best results that
grey wolf Arabic texts using binary grey obtained with SVM
optimizer wolf optimizer with are up to %96 for
elite-based crossover F-measure
for Arabic text
classification
[10] Machine learning Corpus of Comparing multiple The macro-averaged
techniques english novels categories of feature F-measures are 0.93,
selection methods 0.94, 0.89, and 0.90.
for text classification the kappa
coefficients are 0.93,
0.94, 0.88. With the
increase in the
number of selected
features
[11] Neural network Tweet dataset NLP_Passau CNN, NN result of
model at SemEval-2020 f1 scores up to
representation task 12: multilingual 90.88%
embedding neural network for 76.76%
offensive language
detection
in English, Danish
and Turkish
(continued)
Table 1 (continued)
Author Method Dataset Research title Summary
[12] Neural network Chinese text A Intelligent Model
models CNN-BiLSTM (CNN-BiLSTM)
approach for chinese The experiment gets
sentiment analysis 87.1% accuracy
on spark
[13] Neural network Arabic channel Classification of The result of f1
models news comments cyberbullying text in scores up to 84%
dataset Arabic
[14] ABC algorithm Benchmark test A survey on the Presented a survey
with elite functions studies employing on studies of
opposition-based machine learning improving the ABC
learning strategy (ML) for enhancing using ML
(EOABC) artificial bee colony
(ABC) optimization
algorithm
[15] Neural network Corpus of Deep Using deep learning
models Arabic texts attention-based ANN
review level
sentiment analysis
for arabic reviews
[16] New standard Thirteen A new standard error Using artificial bee
error-based datasets are used based artificial bee colony algorithm
artificial bee from colony algorithm
colony (SEABC) UCI machine and its applications
algorithm learning datasets in feature selection
[17] Multi-objective Twenty Feature weighting Using multi
artificial bee benchmark UCI for naïve Bayes objective artificial
colony-based datasets using multi objective bee colony
feature weighting artificial bee colony algorithm
technique for naïve algorithm For Feature
Bayes weighting
evaluate and compare the result. Using the CountVectorizer means breaking down a
sentence or paragraph or any text into words then to convert the words to multi-
dimensional matrix to training data in classifiers forward features selection using
the machine learning algorithms. The operating system was used OS Windows 10
20H2, Processor Intel(R) Core(TM) i7-3520M, RAM 12 GB.
This paper has two datasets as shown in Fig. 1, first the Jordanian dialect sentiment
corpus 3000 notes are written in the Arabic Jordanian dialect specifically and
collect from different telecommunication companies, the dataset was collected from
Fig. 1 The proposed model

Table 2 The characteristics of the Jordanian dialect sentiment corpus

Number of instances 3000
Number of positive notes 1116
Number of negative 1884
Topics Reviews and feedback from customer’s notes
Language Jordanian dialects (AD)
Annotation Manual (by expert native speakers)
Predicted attribute Class of opinion polarity (positive, negative)
Count of words 1631
Count of stem words 847
Jordanian telecom company notes that were written by call center employees, these
notes were written during the customer’s calls with call center. Call center
employees summarize the calls that they receive as notes. The characteristics of the
dataset are given in Table 2.
The second dataset, the Algerian dialect sentiment corpus Articles extracted
from political, news, sports, religion, and society articles selected from Algerian
Arabic newspaper websites. The characteristics of the dataset are given in Table 3.
3.3 Data Annotation
The dataset has been divided into two different categories as positive, negative. The
dataset has been annotated by a group of experts, the classification of Arabic
messages into two categories has been linked with a number, to facilitate the
classification process as 1 for positive, and 2 for negative.
Table 4, shows a sample of Jordanian dialect sentiment corpus, Table 5 shows a
sample of Algerian dialect sentiment corpus.
Table 3 The characteristics of the Algerian dialect sentiment corpus

Number of instances 5630
Number of positive notes 3046
Number of negative 2584
Topics Articles extracted from news, political, religion,
sports, and society
Language Algerian, Arabic Dialects (AD)
Modern Standard Arabic (MSA)
Annotation Manual (by expert native speakers)
Predicted attribute Class of opinion polarity (positive, negative)
Count of words 9468
Count of stem words 3848
‫… ‪Arabic Text Classification Using Modified Artificial Bee Colony‬‬ ‫‪253‬‬
‫‪Table 4 Dataset example‬‬

‫‪Note‬‬ ‫‪Polarity‬‬
‫ﺍﻟﺘﻐﻄﻴﺔ ﺻﺎﺭﺕ ﻣﻤﺘﺎﺯﺓ ﺑﻀﺎﺣﻴﺔ ﺍﻟﺮﺷﻴﺪ‬ ‫‪1‬‬
‫ﺳﺮﻋﺔ ﺍﻻﻧﺘﺮﻧﺖ ﺻﺎﺭﺕ ﻣﻨﻴﺤﺔ ﺑﺘﻼﻉ ﺍﻟﻌﻠﻲ‬ ‫‪1‬‬
‫ﺍﻟﻌﺮﻭﺽ ﺍﻟﺠﺪﻳﺪﺓ ﻣﺸﺠﻌﺔ‬ ‫‪1‬‬
‫ﺍﻟﺘﻄﺒﻴﻖ ﺳﻬﻞ ﻋﻠﻴﻨﺎ ﻛﺜﻴﺮ‬ ‫‪1‬‬
‫ﻋﻢ ﺑﺤﺎﻭﻝ ﺍﺣﻮﻝ ﺗﻌﺮﻓﺖ ﺧﻄﻲ ﻟﺨﻂ ﺍﻟﻜﻞ ﺑﻠﻜﻞ ﻭ ﺑﻌﻄﻴﻨﻲ ﻳﺮﺟﻰ ﺍﻟﻤﺤﺎﻭﻟﺔ ﻻﺣﻘﺎ‬ ‫‪2‬‬
‫ﺍﻟﻨﺖ ﺑﻀﻞ ﻳﻔﺼﻞ ﻣﻊ ﺍﻧﻮ ﻣﻌﻲ ﺣﺰﻡ ﻛﺜﻴﺮ‬ ‫‪2‬‬
‫ﺑﺮﻥ ﻋﻠﻲ ﺭﻗﻢ ﺧﺎﺹ ﻭ ﺑﻀﻞ ﻳﺰﻋﺠﻨﻲ‬ ‫‪2‬‬
‫ﺣﻜﻴﺖ ﻣﻌﻜﻢ ﻛﺘﻴﺮ ﻭ ﻫﺎﻱ ﺍﻟﻤﻜﺎﻟﻤﺔ ﺍﻟﻌﺸﺮﻳﻦ ﻭ ﻣﺎ ﺣﺪﺍ ﺣﻠﻠﻲ ﻣﺸﻜﻠﺘﻲ‬ ‫‪2‬‬
‫‪Table 5 Dataset example‬‬

‫‪Articles‬‬ ‫‪Polarity‬‬
‫ﺷﻲء ﻋﺠﻴﺐ ﻭ ﺍﻟﻠﻪ ﺍﻥ ﻳﻜﻮﻥ ﻣﻨﺎﻇﻞ ﻛﺒﻴﺮ ﻛﻤﺎ ﻳﻘﺎﻝ ﻭ ﺭﺋﻴﺲ ﺣﻜﻮﻣﺔ ﻳﺠﻬﻞ ﻣﻜﺎﻧﺔ ﺍﻟﺸﻴﺦ ﺍﻻﺑﺮﺍﻫﻤﻲ ﻓﻲ ﺍﻟﻌﺎﻟﻢ‬ ‫‪2‬‬
‫ﺍﻻﺳﻼﻣﻲ ﻭ ﻣﺎ ﻗﺪﻣﻪ ﻟﻠﺜﻮﺭﺓ ﺍﻟﺠﺰﺍﺋﺮﻳﺔ ﻭ ﻛﻼﻣﻪ ﻭ ﺑﻴﺎﻧﺎﺗﻪ ﻣﻌﺮﻭﻓﺔ ﻭ ﻣﻨﺸﻮﺭﺓ ﺑﺎﻣﻜﺎﻥ ﺍﻱ ﺍﻧﺴﺎﻥ ﺍﻻﻃﻼﻉ ﻋﻠﻴﻬﺎ ﻭ‬
‫ﻋﻠﻰ ﺗﺎﺭﻳﺦ ﺍﺻﺪﺍﺭﻫﺎ ﻭ ﻗﺪ ﺍﺳﺘﻐﻞ ﺍﺣﺪ ﺍﻟﺼﺤﺎﻓﻴﻴﻦ ﺍﻟﺘﺎﻓﻬﻴﻦ ﻛﻼﻡ ﺍﻟﺴﻴﺪ ﺑﻠﻌﻴﺪ ﻭ ﺑﺪﺃ ﻳﻠﻮﻙ ﻛﻼﻡ ﺍﻟﺘﺸﻔﻲ ﻭ ﺍﻻﻧﺘﻘﺎﺹ‬
‫ﻟﻼﻣﺎﻡ ﺍﻟﺒﺸﻴﺮ ﺍﻻﺑﺮﺍﻫﻤﻲ ﻫﺬﺍ ﺍﻻﻣﺎﻡ ﺍﻟﺬﻱ ﻛﺎﻧﺖ ﺗﺘﺠﻨﺒﻪ ﻋﻴﻮﻥ ﺍﻟﻌﻠﻤﺎﺀ ﺍﻣﺜﺎﻝ ﺍﻟﻌﻘﺎﺩ ﻭ ﻃﻪ ﺣﺴﻴﻦ ﻓﻲ ﺍﺭﻭﻗﺔ ﻣﺠﻤﻊ‬
‫ﺍﻟﻠﻐﺔ ﺍﻟﻌﺮﺑﻴﺔ ﺑﺎﻟﻘﺎﻫﺮﺓ ﺗﻘﺪﻳﺮﺍ ﻭ ﺍﺣﺘﺮﺍﻣﺎ ﻟﻌﻠﻤﻪ ﺍﻟﻜﺒﻴﺮ ﻭ ﺗﺒﺤﺮﻩ ﻻ ﻣﺘﻨﺎﻫﻲ ﻓﻲ ﺍﻟﻠﻐﺔ ﻭ ﺍﻻﺩﺏ ﻳﻬﺎﻥ ﻫﺬﻩ ﺍﻟﻤﺮﺓ ﻋﻠﻰ‬
‫ﺍﻳﺪﻱ ﺍﻃﻔﺎﻝ ﻓﻲ ﺍﻟﻌﻠﻢ ﻭ ﺍﻟﻔﻜﺮ ﻛﻢ ﺗﻤﻨﻴﺖ ﻟﻮ ﻛﺎﻥ ﺍﻻﺑﺮﺍﻫﻤﻲ ﻣﺼﺮﻳﺎ ﻟﺮﺃﻳﻨﺎ ﺍﻟﻌﺠﺐ ﺍﻟﻌﺠﺎﺏ ﻓﻲ ﺗﻘﺪﻳﺮﻩ ﻭ ﺍﺣﺘﺮﺍﻣﻪ ﻭ‬
‫ﺭﺑﻤﺎ ﻟﻘﺐ ﻣﻦ ﻃﺮﻑ ﺍﻟﻤﺼﺮﻳﻴﻦ ﺑﻤﻠﻚ ﺍﻟﺒﻴﺎﻥ ﺍﻟﻌﺮﺑﻲ ﻭ ﺟﻌﻠﻮﺍ ﻟﻪ ﺗﻤﺜﺎﻻ ﻳﻨﺎﻓﺲ ﺗﻤﺜﺎﻝ ﻃﻪ ﺣﺴﻴﻦ ﻓﻲ ﺍﻟﺠﺎﻣﻌﺔ‬
‫ﺍﻟﻤﺼﺮﻳﺔ ﻟﻜﻦ ﻋﻨﺪﻧﺎ ﺣﻴﺚ ﺍﻟﺠﻬﻞ ﻭ ﺍﻟﺮﻛﺎﻛﺔ ﺍﻟﻠﻐﻮﻳﺔ ﻻ ﺑﺪ ﻣﻦ ﺍﻧﺘﻘﺎﺹ ﻣﻦ ﻗﻴﻤﺘﻪ ﻭ ﺟﻬﺎﺩﻩ ﻭ ﻣﻦ ﺧﻼﻝ ﺗﺘﺒﻌﻲ‬
‫ﻟﻤﺴﻴﺮﺓ ﺗﺎﺭﻳﺨﻨﺎ ﺍﻟﻤﻌﺎﺻﺮ ﻻﺣﻈﺖ ﺍﻥ ﻣﻌﻈﻢ ﻣﺴﺆﻭﻟﻴﻨﺎ ﺭﻣﺖ ﺑﻬﻢ ﺍﻟﺼﺪﻓﺔ ﺍﻟﻰ ﻭﺍﺟﻬﺔ ﺍﻟﺤﻜﻢ ﻭ ﻟﻴﺲ ﻛﻤﺎ ﻫﻮ ﻋﻨﺪ‬
‫ﻏﻴﺮﻧﺎ ﺣﻴﺚ ﺍﻟﻜﻔﺎءﺔ ﻭ ﺍﻟﻌﻠﻢ ﻭ ﺍﻟﻨﺰﺍﻫﺔ ﻫﻲ ﻣﻴﺰﺍﻥ ﺍﻻﺧﺘﻴﺎﺭ ﻭ ﻟﻠﺤﻘﻴﻘﺔ ﺍﻥ ﻣﻌﻈﻢ ﻭﺯﺭﺍﺀ ﺍﻟﺴﺒﻌﻴﻨﺎﺕ ﻛﺎﻧﻮﺍ ﻇﻞ ﻟﻼﺥ‬
‫ﺑﻮﻣﺪﻳﻦ ﻓﻬﻮ ﻣﻦ ﻗﺎﻡ ﺑﻜﻞ ﺷﻲء ﻓﻲ ﺍﻟﻤﺠﺎﻝ ﺍﻟﺴﻴﺎﺳﻲ ﻭ ﺍﻻﻗﺘﺼﺎﺩﻱ ﻭ ﺍﻻﺟﺘﻤﺎﻋﻲ ﻭ ﻫﻢ ﻣﺠﺮﺩ ﺩﻣﻰ ﻣﺘﺤﺮﻛﺔ‬
‫ﻻﻭﺍﻣﺮﻩ ﺍﻟﻨﺎﻓﺬﺓ ﻭ ﺍﻟﺪﻟﻴﻞ ﺍﻥ ﻫﺆﻻﺀ ﻋﻨﺪﻣﺎ ﺭﺟﻌﻮﺍ ﻟﻠﺤﻜﻢ ﻣﺮﺓ ﻟﻢ ﻳﻘﺪﻣﻮﺍ ﺷﻲء ﻳﺬﻛﺮ ﻭ ﺍﻻﺥ ﺑﻠﻌﻴﺪ ﺍﻟﺬﻱ ﻟﻘﺐ ﺑﺄﺏ‬
‫ﺍﻟﺼﻨﺎﻋﺔ ﺍﻟﺜﻘﻴﻠﺔ ﻛﺎﻥ ﻣﻦ ﺍﻟﻤﻔﺮﻭﺽ ﺍﻥ ﻳﻨﺴﺐ ﻟﺒﻮﻣﺪﻳﻦ ﻓﻬﻮ ﺻﺎﺣﺐ ﺍﻟﻔﻜﺮﺓ ﻭ ﺍﻟﻔﻀﻞ ﻟﻢ ﻳﺴﺘﻄﻊ ﺍﻧﻘﺎﺫ ﻣﺼﻨﻊ ﻭﺍﺣﺪ‬
‫ﺻﻐﻴﺮ ﻭ ﺍﺿﻄﺮ ﻟﺒﻴﻌﻪ ﻟﻠﺨﻮﺍﺹ ﻻ ﺑﺪ ﻣﻦ ﺍﻋﺎﺩﺓ ﻗﺮﺍءﺔ ﺗﺎﺭﻳﺨﻨﺎ ﺑﻌﻴﻮﻥ ﻧﺎﻗﺪﺓ ﻭ ﻭﺍﻋﻴﺔ ﺗﻌﺘﻤﺪ ﻓﻘﻂ ﻋﻠﻰ ﺍﻟﺤﻘﺎﺋﻖ‬
‫ﺍﻟﺪﺍﻣﻐﺔ ﺍﻟﺘﻲ ﺗﺴﻨﺪﻫﺎ ﺍﻟﻮﺛﺎﺋﻖ ﻭ ﻟﻴﺲ ﻋﻠﻰ ﺍﻻﺑﺎﻃﻴﻞ‬
‫ﻛﻠﻤﺎ ﺃﻃﺎﻝ ﺍﻟﻠﻪ ﻓﻲ ﻋﻤﺮﻱ ﺃﺗﺄﻛﺪ ﺑﻤﺎ ﻻ ﻳﺪﻉ ﻣﺠﺎﻻ ﻟﻠﺸﻚ ﺃﻥ ﺟﻞ ﻣﻦ ﻗﺎﺩﻭﺍ ﺍﻟﺠﺰﺍﺋﺮ ﺑﻌﺪ ﺍﻻﺳﺘﻘﻼﻝ ﺍﻟﻰ ﻳﻮﻣﻨﺎ ﻫﺬﺍ ﻫﻢ‬ ‫‪2‬‬
‫ﺃﻗﺮﺏ ﺍﻟﻰ ﺍﻟﺠﻬﻞ ﺍﻟﻤﺮﻛﺐ ﺃﻭ ﺍﻟﻌﻤﺎﻟﺔ ﺑﻞ ﺍﻻﻧﺒﻄﺎﺡ ﺍﻟﻰ ﻓﺮﻧﺴﺎ ﺍﻻﺳﺘﻌﻤﺎﺭﻳﺔ ﻓﻼ ﺍﺗﺼﻮﺭ ﻛﻴﻒ ﻟﻤﺠﺎﻫﺪ ‪ -‬ﻛﻤﺎ ﻳﻘﻮﻟﻮﻥ‬
‫‪ -‬ﻭﺭﺋﻴﺲ ﺣﻜﻮﻣﺔ ﻟﻠﺪﻭﻟﺔ ﺍﻟﺠﺰﺍﺋﺮﻳﺔ ﺍﻟﻤﺴﺘﻘﻠﺔ ﻳﺘﺴﻢ ﺑﻬﺬﺍ ﺍﻟﺠﻬﻞ ﺍﻟﻤﺮﻛﺐ ﻭﻧﻠﻮﻡ ﺍﻟﺤﺮﺍﻗﻴﻦ ﻭﻧﻜﻔﺮ ﺍﻟﻤﻨﺘﺤﺮﻳﻦ ﻭﻧﺴﺠﻦ‬
‫ﺃﺻﺤﺎﺏ ﺍﻟﺮﺃﻱ ﺍﻵﺧﺮ ﺃﻟﻴﺲ ﻣﺎ ﻧﺤﻦ ﻋﻠﻴﻪ ﺍﻵﻥ ﻫﻮ ﺛﻤﺮﺓ ﻣﺎ ﺯﺭﻋﻪ ﺃﻣﺜﺎﻝ ﻫﺆﻻﺀ ﺍﻟﻤﺘﺨﻠﻔﻮﻥ ﻋﻘﻠﻴﺎ ﻭﺍﻟﻤﻨﺒﻄﺤﻮﻥ ﻣﻨﺬ‬
‫ﺯﻣﻦ ﻭﺍﻟﺸﻴﺎﺗﻮﻥ ﺣﺎﻟﻴﺎ ﻟﻚ ﺍﻟﻠﻪ ﻳﺎﺟﺰﺍﺋﺮ‬
‫ﻣﻨﺎﻓﻘﻮﻥ ﺑﻼ ﻋﻨﻮﺍﻥ ﻫﺆﻻﺀ ﻻ ﻳﺴﺘﺤﻮﻥ ﻣﺎﺯﺍﻟﻮﺍ ﻳﺴﺘﺒﻐﻠﻮﻥ ﺍﻟﺸﻌﺐ ﻭﻳﻜﺬﺑﻮﻥ ﻋﻠﻴﻪ ﺍﻟﺮﺟﻞ ﻓﻲ ﺣﺎﻟﺔ ﻭﺍﻟﻨﺎﺯﻋﺎﺕ ﻏﺮﻗﺎ‬ ‫‪2‬‬
‫ﻳﺮﻛﺰﻭﻥ ﻋﻠﻰ ﺇﻇﻬﺎﺭ ﺻﻮﺭﺗﻪ ﻟﻠﺸﻌﺐ ﻭﻛﺄﻧﻪ ﻫﻮ ﻣﻦ ﻳﺤﻜﻢ ﻭﻳﺪﻳﺮ ﺷﺆﻭﻥ ﺍﻟﺒﻠﺪ ﻭﺍﻟﻠﻪ ﻻ ﺗﺴﺘﺤﻮﻥ ﻋﻠﻰ ﺍﺭﻭﺍﺣﻜﻢ‬
‫ﻭﺍﻟﻠﻪ ﻟﻮ ﻛﻨﺎ ﻓﻲ ﺩﻭﻟﺔ ﺍﻟﻌﺪﺍﻟﺔ ﺍﻟﻤﺴﺘﻘﻠﺔ ﻭﺩﻭﻟﺔ ﺍﻟﺤﻖ ﻭﺍﻟﻘﺎﻧﻮﻥ ﻟﺤﻮﻛﻢ ﻫﺆﻻﺀ ﻋﻠﻰ ﻣﺒﻠﻎ ‪ 800‬ﻣﻠﻴﺎﺭ ﺩﻭﻻﺭ ﺃﻳﻦ ﺫﻫﺒﺖ‬
‫ﻭﺃﻳﻦ ﺻﺮﻓﺖ ﻣﺎﺩﻣﻨﺎ ﻧﺨﺎﻑ ﻣﻦ ﻇﻠﻨﺎ ﺗﺠﻤﻌﻨﺎ ﺍﻟﺰﺭﻧﺔ ﻋﻨﺪ ﺃﻫﻞ ﺍﻟﺸﺮﻕ ﻭﺗﻔﺮﻗﻨﺎ ﻫﺮﺍﻭﺓ ﺍﻟﺪﺭﻛﻲ ﻓﻠﻦ ﻳﺘﻐﻴﺮ ﺣﺎﻟﻨﺎ‬
‫ﻗﺪ ﺗﻜﻮﻥ ﻣﻌﻲ ﻭﻗﺪ ﺗﻜﻮﻥ ﺿﺪﻱ ﻓﻴﻤﺎ ﺃﻗﻮﻟﻪ ﻳﺎ ﺳﻴﺪ ﺳﻌﺪ ﺑﻮﻋﻘﺒﺔ ﻫﺆﻻﺀ ﺍﻟﻤﻨﺎﺿﻠﻴﻦ ﻳﺘﺮﺑﺼﻮﻥ ﺍﻟﻔﺮﺻﺔ ﻟﻠﺬﻫﺎﺏ ﺑﻌﻴﺪ‬ ‫‪1‬‬
‫ﻭﻫﻢ ﻋﻠﻰ ﻋﻠﻢ ﺑﺄﻧﻬﻢ ﻻﻳﺴﺘﻄﻴﻌﻮﻥ ﺗﺄﺩﻳﺔ ﺭﺑﻊ ﺍﻟﻤﻬﺎﻡ ﺍﻟﺘﻲ ﺗﺴﻨﺪ ﺍﻟﻴﻬﻢ ﻣﺜﻞ ﻣﻦ ﻛﺎﻧﻮﺍ ﻳﺼﻔﻘﻮﻥ ﻟﻪ ﻭﻫﻮ ﻳﻘﺬﻑ‬
‫ﺍﻟﻤﻨﺎﺿﻠﻴﻦ ﺍﻷﻗﺤﺎﺡ ﻭﻳﺰﻏﺮﺩﻭﻥ ﻭﻳﺼﻔﻘﻮﻥ ﻟﻮ ﻛﺎﻧﻮﺍ ﻓﻌﻼ ﻣﻨﺎﺿﻠﻴﻦ ﻣﻦ ﺃﺟﻞ ﺍﻟﺒﻼﺩ ﻭﺍﻟﻌﺒﺎﺩ ﻛﺎﻥ ﺍﻷﺟﺪﺭ ﺑﻬﻢ ﺃﻥ‬
‫ﻳﺴﺘﻘﻴﻠﻮﺍ ﻣﻨﺎ ﻣﻨﺎﺻﺒﻬﻢ ﻷﻧﻬﻢ ﻏﻴﺮ ﻣﻘﺘﻨﻌﻴﻦ ﺑﻤﺎ ﺣﺪﺙ ﺍﻟﺬﻳﻦ ﺻﻔﻘﻮﺍ ﻟﺴﻴﺪﻫﻢ ﺳﻴﺼﻔﻘﻮﻥ ﻟﺴﻴﺪ ﺃﺧﺮ ﻓﻘﻂ ﺃﻏﺎﺿﻬﻢ‬
‫ﺳﻴﺪﻫﻢ ﺍﻷﻭﻝ ﻷﻧﻬﻢ ﻛﺎﻧﻮﺍ ﻳﺘﻤﻨﻮﻥ ﺃﻥ ﻳﻜﻮﻧﻮﺍ ﻣﺸﺮﻋﻴﻦ ﻓﻲ ﺍﻟﺒﺮﻟﻤﺎﻥ ﺷﻜﺮ ﺍﻷﺥ ﺳﻌﺪ ﺃﻋﺎﻧﻚ ﺍﻟﻠﻪ ﻟﻠﺮﺩ ﻋﻠﻰ ﻣﻨﺎﺿﻠﻴﻦ‬
‫ﻳﻔﻜﺮﻭﻥ ﻓﻘﻂ ﻓﻲ ﺃﻧﻔﺴﻬﻢ‬
‫ﺍﺭﻳﺪ ﺍﻥ ﺍﻧﺒﻪ ﺍﻟﺴﻴﺪﺓ ﻧﺠﻼﻭﻯ ﺍﻥ ﺍﻟﻤﻘﺎﻝ ﺍﻟﺬﻯ ﺗﻨﺒﺎ ﻓﻴﻪ ﺍﻟﺴﻴﺪ ﺑﻮﻋﻘﺒﺔ ﺑﺮﺣﻴﻞ ﺍﻟﺴﻴﺪ ﺳﻌﺪﺍﻧﻰ ﻛﺎﻥ ‪ 05‬ﺍﻳﺎﻡ ﻗﺒﻞ ﺣﺪﻭﺙ‬ ‫‪1‬‬
‫ﺍﻟﺤﺪﺙ ‪ -‬ﻭﻫﺬﺍ ﻳﺪﻝ ﻋﻠﻰ ﺍﻥ ﺍﻟﺴﻴﺪ ﺑﻮﻋﻘﺒﺔ ﻫﻮ ﺍﻛﺒﺮ ﻣﻦ ﺍﻥ ﻳﺘﻨﺎﻭﻟﻪ ﺍﺣﺪ ﺑﺴﻮﺀ ﻫﻮ ﻓﻰ ﺧﺪﻣﺔ ﺍﻟﻘﺮﺍﺀ ﻭﻟﻮ ﻛﺎﻥ ﻳﺮﻳﺪ‬
‫ﺍﻟﺠﺎﻩ ﻭﺍﻟﻤﺎﻝ ﻟﻜﺎﻥ ﻟﻪ ﺫﻟﻚ ﻣﻨﺬ ﺯﻣﻦ ﻃﻮﻳﻞ ﻭﻫﻮ ﻋﻨﺪﻣﺎ ﺗﺤﺪﺙ ﻋﻦ ﺍﻟﺴﻴﺪ ﺳﻌﺪﺍﻧﻰ ﻓﺎﻧﻤﺎ ﻋﺒﺮ ﻋﻤﺎ ﻳﺪﻭﺭ ﻓﻰ ﻣﺨﻴﻠﺔ‬
‫ﺍﻟﻘﺮﺍﺀ ﻭﺍﻗﻮﻝ ﻟﻠﺴﻴﺪﺓ ﺍﻟﻔﺎﺿﻠﺔ ﺍﻥ ﺣﺰﺏ ﺟﺒﻬﺔ ﺍﻟﺘﺤﺮﻳﺮ ﻳﻌﺞ ﺑﺎﻟﻤﻨﺎﺿﻠﻴﻦ ﺍﻻﻛﻔﺎﺀ ﻭﻋﻠﻴﻜﻢ ﻣﻦ ﺍﻻﻥ ﺗﺼﺤﻴﺢ‬
‫ﺍﻻﻭﺿﺎﻉ ﻭﺍﻥ ﺑﺪﺍ ﻟﻜﻢ ﺍﻻﻣﺮ ﻻ ﻳﺴﺘﺤﻖ ﺫﻟﻚ ﻓﺎﻧﺰﻟﻮﺍ ﺍﻟﻰ ﺍﻟﺸﺎﺭﻉ ﻭﺍﺳﻤﻌﻮﺍ ﺣﺪﻳﺚ ﺍﻟﺸﻌﺐ‬
The Algerian dataset contains articles that describe the human feeling in its
positive or negative state, as this paper needs long paragraphs to train the proposed
model.
3.4 Preprocessing
3.4.1 Tokenization
The process of converting text into tokens before transforming it into vectors. It is
also easier to filter out unnecessary tokens. For example, split a document into
paragraphs or sentences into words. In this case, the tokenizing split sentences into
words as shown in Fig. 1 pre-processing phase. Words using CAMel Tools to apply
tokenizing for Arabic Natural language processing in (ANLP) Python [18].
3.4.2 Text Pre-processing
The main task is to avoid non-meaningful, it is important for text classification can
reduce the error with high accuracy. Each file of the corpus was subject to the
following procedure as shown in Fig. 1 pre-processing phase:
. Delete digits, punctuation marks and numbers.
. Delete all non-Arabic characters.
. Delete stop-words and non-useful words like pronouns, articles.
. In addition, propositions.
. Change the letter ‘‘‫”ﻯ‬to ‘‘‫”ﻱ‬.
. Change the letter ‘‘‫”ﺓ‬to ‘‘‫”ﻩ‬.
. Change the letter ‘‘” ‫ ‘‘ﺁ‬,”‫ ‘‘ﺇ‬,”‫ ‘‘ﺅ‬,”‫ ‘‘ﺉ‬,”‫ ﺃ‬to ‘‘‫”ﺍ‬. Delete characters that confuse
the classification process [19].
3.4.3 Stemming
The implements CAMel Tools for ANLP Arabic Natural language processing in
Python as shown in Fig. 1 pre-processing phase, a collection of open-source,
utilities for dialect identification, pre-processing, morphological modeling, senti-
ment analysis, and named entity recognition, and describe the functionalities and
stemming of Arabic words [18].
It is a process of reducing inflected words into one root or stem by removing
suffixes, prefix, and infixes. Types of Stemming: statistical [20].
Table 6 Count vectorizer Words Vector No

example
‫ﻳﻔﺼﻞ‬ 898
‫ﻳﻮﺟﺪ‬ 901
‫ﻳﻨﺨﺼﻢ‬ 899
3.4.4 Text to Numeric Data Representation
Implements the Count Vectorizer pre-training algorithm to encode the presence

words example in Table 6 to calculate the matrix of numeric values for each word
as show in Fig. 2 within each review texts [21].
3.4.5 Most Affective Jordanian Words
See the Table 7.

Figure 3 shows how the pre-processing phase steps to Numeric Data
Representation as Countvectorizer.
3.5 Modified Artificial Bee Colony Algorithm with Upper

Confidence Bound Algorithm
3.5.1 The Original Artificial Bee Colony Algorithm
Karaboga [22] has defined the swarm intelligence as “any attempt to design
algorithms or distributed problem-solving devices inspired by the collective
behavior of social insect colonies and other animal societies” there is a special
intelligent behavior of a honey bee swarm, based on this foraging behavior,
Fig. 2 Example of matrix of numeric value for words

Table 7 Most affective Jordanian dialect words in classifiers:

Features Words Score
Feature: 750 ‫ﻣﺎ‬ Score: 0.01007
Feature: 728 ‫ﻻ‬ Score: 0.28105
Feature: 626 ‫ﻋﺎﻟﺘﻄﺒﻴﻖ‬ Score: 0.01048
Feature: 605 ‫ﺻﺎﺭﺕ‬ Score: 0.01314
Feature: 598 ‫ﺷﻜﺮ‬ Score: 0.21708
Feature: 597 ‫ﺍﺷﺘﻜﻲ‬ Score: 0.05050
Feature: 306 ‫ﻏﺎﻟﻲ‬ Score: 0.01008
Feature: 153 ‫ﺧﻄﺄ‬ Score: 0.01989
Feature: 211 ‫ﻏﻴ ﺮ‬ Score: 0.00226
Feature: 223 ‫ﻣﺶ‬ Score: 0.00115
Feature: 246 ‫ﺯﺍﺑﻂ‬ Score: 0.00152
Feature: 261 ‫ﺟﻴﺪﺓ‬ Score: 0.00186
Feature: 272 ‫ﻣﺸﻜﻠﺔ‬ Score: 0.00426
Feature: 294 ‫ﺍﺭﺧﺺ‬ Score: 0.00265
Feature: 306 ‫ﺭﺻ ﻴ ﺪﻱ‬ Score: 0.01008
Feature: 315 ‫ﺻﻼﺣﻴﺔ‬ Score: 0.00160
Feature: 318 ‫ﺍﻟﻐﻴﻬﺎ‬ Score: 0.00114
Feature: 321 ‫ﻋﺎﻟﻤﺴﺆﻭﻝ‬ Score: 0.00152
Feature: 337 ‫ﺍﻭﻓﺮ‬ Score: 0.00488
Feature: 348 ‫ﺑﺎﻷﻗﺴﺎﻡ‬ Score: 0.00379
Feature: 401 ‫ﻋﺎﺳﺎﺱ‬ Score: 0.00371
Feature: 402 ‫ﺑﻀﻞ‬ Score: 0.00613
Feature: 403 ‫ﺑﻄﺊ‬ Score: 0.00006
Feature: 455 ‫ﺗﺘﻌﻠﻖ‬ Score: 0.00267
Feature: 467 ‫ﻣﻀﻄﺮ‬ Score: 0.00335
Feature: 492 ‫ﻣﺤﺘﺮﻣﻴﻦ‬ Score: 0.00155
Feature: 498 ‫ﺟﻮﺍﺋﺰ‬ Score: 0.00666
Feature: 575 ‫ﺳﺎﻋﺪﻧﻲ‬ Score: 0.00446
Feature: 576 ‫ﺳﺮﻋﺔ‬ Score: 0.00342
Feature: 587 ‫ﻟﻘ ﻴ ﺖ‬ Score: 0.00471
Feature: 581 ‫ﺳﻬﻞ‬ Score: 0.00362
Feature: 626 ‫ﺍﻟﺤﻞ‬ Score: 0.01048
Feature: 637 ‫ﺍﺣ ﺴﻦ‬ Score: 0.00479
Feature: 641 ‫ﺳﻬﻞ‬ Score: 0.00646
Feature: 644 ‫ﻋﻠﻴﻨﺎ‬ Score: 0.00423
Feature: 656 ‫ﻋ ﻨ ﺪﻱ‬ Score: 0.00317
Feature: 733 ‫ﻓﺎﺻﻞ‬ Score: 0.00111
Feature: 740 ‫ﻟﻤﺎ‬ Score: 0.00125
Feature: 751 ‫ﻣﺎﻛﺲ‬ Score: 0.00179
Feature: 760 ‫ﺭﻳﺤﻨﺎ‬ Score: 0.00446
(continued)
Table 7 (continued)
Features Words Score
Feature: 779 ‫ﻣﺶ‬ Score: 0.00286
Feature: 780 ‫ﻣﺸﺎﻛﻞ‬ Score: 0.00458
Feature: 782 ‫ﻣﺸﺠﻌﺔ‬ Score: 0.00552
Feature: 802 ‫ﻣﻤﺘﺎﺯ‬ Score: 0.01359
Feature: 804 ‫ﺍﻋﻄﻮﻧﻲ‬ Score: 0.00505
Feature: 815 ‫ﻓ ﺰﺕ‬ Score: 0.00443
Feature: 812 ‫ﻣﻨﻴﺤﺔ‬ Score: 0.00310
Feature: 828 ‫ﻧﺰﻟﺖ‬ Score: 0.00472
Feature: 887 ‫ﻳﻄﻠﻌﻠﻲ‬ Score: 0.00542
Feature: 888 ‫ﻳﻌﻤﻞ‬ Score: 0.00117
establish the new ABC algorithm simulating real world. The ABC algorithms can
be efficiently used for solving multimodal and multidimensional optimization
problems.
The ABC has three groups, employed, onlookers, and Scouts bees. Distributed
as the first half has employed artificial bees, second half consist of onlookers.
One employed bee for food source, onlooker bees wait in the hive and decide on
a food source to exploit based on the information shared with the employed bees.
The employed bee becomes a scout after depleting its food [22].
The original ABC Algorithms:

(1) Generate the initial solution source randomly assigned
(2) Evaluate the fitness (fit(xi)) of the population
(3) Set cycle to 1
(4) Repeat
(5) For each employed bee {
(a) Produce new solution Vi by using (2)
(b) Calculate its fitness value fit(Vi)
(c) Apply greedy selection process}
(6) Calculate the probability values Pi for the solution (xi) by (3)
(7) For each onlooker bee {
(a) Select a solution xi depending on Pi
(b) Produce new solution Vj
(c) Calculate its fitness value fit(Vj)
(d) Apply greedy selection process}
(8) If there is an abandoned solution for the scout,
then replace it with a new solution which will be randomly produced by (4)
(9) Memorize the best solution so far
(10) Cycle = cycle + 1
(11) Until cycle = maximum cycle number
Pseudocode 1: ABC algorithm
Fig. 3 Second phase preprocessing
The ABC algorithm as swarm intelligence, is an iterative process, ABC create a

candidate solution according to the following equation:
Each solution Xi = 1, 2, …, SN; where SN represents the number of solutions,
xj ¼ ð1; 2; :::; DÞ a D-dimensional vector. The food source is randomly assigned to
SN of the employed bee with fitness evaluated. Then, cycle of search process for
employed, onlooker and scout bee’s is repeated.
( )
xji ¼ xjmin þ randð0; 1Þ xjmax _ xjmin ð1Þ
To produce a candidate solution according to Vij , position from the old one in
this phase search of employed bees denoted by Eq. 2, where j 2 (1, 2, …, D),
k 2 (1, 2, …, SN). hji ; ; theta is a random number in [−1, 1]. A food source vi is
assigned for every food source xi . Once vi is obtain it will be evaluated and
compared with xi . A greedy selection is applied between xi and vi . Then, best one is
selected depending on fitness values, the food amount of at xi .
( )
vji ¼ xji þ uji xji _ xjk ð2Þ
ABC select food, each onlooker bee select food depending on fitness value that
is obtained from employed bees. Where the fit(xi) is the fitness value of solution i.
Onlooker will select food source and produce new candidate position pi of the
selected food. Moreover, the selection probability of each solution is calculated by:
fit(x Þ
pi ¼ PSN i ð3Þ
m¼1 fit(xm Þ
After completing the search of employed and onlooker bees, the ABC algorithm
checks with here is any exhaust source to be disused. The scouts can discover rich
entirely as unknown food sources.
The Original Artificial Colony Bee Algorithm has three control parameters, food
source, limit value to stop iteration when find the optimal food source,and MEN the
maximum cycle number [23].
3.5.2 Enhancing Artificial Bee Algorithm with Upper Confidence

Bound
Upper Confidence Bound algorithms changes its pure Exploration and Exploitation
balance as it gathers more information of the environment to best exploitation in
it [24].
Exploration and exploitation are essential for a population-based optimization
algorithm. Like PSO, GA, DE, where exploration refers to the ability to achieve
optimal discovery of unknown areas. In terms of exploitation, it is the ability to
apply prior knowledge to obtain a better solution in practice for exploration [25].
The ABC algorithm is the process for maximum or minimum solution in
problem-solving within possible search space. The scout bees have to control the
exploration ability while employed and onlooker bees are having exploitation
ability. The artificial bee colony is efficient for constrained and multidimensional
basic functions. When we deal with local search ability. the convergence rate is
poor with complex multimodal function.
The artificial bee colony algorithm in equation (2) Chose a food source
according to the probability value, based on the roulette wheel method. A greedy
selection is applied between xi and vi. In this phase of the original ABC Pseudocode
(1): (5)(c) and (6)(d) where apply the greedy selection is applied, in order to
improve the exploitation some modifications inspired by Upper Confidence Bound
algorithm (UBC). with this modified affects the four results: mode, mean, median,
and standard deviation.
The UCB algorithm modifies its levels of exploration and exploitation, when
UCB has information about the available actions. Low confidence in the best
actions, can increase good action favors exploitation. adjust the balance as time
progresses, the UBC achieves an optimal action of average reward compared to
greedy.
sffiffiffiffiffiffiffiffiffiffiffi
log t
AðtÞ ¼ argmax½QtðaÞ þ ð4Þ
c
UBC algorithm
NtðaÞ
where Nt(k) is the number of times the treatment arm k has been selected up to the
time t, equation (5).
AðtÞ ¼ argmaxQtðaÞ Greedy algorithm ð5Þ
where argmax specifies choosing the action ‘a’ for Qt(a) is maximizing QtðaÞ
action ‘a’ at time step ‘t’.
Table 8 Mapping parameter equation between (greedy and UBC) algorithms

UBC parameter Estimated value Greedy parameter
Qt(a) Action ‘a’ at time step ‘t’ Qt(a)
Specifies choosing the argmax Specifies choosing the
action ‘a’ for Qt(a) is action ‘a’ for Qt(a) is
maximized maximized
Nt(a) Number of times that action ‘a’
has been selected, prior to time
‘t’
C Confidence value that controls Constant
the level of exploration
Qt(a) Represents the exploitation part Qt(a)
of the equation
Table 8 shows how to map parameters in the equation from greedy selection to
the UBC selection process.
Since the UBC is high potential for being optimal, it inspired a method of to
MAB problems called (Upper confidence bound) approach [26].
In order to simplify the modified ABC with UBC as shown in Pseudocode 2 step
(5)(a), (6)(d), and how much the modified selection of new food source effects the
behavior of ABC-UBC using the reinforcement learning at the Artificial bee colony.
(1) Generate the initial population xi (i = 1, 2,..., SN)
(2) Evaluate the fitness (fit(xi)) of the population
(3) Set cycle to 1
(4) Repeat
(5) For each employed bee {
(a) Produce new solution Vi by using (2)
(b) Calculate its fitness value fit(Vi)
(c) Apply UBC selection process}
(6) Calculate the probability values Pi for the solution (xi) by (3)
(7) For each onlooker bee {
(a) Select a solution xi depending on Pi
(b) Produce new solution Vj
(c) Calculate its fitness value fit(Vj)
(d) Apply UBC selection process}
(8) If there is an abandoned solution for the scout,
then replace it with a new solution which will be randomly produced by (4)
(9) Memorize the best solution so far
(10) Cycle = cycle + 1
(11) Until cycle = maximum cycle number
Pseudocode 2: modified ABC with UBC
3.5.3 Obtain the Number of Feature Selection Using the Modified

ABC-UBC
Modified ABC-UBC process to find the minimum number of features subset of

features (words) that has a higher classification accuracy.
Initial food sources: number of features equal the search space, it is the step to
find out the best accuracy for wrapper method, the forward feature selection in
proposed model. To find the optimal features based on the minimum number of
features.
Then apply the number of subset features in the forward feature selection. in
order to present food source as a bit vector that is considered if value 1 or not
considered if value 0. the generated number of random between 0 and 1 is a Ri, for
each position in each food source, the value of the position is considered as 1 if the
Ri value is less than the MR value. as part of a subset features, if value 0 this is not
considered features.
The number of variable n features which is a random number that controls the
subset features, the subset of features is evaluated classification accuracy by the
classifier. And used as a fitness value of food source. The neighbors of feature (food
source) determined by employed bees, the new food source has selected with the
UBC algorithm, as indicated in Eq. (5) [27].
3.6 Feature Selection
Three objectives of FS develop the prediction interpretation predictors with more

cost-effective and fast predictors. Best deal with underlying test that has generated
the data [28].
In the first phase, selecting the optimal feature is very important, which means
choosing distinctive features from a feature set while excluding extraneous features
[29].
Practically, any combination of machine learning and search strategy can be
used as a wrapper to train models for the best possible combination of features that
resulted in the best performance. Looking for a subset of feature set as n, where n
the number of feature obtained from the modified ABC-UBC, to optimize perfor-
mance in the next step of machine learning algorithm classifiers, and evaluate the
model performance of the newly-trained machine learning with metrics perfor-
mance measure.
Moreover, the ending criteria in this proposed model is a predefined number of
features by the ABC-UBC, In addition, the Receiver Operating Characteristics
(ROC) is also used to measure the performance of the classifiers. The ROC graphs
are used to visualize, organize, and select classifications based on the performance.
The difference between the ROC and accuracy is that the ROC is helpful in
managing unbalanced instances of classes, whereas, the accuracy is a single number
to sum up the performance.
The ROC analysis evaluates models using (FPR) false positive rate and
(TPR) true positive rate. These are calculated as FPR ¼ FP N and TPR ¼ P , where N
TP
is the number of negative, p is the number of positives, and TP is the number of true
positives. Researchers use the Forward Feature Selection, that starts with no feature
and adds one at a time by evaluating all features individually, then select the feature
that results in best performance [6].
3.7 The Text Classification
Text categorization or tagging is the process of tagging text into labeled groups, text
classifiers can analyze text and assign labels or tags based on their content [30].
3.7.1 Support Vector Machines Classifier (SVM)
Belongs to nonparametric supervised techniques A binary classifier between two

classes by single identifies boundary, the most important models for SVM text
classifications are Linear and Radial Basis functions. Linear classification tends to
train the data-set then builds a model that assigns classes or categories [31]. the
main goal in this model to use SVM in forward feature selection Classifiers text. To
an optimal line in the simplest case using the training data to separate data into
classes depending on training data label (0 and 1).
The learning phase in SVM to process the repeated constraint classifier with an
optimal decision boundary [31].
3.7.2 K-Nearest Neighbors Classifier (KNN)
Belongs to nonparametric supervised techniques, assumes that a similar class exists

in close proximity to the part of used to classification problems, the main goal in
this model to use KNN in forward feature selection Classifiers text. to solve the
problem of Arabic sentiment analysis, KNN determines the label of a new sample
based on the label of its nearest neighbors [32].
3.7.3 Naïve- Bayes Classifier
Naive Bayes is a learning method in which you introduce a polynomial model, or a

probabilistic learning method. Naïve Bayes often relies on a document’s word bag
view, combining the most frequently used words while ignoring the other rare ones.
Bag of Words relies on feature extraction method to provide a classification for
some data. the main goal in this model to using it in forward feature selection
Classifiers text [33].
3.7.4 Polynomial Neural Networks Classifier
PNNs flexible neural architecture classifier algorithm is based on the GMDH

method and utilizes a class of polynomials such as linear, modified quadratic, cubic,
the number of layers can set during the training and test for classification has the
capability to capture relationships between the words in sentence. The main goal in
this model to using it in forward feature selection Classifiers text. to solve the
problem of Arabic sentiment analysis [34].
4 Results
4.1 Results Information
This paper aims to extract the polarity of Arabic text, for the introduced datasets. To
classify these texts using the proposed model Fig 1. The Results section is a
summary of the experiments that will be presenting results in tables. Four perfor-
mance evaluation criteria were used: precision, recall, f1-score, and accuracy.
4.2 The Jordanian Dialect Dataset Experiments
4.2.1 The Result of Arabic Text Classifiers with Pre-processing Phase
Results of KNN classifiers with Pre-processing phase are presented as follows

(Table 9).
SVM (Table 10).
NB (Table 11).
PNN (Table 12).
Table 9 Result of KNN with pre-processing

Models DataSet Classifier Precision Recall f1-score Accuracy
for label 1 label 1 2 label 1 2
2
Pre-processing Jordan KNN 0.82 1.00 0.93 0.99 0.87 0.99 0.98
(Stemming, stop Dialect Macro 0.91 0.96 0.93
word) avg
CountVectorizer
Weighted 0.98 0.97 0.97
avg
Accuracy 0.98
Table 10 Result of SVM with pre-processing

or label 1 2 label 1 2 label 1 2
Pre-processing Jordan SVM 1.00 0.99 0.80 1.00 0.89 0.99 0.99
word) avg
CountVectorizer
Weighted 0.99 0.99 0.99
avg
Accuracy 0.99
Table 11 Result of NB with pre-processing

2
Pre-processing Jordan Naïve 0.67 0.99 0.80 0.97 0.73 0.98 0.96
(Stemming, stop Dialect Bayes
word) Macro 0.83 0.89 0.85
CountVectorizer avg
Weighted 0.97 0.96 0.96
avg
Accuracy 0.96
Table 12 Result of PNN with pre-processing

2
Pre-processing Jordan PNN 0.80 0.99 0.80 0.99 0.80 0.99 0.97
word) avg
CountVectorizer
Weighted 0.97 0.97 0.97
avg
Accuracy 0.97
4.2.2 The Result of Arabic Text Classifiers Without Pre-Processing

Phase
KNN (Table 13).

SVM (Table 14).
NB (Table 15).
PNN (Table 16).
Table 13 Result of KNN without pre-processing

Models DataSet Classifier Precision Recall f1-score accuracy
2
Without Jordan KNN 0.56 1.00 0.93 0.95 0.70 0.97 0.95
pre-processing Dialect Macro 0.78 0.94 0.84
CountVectorizer avg
Weighted 0.97 0.95
avg
Accuracy 0.95
Table 14 Result of SVM without pre-processing

2
Without Jordan SVM 1.00 0.99 0.80 1.00 0.89 0.99 0.99
CountVectorizer avg
Weighted 0.99 0.99 0.99
avg
Accuracy 0.99
Table 15 Result of NB without pre-processing

2
Without Jordan Naïve 0.71 0.99 0.80 0.98 0.75 0.98 0.97
pre-processing Dialect Bayes
CountVectorizer Macro avg 0.85 0.89 0.87
Weighted 0.97 0.97 0.97
avg
Accuracy 0.97
Table 16 Result of PNN without pre-processing

2
Without Jordan PNN 0.87 0.99 0.87 0.99 0.87 0.99 0.98
pre-processing Dialect Macro avg 0.93 0.93 0.93
CountVectorizer
Weighted 0.98 0.98 0.98
avg
Accuracy 0.98
4.2.3 The Result of Arabic Text Using Forward Feature Selection

with ABC-UBC and Pre-Processing Phase
KNN (Table 17 and Fig. 4).

SVM (Table 18 and Fig. 5).
NB (Table 19 and Fig. 6).
PNN (Table 20 and Fig. 7).
Table 17 Result of KNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC Classifier Precision Recall f1-score Accuracy
UBC for label 1 label 1 2 label 1 2
Fno 2
Feature Jordan 10 KNN 0.89 0.94 0.94 0.89 0.92 0.91 0.92
selection with Dialect Macro 0.92 0.92 0.92
ABC-UBC avg
results with
Weighted 0.92 0.92 0.92
pre-processing
avg
Accuracy 0.92
Fig. 4 Performance of features with the selected number of features

Table 18 Result of SVM using forward feature selection with ABC-UBC and pre-processing
phase
Fno 2
Feature Jordan 10 SVM 0.86 0.99 0.80 1.00 0.83 0.99 0.98
selection with dialect Macro 0.92 0.90 0.91
ABC-UBC avg
results with
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98
Fig. 5 Performance of features
Table 19 Result of NB using forward feature selection with ABC-UBC and pre-processing phase
Fno 2
Feature Jordan 10 Naïve 1.00 0.99 0.80 1.00 0.89 0.99 0.99
selection with Dialect Bayes
ABC-UBC Macro 0.99 0.90 0.94
results with avg
pre-processing
Weighted 0.99 0.99 0.99
avg
Accuracy 0.99
Table 20 Result of PNN using forward feature selection with ABC-UBC and pre-processing
phase
Fno 2
Feature Jordan 8 PNN 0.92 0.99 0.80 1.00 0.86 0.99 0.98
ABC-UBC avg
results with
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98

with ABC-UBC Without Pre-Processing Phase

Table 21 Result of KNN using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
ABC-UBC avg
results without
Weighted 0.92 0.92 0.92
pre-processing
avg
Accuracy 0.92
Table 22 Result of SVM using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
Feature Jordan 10 SVM 0.86 0.99 0.80 0.99 0.83 0.99
ABC-UBC avg
results without
Weighted 0.98 0.98 0.98
pre-processing
avg
Accuracy 0.98
Table 23 Result of NB using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall abel f1-score Accuracy
UBC for label 1 12 label 1 2
Fno 2
ABC-UBC Macro 0.95 0.90 0.92
results without avg
pre-processing
Weighted 0.98 0.98 0.98
avg
Accuracy 0.98
Table 24 Result of PNN using forward feature selection with ABC-UBC without pre-processing
phase
Models DataSet ABC Classifier Precision Recall label f1-score Accuracy
UBC for label 1 12 label 1 2 training test
Fno 2
Feature Jordan 8 PNN 0.92 0.99 0.80 1.00 0.86 0.99 0.99 0.97
ABC-UBC avg
results without
Weighted 0.98 0.98 0.92
pre-processing
avg
Accuracy 0.98
Table 25 Result of KNN with pre-processing

2
Pre-processing Jordan KNN 0.59 0.62 0.88 0.26 0.71 0.36 0.60
(Stemming, stop Dialect macro 0.61 0.57 0.53
word) avg
CountVectorizer
weighted 0.61 0.60 0.55
avg
accuracy 0.60
4.3 The Algerian Dialect Dataset Experiments
4.3.1 The Result of Arabic Text Classifiers with Pre-processing Phase
KNN (Table 25).

SVM (Table 26).
NB (Table 27).
PNN (Table 28).
Table 26 Result of SVM with pre-processing

for label 1 label 1 2 label 1 2 training
2 test
Pre-processing Jordan SVM 0.72 0.68 0.75 0.64 0.73 0.66 0.70 0.72
word) avgn
CountVectorizer
Weighted 0.70 0.70 0.70
avg
Accuracy 0.70
Table 27 Result of NB with pre-processing

2
Pre-processing Jordan Naïve 0.63 0.63 0.79 0.44 0.70 0.52 0.63
(Stemming, stop Dialect Bayes
word) Macro 0.63 0.61 0.61
CountVectorizer avg
Weighted 0.63 0.63 0.62
avg
Accuracy 0.63
Table 28 Result of PNN with pre-processing

2 test
Pre-processing Jordan PNN 0.61 0.57 0.73 0.44 0.67 0.49 0.60 0.74
word) avg
CountVectorizer
Weighted 0.59 0.60 0.59
avg
Accuracy 0.60
4.3.2 The Result of Arabic Text Classifiers Without Pre-processing

Phase
KNN (Table 29).

SVM (Table 30).
NB (Table 31).
PNN (Table 32).
Table 29 Result of KNN without pre-processing

2
Without Jordan KNN 0.66 0.84 0.94 0.91 0.78 0.75 0.70
CountVectorizer avg
Weighted 0.74 0.70 0.68
avg
Accuracy 0.70
Table 30 Result of SVM without pre-processing

2 test
Without Jordan SVM 0.79 0.72 0.77 0.74 0.78 0.73 0.76 0.76
CountVectorizer avg
Weighted 0.76 0.76 0.76
avg
Accuracy 0.76
Table 31 Result of NB without pre-processing

2
Without Jordan Naïve 0.60 0.58 0.79 0.36 0.68 0.44 0.60
Pre-processing Dialect Bayes
CountVectorizer Macro 0.59 0.58 0.56
avg
Weighted 0.59 0.60 0.58
avg
Accuracy 0.60
Table 32 Result of PNN without pre-processing

2
Without Jordan PNN 0.70 0.61 0.67 0.64 0.68 0.62 0.66
pre-processing Dialect Macro avg 0.65 0.65 0.65
CountVectorizer
Weighted 0.66 0.66 0.66
avg
Accuracy 0.66

with ABC-UBC and Pre-processing Phase

Table 33 Result of KNN using forward feature selection with ABC-UBC and pre-processing
phase
Fno 2
ABC-UBC avg
Results with
Weighted 0.82 0.77 0.77
Pre-processing
avg
Accuracy 0.77

Table 34 Result of SVM using forward feature selection with ABC-UBC and pre-processing
phase
Fno 2
Feature Jordan 10 SVM 0.83 0.76 0.79 0.79 0.81 0.77 0.79
ABC-UBC avg
Results with
Weighted 0.79 0.79 0.79
Pre-processing
avg
Accuracy 0.79
Table 35 Result of NB using forward feature selection with ABC-UBC and pre-processing phase
Fno 2
ABC-UBC Macro 0.82 0.82 0.82
results with avg
pre-processing
Weighted 0.83 0.82 0.82
avg
Accuracy 0.82
Table 36 Result of PNN using forward feature selection with ABC-UBC and pre-processing
phase
Models DataSet ABC UBC Classifier precisionnfor Recall f1-score Accuracy
Fno label 1 2 label 1 2 label 1 2 Training
Test
Feature selection Jordan 10 PNN 0.86 0.65 0.62 0.87 0.72 0.75 0.74 0.70
with ABC-UBC Dialect macro avg 0.76 0.75 0.74
results with
pre-processing weighted avg 0.77 0.74 0.73
accuracy 0.74
Table 37 Result of KNN using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
Feature Jordan 10 KNN 0.94 0.70 0.67 0.95 0.78 0.80 0.79 0.793
ABC-UBC avg
results
Weighted 0.70 0.95 0.80
without
avg
pre-processing
Accuracy 0.79

with ABC-UBC Without Pre-processing Phase

Table 38 Result of SVM using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
Feature selection Jordan 10 SVM 0.81 0.96 0.71 0.79 0.76 0.74 0.75
with ABC-UBC Dialect Macro 0.75 0.75 0.75
results without avg
pre-processing
0.75 0.75 0.75 0.75
Accuracy 0.75
Table 39 Result of NB using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
Feature Jordan 10 Naïve 0.90 0.74 0.75 0.90 0.82 0.81 0.82 0.81
ABC-UBC Macro 0.82 0.82 0.82
results without avg
pre-processing
Weighted 0.83 0.82 0.82
avg
Accuracy 0.82
Table 40 Result of PNN using forward feature selection with ABC-UBC without pre-processing
phase
Fno 2
Feature selection Jordan 6 PNN 0.91 0.67 0.62 0.92 0.74 0.77 0.76
with ABC-UBC Dialect Macro 0.79 0.77 0.76
results without avg
pre-processing
Weighted 0.80 0.76 0.76
avg
Accuracy 0.76
4.4 Experimental Results and Discussion
The Jordanian dialect dataset experiments (Table 41 and Fig. 20).
Table 41 Comparison of performance values

Model Optimization Machine Performance measures
algorithms learning Precision Recall F1-SCORE Accuracy
classifiers
Arabic text KNN 0.91 0.96 0.93 0.98
classifiers with SVM 0.99 0.90 0.94 0.99
pre-processing
NB 0.83 0.89 0.85 0.96
phase
PNN 0.89 0.89 0.89 0.97
Arabic text KNN 0.78 0.94 0.84 0.95
classifiers without SVM 0.99 0.90 0.94 0.99
pre-processing
NB 0.85 0.89 0.87 0.97
phase
PNN 0.98 0.98 0.98 0.98
Arabic text using KNN 0.92 0.92 0.92 0.92
forward feature SVM 0.92 0.90 0.91 0.98
selection with
NB 0.99 0.90 0.94 0.99
ABC-UBC and
pre-processing PNN 0.95 0.90 0.92 0.98
phase
Arabic text using KNN 0.92 0.92 0.92 0.99
forward feature SVM 0.92 0.90 0.91 0.98
selection with
NB 0.95 0.90 0.92 0.98
ABC-UBC without
pre-processing PNN 0.95 0.90 0.92 0.98
phase
Arabic text classifiers with Pre-processing Arabic text classifiers without Pre-processing
1 1
0.9
0.8 0.8
0.7
0.6
0.6
0.5
0.4
0.4
0.3 0.2
0.2
0.1 0
0 Precision Recall F1-SCORE Accuracy
Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN
SVM NB PNN KNN
F.F.S with ABC-UBC with Pre-processing phase F.F.S with ABC-UBC without Pre-processing phase
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
Precision Recall F1-SCORE Accuracy Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN SVM NB PNN KNN
Fig. 20 Compared prediction accuracy for the four tests using Jordanian dialect dataset
4.5 Experimental Results and Discussion
The Algerian dialect dataset experiments (Table 42 and Fig. 21).
Table 42 Comparison of performance values

classifiers
Arabic text KNN 0.61 0.57 0.53 0.60
classifiers with SVM 0.70 0.70 0.70 0.70
pre-processing
NB 0.63 0.61 0.61 0.63
phase
PNN 0.59 0.58 0.58 0.60
Arabic text KNN 0.75 0.76 0.66 0.70
classifiers without SVM 0.76 0.76 0.76 0.76
pre-processing
NB 0.59 0.58 0.56 0.60
phase
PNN 0.63 0.65 0.65 0.66
KNN 0.81 0.79 0.77 0.77
(continued)
Table 42 (continued)
classifiers
Arabic text using Modified
forward feature ABC-UBC
selection with SVM 0.79 0.79 0.79 0.79
ABC-UBC and
NB 0.82 0.82 0.82 0.82
pre-processing
phase PNN 0.76 0.75 0.74 0.74
Arabic text using Modified KNN 0.82 0.81 0.79 0.79
forward feature ABC-UBC
selection with SVM 0.75 0.75 0.75 0.75
ABC-UBC without
NB 0.82 0.82 0.82 0.82
pre-processing
phase PNN 0.74 0.72 0.72 0.76
Arabic text classifiers with Pre-processing Arabic text classifiers without Pre-processing
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
Precision Recall F1-SCORE Accuracy Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN SVM NB PNN KNN
F.F.S with ABC-UBC with Pre-processing phase F.F.S with ABC-UBC without Pre-processing phase
1 1
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 Precision Recall F1-SCORE Accuracy
Precision Recall F1-SCORE Accuracy
SVM NB PNN KNN
SVM NB PNN KNN
Fig. 21 Compared prediction accuracy for the four tests using Algerian dialect dataset
5 Conclusion
In this paper, The extent to which the modified algorithm influences optimal fea-
tures. within Jordanian text Classifiers and their effect, the proposed modified
ABC-UBC achieves the minimum number of feature selection picks out the optimal
features from the words for the classification task. The test was carried out using the
Jordanian dialect dataset. The comparison of performance measures shown in
Table 40, with four tests in Jordanian text classifiers: with Pre-processing phase,
without Pre-processing phase, with using forward feature selection with ABC-UBC
with Pre-processing phase, and with using forward feature selection with
ABC-UBC without Pre-processing phase. We inferred The optimized features are
given into the classification task. with higher accuracy up to 99% Moreover, the
precision, recall, and f1-score also rate from 95% to 99%. After testing the
classification algorithms, we compared prediction accuracy for four tests so that
have Support Vector(SVM), KNeighborsClassifier(KNN), Naive Bayes(NB),
Probabilistic Neural Network (PNN) as shown in Fig. 5 the best result of KNN, NB,
PNN, accuracy up to 99.9%. And the test was carried out using the Algerian dialect
dataset. The comparison of performance measures shown in Table 41, with four
tests in Algerian text classifiers: with Pre-processing phase, without Pre-processing
phase, with using forward feature selection with ABC-UBC with Pre-processing
phase, and with using forward feature selection with ABC-UBC without Pre-
processing phase. This model with the four tests gives accuracy up to 82% (for F1
score).
A comparison between the contents of the Jordanian dialect data set and the
Algerian dialect data set.
The text size in the Jordanian dialect does not exceed twenty words for each row
in the database. While the text size in the Algerian dialect is a long paragraph, the
words are more than 100 per row in the database. Through experience, the fol-
lowing was observed: The accuracy of classification is affected by the number of
words. If the number of a word decreases, the accuracy of classification increases.
In the future, The objective is to apply the proposed model supervised approach
in Arabic, and its dialects, to be comparable with other methods after test in more
Arabic datasets. The method will introduce different functions like spam detection
and others to achieve the excellent results of the Arabic text classification system.
References
1. Proudfoot, D. (2020). Rethinking turing’s test and the philosophical implications. Minds and
Machines, 1–26.
2. Janani, R., & Vijayarani, S. (2020). Automatic text classification using machine learning and
optimization algorithms. Soft Computing, 1–17.
3. Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning
models. Information Processing & Management, 57(1), 102121.
4. Karaboga, D., Gorkemli, B., Ozturk, C., & Karaboga, N. (2014). A comprehensive survey:
Artificial bee colony (ABC) algorithm and applications. Artificial Intelligence Review, 42(1),
21–57.
5. Jiang, D., Yue, X., Li, K., Wang, S., & Guo, Z. (2015). Elite opposition-based artificial bee
colony algorithm for global optimization. International Journal of Engineering, 28(9), 1268–
1275.
6. Alzaqebah, A., Smadi, B., & Hammo, B. H. (2020, April). Arabic sentiment analysis based on
salp swarm algorithm with S-shaped transfer functions. In 2020 11th International
Conference on Information and Communication Systems (ICICS) (pp. 179–184). IEEE.
7. Guellil, I., Adeel, A., Azouaou, F., Benali, F., Hachani, A. E., Dashtipour, K., ... & Hussain,
A. (2021). A semi-supervised approach for sentiment analysis of arab (ic+ izi) messages:
Application to the algerian dialect. SN Computer Science, 2(2), 1–18.
8. Thirumoorthy, K., & Muneeswaran, K. (2020). Optimal feature subset selection using hybrid
binary Jaya optimization algorithm for text classification. Sādhanā, 45(1), 1–13.
9. Chantar, H., Mafarja, M., Alsawalqah, H., Heidari, A. A., Aljarah, I., & Faris, H. (2020).
Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text
classification. Neural Computing and Applications, 32(16), 12201–12220.
10. Zheng, W., & Jin, M. (2020). Comparing multiple categories of feature selection methods for
text classification. Digital Scholarship in the Humanities, 35(1), 208–224.
11. Hussein, O., Sfar, H., Mitrović, J., & Granitzer, M. (2020, December). NLP_Passau at
SemEval-2020 Task 12: Multilingual neural network for offensive language detection in
English, Danish and Turkish. In Proceedings of the Fourteenth Workshop on Semantic
Evaluation (pp. 2090–2097).
12. Pan, Y., & Liang, M. (2020, June). chinese text sentiment analysis based on BI-GRU and
self-attention. In 2020 IEEE 4th Information Technology, Networking, Electronic and
Automation Control Conference (ITNEC) (vol. 1, pp. 1983–1988). IEEE.
13. Rachid, B. A., Azza, H., & Ghezala, H. H. B. (2020, July). Classification of cyberbullying
text in Arabic. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1–
7). IEEE.
14. Guo, Z., Shi, J., Xiong, X., Xia, X., & Liu, X. (2019). Chaotic artificial bee colony with elite
opposition-based learning. International Journal of Computational Science and Engineering,
18(4), 383–390.
15. Almani, N., & Tang, L. H. (2020, March). Deep attention-based review level sentiment
analysis for Arabic reviews. In 2020 6th Conference on Data Science and Machine Learning
Applications (CDMA) (pp. 47–53). IEEE.
16. Hanbay, K. (2021). A new standard error based artificial bee colony algorithm and its
applications in feature selection. Journal of King Saud University-Computer and Information
Sciences.
17. Chaudhuri, A., & Sahu, T. P. (2021). Feature weighting for naïve Bayes using multi objective
artificial bee colony algorithm. International Journal of Computational Science and
Engineering, 24(1), 74–88.
18. Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., ... & Habash, N. (2020,
May). CAMeL tools: An open source python toolkit for Arabic natural language processing.
In Proceedings of the 12th language resources and evaluation conference (pp. 7022–7032).
19. Ayedh, A., Tan, G., Alwesabi, K., & Rajeh, H. (2016). The effect of preprocessing on arabic
document categorization. Algorithms, 9(2), 27.
20. Chen, P. H. (2020). Essential elements of natural language processing: What the radiologist
should know. Academic radiology, 27(1), 6–12.
21. Vijayaraghavan, S., & Basu, D. (2020). Sentiment analysis in drug reviews using supervised
machine learning algorithms. arXiv preprint arXiv:2003.11643.
22. Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization (vol.
200, pp. 1–10). Technical report-tr06, Erciyes university, engineering faculty, computer
engineering department.
23. Ghambari, S., & Rahati, A. (2018). An improved artificial bee colony algorithm and its
application to reliability optimization problems. Applied Soft Computing, 62, 736–767.
24. Xiang, Z., Xiang, C., Li, T., & Guo, Y. (2020). A self-adapting hierarchical actions and
structures joint optimization framework for automatic design of robotic and animation
skeletons. Soft Computing, 1–14.
25. Sharma, A., Sharma, A., Choudhary, S., Pachauri, R. K., Shrivastava, A., & Kumar, D. A.
(2020). Review on artificial bee colony and it’s engineering applications. Journal of Critical
Reviews.
26. Li, Y. (2020). Comparison of various multi-armed bandit algorithms (E-greedy, ompson
sampling and UCB-) to standard A/B testing.
27. Hijazi, M., Zeki, A., & Ismail, A. (2021). Arabic text classification using hybrid feature
selection method using chi-square binary artificial bee colony algorithm. Computer Science,
16(1), 213–228.
28. Zhang, X., Fan, M., Wang, D., Zhou, P., & Tao, D. (2020). Top-k feature selection
framework using robust 0–1 integer programming. IEEE Transactions on Neural Networks
and Learning Systems.
29. Janani, R., & Vijayarani, S. (2020). Automatic text classification using machine learning and
optimization algorithms. Soft Computing, 1–17.
30. Dhar, A., Mukherjee, H., Dash, N. S., & Roy, K. (2021). Text categorization: Past and
present. Artificial Intelligence Review, 54(4), 3007–3054.
31. Sheykhmousa, M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., &
Homayouni, S. (2020). Support vector machine vs. random forest for remote sensing image
classification: A meta-analysis and systematic review. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing.
32. Saadatfar, H., Khosravi, S., Joloudari, J. H., Mosavi, A., & Shamshirband, S. (2020). A new
K-nearest neighbors classifier for big data based on efficient data pruning. Mathematics, 8(2),
286.
33. Ruan, S., Li, H., Li, C., & Song, K. (2020). Class-specific deep feature weighting for Naïve
Bayes text classifiers. IEEE Access, 8, 20151–20159.
34. Oh, S. K., Pedrycz, W., & Park, B. J. (2003). Polynomial neural networks architecture:
Analysis and design. Computers & Electrical Engineering, 29(6), 703–725.

Classification Applications With Deep Learning and Machine Learning Technologies

Uploaded by

Copyright:

Available Formats

You might also like

Classification Applications With Deep Learning and Machine Learning Technologies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Classification Applications With Deep Learning and Machine Learning Technologies

Uploaded by

Copyright:

Available Formats

Studies in Computational Intelligence 1071

Laith Abualigah Editor

ISSN 1860-949X ISSN 1860-9503 (electronic)

Amman, Jordan Laith Abualigah

Artocarpus Classification Technique Using Deep Learning Based

Comparison of Pre-trained and Convolutional Neural Networks

Keywords Deep learning · Transfer learning · Convolutional neural network ·

L. Z. Pen · K. Xian Xian · C. F. Yew · O. S. Hau · P. Sumari · L. Abualigah (B)

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1

2 Propose Deep Learning

2.1 Proposed Convolutional Neural Network (CNN)

Fig. 1 The proposed CNN

Fig. 2 The hyperparameter of proposed CNN architecture

2.2 Transfer Learning Model for Artocarpus Classification

Fig. 3 VGG16 architecture

Fig. 4 Performance of VGG16 transfer model on Artocarpus image classification

Fig. 6 ResNet50 architecture

Fig. 7 Performance of ResNet50 transfer model on Artocarpus image classification

Fig. 9 Xception architecture

Fig. 10 Performance of Xception transfer model on Artocarpus image classification

Summary on Transfer Learning Models

Fig. 12 Performance of VGG16, ResNet50 and Xception on Artocarpus image classification

Fig. 13 Sample images of Artocarpus dataset

3.1 Experimental Setup

Fig. 14 Augmented Images using 90° rotation

3.2 Performance of Proposed CNN Model

Fig. 15 The hyperparameter optimization workflow

3.2.1 Effect of Hidden Layers (Convolutional Layers and Dense Layers)

4 Hyperparameter 4 Effect of Optimum Optimum Optimum Tuning 15 0.01

3.2.2 Effect of Perceptrons

3.2.3 Effect of Filter Number

Fig. 17 Accuracy of model with different number of perceptrons

3.2.4 Effect of Optimizers

Different types of optimizers such as Adam, Adagrad, RMSprop, SGD, Adadelta,

Fig. 20 Accuracy of model using different types of optimizers

Fig. 21 Accuracy of different optimizers in each epoch

3.2.5 Effect of Learning Rate

Fig. 22 Accuracy of the model with different learning rates

3.3 Accuracy Comparison

3.4 Model Performance Comparison

Table 2 Accuracy of pre-trained, proposed model and its hyperparameter

Pretrained Model and Proposed Model Accuracy in Each

VGG-16 (freeze all) VGG-16 (freeze all with new classifier)

Keywords Deep learning · Convolutional neural networks · Fruit classification ·

N. A. Anuar · L. Muniandy · K. A. B. Jaafar · Y. Lim · A. L. L. Sabeeh · P. Sumari ·

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 23

3 Proposed Deep Learning Method

A convolutional neural network (CNN) is a particular type of feed-forward neural

Fig. 1 Basic architecture of CNN

Fig. 2 Proposed CNN architecture

Fig. 3 Building the proposed model

3.2 Transfer Learning

Fig. 4 Model’s summary

Feature Extraction and Model Training for ResNet and VGG:

Fig. 6 A 34-layer ResNet architecture

Fig. 7 Setting up the ResNet model

Fig. 8 Resnet model summary

Fig. 9 VGG16 architecture