Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Chapter 1

Introduction

Image detection is a big problem in machine vision, and it's been around for a long time. The
challenge includes a variety of objectives, including intra-class photographs instead of colour
and scale, as well as circumstances and shape. Even if it is merely for preparation, enormous
data of labelled training images is required, and preparing this big data takes a long time and
costs a lot of money. By training the computer with data, classification of images was
developed to bridge the gap between computer vision and human vision. The image is graded
by categorising it into one of the designated categories depending on the vision content.
Image recognition is a traditional challenge in the domains of image processing, computer
vision, and machine learning. We utilise deep learning to examine picture recognition in this
article. In computer vision difficulties, one of the most difficult tasks is image identification.
This image recognition and feature extraction may be done with the Python computer
language and a few more machine learning techniques and Python packages. Convolutional
neural networks (CNNs) are extensively utilised in pattern and image recognition problems
because they provide a number of advantages over other techniques. This document covers
the fundamentals of CNNs, including a discussion of the many layers that can be utilised. In
this work, we develop a Convolutional Neural Network-based model for cat and dog image
recognition (CNN). The cat and dog dataset is pre-processed before being fed to the CNN
model.

In the world, there is an immense amount of picture data, and the rate of growth itself is
growing. Many of the images are processed or distributed on the Internet on cloud services.
For a wide range of image-related activities, automatic processing of image information is
useful. This means, for computer systems, crossing the so-called semantic distance between
the information contained in the image at the pixel level and the human understanding of the
same images. Object detection from the image library is a difficult task in the field of
computer vision. In terms of memory and speed, CNNs have a high computational cost in the
learning stage, but can achieve some degree of invariance of shift and deformation.

Image processing was described by Lillsand and Kiefer as involving the manipulation of
digital images using computers. It is a broad topic and includes mathematically complex
processes. Some simple operations are involved in image processing, such as image
restoration/rectification, image enhancement, image classification, fusion of images, etc. The
classification of images forms an essential part of processing images. Automatic allocation of
images to thematic groups is the purpose of image classification. Supervised classification
and unsupervised classification are two types of classification. The image classification
process involves two stages, followed by testing and training of the system. The training
method means taking the characteristic characteristics of the images (forming a class) and
forming a special definition for a specific class. Depending on the form of classification
issue, the process is performed for all classes; binary classification or multi-class
classification. The testing phase involves classifying the test images into different groups for
which the system has been educated. This class assignment is done on the basis of the
partitioning of the training features between classes.

Convolution neural network

One of the most common approaches to deep learning, where multiple layers are robustly
fitted, is Convolutionary Neural Networks (CNN). It has been found to be highly effective,
and is also the most commonly used in different computer vision applications. In general, a
CNN is made up of three central neural layers, which are fully connected convolutionary
layers, pooling layers, and layers. Different layer forms perform various functions. In two
steps, the network is trained: a forward and a backward step. Second, the main aim of the
forward step is to represent the input image with the current parameters in each layer
(weights and bias). The predictive output is then used to measure the cost of the loss with the
ground-truth labels. Second, the backward step with chain rules calculates the gradients of
each parameter based on the cost of loss. All parameters are updated and prepared based on
gradients for the next forward calculation [4]. After sufficient iterations of the forward and
backward stages, the network learning can be stopped. The convolutionary neural network
consists of single or multiple convolutionary and sub-sampling layer blocks, which often
include completely linked layers and finally an output block.

Fig. 1: Block diagram of CNN

Convolutional Layer: The most significant layer in CNN is the convolution layer, since this
layer converges with the input and transfers the output to the next layer. This is close to the
concept of human vision. In the upcoming layer, this layer only links to unique (smaller)
neurons. As a machine uses pixels to define an image, each pixel is provided with a value.
This layer has three mandatory characteristics, such as convolutional kernels, the number of
input channels and the filter depth.

A smaller image is considered and the pattern is compared with all the pixels in the given
input ( Large image) ( Large image). Every single pixel is filtered by following certain steps:
I Firstly, the filters are applied to each and every pixel of the images. (ii) each pixel is
multiplied with the corresponding image (iii) all the pixels values have to be added. (iv) the
value has to be divided by the feature pixel size. Hence, this layer outputs the lesser number
of free parameters, allowing the network to be deeper with the free parameters.
Pooling layer: The input for this layer is taking from a small portion of the previous layer.
The input is sampled into smaller proportions and produces a single output. Though there
exist several non-linear functions such as max-pooling, mean pooling and average pooling for
sub sampling, max pooling is widely used technique. This technique considers the maximum
value of the pixel in the considerable window. This states the importance of ideal recognition
of the related pattern, rather than the exact match. It continues pooling minimises the spatial
size representation and amount of computation in the memory.

Fully connected layer: Final part of the CNN is fully connected layer; this layer comes into
existence after several layers of convolutional and pooling layers. The high-level reasoning is
performed in this layer. As like in artificial neural networks, neurons in this layer fully
connected with the previous layers.

Chapter 2

Literature Review
Rocha et al. [2010] presented an automated system for classifying fruits and vegetables
based on statistical parameters extracted from the data. This study was based on data from
supermarkets, and the history of photographs was removed using the k-means algorithm.
Although they used a supermarket dataset of 2633 photos to test the algorithm, they did not
specify the precision of the categorization results.

Zhang and Wu [2012] are a pair of researchers who published a paper in 2012. A multi-class
kernel support vector supporting machine (kSVM) classification method was proposed in
their research work, and 18 different fruit groups were classified. The split-and-merge
segmentation algorithm was applied to the fruit files for feature extraction.
Zawbaa et al. [2014] developed an automated fruit identification technique that extracts two
features based on shape and colour characteristics and transforms invariant features to
classify a variety of fruit (SIFT). They employed support vector machine (SVM) and
adjoining k-nearest (k-NN) algorithms to find the files. Their suggested model is based on a
dataset of 178 fruit pictures, with classifiers achieving a range of accuracies for different fruit
morphologies, among other things.

Sang et al. [2014] Using a framework for detecting and figuring fruits from images in
congested greenhouses, a two-stage technique for grading and counting fruits was devised.
Despite using a big dataset of 28000 colour photos of fruits and plants as well as SVM as a
classifier, the model obtains an external correlation of 74.2 per cent for a robust dataset.

Shivaji et al. [2016] Create an embedded based system that classify fruit photos by scale,
consistency, colour, and health. They were utilised to classify two measures, colour and edge
detection, although the proposed process does not specify a classification algorithm or
success rate.

Naik and Patel [2017] The classification and ranking of fruits is investigated in these
research papers using a limited data set. Only k-NN and SVM are compared in this study,
which uses the random forest classifier.

Jana et al. [2019] Developed a technique for categorising distinct fruit varieties using texture
and colour features utilising statistical colour characteristics and co-occurrence matrix
(GLCM) at the grey-level. The method for classifying fruit photos employs SVMs. Their
work is done using two sorts of functionality, texture and colour, but no data size or accuracy
result for the classifier has been provided.

Hossain et al. [2018] Deep learning algorithms are used to classify fruits for industrial
applications. The suggested model is based on a deep learning model that uses a
convolutional neural network and a VGG-16 model that has been pre-trained (also called
OxfordNet). Two datasets are used to provide crisp photos of the fruits and general images of
the fruits. Although the accuracy of this study work is great, it has not been compared to
other classifiers.

Astuti et al. [2018] proposed utilising SVM in conjunction with an artificial neural network
comparative technique to differentiate fruits based on their morphologies. The Fast Fourier
Transform (FFT) is extracted and provided as an input to the SVM-based identifier in this
work. There is no mention of a segmentation method, and the ANN only achieves a
classification rate of 66.7 per cent in training frames, meaning that the ANN does not
accurately classify all fruits.

Sekar et al. [2018] discussed a computer vision approach called Fruit Identification. In their
investigation, they looked into several fruit disease detection technologies, as well as their
benefits and drawbacks. They offer a few classifier models and functions, however they don't
recommend a computer vision solution for identifying or detecting diseases in fruit or seeds,
for example.
Nikola Vukovic, Andrej Jokie [2018] This literature review is focused on feature-based
extraction. Their main goal is to lower the number of computational resources that the ALPR
system requires, as well as the amount of data that is used for training. They also proposed a
compressive sensing-based dimensionality reduction method that includes segmentation pre-
processing.

Usama S. Mohammed, Mohamed Yousef, and Khaled F. Hussain [2018] In this literature
review, they focused on preventing accuracy loss. To train their model, they took advantage
of the CTC loss function. In addition, highway networks were established for image
classification. They were also able to successfully implement it on captcha recognition and
street photos. For various implementations, they employed a variety of datasets.

Rayson Laroca, Evair Severo, Luiz A. Zanlorensi, Luiz S. Oliveira, Gabriel Resende
Goncalves, William Robson Schwartz, and David Menotti [2018] are among those who
have worked on the film. In addition, their proposed model performed several pre-processing
tasks, such as segmentation. To classify the image, they used Convolutional Neural
Networks. We developed two ways to improve our approach: inverted licence plates and
reversed characters. They used the UFPR-ALPR dataset, which has around 150 films with
over 4500 frames of vehicles collected.

Eduardo Todt, William Robson Schwartz, David Menotti, Rayson Laroca, Luiz A.
Zanlorensi, Gabriel R. Goncalves, Luiz A. Zanlorensi, Gabriel R. Goncalves, Luiz A.
Zanlorensi, Luiz A. Zanlorensi, Luiz A. Zanloren [2019] This literature review presents an
individual architecture ALPR approach based on the state-of-the-art YOLO object detector.
In post-processing procedures, they have adopted a single methodology for recognising and
classifying layouts. They employed networks to train models utilising photos from a variety
of datasets, as well as other augmentation techniques.

Satadal Saha [2019] The focus of this literature review is on feature extraction from images.
Binarization, localization, and segmentation were employed as pre-processing techniques.
The multilayer perceptron (MLP) has been used as a classifier, and the Quad Tree Based
Longest Run algorithm has been used to verify the network (QLTR). Get the characters it
predicts as well.

Huang et al. [2015] introduced a depth learning framework based on a convolutional neural
network (CNN) and a Naive Bayes data fusion strategy (dubbed NB-CNN) that may be used
to crack detect a single video frame. Simultaneously, a new data fusion strategy is provided
to aggregate the information retrieved from each video frame in order to improve the system's
overall performance and robustness. A CNN is proposed in this research to detect fissures in
each video frame, the proposed data fusion scheme maintains the temporal and spatial
coherence of the cracks in the video, and the Naive Bayes decision effectively discards false
positives.

Yan et al. [2016] proposed a semi-supervised HSI classification network with a space and
class structure regularised sparse representation (SCSSR). In particular, spatial information
was included into the SR model via graph Laplace regularisation, which posits that spatial
neighbours should have comparable representation coefficients, allowing the resultant
coefficient matrix to more precisely reflect sample similarity. They also incorporate the
probabilistic class structure (the probabilistic relationship between each sample and each
class) into the SR model to increase graph differentiability.

Zhang et al. [2017], The specificity of uniform samples and the invariance of rotation
invariance, as shown by are critical for object identification and classification applications.
The focus of current study is on feature invariance, such as rotation invariance. The invariant
properties of object classification are extracted using a new multichannel convolution neural
network (mCNN) introduced in this paper. To reduce the characteristic variance of sample
pairs with different rotations in the same class, multi-channel convolution with the same
weight is utilised. As a result, the uniform object's invariance and rotation invariance are
experienced at the same time to improve the invariance.

Chaib et al. [2018] outlined the basic model structure, convolution feature extraction, and
pool operation of convolution neural networks, as well as the rise and development of deep
learning and convolution neural networks. The research status and development trends of a
convolution neural network model based on deep learning for image classification are then
discussed, as well as the typical network shape, training method, and performance. Finally,
several current research issues are quickly described and explored, and future development
directions are forecasted.

R. Gupta and Bhavsar [2016] presented an extension to any convolutional neural network
(CNN) architecture to fine-tune classic 2D significant prediction to omnidirectional image
recognition (ODI). It is demonstrated that each stage in the pipeline described by them is
focused at making the generated salient map more accurate than the ground live data in an
end-to-end way. Convolutional neural network (Ann) is a type of deep machine learning
technology built from artificial neural network (Ann), which has seen a lot of success in
recent years in the field of image recognition.

Chapter 3

Research Methodology
Proposed Work

The hierarchical grouping of things into classes and divisions based on their qualities is
known as classification. By using data to teach the computer, image classification was built to
bridge the gap between machine vision and human vision. The image classification is done
by classifying the image into the proper category based on the quality of the vision. In this
research work, we look at the study of picture recognition using deep learning. Convolutional
neural networks (CNNs) are commonly utilised in automatic image identification systems.
Convolutional neural networks (CNNs) are commonly utilised in automatic image
identification systems. The top layer of CNN features is typically utilised for classification;
however, such features do not provide enough useful information to accurately anticipate the
image. In some circumstances, the traits of the lower layer have a bigger uneven effect than
those of the top layer. As a result, using a layer's properties simply for grouping is still a
strategy that doesn't fully exploit the CNN's inherent differential impact. This intrinsic
attribute emphasises the importance of merging elements from different phases. To address
this problem, we provide a framework for embedding multi-layer capabilities into CNN
models.

Research Objectives

(1) Create an image database.

(2) Image dataset pre-processing and cleaning

(3) Examine several picture recognition algorithms based on deep learning.

(4) Using the Convolutional Neural Network approach to estimate performance.

Research Methodology

Image classification is one of the hottest study areas in computer vision, and it's also the
foundation of picture classification systems in other industries. It's usually broken down into
three parts: image pre-processing, image feature extraction, and classifier.
References
[1] A. Rocha, D. C. Hauagge, J. Wainer, and S. Goldenstein, “Automatic fruit and vegetable
classification from images,” Computers and Electronics in Agriculture, vol. 70, no. 1, pp. 96–
104, Jan. 2010, doi: 10.1016/j.compag.2009.09.002.

[2] Y. Zhang and L. Wu, “Classification of Fruits Using Computer Vision and a Multiclass
Support Vector Machine,” Sensors, vol. 12, no. 9, pp. 12489–12505, Sep. 2012, doi:
10.3390/s120912489.

[3] C. V. Narasimhulu, “An automatic feature selection and classification framework for
analyzing ultrasound kidney images using dragonfly algorithm and random forest
classifier,” IET Image Processing, vol. 15, no. 9, pp. 2080–2096, Mar. 2021, doi:
10.1049/ipr2.12179.

[4]E. Mohebi and A. Bagirov, “A convolutional recursive modified Self Organizing Map for
handwritten digits recognition,” Neural Networks, vol. 60, pp. 104–118, Dec. 2014, doi:
10.1016/j.neunet.2014.08.001.

[5] J. J. D. Orquia and E. J. Bibangco, “Automated Fruit Classification Using Deep


Convolutional Neural Network,” Philippine Social Science Journal, vol. 3, no. 2, pp. 177–
178, Nov. 2020, doi: 10.52006/main.v3i2.188.

[6] S. Naik and B. Patel, “Machine Vision based Fruit Classification and Grading - A
Review,” International Journal of Computer Applications, vol. 170, no. 9, pp. 22–34, Jul.
2017, doi: 10.5120/ijca2017914937.

[7] A. R. Jiménez, A. K. Jain, R. Ceres, and J. L. Pons, “Automatic fruit recognition: a survey
and new results using Range/Attenuation images,” Pattern Recognition, vol. 32, no. 10, pp.
1719–1736, Oct. 1999, doi: 10.1016/s0031-3203(98)00170-8.

[8] M. S. Hossain, M. Al-Hammadi, and G. Muhammad, “Automatic Fruit Classification


Using Deep Learning for Industrial Applications,” IEEE Transactions on Industrial
Informatics, vol. 15, no. 2, pp. 1027–1034, Feb. 2019, doi: 10.1109/tii.2018.2875149.

[9] W. Astuti, S. Dewanto, K. E. N. Soebandrija, and S. Tan, “Automatic fruit classification


using support vector machines: a comparison with artificial neural network,” IOP Conference
Series: Earth and Environmental Science, vol. 195, p. 012047, Dec. 2018, doi: 10.1088/1755-
1315/195/1/012047.

[10] Y. Zhang, S. Wang, G. Ji, and P. Phillips, “Fruit classification using computer vision and
feedforward neural network,” Journal of Food Engineering, vol. 143, pp. 167–177, Dec.
2014, doi: 10.1016/j.jfoodeng.2014.07.001.

[11] L. Ju-xia, “License Plate Recognition Model Research Based on the Multi-Feature
Technology,” TELKOMNIKA Indonesian Journal of Electrical Engineering, vol. 12, no. 4,
Apr. 2014, doi: 10.11591/telkomnika.v12i4.4307.
[12]R. Laroca, L. A. Zanlorensi, G. R. Gonçalves, E. Todt, W. R. Schwartz, and D. Menotti,
“An efficient and layout‐independent automatic license plate recognition system based on the
YOLO detector,” IET Intelligent Transport Systems, vol. 15, no. 4, pp. 483–503, Feb. 2021,
doi: 10.1049/itr2.12030.

[13] R. Laroca, L. A. Zanlorensi, G. R. Gonçalves, E. Todt, W. R. Schwartz, and D. Menotti,


“An efficient and layout‐independent automatic license plate recognition system based on the
YOLO detector,” IET Intelligent Transport Systems, vol. 15, no. 4, pp. 483–503, Feb. 2021,
doi: 10.1049/itr2.12030.

[14] B. Li, Z. Y. Zeng, H. L. Dong, and X. M. Zeng, “Automatic License Plate Recognition
System,” Applied Mechanics and Materials, vol. 20–23, pp. 438–444, Jan. 2010, doi:
10.4028/www.scientific.net/amm.20-23.438.

[15] H. M. Li, “Deep Learning for Image Denoising,” International Journal of Signal
Processing, Image Processing and Pattern Recognition, vol. 7, no. 3, pp. 171–180, Jun.
2014, doi: 10.14257/ijsip.2014.7.3.14.

[16]M. Tummala, “Image Classification Using Convolutional Neural


Networks,” International Journal of Scientific and Research Publications (IJSRP), vol. 9, no.
8, p. p9261, Aug. 2019, doi: 10.29322/ijsrp.9.08.2019.p9261.

[20] X. Wan, X. Zhang, and L. Liu, “An Improved VGG19 Transfer Learning Strip Steel
Surface Defect Recognition Deep Neural Network Based on Few Samples and Imbalanced
Datasets,” Applied Sciences, vol. 11, no. 6, p. 2606, Mar. 2021, doi: 10.3390/app11062606.

[21] P. Ganakwar, “Convolutional Neural Network-VGG16 for Road Extraction from


Remotely Sensed Images,” International Journal for Research in Applied Science and
Engineering Technology, vol. 8, no. 8, pp. 916–922, Aug. 2020, doi:
10.22214/ijraset.2020.30796.

[22]Y. Tian, “Artificial Intelligence Image Recognition Method Based on Convolutional


Neural Network Algorithm,” IEEE Access, vol. 8, pp. 125731–125744, 2020, doi:
10.1109/access.2020.3006097.

[23] A. Sharma and G. Phonsa, “Image Classification Using CNN,” SSRN Electronic
Journal, 2021, doi: 10.2139/ssrn.3833453.

[24] X.-C. Yin, Q. Liu, H.-W. Hao, Z.-B. Wang, and K. Huang, “FMI image based rock
structure classification using classifier combination,” Neural Computing and Applications,
vol. 20, no. 7, pp. 955–963, Jun. 2010, doi: 10.1007/s00521-010-0395-3.

[25] S. Sharma, A. Soni, and V. Malviya, “Face Recognition Based on Convolution Neural
Network (CNN) Applications in Image Processing: A Survey,” SSRN Electronic Journal,
2019, doi: 10.2139/ssrn.3372193.
[26] N. Nour, M. Elhebir, and S. Viriri, “Face Expression Recognition using Convolution
Neural Network (CNN) Models,” International Journal of Grid Computing & Applications,
vol. 11, no. 4, pp. 1–11, Dec. 2020, doi: 10.5121/ijgca.2020.11401.

[27]A. V. Savchenko, “Deep neural networks and maximum likelihood search for
approximate nearest neighbor in video-based image recognition,” Optical Memory and
Neural Networks, vol. 26, no. 2, pp. 129–136, Apr. 2017, doi: 10.3103/s1060992x17020102.

[28]L. Jing, “Research on Image Recognition Method Based on Deep Learning,” Journal of
Physics: Conference Series, vol. 1395, p. 012008, Nov. 2019, doi: 10.1088/1742-
6596/1395/1/012008.

[29] A. Qayyum et al., “Image classification based on sparse-coded features using sparse
coding technique for aerial imagery: a hybrid dictionary approach,” Neural Computing and
Applications, vol. 31, no. 8, pp. 3587–3607, Dec. 2017, doi: 10.1007/s00521-017-3300-5.

[30] M. Amrani and F. Jiang, “Deep feature extraction and combination for synthetic aperture
radar target classification,” Journal of Applied Remote Sensing, vol. 11, no. 04, p. 1, Oct.
2017, doi: 10.1117/1.jrs.11.042616.

[31]D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deep neural network
for traffic sign classification,” Neural Networks, vol. 32, pp. 333–338, Aug. 2012, doi:
10.1016/j.neunet.2012.02.023.

[32] S. Lawrence, C. L. Giles, Ah Chung Tsoi, and A. D. Back, “Face recognition: a


convolutional neural-network approach,” IEEE Transactions on Neural Networks, vol. 8, no.
1, pp. 98–113, 1997, doi: 10.1109/72.554195.

You might also like