Professional Documents
Culture Documents
Image Classification With Artificial Intelligence Cats Vs Dogs
Image Classification With Artificial Intelligence Cats Vs Dogs
Abstract— With the rise of the techniques including big many machine learning models and effective training
data and artificial intelligence, we now have a higher-level algorithms. Machine learning models can study suitable
technology in solving image classification problems, and parameters from data automatically, and there are less
increase the accuracy of image classification by a remarkable manual operations [6, 7]. After the rise of the deep learning,
amount. To compare and analyze the classification a wide variety of the model of the neural networks,
performance from different machine learning and deep especially convolutional neural networks show an impressive
learning, this paper implemented support vector machine and performance in the image classification problem [8-10].
convolutional neural network to solve the classical Cats vs
Dogs problem, and compared how different parameters affect In this paper, we take the classical Cat vs Dogs problem
CNN. The lessons are summarized and presented in this paper. as an example, implement and compare the SVM and
different structures of CNN models, based on a real-world
Keywords— Support Vector Machine, Convolutional Neural dataset.
Network, Artificial Intelligence, Image Classification
The contributions of this paper are summarized as
follows:
I. INTRODUCTION
(1) We validate that the deep learning models represented
Image classification problems are universal in our daily by CNN outperform the shallow machine learning models
life. For instance, in the task of garbage classification, if we represented by SVM, in the problem of image classification;
do not know which category the trash belongs to, we just
need to take a picture of it with our smartphone. The relevant (2) We give a comprehensive evaluation of the influence
app in the phone will automatically identify the information of different parameters.
in the picture and classify its type. In the process of realizing The finding of this paper are as follows:
the image classification, computer programs can use
handcrafted characteristics of pictures, such as information in (1) The data augmentation technique is helpful in
the edge and the distribution of the color, and then separate improving the deep learning model’s performance;
the different kinds of objects.
(2) In the choice of optimizers, Adam is better than both
Before the rise of the smartphone and artificial SGD and RMSprop;
intelligence, people already had the idea of classifying
(3) In the choice of the convolution kernel, the size of 3 is
images and also built multiple tools to use. OpenCV [1],
better than the size of 5;
born in 1998, for instance, is the most popular open source
tool in the computer vision area. It provides an interface for (4) In the choice of the activation function, LeakyReLU
different programming languages like Python, Matlab and is better than ReLU.
Java, and has been widely applied both in the academia and
industry areas. OpenCV not only provides the basic II. RELATED WORK
manipulation such as drawing, saving and conversion of the
picture, but also provides the basic function of edge A. Digital Image Processing [11]
extraction and image classification. With the rise of
smartphones, these basic function of the picture Since the computer came into being, images can be
manipulations gradually moves to smartphones from stored and processed by computers. For a single-color image,
computers, making people able to use these functions we represent it to pixels with a value range from 0 to 255.
anytime and anywhere. While the image classification tools Image processing software tools including OpenCV can
represented by OpenCV provide the function of process basic operations such as drawing, copying,
classification, all the realization of these functions are based displaying, saving, etc. They also support the affine
on the empirical parameter setting. As a result, it is difficult transformations like translation and rotation. Afterwards,
to fit in the complex situation and increasing demand of big blurred images can be recovered by the noise filter, and the
data. edge can be extracted by detecting changes of nearby pixels.
After the rise of the A.I. technology, researchers collect a These are traditional pre-processing technology of
large amount of images and contribute many large-scale pictures. After these processing, researchers combine
open datasets [2-5]. With these datasets, researchers propose machine learning models like SVM to classify images. The
Authorized licensed use limited to: ULAKBIM UASL - Sutcu Imam Universitesi. Downloaded on October 26,2022 at 12:34:06 UTC from IEEE Xplore. Restrictions apply.
appearing of deep learning techniques simplifies the process IV. MODELS DESCRIPTIONE
largely. After This section introduces the various models we use,
including machine learning models represented by SVM, and
B. Deep Learning [12] deep learning models represented by CNN.
438
Authorized licensed use limited to: ULAKBIM UASL - Sutcu Imam Universitesi. Downloaded on October 26,2022 at 12:34:06 UTC from IEEE Xplore. Restrictions apply.
Figure 1. The CNN model used in this study.
is used to manipulate the images. SVM model is
b. Data augmentation implemented with scikit-learn and the deep learning models
are implemented with TensorFlow.
The data augmentation technology generates similar but The parameters for training the convolutional neural
different training samples by making a series of random networks are set as follows: the training epoch is 30, the
changes to the training images, thereby expanding the size of batch size is 10, the loss function is cross entropy and the
the training data set. Randomly changing the training final evaluation metric is accuracy.
samples can reduce the model's dependence on certain Specifically, we conducted the following experiments.
attributes, thereby improving the generalization ability of the
Firstly, we compared the performance of the SVM and CNN
model. For example, we can crop the image in different ways
to make the object of interest appear in different positions, models. Both models used the same original dataset. Then
thereby reducing the dependence of the model on the for the CNN model, we compared the cases with and without
position of the object. We can also adjust factors such as the data augmentation technique. Then we kept using the
brightness and color to reduce the model's sensitivity to data augmentation technique, instead changed the different
color. Therefore, we added image recognition technology parameters. The first change is the optimizer, by replacing
without changing the model to see if the accuracy of the Adam with SGD and RMSprop. The second change is the
model has been improved. kernel size, by replacing 3 with 5. The third and last change
is the activation function, by replacing ReLU with
V. RESULTS AND ANALYSIS LeakyReLu. We show these two functions in Figure 2, in
which the parameter a is set to 0.1 empirically.
A. Experiment Setting B. Result Analysis
We conducted a series of experiments in this paper. The We show the test accuracy for different models in Figure
experiments used a desktop computer with Windows 10 OS. 3.
Python is used as the major programming language. OpenCV
f (y) f (y)
f (y) = y f (y) = y
f (y) = 0 y y
f (y) = ay
439
Authorized licensed use limited to: ULAKBIM UASL - Sutcu Imam Universitesi. Downloaded on October 26,2022 at 12:34:06 UTC from IEEE Xplore. Restrictions apply.
Figure 5. The change of loss when data augmentation is not
used.
VI. CONCLUSION
In this paper, we take the cat and dog classification
problem as an example, evaluate the machine learning
represented by SVM and the deep learning model
Figure 4. The change of accuracy when data augmentation is represented by CNN, and verify that CNN is superior to
not used. SVM in image classification. We also analyzed the influence
of different parameters on the CNN model. Our results can
provide a reference for the selection of corresponding
parameters for similar problems.
REFERENCES
[1] G. Bradski G, Kaehler A. Learning OpenCV: Computer vision with
the OpenCV library[M]. " O'Reilly Media, Inc.", 2008.
440
Authorized licensed use limited to: ULAKBIM UASL - Sutcu Imam Universitesi. Downloaded on October 26,2022 at 12:34:06 UTC from IEEE Xplore. Restrictions apply.
[2] Deng L. The mnist database of handwritten digit images for machine [8] Sharma N, Jain V, Mishra A. An analysis of convolutional neural
learning research [best of the web][J]. IEEE Signal Processing networks for image classification[J]. Procedia computer science,
Magazine, 2012, 29(6): 141-142. 2018, 132: 377-384.
[3] Cohen G, Afshar S, Tapson J, et al. EMNIST: Extending MNIST to [9] Jiang W, Zhang L. Edge-siamnet and edge-triplenet: New deep
handwritten letters[C]//2017 International Joint Conference on Neural learning models for handwritten numeral recognition[J]. IEICE
Networks (IJCNN). IEEE, 2017: 2921-2926. Transactions on Information and Systems, 2020, 103(3): 720-723.
[4] Jiang W. MNIST-MIX: A Multi-language Handwritten Digit [10] Jiang W. Evaluation of deep learning models for Urdu handwritten
Recognition Dataset[J]. IOPSciNotes, 2020, 1(025002). characters recognition[C]//Journal of Physics: Conference Series. IOP
[5] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical Publishing, 2020, 1544(1): 012016.
image database[C]//2009 IEEE conference on computer vision and [11] Gonzalez R C, Woods R E, Eddins S L. Digital image processing
pattern recognition. Ieee, 2009: 248-255. using MATLAB[M]. Pearson Education India, 2004.
[6] Kouropteva O, Okun O, Pietikäinen M. Classification of handwritten [12] Goodfellow I, Bengio Y, Courville A, et al. Deep learning[M].
digits using supervised locally linear embedding algorithm and Cambridge: MIT press, 2016.
support vector machine[C]//ESANN. 2003: 229-234.
[7] Greeshma K V, Sreekumar K. Fashion-MNIST classification based
on HOG feature descriptor using SVM[J]. International Journal of
Innovative Technology and Exploring Engineering, 2019, 8: 960-962.
441
Authorized licensed use limited to: ULAKBIM UASL - Sutcu Imam Universitesi. Downloaded on October 26,2022 at 12:34:06 UTC from IEEE Xplore. Restrictions apply.