Final Memory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

1

utfcode
utf8.sty 3.10 UTF-8 input encoding 13.06.2000
scanner for code UTF-8 installed.
utf8
MINISTRY OF HIGHER EDUCATION AND SCIENTIFIC RESEARCH

UNIVERSITY OF DJILALI BOUNAAMA KHEMIS MILIANA

FACULTY OF SCIENCE AND TECHNOLOGY

DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

THESIS PRESENTED TO OBTAIN

THE MASTER DEGREE ON « COMPUTER SCIENCE»

OPTION: «SOFTWARE ENGINEERING AND DISTRIBUTED SYSTEMS»

Classification of Parking Images Based


on Transfer Learning (Case Study:
Xception)

The jury composed of:


Realized by:
Mr.N.azzouza : President
RAMLA mohammed abdelhadi
Mr.O.boukadoum : Examinator
ZERGANE abdelkader
Mr. A.khalfi : Supervisor
2

Academic year: 2022/2023.


DEDICATION

I would like to express my gratitude to Allah for His blessings

and guidance throughout this journey.

I am also grateful to all the professors who have imparted their knowledge and wisdom,

no matter how small,

and I extend my heartfelt thanks to my parents, Brothers, and all my friends,

without exception, for their unwavering support

and encouragement.

I would also like to express my sincere appreciation to M.khalfi for his invaluable

contributions to this humble work.

- Abdelkader -Mohammed

i
thanks

We thank Allah who helped to achieve this work

We also thank our framer KHALFI Ali for Help

and advice on this work

We would also like to thank the members of the jury for accepting To review

and assess this work.

ii
ABSTRACT

In recent decades, there has been a significant rise in the number of vehicles, leading cities to
grapple with the persistent issue of parking. Particularly in highly desirable neighborhoods,
the demand for parking spaces continues to grow, which negatively impacts the overall quality
of living environments.

To address this challenge, we propose utilizing a pre-trained Xception model combined with
artificial neural networks, specifically convolutional neural networks, and transfer learning tech-
niques for the classification of parking spaces as either free or occupied. We will employ the
PKLot image database as our dataset for training and evaluation.

Our approach involves leveraging the existing Xception model and adapting it to new datasets to
enhance the accuracy and effectiveness of parking space classification. By utilizing convolutional
neural networks and transfer learning, we aim to improve upon the results obtained thus far.

Through this work, we will explore the potential of the Xception model in conjunction with new
datasets to achieve better performance in the classification of parking spaces. Our objective
is to contribute to the advancement of parking management systems and provide insights for
future research and development in this area.

Keyword— : PKLot,convolutional neural networks,Xception model,datasets,transfer learning.

iii
Résumé

Au cours des dernières décennies, on a constaté une augmentation significative du nombre de


véhicules, ce qui a conduit les villes à lutter contre le problème persistant du stationnement.
Particulièrement dans les quartiers très prisés, la demande de places de stationnement continue
de croître, ce qui a un impact négatif sur la qualité globale des environnements de vie.

Pour relever ce défi, nous proposons d’utiliser un modèle Xception pré-entraîné combiné à des
réseaux neuronaux artificiels, plus précisément des réseaux neuronaux convolutifs, ainsi que
des techniques d’apprentissage par transfert pour la classification des places de stationnement
comme étant libres ou occupées. Nous utiliserons la base de données d’images PKLot comme
ensemble de données pour l’entraînement et l’évaluation.

Notre approche consiste à exploiter le modèle Xception existant et à l’adapter à de nouveaux


ensembles de données afin d’améliorer la précision et l’efficacité de la classification des places de
stationnement. En utilisant des réseaux neuronaux convolutifs et l’apprentissage par transfert,
nous visons à améliorer les résultats obtenus jusqu’à présent.

Grâce à ce travail, nous explorerons le potentiel du modèle Xception en conjonction avec de


nouveaux ensembles de données pour obtenir de meilleures performances dans la classification
des places de stationnement. Notre objectif est de contribuer à l’avancement des systèmes de
gestion du stationnement et de fournir des connaissances pour des recherches et développements
futurs dans ce domaine.

Keyword— : modèle Xception , réseaux neuronaux artificiels ,techniques d’apprentissag , base de

donneé PKLot ,

iv
áJJj
áJJj
áJJj

v
CONTENTS

DEDICATION i

THANKS ii

ABSTRACT iii

Résumé iv

abstract v

CONTENTS vi

LIST OF FIGURES x

General Introduction 1

1 Images concepts and analysis 2


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Image concepts : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 definition : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 image file format : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 image color formats : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 image characteristics : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 image processing : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 definition : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 image processing methods : . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Image similarity : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

vi
CONTENTS

1.4.1 pixel-based similarity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


1.4.2 feature-based similarity: . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 conclusion : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 State of the art 14


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Artificial Intelligence(AI): . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Machine learning(ML) : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 definition : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Type of ML : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 supervised learning : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 unsupervised learning : . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.3 reinforcement learning : . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.4 Neural Network : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Deep learning : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.1 definition : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5.2 method : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Transfer Learning : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.1 definition : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6.2 transfer learning techniques : . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Exploring Work Methods for Car Park Picture Classification: . . . . . . . . . . . 25
2.7.1 feature extraction-based methods: . . . . . . . . . . . . . . . . . . . . . 25
2.7.2 Deep Learning Based Methods : . . . . . . . . . . . . . . . . . . . . . . 26
2.7.3 Transfer learning Based Methods : . . . . . . . . . . . . . . . . . . . . . 28
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Transfer Learning with Xception model


for Parking images classification 31
3.1 Indroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 The Xception model architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 The InceptionV3 model architecture . . . . . . . . . . . . . . . . . . . . . . . . 33

vii
CONTENTS

3.4 Xception reuse method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34


3.5 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Network architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.1 depth and Width of Neural Network . . . . . . . . . . . . . . . . . . . . 35
3.6.2 convolution layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.3 averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.4 activation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7 Learning and optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7.1 optimization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.7.2 learning rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.7.3 epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 Regularization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8.1 batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8.2 data augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.9 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10 Presentation of the architectures of our proposals . . . . . . . . . . . . . . . . . 38
3.10.1 Model 1: Xception Model with Additional Convolution Layer and Regu-
larization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10.2 Model 2 : Xception Model Concatenated with Inception Model . . . . . 40
3.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Implementation and discussion


of results 43
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Hardware used in the implementation . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Software and libraries used in the implementation . . . . . . . . . . . . . . . . . 44
4.3.1 Kaggle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.3 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.4 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.5 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

viii
CONTENTS

4.3.6 NumPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.7 Scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3.8 summary of the PKLot database . . . . . . . . . . . . . . . . . . . . . . 51
4.3.9 Sub Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.10 Compilation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Presentation of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.1 Model 1: Xception Model with Additional Convolution Layer and Regu-
larization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.2 Model 2 : Xception Model Concatenated with Inception Model . . . . . . 59
4.5 Discussion of results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5.1 Compare our results with Different works . . . . . . . . . . . . . . . . . . 64
4.6 conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

ix
List of Figures

1.1 Pixel file versus Vector file. [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


1.2 the RGB colors. [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 the CMYK colors. [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Neural Networks.[14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 the similarity between two images. [17] . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 IA vs ML vs DP. [19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15


2.2 Types of Machine Learning. [21] . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Neural Network.[23] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Neural Network layer. [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 convolutional neural network(CNN). [25] . . . . . . . . . . . . . . . . . . . . . . 20
2.6 Pooling layer operation oproaches. [26] . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Transfer learning. [30] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.8 Feature Extraction-based Methods . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Deep learning based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Transfer Learning Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 The Xception model architecture [28] . . . . . . . . . . . . . . . . . . . . . . . . 33


3.2 The Inception model architecture [29] . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 The First Model Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 the second model schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Google Kaggle Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


4.2 TensorFlow Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Keras Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

x
4.4 python Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 matplotlib Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 NumPy Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 scikit learn Logo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.8 Example of images of empty and occupied parking spaces . . . . . . . . . . . . . 52
4.9 images of parking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.10 The result of Xception model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.11 accuracy and loss of the first model for the base PUC 000. . . . . . . . . . . . . 56
4.12 accuracy and loss of the first model for the base UFP R004. . . . . . . . . . . . 57
4.13 accuracy and loss of the first model for the base UFP R005. . . . . . . . . . . . 58
4.14 The result of Xception and inception model . . . . . . . . . . . . . . . . . . . . 59
4.15 accuracy and loss of the second model for the base PUC 000. . . . . . . . . . . . 60
4.16 accuracy and loss of the second model for the base UFP R004. . . . . . . . . . . 61
4.17 accuracy and loss of the second model for the base UFP R005. . . . . . . . . . . 62
General Introduction

Today, artificial intelligence-based applications have become integrated into various aspects of
our daily lives across nearly all sectors, including industry, commerce, healthcare, education,
and other services. These sectors benefit greatly from the capabilities offered by artificial
intelligence.

The proliferation of vehicles has resulted in several negative impacts, such as traffic congestion,
air pollution, noise levels, and limited availability of parking spaces. The current solution
proposed to address these issues is the development of an intelligent parking system. This
smart parking system provides information about parking space availability in a given area,
aiming to alleviate congestion in parking areas.

Deep learning, specifically convolutional neural networks (CNNs), offers a viable approach for
solving parking space classification problems. The objective of this study is to modify the
architecture of CNNs, which are a part of deep learning, to effectively classify parking spaces.

In this research, we will utilize a pretrained Xception model as the foundation for our pro-
posed model to detect parking occupancy using traffic network images. We will leverage the
capabilities of convolutional neurons (CNNs) and transfer learning techniques.

Our work is divided into four chapters:

• 1st chapter: Images concepts and analysis.

• 2nd chapter: State of the art, discussing the existing literature and research in the field.

• 3rd chapter: The proposed model, outlining our approach and modifications to the CNN
architecture.

• 4th chapter: Implementation and discussion of results, where we present our experimental
implementation and analyze the obtained outcomes.
Chapter 1

Images concepts and analysis

2
Chapter 1. Images concepts and analysis

1.1 Introduction

Images have become one of the most important means of communication in today’s world,
allowing us to convey complex ideas and information with ease. In the field of computer
vision, images are analyzed and processed to extract valuable information, identify patterns,
and make decisions. In this chapter, we will present the fundamental concepts of images and
explore the visual similarity between them. By understanding these concepts, we can gain a
deeper understanding of how images are processed and analyzed in computer vision, and how
they can be used to drive insights and inform decision-making.

1.2 Image concepts :

1.2.1 definition :

An image can be defined as a visual representation of information, typically in a two-dimensional


format that can be displayed on a screen or printed on a physical medium. This definition is
commonly used in the field of computer vision and image processing. For instance, Gonzalez
and Woods (2017) provide an example of how images can be defined as a function of two
variables, x and y, which correspond to the spatial coordinates of a pixel in the image. This
function maps each pixel location to an intensity or color value, thereby forming a complete
visual representation of the image. [1]

1.2.2 image file format :

1.2.2.1 vector graphics :

It is a digital artwork in which points, lines, and curves are calculated by a computer. They are
basically giant math equations, you can assign a color, border, or thickness to each equation to
turn your shapes into works of art. Here is some examples for vector graphics file format :

1.2.2.1.1 PDF : It is an image format used to display documents and graphics correctly,
regardless of device, application, operating system, and web browser. It also has a strong
foundation of vector images.

3
Chapter 1. Images concepts and analysis

1.2.2.1.2 SVG : (Scalable Vector Graphics) Extensibility stands for, it describes the image
as an XML application.

1.2.2.1.3 EPS : (Encapsulated PostScript) Includes vector and bitmap data Typically, an
EPS file includes a single design element that can be used in a larger design.

1.2.2.2 raster images (bitmap) :

Bitmap images are made up of a grid of dots called pixels, where each pixel is assigned a color.
Unlike vector images, bitmap images are resolution-dependent, which means they exist in one
size. Here is some examples for bitmap file format :

1.2.2.2.1 JPEG : is a graphic image file produced according to the Joint Photographic
Experts Group standard. This group of experts develop and maintain standards for a suite of
compression algorithms for computer image files.

1.2.2.2.2 GIF : stands for Graphics Interchange Format. GIFs use a two-dimensional (2D)
raster data type and are binarily encoded.

1.2.2.2.3 PNG : is the Portable Network Graphics file format for image compression. It
provides several improvements over the GIF format.

1.2.2.2.4 TIFF : (Tag Image File Format) is a standard format for exchanging raster
graphic (bitmap) images between application programs, including those used for scanner im-
ages.

4
Chapter 1. Images concepts and analysis

Figure 1.1: Pixel file versus Vector file. [2]

1.2.3 image color formats :

1.2.3.1 RGB :

(Red, Green and Blue) refers to a system representing the colors used on a digital display
screen, which creates any color you need by mixing red, green and blue and varying their
intensity (known as additive mixing)

5
Chapter 1. Images concepts and analysis

Figure 1.2: the RGB colors. [3]

1.2.3.2 CMYK :

(Cyan, Magenta, Yellow, Key/Black) is the color space for printed materials, which combines
CMYK colors to varying degrees with physical ink (known as subtractive mixing)

6
Chapter 1. Images concepts and analysis

Figure 1.3: the CMYK colors. [4]

1.2.4 image characteristics :

1.2.4.1 color :

Color in images refers to the visual sensation produced by the spectral composition of light
that is captured by an imaging system. In digital images, color is typically represented using
a combination of red, green, and blue (RGB) values, which specify the intensity of each color
channel at each pixel location. Color can be defined as a fundamental attribute of images that
can provide important visual information about the underlying objects or scenes.[5]

1.2.4.2 texture :

Texture in images refers to the visual patterns or structures that are repeated over a local
region of the image. Texture is an important characteristic of images that can convey important

7
Chapter 1. Images concepts and analysis

information about the underlying objects or scenes. Texture can be defined as a spatial pattern
that exhibits a degree of repetition and can be described by statistical measures such as mean,
variance, and co-occurrence matrices. [6]

1.2.4.3 resolution :

Resolution in images refers to the amount of detail or information that can be captured and
displayed by an imaging system. It is typically measured in terms of the number of pixels in
the image, with higher resolutions corresponding to larger numbers of pixels and finer levels
of detail. Image resolution can be defined as the total number of pixels in the image, which
determines the level of detail that can be captured. [7]

1.2.4.4 luminance :

Luminance in images refers to the perceived brightness of an image, which is determined by


the amount of light that is reflected or emitted from the image. Luminance is an important
characteristic of images that can have a significant impact on their visual quality and appear-
ance. Luminance can be defined as the component of an image that represents the intensity of
light, without taking into account any color information. [8]

1.2.4.5 edge :

Edges in images refer to the boundaries or transitions between regions of different intensities or
colors. They are important features of images that can provide valuable information about the
underlying objects or scenes. edges can be defined as the points or lines of maximum intensity
gradient in an image. [9]

1.3 image processing :

1.3.1 definition :

Image processing is the manipulation of digital images using mathematical operations or al-
gorithms to improve their visual quality, extract information, or perform analysis. Image pro-
cessing can be described as a broad and interdisciplinary field that draws on concepts and

8
Chapter 1. Images concepts and analysis

techniques from mathematics, physics, computer science, and engineering. Image processing
can be used for a wide range of applications, including image enhancement, restoration, seg-
mentation, compression, and recognition. [1]

1.3.2 image processing methods :

1.3.2.1 image Editing :

Image restoration involves the process of enhancing the quality of a digital image that has been
degraded by various forms of distortion, such as noise or blur. The goal of image restoration is
to recover the original image as much as possible, while taking into account the specific type
of degradation that has occurred. This typically requires the use of mathematical models and
algorithms, such as filtering and deconvolution, to remove or reduce the distortion in the image.
[10]

1.3.2.2 image restoration:

Image restoration involves the process of enhancing the quality of a digital image that has been
degraded by various forms of distortion, such as noise or blur. The goal of image restoration is
to recover the original image as much as possible, while taking into account the specific type
of degradation that has occurred. This typically requires the use of mathematical models and
algorithms, such as filtering and deconvolution, to remove or reduce the distortion in the image.
[11]

1.3.2.3 independent component analysis (ICA):

It is a computational method used for separating a multivariate signal into independent, non-
Gaussian components. ICA assumes that the observed signal is a linear combination of un-
known, independent sources, and its goal is to estimate the sources from the observations with-
out any prior knowledge of the mixing process. ICA is widely used in various fields, including
signal processing, image analysis, and neuroscience, among others. [12]

9
Chapter 1. Images concepts and analysis

1.3.2.4 anisotropic diffusion :

It is a technique used for smoothing images and reducing noise in a selective manner that
preserves high-contrast features such as edges while smoothing out regions of low contrast.
This method applies a diffusion process that varies across the image according to the local
image structure. This technique finds applications in a wide range of fields, including image
processing, computer vision, and medical imaging.[13]

1.3.2.5 linear filtering:

It is a common signal and image processing technique that involves modifying an input signal
or image using a linear combination of weighted coefficients. Linear filters can be used for
various tasks, such as smoothing, sharpening, edge detection, and noise reduction. These filters
are typically defined by a kernel or mask that specifies the relative weights for each pixel or
sample in the filter window. Linear filtering is a powerful tool for manipulating signals and
images, with numerous applications in fields such as computer vision, audio processing, and
telecommunications. [11]

1.3.2.6 neural networks :

Neural networks are the computational models used in machine learning for solving various
tasks, it’s a series of algorithms that recognize relationships between vast amounts of data,
their name and structure are inspired by the human brain, mimicking the way that biological
neurons signal to one another. Neural networks rely on training data to learn and improve
their accuracy over time. However, once these learning algorithms are fine-tuned for accuracy,
it’s allowing us to classify and cluster data at a high velocity.

10
Chapter 1. Images concepts and analysis

Figure 1.4: Neural Networks.[14]

1.3.2.7 segmentation :

Segmentation is the process of partitioning an image into multiple regions or objects, with the
goal of identifying and separating regions of interest from the background or other regions.
Segmentation can be performed using various techniques, such as thresholding, region growing,
edge detection, and clustering. Segmentation is a fundamental task in image processing and
computer vision, with applications in fields such as medical imaging, robotics, and remote
sensing. [11]

1.3.2.8 wavelets :

Wavelets are mathematical functions that can be used to decompose signals and images into
different frequency components with localized support in both time and frequency domains.
Unlike Fourier analysis, which uses harmonic sine and cosine functions to represent signals,
wavelet analysis allows for more flexible representations of signals with non-stationary behavior
and time-varying frequencies. Wavelets have found widespread applications in signal and image
processing, including compression, denoising, edge detection, and feature extraction. [15]

1.3.2.9 Self-organizing Maps (SOM):

Self-organizing maps (SOM) are a type of artificial neural network that can be used for un-
supervised learning and data visualization. SOMs are designed to represent high-dimensional
input data onto a two-dimensional grid of neurons, where neurons that are close to each other

11
Chapter 1. Images concepts and analysis

in the grid correspond to similar input vectors. SOMs use a competitive learning process to
adjust the weights of the neurons gradually, based on the input data, until the neurons form a
topological map of the input space. SOMs have a wide range of applications in fields such as
data mining, image processing, and pattern recognition, among others.[16]

1.4 Image similarity :

Image similarity is a measure of how much two or more images resemble each other. It is
a critical concept in computer vision, and it finds applications in various tasks such as image
retrieval, object recognition, and image clustering. For instance, in image retrieval, the aim is
to find images that are similar to a query image. This can be accomplished by computing a
similarity score between the query image and the images in the database. By measuring the
similarity between images, we can better understand the content of the image and gain insights
into how images are related to each other.

Figure 1.5: the similarity between two images. [17]

There are several methods to measure image similarity, which depend on the application and
the types of images being compared. Here are some common approaches:

1.4.1 pixel-based similarity:

This method involves computing the similarity between images based on the pixel values. For
instance, we can compute the Euclidean distance or cosine similarity between the pixel values
of two images. However, this approach may not be robust to changes in lighting, orientation,

12
Chapter 1. Images concepts and analysis

and scale.

1.4.2 feature-based similarity:

This method involves extracting features from the images and computing the similarity between
the features. We can extract features using techniques such as SIFT, SURF, or deep learning-
based approaches. This method is often more robust to changes in lighting, orientation, and
scale. Structural similarity: This method involves comparing the structure or layout of the
images. For example, we can compute the similarity between the edges, contours, or shapes of
the images.

Overall, image similarity is a critical concept in computer vision, and it has numerous appli-
cations in various fields, including image retrieval, object recognition, and image clustering.
By measuring the similarity between images, we can gain a better understanding of the image
content and identify relationships between images.

1.5 conclusion :

In conclusion, this chapter provided an overview of the general concept of images and
their different components. We discussed the importance of image processing in the field of
machine learning, where it is used as a preprocessing step to extract and enhance features
from images. Image processing techniques can help to improve the quality of images, remove
noise, or transform the images to a different domain. The importance of image processing in
computer vision tasks such as image recognition, object detection, and image retrieval cannot
be overstated.

13
Chapter 2

State of the art

14
Chapter 2. State of the art

2.1 Introduction

Recent advances in machine learning and artificial intelligence have led to remarkable progress
in various fields, with deep learning being one of the most significant developments. This tech-
nology has revolutionized computer vision, audio processing, and many other domains, allowing
for dramatic improvements in image classification, facial recognition, speech recognition, nat-
ural language processing, and more. The performance level of deep learning is extraordinary,
and ongoing research and development will likely lead to even more impressive breakthroughs
in the future.

2.2 Artificial Intelligence(AI):

the term Artificial Intelligence(AI) was coined in the 1950s. It is about creation of machines
that perform tasks that, if performed by a human, would require intelligence. It is unique,
sharing borders with mathematics, computer science, philosophy, psychology, biology, cognitive
science and many others. Artificial Intelligence (AI) is the field of computer science that
focuses on creating computer systems capable of performing tasks that typically require human
intelligence. It involves the development of algorithms and models that enable machines to
learn from data, adapt to new information, and make informed decisions or take actions based
on their understanding of a specific problem domain. [18]

Figure 2.1: IA vs ML vs DP. [19]

15
Chapter 2. State of the art

2.3 Machine learning(ML) :

2.3.1 definition :

Machine learning is a subfield of artificial intelligence (AI) that involves building algorithms
and models that can automatically learn and improve from experience without being explicitly
programmed. Machine learning algorithms use statistical techniques to analyze and learn from
data, and can be trained on large datasets to identify patterns and relationships that can be
used to make predictions or decisions. [20]

2.4 Type of ML :

2.4.1 supervised learning :

In supervised learning, the algorithm is trained on labeled data, where the correct output is
known for each input. The goal of it is to learn a mapping between inputs and outputs, so that
the algorithm can make accurate predictions on new, unseen data.

2.4.2 unsupervised learning :

In unsupervised learning, the algorithm is trained on unlabeled data and must identify patterns
and relationships on its own. The goal of it is to identify patterns or structure in the data,
without being given any specific labels or categories.

2.4.3 reinforcement learning :

Reinforcement learning involves training an algorithm to make decisions based on feedback


from its environment, with the goal of learning a policy that maximizes the cumulative reward
over a sequence of actions.

16
Chapter 2. State of the art

Figure 2.2: Types of Machine Learning. [21]

2.4.4 Neural Network :

2.4.4.1 definition :

A neural network is a type of machine learning algorithm that is inspired by the structure and
function of the human brain. Neural networks consist of interconnected nodes, or "neurons",
that process information and learn from data. The basic building block of a neural network
is the perceptron, which takes a set of inputs, applies a set of weights to those inputs, and
produces an output. Multiple perceptrons can be combined to form a layer. Neural networks
are trained using a process called back propagation, which involves adjusting the weights of
the connections between neurons in response to errors in the output. During training, the
network is presented with a set of inputs and the correct output, and the weights are adjusted
to minimize the difference between the predicted output and the correct output.[22]

17
Chapter 2. State of the art

Figure 2.3: Neural Network.[23]

2.4.4.2 Neural Network layers :

In a neural network, layers are used to organize the neurons and their connections. The most
common types of layers in neural networks are:

2.4.4.3 input layer :

The input layer receives the raw data and passes it on to the next layer. The number of neurons
in the input layer is determined by the number of features in the input data.

2.4.4.4 hidden layers:

Hidden layers are intermediate layers that perform computations on the input data. The
number of hidden layers and the number of neurons in each hidden layer are determined by the
complexity of the problem being solved and the size of the dataset.

2.4.4.5 output layer :

The output layer produces the final output of the neural network. The number of neurons in
the output layer is determined by the number of classes or the number of regression targets.

18
Chapter 2. State of the art

The output can be calculated using the convolution operation as follows:

Y(i, j, k) = sum(sum(sum(X(a, b, c) * W(d, e, c, k)))) + b(k)

where:

Y(i, j, k) represents the value at position (i, j) in the k-th feature map. X(a, b, c) represents
the value at position (a, b) in the input image for channel c. W(d, e, c, k) represents the value
at position (d, e) in the filter for channel c and output channel k. b(k) represents the bias term
for the k-th filter.

Figure 2.4: Neural Network layer. [24]

19
Chapter 2. State of the art

2.4.4.6 convolutional layers:

These layers are used in convolutional neural networks for image and video processing tasks.
They apply filters to the input data to extract features that are important for the task. In
addition to these basic layers, there are also specialized layers that can be added to neural
networks for specific tasks.

Figure 2.5: convolutional neural network(CNN). [25]

20
Chapter 2. State of the art

2.4.4.7 recurrent layers:

These layers are used in recurrent neural networks for sequential data processing tasks. They
use feedback connections to process sequences of data, such as speech or text.

2.4.4.8 Pooling Layers:

These layers are used in convolutional neural networks to reduce the dimensionality of the
feature maps generated by the convolutional layers.

Figure 2.6: Pooling layer operation oproaches. [26]

21
Chapter 2. State of the art

2.4.4.9 dropout layers:

These layers are used to prevent over fitting by randomly dropping out some of the neurons
during training.

2.5 Deep learning :

2.5.1 definition :

Deep learning is a type of machine learning that uses artificial neural networks (ANNs) to learn
and make predictions or decisions from data. Deep learning algorithms are designed to mimic
the structure and function of the human brain, with multiple layers of interconnected nodes that
can process information and extract features from data. Deep learning is particularly effective
for tasks such as image and speech recognition, natural language processing, and decision-
making in complex environments. For example, deep learning has been used to develop self-
driving cars, to improve medical imaging and diagnosis, and to develop more accurate language
translation systems.[26]

2.5.2 method :

There are several types of deep learning algorithms, each designed for specific tasks and appli-
cations. Some of the most common deep learning algorithms include:

2.5.2.1 Convolutional Neural Networks (CNNs):

CNNs are used for image and video recognition tasks, and are designed to automatically identify
spatial patterns in images and learn hierarchical representations of visual data.

2.5.2.2 Recurrent Neural Networks (RNNs):

RNNs are used for sequential data such as speech, text, and time series data. RNNs can process
input sequences of varying lengths, and use the output from previous time steps to inform their
current decision-making.

22
Chapter 2. State of the art

2.5.2.3 Generative Adversarial Networks (GANs):

GANs are used for generating new data, such as images or text, that are similar to a given
dataset. GANs consist of two neural networks - a generator network that generates new data,
and a discriminator network that tries to distinguish between real and generated data.

2.5.2.4 Autoencoders:

Autoencoders are used for unsupervised learning and feature extraction. They work by com-
pressing input data into a lower-dimensional representation, and then reconstructing the original
data from this representation.

2.5.2.5 Restricted Boltzmann machines (RBMs):

Restricted Boltzmann machines are a type of generative neural network that can learn to model
the probability distribution of a set of input data. RBMs consist of a layer of visible units and a
layer of hidden units, with connections between them that are weighted by learned parameters.

supervised CNN,RNN,GAN
Deep learning algorithms
unsupervised RBN,autoncoders

Table 2.1: Deep learning Algorithmes

2.6 Transfer Learning :

2.6.1 definition :

Transfer learning is a machine learning technique that involves using a pre-trained model as a
starting point for a new task, rather than training a model from scratch on a new dataset. The
pre-trained model is typically a deep neural network that has been trained on a large dataset,
then it is used as a feature extractor, where the output of one or more layers in the network is
used as input to a new classifier that is trained on the new dataset.[27]

2.6.2 transfer learning techniques :

There are several transfer learning techniques that can be used depending on the specific task
and the pre-trained model being used. Some common transfer learning techniques include:

23
Chapter 2. State of the art

Figure 2.7: Transfer learning. [30]

2.6.2.1 fine-tuning:

In fine-tuning, the pre-trained model is further trained on the new dataset with a small learning
rate, allowing the model to adapt to the new task while retaining the learned features. Fine-
tuning is often used for tasks such as image classification and natural language processing.

2.6.2.2 feature extraction:

In feature extraction, the pre-trained model is used as a fixed feature extractor, and a new
classifier is trained on top of the extracted features. This technique is often used for tasks such
as image retrieval and sentiment analysis.

2.6.2.3 multi-task Learning:

In multi-task learning, the pre-trained model is used to solve multiple related tasks simultane-
ously. The idea behind multi-task learning is that the model can learn to share information
across tasks, leading to better performance on all tasks.

2.6.2.4 domain adaptation:

In domain adaptation, the pre-trained model is adapted to a new domain that is similar but
not identical to the original domain. This technique is often used for tasks such as object

24
Chapter 2. State of the art

recognition and speech recognition in different environments or languages.

2.7 Exploring Work Methods for Car Park Picture Clas-

sification:

2.7.1 feature extraction-based methods:

Several approaches have been proposed for classifying individual parking spaces in various
datasets. Bay et al.(2008) introduced an approach utilizing descriptor and color features,
achieving an impressive 95.5% overall accuracy on the PKLot dataset through the use of a
support vector machine (SVM) classifier. Their work demonstrated the effectiveness of their
method across different weather conditions and camera viewpoints. Amato et al. (2019a)
developed another approach for car park image classification, employing smart cameras and
techniques such as background modeling and the Canny edge detector. They achieved an
overall accuracy of 92.5% on the CNRPark-EXT dataset. Similarly, Raj et al. (2019) utilized
the Canny edge detector and combined it with a transformation to the LUV color space to
generate features for car park image classification. They outperformed other state-of-the-art
methods with an accuracy of 98.7% on the same CNRPark-EXT dataset, utilizing a random
forest classifier. Hammoudi et al. (2018a,b, 2019, 2020) proposed the use of Local Binary
Pattern (LBP)-based features for parking lot image classification. They employed a k-NN
classifier and tested their method on small image subsets from the PKLot dataset, ranging
from 3,000 to 6,000 segmented images. However, the specific method used to group the images
into subsets remains unclear. In Hammoudi et al.(2019), the authors also incorporated Support
Vector Machine (SVM) classifiers and evaluated changes within parking lots using a subset of
PKLot and the CNRPark-EXT dataset.

These various approaches highlight the ongoing research and development in the field of park-
ing space classification, with each method offering unique contributions and achieving notable
results.

25
Chapter 2. State of the art

Figure 2.8: Feature Extraction-based Methods

2.7.2 Deep Learning Based Methods :

Krizhevsky et al. (2012) introduced AlexNet, a deep convolutional neural network (CNN) ar-
chitecture for image classification. They evaluated their approach on the ImageNet Large Scale
Visual Recognition Challenge (ILSVRC) dataset and achieved a top-5 error rate of 15.3%. This
was significantly lower than the best-performing method at the time, demonstrating the effec-
tiveness of deep CNNs for image recognition tasks and contributing to the widespread adoption
of deep learning in computer vision. Amato et al. (2016) proposed a CNN-based approach for
parking lot occupancy detection. They utilized the CNRPark-EXT dataset and achieved an ac-
curacy of 98.73% for detecting the occupancy status of parking spaces in outdoor parking lots.
Their approach demonstrated the effectiveness of deep learning techniques in accurately deter-
mining parking lot occupancy. Nyambal and Klein (2017) developed a CNN-based approach
for parking space classification. They used the PKLot dataset for training and testing their ap-
proach, achieving an overall accuracy of 95.6%. They also evaluated their approach on a subset
of the CNRPark-EXT dataset and achieved an accuracy of 93.7%. These results highlighted
the effectiveness of deep learning techniques in accurately classifying parking spaces. Sandler
et al. (2018) proposed MobileNetV2, a deep CNN architecture for image classification. They
evaluated their approach on the ImageNet dataset and achieved a top-1 accuracy of 72% and

26
Chapter 2. State of the art

a top-5 accuracy of 91%. Notably, MobileNetV2 achieved these high accuracies while keeping
the number of parameters and computation cost significantly lower than other state-of-the-art
CNN architectures. This made MobileNetV2 suitable for deployment on mobile and embedded
devices with limited computational resources. Ding and Yang (2019) employed the YOLOv3
object detection framework for parking space detection and classification. They incorporated
residual blocks to extract more granular features and utilized the PKLot dataset for training
and testing. Their approach achieved an accuracy of 97.8% for parking space detection and
92.9% for parking space classification. This demonstrated the effectiveness of deep learning
techniques for accurate and efficient parking space detection and classification tasks. Khalfi
Ali and Guerroumi Mohamed (2020) proposed a deep learning-based approach for classifying
roadside parking spaces as vacant or occupied. They utilized a CNN to extract features from
parking space images and incorporated the dilation technique for classification. Their approach
achieved a high accuracy of 96.64% on the PKLot dataset. Additionally, they compared their
approach to other models like AlexNet and CarNet, showing faster learning time and a smaller
number of parameters, indicating efficiency and effectiveness in parking space classification.
These studies collectively showcase the effectiveness of deep learning techniques, such as deep
CNN architectures, for tasks related to image classification, occupancy detection, and space
classification in parking-related applications.

27
Chapter 2. State of the art

Figure 2.9: Deep learning based Methods

2.7.3 Transfer learning Based Methods :

In their study, Zhang, Li, and Wu (2020) proposed a parking space classification method that
combined transfer learning with the VGG-16 CNN architecture and spatial pyramid pooling.
By leveraging pre-trained weights and spatial pyramid pooling techniques, they achieved an
impressive accuracy of 97.0% on the PKLot dataset, highlighting the effectiveness of their ap-
proach in accurately classifying parking spaces. Building upon this work, Gao et al. (2021)
introduced a transfer learning-based approach for parking space classification using the Xcep-
tion CNN architecture. They incorporated auxiliary tasks, namely rotation prediction and
image colorization, to enhance the primary classification task. Through their comprehensive
framework, which combined Xception architecture, auxiliary tasks, multi-task learning, and
mixup, they achieved a remarkable accuracy of 98.1% on the PKLot dataset, demonstrating
the significant improvement in classification accuracy. Similarly, Liu et al. (2021) proposed a
transfer learning-based approach for parking space classification, employing the Inception-v3
CNN architecture. They introduced a multi-scale feature fusion technique, training the model
on multiple scales of input images to learn shared representations. By doing so, they achieved

28
Chapter 2. State of the art

an accuracy of 98.2% on the PKLot dataset, showcasing the effectiveness of their approach in
accurately classifying parking spaces. Moreover, Khalfi Ali and Guerroumi Mohamed (2023) de-
veloped a deep transfer learning approach for detecting parking space occupancy. They utilized
the Inception V3 CNN model, pre-trained on the ImageNet dataset, for feature extraction and
fine-tuning. Additionally, they employed image data augmentation techniques, such as rotation
and flipping, to augment the dataset and improve generalization. Their approach achieved an
impressive accuracy of 98.75% on a dataset of parking space images, outperforming existing
state-of-the-art methods. These studies collectively demonstrate the power of transfer learning,
various CNN architectures, and additional techniques such as spatial pyramid pooling, auxil-
iary tasks, multi-scale feature fusion, and data augmentation in accurately classifying parking
spaces and detecting their occupancy.

Figure 2.10: Transfer Learning Based Methods

29
Chapter 2. State of the art

2.8 Conclusion

For the field of computer vision, the use of neural networks, especially CNNs, is a crucial choice
for image classification. In this chapter, we have introduced the concept of neural networking
and CNNs within the framework of transfer learning. In the next chapter, we will discuss
recommended pre-trained models for image classification.

30
Chapter 3

Transfer Learning with Xception model


for Parking images classification

31
Chapter 3. Transfer Learning with Xception model
for Parking images classification

3.1 Indroduction

Transfer learning has become a popular technique in machine learning, allowing for the reuse
of knowledge obtained from one task to solve a different but similar problem. One powerful
example of this is the Xception model, a convolutional neural network (CNN) designed for image
classification. In this chapter, we will introduce the Xception model, explore its architecture,
and discuss methods for reusing the model and adjusting its settings to solve a variety of image
classification tasks.

3.2 The Xception model architecture

The Xception architecture is a convolutional neural network (CNN) designed for image classi-
fication tasks. It was introduced by François Chollet in 2016 as an extension of the Inception
architecture. The name "Xception" is a combination of "Extreme inception," reflecting the fact
that it pushes the concept of depthwise separable convolutions to its limits. The Xception
architecture has 36 convolutional layers, including 14 depthwise separable convolutional layers,
and several auxiliary classifiers. These convolutions are performed in two steps: the first step
is to perform a depthwise convolution, which applies a single filter to each input channel, and
the second step involves a pointwise convolution, which applies a set of filters to the output of
the depthwise convolution . The architecture also includes skip connections and batch normal-
ization layers. The number of layers in the Xception architecture is relatively small compared
to other state-of-the-art models, but its use of depthwise separable convolutions and other
optimization techniques make it highly efficient and accurate for image classification tasks.

32
Chapter 3. Transfer Learning with Xception model
for Parking images classification

Figure 3.1: The Xception model architecture [28]

3.3 The InceptionV3 model architecture

The InceptionV3 model architecture is a convolutional neural network (CNN) designed for image
classification tasks. It was introduced by Google researchers in 2015 as an extension of the
original Inception architecture. The InceptionV3 model utilizes a combination of convolutional
layers with different filter sizes to capture features at various scales.

The InceptionV3 architecture has 48 convolutional layers, including several Inception modules
that are designed to capture features at different levels of abstraction. These modules use
a combination of 1x1, 3x3, and 5x5 convolutions to capture both local and global features.
The architecture also includes auxiliary classifiers to improve training performance and reduce
overfitting.

33
Chapter 3. Transfer Learning with Xception model
for Parking images classification

Figure 3.2: The Inception model architecture [29]

3.4 Xception reuse method

To meet our specific needs, we plan to utilize a pre-trained model. Our approach involves
removing the original classifier and integrating a new classifier that satisfies our requirements.
To further enhance the model’s performance, we will implement a fine-tuning strategy

3.5 Fine-tuning

In machine learning, fine-tuning denotes the technique of refining a pre-trained model by train-
ing it further on a new, smaller dataset with labeled examples to enhance its performance on a
specific task. This involves modifying the model’s parameters, such as its weights and biases,
using the new dataset. Fine-tuning is frequently employed in transfer learning, where a pre-
trained model serves as a foundation for a novel task. By fine-tuning the pre-existing model to
suit the new task, the model can improve its ability to generalize and achieve better accuracy
than training a new model from the beginning.[26]

34
Chapter 3. Transfer Learning with Xception model
for Parking images classification

3.6 Network architecture

3.6.1 depth and Width of Neural Network

The learning ability of a neural network, whether it is a convolutional neural network or any
other type of neural network, is determined by the number of layers and hidden units (referred
to as depth and breadth). The primary goal is to set the number of units to a "large enough"
value, so that the network can learn the characteristics of the data. Networks that are too
small may be inadequate, while excessively large networks may cause overfitting. Determining
the appropriate size of a network requires selecting a starting point, observing its performance,
and adjusting it up or down accordingly

3.6.2 convolution layers

Convolutional layers, or conv layers, are the foundational units of convolutional neural networks
(CNNs). They execute a convolution operation on the input data, which involves applying a
set of learnable filters (or kernels) to the input tensor. The filters slide over the input tensor,
computing dot products between the filter weights and local regions of the input, and generating
multiple feature maps that capture different characteristics of the input. For instance, the
Conv2D() layer operates on a 4D tensor with dimensions (batch_size, height, width, channels)
and produces a 4D tensor with dimensions (batch_size, new_height, new_width, filters).

3.6.3 averaging

Convolutional neural networks (CNNs) frequently utilize this layer to decrease the spatial di-
mensions of the output while preserving information about individual channels. It is commonly
employed instead of fully connected layers towards the end of a CNN architecture, just before
the final classification layer. For instance, the GlobalAveragePooling2D() layer operates on a
4D tensor with dimensions (batch_size, height, width, channels), and produces a 2D tensor
of shape (batch_size, channels).

35
Chapter 3. Transfer Learning with Xception model
for Parking images classification

3.6.4 activation functions

Activation functions are typically non-linear functions that allow neural networks to learn more
complex functions than simple linear regression. This is because multiplying the weights of a
hidden layer only results in a linear transformation. Currently, the most popular activation
function is ReLU (Rectified Linear Unit), which allows faster training and performs well in hid-
den layers. For the output layer, the sigmoid function is commonly used in binary classification
tasks. In mathematical terms, the ReLU activation function can be written as:

ReLU(x) = x, if x > 0 0, if x <= 0

3.7 Learning and optimization

3.7.1 optimization algorithms

Optimizers are mathematical functions that modify network weights based on gradients and
other relevant information, depending on the specific formulation of the optimizer. These
functions are based on the concept of gradient descent, which involves iteratively reducing the
loss function by following the gradient in a greedy fashion. There are various types of optimizers
available, including SGD and Batch Gradient Descent. The equation for the SGD optimizer
can be described as follows:

θ(t + 1) = θ(t) − α ∗ δL(θ(t)) (3.1)

In this equation: θ(t) represents the parameters of the neural network at iteration t.

θ(t + 1) represents the updated parameters at iteration t + 1.

α (alpha) is the learning rate, which determines the step size of the parameter updates.

δL(θ(t)) represents the gradient of the loss function L with respect to the parameters θ at
iteration t.

36
Chapter 3. Transfer Learning with Xception model
for Parking images classification

3.7.2 learning rate

The learning rate is a hyperparameter that can be configured during neural network training
and typically takes on a small positive value ranging from 0.0 to 1.0. This parameter governs
the rate at which the model adjusts to the problem at hand. Lower learning rates necessitate
more training epochs due to the small weight changes that occur with each update, while higher
learning rates result in rapid changes that require fewer training iterations. If the learning rate
is too high, the model may converge too quickly toward a suboptimal solution, whereas if it is
too low, the learning process may be impeded.

3.7.3 epochs

In machine learning, a training iteration or epoch refers to a complete cycle in which the model
processes all of the available training data. The more training iterations the model performs,
the better it becomes at recognizing the characteristics of the training data. However, excessive
iterations may result in overfitting. To determine the optimal number of training epochs, it is
advisable to monitor the training and validation error values. As a general rule, we should aim
to keep training the network as long as the error value continues to decrease, indicating that
the model is still learning.

3.8 Regularization techniques

3.8.1 batch normalization

Batch normalization is a supervised learning technique that standardizes the outputs between
layers in a neural network, a process known as normalization. By doing so, the distribution
of the output from the previous layer is effectively reset, making it easier for the next layer to
process the information.

3.8.2 data augmentation

Data augmentation is a technique used to combat overfitting when obtaining more data is
not feasible. It involves generating new instances of training data by applying various trans-

37
Chapter 3. Transfer Learning with Xception model
for Parking images classification

formations to existing images. Some common image enhancement techniques include flipping,
rotating, scaling, zooming, adjusting lighting conditions, and other similar transformations. By
applying these techniques to our dataset, we can provide the learning algorithm with a diverse
range of images to learn from and improve its ability to generalize to new data.

3.9 Concatenation

Concatenation used to combine the outputs of two different models. This is known as model
concatenation and is a form of ensemble learning, where the predictions of multiple models
are combined to improve performance. In this case, the output of each model is typically a
tensor of the same shape, and the tensors are concatenated along a specific axis to produce
a combined output tensor. The combined output tensor can then be passed through one or
more additional layers to produce the final prediction. Model concatenation can be useful in a
variety of contexts, such as when combining multiple models trained on different modalities of
data (e.g., text and images), or when combining models trained on different subsets of a larger
dataset. By combining the strengths of multiple models, model concatenation can often lead
to improved performance compared to using a single model.

3.10 Presentation of the architectures of our proposals

In this section, we introduce two novel models for parking image classification based on the
Xception architecture. For the first model, we enhanced the structure of the Xception model
by incorporating an additional convolution layer and applying regularization techniques. In the
second model, we combined the Xception model with the Inception model to further improve
its performance. Detailed descriptions of both models are provided in the following sections.

3.10.1 Model 1: Xception Model with Additional Convolution Layer

and Regularization

3.10.1.0.1 Data preparation The code imports the necessary packages from Keras and
sets up the data directories, the number of classes, and the number of epochs to train the
model. Next, it sets up the data generators for the training, validation, and test sets using the

38
Chapter 3. Transfer Learning with Xception model
for Parking images classification

‘ImageDataGenerator‘ class.

3.10.1.0.2 Model definition After setting up the data generators, the code defines the
architecture of the model. The model consists of a pre-trained ‘Xception‘ base model with
the last several layers unfrozen and combined with some custom layers, including convolutional
layers, batch normalization layers, fully connected layers, and activation functions. The outputs
of the base model are passed through a convolutional layer with 1024 filters, followed by batch
normalization and global average pooling layers. The output is then passed through three
fully connected layers with decreasing numbers of units, all with ReLU activation functions
and batch normalization layers. The final output layer has a sigmoid activation function that
outputs a probability value between 0 and 1, indicating the probability of the image containing
a car.

3.10.1.0.3 Model training After defining the model architecture, the code compiles the
model by specifying the optimizer, loss function, and evaluation metrics. Here, the optimizer
is Stochastic Gradient Descent (SGD) with a learning rate of 0.00001 and momentum of 0.9,
the loss function is binary cross-entropy, and the evaluation metric is accuracy. Next, it trains
the model on the training set using the ‘fit()‘ method, specifying the number of epochs, steps
per epoch, validation set, and validation steps. It also saves the training history to a variable
for visualization purposes and plots the training and validation loss and accuracy over epochs
using the Matplotlib package.

3.10.1.0.4 Model evaluation After training the model, the code saves it to a file using the
‘save()‘ method and loads it back into memory using the ‘load_model()‘ method. Finally,
it evaluates the model on the test set and computes the loss and accuracy metrics. It also
generates a confusion matrix to visualize the performance of the model on the test set and
prints a classification report, which includes precision, recall, F1-score, and support for each
class. The Matplotlib package is used to plot the confusion matrix as a heatmap.

39
Chapter 3. Transfer Learning with Xception model
for Parking images classification

Figure 3.3: The First Model Schema

3.10.2 Model 2 : Xception Model Concatenated with Inception

Model

3.10.2.1 data preparation

The code imports the necessary packages from Keras, including the ‘ImageDataGenerator‘,
pre-trained models (‘Xception‘, ‘InceptionV3‘), layers, optimizers, callbacks, metrics, and other
packages for visualization and evaluation. It then sets up the data directories and numbers,
including the file paths for the training, validation, and test sets, the number of classes (in this
case, 1 because it’s binary classification), and the number of epochs to train the model. Next,
it sets up the data generators for the training, validation, and test sets using the ‘ImageData-

40
Chapter 3. Transfer Learning with Xception model
for Parking images classification

Generator‘ class.

3.10.2.2 model definition

After setting up the data generators, the code defines the architecture of the model. The model
consists of two pre-trained convolutional neural networks (CNNs), ‘Xception‘ and ‘InceptionV3‘,
with their last several layers unfrozen and combined with some custom layers, including con-
volutional layers, batch normalization layers, fully connected layers, and activation functions.
The outputs of these two CNNs are concatenated and fed into a fully connected layer with
decreasing numbers of units, all with ReLU activation functions and batch normalization lay-
ers. The final output layer has a sigmoid activation function that outputs a probability value
between 0 and 1, indicating the probability of the image containing a car.

3.10.2.3 model training

After defining the model architecture, the code compiles the model by specifying the optimizer,
loss function, and evaluation metrics. Here, the optimizer is Stochastic Gradient Descent (SGD)
with a learning rate of 0.00001 and momentum of 0.9, the loss function is binary cross-entropy,
and the evaluation metric is accuracy. Next, it trains the model on the training set using the
‘fit()‘ method, specifying the number of epochs, steps per epoch, validation set, and validation
steps. It also saves the training history to a variable for visualization purposes and plots the
training and validation loss and accuracy over epochs using the Matplotlib package.

3.10.2.4 model evaluation

After training the model, the code saves it to a file using the ‘save()‘ method and loads it
back into memory using the ‘load_model()‘ method. Finally, it evaluates the model on the
test set and computes the loss and accuracy metrics. It also generates a confusion matrix to
visualize the performance of the model on the test set and prints a classification report, which
includes precision, recall, F1-score, and support for each class. The Matplotlib package is used
to plot the confusion matrix as a heatmap.

41
Chapter 3. Transfer Learning with Xception model
for Parking images classification

Figure 3.4: the second model schema

3.11 Conclusion

In conclusion, this chapter has demonstrated the use of transfer learning and the Xception
model, trained on ImageNet, to propose a new model for classifying images of vehicle parking.
This was achieved using the fine-tuning method discussed earlier in the chapter. The proposed
model shows promising results and serves as a foundation for the implementation of a parking
image classification system in the next chapter. The tools and test models used for this imple-
mentation will be discussed, offering a comprehensive view of the approach taken to solve the
problem of parking image classification.

42
Chapter 4

Implementation and discussion


of results

43
Chapter 4. Implementation and discussion
of results

4.1 Introduction

In the previous chapter, we explored the concept of transfer learning and proposed a CNN
model that utilized the Xception architecture. In this chapter, we will dive into the stages
of implementing the proposed model and discuss the results of our experimentation. We will
explore the process of optimizing the model and fine-tuning the pre-trained Xception layers to
achieve higher accuracy in classification tasks. Additionally, we will analyze the performance of
the model on various datasets and compare it to other state-of-the-art models. By the end of
this chapter, we will have a comprehensive understanding of the effectiveness of our proposed
transfer learning approach and the potential of the Xception architecture in solving real-world
problems .

4.2 Hardware used in the implementation

• DELL PC

• Processor: Intel(R) Core(TM) i5-8365U CPU @ 1.60GHz 1.90 GHz

• Installed memory (RAM): 16,0 Go

• System type: 64-bit operating system

4.3 Software and libraries used in the implementation

4.3.1 Kaggle

Kaggle is a community and platform for data scientists, machine learning engineers, and other
related professionals to collaborate on data science projects, participate in machine learning
competitions, and access public datasets. It was founded in 2010 and acquired by Google in
2017.

Kaggle provides a platform where users can upload and share datasets, create and participate
in machine learning competitions, and access a variety of tools and resources for data analysis
and modeling. The platform also features a discussion forum where users can ask questions,
share ideas, and collaborate on projects.

44
Chapter 4. Implementation and discussion
of results

One of the main features of Kaggle is its machine learning competitions, which are sponsored
by companies and organizations looking to solve specific data science problems. Competitors
can win prizes and gain recognition for their work.

Kaggle also offers a variety of educational resources, including tutorials and courses on data
science and machine learning topics. Overall, Kaggle is a platform that brings together a
community of data science professionals and enthusiasts to share knowledge, collaborate on
projects, and advance the field of data science and machine learning.

Figure 4.1: Google Kaggle Logo.

45
Chapter 4. Implementation and discussion
of results

4.3.2 TensorFlow

TensorFlow is an open-source software library for dataflow and differentiable programming


across a range of tasks. It was developed by the Google Brain team for internal use and was
later released as an open-source project in 2015.

TensorFlow is designed to facilitate the creation and training of machine learning models,
particularly neural networks, for applications such as image and speech recognition, natural
language processing, and predictive analytics. It provides a variety of tools and resources for
building and optimizing machine learning models, including a high-level API for building and
training neural networks, support for distributed training, and a range of pre-built models for
common tasks.

One of the key features of TensorFlow is its computational graph, which allows users to visualize
and manipulate the flow of data through a machine learning model. This makes it easier to
debug and optimize models, and also enables support for distributed computing.

TensorFlow is widely used in academia and industry for a variety of applications, and has a large
and active community of developers contributing to its development and maintenance. It is
available under an Apache 2.0 open-source license, which allows users to modify and distribute
the software freely.

46
Chapter 4. Implementation and discussion
of results

Figure 4.2: TensorFlow Logo.

4.3.3 Keras

Keras is a high-level open-source software library for building and training neural networks.
It was initially developed as a user-friendly interface to TensorFlow, but can now also run on
other backends, such as Theano and Microsoft Cognitive Toolkit.

Keras is designed to simplify the process of building and training neural networks, particularly
for beginners and researchers who are not experts in deep learning. It provides a range of high-
level building blocks for constructing and optimizing neural networks, such as layers, activations,
loss functions, and optimizers. These building blocks can be assembled into a model using a
simple, intuitive syntax.

Keras also provides a range of utilities for loading and preprocessing data, as well as tools
for visualizing and evaluating the performance of a trained model. It supports a variety of
neural network architectures, including convolutional neural networks (CNNs), recurrent neural
networks (RNNs), and transformers.

One of the main benefits of Keras is its ease of use and flexibility. It allows users to experiment
with different neural network architectures and hyperparameters quickly and easily, without
needing to write low-level code. Keras is widely used in academia and industry for a vari-

47
Chapter 4. Implementation and discussion
of results

ety of applications, and has a large and active community of developers contributing to its
development and maintenance.

Figure 4.3: Keras Logo.

48
Chapter 4. Implementation and discussion
of results

4.3.4 Python

Python is a high-level, interpreted programming language that was first released in 1991 by
Guido van Rossum. It is designed to be easy to read and write, with a simple and clean syntax
that emphasizes code readability and reduces the cost of program maintenance.

Python is a popular language for a wide range of applications, including web development,
scientific computing, data analysis, artificial intelligence, and machine learning. It provides
a large standard library and a variety of third-party packages that make it easy to perform
complex tasks with minimal code.

Python is known for its ease of use, flexibility, and versatility. It supports multiple programming
paradigms, including procedural, object-oriented, and functional programming. Python code
can be run on a variety of platforms, including Windows, macOS, Linux, and Unix, and it is
often used in conjunction with other languages and tools.

One of the main benefits of Python is its large and active community of developers, who
contribute to the development of the language, create new packages and libraries, and provide
support and resources for beginners and experts alike. Python is an open-source language,
which means that it is freely available and can be modified and distributed by anyone.

Figure 4.4: python Logo.

49
Chapter 4. Implementation and discussion
of results

4.3.5 Matplotlib

Matplotlib is a powerful Python library that enables users to create a wide range of visualiza-
tions, including static, animated, and interactive plots. Initially developed by John D. Hunter
in 2003, it is now maintained by a thriving community of developers.

With a comprehensive set of tools and functions, Matplotlib supports the creation of various
types of visualizations, such as line plots, scatter plots, bar plots, histograms, and more. It
offers extensive customization options, allowing users to tailor their plots with precise control
over every aspect of the visualization, from color schemes to axes labels and annotations.

Figure 4.5: matplotlib Logo.

4.3.6 NumPy

NumPy is a popular Python library that is specifically designed for numerical computations.
It was initially introduced in 2006 and is now supported by a vibrant community of developers.
With a comprehensive set of tools and functions, NumPy enables users to carry out complex
mathematical and numerical operations on arrays and matrices efficiently. It boasts a powerful
N-dimensional array object that can be used to represent and manipulate large sets of data.
The library also includes a wide range of mathematical functions, such as linear algebra, Fourier
transforms, and random number generation, making it a versatile tool for scientific computing
and data analysis.

50
Chapter 4. Implementation and discussion
of results

Figure 4.6: NumPy Logo.

4.3.7 Scikit-learn

Scikit-learn, commonly referred to as sklearn, is a well-known Python library for machine


learning that is built on top of other popular libraries such as NumPy, SciPy, and Matplotlib.
Developed initially in 2007, it is now maintained by a large and thriving community of devel-
opers.

The library offers a comprehensive range of tools and functions for machine learning, including
classification, regression, clustering, and dimensionality reduction. It provides a diverse set
of algorithms for each of these tasks, as well as tools for model selection, preprocessing, and
evaluation. Additionally, sklearn offers helpful utilities for handling datasets, such as loading,
cleaning, and splitting data.

Figure 4.7: scikit learn Logo.

4.3.8 summary of the PKLot database

The PKLot dataset is a collection of 12,416 parking lot images extracted from surveillance cam-
eras located in the parking lots of the Federal University of Parana (UFPR) and the Pontifical
Catholic University of Parana (PUCPR), both located in Curitiba, Brazil. The dataset was
introduced by Karakaya et al. in 2018.

51
Chapter 4. Implementation and discussion
of results

The images were captured on sunny, cloudy, and rainy days, and each parking space is labeled
as either occupied or vacant. The dataset also includes additional information, such as the date
and time of image capture, the camera location, and the weather conditions.

The PKLot dataset was created using a three-step protocol, which includes image acquisition,
labeling, and segmentation. During the labeling process, each parking space was manually
labeled as either occupied or vacant by human annotators. The segmentation process involved
the use of a threshold-based approach to separate the parking spaces from the background.

The PKLot dataset is challenging due to its large size and the variability in weather and lighting
conditions. It has been used in several studies to develop and evaluate parking space occupancy
detection algorithms, including the use of traditional machine learning approaches and deep
learning models.

Figure 4.8: Example of images of empty and occupied parking spaces .

52
Chapter 4. Implementation and discussion
of results

Figure 4.9: images of parking.

4.3.9 Sub Datasets

We devide our dataset into three subsets :

4.3.10 Compilation parameters

• num_classes = 1
• Number of epochs: 75
• Batch size: 32
• Optimization algorithm: SGD optimizer

53
Chapter 4. Implementation and discussion
of results

• Learning rate: lr=0.00001


• momentum=0.9
• class_mode=’binary’
• input_shape=(224, 224, 3)
Number of parameters :
Total params : 83,184,969
Trainable params : 43,676,673
Non-trainable params : 39,508,296

54
Chapter 4. Implementation and discussion
of results

4.4 Presentation of the results

4.4.1 Model 1: Xception Model with Additional Convolution Layer

and Regularization

Figure 4.10: The result of Xception model

55
Chapter 4. Implementation and discussion
of results

4.4.1.1 PUC 000 results

Figure 4.11: accuracy and loss of the first model for the base PUC 000.

56
Chapter 4. Implementation and discussion
of results

4.4.1.2 UFP R004 results

Figure 4.12: accuracy and loss of the first model for the base UFP R004.

57
Chapter 4. Implementation and discussion
of results

4.4.1.3 UFP R005 results

Figure 4.13: accuracy and loss of the first model for the base UFP R005.

58
Chapter 4. Implementation and discussion
of results

4.4.2 Model 2 : Xception Model Concatenated with Inception Model

Figure 4.14: The result of Xception and inception model

59
Chapter 4. Implementation and discussion
of results

4.4.2.1 PUC 000 results

Figure 4.15: accuracy and loss of the second model for the base PUC 000.

60
Chapter 4. Implementation and discussion
of results

4.4.2.2 UFP R004 results

Figure 4.16: accuracy and loss of the second model for the base UFP R004.

61
Chapter 4. Implementation and discussion
of results

4.4.2.3 UFP R005 results

Figure 4.17: accuracy and loss of the second model for the base UFP R005.

62
Chapter 4. Implementation and discussion
of results

4.5 Discussion of results

The first table shows the performance of Model 1, which uses the Xception model with an
additional convolution layer and regularization. The model achieves high accuracy on the
training set, with an accuracy of 99.06% on PUC000, 98.44% on UFP R004, and 99.06% on
UFP R005. The validation accuracy is also high, with an accuracy of 98.12% on PUC000,
97.08% on UFP R004, and 95.83% on UFP R005. The test accuracy is also reasonable, with
an accuracy of 98.66% on PUC000, 88.15% on UFP R004, and 92.36% on UFP R005. The loss
values are also relatively low across all sets, with the lowest loss being on the test set, indicating
that the model is generalizing well.

The second table shows the performance of Model 2, which concatenates the Xception model
with the Inception model. The model achieves high accuracy on the training set, with an
accuracy of 99.37% on all three datasets. The validation accuracy is also high, with an accuracy
of 98.02% on PUC000, 98.12% on UFP R004, and 96.04% on UFP R005. The test accuracy
is even higher on PUC000, with an accuracy of 99.62%. However, the test accuracy is lower
on UFP R004 and UFP R005 compared to Model 1, with accuracies of 95.06% and 92.25%,
respectively. The loss values are mostly low, with the lowest loss being on the test set for all
three datasets.

Comparing the two models, it appears that Model 2 generally performs better than Model 1,
especially on the validation and test sets. Model 2 achieves higher accuracy on the validation
set for all three datasets and higher test accuracy on PUC000. However, Model 1 achieves
higher accuracy on the test set for UFP R004 and UFP R005. It is also worth noting that
Model 2 has lower loss values than Model 1 on the training set, which suggests that it may be
better at generalizing to new data.

Overall, the use of concatenation in Model 2 appears to be a promising technique for improving
the performance of image classification models. By combining the strengths of multiple models,
concatenation can help to improve the overall accuracy and ability to generalize to new data.
However, as with any modeling technique, it is important to carefully evaluate the performance
of the model on different datasets and subsets of the data to ensure that it is robust and effective
for the specific task at hand.

63
Chapter 4. Implementation and discussion
of results

4.5.1 Compare our results with Different works


accuracy
model
PUC000 UFPR004 UFPR5 MEAN
first model 99.04% 93.58% 93.68% 95.43%
second model 99.62% 95.06% 95.27% 96.65%
Gao et all (2021) - - - 98.10%
Khalfi ali and Guerroumi
- - - 98.75%
Mohamed (2023)

Table 4.1: Deffirent Works Results

Our first Model The First Model has an accuracy of 95.43% on average across the three
datasets (PUC000, UFPR004, UFPR005).

Our second Model The Second Model has an accuracy of 96.65% on average, which is
higher than the First Model.

Gao et al.’s (2021) Model The model developed by Gao et al. in 2021 has a highest
average accuracy of 98.10% .

Khalfi Ali and Guerroumi Mohamed (2023) The model developed by Khalfi Ali and
Guerroumi Mohamed in 2023 has an average accuracy of 98.75% , which is the highest among
all the models presented.

In summary, the table highlights the trade-offs in image classification model performance.
While the "Second Model" achieved the highest accuracy on a specific dataset (PUC000), the
models proposed by Gao et al. (2021) and Khalfi ali and Guerroumi Mohamed (2023)
demonstrated even higher mean accuracies across the benchmarks.

This suggests that there may be more accurate image classification models available than the
ones we have developed . However, the choice of the most suitable model ultimately depends
on the specific requirements of the application, such as desired accuracy, computational
constraints, model complexity, and inference speed. Each model likely has its own strengths
and weaknesses that must be carefully evaluated to determine the best fit for the target use
case.

64
Chapter 4. Implementation and discussion
of results

4.6 conclusion

In this chapter, we presented two models. The initial model is built upon the pre-trained
exception with fine-tuning, involving the addition of one convolutional layer. In the second
model, we investigated the concatenation technique between the exception and the inception
model, also incorporating one convolutional layer By the end of this chapter, we will have
insights into the strengths and limitations of our proposed model and recommendations for
further improvements. The results of our experimentation will provide valuable information
for the development of transfer learning models that are efficient, accurate, and suitable for
various applications.

65
General conclusion

In conclusion, artificial intelligence has become an integral part of various sectors, including
the development of intelligent parking systems. These systems aim to address the negative
impacts of the proliferation of vehicles, such as traffic congestion, air pollution, noise levels,
and limited availability of parking spaces. Deep learning, specifically convolutional neural
networks, offers a promising approach for solving parking space classification problems. The
proposed research aims to modify the architecture of CNNs to effectively classify parking
spaces, utilizing a pretrained Xception model as the foundation and transfer learning
techniques to leverage the capabilities of CNNs. Overall, the study has the potential to
contribute to the development of more efficient and effective intelligent parking systems.

66
Bibliography

[1] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Pearson Education
India, 2017.
[2] Difference between JPG and PDF file. https://sp-ao.shortpixel.ai/client/tow ebp, ql ossless, reti mg, w5 5
//viavector.eu/wp−content/uploads/2022/01/dif f erence−between−jpg −and−pdf −
f ile.jpg.
[3] Color RGB. https : / / www . cevagraf . coop / blog / wp - content / uploads / 2018 / 12 /
color-rgb.jpg. visited on2023-04-29.
[4] CMYK Image. https://99designs- blog.imgix.net/blog/wp- content/uploads/
2017/02/CMYK-915x915px.png?auto=format&q=60&fit=max&w=930. visited on 2023-
05-04.
[5] R. Rastogi and A. Sharma. Color Image Processing and Applications. City: Publisher,
2016.
[6] Y. Zhang, Y. Gao, and Q. Ji. Texture Analysis and Classification with Tree-Structured
Wavelet Transform. City: Publisher, 2002.
[7] V. Singh, D. Singh, and K. Singh. Image Resolution. City: Publisher, 2016.
[8] R. D. Dony, R. Hamzaoui, and S. Batti. Image and Video Compression. City: Publisher,
2014.
[9] M. Singh and R. Kaur. Edge Detection Techniques for Image Segmentation. City: Pub-
lisher, 2016.
[10] Lesa Phillips. The Adobe Photoshop CC Book. City: Publisher, 2015.
[11] Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. 4th. City: Publisher,
2018.

67
Bibliography

[12] Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis. City:
Publisher, 2004.
[13] Pietro Perona and Jitendra Malik. “Scale-Space and Edge Detection Using Anisotropic
Diffusion”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 12.7
(1990), pp. 629–639.
[14] Neutral Network Diagram. https : / / www . tibco . com / sites / tibco / files / media _
entity/2021-05/neutral-network-diagram.svg. visited on 2023-04-29.
[15] Stéphane Mallat. A Wavelet Tour of Signal Processing. San Diego, CA: Academic Press,
2009.
[16] Teuvo Kohonen. Self-Organizing Maps. 3rd. Berlin, Heidelberg: Springer, 2001.
[17] Image Title. https : / / miro . medium . com / v2 / resize : fit : 1400 / format : webp / 1 *
4n1iICgC0jlYPdC9VQfO3Q.jpeg. visited on 2023-04-29.
[18] Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
[19] AI vs ML vs Deep Learning. https://k21academy.com/wp-content/uploads/2020/
12/AI-vs-ML-vs-Deep-Learning-1536x646.png. visited on 2023-04-29.
[20] Prateek Joshi. Artificial Intelligence with Python. Packt Publishing, 2017.
[21] Image Title. https://miro.medium.com/v2/resize:fit:720/format:webp/1*9Eu_-
DDMZ_bP_t94_MMEYA.png. visited on 2024-04-29.
[22] Michael Nielsen. Neural Networks and Deep Learning: A Textbook. Determination Press,
2015.
[23] Pathmind. Perceptron Node Image. visited on: 2023-04-27. n.d. url: https : / / wiki .
pathmind.com/images/wiki/perceptron_node.png.
[24] Developers Breach. Convolution Neural Network Deep Learning. visited on: 2023-04-27.
n.d. url: https : / / developersbreach . com / convolution - neural - network - deep -
learning/.
[25] Imran Ali. Pooling Layer Operation Approaches. visited on: 2023-04-27. 2020. url: https:
//www.researchgate.net/profile/Imran-Ali-12/publication/340812216/figure/
fig4/AS:928590380138496@1598404607456/Pooling-layer-operation-oproaches-
1-Pooling-layers-For-the-function-of-decreasing-the.png.
[26] Aurélien Géron. Hands-On Machine Learning. O’Reilly Media, 2019.

68
Bibliography

[27] Dipanjan Sarkar. Hands-On Transfer Learning with Python: Implement advanced deep
learning and neural network models using TensorFlow and Keras. Packt Publishing, 2018.
[28] Abid Mehmood. Proposed structure of Xception network used within each stream of
CNN. visited on: 12 07 2023. 2021. url: https://www.researchgate.net/profile/
Abid _ Mehmood3 / publication / 355098045 / figure / fig2 / AS : 1076622409109511 @
1633698193851/Proposed- structure- of- Xception- network- used- within- each-
stream-of-CNN.ppm.
[29] Springer Nature. Figure from the article on advanced signal processing techniques. visited
on : 2023-07-18. 2021. url: https://media.springernature.com/full/springer-
static/image/art%3A10.1186%2Fs13634-021-00755-1/MediaObjects/13634_2021_
755_Fig1_HTML.png?as=webp.

69

You might also like