Classification of Endangered Bird Species of Nepal Using Deep Learning

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING

PASHCHIMANCHAL CAMPUS, POKHARA

Lamachaur, Pokhara-16

[Subject Code: EX 755]

A MAJOR PROJECT REPORT

ON

“Classification of Endangered Bird Species of Nepal Using


Deep Learning”

Submitted By:

Abhishek Aryal PAS076BEI003

Ozan Wagle PAS076BEI020

Sandip Shrestha PAS076BEI032

Sumit Pant PAS076BEI046

Submitted To:

Department of Computer and Electronics Engineering

MARCH, 2024
Classification of Endangered Bird Species of Nepal Using Deep
Learning

Submitted By:

Abhishek Aryal PAS076BEI003

Ozan Wagle PAS076BEI020

Sandip Shrestha PAS076BEI032

Sumit Pant PAS076BEI046

Supervised By:
Nabin Lamichhane

A PROJECT REPORT SUBMITTED TO DEPARTMENT OF ELECTRONICS


AND COMPUTER ENGINEERING IN PARTIAL FULFILLMENT OF THE
REQUIREMENT FOR THE BACHELOR'S DEGREE OF ELECTRONICS,
COMMUNICATION AND INFORMATION ENGINEERING

Submitted To:

Department of Computer and Electronics Engineering

Pashchimanchal Campus

Lamachaur, Pokhara-16

MARCH, 2024
DECLARATION
We hereby declare that the report of the project entitled “Classification of Endangered
Bird Species of Nepal Using Deep Learning” which is being submitted to the
Department of Electronics and Computer Engineering, IOE, Pashchimanchal
Campus, in the partial fulfilment of the requirements for the award of the Degree of
Bachelor of Engineering in Electronics, Communication and Information
Engineering, is a bonafide report of the work carried out by us. The materials contained
in this report have not been submitted to any University or Institution for the award of
any degree and we are the only author of this complete work and no sources other than
the listed here have been used in this work.

Abhishek Aryal (PAS076BEI003) ________________________

Ozan Wagle (PAS076BEI020) ________________________

Sandip Shrestha (PAS076BEI032) ________________________

Sumit Pant (PAS076BEI046) ________________________

Date: March, 2024


COPYRIGHT
The authors have agreed that the library, Department of Electronics and Computer
Engineering, Pashchimanchal Campus, Institute of Engineering, may take this report
freely available for inspection. Moreover, the authors have agreed that the permission
of extensive copying of this project report for scholar purpose may be granted by the
professor who supervised the project or, in absence by the Head of Department. It is
understood that the recognition will be given to the authors of this report or to the
Department of Electronics and Computer Engineering, Pashchimanchal Campus in any
use of the material of this project report. Copying or publication or the other use of this
report for financial gain without approval of the Department of Electronics and
Computer Engineering, Pashchimanchal Campus, Institute of Engineering and authors
written permission is prohibited.

Request for permission to copy or to make any use of the material in this report in whole
or in part should be addressed to:

Head
Department of Electronics and Computer Engineering
Pashchimanchal Campus, Institute of Engineering
Lamachaur-16, Pokhara
Nepal
ACKNOWLEDGEMENT
We must express our deep and sincere gratitude to the Institute of Engineering,
Pashchimanchal Campus and the Department of Electronics and Computer Engineering
for giving us this wonderful platform and an opportunity to utilize and build our skills.
We are deeply indebted to all our teachers who have provided us with valuable
knowledge that has really helped us to reach here and explore the horizon further.

We would also like to acknowledge our family and friends and would like to thank
them for being helpful and supportive which has worked as a motivation and created a
secure space for us.
ABSTRACT
TABLE OF CONTENTS

DECLARATION ......................................................................................................... iii

COPYRIGHT ................................................................................................................ iv

ACKNOWLEDGEMENT ............................................................................................. v

ABSTRACT .................................................................................................................. vi

LIST OF FIGURES ....................................................................................................... 4

LIST OF ABBREVIATIONS ........................................................................................ 5

CHAPTER 1: INTRODUCTION .................................................................................. 6

1.1 Background ...................................................................................................... 6

1.2 Motivation ........................................................................................................ 7

1.3 Problem Statement ........................................................................................... 7

1.4 Objective .......................................................................................................... 8

1.5 Project Scope .................................................................................................... 8

1.6 Feasibility Analysis .......................................................................................... 8

1.6.1 Economic Feasibility ............................................................................... 8


1.6.2 Technical Feasibility ................................................................................ 9
1.6.3 Operational Feasibility ............................................................................. 9
CHAPTER 2: LITERATURE REVIEW ..................................................................... 10

CHAPTER 3: METHODOLOGY ............................................................................... 12

3.1 System Block Diagram ................................................................................... 12

3.2 Interface and Working .................................................................................... 12

3.3 Software Component and technologies .......................................................... 13

3.3.1 Python 13
3.3.2 Visual Studio Code ................................................................................ 13
3.3.3 TensorFlow ............................................................................................ 13
3.3.4 Keras 14
3.3.5 Scikit-learn ............................................................................................. 14
3.3.6 Numpy.................................................................................................... 14
3.3.7 Pandas 14
3.3.8 Matplotlib............................................................................................... 14
3.4 Image collection and dataset preparation ....................................................... 15

3.5 Data preprocessing ......................................................................................... 16

3.6 Data augmentation .......................................................................................... 16

3.7 CNN (Convolution Neural Network) Architecture ........................................ 17

3.7.1 Convolutional layer Architecture ........................................................... 18


3.7.2 Convolution layer................................................................................... 21
3.7.3 Kernel 21
3.7.4 Filter 22
3.7.5 Stride 22
3.7.6 Padding .................................................................................................. 23
3.7.7 Max Pooling ........................................................................................... 23
3.7.8 Batch Normalization Layer .................................................................... 23
3.7.9 Fully Connected layer ............................................................................ 24
3.7.10 Flatten Layer .............................................................................. 25
3.7.11 Dropout....................................................................................... 25
3.8 Activation Function ........................................................................................ 25

3.8.1 Re-Lu Activation ................................................................................... 26


3.8.2 SoftMax.................................................................................................. 26
3.9 Optimization Algorithm ................................................................................. 27

3.9.1 Adam Optimization................................................................................ 27


3.10 Loss Function ................................................................................................. 28

3.11 Early Stopping ................................................................................................ 28

3.12 Web Application Development ...................................................................... 29

CHAPTER 4: RESULT AND DISCUSSIONS ........................................................... 31

4.1 Experiment ..................................................................................................... 31

4.2 Result analysis ................................................................................................ 31

4.2.1 Training Accuracy ................................................................................. 31


4.2.2 Training Loss ......................................................................................... 32
4.3 Model Evaluation on Unseen Images ............................................................. 33

4.4 Future Enhancements ..................................................................................... 34

CONCLUSION ............................................................................................................ 35

References .................................................................................................................... 36

Appendix ...................................................................................................................... 37
LIST OF FIGURES
FIGURE PAGE

Figure 3-1: Block diagram of the system ..................................................................... 12

Figure 3-2 : ebird.org user interface ............................................................................ 15

Figure 3-3 : Dataset distribution of bird species .......................................................... 16

Figure 3-4: Data Augmentation ................................................................................... 17

Figure 3-5: A basic CNN architecture ......................................................................... 18

Figure 3-6: Working of Kernel .................................................................................... 22

Figure 3-7: filter and kernels........................................................................................ 22

Figure 3-8: Max pooling .............................................................................................. 23

Figure 3-9: Flatten layer............................................................................................... 25

Figure 3-10: ReLU activation function ........................................................................ 26

Figure 3-11: Home page ............................................................................................. 29

Figure 3-12: Prediction Page ....................................................................................... 29

Figure 3-13: About Us page ......................................................................................... 30

Figure 3-14: Predicted Bird Species details page ........................................................ 30


LIST OF ABBREVIATIONS
ADAM Adaptive Moment Estimation

ANN Artificial Neural Network

API Application Programming Interface

CNN Convolutional Neural Network

CSV Comma Separated Values

JS JavaScript

ML Machine Learning

Re-LU Rectified Linear Unit

SGD Stochastic gradient descent

UI User Interface

Wi-Fi Wireless Fidelity


CHAPTER 1: INTRODUCTION
1.1 Background

Birds represent a diverse class of creatures found across our planet. They possess
unique characteristics such as feathers, toothless beaked jaws, and the ability to lay
eggs. With approximately 11,162 known species worldwide, the majority of bird
populations are concentrated in tropical regions. Indonesia claims the title for hosting
the largest population of endemic bird species, with around 500 species. Australia
follows closely behind with nearly 360 bird species, while the Philippines and Brazil
each have approximately 260 species. However, higher latitude countries such as Spain,
Portugal, and other European nations typically have fewer endemic bird species.

As per reports 159 bird species have become extinct in the past 500 years. As of the
2021 IUCN Red List, out of the total global bird species, approximately 1,445 species
(13% of the total) are classified as threatened or endangered. This means that roughly
1 in 7 bird species on Earth is at risk. Brazil harbors 172 of these endangered species,
while Indonesia is home to 106 threatened endemic species.

In the context of Nepal, the country boasts a remarkable diversity of avian life, with
875 recorded bird species. This places Nepal 27th on the list of countries with the
highest endemic bird populations. Of these species, approximately 38 species are
considered as endangered. Notable examples of endangered bird species in Nepal
include Cheer Pheasant, Sultan tit, White-throated bulbul, Abbott’s Babbler, Great
Hornbill, etc. There are many birds which have gone extinct and many which are on
the verge of extinction. This is because of many factors like unplanned urbanization,
loss of habitat due to rapid deforestation, rampage killing of birds, climate change,
global warming etc. As humans, it becomes our responsibilities to find ways to protect
all the different aspects of nature.

Artificial Intelligence (AI) is an attempt at mimicking human intelligence by machines.


In this modern world, which is flowing towards the idea of automation in every field
possible AI has become a big name. AI has found its presence in various places like
search engines, voice assistants, different recognition software, automated vehicles, etc.
Machine learning is a type of AI that learns by data through certain algorithms. It feeds
on data to predict future values. A huge amount of data is available these days and
proper utilization of it can empower the machine learning algorithm. So, by feeding the
data corresponding to any real-life problem to a machine learning model better results
can be expected. The applications of this are literally limitless and one of them can be
in the field of endangered species conservation. AI or Machine learning algorithms are
able to take images and audio signals as input and produce predictions based on those
inputs. Image processing has been applied for conservation application all over the
world.

1.2 Motivation

Our project, "Classification of Endangered Bird Species of Nepal Using Deep


Learning," is motivated by the urgent need to protect Nepal's endangered bird species.
With a significant number of these species facing extinction, it is crucial to enhance our
monitoring and conservation efforts. By harnessing the capabilities of AI and deep
learning, we aim to develop an automated system that facilitates the identification and
preservation of endangered birds. Through this initiative, we strive to contribute to the
protection of Nepal's avian biodiversity and the formulation of effective conservation
policies, ensuring the continued existence of these invaluable species for generations to
come.

1.3 Problem Statement

The current scenario in Nepal underscores a pressing conservation challenge: the


existence of 38 endangered bird species, each representing a critical component of the
country's biodiversity. These species face imminent threats, accentuating the urgency
for effective conservation measures. However, the efficacy of current strategies is
hindered by the absence of efficient methods for accurately identifying these
endangered birds. Manual identification processes are often time-consuming and
resource-intensive, leading to gaps in monitoring efforts. As a result, there is a
compelling need for a more precise classification approach that harnesses advanced
technologies to enhance our ability to identify and protect these vulnerable species.
1.4 Objective

The major objective of this project is:

➢ To develop a deep learning system to classify Endangered Bird Species of


Nepal.

➢ To curate high quality dataset for Nepal’s 38 Endangered Bird Species.

1.5 Project Scope

Our project focuses on improving the accuracy of identifying endangered bird species
in Nepal, addressing the current limitations of manual methods which often require
specialized knowledge and expertise, leading to delays in recognizing these species.
We aim to develop a deep learning-based classification system tailored for the specific
context of Nepal's avian biodiversity. By utilizing advanced CNN models, we intend to
create a robust classification framework capable of accurately identifying endangered
bird species in real-time. This system will not only streamline the identification process
but also provide valuable insights into the distribution and behavior of these species,
thereby facilitating the formulation of targeted conservation strategies. Ultimately, our
project seeks to contribute significantly to the conservation efforts aimed at
safeguarding Nepal's endangered bird specie

1.6 Feasibility Analysis

1.6.1 Economic Feasibility

The economic feasibility of a Bird Species identification system depends on the specific
requirements of the system. Generally, the cost of implementing a Bird Species
identification system will depend on the number of sensors, cameras, and other devices
needed, as well as the cost of the software and hardware required to process and analyze
the data. Conducting a comprehensive cost-benefit analysis and market research is
crucial to assess the economic viability of the Bird Species surveillance system.
1.6.2 Technical Feasibility

The technical feasibility of a bird species identification system involves evaluating


factors such as feature selection, data availability, computational requirements, real-
time performance, integration and scalability, robustness to variations, model training
and updates, and computational efficiency. Considering these aspects helps determine
the feasibility of implementing a bird species identification system and identifies
potential challenges that need to be addressed.

1.6.3 Operational Feasibility

The operational feasibility of a bird species identification system depends on the


specific requirements of the system. Generally, the system should be able to integrate
with existing conservation and research practices. Additionally, the system should be
able to provide insights into the bird species surveillance, such as identifying areas
where system can be improved. Moreover, the system's usability and user-friendliness
should be considered to ensure smooth adoption and operation.
CHAPTER 2: LITERATURE REVIEW
In the paper “Bird Species Identification using Convolutional Neural Networks” [3],
John Martinsson et al. discuss the need of system for monitoring animal populations to
better understand their behavior, biodiversity, and population dynamics. The CNN
method and deep residual neural networks were presented to detect a picture in two
ways, based on feature extraction and signal classification. They conducted an
experimental investigation for datasets made up of various image types. They neglected
to consider the background species, though. Larger volumes of training data are
necessary to identify the background species; however, these may not be available.

Juha Niemi, Juha T Tanttu et al (2018) [4], proposed a Convolutional neural network
trained with deep learning algorithms for image classification. It also proposed a data
augmentation method in which images are converted and rotated in accordance with
the desired color. The final identification is based on a fusion of parameters provided
by the radar and predictions of the image classifier. They demonstrated that data
augmentation method is suitable for image classification problem and it significantly
increases the performance of the classifier.

Madhuri A. Tayal, Atharva Magrulkar et al (2018) [5], developed a software application


that is used to simplify the bird identification process. This bird identification software
takes an image as an input and gives the identity of the bird as an output. They used the
concept of transfer learning and pre-trained algorithm made feature extraction possible
from an image. The result obtained were of high efficiency as the software could easily
identify a bird species from an image whose dataset was present in the database.

Andreia Marini, Jacques Facon and Alessandro L. Koerich et al (2013) [6], presents a
novel approach for bird species classification based on color features extracted from
unconstrained images. The approach addresses challenges such as variations in
scenarios, bird poses, sizes, and viewing angles, as well as occlusions and lighting
variations in the images. A color segmentation algorithm is applied to eliminate
background elements and identify candidate regions where the bird may be present.
Experimental results on the CUB-200 dataset show a 75% correct segmentation rate
and varying bird species classification rates (ranging from 90% to 8%) depending on
the number of classes considered.

In this paper, “Genetic Algorithm based hyper-parameters optimization for transfer


Convolutional Neural Network” [7], X. Xiao et al. proposed a Genetic Algorithm to
optimize the hyper-parameters in CNNs. In their work, they did not restrain the depth
of the model. Experimental results show that they can find satisfactory hyper- parameter
combinations efficiently with accuracy about 88.92% and within 24.55 hours which is
relatively better than random search algorithm.

In this paper, “Samrakshyan: An Endangered Birds Recognition Portal” [8], B.


Khatiwada, B. Subedi, N. Duwal and R. et al. proposed an Endangered Birds
Recognition portal for the preservation of bird species and monitoring the status of birds
in the ecosystem which can assist researchers of Nepal’s biodiversity in planning
different strategies for their preservation. They developed a system to identify bird calls
from the audio data set collected from Xeno-canto.org. They extract spectral
characteristics of the audio signal through Mel-Spectrogram and MFCC (Mel-
Frequency Cepstral Coefficients) which generated the spectrogram. It was fed into the
deep learning model architecture like efficientNet which is based on a convolutional
neural network. Experimental results show that they can find satisfactory hyper-
parameter combinations efficiently with F1-Score of 79% for 41 species of birds.
CHAPTER 3: METHODOLOGY
3.1 System Block Diagram

Figure 3-1: Block diagram of the system

3.2 Interface and Working


The block diagram of the system illustrated in Figure gives a high-level overview of
the system’s workings. The User sends data to the server through web application. The
server has a CNN model loaded that has been trained to predict bird species accurately.

In CNN model, the first step is to split the image dataset into training and testing. This
division help to evaluate the performance of the trained model on unseen data.
Preprocessing is crucial to prepare the image data for training. It involves several steps
such as resizing image to consistent size, normalizing pixel values and applying
techniques like data augmentation to increase diversity in training set. The training data
is further splitted for validation purpose. This allows monitoring the model’s
performance during training and prevent overfitting.
The training process involves feeding the training data to the CNN model. The model
learns to optimize its internal parameter by minimizing loss function. The testing data
is fed to the CNN model to evaluate metrics like accuracy, weighted precision, weighted
F1-score. The server processes the image using the CNN model and predicts bird
species. The predicted output, which includes the identified bird species and their
details, is then transmitted to a web application.

3.3 Software Component and technologies

3.3.1 Python

Python is a versatile and high-level programming language known for its simplicity and
readability. It offers a wide range of libraries and frameworks that make it suitable for
various applications, including web development, data analysis, machine learning, and
automation. Python's clean syntax and extensive standard library allow developers to
write code more efficiently. It supports multiple programming paradigms, including
object-oriented, functional, and procedural programming.

3.3.2 Visual Studio Code

Visual Studio Code (VS Code) is a lightweight and highly versatile source code editor
developed by Microsoft. Known for its simplicity, speed, and cross-platform
compatibility, VS Code has gained widespread popularity among developers for a
variety of programming languages. It offers a rich set of features, including built-in Git
integration, intelligent code completion, debugging support, and a vast extension
ecosystem that allows users to customize and enhance their development environment.
With its emphasis on productivity and a user-friendly interface, Visual Studio Code has
become a go-to choose for developers seeking a powerful yet lightweight coding tool.

3.3.3 TensorFlow

TensorFlow is a popular deep learning framework that provides a high-level API for
building and training neural networks. It offers support for 3D CNNs and has pre-built
implementations of ResNet and ensemble techniques. TensorFlow also provides tools
for data preprocessing, model training, and evaluation
3.3.4 Keras

Keras is a high-level neural network API that can run on top of TensorFlow or other
backend engines, including PyTorch. Keras simplifies the process of building and
training neural networks, making it an accessible choice for lip reading tasks. It
provides pre-trained models, including ResNet, and supports ensemble techniques for
model combination.

3.3.5 Scikit-learn

Scikit-learn is a popular Python library for machine learning tasks. It provides a wide
range of tools and algorithms for tasks such as classification, regression, clustering, and
dimensionality reduction. With a user-friendly interface and extensive documentation,
scikit-learn is widely used for data analysis and building predictive models. It also
offers utilities for data preprocessing, model evaluation, and model selection, making
it a comprehensive and versatile tool for machine learning practitioners.

3.3.6 Numpy

NumPy is a powerful Python library for numerical computing. It provides support for
large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently. NumPy is widely used for tasks such as
numerical analysis, scientific computing, and data manipulation.

3.3.7 Pandas

Pandas is a popular Python library built on top of NumPy that specializes in data
manipulation and analysis. It offers data structures like Data Frames, which provide a
flexible and efficient way to work with structured data. Pandas provides functions for
data cleaning, filtering, grouping, merging, and many other operations, making it a
valuable tool for data preprocessing and exploratory data analysis.

3.3.8 Matplotlib

Matplotlib is a widely-used Python library for creating visualizations and plots. It


provides a flexible and intuitive interface for generating a variety of static, animated,
and interactive plots. Matplotlib supports a wide range of plot types, including line
plots, scatter plots, bar plots, histograms, and more. It offers extensive customization
options for controlling the appearance of plots, such as labels, colors, axes, and
annotations. Matplotlib is commonly used in data analysis, scientific research, and
visualization tasks to effectively communicate and explore data in a visually appealing
manner.

3.4 Image collection and dataset preparation

On the websites ebird.org bird images from throughout the world are shared. The
images are uploaded on the website by the contributors who travel around.

Figure 3-2 : ebird.org user interface

We collected 8475 images from this website of 38 bird species. The images were
splitted into training sets, validation sets and testing sets in the ratio of 0.9:0.5:0.5.
The training sets consists of 7671 images of 38 bird species for training, validation set
consists of 402 images for validation during training and 402 images for testing the
model.
Figure 3-3 : Dataset distribution of bird species

3.5 Data preprocessing

To ensure uniformity in the dataset, we standardized the images which have significant
variations in the size and quality. Given their original high-quality nature, the images
occupied considerable storage space. We resized the images to a consistent dimension
of 224*224 pixels. This process helps us to maintain the original aspect ratio while
ensuring all the images having consistent same dimension. The images in the dataset
are in RGB color. The pixels value of the images is normalized between 0 to 255 to
prevent potential numerical issues during model training. By performing normalization,
we ensured that the pixel values were represented in a standardized range suitable for
training the model effectively.

These processing steps helps to achieve consistent image sizes, reduces storage
requirements through compression, and normalize the pixels values for improving
model training outcomes.

3.6 Data augmentation

Data augmentation is a technique of artificially increasing the training set by creating


modified copies of a dataset using existing data. When training deep learning model,
having a large amount of diverse data is crucial for obtaining good performance.
However, if the dataset is relatively small, it can limit the model’s ability to learn and
generalize pattern effectively.

We used Keras deep learning library to create an image generator. Instead of using the
original images directly, this generator produces randomly altered images. These
altered images include minor changes to the dataset to generate new data points. Data
augmentation helps to reduce overfitting and improve accuracy.

Figure 3-4: Data Augmentation

3.7 CNN (Convolution Neural Network) Architecture

A convolutional neural network (CNN) is a type of artificial neural network comprised


of neurons that self-optimize through learning. The neurons receive input and perform
scalar product and non-linear function [7]. Convolutional neural networks outperform
other neural networks when given input datasets are images, voice, or audio. They have
layers like the Convolutional layer, Pooling layer, and Fully-connected layer. A
convolutional network’s initial layer is the convolutional layer whereas the fully
connected layer is the last layer. Many convolutional layers or pooling layers can exist
in between. The CNN becomes more complicated with each layer, detecting larger
areas of the picture. Early layers emphasize basic elements like colors and borders.

Figure 3-5: A basic CNN architecture

3.7.1 Convolutional layer Architecture

Our classifier is a convolutional neural network (CNN) consisting of

➢ 4 Convolution layers, each followed by a batch normalization layer and max


pooling layer.
➢ 3 fully connected layer, each followed by dropout of 30%.
➢ An output layer with 38 nodes.

Convolution layer applies 64 filters of kernel size 3*3 to the input images with a same
padding. The activation function used is ReLU (Rectified Linear Unit). The input shape
of the image is (224, 244, 3), indicating the input images are of size 224*224 pixel with
3 color channels (RGB).

Batch Normalization is applied to normalize the activations of the previous layer,


helping with training stability and accelerating convergence. Max pooling layer
performs max pooling with a pool size of 2*2 with stride 2 and has valid padding. Max
pooling reduces the spatial dimensions of the input by taking the maximum value within
each pooling window.
Three more convolution layers with filters 64, 32, 32 respectively with same
configurations as previous convolution layer followed by batch normalization and max
pooling.

Flatten layers flatten the multidimensional output from the previous layer into a 1-
Dimension vector, preparing it for the fully connected layers. These layers are fully
connected layers. The first dense layers have 512 units with ReLU activation function
followed by a dropout layer that randomly set 30% of the inputs to 0 during training to
prevent overfitting. The second dense layer have 256 units with ReLU activation
function followed by a dropout layer of 30%. The third dense layer have 128 units with
ReLU activation function followed by a dropout layer of 30%. The fourth dense layer
have 64 units with ReLU activation function followed by a dropout layer of 30%. The
final dense layer has 38 units with SoftMax activation function. All these layers
contribute for the construction of CNN model for bird species prediction, progressively
extracting features from the input images and making predictions based on those
features.

The architecture of our model is visualized in the figure below.


Figure 3-6: CNN Architecture
3.7.2 Convolution layer

An operation that combines two sequences of data to produce a third one is known as
convolution. In image processing, convolutions are used to apply filter or kernels to
an image that also allows the network to detect the certain patterns or features in the
image.

During the convolution operation, the filter is slid over the data, performing a
mathematical operation at each position. The result of this operation is a new set of
data that has been transformed or filtered in some way. For example, if the filter is
designed to detect edges, then the resulting data will highlight the edges in the
original data.

Let an image of dimension (W1 * H1) is convoluted by using a filter (F*F), then the
image will have new shape (W2 * H2). The formula to calculate the output shape after
convolution is given below:

𝑊1 + 2𝑃 − 𝐹
𝑊2 = +1 3-1
𝑆

𝐻1 + 2𝑃 − 𝐹
𝐻2 = +1 3-2
𝑆

Where,
𝑊2 ,𝐻2 = Width and height of output
𝑊1 ,𝐻1 = Width and height of input
S = Stride
P = Amount of zero padding
F = Filter size

3.7.3 Kernel

Kernel is a 2D matrix that uses convolution technique to blur, sharpen, and carry out
other image processing operations. The convolution method is shown in figure below,
where an image matrix and a kernel are multiplied element by element and added
together at last to get an output.
Figure 3-7: Working of Kernel

3.7.4 Filter

Filter is a small matrix of numbers that is used in a convolution operation. CNN uses
multiple kernels stacked together. Thus, filter is referred to this 3D structure of stacked

kernels. When operating in 2D, filter size is same as the kernel size. It is used to detect
patterns or features in data, such as edges or lines in an image.

Figure 3-8: filter and kernels

3.7.5 Stride

Stride is a vital hyperparameter that determines the size of output produced by the
convolutional layer and plays a key role in determining the spatial resolution of the
output. The stride is the number of pixels that the convolutional kernel moves when it
slides over the input image.
3.7.6 Padding

Padding refers to adding extra elements to a list or array so that it has a fixed size. This
is often done when working with neural networks, as they typically expect input data
to have a specific size. Padding is the process adding extra pixels around the edges of
an image which typically helps to prevent shrinking by adding layers of zeros to input
images. This is done to achieve more accurate results.

3.7.7 Max Pooling

Pooling layers are the layers added after convolution layer in CNN that perform down
sampling of an image. They are the building blocks of CNN, and they are used to reduce
the dimension of feature maps. As they reduce the dimension of feature maps, it reduces
the number of parameters to be learned and number of computations to be performed.
Thus, pooling layers come very handy in CNN as they reduce overfitting of the model.
Max Pooling is one of the way pooling operation which selects the maximum element
from the region of the feature map covered by the filter. Thus, the output after max
pooling layer would be a feature map containing the most prominent features of the
previous feature map.

Figure 3-9: Max pooling

3.7.8 Batch Normalization Layer

Each Convolution layer is followed by a batch Normalization layer. Batch


Normalization is a technique commonly used in deep neural networks, including
Convolution Neural Networks, to improve training stability and speed up convergence.
It helps to address the issue of internal covariate shift, which can impede the training of
deep neural networks. Internal covariate shift refers to the change in the distribution of
layer inputs during training, making it challenging for the network to learn effectively.
Batch Normalization mitigates this problem by normalizing the inputs of each layer in
a mini-batch.

This Normalization step brings the activations closer to zero mean and unit variance,
which helps to stabilize and speed up training.

3.7.9 Fully Connected layer

A dense layer is a fully connected layer in a convolutional neural network (CNN), in


which all neurons in the previous layer are connected to all neurons in the dense layer.
This allows the CNN to learn more complex patterns in the data and make more precise
predictions. Convolutional layers apply a convoluted operation over the input, where
each neuron is only connected to a subset of the input and the weights are shared across
all neurons. The dense layer takes in the flattened output of the convolutional layers
and processes it through fully connected layers. One of the main benefits of using dense
layers in a CNN is that they allow for end-to-end learning, where the model can learn
to map the input directly to the output without the need for manual feature engineering.
This can greatly simplify the process of building a CNN and make it easier to apply to
a wide range of tasks. However, dense layers can also add a significant number of
parameters to the model, which can lead to overfitting if not properly regularized. It is
important to balance the use of dense layers in a CNN with other regularization
techniques such as dropout and weight decay to ensure the model generalizes well to
new data. Overall, dense layers play an important role in CNNs by allowing the model
to learn more complex patterns in the data and make more precise predictions. They
should be used in combination with other regularization techniques to prevent over-
fitting and improve the generalization of the model.
3.7.10 Flatten Layer

When transitioning from the convolution layer to the fully connected layer, the flatten
layer is often used to reduce the multidimensional input to one dimension. The pooled
feature map is transformed into a single column and delivered to the fully connected
layer using the flatten function. Flatten layer is an important and useful component of
a CNN that helps to simplify the process of connecting the convolutional layers to the
fully-connected layers, improve computational efficiency, and improve network
performance.

Figure 3-10: Flatten layer

3.7.11 Dropout

A technique that helps in preventing overfitting is Dropout which works by randomly


dropping out or ignoring a certain number of output units in a layer during training by
setting them to zero. This has the effect of” averaging” the representations learned by
the different units, which makes the model more robust and less sensitive to the specific
weights of each unit. Dropout is typically applied to fully-connected layers, but it can
also be applied to convolutional layers. It is usually applied with a rate between 0.1 and
0.5. In this project, dropout of 0.3 is used on each fully connected layer.

3.8 Activation Function

An activation function is a crucial component in artificial neural networks, serving as


the mathematical operation applied to the output of each neuron or node in a neural
network layer. The purpose of an activation function is to introduce non-linearity into
the network, enabling it to learn complex relationships and patterns in data. Two types
of activation function have been used in our CNN model:

3.8.1 Re-Lu Activation

ReLU (Rectified Linear Unit) is one of the most popular activation functions. It replaces
all negative values with zero, allowing only positive values to pass through. ReLU helps
with the vanishing gradient problem and has contributed to the success of deep learning
models.
It has been used in convolution layers to introduce non-linearity to the network,
allowing CNNs to learn and model complex relationships in the data. Non-linear
activation functions are crucial for capturing intricate patterns and enabling the network
to approximate any arbitrary functions.

Mathematically, it can be represented as:

𝑓(𝑥) = 𝑚𝑎𝑥(0, 𝑥) 3-3

Figure 3-11: ReLU activation function

3.8.2 SoftMax

The SoftMax function is a popular activation function used in the output layer of a
neural network for multi-class classification problems. It transforms the raw output
scores of a network into a probability distribution over multiple class. The SoftMax
function takes a vector of real numbers as input and produces another vector of the same
dimension, where each element represents the probability of the corresponding class.

Mathematically,

𝑒 𝑥𝑖
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) = 𝑥
∑𝑗 𝑒 𝑗
3-4

3.9 Optimization Algorithm

In the context of Convolutional Neural Networks (CNNs), optimization algorithms play


a crucial role in training these deep learning models for tasks such as image
classification, object detection, and image segmentation. CNNs are specifically
designed to handle spatial hierarchies and patterns in data, making them highly effective
for tasks involving grid-like data structures, such as images.

We choose Adam optimizer as on optimization algorithm.

3.9.1 Adam Optimization

In machine learning, (Adaptive Moment Estimation) Adam is one of the popular and
powerful optimization algorithms that combines the benefits of stochastic gradient
descent (SGD) with the momentum and adaptivity, and requires fewer hyperparameters
to be tuned. It is the extension of the SGD and is well-suited for training deep learning
models.
Adam combines the benefits of SGD with momentum and the ability to adapt the
learning rates of each parameter. It maintains an exponentially decaying average of past
gradients and an exponentially decaying average of past squared gradients. These
moving averages are used to estimate the first and second moments of the gradients,
respectively. The learning rate of each parameter is then adjusted based on these
estimates, which helps the optimization process converge more quickly.

One of the advantages of Adam is that it requires fewer hyperparameters to be tuned


compared to SGD. It has a default value for the learning rate and the decay rates for the
moving averages, which means that it can work well without extensive hyperparameter
tuning.
3.10 Loss Function

The loss function used in multi-class classification is called Categorical cross entropy.
Categorical cross entropy is a loss function which is used in classification tasks,
specifically when the classes are exclusive. It is also called SoftMax loss as it is the
fusion of SoftMax activation and Cross-Entropy loss. It is used to measure the
difference between the true class label distribution and the predicted class label
distribution. Identification of target is done by using one-hot vector having 1 on single
position and 0’s everywhere else. The values predicted by the model are then passed to
SoftMax layer that coverts the values to probability distribution. Then the actual values
and predicted probability is passed to cross entropy loss which gives the overall loss of
the model.

Mathematically,

1
𝐶𝐶𝐸 𝐿𝑜𝑠𝑠 = − 𝑁 ∑𝑁 𝐾
𝑖=1 ∑𝑘=1 1(𝑦𝑖 = 𝑘)*log𝑦
̂𝑖 3-5

Where,
N = Total number of data points
k = Total number of classes
𝑦𝑖 = Actual class the data points belong to
𝑦̂=
𝑖 Predicted Probability by the model

3.11 Early Stopping

Early stopping is a regularization technique commonly used in machine learning to


prevent overfitting during training. Instead of running the training process for a fixed
number of iterations, early stopping monitors the model's performance on a validation
set. If the performance stops improving or starts deteriorating, the training process is
halted before completion. This helps prevent the model from learning noise in the
training data and enhances its generalization to unseen examples, ultimately improving
efficiency and avoiding potential overfitting issues. In this project early stopping is used
to monitor validation loss with patience 15.
3.12 Web Application Development

The Web application is build using Django and Tailwind CSS. It mainly consists of 3
pages. The Home page serves as the welcoming gateway, introducing the application’s
purpose. The prediction page serves as the page where the user provides input image of
the bird to predict bird species. The about page provide the details about the project
about how the project is made and how many bird species can this project classify.

Figure 3-12: Home page

Figure 3-13: Prediction Page


Figure 3-14: About Us page

Figure 3-15: Predicted Bird Species details page


CHAPTER 4: RESULT AND DISCUSSIONS
4.1 Experiment

The model underwent experiments with various configurations in the CNN


architecture. Different models produced different outcomes, and the best one was
chosen from these trials. The dataset was distributed in a 90:5:5 ratio for training,
testing, and validation. The chosen model includes four convolution layers with 64, 54,
32, and 32 filters, respectively, followed by batch normalization and maximum pooling
layers. In addition, there are four dense layers of 512, 256, 128, and 64 units, each using
ReLU activation. The final output layer contains 38 units for predicting bird species,
which uses a SoftMax layer.

The experiment with various configuration yield different result as:

Dataset Convolution layers Fully connected layers Output no of Accuracy


Trials Dropout
Distribution layer epochs (validation)
1st 2nd 3rd 4th 1st 2nd 3rd 4th
1 90:05:05 64 64 32 - 512 256 - - 38 0.1 200 0.893

2 90:05:05 64 64 32 32 512 256 128 64 38 0.3 116 0.908

3 80:10:10 64 64 32 16 512 256 - - 38 0.3 200 0.8812

4 80:10:10 64 64 32 16 256 128 64 - 38 0.3 129 0.8329

Figure 4-1: Model experiment with different configurations

4.2 Result analysis

4.2.1 Training Accuracy


Figure 4-2: Training and validation accuracy on training set

The accuracy is the measure of the model to correctly predict the classes of the input
data. From the graph, at the end of the 116 epoch the accuracy is 0.8823 and the
validation accuracy is 0.8955. The almost 1.32% difference between accuracy and
validation accuracy shows very minor level of overfitting. So, the model is generalizing
well to the unseen data.

4.2.2 Training Loss

Figure 4-3: Training and validation loss on training set

Loss represents the measure of how well the model is performing on the given set of
the data. The loss is the numerical value that indicates the difference between the
predicted output and the actual target values. The main objective of training the model
is to minimize the loss as low as possible. During the training process, the trend of the
training and validation loss over time must be monitored to see if it is consistently
increasing or decreasing. It is important to ensure that the model is not overfitting to
the training data. Inconsistencies in the validation loss and training loss can indicate
overfitting. Therefore, it is important to monitor both the training and validation loss
throughout the training process and use techniques such as regularization or early
stopping to prevent overfitting.
Both training and validation loss can be seen decreasing as the model trains for enough
number of epochs. The training loss at the 116 epochs is 0.4789 whereas the validation
loss is 0.4406

4.3 Model Evaluation on Unseen Images

The model’s performance is evaluated on the testing dataset which is splitted from the
total dataset. The testing dataset contains the 5% of each species of the total dataset that
the model is not trained with before. Evaluation of test metrics is more important than
training metrics, as the test metrics indicate the model's ability to generalize on unseen
data from the real world. The model performed very well on the test dataset, achieving
an accuracy of 0.9080. The weighted F1 score, which considers precision and recall for
all classes, was 0.9084. This indicates that the model has a good balance between
precision and recall for all classes and is effective at classifying bird species.

Figure 4-4: Confusion matrix of the test dataset


4.4 Future Enhancements

Some limitations and future enhancements that could be applied to alleviate them are
discussed below:

➢ Limited dataset: The dataset contains only 8475 images of total 38 bird species.
Some of the bird species contains very small number of images for training the
model. Due to this it is possible that the model might not perform well with the
diverse data from the real wold. Although the model managed to get good
accuracy on test dataset, the set was not the complete representation of the real-
world scenarios. So, expansion of dataset to make it as diverse as possible is
necessary to make the system capable of classifying every bird species
accurately.
➢ Addition of real-time classification of the bird species: This project is capable
of classifying the bird species after the user provide the image of the bird species
from the web application. It can be integrating with the IoT device for real-time
classification of bird species.
➢ Limited class classification: This project is capable of classifying only 38
endangered bird species of Nepal. It can be trained with large number of bird
species to increase the scope of the project.
➢ Ensemble models: This project is based on single model trained to predict bird
species. The use of ensemble models, combining predictions from the multiple
CNN architecture often leads to improved accuracy and robustness
CONCLUSION
Despite the unaccounted limitations that exist in our project work, this project can be
remarked as an achievement and opportunity to explore application of a CNN model in
classification of Endangered Bird Species of Nepal. The main objective of the project
is to develop a classification system that can classify endangered bird species accurately
which we have rightfully succeeded in doing so.
References

[1] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel and B. Thirion , "Scikit-


learn: Machine Learning in Python," Journal of Machine Learning Research,
vol. 12, pp. 2825-2830, 2011.

[2] T. Amaratunga, "Github," 2019. [Online]. Available:


https://github.com/Thimira/bird_watch.

[3] J. Martinsson, "Bird Species Identification using Convolution Neural


Networks," Chalmers University of Technology and University of Gothenburg,
Sweden, 2017.

[4] J. Niemi and J. T. Tanttu, "Deep Learning Case Study for Automatic Bird
Identification," Applied Sciences, vol. 8, no. 11, 2017.

[5] M. A. Tayal, A. Mangrulkar, P. Waldey and C. Dangra, "Bird Identification by


Image Recognition," Helix, vol. 8, no. 6, pp. 4349-4352, 2018.

[6] A. Marini, J. Facon and A. L. Koerich, "Bird Species Classification Based on


Color Features," 2013 IEEE International Conference on Systems, Man, and
Cybernetics, pp. 4336-4341, 2013.

[7] X. Xiao, M. Yan, C. Ji, S. Basodi and Y. Pan, "Efficient Hyperparameter


Optimization in Deep Learning Using a Variable Length Genetic Algorithm,"
arXiv, vol. 1, 2020.

[8] B. Khatiwada, B. P. Subedi, N. Duwal and R. Gautam, "Samrakshyan: An


Endangered Birds Recognition Portal," March 2023. [Online]. Available:
https://www.linkedin.com/posts/rewangautam_research-birds-conservation-
activity-7031145284067807232-HQ8o/.

[9] S. Albawi, T. A. Mohammed and S. Al-Zawi, "Understanding of a


convolutional neural network," 2017 International Conference on Engineering
and Technology (ICET), vol. 1, no. 6, 2017.

[10] B. S. Poudel, "Wildlife Research and Monitoring in Nepal: An Overview," The


Initiation, vol. 2, no. 1, pp. 22-32, 2006.

[11] C. Inskipp, H. S. Baral, T. Inskipp, A. P. Khatiwada, M. P. Khatiwada, P. L.


Poudyal and R. Amin, "Nepal’s National Red List of Bird," Journal of
Threatened Taxa, vol. 9, no. 1, pp. 9700-9722, 2017.
Appendix

You might also like