Professional Documents
Culture Documents
Classification of Endangered Bird Species of Nepal Using Deep Learning
Classification of Endangered Bird Species of Nepal Using Deep Learning
Classification of Endangered Bird Species of Nepal Using Deep Learning
INSTITUTE OF ENGINEERING
Lamachaur, Pokhara-16
ON
Submitted By:
Submitted To:
MARCH, 2024
Classification of Endangered Bird Species of Nepal Using Deep
Learning
Submitted By:
Supervised By:
Nabin Lamichhane
Submitted To:
Pashchimanchal Campus
Lamachaur, Pokhara-16
MARCH, 2024
DECLARATION
We hereby declare that the report of the project entitled “Classification of Endangered
Bird Species of Nepal Using Deep Learning” which is being submitted to the
Department of Electronics and Computer Engineering, IOE, Pashchimanchal
Campus, in the partial fulfilment of the requirements for the award of the Degree of
Bachelor of Engineering in Electronics, Communication and Information
Engineering, is a bonafide report of the work carried out by us. The materials contained
in this report have not been submitted to any University or Institution for the award of
any degree and we are the only author of this complete work and no sources other than
the listed here have been used in this work.
Request for permission to copy or to make any use of the material in this report in whole
or in part should be addressed to:
Head
Department of Electronics and Computer Engineering
Pashchimanchal Campus, Institute of Engineering
Lamachaur-16, Pokhara
Nepal
ACKNOWLEDGEMENT
We must express our deep and sincere gratitude to the Institute of Engineering,
Pashchimanchal Campus and the Department of Electronics and Computer Engineering
for giving us this wonderful platform and an opportunity to utilize and build our skills.
We are deeply indebted to all our teachers who have provided us with valuable
knowledge that has really helped us to reach here and explore the horizon further.
We would also like to acknowledge our family and friends and would like to thank
them for being helpful and supportive which has worked as a motivation and created a
secure space for us.
ABSTRACT
TABLE OF CONTENTS
COPYRIGHT ................................................................................................................ iv
ACKNOWLEDGEMENT ............................................................................................. v
ABSTRACT .................................................................................................................. vi
3.3.1 Python 13
3.3.2 Visual Studio Code ................................................................................ 13
3.3.3 TensorFlow ............................................................................................ 13
3.3.4 Keras 14
3.3.5 Scikit-learn ............................................................................................. 14
3.3.6 Numpy.................................................................................................... 14
3.3.7 Pandas 14
3.3.8 Matplotlib............................................................................................... 14
3.4 Image collection and dataset preparation ....................................................... 15
CONCLUSION ............................................................................................................ 35
References .................................................................................................................... 36
Appendix ...................................................................................................................... 37
LIST OF FIGURES
FIGURE PAGE
JS JavaScript
ML Machine Learning
UI User Interface
Birds represent a diverse class of creatures found across our planet. They possess
unique characteristics such as feathers, toothless beaked jaws, and the ability to lay
eggs. With approximately 11,162 known species worldwide, the majority of bird
populations are concentrated in tropical regions. Indonesia claims the title for hosting
the largest population of endemic bird species, with around 500 species. Australia
follows closely behind with nearly 360 bird species, while the Philippines and Brazil
each have approximately 260 species. However, higher latitude countries such as Spain,
Portugal, and other European nations typically have fewer endemic bird species.
As per reports 159 bird species have become extinct in the past 500 years. As of the
2021 IUCN Red List, out of the total global bird species, approximately 1,445 species
(13% of the total) are classified as threatened or endangered. This means that roughly
1 in 7 bird species on Earth is at risk. Brazil harbors 172 of these endangered species,
while Indonesia is home to 106 threatened endemic species.
In the context of Nepal, the country boasts a remarkable diversity of avian life, with
875 recorded bird species. This places Nepal 27th on the list of countries with the
highest endemic bird populations. Of these species, approximately 38 species are
considered as endangered. Notable examples of endangered bird species in Nepal
include Cheer Pheasant, Sultan tit, White-throated bulbul, Abbott’s Babbler, Great
Hornbill, etc. There are many birds which have gone extinct and many which are on
the verge of extinction. This is because of many factors like unplanned urbanization,
loss of habitat due to rapid deforestation, rampage killing of birds, climate change,
global warming etc. As humans, it becomes our responsibilities to find ways to protect
all the different aspects of nature.
1.2 Motivation
Our project focuses on improving the accuracy of identifying endangered bird species
in Nepal, addressing the current limitations of manual methods which often require
specialized knowledge and expertise, leading to delays in recognizing these species.
We aim to develop a deep learning-based classification system tailored for the specific
context of Nepal's avian biodiversity. By utilizing advanced CNN models, we intend to
create a robust classification framework capable of accurately identifying endangered
bird species in real-time. This system will not only streamline the identification process
but also provide valuable insights into the distribution and behavior of these species,
thereby facilitating the formulation of targeted conservation strategies. Ultimately, our
project seeks to contribute significantly to the conservation efforts aimed at
safeguarding Nepal's endangered bird specie
The economic feasibility of a Bird Species identification system depends on the specific
requirements of the system. Generally, the cost of implementing a Bird Species
identification system will depend on the number of sensors, cameras, and other devices
needed, as well as the cost of the software and hardware required to process and analyze
the data. Conducting a comprehensive cost-benefit analysis and market research is
crucial to assess the economic viability of the Bird Species surveillance system.
1.6.2 Technical Feasibility
Juha Niemi, Juha T Tanttu et al (2018) [4], proposed a Convolutional neural network
trained with deep learning algorithms for image classification. It also proposed a data
augmentation method in which images are converted and rotated in accordance with
the desired color. The final identification is based on a fusion of parameters provided
by the radar and predictions of the image classifier. They demonstrated that data
augmentation method is suitable for image classification problem and it significantly
increases the performance of the classifier.
Andreia Marini, Jacques Facon and Alessandro L. Koerich et al (2013) [6], presents a
novel approach for bird species classification based on color features extracted from
unconstrained images. The approach addresses challenges such as variations in
scenarios, bird poses, sizes, and viewing angles, as well as occlusions and lighting
variations in the images. A color segmentation algorithm is applied to eliminate
background elements and identify candidate regions where the bird may be present.
Experimental results on the CUB-200 dataset show a 75% correct segmentation rate
and varying bird species classification rates (ranging from 90% to 8%) depending on
the number of classes considered.
In CNN model, the first step is to split the image dataset into training and testing. This
division help to evaluate the performance of the trained model on unseen data.
Preprocessing is crucial to prepare the image data for training. It involves several steps
such as resizing image to consistent size, normalizing pixel values and applying
techniques like data augmentation to increase diversity in training set. The training data
is further splitted for validation purpose. This allows monitoring the model’s
performance during training and prevent overfitting.
The training process involves feeding the training data to the CNN model. The model
learns to optimize its internal parameter by minimizing loss function. The testing data
is fed to the CNN model to evaluate metrics like accuracy, weighted precision, weighted
F1-score. The server processes the image using the CNN model and predicts bird
species. The predicted output, which includes the identified bird species and their
details, is then transmitted to a web application.
3.3.1 Python
Python is a versatile and high-level programming language known for its simplicity and
readability. It offers a wide range of libraries and frameworks that make it suitable for
various applications, including web development, data analysis, machine learning, and
automation. Python's clean syntax and extensive standard library allow developers to
write code more efficiently. It supports multiple programming paradigms, including
object-oriented, functional, and procedural programming.
Visual Studio Code (VS Code) is a lightweight and highly versatile source code editor
developed by Microsoft. Known for its simplicity, speed, and cross-platform
compatibility, VS Code has gained widespread popularity among developers for a
variety of programming languages. It offers a rich set of features, including built-in Git
integration, intelligent code completion, debugging support, and a vast extension
ecosystem that allows users to customize and enhance their development environment.
With its emphasis on productivity and a user-friendly interface, Visual Studio Code has
become a go-to choose for developers seeking a powerful yet lightweight coding tool.
3.3.3 TensorFlow
TensorFlow is a popular deep learning framework that provides a high-level API for
building and training neural networks. It offers support for 3D CNNs and has pre-built
implementations of ResNet and ensemble techniques. TensorFlow also provides tools
for data preprocessing, model training, and evaluation
3.3.4 Keras
Keras is a high-level neural network API that can run on top of TensorFlow or other
backend engines, including PyTorch. Keras simplifies the process of building and
training neural networks, making it an accessible choice for lip reading tasks. It
provides pre-trained models, including ResNet, and supports ensemble techniques for
model combination.
3.3.5 Scikit-learn
Scikit-learn is a popular Python library for machine learning tasks. It provides a wide
range of tools and algorithms for tasks such as classification, regression, clustering, and
dimensionality reduction. With a user-friendly interface and extensive documentation,
scikit-learn is widely used for data analysis and building predictive models. It also
offers utilities for data preprocessing, model evaluation, and model selection, making
it a comprehensive and versatile tool for machine learning practitioners.
3.3.6 Numpy
NumPy is a powerful Python library for numerical computing. It provides support for
large, multi-dimensional arrays and matrices, along with a collection of mathematical
functions to operate on these arrays efficiently. NumPy is widely used for tasks such as
numerical analysis, scientific computing, and data manipulation.
3.3.7 Pandas
Pandas is a popular Python library built on top of NumPy that specializes in data
manipulation and analysis. It offers data structures like Data Frames, which provide a
flexible and efficient way to work with structured data. Pandas provides functions for
data cleaning, filtering, grouping, merging, and many other operations, making it a
valuable tool for data preprocessing and exploratory data analysis.
3.3.8 Matplotlib
On the websites ebird.org bird images from throughout the world are shared. The
images are uploaded on the website by the contributors who travel around.
We collected 8475 images from this website of 38 bird species. The images were
splitted into training sets, validation sets and testing sets in the ratio of 0.9:0.5:0.5.
The training sets consists of 7671 images of 38 bird species for training, validation set
consists of 402 images for validation during training and 402 images for testing the
model.
Figure 3-3 : Dataset distribution of bird species
To ensure uniformity in the dataset, we standardized the images which have significant
variations in the size and quality. Given their original high-quality nature, the images
occupied considerable storage space. We resized the images to a consistent dimension
of 224*224 pixels. This process helps us to maintain the original aspect ratio while
ensuring all the images having consistent same dimension. The images in the dataset
are in RGB color. The pixels value of the images is normalized between 0 to 255 to
prevent potential numerical issues during model training. By performing normalization,
we ensured that the pixel values were represented in a standardized range suitable for
training the model effectively.
These processing steps helps to achieve consistent image sizes, reduces storage
requirements through compression, and normalize the pixels values for improving
model training outcomes.
We used Keras deep learning library to create an image generator. Instead of using the
original images directly, this generator produces randomly altered images. These
altered images include minor changes to the dataset to generate new data points. Data
augmentation helps to reduce overfitting and improve accuracy.
Convolution layer applies 64 filters of kernel size 3*3 to the input images with a same
padding. The activation function used is ReLU (Rectified Linear Unit). The input shape
of the image is (224, 244, 3), indicating the input images are of size 224*224 pixel with
3 color channels (RGB).
Flatten layers flatten the multidimensional output from the previous layer into a 1-
Dimension vector, preparing it for the fully connected layers. These layers are fully
connected layers. The first dense layers have 512 units with ReLU activation function
followed by a dropout layer that randomly set 30% of the inputs to 0 during training to
prevent overfitting. The second dense layer have 256 units with ReLU activation
function followed by a dropout layer of 30%. The third dense layer have 128 units with
ReLU activation function followed by a dropout layer of 30%. The fourth dense layer
have 64 units with ReLU activation function followed by a dropout layer of 30%. The
final dense layer has 38 units with SoftMax activation function. All these layers
contribute for the construction of CNN model for bird species prediction, progressively
extracting features from the input images and making predictions based on those
features.
An operation that combines two sequences of data to produce a third one is known as
convolution. In image processing, convolutions are used to apply filter or kernels to
an image that also allows the network to detect the certain patterns or features in the
image.
During the convolution operation, the filter is slid over the data, performing a
mathematical operation at each position. The result of this operation is a new set of
data that has been transformed or filtered in some way. For example, if the filter is
designed to detect edges, then the resulting data will highlight the edges in the
original data.
Let an image of dimension (W1 * H1) is convoluted by using a filter (F*F), then the
image will have new shape (W2 * H2). The formula to calculate the output shape after
convolution is given below:
𝑊1 + 2𝑃 − 𝐹
𝑊2 = +1 3-1
𝑆
𝐻1 + 2𝑃 − 𝐹
𝐻2 = +1 3-2
𝑆
Where,
𝑊2 ,𝐻2 = Width and height of output
𝑊1 ,𝐻1 = Width and height of input
S = Stride
P = Amount of zero padding
F = Filter size
3.7.3 Kernel
Kernel is a 2D matrix that uses convolution technique to blur, sharpen, and carry out
other image processing operations. The convolution method is shown in figure below,
where an image matrix and a kernel are multiplied element by element and added
together at last to get an output.
Figure 3-7: Working of Kernel
3.7.4 Filter
Filter is a small matrix of numbers that is used in a convolution operation. CNN uses
multiple kernels stacked together. Thus, filter is referred to this 3D structure of stacked
kernels. When operating in 2D, filter size is same as the kernel size. It is used to detect
patterns or features in data, such as edges or lines in an image.
3.7.5 Stride
Stride is a vital hyperparameter that determines the size of output produced by the
convolutional layer and plays a key role in determining the spatial resolution of the
output. The stride is the number of pixels that the convolutional kernel moves when it
slides over the input image.
3.7.6 Padding
Padding refers to adding extra elements to a list or array so that it has a fixed size. This
is often done when working with neural networks, as they typically expect input data
to have a specific size. Padding is the process adding extra pixels around the edges of
an image which typically helps to prevent shrinking by adding layers of zeros to input
images. This is done to achieve more accurate results.
Pooling layers are the layers added after convolution layer in CNN that perform down
sampling of an image. They are the building blocks of CNN, and they are used to reduce
the dimension of feature maps. As they reduce the dimension of feature maps, it reduces
the number of parameters to be learned and number of computations to be performed.
Thus, pooling layers come very handy in CNN as they reduce overfitting of the model.
Max Pooling is one of the way pooling operation which selects the maximum element
from the region of the feature map covered by the filter. Thus, the output after max
pooling layer would be a feature map containing the most prominent features of the
previous feature map.
This Normalization step brings the activations closer to zero mean and unit variance,
which helps to stabilize and speed up training.
When transitioning from the convolution layer to the fully connected layer, the flatten
layer is often used to reduce the multidimensional input to one dimension. The pooled
feature map is transformed into a single column and delivered to the fully connected
layer using the flatten function. Flatten layer is an important and useful component of
a CNN that helps to simplify the process of connecting the convolutional layers to the
fully-connected layers, improve computational efficiency, and improve network
performance.
3.7.11 Dropout
ReLU (Rectified Linear Unit) is one of the most popular activation functions. It replaces
all negative values with zero, allowing only positive values to pass through. ReLU helps
with the vanishing gradient problem and has contributed to the success of deep learning
models.
It has been used in convolution layers to introduce non-linearity to the network,
allowing CNNs to learn and model complex relationships in the data. Non-linear
activation functions are crucial for capturing intricate patterns and enabling the network
to approximate any arbitrary functions.
3.8.2 SoftMax
The SoftMax function is a popular activation function used in the output layer of a
neural network for multi-class classification problems. It transforms the raw output
scores of a network into a probability distribution over multiple class. The SoftMax
function takes a vector of real numbers as input and produces another vector of the same
dimension, where each element represents the probability of the corresponding class.
Mathematically,
𝑒 𝑥𝑖
𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) = 𝑥
∑𝑗 𝑒 𝑗
3-4
In machine learning, (Adaptive Moment Estimation) Adam is one of the popular and
powerful optimization algorithms that combines the benefits of stochastic gradient
descent (SGD) with the momentum and adaptivity, and requires fewer hyperparameters
to be tuned. It is the extension of the SGD and is well-suited for training deep learning
models.
Adam combines the benefits of SGD with momentum and the ability to adapt the
learning rates of each parameter. It maintains an exponentially decaying average of past
gradients and an exponentially decaying average of past squared gradients. These
moving averages are used to estimate the first and second moments of the gradients,
respectively. The learning rate of each parameter is then adjusted based on these
estimates, which helps the optimization process converge more quickly.
The loss function used in multi-class classification is called Categorical cross entropy.
Categorical cross entropy is a loss function which is used in classification tasks,
specifically when the classes are exclusive. It is also called SoftMax loss as it is the
fusion of SoftMax activation and Cross-Entropy loss. It is used to measure the
difference between the true class label distribution and the predicted class label
distribution. Identification of target is done by using one-hot vector having 1 on single
position and 0’s everywhere else. The values predicted by the model are then passed to
SoftMax layer that coverts the values to probability distribution. Then the actual values
and predicted probability is passed to cross entropy loss which gives the overall loss of
the model.
Mathematically,
1
𝐶𝐶𝐸 𝐿𝑜𝑠𝑠 = − 𝑁 ∑𝑁 𝐾
𝑖=1 ∑𝑘=1 1(𝑦𝑖 = 𝑘)*log𝑦
̂𝑖 3-5
Where,
N = Total number of data points
k = Total number of classes
𝑦𝑖 = Actual class the data points belong to
𝑦̂=
𝑖 Predicted Probability by the model
The Web application is build using Django and Tailwind CSS. It mainly consists of 3
pages. The Home page serves as the welcoming gateway, introducing the application’s
purpose. The prediction page serves as the page where the user provides input image of
the bird to predict bird species. The about page provide the details about the project
about how the project is made and how many bird species can this project classify.
The accuracy is the measure of the model to correctly predict the classes of the input
data. From the graph, at the end of the 116 epoch the accuracy is 0.8823 and the
validation accuracy is 0.8955. The almost 1.32% difference between accuracy and
validation accuracy shows very minor level of overfitting. So, the model is generalizing
well to the unseen data.
Loss represents the measure of how well the model is performing on the given set of
the data. The loss is the numerical value that indicates the difference between the
predicted output and the actual target values. The main objective of training the model
is to minimize the loss as low as possible. During the training process, the trend of the
training and validation loss over time must be monitored to see if it is consistently
increasing or decreasing. It is important to ensure that the model is not overfitting to
the training data. Inconsistencies in the validation loss and training loss can indicate
overfitting. Therefore, it is important to monitor both the training and validation loss
throughout the training process and use techniques such as regularization or early
stopping to prevent overfitting.
Both training and validation loss can be seen decreasing as the model trains for enough
number of epochs. The training loss at the 116 epochs is 0.4789 whereas the validation
loss is 0.4406
The model’s performance is evaluated on the testing dataset which is splitted from the
total dataset. The testing dataset contains the 5% of each species of the total dataset that
the model is not trained with before. Evaluation of test metrics is more important than
training metrics, as the test metrics indicate the model's ability to generalize on unseen
data from the real world. The model performed very well on the test dataset, achieving
an accuracy of 0.9080. The weighted F1 score, which considers precision and recall for
all classes, was 0.9084. This indicates that the model has a good balance between
precision and recall for all classes and is effective at classifying bird species.
Some limitations and future enhancements that could be applied to alleviate them are
discussed below:
➢ Limited dataset: The dataset contains only 8475 images of total 38 bird species.
Some of the bird species contains very small number of images for training the
model. Due to this it is possible that the model might not perform well with the
diverse data from the real wold. Although the model managed to get good
accuracy on test dataset, the set was not the complete representation of the real-
world scenarios. So, expansion of dataset to make it as diverse as possible is
necessary to make the system capable of classifying every bird species
accurately.
➢ Addition of real-time classification of the bird species: This project is capable
of classifying the bird species after the user provide the image of the bird species
from the web application. It can be integrating with the IoT device for real-time
classification of bird species.
➢ Limited class classification: This project is capable of classifying only 38
endangered bird species of Nepal. It can be trained with large number of bird
species to increase the scope of the project.
➢ Ensemble models: This project is based on single model trained to predict bird
species. The use of ensemble models, combining predictions from the multiple
CNN architecture often leads to improved accuracy and robustness
CONCLUSION
Despite the unaccounted limitations that exist in our project work, this project can be
remarked as an achievement and opportunity to explore application of a CNN model in
classification of Endangered Bird Species of Nepal. The main objective of the project
is to develop a classification system that can classify endangered bird species accurately
which we have rightfully succeeded in doing so.
References
[4] J. Niemi and J. T. Tanttu, "Deep Learning Case Study for Automatic Bird
Identification," Applied Sciences, vol. 8, no. 11, 2017.