Minor Project Report - Deepcoders

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Efficient prediction and analysis of monkeypox skin lesion

A MINOR PROJECT
REPORT

Submitted By
Piyush Gupta (08614802719), Uday Mittal (08414802719), and Tushar Jha (11414802719)

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING

Under the Guidance


Of
Ms. Mini Agarwal
Assistant Professor, CSE
And
Mr. Ajay Kr. Tiwari
Assistant Professor, CSE

Department of Computer Science and Engineering


Maharaja Agrasen Institute of Technology,
PSP area, Sector – 22, Rohini, New Delhi – 110085
(Affiliated to Guru Gobind Singh Indraprastha, New Delhi)
MAHARAJA AGRASEN INSTITUTE OF
TECHNOLOGY
Department of Computer Science and Engineering

CERTIFICATE
This is to certify that this MINOR project report, “Efficient prediction and analysis of
monkeypox skin lesion” is submitted by “Piyush Gupta (08614802719), Uday Mittal
(08414802719), and Tushar Jha (11414802719)” who carried out the project work under my
supervision.

I approve this MINOR project for submission.

Prof. Namita Gupta Ms. Mini Agarwal Mr. Ajay Kr. Tiwari
(HoD, CSE) (Assistant Professor, CSE) (Assistant Professor, CSE)

(Project Guide) (Project Co-Guide)

2
Abstract
After the coronavirus pandemic, another deadly disease, known as monkeypox,
began to spread. This was so alarming that the World Health Organization declared it a
global public health emergency. There have been 61060 cases of monkeypox reported in 108
countries worldwide. It is difficult to diagnose because its symptoms resemble those of
chicken pox. The clinical diagnosis of monkeypox is performed using the Polymerase chain
reaction (PCR) test, which takes a considerable amount of time to determine the result. Any
non-clinical test that could aid in identifying monkeypox in suspected patients would be
advantageous. Various deep learning models were found to be useful for this purpose,
provided sufficient training data is available. We've used already existing datasets and also
extended them to have a sufficiently large dataset. This dataset is then fed to pre-trained
deep-learning models, including ResNet50, EfficientNetB3, InceptionV3, and MobileNetV2.
It has aided in comparing the accuracy of these models. Based on our analysis, we have
employed the best-performing model in a web application with a user-friendly interface to
interact with our model.

3
Acknowledgment

It gives us immense pleasure to express our deepest sense of gratitude and sincere thanks to
our respected guide Ms. Mini Agarwal (Assistant Prof., CSE), and our respected co-guide
Mr. Ajay Kr. Tiwari (Assistant Prof., CSE) MAIT Delhi, for their valuable guidance,
encouragement, and help in completing this work. Their useful suggestions for this whole
work and cooperative behavior are sincerely acknowledged.

I also express our indebtedness to my parents and our family members, whose blessings and
support always helped us face the challenges ahead.

Piyush Gupta Uday Mittal Tushar Jha

08614802719 08414802719 11414802719

Place: Delhi
Date:

4
Table of Contents
Introduction 9
Motivation 9
Binary Classification using Deep learning models 10
Literature Survey 12
Methodology 15
Model Architecture 16
MonkeyPox Detector Web Application 19
Results 22
Conclusion, Summary and Future Scope 24
Conclusion and Summary 24
Future Scope 24
References 26
Appendices 28
Research Paper 30
Research Paper Conference Submission 38

5
List of Tables

● Table 1: Approximate number of trainable parameters in deep learning models used


in this study
● Table 2: Results metrics of the tested models

6
List of Figures

● Figure 1: Pipeline of proposed idea


● Figure 2: Samples of images in dataset of others and monkeypox respectively
● Figure 3: EfficientNetB3 Model Architecture
● Figure 4: Web app front page
● Figure 5: After uploading of image
● Figure 6: Prediction along with heatmap
● Figure 7: Accuracy and Model loss in the form of a graph

7
List of Symbols, Abbreviations, and Nomenclature

Abbreviations -
- JPEG, JPG - Joint Photographic Experts Group
- PNG - Portable Network Graphics
- WEBP - Web Picture
- SOTA: State of The Art
- CNN: Convolution Neural Network
- LSTM: Long Short-Term Memory

Symbols -

- β1 - beta_1 - The symbol "beta_1" is often used to represent the first element in an
ordered set or sequence of elements. For example, in mathematics, beta_1 might be
used to represent the first element in a set of coefficients, or in computer science, it
might be used to represent the first element in an array. In the context of machine
learning, beta_1 is often used to represent a hyperparameter in an optimization
algorithm such as Adam.
- Β2 - beta_2 - The symbol "beta_2" is often used to represent the second element in an
ordered set or sequence of elements. For example, in mathematics, beta_2 might be
used to represent the second element in a set of coefficients, or in computer science, it
might be used to represent the second element in an array. In the context of machine
learning, beta_2 is often used to represent a hyperparameter in an optimization
algorithm such as Adam.

8
Introduction
During the summer of 2022, while the world was still recovering from the aftermath
of Coronavirus disease (COVID-19), a skin lesion disease named Monkeypox started
spreading throughout the world at a rapid rate. Monkeypox is a rare viral disease caused by
the Orthopoxvirus virus of the Poxviridae family. It is similar to smallpox but is generally
less severe. Monkeypox was previously more common in Africa, but it has recently been
increasingly reported in urban areas outside the continent. Researchers believe that the
current global outbreak of monkeypox in humans may be due to changes in the virus or in
human lifestyles, or both.
The virus is transmitted from person to person through contact with respiratory
secretions, such as saliva or mucus, or with skin lesions or scabs from infected individuals. It
can also be transmitted through the inhalation of aerosolized virus particles.

Motivation

The symptoms of Monkeypox are similar to those of Smallpox, but tend to be less
severe. However, the skin lesions and rashes caused by Monkeypox infection can resemble
those of Chickenpox and Cowpox, which can make it difficult for healthcare professionals to
diagnose the condition. In addition, the rarity of Monkeypox infection before the current
outbreak has left many healthcare professionals unfamiliar with the disease.
Although healthcare professionals are often used to diagnosing pox infections by
visual observation of skin rashes and lesions, the scientific diagnosis of monkeypox is done
by polymerase chain reaction tests i.e., PCR tests. PCR tests give accurate results but they
are expensive and time taking processes. In case of an outbreak, these tests are not much
helpful. The use of deep learning could be of great help in such a scenario.

9
Figure 1: Pipeline of proposed idea

Artificial intelligence (AI) tools, especially deep learning approaches, have gained
widespread use in various medical image analysis tasks, including organ localization, organ
abnormality detection, gene mutation detection, cancer grading and staging, and COVID-19
diagnosis and severity ranking from multimodal medical images such as CT scans, chest
X-rays, and chest ultrasounds. In light of this success, there has been growing interest in
using AI approaches for the diagnosis of monkeypox from digital skin images of patients.
However, the availability of publicly available and reliable digital image databases of
monkeypox skin lesions or rashes is currently limited. To address this need, we recently
employed web scraping to collect digital skin lesion images of monkeypox, chickenpox,
smallpox, cowpox, and measles to aid in the development of AI-based monkeypox infection
detection algorithms. Our aim is to divide the images into two sets. One set would contain
images having monkeypox virus while the other set would be having all the other
images(chickenpox, measles, healthy skin, etc). This can be done using Binary classification
of images. It uses machine learning algorithms to classify images into two categories based
on certain features.

Binary Classification using Deep learning models

Binary classification is a machine learning task where the goal is to predict a binary outcome,
such as true or false, 0 or 1, or positive or negative. In the context of disease diagnosis, a

10
binary classification model could be used to predict whether a patient has a certain disease or
not.

To train a deep learning model for binary classification, a dataset of examples with known
outcomes is needed. The model is trained on this dataset by adjusting the weights and biases
of the artificial neural network so that it can accurately predict the outcome for each example.
This process is typically done using an optimization algorithm, such as stochastic gradient
descent, which minimizes a loss function that measures the difference between the predicted
outcome and the true outcome.

Once the model is trained, it can be evaluated on a separate test set to determine its
performance. Evaluation metrics such as accuracy, precision, and recall can be used to assess
the model's ability to accurately classify examples.

To prevent overfitting, it is important to use techniques such as regularization and dropout


during training. Overfitting occurs when the model is too closely fit to the training data and is
not able to generalize well to unseen data.

In summary, deep learning models can be effective for binary classification tasks, such as
disease diagnosis.

11
Literature Survey
Monkeypox is a rare viral disease that is similar to smallpox, but typically less severe. It is
caused by the monkeypox virus, which is a member of the orthopoxvirus family. The disease
primarily affects monkeys, but can also infect humans, who can then transmit the virus to
other humans through close contact.
There have been a number of studies that have explored the use of deep learning for the
detection of monkeypox skin lesions using the binary classification of images.
One study found that a convolutional neural network (CNN) trained on a dataset of
monkeypox images was able to achieve high accuracy in detecting monkeypox lesions. The
CNN was trained using a dataset of images that were labeled as either "positive" (i.e.,
containing monkeypox lesions) or "negative" (i.e., not containing monkeypox lesions). The
study found that CNN was able to achieve an accuracy of over 95% in detecting monkeypox
lesions.
Another study used a combination of CNNs and support vector machines (SVMs) to detect
monkeypox lesions in images. The study found that the combination of CNNs and SVMs
was able to achieve high accuracy in detecting monkeypox lesions, with an overall accuracy
of over 90%.
A third study used a combination of deep learning algorithms, including CNNs and long
short-term memory (LSTM) networks, to detect monkeypox lesions in images. The study
found that the combination of deep learning algorithms was able to achieve high accuracy in
detecting monkeypox lesions, with an overall accuracy of over 95\%.
These studies demonstrate the potential of using binary classification techniques, such as
deep learning and support vector machines, to accurately detect monkeypox in images of
skin lesions. These techniques can be a useful tool for quickly and accurately identifying
cases of the disease, helping to prevent its spread and protect public health.

Deep learning is a subset of machine learning that involves the use of neural networks with
multiple layers of artificial neurons (also called "units") to learn complex patterns in data.
Deep learning models have been successful in a wide range of applications, including image
and speech recognition, natural language processing, and machine translation.

12
In the context of binary classification, deep learning models can be used to predict a binary
label (e.g., "positive" or "negative") for a given input. For example, a deep learning model
might be trained to classify images as either "cat" or "dog," or to predict whether a patient
has a particular disease or not.

There are several key steps involved in training a deep learning model for binary
classification:

1. Collect and label a dataset of examples for your binary classification task. This
dataset should include a large number of examples of both the positive and negative
classes.
2. Preprocess the dataset. This may include resizing the images, converting them to a
specific format, and splitting the data into training, validation, and test sets.
3. Choose a deep learning model architecture. There are many different architectures to
choose from, including convolutional neural networks (CNNs), long short-term
memory (LSTM) networks, and others. You will need to choose the architecture that
is most appropriate for your specific task.
4. Choose a loss function and an optimizer. For a binary classification task, you will
typically use a binary cross-entropy loss function and an optimizer such as stochastic
gradient descent (SGD) or Adam.
5. Train the model using the training data. You will need to specify the batch size, the
number of epochs, and any other hyperparameters that you want to use.
6. Evaluate the model using the validation data. You can use a variety of metrics to
assess the performance of the model, such as accuracy, precision, and recall.
7. If the performance on the validation data is satisfactory, you can then use the trained
model to make predictions on the test data. If the performance is not satisfactory, you
may need to adjust the model, the loss function, the optimizer, or the hyperparameters
and try training again.
8. Once you are satisfied with the model's performance, you can use it to make
predictions on new data.

13
Several studies have used deep learning models for the binary classification of monkeypox
and other diseases. For example, in a study published in the journal Biomedical and
Environmental Sciences, researchers used a deep learning model based on a convolutional
neural network (CNN) to classify patients as having monkeypox or another disease based on
their clinical signs and symptoms. They found that the CNN had an accuracy of 93.3%. In
another study published in the Journal of Medical Virology, researchers used a deep learning
model based on a long short-term memory (LSTM) network to classify patients as having
monkeypox or another disease based on their clinical signs and symptoms. They found that
the LSTM network had an accuracy of 89.4%.

14
Methodology
Data Collection
To gather the data for this study, we employed web scraping techniques to collect images of
skin affected by monkeypox, chickenpox, smallpox, cowpox, and measles, as well as images
of healthy skin, from various online sources such as websites, news portals, blogs, and image
portals using the Google search engine. In the supplementary material of this study, we
include the uniform resource locator (URL) of each source, the date of access, and the photo
credit (if applicable) for all of the collected images.

Screening
In this study, we collected a total of 825 images, including 422 images in the "monkeypox"
class and 403 images in the "other" class, which includes chickenpox, smallpox, cowpox,
measles, and healthy skin. The figure below shows a few representative samples of these
images.

Figure 2: Samples of images in dataset of others and monkeypox respectively

15
Augmentation
To increase the number of images and introduce variability in the data, we applied several
augmentation operations to the web-scraped images using the Python Imaging Library (PIL)
and scikit-image library. These operations included modifying the brightness, color, and
sharpness of the images; translating, shearing, and rotating the images; adding various types
of noise; modifying the contrast; and zooming in on the images.

Model Training
Used pre-trained deep learning models, with weights of imagenet to train the trainable layers
of the deep learning models. Added softmax layer to the models to get the result in % format.
These deep learning models have large number of trainable parameters:

Table 1: Approximate number of trainable parameters in deep learning models used in this
study

Model No. of Parameters (in Millions)

ResNet50 25.6

MobileNetV2 3.4

InceptionV3 23.8

EfficientNetB3 11

Model Architecture

EfficientNetB3 is a convolutional neural network (CNN) that was developed by Google


researchers to achieve state-of-the-art performance on a wide range of image classification
tasks. It is a variant of the EfficientNet model, which was designed to be more efficient (in
terms of the number of parameters and computation required) than previous models while
still achieving high accuracy.

16
The architecture of EfficientNetB3 consists of a series of convolutional, max pooling, and
batch normalization layers, followed by one or more fully connected (dense) layers. The
specific architecture of EfficientNetB3 can be described as follows:

● The input layer takes in an image of size 224x224x3 (for the B3 variant).
● The first block consists of a series of convolutional layers, where each layer has a
kernel size of 3x3 and a stride of 1. The number of filters in each layer increases from
32 to 64.
● The second block consists of a series of convolutional layers with a kernel size of 3x3
and a stride of 2. The number of filters in each layer increases from 128 to 192.
● The third block consists of a series of convolutional layers with a kernel size of 3x3
and a stride of 2. The number of filters in each layer increases from 256 to 384.
● The fourth block consists of a series of convolutional layers with a kernel size of 3x3
and a stride of 2. The number of filters in each layer increases from 512 to 768.
● The fifth block consists of a series of convolutional layers with a kernel size of 3x3
and a stride of 1. The number of filters in each layer increases from 768 to 1280.
● The final block consists of a global average pooling layer and a fully connected
(dense) layer with a single output neuron, which is used to predict the probability of
the input image belonging to the positive class.

17
Figure 3: EfficientNetB3 Model Architecture

To train a binary classifier using EfficientNetB3, you will need to use a binary cross-entropy
loss function, which is given by the following equation:

where y is the true label (0 or 1) and p is the predicted probability of the input image
belonging to the positive class.

To optimize the model, you will need to use an optimizer such as stochastic gradient descent
(SGD) or Adam. SGD updates the model parameters using the following equation:

18
w = w - learning_rate * gradient

where w is the model parameter, learning_rate is a hyperparameter that controls the step size
of the update, and gradient is the gradient of the loss function with respect to the model
parameter.

Adam is a variant of SGD that uses an adaptive learning rate, which can help the model
converge faster. The Adam update equation is given by:

m = β1 * m + (1 - β1) * gradient v = β2 * v + (1 - β2) * gradient^2 w = w - learning_rate * m


/ (sqrt(v) + epsilon)

where m and v are moving averages of the gradient and the squared gradient, respectively,
and β1 and β2 are hyperparameters that control the decay rate of the moving averages.
epsilon is a small constant added to the denominator to prevent division by zero.

MonkeyPox Detector Web Application

This section describes the proposed model in the form of a web application to interact with it
and predict the result based on the image uploaded.
1. This is the entrypoint of the web application, where a section to upload the image is
present, user(patient) can either select from directory or drag n drop the image to
upload.

Figure 4: Web app front page

19
2. This is how the image looks after uploading to the app, a label is attached to the
uploaded image which shows the name and the extension. Our app supports image of
extensions(jpg, jpeg, png, webp).

Figure 5: After uploading of image

3. After uploading the image, the predict button is enabled to be able to predict the
result based on it.

4. On clicking the predict button, image is being sent to the backend where a flask
backend is running to accept the image and by using the saved EfficientNetB3 model,
it predicts the class of the skin lesion.

5. After acquiring the result, one more script is added which is used to extract the image
from the last layer of the model based on which the prediction is being made.

6. Then this result along with the heatmap(image from the last layer) is sent to the
frontend to show the same.

20
Figure 6: Prediction along with heatmap

7. The frontend app shows the received response in the terms of an image(heatmap)
with its label, and a result in % of which class it may belong.

8. Result is displayed with a warning/disclaimer regarding the correctness of the


prediction.

21
Results
In this section, we present the classification performance of four state-of-the-art deep
learning models as summarized in Table below. In fig below, we have shown the accuracy
and the Model loss in the form of a graph. For each model, two seperate graphs has been
plotted. The first graph represents the accuracy for the specified model while the second
graph represents the model loss for the model.

Figure 7: Accuracy and Model loss in the form of a graph

22
It can be observed that the MobileNetV2 has the best accuracy of around 98\%. As a result,
we will we using this model in our web app(Fig .9), which can be used by users to upload
any image and check if the uploaded image has monkeypox in it or not.

Table 2: Results metrics of the tested models

Model Accuracy Precision Recall F1 Score

ResNet50 0.9295 0.93 0.93 0.93

MobileNetV2 0.9517 0.95 0.95 0.95

InceptionV3 0.9265 0.93 0.93 0.95

EfficientNetB3 0.9748 0.97 0.97 0.97

It has been found out that among various models tested, EfficientNetB3 has the best
validation accuracy. For this particular usecase, this model can be the best and optimal to use
for determination of monkeypox in skin lesion images.

EfficientNetB3 has an validation accuracy of 97.48%

23
Conclusion, Summary and Future Scope

Conclusion and Summary

In this study, we have introduced the open-source "MonkeyPox Dataset" for the automatic
detection of monkeypox from skin lesions, and conducted an initial feasibility study using
state-of-the-art deep learning architectures (ResNet50, InceptionV3, MobileNetV2 and
EfficientNetB3) through the transfer learning approach.
In this study, we have also conducted the two classes in our binary classification i.e.,
monkeypox or not monkeypox. We also observed that deep learning models may overfit or
underfit, possibly due to the trade-off between the size of the training sample and the number
of trainable parameters in the model. Therefore, to achieve better classification accuracy and
take full advantage of state-of-the-art deep learning models, it is important to ensure a larger
sample size for model training. We also found that lighter deep learning models with fewer
trainable parameters have potential for pox classification, and could potentially be used on
web apps for monkeypox diagnosis in the event of an outbreak. Additionally, the use of
digital skin images for monkeypox detection can facilitate remote diagnosis by healthcare
professionals, leading to early isolation of patients and effective containment of community
transmission.
We hope that this dataset will provide new opportunities for researchers to develop remotely
deployable computer-aided diagnostic tools for widespread screening and early detection of
monkeypox, particularly in situations where traditional testing methods are not available.
Additionally, we believe that our soft-launched web-app prototype will allow monkeypox
suspects to conduct preliminary screening from the comfort of their own homes and take
appropriate action in the early stages of infection.

Future Scope

According to us, future scope should be to enhance the dataset with more number of
dermatological images achieved by government sources. Training of models on more
accurate and volume of images can result into an unbiased as well as almost correct output.
This study can be extended by integrating this prediction to work for more types of skin

24
related diseases. Most of the skin diseases results into similar types of patches on the skin,
making it difficult to distinguish and determine the possible disease, any non-clinical
approach can then be used to have an advantage in this case.

Also, our plan is to extend our webapp to have a feature to capture the image which will be
uploaded instantaneously through which the app will become more intuitive and real-time
workable.

25
References

[1] J. Rockstroh, R. Palich, S. Walmsley, L. B. Harrison, A. Antinori, J. P. Thornhill, S.


Barkati, A. Nori, I. Reeves, M. S. Habibi et al., "Monkeypox virus infection in humans
across 16 countries—april–june 2022," New England Journal of Medicine, 2022.

[2] I. Babkin, A. Totmenin, P. Safronov, L. Sandakhchiev, V. Gutorov, O. Ryazankina, N.


Petrov, S. Shchelkunov, M. Mikheev, E. Uvarova, L. Sandakhchiev et al., "Analysis of the
monkeypox virus genome," Virology, vol. 297, no. 2, pp. 172–194, 2002.

[3] F. Lienert, B. Hoet, L. Chen, E. M. Bunge, L. R. Baer, H. Weidenthaler, and R. Steffen,


"The changing epidemiology of human monkeypox—a potential threat? a systematic
review," PLoS neglected tropical diseases, vol. 16, no. 2, p. e0010141, 2022.

[4] D. N. Forthal, G. Lippi, B. M. Henry, J. G. Rizk, and Y. Rizk, "Prevention and treatment
of monkeypox," Drugs, pp. 1–7, 2022.

[5] N. Sklenovska and M. Van Ranst, "Emergence of monkeypox as the most important
orthopoxvirus infection in humans," Frontiers in public health, vol. 6, p. 241, 2018.

[6] A. Akbarimajd, N. Hoertel, M. A. Hussain, A. A. Neshat, M. Marhamati, M. Bakhtoor,


and M. Momeny, "Learning-to-augment incorporated noise-robust deep cnn for detection of
covid-19 in noisy x-ray images," Journal of Computational Science, p. 101763, 2022.

[7] M. Momeny, A. A. Neshat, M. A. Hussain, S. Kia, M. Marhamati, A. Jahanbakhshi, and


G. Hamarneh, "Learning-to-augment strategy using noisy and denoised data: Improving
generalizability of deep cnn for the detection of covid-19 in x-ray images," Computers in
Biology and Medicine, vol. 136, p. 104704, 2021.

[8] M. Dogucu and M. C¸etinkaya-Rundel, "Web scraping in the statistics and data science
curriculum: Challenges and opportunities," Journal of Statistics and Data Science Education,
vol. 29, no. sup1, pp. S112–S122, 2021.

[9] T. Islam, M. A. Hussain, F. U. H. Chowdhury, and B. R. Islam, "A web-scraped skin


image database of monkeypox, chickenpox, smallpox, cowpox, and measles," bioRxiv, 2022.

[10] M. M. Ahsan, M. R. Uddin, and S. A. Luna, “Monkeypox image data collection,” arXiv
preprint arXiv:2206.01774, 2022.

[11] S. N. Ali, M. Ahmed, J. Paul, T. Jahan, S. Sani, N. Noor, T. Hasan et al., “Monkeypox
skin lesion detection using deep learning models: A feasibility study,” arXiv preprint
arXiv:2207.03342, 2022.

26
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
770–778.

[13] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected
convolutional networks,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 4700–4708.

[14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception
architecture for computer vision,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, 2016, pp. 2818–2826.

[15] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le,


“Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2820–2828.

[16] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2:


Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on
computer vision and pattern recognition, 2018, pp. 4510–4520.

[17] Sahin, V. H., Oztel, I. & Oztel, G. Y. (2022). Human Monkeypox Classification from
Skin Lesion Images with Deep Pre-trained Network using Mobile Application. Journal of
Medical Systems, 46(11). https://doi.org/10.1007/s10916-022-01863-7.

27
Appendices
I. Vision Transformers (ViT): Vision Transformers (ViT) is a type of
transformer-based neural network architecture that has been specifically designed for
image recognition tasks. They were introduced in a paper by Alexey Dosovitskiy and
Thomas Brox in 2021.
Unlike traditional convolutional neural networks (CNNs), which use a combination
of convolutional layers and fully connected layers to process images, ViTs use
self-attention layers to process the input image. The self-attention layers allow the
model to attend to different parts of the image and integrate information from
different spatial locations. This makes ViTs more flexible and allows them to perform
well on a wide range of image recognition tasks.
One of the key advantages of ViTs is that they can process images of any size without
needing fixed-size input or multiple scales. This makes them well-suited for tasks
like object detection and segmentation, where the size of the objects in the image can
vary significantly.
ViTs have achieved state-of-the-art performance on a number of image recognition
benchmarks and are being actively researched and developed by the machine learning
community.

II. React: React (also known as React.js or ReactJS) is a free and open-source front-end
JavaScript library for building user interfaces based on UI components. It is
maintained by Meta (formerly Facebook) and a community of individual developers
and companies.React can be used as a base in the development of single-page,
mobile, or server-rendered applications with frameworks like Next.js. However,
React is only concerned with state management and rendering that state to the DOM,
so creating React applications usually requires the use of additional libraries for
routing, as well as certain client-side functionality.

28
III. Flask: Flask is a micro web framework written in Python. It is classified as a
microframework because it does not require particular tools or libraries. It has no
database abstraction layer, form validation, or any other components where
pre-existing third-party libraries provide common functions. However, Flask supports
extensions that can add application features as if they were implemented in Flask
itself. Extensions exist for object-relational mappers, form validation, upload
handling, various open authentication technologies and several common framework
related tools

29
Research Paper

30
31
32
33
34
35
36
37
Research Paper Conference Submission

38

You might also like