ICMR - Reproducible AI in Medicine and Health

Annexure I
FORMAT FOR SUBMISSION OF FULL PROPOSAL
1. Title: Detecting Covid from CT Images using Autoencoders

2. Contact details of Coordinator/PI/Co-PIs
Dr.Om Kumar C.U, (AP / CSE, Easwari engineering College Ramapuram ch-89
TN India, omkumar.cu@eec.srmrmp.edu.in, 9952035395).
Mr. KPK Devan (ASP/ CSE, Easwari engineering College Ramapuram ch-89 TN
India, parimalakanagadevan.k@eec.srmrmp.edu.in, 9841726794).
3. Summary (up to 250 words): There are various tests now detecting for Covid
19, however, each of these tests has its own disadvantages. Of these tests, one of
the efficient ways to detect covid 19 is from computed tomography (CT scans)
images. CT scans are much more efficient than X-ray images. However, these
scans cannot yield results directly by just looking at it. Hence it is imperative for
us to use ML and DL for fine classification and better results. The primary
algorithm used for classification is convolution neural network which gave
outstanding results, but the complexity of the program was drastically
increased. In this paper, we will be using autoencoders for the data
augmentation process so that we extract only the important features from the
images. 2+2 layer autoencoder was used to find out the last layer used for image
augmentation. This is followed by designing and developing 3 autoencoders
which yielded a satisfactory result. Finally, the images were fed into various
models designed to classify if the image belonged to the covid class or not. To
avoid overfitting only 2 CNN followed by a flattening layer followed by two
dense layers. Another step is taken to avoid overfitting by using regularisers
like dropout and kernel regularizations. For the testing accuracy, we converted
the 256,256,1 images to 64,64,192 using the three encoders we trained. Our
model is proposed to achieve an average accuracy of 83%
4. Background (up to 500 words):
The Coronavirus caused severe acute respiratory syndrome (SARS). All
COVID-19 virus-infected patients developed mild to severe respiratory illness
symptoms. COVID-19 spreads quickly over the world. COVID-19 sickness was
declared a global pandemic by the World Health Organization. One of the
causes for the disease's rapid spread is the inefficiency of global detection. Since
the isolation and genome sequencing of the COVID-19 virus, the nucleic acid
detection kit method (TR-PCR method) and COVID-19 nucleic acid sequencing
method have been used to detect the virus. The TR-PCR procedure takes at least
4 hours to get test results. The nucleic acid sequencing process takes
substantially longer. This virus is transmitted from person to person via
respiratory organs, resulting in fast virus transmission. The reagents and
equipment needed for these diagnostic tests will be somewhat scarce in some
countries and regions, delaying the early detection of infected people, which led
to the rapid spread of COVID-19 over the world.
We focused on the global issue of coronavirus illness identification using chest

radiography pictures in this work. Early investments in defining this viral
infection can significantly improve the epidemic response in terms of swift and
ongoing surveillance and subject evaluation. The patient is scanned using
computer tomography (CT), X-ray, and ultrasound techniques to identify the
existence and severity of Covid-19.
The screening of patients in primary health hospitals is the initial step in the
treatment of COVID-19. Although transcription-polymerase chain reaction
(PCR) tests are still used for final diagnosis, in the case of persons with severe
respiratory symptoms, hospitals now rely on medical imaging since it is easy
and quick, allowing doctors to identify diseases and their effects more quickly.
This problem can be solved using deep learning methods. The goal of deep
learning is to develop a multi-hidden layer machine learning model that is
trained with a vast quantity of sample data to enhance classification and
prediction accuracy.
When resources are few, medical centres can consider patients' needs and make
the most use of what they have. Deep neural networks have been shown to be
particularly effective in the early identification of COVID-19. Reconstruction
techniques using autoencoders on CT images have been performed for covid-19
detections. U-Net-based architectures have been used to partition many infected
regions in the chest CT images.
The autoencoders combined with convolutional neural network
(CNN) has remarkably gave good results. Though there lots of studies revolved
around this concept, one major issue was either the model has been trained with
a small data set or the model involves large time and space complexity.
In this paper we will focusing on the following things. To yield better results in
terms of time and space complexity with minimum autoencoders in the section.
We used convolutional autoencoder for data augmentation and feature
extraction. Once the necessary features have been extracted the data is fed into
the CNN where after processing, it will state whether the image fed is covid
positive or covid negative.
5. Literature review (up to 1000 words):

 This section contains the literature survey taken for this paper. One of the
early learning model was given by [9] where they used modified inception
transfer-learning model. The model used 1065 CT images out of which 325
images were of covid images and 740 images corresponds to viral
pneumonia. The model gave an accuracy of 79.3% and a specificity of 0.83.
The model gave by [10] was 2D deep convolutional neural network which
had 970 CT images. Out of which there were 496 CT images of patients with
confirmed cases. This model achived an accuracy of 94.98%.
 In the paper [11], the author described the use of 3-dimentioanl deep learning
model with total of 618 CT samples and achieved an accuracy of 86.7%. In
papers [12, 13] COVID-19 detection neural network (COVnet) was used
with different architectures. They achieved 86.7% and 95% individual
accuracy respectively. A novel CNN-AE approach has been used in [14].
This model was quite efficient and produced good results compared to its
parent model ( CNN exclusive). The model’s accuracy was 96.05%.
 Random forest algorithm[15] was used on the clinically available blood tests
results. This yieled an accuracy of 97.95%. But the catch here is, it was
performed on the blood test results. Another deep transfer learning model
combined with Conditional Generative Adversarial Nets (CGAN) [16] was
performed in the CT images with 5 different deep CNN models (AlexNet,
VGGNet16, VGGNet19, GoogleNet, and ResNet50) gave a accuracy of
82.91%.
 An unsupervised deep learning model which used variational autoencoders

[17] which also involves the used of Adaptive Wiener filtering (AWF) gave
an higher accuracy of 98.7%. Moreover, in this mode, Inception v4 with
adagard technique is used for feature extraction. Another new model was
proposed with anatomical structure on deep neural network, which relies on
abundance of labelled data for proper training. This paper [18] used stacked
autoencoders for area object detection model on Mask-RCNN. The paper
[1] emphasised more on use of stacked autoencoders and achieved an
average accuracy, precision, recall, and F1-score rate of 94.7%, 96.54%,
94.1%, and 94.8%.
 Instead of lung CT images, chest X-rays were also used for detection. For
this, deep CNN [19] were used and accuracy achieved was 98.04%. For
detecting early cases, deep convolutional autoencoder [20] was used on
chest X-rays which gave an accuracy of 76.52%.
 Efficient framework was designed in [21], to exploit powerful features

combined with random forest algorithm produced an accuracy of 97.78%.
Another model uses bifurcated autoencoder for segmentation for COVID-
19 infected regions [22]. This model uses a shared encoder and a bifurcated
connection to two separate decoders. One decoder is for segmentation of the
healthy region of the lungs, while the other is for the segmentation of the
infected regions.
 These COVID-19 chest CT aided diagnostic models based on deep learning

are based on a small number of COVID-19 CT datasets. These models'
performance measures, such as accuracy and recall rate, do not meet the
standards for genuine COVID-19 detection. Furthermore, the use of deep
learning approaches to identify and detect new COVID-19 in chest CT is
still in its early stages. As a result, the goal of this study is to present a novel
architecture of deep learning classifiers to help radiologists identify COVID-
19 in chest CT scans automatically.
REFERENCES
1. Li D, Fu Z, Xu J. Stacked-autoencoder-based model for COVID-19 diagnosis on CT

images. Applied Intelligence. 2021 May;51(5):2805-17.
2. Mansour RF, Escorcia-Gutierrez J, Gamarra M, Gupta D, Castillo O, Kumar S.
Unsupervised deep learning based variational autoencoder model for COVID-19
diagnosis and classification. Pattern Recognition Letters. 2021 Nov 1;151:267-74.
3. Yang D, Martinez C, Visuña L, Khandhar H, Bhatt C, Carretero J. Detection and analysis
of COVID-19 in medical images using deep learning techniques. Scientific Reports. 2021
Oct 4;11(1):1-3.
4. Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G, Van Der
Laak JA, Hermsen M, Manson QF, Balkenhol M, Geessink O. Diagnostic assessment of
deep learning algorithms for detection of lymph node metastases in women with breast
cancer. Jama. 2017 Dec 12;318(22):2199-210.
5. Amin J, Sharif M, Yasmin M, Fernandes SL. Big data analysis for brain tumor detection:
Deep convolutional neural networks. Future Generation Computer Systems. 2018 Oct
1;87:290-7.
6. Pastur-Romay LA, Cedrón F, Pazos A, Porto-Pazos AB. Deep artificial neural networks
and neuromorphic chips for big data analysis: pharmaceutical and bioinformatics
applications. International journal of molecular sciences. 2016 Aug;17(8):1313.
7. Joloudari JH, Haderbadi M, Mashmool A, GhasemiGol M, Band SS, Mosavi A. Early
detection of the advanced persistent threat attack using performance analysis of deep
learning. IEEE Access. 2020 Oct 6;8:186125-37.
8. Chen X, Yao L, Zhang Y. Residual attention u-net for automated multi-class segmentation
of covid-19 chest ct images. arXiv preprint arXiv:2004.05645. 2020 Apr 12.
9. Wang S, Kang B, Ma J, Zeng X, Xiao M, Guo J, Cai M, Yang J, Li Y, Meng X, Xu B. A deep
learning algorithm using CT images to screen for Corona Virus Disease (COVID-19).
European radiology. 2021 Aug;31(8):6096-104.
10. Jin C, Chen W, Cao Y, Xu Z, Tan Z, Zhang X, Deng L, Zheng C, Zhou J, Shi H, Feng J.
Development and evaluation of an AI system for COVID-19.
11. Xu X, Jiang X, Ma C, Du P, Li X, Lv S, Yu L, Ni Q, Chen Y, Su J, Lang G. A deep learning
system to screen novel coronavirus disease 2019 pneumonia. Engineering. 2020 Oct
1;6(10):1122-9.
12. Li L, Qin L, Xu Z, Yin Y, Wang X, Kong B, Bai J, Lu Y, Fang Z, Song Q, Cao K. Artificial
intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT.
Radiology. 2020 Mar 19.
13. Wang L, Lin ZQ, Wong A. Covid-net: A tailored deep convolutional neural network
design for detection of covid-19 cases from chest x-ray images. Scientific Reports. 2020
Nov 11;10(1):1-2.
14. Khozeimeh F, Sharifrazi D, Izadi NH, Joloudari JH, Shoeibi A, Alizadehsani R, Gorriz JM,
Hussain S, Sani ZA, Moosaei H, Khosravi A. Combining a convolutional neural network
with autoencoders to predict the survival chance of COVID-19 patients. Scientific Reports.
2021 Jul 28;11(1):1-8.
15. Wu J, Zhang P, Zhang L, Meng W, Li J, Tong C, Li Y, Cai J, Yang Z, Zhu J, Zhao M. Rapid
and accurate identification of COVID-19 infection through machine learning based on
clinical available blood test results. MedRxiv. 2020 Jan 1.
16. Loey M, Manogaran G, Khalifa NE. A deep transfer learning model with classical data
augmentation and CGAN to detect COVID-19 from chest CT radiography digital images.
Neural Computing and Applications. 2020 Oct 26:1-3.
17. Mansour RF, Escorcia-Gutierrez J, Gamarra M, Gupta D, Castillo O, Kumar S.
Unsupervised deep learning based variational autoencoder model for COVID-19
diagnosis and classification. Pattern Recognition Letters. 2021 Nov 1;151:267-74.
18. Okolo GI, Katsigiannis S, Althobaiti T, Ramzan N. On the Use of Deep Learning for
Imaging-Based COVID-19 Detection Using Chest X-rays. Sensors. 2021 Jan;21(17):5702.
19. Khan SS, Khoshbakhtian F, Ashraf AB. Anomaly detection approach to identify early
cases in a pandemic using chest X-rays. arXiv preprint arXiv:2010.02814. 2020 Oct 6.
20. Goel C, Kumar A, Dubey SK, Srivastava V. Efficient deep network architecture for covid-
19 detection using computed tomography images. medRxiv. 2020 Jan 1.
21. Yazdekhasty P, Zindari A, Nabizadeh-ShahreBabak Z, Roshandel R, Khadivi P, Karimi
N, Samavi S. Bifurcated autoencoder for segmentation of COVID-19 infected regions in
CT images. InInternational Conference on Pattern Recognition 2021 Jan 10 (pp. 597-607).
Springer, Cham.
22. Sokolova Lapalme G.systematic analysis of performance measures for classification
tasks. Information processing & management. 2009 Jul 1;45(4):427-37.
6. Novelty/Innovation :
The novelty prescribed in this project is that, we are proposing a new variant of
convolutional autoencoder to identify and detect the presence of Covid 19 using
CT scan images. The model will use 3 autoencoders (AE) as of now, and can be
increased based on the performance and requirements. The model of these 3 AE
is designed such that, it gives better performance than other models. These 3
autoencoders are designed uniquely. The output from the autoencoder is then
run in the neural networks for classification. The result is displayed as ‘Covid’
or ‘No covid’.
7. Study Objectives:
 To detect covid-19 from computed topography images (CT scans) using
autoencoders and convolutional neural networks.
 With the help of convolution neural network, we classify the given
input image as either covid or no covid.
8. Relevant AI expertise available: mention Name, Position and Expertise.

9. Infrastructure available (Specific major equipment and facilities available with
reference to this proposal):
 OS: Windows/Linux/Mac OSDevelopment
 Tools: Data Augmentation : anaconda platform/Google colab,
Tensorflow, Numpy, Keras installed
 Data Labelling : Labelling tool/any suitable labelling toolkits
 Algorithm Build : Google colab/Google colab pro
 Language: python 3.6 and above
 Test Bed: Google Colab, Colab tpu, paperspace gradient
 Storage: GoogleDrive, Google Cloud or any cloud storage
 Hardware: A system with high gpu and graphic card
10.Brief description of the previous work in this area:
One of the causes for the disease's rapid spread is the inefficiency of global
detection. Since the isolation and genome sequencing of the COVID-19 virus,
the nucleic acid detection kit method (TR-PCR method) and COVID-19 nucleic
acid sequencing method have been used to detect the virus. The TR-PCR
procedure, on the other hand, takes at least 4 hours to get test results. The nucleic
acid sequencing process, on the other hand, takes substantially longer.
The patient is scanned using computer tomography (CT), X-ray, and
ultrasound (US) techniques to identify the existence and severity of Covid-19.It’s
comparatively found that evolution of CT images are efficient than other
techniques in identifying the existence of Covid-19. Reconstruction techniques
using autoencoders on CT images have been performed for covid-19 detections.
The autoencoders combined with convolutional neural network (CNN) has
remarkably given good results.
Hence, we used convolutional autoencoder for data augmentation and
feature extraction. Once the necessary features have been extracted the data is fed
into the CNN where after processing, it will state whether the image fed is covid
positive or covid negative.
11.Methodology and Technical Work Plan.

Our proposed system consists of the following:
Training data was used to create multiple autoencoder architectures and finally
used a 2+2 layer autoencoder, based on the training loss(MSE) when recreating
the image from the encoder
 The images were in the dimension of (256,256,1). Once it pases through all
the three autoencoders, our image dimension would be (64,64,192).
 Moreover, our training and validation data increased as the 523 and 77
images got multiplied by 3.
 Images of dimension (64,64,192) that were encoded using the three
autoencoders were fed into various models designed to classify if the
image belonged to the covid class or not.
 To avoid overfitting only 2 CNN followed by a flattening layer followed
by two dense layers.
 Another step is taken to avoid overfitting by using regularisers like
dropout and kernel regularizations.
 Max pooling is also performed to reduce the computation time.
12.Data:
The dataset used for this paper is obtained from https://github.com/UCSD-
AI4H/COVID-CT.
The use of this dataset has been proven by a senior radiologist at Tongji Hospital
in Wuhan, China, who diagnosed and treated a significant number of COVID-
19 patients between January and April during the disease's epidemic.
To summarise the available data,

TRAIN VALIDATION TEST TOTAL
COVID 245 36 71 361

NO-COVID 278 41 81 400
TOTAL 523 77 152 761
○ We split this data into 70:10:20

■ 70% - Training Data
■ 10% - Validation Data
■ 20% - Testing Data
13.Ethics Review:
The data used is publicly available dataset on github. The data collected by the
author has been approved clinically and the laws have not been broken.
14.Project Implementation Plan:
The project is proposed to have the following milestones.
 Milestone 1- to find the last layer of the autoencoder for finding out the
features
 Milestone 2- configure 3 autoencoders and running it
 Milestone 3- running it with CNN and finding out the results
Milestone 1 is expected to take around 15 days of duration

Milestone 2 is expected to take 25 days of duration
Milestone 3 is expected to take around 20 days
Overall, the project is expected to take 60 days from the day of commencement.
15.Budget:
Non-Recurring
S.No Description Amount for Quantity Amount in
1unit in Rs. Rs.
1. High End Servers
PowerEdge XR11 Rack Server
One 3rd Generation Intel® Xeon® Scalable 2,00,000 2 4,00,000

processor with up to 36 cores
Intel® C620 series chipset

DIMM Speed Up to 3200 MT/s
LRDIMM
Intel® Optane™ PMem 200 Series
PERC H755, PERC H345, HBA355i PERC

H840, HBA355e
RAID S150
Boot Optimized Storage Subsystem

2. Software tools
Neural Designer
13,28,000 1 13,28,000
Keras
Tableau
3. Computer systems(nodes)
Lenovo Thinkpad
10th Generation Intel® Core™ i7-10750H 19,5000 10 19,50,000
Processor (6 Cores / 12 Threads, 2.60 GHz, up to
5.00 GHz with Turbo Boost, 12 MB Cache)
Total Rs. 36,78,000
Recurring
S.No Description 1 Year 2 Year

st nd
in Rs. in Rs.
1. Travel 1,00,000 1,00,000
2. Salary of Research Fellow 4,00,000 4,00,000

3. Stationery 30,000 30,000
4. Miscellaneous 25,000 25,000
Total 5,55,000 5,55,000

Non Recurring Rs. 36,78,000
Recurring Rs. 11,10,000
Grand Total Rs. 47,88,000
16. Expected Outcomes: The model is projected to give an accuracy of 83%. This
accuracy can be increased more with the increase in the number of autoencoders.

ICMR - Reproducible AI in Medicine and Health

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ICMR - Reproducible AI in Medicine and Health

Uploaded by

Copyright:

Available Formats

Annexure I

FORMAT FOR SUBMISSION OF FULL PROPOSAL

1. Title: Detecting Covid from CT Images using Autoencoders

We focused on the global issue of coronavirus illness identification using chest

5. Literature review (up to 1000 words):

 An unsupervised deep learning model which used variational autoencoders

 Efficient framework was designed in [21], to exploit powerful features

 These COVID-19 chest CT aided diagnostic models based on deep learning

1. Li D, Fu Z, Xu J. Stacked-autoencoder-based model for COVID-19 diagnosis on CT

8. Relevant AI expertise available: mention Name, Position and Expertise.

11.Methodology and Technical Work Plan.

To summarise the available data,

COVID 245 36 71 361

○ We split this data into 70:10:20

Milestone 1 is expected to take around 15 days of duration

PowerEdge XR11 Rack Server

One 3rd Generation Intel® Xeon® Scalable 2,00,000 2 4,00,000

Intel® C620 series chipset

PERC H755, PERC H345, HBA355i PERC

Boot Optimized Storage Subsystem

S.No Description 1 Year 2 Year

2. Salary of Research Fellow 4,00,000 4,00,000

4. Miscellaneous 25,000 25,000

Total 5,55,000 5,55,000

You might also like