First Progress Report

First Progress Report on
Chest X-Ray disease classification Using Deep Learning

Submitted in the partial fulfillment of the Degree of
Bachelor of Technology
(Computer Science and Engineering)
Submitted by:
Gaurav Chaudhary (03815002718)
Keshav Bansal (05115002718)
Kumar Tatsat (05415002718)
Under the supervision of

Mr. Sushil Kumar
Department of computer Science and Engineering

Maharaja Surajmal Institute of Technology
Janakpuri, New Delhi
2018-22
INDEX
1. Title page
2. Abstract (also contains problem statement)
3. Introduction
4. Literature survey
5. Research gaps
6. Objectives
7. Methodology of study
8. Work left to do.
9. Limitations
10. References
Abstract
In this project, we will implement a deep learning pipeline for disease classification using a deep
learning model. We also aim to present a comparison of various deep learning models for this
application in cohesion with our data preprocessing pipeline. We want to examine if smaller
models can perform as well as some deep CNNs because of data preprocessing. For this we are
using OpenCV and PyTorch.
Introduction
Every year millions of adults are diagnosed with pulmonary pathologies like: pulmonary
atelectasis, cardiomegaly, pulmonary effusion, infiltration, pulmonary mass, nodule, pneumonia,
pneumothorax [2,3], etc. According to the Indian Council of Medical Research, in 2019 alone,
about 1.6 million people died due to COPD (Chronic obstructive pulmonary disease) [24]. Also,
the COVID-19 pandemic truly took our healthcare infrastructures abase, causing millions of
people to die due to breathing difficulty caused by pulmonary fibrosis (a symptom of Covid-19).
CXRs (Chest X-Rays) are currently one of the best available methods for diagnosing pulmonary
pathologies [3], playing a crucial role in clinical healthcare and epidemiological studies. But,
diagnosing the X-ray images with correct pathology is an involved task and relies heavily upon
the availability of expert radiologists.
In this project, we propose a novel deep learning based pipeline for classifying aforementioned
pulmonary pathologies from CXRs. While other researchers have used deep learning for this
application [7], we are using a combination of preprocessing techniques such as Bone
Suppression [13] and HE (Histogram Equalization)[25] to enhance our model’s capability.
Our project aims to help our healthcare workers provide quality diagnosis even to the most
remote places quickly and cheaply.
Literature Survey
The application of Deep Convolutional Neural Networks for disease classification using CXRs
really became popular with the RSNA Pneumonia Detection Challenge [1] on Kaggle after the
Stanford team consisting of Andrew Ng et al. submitted their CheXnet model [7], which was
based on a Dense CNN. The CheXnet team used Chest X-ray 14 dataset which contained about
112000 X-rays with annotations of up to 14 different thoracic pathologies, released by Wang et
al. in 2017 [5,6]. Concomitantly, they published a research paper on CXRs disease classification
in which they shared their model’s performance. CheXnet achieved F1 scores of around 0.435
which was significantly higher than compared to the F1 score, 0.387 of Stanford radiologists.
Their paper proved the obtainability of Deep Learning techniques for chest X-ray disease
classification.
Then later in 2020, a group from University of Saskatchewan along with researchers from
National Institute of Technology, Trichy, India and International Road Dynamics, Canada
published their research with COVID CXRs and a CheXNet based Deep CNN called CXNet [8].
The research team also published their base model’s performance, which was comparable to
CheXNet even though it had a significantly lesser number of parameters. To achieve this, the
team used various data pre-processing techniques like, lung segmentation [11] and CLAHE and
BEASF [25,16] on CXRs. They were able to improve on some fault points of CheXnet. CheXnet
was often found distracted by the textual data on images, a problem which the team tackled by
lung segmentation using U-nets, forcing the model to focus on the required ROI. Their final
model, Hierarchical Multi-class COVID-CXNet, achieved an F1 score of 0.94 for binary
classification and 0.85 for 3 class classification. Although their model performed very well, they
pointed out a major problem of lack of data for training. They trained their model on only about
7700 [9] images available for COVID detection.
Furthermore, image enhancement techniques based on histogram equalization for medical

images is specially necessary to increase image contrast to make non-linearities more
distinguishable. There are various image enhancement methods based on histogram
equalization, like: DHE, BEASF, CLAHE, etc. Radiologists also use manual contrast
improvement to diagnose mass and nodules better.
Apart from the aforementioned techniques, researchers have also tried bone suppression [12, 13]
to increase the model performance. Bone suppression separates the soft tissues from the bones,
removing some distractions for the deep learning models. Researchers of this [12] paper proved
the efficacy of this process in detecting tuberculosis from CXRs.
Research Gaps
1. CheXNet was often found distracted by the textual data on X-ray images, which can be
solved by lung-segmentation.
2. University of Saskatchewan used Lung Segmentation, BEASF and CLAHE in their data
preprocessing pipeline, but they left out bone suppression.
3. Although the model presented by researchers at University of Saskatchewan performed
well for detecting Covid even with lesser number of parameters compared to other
researchers mentioned in their paper, they stated the lack of data as a limitation for the
bias in their model.
4. Another group of independent researchers tried out bone shadow suppression, but left out
histogram-equalization and lung segmentation in their data preprocessing pipeline, which
have proved to be effective in increasing model performance.
Objectives
The objectives of our project are as follows:
1. Building upon the existing research on effectiveness of data preprocessing for medical
images, we want to find and implement an optimized data preprocessing pipeline that
achieves lung segmentation [11, 14] using neural networks like U-net, bone suppression
using an auto-encoders network and HE using techniques like BEASF, CLAHE, DHE
and others.
2. For increasing the model’s explainability, we want to implement the Grad-CAM [21]
algorithm, which would help determine if the model is looking at the correct places or
not.
3. For measuring model’s performance, we use F1 score for multi-class classification.
4. Lastly, we are proposing an Inception [19] based Deep CNN trained from scratch for this
application.
5. Along with our proposed model, we would compare the results with SqueezeNet based
and ResNet-34 deep learning models.
Research Methodology
Figure 1: Proposed DL Pipeline
Our research methodology can be described in the following steps:
1. Dataset preparation:
a. We are using the NIH Chest X-ray dataset [6] for our final disease classification.
b. We are combining the open-source COVID-19 dataset, “COVID-19 Image Data
Collection” [9] compiled by a team from IEEE.
c. For the Segmentation model, we are using datasets of Montgomery County chest
X-ray set (MC) and Shenzhen chest X-ray set from U.S. National Library of
Medicine [10].
d. For Bone suppression, we are using the BSE-JSRT dataset [13].
2. Bone Suppression Model Training: we’re using an auto-encoder based model for bone
suppression [22].
3. HE: for this step, we would use OpenCV [18] and python’s NumPy library.
4. Lung Segmentation: for this we’re using a U-Net model which has been proven to be
very effective in medical segmentation tasks [26], along with edge-dilation to preserve
the lung structure better.
5. In the final stage, we have chosen a deep learning model which would accept inputs from
the preprocessing pipeline and the reshaped x-ray image for context preserving [19]. For
the outputs, we would have a primary output path for pulmonary pathology classification
using softmax function and a secondary output for binary classification using a sigmoid
function for COVID-19 dataset. Primarily, we are using an inception architecture to keep
the number of parameters low while giving attention to different levels of details in the
x-ray images.
6. Along with the Inception based model, we would be using a SqueezeNet and a ResNet-34
model to compare the performance.
Methodology of Study
Objective-1:
1) Datasets :- (50% complete) We have collected the datasets required for the project,
details of whom are mentioned in the table below:
Name of dataset Size of Dataset # Data Classes

CXR8 119,631 9
Shenzhen Lung Segmentation Dataset 662 2
Montgomery Lung Segmentation Dataset 138 2
JSRT Bone Suppression Dataset 39 (4000 augmented) 2
IEEE Dataset for Covid Chest X-Rays 2
Table: Dataset Collection
For the CXR8 dataset, here are the details:
Name of pathology # of images
Atelectasis 5789
Cardiomegaly 1010
Effusion 6331
Infiltration 10317
Mass 6046
Nodule 1971
Pneumonia 1062
Pneumothorax 2793
Normal 84312
Total 119631
Table: CXR Classification Dataset

For lung segmentation, the following datasets are used:
Dataset name No. of X-Rays (w/masks)
Shenzhen Hospital CXR Set 326 normal, 336 abnormal
Montgomery County CXR Set 80 normal, 58 abnormal

Table: Lung Segmentation dataset
For Bone Suppression dataset: One hundred and fifty-four conventional chest radiographs with a
lung nodule and 93 radiographs without a nodule were selected from 14 medical centers and
were digitized by a laser digitizer with a 2048 x 2048 matrix size (0.175-mm pixels) and a 12-bit
gray scale. It was developed by the JSRT (Japanese Society of Radiological Telhnology).
Bone Suppression
2) Histogram Equalization: For this we tested three algorithms (CLAHE, BEASF and
DHE):
From the results in this image, we can see, CLAHE performs the best among the tested
algorithms.
Objective-4 (50% done)
1) Lung Segmentation: We have created all but one of the scripts for final testing. Below is
the model architecture.
Lung Segmentation (U-Net) model
Layer Output Shape # of Params
ModuleList1: - -
- DoubleConv 2-1 - -
- Sequential 3-1 [-1, 64, 512, 512] 37,696
MaxPool2D 1-1 [-1, 64, 256, 256] -
ModuleList1: - -
- Sequential 3-2 [-1, 128, 256, 256] 221,696
MaxPool2D 1-2 [-1, 128, 128, 128] -
ModuleList1: - -
- Sequential 3-3 [-1, 256, 128, 128] 885,760
MaxPool2d: 1-3 [-1, 256, 64, 64] -
ModuleList1: - -
- Sequential 3-4 [-1, 512, 64, 64] 3,540,992
MaxPool2d: 1-4 [-1, 512, 32, 32] -
DoubleConv: 1-5 - -
- Sequential 2-5 - -
- Conv2D 3-5 - 4,718,592
- BatchNorm2D 3-6 - 2,048
- ReLU 3-7 - -
- Conv2D 3-8 - 9,437,184
- BatchNorm2d 3-9 - 2,048
- ReLU 3-10 [-1, 1024, 32, 32] -
ModuleList: 1 - -
- ConvTranspose2d: 2-6 - 2,097,664
- DoubleConv: 2-7 - -
- Sequential: 3-11 [-1, 512, 64, 64] 7,079,936
- ConvTranspose2d: 2-8 - 524,544
- Sequential: 3-12 - 1,770,496
- Sequential: 3-13 [-1, 256, 128, 128] 442,880
- Sequential: 3-14 [-1, 64, 512, 512] 110,848
Conv2d: 1-6 [-1, 1, 512, 512] 65
Total Trainable parameters 31,036,481
Total parameters 31,036,481
While testing the model with just 1 image pair, we got these results:
2) Bone Suppression: We have created all but one of the scripts for model testing. Below is the
model architecture:
Bone Suppression (AE) model
Layer Output Shape Param #
Sequential: 1-1 [-1, 64, 64, 64] -

- Conv2d: 2-1 - 144
- LeakyReLU: 2-2 - -
- MaxPool2d: 2-3
- 4,608
- Conv2d: 2-4
- BatchNorm2d: 2-5 - 64
- MaxPool2d: 2-7 - -
- Conv2d: 2-8 - 18,432
- BatchNorm2d: 2-9 - 128
- LeakyReLU: 2-10 [-1, 16, 512, 512] -
- MaxPool2d: 2-11
[-1, 64, 64, 64] -
Sequential: 1-2 [-1, 1, 513, 513] -

- ConvTranspose2d: 2-14
[-1, 32, 129, 129] 4,624
- LeakyReLU: 2-15
- ConvTranspose2d: 2-16 - -
- Tanh: 2-17 [-1, 16, 257, 257] 145
-
Total Trainable parameters 46,609
Total parameters 46,609

3) Classification : We have created all but 2 of the scripts, one for training the models, and the
second for saving the training checkpoints. Below are the model architectures:
a) Inception based model: The model shown here is not the final, as one of the fully
connected layers currently has 96% of the model’s parameter weight. We are working on
updating this architecture.
Inception Based model

conv_block: 1-1 [-1, 32, 256, 256] -

- Conv2d: 2-1 [-1, 32, 256, 256] 1,600
- BatchNorm2d: 2-2 [-1, 32, 256, 256] 64
- LeakyReLU: 2-3
[-1, 32, 256, 256] -
MaxPool2D 1-2 [-1, 32, 128, 128] -
conv_block: 1-3 [-1, 64, 128, 128] -

- Conv2d: 2-4 [-1, 64, 128, 128] 18,496
- BatchNorm2d: 2-5 [-1, 64, 128, 128] 128
- LeakyReLU: 2-6
[-1, 64, 128, 128] -
MaxPool2D 1-4 [-1, 64, 64, 64] -
Inception_block: 1-5 [-1, 112, 64, 64] -

- conv_block: 2-7 [-1, 26, 64, 64] -
- Conv2d: 3-1 [-1, 26, 64, 64] 1,690
- BatchNorm2d: 3-2
[-1, 26, 64, 64] 52
- LeakyReLU: 3-3
- Sequential: 2-8 [-1, 26, 64, 64] -
- conv_block: 3-4 [-1, 46, 64, 64] -
- conv_block: 3-5 [-1, 32, 64, 64] 2,144
- Sequential: 2-9 [-1, 46, 64, 64] 13,386
- conv_block: 3-6 [-1, 20, 64, 64] -
- conv_block: 3-7 [-1, 16, 64, 64] 1,072
- Sequential: 2-10
[-1, 20, 64, 64] 8,060
- MaxPool2d: 3-8
- conv_block: 3-9 [-1, 20, 64, 64] -
[-1, 64, 64, 64] -
[-1, 20, 64, 64] 1,340
MaxPool2d: 1-6 [-1, 112, 32, 32] -
Inception_block: 1-7 [-1, 192, 32, 32] -

- conv_block: 2-11 [-1, 51, 32, 32] -
- Conv2d: 3-10 [-1, 51, 32, 32] 5,763
- BatchNorm2d: 3-11 [-1, 51, 32, 32] 102
- LeakyReLU: 3-12 [-1, 51, 32, 32] -
- Sequential: 2-12
[-1, 77, 32, 32] -
- conv_block: 3-13
- conv_block: 3-14 [-1, 51, 32, 32] 5,865
- Sequential: 2-13 [-1, 77, 32, 32] 35,574
- conv_block: 3-15 [-1, 38, 32, 32] -
- conv_block: 3-16 [-1, 13, 32, 32] 1,495
- Sequential: 2-14 [-1, 38, 32, 32] 12,464
- MaxPool2d: 3-17 [-1, 26, 32, 32] -
- conv_block: 3-18
[-1, 112, 32, 32] -
[-1, 26, 32, 32] 2,990
Inception_block: 1-8 [-1, 268, 32, 32] -

- conv_block: 2-15 [-1, 102, 32, 32] -
- Conv2d: 3-19 [-1, 102, 32, 32] 19,686
204
- BatchNorm2d: 3-20 [-1, 102, 32, 32]
-
- LeakyReLU: 3-21 [-1, 102, 32, 32] -
- Sequential: 2-16 [-1, 110, 32, 32] 8,580
- conv_block: 3-22 [-1, 44, 32, 32] 43,890
- conv_block: 3-23 [-1, 110, 32, 32] -
- Sequential: 2-17 [-1, 24, 32, 32] 1,560
- conv_block: 3-24 [-1, 8, 32, 32] 4,872
-
- conv_block: 3-25 [-1, 24, 32, 32]
-
- Sequential: 2-18 [-1, 32, 32, 32]
- MaxPool2d: 3-26 [-1, 192, 32, 32]
- conv_block: 3-27 [-1, 32, 32, 32]
Inception_block: 1-9 [-1, 370, 32, 32] -

- conv_block: 2-19 [-1, 110, 32, 32] -
- Conv2d: 3-28 [-1, 110, 32, 32] 29,590
- BatchNorm2d: 3-29 [-1, 110, 32, 32] 220
- LeakyReLU: 3-30 [-1, 110, 32, 32] -
- Sequential: 2-20 [-1, 180, 32, 32] -
- conv_block: 3-31 [-1, 90, 32, 32] 24,390
- conv_block: 3-32 [-1, 180, 32, 32] 146,340
- Sequential: 2-21 [-1, 40, 32, 32] -
- conv_block: 3-33 [-1, 20, 32, 32] 5,420
- conv_block: 3-34 [-1, 40, 32, 32] 20,120
- Sequential: 2-22 [-1, 40, 32, 32] -
- MaxPool2d: 3-35 [-1, 268, 32, 32] -
- conv_block: 3-36 [-1, 40, 32, 32] 10,840
MaxPool2d: 1-10 [-1, 370, 16, 16] -

Inception_block: 1-11 [-1, 832, 16, 16] -
- conv_block: 2-23 [-1, 256, 16, 16] -
- Conv2d: 3-37 [-1, 256, 16, 16] 94,976
- BatchNorm2d: 3-38 [-1, 256, 16, 16] 512
- LeakyReLU: 3-39 [-1, 256, 16, 16] -
- Sequential: 2-24 [-1, 320, 16, 16] -
- conv_block: 3-40 [-1, 160, 16, 16] 59,680
- conv_block: 3-41 [-1, 320, 16, 16] 461,760
- Sequential: 2-25 [-1, 128, 16, 16] -
- conv_block: 3-42 [-1, 32, 16, 16] 11,936
- conv_block: 3-43 [-1, 128, 16, 16] 102,784
- Sequential: 2-26 [-1, 128, 16, 16] -
- MaxPool2d: 3-44 [-1, 370, 16, 16] -
- conv_block: 3-45 [-1, 128, 16, 16] 47,744
AvgPool2d: 1-12 [-1, 832, 10, 10] -
Dropout: 1-13 [-1, 83200] -
Linear: 1-14 [-1, 384] 31,949,184
Dropout: 1-15 [-1, 384] -
Linear: 1-16 [-1, 128] 49,280
Dropout: 1-17 [-1, 128] -
Linear: 1-18 [-1, 9] 1,161

b) ResNet34 based model: Below is the model’s architecture:
ResNet-34 Based model
Conv2d: 1-1 [-1, 32, 256, 256] 1,568
BatchNorm2d: 1-2 [-1, 32, 256, 256] 64
LeakyReLU: 1-3 [-1, 32, 256, 256] -
MaxPool2d: 1-4 [-1, 32, 128, 128] -
Sequential: 1-5 [-1, 64, 128, 128] -

- block: 2-1 [-1, 64, 128, 128] -
- Conv2d: 3-1 [-1, 32, 128, 128] 1024
- BatchNorm2d: 3-2
[-1, 32, 128, 128] 64
- LeakyReLU: 3-3
- Conv2d: 3-4 [-1, 32, 128, 128] -
- BatchNorm2d: 3-5 [-1, 32, 128, 128] 2,048
- LeakyReLU: 3-6 [-1, 32, 128, 128] 128
- Conv2d: 3-7 [-1, 32, 128, 128] 2,176
- BatchNorm2d: 3-8 [-1, 64, 128, 128] -
- Sequential: 3-9 [-1, 64, 128, 128] -
- LeakyReLU: 3-10
[-1, 64, 128, 128] 2,048
- block: 2-2
- Conv2d: 3-11 [-1, 64, 128, 128] 64
- BatchNorm2d: 3-12 [-1, 64, 128, 128] -
- LeakyReLU: 3-13 [-1, 32, 128, 128] 9,216
- Conv2d: 3-14 [-1, 32, 128, 128] 64
- BatchNorm2d: 3-15 [-1, 32, 128, 128] -
- LeakyReLU: 3-16 [-1, 32, 128, 128] 2,048
- Conv2d: 3-17
[-1, 32, 128, 128] 128
- BatchNorm2d: 3-18
- LeakyReLU: 3-19 [-1, 32, 128, 128] -
[-1, 64, 128, 128]
[-1, 64, 128, 128]
[-1, 64, 128, 128]
Sequential: 1-6 [-1, 128, 64, 64] -

- block: 2-3 [-1, 128, 64, 64] -
- Conv2d: 3-20 [-1, 64, 128, 128] 4,096
- BatchNorm2d: 3-21
[-1, 64, 128, 128] 128
- LeakyReLU: 3-22
- Conv2d: 3-23 [-1, 64, 128, 128] -
- BatchNorm2d: 3-24 [-1, 64, 64, 64] 36,864
- LeakyReLU: 3-25 [-1, 64, 64, 64] 128
- Conv2d: 3-26 [-1, 64, 64, 64] -
- BatchNorm2d: 3-27 [-1, 128, 64, 64] 8,192
- Sequential: 3-28 [-1, 128, 64, 64] 256
- LeakyReLU: 3-29 [-1, 128, 64, 64] 8,448
- block: 2-4
[-1, 128, 64, 64] -
- Conv2d: 3-30
- BatchNorm2d: 3-31 [-1, 128, 64, 64] -
- LeakyReLU: 3-32 [-1, 64, 64, 64] 8,192
- Conv2d: 3-33 [-1, 64, 64, 64] 128
- BatchNorm2d: 3-34 [-1, 64, 64, 64] -
- LeakyReLU: 3-35 [-1, 64, 64, 64] 36,864
- Conv2d: 3-36 [-1, 64, 64, 64] 128
- BatchNorm2d: 3-37
[-1, 64, 64, 64] -
- LeakyReLU: 3-38
- block: 2-5 [-1, 128, 64, 64] 8,192
- Conv2d: 3-39 [-1, 128, 64, 64] 256
- BatchNorm2d: 3-40 [-1, 128, 64, 64] -
- LeakyReLU: 3-41 [-1, 128, 64, 64] -
- Conv2d: 3-42 [-1, 64, 64, 64] 8,192
- BatchNorm2d: 3-43 [-1, 64, 64, 64] 128
- LeakyReLU: 3-44
[-1, 64, 64, 64] -
- Conv2d: 3-45
- BatchNorm2d: 3-46 [-1, 64, 64, 64] 36,864
- LeakyReLU: 3-47 [-1, 64, 64, 64] 128
[-1, 64, 64, 64] -
[-1, 128, 64, 64] 8,192
[-1, 128, 64, 64] 256
[-1, 128, 64, 64] -
Sequential: 1-7 [-1, 256, 32, 32] -

- block: 2-6 [-1, 256, 32, 32] -
- Conv2d: 3-48 [-1, 128, 64, 64] 16,384
- BatchNorm2d: 3-49
[-1, 128, 64, 64] 256
- LeakyReLU: 3-50
- Conv2d: 3-51 [-1, 128, 64, 64] -
- BatchNorm2d: 3-52 [-1, 128, 32, 32] 147,456
- LeakyReLU: 3-53 [-1, 128, 32, 32] 256
- Conv2d: 3-54 [-1, 128, 32, 32] -
- BatchNorm2d: 3-55 [-1, 256, 32, 32] 32,768
- Sequential: 3-56 [-1, 256, 32, 32] 512
- LeakyReLU: 3-57
[-1, 256, 32, 32] 33,280
- block: 2-7
- Conv2d: 3-58 [-1, 256, 32, 32] -
- BatchNorm2d: 3-59 [-1, 256, 32, 32] -
- LeakyReLU: 3-60 [-1, 128, 32, 32] 32,768
- Conv2d: 3-61 [-1, 128, 32, 32] 256
- BatchNorm2d: 3-62 [-1, 128, 32, 32] -
- LeakyReLU: 3-63 [-1, 128, 32, 32] 147,456
- Conv2d: 3-64
[-1, 128, 32, 32] 256
- BatchNorm2d: 3-65
- LeakyReLU: 3-66 [-1, 128, 32, 32] -
- block: 2-8 [-1, 256, 32, 32] 32,768
- Conv2d: 3-67 [-1, 256, 32, 32] 512
- BatchNorm2d: 3-68
[-1, 256, 32, 32] -
- LeakyReLU: 3-69
- Conv2d: 3-70 [-1, 256, 32, 32] -
- BatchNorm2d: 3-71 [-1, 128, 32, 32] 32,768
- LeakyReLU: 3-72 [-1, 128, 32, 32] 256
- Conv2d: 3-73 [-1, 128, 32, 32] -
- BatchNorm2d: 3-74 [-1, 128, 32, 32] 147,456
- LeakyReLU: 3-75 [-1, 128, 32, 32] 256
- block: 2-9
[-1, 128, 32, 32] -
- Conv2d: 3-76
- BatchNorm2d: 3-77 [-1, 256, 32, 32] 32,768
- LeakyReLU: 3-78 [-1, 256, 32, 32] 512
- Conv2d: 3-79 [-1, 256, 32, 32] -
- BatchNorm2d: 3-80 [-1, 256, 32, 32] -
- LeakyReLU: 3-81 [-1, 128, 32, 32] 32,768
- Conv2d: 3-82 [-1, 128, 32, 32] 256
- BatchNorm2d: 3-83
[-1, 128, 32, 32] -
- LeakyReLU: 3-84
[-1, 128, 32, 32] 147,456
[-1, 128, 32, 32] 256
[-1, 128, 32, 32] -
[-1, 256, 32, 32] 32,768
[-1, 256, 32, 32] 512
[-1, 256, 32, 32] -
-
32,768
256
-
147,456
256
-
32,768
512
-
Sequential: 1-8 [-1, 512, 16, 16] -

- block: 2-10 [-1, 512, 16, 16] -
- Conv2d: 3-85 [-1, 256, 32, 32] 65,536
- BatchNorm2d: 3-86
[-1, 256, 32, 32] 512
- LeakyReLU: 3-87
- Conv2d: 3-88 [-1, 256, 32, 32] -
- BatchNorm2d: 3-89 [-1, 256, 16, 16] 589,824
- LeakyReLU: 3-90 [-1, 256, 16, 16] 512
- Conv2d: 3-91 [-1, 256, 16, 16] -
- BatchNorm2d: 3-92
- Sequential: 3-93 [-1, 512, 16, 16] 131,072
- LeakyReLU: 3-94 [-1, 512, 16, 16] 1,024
- block: 2-11 [-1, 512, 16, 16] 132,096
- Conv2d: 3-95
[-1, 512, 16, 16] -
- BatchNorm2d: 3-96
- LeakyReLU: 3-97 [-1, 512, 16, 16] -
- Conv2d: 3-98 [-1, 256, 16, 16] 131,072
- BatchNorm2d: 3-99 [-1, 256, 16, 16] 512
- LeakyReLU: 3-100 [-1, 256, 16, 16] -
- Conv2d: 3-101 [-1, 256, 16, 16] 589,824
- BatchNorm2d: 3-102 [-1, 256, 16, 16] 512
- LeakyReLU: 3-103
[-1, 256, 16, 16] -
[-1, 512, 16, 16] 131,072
[-1, 512, 16, 16] 1,024
[-1, 512, 16, 16] -
AdaptiveAvgPool2d: 1-9 [-1, 512, 1, 1] -
Linear: 1-10 [-1, 128] 65,664
Linear: 1-11 [-1, 64] 8,256
Linear: 1-12 [-1, 9] 585
c) SqueezeNet based model: Below is the model’s arhitecture:
SqueezeNet Based model
Conv2d: 1-1 [-1, 32, 512, 512] 288

BatchNorm2d: 1-2 [-1, 32, 512, 512] 64
LeakuRelU: 1-3 [-1, 32, 512, 512] -
MaxPool2d: 1-4 [-1, 32, 256, 256] -
Fire: 1-5 [-1, 64, 256, 256] -

- Conv2d: 2-1 [-1, 16, 256, 256] 512
- BatchNorm2d: 2-2 [-1, 16, 256, 256] 32
- LeakyReLU: 2-3
[-1, 32, 256, 256] -
- Conv2d: 2-4
- BatchNorm2d: 2-5 [-1, 32, 256, 256] 512
- Conv2d: 2-6 [-1, 32, 256, 256] 64
- BatchNorm2d: 2-7 [-1, 32, 256, 256] 4608
- LeakyReLU: 2-8 [-1, 32, 256, 256] 64
[-1, 64, 256, 256] -
Fire: 1-6 [-1, 128, 256, 256] -

- Conv2d: 2-9 [-1, 16, 256, 256] 1024
- BatchNorm2d: 2-10 [-1, 16, 256, 256] 32
- LeakyReLU: 2-11
[-1, 16, 256, 256] -
- Conv2d: 2-12
- BatchNorm2d: 2-13 [-1, 64, 256, 256] 1024
- Conv2d: 2-14 [-1, 64, 256, 256] 128
- BatchNorm2d: 2-15 [-1, 64, 256, 256] 9216
- LeakyReLU: 2-16 [-1, 64, 256, 256] 128
[-1, 64, 256, 256] -
Fire: 1-7 [-1, 128, 256, 256] -

- Conv2d: 2-17 [-1, 16, 256, 256] 4096
- BatchNorm2d: 2-18 [-1, 16, 256, 256] 64
- LeakyReLU: 2-19
[-1, 16, 256, 256] -
- Conv2d: 2-20
- BatchNorm2d: 2-21 [-1, 64, 256, 256] 4096
- Conv2d: 2-22 [-1, 64, 256, 256] 256
- BatchNorm2d: 2-23 [-1, 64, 256, 256] 36,864
- LeakyReLU: 2-24 [-1, 64, 256, 256] 256
[-1, 64, 256, 256] -
MaxPool2d: 1-8 [-1, 256, 128, 128] -
Fire: 1-9 [-1, 128, 256, 256] -

- Conv2d: 2-25 [-1, 32, 128, 128] 8192
- BatchNorm2d: 2-26 [-1, 32, 128, 128] 64
- LeakyReLU: 2-27
[-1, 32, 128, 128] -
- Conv2d: 2-28
- BatchNorm2d: 2-29 [-1, 128, 128, 128] 4096
- Conv2d: 2-30 [-1, 128, 128, 128] 256
- BatchNorm2d: 2-31 [-1, 128, 128, 128] 36,864
- LeakyReLU: 2-32 [-1, 128, 128, 128] 256
[-1, 256, 128, 128] -
Fire: 1-10 [-1, 384, 128, 128] -

- Conv2d: 2-33 [-1, 48, 128, 128] 12,288
- BatchNorm2d: 2-34 [-1, 48, 128, 128] 96
- LeakyReLU: 2-35
[-1, 48, 128, 128] -
- Conv2d: 2-36
- BatchNorm2d: 2-37 [-1, 192, 128, 128] 9,216
- Conv2d: 2-38 [-1, 192, 128, 128] 384
- BatchNorm2d: 2-39 [-1, 192, 128, 128] 82,944
- LeakyReLU: 2-40 [-1, 192, 128, 128] 384
[-1, 384, 128, 128] -
Fire: 1-11 [-1, 384, 128, 128] -

- Conv2d: 2-41 [-1, 32, 128, 128] 18432
- BatchNorm2d: 2-42 [-1, 48, 128, 128] 96
- LeakyReLU: 2-43
[-1, 48, 128, 128] -
- Conv2d: 2-44
- BatchNorm2d: 2-45 [-1, 48, 128, 128] 9,216
- Conv2d: 2-46 [-1, 192, 128, 128] 384
- BatchNorm2d: 2-47 [-1, 192, 128, 128] 82,944
- LeakyReLU: 2-48 [-1, 192, 128, 128] 384
[-1, 384, 128, 128] -
Fire: 1-12 [-1, 512, 128, 128] -

- Conv2d: 2-49 [-1, 64, 128, 128] 24,576
- BatchNorm2d: 2-50 [-1, 64, 128, 128] 128
- LeakyReLU: 2-51
[-1, 64, 128, 128] -
- Conv2d: 2-52
- BatchNorm2d: 2-53 [-1, 256, 128, 128] 16,384
- Conv2d: 2-54 [-1, 256, 128, 128] 512
- BatchNorm2d: 2-55 [-1, 256, 128, 128] 147,456
- LeakyReLU: 2-56 [-1, 256, 128, 128] 512
[-1, 512, 128, 128] -
MaxPool2d: 1-13 [-1, 512, 64, 64] -
Fire: 1-14 [-1, 512, 64, 64] -

- Conv2d: 2-57 [-1, 64, 64, 64] 32,768
- BatchNorm2d: 2-58 [-1, 64, 64, 64] 128
- LeakyReLU: 2-59 [-1, 64, 64, 64] -
- Conv2d: 2-60
[-1, 256, 64, 64] 16,384
- BatchNorm2d: 2-61
- Conv2d: 2-62 [-1, 256, 64, 64] 512
- BatchNorm2d: 2-63 [-1, 256, 64, 64] 147,456
- LeakyReLU: 2-64 [-1, 256, 64, 64] 512
[-1, 512, 64, 64] -
Conv2d: 1-15 [-1, 10, 64, 64] 5,120
AvgPool2d: 1-16 [-1, 10, 16, 16] -
Dropout: 1-17 [-1, 2560] -
Linear: 1-18 [-1, 384] 983,424
Dropout: 1-19 [-1, 384] -
Linear: 1-20 [-1, 128] 49,280
Dropout: 1-21 [-1, 128] -
Linear: 1-22 [-1, 9] 1,161
Total Number of trainable parameters 1,756,137

Work left to do:
1) Implement Grad-CAM algorithm.

2) Train individual modules.
3) Get the scores for their individual performances.
4) Orchestrate all the modules together to build the pipeline.
5) Fine-tune the final pipeline.
6) Check performance of the 3 presented Deep CNNs, for pathology detection.
7) Check the performance of above-mentioned models for Covid detection.
Limitations:
1) Mainly, our model’s performance would be limited by the lack of quality data for the
bone suppression module.
2) Another limitation is the computational power required to train the model efficiently.
3) We are currently fixing the input size of the modules to 512x512, with more
computational capacity we would be able to increase the image input size, hence
increasing the information the model has to work with.
Gantt chart
References
[1] “RSNA Pneumonia Detection Challenge,” Kaggle.com. [Online]. Available:
https://www.kaggle.com/c/rsna-pneumonia-detection-challenge. [Accessed:
04-Sep-2021].
[2] Cdc.gov. [Online]. Available:

https://www.cdc.gov/nchs/data/nhamcs/web_tables/2015_ed_web_tables.pdf#%5B%7B
%22num%22%3A80%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22X
YZ%22%7D%2C-89%2C777%2C0.930573%5D. [Accessed: 04-Sep-2021].
[3] Cdc.gov. [Online]. Available:

https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_06_tables.pdf#%5B%7B%22num%
22%3A109%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22FitH%22%7
D%2C554%5D. [Accessed: 04-Sep-2021].
[4] T. Franquet, “Imaging of community-acquired pneumonia,” J. Thorac. Imaging, vol.

33, no. 5, pp. 282–294, 2018.
[5] B. Kelly, “The chest radiograph,” Ulster Med. J., vol. 81, no. 3, pp. 143–148, 2012.
[6] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “ChestX-Ray8:

Hospital-scale chest X-ray database and benchmarks on weakly-supervised
classification and localization of common thorax diseases,” in 2017 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2017.
[7] P. Rajpurkar et al., “CheXNet: Radiologist-level pneumonia detection on chest X-rays

with deep learning,” arXiv [cs.CV], 2017.
[8] A. Haghanifar, M. M. Majdabadi, Y. Choi, S. Deivalakshmi, and S. Ko,

“COVID-CXNet: Detecting COVID-19 in frontal chest X-ray images using deep
learning,” arXiv [eess.IV], 2020.
[9] J. P. Cohen, P. Morrison, L. Dao, K. Roth, T. Q. Duong, and M. Ghassemi, “COVID-19

Image Data Collection: Prospective Predictions Are the Future,” arXiv [q-bio.QM],
2020.
[10] S. Jaeger, S. Candemir, S. Antani, Y.-X. J. Wáng, P.-X. Lu, and G. Thoma, “Two public
chest X-ray datasets for computer-aided screening of pulmonary diseases,” Quant.
Imaging Med. Surg., vol. 4, no. 6, pp. 475–477, 2014.
[11] S. Rajaraman, L. Folio, J. Dimperio, P. Alderson, and S. Antani, “Improved semantic
segmentation of tuberculosis-consistent findings in chest X-rays using augmented
training of modality-specific U-Net models with weak localizations,” arXiv [cs.CV],
2021.
[12] S. Rajaraman, G. Zamzmi, L. Folio, P. Alderson, and S. Antani, “Chest X-ray bone
suppression for improving classification of tuberculosis-consistent findings,”
Diagnostics (Basel), vol. 11, no. 5, p. 840, 2021.
[13] R. Tanaka, S. Sanada, M. Oda, M. Suzuki, K. Sakuta, and H. Kawashima, “Improved

accuracy of image guided radiation therapy (IMRT) based on bone suppression
technique,” in 2013 IEEE Nuclear Science Symposium and Medical Imaging
Conference (2013 NSS/MIC), 2013, pp. 1–2.
[14] Y. Gordienko et al., “Deep learning with lung segmentation and bone shadow exclusion
techniques for chest X-ray analysis of lung cancer,” arXiv [cs.LG], 2017.
[15] M. Gusarev, R. Kuleev, A. Khan, A. Ramirez Rivera, and A. M. Khattak, “Deep

learning models for bone suppression in chest radiographs,” in 2017 IEEE Conference
on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB),
2017, pp. 1–7.
[16] J. Ma, X. Fan, S. X. Yang, X. Zhang, and X. Zhu, “Contrast limited adaptive histogram
equalization based fusion for underwater image enhancement,” Preprints, 2017.
[17] A. M. Reza, “Realization of the contrast limited adaptive histogram equalization

(CLAHE) for real-time image enhancement,” J. VLSI Sign. Process. Syst. Sign. Image
Video Technol., vol. 38, no. 1, pp. 35–44, 2004.
[18] I. Culjak, D. Abram, T. Pribanic, H. Dzapo, and M. Cifrek, “A brief introduction to

OpenCV,” in 2012 Proceedings of the 35th International Convention MIPRO, 2012, pp.
1725–1730.
[19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception
architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 2016.
[20] A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning

library,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[21] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra,
“Grad-CAM: Visual explanations from deep networks via Gradient-based localization,”
arXiv [cs.CV], 2016.
[22] D. Y. Oh and I. D. Yun, “Learning bone suppression from dual energy chest X-rays
using adversarial networks,” arXiv [cs.CV], 2018.
[23] K. K. Bressem, L. C. Adams, C. Erxleben, B. Hamm, S. M. Niehues, and J. L.

Vahldiek, “Comparing different deep learning architectures for classification of chest
radiographs,” Sci. Rep., vol. 10, no. 1, p. 13590, 2020.
[24] “India: air pollution deaths by type 2019,” Statista.com. [Online]. Available:
https://www.statista.com/statistics/1194824/india-air-pollution-deaths-by-type/.
[Accessed: 04-Sep-2021].
[25] Nithyananda C R, Ramachandra A C, and Preethi, “Survey on Histogram Equalization

method based Image Enhancement techniques,” in 2016 International Conference on
Data Mining and Advanced Computing (SAPIENCE), 2016, pp. 150–158.
[26] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for

Biomedical Image Segmentation,” arXiv [cs.CV], 2015.

First Progress Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

First Progress Report

Uploaded by

Copyright:

Available Formats

First Progress Report on

Chest X-Ray disease classification Using Deep Learning

Under the supervision of

Department of computer Science and Engineering

Furthermore, image enhancement techniques based on histogram equalization for medical

The objectives of our project are as follows:

Figure 1: Proposed DL Pipeline

Our research methodology can be described in the following steps:

Name of dataset Size of Dataset # Data Classes

For the CXR8 dataset, here are the details:

Name of pathology # of images

Table: CXR Classification Dataset

Dataset name No. of X-Rays (w/masks)

Shenzhen Hospital CXR Set 326 normal, 336 abnormal

Montgomery County CXR Set 80 normal, 58 abnormal

Lung Segmentation (U-Net) model

Layer Output Shape # of Params

MaxPool2D 1-1 [-1, 64, 256, 256] -

MaxPool2D 1-2 [-1, 128, 128, 128] -

MaxPool2d: 1-3 [-1, 256, 64, 64] -

MaxPool2d: 1-4 [-1, 512, 32, 32] -

Conv2d: 1-6 [-1, 1, 512, 512] 65

Total Trainable parameters 31,036,481

Total parameters 31,036,481

Bone Suppression (AE) model

Layer Output Shape Param #

Sequential: 1-1 [-1, 64, 64, 64] -

Sequential: 1-2 [-1, 1, 513, 513] -

Total Trainable parameters 46,609

Total parameters 46,609

Inception Based model

conv_block: 1-1 [-1, 32, 256, 256] -

MaxPool2D 1-2 [-1, 32, 128, 128] -

conv_block: 1-3 [-1, 64, 128, 128] -

MaxPool2D 1-4 [-1, 64, 64, 64] -

Inception_block: 1-5 [-1, 112, 64, 64] -

MaxPool2d: 1-6 [-1, 112, 32, 32] -

Inception_block: 1-7 [-1, 192, 32, 32] -

Inception_block: 1-8 [-1, 268, 32, 32] -

Inception_block: 1-9 [-1, 370, 32, 32] -

MaxPool2d: 1-10 [-1, 370, 16, 16] -

AvgPool2d: 1-12 [-1, 832, 10, 10] -

Dropout: 1-13 [-1, 83200] -

Linear: 1-14 [-1, 384] 31,949,184

Dropout: 1-15 [-1, 384] -

Linear: 1-16 [-1, 128] 49,280

Dropout: 1-17 [-1, 128] -

Linear: 1-18 [-1, 9] 1,161

Total Trainable parameters 33,213,254

Total parameters 33,213,254

ResNet-34 Based model

Layer Output Shape Param #

Conv2d: 1-1 [-1, 32, 256, 256] 1,568

BatchNorm2d: 1-2 [-1, 32, 256, 256] 64

LeakyReLU: 1-3 [-1, 32, 256, 256] -

MaxPool2d: 1-4 [-1, 32, 128, 128] -

Sequential: 1-5 [-1, 64, 128, 128] -

Sequential: 1-6 [-1, 128, 64, 64] -

Sequential: 1-7 [-1, 256, 32, 32] -

Sequential: 1-8 [-1, 512, 16, 16] -

AdaptiveAvgPool2d: 1-9 [-1, 512, 1, 1] -

Linear: 1-10 [-1, 128] 65,664