Professional Documents
Culture Documents
CapstoneProjectreport Milestone1
CapstoneProjectreport Milestone1
Contents
1. Abstract ........................................................................................................................................... 2
2. INTRODUCTION ............................................................................................................................... 2
2.1 About Pneumonia ................................................................................................................... 2
2.2 Pneumonia Diagnosis and detection ...................................................................................... 2
2.3 Chest Radiographs Basics ........................................................................................................ 3
3. Summary of Problem Statement, Data and Findings...................................................................... 3
3.1 Problem Statement ................................................................................................................. 3
3.2 Given Data............................................................................................................................... 4
3.3 Findings ................................................................................................................................... 5
4. Overview of the final process-data pre-processing steps and the algorithms used ...................... 6
4.1 Visualizations – Exploratory Data Analysis ............................................................................. 6
4.1.1 Distribution of lung opacity in patients .......................................................................... 6
4.1.2 Age-wise value counts for all records ............................................................................. 7
4.1.3 Bar chart view of the ratio of Sex and the different lung opacities .............................. 10
4.1.4 Lung opacity scatter plot............................................................................................... 11
4.1.5 DICOM image Metadata ............................................................................................... 13
4.1.6 Visualization of Sample X-Ray images for each category ............................................. 15
4.1.7 Visualization of sample X-Ray images with bounding boxes depicting lung opacity .... 16
5. Deciding Models and Model Building ........................................................................................... 17
5.1 Suitable algorithms for the given problem ........................................................................... 17
5.2 Different algorithm and their performances ........................................................................ 19
6. Improve model performance ........................................................................................................ 19
6.1 Approaches to improve model performance ........................................................................... 19
2
Pneumonia Detection challenge – Group A, 2022
1. Abstract
This is the report of the capstone project, which is executed as part of the academic
requirement, and to complete the PGP programme on AIML from Great learning, Great
Lakes university.
The objective of the project is to build a CNN model to detect the presence of pneumonia
given a set of DICOM images.
2. INTRODUCTION
2.1 About Pneumonia
Pneumonia is an infection of one or both lungs caused by bacteria, viruses, or fungi.
It is a serious infection in which the air sacs fill with pus and other liquid
(hopkinsmedicine.org). The infection causes inflammation in the air sacs in your
lungs, which are called alveoli.
Going by the statistics, Pneumonia accounts for over 15% of all deaths of children
under 5 years old globally. In 2017 alone, 920,000 children under age of 5 died of
pneumonia.
Fig 2.2.1
In the process of taking the image, an X-ray passes through the body and reaches
a detector on the other side. Tissues with sparse material, such as lungs which
are full of air, do not absorb the X-rays and appear black in the image. Dense
tissues such as bones absorb the X- rays and appear white in the image.
In short –
o Black = Air
o White = Bone
o Grey = Tissue or fluid
The left side of the subject is on the right side of the screen by convention. It can
also be observed that there is a small L at the top of the right corner. In a normal
image we see the lungs as black, but they have different projections on them -
mainly the rib cage bones, main airways, blood vessels and the heart.
The testing images doesn’t contain any class and bounding box information. So,
the training images should split into training data and validation data in the ratio
of 70% and 30%
3.3 Findings
Patient distribution shows most pneumonia patients or patients are found between
age 40-60. More records for male are observed in the data.
4.1.3 Bar chart view of the ratio of Sex and the different lung opacities
We can clearly see that ‘MALE’ have higher number of observations as compared to
‘FEMALE’ in all the classes
11
Pneumonia Detection challenge – Group A, 2022
“No Lung Opacity / Not Normal” and “Normal” class falls under Target 0 which represent
no Pneumonia and “Lung Opacity” class represent Pneumonia patient
From the below scatter plots, it can be observed that lung opacity is seen maximum
in age group 50-65 followed by group between 20-50
Also we observe that lower age group ( less than 20) patients have less Pneumonia
compared to older ages. Above 65 age group Pneumonia patients are less as the
number of patients decreases.
12
Pneumonia Detection challenge – Group A, 2022
13
Pneumonia Detection challenge – Group A, 2022
• Pixel Data (7fe0 0010)(last entry) — This is where the raw pixel data is stored. The
order of pixels encoded for each image plane is left to right, top to bottom, i.e., the
upper left pixel (labeled 1,1) is encoded first
• Photometric Interpretation (0028, 0004) — aka color space. In this case it is
MONOCHROME2 where pixel data is represented as a single monochrome image
plane where the minimum sample value is intended to be displayed as black.
• Bits Stored (0028 0101) — Number of bits stored for each pixel sample. Here it is 8
• Pixel Representation (0028 0103) — can either be unsigned(0) or signed(1). Here it is
0 which means unsigned.
• Lossy Image Compression (0028 2110) — 00 image has not been subjected to lossy
compression. 01 image has been subjected to lossy compression. Here it is 01.
14
Pneumonia Detection challenge – Group A, 2022
• The Rows and the Columns attribute yield us the image size. Here Rows:1024 and
cols 1024, Image size is 1024 x 1024
• Lossy Image Compression Method (0028 2114) — states the type of lossy
compression used (in this case JPEG Lossy Compression, as denotated
by CS:ISO_10918_1)
4.1.7 Visualization of sample X-Ray images with bounding boxes depicting lung
opacity
17
Pneumonia Detection challenge – Group A, 2022
The model involves building the input layer, feature extraction layer, using
activation functions, applying appropriate weights and classifiers
Fig 5.1.1 shows the architecture of the basic model that we have used
18
Pneumonia Detection challenge – Group A, 2022