Professional Documents
Culture Documents
Rabjot Capstone Report
Rabjot Capstone Report
Submitted by:
( 101611041) RABJOT SINGH
(101611040) PRATEEK GARG
(101603107) GYANDEEP DIGRA
(101783032) PARSHANT JINDAL
This project aims to develop a fully automated system for diagnosis of lung diseases using chest x-rays. The
platform will enable its users and professional diagnostic centres to upload their chest radiographs (x-rays)
and get accurate predictions based on those. Chest radiography has important clinical value in the diagnosis
of diseases. Thus the automatic detection of chest disease based on chest radiography has become a hot topic
in medical imaging research.
This project has overall two parts :
1.Backend Server - The server will cater the request from the medical diagnosis labs and individual users
who do not have access professional consultation for medical diagnosis. The server should be capable of
generating report whenever a chest radiograph is uploaded and provide accurate results.
2. Mobile App - This mobile application will serve as a client to the backend server. It will be the primary
target for interaction with the users. The user can install this application on their devices and use it to view
reports.
All these components together will act as a Report Generation Tool from the chest radiographs that ensures
accurate results.
Apart from the above said components, there will be a machine learning algorithm that will learn from the
new x-rays being uploaded thus continuously improving its accuracy.
2
DECLARATION
We hereby declare that the design principles and working prototype model of the project entitled ‘LUNG
DISEASE DETECTION USING X RAYS’ is an authentic record of our own work carried out in the
Computer Science and Engineering Department, TIET, Patiala, under the guidance of Dr Ashutosh Aggarwal
during 6th semester (2019).
3
ACKNOWLEDGEMENT
We would like to express our thanks to our mentor Dr. Ashutosh Aggarwal. He has been of great help in our
venture, and an indispensable resource of technical knowledge. He is truly an amazing mentor to have.
We are also thankful to Dr Maninder Singh, Head, Computer Science and Engineering Department, entire
faculty and staff of Computer Science and Engineering Department, and also our friends who devoted their
valuable time and helped us in all possible ways towards successful completion of this project. We thank all
those who have contributed either directly or indirectly towards this project.
Lastly, we would also like to thank our families for their unyielding love and encouragement. They always
wanted the best for us and we admire their determination and sacrifice.
4
TABLE OF CONTENTS
ABSTRACT
DECLARA TION
ACKNOWLEDGEMENT
LIST OF FIGURES
LIST OF TABLES
LIST OF ABBREVIATIONS
CHAPTER 1- INTRODUCTION
1.1.3 GOAL
1.1.4 SOLUTION
2.2 STANDARDS
2.3.1 Introduction
2.3.1.1 Purpose
5.2.1 DATA
7
5.3.1.2 Test Strategy
APPENDIX A: REFERENCES
8
LIST OF TABLES
9
LIST OF FIGURES
Figure10 Screenshot 36
Figure 11 Screenshot 36
Figure 12 Screenshot 37
10
LIST OF ABBREVIATIONS
ML Machine Learning
CNN Convolutional Neural Network
ABBR2 Abbreviation 2
11
CHAPTER 1 - INTRODUCTION
A large number of diseases that affect the worldwide population are lung-related. Therefore, research in the
field of Pulmonology has great importance in public health studies and focuses mainly on Infiltration,
Atelectasis, Cardiomegaly, Effusion, Mass, Nodule, Pneumonia, Pneumothorax.
The World Health Organisation (WHO) estimates that there are 300 million people who suffer from asthma,
and that this disease causes around 250 thousand deaths per year worldwide (Campos and Lemos, 2009). In
addition, WHO estimates that 210 million people have Cardiomegaly. The disease caused the death of over
300 thousand people in 2005 (Gold Cardiomegaly, 2008). Recent studies reveal that CARDIOMEGALY is
present in the 20 to 45 year-old age bracket, although it is characterised as an over-50-year-old disease.
Accordingly, WHO estimates that the number of deaths due to CARDIOMEGALY will increase 30% by
2015, and by 2030 CARDIOMEGALY will be the third cause of mortalities worldwide (World…, 2014).
For the public health system, the early and correct diagnosis of any pulmonary disease is mandatory for
timely treatment and prevents further death. From a clinical standpoint, diagnosis aid tools and systems are
of great importance for the specialist and hence for the people’s health.
X RAY images of lungs represent a slice of the ribcage, where a large number of structures are located, such
as blood vessels, arteries, respiratory vessels, pulmonary pleura and parenchyma, each with its own specific
information. Thus, for pulmonary disease analysis and diagnosis, it is necessary to segment lung structures.
It is worth noting that segmentation is an essential step in image systems for the accurate lung disease
diagnosis, since it delimits lung structures in X RAY images. Indeed, image processing techniques can help
computer diagnosis if lung region is accurately obtained.
Following the segmentation process, an automatic procedure is applied to detect possible diseases in lung X
RAY images in order to guide the radiologist diagnosis. Some studies have yielded promising disease
detection results as reported by Trindade (2009) that uses texture descriptors extracted from the gray level
concurrence matrix (GLCM) (Haralick et al., 1973) to describe three disease patterns (nodule, emphysema
and frosted glass) and a normal one. Shimo et al. (2010) also employ GLCM texture descriptors to determine
if the lungs are healthy or not. Furthermore, some papers address the detection of certain specific diseases,
such as nodules (Ayres et al., 2010; Silva and Oliveira, 2010), and emphysema (Felix et al., 2007, 2011).
12
1.1 PROJECT OVERVIEW
1. Backend Server - The server will cater the request from the medical diagnosis labs and individual
users who do not have access professional consultation for medical diagnosis. The server should be
capable of generating report whenever a chest radiograph is uploaded and provide accurate results.
2. Mobile App - This mobile application will serve as a client to the backend server. It will be the
primary target for interaction with the users. The user can install this application on their devices and
use it to view reports.
All these components together will act as a Report Generation Tool from the chest radiographs that
ensures accurate results.
Apart from the above said components, there will be a machine learning algorithm that will learn from
the new x-rays being uploaded thus continuously improving its accuracy.
Lack of trained radiologists make it very difficult to provide accurate interpretation and get accurate
predictions from chest x-rays.
1.1.3 GOAL
Accuracy in detection of disease in lungs.
1.1.4 SOLUTION
Pre prediction of the disease so that a proper treatment can take place. So design a fully automated
system for diagnosis of lung diseases using chest x-rays. The platform will enable its users and
professionals to get accurate predictions based on chest x-rays.
13
1.2 NEED ANALYSIS
The need of our project is dire and can be defined and explained under the following headings:
1. Advantages of chest X-Rays include their low cost and easy operation. Even in underdeveloped areas,
machines are very affordable. Chest radiographs are widely used in the detection and diagnosis of
lung diseases and contain a large amount of information about a patient’s health. However, correctly
interpreting the information is always a major challenge.
2. Overlapping of the tissue structure and lack of well trained radiologists make it very difficult to
provide accurate interpretations of the chest X-rays.
3. Advancing the NIH Director’s global health initiative by making a significant impact in the
development and application of low cost disease detection technologies in resource-challenged
regions.
4. Developing screening technology for lung diseases, a major global health challenge identified by the
WHO as the second leading cause of mortality from infectious disease. HIV and TB co-infections
result in treatment complications and spread of the disease.
5. Advancing the science in image analysis for automatically detecting pulmonary diseases from digital
CXR images.
6. Instead of going to a medical professional for consultation from the report, the users can easily get
accurate results even in areas where there is no professional help and based on those results they can
get the required medical help.
Till now the research done on this topic is only limited to one specific disease but here we have done the
detection with more than one disease on a single go in one model which will result in multiple detections
with improved accuracy due to availability of huge dataset.
The major problem is expert doctors are not able to find the problem with the patient by just their x-ray as
there is no clear visibility in them which may result in ignorance of the upcoming problem with patient. So
14
the current scope of the project is to get that visibility by applying image processing techniques and then
compare it with the available data set and generate the results which further can be verified by the experts
this also saves time and chances of human errors.
S. No. Assumptions
Availability :
1
Clear chest radiographs should be available in digital form (eg. JPEG)
Correct Labelling :
3
The chest X-rays are correctly labelled.
This project exploits the convergence of imaging research and system development at the NLM and NIH
policy objectives in global health. The following are project objectives :
1. Advance the state-of-the-art in automated CXR image analysis. Automatically detect presence of
pulmonary diseases including TB and other relevant disease in digital CXRs, leading to suitable
discrimination for screening, as well as compute a measure of confidence in its determination.
2. Develop deployable screening software such that it can aid field clinical officers in decision making at
the point of care, and for radiologists to organise their workload.
3. Recognising the severity of lung diseases and the shortage of radiological services in western Kenya,
deploy developed software on to a self-powered mobile X-ray truck that AMPATH uses in rural areas.
Their staff take chest x-rays of the population and employ the NLM-developed software to screen for the
presence of lung diseases and other diseases.
15
1.7 METHODOLOGY USED
The system architecture is designed as a set of cascaded modules, with the flexibility to implement alternate
image analysis pathways followed by late stage decision fusion. As currently developed, every image is
analysed for automatic lung region localisation. Image features are extracted from within the localised lung
boundary, leading to 2-class normal/abnormal decision for the input CXR image. We are also studying
alternate techniques for detecting abnormalities in the CXR without localising the exact lung boundary. The
method also uses edge detection to find spurious contours that could be indicative of disease. Initial results
suggest that the method is fast and quite powerful in detecting certain kinds of pathologies.
Figure 1. An overview of our lung segmentation algorithm: (Stage-I) finding the similar lung CXRs from an atlas; (Stage-II) warping selected
images to patient CXR; and, (Stage-III) lung boundary detection using a graph-cuts optimization approach.
http://ceb.nlm.nih.gov/repos/chestImages.php
16
Stage 1:
First we use a content-based image retrieval (CBIR) method to identify a small set of similar appearing
CXR images from an expert-annotated set, that we shall call the “atlas set” hereafter. Horizontal and
vertical projection profiles are computed for all CXR images in the atlas set. Then, we measure the
similarity of each projection profile between the atlas set and the patient chest X-ray using the average
Bhattacharyya coefficient.
Stage 2:
In order to create the lung model, we register the selected set of CXR images that have similar appearance
but may have different lung outlines. The transformation mapping is done using the SIFT-flow algorithm.
The algorithm first models the local gradient information of the observed image using scale invariant
feature transforms (SIFT). Next, a minimization algorithm calculates the SIFT-flow, the transformation
mapping between each selected atlas image and the patient image. The mapping is used to register and
warp the selected atlas CXRs, making them geometrically aligned to the patient image. The lung model
for the patient X-ray is then composed as the mean of the warped lung masks from the registered atlas
images. The model is a probabilistic shape prior in which each pixel value is the probability of the pixel
being part of the lung field in the patient image.
Stage 3:
As a refining stage, we perform image segmentation using the graph cut algorithm and model the
segmentation process with an objective function. The max-flow/min-cut algorithm minimizes the
objective function to find a global minimum that corresponds to foreground (within-lung) and background
(outside- lung) labeling of the pixels.
17
Evaluation:
A radiology manually generated gold standard segmentations for the atlas chest X-ray images. The process
was aided using an interactive boundary marking tool [39], developed in prior NLM research and reported to
the Board. The radiologist then corrected these outlines using FireFly [40], a web-based labeling tool
developed at the University of Missouri. The method was evaluated on three datasets (JSRT, Montgomery,
and India) described, previously, in Section 3 above. We also compared the system performance with the
systems in the literature. The Jaccard Index3 (which measures overlap agreement) resulted in average
accuracy of 95.4% on the public JSRT database, which bests all prior published results. A similar degree of
accuracy of 94.1% and 91.7% on Montgomery and India datasets, respectively, demonstrates its robustness
to image variety.
! A well trained accurate model for prediction of lung diseases using chest X-rays.
! Aid the professionals in early and speedy classification of X-rays.
! Providing accurate results where trained medical professionals are not available.
Mobile App - This mobile application will be the primary target for interaction with the users. The user can
install this application on their devices and use it to view reports.
The model we are designing will be generating results for Infiltration, Atelectasis, Cardiomegaly, Effusion,
Mass, Nodule, Pneumonia, Pneumothorax with a good accuracy. On the other hand existing model focuses
only on one of the above mentioned diseases. While the app will be generating report that is directly
available to the fellow person. So no long waiting queues.
18
CHAPTER 2: REQUIREMENT ANALYSIS
Chest X-rays produce images of your heart, lungs, blood vessels, airways, and the bones of your chest
and spine. Chest X-rays can also reveal fluid in or around your lungs or air surrounding a lung. As the
most common examination tool in medical practice, chest radiography has important clinical value in
the diagnosis of disease. Thus, the automatic detection of chest disease based on chest radiography has
become one of the hot topics in medical imaging research. Our project focuses on computer-aided
detection (CAD) systems technology applied in chest radiography. The paper presents several common
chest X-ray datasets and briefly introduces general image preprocessing procedures, such as contrast
enhancement and segmentation, and bone suppression techniques that are applied to chest radiography.
If you go to your doctor or the emergency room with chest pain, a chest injury or shortness of breath,
you will typically get a chest X-ray. The image helps your doctor determine whether you have heart
problems, a collapsed lung, pneumonia, broken ribs, emphysema, cancer or any of several other
conditions.
The chest X-ray is a common way to diagnose disease. But it can also be used to tell whether a certain
treatment is working. Some people have a series of chest X-rays done over time, to track whether a
health problem is getting better or worse.
The Dubai Health Authority (DHA) on April 17, 2018 announced the preliminary results of a chest X-
ray artificial intelligence (AI) algorithm deployed across DHA medical fitness centers (MFCs). The
collaboration is the first validation of Agfa Healthcare’s Augmented Intelligence (AI) in the United
Arab Emirates (UAE).The partners began reviewing the use of artificial intelligence-enabled
workflows in radiology with Agfa more than two years ago. Upon completion of phase one of onsite
validation early January 2018, and on analysis of preliminary data, the algorithm was able to correctly
19
identify lung diseases in chest X-rays approximately 90 percent of the time. Phase two results in
March 2018 showed further improved sensitivity to 95 percent
The paper presents several common chest X-ray datasets and briefly introduces general image
preprocessing procedures, such as contrast enhancement and segmentation, and bone suppression
techniques that are applied to chest radiography. Then, the CAD system in the detection of specific
disease (pulmonary nodules, tuberculosis, and interstitial lung diseases) and multiple diseases is
described, focusing on the basic principles of the algorithm, the data used in the study, the evaluation
measures, and the results. Finally, the paper summarizes the CAD system in chest radiography based
on artificial intelligence and discusses the existing problems and trends.
They experiment a set of deep learning methods for the multi label classification of ChestX-ray14
dataset and provide results comparable to the state-of-the-art. They provide comparison results for
cross entropy and pairwise error loss for the task of multi label classification of the dataset. Further,
they implement a cascade network that improves upon the performance of deep learning models along
with modeling label dependencies. In summary, the present work provides optimistic results for the
automatic diagnosis of thoracic diseases. However, future work related to disease localization and
improvement of classification performance is in progress.
They discussed several state-of-the-art models and novel approaches for detecting, classifying, and
analysing various abnormalities involving the chest. The biggest impediment to achieving superhuman
level performance seems to come from the lack of large, high-quality datasets. However, the future
looks bright — with larger, better-annotated datasets and innovative models targeted towards working
with medical images, it is plausible that deep learning will bring phenomenal improvement to the
efficiency of radiologists’ workflow and quality of radiological diagnoses worldwide.
20
2.1.3 Literature Survey
Dept. of
Real-time
Comput.
database Design and implementation
Rabjot Sci ,Virginia
5. 101611041 systems: of real-time database
Singh Univ.,
present and systems
Charlottesville,
future
VA,USA
21
Pretorius,Arnu
& Bierman,
Limitations regarding the
Surette & J.
choice of performance
Steel, Sarel.
A meta-analysis measures, the way in
(2016). A meta-
of which
analysis of
Rabjot research in These measures are
6. 101611041 research in
Singh random forests estimated, and the
random forests
for methodology for
for
classification comparisons of multiple
classification.
algorithms over multiple
1-6.10.1109/Ro
data sets.
boMech.20
16.7813171.
Very deep
convolutional Effect of the convolutional
Rabjot networks for network depth on its Simonyan K,
7. 101611041
Singh large-scale accuracy in the large-scale Zisserman A
image image recognition setting
recognition
Szegedy C, Liu
neural network architecture
Going deeper W, Jia Y,
Rabjot codenamed Inception that
8. 101611041 with Sermanet P,
Singh achieves the new state of
convolutions Reed S,
the art for classification
Anguelov D
Fully automatic
lung
segmentation
Soleymanpour
and rib Steps for initial detection
Gyandeep E, Pourreza HR,
11. 101603107 suppression of lung cancer in Posterior-
Digra Ansaripour E,
methods to Anterior chest radiographs.
Yazdi MS
improve nodule
detection in
chest radiograph
22
Candemir S,
Lung segmentation in chest
automatic Jaeger S,
Prashant radiographs using
12. 101783032 detection of the Palaniappan K,
Jindal anatomical atlases with
lung regions Musco JP, Singh
nonrigid registration
RK, Xue Z
Automatic heart
localization and Candemir S,
Prashant radiographic Heart Identification in X- Jaeger S, Lin W,
13. 101783032
Jindal index rays Xue Z, Antani S,
computation in Thoma G.
chest x-rays
Automatic
An algorithm for detection Zhanjun Y,
Prashant detection of rib
14. 101783032 of posterior rib borders in Goshtasby A,
Jindal borders in chest
chest radiographs Ackerman LV.
radiographs
Usually there are four steps in a CAD system: algorithm preprocessing, extracting ROI regions,
extracting ROI features, and classifying disease according to the features. In the algorithm
preprocessing and extraction of ROI, the techniques of enhancement and segmentation are very
important. Usually, there are many ways to highlight lesions and suppress noise. In the segmentation,
the deformable model and the deep learning method are the best, while the rule-based methods have
poor performance, and they often used together with other methods to improve the segmentation
performance. The techniques of bone suppression are used less frequently in the literature, but
removing the rib and clavicle that block lung abnormalities can improve the system performance; In
terms of feature extraction, the features extracted by traditional machine learning algorithms include
geometric features, texture features, and shape features, which are usually processed to reduce the
dimensionality due to feature redundancy. However, hand-crafted features could have errors that affect
the classification performance and are gradually replaced by deep learning methods. In terms of
classifier selection, the performance of support vector machine and random forest in traditional
algorithms may be better, but with the excellent performance of deep learning in image classification,
the deep learning methods have gradually become the mainstream.
Following are the tools that have been surveyed for both software and hardware components:
23
Hardware:
1.Servers to host the application.
Software:
1.Python
2.Tensorflow/Keras
3.Google Colab
4.Kaggle
5.Flask
2.3.1 Introduction
2.3.1.1 Purpose:
“One hospital in Boston has 126 radiologists. Liberia has two.”
Frankly, even if these two radiologists have the speed of the Flash, the mental faculties of
Einstein, and no need for “amenities” like sleep and a social life, the burden of chest diseases
would prove too much to bear. Around 18 people die from lung cancer per hour in the United
States alone, and that number would be significantly higher were it not for the routine screening
of patients and early detection of nodules. Deep learning may help automatically discover chest
diseases at the level of experts, providing the two Liberian radiologists with some respite and
potentially saving countless lives worldwide.
The intended audience for this product is doctors as it will aid and assist them in
classifying X-Rays, sorting and pre-processing so they are able to focus on the
cases which require their attention and screening out the normal ones. Once we get
a stable and accurate model that can correctly predict the results independently
without requiring any human intervention they can be used to screen the X-Rays
autonomously and generate automated templates giving much faster results and
reports.
24
2.3.1.3 Project Scope:
! Workflow automation for radiologists - Ability to focus on suspected chest X-rays faster
instead of manual searching;
! Improved turn-around times - Expand the scope of chest X-ray screening program to add
more volume and capacity
The product has wide applications in detecting various diseases, and they are playing a vital role
as a second opinion for medical experts. In addition, CAD algorithms also reduce the workload
of medical experts by reviewing many CXRs quickly.
! Improved results by faster screening and results and ability to hande large volumes.
This section describes the way user interact with the system.
Can run on any desktop with web browser and internet connection.
This section will deal with some non-functional requirements of our project.
Availability
Response Time
The product should be able to reply to the query in a given amount of time
Processing speed
Processing speed of the app should be fast as it needs to process large volume data for optimal
functionality.
Users will have unique id and password which they can use to login the app.
User’s personal data will not be shared with any other company/person.
26
2.4 COST ANALYSIS
Item Price
1.Hosting the WebApp on Cloud Rs.3000
Total Cost Rs.3000
Project Risk
If any member gets sick or is not able to do his part of work for some
will increase.
Product Risk
If the model does not give accurate results, it may diagnose incorrectly which can have fatal results.
27
CHAPTER 3 – METHODOLOGY ADOPTED
28
3.4 TOOLS AND TECHNOLOGIES USED
29
CHAPTER 4 - DESIGN SPECIFICATION
30
4.2 DESIGN LEVEL DIAGRAMS
31
USE CASE DIAGRAM
32
Actor: User, Medical Diagnostic Lab
Preconditions: User wants a report based on their Chest X-ray
Postconditions: ● Success end Condition
1. Report will be generated based on the file uploaded.
2. User will be alerted in case of abnormalities.
● Failure end Condition
1. Report Not Generated
2. Inaccurate Report Generated.
Normal Scenario: 1. User will log into the application.
2. User will upload the file.
3. Pre-processed Image will be fed to the neural network.
4. The neural network makes predictions and a report is generated
based on the predictions
Alternative Flow: ● First
1. System determines user is logged on.
2. Return to normal scenario step 2.
● Second
1. User logs out.
2. Return to normal scenario step 1.
● Third
1. User does not have an account.
2. User creates an account.
3. System confirms account creation.
4. Return to normal scenario step 1.
Extensions: If there is abnormality in the report then the user will get a
recommendation of doctors.
Special Requirements: ● Performance
1. The device shall display report within 5 minutes.
● User Interface
1. The application shall display all outputs in english language.
2. User-friendly interface.
33
ACTIVITY DIAGRAM
34
CLASS DIAGRAM
35
4.3 USER INTERFACE DIAGRAMS
Figure 10
Figure 10
36
Figure 11
37
4.4 SNAPSHOTS OF WORKING PROTOTYPE MODEL
Figure 12
38
CHAPTER 5: CONCLUSIONS AND FUTURE DIRECTIONS
5.2 CONCLUSION
Chest x-ray CAD offers a helping hand in the detection of lung nodules that may have been missed by a
radiologist. In addition to boosting detection rates, lung CAD may facilitate the earlier detection of cancers,
which can benefit the treatment process and potentially reduce healthcare costs. With false-positive findings
declining and the technology advancing, chest x-ray CAD can be a valuable addition to high-risk screening
workflow.
39
5.4 Future Work Plan
40
APPENDIX A: REFERENCES
[1] Armato S.G., McLennan G., Bidaut L., et al., 2011. The Lung Image Database Consortium
(LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database
of Lung Nodules on CT Scans. Medical Physics. 2011;38(2):915–931. doi:
10.1118/1.3528204.
[2] Christodoulidis, S., Anthimopoulos, M., Ebner, L., Christe, A., Mougiakakou, S., 2017.
Multi-source transfer learning with convolutional neural networks for lung pattern nalysis.
IEEE J Biomed Health Inform 21, 76–84.
[3] Cicero, M., Bilbily, A., Colak, E., Dowdell, T., Gray, B., Perampaladas, K., Barfett, J.,
2016. Training and validating a deep convolutional neural network for computer-aided
detection and classification of abnormalities on frontal chest radiographs. Invest Radiol, in
press.
[4] Ciompi, F., de Hoop, B., van Riel, S. J., Chung, K., Scholten, E. T., Oudkerk, M., de Jong,
P. A., Prokop, M., van Ginneken, B., 2015. Automatic classification of pulmonary eri-
fissural nodules in computed tomography using an ensemble of 2D views and a
convolutional neural network out-of-the-box. Med Image Anal 26, 195–202.
[5] Ciompi, F., Chung, K., van Riel, S., Setio, A. A. A., Gerke, P., Jacobs, C., Scholten, E.,
Schaefer-Prokop, C., Wille, M. W., Marchiano, A., Pastorino, U., Prokop, M., van
Ginneken, B., 2016. Towards automatic pulmonary nodule management in lung cancer
screening with deep learning. doi:10.1038/srep46479
[6] Dou, Q., Chen, H., Yu, L., Qin, J., Heng, P. A., 2016b. Multi-level contextual 3D CNNs for
false positive reduction in pulmonary nodule detection, in press.
41
42