Artificial Intelligence in Finance Newsletter by Slidesgo

Captioning Chest X Rays
with Deep Learning

Krishna Poddar(2K21/CO/243)
Kumar Su Prashant(2K21/CO/247)
Kunal Singhal(2K21/CO/251)
Table of contents
01 02 03
Introduction Proposed Experimental Details
Methodology
04 05 06
Code Output Bibliography
01
Introduction
Introduction
1. Transformation in Deep Learning: Deep learning has significantly advanced, especially in understanding and interpreting
visual information.
2. Synergy of Computer Vision and NLP: The combination of computer vision and natural language processing (NLP) is
groundbreaking, enabling machines to recognize and describe objects and scenes in natural language.
3. Potential in Medical Radiology: This synergy holds immense potential in medical radiology, where thousands of images are
generated daily, aiding in disease diagnosis, treatment monitoring, and understanding patients' health conditions.
4. Challenges in Image Interpretation: Despite the importance of radiological images, interpreting them is challenging due to
the complexity and vast amount of visual information they contain.
5. Objective of the Project: The project aims to address this challenge by creating an automated image captioning system
specifically tailored for medical radiology reports.
6. Utilizing Deep Learning: State-of-the-art neural network architectures will be employed to generate coherent and contextually
accurate natural language descriptions of radiological images.
7. Benefits: This innovative approach is expected to save time for healthcare professionals while enhancing accessibility and
interpretability of medical images for various stakeholders, including physicians, radiologists, and patients.
Challenges in Medical Radiology Reports
1. Complexity of Medical Radiology Reports: These reports contain intricate and multifaceted images, requiring a deep
understanding of anatomy, pathology, and disease-specific patterns.
2. Use of Medical Jargon: Radiology reports often contain complex medical terminology, making them challenging for non-
specialists to comprehend.
3. Challenge for Automated Interpretation: The combination of visual complexity and linguistic specificity poses a significant
challenge for automated interpretation of radiology reports.
4. Conventional Methods: Traditional methods involve manual interpretation and report writing by radiologists, which are time-
consuming, prone to human error, and may result in reporting backlogs in busy healthcare settings.
5. Advantages of Deep Learning-Based Image Captioning: Deep learning-based systems can automatically generate detailed
and coherent descriptions of radiological images, reducing the burden on healthcare professionals and providing rapid, consistent,
and understandable reports.
02
Proposed
Methodology
Introduction to Image Understanding
1) Image Understanding:
a) Essential for generating coherent captions.
b) Encompasses object recognition, scene
recognition, and understanding
interrelationships.
2) Key Aspects:
a) Object Recognition: Identifying anatomical
structures or abnormalities.
b) Scene Recognition: Understanding the broader
context, crucial in medical images.
c) Interrelationships: Understanding how objects and
scenes relate.
Language Used: Python
a) Key Language: Python is the main language for

AI/ML due to its simplicity.
b) Rich Libraries: Python's libraries like c) Flexibility:
Python's adaptability streamlines the AI/ML workflow.
d) Community Support: Python's vast community
offers abundant resources for AI/ML practitioners.
Libraries Used:
a) TensorFlow: For building, training, and deploying ML
models.
b) Keras: Simplifies building and training deep learning
models.
c) NumPy: For numerical computations and data
manipulation.
d) OpenCV: For image processing tasks.
e) NLTK: For text-related tasks in natural language
processing.
f) Scikit-Learn: For machine learning tasks like dataset
splitting and evaluation.
g) Matplotlib: For data visualization.
h) Pandas: For data manipulation and analysis.
i) Pre-trained CNN Models:
Leveraged for feature extraction from images.
Feature Extraction and Dataset Preparation
Feature Extraction:
a) Utilizes CNN architectures like
ResNet50, EfficientNet, etc.
b) Transforms raw pixel data into
numerical feature vectors. Dataset Preparation:
c) Final vector dimension typically
8x8x2048. a) Importance of high-quality datasets.
b) Specific datasets: NIH Chest X-Ray Dataset
and Chest X-ray dataset from Indiana
University.
Preprocessing
Data Preprocessing: Benefits:
a) Organizing datasets into a) Efficient utilization of data
TensorFlow's during training.
image_dataset_from_directory format. b) Improves the effectiveness of
b) Ensures data is in the required the training process.
format for model training.
03
Experimental
Details
Workflow Overview
Dataset Preprocessing and Feature Extraction Text Vectorization and Network for captioning
visualization data splitting
CNN Encoder RNN Decoder Bahdanau Attention Training

Mechanism
Data Preprocessing And visualization
● Indiana University and NIH Chest X-Ray datasets provide
valuable resources for medical image analysis.
● Merged image path and caption files to facilitate easier data
handling.
● Created training and testing datasets using TensorFlow's
image_dataset_from_directory method.
● Utilized parameters such as image size, label mode, color
mode, and batch size for dataset creation.
● Visualizing the dataset is crucial for understanding its
contents.
● Demonstrated dataset reading and display using the Pandas
library.
Feature Extraction
NIH-Dataset Indiana University Dataset

The code loads the dataset into a pandas The last convolutional layer of the
DataFrame and preprocesses it to create InceptionV3 model is used to extract
training and testing datasets using features from chest x-rays and encode them
TensorFlow's image_dataset_from_directory into a feature vector. Additionally, a
method. Then, it uses the InceptionV3 TensorFlow Text Vectorization layer is set
model with pre-trained weights to train a up to numerically encode the caption data
classification model on the 12 classes with for creating an embedding. Finally, datasets
the highest value counts, aiming to encode are split into training and testing sets for
features of chest x-rays into tensors further processing.
Network for Captioning
● Preparation of caption data is essential for image captioning,
ensuring that the model can effectively learn from textual
information associated with the images. The process involves two
components: a CNN encoder and an RNN decoder.
● The CNN encoder extracts features from input images, encoding
them into a suitable format for further processing.
● The RNN decoder interprets these encoded features and generates
captions sequentially, leveraging sequential processing capabilities
to capture context and nuances.
● The integration of the CNN encoder and RNN decoder bridges the
semantic gap between visual input and textual output, facilitating
the generation of meaningful captions for chest X-ray images.
CNN Encoder and RNN Decoder
● The model takes in a single raw image and generated a caption y
encode as a sequence of 1 to K encoded words. K is the size of the
vocabulary and C is the length of the caption.
● The model also uses a Convolutional Neural Network
(InceptionV3 in our case) to extract and output a feature vector
which the authors call annotation vectors. The CNN outputs L
vectors each of which is of D dimensions. In our case, the output
of the InceptionV3 feature extractor is a tensor of shape 8x8x2048.
● The RNN Decoder uses LSTM (Long Short Term Memory) cells
to produce captions step by step but faces vanishing gradient
problem
● Context vectors, obtained from the attention mechanism, are used
to influence the caption generation.
● GRU also has a similar vanishing gradient problem so finally we
used a combination of GRU and attention model to solve the
sequence to sequence problem
Bahdanau Attention Mechanism
• The Bahdanau Attention

mechanism is a key component of
the RNN Decoder.
• It computes attention weights that
determine the importance of
different image locations when
generating words in the captions.
• The attention mechanism enhances
the model's ability to focus on
relevant parts of the image.
Training
• Model training is a crucial phase

where the CNN Encoder and RNN
Decoder are trained to work
together.
• The Adam optimizer is used, and
the loss is calculated using Sparse
Categorical Cross-Entropy.
• The training is performed for 40
epochs on a Google Colab pro
Tesla P100 GPU to allow the
model to learn and improve.
04
Code
Link to Code of the program
https://colab.research.google.com/drive/1yT-W
hVclXBw80-pN_Igg8wWYgrnGrzMi?usp=sha
ring
05
Output
Sample test 1
Actual Caption Generated caption
• Indications: xxxx with xxxx • Indications: xxxxyearold female followup
endseq startseq
endseq startseq • Findings: normal heart size no focal
• Findings: stable consolidation is identified there is minimal
cardiomediastinal silhouette no xxxx airspace disease in the left ventricle
no focal alveolar consolidation no definite
focal airspace consolidation pleural effusion or pneumothoraces
suspicious pulmonary opacity cardiomediastinal silhouette is normal for
pneumothorax or pleural size and contour degenerative changes in
the inferior xxxx cardiomegaly and small
effusion changes of right to previouschronic pulmonary arthritis
mastectomy sequelae of prior • Impressions: 1 pulmonary clinical
granulomatous disease mild correlation xxxx no xxxx old fractures the
previously seen left upper quadrant seen
thoracic spine degenerative no xxxx soft tissue since comparison
change. examination there is some left base
• Impressions: no acute airspace disease the visualized bony
structures are intact endseq startseq
cardiopulmonary abnormality impressions no
Sample test 2
Actual Caption Generated caption
• Indications: start startseq • Indications: shortness of breath
hypertension
indications dyspnea endseq • Findings: impressions ltthe heart size
startseq within normal limits no focal consolidation
• Findings: stable the heart is top pneumothorax or large pleural effusion
visualized bony structures are otherwise
normal in size the mediastinum unremarkable in appearance of focal
is stable the aorta is airspace disease no pleural effusion or
atherosclerotic xxxx opacities pneumothorax the bony elements from
elsewhere are no displaced rib fractures the
are noted in the lung bases lungs are clear no pleural effusion
compatible with scarring or • Impressions: chest three total images to be
atelectasis there is no acute grossly unremarkable no suspicious
pulmonary opacities mild degenerative
infiltrate or pleural effusion changes of right apex otherwise
• Impressions: chronic changes unremarkable exam negative for acute
without acute disease pulmonary infiltrate endseq end
06
Bibliography
a) Link to NIH X-ray dataset:
https://www.nih.gov/news-events/news-releases/nih-clinical-center-pr
ovides-one-largest-publicly-available-chest-x-ray-datasets-scientific-c
ommunity?source=post_page-----24febcc19f6f----------------------------
----
b) Link to Indiana University Dataset:
https://www.kaggle.com/datasets/raddar/chestxrays-Indiana-university
?select=indiana_reports.csv&source=post_page----- 24febcc19f6f------
Thanks!!

Artificial Intelligence in Finance Newsletter by Slidesgo

Uploaded by

Copyright:

Available Formats

You might also like

Artificial Intelligence in Finance Newsletter by Slidesgo

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Intelligence in Finance Newsletter by Slidesgo

Uploaded by

Copyright:

Available Formats

Captioning Chest X Rays

with Deep Learning

a) Key Language: Python is the main language for

CNN Encoder RNN Decoder Bahdanau Attention Training

NIH-Dataset Indiana University Dataset

• The Bahdanau Attention

• Model training is a crucial phase

You might also like