Unveiling The Precision of CheXNeXt Algorithm Against Radiologist Expertise in Chest Radiograph Pathology Detection

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Unveiling the Precision of CheXNeXt Algorithm Against Radiologist

Expertise in Chest Radiograph Pathology Detection

Selva Vignesh M1*


1
IIPC Coordinator, Dr.N.G.P Arts and Science College, Coimbatore, Tamilnadu, India

*Author to whom correspondence should be addressed


E-mail: selvavigneshmds@gmail.com

ABSTRACT

Chest radiograph interpretation plays a pivotal role in diagnosing thoracic diseases, yet the reliance on expert
radiologists poses challenges such as fatigue-based errors and limited diagnostic accessibility in certain regions
of the planet. Leveraging deep learning, we introduce a novel algorithm, CheXNeXt, designed to detect 14
different pathologies in frontal-view chest radiographs. In a comprehensive study comparing CheXNeXt with 3
radiologists, including board-certified and senior residents, our algorithm demonstrated expert-level
performance on 11 pathologies. Notably, CheXNeXt outperformed radiologists in detecting atelectasis. While
radiologists excelled in specific cases like cardiomegaly, emphysema, and hiatal hernia, CheXNeXt showcased
efficiency and consistency, completing image interpretation substantially faster. The study highlights the
potential of deep learning algorithms, like CheXNeXt, to augment diagnostic capabilities, addressing challenges
associated with radiologist shortages and enhancing patient access to chest radiograph diagnostics.

Key Points – Chest radiograph, CheXNeXt, Radiologists, Thoracic diseases, Digital Diagnostics

I. INTRODUCTION

Chest radiograph interpretation is a critical component of diagnosing thoracic diseases, providing insights into
conditions such as tuberculosis and lung cancer that affect millions worldwide annually. The traditional reliance
on expert radiologists for image analysis, however, presents formidable challenges that extend beyond the
nuances of medical diagnosis. This dependence introduces a susceptibility to fatigue-based errors, where the
demanding nature of the task can compromise the accuracy and precision of diagnostic assessments. Moreover,
the shortage of radiologists in specific geographic regions exacerbates the limited diagnostic accessibility for
individuals seeking timely medical evaluations.
In recent years, the emergence of deep learning approaches has marked a transformative shift in the landscape of
medical image interpretation. Fueled by large-scale neural network architectures and propelled by extensive
labeled datasets, these algorithms have showcased expert-level performance in various diagnostic tasks. Our
study aims to contribute to this paradigm shift by introducing CheXNeXt, a convolutional neural network
meticulously designed to concurrently detect 14 distinct pathologies in frontal-view chest radiographs.

The motivation behind the development of CheXNeXt lies in addressing the dual challenges of
diagnostic accuracy and accessibility. By harnessing the power of deep learning, we aspire to alleviate the
burden on expert radiologists, reduce fatigue-induced errors, and extend diagnostic expertise to regions where
the scarcity of radiologists limits accessibility. The comprehensive training and internal validation of CheXNeXt
on the ChestX-ray8 dataset, coupled with the reference standard provided by a panel of three board-certified

1
cardiothoracic specialist radiologists, form the foundation of our investigation into the algorithm's
discriminative performance.

Our comparative analysis involves not only algorithmic performance but also an assessment of CheXNeXt
against the proficiency of nine radiologists, comprising six board-certified radiologists and three senior
radiology residents. The 14 pathologies under consideration include pneumonia, pleural effusion, pulmonary
masses, nodules, and others of clinical significance. In this pursuit, we seek to evaluate whether CheXNeXt
achieves radiologist-level performance across these pathologies, identifying areas where the algorithm excels
and those where it may require refinement.

Furthermore, we delve into the efficiency aspect, measuring the time taken for image interpretation by both
CheXNeXt and the radiologists. The stark contrast in time efficiency underscores the potential of deep learning
algorithms to expedite diagnostic processes, addressing concerns related to prolonged waiting times for medical
evaluations. Through this research, we envision a future where advanced deep learning algorithms, exemplified
by CheXNeXt, play a pivotal role in complementing radiological expertise, reducing diagnostic errors, and
expanding patient access to chest radiograph diagnostics on a global scale.

II. SYSTEM DESIGN

2.1 DATASET

In this research, the dataset used in the study is called ChestX-ray14. It is currently the largest public repository
of radiographs, containing 112,120 frontal-view chest radiographs of 30,805 unique patients. Each image in the
dataset is annotated with up to 14 different thoracic pathology labels. The labels were chosen based on the
frequency of observation and diagnosis in clinical practice. The dataset was partitioned into training, tuning, and
validation sets for the purpose of the study. It's worth noting that the evaluation of CheXNeXt's performance
relies on the reference standard provided by a panel of three board-certified cardiothoracic specialist
radiologists. These radiologists, as part of the validation process, likely contributed to the annotation and
labeling of the dataset, ensuring a robust benchmark for the algorithm's discriminative capabilities The training
set was used to optimize network parameters, the tuning set was used to compare and choose networks, and the
validation set was used to evaluate the performance of the algorithm and radiologists. The dataset is publicly
hosted by the National Institutes of Health Clinical Centre.. The test set annotations are not made publicly
available to preserve the integrity of the test results.

Figure 1: Original Slices of X-rays from ChestX-ray14

2
The chest radiograph images utilized in this study were sourced from the College of Chest X-rays and were
available in JPEG format as shown in Figure 1. The use of this standardized image format allowed seamless
integration into the study's workflow, ensuring accessibility and ease of manipulation for subsequent stages,
such as training and validation of the deep learning algorithm, CheXNeXt.

2.2 EXPLORATORY DATA ANALYSIS AND PREPROCESSING

The dataset employed in this research, derived from the publicly available ChestX-ray8 dataset, serves as a
fundamental resource for the diagnosis of thoracic diseases through chest radiographs. This section delineates
the critical steps involved in the Exploratory Data Analysis (EDA) and pre-processing procedures, vital for
enhancing both interpretability and the subsequent efficacy of the machine learning models.
To obviate potential data leakage, a meticulous approach to patient-level splitting is adopted, thereby
preventing inadvertent occurrences of the same patient's radiographic images in both the training and test
datasets. This meticulous partitioning strategy ensures the integrity of the model evaluation process.
The dataset encompasses 14 pathology classes, including Atelectasis, Cardiomegaly, Consolidation,
Edema, Effusion, Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nodule, Pleural Thickening, Pneumonia, and
Pneumothorax. Each pathology class exhibits varying prevalence within the dataset, necessitating nuanced
considerations during both EDA and subsequent model training.
The subsequent detailed EDA encompasses the scrutiny of individual X-ray images. Pixel value
distributions are analysed to gain insights into the intensity variations across image as shown in Figure 2.
Additionally, statistical characteristics, including dimensions, pixel intensity, and overall image quality, are
examined to inform subsequent pre-processing steps

Figure 2: Pixel Value Distribution

The subsequent detailed EDA encompasses the scrutiny of individual X-ray images. Pixel value distributions are
analyzed to gain insights into the intensity variations across images. Additionally, statistical characteristics,
including dimensions, pixel intensity, and overall image quality, are examined to inform subsequent pre-
processing steps. The images have dimensions of 1024 pixels width and 1024 pixels height, with one single
colour channel. The maximum pixel value is 0.9804, the minimum is 0.0000, the mean value of the pixels is
0.4796, and the standard deviation is 0.2757 as shown in Figure 2.

3
Figure 3: Single Image Investigation

The pre-processing phase is paramount for effective model training. Standardization, facilitated
through the Keras ImageDataGenerator, ensures a mean pixel value of zero and a standard deviation of one.
This critical step enhances model convergence during training. Addressing class imbalance, a ubiquitous
challenge in medical datasets, involves the introduction of weighted loss functions. These functions assign
distinct weights to positive and negative cases based on their frequency, mitigating the impact of class
imbalances and fostering equal contribution from each pathology during model training.

Figure 4: Frequency of Each Class

Visualizing the frequency distribution of each class underscores the significant imbalances inherent in
certain pathologies. Computation of class-wise positive and negative frequencies, coupled with the
determination of class-specific weights, contributes to the establishment of a balanced and nuanced model
training paradigm. The integration of a weighted loss function into the model training pipeline augments model
performance, ensuring equitable consideration of each pathology.
In conclusion, the rigorous EDA and pre-processing methodologies outlined herein form the bedrock
for the subsequent development of a robust and effective diagnostic model for thoracic diseases. The nuanced
understanding of dataset intricacies, coupled with the mitigation of class imbalances and standardization of
image data, collectively contribute to the creation of a reliable and accurate diagnostic framework.

2.3 NORMALIZATION

Normalization of input data stands as a pivotal step in optimizing the performance of the DenseNet-121 model
in our research endeavour. The adoption of standardization, where pixel values undergo transformation to attain
a mean of zero and a standard deviation of one, is instrumental in ensuring uniformity across input features. The
image is resized to a standardized 512x512 pixel dimension, and pixel values are adjusted based on the mean

4
and standard deviation derived from the ImageNet training set. This is particularly crucial for DenseNet-121,
characterized by densely connected blocks. The process of normalization serves to stabilize the training
procedure, mitigating the impact of varied scales in images and preventing the dominance of individual features.
Aligned with the architecture of DenseNet-121, this normalization strategy enhances convergence, regulates
activation function behaviour, and contributes to the overall efficiency of the model. The meticulous
normalization process facilitates the successful fine-tuning of DenseNet-121, thereby augmenting its predictive
accuracy and robustness.

2.4 CHEXNEXT

DenseNet-121, a robust convolutional neural network architecture, is at the core of our chest X-ray image
analysis. DenseNet-121 is chosen for its unique dense connectivity pattern, promoting feature reuse and
enhancing model capacity. This architecture is particularly beneficial in medical image analysis, where the
extraction of intricate features is crucial for accurate diagnosis. We leverage the pre-trained DenseNet-121 on a
vast dataset, such as ImageNet, to harness its learned features from a diverse range of visual patterns.
The fine-tuning process is a pivotal step in adapting DenseNet-121 to the nuances of chest X-ray
images. Fine-tuning involves adjusting model parameters to specialize in discerning disease-related features. In
our case, the fine-tuning occurs on a comprehensive chest X-ray dataset that includes various pathologies. This
ensures that the model becomes adept at capturing the subtle visual cues indicative of thoracic diseases. The
augmentation of the dataset with more data is a strategic move, enhancing the model's ability to generalize
across a broader spectrum of cases, including rare pathologies and diverse patient populations.
The hierarchical feature extraction capabilities of DenseNet-121 are integral to our model's success
because of its dense networks as shown in Figure 5. Its densely connected blocks enable efficient information
flow, capturing intricate details within chest X-ray images. We exploit these features to empower CheXNeXt in
accurately identifying and classifying a diverse array of thoracic pathologies. This versatility is particularly
valuable in the medical domain, where conditions may manifest in various ways.

Figure 5: DenseNet121 Architecture

In terms of data, our chest X-ray dataset, enriched with a plethora of annotated images, contributes to
the model's robustness. The fine-tuning process involves training the model on this expansive dataset, ensuring
that it adapts to the intricacies of medical imaging. The augmentation of data involves feeding the model with
additional instances, exposing it to a more extensive variety of cases. This not only improves the model's ability
to generalize but also enhances its sensitivity to rare conditions and subtle abnormalities.
CheXNeXt, therefore, stands as a testament to the synergy between DenseNet-121 and an enriched
dataset. Through meticulous fine-tuning and data augmentation, we optimize the model's performance, aiming
for heightened accuracy in chest X-ray pathology detection. This approach represents a significant stride in
leveraging deep learning for medical diagnostics, showcasing the potential of DenseNet-121 in enhancing the
capabilities of chest X-ray analysis systems.

5
2.5 GRADCAM

GradCAM, or Gradient-weighted Class Activation Mapping, constitutes a pivotal interpretative tool in our
research, particularly when applied to the predictions generated by the CheXNeXt algorithm. It functions by
generating heat maps that spotlight regions within chest radiographs that significantly influence the algorithm's
classification of specific pathologies. Through these heat maps, the algorithm's decision-making process
becomes transparent, elucidating the critical features in the image that contribute to its predictions.
Within our research framework, the integration of GradCAM is seamlessly realized alongside the
DenseNet121 architecture, which serves as the neural network model for the CheXNeXt algorithm.
DenseNet121, a convolutional neural network tailored for image data, exhibits a distinctive characteristic of
direct connections between every layer within a block. This architectural choice facilitates an efficient flow of
information, enabling the model to glean insights from various layers simultaneously. Trained on an extensive
dataset, the DenseNet121 model autonomously learns parameters, empowering the CheXNeXt algorithm to
make accurate predictions on previously unseen chest radiographs. This integration exemplifies a robust synergy
between GradCAM's interpretative capabilities and DenseNet121's proficiency in handling intricate image-
based diagnostic tasks as shown in Figure 6

Figure 6: GradCAM on Single Label

Visualizing multiple labels using GradCAM offers a comprehensive understanding of how the CheXNeXt
algorithm processes and interprets chest radiographs for various pathologies. The GradCAM technique, which
highlights regions contributing to the model's predictions, becomes particularly insightful when applied to
multiple labels simultaneously as shown in Figure 7.

Figure 7: GradCAM on Multiple Labels

III. RESULTS AND DISCUSSION

The study employed the CheXNeXt algorithm to assess the effectiveness of deep learning in chest radiograph
diagnosis, revealing promising results. The algorithm exhibited a specificity of 0.927 and a sensitivity of 0.594
across 14 pathologies, surpassing radiologists in certain areas like effusion but lagging behind in others,
including cardiomegaly, emphysema, and hernia. The mean proportion correct for all pathologies stood at 0.828,
showcasing the algorithm's overall competence. However, notable variations in performance were observed,
particularly for pathologies with low prevalence during training. The relabelling process, while improving
performance in some instances, posed challenges in others, highlighting the need for ongoing research and

6
refinement to enhance the accuracy and reliability of deep learning algorithms in clinical practice. Figure 8
depicts how the model has performed well in detecting multiple pathologies.

Figure 8: Multi- Pathological Detection by CheXNeXt

In evaluating accuracy through the area under the receiver operating characteristic curve (AUC - ROC),
CheXNeXt demonstrated significant prowess. Specifically, it achieved a statistically higher AUC of 0.862 (95%
CI 0.825–0.895) for atelectasis compared to radiologists, whose AUC was 0.808 (95%) as shown in Figure 8.

Figure 9: AUC Curve

Across various pathologies, the algorithm matched or outperformed radiologists in 11 instances, but the
radiologists excelled in AUC performance for cardiomegaly, emphysema, and hiatal hernia. These findings
underscore the potential of deep learning algorithms like CheXNeXt in enhancing diagnostic capabilities, while
also emphasizing the nuanced challenges and areas for improvement, particularly in handling pathologies with
lower prevalence and refining algorithmic performance across diverse conditions.

7
Pathology Radiologists Algorithm Algorithm - Radiologists Difference Advantage
Atelectasis 0.808 0.862 0.053 Algorithm
Cardiomegaly 0.888 0.831 -0.057 Radiologists
Consolidation 0.841 0.893 0.052 No Difference
Edema 0.910 0.924 0.015 No Difference
Effusion 0.900 0.901 0.000 No Difference
Emphysema 0.911 0.704 -0.208 Radiologists
Fibrosis 0.897 0.806 -0.091 No Difference
Hernia 0.985 0.851 -0.133 Radiologists
Infiltration 0.734 0.721 -0.013 No Difference
Mass 0.886 0.909 0.024 No Difference
Nodule 0.899 0.894 -0.005 No Difference
Pleural thickening 0.779 0.798 0.019 No Difference
Pneumonia 0.823 0.851 0.028 No Difference
Pneumothorax 0.940 0.944 0.004 No Difference

Table 1: Evaluation of Model and Radiologists

The evaluation of the CheXNeXt algorithm for chest radiograph diagnosis demonstrated its considerable
potential in augmenting diagnostic capabilities compared to expert radiologists. With a specificity of 0.927 and a
sensitivity of 0.594 across 14 pathologies, CheXNeXt outperformed radiologists in specific instances, notably
excelling in the detection of atelectasis as shown in Table 1. However, variations in performance were observed,
and radiologists demonstrated superiority in pathologies like cardiomegaly, emphysema, and hernia. The
algorithm showcased overall competence, with a mean proportion correct of 0.828. In AUC-ROC analysis,
CheXNeXt exhibited significant prowess, achieving a higher AUC of 0.862 for atelectasis compared to
radiologists (AUC 0.808). While the algorithm matched or outperformed radiologists in 11 instances, challenges
associated with pathologies of low prevalence and the impact of the relabelling process were acknowledged.
The study emphasizes the efficiency of CheXNeXt, completing image interpretation substantially faster than
radiologists, addressing concerns related to prolonged waiting times for medical evaluations. Despite challenges,
the research underscores the potential of deep learning algorithms, such as CheXNeXt, to revolutionize chest
radiograph diagnostics, complement radiological expertise, and enhance patient access to timely and accurate
interpretations globally.

IV CONCLUSIONS

In conclusion, this project underscores the transformative potential of the CheXNeXt algorithm in reshaping
chest radiograph diagnostics. The algorithm, designed to detect 14 different pathologies, demonstrated expert-
level performance on numerous fronts, surpassing radiologists in specific pathologies and showcasing efficiency
in image interpretation. The study illuminates the profound impact of deep learning algorithms, like CheXNeXt,
in mitigating challenges associated with radiologist shortages, reducing diagnostic errors, and expediting patient
access to timely medical evaluations. Notably, the integration of GradCAM further elucidates the interpretative
capabilities of CheXNeXt, providing transparency into the algorithm's decision-making process. While
challenges exist, particularly in handling pathologies of low prevalence, the research emphasizes the need for
ongoing refinement and research to enhance the accuracy and reliability of deep learning algorithms in clinical
practice. Ultimately, the findings position CheXNeXt as a promising advancement in the realm of chest
radiograph interpretation, paving the way for a future where advanced algorithms contribute significantly to
global healthcare accessibility and diagnostic precision.

8
V REFERENCES

1. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification
of
skin cancer with deep neural networks. Nature. 2017 Feb; 542(7639):115–8. https://doi.org/10.1038/
nature21056 PMID: 281174455. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken
B, Karssemeijer N, Litjens G, et al.
2. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in
Women With Breast Cancer. JAMA. 2017 12; 318(22):2199–210. https://doi.org/10.1001/jama.2017.
14585 PMID: 29234806
3. Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K, et al. Training and Validating a
Deep Convolutional Neural Network for Computer-Aided Detection and Classification of
Abnormalities on Frontal Chest Radiographs. Invest Radiol. 2017; 52(5):281–7.
4. Bar Y, Diamant I, Wolf L, Lieberman S, Konen E, Greenspan H. Chest pathology detection using deep
learning with non-medical training. In: 2015 IEEE 12th International Symposium on Biomedical
Imaging (ISBI). 2015. p. 294–7.
5. Maduskar P, Muyoyeta M, Ayles H, Hogeweg L, Peters-Bax L, van Ginneken B. Detection of
tuberculosis using digital chest radiography: automated reading vs. interpretation by clinical officers.
6. Lakhani P, Sundaram B. Deep Learning at Chest Radiography: Automated Classification of Pulmonary
Tuberculosis by Using Convolutional Neural Networks. Radiology. 2017 Apr 24; 284(2):574–82.
https://
doi.org/10.1148/radiol.2017162326 PMID: 28436741
7. Setio AAA, Ciompi F, Litjens G, Gerke P, Jacobs C, Riel SJ van, et al. Pulmonary Nodule Detection in
CT Images: False Positive Reduction Using Multi-View Convolutional Networks. IEEE Trans Med
Imaging. 2016 May; 35(5):1160–9. https://doi.org/10.1109/TMI.2016.2536809 PMID: 26955024
8. Yao L, Poblenz E, Dagunts D, Covington B, Bernard D, Lyman K. Learning to diagnose from scratch
byexploiting dependencies among labels. ArXiv171010501 Cs [Internet]. 2017 Oct 28. Available from:
http://arxiv.org/abs/1710.10501. [cited 2017 Oct 28].
9. Donovan T, Litchfield D. Looking for Cancer: Expertise Related Differences in Searching and
Decision
Making. Appl Cogn Psychol. 2013 Jan 1; 27(1):43–9
10. Huang G, Liu Z, Maaten L v d, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 2261–9

9
10

You might also like