Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/375355403

CheXMed: a Multimodal Learning Algorithm for Pneumonia Detection in the


Elderly

Article in Information Sciences · January 2024


DOI: 10.1016/j.ins.2023.119854

CITATION READS

1 14

12 authors, including:

Fengshi Jing Jiandong Zhou


City University of Macau The University of Warwick
43 PUBLICATIONS 111 CITATIONS 197 PUBLICATIONS 1,398 CITATIONS

SEE PROFILE SEE PROFILE

Zhongzhi Xu
Sun Yat-Sen University
40 PUBLICATIONS 268 CITATIONS

SEE PROFILE

All content following this page was uploaded by Jiandong Zhou on 27 November 2023.

The user has requested enhancement of the downloaded file.


Information Sciences 654 (2024) 119854

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

CheXMed: A multimodal learning algorithm for pneumonia


detection in the elderly
Hao Ren a, b, 1, Fengshi Jing a, b, 1, Zhurong Chen c, Shan He d, Jiandong Zhou e, Le Liu f,
Ran Jing a, Wanmin Lian g, Junzhang Tian a, Qingpeng Zhang h, i, *, Zhongzhi Xu j, *,
Weibin Cheng a, k, *
a
Institute for Healthcare Artificial Intelligence Application, Guangdong Second Provincial General Hospital, Guangzhou, China
b
Faculty of Data Science, City University of Macau, Macau SAR, China
c
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
d
Department of Health and Physical Education, The Education University of Hong Kong, Hong Kong SAR, China
e
Warwick Medical School, The University of Warwick, United Kingdom
f
College of Foreign Languages, Jishou University, Zhangjiajie, China
g
Information Department, Guangdong Second Provincial General Hospital, Guangzhou, China
h
Department of Pharmacology and Pharmacy, The University of Hong Kong, Hong Kong SAR, China
i
Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China
j
School of Public Health, Sun Yat-sen University, Guangzhou, China
k
School of Data Science, City University of Hong Kong, Hong Kong SAR, China

A R T I C L E I N F O A B S T R A C T

Keywords: Pneumonia can be a deadly illness for particular populations, one of which is older adults. While
Multimodal learning studies have successfully trained artificial intelligent assisted diagnostic tools to detect pneu­
Pneumonia detection monia using chest X-ray images, they were targeted to the general population without stratifi­
AI-assisted precision medicine
cation on age groups. This study (a) investigated the performance disparities between geriatric
Deep neural networks
Medical image processing
and younger patients when using chest X-ray images to detect pneumonia, and (b) developed and
tested a multimodal model called CheXMed that incorporates clinical notes together with image
data to improve pneumonia detection performance for older people. Accuracy, precision, recall,
and F1-score were used for model performance evaluation. CheXMed outperforms baseline
models on all evaluation metrics. The accuracy, precision, recall, and F1-score are 0.746, 0.746,
0.740, 0.743 for CheXMed, 0.645, 0.680, 0.535, 0.599 for CheXNet, 0.623, 0.655, 0.521, 0.580
for DenseNet121, and 0.610, 0.617, 0.543, 0.577 for ResNet18.

1. Introduction

Pneumonia, caused by viruses, bacteria, or fungus, is one of the leading causes of morbidity and mortality from infection in the
elderly [1–3]. People at any stage of life can be affected by pneumonia, but the risk is not evenly distributed over age: the older adults
are more vulnerable than the younger people in terms of pneumonia. Almost one million adults 65 years or over are hospitalized with
pneumonia yearly in the United States, raising heavy disease burden [4]. Since symptoms in older adults are often atypical and can

* Corresponding authors.
E-mail addresses: qpzhang@hku.hk (Q. Zhang), xuzhzh26@mail.sysu.edu.cn (Z. Xu), chwb817@gmail.com (W. Cheng).
1
Hao Ren and Fengshi Jing contributed equally to this work.

https://doi.org/10.1016/j.ins.2023.119854
Received 8 May 2023; Received in revised form 27 September 2023; Accepted 31 October 2023
Available online 4 November 2023
0020-0255/© 2023 Elsevier Inc. All rights reserved.
H. Ren et al. Information Sciences 654 (2024) 119854

differ from classic pneumonia symptoms, detecting pneumonia in this population can be more challenging, potentially resulting in a
delay in diagnosis and treatment.
Chest X-ray is the most used material for the diagnosis of pneumonia [5–7]. Nevertheless, it can be difficult and time-consuming for
detecting pneumonia through chest X-ray in the elderly, even for skilled radiologists [8–11]. Consequently, extensive studies have
been conducted to develop and test AI endowed image processing tools in assisting the diagnosis of pneumonia [12–15]. While these
studies illustrate the promise of AI techniques for identifying pneumonia, they have two main limitations. First, these studies applied
the trained models to the general population without stratifying age groups. For example, Minaee adopted ResNet18 to detect
pneumonia from chest X-rays and validated in people aged 14 years or older [16]; Pranav proposed CheXNet to detect pneumonia from
chest X-rays and reported validation results without age stratification [17]. It remains unexplored whether these models would
demonstrate different performance for different age groups: Compared with young adults, detecting pneumonia in geriatric patients
using chest X-ray presents some limitations due to comorbidity, lack of cooperation, and difficulty in maintaining posture [18–20]. It is
therefore reasonable to hypothesize that AI-based computational tools might perform less effectively in the elderly. But evidence
related to such a hypothesis is lacking.
Second, although chest X-ray has been the most used material for patients with respiratory signs and symptoms, the appearance of
pneumonia in X-ray images is often vague, can overlap with other diagnoses, and can mimic many other benign abnormalities [21].
This is especially true for the elderly patients. Evidence has shown that electronic medical record (EMR) data contain salient infor­
mation about pneumonia. For example, given a pulmonary abnormality with symptoms of fever and cough, pneumonia would be
appropriate rather than less specific terms such as infiltration or consolidation [22]. However, as far as the authors know, AI diagnostic
models leveraging image data and EMR to detect pneumonia remain unexplored. Models that can harness together the two types of
data are also underdeveloped.
To fill these gaps, we first tested three most used pneumonia detection models for geriatric and younger patients, separately, and
compared the diagnostic performance for the two specific age groups. Second, we collected two types of data, EMR and chest X-ray
images, from a local hospital and developed and tested a novel multi-input neural network model that aims to improving the pneu­
monia detection accuracy in the elderly. This AI assisted diagnostic tool is advantageous especially for regional hospitals with limited
access to experienced radiologists.
The main contributions of this study are summarized as follows:
We investigated the performance disparities between geriatric and younger patients of existing state-of-the-art models using chest
X-ray images to detect pneumonia. Results showed that models designed for the general population without stratifying age groups
performed poor in the elderly. This result adds new knowledge to existing literature.
A multimodal learning algorithm, CheXMed, is developed and tested using real-world data. CheXMed incorporates clinical notes
together with image data, outperforming benchmark models on all evaluation metrics for pneumonia detection in the elderly.
A dataset containing images and clinical notes is released. To the best of the authors’ knowledge, this open source is the first of its
kind in terms of lung disease multimodal data. The data can be reached at https://github.com/r1293497424h/Pneumonia-dataset.git.

2. Material and methods

In this section, we first introduce three most used pneumonia detection models, also known as the existing state-of-the-art models.
We then lead to our proposed novel multimodal algorithm.

Fig. 1. Flowchart of inclusion criteria and data selection.

2
H. Ren et al. Information Sciences 654 (2024) 119854

2.1. Data collection

We hypothesized that a pneumonia detection model might perform differently for the elderly and younger people. To test this
hypothesis, we examined three most used pneumonia detection models, CheXNet [17], DenseNet121 [23], and ResNet18 [12,16,24],
for geriatric and younger patients, and compared the diagnostic performance for the two specific age groups. Data used to train and test
the three models are Chest X-ray14 [25], CheXpert [26], and RSNA [27], respectively. ChestX-ray14 (/CheXpert, /RSNA) test data
contains 21,813 (/5,522, /2,684) frontal-view X-ray images. Of those, 5,262 (/3,119, /639) belongs to the older people (aged 60 years
old or over) and the rest 16,551 (/2,403, /2,045) are from the younger people (aged 15 to 59 years old).
Apart from implementing existing models using open access data sets, we also collected data from a local hospital. Between January
1, 2017 and October 31, 2021, 8,203 inpatients with a discharge record of J18.x, C34.x, J43.x, J44.x, J96.x, or J47.x (the ICD-10 codes
of pneumonia, lung cancer, pulmonary emphysema, chronic obstructive pulmonary disease (COPD), respiratory failure, and bron­
chiectasis, respectively) were included in the analysis. Patients will be further excluded if they have a comorbidity of pneumonia with
any of the other five types of diseases (i.e., the discharge record contains a mixture of J18.x and C34.x, or J43.x, or J44.x, or J96.x, or
J47.x.). The resulting research samples consisted of 2,572 patients. Of those, 1,275 are pneumonia patients (positive samples), 1,297
are with other types of pulmonary diseases (negative counterparts). See Fig. 1 for an illustration of the sample inclusion and exclusion
process.
The collected database contains two data types; They are clinical notes and frontal-view chest X-ray images. Clinical entities were
structuralized using automatic extraction methods (Named Entity Recognition (NER) [28,29]). It contains demographic data (e.g., age,
gender), clinical data (e.g., symptoms, historical diagnoses), and laboratory data (e.g., expectoration, triglyceride level, gastric
distention). See SI Part A for the complete list of medical record variables.

2.2. CheXMed

To harness the EMR and image data, we proposed CheXMed, a multi-input neural network framework to identify pneumonia
automatically, with a particular focus on the elderly. The general architecture of CheXMed was shown in Fig. 2. It takes clinical notes
and frontal-view chest X-ray images as input, and outputs the probability that a person has developed pneumonia.
Clinical notes are inherently unstructured, often presented as free-text narratives that don’t readily lend themselves to direct
analysis. To begin transforming this data into a more structured format, we initially applied named entity recognition (NER) tech­
niques. This approach allowed us to identify and extract specific entities, such as variable names or medical terminologies, embedded
within the free-text notes.
Once these entities were identified, we employed regular expressions to pinpoint and extract their corresponding values, trans­
lating sections of narrative text into structured data pairs. For instance, if a clinical note mentioned “Blood Pressure: 120/80 mmHg,”
our system would first recognize “Blood Pressure” as an entity and subsequently use regular expressions to extract the value “120/80
mmHg,” thereby transforming this unstructured data into a structured format ready for further analysis.
A CatNN [30] module was adopted to handle the sparse and high dimensional features. CatNN first converted a condensed and
concentrated version of a sparse vector into a low-dimensional format:

Fig. 2. The overall framework of CheXMed. CheXMed takes clinical notes and frontal-view chest X-ray images as input, and outputs the prob­
ability that a person has developed pneumonia. A CatNN module was adopted to handle the sparse and high dimensional medical record features. It
outputs the representation vectors of medical records (VM). A convolutional neural network (CNN) was used to represent images as vectors (vectors
of images, VI). Finally, the above-mentioned two vectors, VM and VI, were concatenated to one vector, representing the patient. We fed the patient
vector into a binary Supporting Vector Machine (SVM) classifier to distinguish pneumonia patients from others.

3
H. Ren et al. Information Sciences 654 (2024) 119854

ESi (xi ) = embedding lookup(Si , xi ) (1)

where xi represents the i-th feature of medical records. By using back-propagation, Si can learn the storage of all embeddings related to
the i-th feature. ESi (xi ) the represents the relevant embedding vector that corresponds to xi .
ESi (xi ) is then fed into two components, the FM component and the Deep component. A 2-way FM (degree d = 2) captures all single
and pairwise interactions between variables. Formally, a scalar μ is used to weigh its order-1 importance; ESi (xi ) is used to model order-
2 feature interactions:

n ∑
n
〈 ( )〉
gFM (x) = μ0 + 〈μ, x〉 + ESi (xi ), ESj xj xi xj , (2)
i=1 j=i+1

where the model parameters that have to be estimated are: μ0 ∈ R, μ ∈ Rn ,ES (x) ∈ Rn×k . 〈., .〉 is the dot product of two vectors of size k. A
row ESi (xi ) within ES (x) describes the i-th variable with k factors. k ∈ N0+ is a hyperparameter that defines the dimensionality of the
factorization.

• u0 is the global bias.


• ui models the strength of the i-th variable.
• u i,j = 〈ESi (xi ), ESj (xj )〉 models the interaction between the i-th and j-th variable. Instead of using an own model parameter ui,j ∈ R for
̂
each interaction, the FM models the interaction by factorizing it. This is the key point which allows high quality parameter esti­
mates of higher-order interactions (d ≥ 2) under sparsity.

The Deep component, which is a feed-forward neural network, is employed to learn high-order feature interactions:
{[ ]T }
gDeep (x) = N ES1 (x1 )T , ES2 (x2 )T , ..., ESn (xn )T |β , (3)

where N{x|β} represents the multi-layered neural network model, x is the input variable while β is the parameter.
Combining the two components, we can obtain the final expression of CatNN output:
gCatNN (x) = gDeep (x) + gFM (x) (4)

In the rest of the paper, we call the output of CatNN “vectors of medical records (VM)”.
For image processing, we employed CheXNet to convert images into vector representations (termed as ’vectors of images’, or VI).
CheXNet (i.e., the Ablated CheXMed VI that will be detailed later) leverages the same network structure to DenseNet121 [23]. A key
distinction is CheXNet’s additional pre-training process on ImageNet.
Finally, the aforementioned two vectors, VM and VI, were concatenated to one vector, representing the patient. We fed the patient
vector into a binary Supporting Vector Machine (SVM) [31,32] classifier to distinguish pneumonia patients from others.
We also conducted experiments using image data alone to explore the putative added value of EMR data. It is worth noting that,
instead of discriminating pneumonia with “no findings”, we formulated our goal as a more challenging task, which is to distinguish
pneumonia from other pulmonary diseases (lung cancer, COPD, emphysema, bronchitis, and respiratory failure). Such manipulation
has clear clinical implications: it is usually other types of lung diseases that are an obstacle to identifying pneumonia, even for skilled
radiologists.

2.3. Computational complexity

The CatNN embedding, which processes high-dimensional sparse vectors into dense formats, primarily depends on the dimen­
sionality of the input and the embedding size, giving it a complexity of O(n). Feature Interactions with FM: Linear interactions have a
complexity of O(n). Pairwise interactions have a complexity of O(n2 ) since we are considering interactions between all pairs of fea­
tures. Hence, the total complexity for FM would be O(n +n2 ) which simplifies to O(n2 ). Moving to the feature interactions with FM:
Linear interactions possess a complexity of O(n2 ), while pairwise interactions, involving considerations of all feature pairs, exhibit a
complexity of O(n2 ). Consequently, the overall complexity of FM simplifies to O(n2 ).
The Deep component’s complexity is determined by the architecture of the neural network, which typically would be O(l × n2 ) for l
layers and n neurons per layer, considering fully connected layers. The CNN, Densenet121, employed for image processing has its own
inherent complexity, mainly based on the depth of the network and the size of the input image, typically O(d × w × h), where d is the
depth, and w and h are the width and height of the image, respectively. Lastly, the SVM classifier’s complexity is O(n2 ) for training. The
overall computational complexity of the proposed model relies nearly on O(d × w × h) since it is way greater than the values of other
components.

2.4. Baselines

We compare CheXMed with the following state-of-the-art baselines:


•Densenet121 [23], part of the Densely Connected Convolutional Networks family, emphasizes dense connections between layers

4
H. Ren et al. Information Sciences 654 (2024) 119854

to reuse features.
•CheXNet [17], is a model specifically tailored for chest X-ray diagnostics and employs the DenseNet121 architecture.
•ResNet18 [33], a variant of the Residual Networks, introduces skip or residual connections to avoid vanishing gradient problems.
•VGG16 [34], designed by the Visual Geometry Group, is characterized by its repeated use of 3x3 convolutions and increasing
depth.
•GoogLeNet [35], also known as Inception, incorporates “inception modules” allowing for multi-scale feature learning.
•AlexNet [36], one of the pioneering deep architectures, brought attention to deep learning in computer vision with its win in the
ImageNet competition in 2012.
They are classic CNN-base models specifically tailored for chest X-ray diagnostics, with a set of differences on the number of layers,
the density of layers, residual connections.
•CNNMD [37] uses dropout in the convolutional part of the network to improve model accuracy.
•MulNet [11]. trains neural networks and Bayesian networks concurrently. It can deal with multi-modal data.
•Ablated CheXMed VM and Ablated CheXMed VI are two ablated models that aims at exploring the effectiveness of VM and VI
components. Ablated CheXMed VM is CheXMed with the ablation of VI component; Ablated CheXMed VI is CheXMed with the ablation
of VM component.

2.5. Evaluation metrics

For all the experiments in this study, 80 % of the samples were used as training set, 10 % as validation set, and the remaining 10 %
as test set. For task 1, we reported ROCAUC to make it comparable to the CheXNet, DenseNet121, and ResNet18 literature. For task 2,
we use accuracy, precision, recall, AUC, and F1-score as evaluation metrics (See SI Part B for their calculation methods). All the ex­
periments were conducted fifty times. The mean value and 95 % confidence intervals were reported.

2.6. Experiments setup

Keras [38] (version 2.3), a Python [39] interface for artificial neural networks, was used to develop CheXMed. CheXNet, Dense­
Net121, and ResNet18 were implemented using Pytorch [40] (version 1.11.0), an open source machine learning framework. The SVM
classifier was implemented using MATLAB classification learner. The experimental environment was an Ubuntu Linux server with
Kaby Lake GT2 GPU.

2.7. Data availability and Ethics statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. We have
obtained the ethical approval of this study from the Ethics Committee of Guangdong Second Provincial General Hospital.

3. Results

3.1. The performance divergence in different age groups

We first report the performance of the three popular pneumonia detection models CheXNet, DenseNet121, and ResNet18 in
different age groups in Table 1. CheXNet achieved an AUC of 0.768 in the general population, 0.712 in the elderly, and 0.780 in the
younger people. The performance in the elderly age group is 9.5 % lower than in the younger age group. DenseNet121 and ResNet18
demonstrated patterns similar to CheXNet: DenseNet121 achieved an AUC of 0.757 in the general population, 0.734 in the elderly, and
0.778 in the younger people. The performance in the elderly is 6.0 % lower than in the younger age group. ResNet18 achieved an AUC
of 0.888 in the general population, 0.842 in the elderly, and 0.900 in the younger people. The performance in the elderly is 6.9 % lower
than in the younger age group. The divergences were consistently significant (p < 0.001). Note that in the original literature about the
three models, such age-specific results were not shown.

Table 1
AUC of CheXNet, DenseNet121, and ResNet18 for different age groups in different test data sets.
Model Test data Number of cases AUC(95 % CI) t-test

CheXNet ChestX-ray14 (general population) 21,813 0.768 (0.765–0.770) –


ChestX-ray14 (older people) 5,262 0.712 (0.709–0.714) <0.001
ChestX-ray14 (younger people) 16,551 0.780 (0.777–0.783) <0.001
DenseNet121 CheXpert (general population) 5,522 0.757 (0.751–0.763) –
CheXpert (older people) 3,119 0.734 (0.730–0.738) <0.001
CheXpert (younger people) 2,403 0.778 (0.774–0.783) <0.001
ResNet18 RSNA (general population) 2,684 0.888 (0.885–0.893) –
RSNA (older people) 639 0.842 (0.840–0.845) <0.001
RSNA (younger people) 2,045 0.900 (0.899–0.901) <0.001

5
H. Ren et al. Information Sciences 654 (2024) 119854

3.2. Study sample characteristics

Regarding the proposed local hospital analyses, 1,275 pneumonia patients and 1,297 counterparts (i.e., patients with lung cancer,
COPD, emphysema, bronchitis, or respiratory failure) were included. Among them 679 (53.3 %) of the pneumonia patients and 676
(52.1 %) of the counterparts were men. 879 of the 1,275 pneumonia patients were aged between 60 over 79. 397 (31.1 %) are aged 80
or older. The distribution of such is similar for the counterparts. Study sample characteristics were shown in Table 2.

3.3. Detecting pneumonia in the elderly

Results of CheXMed and baseline models in detecting pneumonia in the elderly are shown in Table 3. CheXMed outperformed
baseline models on all metrics. Taking CheXNet as an example, CheXMed outperforms CheXNet by 10.1 % in accuracy, 6.6 % in
precision, 20.5 % in recall, and 14.4 % in F1-score. Such observation resonated with our previous findings that CheXNet cannot achieve
a fair performance for older people. It also indicates that, apart from X-ray images, clinical notes provided important auxiliary in­
formation to identify pneumonia. The ROCAUC curves of CheXNet and CheXMed are shown in Fig. 3 to demonstrate the changes in
sensitivity with specificity.
We conducted two ablation experiments to further explore the effectiveness of VM and VI components. The two ablation exper­
iments are called Ablated CheXMed VM (CheXMed with the ablation of VI component) and Ablated CheXMed VI (CheXMed with the
ablation of VM component). The accuracy, precision, recall, and F1-score for CheXMed VM are 0.622 (95 % CI: 0.620–0.624), 0.654
(95 % CI: 0.650–0.669), 0.520 (95 % CI: 0.517–0.522), and 0.579 (95 % CI: 0.578–0.581), respectively. For CheXMed VI, the accuracy,
precision, recall, and F1-score for CheXMed VM are 0.644 (95 % CI: 0.642–0.648), 0.679 (95 % CI: 0.675–0.685), 0.535 (95 % CI:
0.533–0.538), and 0.598 (95 % CI: 0.597–0.602), respectively. Results showed that either Ablated CheXMed VM or Ablated CheXMed
VI performed worse than CheXMed, demonstrating the effectiveness of concatenating both.

3.4. Interpretability

To explore important features for pneumonia detection in medical records, we trained a Random Forest (RF) model using only
medical records (that said, without images), and output the top 10 important variables. Since CheXMed is a CNN endowed black box
that is hardly interpretable, we took a step back by using the classic explainable machine learning model RF to the same task to get
some explainability. We anticipate RF can provide us insights about which are the most important features in medical records in the
sense of detecting pneumonia, although we recognize that such manipulation cannot mimic CheXMed exactly. Results of this analysis
are shown in Table 4.
Table 4 presents the results of important clinical features pertinent to pneumonia diagnosis versus its counterparts which include
lung cancer, COPD, emphysema, bronchitis, and respiratory failure. Notably, coughing was more pronounced in the counterparts
(50.7 %) compared to those diagnosed with pneumonia (28.8 %). This might be attributable to the chronic irritation or damage in the
respiratory pathways linked to these conditions, such as tumor obstruction in lung cancer or persistent inflammation in COPD and
bronchitis. Interestingly, the pressure end-tidal carbon dioxide levels manifest a marked deviation, with pneumonia patients pre­
dominantly registering between 35 and 45 mmHg (92.7 %), whereas a larger proportion of counterparts exhibited elevated levels
exceeding 45 mmHg (16.0 %). This could signify efficient carbon dioxide clearance in pneumonia, whereas conditions like COPD or
respiratory failure might hamper effective gaseous exchange. Expectoration was significantly higher in the counterparts (38.2 %)
compared to pneumonia (18.6 %), perhaps due to the persistent mucus production seen in chronic conditions like bronchitis. The D-
Dimer levels were comparatively higher in the pneumonia cohort (48.4 % vs 33.7 %), indicating a potential underlying coagulation
dysfunction or response to inflammation. Fatigue was also more frequently reported amongst pneumonia cases (24.3 %), possibly
reflecting the systemic impact of the infection. Gastric distention’s incidence in pneumonia at 3.8 % vs 9.4 % in counterparts may
underline non-specific symptomatology or concurrent gastrointestinal complications. Noteworthy is the slightly higher prevalence of
N-terminal b-type pro natriuretic peptide in pneumonia patients (34.4 %), hinting at possible cardiac involvement or strain. Vomiting
was more prevalent in pneumonia at 9.8 %, suggesting an association with systemic effects or certain causative organisms. Lastly,
triglyceride concentrations mostly resided between 0.45 and 1.69 mmol/L for both cohorts, although subtle deviations exist, indi­
cating the limited differentiation power of this parameter between the conditions.

Table 2
Characteristics of the study samples.
Characteristic Pneumonia Counterparts
Lung cancer COPD* Emphysema Bronchitis Respiratory failure
Number of cases 1,275 208 113 584 150 242
Age 60–69 503 (39.5 %) 95 (33.3 %) 73(64.6 %) 130(22.3 %) 31(20.7 %) 104(43.0 %)

70–79 375 (29.4 %) 63 (35.5 %) 32(28.3 %) 199(34.1 %) 79(52.7 %) 89(36.8 %)


>=80 397 (31.1 %) 50 (30.1 %) 8(7.1 %) 255(43.7 %) 40(26.7 %) 49(20.2 %)
Male sex 679 (53.3 %) 109 (52.4 %) 60(53.1 %) 312(53.4 %) 80(53.3 %) 129(53.3 %)
*
COPD stands for chronic obstructive pulmonary disease.

6
H. Ren et al. Information Sciences 654 (2024) 119854

Table 3
The performance of the CheXMed and baseline models.
Model Accuracy (95 % CI) Precision (95 % CI) Recall (95 % CI) F1-score (95 % CI)

CheXNet 0.645 (0.642–0.648) 0.680 (0.675–0.685) 0.535 (0.533–0.538) 0.599 (0.597–0.602)


DenseNet121 0.623 (0.620–0.626) 0.655 (0.651–0.670) 0.521 (0.518–0.524) 0.580 (0.578–0.583)
ResNet18 0.610 (0.605–0.616) 0.617 (0.612–0.622) 0.543 (0.539–0.546) 0.577 (0.573–0.581)
VGG16 0.563 (0.556–0.569) 0.551 (0.545–0.558) 0.509 (0.505–0.513) 0.529 (0.525–0.532)
GoogLeNet 0.609 (0.605–0.613) 0.580 (0.576–0.584) 0.532 (0.530–0.535) 0.555 (0.551–0.560)
AlexNet 0.618 (0.616–0.620) 0.593 (0.590–0.596) 0.538 (0.535–0.540) 0.564 (0.562–0.566)
CNNMD 0.641(0.638–0.644) 0.677 (0.675–0.679) 0.533 (0.532–0.534) 0.596 (0.595–0.607)
MulNet 0.611 (0.608–0.613) 0.642 (0.577–0.586) 0.524 (0.530–0.535) 0.578 (0.551–0.560)
Ablated CheXMed VM 0.622 (0.620–0.624) 0.654 (0.650–0.669) 0.520 (0.517–0.522) 0.579 (0.578–0.581)
Ablated CheXMed VI 0.644 (0.642–0.648) 0.679 (0.675–0.685) 0.535 (0.533–0.538) 0.598 (0.597–0.602)
CheXMed 0.746 (0.739–0.749) 0.746 (0.741–0.750) 0.740 (0.738–0.743) 0.743 (0.737–0.748)

Fig. 3. The ROCAUC curves for CheXMed and CheXNet.

Table 4
Features that are important for pneumonia detection in medical records.
Variables Pneumonia (n ¼ 1,275) Counterparts (n ¼ 1,297)

Cough 367 (28.8 %) 658 (50.7 %)


Pressure end-tidal carbon dioxide
1 (>45 mmHg) 45 (3.5 %) 208 (16.0 %)
0 (35–45 mmHg) 1,182 (92.7 %) 1,066 (82.2 %)
− 1 (<35 mmHg) 48 (3.8 %) 23 (1.8 %)
Expectoration 237 (18.6 %) 495 (38.2 %)
D-Dimer 617 (48.4 %) 437 (33.7 %)
Feel tired 310 (24.3 %) 188 (14.5 %)
Gastric distention 48 (3.8 %) 28 (9.4 %)
N-terminal b-type pro natriuretic peptide 439 (34.4 %) 367 (28.3 %)
Vomit 125 (9.8 %) 65 (5.0 %)
Triglyceride
1 (>1.69 mmol/L) 108 (8.5 %) 101 (7.8 %)
0 (0.45–1.69 mmol/L) 1158 (90.8 %) 1191 (91.8 %)
− 1 (<0.45 mmol/L) 9 (0.7 %) 5 (0.4 %)

4. Discussions

The disease burden of pneumonia in the elderly is much heavier than that in the young people. Existing AI endowed tools were
typically trained and validated in the general population. How effective they are for the elderly in specific, and how can we harness the
salient information contained in the medical records to improve our ability in detecting pneumonia remains unexplored. In this study,

7
H. Ren et al. Information Sciences 654 (2024) 119854

we have two main findings against this backdrop: (a) CheXNet, the most used pneumonia detection model trained on large scale X-ray
image data, performed unequally for the elderly and the younger people; The AUC is 9.9 % higher for people aged 14–59 than for
people aged 60 years and above. Analysis using the other two pneumonia diagnostic tools DenseNet121 and ResNet18 further reso­
nated such finding. (b) Apart from chest X-ray images, information contained in patients’ medical records is a powerful supplement to
pneumonia detection. In a model developed and trained using a local hospital data, and the performance is raised by up to 6.6 % in
precision and 20.5 % in recall, compared to the model trained using chest images alone.
The contribution of this paper is threefold. First, this study for the first time showed that there is an age effect on the performance of
X-ray images-based pneumonia detection models. Such knowledge is important because it informs us that age stratification is essential
for studies of this line: Models achieving good performances for the young adults or the general population do not guarantee that they
are useful for the elderly. It also informs us that health care personnel must be very careful when approving AI endowed pneumonia
detection tools to different age groups, although such tools may have been tested in the general population. Second, this study showed
solid evidence that, apart from X-ray images, patients’ medical records are also very crucial in the sense of identifying pneumonia.
Third, a dataset containing images and clinical notes is released. To the best of the authors’ knowledge, this open source is the first of its
kind in terms of lung disease multimodal data. The data can be reached at https://github.com/r1293497424h/Pneumonia-dataset.git.
One possible explanation for the age-related discrepancy would be that the quality of chest X-ray images in geriatric patients
presents some obstacles for pneumonia detection due to comorbidity, lack of cooperation, and difficulty in maintaining posture,
whereas chest X-ray images in younger people are clearer with little areas of comorbidity opacification. In a study that used chest X-ray
images to detect pneumonia in pediatric people [41], the authors even achieved an AUC of 0.97, which is much higher than research of
this kind in adults. This is because, as can be imagined, pediatric pneumonia typically exhibits a clear consolidation in X-ray images,
almost without any disturbances.
Several limitations of this study should be noted. To begin with, our study did not encompass all potential risk factors associated
with pneumonia. Missing from our medical records data are crucial behavioral factors such as excessive alcohol consumption, in
addition to socio-economic variables and genetic predispositions. Incorporating these features may refine and enhance the perfor­
mance of our approach. Furthermore, the model we devised was anchored in data sourced from a tertiary hospital. While we have
confidence in its efficacy, applying it universally warrants caution. To assert its broader applicability, additional validation from
diverse medical settings is necessary. Lastly, it’s worth highlighting the burgeoning advancements in Large Language Models (LLMs).
Their proven track record in robust generalization and feature discernment makes them prime candidates for objectives akin to ours.
Future research could benefit from tapping into LLMs, potentially improving the precision of pneumonia detection.

5. Conclusions

Pneumonia detection models achieving good performance for the young adults or the general population do not guarantee that they
are useful for the elderly. Although having been tested in the general population, medical personnel must be careful about applying
existing tools to the older people. Clinical records contain important salient information that can help improve pneumonia detection.
The novel diagnostic tool proposed in this paper can improve pneumonia detection accuracy, which is especially useful for the elderly
in parts of the world where access to experienced radiologists is limited.

CRediT authorship contribution statement

Hao Ren: Methodology, Data curation, Formal analysis, Writing – original draft. Fengshi Jing: Visualization, Data curation,
Funding acquisition, Methodology, Writing – review & editing. Zhurong Chen: Data curation. Shan He: Data curation, Resources.
Jiandong Zhou: Formal analysis, Methodology, Writing – review & editing. Le Liu: Software, Writing – review & editing. Ran Jing:
Project administration, Data curation. Junzhang Tian: Conceptualization, Funding acquisition, Supervision. Qingpeng Zhang:
Methodology, Conceptualization, Investigation, Validation, Writing – review & editing. Zhongzhi Xu: Conceptualization, Formal
analysis, Methodology, Writing – review & editing. Weibin Cheng: Software, Supervision, Conceptualization, Funding acquisition,
Methodology, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

Data will be made available on request.

Acknowledgment

This study is supported by the National Key R&D Program of China ( 2021YFC2009400), National Doctoral Workstation Foun­
dation of Guangdong Second Provincial General Hospital (2023BSGZ014), and Guangzhou Science and Technology Project
(SL2022A04J01130).

8
H. Ren et al. Information Sciences 654 (2024) 119854

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.ins.2023.119854.

References

[1] O. Ruuskanen, E. Lahti, L.C. Jennings, D.R. Murdoch, Viral pneumonia, The Lancet. 377 (2011) 1264–1275, https://doi.org/10.1016/S0140-6736(10)61459-6.
[2] I. Koivula, M. Sten, P.H. Makela, Risk factors for pneumonia in the elderly, Am J Med. 96 (1994) 313–320, https://doi.org/10.1016/0002-9343(94)90060-4.
[3] C. Troeger, B. Blacker, I.A. Khalil, P.C. Rao, J. Cao, S.R.M. Zimsen, Estimates of the global, regional, and national morbidity, mortality, and aetiologies of lower
respiratory infections in 195 countries, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016, Lancet Infect Dis. 18 (2018) 1191–1210,
https://doi.org/10.1016/S1473-3099(18)30310-4.
[4] F.W. Arnold, A.M. Reyes Vega, V. Salunkhe, S. Furmanek, C. Furman, L. Morton, A. Faul, P. Yankeelov, J.A. Ramirez, Older Adults Hospitalized for Pneumonia
in the United States: Incidence, Epidemiology, and Outcomes, J Am Geriatr Soc. 68 (2020) 1007–1014, https://doi.org/10.1111/jgs.16327.
[5] Ş. Öztürk, A. Alhudhai˙f, K. Polat, Attention-based end-to-end CNN framework for content-based X-ray image retrieval, Turkish Journal of Electrical Engineering
and Computer Sciences. 29 (2021) 2680–2693, https://doi.org/10.3906/elk-2105-242.
[6] Ş. Öztürk, E. Yi˙ği˙t, U. Özkaya, Fused deep features based classification framework for COVID-19 classification with optimized MLP, Konya Journal of,
Engineering Sciences. 8 (2020) 15–27. 10.36306/konjes.821782.
[7] Ş. Öztürk, E. Çelik, T. Çukur, Content-based medical image retrieval with opponent class adaptive margin loss, Inf Sci (n Y). 637 (2023) 118938, https://doi.org/
10.1016/j.ins.2023.118938.
[8] A.I. Khan, J.L. Shah, M.M. Bhat, CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images, Comput Methods Programs
Biomed. 196 (2020) 105581, https://doi.org/10.1016/J.CMPB.2020.105581.
[9] E. Ayan, H.M. Unver, Diagnosis of Pneumonia from Chest X-Ray Images Using Deep Learning, in, Scientific Meeting on Electrical-Electronics & Biomedical
Engineering and Computer Science (EBBT), IEEE 2019 (2019) 1–5, https://doi.org/10.1109/EBBT.2019.8741582.
[10] A.K. Jaiswal, P. Tiwari, S. Kumar, D. Gupta, A. Khanna, J.J.P.C. Rodrigues, Identifying pneumonia in chest X-rays: A deep learning approach, Measurement. 145
(2019) 511–518, https://doi.org/10.1016/J.MEASUREMENT.2019.05.076.
[11] H. Ren, A.B. Wong, W. Lian, W. Cheng, Y. Zhang, J. He, Q. Liu, J. Yang, C.J. Zhang, K. Wu, H. Zhang, Interpretable Pneumonia Detection by Combining Deep
Learning and Explainable Models With Multisource Data, IEEE Access. 9 (2021) 95872–95883, https://doi.org/10.1109/ACCESS.2021.3090215.
[12] A.M. Ismael, A. Şengür, Deep learning approaches for COVID-19 detection based on chest X-ray images, Expert Syst Appl. 164 (2021) 114054, https://doi.org/
10.1016/j.eswa.2020.114054.
[13] V. Chouhan, S.K. Singh, A. Khamparia, D. Gupta, P. Tiwari, C. Moreira, R. Damaševičius, V.H.C. de Albuquerque, A Novel Transfer Learning Based Approach for
Pneumonia Detection in Chest X-ray Images, Applied Sciences. 10 (2020) 559, https://doi.org/10.3390/app10020559.
[14] A. Narin, C. Kaya, Z. Pamuk, Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks, Pattern,
Analysis and Applications. 24 (2021) 1207–1220, https://doi.org/10.1007/s10044-021-00984-y.
[15] T. Ozturk, M. Talo, E.A. Yildirim, U.B. Baloglu, O. Yildirim, U. Rajendra Acharya, Automated detection of COVID-19 cases using deep neural networks with X-
ray images, Comput Biol Med. 121 (2020) 103792, https://doi.org/10.1016/j.compbiomed.2020.103792.
[16] S. Minaee, R. Kafieh, M. Sonka, S. Yazdani, G. Jamalipour Soufi, Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning, Med
Image Anal. 65 (2020) 101794, https://doi.org/10.1016/j.media.2020.101794.
[17] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, M.P. Lungren, A.Y. Ng, CheXNet: Radiologist-Level
Pneumonia Detection on Chest X-Rays with Deep Learning, (2017). http://arxiv.org/abs/1711.05225.
[18] A. Torres, M. El-Ebiary, R. Riquelme, M. Ruiz, R. Celis, Community-acquired pneumonia in the elderly, Semin Respir Infect. 14 (1999) 173–183.
[19] B.A. Cunha, Pneumonia in the elderly, Clinical Microbiology and Infection. 7 (2001) 581–588, https://doi.org/10.1046/j.1198-743x.2001.00328.x.
[20] V. Kaplan, D.C. Angus, M.F. Griffin, et al., Hospitalized Community-acquired Pneumonia in the Elderly, Am J Respir Crit Care Med. 165 (2002) 766–772,
https://doi.org/10.1164/ajrccm.165.6.2103038.
[21] J.E. Wipf, B.A. Lipsky, J.V. Hirschmann, E.J. Boyko, J. Takasugi, R.L. Peugeot, C.L. Davis, Diagnosing Pneumonia by Physical Examination, Arch Intern Med.
159 (1999) 1082, https://doi.org/10.1001/archinte.159.10.1082.
[22] C.J. Babcook, G.R. Norman, C.L. Coblentz, Effect of Clinical History on the Interpretation of Chest Radiographs in Childhood Bronchiolitis, Invest Radiol. 28
(1993) 214–217, https://doi.org/10.1097/00004424-199303000-00005.
[23] G. Huang, Z. Liu, L. van der Maaten, K.Q. Weinberger, Densely Connected Convolutional Networks, in: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, 2017: pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243.
[24] K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
IEEE, 2016: pp. 770–778. https://doi.org/10.1109/CVPR.2016.90.
[25] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R.M. Summers, ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised
Classification and Localization of Common Thorax Diseases, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017: pp.
3462–3471. https://doi.org/10.1109/CVPR.2017.369.
[26] J. Irvin, P. Rajpurkar, M. Ko, Y. Yu, S. Ciurea-Ilcus, C. Chute, H. Marklund, B. Haghgoo, R. Ball, K. Shpanskaya, J. Seekins, D.A. Mong, S.S. Halabi, J.K. Sandberg,
R. Jones, D.B. Larson, C.P. Langlotz, B.N. Patel, M.P. Lungren, A.Y. Ng, CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert
Comparison, n.d. www.aaai.org.
[27] G. Shih, C.C. Wu, S.S. Halabi, M.D. Kohli, L.M. Prevedello, T.S. Cook, Augmenting the National Institutes of Health Chest Radiograph Dataset with Expert
Annotations of Possible Pneumonia, Radiol Artif Intell. 1 (2019) e180041.
[28] B. Mohit, Named Entity Recognition, In (2014) 221–245, https://doi.org/10.1007/978-3-642-45358-8_7.
[29] D. Sui, Y. Chen, K. Liu, J. Zhao, S. Liu, Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network, in: Proceedings of
the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), Association for Computational Linguistics, Stroudsburg, PA, USA, 2019: pp. 3828–3838. https://doi.org/10.18653/v1/D19-1396.
[30] G. Ke, Z. Xu, J. Zhang, J. Bian, T.Y. Liu, DeepGBM: A deep learning framework distilled by GBDT for online prediction tasks, in: Proceedings of the ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, 2019: pp. 384–394. https://doi.org/10.1145/
3292500.3330858.
[31] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceedings of the Fifth Annual Workshop on Computational
Learning Theory - COLT ’92, ACM Press, New York, New York, USA, 1992: pp. 144–152. https://doi.org/10.1145/130385.130401.
[32] V. Vapnik, S. Golowich, A. Smola, Support vector method for function approximation, regression estimation and signal processing, Adv Neural Inf Process Syst. 9
(1996).
[33] M. Odusami, R. Maskeliūnas, R. Damaševičius, T. Krilavičius, Analysis of Features of Alzheimer’s Disease: Detection of Early Stage from Functional Brain
Changes in Magnetic Resonance Images Using a Finetuned ResNet18 Network, Diagnostics. 11 (2021) 1071, https://doi.org/10.3390/diagnostics11061071.
[34] D. Theckedath, R.R. Sedamkar, Detecting Affect States Using VGG16, ResNet50 and SE-ResNet50 Networks, SN Comput Sci. 1 (2020) 79, https://doi.org/
10.1007/s42979-020-0114-9.

9
H. Ren et al. Information Sciences 654 (2024) 119854

[35] P. Ballester, R. Araujo, On the Performance of GoogLeNet and AlexNet Applied to Sketches, Proceedings of the AAAI Conference on Artificial Intelligence. 30
(2016). https://doi.org/10.1609/aaai.v30i1.10171.
[36] H. Ismail Fawaz, B. Lucas, G. Forestier, C. Pelletier, D.F. Schmidt, J. Weber, G.I. Webb, L. Idoumghar, P.-A. Muller, F. Petitjean, InceptionTime: Finding AlexNet
for time series classification, Data Min Knowl Discov. 34 (2020) 1936–1962, https://doi.org/10.1007/s10618-020-00710-y.
[37] P. Szepesi, L. Szilágyi, Detection of pneumonia using convolutional neural networks and deep learning, Biocybern, Biomed Eng. 42 (2022) 1012–1022, https://
doi.org/10.1016/j.bbe.2022.08.001.
[38] J. Moolayil, An Introduction to Deep Learning and Keras, in: Learn Keras for Deep Neural Networks, Apress, Berkeley, CA, 2019: pp. 1–16. https://doi.org/
10.1007/978-1-4842-4240-7_1.
[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, others, Scikit-learn: Machine
learning in Python, The, Journal of Machine Learning Research. 12 (2011) 2825–2830.
[40] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury Google, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A.K. Xamla, E. Yang, Z. Devito,
M. Raison Nabla, A. Tejani, S. Chilamkurthy, Q. Ai, B. Steiner, L.F. Facebook, J.B. Facebook, S. Chintala, PyTorch: An Imperative Style, High-Performance Deep
Learning Library, 2019.
[41] D.S. Kermany, M. Goldbaum, W. Cai, C.C.S. Valentim, H. Liang, S.L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, J. Dong, M.K. Prasadha, J. Pei, M.Y.L. Ting,
J. Zhu, C. Li, S. Hewett, J. Dong, I. Ziyar, A. Shi, R. Zhang, L. Zheng, R. Hou, W. Shi, X. Fu, Y. Duan, V.A.N. Huu, C. Wen, E.D. Zhang, C.L. Zhang, O. Li, X. Wang,
M.A. Singer, X. Sun, J. Xu, A. Tafreshi, M.A. Lewis, H. Xia, K. Zhang, Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning, Cell.
172 (2018) 1122–1131.e9, https://doi.org/10.1016/j.cell.2018.02.010.

10

View publication stats

You might also like