Bae 2021

Journal of Digital Imaging
https://doi.org/10.1007/s10278-021-00499-2
External Validation of Deep Learning Algorithm for Detecting

and Visualizing Femoral Neck Fracture Including Displaced
and Non‑displaced Fracture on Plain X‑ray
Junwon Bae1 · Sangjoon Yu2 · Jaehoon Oh1,3 · Tae Hyun Kim2,3 · Jae Ho Chung3,4,5 · Hayoung Byun3,4 ·
Myeong Seong Yoon1 · Chiwon Ahn6 · Dong Keon Lee7
Received: 10 November 2020 / Revised: 8 June 2021 / Accepted: 15 July 2021

© Society for Imaging Informatics in Medicine 2021
Abstract
This study aimed to develop a method for detection of femoral neck fracture (FNF) including displaced and non-displaced
fractures using convolutional neural network (CNN) with plain X-ray and to validate its use across hospitals through internal
and external validation sets. This is a retrospective study using hip and pelvic anteroposterior films for training and detecting
femoral neck fracture through residual neural network (ResNet) 18 with convolutional block attention module (CBAM) + + .
The study was performed at two tertiary hospitals between February and May 2020 and used data from January 2005 to
December 2018. Our primary outcome was favorable performance for diagnosis of femoral neck fracture from negative
studies in our dataset. We described the outcomes as area under the receiver operating characteristic curve (AUC), accuracy,
Youden index, sensitivity, and specificity. A total of 4,189 images that contained 1,109 positive images (332 non-displaced
and 777 displaced) and 3,080 negative images were collected from two hospitals. The test values after training with one
hospital dataset were 0.999 AUC, 0.986 accuracy, 0.960 Youden index, and 0.966 sensitivity, and 0.993 specificity. Values
of external validation with the other hospital dataset were 0.977, 0.971, 0.920, 0.939, and 0.982, respectively. Values of
merged hospital datasets were 0.987, 0.983, 0.960, 0.973, and 0.987, respectively. A CNN algorithm for FNF detection in
both displaced and non-displaced fractures using plain X-rays could be used in other hospitals to screen for FNF after train-
ing with images from the hospital of interest.
Keywords Femur · Fracture · Deep learning · Machine learning · Artificial intelligence · AI
Abbreviations ResNet Residual neural network

FNF Femoral neck fracture CBAM Convolutional block attention module
CNN Convolution neural network AUC Area under the receiver operating character-
istic curve
CT Computed tomography
Junwon Bae and Sangjoon Yu contributed this work equally as first MRI Magnetic resonance imaging
author.
4
* Jaehoon Oh Department of Otolaryngology – Head and Neck Surgery,
ojjai@hanmail.net; ohjae7712@gmail.com College of Medicine, Hanyang University, Seoul,
Republic of Korea
* Tae Hyun Kim
5
taehyunkim@hanyang.ac.kr Department of HY, College of Medicine, KIST
Bio-Convergence, Hanyang University, Seoul,
1
Department of Emergency Medicine, College of Medicine, Republic of Korea
Hanyang University, 222 Wangsimni‑ro, Seongdong‑gu, 6
Department of Emergency Medicine, College of Medicine,
Seoul 04763, Republic of Korea
Chung-Ang University, Seoul, Republic of Korea
2
Department of Computer Science, Hanyang University, 7
Department of Emergency Medicine, Seoul National
222 Wangsimni‑ro, Seongdong‑gu, Seoul 04763,
University Bundang Hospital, Gyeonggi‑do,
Republic of Korea
Republic of Korea
3
Machine Learning Research Center for Medical Data,
Hanyang University, Seoul, Republic of Korea
13
Vol.:(0123456789)
AP Anteroposterior non-displacement and displacement cases and to perform

ICD code International Classification of Diseases code external validation for this CNN technique.
PACS Picture archiving and communication
system
ROC curve Receiver operating characteristic curve Methods
Grad-CAM Gradient-weighted Class Activation
Mapping Study Design
IQR Interquartile ranges
SD Standard deviation This is a retrospective study using hip and pelvic anter-
CI Confidence interval oposterior (AP) films for learning and detecting FNF using
CNN. The study was performed at two tertiary hospitals
(Seoul and Gyeonggi-do, Republic of Korea) between
Introduction February 2020 and May 2020 and used data from Janu-
ary 2005 to December 2018. This study was approved by
Femoral neck fracture (FNF) is a type of hip fracture that the Institutional Review Boards of Hanyang University
occurs at the level of the neck of the femur, generally is Hospital (ref. no. 2020–01-005) and Seoul National Uni-
located within the capsule, and may result in loss of blood versity Bundang Hospital (ref. no. B-2002–595-103) and
supply to the bone. This fracture is common in the elderly, the requirement for informed consent were waived by the
with 250,000 hip fractures each year in the USA [1–4]. IRBs of two hospitals. All methods and procedures were
Hip fractures are associated with increased mortality, and carried out in accordance with the Declaration of Helsinki.
12–17% of patients die within 1 year of fracture [5, 6].
Patients with FNF usually visit the emergency department Dataset of Participants
with femoral pain and gait disturbance after traumatic
injury. However, it might be possible to stand upright A flowchart of data collection and analysis is shown in
or walk in the initial few hours after fracture [7]. Patient Fig. 1.
morbidity and mortality increase with time from injury
and unplanned delaying surgical treatment [8–10]. Extraction and Categorization of Positive Plain X‑ray
Several imaging methods are available for diagnosing hip Images with Femoral Neck Fracture
fracture including FNF in the emergency department [11].
Plain X-ray is inexpensive and is commonly used as a first- We sorted and gathered hip or pelvis AP X-rays including
line screening test for fracture in patients with related symp- both hips by ICD code in those with FNF confirmed by
toms and signs. Although the sensitivity of diagnosis of hip additional tests such as CT or MRI in the emergency room
fracture using X-ray is high, between 90 and 98%, it is dif- between 2005 and 2018 and extracted their pre-operative
ficult to detect a non-displaced fracture of the femoral neck images [26]. Two emergency medicine specialists reviewed
[4, 12–14]. Unrecognized non-displaced fractures could be the X-rays based on the readings of the radiologist and the
displaced and create complications such as avascular necrosis surgical records of the orthopedic surgeon. We excluded
of the femoral head and malunion, which increase mortality images with severely altered anatomical structures due to
[1, 4, 11, 15]. Due to these problems, elderly trauma patients past damage or acute fractures in any other parts includ-
with femoral pain are recommended to undergo computed ing trochanteric or intertrochanteric fracture, but did not
tomography (CT) or magnetic resonance imaging (MRI) even exclude patients with implants due to past surgery on the
if fractures are not confirmed on plain X-ray images in the contralateral hip. Garden classification was performed
emergency room [11, 13, 16–19]. However, these methods based on the results of surgery, and we subclassified the
might not be available in all cases and increase the required fractures as Garden type I or II in cases of non-displaced
treatment time and cost burden on the patient. FNF (difficult cases) and as Garden type III or IV in cases
Deep learning is widely used for detection of fractures of displaced FNF (easy cases) [27–30].
in medical images, and some studies have been conducted
on hip fracture including FNF or intertrochanteric fracture. Extraction of Negative Plain X‑ray Images Without Femoral
However, such studies only assess the presence or absence Neck Fracture
of a fracture; there has been no external validation of
deep learning algorithms for detection of femoral fracture The candidate images for inclusion in the negative group
[20–25]. This study aims to develop a convolutional neural were identified using X-rays of patients who visited the
network (CNN) for detection of FNF with datasets including emergency room with complaints of hip pain or pelvic pain
13
Fig. 1 Flowchart of data collection and analyzing in the study
that was not indicative of FNF. Their reports were stated and data augmentation. First, we added zeros to the given
by radiologists as “unremarkable study,” “non-specific input images to create square shapes and resized the images
finding,” or “no definite acute lesion.” We also excluded to a fixed 500 × 500 pixel. Next, to increase the amount of
images of severe deformation of anatomical structures training data and secure robustness to geometric transforma-
caused by acute fracture in another area or past damage tions such as scale, translation, and rotation, we augmented
but did not exclude patients with implants due to past sur- the training images by applying random transformations
gery in the absence of anatomical deformation. A relatively (e.g., flip, flop, or rotation) and randomly cropping patches
larger number of non-fracture images compared to fracture of 450 × 450 during the training process. Although the result-
images were obtained at random, with a ratio of fracture ing cropped image includes both hips within the processed
to non-fracture images of 1:3. We collected these images input image, our data pre-processing and augmentation are
from the same hospitals and within the same time period. not aiming to select the region of interests (i.e., hip regions).
X-ray images were extracted using the picture archiv-
ing and communication system (PACS, Piview, INFINITT Fracture Detection by Image Classification
Healthcare, Seoul, South Korea) and stored as.jpeg format
images. No personal information was included when saving The overall network architecture of the proposed method
images for data collection, and data were obtained without for FNF detection was based on image classifiers since our
personal identifying data. In addition, arbitrary numbers were fracture diagnosis task can be categorized as a typical binary
assigned to images, which were then coded and managed. classification problem. Our architecture was based on CBAM,
which includes residual neural network (ResNet) and spatial
Data Pre‑processing and Augmentation attention modules [31, 32]. Specifically, ResNet used residual
mapping to successfully address the issue of vanishing gradi-
In general, medical images acquired by different sources have ents in a CNN. This residual mapping, which can be achieved
different sizes (i.e., resolution) and aspect ratios. Moreover, using the skip connection between layers, allows the model to
as it is difficult to obtain many training datasets with ground- contain many layers. CBAM integrates an attention module
truth labels, there is a need for proper pre-processing steps with ResNet and improves the classification performance with
13
13
◂Fig. 2 Architecture and network comparison of deep learning neu- Effects of Data Augmentation
ral network for detection of FNF. a Architecture of ResNet18 with
attention module. ResNet18 is composed of 8 residual blocks, every
two blocks belong to the same stage and have the same channels of
To demonstrate the performance of our data augmentation, we
output. b Diagram of CBAM + + . The two outputs pooled along the provided AUC values with and without using the augmenta-
channel axis are resized by additional pooling operations accord- tion technique. The AUC value of the Hospital A dataset was
ing to stage number and forwarded to a convolution layer. c the 0.880 when the data augmentation was not applied in training
AUC of internal validation with Hospital A dataset and the number
of Network parameters. The parameter number of ResNet18 with
time. In contrast, with our data-augmentation, this AUC value
CBAM + + is the lowest and the AUC value is the highest was 0.991. We saw that the data augmentation brings a signifi-
cant impact on an insufficient training set of medical images.
a few additional parameters in feed-forward CNNs. CBAM Network Comparison for Detection of FNF
sequentially generates attention maps along the two dimen-
sions of channel and space in the given input feature map. We conducted experiments with four types of attention mod-
These attention maps refined the input feature map. ules: CBAM (with spatial attention and with channel attention),
CBAM- (with spatial attention and without channel attention),
Effective Attention via Coarse‑to‑Fine Approaches CBAM + (with the proposed spatial attention and with channel
attention), and CBAM + + (with the proposed spatial attention
The CBAM module functions well in an image-classification and without channel attention), using ResNet18 and ResNet50
task. However, since low-level activation maps are acquired in as baselines. We trained the networks with the same conditions
the early stages of the CNN, it is difficult to accurately gener- and evaluated the performance of each module. Figure 2c shows
ate spatial attention maps using the spatial attention module the number of parameters in each network and the AUC value
in CBAM. Therefore, we slightly modified the spatial atten- of internal validation set from the Hospital A. There was a small
tion module in an early stage of the CNN so that a coarse number of data in our training set; we achieved similar AUC
attention map could be calculated. As shown in Fig. 2 (a) and values to a larger network. Among the modules, ResNet18 with
(b), we applied average-pooling and max-pooling operations CBAM + + was the most efficient, with the smallest number
along the channel axis and concatenate them. Subsequently, of parameters and the highest AUC value. Therefore, we used
additional average-pooling operations were applied before the ResNet18 with CBAM + + as our final model.
convolution layer. These average pooling operations make the
intermediate feature map. By decreasing the number of aver- Visualization and Verification of the Medical Diagnosis
age pooling operations in each stage, we obtain different levels
of activation maps of size (H/8, W/8), (H/4, W/4), (H/2, W/2), The proposed method is a computer-aided diagnostic system
and (H, W) after average pooling in each stage of the network. aimed to help radiologists and emergency doctors in medical
On this intermediate feature map, we applied a convolution imaging analysis. Therefore, we not only classify whether
layer with filter size 5 for encoding areas to emphasize and the input X-ray image includes fractured parts, but also
suppress. We increased the resolution to HxW using the bilin- visualize suspicious parts. For visualization, we employed
ear interpolator so that the attention map has the same size as Grad-CAM to highlight important regions in the image for
the input feature map. Finally, we multiplied the attention map diagnosis since the proposed network is composed of CNNs
to the input feature map. Using our modified attention mod- with fully connected layers [33]. As a result, Grad-CAM was
ule, we generated more accurate attention maps with a small applied to the last convolutional layer placed before the fully
number of convolutional operations compared to the original connected layer to verify the medical diagnosis.
CBAM, even though the number of parameters is equivalent.
Primary Outcomes and Validation
Experiments
Our primary outcome was favorable performance of detec-
To verify the performance of the proposed method, we per- tion of FNF from negative studies in our dataset. For valida-
form the following two steps. (1) Phase 1: Hospital A dataset tion, we used the module with accuracy, Youden index, and
is separated into training data (80%), internal validation data AUC. Accuracy is the fraction of correct predictions over total
(10%), and test data (10%). We determined the optimal cut-off predictions [34]. The Youden index is calculated at [(sensitiv-
value using result from internal validation. And we performed ity + specificity) – 1] and is the vertical distance between the
the test with Hospital A test dataset and external validation 45-degree line and the point on the ROC curve. Sensitivity,
with Hospital B dataset. (2) Phase 2: Training, internal valida- also known as recall, is the fraction of correct predictions over
tion, and testing were conducted with all data from Hospitals A all FNF cases, and specificity is the fraction of correct nor-
and B, using 80%, 10%, and 10% of each dataset, respectively. mal predictions over all normal cases. The AUC is the area
13
under the ROC that plots the relation of the true positive rate Table 1 Baseline characteristics of participants who provided images
(sensitivity) to the false-positive rate (1-specificity). Since the in datasets
metrics except for AUC change according to the cut-off values All datasets Hospital A set Hospi-
that determine fractures or negative predictions, we used the (n = 2,090) tal B set
AUC as the primary evaluation metric. In the test and external (n = 2,099)
validation, we selected the optimal cut-off value where Youden Total 4,189 2,090 2,099
index was highest in internal validation because plain radiogra- Positive images, n 1,109 589 520
phy of the pelvis is commonly used as a first-line screening test Non-displaced, n 332 139 193
to diagnose hip fractures. We applied the cut-off value using Displaced, n 777 450 327
the same algorithm as in the external validation set. Age, years 75.7 [12.9] 76.8 [13.8] 74.3 [11.7]
Sex, male 28.1% 28.4% 27.8%
Negative images 3,080 1,501 1,579
Statistical Analysis Age, years 46.4 [16.5] 50.9 [19.1] 42.3 [12.2]
Sex, male 48.5% 52.0% 45.2%
Data were compiled using a standard spreadsheet appli- Continuous variables are presented by mean [standard deviation] and
cation (Excel 2016; Microsoft, Redmond, WA, USA) categorical variables are presented by N (%)
and analyzed using NCSS 12 (Statistical Software 2018,
NCSS, LLC. Kaysville, UT, USA, ncss.com/software/ validation with that of test after training and internal vali-
ncss). Kolmogorov–Smirnov tests were performed to dation. Two-tailed p < 0.05 was considered significantly
demonstrate normal distribution of all datasets. We gen- different.
erated descriptive statistics and present them as frequency
and percentage for categorical data and as either median
and interquartile range (IQR) (non-normal distribution) or
mean and standard deviation (SD) (normal distribution) Results
or 95% confidence interval (95% CI) for continuous data.
We used one ROC curve and cut-off analysis for inter- A total of 4,189 images containing 1,109 positive images (332
nal validation and two ROC curves with the independent difficult cases and 777 easy cases) and 3,080 negative images
groups designed for comparing the ROC curve of external were collected from 4,189 patients (Table 1). From hospital A,
Table 2 Diagnostic performance matrix (A) and outcomes (B) on the internal validation, test and external validation test with optimal cut off
values (phase 1). Optimal cut off value was estimated when it is located on the highest Youden index in the internal validation
(A)
(1) Internal (3) External
Pos Neg (2) Test after (1) Pos Neg Pos Neg
Validation Validation
0.72 of Predicted Pos 57 4 Predicted Pos 57 1 Predicted Pos 488 29

Cut-off 2 2 32
Predicted Neg 146 Predicted Neg 150 Predicted Neg 1550
value (E1/D1) (E0/D2) (E5/D27)
(B)
(2) Test after (1) (3) External Validation
Youden Youden
AUC Accuracy Sen Spe AUC Accuracy Sen Spe
Index index
0.999 0.977
0.986 0.960 0.966 0.993 0.971 0.920 0.939 0.982
(0.996, 1.000) (0.965, 0.984)
E 1.000 0.985
D 0.933 0.861
(1) Internal validation with Hospital A set, (2) Test after training and internal validation with Hospital A set, (3) External validation with Hos-
pital B set. Pos positive, fracture of femoral neck Neg negative, no fracture. AUC under the curve of the receiver operating characteristic curve
(ROC). Accuracy, the fraction of the correct predictions over the total predictions. The Youden index; the sensitivity + specificity – 1 that is the
vertical distance between the 45° line and the point on ROC curve. Sen sensitivity, Spe specificity. In the internal and external validation, we
selected the optimal cutoff value when the Youden index value is highest in the test after training with Hospital A set. CI confidence interval. E,
easy cases subclassified to garden III or IV types. D, difficult cases subclassified to garden I or II types. *p < 0.05 is statistically significant
13
Fig. 3 Receiver operator charac-

teristics (ROC) curves and test
comparing two AUCs of (1)
Test after training and internal
validation with A set, (2) Exter-
nal Validation with B set after
training and internal validation
with A set (3) Test after training
and internal validation with
A + B sets. *p < 0.05 is statisti-
cally significant
2,090 images consisted of 589 positive images (male, 28.4%; test and external validation. Test values after training and
mean [SD] age, 76.8[12.8] years) of FNF and 1,501 nega- internal validation with hospital A dataset were 0.999
tive images with no fracture. From hospital B, 2,099 images (0.996, 1.000) AUC, 0.986 accuracy, and 0.960 Youden
comprising 520 positive images (male, 27.8%; age, 74.3[11.7] index, with a 0.966 sensitivity (1.000 in easy cases and
years) and 1,579 negative images were analyzed. 0.933 in difficult cases) and a 0.993 specificity. Values
Phase 1: Comparison of external validation results with of external validation with the hospital B dataset were
those of testing after internal validation in one hospital. lower, at 0.977 (0.965, 0.984) AUC, 0.971 accuracy, and
Diagnostic performance matrix and outcomes are 0.920 Youden index, with a 0.939 sensitivity (0.985 in
shown in Table 2. The optimal cut-off value was 0.72, easy cases and 0.861 in difficult cases) and a 0.982 speci-
with the highest Youden index (0.939) on the ROC in ficity (p < 0.001). These results are shown in Table 2 and
internal validation for estimating values in the internal Fig. 3.
13
Table 3 Diagnostic performance matrix (A) and outcomes (B) on the internal validation, and test after training with all dataset (phase 2). Opti-
mal cut off value was estimated when it is located on the highest Youden index in the internal validation
(A)
(1) Internal
Pos Neg (2) Test after (1) Positive Neg
validation
Predicted Pos 109 6 Predicted Pos 108 4

58.61 of Cut-
off value
Predicted Neg 2 (E1/D1) 302 Predicted Neg 3 (E0/D3) 305
(B)
Internal Validation Test after Internal Validation
Youden Youden
AUC Accuracy Sen Spe AUC Accuracy Sen Spe
Index index
0.991 0.987
0.981 0.963 0.982 0.981 0.983 0.960 0.973 0.987
(0.971, 0.997) (0.962, 0.995)
E 0.982 1.000
D 0.982 0.946
(1) Internal validation with Hospital A and B set, (2) Test after training and internal validation with Hospital A and B set. Pos positive, fracture
of femoral neck; Neg negative, no fracture. AUC under the curve of the receiver operating characteristic curve (ROC). Accuracy, the fraction
of the correct predictions over the total predictions. The Youden index; the sensitivity + specificity – 1 that is the vertical distance between the
45° line and the point on ROC curve. Sen sensitivity, Spe specificity. In the internal and external validation, we selected the optimal cutoff value
when the Youden index value is highest in the test after training with Hospital A set. CI confidence interval. E, easy cases subclassified to garden
III or IV types. D, difficult cases subclassified to garden I or II types. *p < 0.05 is statistically significant
Phase 2: Evaluation of the internal test with combined We set the cut-off value (58.61) at the highest Youden
image datasets. index (0.963) on the ROC of internal validation after
Fig. 4 Visualization with Grad-Class Activation mapping (CAM) nal plain X-ray images and CAM applied images without fracture.
results in external validation test. A Correct detection images. The B False detection images. The images in in 1st row are false-positive
images in 1st and 2nd row (true positive images) are the original images, whereas the images in in 2nd, 3rd, and 4th row are false-
plain C-ray images and CAM applied images with FNF, whereas negative images with unidentified areas highlighted by CAM in
the images in 3rd and 4th row (true negative images) are the origi- images
13
training with merged image datasets. Test values using non-displaced fracture [24, 25]. In our study, we did not
the merged dataset were 0.987 (0.962, 0.995) AUC, 0.983 classify the normal, displaced fracture, and non-displaced
accuracy, and 0.960 Youden index with a 0.973 sensitiv- fracture but detected displaced and non-displaced fracture
ity (1.000 in displaced images and 0.946 in non-displaced images together. However, the sensitivity of easy displaced
images) and a 987 specificity, as shown in Table 3 and fracture was 1.000, and that of difficult non-displaced frac-
Fig. 3. ture was 0.933. Our algorithm can aid in detection of not
only Garden type III or IV, but also Garden type I or II FNF
from hip or pelvic AP X-ray images.
Visualization with Grad‑CAM We conducted external validation with the dataset of a
second hospital after deep learning with a single dataset for
Figure 4 visualizes the feature maps of our network functions. detection of FNF. The external validation test results were
Testing results of FNF after training through ResNet18 with 0.971 accuracy, 0.939 sensitivity, and 0.977 AUC. Com-
the spatial attention module using our network could con- paring the external validation with test after training and
centrate on bilateral hip joints, even if images were negative. the internal validation, the difference of AUC was 0.023 (p
value < 0.001), likely due to the resolution difference and
degradation of data with decreasing image intensity level
Discussion and contrast. However, with merged images of disparate hos-
pitals using the same protocols, the AUC between a single
Traditional deep neural networks require many training institution and multiple institutions was statistically not dif-
datasets, but medical images with annotations are not easy ferent (difference of AUC = 0.013, p = 0.076). This indicates
to acquire and are insufficient in number to train large net- that the completed model trained using one hospital dataset
works. To avoid the overfitting problem with insufficient can be transferred to other hospitals and used as a screen-
datasets, we employ an efficient network that shows high ing tool for FNF. Hospitals that use the model would need
performance with a small dataset. Thus, we developed to train it using their own positive and negative images to
CBAM + + as an efficient version of CBAM. In the experi- optimize performance in specific environments.
ments for detection of FNF with X-ray, we evaluated per-
formances of ResNets with different types of attention mod-
ules. We ultimately selected ResNet18 with the proposed Limitations
CBAM + + as it showed the best performance with the
smallest number of parameters. Moreover, we demonstrated There were several limitations in this study. First, the age
the performance of the proposed network by providing visu- and sex of patients according to presence or absence of
alization results through Grad-CAM. fractures were not completely matched because FNF has a
The X-ray image detection function for FNF through particularly high incidence at certain age and sex. To apply
deep learning performed with images of hospital A shows the results of this study to clinical situations, the method
results that are equivalent to or higher than the sensitivity of must be verified in adult patient of various ages and sexes
medical staff, especially emergency medicine doctors. In a who visited the emergency room. Therefore, we thought that
previous study, the sensitivity and specificity of X-ray read- limiting the range of the control group for statistical match-
ings of emergency medical doctors, except radiologists and ing would result in a deterioration in clinical application.
orthopedic surgeons, were 0.956 and 0.822 [21]. In other Second, we did not compare the performance of our model
recent studies of deep learning for detection of femoral frac- to that of physicians with respect to key factors such as clini-
ture conducted at a single institution, ranges of outcomes cal outcomes, the time required to reach diagnosis, and the
were 0.906–0.955 for accuracy, 0.939–0.980 for sensitiv- equipment required to use the model as a screening tool.
ity, and 0.890–0.980 for AUC [20–22]. These results in Third, there was a difference in resolution between the medi-
our study showed excellent respective outcomes of 0.986, cal images and the input images due to resizing. Therefore, it
0.966, and 0.999, respectively. Our deep learning algorithm is possible for information loss to occur when attempting to
increased the capability for FNF detection with X-ray as detect FNF since the medical images were downsized [35].
the first screening tool and can be applied to clinical prac- Finally, external validation in phase 1, applying the network
tice of any hospital that provides training and test datasets. of trained with single hospital directly to another hospital
Krogue et al. showed that the sensitivity using deep learning showed lower accuracy than the results using a single hospi-
was 0.875 in displaced fracture and 0.462 in non-displaced tal. So, a phase 2 study should be conducted in which images
fracture, and Mutasa et al. showed that the sensitivity using of the two hospitals are learned together, because we do not
generative adversarial network with digitally reconstructed yet know how many images are needed to apply our network
radiographs was 0.910 in displaced fracture and 0.540 in equally to external data.
13
Conclusions 7. Hip Fracture. OrthoInfo from the American Academy of Ortho-

paedic Surgeons. Available at https://o rthoi nfo.a aos.o rg/e n/
diseases--conditions/hip-fractures. Accessed 10 November 2020
We verified a CNN algorithm for detection of both displaced 8. Zuckerman JD, Skovron ML, Koval KJ, Aharonoff G, Frankel
and non-displaced FNF using plain X-rays. This algorithm VH: Postoperative complications and mortality associated with
could be transferred to other hospitals and used to screen operative delay in older patients who have a fracture of the hip. J
Bone Joint Surg Am 77(10):1551-1556, 1995
for FNF after training with images of the selected hospitals. 9. Rudman N, McIlmail D: Emergency Department Evaluation and
Treatment of Hip and Thigh Injuries. Emerg Med Clin North Am.
Acknowledgements We would like to thank eworld editing (www. 18(1):29-66, 2000
eworldediting.com) for English language editing. 10. Bottle A, Aylin P: Mortality associated with delay in operation after
hip fracture: observational study. BMJ 332(7547):947-951, 2006
Author Contribution Oh J and Kim TH conceived the study and 11. Perron AD, Miller MD, Brady WJ: Orthopedic pitfalls in the ED:
designed the trial. Oh J, Bae J, Ahn C, Byun H, Chung JH, and Lee D radiographically occult hip fracture. Am J Emerg Med 20(3):234-
supervised the trial conduct and were involved in data collection. Yu 237, 2002
S, Kim T, and Yoon M analyzed all images and data. Bae J, Yu S, Oh 12. Parker MJ: Missed hip fractures. Arch Emerg Med 9(1):23-27,
J and Kim TH drafted the manuscript, and all authors substantially 1992
contributed to its revision. Oh J and Kim TH take the responsibility 13. Rizzo PF, Gould ES, Lyden JP, Asnis SE: Diagnosis of occult
for the content of the paper. fractures about the hip. Magnetic resonance imaging compared
with bone-scanning. J Bone Joint Surg Am 75(3):395–401, 1993
Funding This study was supported by the National Research Founda- 14. Jordan RW, Dickenson E, Baraza N, Srinivasan K: Who is more
tion of Korea (2019R1F1A1063502). This work was also supported by accurate in the diagnosis of neck of femur fractures, radiologists
the research fund of Hanyang University (HY-2018) and the National or orthopaedic trainees?. Skeletal Radiol 42(2):173-176, 2013
Research Foundation of Korea (NRF) grant funded by the Korea gov- 15. Labza S, Fassola I, Kunz B, Ertel W, Krasnici S: Delayed recognition
ernment (MSIT) (NRF2019R1A4A1029800). of an ipsilateral femoral neck and shaft fracture leading to preventable
subsequent complications: a case report. Patient Saf Surg 11:20, 2017
16. Lubovsky O, Liebergall M, Mattan Y, Weil Y, Mosheiff R: Early
Declarations diagnosis of occult hip fractures MRI versus CT scan. Injury
36(6):788-792, 2005
Ethics Approval This study was approved by the Institutional Review 17. Deleanu B, Prejbeanu R, Tsiridis E, Vermesan D, Crisan D,
Boards of Hanyang University Hospital (ref. no. 2020–01-005) and Haragus H, et al: Occult fractures of the proximal femur: imag-
Seoul National University Bundang Hospital (ref. no. B-2002–595- ing diagnosis and management of 82 cases in a regional trauma
103). center. World J Emerg Surg 10:55, 2015
18. Thomas RW, Williams HL, Carpenter EC, Lyons K: The validity
Consent to Participate The requirement for informed consent were of investigating occult hip fractures using multidetector CT. Br J
waived by the IRBs of Hanyang University Hospital, and Seoul Radiol 89(1060):20150250, 2016
National University Bundang Hospital. All methods and procedures 19. Mandell JC, Weaver MJ, Khurana B: Computed tomography for
were carried out in accordance with the Declaration of Helsinki. occult fractures of the proximal femur, pelvis, and sacrum in clini-
cal practice: single institution, dual-site experience. Emerg Radiol
Consent for Publication This manuscript has not been published or 25(3):265-273, 2018
presented elsewhere in part or in entirety and is not under consideration 20. Adams M., Chen W, Holcdorf D, McCusker MW, Howe PD,
by another journal. We have read and understood your journal’s poli- Gaillard F: Computer vs human: Deep learning versus perceptual
cies, and we believe that neither the manuscript nor the study violates training for the detection of neck of femur fractures. J Med Imag-
any of these. ing Radiat Oncol 63(1):27-32, 2019
21. Cheng CT, Ho TY, Lee TY, Chang CC, Chou CC, Chen CC, et al:
Application of a deep learning algorithm for detection and visu-
Conflict of Interest The authors declare no competing interests.
alization of hip fractures on plain pelvic radiographs. Eur Radiol
29(10):5469-5477, 2019
22. Urakawa T, Tanaka Y, Goto S, Matsuzawa H, Watanabe K, Endo
References N: Detecting intertrochanteric hip fractures with orthopedist-level
accuracy using a deep convolutional neural network. Skeletal
1. Zuckerman JD: Hip fracture. N Engl J Med 334(23):1519-1525, Radiol 48(2):239-244, 2019
1996 23. Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M,
2. Cummings SR, Rubin SM, Black D: The Future of Hip-Fractures Gale W, et al: Deep learning predicts hip fracture using confound-
in the United-States - Numbers, Costs, and Potential Effects of ing patient and healthcare variables. NPJ Digit Med 2:31, 2019
Postmenopausal Estrogen. Clin Orthop Relat Res (252):163-166, 24. Krogue JD, Cheng KV, Hwang, KM, Toogood P, Meinberg EG,
1990 Geiger, EJ, et al: Automatic Hip Fracture Identification and Func-
3. Melton LJ: Hip fractures: A worldwide problem today and tomor- tional Subclassification with Deep Learning. Radiol Artif Intell
row. Bone 14:1-8, 1993 25(2):e190023, 2020
4. Cannon J, Silvestri S, Munro M: Imaging choices in occult hip 25. Mutasa S, Varada S, Goel A, Wong TT, Rasiej MJ: Advanced
fracture. J Emerg Med 37(2):144-152, 2009 Deep Learning Techniques Applied to Automated Femoral Neck
5. Richmond J, Aharonoff GB, Zuckerman JD, Koval KJ: Mortality Fracture Detection and Classification. J Digit Imaging 33(5):1209-
risk after hip fracture. J Orthop Trauma 17(1):53-56, 2003 1217, 2020
6. LeBlanc KE, Muncie HL Jr., LeBlanc LL: Hip fracture: diag- 26. International Statistical Classification of Diseases and Related
nosis, treatment, and secondary prevention. Am Fam Physician Health Problems 10th Revision. Available at https://icd.who.int/
89(12):945-951, 2014 browse10/2019/en. Accessed 10 November 2020
13
27. Garden RS: Low-Angle Fixation in Fractures of the Femoral Neck. 33. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra
Journal of Bone and Joint Surgery-British Volume 43(4):647-663, D: Grad-CAM: Visual Explanations From Deep Networks via
1961 Gradient-Based Localization. Proceedings of the IEEE Interna-
28. Frandsen PA, Andersen E, Madsen F, Skjødt T: Garden’s classifi- tional Conference on Computer Vision (ICCV) 618–626, 2017
cation of femoral neck fractures. An assessment of interobserver 34. Youden WJ: Index for rating diagnostic tests. Cancer 3(1):32-35,
variation. J Bone Joint Surg Br 70(4):588–590, 1988 1950
29. Thorngren KG, Hommel A, Norrman PO, Thorngren J, Wingstrand 35. Kwon G, Ryu J, Oh J, Lim J, Kang BK, Ahn C, et al: Deep learn-
H: Epidemiology of femoral neck fractures. Injury 33:1-7, 2002 ing algorithms for detecting and visualising intussusception on
30. Van Embden D, Rhemrev SJ, Genelin F, Meylaerts SA, Roukema plain abdominal radiography in children: a retrospective multi-
GR: The reliability of a simplified Garden classification for intra- center study. Sci Rep 10(1):17582, 2020
capsular hip fractures. Orthop Traumatol Surg Res 98(4):405-
408, 2012 Publisher’s Note Springer Nature remains neutral with regard to
31. He K, Zhang X, Ren S, Sun J: Deep Residual Learning for Image jurisdictional claims in published maps and institutional affiliations.
Recognition. Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) 770–778, 2016
32. Woo S, Park J, Lee J, Kweon IS: CBAM: Convolutional Block
Attention Module. Proceedings of the European Conference on
Computer Vision (ECCV) 3–19, 2018
13

Bae 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bae 2021

Uploaded by

Copyright:

Available Formats

Journal of Digital Imaging

External Validation of Deep Learning Algorithm for Detecting

Received: 10 November 2020 / Revised: 8 June 2021 / Accepted: 15 July 2021

Keywords Femur · Fracture · Deep learning · Machine learning · Artificial intelligence · AI

Abbreviations ResNet Residual neural network

AP Anteroposterior non-displacement and displacement cases and to perform

Fig. 1 Flowchart of data collection and analyzing in the study

0.72 of Predicted Pos 57 4 Predicted Pos 57 1 Predicted Pos 488 29

Fig. 3 Receiver operator charac-

Predicted Pos 109 6 Predicted Pos 108 4

Conclusions 7. Hip Fracture. OrthoInfo from the American Academy of Ortho-

You might also like

Bae 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bae 2021

Uploaded by

Copyright:

Available Formats

Journal of Digital Imaging

External Validation of Deep Learning Algorithm for Detecting

Received: 10 November 2020 / Revised: 8 June 2021 / Accepted: 15 July 2021

Keywords Femur · Fracture · Deep learning · Machine learning · Artificial intelligence · AI

Abbreviations ResNet Residual neural network

AP Anteroposterior non-displacement and displacement cases and to perform

Fig. 1 Flowchart of data collection and analyzing in the study

0.72 of Predicted Pos 57 4 Predicted Pos 57 1 Predicted Pos 488 29

Fig. 3 Receiver operator charac-

Predicted Pos 109 6 Predicted Pos 108 4

Conclusions 7. Hip Fracture. OrthoInfo from the American Academy of Ortho-

You might also like

Abbreviations ResNet Residual neural network

AP Anteroposterior non-displacement and displacement cases and to perform