Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

A Deep Learning Approach for Spine Cervical Injury

Severity Determination through Axial and Sagittal


Magnetic Resonance Imaging Segmentation and
Classification
I Gusti Lanang Ngurah Agung Artha Wiguna
Rumah Sakit Umum Pusat Sanglah Denpasar
Yosi Kristian
Institut Sains dan Teknologi Terpadu Surabaya
Maria Florencia Deslivia
Rumah Sakit Umum Pusat Sanglah Denpasar
Rudi Limantara
Institut Sains dan Teknologi Terpadu Surabaya
David Cahyadi
Institut Sains dan Teknologi Terpadu Surabaya
Ivan Alexander Liando
Rumah Sakit Umum Pusat Sanglah Denpasar
Hendra Aryudi Hamzah
Rumah Sakit Umum Pusat Sanglah Denpasar
Kevin Kusuman
Rumah Sakit Umum Pusat Sanglah Denpasar
Dominicus Dimitri
Rumah Sakit Umum Pusat Sanglah Denpasar
Maria Anastasia (  mrnstasia05@gmail.com )
Rumah Sakit Umum Pusat Sanglah Denpasar
I Ketut Suyasa
Rumah Sakit Umum Pusat Sanglah Denpasar

Research Article

Keywords: Spinal cord injury, Cervical spinal cord injury, MRI classification, Deep learning, Machine
learning

Posted Date: November 25th, 2023

Page 1/25
DOI: https://doi.org/10.21203/rs.3.rs-3644109/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.

Page 2/25
Abstract
Objectives: Spinal cord injuries (SCI) require extensive efforts to predict the outcome of patients. While
the ASIA Impairment Scale is the gold standard to assess patients with SCI, it has some limitations due to
the subjectivity and impracticality in certain cases. Recent advances in machine learning (ML) and image
recognition have prompted research into using these tools to predict outcomes. The aim of this study is
to present a comprehensive analysis using deep learning techniques to evaluate and predict cervical
spine injuries from MRI scans.

Materials & Method: This is a cross-sectional database study, with patients admitted with traumatic and
nontraumatic cervical SCI from 2019 to 2022 were included in our study. MRI images were labelled by
four senior resident physicians. We trained a deep convolutional neural network using axial and sagittal
cervical MRI images from our dataset and assessed model performance.

Result: In the axial spinal cord segmentation, we achieved a dice score of 0.94 for and IoU score of 0.89.
In the sagittal spinal cord segmentation, we obtained a dice scores up to 0.9201 and IoU scores up to
0.8541. The model for axial image score classification gave a satisfactory result with an f1 score of 0.72
and AUC of 0.79.

Conclusion: Deep learning has been used in automated diagnostic tools, showing promise for significant
future advancement. Our models were effectively able to identify cervical spinal cord injury on T2-
weighted MR images with satisfactory performance. Further research is necessary to create an even more
advanced model for predicting patient outcomes in spinal cord injury cases.

Introduction
Spinal cord injury (SCI) is considered as one of the most devastating injuries. SCI has always been
associated with serious complications during the acute and long-term phases of care, therefore, extensive
efforts should be made to predict the outcome of patients with SCI.[1, 2] The spinal cord damage itself
can occur due to either a traumatic event or a non-traumatic injury to the spinal cord. Traumatic SCI
primarily occurs due to immediate mechanical damage and secondary damage due to subsequent
inflammation. Meanwhile, non-traumatic types encompass degenerative cervical myelopathy in which
age-related degeneration of the discs, ligaments, and vertebrae in the cervical spine leads to spinal cord
compression. This compression causes varying levels of neurological dysfunction.[3] The ASIA
Impairment Scale, which is the gold standard in evaluating spinal cord injuries, relies on subjective
assessments of the patients, leading to inconsistencies and potential bias in the classification.
Additionally, it is not always possible to perform a complete ASIA examination properly due to patient’s
conditions, such as the presence of concomitant injuries.[4, 5]

In recent years, there has been major development in machine learning (ML) technology. These advances
have led to a dramatic increase in research into the use of ML as one of the tools for determining patient
outcomes by utilizing image recognition technology. Image recognition offers potential benefits in spinal
Page 3/25
cord injury management by automatically detecting and segmenting spinal cord structures,[6, 7]
identifying injury patterns,[8, 9] and quantifying the extent of damage from Magnetic Resonance Imaging
(MRI).[10–12] Recent effort has been made to perform automated injury segmentation and extract MR
imaging biomarkers in patients with cervical SCI, especially in axial view.[7, 8] Furthermore, there is also
another developing image recognition technology to reveal the microstructure of the spinal cord to
investigate the spinal cord pathology in patients with cervical myelopathy.[13] As most studies only
evaluated SCI based on the axial MRI, the sagittal MRI also plays a significant role in determining the
extent of the spinal cord damage.[14]

This study presents a comprehensive analysis that uses deep learning techniques to predict cervical
spine injuries from MRI scans, with segmentation and classification methods for analysing axial and
sagittal MRI views being the benchmark of this study. Overall, this paper presents a robust framework
that employs deep learning techniques to analyse and evaluate cervical SCI, with the potential to improve
clinical practice and enhance patient prognosis.

Methods
Data Acquisition
This study involved a retrospective analysis of Magnetic Resonance Imaging (MRI) from patients
admitted with traumatic and nontraumatic cervical SCI in two centres (Prof I.G.N.G Ngoerah Hospital
Denpasar and Siloam Hospital Denpasar) from 2018 to 2022. All patients underwent MRI examination
within 48 hours of admission which included axial T2 images. Patients included in this study were
enrolled consecutively and the institutional research board (IRB) approval was exempted due to the
nature of the study. Informed consent was also exempted for this study.

Data Pre-Processing
A data pre-processing protocol was developed and applied to the training/validation as well as holdout
datasets. The axial and sagittal T2-weighted non-fat saturated (aT2w) sequence from each MRI study
was uploaded to the SCI webtool for the scoring. The MRI scanner acquisition parameters, including
voxel size and slice thickness for the T2-weighted axial sequence, were collected and the aT2w sequence
for each patient was converted into a series of JPEG images (Cervical SCI Dataset). All images from
Spine Generic Dataset[15] were resized into 320x320 pixels to standardize input size and computational
efficiency. We standardized the color range between various MRI images with contrast stretching. The
primary purpose of contrast stretching was to enhance the visibility and differentiation of features in an
image by stretching the range of pixel intensities and can be beneficial during model training.

f (x, y) − f
min
g (x, y) = × (g − g ) + g
max min min
f − f
max min

Where:

Page 4/25
g(x, y) is the updated pixel value in the new image.
f(x, y) is the original pixel value in the image to be transformed.
fmin and fmax are the minimum and maximum values of the pixel intensity range in the original
image.
gmin and gmax are the desired minimum and maximum values within the pixel intensity range in the
new image.

General Contrast Stretching was done to all our data prior to the training. We implemented the 99th
percentile contrast stretching technique as it is more appropriate for MRI images with black backgrounds
and a high white range, since the standard contrast stretching might not adequately reveal the significant
differences in these cases. The 99th percentile contrast stretching was an image processing technique
that enhances contrast by expanding the range of pixel intensities used.[16] It worked by excluding the
1% of pixels with the lowest and highest intensities, which would effectively sharpen and improve images
with low brightness or sharpness.

(x − P 1) * (255)
[f (x) = ]
(P 99 − P 1)

Where:

f(x) represents the updated pixel value in the new image.


P1 and P99 are the 1st and 99th percentiles of the pixel intensity range in the original image.
Multiplied by 255 to convert the values back to an 8-bit image.

Following the contrast stretching, we performed data cropping to eliminate irrelevant portions or reduce
the image size for more efficient processing. The cropping used in this study were normal cropping
process using the Albumentation library's CenterCrop method and circular masking and cropping with
openCV library.

[Insert Fig. 1.]


Data Labelling & Data Distribution
Four orthopaedic surgery residents, each with > 2 years of experience interpreting spine MRI scans
independently examined each axial T2-weighted image from the training/validation and testing dataset.
The labellers were given the full-resolution JPEG images to review with no time limit per image. The axial
plane identified different patterns of T2 signal abnormality within the spinal cord, according to the signal
intensity and each image was labelled as 0 or 1 or 2. If there were instance disagreement between more
than two labellers, they collaboratively reviewed those images to reach decision on the final label. For the
sagittal plane, the spinal cord injury patterns were evaluated by measuring the rostrocaudal length of the

Page 5/25
lesion that can be seen on a T2-weighted mid-sagittal image. The length of the lesion was classified into
two groups: <30 mm and > 30 mm.

A. Axial Spinal Cord

The Spine Generic dataset had more slices (85.440) than cervical SCI dataset (11.374) with similar
number of subjects (294 vs. 267 patients). There were more labelled slices in Spine Generic Dataset since
all normal spinal cord available were retrieved. Meanwhile, the slices in our cervical SCI dataset were
labelled only around epicentrum of injury. The image sizes in Spine Generic are smaller (64x320 pixels)
as it focuses on the spinal cord area, and data outside of this area has been discarded. To make the data
suitable for testing and real data with square sizes, padding was added to maintain a size of 320x320
pixels without altering the aspect ratio and image dimensions.

B. Sagittal Spinal Cord

Spine Generic had more slices (28.987 vs. 5.305) and more labelled slices (5.749 vs. 873) than cervical
SCI dataset. Yet, Spine Generic had more slices without labels (23.238 vs 4.372). Overall, the Spine
Generic dataset was more extensive in intramedullary sagittal segmentation than our dataset. It
contained more labelled slices, indicating better coverage of the spinal cord area in sagittal images.

System Architecture
[Insert Fig. 2.]

[Insert Fig. 3.]

The system architecture was further developed to create a program which can automatically predict the
severity of injury in axial and sagittal T2W cervical MRI, as illustrated in Figs. 2 and 3.

1. SCI severity prediction from MRI axial view

The system had two main parts. The first part focused on predicting the score based on the signal
intensity in the axial MRI image in a binary fashion from 0 to 2, to assess the severity of the spinal cord
injury as shown in Fig. 4. We developed a machine learning model, like a classification model, which
takes input from MRI images, including various features like intensity values, texture, and shape
attributes of the spinal cord region. To make the classification more efficient, we used an axial
segmentation model to filter slices with the spinal cord. In the second part, we created an application to
input new MRI images into the system. The model then processed these images and provides a
prediction of the score, which enabled quick and accurate assessments of spinal cord injury severity.

[Insert Fig. 4.]

2. Lesion length calculation from MRI sagittal view

Page 6/25
The second part of the system predicts the length of the lesion (Fig. 12). A regression model was trained
on MRI images with corresponding lesion masks. The system utilized two different model segmentations:
one for filtering the spinal cord and another for creating the segmentation area for the lesion. Similar to
the axial MRI prediction, an application is developed to input new MRI images. The regression model then
analysed the images and estimates the lesion length, as illustrated in Fig. 5. By using these machine
learning models, the diagnostic process and treatment planning for spinal cord injuries were improved. AI-
driven prediction was used for efficient assessment of injury severity and lesion length, while continuous
monitoring and validation of the model's performance were essential to ensure accuracy and reliability.

[Insert Fig. 5.]


Model Training
This study used various Convolutional Neural Networks (CNN) architectures. For segmentation task, we
used U-Net, Residual UNet, and Attention UNet. While for classification task, we used Resnet and
EfficientNetV2. U-Net, in particular, is considered a state-of-the-art technology for lesion segmentation in
determining spinal cord injury level. It is widely used and effective for segmenting objects in images. U-
Net was proposed in 2015 by researchers at the University of Freiburg, Germany, and it is specifically
designed for accurate and efficient medical image segmentation. U-Net was specifically designed with an
architecture tailored for accurate and efficient segmentation of medical images. Its architecture included
an encoder and decoder, with skip connections to preserve important information and help the model
remember initial data in larger and deeper models.[17] The Residual U-Net utilizes residual blocks, which
help address the problem of vanishing gradients commonly encountered in deep neural networks. In a
residual block, the input data was added to the output of several convolutional layers. This allowed
weight gradients to flow through more layers, thereby improving training effectiveness and reducing
overfitting.[18]

Convolution block utilized by the Residual U-Net each consisted of Batch Normalization, ReLU activation,
and a 3x3 convolution. These three layers were repeated twice within a single residual block. ResNet
(Residual Network) was a deep neural network architecture specifically designed to address the
challenges of training very deep networks, which work as an extension of the Convolutional Neural
Network (CNN) architecture commonly used for tasks such as object recognition, object detection, and
segmentation.[He et al.] In this project, we used the following ResNet models: ResNet 18, ResNet 34,
ResNet 50, and ResNet 101. These models were evaluated to assess their performance and suitability for
the specific task at hand. For the EfficientNetV2, it has been tested on various datasets and proven to
outperform similar models. With the introduction of these new techniques, EfficientNetV2 achieved an
acceptable accuracy on several visual recognition datasets.

Model Testing
The model’s performance was further assessed. Dice score and IoU (Interaction over Union) score were
used to evaluate segmentation accuracy. The dice score was used to measure the similarity between the
predicted and actual segmentations, which quantifies the overlap between the two segmentations.
Page 7/25
Meanwhile IoU measured the ratio of the area of overlap between the predicted and actual segmentations
to the area of their union. Both of Dice score and IoU ranges from 0 to 1, where 0 indicates no overlap and
1 indicates perfect overlap. The model's predicted categories were matched against the human-generated
ground-truth labels across the complete training dataset. This comparison produced the following
summary: receiver operating characteristic curve (ROC), area under the curve (AUC), sensitivity, specificity,
and f1 score.
Post Processing
Lesion Length Calculation
[Insert Fig. 6.]

To calculate the length of the lesion, the output of the lesion segmentation process was transformed into
a set of coordinate points. These coordinate points corresponded to the pixels that make up the
segmented lesion area. The points were essentially the boundary or contour of the lesion. The
segmentation process identified and marked the pixels within the lesion, resulting in a binary mask that
highlighted the lesion region in the image. From this binary mask, the contour of the lesion was extracted,
creating a sequence of coordinate points that represented the border of the lesion. Once the coordinate
points were obtained, the system proceeds to calculate the distance between all the points found in the
lesion region. By comparing all the points, the system identified the two points that are farthest from each
other. These two points represented the endpoints of the lesion region, and the distance between these
two points was considered the length of the lesion.
Single Labeller Comparison
Apart from assessing the model's abilities against human labellers, we also performed trials to evaluate
its performance against single senior orthopaedic resident with more than three years of experience
interpreting spine MRI images, without the knowledge of the clinical conditions or other MRI slices in the
patient’s series. We randomly chose 50 MRI slices that were unfamiliar to the model. Subsequently, we
tasked both the model and the single senior orthopaedic resident with diagnosing these slices. The
assessment encompassed two aspects: first, detecting the axial classification score and followed by
quantifying the length of any identified lesions. The length measurement was intended to assess the
model's accuracy in determining the extent of lesions in the images.

Results
Model Interpretation and Data Visualization

The machine learning model consisted of feature extractor layers, each produced an output. Figure 7
depicts featured maps for randomly selected examples of correct and wrong predictions by the model.
The results displayed highlight the portions regarded as crucial by the model, demanding additional
attention. The featured map appeared as color spectrum to show activation over clinically relevant areas
Page 8/25
of the image such as the spinal cord and surrounding structures. The color spectrum varied from blue to
red, with the red areas signifying higher importance to the model's functioning.

Axial Spinal Cord Segmentation

The Correct and Wrong segments entail images were manually classified based on accurate
segmentation by machine learning models, despite encountering some errors. The majority of input
datasets were effectively segmented, with only a small number of non-fatal errors evident in the "wrong"
folder. The axial visualization represented the output of the final layer or, in other words, the heatmap
generated from the machine learning model's output, as depicted in figure 7. This specific visual
representation was selected due to its clarity and comprehensibility.

[Insert Figure 7.]

Several experimental procedures were conducted, including using different pre-processing techniques,
data composition, model type and size, aimed at spinal cord segmentation in MRI images. The original
approach used the UNET model with a component of the Spine Generic Dataset and our dataset with
ratio 1:2. This experiment gave better results than the other preparations tested. With the Floating Point
32, we got the better result even if it affected the prediction speed of the model. For contrast pre-
processing, contrast stretching method was used, which was superior to other contrast pre-processing
techniques. The choice of model architecture also significantly affected the results. Five UNET-based
models were tested and the best results were obtained with the Residual UNET model. Overall, all the
results presented represented excellent performance as shown in table 3, with dice scores above 0.93 and
IoU scores above 0.87, close to 0.9. By combining the experiences obtained in these tests, we obtained a
model with a Dice Score of 0.9442 and IoU Score of 0,8945.

Sagittal Spinal Cord Segmentation

The Sagittal section was partitioned into two distinct layers, designated as layers 7 and 14. Layer 7
serves as the feature extractor while layer 14 functions as the output layer. In layer 7, a notable
observation was the machine learning model's proficiency in allocating proper attention to the region
surrounding the patient's neck, while concurrently considering the background as less significant in the
model's assessment. Layer 14 displays the outcomes of the segmentation process. The intensity of the
coloration, with increasing shades of red, signifies the model's enhanced confidence in identifying the
spinal cord in those specific areas, as shown in figure 8.

[Insert Figure 8.]

In the task of sagittal spinal cord segmentation, the overall achieved scores were remarkably good and
exceeded our expectations as shown in table 4. Using the UNET model alone, we were able to obtain Dice
scores up to 0.9201 and IoU scores up to 0.8541. Among the various models we experimented with, the
best results were achieved by employing the AttentionUNET model and making changes to the data.
Notably, even when we included data without segmentation label, we achieved even better Dice and IoU
Page 9/25
scores. This observation showed that these two types of tasks require distinct approaches and different
models to achieve optimal results.

Axial Image Score Classification

The extracted part was pertained to the feature extraction phase, allowing observation of the model's
focus on specific regions, as displayed in table 5.

The utilization of a 3-class grouping, where Scores of 0 and 1 were combined into label 0, Scores of 2
become label 1, and Scores of 3 and 4 are assigned label 2, resulted in a significant improvement in the
model's performance for both F1 Score Micro and F1 Weighted (table 6). By implementing this grouping
strategy, we observed a substantial enhancement in the model's ability to accurately classify the different
classes, leading to higher F1 Score Micro and F1 Weighted values. This approach was proved to be
effective in capturing the nuances of the data and optimizing the model's predictive capabilities. As a
result, our model achieved improved overall performance, allowing for more reliable and meaningful
predictions in practical scenarios.

AUC stands for Area under Curve, which measures the entire two-dimensional area underneath the entire
ROC curve from (0,0) to (1,1). AUC provides an aggregate measure of performance across all possible
classification thresholds. The concepts of precision, recall, and F1-score are built on four main
components: True Positive, True Negative, False Positive, and False Negative. Using these components,
the metrics such as precision (the accuracy of positive predictions), recall (the ability to find all positive
instances), and F1-score (a balance between precision and recall) can be calculated to gauge how well
the classifier is performing in terms of identifying positive and negative instances. The model achieved
an AUC of 0.79, as shown in figure 9.

[Insert Figure 9.]

Based on our observations, where we applied various types of cropping, ranging from normal crop to
cropping around the spinal cord area only, we found that circular cropping yielded better results
compared to cropping around the spinal cord area alone. This indicated the presence of crucial regions
that become the focus of the model for classifying the different classes. Hence, providing a margin
around the spinal cord area resulted in improved outcomes. We also experimented with different batch
sizes, as the model's performance can be influenced by the amount of data fed during training. Through
these experiments, we found that a batch size of 96 yielded the best performance. By understanding the
key factors that affect the model's performance, we were able to enhance its accuracy and generalization
capabilities, which contributed to more reliable and effective medical image analysis applications.

Matrix evaluation for length measurement accuracy

Illustration for segmentation result

Page 10/25
In the final stage of this research, we explored lesion segmentation. Due to the relatively small dataset
size, it inevitably had an impact on the performance level of the resulting model. As a consequence, the
performance did not reach the same level achieved in the spinal cord segmentation task. The UNET
model yielded the highest scores among the models tested for lesion segmentation. It was evident that as
the complexity of the model increased, since there was a corresponding decrease in performance
correlation. This occurred because complex models require a large amount of training data to achieve
satisfactory performance. As a result, the UNET model, being the simplest model employed, outperformed
the others. Given the limitations of the dataset size, our lesion segmentation results might have been
improved with a more extensive dataset. However, despite these challenges, the UNET model still
demonstrated its effectiveness in this task, presented its ability to perform well with limited data. As an
area for future improvement, acquiring a larger dataset or investigating data augmentation techniques
could potentially boost the performance of more complex models for lesion segmentation. Nonetheless,
this research highlights the significance of model selection and dataset size in achieving successful
medical image segmentation outcomes.

To improve the experimental scores, we employed a technique called transfer learning. Due to the limited
amount of data available, we found a solution by leveraging knowledge from a pre-trained model that
was trained on a different dataset with a larger quantity of data. One of the most promising candidates
we discovered was using data obtained from sagittal spinal cord segmentation. The data used for
training showed similarities with our target data but differed only in terms of labelling. By adopting
transfer learning, we observed a significant increase in the model's accuracy, along with a notable
reduction in both the training time and the number of training samples required. This approach allowed
our model to benefit from the valuable insights and feature representations learned during the pre-training
on the similar dataset. Consequently, the model became more adept at handling the specificity of our
target task, leading to improved performances. This method not only expedited the training process but
also showcased the potential of knowledge transfer in enhancing the performance of the spinal cord
segmentation model.

Development of Program to Predict Severity of SCI

[Insert Figure 10.]

The process began with receiving MRI DICOM data as input. Before using the data for prediction, a pre-
processing step which involved converting DICOM images to a standardized format and intensity
normalization was needed to ensure it is in a suitable format and to extract relevant features for analysis.
Following the pre-processing, the system performed axial spinal cord segmentation using a dedicated
model. This segmentation model was designed to accurately identify and isolate the spinal cord region
within the DICOM images. The resulting segmented data provided a refined and focused input that
specifically includes the spinal cord region for subsequent analysis. The segmented data, which was the
output of the spinal cord segmentation model, was then passed on to the classification model for axial
MRI image score prediction. The classification model has been previously trained on labelled data, which

Page 11/25
includes DICOM images with their corresponding axial MRI image scores. The model has learned to
recognize patterns and relationships between the image features and the associated axial image Scores
during the training phase. For the sagittal images, the process followed a different approach. The
selected DICOM data underwent pre-processing and was then fed as input to the Sagittal Spinal Cord
Segmentation Model, which has been specifically designed to accurately segment the spinal cord region
in sagittal MRI images. The segmented spinal cord region, served as a filter for the data that would be an
input to the lesion segmentation model, as shown in figure. By using the segmented spinal cord region,
the subsequent lesion segmentation model could focus solely on the relevant area, which helped
improving the accuracy and efficiency of lesion detection. After segmenting the lesion area, further
processing was conducted to determine the length of the lesion. The process involved identifying and
measuring the two farthest points within the segmented lesion area. By computing the distance between
these two points, the system could determine the length of the lesion. The example of the developed
program is shown in figure 10.

This advanced application excels at efficiently processing DICOM MRI images, delivering predictions
within a remarkable time range of 18.7 to 22.1 seconds across seven distinct prediction scenarios, each
involving diverse MRI files. The reported time interval specifically encompasses the duration during which
the machine learning model is engaged in the processing pipeline. It's important to note that this
calculation excludes the time spent on form completion and MRI file upload, focusing solely on the
processing phase that involves the machine learning model. The application's ability to achieve such
swift prediction times while handling a variety of scenarios underscores its efficiency and
responsiveness. This feature is especially valuable in time-sensitive clinical settings, where rapid and
accurate predictions are paramount for informed decision-making and patient care planning. The
reported time interval not only reflects the system's computational efficiency but also accounts for the
model's performance in various scenarios, ensuring its applicability across a spectrum of clinical
contexts. However, it should be noted that the results of the experiments, particularly the machine's speed,
can be influenced by various factors when applied in the real world, such as the number of slices, the
volume of requests made to the machine, and the utilization of different machine specifications.

Single Labeller Comparison

From the 50 distinct datasets, the machine-driven procedure for measuring lesion length produced an
average difference of 21.72 mm from the human measured result. Meanwhile, the classification models
for MRI axial view were able to achieve the Area Under Curve (AUC) of 0.79, as illustrated in figure 9,
surpassing single labeller (AUC = 0.65).

Discussion
To our knowledge, this is the first study of segmentation and classification in cervical SCI MRI axial and
sagittal views using a deep learning algorithm that achieved satisfactory results. The model
demonstrated consistent and acceptable results when tested on various patient subgroups. The

Page 12/25
segmentation models achieved an average Dice score of 0.9201, and the classification models were able
to achieve the Area Under Curve (AUC) of 0.79, as illustrated in Fig. 9, surpassing single labeller (AUC =
0.65). The ground-truth in this study was assigned from the consensus of four labellers.

The currently accepted method for evaluating spinal cord injuries (SCI) using MRI scans primarily relies
on subjective and qualitative descriptions of the imaging findings, such as the presence or absence of
spinal cord edema and haemorrhage. As a result, there are only a few established MRI biomarkers that
have been validated for stratifying and predicting the prognosis of SCI, despite MRI being used clinically
for SCI for over four decades.[19–22] To address this limitation, it is essential to not only improve
quantitative methods for MRI assessment but also explore advanced, unbiased, and automated image
analysis techniques. These techniques can potentially expedite the identification of reliable MRI
biomarkers for SCI. Machine learning systems have been developed for spinal degenerative disease and
spinal deformity, yet the use for automated interpretation of SCI is still less advanced.[9, 23, 24] The
process of MRI image segmentation for an injured spinal cord differs significantly from that of a normal
spinal cord. In injured spinal cords, they tend to shrink due to tissue atrophy, resulting in irregular shapes.
This is where the contour-based method, a technique used to outline or delineate the boundaries of the
spinal cord or specific regions of interest within the spinal cord based on their intensity or shape
characteristics in the MRI images, is needed. However, severely injured cords may contain voids or
hemorrhages within the spinal cord, which contradicts the fundamental assumption of contour-based
methods, as they struggle to find a single continuous boundary to separate the tissue of interest from the
rest of the image.[25, 26] Tay et al.[8] used diffused tensor imaging for spinal cord segmentation in
patients with spinal cord injury in the cervical area. They employed a machine learning method to
evaluate lesions and structural damage to predict patient outcomes without segmentation in the pre-
processing part. The study showed a satisfactory result yet the number of samples was limited. Merali et
al.[9] also used a similar approach to develop a deep learning model for detecting spinal cord
compressions in MRI scans, without performing segmentation first. Another study by McCoy et al.[7],
which evaluated a larger dataset of axial MRI images using deep learning methods, showed that the use
of the deep learning method in evaluating spinal cord injury enhances algorithm performance and
provides clinically relevant metrics of spinal cord injury.

In this study, we evaluated and identified the outcome of patients with SCI using deep learning method
from T-2 sequence MRI images. The proposed framework is based on novel segmentation and
classification using a deep learning method from axial and sagittal MRI images. In the first part, we did
an innovative classification approach to recognize the injured area of the spinal cord of the axial MRI
images based on the signal intensity which we classified into three groups. This finding is supported by a
current study. Ren et al.[27] performed a study evaluating the surgical outcomes in cervical spondylotic
myelopathy, using a combined classification of increased signal intensity on MRI. The result showed that
a combined classification of increased signal intensity could be used as a relevant indicator that enables
efficient and reliable injury severity assessments. Axial MRI classification for spinal cord injury based on
signal intensity is commonly referred to as the BASIC method. Originally, the BASIC score classification
involves assessing and categorizing spinal cord injuries according to the observed signal intensity
Page 13/25
patterns into five classifications (Grade 0, 1, 2, 3, and 4).[28] However, patients who fall under the BASIC
score grade 3 and grade 4 categories are generally considered to have a deteriorating neurological
outcome.[20, 29, 30] Thus, we only categorize the image pattern intensity into 3 classifications, as shown
in Fig. 9. Furthermore, through careful observation, our deep learning model demonstrates superior
performance when utilizing three classifications instead of the original BASIC score with five
classifications.[28]

The second part of the study applies a similar segmentation method to the sagittal MRI images to
recognize the extent of the injury, by measuring the lesion length in the intramedullary area. Additionally,
most previous literature only evaluated the axial view of MRI images. In this study we also demonstrate a
method for segmenting lesion areas within the spinal cord from sagittal views, offering the length of the
lesion from the spinal cord injury and a more detailed visualization of the injuries. As shown in most
previous studies, which used a similar method called intramedullary lesion length (IMLL) to measure the
extent of the lesion along the spinal cord seen in the sagittal MRI, it has been shown that IMLL is a
relevant and valid predictor of neurological outcomes in patients with cervical spinal cord injury.[31–34] A
key point of our study is the introduction of a predictive model for determining the lesion length according
to the sagittal MRI images. The conventional manual approach relies on direct visual inspection, while
the machine-driven approach uses a series of computational steps to determine the measurement of
lesion length. This difference highlights the different methodologies used by the two approaches, and the
impact this has on the accuracy of the measurements.

Throughout the model training process, we experimented with different model architectures and different
pre-processing techniques. In the previous study by Okimatsu et al.[35] which consisted similar number of
patients (n = 215), achieved an F1 score of 0.567. Another study by Yi et al.[36] with a total of 804
patients, attained F1 score of 0.9. In our classification task, we employed the ResNet architecture with a
total of 18 layers, which achieved F1 Score of 0.6562. The lower score in our study was due to the smaller
sample size compared to the previous study by Yi et al. A significant highlight of our pre-processing
experiment revolves around the precise identification of the spinal cord area. This achievement was made
possible by harnessing the potential of cutting-edge deep learning algorithms, allowing our model to
effortlessly concentrate on the region of interest and disregard any distractions from other areas in the
image. Due to this focused approach, the model exhibited remarkable performance, achieving precision in
classifying spinal cord injuries. In our segmentation project, we investigated state-of-the-art UNET-based
architectures, such as the classic UNET, Residual UNET, and Attention UNET. These models have
demonstrated satisfactory segmentation capabilities. For the challenging task of axial spinal cord
segmentation, we chose the powerful Residual UNET, which excelled in generating precise cropping areas
for classification. The DICE score of 0.9442 further confirms its ability to accurately delineate the spinal
cord region. In our Lesion Segmentation Task, we utilized a UNET-based model, and its Dice Score of
0.6807 exceeded expectations, ensuring accurate identification of spinal cord lesions. Moreover, the
machine learning demonstrates greater accuracy and a superior outcome in comparison to a single
medical doctor. The conventional manual approach relies on direct visual interpretation of the MRI
images, while the machine-driven approach uses a series of computational steps to measure. It is also
Page 14/25
more subjective and can be susceptible to human error, while the machine-driven approach was trained
based on a consensus of four medical doctors. The assessment and evaluation of axial score
classification is a subjective process that requires expert discussion and consideration of the patient's
clinical status.

Despite being one of a few studies which evaluate the deep learning method for cervical spinal imaging,
our study also has several limitations. First, we recognize that our study is a retrospective study with a
limited number of patients included in our study, which could lead to data bias in the statistical analysis.
To enhance the accuracy of machine learning, it is essential to gather a larger number of patients from
various hospitals. Additionally, we used a step-by-step approach to processing the images. Whole-slice
processing can be more efficient as it requires less computation to process a single image. However,
whole processing can be more difficult to segment the spinal cord from other structures in the image.
Whole-slice processing can also be more sensitive to noise in the image, which can degrade the
performance of the machine learning model. Despite any limitations, we consider this study to offer
valuable insights for practical applications and future research directions. Second, our study only focuses
on the analysis of T2-weight MRI images, which is the gold standard for the diagnosis of spinal cord
injury. Yet, further research may be needed to investigate the outcome of patients with spinal cord injury
using a deep learning method, such as developing a more accurate and robust deep learning models, as
the current deep learning models are still under development, and there is still room for improvement. The
future direction of the deep learning model could also involve assessing the severity of spinal cord
injuries and identifying the factors most likely to impact the patient's recovery. This could lead to the
development of a treatment plan that is specifically tailored to the patient's needs. In summary, this paper
presents a robust framework employing deep learning techniques in the analysis and evaluation of
cervical spine injuries, with the potential to enhance clinical practice and improve patient prognosis.

Conclusion
In recent years, the deep learning method has been applied to automated diagnostic tools, with a
potential substantial development in the future. In this study, we trained and tested a CNN model to detect
spinal cord injury in cervical spine T2 MRI scans. Our deep learning model demonstrates the ability to
identify cervical spinal cord injury on T2-weighted MR images with satisfactory performance and
feasibility to apply in clinical settings. Future work will be needed to facilitate the development of a more
advanced model capable of predicting the outcome of patients with spinal cord injury.

Declarations

Conflict of Interest:
The authors declare that they have no conflict of interest.

Page 15/25
Funding
This work was supported by AO Spine National Research Grant with Project No.AOSEAR202205

Author Contribution
IGLNAAW: Conception, design of the study, acquisition of data, interpretation of data, critically revising for
important intellectual content, final approval of the version to be submittedYK: Design of the study,
interpretation of data, critically revising for important intellectual contentMFD: Conception, design of the
study, acquisition of data, analysis, drafting the article, critically revising for important intellectual
content, final approval of the version to be submittedRL: Interpretation of data, analysis, critically revising
for important intellectual contentDC: Interpretation of data, analysis, critically revising for important
intellectual contentIAL: Acquisition of data, analysis, interpretation of data, revising it critically for
important intellectual contentHAH: Acquisition of data, analysis, interpretation of data, revising it critically
for important intellectual contentKK: Acquisition of data, analysis, interpretation of data, revising it
critically for important intellectual contentDD: Acquisition of data, analysis, interpretation of data, revising
it critically for important intellectual contentMA: Acquisition of data, drafting the article, revising it
critically for important intellectual contentIKS: Conception, design of the study, final approval of the
version to be submitted

References
1. Alizadeh A, Dyck SM, Karimi-Abdolrezaee S (2019) Traumatic Spinal Cord Injury: An Overview of
Pathophysiology, Models and Acute Injury Mechanisms. Front Neurol. doi:
10.3389/fneur.2019.00282
2. Sezer N (2015) Chronic complications of spinal cord injury. World J Orthop 6:24.
3. Soufi K, Nouri A, Martin AR (2022) Degenerative Cervical Myelopathy and Spinal Cord Injury:
Introduction to the Special Issue. J Clin Med 11:4253.
4. Gündoğdu İ, Akyüz M, Öztürk EA, Çakcı FA (2014) Can spinal cord injury patients show a worsening
in ASIA impairment scale classification despite actually having neurological improvement? The
limitation of ASIA Impairment Scale Classification. Spinal Cord 52:667–670.
5. Roberts TT, Leonard GR, Cepela DJ (2017) Classifications In Brief: American Spinal Injury
Association (ASIA) Impairment Scale. Clin Orthop Relat Res 475:1499–1504.
6. Zhang Q, Du Y, Wei Z, Liu H, Yang X, Zhao D (2021) Spine Medical Image Segmentation Based on
Deep Learning. J Healthc Eng 2021:1–6.
7. McCoy DB, Dupont SM, Gros C, Cohen-Adad J, Huie RJ, Ferguson A, Duong-Fernandez X, Thomas LH,
Singh V, Narvid J, Pascual L, Kyritsis N, Beattie MS, Bresnahan JC, Dhall S, Whetstone W, Talbott JF
(2019) Convolutional Neural Network–Based Automated Segmentation of the Spinal Cord and

Page 16/25
Contusion Injury: Deep Learning Biomarker Correlates of Motor Impairment in Acute Spinal Cord
Injury. American Journal of Neuroradiology. doi: 10.3174/ajnr.A6020
8. Tay B, Hyun JK, Oh S (2014) A Machine Learning Approach for Specification of Spinal Cord Injuries
Using Fractional Anisotropy Values Obtained from Diffusion Tensor Images. Comput Math Methods
Med 2014:1–8.
9. Merali ZA, Colak E, Wilson JR (2021) Applications of Machine Learning to Imaging of Spinal
Disorders: Current Status and Future Directions. Global Spine J 11:23S-29S.
10. Ren G, Yu K, Xie Z, Wang P, Zhang W, Huang Y, Wang Y, Wu X (2022) Current Applications of Machine
Learning in Spine: From Clinical View. Global Spine J 12:1827–1840.
11. Gassenmaier S, Küstner T, Nickel D, Herrmann J, Hoffmann R, Almansour H, Afat S, Nikolaou K,
Othman AE (2021) Deep Learning Applications in Magnetic Resonance Imaging: Has the Future
Become Present? Diagnostics 11:2181.
12. Kapoor D, Xu C (2023) Spinal Cord Injury AIS Predictions Using Machine Learning. eNeuro
10:ENEURO.0149-22.2022.
13. Jin R, Luk KD, Cheung JPY, Hu Y (2019) Prognosis of cervical myelopathy based on diffusion tensor
imaging with artificial intelligence methods. NMR Biomed e4114.
14. Kumar Y, Hayashi D (2016) Role of magnetic resonance imaging in acute spinal trauma: a pictorial
review. BMC Musculoskelet Disord 17:310.
15. Cohen-Adad J, Alonso-Ortiz E, Abramovic M, Arneitz C, Atcheson N, Barlow L, Barry RL, Barth M,
Battiston M, Büchel C, Budde M, Callot V, Combes AJE, De Leener B, Descoteaux M, de Sousa PL,
Dostál M, Doyon J, Dvorak A, Eippert F, Epperson KR, Epperson KS, Freund P, Finsterbusch J, Foias A,
Fratini M, Fukunaga I, Gandini Wheeler-Kingshott CAM, Germani G, Gilbert G, Giove F, Gros C, Grussu
F, Hagiwara A, Henry P-G, Horák T, Hori M, Joers J, Kamiya K, Karbasforoushan H, Keřkovský M,
Khatibi A, Kim J-W, Kinany N, Kitzler HH, Kolind S, Kong Y, Kudlička P, Kuntke P, Kurniawan ND,
Kusmia S, Labounek R, Laganà MM, Laule C, Law CS, Lenglet C, Leutritz T, Liu Y, Llufriu S, Mackey S,
Martinez-Heras E, Mattera L, Nestrasil I, O’Grady KP, Papinutto N, Papp D, Pareto D, Parrish TB,
Pichiecchio A, Prados F, Rovira À, Ruitenberg MJ, Samson RS, Savini G, Seif M, Seifert AC, Smith AK,
Smith SA, Smith ZA, Solana E, Suzuki Y, Tackley G, Tinnermann A, Valošek J, Van De Ville D,
Yiannakas MC, Weber II KA, Weiskopf N, Wise RG, Wyss PO, Xu J (2021) Open-access quantitative
MRI data of the spinal cord and reproducibility across participants, sites and manufacturers. Sci
Data 8:219.
16. Jain Anil K. (1989) Fundamentals of digital image processing. Prentice-Hall Inc.
17. Ronneberger O, Fischer P, Brox T (2015) U-Net: Convolutional Networks for Biomedical Image
Segmentation.
18. He K, Zhang X, Ren S, Sun J (2015) Deep Residual Learning for Image Recognition.
19. Chandra J, Sheerin F, Lopez de Heredia L, Meagher T, King D, Belci M, Hughes RJ (2012) MRI in acute
and subacute post-traumatic spinal cord injury: pictorial review. Spinal Cord 50:2–7.

Page 17/25
20. Haefeli J, Mabray MC, Whetstone WD, Dhall SS, Pan JZ, Upadhyayula P, Manley GT, Bresnahan JC,
Beattie MS, Ferguson AR, Talbott JF (2017) Multivariate Analysis of MRI Biomarkers for Predicting
Neurologic Impairment in Cervical Spinal Cord Injury. American Journal of Neuroradiology 38:648–
655.
21. Tetreault LA, Skelly AC, Dettori JR, Wilson JR, Martin AR, Fehlings MG (2017) Guidelines for the
Management of Degenerative Cervical Myelopathy and Acute Spinal Cord Injury: Development
Process and Methodology. Global Spine J 7:8S-20S.
22. Fehlings MG, Tetreault LA, Wilson JR, Kwon BK, Burns AS, Martin AR, Hawryluk G, Harrop JS (2017) A
Clinical Practice Guideline for the Management of Acute Spinal Cord Injury: Introduction, Rationale,
and Scope. Global Spine J 7:84S-94S.
23. Cui Y, Zhu J, Duan Z, Liao Z, Wang S, Liu W (2022) Artificial Intelligence in Spinal Imaging: Current
Status and Future Directions. Int J Environ Res Public Health 19:11708.
24. S P, S A, A D, N G (2022) Detection of Spinal Cord Injury using Deep Learning Algorithm. 2022
International Conference on Sustainable Computing and Data Communication Systems (ICSCDS).
IEEE, pp 270–275
25. Tidwell VK, Kim JH, Song S-K, Nehorai A (2010) Automatic segmentation of rodent spinal cord
diffusion MR images. Magn Reson Med 64:893–901.
26. Mukherjee DP, Cheng I, Ray N, Mushahwar V, Lebel M, Basu A (2010) Automatic Segmentation of
Spinal Cord MRI Using Symmetric Boundary Tracing. IEEE Transactions on Information Technology
in Biomedicine 14:1275–1278.
27. Ren H, Feng T, Wang L, Liu J, Zhang P, Yao G, Shen Y (2020) Using a Combined Classification of
Increased Signal Intensity on Magnetic Resonance Imaging (MRI) to Predict Surgical Outcome in
Cervical Spondylotic Myelopathy. Medical Science Monitor. doi: 10.12659/MSM.929417
28. Talbott JF, Whetstone WD, Readdy WJ, Ferguson AR, Bresnahan JC, Saigal R, Hawryluk GWJ, Beattie
MS, Mabray MC, Pan JZ, Manley GT, Dhall SS (2015) The Brain and Spinal Injury Center score: a
novel, simple, and reproducible method for assessing the severity of acute cervical spinal cord injury
with axial T2-weighted MRI findings. J Neurosurg Spine 23:495–504.
29. Talbott JF, Whetstone WD, Readdy WJ, Ferguson AR, Bresnahan JC, Saigal R, Hawryluk GWJ, Beattie
MS, Mabray MC, Pan JZ, Manley GT, Dhall SS (2015) The Brain and Spinal Injury Center score: a
novel, simple, and reproducible method for assessing the severity of acute cervical spinal cord injury
with axial T2-weighted MRI findings. J Neurosurg Spine 23:495–504.
30. Parthiban J, Zileli M, Sharif SY (2020) Outcomes of Spinal Cord Injury: WFNS Spine Committee
Recommendations. Neurospine 17:809–819.
31. Dobran M, Aiudi D, Liverotti V, Fasinella MR, Lattanzi S, Melchiorri C, Iacoangeli A, Campa S, Polonara
G (2023) Prognostic MRI parameters in acute traumatic cervical spinal cord injury. European Spine
Journal 32:1584–1590.
32. Kamal R, Verma H, Narasimhaiah S, Chopra S (2023) Predicting the Role of Preoperative
Intramedullary Lesion Length and Early Decompressive Surgery in ASIA Impairment Scale Grade
Page 18/25
Improvement Following Subaxial Traumatic Cervical Spinal Cord Injury. J Neurol Surg A Cent Eur
Neurosurg 84:144–156.
33. Mohajeri Moghaddam S, Bhatt AA (2018) Location, length, and enhancement: systematic approach
to differentiating intramedullary spinal cord lesions. Insights Imaging 9:511–526.
34. Aarabi B, Sansur CA, Ibrahimi DM, Simard JM, Hersh DS, Le E, Diaz C, Massetti J, Akhtar-Danesh N
(2017) Intramedullary Lesion Length on Postoperative Magnetic Resonance Imaging is a Strong
Predictor of ASIA Impairment Scale Grade Conversion Following Decompressive Surgery in Cervical
Spinal Cord Injury. Neurosurgery 80:610–620.
35. Okimatsu S, Maki S, Furuya T, Fujiyoshi T, Kitamura M, Inada T, Aramomi M, Yamauchi T, Miyamoto
T, Inoue T, Yunde A, Miura M, Shiga Y, Inage K, Orita S, Eguchi Y, Ohtori S (2022) Determining the
short-term neurological prognosis for acute cervical spinal cord injury using machine learning.
Journal of Clinical Neuroscience 96:74–79.
36. Yi W, Zhao J, Tang W, Yin H, Yu L, Wang Y, Tian W (2023) Deep learning-based high-accuracy
detection for lumbar and cervical degenerative disease on T2-weighted MR images. European Spine
Journal. doi: 10.1007/s00586-023-07641-4

Tables
Table 1. The Distribution of Intramedullary Axial Segmentation Data

Category Spine Generic Cervical SCI Dataset

Number of Slice 85.440 11.374

Number of Subject 267 294

Slice with Labels 62.202 4.784

Slice without Labels 23.238 6.590

Preprocess Size 320 x 320 px 320 x 320 px

Table 2. Segmentation Data Distribution for Intramedullary Sagittal

Category Spine Generic Cervical SCI Dataset

Number of Slice 28.987 5.305

Number of Subject 267 294

Slice with Labels 5.749 873

Slice without Labels 23.238 4.372

Page 19/25
Table 3. Axial Spinal Cord Segmentation Result

Experiment Dice IoU


Score Score

UNET 0,9369 0,8816

+FP32 0,9311 0,8714

+Augmentation Contrast Stretching 0,9318 0,8726

Change Model ResidualUNET 0,9422 0,8909

ResidualUNET (Depth 4 Feature 32), Data Composition 1:2, FP32, Augmentation 0,9442 0,8945
+ Contrast Stretching

Table 4. Sagittal Spinal Cord Segmentation Result

Experiment Dice Score IoU Score

UNET + Contrast Stretching 0,92101 0,8541

AttentionUNET + Contrast Stretching 0,9278 0,8659

AttentionUNET + Contrast Stretching + (Non Spinal Cord for Training) 0,9327 0,8747

Table 5. Grouping Experiment

Total Class F1 Macro F1 Micro F1 Weighted

5 Class 0.3638 0.3692 0.3355

3 Class 0.5576 0.6974 0.7248

Table 6. Preprocessing and Model Experiment

Experiment (for 3 classes) F1 Macro F1 Micro F1 Weighted

EfficientNetV2 S – Without Cropping 0.5576 0.6974 0.7248

EfficientNetV2 S - Circular Crop Radius 128px 0.6659 0.7282 0.7202

ResNet 18 - Circular Crop Radius 128px 0.6228 0.7128 0.7197

Page 20/25
Table 7. Experiment Result of Pre-trained Model

Scenario Dice Score IoU Score Epochs

UNET 0.5330509 0.363516 101

UNET Pretrained 0.6806988 0.51612 25

Table 8. Application Prediction Time

Case Time (s)

1 22.1

2 21.2

3 20.3

4 21.1

5 20.7

6 18.7

7 19.3

Figures

Figure 1

A. Axial MRI without 99th Percentile Contrast Stretching

B. Axial MRI with 99th Percentile Contrast Stretching (B)

Page 21/25
Figure 2

The diagram shows the workflow of Spinal Cord Injury Classification from MRI Axial View

Figure 3

The diagram shows the workflow of Spinal Cord Injury Classification from MRI Sagittal View

Figure 4

SCI Severity Prediction Score from MRI Axial View

Page 22/25
Figure 5

Lesion length Calculation from MRI sagittal view

Figure 6

Lesion length Calculation

Figure 7

Correct and Wrong Segmentation of the Axial Spinal Cord

Page 23/25
Figure 8

Layer 7 and Layer 14 Sagittal Spinal Cord Segmentation

Figure 9

An area under the receiver operating characteristic curve plot

Page 24/25
Figure 10

Example of Development Program

Page 25/25

You might also like