Professional Documents
Culture Documents
Predict Motor Recovery AI
Predict Motor Recovery AI
Korea; bDepartment of Game Design, Faculty of Arts, Uppsala University, Uppsala, Sweden; cDepartment of Physical
Medicine and Rehabilitation, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Republic
of Korea; dDepartment of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu, Republic of
Korea
Fig. 1. The overall process of analysis of data with ML. MBC, modified Brunnstrom classification; FAC, func-
tional ambulation score; ML, machine learning.
residual disability even after acute stroke management [2, field has produced promising results. In particular, the
3]. Appropriate rehabilitative treatment can improve complex and unpredictable nature of human physiology
functional recovery and reduce post-stroke disability [4, has proven to be better explained by ML algorithms in
5]. Knowledge of motor function prognosis is useful for many circumstances [13]. Previous studies in the field of
clinicians in determining the most appropriate and effec- stroke rehabilitation demonstrated the potential of ML to
tive rehabilitation strategy [4]. be used in motor function prediction [13–15]. However,
To date, several prognostic scoring systems, such as because the input data used in previous ML studies were
Acute ASTRAL, DRAGON, and SEDAN scores, have diverse and the data forms used by hospitals in actual
been developed for this purpose [6, 7]. In addition, trans- clinical practice differed, it is difficult to uniformly use it
cranial magnetic stimulation and diffusion tensor trac- in many hospitals. In this study, we created a practical ML
tography studies have been used for predicting motor model using the most common data measured in almost
outcome after stroke [8, 9]. However, in clinical practice, all rehabilitation hospitals as input data.
the aforementioned methods frequently fail to predict the
actual results [10]. Furthermore, because each hospital
Materials and Methods
has a variety of tests and uses different evaluation tools, it
is often difficult to apply the abovementioned predicting This study was approved by the Institutional Review Board of
systems or tools. Therefore, efforts to improve the predic- Yeungnam University Hospital, and informed consent was waived
tion accuracy and the use of commonly used are required. because of the retrospective nature of the study and because the
Machine learning (ML) is an artificial intelligence analysis involved anonymous clinical data. This study included
stroke patients who were admitted to the physical medicine and
technique in which a system learns patterns and rules
rehabilitation department of a single university hospital for stroke
from a given data [11–13]. It has the advantage of detect- rehabilitation between January 2009 and March 2021. The steps
ing potential interactions between many attributes/vari- of the modeling process applied in this study are shown in Figure
ables. The application of this technology in the medical 1.
Sample size (patients) 429 for training, 129 for validation, 56 for test, total 614 583 for training, 175 for validation, 75 for test, total 833
Sample zero ratio Train 50.12%, validation 50.45%, test 50.00% Train 45.97%, validation 46.0%, test 46.0%
DNN 3 layers with 256-256-256 neurons, learning rate 4e−06 3 layers with 256-512-1024 neurons, learning rate 5e−06
SGD optimizer, ELU activation, batch size 16, drop out rate 0.2 SGD optimizer, ReLU activation, batch size 32, drop out rate 0.6
Training accuracy: 78.09% Training accuracy: 90.22%
Validation accuracy: 83.72% Validation accuracy: 83.43%
Test accuracy: 80.36% Test accuracy: 78.67%
Validation AUC 0.836 with CI (0.774–0.898) Validation AUC 0.836 with CI (0.782–0.891)
Test AUC 0.804 with CI (0.703–0.904) Test AUC 0.782 with CI (0.687–0.877)
ML, machine learning; MBC, modified Brunnstrom classification; FAC, functional ambulation category; DNN, deep neural network; SGD, stochastic gradient
descent; AUC, area under the curve; CI, confidence interval.
Data Collection of the inputs [20]. The random forest algorithm comprises several
The inclusion criteria were as follows: (1) first-ever stroke, (2) decision trees that consist of multiple true or false conditions using
age over 20 years, (3) hemiplegia or hemiparesis following stroke, input variables [21]. The final classification is based on the sum of
(4) clinical data collected within 7–30 days (early stage, day of the decisions made by the decision trees [22]. Logistic regression
transfer, or day of admission to the rehabilitation department) af- is a statistical technique for estimating the causal relationship be-
ter onset, and (5) absence of serious medical complications, such tween a dependent variable with only two values and the indepen-
as pneumonia or cardiac problems (acute coronary syndromes [4 dent variables using a logistic function [23].
patients] and cardiomyopathy [2 patients]) from onset to final Two DNN models were trained for FAC and MBC predictions
evaluation. The exclusion criteria were as follows: (1) other preex- with all variables as inputs to classify the patients’ motor outcomes.
isting brain or spinal cord lesions and (2) presence of other periph- For the DNN, three layers with 256-256-256 neurons (MBC predic-
eral neuropathies that could affect ankle dorsiflexion strength, tion) and 256-512-1024 neurons (FAC prediction) were used. For
such as peripheral polyneuropathy. the random forest model, 500 decision trees were used (Table 1).
The following demographic and clinical data were collected To understand the effects of two demographic variables – sex
when patients were transferred to the rehabilitation unit (8–30 and age, we evaluated performance change for each model which
days after stroke onset): age, sex, type of stroke (ischemic/hemor- does not use sex or age as input data. The evaluation result shows
rhagic), modified Brunnstrom classification (MBC), functional that the DNN model with all data outperformed the other two
ambulation score (FAC), and Medical Research Council (MRC) models (without sex or age) on both validation and test datasets.
score for muscle strength of shoulder abductor, elbow flexor, fin- Logistic regression and random forest model show only minor dif-
ger flexor, finger extensor, hip flexor, knee extensor, and ankle ferences or no difference without sex or age. The details are shown
dorsiflexor of the affected side [16, 17]. We selected these input in Tables 2 and 3.
variables because they are most commonly and easily collected We categorized the output variables as “good” and “poor” in
when stroke patients are admitted or visit the hospital for reha- the upper and lower extremities. In the lower extremity, patients
bilitation. with an FAC of <4 at 6 months after stroke onset were considered
We used three ML algorithms: deep neural network (DNN), to have a “poor” outcome, while those with scores ≥4 were consid-
random forest, and logistic regression [18]. The DNN algorithm ered to have a “good” outcome. In the upper extremity, patients
consists of layers of interconnected artificial neurons [19]. An ar- with an MBC score of <5 at 6 months after stroke onset were con-
tificial neuron is designed based on the biological neuron and re- sidered to have a “poor” outcome, while those with scores ≥5 were
ceives multiple inputs multiplied by weights and outputs the sum considered to have a “good” outcome.
ML model Base model with all variables Model without sex Diff Model without age Diff
DNN Training accuracy: 78.09% Training accuracy: 80.89% Training accuracy: 85.08%
Validation accuracy: 83.72% Validation accuracy: 83.72% 0% Validation accuracy: 80.62% −3.1%
Test accuracy: 80.36% Test accuracy: 78.57% −1.79% Test accuracy: 78.57% −1.79%
Logistic Training accuracy: 75.99% Training accuracy: 76.22% Training accuracy: 75.52%
regression Validation accuracy: 79.07% Validation accuracy: 77.52% −1.55% Validation accuracy: 78.79% −0.28%
Test accuracy: 76.79% Test accuracy: 75.00% −1.79% Test accuracy: 78.57% +1.78%
Random Out-of-bag score estimate: 72.73% Out-of-bag score estimate: 70.16% Out-of-bag score estimate: 72.49%
forest Mean validation accuracy score: 73.64% Mean validation accuracy score: 73.64% 0% Mean validation accuracy score: 75.19% +1.55%
Mean test accuracy score: 75.00% Mean test accuracy score: 75.00% 0% Mean test accuracy score: 76.79% +1.79%
ML model Base model with all variables Model without sex Diff Model without age Diff
DNN Training accuracy: 90.22% Training accuracy: 79.93%% Training accuracy: 79.07%
Validation accuracy: 83.43% Validation accuracy: 81.17% −2.26% Validation accuracy: 80.00% −3.43%
Test accuracy: 78.67% Test accuracy: 69.33% −9.33% Test accuracy: 72.00% −6.67%
Logistic Training accuracy: 71.87% Training accuracy: 71.87% Training accuracy: 69.98%
regression Validation accuracy: 78.86% Validation accuracy: 78.86% 0% Validation accuracy: 77.71% −1.15%
Test accuracy: 69.33% Test accuracy: 69.33% 0% Test accuracy: 70.67% +1.34%
Random Out-of-bag score estimate: 67.75% Out-of-bag score estimate: 66.90% Out-of-bag score estimate: 67.41%
forest Mean validation accuracy score: 74.29% Mean validation accuracy score: 73.14% −1.15% Mean validation accuracy score: 77.14% +2.85%
Mean test accuracy score: 69.33% Mean test accuracy score: 66.67% −2.66% Mean test accuracy score: 68.00% +1.33%
For this study, data from 842 patients (mean age: 60.8 ± 13.2 culated. The confidence interval (CI) for AUC was calculated using
years; 480 males, 353 females; stroke onset: 16.3 ± 6.6 days) were the approach used by DeLong et al. [24].
analyzed. To develop a model for predicting motor outcome of upper
extremity, 614 patients’ data (mean age 61.0 ± 13.0 years; 349 males,
265 females; stroke onset: 16.4 ± 6.5 days) were used. To prevent
Results
overfitting of the models, 70% (n = 429, poor: good = 50.1%: 49.9%),
21% (n = 129, poor: good = 50.5%: 49.7%), and 9% (n = 56, poor: good
= 50.0%: 50.0%) of these data were included in the training, valida- In the prediction of the motor outcome of the upper
tion, and test sets, respectively. To develop a model for predicting extremity, the AUC of the validation dataset for the DNN
motor outcome of lower extremity, 833 patients’ data (mean age: 60.8 model was 0.836 (95% CI, 0.774–0.898). For the random
± 13.2 years; 480 males, 353 females; stroke onset: 16.3 ± 6.6 days)
forest and logistic regression models, the AUC of the val-
were used: 70% (n = 583, poor : good = 46.0%: 54.0%) for training set,
21% (n = 175, poor: good = 46.3%: 53.7%) for validation set, and 9% idation dataset was 0.736 (95% CI, 0.660–0.813) and 0.790
(n = 75, poor: good = 45.3%: 54.7%) for test sets. TensorFlow version (95% CI, 0.722–0.858), respectively (Table 1; Fig. 2).
2.4.1 (Google, Mountain View, CA, USA) and the scikit-learn toolkit As for the motor outcome of the lower extremity, the
version 0.24.1 were used to train the ML models. AUC of the validation dataset for the DNN model was
0.836 (95% CI, 0.782–0.891). For the random forest and
Statistical Analysis
Statistical analyses were performed using Python 3.8.8 and logistic regression models, the AUC was 0.741 (95% CI,
scikit-learn version 0.24.1. Receiver operating characteristic curve 0.675–0.806) and 0.795 (95% CI, 0.736–0.853), respec-
analysis was employed, and area under the curve (AUC) was cal- tively (Table 1; Fig. 2).
0.8 0.8
True positive rate
0.4 0.4
0.2 Deep neural network, AUC = 0.836 0.2 Deep neural network, AUC = 0.836
Logistic regression, AUC = 0.790 Logistic regression, AUC = 0.795
Random forest, AUC = 0.736 Random forest, AUC = 0.741
0 0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
False positive rate False positive rate
Fig. 2. Receiver operating characteristic curves for validation data. AUC, area under the curve.