Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Biomedical Signal Processing and Control 74 (2022) 103522

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control


journal homepage: www.elsevier.com/locate/bspc

Subject-transfer framework with unlabeled data based on multiple distance


measures for surface electromyogram pattern recognition
Suguru Kanoga a, *, Takayuki Hoshino a, b, Hideki Asoh a
a
National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
b
Keio University, 5322 Endo, Fujisawa-shi, Kanagawa 252-0882, Japan

A R T I C L E I N F O A B S T R A C T

Keywords: To improve the initial accuracy of wearable sensor-driven human interfaces, inter-subject variabilities must be
Transfer learning reduced through transfer learning. If subject transfer can be performed without labeling the target user’s cali­
Surface electromyogram (sEMG) bration data, an interface that provides stable accuracy can be easily achieved without a cumbersome calibration
Wearable sensor
protocol. Herein, we propose a subject-transfer framework based on multiple distance measures that enables
Multiple distance measures
Pattern recognition
subject transfer using only unlabeled calibration data by minimizing the distance between the data distributions
of the target and the source. To assess the performance of this framework, we used two surface electromyogram
databases (one private database and one public database called the NinaPro database 5) acquired from the same
wearable sensor, the Myo Gesture Control Armband. The proposed framework improved the pattern recognition
accuracy compared with well-established classifiers constructed from randomly selected source subject data. In
the future, we will apply this framework to online human interfaces that are not based on a specific calibration
protocol. The scripts used in this study can be downloaded from https://github.com/aistairc/Unlabeled_STM.

1. Introduction substitute the amount of data with the data of other subjects that have
been measured in advance, or to use a module that has been trained by
Pattern-recognition–based interfaces using surface electromyograms the data of other (source) subjects. However, when considering cross-
(sEMGs) can utilize high signal-to-noise ratio signals, enabling the stable subject classification with others, interface designers are plagued by
observation of data of individuals even when using noisy, inexpensive high inter-subject variability in sEMG owing to differences in muscle
sensors mounted on wearable devices. These interfaces have been activity levels and movement sequences performed [6,7].
deployed in various applications, such as health monitoring [1] and Recently, numerous researchers in various fields have attempted to
security verification [2]. Linear discriminant analysis (LDA) and support alleviate the abovementioned problem through transfer learning by
vector machine (SVM) are the main pattern recognition modules used measuring a small amount of labeled calibration data from the target. In
for such devices [3,4]. The performance of these modules were obtained the sEMG-based pattern recognition field, researchers have proposed
by preparing training data, usually 6 to 120 trials, for each target pattern transfer learning methods for LDA and SVM. Vidovic et al. proposed the
such as hand gestures [5]. Each trial requires 6–20 s of gesture and idle covariate shift adaptation (CSA) for LDA, which linearly tunes the bal­
duration; thus, where a health monitoring system would recognize 10- ance of both mean vectors and covariance matrices calculated using
class body statements with high performance, 60- to 1200-trial (result­ labeled calibration data and labeled source data [8]. Kanoga et al.
ing in approximately 6 min to 6 h) training data will be required for each proposed a semi-supervised style transfer mapping (STM) method for
user. Measurement methods are becoming easier owing to the devel­ SVM, which learns an affine transformation matrix using labeled cali­
opment of wearable devices, but this only makes it easier to prepare the bration data and destination points defined by the labeled source data to
measurements; the number of samples required to train a high- convert new data from the target into the distribution of the source
performance classifier remains the same. Therefore, it is still difficult dataset [9]. STM can also adjust the learned affine transformation ma­
to secure the required amount of labeled training data from users (tar­ trix using information from unlabeled test data through semi-supervised
gets). To shorten the measurement time of a target, it is necessary to learning. SVM with semi-supervised STM effectively improved the cross-

* Corresponding author.
E-mail address: s.kanouga@aist.go.jp (S. Kanoga).

https://doi.org/10.1016/j.bspc.2022.103522
Received 23 June 2021; Received in revised form 13 November 2021; Accepted 21 January 2022
Available online 1 February 2022
1746-8094/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

subject pattern recognition performances compared with LDA with CSA max normalization to the originally calculated sample entropies.
on both 8-class 1-degree of freedom (DoF) and 14-class 2-DoF sEMG Thereafter, a universal threshold of 0.4 was experimentally determined
datasets that were acquired from the same 25 subjects using an eight- [13]. We can extract active 1.5 s segments based on the first time point at
channel wearable device [10]. which the eight-channel data initially exceeded the threshold (see
In most cases, transfer learning requires label information regarding Fig. 1). We did not only use the stationary phase because the myoelectric
calibration data to bridge domain knowledge and optimize the param­ pattern at the beginning of contraction (the transient phase) is known to
eters using the recognition accuracy of a cross-subject classifier [9–12]. provide useful information [16].
However, the situation requiring label information is not ideal because
accurate label information is required before the interface can be used, 2.2. NinaPro DB5 (DB5-A, DB5-B, and DB5-C)
in addition to a cumbersome calibration protocol. In this study, we
focused on the fact that calibration labels can be predicted by a cross- Detailed descriptions of this DB is provided by Pizzolato et al. [4].
subject (e.g., labels of subject A’s calibration data predicted by subject Ten healthy subjects aged 22 to 34 years (including two female and zero
B’s classifier) as pseudo labels. In addition, when projecting calibration left-handed subjects) participated in the experiments. All subjects pro­
data to resemble source data via STM, the destination point for the vided written informed consent. This experimental design was approved
calibration data in the source data can be identified using the pseudo by the Ethics Commission of the Canton of Valais, Switzerland.
labels; moreover, the distance-based similarity between the projected Sixteen-channel sEMG data with a sampling rate of 200 Hz were
calibration data and the source dataset can be calculated while training recorded from the right forearm using two banded devices. Two Myo
the transfer learning model. A method for calculating distance-based Gesture Control Armbands were worn next to each other (see Fig. 2). The
statistics, selecting source subjects that are similar to the calibration upper armband was placed closer to the elbow with the first channel on
data, and transferring information successfully in the absence of labels the radio–humeral joint. The lower armband was placed just below the
in the calibration data could improve the convenience of wearable first, closer to the hand, and tilted by 22.5∘ to fill the gaps left by the
sensor-based human interfaces. sensors of the other armband. The signals were high-pass filtered at 15
Thus, we propose a multiple distance measure (MDM)-based source Hz through a fifth-order Butterworth filter.
selection and transfer learning method that can be embedded in the The subjects were asked to perform three types of exercises (exercise
existing subject-transfer framework for sEMG pattern recognition A, B, and C in the reference [17]) represented by movies shown on the
without labeling calibration data. To evaluate the performance of the screen of a laptop PC. We considered each exercise as a separate DB and
proposed method, we used two sEMG databases (DBs): the private DB defined them as DB5-A, DB5-B, and DB5-C (see Fig. 3)). DB5-A had 12-
recorded in our previous study [9] and a public DB called NinaPro DB5 class motions related to basic motions of the fingers. DB5-B had 17-class
[4]. motions related to eight isometric and isotonic hand configurations and
nine basic wrist motions. DB5-C had 23-class grasping and functional
2. Materials motions where objects were presented to the subject for grasping to
mimic daily life motions. Each motion was repeated six times, resulting
We employed two types of DBs: private and public. The public DB, in 72, 102, and 138 trials for each DB. Each trial lasted 5 s. We extracted
NinaPro DB5 [4], has three types of exercises, which are described as active motion segments from the 5 s data based on the restimulus
separate DBs in this study: basic motions of the fingers (DB5-A); iso­ variable given by the provider of the DBs.
metric, isotonic hand configurations and basic wrist motions (DB5-B);
and grasping and functional motions (DB5-C).
3. Methods

2.1. Private DB 3.1. Feature extraction

Detailed descriptions of this DB are provided in our previous reports Regardless of the DB, each segment was further divided into analysis
[9,10,13]. Twenty-five healthy subjects aged 20 to 31 years (including windows of 250 ms with 4/5 overlaps (50 ms shifts) because the length
eight females and five left-handed subjects) participated in the experi­ of the analysis windows should be less than 300 ms for the permissible
ments. All subjects provided written informed consent. The experi­ range, considering the time lag of the human interfaces [18]. For each
mental design was approved by the Institutional Review Board of AIST analysis window, 11 dimensional features were determined: (1) mean
(approval number: HF2017-784). absolute value, (2) zero crossing, (3) slope sign changes, (4) waveform
Eight-channel sEMG data with a sampling rate of 200 Hz were length, (5) root mean square, and (6–11) sixth-order autoregressive
recorded from the right forearm using a banded device, and the Myo coefficients [19]. This feature set is the gold standard in sEMG-based
Gesture Control Armband (Thalmic Labs, Kitchener, Canada) with pattern recognition and kinematics prediction [20,21]. We collected
evenly arranged sensors based on the fourth channel, which had a eight- or sixteen-channel data; therefore, a 250 ms analysis window was
Bluetooth communication port, was placed 1 cm distal to the belly of the translated into 88- or 176-dimensional features (D = 88 or 176).
brachioradialis muscle. The signals were high-pass filtered at 15 Hz
through a fifth-order Butterworth filter.
The subjects were asked to stand in front of a laptop PC with a 3.2. Subject-transfer frameworks with labeled data
relaxed posture. The palms of their hands remained facing each other.
The subjects enacted eight-class 1-DoF motions: (1) hand opening (rest), 3.2.1. CSA-LDA
(2) wrist flexion, (3) wrist extension, (4) radial deviation, (5) ulnar The basic idea of CSA-LDA is to transfer information from the cali­
deviation, (6) forearm pronation, (7) forearm supination, and (8) hand bration dataset Xcal ∈ RD×Nt , where D and Nt are the feature dimen­
closing. In addition, each motion was repeated five times, resulting in 40 sionality and size of the calibration dataset, respectively, to the
trials (eight motions, five times each). Each trial lasted 6 s. We extracted parameters of the LDA model learned from a source dataset of a pooled
1.5 s segments from the 6 s data by applying a sample entropy-based subject Xsrc ∈ RD×Ns , where Ns is the size of the source dataset. Each data
thresholding segmentation approach. Sample entropy is a measure of contains a class label c ∈ {1, …, C}. LDA has two types of parameters:
quantifying the regularity and complexity of data [14]. In this study, the mean vector μc ∈ RD and covariance matrix Σc ∈ RD×D , which were
motion onsets were detected by multiscale sample entropy, which can calculated from the same class of data. Vidovic et al. proposed an
emphasize the timing of sudden signal fluctuations generated by muscle adaptation by shrinking parameters {τ, λ} toward the mean vector and
contraction [15]. To set a universal threshold in the dataset, we applied covariance matrix obtained from the calibration dataset and source

2
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

Fig. 1. Computerized motion instruction for wrist flexion, motion onset detection, and gesture execution sequence in one trial [13]. The dashed vertical line indicates
the first time point exceeding the threshold of normalized sample entropy in the trial.

Fig. 2. Acquisition setups for NinaPro DB5 (Double Myo) [4].

dataset to tune their weights [8]: hyperparameters were estimated via a grid search with a step size of 0.1,
using a validation dataset additionally obtained from the target. The
μc = (1 − τ)μsrcc + τμcalc , (1)
search range was set from 0 to 1. Note that because the method in­
terferes directly with the LDA model, the classifier’s performance
Σc = (1 − λ)Σsrcc + λΣcalc , (2)
(recognition accuracy) to optimize the hyperparameters {τ, λ} must be
analyzed and maximized; thus, label information is always required in
where τ ∈ [0, 1] and λ ∈ [0, 1] are regularization parameters. The
both the calibration and validation datasets.

3
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

Exercise A
Little
Index Middle Ring Thumb Thumb
1 3 5 7 finger 9 11
flexion flexion flexion adduction flexion
flexion
Little Thumb
Index Middle Ring Thumb
2 4 6 8 finger 10 12
extension extension extension abduction extension
extension

(a)

Exercise B
Thumb Wrist
Wrist
Thumb opposing Pointing pronation Wrist
1 4 7 10 13 16 ulnar
up base of the index (axis: middle flexion
deviation
littel finger finger)

Extension of Wrist Wrist


the index and Abduction Abduction
supination Wrist extension
2 middle fingers; 5 of all 8 of extended 11 14 17
(axis: little extension with closed
flexion of fingers fingers
finger) hand
the others
Flexion of Fingers Wrist Wrist
the ring and Wrist
flexed supination pronation
3 little fingers; 6 9 12 15 radial
together in (axis: middle (axis: little
extension of deviation
fist finger) finger)
the others

(b)

Exercise C
Open a
Large Writing
Medium Tripod Lateral bottle with
1 diameter 5 9 tripod 13 17 21
wrap grasp grasp a tripod
grasp grasp
grasp

Small Turn a screw


Power Prasmatic Parallel (grasp the
diameter Ring
2 6 10 sphere 14 pinch 18 extension 22 screwdriver
grasp grasp
grasp grasp grasp with a stick
(power grip)
grasp)
Cut
Three
Fixed hook Prismatic something
finger Tip pinch Extension
3 7 four fingers 11 15 19 (grasp the
grasp sphere grasp type grasp
grasp 23 knife with an
grasp
index finger
extension
Fixed
Precision grasp
finger Stick Quadpod Power disk
4 8 12 sphere 16 20
extension grasp grasp grasp
grasp
grasp

(c)

Fig. 3. Motions divided by exercise modified from [17]. Exercise A: 12 basic finger motions. Exercise B: eight isometric and isotonic hand configurations and nine
basic wrist motions. Exercise C: 23 grasping and functional motions.

When all classes within the calibration and source datasets share the data are the same, Kanoga et al. proposed a mapping destination via an

same covariance matrix Σ = 1/C Cc=1 Σc , we can obtain the transfer affine transform [9,10]. The parameters A ∈ RD×D and b ∈ RD were
learning-applied linear discriminant δc for a new sample of the target learned by minimizing the weighted squared error:
xk ∈ RD : ∑
Nt
min fi ||Axcali + b − di ||22 + β||A − I||2F + γ||b||22 , (5)
1 T −1
(3)
− 1 A,b
δc (xk ) := xTk Σ μc − μ Σ μc − log2. i=1
2 c
where fi ,xcali ,di ,||⋅||2F ,||⋅||22 , and I are the data confidence, a sample of the
The class with the highest probability is the estimated class:
calibration dataset, the destination point, the Frobenius norm of matrix,
y k = argmaxδc (xk ),
̂ (4) the L2 -norm of vector, and the identity matrix, respectively. The second
and third terms prevent the results of this equation from going too far
c

where ̂
y k ∈ {1, …, C}. from the original position in space. In addition, β and γ control the trade-
off between non-transfer and over-transfer.
3.2.2. STM-SVM 1 ∑
Ni

STM-SVM is based on the concept that a new sample from the target, β=̃
β tr(fi xcali xTcal i), γ = ̃γ fi , (6)
D
obtained after linear projection, must be similar to the source sample i=1

and must be identified by a previously trained cross-subject SVM clas­


sifier [11]. In this study, transfer learning was performed using unla­ where tr(⋅) is the trace of a matrix. Furthermore, ̃
β and ̃γ were selected
beled calibration data. Because adding a semi-supervised learning from 0 to 3 [11]. The search range was set for every 0.2 steps. The
capability makes determining the impact on the results difficult when hyperparameters were optimized by maximizing the recognition accu­
verifying the performance, we applied STM-SVM instead of semi- racy of the validation dataset obtained from the target. In addition, the
supervised STM-SVM to the sEMG-based pattern recognition problems. computations of A and b can be performed according to Eqs. (7)–(10) in
Assuming that the distributions of the new sample and calibration [9].

4
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

In linear projection, the definition of the destination point di is



D
important. According to a previous study [11], we clustered all samples DEuclidean (a, b) = (Fa i − Fbi )2 , (7)
of the source dataset via K-means clustering in each class to derive i=1
clustering centers (destination candidates) from these vectors. In this
study, we set K as 15. The nearest clustering center of a calibration where Fai is the ith element of the probability density function Fa based
sample xcali from the K cluster centers of class c was defined as the on the vector a.
destination point di [22]. Note that label information is required for the The correlation distance is defined as one minus the Pearson’s cor­
calibration data for this search process; however, label information can relation coefficient between the two vectors:
be supported by the pseudo label information estimated by the cross- ∑ D
subject classifier. (Fa i − Fa )(Fbi − Fb )
In the semi-STM algorithm [23], the confidence values of the labeled
i=1
Dcorrelation (a, b) = 1 − √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ . (8)
data were replaced with 0.8 to favor the new unlabeled data. However, ∑ ∑
D D
(Fai − Fa )2 (Fbi − Fb )2
in ordinal STM, the computations of the parameters {A, b} have a full i=1 i=1

confidence value (fi = 1). A test sample xk was transferred to the source
domain (i.e., Axk + b). Furthermore, to estimate the label ̂ y k , the where Fa indicates the mean value of the probability density function
transferred sample was classified using a radial-basis-function-kernel- Fa .
SVM classifier constructed using the selected sources. The hyper­ The Chebyshev distance is the maximal elemental distance between
parameters {C, σ} ∈ [10− 3 , 10− 2 , 10− 1 , …, 103 ] were optimized by a grid two vectors:
search and 5-fold cross-validation. DChebyshev (a, b) = max(|Fai − Fb i |). (9)
i

3.2.3. Source selection


The cosine distance is defined as one minus the cosine similarity be­
To avoid negative transfer, the selection of a few source subjects from
tween two vectors:
all source subjects similar to the target was more accurate than using all
source subject data when there are no post-processing methods [9,11]. ∑
D
(Fai ⋅Fbi )
Therefore, source selection is a key technique in subject-transfer i=1
Dcosine (a, b) = 1 − √̅̅̅̅̅̅̅̅̅̅̅̅√̅̅̅̅̅̅̅̅̅̅̅̅ . (10)
frameworks, and we should determine the source subjects from which ∑D ∑D
knowledge should be borrowed. If distributions between the source and
2
F ai F2bi
target domains are similar, the cross-subject classifier can estimate the
i=1 i=1

correct label for the target data.


The Kullback–Leibler divergence is the nonsymmetric difference be­
Previous studies have claimed that using source data from seven
tween the two probability distributions:
subjects achieves better transfer learning performances compared with
using all source subject data containing over 14 source subjects’ data KL(Fa , Fb ) + KL(Fb , Fa )
DKL (a, b) = , (11)
[11,24]. For our private DB containing 25 subjects, we identified seven 2
source subjects who exhibited higher recognition accuracies among all
source subjects by cross-subject classification based on the calibration ∑
D
pi
KL(p, q) = log( ), (12)
data of the target and used the individual cross-subject classifiers for the i=1
qi
ensemble strategy described in Section 3.4. Note that because the
number of subjects in NinaPro DB5 is relatively small (i.e., 10 subjects), where Fa and Fb are assigned to p and q, or q and p, respectively.
we decided to select three subjects to keep the ratio of the number of Based on the aforementioned five measures, we constructed an XP
subjects to be selected about the same as in the case of private DB. estimator using support vector regression (SVR) with a linear kernel
within the source-subject-model pool, which selects an optimal combi­
3.3. Proposed subject-transfer framework with unlabeled data nation of these measures and predicts the cross-subject classifier per­
formances of source classifiers for the target (see Fig. 4). The
In this study, we propose an MDM-based subject-transfer framework hyperparameter C was empirically set to 10. Using the estimated XPs for
that contains a source-selection module and a linear data-projection the target, the source subjects were ranked and similar seven/three
module. The measures include five commonly used distance measures source subjects’ SVM classifiers were used for the ensemble strategy.
for quantifying inter-subject similarity [24–26]: (1) Euclidean distance, The labels of the target calibration dataset were estimated using the
(2) correlation distance, (3) Chebyshev distance, (4) cosine distance, selected source SVM classifiers through cross-subject classifications and
and (5) Kullback–Leibler divergence. The five distance measures are the ensemble strategy. The original STM can be trained by assigning
measures that can be used in concert to assess the (dis) similarity be­ labels as pseudo labels, which are the highest sum of class likelihoods
tween spectral distributions [24]; in the MDM-based framework, the from the source classifiers to the calibration dataset. Furthermore, the
transferability (XP) of the source model will be learned by a regressor parameters are optimized by minimizing the five measures of the vali­
using these five measures as input. Thus, even if only one of the distances dation dataset when it is mapped into the space of the source data (i.e.,
is strongly related, it can be adjusted so that it is the only one that has maximizing the similarity).
weight; equivalently, the reason why we use five measures instead of
just one is because we expect the accuracy to be higher than the accu­
racy obtained via a single measure. To calculate them between the 3.4. Ensemble strategy
source and calibration datasets, we prepared the median vectors of both
datasets. In addition, according to a previous study [24], the vectors The classifiers from seven/three source subjects were independently
were linearly scaled into 1 × D vectors (bins) using the hist function of used for the private/NinaPro DB. In other words, for one target, there
MATLAB. The 1 × D vectors were then normalized to add up to 1, as a were seven/three transfer-learning-trained classifiers, resulting in
probability density function F. seven/three-class likelihoods for the test data. The sum of the class
The Euclidean distance is the distance between two vectors between likelihoods for all candidates (source subjects) was taken, and the class
two points in Euclidean space: with the largest likelihood was used as the estimated class for the test
data. Here, weighted voting can be used if the pattern recognition ac­
curacy can be obtained by ranking the source subjects against the

5
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

Fig. 4. Flowchart of the proposed MDM-based source selection for the subject-transfer framework modified from the reference [24].

labeled calibration data in advance [11]. However, in the absence of


Table 1
labels, weighting by pattern recognition accuracy for the calibration
Averaged pattern recognition accuracies over 25 subjects and the best param­
data is not possible; therefore, we decided to use a simple approach of
eters for private DB using data and classifiers from seven subjects. The three
summing the class likelihoods over all candidates. types of sources (Random sources, MDMs-based sources, and Accuracy-based
sources) indicate seven and three source subjects selected randomly by mini­
mizing MDMs and maximizing accuracy to the calibration dataset, respectively.
3.5. Performance evaluation
The method in bold is the proposed method.
Source Classifier Accuracy (%) Best parameters
To evaluate the subject-transfer framework with unlabeled data and
to use each subject as a target, we prepared four datasets: (i) the training Random LDA 54.7 –
3
(source) dataset Xsrc , (ii) the calibration dataset X cal , (iii) the validation sources SVM 63.1 C: 10, σ: 10−

dataset Xval , and (iv) the testing dataset Xtes . The first matrix, Xsrc , MDMs-based LDA 56.7 –
3
sources SVM 67.3 C: 10, σ: 10−
contains all-trail labeled data of a source subject and is used to train a
STM-SVM 87.5 C: 10, σ: 10− 3, β: 0.8, γ: 1.2
classifier. The second, third, and fourth matrices, Xcal , Xval , and Xtes ,
LDA 73.7
respectively, represent the data from the target user. The calibration

3
Accuracy-based SVM 78.8 C: 10, σ: 10−
dataset is used to train an affine transformation matrix in STM. The
sources CSA-LDA 90.5 τ: 0.7, λ: 0.3
hyperparameters in the STM algorithm is optimized by the validation
STM-SVM 89.8 C: 10, σ: 10− 3, β: 0.4, γ: 0.2
dataset. The recognition accuracy of each classifier is evaluated on the
testing dataset. In the private DB, the first, second, and third to fifth trial
data of the target were divided into calibration, validation, and testing
datasets. In DB5-A, DB5-B, and DB5-C, the first to second, third to fourth, Table 2
and fifth to sixth trial data of the target were divided into the calibration, Averaged pattern recognition accuracies over 10 subjects and the best param­
eters for DB5-A using data and classifiers from three subjects.
validation, and testing datasets. The dataset combinations were rede­
fined 25 or 10 times. The training dataset for each source included five Source Classifier Accuracy (%) Best parameters
or six trial data for each motion. Calibration datasets are not used in Random LDA 22.1 –
ordinal machine learning algorithms (i.e., LDA and SVM) because the sources SVM 21.0 (C: 10, σ: 10− 3)
distributions between the source and target datasets cannot be bridged MDMs-based LDA 26.6 –
in the algorithms. sources SVM 24.2 (C: 10, σ: 10− 3)
STM-SVM 39.2 (C: 10, σ: 10− 3, β: 2.8, γ: 2.4)
LDA 30.0
4. Results and Discussions

Accuracy-based SVM 29.0 (C: 10, σ: 10− 3)
sources CSA-LDA 51.5 (τ: 0.7, λ: 0.2)
4.1. Pattern recognition accuracies
STM-SVM 52.0 (C: 10, σ: 10− 3, β: 0, γ: 0)

The averaged pattern recognition accuracies over 25/10 subjects and


the best parameters using data and classifiers from seven/three source using randomly selected source subjects with uniform distribution
subjects are presented in Tables 1–4. In addition, Table 5 summarizes because the hyperparameters of these methods cannot be optimized. By
the characteristics of each classifier. Fig. 5 shows a boxplot of the overall contrast, the MDM-based approach can optimize the hyperparameters
recognition accuracies. Both CSA and STM are not applicable when

6
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

Table 3 previous study, enhanced TD was the same as TDAR features; thus, the
Averaged pattern recognition accuracies over 10 subjects and the best param­ subject-specific classification results of enhanced TD + LDA and
eters for DB5-B using data and classifiers from three subjects. enhanced TD + SVM were comparable to the results of LDA and SVM
Source Classifier Accuracy (%) Best parameters with accuracy-based sources. The comparison indicates that the results
Random LDA 20.0 –
of cross-subject classification (see Tables 2,3 and Table 4) are much
sources SVM 22.8 C: 10, σ: 10− 3 lower than those of the subject-specific classification for all exercises
MDMs-based LDA 22.7 – (see Fig. 11 and Table 4 in the reference [29]). It is not contradictory
sources SVM 26.5 C: 10, σ: 10− 3
that the exercise becomes more difficult to identify as it moves from A to
STM-SVM 33.9 − 3
C: 10, σ: 10 , β: 2.8, γ: 2.4 C, and the identification accuracy is lower. However, we can see that
LDA 25.2 – without the target data, the performance is reduced to approximately
Accuracy-based SVM 29.4 (C: 10, σ: 10− 3) half (in the case of using SVM in our and previous studies, DB5-A: 29.0%
sources CSA-LDA 45.2 τ: 0.7, λ: 0.4 vs. 58.32%, DB5-B: 29.4% vs. 51.17%, DB5-C: 17.3% vs. 47.18%).
STM-SVM 43.5 C: 10, σ: 10− 3, β: 0, γ: 0 Ordinal classifiers (i.e., LDA and SVM) do not use any target data for the
training phase. Therefore, they are likely to present lower accuracy than
the pattern recognition performances with individual (subject-specific)
Table 4 models, even if cross-subject classifiers with high performance on cali­
Averaged pattern recognition accuracies over 10 subjects and the best param­ bration data are selected during the ensemble process. These results
eters for DB5-C using data and classifiers from three subjects. indicate that if transfer learning is not used, it is better to use the target’s
Source Classifier Accuracy (%) Best parameters
own data information than to select individual classifiers of similar
subjects and process them in an ensemble.
Random LDA 13.6
When there are labeled calibration data, CSA or STM can reflect the

sources SVM 12.0 (C: 10, σ: 10− 3)
target information in the model or map new data to the source domain,
MDMs-based LDA 14.7 –
sources SVM 15.1 (C: 10, σ: 10− 3) which significantly improves the performance of cross-subject classifi­
STM-SVM 20.3 (C: 10, σ: 10− 3, β: 3, γ: 0.2) cation on all DBs. For NinaPro DB5, the results were much closer to the
LDA 17.8 – subject-specific classification results (in LDAs, DB5-A: 51.5% vs.
Accuracy-based SVM 17.3 (C: 10, σ: 10− 3) 64.36%, DB5-B: 45.2% vs. 57.80%, and DB5-C: 32.6% vs. 48.26%; in
sources CSA-LDA 32.6 (τ: 0.6, λ: 0.2) SVMs, DB5-A: 52.0% vs. 58.32%, DB5-B: 43.5% vs. 51.17%, and DB5-C:
STM-SVM 28.1 (C: 10, σ: 10− 3, β: 0, γ: 0) 28.1% vs. 47.18%). The fact that the performance of the subject-specific
model can be approached using one or two trials of target labeled data is
a very valuable result in demonstrating the usefulness of transfer
learning. By contrast, the more complex the exercise, the more difficult
Table 5
it is to reduce the intra-subject variability of the target and source, and
Characteristics of each classifier.
the lower the improvement in performance is.
Source Classifier Use Xcal ? Need labels to Xcal ?
In all DBs, the proposed method performed worse than STM-SVM
Random LDA No No and CSA-LDA, which are accuracy-based approaches obtained with la­
sources SVM No No bels. By contrast, the proposed method performed better than any of the
MDMs-based LDA Yes No cases of ordinal LDA and SVM classifiers (see Fig. 5). In addition,
sources SVM Yes No
compared to LDAs/SVMs with random sources, the pattern recognition
STM-SVM Yes No
LDA Yes Yes performance of LDAs/SVMs with MDM-based sources was improved
Accuracy-based SVM Yes Yes (1.1–4.5%). Therefore, similar source subject data from unlabeled target
sources CSA-LDA Yes Yes data can be found by MDMs. Interestingly, the performance of the MDM-
STM-SVM Yes Yes
based STM-SVM was higher than that of the ordinal classifiers with
accuracy-based source selection in all DBs. This means that simply
by minimizing the similarity between the projected target data and the selecting source subjects that are similar to the unlabeled data does not
source data, and then, STM-SVM can be applied by using the pseudo considerably reduce the inter-subject variability. Using source classifiers
labels estimated through source classifiers. to assign pseudo labels to the target data and projecting the target data
In myoelectric signal processing, a general model must be learned into the individual feature space containing the classification bound­
that is effective for all subjects; however, sEMG data have very large aries learned from the source data effectively reduces inter-subject
individual differences owing to biometrics such as weight and genetic variability and facilitates cross-subject classification.
muscular characteristics and the optimal method of motion expression Three issues were investigated in this study: (1) the feasibility of
that has been learned to date [27]. Therefore, the results from cross- selecting similar source subjects to the target; (2) the feasibility of
subject classification can be significantly lower than the results from estimating the calibration data with the correct labels; and (3) the
subject-specific classification, which has often been reported [28]. In feasibility of correctly bridging the pseudo labeled calibration data with
our private database (see Table 1), the cross-subject classification results the source data. Considering the upper limit of the results obtained when
of LDA, SVM, CSA-LDA, and STM-SVM with the selected 10 subjects true labels are attached to the calibration data, estimating the trans­
based on accuracies using labeled calibration data (73.7, 78.8, 90.5, and ferability with MDMs and selecting seven/three source subjects results
89.8%) were similar to those obtained in our previous study (76.6, 80.2, in a higher accuracy than that achieved when the source is selected at
89.3, and 90.9%), which used all source subject data and classifiers for random. The accuracy of the cross-subject classifier was improved to
the weighted ensemble strategy [10]. To date, no other study has some extent even without transfer learning; however, we confirmed that
described the results of cross-subject classification when using NinaPro the accuracy of the cross-subject classifier was further improved when
DB5, and it was found that the performance was significantly lower than the proposed MDM-based subject-transfer framework was applied.
that of a previous study that performed subject-specific classification These results confirmed our ability to select sources with close data
separately for each exercise [29]. Comparing our results with the study distributions in terms of classifier performance. However, it is consid­
is relatively straightforward because the studies used similar analysis erably less accurate than the labeled state and must be further improved.
settings (however, they did not apply the ensemble strategy). In the

7
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

Fig. 5. Boxplots of the overall recognition accuracies of (a) private DB, (b) DB5-A, (c) DB5-B, and (d) DB5-C. The method in bold is the proposed method.

4.2. Relation to prior work projection matrix from the labels or pseudo labels of the calibration data
to the destination (representative) points of the nearest source data with
We considered a case of calibration data without labels. In conven­ the same label. Thus, if the source dataset has eight classes and the
tional methods [8,9], the implicit assumption is that the calibration data calibration data has only six classes, the STM projects the new target
have labels; however, this assumption is difficult to achieve in practice data to one of the representative points of the six classes. In other words,
because a measurement phase that labels the interface for a few minutes the remaining two classes will most likely not be classified, even if they
before it is used each time is required. The proposed method can learn are tested in the testing phase. Although the proposed method is
the parameters of an affine transform using sEMG signals during a protocol-free, it needs to run all target behavior classes during the
sequence of motions performed by the user at random times and per­ calibration phase. A method to find a projection matrix that can effec­
mutations. Considering an interface that easily presents some degree of tively transform data even if the number of classes in the calibration data
accuracy to any user from the outset, it would be worthwhile to assume is smaller than the source dataset and the data are unbalanced must be
that the calibration data are unlabeled. proposed, which is considered a challenging task for simpler calibration.
In this study, supervised transfer learning was performed with
pseudo labels; however, we aimed to achieve unsupervised transfer
4.4. Future works
learning. Although it was not covered in this study owing to the small
amount of data that was used, some recent studies have used deep
In the future, to compare the performance of our method with deep
learning models, such as adversarial domain adaptation, to perform
learning transfer learning methods, we aim to conduct comprehensive
unsupervised transfer learning [30,31]. When combined with methods
validation by incorporating datasets with larger amounts of data, as well
such as data augmentation [32], it may be possible to extend the current
as incorporating data augmentation methods such as deep convolutional
shallow approach to a deep approach, even in cases where large
generative adversarial networks (DCGANs) [32] into the subject-
amounts of data are difficult to acquire, such as biomedical signal-based
transfer framework.
interfaces.
In addition, we would like to verify the usefulness of this method for
data obtained from a calibration protocol using random permutation.
4.3. Limitations Specifically, we have been testing the performance of online human
interfaces using labeled/unlabeled data obtained by a calibration pro­
In this study, we assumed that a large amount of unlabeled data will tocol under controlled conditions with a fixed number and order of
be measured by wearable sensing, and we used only datasets measured motion tasks and unlabeled data obtained by a calibration protocol with
by wearable sensors (i.e., Myo Gesture Control Armband). Therefore, we a free number and order of the tasks.
tested our method with a relatively low sampling frequency. We must
consider how the model will work when high-frequency components are 5. Conclusions
included.
The unlabeled data were evaluated by balancing the amount of data We proposed an MDM-based subject-transfer framework for sEMG
for each class. In the case of protocol-free calibration, the balance of pattern recognition. The performance of this framework was confirmed
unlabeled data against classes is always unbalanced. STM computes a using two sEMG databases (i.e., a private DB and a public DB called

8
S. Kanoga et al. Biomedical Signal Processing and Control 74 (2022) 103522

NinaPro DB5). The results showed that the recognition accuracy of the [10] S. Kanoga, T. Hoshino, H. Asoh, Semi-supervised style transfer mapping-based
framework for sEMG-based pattern recognition with 1-or 2-DoF forearm motions,
proposed framework was higher than those of models based on data
Biomed. Signal Process. Control 68 (2021), 102817.
from randomly selected subjects. However, compared to the case in [11] J. Li, S. Qiu, Y.-Y. Shen, C.-L. Liu, H. He, Multisource transfer learning for cross-
which labels are included in the calibration data, there is still a need for subject EEG emotion recognition, IEEE Trans, Cybern, 2019.
improvement. In future studies, the effective transfer of information [12] C.-S. Wei, Y.-P. Lin, Y.-T. Wang, C.-T. Lin, T.-P. Jung, Transfer learning with large-
scale data in brain-computer interfaces, in: 2016 38th Annual International
from an unbalanced amount of calibration data should be examined, the Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE,
confidence of the pseudo label should be quantified, and this framework pp. 4666–4669.
should be expanded to a deep unsupervised transfer learning process. [13] S. Kanoga, A. Kanemura, H. Asoh, Are armband sEMG devices dense enough for
long-term use?-Sensor placement shifts cause significant reduction in recognition
This framework can also be made more useful for the daily use of sEMG- accuracy, Biomed. Signal Process. Control 60 (2020).
based human interfaces, such as supplementary robotic/prosthetic [14] D.E. Lake, J.S. Richman, M.P. Griffin, J.R. Moorman, Sample entropy analysis of
hands, if information can be extracted from the sequence using a cross- neonatal heart rate variability, Am. J. Physiol. Regul. Integr. Comp. Physiol 283
(2002) R789–R797.
subject classifier to assign pseudo labels to the sequence after perform­ [15] X. Zhang, P. Zhou, Sample entropy analysis of surface EMG for improved muscle
ing several motions at random. activity onset detection against spurious background spikes, J. Electromyogr.
Kinesiol. 22 (2012) 901–907.
[16] I.J.R. Martinez, A. Mannini, F. Clemente, A.M. Sabatini, C. Cipriani, Grasp force
estimation from the transient EMG using high-density surface recordings, J. Neural
Declaration of Competing Interest Eng. 17 (2020), 016052.
[17] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A.-G.M. Hager, S. Elsig,
G. Giatsidis, F. Bassetto, H. Müller, Electromyography data for non-invasive
The authors declare that they have no known competing financial
naturally-controlled robotic hand prostheses, Scientific Data 1 (2014) 1–13.
interests or personal relationships that could have appeared to influence [18] K. Englehart, B. Hudgins, A robust, real-time control scheme for multifunction
the work reported in this paper. myoelectric control, IEEE Trans. Biomed. Eng. 50 (2003) 848–854.
[19] Y. Huang, K.B. Englehart, B. Hudgins, A.D.C. Chan, A Gaussian mixture model
based classification scheme for myoelectric control of powered upper limb
Acknowledgment prostheses, IEEE Trans. Biomed. Eng. 52 (2005) 1801–1811.
[20] N. Jiang, S. Muceli, B. Graimann, D. Farina, Effect of arm position on the prediction
of kinematics from EMG in amputees, Med. Biol. Eng. Comput. 51 (2013) 143–151.
This work was partially supported by a project of the New Energy [21] K.-T. Kim, C. Guan, S.-W. Lee, A subject-transfer framework based on single-trial
and Industrial Technology Development Organization (NEDO) Project EMG analysis using convolutional neural networks, IEEE Trans. Neural Syst.
No. JPNP20006 and JSPS KAKENHI Grant No. JP20K19854. Rehabil. Eng. 28 (2019) 94–103.
[22] J.C. Bezdek, L.I. Kuncheva, Nearest prototype classifier designs: An experimental
study, Int. J. Intell. Syst. 16 (2001) 1445–1473.
References [23] M. Rohrbach, S. Ebert, B. Schiele, Transfer learning in a transductive setting, in:
Advances in Neural Information Processing Systems, pp. 46–54.
[24] C.-S. Wei, Y.-P. Lin, Y.-T. Wang, C.-T. Lin, T.-P. Jung, A subject-transfer framework
[1] C.I. Christodoulou, C.S. Pattichis, Unsupervised pattern recognition for the
for obviating inter-and intra-subject variability in EEG-based drowsiness detection,
classification of EMG signals, IEEE Trans. Biomed. Eng. 46 (1999) 169–178.
NeuroImage 174 (2018) 407–419.
[2] M.B.I. Reaz, M.S. Hussain, F. Mohd-Yasin, Techniques of EMG signal analysis:
[25] A. Gupta, S. Parameswaran, C.-H. Lee, Classification of electroencephalography
Detection, processing, classification and applications, Biol. Proced. Online 8 (2006)
(EEG) signals for different mental activities using Kullback Leibler (KL) divergence,
11–35.
in, IEEE International Conference on Acoustics, Speech and Signal Processing,
[3] A.J. Young, L.H. Smith, E.J. Rouse, L.J. Hargrove, Classification of simultaneous
IEEE, 2009, pp. 1697–1700.
movements using surface EMG pattern recognition, IEEE Trans. Biomed. Eng. 60
[26] R.K. Chaurasiya, N.D. Londhe, S. Ghosh, A novel weighted edit distance-based
(2012) 1250–1258.
spelling correction approach for improving the reliability of devanagari script-
[4] S. Pizzolato, L. Tagliapietra, M. Cognolato, M. Reggiani, H. Müller, M. Atzori,
based P300 speller system, IEEE Access 4 (2016) 8184–8198.
Comparison of six electromyography acquisition setups on hand movement
[27] H. Han, S. Jo, Supervised hierarchical Bayesian model-based electomyographic
classification tasks, PloS one 12 (2017).
control and analysis, IEEE J. Biomed. Health Inform. 18 (2013) 1214–1224.
[5] P. Kaczmarek, T. Mańkowski, J. Tomczyński, putEMG-a surface electromyography
[28] X. Jiang, B. Bardizbanian, C. Dai, W. Chen, E.A. Clancy, Data management for
hand gesture recognition dataset, Sensors 19 (2019) 3548.
transfer learning approaches to elbow EMG-torque modeling, IEEE Trans. Biomed.
[6] E.J. Rechy-Ramirez, H. Hu, Bio-signal based control in assistive robots: A survey,
Eng. (2021).
Digit, Commun. Netw. 1 (2015) 85–101.
[29] L. Chen, J. Fu, Y. Wu, H. Li, B. Zheng, Hand gesture recognition using compact
[7] E. Campbell, J. Chang, A. Phinyomark, E. Scheme, A comparison of amputee and
CNN via surface electromyography signals, Sensors 20 (2020) 672.
able-bodied inter-subject variability in myoelectric control, arXiv preprint arXiv:
[30] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette,
2003.03481 (2020).
M. Marchand, V. Lempitsky, Domain-adversarial training of neural networks, The,
[8] M.M.-C. Vidovic, H.-J. Hwang, S. Amsüss, J.M. Hahne, D. Farina, K.-R. Müller,
Journal of Machine Learning Research 17 (2016) 2030–2096.
Improving the robustness of myoelectric pattern recognition for upper limb
[31] U. Côté-Allard, G. Gagnon-Turcotte, A. Phinyomark, K. Glette, E. Scheme, F.
prostheses by covariate shift adaptation, IEEE Trans. Neural Syst. Rehabil. Eng. 24
Laviolette, B. Gosselin, Unsupervised domain adversarial self-calibration for
(2015) 961–970.
electromyographic-based gesture recognition, arXiv preprint arXiv:1912.11037
[9] S. Kanoga, T. Hoshino, H. Asoh, Subject transfer framework based on source
(2019).
selection and semi-supervised style transfer mapping for sEMG pattern recognition,
[32] R. Anicet Z., E. Luna C., Parkinson’s disease EMG data augmentation and
in: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and
simulation with DCGANs and style transfer, Sensors 20 (2020) 2605.
Signal Processing (ICASSP), IEEE, pp. 1349–1353.

You might also like