Human Activity Recognition With Accelerometer and Gyroscope: A Data Fusion Approach

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/351572557
Human Activity Recognition With Accelerometer and Gyroscope: A Data Fusion

Approach
Article in IEEE Sensors Journal · May 2021

DOI: 10.1109/JSEN.2021.3079883
CITATIONS READS
50 1,713
2 authors, including:
Raul Fernandez Rojas

University of Canberra
48 PUBLICATIONS 605 CITATIONS
SEE PROFILE
All content following this page was uploaded by Raul Fernandez Rojas on 12 June 2021.
The user has requested enhancement of the downloaded file.

IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2021 1
Human Activity Recognition with Accelerometer

and Gyroscope: a Data Fusion Approach
Mitchell Webber, Raul Fernandez Rojas
Abstract— This paper compares the three levels of data fusion with
the goal of determining the optimal level of data fusion for multi-
sensor human activity data. Using the data processing pipeline,
gyroscope and accelerometer data was fused at the sensor-level,
feature-level and decision-level. For each level of data fusion four
different techniques were used with varying levels of success. This
analysis was performed on four human activity publicly-available
datasets along with four well-known machine learning classifiers to
validate the results. The decision-level fusion (Acc = 0.7443±0.0850)
out performed the other two levels of fusion in regards to accuracy,
sensor level (Acc = 0.5934 ± 0.1110) and feature level (Acc = 0.6742 ±
0.0053), but, the processing time and computational power required
for training and classification were far greater than practical for a HAR system. However, Kalman filter appear to be
the more efficient method, since it exhibited both good accuracy (Acc = 0.7536 ± 0.1566) and short processing time
(time = 61.71ms ± 63.85); properties that play a large role in real-time applications using wearable devices. The results of
this study also serve as baseline information in the HAR literature to compare future methods of data fusion.
Index Terms— Data, Fusion, HAR, Human, Activity, Recognition, Feature, Sensor, Decision, Voting, Bagging, Kalman,
Complementary, Factor, Analysis, Gyroscope, Accelerometer, SVD, MDS, PCA, Factor, Principal, Component, Singular,
Value, Decomposition, Multi-dimensional, Scaling
I. I NTRODUCTION Wearable sensors and smartphones are ubiquitous devices

that include a variety of built-in sensors such as accelerome-
H UMAN activity recognition (HAR) has become a very

active research topic in the field of pervasive computing
and ubiquitous sensing, with many applications in real-world
ters, gyroscopes, magnetometers, GPS, etc. The inclusion of
these sensors assists these devices in the continuous mon-
itoring and tracking of human activity in real time. One
scenarios such as, healthcare, smart environments, security and type of sensor that has been widely used for HAR is the
surveillance, sports performance, or human-computer inter- accelerometer sensor; this device can measure static (e.g.,
action. HAR can be defined as the identification of actions gravity) and dynamic (vibration or movement) forces of ac-
of one or more individuals using a series of observations on celeration acting on the sensor, providing useful data in the
their individual actions and environmental conditions [33]. In detection of movement patterns [44]. Another popular sensor
the research literature, there are two main methods for HAR for HAR is the gyroscope; this sensor measures the angular
analysis, camera-based and wearable-based methods. Camera- velocity, i.e., the rate of change of the sensor’s orientation
based applications are generally more expensive since the [35], providing information for the detection of patterns in
installation of cameras and other infrastructure is required. activities that involve rotation around a particular axis. Often,
On the other hand, wearable-based methods are cheaper and these two sensors are combined due to their complementary
more efficient by using wearable sensors (e.g., wristband, skills to improve activity monitoring. The combination of
waistband, smart glasses, etc.), or smartphones to capture accelerometer and gyroscope sensors in a single device is
human movement data [6]. Currently, the integration of these referred to as an inertial measuring unit (IMU) or inertial
wearable devices in our daily life has become more accessible sensors.
due to their low cost, ease of use, small size, low power The combination of accelerometer and gyroscope generally
consumption, and multi-tasking capabilities. achieves a more reliable and accurate measure for HAR, often
with the use of data fusion techniques. Data fusion is the
Paper submitted: January 2021 combination of multiple sources of data to obtain improved
Mitchell Webber is with the Human-Centred Technology Research
Centre, Faculty of Science and Technology, University of Canberra, information that could be achieved using a single source alone
Canberra 2617, Australia. [11], [14]. In addition, data fusion is a technique that allows
Raul Fernandez Rojas is with the Human-Centred Tech- incorporation of data from multiple sources (from the same
nology Research Centre, Faculty of Science and Technol-
ogy, University of Canberra, Canberra 2617, Australia (e- or different type) to infer specific information. For instance,
mail:raul.fernandezrojas@canberra.edu.au). gyroscopes and accelerometers measure multiple perspectives
2 IEEE SENSORS JOURNAL, VOL. XX, NO. XX, XXXX 2021
(linear acceleration and angular velocity, respectively) of the A large number of studies have explored the use of ac-
same event (movement), that alone might not provide enough celerometer and gyroscope data to model and recognise human
information to obtain accurate orientation, position, or ve- activities. A triaxial accelerometer offers the linear accelera-
locity of an object or person. Therefore, the combination of tion of three orthogonal axes: forward acceleration in y-axis,
accelerometer and gyroscope data improves human activity horizontal acceleration in the x-axis, and vertical acceleration
monitoring due to the fusion of different information to in the z-axis [61]. However, the use of accelerometers as a
measure human movement. single modality for HAR has been found to be ineffective to
In terms of the data processing pipeline, data fusion tech- discriminate activities with similar patterns (e.g., ascending or
niques can be classified into three main levels: sensor-level descending stairs) [43]. On the other hand, a triaxial gyroscope
fusion, feature-level fusion, and decision-level fusion. Sensor- measures the angular acceleration from three different angles
level allows fusing the raw signals from multiple sensors. Data (pitch, roll, and yaw) that help estimate the orientation and ro-
fusion at this level is carried out immediately after data is col- tation of the movement being captured [61]. The combination
lected from sensors and before feature extraction [57]. Feature- of accelerometer and gyroscope sensors provide mechanisms
level refers to the process of combining multiple features (e.g., to distinguish numerous human activities with similar motion
in time, frequency, or wavelet domain) which were obtained data and therefore obtain higher recognition rate than using
from the raw sensors before classification/regression is applied the individual sensors themselves.
[3]. Decision-level fusion is where multiple weak classifier In the data processing pipeline, sensor-level fusion is the
results are combined to output a more accurate classification most basic data fusion method. Algorithms employed at this
[20]. Although, many studies have addressed the fusion of level are generally based on signal processing algorithms.
accelerometer and gyroscope sensor to obtained improved Kalman filter is the most popular fusion technique to determine
accuracy [3], [13], [34], [53], [54], there is no clear evidence the state of a target under movement [11]. This technique has
of which level of fusion is more appropriate for HAR studies. been used to fuse accelerometer and gyroscope sensor data in
Therefore, the objective of this paper is to identify the different applications; for instance, to design a inertial sensor-
best level of data fusion using accelerometer and gyroscope based gait analysis [58], to analyse postural information in
sensors, which can lead to better classification performance patients with Parkinson’s disease [1], or to design an inertial
in HAR systems. With that in mind, a research study on sensor-based ambulatory movement analysis [49]. Another
four public HAR datasets was carried out. The experimental popular fusion technique is the Complementary filter. This
results indicate that the classification accuracy can be further type of filtering technique is based on a high-pass filter and
improved using data fusion and that decision-level fusion low-pass filter to remove accelerometer spikes and gyroscopic
achieved the highest accuracy results across all four datasets. drift [28]. This filter has been applied to fuse accelerometer
The main contributions of this study can be summarised as and gyroscope data different real time applications due to
follows: 1) provide an analytical comparison between common its low complexity and less computational requirements [19],
fusion techniques at each level, 2) compare the performance [25], [64]. Two less commonly used fusion techniques are the
of different classifiers for HAR analysis, 3) identify the best Signal Magnitude Vector (SMV) and the Absolute Vertical
level of data fusion for the improvement of HAR systems, and Acceleration (AVA) of the signal. The advantage of using SMV
4) present a baseline information in the HAR literature that or AVA is that it removes the noise from the acceleration and
other studies can use to compare future fusion methods. eliminates the drift in angular velocity. These two fusion tech-
The rest of paper is organized as follows. In Section II, niques have been used for fall detection [26], [57], movement
we review the related works of accelerometer and gyro- monitoring and accident detection [17], and identification of
scope sensor fusion methods for HAR. Section III details activities of daily living [2].
the methodology used for data fusion, describes the details Fusion at the feature-level involves the extraction of dif-
of the public HAR datasets, and provides the details of the ferent signal properties from each sensor to be combined
experimental design. In Section IV, we provide the empirical before using machine learning algorithms for the identification
comparison showing different metrics for that purpose. In of human activities. There are obvious reasons to perform
Section V, we discuss the results. Section VI concludes the feature-level fusion, such as remove highly correlated (re-
paper and summarise the findings of this study. dundant) features, identify irrelevant (low variance) features,
and reduce computation time and complexity especially for
mobile and wearable systems [43]. In the machine learning
II. R ELATED W ORK
literature, there are two main methods that can be used to
In this section, we introduce data fusion techniques used in perform feature-level fusion: by dimensionality reduction and
HAR systems. In the literature many fusion techniques have by feature selection. Feature selection can be considered a
been presented; however here, our aim is not to present a special case of dimensionality reduction, the main difference
comprehensive review of the literature, instead, we present between these two is that feature selection methods find a
relevant studies to fuse accelerometer and gyroscope sensor subset of features from the original set of features, while
data to improve HAR analysis based on the data processing dimensionality reduction methods fuse the original features
pipeline (refer to Figure 1). For in-depth reviews with more to make new synthetic features; in this study, we focused
aspects of sensor data fusion, the interested reader is referred on the latter. Common methods in dimensionality reduction
to [11], [32], [43]. of time series data are Principal Component Analysis (PCA),
AUTHOR et al.: PREPARATION OF PAPERS FOR IEEE TRANSACTIONS AND JOURNALS (FEBRUARY 2021) 3
Fig. 1. Data pipeline process used in this study. Firstly, the human activity is captured using both accelerometer and gyroscope sensors; data is
recorded as time series. Secondly, collected data is fused at three different levels separately: sensor-level fusion (in blue), feature-level fusion (in
orange), and decision-level fusion (in yellow). Finally, a human activity is recognised in the last stage.
Singular Value Decomposition (SVD), Multidimensional Scal- III. M ETHODS

ing (MDS), and Factor Analysis (FA) [43]. These methods All three levels of data fusion were analysed using four
have been employed in previous studies to fuse accelerometer techniques at each level. The sensor-level fusion techniques
and gyroscope data in diverse applications such as, motion were chosen to see the impact of one sensor on the other.
compensation in fMRI studies [29], sport training [8], motion The gyroscope and accelerometer both have their downsides.
analysis [23] and HAR systems based on smartphones [53]. The gyroscope is extremely vulnerable to drifting over the
Fusion at the decision-level generally combines several long term while the accelerometer is sensitive to sharp jerks
intermediate classification results into a single decision. The but does not drift over time [45]. The sensor-level fusion
aim of this type of fusion is to arrive at consensus to improve allows us to merge the best parts of both sensors. Feature-
overall accuracy, robustness, and generalisation; results that level fusion serves the purpose to identify correlation between
might be unlikely when classifiers are used in isolation [43]. features allowing classifiers to draw more information from a
In addition, fusing the output of individual classifiers helps smaller set of features [24]. A large benefit of feature-level
reduce their uncertainty and ambiguity in problems with high fusion is it provides a smaller more separable feature vector.
dimensional data or insufficient training data. Fusion rules Decision-level fusion allows weaker performing classifiers to
often used in activity recognition literature are majority voting, merge their results to produce a higher performing classifier.
boosting, bagging, or stacking [43]. In ensemble bagging each Throughout this study four machine learning classifiers from
classifier is trained with different subsets randomly selected the scikit-learn python library are used. These classifiers being
from the training set without replacement; bagging has been a Support Vector Machine (SVM), K-Nearest Neighbours
applied in different HAR studies [39], [40]. Boosting also uses (KNN), Linear Discriminant Analysis (LDA) and a Decision
data partitioning to construct multiple weaker classifiers to Tree Classifier (DT). These classifiers were chosen based on
then construct a strong model; this method has been used in their simplicity to implement and their variety in mathematical
physical health monitoring [4], [5] and HAR [38]. Voting is models of achieving classifications, and their extended use in
another method to fuse decisions of different classifiers. In this the HAR literature [9], [43], [47]. Having the variety in our
case, the predictions made by the single classifiers are used as classifiers allows us to eliminate any potential outliers in our
votes, and the final prediction is obtained following a majority results.
vote rule, applications of this method in HAR are [18], [63]. The pre-processing procedures, implementation of fusion
Stacking is a meta learning approach that involves two steps. techniques, and classification tasks were all carried out in
First, different classifiers are built on same training data for Python. All processing and computation was performed on
model diversification; then, the final decision is obtained by a desktop PC with 16GB RAM, an Intel i7 CPU and no use
training a meta-classifier based on the outputs of the individual of a GPU. Figure 1 presents an overview of the methodology
classifiers in the previous step. HAR studies that have used this used in this study.
method reported that the combination of different classifiers
offer complementary information to improve accuracy and
A. Datasets
reduce uncertainty for complex activities [16], [36].
The data used in this study was collected from various IMU
sensors. Theses sensors are a 6-axis motion tracking device
that combines a triaxial gyroscope and a triaxial accelerometer.
A gyroscope measures the angular motion (rotation rate) about Where n is defined by the number of samples in each win-
one or several axis and offers the roll, pitch, and yaw motions dow. For each axes of the windows we extract seven statistical
along the x, y, z axis. The accelerometer measures the linear measures to use as a feature vector which is described in the
acceleration along the x, y, z axis [46]. following section.
TABLE I
I NFORMATION ABOUT THE FOUR DATASETS USED IN THIS STUDY. C. Feature Extraction
Dataset Sensor Subjects Classes Sample Rate (Hz) The features used in this study were extracted from the
MobiAct [60] Smartphone 66 12 94
UMAFall [10] Smartphone 19 13 20
time domain using statistical measures. These features have
IM-WSHA [30] Smartphone 10 11 100 been used extensively in time series analysis using machine
UCI HAR [31] MPU-9250 30 6 50 learning [50], [51]. The features extracted are listed in Table
II. A feature vector of seven statistical measures were ex-
To validate the results found in this study, all data fusion tracted from both the accelerometer and the gyroscope sensors.
methods were applied to four external human activity datasets. The features are extracted at different stages of the pipeline
These datasets were chosen due to they included both, ac- depending on which level of fusion is being applied. For
celerometer and gyroscope data. All four external datasets the sensor-level fusion, features are extracted after the fusion
were reduced down to four basic activities to keep the variance process is applied to the raw data and the fused signal has
among the datasets similar. The MobiAct dataset was the been partitioned into windows. Feature-level and decision-
largest with 66 subjects participating in the study utilizing a level features are extracted from the raw windowed data. The
smartphone for the IMU sensor [60]. The next dataset analysed feature-level fusion is then applied to the features where as the
was the UMAFall dataset which 19 people participated in the decision-level fusion is applied after the features have been fed
data collection also using the built in IMU in smartphones through machine learning classifiers.
[10]. This dataset mainly focus’ on fall detection, however the The features extracted in this study were selected as a simple
collected data also contains human activity data which was baseline feature set for the comparison of different methods
used for our analysis. The third smartphone based dataset used of data fusion. With the aim of this study to compare the
in this study was the UCI Human Activity Recognition dataset effects of data fusion techniques, a computationally expensive
[31]. This dataset consists of 30 subjects with waist mounted feature set was not necessary for this study. This feature vector
smartphones with data that had previously been pre-processed could be significantly improved for performance with the
for noise reduction using a 0.3Hz low-pass filter. The final use of wavelet, frequency domain features, or other complex
dataset we used to validate our findings was the IM-Wearable mathematical functions, however for this study the feature
Smart Home Activities (IM-WSHA) dataset [30]. The data vector was kept small and simple to decrease the processing
in this study was collected using 3 MPU-9250 IMU sensors time throughout many stages in the pipeline. The correlation
from five female and five male subjects. The procedure used between the levels of fusion is clear without the need for a
in this research study was applied in the same way across all complex feature set.
four datasets, excluding [31] as it had been through a cleaning
procedure prior to our use in this study.
TABLE II
L IST OF FEATURES EXTRACTED WITH DEFINITIONS
B. Pre-processing
Feature Symbol Definition
Each subject’s data was scaled linearly between zero and Mean µt µt = q1 PN
x
N P n=1 n
one. For some datasets, the process of cleaning the data did |x−x̄|2
Standard Deviation σ σ= n
not improve the results, therefore all cleaning procedures were x n +x n +1
removed from the pre-processing stage to reduce the overall Median M M = 2 22
Min M in Lowest Value of x
processing time. The data is then partitioned into windows Max M ax Highest Value of x
with 50% overlap [34]. Five window sizes were compared Range R(a,b) R(a,b) = M ax − M in
to identify the optimal window size for HAR analysis. The Inter-Quartile Range IQR IQR = Q3 − Q1
window sizes compared were 16, 32, 64, 128 and 256 data
points. For smaller datasets with a lower number of samples,
the accuracy for each window size tends to drop off after
64 points, as the larger windows ultimately end up with a D. Fusion Algorithms
lower number of overall windows, negatively affecting training Table III presents a summary of each fusion method used in
results. Each dataset had varying results but to leverage the this study. The number of axis given is the dimension returned
potential of the smaller datasets, a window size of 64 was from the relative fusion method. The decision-level fusion was
chosen. The following Hamming window function was applied applied after classification hence why the axis and features
to each window to reduce the loss in data between windows are consistent to that of the baseline data. As the feature-
and smooth the signal from the sensors [13]. level fusion was applied to the features with the purpose of
dimensionality reduction, 7 features proved to be the optimal
2πn sized feature vector. In the case of the feature-level fusion, the
w(n) = 0.54 − 0.46 cos 0 ≤ n ≤ M − 1 (1)
M −1 axis and features in this table are one of the same.
TABLE III the first time step of each sample there is no θ(−1) , and is
S UMMARY OF ALL FUSION METHODS WITH THE NUMBER OF AXIS AND therefore replaced with 0 in equation (4).
FEATURES GIVEN .
The Kalman filter is slightly more complicated than the
Method Axis Features Level complementary filter. Adu and Bran-Melendez [45] described
AVA 1 7 Sensor
Magnitude 2 14 Sensor a Kalman Filter as a predictor-corrector algorithm. Where the
Complementary 1 7 Sensor filter first estimates the angle of the sensor based on prior
Kalman 6 42 Sensor information and then corrects itself to then provide more
FA 7 7 Feature
MDS 7 7 Feature information for the estimation at the next time step. The
PCA 7 7 Feature Kalman filter has two states, predict and update [11]. During
SVD 7 7 Feature the prediction state the Kalman filter makes an estimate about
Bagging 6 42 Decision
Boosting 6 42 Decision the position of the sensor based on the previous time step using
Voting 6 42 Decision on a Gaussian distribution. In the update state the filter then
Stacking 6 42 Decision takes a new measurement and calculates the Kalman Gain to
update the estimate based on the Gaussian distributions. The
Kalman gain is given by [45]:
1) Sensor-level Fusion: Multiple sensors often perform dif-
ferently in certain situations, providing different levels of Kk = P − k H T (HP − k H T + R)−1 (5)
information about the situation being observed. The purpose Where K is the Kalman Gain, Pk− is the posteriori error
of sensor fusion is to merge the signals from multiple sensors covariance matrix, H is the relationship between the current
to reduce the less important and emphasise the more useful state and the measurement, R is the measurement noise
data collected from numerous sensors. Each level of fusion covariance, k is the sample iteration (k = 1, ..., n)
was analysed using four methods. We analysed the effects of
using each sensors absolute vertical acceleration (AVA), the 2) Feature-level Fusion: The goal of feature-level data fu-
magnitude of the sensor signals [57] and both Kalman and sion is to reduce the dimensionality of the feature vector while
complementary filters as they have been gathering popularity still retaining as much information as possible. Feature-level
in the recent years [45]. The sensor fusion process is applied fusion is applied to the extracted feature vector at the end
to the raw signal data before the windowing and feature of the pre-processing stage. The four methods chosen for
extraction. the feature-level analysis were Principal Components Analysis
The AVA is calculated based on the acceleration from the (PCA), Multidimensional Scaling (MDS), a truncated Singular
accelerometer and combining the angle from the gyroscope to Value Decomposition (SVD) and Factor Analysis (FA) [55].
produce a fused output from the two sensors [57]. All feature-level methods used in this study were applied using
AV A = |aXsinθZ + aY + sinθY − aZcosθY cosθZ| (2) the scikit-learn python library, reducing the size of each feature
vector to 7 components.
Where a stands for the accelerometer axis values and θ stands PCA is aimed at finding a subspace of a feature set from
for the gyroscope axis values. some sample x, with dimension M for any given desired
The magnitude of the signal is applied to one sensor at a dimension m while maintaining or increasing the level of
time to fuse the three axis into one. The magnitude of the separability [55]. Simply put, the goal of PCA is to find a new
accelerometer sensor is calculated by: feature vector that provides similar or better separability be-
M agnitude = (|aX|2 + |aY |2 + |aZ|2 )1/2 (3) tween classes while reducing the number of features. PCA also
has the benefit of potentially converting correlated features into
Where once again a denotes the accelerometer axis values and an uncorrelated feature vector. PCA applies the dimensionality
can be replaced with θ to represent the gyroscopes magnitude reduction by removing the components that are associated with
value. the lowest eigen-values extracted from the feature vector [55].
The complementary filter is designed to merge the optimal The process of MDS results in a representation of the dis-
aspects of each sensor to counteract the downsides. The gyro- similarity or similarity of the points of the feature vectors. Just
scope is said to be accurate in the short term but tends to drift like the other feature-level techniques chosen, the goal of MDS
in the long term. The Accelerometer is extremely sensitive to is also to reduce the feature vector to a smaller dimension
noise and sudden accelerations but does not drift [45]. The without losing the vital information of the features [52]. MDS
complementary filter in its basic form can be described as: scales the feature vector into a lower dimensional space. The
feature vector represents dissimilar values as further apart and
θ(t) = 0.98(θ(t−1) + δ) + 0.02(arctan 2(aX, aY )) (4)
similar values as close together in the resulting space. The
θ(t) represents the angle calculated by the filter at a particular classical MDS algorithm utilises euclidean distance to measure
time step t Where δ = 1/samplingf requency The com- the dissimilarity between features [52].
plementary filter essentially applies a high pass filter to the SVD is a similar approach to that PCA, however SVD
gyroscope signals and a low pass filter to the accelerometer decomposes the input feature vector into three matrices (W ,
signals. In equation (4) the high pass filter of gyroscope is D and U ). W contains matrices of eigen-values of the input
calculated by 0.98(θ(t−1) + δ) and the low pass filter applied vector, D contains the square roots of the eigen-values and
to the accelerometer is achieved by 0.02(atan2(aX, aY )). For U contains the original feature vector [55]. The reduction of
dimensionality is achieved by removing the components with different algorithms and levels of fusion. This metric was
the lowest eigen-values found in W . implemented by defining a multi-class problem setting [56].
Factor Analysis is a similar approach to MDS, how ever The accuracy is defined as as follows:
the factor analysis requires features to be scored based on a
given list of attributes where as with MDS the dissimilarities k
between features can be measured directly. This is the reason 1X T Pi + T Ni
Accuracy = , (6)
MDS is generally preferred over factor analysis [27]. Factor k i=1 T Pi + F Ni + F Pi + T Ni
analysis consists of two steps; factor extraction and factor
where k, i, P and N indicate the number of classes
rotation. Factor extraction involves evaluating which model
(activities in each separated dataset), a single class, the number
will be used and the number of components to extract. Once
of positive, and negative samples, respectively. A true positive
the factor extraction has taken place, the factor rotation step is
(TP) refers to the number of correctly predicted samples
applied, in which the extracted factors are rotated to produce
belonging to the positive class, a true negative (TN) refers
a lower dimension vector [59].
to the number of correctly predicted samples belonging to the
3) Decision-level Fusion: The decision-level of data fusion
negative class, a false positive (FP) refers to the number of
involves using multiple classifiers to predict a class for the
incorrectly predicted samples belonging to the positive class
input data and then fusing the results to return a more accurate
(Type I error), and a false negative (FN) refers to the number of
classification [12]. The four decision fusion methods were
incorrectly predicted samples belonging to the negative class
four ensemble classifiers from the scikit-learn python library,
(Type II error).
these being Voting, Bagging, Boosting and Stacking. The
simplest form of decision fusion is a Voting classifier. A Voting
classifier applies a different classifier to each axis of the sensor IV. R ESULTS
signal and applies a majority rules algorithm where the class In this section, the results of the empirical comparison
with the most votes is the predicted class [20]. between different levels (sensor-level, feature-level, decision-
The boosting classifier used in this analysis is known as level) of fusion are presented. First, the results of the baseline
AdaBoost (adaptive boosting) and is an ensemble technique model without any fusion are obtained. Second, the results
that uses multiple weaker performing classifiers with the goal of four different fusion techniques at each level are presented.
to generate a single powerful classifier [21]. The difference Finally, an overall comparison among the three levels of fusion
between standard gradient boosting algorithms and AdaBoost is offered. In all experiments, four publicly available HAR
is the gradient boosting algorithms train the weak classifiers on datasets (please, refer to section III-A) are used in this study.
the errors of the more accurate classifier whereas the AdaBoost Figure 2 presents an example of different activities captured
algorithm changes the weights of the weaker classifiers in by the accelerometer and gyroscope tri-axial sensor data.
order to help them learn more difficult occurrences in the It is evident that activities (e.g., walking) with large body
data and then merges these weaker classifiers with the stronger movement exhibit large signal fluctuations. In addition, the
classifier. The boosting classifier utilizes 10 base-classifiers to difference between both types of sensors is evident in the
train the main model. third activity (standing), since in this activity the subject’s
The stacking classifier also uses multiple weak classifiers were instructed to move their head to look around while
to assist the capability of a stronger classifier. The stacking standing. In this case, the accelerometer captures very low
classifier trains numerous classifiers on the regular data and difference in linear acceleration since the subject is standing in
then uses the outputs of each classifier as input for the main the same spot; on the other hand, the gyroscope captures large
stronger classifier [15]. The stacking weak classifiers used differences in angular velocity since the subject’s are moving
were a Random Forest classifier, Support Vector Machine their upper body to look around. This is a clear example of
(SVM) and a K-Nearest Neighbours (KNN) classifier, utilizing how the captured data of both sensors can be used to overcome
a Logistic Regression model as the stronger final classifier. the limitations from each other and provide better information.
Similarly like the other decision fusion methods, bagging
trains the base weak classifiers on random subsets of the
A. Baseline Classification
training data, and then fuses the results together, generally by
using averaging or voting. A benefit of the bagging classifier is The classification methods were first applied to the extracted
it introduces a level of randomisation into the base classifiers features without performing any data fusion to obtain baseline
prior to merging the results [22]. Similar to boosting the data. This allows us to obtain reference values from each
bagging model had 10 base-classifiers trained to obtain the dataset and for each classifier. The classification was done
final trained model. using the seven features (mean, std, median, min, max, range,
IQR) extracted from each axis of both sensors, in total 42
features were obtained (7 features × 6 axis). Table IV displays
E. Evaluation Metrics the accuracy results of the defined features. The results showed
The machine learning models were evaluated by randomly that KNN was the most successful classifier for the MobiAct
splitting the data into training (70%) and test (30%) sets. 10- (Acc = 0.65924), UMAFall (Acc = 0.6922), and IM-WSHA
fold cross validation was used on all classifiers. Classification (Acc = 0.96674) datasets. The SVM classifier obtained better
accuracy was used as performance metric to compare the results than the KNN using the UCI-HAR (Acc = 0.7908)
Fig. 2. An example of the tri-axial data captured by the accelerometer and gyroscope sensors during four different activities.
TABLE IV TABLE V
B ASELINE RESULTS WITH DIFFERENT CLASSIFIERS . ACCURACY ACCURACY CLASSIFICATION RESULTS USING SENSOR - LEVEL FUSION
CLASSIFICATION VALUES FROM ALL DATASETS WITHOUT DATA FUSION . WITH ALL THE DATASETS .
Dataset SVM LDA DT KNN Dataset Fusion SVM LDA DT KNN Average
MobiAct 0.5992 0.6258 0.5858 0.6592 AVA 0.4100 0.4092 0.3817 0.4000 0.4003
UMAFall 0.633 0.6388 0.5602 0.6922 Magn. 0.4858 0.4783 0.4592 0.5058 0.4748
MobiAct
IM-WSHA 0.9375 0.9358 0.9317 0.9667 Comp. 0.4183 0.3833 0.3283 0.3242 0.3613
UCI-HAR 0.7908 0.6383 0.7750 0.7683 Kalman 0.8727 0.8909 0.8182 0.9455 0.8764
Average 0.7401 0.7096 0.7131 0.7716 AVA 0.4454 0.4476 0.3592 0.4039 0.4365
Magn. 0.5146 0.5350 0.4379 0.4621 0.4878
UMAFall
Comp. 0.4107 0.4243 0.3427 0.3447 0.3790
Kalman 0.7273 0.5091 0.6545 0.8000 0.6655
dataset. Overall, the KNN classifier obtained the highest AVA 0.6625 0.6592 0.7392 0.7883 0.6740
average accuracy (Acc = 0.7677) across all the datasets. IM-WSHA
Magn. 0.8042 0.7717 0.8008 0.8275 0.7760
If the average training time (in seconds) taken by each Comp. 0.8608 0.8433 0.8642 0.9192 0.8620
Kalman 0.8611 0.9375 0.8660 0.9493 0.8950
classifier with all four datasets is considered, it can be argued AVA 0.5217 0.4658 0.4808 0.5033 0.4846
that the KNN (time = 26.11ms) is more efficient than the UCI-HAR
Magn. 0.5683 0.5150 0.5417 0.5483 0.5430
rest; with the SVM classifier (time = 219.61ms) exhibiting Comp. 0.6117 0.5783 0.5917 0.6333 0.6003
Kalman 0.6325 0.5575 0.5450 0.5825 0.5778
the longest average training time, and the LDA (time =
52.19ms) and DT (time = 57.85ms) showing similar results.
Therefore, the KNN classifier can be considered as the best
classifier in the baseline classification among all datasets. (Acc = 0.8764), UMAFall (Acc = 0.6655),and IM-WSHA
(Acc = 0.8950); while the Complementary filter obtained the
B. Classification of Data Fusion Levels best results with the HAR dataset (Acc =0.6003). In addition,
In this section, the results of applying the three levels of KNN exhibited the highest accuracy with most of the fusion
data fusion for each dataset are presented. Each level of data methods. Overall, it is evident that the Kalman filter obtained
fusion was applied separately for each dataset, please refer to the best results among all tested sensor-level fusion methods.
Figure 1. The expectation was that the classification accuracy At this level of fusion, there is an evident method of fusion
would improve if the fusion method is able to obtain more that performed better based on the accuracy value. Overall
complete global information from the two different sensors. accuracy values across all classifiers showed that classifiers
1) Sensor-level Fusion Results: The first level of fusion in using the Kalman filter as method of fusion obtained the best
the data processing pipeline is sensor-level fusion. At this results, with an overall (Acc = 0.7537). While classifiers
level, the time series data from each sensor are fused according relying on the other methods of fusion had much lower results,
to each individual method. The data from each axis (x, y, z) i.e., Magnitude (Acc = 0.5704), Complementary filter (Acc =
is pre-processed individually before these are fused. Table 0.5506), and AVA (Acc = 0.4989). In terms of the computa-
VI presents the results for each individual dataset for each tional time taken to train the classifiers, classifiers using the
method. The results showed that sensor fusion by Kalman complementary filter as method of fusion exhibited, in average,
filter obtained the best results with three datasets MobiAct the fastest training time (time = 43.81ms). Classifiers relying
TABLE VI exhibited the highest performance overall, with very similar

ACCURACY CLASSIFICATION RESULTS USING FEATURE - LEVEL FUSION results (Acc = 0.733) and (Acc = 0.736), respectively. LDA
WITH ALL THE DATASETS .
and DT showed slightly worse results, with (Acc = 0.6176)
Dataset Fusion SVM LDA DT KNN Average and (Acc = 0.6398), respectively. Based on the average train-
FA 0.6042 0.4683 0.4692 0.6067 0.5285
MDS 0.6258 0.4592 0.4883 0.6250 0.5340 ing time taken by these classifiers with all fusion methods at
MobiAct the feature level, the KNN exhibited the fastest computational
PCA 0.6208 0.4617 0.4992 0.6233 0.5382
SVD 0.6133 0.4742 0.5200 0.6175 0.5420 time (time = 7.43ms); while the LDA (time = 7.89ms) and
FA 0.6262 0.5155 0.5398 0.6631 0.5716
MDS 0.6301 0.5447 0.4874 0.6146 0.5637
the DT (time = 14.48ms) required slightly longer time to be
UMAFall trained. Again, the SVM showed the longest computational
PCA 0.6194 0.5466 0.4903 0.5961 0.5588
SVD 0.6252 0.5417 0.4631 0.6126 0.5557 time (time = 188.06ms) at this level of fusion.
FA 0.9525 0.8600 0.845 0.9308 0.8920
3) Decision-level Fusion Results: The third level of fusion
MDS 0.9375 0.8417 0.8808 0.9533 0.8930
IM-WSHA in the data processing pipeline is decision-level fusion. At
PCA 0.9325 0.8492 0.8933 0.9517 0.8950
SVD 0.9400 0.8442 0.8775 0.9492 0.8940 this stage, four different classification methods were used to
FA 0.7858 0.6367 0.7100 0.7667 0.7290 fuse the information. Table VII exhibits the accuracy results
MDS 0.7483 0.5825 0.6700 0.7383 0.6778
UCI-HAR obtained by each fusion method. Stacking obtained the highest
PCA 0.7633 0.6092 0.7117 0.7433 0.7058
SVD 0.7542 0.6467 0.6917 0.7358 0.7087 accuracy among all explored methods of decision-level fusion,
MobiAct (Acc = 0.7600), UMAFall (Acc = 0.7427), IM-
WSHA (Acc = 0.9800), and HAR (Acc = 0.8583). In
on the other three fusion methods, in average, showed longer this case, it is evident that stacking is the a single method
training times, with the Kalman filter (time = 61.71ms), AVA outperforms all the other fusion methods.
(time = 70.39ms), and Magnitude (time = 75.93ms).
In terms of the performance of the individual classifiers, TABLE VII
the KNN exhibited good results. Overall, the KNN obtained ACCURACY CLASSIFICATION RESULTS USING DECISION - LEVEL FUSION
WITH ALL THE DATASETS .
the highest results (Acc = 0.6212) among all the methods
of fusion, while the other classifiers had comparable results, Dataset Bagging Boosting Voting Stacking
the Gaussian SVM (Acc = 0.6129), DT (Acc = 0.5757), and MobiAct 0.7033 0.6333 0.6133 0.7600
UMAFall 0.6709 0.6282 0.5699 0.7427
LDA (Acc = 0.5878). However, based on the average training IM-WSHA 0.9683 0.7967 0.9392 0.9800
time taken by these classifiers with all fusion methods at the UCI-HAR 0.8408 0.5642 0.6400 0.8583
sensor level, the KNN exhibited the fastest computational time Average 0.8185 0.6674 0.7016 0.8588
(time = 11.79ms); while the LDA (time = 14.25ms) and
the DT (time = 23.45ms) required slightly longer time to be The average accuracy values from two methods of fusion
trained. Finally, the SVM showed the longest computational at the decision-level stand out from the rest. Stacking showed
time (time = 202.33ms) at this level of fusion. the highest performance (Acc = 0.8588), and Bagging had
2) Feature-level Fusion Results: The second level of fusion slightly lower results (Acc = 0.8185). On the other hand,
in the data processing pipeline is feature-level fusion. Table VI Boosting (Acc = 0.6674) and Voting (Acc = 0.7016)
presents the accuracy results for each fusion method at the fea- exhibited the lesser results overall. In average, Voting (time =
ture level. Factor analysis (FA) showed the best results in two 41.73ms) and Boosting (time = 59.19ms) exhibited the
datasets, UMAFall (Acc = 0.5716) and HAR(Acc = 0.7290). lowest computational time to be trained at this level of
While singular value decomposition (SVD) and principal fusion; while the Bagging (time = 159.70ms) and Stacking
component analysis (PCA) presented the best results using the took the longest computational time to the datasets (time =
MobiAct (Acc = 0.5420) and IM-WSHA (Acc = 0.8950), 6464.93ms).
respectively; in both cases, FA had comparable results than
the former methods. At this particular level of fusion, there is C. Comparison Between Levels of Fusion
no clear indication that a single method performs better than The different levels of fusion have showed contrasting re-
the others. sults. The results presented in this section represent the average
At feature-level fusion, all fusion methods showed sim- results across all datasets and obtained for each individual
ilar accuracy and computational time results. For instance, method of fusion. The accuracy, computational time, and
classifiers using these methods obtained very similar results, finality a comparison between accuracy and computational are
PCA (Acc = 0.6745), and SVD (Acc = 0.6751), FA presented below.
(Acc = 0.6803), and MDS (Acc = 0.6671). In terms of Figure 3 presents the accuracy results for each individual
computational time, classifiers using SVD as method of feature fusion method averaged across the four datasets. The baseline
fusion appeared to take less time to be trained that using other method achieved an average Acc = 0.7250 ± 0.1517 without
methods of fusion, with SVD (time = 49.49ms); while PCA the use of any data fusion. Methods based on sensor-level
(time = 50.52ms) and MDS (time = 53.46s) had similar fusion produced different results (average Acc = 0.5934 ±
results, FA (time = 64.39ms) exhibited the worst time among 0.1110), the Kalman filter exhibited higher accuracy (Acc =
fusion methods at this level. 0.7536 ± 0.1566) results than the baseline model, this was the
At the classifier level using feature-level fusion, again the third highest accuracy among all the methods fusion; the other
KNN exhibited good results. The KNN algorithm and SVM three methods presented lower results that the baseline model,
AVA (Acc = 0.4987 ± 0.1215), Magnitude (Acc = 0.5704 ±

0.1403),and Complementary filter (Acc = 0.5507 ± 0.2343).
All the feature-level fusion techniques exhibited lower results
than the baseline method (average Acc = 0.6742 ± 0.0053),
with Factor Analysis (Acc = 0.6803 ± 0.1654), MDS (Acc =
0.6672 ± 0.1629), PCA (Acc = 0.6744 ± 0.1647), and
SVD (Acc = 0.6750 ± 0.1641). Finally, the decision-level
techniques also showed different results (average Acc =
0.7443 ± 0.0850), Stacking (Acc = 0.8353 ± 0.1091) and
Bagging (Acc = 0.7958 ± 0.1365) showed two highest
accuracy values among all fusion methods; while Boosting
(Acc = 0.6556±0.0991) and Voting (Acc = 0.6906±0.1682)
showed lower accuracy results than the baseline model.
Fig. 4. Comparison of the average processing time between all

methods of fusion and the baseline data.
levels of fusion are presented in blue (sensor-level), green

(feature-level), and purple (decision-level) for easy identifi-
cation. It is clear that in the bottom left quadrant the great
majority of fusion methods are present, these methods were
less accurate (< 0.7250) and less computational intensive
(< 88.94ms) than the baseline model. It is also clear that
two decision-level methods (Bagging and Stacking) were more
accurate than the baseline model; however, these two methods
were much more computational intensive. Based on this graph,
it is also evident that in the top left quadrant a single method
(Kalman Filter) presented both, better accuracy and less com-
Fig. 3. Comparison of the accuracy between the methods of fusion and putational power than the baseline model; which represents a
the baseline data as a combined average result for all the datasets. good trade-off between accuracy and computational power.
Regarding the processing time, most fusion methods were 0.85

less computational intensive than the baseline model (time =
0.80
88.94ms±27.98). Figure 4 presents the average CPU time for Baseline
all levels of fusion explored in this study. Sensor-level fusion 0.75 AVA
presented lower computational time than the baseline model, Magnitude
Complementary
0.70
Accuracy
and the Complementary Filter (time = 43.81ms ± 51.06) Kalman

exhibited the second lowest CPU time in this study; while 0.65
FA
MDS
Kalman Filter (time = 61.71ms ± 63.85), AVA (time = PCA
70.39ms±27.98), and Magnitude (time = 34.28ms±34.28). 0.60 SVD
Bagging
Overall, feature-level fusion appeared to be the fastest methods 0.55 Boosting
to be computed among all three levels of fusion, with SVD Voting
(time = 49.49ms ± 20.61), AVA (time = 64.39ms ± 0.50 Stacking
30.29), MDS (time = 53.46ms ± 22.44), and PCA (time = 40 80 600 6000
50.52ms ± 21.26). For the decision-level techniques, Voting Average CPU Time (ms)
exhibited the lowest average CPU time (time = 41.73ms ±

2.88) among all fusion methods, Boosting also exhibited lower Fig. 5. Comparison between the accuracy and the processing time of
all the levels of fusion considered in this study. The scatter plot presents
CPU time (time = 59.19ms ± 4.72) than the baseline model; the three different levels of fusion in different colours, the baseline
while Bagging (time = 159.70ms ± 23.75) and Stacking classification without data fusion (in red), sensor-level fusion (in blue),
(time = 6464.93ms ± 3135.98) were the only methods to feature-level fusion (in green), and decision-level fusion (in purple).
exhibit longer CPU time than the baseline model.
Finally, a comparison between the accuracy and the com-
putational time for each method is presented in Figure 5. This V. D ISCUSSION
plot presents the baseline model in red and the graph is divided This paper tries to identify the most appropriate level of
in four sections based on that. In addition, the three different fusion to improve the accuracy of human activity recognition
systems. Four datasets were used to investigate the accuracy gression of this study. Each dataset have a different list of
obtained in the three levels of fusion in the data pipeline activities sampled. Each dataset was reduced to four similar
process (please refer to Figure 1). The results showed that three activities but they are not all the same. In some datasets, certain
methods of fusion presented better accuracy that the baseline activities may be slightly more separable from other activities
models with all datasets. However, a single method presented compared to that of other datasets. Numerous sensors were
both better accuracy and less processing time than the baseline used throughout the four datasets used in this study. The UCI-
models. HAR dataset using 3 MPU-9250 sensors and the other three
Each level of fusion showed different computational com- datasets using smartphones to collect the data. With the smart-
plexity. On average, the feature-level fusion requires less phone models not being disclosed, the variance amongst the
computational time (time = 54.47ms) and power compared sensors could cause variation in the data collection process. All
to the decision-level (time = 1681.39ms) and sensor-level four datasets have a different number of subject’s participating
(time = 62.9626ms) when it comes to the training of in the data collection. Datasets with more participants have
classifiers. The fusion process at the sensor-level takes slightly more variation between the data samples. This has the potential
longer than that of the feature-level as the process is applied to negatively affect the results of datasets with more variety
to the raw data where as the feature-level is applied to the in participant data. An aspect not explored in this study was
7 features extracted from each axis (42 features per window). the use of multiple levels of fusion to improve results. This
The decision-level of fusion is by far the most computationally would significantly increase the pre-processing time but has
expensive on average, however Boosting (time = 59.19ms) the potential to increase the accuracy results utilizing the best
and Voting (time = 41.73ms) are on a similar level as the aspects of each level of fusion; this is something that we could
sensor and feature-level methods. This is extremely important explore in our future work.
since in most applications of real-time HAR, low-computation
power consumption is needed while using wearable devices VI. C ONCLUSION
[7], [41]. The method of fusion that improves on the baseline data
With many of modern day applications being used on results in respects to both accuracy and processing time is the
mobile devices, computational time and power both play a Kalman filter. Two methods in the decision-level (Bagging
large role in dictating what can and cannot be done on these and Stacking) achieve higher accuracy result compared to
devices. If these two variables were not an issue, it would the baseline data, however the time required to train and
be clear that the Stacking classifier at the decision-level is make classifications with these models is far higher than
by far the most optimal method with an average (Acc = that of the baseline. If processing time is not a factor for a
0.8583). Although, decision-level methods such as Bagging specific application the Bagging and Stacking classifiers on the
and Stacking obtained much better accuracy than all the other decision-level have proven to be the most optimal. However
fusion methods in this study (please refer to Figure 5), these most HAR systems are generally utilized on a mobile device,
two methods are not suitable for applications where the need where processing time plays a large role in a systems usability.
of retraining a model for a specific user might be needed and Ultimately only the Kalman filter performed better than the
with limited computational power resources. Voting, on the baseline data in both processing time and accuracy and can be
other hand, showed the lowest computational time (time = deemed the most optimal individual fusion method for HAR
41.73ms) and acceptable accuracy (Acc = 0.6906). In real- systems. Accuracy wise the decision-level was on average the
time applications, Longstaff et al., [37] used voting to improve best performing level of fusion, however the computational
activity classification on mobile devices for real-time analysis. power and processing time needed in training and making
Another study, Riboni et al., [48] used voting to improve classifications for the two higher performing classifiers reduces
the classification for real-time HAR using an Android-based the effectiveness of the decision-level for use in HAR systems
handheld device. on mobile devices.
Finding a trade-off between accuracy performance and com-
putational time is fundamental for HAR applications. Among R EFERENCES
all the fusion methods explored in this study, Kalman Filter [1] Ahmed Al-Jawad, Anton Barlit, Michailas Romanovas, Martin
obtained a good trade off between accuracy and computational Traechtler, and Yiannos Manoli. The use of an orientation kalman filter
power. These results are in line with previous studies using for the static postural sway analysis. APCBEE procedia, 7:93–102, 2013.
[2] Bruno Andò, Salvatore Baglio, Cristian Orazio Lombardo, and Vin-
Kalman filter for real-time HAR. For instance, Wu et al., cenzo Marletta. A multisensor data-fusion approach for adl and fall
[62] implemented a real-time physical activity monitoring classification. IEEE Transactions on Instrumentation and Measurement,
system using based on Kalman Filter fusion, their results 65(9):1960–1967, 2016.
[3] Paranyu Arnon. Classification model for multi-sensor data fusion apply
indicated that body activity was identified with high accuracy for human activity recognition. In 2014 International Conference on
and shot latency. In another study [42], sensor fusion based Computer, Communications, and Control Technology (I4CT), pages 415–
on Kalman Filter was implemented for real-time HAR using a 419. IEEE, 2014.
[4] Oresti Banos, Miguel Damas, Hector Pomares, and Ignacio Rojas. On
smartphone; the experimental results of this study showed that the use of sensor fusion to reduce the impact of rotational and additive
data fusion using Kalman Filter and SVM classifier obtained noise in human activity recognition. Sensors, 12(6):8039–8054, 2012.
higher accuracy than using raw data or the Complementary [5] Oresti Banos, Miguel Damas, Héctor Pomares, and Ignacio Rojas. Activ-
ity recognition based on a multi-sensor meta-classifier. In International
Filter. work-conference on artificial neural networks, pages 208–215. Springer,
There are a few limitations discovered throughout the pro- 2013.
[6] Akram Bayat, Marc Pomplun, and Duc A Tran. A study on human ac- Conference Proceedings, volume 1919, page 020002. AIP Publishing
tivity recognition using accelerometer data from smartphones. Procedia LLC, 2017.
Computer Science, 34:450–457, 2014. [29] Mojtaba Jafari Tadi, Eero Lehtonen, Jarmo Teuho, Juho Koskinen, Jussi
[7] Ganapati Bhat, Ranadeep Deb, Vatika Vardhan Chaurasia, Holly Shill, Schultz, Reetta Siekkinen, Tero Koivisto, Mikko Pänkäälä, Mika Teräs,
and Umit Y Ogras. Online human activity recognition using low-power and Riku Klén. A computational framework for data fusion in mems-
wearable devices. In 2018 IEEE/ACM International Conference on based cardiac and respiratory gating. Sensors, 19(19):4137, 2019.
Computer-Aided Design (ICCAD), pages 1–8. IEEE, 2018. [30] Ahmad Jalal, Kibum Kim, et al. Wearable inertial sensors for daily
[8] Krzysztof Brzostowski and Piotr Szwach. Data fusion in ubiquitous activity analysis based on adam optimization and the maximum entropy
sports training: Methodology and application. Wireless Communications markov model. Entropy, 22(5):579, 2020.
and Mobile Computing, 2018, 2018. [31] Alessandro Ghio Luca Oneto Jorge L. Reyes-Ortiz, Davide Anguita and
[9] Andreas Bulling, Ulf Blanke, and Bernt Schiele. A tutorial on human Xavier Parra. Human activity recognition using smartphoes data set.
activity recognition using body-worn inertial sensors. ACM Computing 2012.
Surveys (CSUR), 46(3):1–33, 2014. [32] Bahador Khaleghi, Alaa Khamis, Fakhreddine O Karray, and Saiedeh N
[10] Eduardo Casilari, Jose A Santoyo-Ramón, and Jose M Cano-Garcı́a. Razavi. Multisensor data fusion: A review of the state-of-the-art.
Umafall: A multisensor dataset for the research on automatic fall Information fusion, 14(1):28–44, 2013.
detection. Procedia Computer Science, 110:32–39, 2017. [33] Adil Mehmood Khan, Ali Tufail, Asad Masood Khattak, and Teemu H
[11] Federico Castanedo. A review of data fusion techniques. The Scientific Laine. Activity recognition on smartphones via sensor-fusion and kda-
World Journal, 2013, 2013. based svms. International Journal of Distributed Sensor Networks,
[12] Frank Cremer, Klamer Schutte, John GM Schavemaker, and Eric den 10(5):503291, 2014.
Breejen. A comparison of decision-level sensor-fusion methods for anti- [34] Rachel C King, Emma Villeneuve, Ruth J White, R Simon Sherratt,
personnel landmine detection. Information fusion, 2(3):187–208, 2001. William Holderbaum, and William S Harwin. Application of data fusion
[13] Waltenegus Dargie. Analysis of time and frequency domain features of techniques and technologies for wearable health monitoring. Medical
accelerometer measurements. In 2009 Proceedings of 18th International engineering & physics, 42:1–12, 2017.
Conference on Computer Communications and Networks, pages 1–6. [35] Manon Kok, Jeroen D Hol, and Thomas B Schön. Using iner-
IEEE, 2009. tial sensors for position and orientation estimation. arXiv preprint
[14] Essam Debie, Raul Fernandez Rojas, Justin Fidock, Michael Barlow, arXiv:1704.06053, 2017.
Kathryn Kasmarik, Sreenatha Anavatti, Matthew Garratt, and Hussein A [36] Li Liu, Shu Wang, Yuxin Peng, Zigang Huang, Ming Liu, and Bin
Abbass. Multimodal fusion for objective assessment of cognitive Hu. Mining intricate temporal rules for recognizing complex activities
workload: a review. IEEE transactions on cybernetics, 2019. of daily living under uncertainty. Pattern Recognition, 60:1015–1028,
[15] Federico Divina, Aude Gilson, Francisco Goméz-Vela, Miguel 2016.
Garcı́a Torres, and José F Torres. Stacking ensemble learning for short- [37] Brent Longstaff, Sasank Reddy, and Deborah Estrin. Improving activity
term electricity consumption forecasting. Energies, 11(4):949, 2018. classification for health applications on mobile devices using active and
[16] Iram Fatima, Muhammad Fahim, Young-Koo Lee, and Sungyoung semi-supervised learning. In 2010 4th International Conference on
Lee. A genetic algorithm-based classifier ensemble optimization for Pervasive Computing Technologies for Healthcare, pages 1–7. IEEE,
activity recognition in smart homes. KSII Transactions on Internet and 2010.
Information Systems (TIIS), 7(11):2853–2873, 2013. [38] Hussein Mazaar, Eid Emary, and Hoda Onsi. Ensemble based-feature
[17] Filipe Felisberto, António Pereira, et al. A ubiquitous and low-cost selection on human activity recognition. In Proceedings of the 10th
solution for movement monitoring and accident detection based on International Conference on Informatics and Systems, pages 81–87,
sensor fusion. Sensors, 14(5):8961–8983, 2014. 2016.
[18] Zengtao Feng, Lingfei Mo, and Meng Li. A random forest-based ensem- [39] Jun-Ki Min and Sung-Bae Cho. Activity recognition based on wearable
ble method for activity recognition. In 2015 37th Annual International sensors using selection/fusion hybrid ensemble. In 2011 IEEE Interna-
Conference of the IEEE Engineering in Medicine and Biology Society tional Conference on Systems, Man, and Cybernetics, pages 1319–1324.
(EMBC), pages 5074–5077. IEEE, 2015. IEEE, 2011.
[19] Hassen Fourati. Heterogeneous data fusion algorithm for pedestrian [40] Lingfei Mo, Shaopeng Liu, Robert X Gao, and Patty S Freedson. Multi-
navigation via foot-mounted inertial measurement unit and complemen- sensor ensemble classifier for activity recognition. Journal of Software
tary filter. IEEE Transactions on Instrumentation and Measurement, Engineering and Applications, 5:113, 2012.
64(1):221–229, 2014. [41] Subhas Chandra Mukhopadhyay. Wearable sensors for human activity
[20] David D Freedman. Overview of decision level fusion techniques for monitoring: A review. IEEE sensors journal, 15(3):1321–1330, 2014.
identification and their application. In Proceedings of 1994 American [42] D Natarajasivan and M Govindarajan. Filter based sensor fusion
Control Conference-ACC’94, volume 2, pages 1299–1303. IEEE, 1994. for activity recognition using smartphone. International Journal of
[21] Yoav Freund, Robert Schapire, and Naoki Abe. A short introduction to Computer Science and Telecommunications, 7(5):26–31, 2016.
boosting. Journal-Japanese Society For Artificial Intelligence, 14(771- [43] Henry Friday Nweke, Ying Wah Teh, Ghulam Mujtaba, and Mo-
780):1612, 1999. hammed Ali Al-Garadi. Data fusion and multiple classifier systems
[22] DP Gaikwad and Ravindra C Thool. Intrusion detection system using for human activity detection and health monitoring: Review and open
bagging ensemble method of machine learning. In 2015 International research directions. Information Fusion, 46:147–170, 2019.
Conference on Computing Communication Control and Automation, [44] Henry Friday Nweke, Ying Wah Teh, Ghulam Mujtaba, Uzoma Rita
pages 291–295. IEEE, 2015. Alo, and Mohammed Ali Al-garadi. Multi-sensor fusion based on
[23] Hassan Ghassemzadeh, Eric Guenterberg, Sarah Ostadabbas, and multiple classifier systems for human activity identification. Human-
Roozbeh Jafari. A motion sequence fusion technique based on pca for centric Computing and Information Sciences, 9(1):34, 2019.
activity analysis in body sensor networks. In 2009 Annual International [45] Stefan Bran-Melendez Paa Adu. Optimizing imu based gesture classif-
Conference of the IEEE Engineering in Medicine and Biology Society, cation. 2018.
pages 3146–3149. IEEE, 2009. [46] SU Park, JH Park, MA Al-Masni, MA Al-Antari, Md Z Uddin, and
[24] Rohin K Govindarajan. Feature-level fusion in multimodal biometrics. T-S Kim. A depth camera-based human activity recognition via deep
2004. learning recurrent neural network for health and social care services.
[25] Yu Guan and Xinmin Song. Sensor fusion of gyroscope and ac- Procedia Computer Science, 100:78–84, 2016.
celerometer for low-cost attitude determination system. In 2018 Chinese [47] Ivan Miguel Pires, Gonçalo Marques, Nuno M Garcia, Francisco
Automation Congress (CAC), pages 1068–1072. IEEE, 2018. Flórez-Revuelta, Maria Canavarro Teixeira, Eftim Zdravevski, Susanna
[26] Han Wen Guo, Yi Ta Hsieh, Yu Shun Huang, Jen Chien Chien, Koichi Spinsante, and Miguel Coimbra. Pattern recognition techniques for
Haraikawa, and Jiann Shing Shieh. A threshold-based algorithm of the identification of activities of daily living using a mobile device
fall detection using a wearable device with tri-axial accelerometer and accelerometer. Electronics, 9(3):509, 2020.
gyroscope. In 2015 International Conference on Intelligent Informatics [48] Daniele Riboni and Claudio Bettini. Cosar: hybrid reasoning for
and Biomedical Sciences (ICIIBMS), pages 54–57. IEEE, 2015. context-aware activity recognition. Personal and Ubiquitous Computing,
[27] TIBCO Software Inc. About data science textbook. 1995. 15(3):271–289, 2011.
[28] Tariqul Islam, Md Saiful Islam, Md Shajid-Ul-Mahmud, and [49] Daniel Roetenberg, Per J Slycke, and Peter H Veltink. Ambulatory
Md Hossam-E-Haider. Comparison of complementary and kalman position and orientation tracking fusing magnetic and inertial sensing.
filter based data fusion for attitude heading reference system. In AIP IEEE Transactions on Biomedical Engineering, 54(5):883–890, 2007.
[50] Raul Fernandez Rojas, Xu Huang, and Keng-Liang Ou. Toward a func-
tional near-infrared spectroscopy-based monitoring of pain assessment
for nonverbal patients. Journal of biomedical optics, 22(10):106013,
2017.
[51] Raul Fernandez Rojas, Xu Huang, and Keng-Liang Ou. A machine
learning approach for the identification of a biomarker of human pain
using fnirs. Scientific reports, 9(1):1–12, 2019.
[52] Nasir Saeed, Haewoon Nam, Mian Imtiaz Ul Haq, and Dost Bhatti
Muhammad Saqib. A survey on multidimensional scaling. ACM
Computing Surveys (CSUR), 51(3):1–25, 2018.
[53] Charlene V San Buenaventura and Nestor Michael C Tiglao. Basic
human activity recognition based on sensor fusion in smartphones.
In 2017 IFIP/IEEE Symposium on Integrated Network and Service
Management (IM), pages 1182–1185. IEEE, 2017.
[54] Dominik Schuldhaus, Heike Leutheuser, and Bjoern M Eskofier. Classi-
fication of daily life activities by decision level fusion of inertial sensor
data. In Proceedings of the 8th International Conference on Body Area
Networks, pages 77–82, 2013.
[55] Carlos Oscar Sánchez Sorzano, Javier Vargas, and A Pascual Mon-
tano. A survey of dimensionality reduction techniques. arXiv preprint
arXiv:1403.2877, 2014.
[56] Alaa Tharwat. Classification assessment methods. Applied Computing
and Informatics, 2020.
[57] Panagiotis Tsinganos and Athanassios Skodras. On the comparison
of wearable sensor data fusion to a single sensor machine learning
technique in fall detection. Sensors, 18(2):592, 2018.
[58] Can Tunca, Nezihe Pehlivan, Nağme Ak, Bert Arnrich, Gülüstü Salur,
and Cem Ersoy. Inertial sensor-based robust gait analysis in non-hospital
settings for neurological disorders. Sensors, 17(4):825, 2017.
[59] UCLA. A practical introduction to factor analysis.
[60] George Vavoulas, Charikleia Chatzaki, Thodoris Malliotakis, Matthew
Pediaditis, and Manolis Tsiknakis. The mobiact dataset: Recognition of
activities of daily living using smartphones. In ICT4AgeingWell, pages
143–151, 2016.
[61] Aiguo Wang, Guilin Chen, Jing Yang, Shenghui Zhao, and Chih-Yung
Chang. A comparative study on human activity recognition using inertial
sensors in a smartphone. IEEE Sensors Journal, 16(11):4566–4578,
2016.
[62] Jian Kang Wu, Liang Dong, and Wendong Xiao. Real-time physical
activity classification and tracking using wearble sensors. In 2007 6th
International Conference on Information, Communications & Signal
Processing, pages 1–6. IEEE, 2007.
[63] Piero Zappi, Thomas Stiefmeier, Elisabetta Farella, Daniel Roggen, Luca
Benini, and Gerhard Troster. Activity recognition from on-body sensors
by classifier fusion: sensor scalability and robustness. In 2007 3rd
international conference on intelligent sensors, sensor networks and
information, pages 281–286. IEEE, 2007.
[64] Makia Zmitri, Hassen Fourati, and Nicolas Vuillerme. Human activities
and postures recognition: From inertial measurements to quaternion-
based approaches. Sensors, 19(19):4058, 2019.
View publication stats

Human Activity Recognition With Accelerometer and Gyroscope: A Data Fusion Approach

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Human Activity Recognition With Accelerometer and Gyroscope: A Data Fusion Approach

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Human Activity Recognition With Accelerometer and Gyroscope: A Data Fusion

Article in IEEE Sensors Journal · May 2021

Raul Fernandez Rojas

The user has requested enhancement of the downloaded file.

Human Activity Recognition with Accelerometer

I. I NTRODUCTION Wearable sensors and smartphones are ubiquitous devices

H UMAN activity recognition (HAR) has become a very

Singular Value Decomposition (SVD), Multidimensional Scal- III. M ETHODS

TABLE VI exhibited the highest performance overall, with very similar

AVA (Acc = 0.4987 ± 0.1215), Magnitude (Acc = 0.5704 ±

Fig. 4. Comparison of the average processing time between all

levels of fusion are presented in blue (sensor-level), green

Regarding the processing time, most fusion methods were 0.85

and the Complementary Filter (time = 43.81ms ± 51.06) Kalman

exhibited the lowest average CPU time (time = 41.73ms ±

View publication stats

You might also like