MultiStream Deep CNN

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI
Multi-stream deep convolution neural

network with blending ensemble
classification for Human Posture
Recognition
Amer Hamza Aamir1 , Syed Farooq Ali1 , Afifa Hameed 2 , Aaima Parvez1 , Ahmed Hasnain
Mirza 1 , Muhammad Bilal 3 and Muhammad Shehzad Hanif 3
1
Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan (e-mail:
f2019279026,farooq.ali @umt.edu.pk)
2
Department of Software Engineering, Faculty of Information and Technology, University of Central Punjab, Lahore, Pakistan
3
Center of Excellence in Intelligent Engineering Systems, Department of Electrical and Computer Engineering, King Abdulaziz University, Jeddah 21589, Saudi
Arabia
Corresponding author: Syed Farooq Ali (e-mail: farooq.ali@umt.edu.pk).
This research work was funded by Institutional Fund Projects under grant no. (IFPIP: 941-135-1443). The authors gratefully acknowledge
technical and financial support provided by the Ministry of Education and King Abdulaziz University, DSR, Jeddah, Saudi Arabia.
ABSTRACT
Human Posture Recognition has been a focal point of research for over a decade. It’s primarily used
for remote detection of human postures, such as sitting, standing, lying, and walking, particularly in the
context of elderly care. While sensor-based, vision-based, and feature-based solutions exist, the advent
of affordable processing power has shifted researchers’ interest towards deep learning-based approaches.
These approaches have shown promising results in posture recognition. This study introduces a multi-
stream deep convolutional neural network for this purpose. We utilize pre-trained deep learning architectures
namely ResNet-50, ResNet-101, and VGG-16 to extract features. Principle Component Analysis (PCA) has
been applied for dimensionality reduction, thereby reducing computational costs. Furthermore, ensemble
machine learning classifiers are used to achieve high accuracy on the deep features obtained. Among various
ensemble methods used to enhance accuracy, Blending proved to be the most effective. Our proposed
approach, in contrast to existing state-of-the-art methods, was evaluated on five publicly available datasets:
KARD, MCF, NUCLA, URFD, and UP Fall. The results indicate that our approach surpasses existing
methods in terms of accuracy, precision, and recall, demonstrating its effectiveness.
INDEX TERMS
Human Posture Recognition, Ensemble Classification, Convolution Neural Network, Deep Learning
I. INTRODUCTION and to capture any kind of abnormal activities. The statistics

as shown in Figure 1 demonstrate that the surveillance
HE importance in the research domain of computer
T vision has drastically elevated in recent years and
extensive research has been conducted in the domain of
technology market size is expected to have a drastic increase
from the year 2020 to the year 2026 [1]. These cameras
installed at public places record videos that further help in
Human Posture Recognition (HPR). The HPR can be referred deciding for any kind of unusual activity that is happening
to as a complicated field due to the colossal number of such as the detection of crimes like kidnapping and assault
postures to deal with. The arrangement of the body’s skeleton [2]. The surveillance systems capture the human posture
in posture is brought about either naturally or involuntarily that assist in identifying the correct or incorrect posture
by a person’s motion. Due to advancements in surveillance thus, aiding in different type of applications as shown in
systems, a copious amount of cameras by the year 2023 Table 1 such as in home environment, care centers for
has been placed at public places to deal with security issues
VOLUME 4, 2016 1
TABLE 1: Applications exploiting human posture recognition
Real-Time Applications References

1 Supportive Home Environments [20], [21]
2 Care Centers for the disabled [22]
3 Physiotherapy [23]–[25]
4 Surveillance Systems [26], [27]
5 Yoga [28]–[30]
6 Gym [6], [31], [32]
to improve the performance of HPR systems, the researchers

further began to incorporate the latest techniques known as
FIGURE 1: Surveillance technology market size worldwide from
convolutional neural networks (CNN) [10]. Over the past
2020 to 2026 ten years, scholars have become increasingly inquisitive
in ensemble learning approaches along with Convolutional
Neural Networks [11] [12] and have been rigorously applying
elders and patients, yoga centers and gyms etc. [3] to these techniques in different real-time application to enhance
identify imminent danger from their posture in domestic their performance measures. Some of the common domains
surroundings, to assists older persons specifically who lives where the ensemble learning techniques have been widely
alone and have difficulties executing specific duties, gaming exploited are healthcare systems [13], finance companies
industry [4] and as an IOT application for smart cities [5]. [14], insurance companies [15], cyber security [16] and
Recently, researchers have also been exploiting automatic many others. Hence, many models and classifiers have been
human posture detection to guide sportsmen to enhance their developed and integrated via ensemble learning to improve
sporting prowess and correct their postures [6]. Likewise, the classification accuracy of all such real-time systems
all kinds of anomalies and abnormal events or activities can working in different domains. Different types of ensemble
be automatically identified using these recorded videos after techniques have been practiced by the researchers such as
deploying the techniques of Deep Learning (DL). Numerous Bagging, Boosting, Stacking, Bayesian Model Combination,
researchers have been working on the recognition of human and Bucket of Models [17]. In [18], stacking ensemble
posture using the DL however, despite the research being learning [19] is proposed, which requires the creation of
done in Computer Vision (CV) along with DL, the issues linear combinations of several predictors to get generally
persist due to lack of training data, depth ambiguities, and better outcomes.
occlusion. Due to its widespread use, a novel approach has been
Several methods for recognizing human positions have proposed in this study that involves ensemble learning in
been put out as a result of real-world applications; these the domain of Human Posture Recognition. In the presented
methods can be broadly divided into sensor-based and vision- study, multiple individual learners were initially stacked,
based approaches. Various wearable gadgets with sensors namely J48, Random Forest (RF), Decision Table, SVM,
or emergency buttons have been designed and developed to and Naive Bayes, generating a new model. Later, Blending
monitor human posture [7]. However, there is a limitation was done using AdaBoost. To increase the prediction of
to this. it is not practically possible to wear these devices our suggested model without sacrificing time efficiency,
all the time to enable monitoring. These limitations were the meta-learner aggregates the outcomes of separate base
later overcome by the use of video cameras [8]. For the 2D learners.
approach, there are two methods used. In the first, moving- Our contributions to this paper include:
pixel projections are made based on the axis. However, the • A new approach using a multi-stream deep convolutional
second approach dissociate the human features into blocks neural network for the recognition of Human Posture has
and further uses Principle Component Analysis to learn been proposed.
the 2D postural appearances. Then, to make the earlier • The proposed architecture outstripped the existing
techniques independent of the camera position, a 3D posture architectures with respect to both accuracy and time
model was used. Finally, some preliminary findings about the efficiency on the KARD, MCF, URFD, NUCLA, and UP Fall
effectiveness of this strategy were presented and discussed. datasets.
This is still a challenging task due to the wide number of • To achieve better performance, improve robustness, and
postures a person has [8]. Other drawbacks include varied make better predictions, the ensembling technique known as
backgrounds, carrying objects, shadows, reflections, crowded blending has been used.
settings, etc. This problem is later addressed in [9] as they The remainder of the paper is organized as follows:
provided a mechanism known as multi-view human bounding Section II reviews the current methods for Human Posture
for estimating volume to recognize human postures. Recognition while Section III gives the details of various
Among all the numerous approaches proposed in the past datasets used in this research work. Section IV describes the
2 VOLUME 4, 2016
proposed framework for Human Posture Recognition. The dataset. Another study conducted by Liaqat et al. in 2021
Section V provides an overview of the experimental results [50] presented a hybrid approach that was based on ML
along with detailed discussion whereas the Section VI finally, (Machine Learning) classifiers such as KNN, SVM, LDA, RF
concludes the paper and provides future implications (Random Forest), QDA (Quadratic Discrete Analysis) and
DL (Deep Learning) classifiers such as 1D-CNN, 2D-CNN,
II. RELATED WORK LSTM (Long Short Term Memory) and Bidirectional LSTM
The researchers and scholars have been actively working to detect the postures. Similarly, another study presented in
on the Human Posture Recognition for several years now. 2020 by Ren et al. [51] proposed a hybrid approach using a
According to the literature, it has been observed that the Kinect sensor, Fuzzy Logic and Machine Learning algorithm
research work can be categorized into thresholding-based (SVM) to detect 20 human in-bed lying postures. In the year
methods [33]–[36] where the extracted features are compared 2021, Noreen et al. [52] proposed a 2D Convolutional Neural
with a defined threshold to identify whether the fall has Network to detect different Hand Poses such as Fist, Test1,
occurred or not. While the other techniques often used by Pinch, Thumb, Tiger Grasp, Test2 using datasets that are
researchers to detect human posture are machine learning and publicly available on Kaggle, First Person and Dexter.
deep learning algorithms [37]–[42]. Iazzi et al. presented a study in the year 2018 considering
According to Pramerdorfer et al. [43] a model was the horizontal and vertical variations of human silhouette
presented to detect the occurrence of a fall consisting of area and deploying a Support Vector Machine on MCF
three different states: initially the state of the person was dataset where an accuracy of 93.7% was achieved [53].
predicted in each frame followed by the detection of events Correspondingly, in the years 2020 and 2021, Iazzi el al.
using temporal analysis and finally the verification of fall improved their work by presenting a combination of shape
was carried out using different tests to reduce the false alarm and motion features while in the latter year, a framework was
rate. Similarly, Hung et al. in their subsequent research work proposed consisting of three phases: according to the initial
[44], [45] presented a model to detect fall incidents using two phase the human silhouette was extracted, in the subsequent
orthogonal cameras by considering the following feature: a phase the local as well as global features of human silhouette
person’s height along with the occupied area. Fall incidents were extracted and finally the combined features were fed to
were encountered based on a defined threshold using the different classifiers including RF, NN, DT, SVM, and KNN.
time-series analysis of human postures. The experiment was An accuracy of 96% and 97.29% was achieved using the
conducted on the Multiple Camera Fall (MCF) Dataset [46] SVM classifier, respectively, in the year 2020 and 2021 [54]
yielding a sensitivity of 95.8%. [55].
Matilainen et al. [47] also presented a model known as In the year 2017, Ge et al. [56] presented an approach that
BPS (Body Part Segmentation) to classify the usual and was based on sparse dictionaries along with CNN features
unusual activities. BPS simply compares the current posture to detect human fall. The experimental results showed an
of humans to the postures available in the training data. accuracy of 83% on the MCF dataset. The similar dataset
Walking and sitting were categorized as normal whereas was used by another researcher, Ali et al [57], in the year
all other postures including falls were classified as unusual 2018, where he presented a model for the detection of
or abnormal activity. However, there is a limitation to human fall using novel geometric and motion features. As
this work as the set of activities chosen was minimal a result, an accuracy of 97.5% was obtained. In the same
moreover, choosing an optimal threshold value is difficult manner, Mousse et al. implemented their proposed approach
and inefficient. Kang et al. [48] also presented a method for on the MCF dataset to classify the postures of humans by
fall detection (SW-FDM) based on the occurrence of event calculating the human bounding volume using the person’s
pattern using the particular human body postures and the height and the area of surface that was in contact with the
postures such as lying, or fall are classified using a predefined ground. The outcome depicted a sensitivity of 95.8% [58].
threshold value. The results of the experiment showed that Since promising results were obtained on fall detection
the processing cost as well as the time compared to other using machine learning classifiers and deep learning
existing front-line techniques, was low and the memory algorithms, therefore, another researcher proposed a
required was also minimum. recurrent model based on ConvLSTM by analyzing temporal
All these methods defined previously fail to adapt to the pooling on the same MCF dataset and demonstrated a
changing position of the cameras as well as the environment. specificity of 95.8% [59]. Fan et al. [60] in his study proposed
Due to these limitations of the threshold-based techniques, a method based on slow feature sequences using shape-based
the researchers were diverted towards the methods that are features and Directed Acyclic Graph SVM. An accuracy
based on machine learning or deep learning in order to of 94% and 96.57% was achieved on MCF and SDU Fall
acquire better performance in the fall detection systems. datasets, respectively. Due to the popularity of CNN and
Abedi et al. [49] his study proposed multi-stage technique transfer learning, Liu et al. [61] in his study presented
to recognize human body postures by constructing a 3D- invariant feature extractor using trained CNN model and
RCNN with spatiotemporal processes using two datasets transfer learning from RGB-D frames for the recognition of
namely the Fall Detection Dataset and the Human 3.6M human action.
VOLUME 4, 2016 3
Lahiri et al. [62], in the year 2017, proposed an approach details of the five datasets used in the research paper are given
known as Average Energy Image (AEI) that incorporates below.
the spatio-temporal features with the use of Histogram
Oriented Gradient (HOG) and Principle Component Analysis
(PCA). The experiment was conducted on URFD and
AbHa datasets, obtaining an accuracy of 92.5% and 98.9%,
respectively. However, the dataset of URFD was used by
FIGURE 2: Sample of NUCLA Dataset in Sitting Position
another researcher [63] in the same year for the detection of
fall on furniture where the proposed approach used Faster
R-CNN model while yielding an accuracy of 95.5%. Feng
et al. [64] in the year 2020, presented a complex scene fall
dataset using an LSTM to detect different variants of fall such
as side fall, backward fall and forward fall. The proposed
approach was evaluated on two datasets namely, MCF and FIGURE 3: Sample of KARD Dataset in Raising hand position
URFD, yielding a specificity of 93.5% and F-Measure of
93% on both the datasets, respectively. Youssfi et al. [65]
in the forthcoming year, demonstrated an approach that is
based on V2V-PoseNet Model detecting 2D images of the
Body Skeleton. The presented approach was analysed using
the similar dataset where a specificity of 93% was obtained. FIGURE 4: Sample of MCF Dataset in Laying Position
In the year 2022, Gomes et al. [66] proposed an
approach using the YOLO object detection algorithm
in combination with temporal classification methods and
incorporates the filter tracking algorithm to track the falls of
each person present in the scene. The two versions of the
proposed approach were created namely YOLOK+3DCNN
and YOLOK+2DCNN+LSTM. Correspondingly, Salimi et FIGURE 5: Sample of UP Fall Dataset in Falling Position
al. in the same time [67] proposed a solution based on the
method called as Fast Pose Estimation that incorporates the
use of Time-Distributed CNN LSTM along with 1D-CNN to
categorize the extracted frames. The proposed technique was
evaluated on the URFD dataset and the accuracy of 97% was
acquired as a result. Another work presented by Wu et al. [68] FIGURE 6: Sample of URFD Dataset in No Fall Position
deployed the Gated Recurrent Unit (GRU), a deep learning
model for the detection of fall on publically available datasets
and achieved a good accuracy when compared to other state- A. KARD
of-the-art machine learning algorithms. Kinetic Activity Recognition Dataset (KARD) contains 18
Thus, inspired by the encouraging results of the already activities [69] including hand raise, as shown in Figure 3,
discussed methods, we propose an approach with major horizontal arm wave, high arm wave, side kick, walk, and
contributions. We propose an ensemble classifier based on bend etc with each activity performed three times by 10
multiple streams of deep convolutional neural networks to different subjects. The total number of files is 2160. The
detect human postures. The robustness and strength of our dataset consists of 540 sequences for about a total of 1 hour
proposed approach is evaluated on five different datasets of videos captured at a resolution of 640x480 pixels at 30fps.
where different scenarios are exploited and the comparison is In this paper, KARD’s 5 categories were used with a total of
shown with the existing cutting-edge techniques. Hence, by 7698 frames.
exploiting the results of posture recognition, the efficient and
effective algorithm is proposed to detect fall while ensuring B. MCF
that proposed approach leads to an improvement in the The MCF dataset has frequent categorization of fall
accuracy of fall detection while minimizing the false alarms. occurrences [73]. It consists of 192 videos, 96 of which
represent fall and the remaining 96 normal events. This
III. DATASETS dataset includes 24 different circumstances where nine
The experiments were conducted on 5 datasets namely: different activities, such as crouching, walking, falling, and
KARD [69], UP Fall [70], MCF [71], URFD [71], and laying etc. are present as shown in Table 4. The video
NUCLA [72]. Samples from each dataset have been shown sequences have a frame rate of 30 fps and a 720 x 480
in Figure 2 through Figure 6. The focus of these datasets is frame size. In this paper, MCF’s 2 categories i.e. laying and
on human posture recognition and human fall detection. The walking, have been used with a total of 690 frames.
4 VOLUME 4, 2016
C. NUCLA 1) Residual Network ResNet-50
The Northwestern-UCLA Multiview Action 3D Dataset ResNet-50 has 48 Convolutional layers and is a version of
(NUCLA) contains RGB [72], depth, and human skeleton the ResNet-50 model. There are 3.8 x 109 Floating point
data captured simultaneously by three Kinect cameras. The operations with 1 MaxPool and 1 Average Pool layer [75].
ten action categories in this dataset are donning, doffing, The architecture provided by the ResNets makes it flexible to
tossing, carrying, picking up with one/two hands, dropping train the large complex deep neural networks, which means
garbage, moving around, sitting down, standing up, and that a network is able to perform and function efficiently with
walking. Ten performers participated in each action. The hundreds or thousands of layers. ResNet offers two different
sample has been collected from various viewpoints. In this types of mapping: identity mapping and residual mapping.
paper, NUCLA’s 5 categories have been used with a total of Identity mapping relates to the short cut connection [76] with
531 frames. an addition i.e. the output is y = F(x)+x while the residual
mapping refers to the difference, that is, y-x. To complete
D. URFD classification jobs, ResNet-50 first performs a convolution
The dataset known as URFD consists of 70 sequences operation on the input, then applies 4 residual blocks, and
(40 everyday life activities + 30 falls) [74] as shown in finally performs a full connection operation. The ResNet-50
Figure 6. The two Kinect cameras along with the associated has the following elements:
accelerometric data were used to apprehend the incidents of • A 7×7 kernel convolution
fall occurrences. However, the events are merely captured • A layer of max pooling
using the camera numbered 0 as well as accelerometer. • There are 9 layers with 3×3,64 kernel convolution, the
Devices such as PS Move (60Hz) along with x-IMU (256Hz) second feature kernels, and the other 1×1,256 kernels.
were used to gather the data generated by sensor. The dataset All 3 layers gets repeated 3 more times.
is set up as described below. Similarly, for the cameras • There are 12 layers including 1 by 1 with 128 kernels,
numbered 0 and 1, which are placed parallel to the ceiling 3 by 3 with 128 kernels, and 1 by 1 with 512 kernels,
as well as the floor, each of them includes an arrangement Each gets repeated 4 more times.
of depth along with the RGB pictures, synchronization • There are 18 layers consisting of 1 by 1 with 256 cores,
information, and the data of raw accelerometer. Every single and 2 cores 3 by 3,with 256 and 1 by 1 with 1024. Each
video stream is therefore, kept in its subsequent zip folder as gets repeated 6 times.
a sequence of png images. In this paper, URFD’s 2 categories
have been used with a total of 917 frames. 2) Residual Network ResNet-101
A convolutional neural network having 101 layers is known
E. UP FALL
as ResNet-101 [77]. The pre-trained network ca classify
The dataset consists of raw data and feature sets that were
images into 1000 different category of objects. Therefore, the
collected from 17 young and healthy people who conducted
network can capture rich feature representations suitable for a
11 activities and fell three times each [70]. The Figure 5
variety of images. The input image resolution of this network
shows a falling position. Additionally, more than 850 GB
224x224.
of data has been collected from several systems such as
vision systems, environmental and wearable sensors. Two
3) Visual Geometry Group-16
test use scenarios were displayed and in this paper, UP Fall’s
2 categories have been used with a total of 1165 frames. The 2014 ILSVRC competition’s runner-up was the Visual
Geometry Group (VGG). The major contribution of the
IV. PROPOSED METHODOLOGY work is that it visualizes the importance of network depth
In this study, we proposed a ensemble learning approach which is used for improving the accuracy of classification
using multi-stream deep convolution neural network and recognition in CNNs [78]. The size of the input is 224
with human posture recognition while encoding precise x 224 x 3 whereas, the same padding along with the 64
posture changes through feature enrichment. The datasets channels with a filter size of 3*3 are mainly considered in
were initially run through ResNet-50, and features from the 1st two layers of the network. The next two layers of the
the pooling layer were retrieved. This procedure was network have convulational layers having the filter size of
then repeated using ResNet-101 and VGG-16. These 128 along with the max pool layers. The pooling layer (2,2)
architectures’ dimensions were reduced using PCA after we is similar to the layer proceeding this layer, thus there are
extracted their features. Next, we used with a blending meta altogether 256 filters which are disseminated across the two
learner and three base learners (Naive Bayes, Random Forest, convulational layers with filter size of 3*3. Finally, there are
and SVM). two corresponding sets of 3 convulational layers where each
filter has the equal padding along with 512 filters of size 3*3.
A. DEEP LEARNING ARCHITECTURES The image at the end is received by the stack of these two
This section discusses the features of the three architectures convulational layers for the final output.
namely ResNet-50, ResNet-101, and VGG-16.
VOLUME 4, 2016 5
FIGURE 7: Proposed Architecture with features extracted from VGG-16, ResNet-50 and ResNet-101 followed by dimensionality reduction
by PCA followed by ensemble classification (Blending).
TABLE 2: Optimized hyperparameters for the backbone deep networks of the proposed approach. AF*-Activation Function
Datasets
Model Hyperparameters
MCF UP URFD KARD NUCLA

Fall
Epochs 50 10 100 30 100

Batch Size 32 32 32 32 32
ResNet-101 Optimizer adam adam SGD adam adam
Hidden relu relu relu relu relu
Layer(AF*)
Dense softmax softmax sigmoid softmax softmax
Layer(AF*)
Epochs 50 10 32 30 100
Batch Size 32 32 100 32 32
ResNet-50 Optimizer adam adam SGD adam adam
Layer(AF*)
Dense softmax softmax softmax softmax softmax
Layer(AF*)
Epochs 50 10 100 30 100

Batch Size 32 32 32 32 32
VGG-16 Optimizer adam adam SGD adam adam
Layer(AF*)
Dense softmax softmax softmax softmax softmax
Layer(AF*)
B. HYPERPARAMETER TUNING these models to be ultimately employed by the proposed

framework have been meticulously tuned and are depicted
Hyperparameter tuning is a critical step in machine learning in Table 2.
model development. It involves adjusting parameters like
learning rate, batch size, or number of layers to optimize
model performance. For instance, the optimal parameters for C. FEATURES EXTRACTION
VGG16, ResNet50, and ResNet101 are crucial for achieving In the context of image classification tasks, feature extraction
the best results. These parameters, such as the number is a crucial step that involves transforming raw image data
of filters or the learning rate, can significantly impact the into a suitable form for modeling by a machine learning
model’s accuracy and efficiency. The best parameters for algorithm. When using Convolutional Neural Networks
6 VOLUME 4, 2016
(CNNs), the process begins with the input of raw images indicated as level-1 model. These meta models, are thus,
into the network. These images are then processed through trained on the predictions generated by the base models on
a series of convolutional layers, each of which applies a the given sample dataset.
set of filters and generates a feature map. These feature • Level-0 Models (Base-Models): These base models are
maps capture local dependencies in the original image (such trained on the data, and their predictions are collected to
as edges, corners, and other texture details) through the establish the final output.
application of various filters. Following the convolutional • Level-1 Model (Meta-Model): These are the meta-
layers, pooling layers are used to reduce the spatial models that are used in succeeding order of base models
dimensions of the data, thereby controlling overfitting and and are used to gather the predictions generated by the
reducing computational complexity. The output of these base models.
layers is a high-level representation of the input image which
However, the concept of blending carries particular
captures the essential features needed for classification. This
implications regarding the construction of a stacking
output is then flattened into a one-dimensional vector and
ensemble model. Blending implies the creation of a stacking
fed into a dense layer that is fully connected network, which
ensemble wherein the base models consist of machine
performs the task of final classification. The fully connected
learning models of various kinds, while the meta-model is
layer uses these extracted features to classify the images
a linear model that combines or "blends" the predictions
taken as an input into several different classes based on the
generated by the base models.
training phase. Thus, CNNs automate the process of feature
extraction and classification, making them highly effective
V. EXPERIMENTS & RESULTS
for image classification tasks. In this work, we extract deep
features from pooling layers of ResNet-101, VGG-16, and The performance of the proposed approach (PA) was
ResNet-50 for UP Fall, URFD, MCF, NUCLA, and KARD analyzed and the results were compared with the cutting-
respectively. The low values of the misclassification rate edge techniques. The publically available datasets used for
demonstrate the efficiency with which our suggested deep the evaluation purpose were UP Fall with two categories,
features were able to categorize postures as seen later in the URFD with two categories, KARD with five categories,
results section. Similarly to that, each category’s percentage NUCLA with five categories, and MCF (with two and three
error in the aforementioned datasets is quite low. categories). The split ratio of 80:20 was used to compare the
PA with the state-of-the-art deep learning architectures like
1) Dimensionality Reduction VGG-16, ResNet-101 and ResNet-50 in terms of precision,
Due to their extremely large dimensions and high recall and accuracy.
computational cost, the combined deep features employed
in the suggested methodology are exceedingly expensive A. ENVIRONMENT
to process. So, to reduce dimensionality, we used The ensemble classification, and blending, were implemented
principle component analysis (PCA). By reducing a large using Weka version 3.9.6. The extraction of the deep features
set of variables while retaining the majority of the data was done in Google Colab environment. Because of the large
in the larger set, PCA is oftenly used to minimize the datasets, the WEKA execution was done using hardware
dimensionality of large complex data set. Due to reduction in with 8GB RAM, 128GB SSD, and a 2.3 GHz Intel core i5
dimensions, accuracy gets affected as variables in a dataset processor.
are reduced, but a trade-off to dimensionality reduction is
to compromise a bit of the accuracy to achieve a simpler B. EXPERIMENT 1: COMPARING ACCURACY WITH
model. Machine learning algorithms can handle smaller data COMPETING METHODS
sets more quickly since there are fewer additional factors to PA is compared with existing state-of-the-art approaches
consider. PCA generates a projection of the original data into using the following datasets i.e. KARD, MCF, UP Fall,
the same number of dimensions or fewer dimensions using URFD, and NUCLA as shown in Figure 8. It can be
straightforward matrix operations from linear algebra and observed that PA outperformed the other approaches and
statistics. Following the application of PCA to each dataset, achieved highest accuracy on all datasets. The proposed
we took the first three components. We then aggregated every approach demonstrated superior accuracy on the UP Fall
element from each dataset to get deep features with smaller dataset compared to other datasets, most likely attributed to
dimensions. its high-resolution images.
While certain datasets like URFD contained images with
D. BLENDING low resolution, the deep features, when combined with
Blending, a widely known ensemble classification technique, ensemble classification (blending) of PA, resulted in the
is used to combine the predictions generated from different second-highest accuracy (98.51%). The highest accuracy
multiple machine learning models called ensemble models. (97.95%) was achieved by Mohd et al. in 2017, and
This stacking model’s architecture includes more than one PA surpassed the performance of the previous approach.
base models that are indicated as level-0 while the rest are Similarly, the PA achieved the highest accuracy using UP fall
VOLUME 4, 2016 7
(a) (b)
(c) (d)
FIGURE 8: Performance comparison of PA with other existing approaches using a)NUCLA b)URFD c)UP Fall and d)KARD Dataset
datasets exceeding all other approaches. Using NUCLA, the accuracy. This was observed across multiple datasets, namely
accuracy of PA was 19.6, 11.2, 6.6, 9.1 times higher than URFD, NUCLA, KARD, UP Fall, and MCF, as illustrated in
Liu, Dhiman, Aftab and Fang, respectively. While the PA was Figure 9. Notably, ResNet-101 showed the least performance
4.42, 0.48, 0.85 times better than Aftab, Dhiman and Ahad on the URFD dataset. A possible explanation for this could
for the KARD dataset, respectively. be the lower resolution of the URFD dataset, which is at 30
frames per second. This lower resolution might not provide
enough detail for the ResNet-101 architecture to accurately
classify the data.
FIGURE 9: Accuracy Comparison with state-of-the-art deep

architectures
C. EXPERIMENT 2-5: COMPARING THE ACCURACY OF FIGURE 10: Performance comparison using URFD Dataset
PA WITH CUTTING-EDGE DEEP ARCHITECTURES
PA outperformed other advanced deep learning architectures,
including ResNet-101, ResNet-50, and VGG-16, in terms of
8 VOLUME 4, 2016
D. EXPERIMENT 6-9: PERFORMANCE COMPARISON OF most significant decline in PA’s performance was noted on
PA WITH ITS VARIANTS the MCF dataset. This dataset is uni-modal, meaning it only
After replacing the classifier of PA with AdaBoost, J48, considers a single visual modality. Furthermore, the dataset’s
Decision Table, Random Forest, SVM, and Naive Bayes, we small size and other constraints, such as the relatively low
were able to compare the performance of PA in terms of frame rate of 30 fps used to generate the samples, could
accuracy, precision, and recall. The graphical representation have contributed to this performance degradation. Table 3
as shown in Figure 10, 11 and 12 represents that the proposed summarizes the performance impact when the proposed
approach outperforms existing state-of-the-art techniques approach (blending) has been replaced with other classifiers.
given the URFD, NUCLA and KARD dataset. Moreover,
Figure 13 demonstrates that the proposed approach obtained
good results on UP Fall Front dataset. Similarly, according
to Figure 14, it can be observed that the proposed approach
achieved better accuracy, precision, and recall on the MCF
dataset with 2 and 3 categories.
FIGURE 13: Performance comparison using UP Fall Front Dataset
TABLE 3: Comparison of PA in terms of percentage accuracy by

replacing various classifiers using URFD, KARD, UP Fall, NUCLA
and MCF datasets.
URFD KARD UP Front NUCLA MCF

J48 93.3 90.52 83.52 92.3 71.42
FIGURE 11: Performance comparison using NUCLA Dataset SVM 90.37 78.56 97.64 98.71 84.03
NB 56.29 56.15 35.00 57.69 56.30
RF 97.7 91.58 97.64 96.15 79.83
PA 98.51 94.24 95.29 97.43 85.71
E. EXPERIMENT 10-12: PERFORMANCE COMPARISON

OF BLENDING IN PA WITH BOOSTING AND BAGGING
We have compared the result of our proposed approach
incorporating blending with bagging and boosting in terms
of accuracy, precision, and recall as shown in Figure 15,
16 and 17, respectively. The comparison demonstrates that
the proposed approach (blending) outperforms the other
methods. One of the prominent reasons for achieving better
results with blending is that it is a heterogeneous ensemble
method giving efficient results by handling the data leakage
FIGURE 12: Performance comparison using KARD Dataset issues using the hold-out approach. Therefore, blending
leverages the strengths of multiple base methods to improve
The PA exhibited the best accuracy on UP Fall and URFD the performance of predictive models.
datasets as compared to other datasets. Both the datasets
are multi-modal datasets. The UP Fall dataset has high- VI. CONCLUSION & FUTURE WORK
resolution images and it has also taken into consideration In the proposed framework, we introduced a novel ensemble
around 11 various types of activities such as falling in technique called blending, coupled with deep features, for
different directions, walking, standing, sitting, picking up the recognition of human posture. To ensure thorough
an object, and lying down. Similarly, with URFD, the PA experimentation, we compared this approach (referred
observed high accuracy as the dataset contains frontal and to as PA) with existing state-of-the-art methods. The
overhead video sequences which give good accuracy. The experimental findings illustrate that PA outperformed all
VOLUME 4, 2016 9
a) FIGURE 16: Precision performance of the Proposed Approach
with boosting and bagging
we integrated a blending technique into the algorithm. This

integration led to a noticeable improvement in the model’s
accuracy. Furthermore, we conducted a comparative analysis
of our results with various other ensemble techniques
and classifiers. These included boosting, Random Forest,
bagging, Decision Table, J48, SVM, Naive Bayes, and
AdaBoost. The experimental outcomes demonstrated that our
proposed method outperformed these other techniques and
classifiers. It achieved superior results not only in terms
of accuracy but also in precision and recall. This indicates
b) the robustness and reliability of our approach in diverse
scenarios.
FIGURE 14: Performance comparison using a)MCF with 2
categories, b) MCF with 3 categories
FIGURE 15: Accuracy Performance of the Proposed Approach FIGURE 17: Recall performance of the Proposed Approach
compared with boosting and bagging compared with boosting and bagging
In future endeavors, we aim to expand upon the proposed

existing methods, achieving accuracies of 98.51%, 97.40%, study to address more challenging aspects, particularly
85.71%, 100%, and 97.12% on URFD, NUCLA, MCF, those that have received less attention thus far in human
UP Front, and KARD datasets, respectively. Deep features posture recognition. This includes tackling scenarios such
for our proposed architecture were extracted from ResNet- as crowded areas, managing occlusion and camouflage,
101, VGG-16, and ResNet-50 architectures. To mitigate and dealing with poorly lit environments. Additionally, we
overfitting, PCA was applied to reduce dimensionality. intend to explore the application of various deep learning
Moreover, PA demonstrated superior performance compared algorithms, or a fusion of deep learning and machine learning
to state-of-the-art deep networks, including ResNet-101, algorithms, thus incorporating a hybrid framework. We
ResNet-50, and VGG-16. believe that such an approach will enhance the results and
In an effort to boost the efficacy of our proposed method, provide valuable insights for other researchers working in the
10 VOLUME 4, 2016
same domain.
VOLUME 4, 2016 11
REFERENCES [22] F. Hajjej, M. Javeed, A. Ksibi, M. Alarfaj, K. Alnowaiser, A. Jalal,
[1] H. J. C. Friedrich Schwandt (CEO), “Statista,” https://www.statista.com/ N. Alsufyani, M. Shorfuzzaman, and J. Park, “Deep human motion
statistics/1251839/surveillance-technology-market-global/, 2023. detection and multi-features analysis for smart healthcare learning tools,”
[2] J. Park, K. Song, and Y.-S. Kim, “A kidnapping detection using human IEEE Access, vol. 10, pp. 116 527–116 539, 2022.
pose estimation in intelligent video surveillance systems,” Journal of the [23] S. P. Godse, S. Singh, S. Khule, V. Yadav, and S. Wakhare,
Korea Society of Computer and Information, vol. 23, pp. 9–16, 2018. “Musculoskeletal physiotherapy using artificial intelligence and machine
[3] M. H. J. Fanchamps, H. L. D. Horemans, G. M. Ribbers, H. J. learning,” International Journal of Innovative Science and Research
Stam, and J. B. J. Bussmann, “The accuracy of the detection of body Technology, vol. 4, no. 11, pp. 592–598, 2019.
postures and movements using a physical activity monitor in people [24] A. Tannoury, E. Choueiri, and R. Darazi, “Human pose estimation
after a stroke,” Sensors, vol. 18, no. 7, 2018. [Online]. Available: for physiotherapy following a car accident using depth-wise separable
https://www.mdpi.com/1424-8220/18/7/2167 convolutional neural networks.” Advances in transportation studies,
[4] B. Qiang, S. Zhang, Y. Zhan, W. Xie, and T. Zhao, “Improved vol. 59, 2023.
convolutional pose machines for human pose estimation using image [25] T. Hellsten, J. Karlsson, M. Shamsuzzaman, and G. Pulkkis, “The
sensor data,” Sensors, vol. 19, no. 3, 2019. [Online]. Available: potential of computer vision-based marker-less human motion analysis
https://www.mdpi.com/1424-8220/19/3/718 for rehabilitation,” Rehabilitation Process and Outcome, vol. 10, p.
[5] J. Han, W. Song, A. Gozho, Y. Sung, S. Ji, L. Song, L. Wen, and Q. Zhang, 11795727211022330, 2021.
“Lora-based smart iot application for smart city: an example of human [26] A. R. Shahzad and A. Jalal, “A smart surveillance system for pedestrian
posture detection,” Wireless Communications and Mobile Computing, vol. tracking and counting using template matching,” in 2021 International
2020, 2020. Conference on Robotics and Automation in Industry (ICRAI). IEEE,
[6] A. Nadeem, A. Jalal, and K. Kim, “Automatic human posture estimation 2021, pp. 1–6.
for sport activity recognition with robust body parts detection and entropy [27] O. F. Arowolo, E. O. Arogunjo, D. G. Owolabi, and E. D. Markus,
markov model,” Multimedia Tools and Applications, vol. 80, pp. 21 465– “Development of a human posture recognition system for surveillance
21 498, 2021. application,” International Journal of Computing and Digital Systems,
[7] A. Lmberis and A. Dittmar, “Advanced wearable health systems and vol. 10, 2021.
applications - research and development efforts in the european union,”
[28] D. P. P. Nagalakshmi Vallabhaneni, “The analysis of the impact of yoga
IEEE Engineering in Medicine and Biology Magazine, vol. 26, no. 3, pp.
on healthcare and conventional strategies for human pose recognition,”
29–33, 2007.
Turkish Journal of Computer and Mathematics Education (TURCOMAT),
[8] B. Boulay, F. Brémond, and M. Thonnat, “Human posture recognition in
vol. 12, no. 6, pp. 1772–1783, 2021.
video sequence,” 2003.
[9] M. A. Mousse, C. Motamed, and E. C. Ezin, “A multi-view human [29] S. Jain, A. Rustagi, S. Saurav, R. Saini, and S. Singh, “Three-dimensional
bounding volume estimation for posture recognition in elderly monitoring cnn-inspired deep learning architecture for yoga pose recognition in the
system,” in ICPR 2016, 2016. real-world environment,” Neural Computing and Applications, vol. 33, pp.
[10] Y.-H. Byeon, J.-Y. Lee, D.-H. Kim, and K.-C. Kwak, “Posture recognition 6427–6441, 2021.
using ensemble deep models under various home environments,” [30] S. Kothari, “Yoga pose classification using deep learning,” 2020.
Applied Sciences, vol. 10, no. 4, 2020. [Online]. Available: https: [31] N. Faujdar, S. Saraswat, and S. Sharma, “Human pose estimation using
//www.mdpi.com/2076-3417/10/4/1287 artificial intelligence with virtual gym tracker,” in 2023 6th International
[11] M. Graczyk, T. Lasota, B. Trawinski, and K. Trawiński, “Comparison of Conference on Information Systems and Computer Networks (ISCON).
bagging, boosting and stacking ensembles applied to real estate appraisal,” IEEE, 2023, pp. 1–5.
in Asian Conference on Intelligent Information and Database Systems, [32] H. Pardeshi, A. Ghaiwat, A. Thongire, K. Gawande, and M. Naik,
2010. “Fitness freaks: A system for detecting definite body posture using
[12] T. G. Dietterich, “An experimental comparison of three methods openpose estimation,” in Futuristic Trends in Networks and Computing
for constructing ensembles of decision trees: Bagging, boosting, and Technologies: Select Proceedings of Fourth International Conference on
randomization,” Mach. Learn., vol. 40, no. 2, p. 139–157, aug 2000. FTNCT 2021. Springer, 2022, pp. 1061–1072.
[Online]. Available: https://doi.org/10.1023/A:1007607513941 [33] A. Iazzi, M. Rziza, R. Oulad Haj Thami, and D. Aboutajdine, “A
[13] J. Abdollahi, B. Nouri-Moghaddam, and M. Ghazanfari, “Deep neural new method for fall detection of elderly based on human shape and
network based ensemble learning algorithms for the healthcare system motion variation,” in Advances in Visual Computing, G. Bebis, R. Boyle,
(diagnosis of chronic diseases),” arXiv preprint arXiv:2103.08182, 2021. B. Parvin, D. Koracin, F. Porikli, S. Skaff, A. Entezari, J. Min, D. Iwai,
[14] H. Faris, R. Abukhurma, W. Almanaseer, M. Saadeh, A. M. Mora, A. Sadagic, C. Scheidegger, and T. Isenberg, Eds. Cham: Springer
P. A. Castillo, and I. Aljarah, “Improving financial bankruptcy prediction International Publishing, 2016, pp. 156–167.
in a highly imbalanced class distribution using oversampling and [34] C. Pramerdorfer, R. Planinc, M. V. Loock, D. Fankhauser, M. Kampel,
ensemble learning: a case from the spanish market,” Progress in Artificial and M. Brandstötter, “Fall detection based on depth-data in practice,” in
Intelligence, vol. 9, pp. 31–53, 2020. European Conference on Computer Vision. Springer, 2016, pp. 195–208.
[15] A. A. Khalil, Z. Liu, A. Salah, A. Fathalla, and A. Ali, “Predicting [35] H.-G. Kang, M. Kang, and J.-G. Lee, “Efficient fall detection based on
insolvency of insurance companies in egyptian market using bagging event pattern matching in image streams,” in 2017 IEEE International
and boosting ensemble techniques,” IEEE Access, vol. 10, pp. 117 304– Conference on Big Data and Smart Computing (BigComp). IEEE, 2017,
117 314, 2022. pp. 51–58.
[16] N. Lower and F. Zhan, “A study of ensemble methods for cyber security,”
[36] V. A. Nguyen, T. H. Le, and T. T. Nguyen, “Single camera based fall
in 2020 10th Annual Computing and Communication Workshop and
detection using motion and human shape features,” in Proceedings of
Conference (CCWC). IEEE, 2020, pp. 1001–1009.
the Seventh Symposium on Information and Communication Technology,
[17] L. Rokach, “Ensemble methods for classifiers,” Data mining and
2016, pp. 339–344.
knowledge discovery handbook, pp. 957–980, 2005.
[18] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, pp. 241– [37] N. Zerrouki, F. Harrou, A. Houacine, and Y. Sun, “Fall detection using
259, 1992. supervised machine learning algorithms: A comparative study,” in 2016
[19] S. Deroski and B. Ženko, “Is combining classifiers with stacking better 8th international conference on modelling, identification and control
than selecting the best one?” Machine Learning, vol. 54, pp. 255–273, (ICMIC). IEEE, 2016, pp. 665–670.
2004. [38] K. Fan, P. Wang, Y. Hu, and B. Dou, “Fall detection via human posture
[20] N.-S. Pai, P.-X. Chen, P.-Y. Chen, and Z.-W. Wang, “Home fitness and representation and support vector machine,” International journal of
rehabilitation support system implemented by combining deep images and distributed sensor networks, vol. 13, no. 5, p. 1550147717707418, 2017.
machine learning using unity game engine,” Sens. Mater, vol. 34, pp. [39] A. Manzi, F. Cavallo, and P. Dario, “A 3d human posture approach for
1971–1990, 2022. activity recognition based on depth camera,” in European Conference on
[21] V. Muralidharan and V. Vijayalakshmi, “A real-time approach of fall Computer Vision. Springer, 2016, pp. 432–447.
detection and rehabilitation in elders using kinect xbox 360 and supervised [40] H. F. T. Ahmed, H. Ahmad, and C. Aravind, “Device free human
machine learning algorithm,” in Inventive Computation and Information gesture recognition using wi-fi csi: A survey,” Engineering Applications
Technologies: Proceedings of ICICIT 2021. Springer, 2022, pp. 119–138. of Artificial Intelligence, vol. 87, p. 103281, 2020.
12 VOLUME 4, 2016
[41] Y. M. Galvão, J. Ferreira, V. A. Albuquerque, P. Barros, and B. J. [63] W. Min, H. Cui, H. Rao, Z. Li, and L. Yao, “Detection of human falls
Fernandes, “A multimodal approach using deep learning for fall detection,” on furniture using scene analysis based on deep learning and activity
Expert Systems with Applications, vol. 168, p. 114226, 2021. characteristics,” IEEE Access, vol. 6, pp. 9324–9335, 2018.
[42] M. M. Islam, O. Tayan, M. R. Islam, M. S. Islam, S. Nooruddin, M. N. [64] “Spatio-temporal fall event detection in complex scenes using attention
Kabir, and M. R. Islam, “Deep learning based systems developed for fall guided lstm,” Pattern Recognition Letters, vol. 130, pp. 242–249, 2020,
detection: a review,” IEEE Access, vol. 8, pp. 166 117–166 137, 2020. image/Video Understanding and Analysis (IUVA). [Online]. Available:
[43] C. Pramerdorfer, R. Planinc, M. Van Loock, D. Fankhauser, M. Kampel, https://www.sciencedirect.com/science/article/pii/S016786551830504X
and M. Brandstötter, “Fall detection based on depth-data in practice,” in [65] A. Youssfi Alaoui, Y. Tabii, R. Oulad Haj Thami, M. Daoudi, S. Berretti,
Computer Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds. and P. Pala, “Fall detection of elderly people using the manifold of
Cham: Springer International Publishing, 2016, pp. 195–208. positive semidefinite matrices,” Journal of Imaging, vol. 7, no. 7, 2021.
[44] D. H. Hung and H. Saito, “Fall detection with two cameras based on [Online]. Available: https://www.mdpi.com/2313-433X/7/7/109
occupied area,” in Proc. of 18th Japan-Korea Joint Workshop on Frontier [66] M. E. N. Gomes, D. Macêdo, C. Zanchettin, P. S. G. de Mattos-Neto,
in Computer Vision, 2012, pp. 33–39. and A. Oliveira, “Multi-human fall detection and localization in videos,”
Computer Vision and Image Understanding, vol. 220, p. 103442, 2022.
[45] ——, “The estimation of heights and occupied areas of humans from two
[67] M. Salimi, J. J. Machado, and J. M. R. Tavares, “Using deep neural
orthogonal views for fall detection,” IEEJ Transactions on Electronics,
networks for human fall detection based on pose estimation,” Sensors,
Information and Systems, vol. 133, no. 1, pp. 117–127, 2013.
vol. 22, no. 12, p. 4544, 2022.
[46] E. Auvinet, C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau,
[68] X. Wu, Y. Zheng, C.-H. Chu, L. Cheng, and J. Kim, “Applying deep
“Multiple cameras fall dataset,” DIRO-Université de Montréal, Tech. Rep,
learning technology for automatic fall detection using mobile sensors,”
vol. 1350, p. 24, 2010.
Biomedical Signal Processing and Control, vol. 72, p. 103355, 2022.
[47] M. Matilainen, M. Barnard, and O. Silvén, “Unusual activity recognition in [69] M. Morana, G. L. Re, and S. Gaglio, “Kard - kinect activity recognition
noisy environments,” in International Conference on Advanced Concepts dataset,” 2017.
for Intelligent Vision Systems. Springer, 2009, pp. 389–399. [70] L. Martínez-Villaseñor, H. Ponce, J. Brieva, E. Moya-Albor, J. Núñez-
[48] H.-G. Kang, M. Kang, and J.-G. Lee, “Efficient fall detection based on Martínez, and C. Peñafort-Asturiano, “Up-fall detection dataset: A
event pattern matching in image streams,” in 2017 IEEE International multimodal approach,” Sensors, vol. 19, no. 9, 2019. [Online]. Available:
Conference on Big Data and Smart Computing (BigComp), 2017, pp. 51– https://www.mdpi.com/1424-8220/19/9/1988
58. [71] S. Ali, R. Khan, A. Mahmood, M. Hassan, and a. Jeon, “Using temporal
[49] W. M. S. Abedi, D. Ibraheem Nadher, and A. T. Sadiq, “Modified deep covariance of motion and geometric features via boosting for human fall
learning method for body postures recognition,” International Journal of detection,” Sensors, vol. 18, p. 1918, 06 2018.
Advanced Science and Technology, vol. 29, pp. 3830–3841, 2020. [72] G. Goyal, N. Noceti, and F. Odone, “Cross-view action recognition with
[50] S. Liaqat, K. Dashtipour, K. Arshad, K. Assaleh, and N. Ramzan, “A small-scale datasets,” Image and Vision Computing, vol. 120, p. 104403,
hybrid posture detection framework: Integrating machine learning and 2022. [Online]. Available: https://www.sciencedirect.com/science/article/
deep neural networks,” IEEE Sensors Journal, vol. 21, no. 7, pp. 9515– pii/S0262885622000324
9522, 2021. [73] E. Alam, A. Sufian, P. Dutta, and M. Leo, “Vision-based human
[51] W. Ren, O. Ma, H. Ji, and X. Liu, “Human posture recognition using a fall detection systems using deep learning: A review,” Computers in
hybrid of fuzzy logic and machine learning approaches,” IEEE Access, Biology and Medicine, vol. 146, p. 105626, 2022. [Online]. Available:
vol. 8, pp. 135 628–135 639, 2020. https://www.sciencedirect.com/science/article/pii/S0010482522004188
[52] I. Noreen, M. Hamid, U. Akram, S. Malik, and M. Saleem, “Hand pose [74] S. Aftab, S. F. Ali, A. Mahmood, and U. Suleman, “A boosting framework
recognition using parallel multi stream cnn,” Sensors, vol. 21, no. 24, for human posture recognition using spatio-temporal features along with
2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/24/8469 radon transform,” Multimedia Tools and Applications, pp. 1–27, 2022.
[53] A. Iazzi, M. Rziza, and R. O. H. Thami, “Fall detection based on [75] A. S. B. Reddy and D. S. Juliet, “Transfer learning with resnet-50 for
posture analysis and support vector machine,” in 2018 4th International malaria cell-image classification,” in 2019 International Conference on
Conference on Advanced Technologies for Signal and Image Processing Communication and Signal Processing (ICCSP), 2019, pp. 0945–0949.
(ATSIP), 2018, pp. 1–6. [76] B. Li and D. Lima, “Facial expression recognition via resnet-50,”
International Journal of Cognitive Computing in Engineering, vol. 2, pp.
[54] ——, “Efficient fall activity recognition by combining shape and motion
57–64, 2021. [Online]. Available: https://www.sciencedirect.com/science/
features,” Computational Visual Media, vol. 6, no. 3, pp. 247–263, 2020.
article/pii/S2666307421000073
[55] A. Iazzi, M. Rziza, and R. Oulad Haj Thami, “Fall detection system-based
[77] P. Ghosal, L. Nandanwar, S. Kanchan, A. Bhadra, J. Chakraborty, and
posture-recognition for indoor environments,” Journal of Imaging, vol. 7,
D. Nandi, “Brain tumor classification using resnet-101 based squeeze and
no. 3, 2021. [Online]. Available: https://www.mdpi.com/2313-433X/7/3/
excitation deep neural network,” in 2019 Second International Conference
42
on Advanced Computational and Communication Paradigms (ICACCP),
[56] C. Ge, I. Y.-H. Gu, and J. Yang, “Human fall detection using segment- 2019, pp. 1–6.
level cnn features and sparse dictionary learning,” in 2017 IEEE 27th [78] K. Simonyan and A. Zisserman, “Very deep convolutional networks
International Workshop on Machine Learning for Signal Processing for large-scale image recognition,” 2014. [Online]. Available: https:
(MLSP), 2017, pp. 1–6. //arxiv.org/abs/1409.1556
[57] S. F. Ali, R. Khan, A. Mahmood, M. T. Hassan, and M. Jeon, “Using
temporal covariance of motion and geometric features via boosting for
human fall detection,” Sensors, vol. 18, no. 6, 2018. [Online]. Available:
https://www.mdpi.com/1424-8220/18/6/1918
[58] M. Mousse, IET Conference Proceedings, pp. 2 (6 .)–2 (6 .)(1),
January 2016. [Online]. Available: https://digital-library.theiet.org/
content/conferences/10.1049/ic.2016.0026
[59] K. Zhou, Y. Zhu, and Y. Zhao, “A spatio-temporal deep architecture for
surveillance event detection based on convlstm,” in 2017 IEEE Visual
Communications and Image Processing (VCIP), 2017, pp. 1–4.
[60] K. Fan, P. Wang, and S. Zhuang, “Human fall detection using slow feature
analysis,” Multimedia Tools Appl., vol. 78, no. 7, p. 9101–9128, apr 2019.
[Online]. Available: https://doi.org/10.1007/s11042-018-5638-9
[61] J. Liu, N. Akhtar, and A. Mian, “Learning human pose models from
synthesized data for robust rgb-d action recognition,” 2017. [Online].
Available: https://arxiv.org/abs/1707.00823
[62] D. Lahiri, C. Dhiman, and D. K. Vishwakarma, “Abnormal human
action recognition using average energy images,” in 2017 Conference on
Information and Communication Technology (CICT), 2017, pp. 1–5.
VOLUME 4, 2016 13
AMER HAMZA AAMIR BUTT is a graduate AHMED HASNAIN MIRZA is a software
of software engineering. He completed his engineer and a researcher, focusing on Artificial
bachelor’s in software engineering at the Intelligence and Machine Learning. He earned
University of Management and Technology, his degree in Software Engineering from the
Lahore, Pakistan, where he had the honor University of Management and Technology, where
of being rewarded with the Rector’s Merit he was honored with the Rector’s Merit Award.
Award. Amer is currently working as a Currently, Hasnain is employed as a Software
Mendix specialist with a mission to bridge Engineer and is preparing to embark on a Master’s
the gap between the low-code and AI fields. degree program in Applied Computing from
University of Windsor, Canada.
SYED FAROOQ ALI did his PhD (CS) from

UMT, Pakistan. He did his Ph.D. Course work,
Ph.D. Comprehensive exam and MS (CS) from
Ohio State University, Columbus, USA. He also
completed his MS (CS) from LUMS, Lahore, MUHAMMAD BILAL is an educator, researcher
Pakistan with Deans Honor List. During his stay in and a maker. His research interests include Digital
MS, he was on LUMS fellowship. He is currently Image/Signal Processing, Machine Learning/AI,
working as an Assistant Professor, UMT. His Digital/Analog circuit design, Embedded systems
research interest includes computer vision, digital and Robotics. His research work has been
image processing, and medical imaging. He is a extensively published. He is an Associate
reviewer for various IEEE conferences and journals. Professor in the Department of Electrical
and Computer Engineering, King Abdulaziz
University, Jeddah, Saudi Arabia. Prior to joining
KAU in 2014, he worked as a post-doctoral
researcher at KAIST, South Korea.
AFIFA HAMEED is a faculty member at Software
Engineering Department at University of Central
Punjab, UCP. She is involved in teaching and
research for the last six years. She won the highest
prestigious Award i.e., Rector Award during her
studies. She did BS from Government College
University, Lahore and later, completed her MS
from Kinnaird College University, Lahore. Her
research interests include computer vision, image
processing and artificial intelligence.
MUHAMMAD SHEHZAD HANIF received
the B.Sc. degree in electrical engineering from
the University of Engineering and Technology,
Lahore, Pakistan, in 2001, and the M.S. degree
in engineering sciences and the Ph.D. degree in
computer engineering from Sorbonne University,
AAIMA is a graduate student at the University
Paris, France, in 2006 and 2009, respectively.
of Management and Technology. She won the
He is currently an Associate Professor with the
Rector’s Merit Award during her studies, which
Department of Electrical and Computer, King
is the most prestigious award at the university.
Abdulaziz University, Jeddah, Saudi Arabia. His
Apart from that, she is currently working as
research interests include machine learning, image analysis, and information
a manager in the digital marketing sector in
fusion.
Pakistan. Her areas of interest include artificial
intelligence, knowledge management, and project
management.
14 VOLUME 4, 2016

MultiStream Deep CNN

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MultiStream Deep CNN

Uploaded by

Copyright:

Available Formats

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2021.DOI

Multi-stream deep convolution neural

I. INTRODUCTION and to capture any kind of abnormal activities. The statistics

Real-Time Applications References

to improve the performance of HPR systems, the researchers

MCF UP URFD KARD NUCLA

Epochs 50 10 100 30 100

Epochs 50 10 100 30 100

B. HYPERPARAMETER TUNING these models to be ultimately employed by the proposed

FIGURE 9: Accuracy Comparison with state-of-the-art deep

FIGURE 13: Performance comparison using UP Fall Front Dataset

TABLE 3: Comparison of PA in terms of percentage accuracy by

URFD KARD UP Front NUCLA MCF

E. EXPERIMENT 10-12: PERFORMANCE COMPARISON

we integrated a blending technique into the algorithm. This

In future endeavors, we aim to expand upon the proposed

SYED FAROOQ ALI did his PhD (CS) from

You might also like