Professional Documents
Culture Documents
MultiStream Deep CNN
MultiStream Deep CNN
ABSTRACT
Human Posture Recognition has been a focal point of research for over a decade. It’s primarily used
for remote detection of human postures, such as sitting, standing, lying, and walking, particularly in the
context of elderly care. While sensor-based, vision-based, and feature-based solutions exist, the advent
of affordable processing power has shifted researchers’ interest towards deep learning-based approaches.
These approaches have shown promising results in posture recognition. This study introduces a multi-
stream deep convolutional neural network for this purpose. We utilize pre-trained deep learning architectures
namely ResNet-50, ResNet-101, and VGG-16 to extract features. Principle Component Analysis (PCA) has
been applied for dimensionality reduction, thereby reducing computational costs. Furthermore, ensemble
machine learning classifiers are used to achieve high accuracy on the deep features obtained. Among various
ensemble methods used to enhance accuracy, Blending proved to be the most effective. Our proposed
approach, in contrast to existing state-of-the-art methods, was evaluated on five publicly available datasets:
KARD, MCF, NUCLA, URFD, and UP Fall. The results indicate that our approach surpasses existing
methods in terms of accuracy, precision, and recall, demonstrating its effectiveness.
INDEX TERMS
Human Posture Recognition, Ensemble Classification, Convolution Neural Network, Deep Learning
VOLUME 4, 2016 1
TABLE 1: Applications exploiting human posture recognition
TABLE 2: Optimized hyperparameters for the backbone deep networks of the proposed approach. AF*-Activation Function
Datasets
Model Hyperparameters
Epochs 50 10 32 30 100
Batch Size 32 32 100 32 32
ResNet-50 Optimizer adam adam SGD adam adam
Hidden relu relu relu relu relu
Layer(AF*)
Dense softmax softmax softmax softmax softmax
Layer(AF*)
(c) (d)
FIGURE 8: Performance comparison of PA with other existing approaches using a)NUCLA b)URFD c)UP Fall and d)KARD Dataset
datasets exceeding all other approaches. Using NUCLA, the accuracy. This was observed across multiple datasets, namely
accuracy of PA was 19.6, 11.2, 6.6, 9.1 times higher than URFD, NUCLA, KARD, UP Fall, and MCF, as illustrated in
Liu, Dhiman, Aftab and Fang, respectively. While the PA was Figure 9. Notably, ResNet-101 showed the least performance
4.42, 0.48, 0.85 times better than Aftab, Dhiman and Ahad on the URFD dataset. A possible explanation for this could
for the KARD dataset, respectively. be the lower resolution of the URFD dataset, which is at 30
frames per second. This lower resolution might not provide
enough detail for the ResNet-101 architecture to accurately
classify the data.
C. EXPERIMENT 2-5: COMPARING THE ACCURACY OF FIGURE 10: Performance comparison using URFD Dataset
PA WITH CUTTING-EDGE DEEP ARCHITECTURES
PA outperformed other advanced deep learning architectures,
including ResNet-101, ResNet-50, and VGG-16, in terms of
8 VOLUME 4, 2016
D. EXPERIMENT 6-9: PERFORMANCE COMPARISON OF most significant decline in PA’s performance was noted on
PA WITH ITS VARIANTS the MCF dataset. This dataset is uni-modal, meaning it only
After replacing the classifier of PA with AdaBoost, J48, considers a single visual modality. Furthermore, the dataset’s
Decision Table, Random Forest, SVM, and Naive Bayes, we small size and other constraints, such as the relatively low
were able to compare the performance of PA in terms of frame rate of 30 fps used to generate the samples, could
accuracy, precision, and recall. The graphical representation have contributed to this performance degradation. Table 3
as shown in Figure 10, 11 and 12 represents that the proposed summarizes the performance impact when the proposed
approach outperforms existing state-of-the-art techniques approach (blending) has been replaced with other classifiers.
given the URFD, NUCLA and KARD dataset. Moreover,
Figure 13 demonstrates that the proposed approach obtained
good results on UP Fall Front dataset. Similarly, according
to Figure 14, it can be observed that the proposed approach
achieved better accuracy, precision, and recall on the MCF
dataset with 2 and 3 categories.
FIGURE 15: Accuracy Performance of the Proposed Approach FIGURE 17: Recall performance of the Proposed Approach
compared with boosting and bagging compared with boosting and bagging
VOLUME 4, 2016 11
REFERENCES [22] F. Hajjej, M. Javeed, A. Ksibi, M. Alarfaj, K. Alnowaiser, A. Jalal,
[1] H. J. C. Friedrich Schwandt (CEO), “Statista,” https://www.statista.com/ N. Alsufyani, M. Shorfuzzaman, and J. Park, “Deep human motion
statistics/1251839/surveillance-technology-market-global/, 2023. detection and multi-features analysis for smart healthcare learning tools,”
[2] J. Park, K. Song, and Y.-S. Kim, “A kidnapping detection using human IEEE Access, vol. 10, pp. 116 527–116 539, 2022.
pose estimation in intelligent video surveillance systems,” Journal of the [23] S. P. Godse, S. Singh, S. Khule, V. Yadav, and S. Wakhare,
Korea Society of Computer and Information, vol. 23, pp. 9–16, 2018. “Musculoskeletal physiotherapy using artificial intelligence and machine
[3] M. H. J. Fanchamps, H. L. D. Horemans, G. M. Ribbers, H. J. learning,” International Journal of Innovative Science and Research
Stam, and J. B. J. Bussmann, “The accuracy of the detection of body Technology, vol. 4, no. 11, pp. 592–598, 2019.
postures and movements using a physical activity monitor in people [24] A. Tannoury, E. Choueiri, and R. Darazi, “Human pose estimation
after a stroke,” Sensors, vol. 18, no. 7, 2018. [Online]. Available: for physiotherapy following a car accident using depth-wise separable
https://www.mdpi.com/1424-8220/18/7/2167 convolutional neural networks.” Advances in transportation studies,
[4] B. Qiang, S. Zhang, Y. Zhan, W. Xie, and T. Zhao, “Improved vol. 59, 2023.
convolutional pose machines for human pose estimation using image [25] T. Hellsten, J. Karlsson, M. Shamsuzzaman, and G. Pulkkis, “The
sensor data,” Sensors, vol. 19, no. 3, 2019. [Online]. Available: potential of computer vision-based marker-less human motion analysis
https://www.mdpi.com/1424-8220/19/3/718 for rehabilitation,” Rehabilitation Process and Outcome, vol. 10, p.
[5] J. Han, W. Song, A. Gozho, Y. Sung, S. Ji, L. Song, L. Wen, and Q. Zhang, 11795727211022330, 2021.
“Lora-based smart iot application for smart city: an example of human [26] A. R. Shahzad and A. Jalal, “A smart surveillance system for pedestrian
posture detection,” Wireless Communications and Mobile Computing, vol. tracking and counting using template matching,” in 2021 International
2020, 2020. Conference on Robotics and Automation in Industry (ICRAI). IEEE,
[6] A. Nadeem, A. Jalal, and K. Kim, “Automatic human posture estimation 2021, pp. 1–6.
for sport activity recognition with robust body parts detection and entropy [27] O. F. Arowolo, E. O. Arogunjo, D. G. Owolabi, and E. D. Markus,
markov model,” Multimedia Tools and Applications, vol. 80, pp. 21 465– “Development of a human posture recognition system for surveillance
21 498, 2021. application,” International Journal of Computing and Digital Systems,
[7] A. Lmberis and A. Dittmar, “Advanced wearable health systems and vol. 10, 2021.
applications - research and development efforts in the european union,”
[28] D. P. P. Nagalakshmi Vallabhaneni, “The analysis of the impact of yoga
IEEE Engineering in Medicine and Biology Magazine, vol. 26, no. 3, pp.
on healthcare and conventional strategies for human pose recognition,”
29–33, 2007.
Turkish Journal of Computer and Mathematics Education (TURCOMAT),
[8] B. Boulay, F. Brémond, and M. Thonnat, “Human posture recognition in
vol. 12, no. 6, pp. 1772–1783, 2021.
video sequence,” 2003.
[9] M. A. Mousse, C. Motamed, and E. C. Ezin, “A multi-view human [29] S. Jain, A. Rustagi, S. Saurav, R. Saini, and S. Singh, “Three-dimensional
bounding volume estimation for posture recognition in elderly monitoring cnn-inspired deep learning architecture for yoga pose recognition in the
system,” in ICPR 2016, 2016. real-world environment,” Neural Computing and Applications, vol. 33, pp.
[10] Y.-H. Byeon, J.-Y. Lee, D.-H. Kim, and K.-C. Kwak, “Posture recognition 6427–6441, 2021.
using ensemble deep models under various home environments,” [30] S. Kothari, “Yoga pose classification using deep learning,” 2020.
Applied Sciences, vol. 10, no. 4, 2020. [Online]. Available: https: [31] N. Faujdar, S. Saraswat, and S. Sharma, “Human pose estimation using
//www.mdpi.com/2076-3417/10/4/1287 artificial intelligence with virtual gym tracker,” in 2023 6th International
[11] M. Graczyk, T. Lasota, B. Trawinski, and K. Trawiński, “Comparison of Conference on Information Systems and Computer Networks (ISCON).
bagging, boosting and stacking ensembles applied to real estate appraisal,” IEEE, 2023, pp. 1–5.
in Asian Conference on Intelligent Information and Database Systems, [32] H. Pardeshi, A. Ghaiwat, A. Thongire, K. Gawande, and M. Naik,
2010. “Fitness freaks: A system for detecting definite body posture using
[12] T. G. Dietterich, “An experimental comparison of three methods openpose estimation,” in Futuristic Trends in Networks and Computing
for constructing ensembles of decision trees: Bagging, boosting, and Technologies: Select Proceedings of Fourth International Conference on
randomization,” Mach. Learn., vol. 40, no. 2, p. 139–157, aug 2000. FTNCT 2021. Springer, 2022, pp. 1061–1072.
[Online]. Available: https://doi.org/10.1023/A:1007607513941 [33] A. Iazzi, M. Rziza, R. Oulad Haj Thami, and D. Aboutajdine, “A
[13] J. Abdollahi, B. Nouri-Moghaddam, and M. Ghazanfari, “Deep neural new method for fall detection of elderly based on human shape and
network based ensemble learning algorithms for the healthcare system motion variation,” in Advances in Visual Computing, G. Bebis, R. Boyle,
(diagnosis of chronic diseases),” arXiv preprint arXiv:2103.08182, 2021. B. Parvin, D. Koracin, F. Porikli, S. Skaff, A. Entezari, J. Min, D. Iwai,
[14] H. Faris, R. Abukhurma, W. Almanaseer, M. Saadeh, A. M. Mora, A. Sadagic, C. Scheidegger, and T. Isenberg, Eds. Cham: Springer
P. A. Castillo, and I. Aljarah, “Improving financial bankruptcy prediction International Publishing, 2016, pp. 156–167.
in a highly imbalanced class distribution using oversampling and [34] C. Pramerdorfer, R. Planinc, M. V. Loock, D. Fankhauser, M. Kampel,
ensemble learning: a case from the spanish market,” Progress in Artificial and M. Brandstötter, “Fall detection based on depth-data in practice,” in
Intelligence, vol. 9, pp. 31–53, 2020. European Conference on Computer Vision. Springer, 2016, pp. 195–208.
[15] A. A. Khalil, Z. Liu, A. Salah, A. Fathalla, and A. Ali, “Predicting [35] H.-G. Kang, M. Kang, and J.-G. Lee, “Efficient fall detection based on
insolvency of insurance companies in egyptian market using bagging event pattern matching in image streams,” in 2017 IEEE International
and boosting ensemble techniques,” IEEE Access, vol. 10, pp. 117 304– Conference on Big Data and Smart Computing (BigComp). IEEE, 2017,
117 314, 2022. pp. 51–58.
[16] N. Lower and F. Zhan, “A study of ensemble methods for cyber security,”
[36] V. A. Nguyen, T. H. Le, and T. T. Nguyen, “Single camera based fall
in 2020 10th Annual Computing and Communication Workshop and
detection using motion and human shape features,” in Proceedings of
Conference (CCWC). IEEE, 2020, pp. 1001–1009.
the Seventh Symposium on Information and Communication Technology,
[17] L. Rokach, “Ensemble methods for classifiers,” Data mining and
2016, pp. 339–344.
knowledge discovery handbook, pp. 957–980, 2005.
[18] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, pp. 241– [37] N. Zerrouki, F. Harrou, A. Houacine, and Y. Sun, “Fall detection using
259, 1992. supervised machine learning algorithms: A comparative study,” in 2016
[19] S. Deroski and B. Ženko, “Is combining classifiers with stacking better 8th international conference on modelling, identification and control
than selecting the best one?” Machine Learning, vol. 54, pp. 255–273, (ICMIC). IEEE, 2016, pp. 665–670.
2004. [38] K. Fan, P. Wang, Y. Hu, and B. Dou, “Fall detection via human posture
[20] N.-S. Pai, P.-X. Chen, P.-Y. Chen, and Z.-W. Wang, “Home fitness and representation and support vector machine,” International journal of
rehabilitation support system implemented by combining deep images and distributed sensor networks, vol. 13, no. 5, p. 1550147717707418, 2017.
machine learning using unity game engine,” Sens. Mater, vol. 34, pp. [39] A. Manzi, F. Cavallo, and P. Dario, “A 3d human posture approach for
1971–1990, 2022. activity recognition based on depth camera,” in European Conference on
[21] V. Muralidharan and V. Vijayalakshmi, “A real-time approach of fall Computer Vision. Springer, 2016, pp. 432–447.
detection and rehabilitation in elders using kinect xbox 360 and supervised [40] H. F. T. Ahmed, H. Ahmad, and C. Aravind, “Device free human
machine learning algorithm,” in Inventive Computation and Information gesture recognition using wi-fi csi: A survey,” Engineering Applications
Technologies: Proceedings of ICICIT 2021. Springer, 2022, pp. 119–138. of Artificial Intelligence, vol. 87, p. 103281, 2020.
12 VOLUME 4, 2016
[41] Y. M. Galvão, J. Ferreira, V. A. Albuquerque, P. Barros, and B. J. [63] W. Min, H. Cui, H. Rao, Z. Li, and L. Yao, “Detection of human falls
Fernandes, “A multimodal approach using deep learning for fall detection,” on furniture using scene analysis based on deep learning and activity
Expert Systems with Applications, vol. 168, p. 114226, 2021. characteristics,” IEEE Access, vol. 6, pp. 9324–9335, 2018.
[42] M. M. Islam, O. Tayan, M. R. Islam, M. S. Islam, S. Nooruddin, M. N. [64] “Spatio-temporal fall event detection in complex scenes using attention
Kabir, and M. R. Islam, “Deep learning based systems developed for fall guided lstm,” Pattern Recognition Letters, vol. 130, pp. 242–249, 2020,
detection: a review,” IEEE Access, vol. 8, pp. 166 117–166 137, 2020. image/Video Understanding and Analysis (IUVA). [Online]. Available:
[43] C. Pramerdorfer, R. Planinc, M. Van Loock, D. Fankhauser, M. Kampel, https://www.sciencedirect.com/science/article/pii/S016786551830504X
and M. Brandstötter, “Fall detection based on depth-data in practice,” in [65] A. Youssfi Alaoui, Y. Tabii, R. Oulad Haj Thami, M. Daoudi, S. Berretti,
Computer Vision – ECCV 2016 Workshops, G. Hua and H. Jégou, Eds. and P. Pala, “Fall detection of elderly people using the manifold of
Cham: Springer International Publishing, 2016, pp. 195–208. positive semidefinite matrices,” Journal of Imaging, vol. 7, no. 7, 2021.
[44] D. H. Hung and H. Saito, “Fall detection with two cameras based on [Online]. Available: https://www.mdpi.com/2313-433X/7/7/109
occupied area,” in Proc. of 18th Japan-Korea Joint Workshop on Frontier [66] M. E. N. Gomes, D. Macêdo, C. Zanchettin, P. S. G. de Mattos-Neto,
in Computer Vision, 2012, pp. 33–39. and A. Oliveira, “Multi-human fall detection and localization in videos,”
Computer Vision and Image Understanding, vol. 220, p. 103442, 2022.
[45] ——, “The estimation of heights and occupied areas of humans from two
[67] M. Salimi, J. J. Machado, and J. M. R. Tavares, “Using deep neural
orthogonal views for fall detection,” IEEJ Transactions on Electronics,
networks for human fall detection based on pose estimation,” Sensors,
Information and Systems, vol. 133, no. 1, pp. 117–127, 2013.
vol. 22, no. 12, p. 4544, 2022.
[46] E. Auvinet, C. Rougier, J. Meunier, A. St-Arnaud, and J. Rousseau,
[68] X. Wu, Y. Zheng, C.-H. Chu, L. Cheng, and J. Kim, “Applying deep
“Multiple cameras fall dataset,” DIRO-Université de Montréal, Tech. Rep,
learning technology for automatic fall detection using mobile sensors,”
vol. 1350, p. 24, 2010.
Biomedical Signal Processing and Control, vol. 72, p. 103355, 2022.
[47] M. Matilainen, M. Barnard, and O. Silvén, “Unusual activity recognition in [69] M. Morana, G. L. Re, and S. Gaglio, “Kard - kinect activity recognition
noisy environments,” in International Conference on Advanced Concepts dataset,” 2017.
for Intelligent Vision Systems. Springer, 2009, pp. 389–399. [70] L. Martínez-Villaseñor, H. Ponce, J. Brieva, E. Moya-Albor, J. Núñez-
[48] H.-G. Kang, M. Kang, and J.-G. Lee, “Efficient fall detection based on Martínez, and C. Peñafort-Asturiano, “Up-fall detection dataset: A
event pattern matching in image streams,” in 2017 IEEE International multimodal approach,” Sensors, vol. 19, no. 9, 2019. [Online]. Available:
Conference on Big Data and Smart Computing (BigComp), 2017, pp. 51– https://www.mdpi.com/1424-8220/19/9/1988
58. [71] S. Ali, R. Khan, A. Mahmood, M. Hassan, and a. Jeon, “Using temporal
[49] W. M. S. Abedi, D. Ibraheem Nadher, and A. T. Sadiq, “Modified deep covariance of motion and geometric features via boosting for human fall
learning method for body postures recognition,” International Journal of detection,” Sensors, vol. 18, p. 1918, 06 2018.
Advanced Science and Technology, vol. 29, pp. 3830–3841, 2020. [72] G. Goyal, N. Noceti, and F. Odone, “Cross-view action recognition with
[50] S. Liaqat, K. Dashtipour, K. Arshad, K. Assaleh, and N. Ramzan, “A small-scale datasets,” Image and Vision Computing, vol. 120, p. 104403,
hybrid posture detection framework: Integrating machine learning and 2022. [Online]. Available: https://www.sciencedirect.com/science/article/
deep neural networks,” IEEE Sensors Journal, vol. 21, no. 7, pp. 9515– pii/S0262885622000324
9522, 2021. [73] E. Alam, A. Sufian, P. Dutta, and M. Leo, “Vision-based human
[51] W. Ren, O. Ma, H. Ji, and X. Liu, “Human posture recognition using a fall detection systems using deep learning: A review,” Computers in
hybrid of fuzzy logic and machine learning approaches,” IEEE Access, Biology and Medicine, vol. 146, p. 105626, 2022. [Online]. Available:
vol. 8, pp. 135 628–135 639, 2020. https://www.sciencedirect.com/science/article/pii/S0010482522004188
[52] I. Noreen, M. Hamid, U. Akram, S. Malik, and M. Saleem, “Hand pose [74] S. Aftab, S. F. Ali, A. Mahmood, and U. Suleman, “A boosting framework
recognition using parallel multi stream cnn,” Sensors, vol. 21, no. 24, for human posture recognition using spatio-temporal features along with
2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/24/8469 radon transform,” Multimedia Tools and Applications, pp. 1–27, 2022.
[53] A. Iazzi, M. Rziza, and R. O. H. Thami, “Fall detection based on [75] A. S. B. Reddy and D. S. Juliet, “Transfer learning with resnet-50 for
posture analysis and support vector machine,” in 2018 4th International malaria cell-image classification,” in 2019 International Conference on
Conference on Advanced Technologies for Signal and Image Processing Communication and Signal Processing (ICCSP), 2019, pp. 0945–0949.
(ATSIP), 2018, pp. 1–6. [76] B. Li and D. Lima, “Facial expression recognition via resnet-50,”
International Journal of Cognitive Computing in Engineering, vol. 2, pp.
[54] ——, “Efficient fall activity recognition by combining shape and motion
57–64, 2021. [Online]. Available: https://www.sciencedirect.com/science/
features,” Computational Visual Media, vol. 6, no. 3, pp. 247–263, 2020.
article/pii/S2666307421000073
[55] A. Iazzi, M. Rziza, and R. Oulad Haj Thami, “Fall detection system-based
[77] P. Ghosal, L. Nandanwar, S. Kanchan, A. Bhadra, J. Chakraborty, and
posture-recognition for indoor environments,” Journal of Imaging, vol. 7,
D. Nandi, “Brain tumor classification using resnet-101 based squeeze and
no. 3, 2021. [Online]. Available: https://www.mdpi.com/2313-433X/7/3/
excitation deep neural network,” in 2019 Second International Conference
42
on Advanced Computational and Communication Paradigms (ICACCP),
[56] C. Ge, I. Y.-H. Gu, and J. Yang, “Human fall detection using segment- 2019, pp. 1–6.
level cnn features and sparse dictionary learning,” in 2017 IEEE 27th [78] K. Simonyan and A. Zisserman, “Very deep convolutional networks
International Workshop on Machine Learning for Signal Processing for large-scale image recognition,” 2014. [Online]. Available: https:
(MLSP), 2017, pp. 1–6. //arxiv.org/abs/1409.1556
[57] S. F. Ali, R. Khan, A. Mahmood, M. T. Hassan, and M. Jeon, “Using
temporal covariance of motion and geometric features via boosting for
human fall detection,” Sensors, vol. 18, no. 6, 2018. [Online]. Available:
https://www.mdpi.com/1424-8220/18/6/1918
[58] M. Mousse, IET Conference Proceedings, pp. 2 (6 .)–2 (6 .)(1),
January 2016. [Online]. Available: https://digital-library.theiet.org/
content/conferences/10.1049/ic.2016.0026
[59] K. Zhou, Y. Zhu, and Y. Zhao, “A spatio-temporal deep architecture for
surveillance event detection based on convlstm,” in 2017 IEEE Visual
Communications and Image Processing (VCIP), 2017, pp. 1–4.
[60] K. Fan, P. Wang, and S. Zhuang, “Human fall detection using slow feature
analysis,” Multimedia Tools Appl., vol. 78, no. 7, p. 9101–9128, apr 2019.
[Online]. Available: https://doi.org/10.1007/s11042-018-5638-9
[61] J. Liu, N. Akhtar, and A. Mian, “Learning human pose models from
synthesized data for robust rgb-d action recognition,” 2017. [Online].
Available: https://arxiv.org/abs/1707.00823
[62] D. Lahiri, C. Dhiman, and D. K. Vishwakarma, “Abnormal human
action recognition using average energy images,” in 2017 Conference on
Information and Communication Technology (CICT), 2017, pp. 1–5.
VOLUME 4, 2016 13
AMER HAMZA AAMIR BUTT is a graduate AHMED HASNAIN MIRZA is a software
of software engineering. He completed his engineer and a researcher, focusing on Artificial
bachelor’s in software engineering at the Intelligence and Machine Learning. He earned
University of Management and Technology, his degree in Software Engineering from the
Lahore, Pakistan, where he had the honor University of Management and Technology, where
of being rewarded with the Rector’s Merit he was honored with the Rector’s Merit Award.
Award. Amer is currently working as a Currently, Hasnain is employed as a Software
Mendix specialist with a mission to bridge Engineer and is preparing to embark on a Master’s
the gap between the low-code and AI fields. degree program in Applied Computing from
University of Windsor, Canada.
14 VOLUME 4, 2016