Professional Documents
Culture Documents
SLFLSDFKSFLDKJ
SLFLSDFKSFLDKJ
Sultan Abylkairov
School of Sciences and Humanities
Mathematics
sultan.abylkairov@nu.edu.kz
1
During data representation each video is converted into a on incorporating the chosen STIP feature extraction method
sequence of STIP feature vectors. To improve action recog- and SVM classifier.
nition techniques are used to segment videos into segments As we move forward with developing and testing our
or shots. This helps the model effectively identify action action recognition system we expect to see some results.
sequences. These findings will help shape our project and add to our
understanding of video based action recognition using the
3.3. Main approach UCF Sports Dataset.
In this step we choose the Support Vector Machine
(SVM) [15] as our classification model to differentiate be- 5. Conclusion
tween ten actions. SVM is a used algorithm in machine The proposed technical method provides a structure
learning and computer vision in video based action recog- for video-based action recognition using the UCF Sports
nition. Selecting the SVM configuration ensures perfor- Dataset. It emphasizes the significance of data prepara-
mance, in action recognition tasks. tion, extracting features, selecting models and conducting
We train our model using the SVM classifier and STIP evaluations to guarantee the recognition systems accuracy
features extracted from video frames. The training process and dependability. This project makes a contribution to the
involves refining the SVM model by experimenting with field of computer vision and action recognition by provid-
regularization settings to enhance its accuracy and overall ing insights into designing models, optimizing parameters
performance. and employing evaluation strategies.
3.4. Evaluation Metric References
To thoroughly assess the models performance we imple- [1] J. Donahue, L. Anne Hendricks, S. Guadarrama,
ment Leave One Out (LOO) validation methodology [14]. M. Rohrbach, S. Venugopalan, K. Saenko, and T. Dar-
This approach guarantees that every data point is utilized rell. Long-term recurrent convolutional networks for visual
for both training and testing resulting in an assessment of recognition and description. In Proceedings of the IEEE
the system. We. Record evaluation metrics such as accu- conference on computer vision and pattern recognition,
racy, precision, recall and F1 score, for each action category pages 2625–2634, 2015. 1
providing a comprehension of the model’s efficacy. [2] H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Two
stream lstm: A deep fusion framework for human action
recognition. In 2017 IEEE winter conference on applications
4. Preliminary results of computer vision (WACV), pages 177–186. IEEE, 2017. 1
In our project we have made progress in preparing and [3] C. Gao, Y. Du, J. Liu, J. Lv, L. Yang, D. Meng, and A. G.
planning our video based action recognition project. We Hauptmann. Infar dataset: Infrared action recognition at dif-
conducted a review of existing literature that focused on ferent times. Neurocomputing, 212:36–47, 2016. 1
[4] R. Ghosh, A. Gupta, A. Nakagawa, A. Soares, and
methods for extracting features, designing classifiers and
N. Thakor. Spatiotemporal filtering for event-based action
evaluating the effectiveness of video based action recog-
recognition. arXiv preprint arXiv:1903.07067, 2019. 1
nition. This initial step provided us with insights into the
[5] Z. Jiang, V. Rozgic, and S. Adali. Learning spatiotemporal
practices and techniques used by researchers in this field. features for infrared action recognition with 3d convolutional
After consideration we decided to use the UCF Sports neural networks. In Proceedings of the IEEE conference on
Dataset as our dataset for action recognition. We thoroughly computer vision and pattern recognition workshops, pages
examined the dataset to gain an understanding of its con- 115–123, 2017. 1
tents, which includes ten distinct actions. [6] R. Kavi, V. Kulathumani, F. Rohit, and V. Kecojevic. Mul-
For feature extraction we chose the Space Time Interest tiview fusion for activity recognition using deep neural
Point (STIP) method. This method allows us to capture in- networks. Journal of Electronic Imaging, 25(4):043010–
formation within video frames making it ideal for our action 043010, 2016. 1
recognition task. [7] Y. Kim and T. Moon. Human detection and activity classifi-
cation based on micro-doppler signatures using deep convo-
As for the classification component of our project we
lutional neural networks. IEEE geoscience and remote sens-
opted to use Support Vector Machine (SVM) as our clas-
ing letters, 13(1):8–12, 2015. 1
sifier. SVMs are known for their effectiveness in handling [8] I. Laptev. On space-time interest points. International jour-
action recognition tasks and their ability to capture linear nal of computer vision, 64:107–123, 2005. 1
relationships within data. [9] W. Lin, M.-T. Sun, R. Poovandran, and Z. Zhang. Human
Currently we are actively involved in developing the pro- activity recognition for video surveillance. 2008 IEEE Inter-
gram code for our action recognition system. We’re mak- national Symposium on Circuits and Systems (ISCAS), pages
ing progress, with the coding process specifically working 2737–2740, 2008. 1
[10] J. Liu, A. Shahroudy, D. Xu, A. C. Kot, and G. Wang.
Skeleton-based action recognition using spatio-temporal
lstm network with trust gates. IEEE transactions on pattern
analysis and machine intelligence, 40(12):3007–3021, 2017.
1
[11] R. Poppe. A survey on vision-based human action recog-
nition. Image and Vision Computing, 28(6):976–990, 2010.
1
[12] H. Rahmani and A. Mian. 3d action recognition from novel
viewpoints. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 1506–1515,
2016. 1
[13] I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, E. Mavroudi,
A. Katsamanis, A. Tsiami, and P. Maragos. Multimodal hu-
man action recognition in assistive human-robot interaction.
2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 2702–2706, 2016. 1
[14] M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach
a spatio-temporal maximum average correlation height filter
for action recognition. In 2008 IEEE conference on computer
vision and pattern recognition, pages 1–8. IEEE, 2008. 1, 2
[15] K. Simonyan and A. Zisserman. Two-stream convolutional
networks for action recognition in videos. Advances in neu-
ral information processing systems, 27, 2014. 2
[16] K. Soomro and A. R. Zamir. Action recognition in realistic
sports videos. In Computer vision in sports, pages 181–208.
Springer, 2015. 1
[17] Y. Wang, Y. Xiao, F. Xiong, W. Jiang, Z. Cao, J. T. Zhou,
and J. Yuan. 3dv: 3d dynamic voxel for action recognition
in depth video. In Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, pages 511–520,
2020. 1
[18] R. Yang and R. Yang. Dmm-pyramid based deep architec-
tures for action recognition with depth cameras. In Asian
Conference on Computer Vision, pages 37–49. Springer,
2014. 1
[19] M. Zeng, L. T. Nguyen, B. Yu, O. J. Mengshoel, J. Zhu,
P. Wu, and J. Zhang. Convolutional neural networks for hu-
man activity recognition using mobile sensors. In 6th inter-
national conference on mobile computing, applications and
services, pages 197–205. IEEE, 2014. 1