Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

The occurrences of violence in daily basic one of the major issues for a long time.

It can easily destroy the


peace and harmony of any society. Tough the criminal activities from 2014 to 2017 had declined a lot.
But from 2017 it started to rise again. we can see an uprise of 6.79% from 2017 to 2018. The rise of
violent behavior in public areas is happening due to various factors. Individual greed, frustration, and
hatred, as well as social and economic insecurity, are the leading causes of violence. To solve this issue
the expected or unexpected violence should be detected at an early stage so that it can be stopped as soon
as possible.

Computer vision and deep learning have recently been used to investigate human actions and behavior.
Despite being the most terrifying societal issue, there are few works that automate action detection,
violence detection, or protest detection. In terms of social security and stability, this field of study is quite
useful. It is impossible to prevent crime and violent acts unless brain signals are studied and a specific
pattern derived from criminal thinking is discovered in real-time. Due to technological feasibility, it has
yet to be accomplished. (add more)

Using deep learning-based computer vision, we can now easily detect aggressive activity in public areas.
Most public sites and private institutions already have surveillance cameras installed. Effective violent
detection techniques can assist the government or authorities in taking a quick and systematic approach to
identifying violence and preventing the loss of human life and property. As human beings and a port of
society, we all desire to have secure streets, communities, and workplaces. Because it does not involve
any explicit feature engineering, deep learning outperforms machine learning. There are some
disadvantages, including high processing costs and large training datasets. These technological
considerations drive us to create a model that requires less training time and a smaller number of training
examples. Using deep learning methodologies, we offer approaches in our system that will be able to spot
violent threats and activities.
Previously, violent and non-violent activities were recognized using the presence of blood, degree of
motion, even characteristics of sound relating to violent activities. The surveillance cameras are not very
effective in recording sounds related to certain activities (Audio-visual content-based violent scene
characterization) [1]. On the other hand, frame-based video analysis is solely based on a sequence of
frames (that is, image) and not on audio. Violence can be categorized into many types, including one-to-
one person violence, crowd violence, family violence, sports violence, violence with guns, and many
more. One of the previous works was violence detection with C3D Convolutional Neural Network (3D-
CNN) for detecting violent scenes in a video stream. The 3D-CNN is a deep supervised learning approach
that learns spatiotemporal discriminant features from videos (sequence of image frames). In contrast to
2D convolutions, this approach operates 3D kernels on a series of image frames in their context producing
3D activation maps that capture both spatial and temporal features that could not be properly identified
with 2D convolutions. Three datasets were combined for this task: Hockey Fight, Movies, and Crowd
Violence[2]. They were able to get an accuracy of 84.428% at the 36th training epoch[3]. Another
contribution was a work that uses the concept of convolutional neural networks (CNNs) and Google
Object Detection API and uses these two new developments in technology to retrain a pre-trained model
to perform weapon detection in real-time surveillance.
The aim of this project is to investigate the effect of training convolutional neural networks with one extra
class “non-weapon” based on two original classes “gun” and “knife”. The Inception model correctly
detected the knife as a knife and the phone as a non-weapon with 99% and 56% accuracy respectively.
To predict violence in the sequential flow of frames, we will utilize the Convolutional Neural Network
Bidirectional LSTM model (CNN-BiLSTM) architecture. To begin, we divide a video into numerous
frames. We pass each frame through a convolutional neural network, to extract the information present in
the current frame. Then, to recognize any sequential flow of events, we utilize a Bidirectional LSTM
layer to compare the information of the current frame once with the prior frames and once with the
upcoming frames. Finally, the classifier determines whether or not an action is violent.

After introducing our topic we will go directly to the methodology, where we will be discussing the way
and steps to implement our system. Then we will discuss the results of our work with qualitative and
quantitative data. In the end, we will discuss the contribution of our paper and future up-gradation
chances in the conclusion.

You might also like