Smart Video Surveillance System Using Priority Filtering

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Smart Video Surveillance System using

Priority Filtering

Subject: RESEARCH METHODOLOGY (FST)


Section: E
Group: 3

Takitazwar, Parthib
September 2020

1 Introduction
Video Surveillance involves the act of observing a limited space by using a video
camera. In a Video Surveillance System, footage of any period of time can be
retrieved by giving time as an input. Closed-Circuit television are used in both
public and private areas for capturing footage of any area.It is mainly used for
protection,safety and monitoring operation. The recorded videos are normally
saved in local storage or rented cloud services. High resolution videos requires
very high storage.
Surveillance cameras capture video at 30 frames per second and size of each
frames consumes 100 Kilobytes on average. So the storage needed for only 1
second is almost (30 x 100)/1000 = 3 Megabytes. Current technology supports
up to 8K (7680 X 4320) resolution video recording. In general cases, even after
compressing the video with highly sophisticated codec, the video still consumes
30 GB of storage each day.However, most of the adjacent frames are exactly
same. Because environment change in a second is relatively low. These duplicate
frames are wasting storage without any benefits.
The current storage system is still quiet expensive since 1 TB of hard disk is
needed every month to store the video locally. Cheap low rpm hard disk drive are
the vital source for local storage which cost $ 30.00 per month for 1 TB. On the
other hand, Popular cloud services like Google Drive generally set their prices
$ 9.99 a month for 2TB and $ 99.99 per month for 10TB. For long term storage
solution, the available options are relatively expensive. Reducing the required
storage can decreases the expense which can make the system affordable to a
mass population. An smart surveillance system with decision capability may
able to fix some existing security issues.
The research question may arise from this research are:
RQ1. What action causes wastage of storage?

1
RQ2. What are the limitation of current technology?
RQ3. How videos can be filtered and stored based on priority?
RQ4. How feasible the system is based on current technology?
The video surveillance is becoming essential for many areas in recent years. To
solve its storage issue, various researches are done on it. RQ1 and RQ2 let
us explore the area . With RQ3 and RQ4, attempts are made to justify our
approach.
The rest of this paper is organized as follows. Section 2 summarizes re-
lated works of video data storage system with respect to RQ2. A priority based
surveillance video storage system is shown in Section 3. In particular, we illus-
trate the architecture of this system and explain how the problem can be solved.
It helps to demonstrate RQ3. In Section 4, performance evaluation is done with
for RQ4. The paper is concluded in Section 5.

2 Related Work
2.1 Background Subtraction
In the present work the concept of frame difference is used in many work. Object
tracking are used to focus on different object while keeping the background
static [3]. In some other works, background subtraction is used based on raw
frame input [4]. Certain threshold is set which changes dynamically reacting to
change of lighting [5]. Retrieving video is crucial as well.A key-frame index is
used in lookup operations while querying [1]. This method can greatly reduce
the amount of video data reading and effectively improves the query’s efficiency.

2.2 Tracking
Intelligence Visual Surveillance (IVS) combines all the techniques mentioned
above with filtering such as Kalman filter and particle filtering. The Kalman
filter is a recursive two-stage filter. It mainly estimates and update . The
prediction is based on current location of the moving object is or the location
the previous observation made. It also contains edge based feature and color
based features.[2] Tracking helps to focus on object meaning pixels become more
accurate around the object.

2.3 Compression Filtering


Compression filtering similar to bottleneck architecture are used to decrease
the workloads. The method reduce huge power requirement with a trade-off of
small accuracy percentage.Compression and decompression completes based on
projection shortcuts. Encoding techniques like h.265[11] are used to reduce the
video size. These algorithms are pretty effective in solving storage problem.

2
2.4 Frame Filtering
In view of rapid variations on both camera and target under dynamic envi-
ronments, the target information is not enough for accurate object detection.
These algorithm does not perform well in dynamic backgrounds [10]. Cheap
alternative can be suggested as frames will be compared to the side of the cam-
era (a chip consisting of the algorithm to compare the frames will be integrated
into the camera) and if the frames are not similar, they will be stored in the
secondary storage device. This approach will optimize the storage while main-
taining the information as well as quality of the video clip. Some method also
uses absolute difference of gray-scaled images with a noise variance. It supports
IP camera technology where the video can be streamed through any android
device [7]. But these method is vulnerable to adapting new environment. It
only compares the values with recent changes so a large change in background
will require it to recompute everything with new background.

2.5 Cloud Storage


Some research shows comparison of various Software as a Service (SaaS), Plat-
form as a Service (PaaS), and Infrastructure as a Service (IaaS) cloud computing
provides what is possible to architect a VMS (video surveillance management
system) using cloud technologies[9] . But It may need legal support as well as
emerging threats and countermeasures associated with using cloud technologies
for a video surveillance management system. Also using just one vendor for
storing becomes more expensive after crossing certain amount of storage.
No method here mentions priority based filtering, where the footage of only
important events are stored. In short term surveillance system, the videos are
erased after 2-3 weeks.Discarding unnecessary events with Priority based filter-
ing can increase this efficiency and can extend the lifespan of the system.

3 Overview of Proposed Framework


3.1 Object Detection
The proposed framework is divided into two layers: Object Detection layer
and Background Subtraction layers. The object detection layer mainly focuses
on prioritizing the frames. Captured frame from camera will go through pre-
trained object detection model.Applying transfer learning will help the system to
use previously trained model. Mobile neural networks performs much better in
these scenario as it need less computational power to begin with.The pre-trained
model will use the weights from a previous batch learning training session.From
the equation(1), The output ’M’ will generate a set of objects. Frame will be
skipped after a certain interval of time. Adjacent frames are less like to be
necessary. Because it may contain similar data of its previous frame.

M = {a, b, c, ...} (1)

3
Figure 1: System Overview

3.2 Sentiment Analysis


Sentiment Analysis will be use as well to value the current output. Pre-trained
bags of word model can be used to determine due being so light-weight. While
training the word model, security issue must be prioritized. So words such as:
Gun, Machete, Explosives will weight more compared to general words. The
more harmless the word is the less it will weight. The output given from the
object detection model will be valuated with the word model. A mean value
needs to be generated from feedback of sentiment analysis model which is called
frame value. The frame value has to exceed a certain threshold. The threshold
can be set based on requirements. Highly secured area may need low threshold.
But a trade-off is made between the threshold and the storage. Low threshold
will create need for higher storage and vice-versa. If threshold is reached, the
frame will go to next layer. Otherwise it will be discarded. In equation (2),
Frame value F is measured from the feedback SA(M) and number of objects ’n’
in the output.
F = SA(M )/n (2)

3.3 Optimization
Background Subtraction[2] layer provide optimization for the frame. It can be
used to stop frame duplication. It is classified with two variety. First type
consider the first frame as the key frame. On the other hand, the second type
consider average of certain amount of frames to be the key frame. The working
mechanism consist subtraction of every frame with key frame initialized before.
The background subtraction method splits the video frames into foreground and
background object, where the foreground object is calculated by matching the
current frames C(a, b) with the background image K(a, b). If we assign the key
frame as K(a,b) and the current frame as C(a,b), then the equation used in,
(
1, if K(a, b) − C(a, b) > threshold
N (a, b) = (3)
0, otherwise

where N(a,b) is the foreground pixel. As mentioned before, threshold can be set
as requirement. The key frame will be selected after some certain time interval.

4
Key frame selected with average method performs far better with cost of little
computational power. Still, The method is quiet efficient. Finally, encoding
techniques are used to compress the video into more smaller size. This will
complete the optimaztion phase.

4 Discussion
Implementation of object detection phase must be attained with power efficient
model. Traditional Convolutions Neural Networks requires huge amount of com-
putational power. Light-weight network models such as YOLO (You Only Look
Once), SSD (Single Shot Multi-Box Detector) can be used with low powered
device. The figure 2[13] shows the performance of some light-weight network
models using real-time object detection. PASCAL VOC dataset was used for
the accuracy test. By comparing mAP(Mean Average Precision) of the models,
Fast R-CNN provides the highest accuracy. But Fast YOLO is the fastest model
on record. A balanced trade off of speed and accuracy can be achieved with
YOLO which has 63.4 mAP and 45fps (On GPU).

Figure 2: Performance Comparison

The implementation of proposed system requires very light weight machines.


Devices build on ARM instruction based CPU such as Raspberry Pi 4 can
be ideal machine for object detection. A 1080p (1920x1080) resolution Log-
itech camera was used to capture the video footage. Tensorflow lite is a great
lightweight framework tool which works smoothly on such Linux based device.
As shown in Figure 3, The setup can run a implementation of YOLO which can
provide 4 FPS on average with great accuracy. The speed(fps) can be drasti-

5
cally increased with an addition of Edge TPU. Coral USB Accelerator was used
as the TPU(Tensor Processing Unit). The FPS using TPU increased almost
8.5 times. Applying Frame-skipping technique may reduce the load to further
which will help to increase the accuracy.

Figure 3: Realtime Object Detection

In the second phase, Enhanced Bag of words model (SAOOP)[12] can be used
for identify threat level of an object. The model has to be pre-trained before
hand with specific requirement. Crime datasets such as Crime in Vancouver,
Ontario Crime Statistics can be applied on model to increase precision. As
Figure 4 [12] show, using SAOOP increases all three performance measurement.
The model is a light-weight model. Detected labels in object detection phase
will determine the frame value. will By determining the threat level with certain
threshold, the frames can be kept or discarded. Threshold will be lowered in
High security required area. This will store more frames and more footage.
But on areas which are quiet secured, the threshold value will get higher. So
the system can be tuned to set it for any specific position. The third phase
of the research contain optimization. It can easily be achieved with matrix
multiplication mentioned in the overview section. Finally, codec such as h.265
will be used to further compress the video. The footage will reach at optimal
state with very low required storage.

Figure 4: Performance Comparison

The purpose of the research was to create a priority based filtering with a

6
low cost system. The device which will be needed to implement this system is
Raspberry Pi 4 (4 GB) which only cost about $59 USD each. In addition, a
TPU can be gain some extra performance. TPU like Coral USB Accelerator
costs only $59 USD. At the initial purchase of any surveillance system, adding
extra $118 USD can lead to a decent storage saving option. Since, the value of
threshold can be changed anytime, the buyers can determine any type of saving
plan. The system will work great for a short-term surveillance system. This
kind of systems remove certain amount of data after the storage fills. But the
proposed system, more and more data can be stored based on priority. So the
surveillance system can store data for much longer. Long-Term Surveillance
Systems can also benefit from the proposed system. Saving feature may reduce
the number of storage bought each year. So judging from the analysis, the
system is very feasible with current technology. It is also cost effective and with
proper entrepreneurship, it can become available to mass population.

5 Conclusion
The invention of surveillance system opened up so many possibilities where
machine can monitor without break unlike humans. But the storage problem has
always been there. This field is a active research and many techniques has been
applied to stop this wastage. In this paper, we tried to create a system, where
the machine will intelligently system. First we discussed about how storage
is wasted on video surveillance system due to frame duplication. Generally,
Change in adjacent frames are marginal. Our objective was to filter out the
unnecessary frames. We succeeded in removing those with Object Detection
and Sentiment Analysis. We tuned the training set of sentiment analysis with
certain dataset to get our required outcome. At last, in the optimization process,
the frame duplication from the selected frames were removed. The video were
encoded and then stored on the secondary device.
Finally, we discussed how we proposed a system which will detect focused
objects in front of camera. Then it filters the frame according the type of
objects. The paper showed, changing threshold value can let user create their
own saving plan. The system is flexible because it can be tuned based on the
requirement.The cost analysis proves that this is a very cost effective solution.
Also individual performance of each proposed method was tested. This proves
the system to be very feasible and can be made attainable for mass population.
Future work: Due to time limitation, the complete implementation of the
system was not possible. A proper implementation in future can provide us the
efficiency of the system. Also, combining it with other related work can provide
performance gain as well.With the system framework, any neural network can
be used in future for object detection.

7
6 Reference
[1] M. Shengcheng et al.,”A Retrieval Optimized Surveillance Video Storage
System for Campus Application Scenarios”,Journal of Electrical and Computer
Engineering,vol. 2018, Apr. 2018., Accessed on: Sep. 05, 2020. [Online].
[2] M. R. Sunitha et al.,”A Survey on Moving Object Detection and Tracking
Techniques”,International Journal Of Engineering And Computer Science,vol.
5,no. 5,pp. 16376-16382, May. 2016., Available DOI: 10.18535/ijecs/v5i5.11
[3] K. R. Jadav et al.,”Vision based moving object detection and tracking”,2011.
[4] S. S. Kshirsagar and P. A. Ghonge,”Movement Detection Using Image Pro-
cessing”,International Journal of Science and Research,vol. 4,no. 5,Feb. 2015.
[5] K. Mahadar et al., ”Intelligent system for storage optimization of video
surveillance using in-camera processing”,International Journal of Advance Re-
search in Science and Engineering,vol. 6,no. 12,pp. 1245-1250,Dec. 2017.
[6] J. Barthélemy et al.,”Edge-Computing Video Analytics for Real-Time Traf-
fic Monitoring in a Smart City”,Sensors,vol. 19,no. 9,pp. 2048-2071,May
2019,Available DOI: doi:10.3390/s19092048
[7] A.N. Ranasinghe and S.R. Liyanage,”Eliminating the storage wastage of
CCTV cameras by motion detection”,International Postgraduate Research Con-
ference 2015,University of Kelaniya, December, 2015.
[8] D.Suganya et al.,”Storage Optimization of Video Surveillance from CCTV
Camera”,International Research Journal of Engineering and Technology,vol.
5,no. 3,pp. 1356-1359,Mar. 2018.
[9]DJ Neal and S. Rahman,”VIDEO SURVEILLANCE IN THE CLOUD”,International
Journal on Cryptography and Information Security,vol. 2,no. 3,Sep. 2012.,
Available DOI: 10.5121/ijcis.2012.2301
[10] K. Kalirajan1 and M. Sudha,”Moving Object Detection for Video Surveil-
lance ”,Journal of Electrical and Computer Engineering,vol. 2015, Mar. 2015.,
Accessed on: Sep. 07, 2020. [Online]., Available DOI: https://doi.org/10.1155/2015/907469
[11] P. Bordes and G. Clare,”An overview of the emerging HEVC standard”,International
Symposium on signal, Image, Video and Communications,Valenciennes,Jul. 2012,
Avaliable DOI: 10.13140/RG.2.1.3482.5762
[12] D. M. El-Din, ”Enhancement Bag-of-Words Model for Solving the Chal-
lenges of Sentiment Analysis”, International Journal of Advanced Computer
Science and Applications,vol. 7,no. 1,pp. 244-252,Jan. 2016.
[13] J. Redmon, ”You Only Look Once:Unified, Real-Time Object Detection”,2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016,pp.
779-788

You might also like