Professional Documents
Culture Documents
Smart Video Surveillance System Using Priority Filtering
Smart Video Surveillance System Using Priority Filtering
Smart Video Surveillance System Using Priority Filtering
Priority Filtering
Takitazwar, Parthib
September 2020
1 Introduction
Video Surveillance involves the act of observing a limited space by using a video
camera. In a Video Surveillance System, footage of any period of time can be
retrieved by giving time as an input. Closed-Circuit television are used in both
public and private areas for capturing footage of any area.It is mainly used for
protection,safety and monitoring operation. The recorded videos are normally
saved in local storage or rented cloud services. High resolution videos requires
very high storage.
Surveillance cameras capture video at 30 frames per second and size of each
frames consumes 100 Kilobytes on average. So the storage needed for only 1
second is almost (30 x 100)/1000 = 3 Megabytes. Current technology supports
up to 8K (7680 X 4320) resolution video recording. In general cases, even after
compressing the video with highly sophisticated codec, the video still consumes
30 GB of storage each day.However, most of the adjacent frames are exactly
same. Because environment change in a second is relatively low. These duplicate
frames are wasting storage without any benefits.
The current storage system is still quiet expensive since 1 TB of hard disk is
needed every month to store the video locally. Cheap low rpm hard disk drive are
the vital source for local storage which cost $ 30.00 per month for 1 TB. On the
other hand, Popular cloud services like Google Drive generally set their prices
$ 9.99 a month for 2TB and $ 99.99 per month for 10TB. For long term storage
solution, the available options are relatively expensive. Reducing the required
storage can decreases the expense which can make the system affordable to a
mass population. An smart surveillance system with decision capability may
able to fix some existing security issues.
The research question may arise from this research are:
RQ1. What action causes wastage of storage?
1
RQ2. What are the limitation of current technology?
RQ3. How videos can be filtered and stored based on priority?
RQ4. How feasible the system is based on current technology?
The video surveillance is becoming essential for many areas in recent years. To
solve its storage issue, various researches are done on it. RQ1 and RQ2 let
us explore the area . With RQ3 and RQ4, attempts are made to justify our
approach.
The rest of this paper is organized as follows. Section 2 summarizes re-
lated works of video data storage system with respect to RQ2. A priority based
surveillance video storage system is shown in Section 3. In particular, we illus-
trate the architecture of this system and explain how the problem can be solved.
It helps to demonstrate RQ3. In Section 4, performance evaluation is done with
for RQ4. The paper is concluded in Section 5.
2 Related Work
2.1 Background Subtraction
In the present work the concept of frame difference is used in many work. Object
tracking are used to focus on different object while keeping the background
static [3]. In some other works, background subtraction is used based on raw
frame input [4]. Certain threshold is set which changes dynamically reacting to
change of lighting [5]. Retrieving video is crucial as well.A key-frame index is
used in lookup operations while querying [1]. This method can greatly reduce
the amount of video data reading and effectively improves the query’s efficiency.
2.2 Tracking
Intelligence Visual Surveillance (IVS) combines all the techniques mentioned
above with filtering such as Kalman filter and particle filtering. The Kalman
filter is a recursive two-stage filter. It mainly estimates and update . The
prediction is based on current location of the moving object is or the location
the previous observation made. It also contains edge based feature and color
based features.[2] Tracking helps to focus on object meaning pixels become more
accurate around the object.
2
2.4 Frame Filtering
In view of rapid variations on both camera and target under dynamic envi-
ronments, the target information is not enough for accurate object detection.
These algorithm does not perform well in dynamic backgrounds [10]. Cheap
alternative can be suggested as frames will be compared to the side of the cam-
era (a chip consisting of the algorithm to compare the frames will be integrated
into the camera) and if the frames are not similar, they will be stored in the
secondary storage device. This approach will optimize the storage while main-
taining the information as well as quality of the video clip. Some method also
uses absolute difference of gray-scaled images with a noise variance. It supports
IP camera technology where the video can be streamed through any android
device [7]. But these method is vulnerable to adapting new environment. It
only compares the values with recent changes so a large change in background
will require it to recompute everything with new background.
3
Figure 1: System Overview
3.3 Optimization
Background Subtraction[2] layer provide optimization for the frame. It can be
used to stop frame duplication. It is classified with two variety. First type
consider the first frame as the key frame. On the other hand, the second type
consider average of certain amount of frames to be the key frame. The working
mechanism consist subtraction of every frame with key frame initialized before.
The background subtraction method splits the video frames into foreground and
background object, where the foreground object is calculated by matching the
current frames C(a, b) with the background image K(a, b). If we assign the key
frame as K(a,b) and the current frame as C(a,b), then the equation used in,
(
1, if K(a, b) − C(a, b) > threshold
N (a, b) = (3)
0, otherwise
where N(a,b) is the foreground pixel. As mentioned before, threshold can be set
as requirement. The key frame will be selected after some certain time interval.
4
Key frame selected with average method performs far better with cost of little
computational power. Still, The method is quiet efficient. Finally, encoding
techniques are used to compress the video into more smaller size. This will
complete the optimaztion phase.
4 Discussion
Implementation of object detection phase must be attained with power efficient
model. Traditional Convolutions Neural Networks requires huge amount of com-
putational power. Light-weight network models such as YOLO (You Only Look
Once), SSD (Single Shot Multi-Box Detector) can be used with low powered
device. The figure 2[13] shows the performance of some light-weight network
models using real-time object detection. PASCAL VOC dataset was used for
the accuracy test. By comparing mAP(Mean Average Precision) of the models,
Fast R-CNN provides the highest accuracy. But Fast YOLO is the fastest model
on record. A balanced trade off of speed and accuracy can be achieved with
YOLO which has 63.4 mAP and 45fps (On GPU).
5
cally increased with an addition of Edge TPU. Coral USB Accelerator was used
as the TPU(Tensor Processing Unit). The FPS using TPU increased almost
8.5 times. Applying Frame-skipping technique may reduce the load to further
which will help to increase the accuracy.
In the second phase, Enhanced Bag of words model (SAOOP)[12] can be used
for identify threat level of an object. The model has to be pre-trained before
hand with specific requirement. Crime datasets such as Crime in Vancouver,
Ontario Crime Statistics can be applied on model to increase precision. As
Figure 4 [12] show, using SAOOP increases all three performance measurement.
The model is a light-weight model. Detected labels in object detection phase
will determine the frame value. will By determining the threat level with certain
threshold, the frames can be kept or discarded. Threshold will be lowered in
High security required area. This will store more frames and more footage.
But on areas which are quiet secured, the threshold value will get higher. So
the system can be tuned to set it for any specific position. The third phase
of the research contain optimization. It can easily be achieved with matrix
multiplication mentioned in the overview section. Finally, codec such as h.265
will be used to further compress the video. The footage will reach at optimal
state with very low required storage.
The purpose of the research was to create a priority based filtering with a
6
low cost system. The device which will be needed to implement this system is
Raspberry Pi 4 (4 GB) which only cost about $59 USD each. In addition, a
TPU can be gain some extra performance. TPU like Coral USB Accelerator
costs only $59 USD. At the initial purchase of any surveillance system, adding
extra $118 USD can lead to a decent storage saving option. Since, the value of
threshold can be changed anytime, the buyers can determine any type of saving
plan. The system will work great for a short-term surveillance system. This
kind of systems remove certain amount of data after the storage fills. But the
proposed system, more and more data can be stored based on priority. So the
surveillance system can store data for much longer. Long-Term Surveillance
Systems can also benefit from the proposed system. Saving feature may reduce
the number of storage bought each year. So judging from the analysis, the
system is very feasible with current technology. It is also cost effective and with
proper entrepreneurship, it can become available to mass population.
5 Conclusion
The invention of surveillance system opened up so many possibilities where
machine can monitor without break unlike humans. But the storage problem has
always been there. This field is a active research and many techniques has been
applied to stop this wastage. In this paper, we tried to create a system, where
the machine will intelligently system. First we discussed about how storage
is wasted on video surveillance system due to frame duplication. Generally,
Change in adjacent frames are marginal. Our objective was to filter out the
unnecessary frames. We succeeded in removing those with Object Detection
and Sentiment Analysis. We tuned the training set of sentiment analysis with
certain dataset to get our required outcome. At last, in the optimization process,
the frame duplication from the selected frames were removed. The video were
encoded and then stored on the secondary device.
Finally, we discussed how we proposed a system which will detect focused
objects in front of camera. Then it filters the frame according the type of
objects. The paper showed, changing threshold value can let user create their
own saving plan. The system is flexible because it can be tuned based on the
requirement.The cost analysis proves that this is a very cost effective solution.
Also individual performance of each proposed method was tested. This proves
the system to be very feasible and can be made attainable for mass population.
Future work: Due to time limitation, the complete implementation of the
system was not possible. A proper implementation in future can provide us the
efficiency of the system. Also, combining it with other related work can provide
performance gain as well.With the system framework, any neural network can
be used in future for object detection.
7
6 Reference
[1] M. Shengcheng et al.,”A Retrieval Optimized Surveillance Video Storage
System for Campus Application Scenarios”,Journal of Electrical and Computer
Engineering,vol. 2018, Apr. 2018., Accessed on: Sep. 05, 2020. [Online].
[2] M. R. Sunitha et al.,”A Survey on Moving Object Detection and Tracking
Techniques”,International Journal Of Engineering And Computer Science,vol.
5,no. 5,pp. 16376-16382, May. 2016., Available DOI: 10.18535/ijecs/v5i5.11
[3] K. R. Jadav et al.,”Vision based moving object detection and tracking”,2011.
[4] S. S. Kshirsagar and P. A. Ghonge,”Movement Detection Using Image Pro-
cessing”,International Journal of Science and Research,vol. 4,no. 5,Feb. 2015.
[5] K. Mahadar et al., ”Intelligent system for storage optimization of video
surveillance using in-camera processing”,International Journal of Advance Re-
search in Science and Engineering,vol. 6,no. 12,pp. 1245-1250,Dec. 2017.
[6] J. Barthélemy et al.,”Edge-Computing Video Analytics for Real-Time Traf-
fic Monitoring in a Smart City”,Sensors,vol. 19,no. 9,pp. 2048-2071,May
2019,Available DOI: doi:10.3390/s19092048
[7] A.N. Ranasinghe and S.R. Liyanage,”Eliminating the storage wastage of
CCTV cameras by motion detection”,International Postgraduate Research Con-
ference 2015,University of Kelaniya, December, 2015.
[8] D.Suganya et al.,”Storage Optimization of Video Surveillance from CCTV
Camera”,International Research Journal of Engineering and Technology,vol.
5,no. 3,pp. 1356-1359,Mar. 2018.
[9]DJ Neal and S. Rahman,”VIDEO SURVEILLANCE IN THE CLOUD”,International
Journal on Cryptography and Information Security,vol. 2,no. 3,Sep. 2012.,
Available DOI: 10.5121/ijcis.2012.2301
[10] K. Kalirajan1 and M. Sudha,”Moving Object Detection for Video Surveil-
lance ”,Journal of Electrical and Computer Engineering,vol. 2015, Mar. 2015.,
Accessed on: Sep. 07, 2020. [Online]., Available DOI: https://doi.org/10.1155/2015/907469
[11] P. Bordes and G. Clare,”An overview of the emerging HEVC standard”,International
Symposium on signal, Image, Video and Communications,Valenciennes,Jul. 2012,
Avaliable DOI: 10.13140/RG.2.1.3482.5762
[12] D. M. El-Din, ”Enhancement Bag-of-Words Model for Solving the Chal-
lenges of Sentiment Analysis”, International Journal of Advanced Computer
Science and Applications,vol. 7,no. 1,pp. 244-252,Jan. 2016.
[13] J. Redmon, ”You Only Look Once:Unified, Real-Time Object Detection”,2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016,pp.
779-788