Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Automated Real-Time Video Surveillance Algorithms

for SoC Implementation: A Survey


Ehab Salahat, Advisor: Hani Saleh, Co-Advisor: Baker Mohammad, Co-Advisor: Mahmoud Al-Qutayri, Co-Advisor:
Andrzej Sluzek and Co-Advisor: Mohammad Ismail
Department of Electrical Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi
E-mails: ehab.salahat@kustar.ac.ae, hani.saleh@kustar.ac.ae, baker.mohammad@kustar.ac.ae, Mahmoud.alqutayri@kustar.ac.ae,
andrzej.sluzek@kustar.ac.ae and mohammad.ismail@kustar.ac.ae

AbstractNumerous techniques and algorithms have been extract information from large scale real-time data. Hardware-
developed and implemented, primarily in software, for object based (or hardware-supported) implementation of key point
tracking, detection and recognition. A few attempts have been detection and description has been recently attracting interests
made to implement some of the algorithms in hardware. However, of the research community (e.g. see [2-4]). In this work, we
those attempts have not yielded optimal results in terms of
accuracy, power and memory requirements. The aim of this paper
investigate possible algorithms for object detection and
is to explore and investigate a number of possible algorithms for recognition. We also compare between these algorithms for
real-time video surveillance, revealing their various theories, automated real-time video surveillance SoC implementation
relationships, shortcomings, advantages and disadvantages, and based on many performance metrics. This is then used to
pointing out their unsolved problems of practical interest in establish the focus of our research in this field.
principled way, which would be of tremendous value to engineers The remaining part of this paper is structured as follows: in
and researchers trying to decide what algorithm among those section II, the investigated performance metrics and challenges
many in literature is most suitable to specific application and the are presented. In section III, the different candidate algorithms
particular real-time System-on-Chip (SoC) implementation.
are overviewed; highlighting their merits and demerits. Finally,
Index TermsReal-Time Video Surveillance; Maximally Stable the paper findings are summarized in section IV.
Extremal Regions; Scale-Invariant Feature Transform, Speeded
II. CHALLENGES AND PERFORMANCE METRICS
Up Robust Features; Background Subtraction; FPGA.
Performance is impacted by many variables, making
I. INTRODUCTION exhaustive testing of all use cases virtually impossible. A clear

V ISUAL surveillance of unknown and dynamically understanding of all these variables and their impact is crucial to
changing environments is one of the most challenging design a successful and meaningful test for the system. Key
applications of machine vision. This active research factors influencing performance of real-time object
area invariably requires high-performance computation detection/recognition system include, but not limited to the
resources due to the volume of data to be processed. Video points discussed below [5].
surveillance has a wide range of applications both in public and
private environments, such as homeland security, crime A. Camera and Constant Environment Parameters
prevention, traffic control, accident prediction and detection, The first challenge is mainly about physical installation of the
and monitoring patients, UAV-based surveillance, airports video stream. First, the view of the video streaming source, i.e.,
surveillance systems, etc). There is an increasing interest in the camera, is finite and limited by scene structures [6]. Other
video surveillance due to the growing availability of cheap camera parameters include the quality and properties of the
sensors and processors, and also a growing need for safety and video as generated by the camera (color, grey-scale, low-light,
security of the public [1]. Thus, in mobile and autonomous infrared, resolution, pixel depth, and frame-rate).
applications, energy efficiency and accuracy of the system are
crucial requirements. For speedy actions, the operations must be B. Variable Environment Parameter
performed in real-time speed. Therefore, advanced vision
These include illumination levels, reflections, weather
modules for field systems are often based on energy efficient
conditions (sun, cloud, rain, snow, fog, and wind), seasonal
embedded video processors supporting surveillance, tracking
changes, nuisance targets, lights, and shadows [5].
and other processing needs for various applications.
Inevitably, next generations of visual surveillance systems
C. Processing Environment Parameters
would incorporate more intelligent algorithms which could
automatically detect/identify objects from dynamically updated Factors such as limited memory, processing power and speed in
visual databases (including detection of unspecified objects embedded processing platforms like Digital Signal Processors
previously seen in the database images). Such systems would be (DSPs) and Field Programmable Gate Arrays (FPGAs) result in
effectively performing visual data matching and retrieval. algorithmic changes that can impact the performance of the
Researchers are need to develop intelligent systems to efficiently individual components and the whole system as well [5].

978-1-4799-2452-3/13/$31.00 2013 IEEE 82


D. Communication Parameters However, detection and extraction are computationally
Mainly, the bandwidth, synchronization, cameras coordination, demanding and therefore cant be used in systems with limited
compression noise and artifacts, potentially dropping frames and computational power [11].
transmission errors. IV. RESEARCH FOCUS
E. Application Parameters The proposed visual chip is needed because available
Application parameters include targets of interest (humans, designs are not targeting affine-invariant key point detectors and
vehicles, shopping carts, works of art, etc.), tolerable miss descriptors on a single chip. We aim to design a dedicated SoC
detection and false alarm rates and their desired trade off (e.g. using ASIC design with advance technology, implementing
whether high detection rate or low false alarm rate is more complete visual data processing mobile system, and employing
important), type of application (e.g. security, business, etc), the state of the art memory-efficient and power reduction and/or
maximum allowable latency, learning and self-adaptation [5]. saving designs. The need for these surveillance system includes
access control in security-sensitive locations such as military
III. CANDIDATE ALGORITHMS bases governmental, crowd flux statistics, congestion analysis
As indicated in the previous section, many challenges and and traffic flow and management, anomaly detection and
performance metrics affect surveillance systems. The choice of alarming to identify abnormal behavior etc. The proposed
the optimal algorithm can enhance the performance and help in SoC will replace the passive operator need, that all the preceding
resolving these challenges. Many object detection algorithms applications require, and promisingly will accomplish high
seems to be excellent candidates (e.g., Difference-of-Gaussians accuracy without human-interaction with this system.
(DoG), Maximally Stable Extremal Regions (MSER), Fully
Affine Invariant Feature Detector (FIAF), Scale Invariant V. CONCLUSION
Feature Transform (SIFT), Speeded Up Robust Features Real-time video surveillance is an active research area with a
(SURF), Background Subtraction, etc)[6-8], differing in their wide spectrum of promising applications. In this short paper,
capabilities and requirements. Some potential algorithms for we presented a brief overview of some potential algorithms for
real-time video surveillance SoC designs are briefly introduced. object recognition, and compared their pros and cons.
Motivated by the limitations of the current system designs, we
A. Background Subtraction
aim to design a mobile power-efficient real-time SoC for video
Background subtraction is widely used for detecting moving surveillance to resolve these limitations.
objects from static cameras. By estimating the background, it
can then subtract it from the input frame, by applying some REFERENCES
threshold value, to get the foreground, i.e., the object. Different [1] X. Wang, Intelligent multi-camera video surveillance: A review,
techniques could be used to estimate the background, the Pattern Recognition Letters, vol. 34, no. 1, 2013.
simplest assumes the background to be the previous frame, and [2] E. S. Kim and H.-J. Lee, "A Practical Hardware Design for the Keypoint
another possibility is to apply a mean/median filter for the last N Detection in the SIFT Algorithm with a Reduced Memory Requirement,"
IEEE Int. Symp.on Circuits and Systems ISCAS 2012., pp.770-773, May
frames, and assuming the background to be the result. This 2012.
algorithm is adaptive to dynamic background changes, easy to [3] V. Bonato, E. Marques, and G. A. Constantinides, A Parallel Hardware
implement and fast and applicable for real-time implementation, Architecture for Scale and Rotation Invariant Feature Detection, IEEE
as in [6-7]. However, the drawbacks are its dependency on the Trans. Circuits Syst. Video Technol., vol. 18, no. 12, pp. 1703-1712, Dec.
object speed, frame rate, huge memory, and most importantly, 2008.
the threshold used is neither global nor time-invariant. [4] Qi Zhang, Huron Chen, Yimin Zhang, Yinlong Xu, "SIFT
implementation and optimization for multi-core systems," IEEE Symp.
B. Maximally Stable Extremal Regions IPDPS 2008, pp.1-8, April 2008.
[5] P. L. Venetianer and H. L. Deng, Performance evaluation of an
The MSER algorithm is an interest region detector originally intelligent video surveillance systemA case study, Comput. Vis. Image
used in wide-baseline stereo matching [8]. MSER operates on Understanding, vol. 114, no. 11, pp. 12921302, 2010.
the input image directly without any smoothing, which results [6] A. McIvor, Background subtraction techniques, in Proc. of Image and
in detection of both fine and coarse structures. Its shown in [9] Vision Computing, New Zealand, November, 2000.
MSER performs well compared to other local detectors. The [7] M. Piccardi, Background subtraction techniques: a review, in
Proceedings of the IEEE International Conference on Systems, Man and
main advantages of the MSER detection is that its the fastest Cybernetics, vol. 4, (The Hague, The Netherlands), October 2004.
affine invariant region detector. To the best of our knowledge, [8] J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust Wide Baseline
the only drawback of the MSER is that its performance Stereo from Maximally Stable Extremal Regions, Proc. 13th British
degrades with blurred images, which can be resolved using Machine Vision Conf., pp. 384-393, 2002.
smart installation of the camera topology [8-10]. [9] K. Mikolajczyk and C. Schmid, A performance evaluation of local
descriptors, PAMI, IEEE Transactions on Pattern Analysis and Machine
C. Speeded Up Robust Features Intellegence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005.
SURF is a scale- and rotation-invariant interest point detector [10] F. Kristensen, W. J. MacLean,Real-Time Extraction of Maximally
and descriptor [11]. The algorithm extracts salient points from Stable Extremal Regions on an FPGA, 2007 IEEE Int. Symp.on Circuits
and Systems ISCAS, pp 165-168, May, 2007.
the image and computes descriptors of their surroundings that
[11] H. Bay, T. Tuytelaars, and L. J. Van Gool. SURF: Speeded Up Robust
are invariant to scale, rotation and illumination changes. Features. In ECCV, pages 404417, 2006.

83

You might also like