Professional Documents
Culture Documents
Heterogeneous Information Fusion and Visualization For A Large-Scale Intelligent Video Surveillance System
Heterogeneous Information Fusion and Visualization For A Large-Scale Intelligent Video Surveillance System
Heterogeneous Information Fusion and Visualization For A Large-Scale Intelligent Video Surveillance System
net/publication/297676413
CITATIONS READS
60 1,132
3 authors, including:
Some of the authors of this publication are also working on these related projects:
Deep Learning for End-To-End Person Detection, Tracking and Re-identification across Cameras View project
All content following this page was uploaded by Yuan-Kai Wang on 15 May 2018.
topics in 3GSS, such as video analytics tasks are not proposed methodology into in a large-scale deployable system. The
in this paper and are referred to [7-16]. This paper gives more number of user interfaces increases linearly and the workload
attention to the proposed event-driven visualization and data of the operator increases dramatically when the number of
fusion. detectors increases. Listing all events in an event inbox [23] is
The remainder of this paper is organized as follows. Section beneficial, but text information only is deficient for responsive
II discusses background information and reviews related work decision and nonverbal information such as keyframes and
on large-scale intelligent surveillance systems. Section III video clips is important.
introduces the data fusion and event-driven visualization design. Video summarization such as keyframe and skim provides a
In Section IV, the details of the remaining system components compact representation by eliminating redundancy of videos
are presented, including sensor tasking and communication and preserving only crucial frames for better visualization of
management. Section V describes a system testbed operated in stored videos. Keyframes are more widely used than video
a campus, and a simulated experiment for the display switching skimming because of the ease of browsing and navigating [24].
arbitration. Finally, conclusion and suggestions for future work A fundamental approach of keyframe extraction considers only
are provided in Section VI. the changes of pixels and/or features in a frame [25]–[27]. A
more compact keyframe representation achieved by
II. STATE-OF-THE-ART incorporating object information is proposed by Erol and
The study of large-scale systems for wide-area video Kossentini [28] and Kim and Hwang [29]. Studies [7] and [30]–
monitoring is briefly reviewed and discussed. A substantial [31] further demonstrate that the object information with the
number of studies have been devoted to video analytics, high-level features based on human visual perception improves
cooperative surveillance, and software architecture in the the optimization of keyframe extraction. In addition to these
context of multi-camera surveillance systems; however, a methods for summarization of single videos, [32] and [33] are
modicum of the systems explored the fusion and visualization more appropriate for large-scale systems because they extract
problem. keyframes from multiple videos acquired from multiple
Intelligent video surveillance, which applies computer vision cameras with different viewpoints. Highly compact
algorithms to detect and recognize objects and events for representation is achieved by removing more redundant frames.
specific prevention and forensic tasks [17], aims at active However, pixel and frame changes cannot be applied for the
monitoring. A visualization scheme that effectively displays multicamera keyframe extraction, but object and semantic
videos and events can simplify human operation and reduce features become more critical for extracting meaningful
manual tasks. Matrix arrangement of video displays to show keyframes from multiple videos. Both centralized [32][33] and
objects and events corresponding to each single camera has decentralized [34][35] networking approaches have been
been widely used as a traditional visualization scheme. proposed. A global optimization mechanism not only adapting
However, the one-camera-one-monitor methodology is not to networking approaches but also incorporating semantic
feasible for a large-scale system with a high number of cameras features is important for video summarization in large-scale
[18]. An alternate approach employs a large screen to show all systems.
camera views in turn or randomly; however, lack of semantic Live video display within a limited visual space for a high
information such as spatial and temporal relationship among number of cameras is also challenging. A hand-off approach for
objects and events, and importance of camera views, incurs finding the next most meaningful camera is indispensable for
heavy overload for the management of the systems. Displaying focusing visual attention on a specific object. Some studies [23],
tremendous information within a limited visual space cannot be [36]–[38] extract high-level semantic features for selecting a
a trivial task. dominant camera. The dynamic camera switching method
Object- and event-oriented interface designs are efficient proposed in [23] identifies whichever camera exhibiting the
ways for information visualization of large-scale systems. In greatest image difference as the main sensor view. Huang and
[19] and [20], only information of specific objects such as Kim et al. [37] propose a probabilistic camera hand-off (PCH)
locations and tracks of objects are visualized for sustaining method for continuous object tracking. The ratio of foreground
attention. They offer only one interface by locating the object blocks and the ratio of angle distance between the camera and
information across multiple cameras on floor plan map for the object are calculated for each camera to obtain the
assisting navigation. [21] provides predictive selection of proximity probability. The camera with the highest proximity
camera views by object tracking to simplify operator's tasks. probability is identified as the dominant camera. The dominant
Nevertheless, methods [19]-[21] could be applied to camera can be identified using the tracking result of the
small-scale systems because they do not employ behavior aforementioned hand-off approach, and its object trajectory is
analysis and event detection for extraction of high-level represented in a map by using a homograph. Goshorn et al. [38]
information. Event-oriented interface provides less but propose a cluster-based multicamera surveillance network that
important information for large-scale systems, and facilitates generates a camera selection manager (CSM) as a cluster
responsive interface for critical threats. IBM S3 [22] integrates network. The CSM for the optimal view depends on weighted
diverse event detectors with one-detector-one-monitor importance and four semantic features.
Accepted by IEEE Transactions on Systems, Man, and Cybernetics: Systems 4
visualization agent, which is described in Section IV. The agent camera with the highest visualizability score, 𝐶𝑣 =
analyzes the synchronized event clip and extracts key frames in
arg max 𝑉 𝐶𝑖 , where 𝐶𝑖 is the ith camera.
real-time to promptly show the key frame in the event box when 𝐶𝑖
an event occurs. The synchronized event clip will be displayed The event term is modeled by a weighted indicator function
in the video streaming box accompanying with the event box, as
manually or automatically. 𝑉+ 𝐶A = 𝑃+ ∙ ∆ 𝐶A (2)
Space-time correspondence of multimodal information for a
single event is therefore effectively fused in this compact where ∆ . is 1 if an event is triggered in camera 𝐶𝑖 . 𝑃𝑒 is
interface. Time stamps of events and GPS coordinates of defined as the priority of the event.
cameras are two fundamental data for space-time The object term represents the score of a weighted sum of 𝐽
synchronization. Spatio-temporal hypergraph [40] could be object features, and is defined as
applied to organize and synchronize the multimodal H
Traditional way to display multiple events or all camera The parameter 𝑋𝑗 𝐶𝑖 is the jth element in the feature vector
videos is infeasible for 3GSS due to limited space of interface.
A smart display switching method is required to assist a single 𝑋 𝐶𝑖 , which is calculated by applying the local optimized
human operator to operate a multicamera surveillance system at process to camera 𝐶𝑖 . We perform keyframe extraction in a
a relatively high level of abstraction. An event-driven specific period as the local optimized process. The process is
visualization strategy with dynamic arbitration of display expressed as
switch is devised here. The display of our visualization
interface can automatically switch to the most meaningful
𝑋 𝐶A = 𝑋T 𝐶A |𝜖 = argmax 𝐹Y 𝑋W 𝐶A , 𝑤 , 𝑡 − 𝛿
camera views by using a visualizability model. W
(5)
Display switching arbitration aims at providing a dynamic <𝑡 <𝑡+𝛿
view of video playing. The display switching arbitration
approach consists of a local optimized process and a where the criterion function 𝐹𝐿 𝑋𝑡 𝐶𝑖 , 𝑤 =
decentralized optimized process. Local visualizability is 𝐽
𝑗=1 𝑤𝑗 ×𝑋𝑗 𝐶𝑖 can be considered as local object
signified by the most representative frame of an object in the
local video. In multiview videos, clustered visualizability is visualizability. The weighted sum of 𝑋𝑗 . is the representative
represented for the most representative frame that provides the score of the local camera.
optimal view. After assessing semantic significances of The proposed method considers three object features, such as
visualizabilities, we design two criterion functions by applying the region of object size, the region containing the object’s skin,
the visualizabilities to achieve the integration of meaningful and the face of the object, to calculate the presentation
representations. probability for selecting a camera. We employ a Kalman filter
Let us first consider a cluster of cameras 𝐂 = 𝐶% , 𝐶' , … , 𝐶) . to optimize all features to reduce the measurement noises in
For visualizability modeling, we express a scoring vector as a these object features.
linear combination of two individual terms:
IV. THE PROPOSED LARGE-SCALE INTELLIGENT VIDEO
𝑉 = 𝑉+ + 𝛾 ∙ 𝑉/01 . (1)
SURVEILLANCE SYSTEM
The event term 𝑉𝑒 encodes the contribution of events for The proposed large-scale IVS system has a complex
visualizability, and the object term 𝑉𝑜𝑏𝑗 the contribution of software and hardware architecture for use in a real
object features for visualizability. A penalty 𝛾, limited to the environment; the detailed architecture is shown in Fig. 4. The
range [0,1), is added to adjust the importance of object features main constituents of this system are four subsystems:
with respect to the behaviors in the monitoring area. The intelligent visualization, sensor tasking, communication, and
dominant camera 𝐶𝑣 is obtained by calculating the video streaming and storage.
visualizability of each camera, 𝑉 𝐶𝑖 , and then choosing the
Accepted by IEEE Transactions on Systems, Man, and Cybernetics: Systems 6
Each collaborative task in the cooperative surveillance group identification of real-time events. For streaming live video from
is a single task with more than two cameras and more than two existing IP cameras and streaming servers, a VLC plugin with
video analytics for an event. It is beneficial for a monitored area AJAX is used. With the rapid development of HTML5
wider than one camera view. A PTZ camera is always utilized technology, the HTML5 elements are effective for accessing
to complement a static camera. Video analytics of a static video clips and map services.
collaborative task are decentralized and performed on Cameras in the system are not restricted to networked and
embedded systems in PTZ cameras. Collaborative tasks are analog cameras, where the analog signal is digitized through a
reactive rather than proactive. video server that provides real-time real time streaming
Two collaborative tasks are implemented. In a loitering protocol (RTSP) streaming video to the back-end. To balance
detection task, a moving target in a wide open area is monitored the network traffic and monitoring needs, each stationary
through the automatic control of an embedded PTZ camera [15], camera owns at least two real-time streaming videos
combined with an intrusion detection algorithm with a static simultaneously. The two streaming videos may have different
camera for continuous tracking. An illegal parking task settings. One is a high-quality video encoded in the H.264
combines a car parking detection algorithm with a human face format with resolution as high as possible for providing
detection [16] for a dynamic scene. The car parking event forensic evidence in the storage subsystem, which is mounted
detected in an illegal position triggers the analytics in a PTZ to a large-scale storage area network with cloud storage.
camera to dynamically track a high-resolution human face, Another video has the QVGA resolution which is enough for
which is useful for forensics. video analysis. In addition to containing historical footage, in
the storage subsystem, a relational database management
C. Communication Subsystem
subsystem built on SQLite is served as event database, and
A publisher-subscriber communication pattern was video clips of marked events and video synopses corresponding
implemented to facilitate message delivery and enhance to events are recorded in the historical footage. If clients query
cooperation among the event agent, video acquisition and events, the native video clips can be cut from the event database
streaming agent, visualization agent, attention tasks, and Web and streamed from the video streaming server.
clients. The sender is called a topic publisher, and the queue
manager according to the registered publisher-subscriber table, V. TEST RUN AND CASE STUDY
who requires information from the appointed publisher, sends
The proposed system has been developed and installed in a
the information to all subscribers. Because the information may
campus with the Vision Based Intelligent Environment (VBIE)
be too abundant to transmitting, many queues are used for
project [41] in Taiwan. More than 20 cameras classified into
publishers by the queue manager. Three topics and five queues
two camera clusters were setup and distributed in two disjoint
are set for various publishers, and a topic can map to several
regions of the campus, as shown in Fig. 5. Each cluster has a
queues.
cooperative task with one collaborative PTZ camera.
When an event is triggered, the visualization agent publishes
Configuration of the camera networks cannot be trivial and a
information fused to the topic of visualization, which is handled
lot of approaches were applied in the proposed system for
by the alert queue and the short message service (SMS) queue.
different situations and constraints. The first thing to mention is
The alert queue delivers push notifications to Web browsers,
that both overlapping and nonoverlapping field-of-views (FoVs)
and the SMS queue sends an SMS notification. The keyframe
of cameras are utilized in this system. The four cameras for
extracted during the mission of summarization consists of
parking space counting [12] is configured to be overlapping
simple text messages and links that can be used to send SMS
FoV, and the cameras for other video analytics are configured
notifications to mobile clients; such links can be effective for
to be nonoverlapping FoV. Camera calibration for the parking
driving users to a Web site or download link. Compared with
space counting utilizes epipolar-plane constraint among
push notifications that are limited to smartphones, SMS support
cameras and feature points of the scene to obtain intrinsic and
is ubiquitous and available in all phones. Furthermore, a video
extrinsic parameters of cameras. Exact object positions in
streaming server publishes on the topic of video streaming. The
three-dimensional space can then be estimated for the accurate
video queue obtains videos and then embeds them into the main
counting of vacant spaces. A pair of PTZ and static cameras in
interface.
the cooperation surveillance tasks are collaborative to track the
D. Video Streaming and Storage Subsystem same object for an event, which also needs to be coordinated
The proposed system provides the output on the Web-based and calibrated. Object location is estimated first in the static
user interface and receives streaming videos from multimodal view and the PTZ camera is then controlled to capture a high
cameras. The Web interface facilitates systems used on resolution object image or to track the behavior of the object.
portable devices such as smartphones and smartpads. The user An assumption of depth information is applied in advance, so
interface is melded from static and dynamic data, where the the static camera sends only x and y coordinates to the PTZ
HTTP protocol is used for accessing static data such as camera to adjust pan and tilt parameters. Details of the camera
interface layout. The dynamic data achieve bidirectional calibration and configuration for parking space counting and
communication with a server through an AJAX or WebSocket cooperative surveillance tasks are referred to published papers.
protocol, facilitating live content transmission and the
Accepted by IEEE Transactions on Systems, Man, and Cybernetics: Systems 8
(a)
Fig. 5. Sketch of the two camera clusters of the system in a campus map. The
two red rectangles in the top map represent the two disjoint clusters, and red
dots are cameras. 3D views of the two corresponding camera clusters are
shown at the bottom.
(a)
Fig. 7. Six examples of 3D interface interacted with the 2D map in the center.
The darker areas in six 3D examples are 3D information from Google Earth,
and the lighter areas in 3D examples are blended live videos by homographic
transformation. (b)
C1
C2
C3
(a) (b)
C4
Top
view
(a)
77 128 189 237 249
VI. CONCLUSION
This paper presents a large-scale scalable system extended
from 3GSSs. The system integrates a lot of computer vision
tasks, such as object detection, camera anomaly detection,
keyframe extraction, and mobile surveillance, with the
knowledge acquired from over 10 years' experiences of
cooperation among academia, industry, and the government.
By integrating algorithmic tasks through the proposed
event-driven visualization and systems engineering techniques,
an efficient system for wide-area visual surveillance by single
(a)
security personnel is presented. The multitier scheme with a
novel visualization mechanism is proposed for centralizing all
heterogeneous surveillance information into a universal
Web-based user interface. A new camera selection method not
only considering objects but also events is developed with the
display switching arbitration. The method enables meaningful
FOV selection and smooth handoff from multiple cameras for
both normal and event-alerting and situations. The fuse of
multimodal information is advocated for event-driven
visualization. A collaborative scheme involving sensor tasking
(b) at static and dynamic cameras is proposed for object-oriented
visualization for aiding continuous visual tracking.
Although the visualization mechanism and new system
concepts to complement 3GSSs have been presented in this
paper, future work remains. In addition to accuracy and
stability, the evaluation of system performances could apply
other measures such as time to receive events, ergonomics and
comfortability. A subjective evaluation of the degree of
assistance of the proposed approach to security personnel can
be quantitatively assessed with respect to user interface issues
(c) such as ergonomics and comfortability. Robust multiobject
Fig. 12. Results of camera selection by (a) PCH method [37], (b) CSM method
[38], and (c) the proposed method; the X-axis represents the frame number and
tracking across multiple cameras [43] can be helpful to display
the Y-axis represents the camera number. switching arbitration. For example, if a person is to be
monitored across cameras, an efficient human tracking
CSM are increased to 75.82% and 78.77%. The results show technique can help the system determine the dominant camera
that the proposed semantic features are very effective, and the for display switching with more accuracy.
optimization algorithm with the visualizability criterion is an
effective design than the two compared algorithms. These two ACKNOWLEDGEMENTS
components constitute the high accuracy of the proposed The authors would like to thank the professors in the Vision
method. Based Intelligent Environment project: Yi-Ping Hung,
Fig. 12 shows the stability of camera selection for the test Sheng-Wen Shih, Yong-Sheng Chen, Jun-Wei Hsieh,
scenario. Stability refers to the ability of an algorithm to stably Chi-Hung Chuang, Chin-Teng Lin, Cheng-Chang Lian,
generate a cluster keyframe without false detection. Figs. 12(a) Chin-Chuan Han, Hsi-Jian Lee, Sheng-Jyh Wang, Daw-Tung
and (b) show the results obtained using the PCH and CSM Lin, Kuo-Chin Fan, Wen-Thong Chang and Wen-Hsiang Tsai,
algorithms, respectively. The PCH algorithm detected a for their contribution to the success of this system. The authors
dominant camera only from Frame 160, which has false also thank the graduate students of these professors for their
negatives for frames from 70 to 150. Both algorithms were tireless efforts and supports.
highly unstable from Frame 160, where the object appeared in
both cameras 3 and 4. Fig. 12(c) shows a relatively smooth REFERENCES
result that is good for display switching. The use of Kalman [1] X. Li, R. Lu, X. Liang, X. Shen, J. Chen, and X. Lin, "Smart
filtering, semantic features and the optimization of the community: an internet of things application," IEEE Communications
visualization model constitute the successful and stable results Magazine, vol.49, no.11, pp.68-75, Nov. 2011.
[2] T. D. Raty, "Survey on contemporary remote surveillance systems for
of the proposed method. public safety," IEEE Tran. on Systems, Man, and Cybernetics, Part C:
Applications and Reviews, vol.99, pp.1-23, Mar. 2010.
[3] G. Smith, "Behind the screens: Examining constructions of deviance
and informal practices among CCTV control room operators," Surveill.
Soc., vol. 2, no. 2–3, 2002.
Accepted by IEEE Transactions on Systems, Man, and Cybernetics: Systems 12
[4] X. Wang, "Intelligent multi-camera video surveillance: A review," [26] H. C. Lee and S. D. Kim, "Rate-driven key frame selection using
Pattern Recognition Letters, vol. 34, no.1, pp.3-19, 2013. temporal variation of visual content," Electron. Lett., vol. 38, no. 5, pp.
[5] L. Yu and T. E. Boult, "System issues in distributed multi-modal 217-218, February 2002.
surveillance," IEEE Conference on Computer Vision and Pattern [27] K. Sze, K. Lam, and G. Qiu, "A new key frame representation for video
Recognition, Minnesota, USA, Jul. 2007, pp.1-2. segment retrieval," IEEE Trans. on Circuits Syst. Video Technol., vol.
[6] F. Porikli, F. Bremond, at al., "Video Surveillance: Past, Present, and 15, no. 9, pp. 1148-1155, 2005.
Now the Future", IEEE Signal Processing Magazine, vol. 30, no. 3, pp. [28] B. Erol and F. Kossentini, "Automatic key video object plane selection
190-198, 2013. using the shape information in the MPEG-4 compressed domain," IEEE
[7] Y. K. Wang, L. Y. Wang, Y. C. Huang, and C. T. Fan, "An online Trans. on Multimedia, vol. 2, pp. 129-138, June 2000.
object-based key frame extraction method for the abstraction of [29] C. Kim and J. N. Hwang, "Object-based video abstraction for video
surveillance videos," in Proc. National Computer Symposium, Taiwan, surveillance systems," IEEE Trans. on Circuits and Systems for Video
pp. 241-249, November 2009. Technology, vol. 12, pp. 1128-1138, 2002.
[8] K. W. Chen, C. W. Lin, T. H. Chiu, M. Y. Chen, and Y. P. Hung, [30] E. Spyrou and Y. Avrithis, "A region thesaurus approach for high-level
"Multi-resolution design for large-scale and high-resolution concept detection in the natural disaster domain," Lecture Notes in
monitoring," IEEE Transactions on Multimedia, vol.13, no.6, Computer Science, LNCS 4816, pp. 74-77, 2007.
pp.1256-1268, Dec. 2011. [31] Z. Ji, Y. Su, R. Qian, and J. Ma, "Surveillance video summarization
[9] D. T. Lin and L. Y. Liu, "Method of detecting moving object." U.S. based on moving object detection and trajectory extraction," Int. Conf.
Patent 121268,603. (Pending) on Signal Processing Systems, Yantai, China, pp. 250-253, July 2010.
[10] Y. Pritch, A. Rav-Acha, and S. Peleg, "Nonchronological video [32] J. Yoder, H. Medeiros, J. Park, and A. Kak, "Cluster-based distributed
synopsis and indexing," IEEE Transactions on Pattern Analysis and face tracking in camera networks," IEEE Trans. on Image Proc., vol. 19,
Machine Intelligence, vol.30 no.11, pp.1971-1984, Nov. 2008. no. 10, pp. 2551-2563, October 2010.
[11] Y.K. Wang, C.T. Fan, K.Y. Cheng, and P.S. Deng, "Real-time camera [33] T. Mat and N. Ukita, "Real-time multitarget tracking by a cooperative
anomaly detection for real-world video surveillance," International distributed vision system," Proc. IEEE, vol. 90, no. 7, pp. 1136-1150,
Conference on Machine Learning and Cybernetics, vol.4 , China, Jul. July 2002.
2011, pp.1520-1525. [34] H. Medeiros, J. Park, and A. C. Kak, "Distributed object tracking using
[12] C. C. Huang, S. J. Wang, Y. J. Chang, and T. Chen. "A hierarchical a cluster-based Kalman filter in wireless camera networks," IEEE
Bayesian generation framework for vacant parking space detection," Journal of Selected Topics in Signal Proc., vol. 2, no. 4, pp. 448-463,
IEEE Transactions on Circuits and Systems for Video Technology, vol. August 2008.
20, no. 12, pp.1770-1785, Dec. 2010. [35] I. F. Akyildiz, T. Melodia, and K. R. Chowdhury, "A survey on wireless
[13] C. C. Lien, Y. T. Tsai, M. H. Tsai, and L. G. Jang, "Vehicle counting multimedia sensor networks," Computer Network, vol. 51, pp. 921-960,
without background modeling," International Conference on Advances 2007.
in Multimedia Modeling, vol.Part I, Taipei, Taiwan, Jan. 2011. [36] N. Martinel, C. Micheloni, C. Piciarelli, and G. L. Foresti, "Camera
[14] J. W. Hsieh, C. H. Chuang, S. Y. Chen, C. C. Chen, and K. C. Fan, Selection for Adaptive Human-Computer Interface," IEEE
"Segmentation of human body parts using deformable triangulation," Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no.
IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems 5, pp. 653-664, 2014.
and Humans, vol.40, no.3, pp.596-610, May 2010. [37] J. Kim and D. Kim, "Probabilistic camera hand-off for visual
[15] C. T. Lin, Linda Siana, Y. W. Shou, and T. K. Shen, "A conditional surveillance," in Proc. Int. Conf. on Digital Smart Cameras, Stanford,
entropy-based independent component analysis for applications in USA, pp. 1-8, September 2008.
human detection and tracking," EURASIP Journal on Advances in [38] R. Goshorn, J. Goshorn, D. Goshorn, and H. Aghajan. "Architecture for
Signal Processing, vol.2010, Apr. 2010. cluster-based automated surveillance network for detecting and
[16] R. Khemmar, J. Y. Ertaud, and X. Savatier, "Face detection and tracking multiple persons," in Proc.ACM/IEEE Int. Conf. on
recognition based on fusion of omnidirectional and PTZ vision sensors Distributed Smart Cameras, Vienna, Austria, pp. 219-226, September
and heteregenous database," International Journal of Computer 2007.
Applications, vol.61, no. 21, 2013. [39] A. Bakhtari , M. D. Naish , M. Eskandari , E. A. Cloft, and B. Benhabib
[17] N. Haering, P. L. Venetianer, and A. Lipton, "The evolution of video "Active-vision-based multisensor surveillance—An implementation",
surveillance: an overview," Machine Vision and Applications, vol.19, IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 36, no. 5, pp. 668
no.5-6, pp.279-290, Sep. 2008. -680, 2006.
[18] J. Ferenbok and A. Clement, (2011). Hidden changes: from CCTV to [40] Y. Fu, Y. Guo, Y. Zhu, F. Liu, C. Song, and Z. H. Zhou, “Multi-view
‘smart’ video surveillance. In A. Doyle, R. Lipert, and D. Lyon (Eds.), video summarization.” IEEE Transactions on Multimedia, vol. 12, no.
Eyes Everywhere: The Global Growth of Camera Surveillance (pp. 7, pp. 717-729, 2010.
218-234). New York: Routledge. [41] Vision-based Intelligent Environment project (accessed Oct. 2015).
[19] P. M. Roth, V. Settgast, P. Widhalm, M. Lancelle, J. Birchbauer, N. [Online]. Available: http://cvrc.nctu.edu.tw/~TT/home.php?&lang=en.
Brandle, S. Havemann, and H. Bischof, "Next-generation 3D [42] ICDSC Challenge - Smart Homes Data set. (accessed 2009) [Online].
visualization for visual surveillance," IEEE Conference on Advanced Available: http://wsnl2.stanford.edu/icdsc09challenge/.
Video and Signal Based Surveillance, Sabta Fe, USA, Sep. 2011, [43] C. M. Huang and L. C. Fu, “Multitarget Visual Tracking Based
pp.343-348. Effective Surveillance With Cooperation of Multiple Active Cameras,”
[20] A. Girgensohn, D. Kimber, J. Vaughan, T. Yang, F. Shipman, T. Turner, IEEE Trans. Syst., Man, Cybern. B, vol. 41, no. 1, pp. 234-247, 2011.
E. Rieffel, L. Wilcox, F. Chen, and T. Dunnigan, "DOTS: support for
effective video surveillance," International Conference on Multimedia,
Augsburg, Germany, Sep. 2007, pp.423-432.
[21] N. Martinel, C. Micheloni, C. Piciarelli, and G. L. Foresti,"Camera
selection for adaptive human-computer interface," IEEE Tran. on
Systems, Man, and Cybernetics: Systems, vol.44, no.5, May 2014.
[22] Y. L. Tian, L. Brown, A. Hampapur, M. Lu, A. Senior, and C. F. Shu,
"IBM smart surveillance system (S3): event based video surveillance
system with an open and extensible framework," Machine Vision and
Applications, vol.19 no.5-6, p.315-327, Sep. 2008.
[23] D. Kieran, J. Weir, and W. Q. Yan, "A framework for an event driven
video surveillance system," Journal of Multimedia, vol.6, no.1, Feb.
2011.
[24] G. C. Chao, Y. P. Tsai, and S. K. Jeng, "Augmented keyframe," J. Vis.
Commun. Image R., vol. 21, pp. 682-692, 2010.
[25] A. M. Ferman, A. M. Tekalp, and R. Mehrotra, "Robust color histogram
descriptors for video segment retrieval and identification," IEEE Trans.
on Image Proc., vol. 11, no. 5, pp. 497-508, May 2002.