Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2008 International Symposium on Ubiquitous Multimedia Computing

Video Trans-coding in Smart Camera for Ubiquitous Multimedia Environment


Ekta Rai Chutani Indian Institute of Technology Multimedia Laboratory New Delhi, India ekta.rai@gmail.com Santanu Chaudhury Indian Institute of Technology Multimedia Laboratory New Delhi, India schaudhury@gmail.com

Abstract
Smart cameras are expected to be important components for creating ubiquitous multimedia environments. In this paper, we propose a scheme for on-line semantic transcoding of the video captured by the smart camera. The transcoding process selects frames of importance and regions of interest for use by other processing elements in a ubiquitous computing environment. We have proposed a local associative computation based change detection scheme for identifying frames of interest. The algorithm also segments out the region of change. The computation is structured for easy implementation in DSP based embedded environment. The transcoding scheme enables the camera to communicate only regions of change in frames of interest to a server or a peer. Consequently communication and processing overhead reduces in a networked application environment. Experimental results have established effectiveness of the transcoding scheme.

1. Introduction
Digital cameras with Embedded CODECS are common commercial products. Smart Cameras can not only process and interpret the data that they capture in real time but also can intelligently decide what to store and what to communicate. Smart cameras are expected to be important components for creating ubiquitous multimedia environments. In general, video transcoder converts one compressed video bitstream into another with a different format, size (resolution), bit rate, or frame rate. The goal of transcoding is to enable the interoperability of heterogeneous multimedia networks reducing complexity and run time by avoiding the total decoding and re-encoding of a video stream[1]. In contrast to this, in this paper we propose a semantic transcoding scheme which can be used to obtain a ltered output from a smart camera satisfying a semantically meaningful condition. The proposed semantic ltering scheme selects

only frames of interest from an input video stream for further processing. The algorithm is structured for facilitating easy implementation on an ASIC or dedicated DSP used in a smart camera.[8] Our transcoding scheme is based on a change detection scheme for identifying frames of interest. Change detection is important in any vision based monitoring system. Various approaches for change detection have been proposed in literature, based upon properties like color of the scene content, shape of the object of interest, motion parameters of objects, etc. Predictions based on the past history have also been used [7]. Change detection have been used for a variety of applications. For example, (a) segmenting video sequence into logical shots using object based features [4]; (b) segmentation of background and foreground objects [2, 9]; (c) classifying the video shots depending on the change detection features (identifying action shots),[3]. (d) prediction based change detection[7]; (e) various map conditions based on pixel colors [5]; and so on. Most of the change detection algorithms are pixel based or block based. In pixel based approach, difference between the two consecutive frames has been calculated for every pixel. Then by using a threshold value depending on the application, change has been identied. In block based approach, frame is divided into blocks of required sizes. Block matching algorithm is implemented on each block of the frame. There are number of methods to decide the goodness of the match on block based and some of them are: Cross Correlation Function, Pel Difference Classication (PDC), Mean Absolute Difference, Mean Squared Difference and Integral Projection etc [5]. For semantic transcoding, we have proposed a clustering based change detection scheme motivated by [2]. However, our algorithm uses a computationally simpler clustering scheme for change detection and unlike[2] propagates cluster label information in the image space for extraction of areas of interest. The clustering based scheme has advantage of detecting only those changes which are not consistent with the past history Our method offers an unied approach which combines the capability of change detec-

978-0-7695-3427-5/08 $25.00 2008 IEEE DOI 10.1109/UMC.2008.44

185

tion for identifying frames of interest (FOI), segmentation for identifying reasons of changes and consequent transcoding by reducing the number of frames and selecting only regions in frames of interest for storage and communication without any loss of required information. The semantic transcoding algorithms with its local and associative computational structure are novel contributions of this work.

2. On line Key Frame Detection


The key factor is to detect that change in a video sequence which conveys some relevant information for the application. For instance, when a person/object enters in area of focus, is a change. In a video, any moving object can give rise to change in between consecutive frames. However, repetitive continuous changes in the background such as ying birds, moving leaves of trees or constantly moving trafc on road are changes which are not of signicance. Figure 1 and Figure 2 shows two examples of object-centric changes in the scene.

Figure 3. Leaves of the tree are in continuous motion. Detected change is the person entered.

Figure 4. Trafc in the background on the road is in continuous motion: Detected change is car entered.

Figure 1. Changed Position of the person as in video sequence

We hypothesize that frames which indicate signicant deviation from the past are only candidates about which information needs to be shared between computing elements in an ubiquitous computing environment.Here, we propose an algorithm for detecting only substantive changes. The algorithm does not assume a background model but learns the past history using an unsupervised incremental clustering algorithm. Further, the algorithm makes use of localized computation and consequently has the ability to locate the changes as well.

2.1. Clustering of Pixel Values


We assume that input frames are in R, G, B colour space with identical resolution for each plane. Each frame is partitioned into 4x4 blocks. We perform incremental clustering in color space for each of these 4x4 blocks. Typically in an application setting we need to record pixel history and identify the change with reference to the past. The pseudostationary color values at a pixel have multi-modal distribution. The incremental clustering process is expected to discover modes of data distribution. For each block we have a set of clusters. For each cluster we store i. Centroid value (in RGB) ii. Frame number which updated the cluster last and

Figure 2. Cow enters in area of focus Some examples of pseudo stationary backgrounds have been shown in Figure 3 & Figure 4. In such cases it is difcult to dene one term for change. It varies from application to application. Here we have proposed a clustering based change detection scheme which learns the characteristic of pseudo stationary background and ags change when there occurs denitive deviation from the past history.

186

iii. Counter to count the number of frames mapped to the cluster iv. Optionally each cluster is associated with a ag eld The cluster set for each block is initialized with a cluster having its centroid set to the average color value of the corresponding block of the rst frame. Each list may have maximum of FIVE cluster nodes.

to be created and the number of existing cluster nodes is 5, we eliminate the cluster node which has not been updated for the longest period of time. Hence, we need not have a priory assumption about the possible number of clusters.

Figure 5. Basic Computational Scheme. When we receive a new frame, we compute difference with the previous frame. The difference is calculated by taking Manhattan distance (absolute differences) between R, G and B values of each pixel in the following way: R = |R1 R2| B = |B1 B2| G = |G1 G2| Dif f = R + G + B Obviously, we have avoided the overhead of multiplication in difference computation. We use average difference value for each block. For low difference (less than a threshold A) blocks increment cluster counter and update the frame number associated with the cluster node of the corresponding block of the previous frame. For blocks having high difference (>A), we nd the nearest cluster centroid in the cluster set. We use the same Manhattan distance for similarity computation. If the difference with the nearest cluster is greater than the threshold (A) a new cluster node is created. The centroid is updated at each step. We use the same strategy as in [2] for updating the centroid. The algorithm allows maximum of 5 such nodes corresponding to each 4x4 block. A new cluster is created when a given pixel assumes new value (completely different from the past) corresponding to the changes happening in the scene. In case of periodic changes in pixel values, re-occurrences of the past sequences are mapped to existing clusters. If numbers of nodes is greater than 5 then we apply Principle of Aging for the purpose of deleting old cluster nodes for recording temporal evolution of pixel values. When a new cluster node is Now, we can look at Cluster Update procedure. The Cluster Update procedure performs the necessary processing for the incremental clustering. We assume that the set of clusters for a block is implemented through a vector data structure.

2.2. Key frame Detection


Clustering algorithm presented in the previous section is used for generating a set of key frames for further processing. These key frames are expected the essential content of the video stream. Given a camera installed at a xed location, informative contents of the video stream are the (i)background view and (ii) changes in the foreground. These contents can be easily captured through the clustering process. At a given instance, using the centroids of the clusters updated by the last frame we can generate a low resolution (one-fourth of the original resolution) view of the scene on demand. This view can be used by a remote visualization terminal. Whenever we introduce a new cluster for a block, it is obvious that pixel values for the block have undergone changes from the past. Further, blocks mapped to new clusters (with count less than N) are also likely to belong to foreground object. Such blocks are marked and a low resolution binary image is generated. Next, we nd connected components in the binary image. If the size of any connected component exceeds 5% of the image size, we ag such frames as key frame. It is clear from the above discussion that when a new object, with spectral properties different from that of background, enters the view, it will be captured in the key frames. We shall continue to generate key frames for the object in motion for at least N frames. Subsequently, if the

187

object becomes static no new clusters will be created for any block in the image and hence we shall stop generating key frames. In Figure 6 & Figure 7 we show the object detected as the scene changes. It may be noted that in Figure 6 only the foreground objects have been identied despite pseudostationary nature of the background. The continuous stream of video is replaced by these frames which indicate substantive change. Basic computational steps involved in cluster generation and change detection are block wise independent and can be executed in parallel. In fact, each such task can be mapped to individual PEs in a VLIW DSP.

However, we shall fail to segment out an object following the above mentioned scheme if it becomes static. In other words, this object will be missed, if a large number of blocks belonging to the object have accumulated counts greater N because in the subsequent frames these blocks will not be marked as changed blocks. However, we need a technique to overcome this problem. We propose to make use of the information from the past frame to perform segmentation even when no change is detected.

Figure 6. Binary images showing Change in Video of Figure 1 and Figure 2.

We consider signicant connected components obtained from the previous frame indicative of a substantive change or existence of a foreground object. We mark the cluster centroids corresponding to the blocks belonging to these connected components with an object label, say O. In absence of object motion, blocks belonging to the object will be mapped to clusters already marked O. Now, we extract connected components considering change/new ag and O ag as identical. This enables extraction of the region of interest even in absence of motion. In the above gure, we show an example of object extraction with connected component of different colors shown in bounded box.

Figure 7. Changes in video shown in gure 3 and gure 4.

4. Results and Discussions


Case 1: Moving Object Detection with static background In general, most of the algorithms of Change Detection for moving objects make use of Motion Vectors or Optical Flow Methods. But at this juncture, a new technique of clustering is used for Change Detection. As a result in Figure 6, it shows only those blocks where value changes due to the movements of the person and cow respectively. This approach of clustering also reduces the complexity of the work for about 30% as compare to the other techniques. This makes it more successful for VLIW architecture based systems. Case 2: Pseudo Stationary background Commonly for such situations, background-foreground separation is done explicitly. This approach has its novelty where continuous movements in background gets eliminated itself. In Figure

3. Segmentation
The smart camera is expected to segment the object of interest and may communicate region of interest in high resolution to the other cameras in the network. In this section, we present a simple scheme for segmenting out objects of interest. The connected components found in low resolution binary images as described in the earlier section, are segments of interest in a frame. This connected component is further rened with reference to the original frame for extraction of segments of different colors.

188

7, the change is shown in the area where a person or car enters the region of focus. The result shows that the movement of the leaves and background trafc movements are ignored without invoking any special technique. The blocks which show changes for long sequence in video, is merged as part of background considering it as constant though out. Segmentation is done on the basis of color. It results in extracting the objects of different colors by matching the neighbouring blocks. The algorithms have been implemented on standard Linux based platform. We experimented with about 20 video sequences of average length of 500 frames (from 300 frames to 1000 frames). Our algorithm has detected about 90% of true changes and has reduced substantially number of frames in the sequence on average by a factor of 9 (approximately).

[6] G. School. Change detection tutorial. www.globe.unh. edu/MultiSpec/Change/Change.pdf. [7] M. Steyvers and S. Brown. Prediction and change detection. Advances in Neural Information Processing Systems, (18):1281 1288, 2006. http://psiexp.ss.uci. edu/research/papers/. [8] A. Vetro, C. Christopoulos, and H. Sun. Video trancoding and architectures techniques: An overview. IEEE Signal Processing Magazine, IEEE, 20(2):18 29, March 2003. [9] H. Wang and D. Suter. A consensus based method for tracking modelling background scenario and foreground appearance. Pattern Recognition, 40(3):1091 1105, March 2007.

5. Conclusions
In this paper we have presented algorithms for semantic transcoding of the video which can useful for distributed monitoring applications. The transcoding is done on the bases of Region of Interest. Only those frames are considered where the information is relevant to change detection. The algorithms have been designed so that it can be decomposed into a set of local pixel based associative computations amenable for SIMD processing or mapping onto VLIW DSP. Semantic transcoding schemes amenable for implementation in embedded environment are the key contribution of this paper. We have also shown that these algorithms yield good results.

References
[1] M. A. Bonuccelli, F. Lonetti, and F. Martelli. Video transcoding architectures for multimedia real time services. ERCIM NEWS, (62), July 2005. http://www.ercim.org/publication/ercim_ news/enw62/bonucelli.html. [2] D. E. Butler, V. M. B. Jr., and S. Sridharan. Real time adaptive foreground/background segmentation. EURASIP Journal on Applied Signal Processing ACM, 2005(1):2292 2304, January 2005. [3] H.-W. Chen, J.-H. Kuo, and J.-L. W. Wei-Ta Chu. Action movies segmentation and summarization based on tempo analysis. International Multimedia Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval, pages 251 258, January 2004. [4] J. FENG, K.-T. LO, and H. MEHRPOUR. Scene change detection algorithm for mpeg video sequence. IEEE Image Processing Proceedings, 2:821 824, September 1996. [5] R. J. Radke and B. Roysam. Image change detection algorithms: A systematic survey. Image Processing, IEEE Transactions, 14(3):294 307, March 2005.

189

You might also like