Professional Documents
Culture Documents
00959025
00959025
Lucio Marcenaro, Gianni Vernazza and Carlo S. Regazzoni University of Genoa, Department of Biophysical and Electronic Engineering (DIBE) Via allOpera Pia 11/A 1-16145 Genova (Italy) Phone: +39-0 10-3532792 Fax: +39-0 10-3532134 e-mail: carlo@dibe.unige.it
ABSTRACT
In this paper, an image-stabilization algorithm is presented thot
is specifically oriented toward video-surveillance applications. The proposed approach is based on a novel motioncompensation method that is an adaptation of a well-known image-stabilization algorithm for visualization purposes to video-surveillance applications. In particular, the illustrated methods take into account the specificity of typical videosurveillance applications, where objects moving in a scene ofen cover a large part of an image thus causing the failure of classic image-stabilization techniques. In the second part of the paper, evaluation methods for image stabilization algorithms are discussed.
The paper is organized as follows: in Section I , the principles of image-registration and motion-compensation techniques are outlined. In Section 2, the need of image-stabilization algorithms for video-surveillance applications is highlighted. In Sections 3 and 4, the method proposed for image stabilization in videosurveillance systems is detailed and analyzed. Sections 5 and 6 deal with possible evaluation methods and results of the described algorithms respectively. Conclusions are drawn in Section 7.
1. INTRODUCTION
In the past few years, the market of video-surveillance systems has considerably grown. Video-surveillance sensors are usually represented by cameras that acquire video sequences to be transmitted to a remote control center. In first-generation videosurveillance systems, acquired images are presented to the human operator, who has to search for potentially dangerous situations. This paper deals with second-generation surveillance systems, where the images acquired by the sensors are processed by an automatic system that can detect and locate objects moving within a scene and possibly, recognize and classify their typologies and behaviors. Static video-surveillance cameras are often mounted on poles, thus they may be affected by vibrations and unwanted movements, for example, due to atmospheric disturbances. Such interferences are extremely harmful to automatic video-surveillance systems as they cause a considerable degradation of automatic event recognition. Imageprocessing methods adopted by this kind of systems typically use an image of an empty scene as a reference image for object detection and location. An unwanted movement in the camera shot often causes an incorrect superposition of the current and reference images as well as destructive consequences for typical change-detection algorithms. In the present paper, a novel image-stabilization algorithm is described together with the methods for evaluating the obtained performances, with special attention on automatic video-surveillance systems.
This work was partially supported by the Ministry of Universities and Scientific Research (MURST) of the Italian Government and by the British Council.
349
minimizing the cost functional associated, for instance, with the differences between analogous regions in the frames considered. A method based on feature tracking is proposed by Morimoto and Chellappa in [ 2 ] , Hansen et al. [SI propose an algorithm based on global optical-flow estimation. A method widely used for motion estimation ilj the Block Matching Algorithm (BMA), which is based on the subdivision of the images of a sequence into blocks: each block. is then tracked in the sequence and the results of the tracking phase are used for motion compensation. However, this method is very time-consuming, then it is not useful when a real-time requirement must be met. Many algorithms have been proposed to improve algorithm performances [6, 71. The second step for image stabilization lies in motion compensation: motion parameters that have been estimated through mvstion registration techniques are now applied to the sequence in order to stabilize the images by placing them in a common reference system. In a general case, motion compensation should compensate for unwanted movements of the camera, while preserving the ones due to moving objects or to global camera-shot movements. A block scheme of a general image-stabilization system is depicted in Figure 1.
stable features from the images and to track them in the sequence. Unfortunately, the contents of video-surveillance images are considerably different because, typically, moving objects constitute a considerable part of an image, often covering a large area of the background and making unreliable the features placed beneath an object in a certain frame. Besides, the features detected within the bounding box of the object must be rejected because of their instability.
wherep,',, denotes the pixels with the coordinates (j,k) in the image t , where t=O is the reference image and t=i is the current image; m and n represent the numbers of points on the grid along the horizontal and vertical axes, respectively. Equation (1) defines a correlation index between the considered images when a translation vector (a,b) is applied to the images. The index is computed on the discrete set of points represented by the grid; in order to minimize this term, an exhaustive search is performed over a certain range of translations, and the vector corresponding to the maximum correlation is selected for motion compensation. Figure 2 shows a typical video-surveillance image (the image sequence presented in [9] has been used for the present paper) where a reference grid (black squares) and the translated one (white sauares) have been SuDerimmsed.
The Motion Estimation module evaluates inter-frame motion, and the Motion Compensation module calculates the global transformation that is needed to stabilize the current image. Finally the Image Composition module modifies the considered image according to the results of the Motion Compensation module, thus generating the stabilized sequence or, if required, the mosaicing image. In the following, only static-camera surveillance systems will be considered, hence only the motion due to movements of objects in a scene will be preserved.
Figure 2 Schematic representation of the grid method. A validation algorithm is then used to discard the grid points that correspond to moving objects in the image: each point is associated with a confidence coefficient that is incremented when a certain point can be successfully tracked, whereas it is decremented when no corresponding point is found in the current processed image. The confidence coefficient is computed as a percentage of correct tracking; it is initialized with 1 and updated as follows: No. of times that the point was tracked C= (2) No. of processed frames
A point can be correctly tracked when a pixel with the same features is detected within a certain research area with respect to
350
the reference frame. Finally, the point is actually tracked if the above coefficient is above a certain threshold, hence the point is defined as trackable. If the method considers only the information about the points lying on the grid, it turns to be very fast but still reliable for video-surveillance applications when the grid is dense enough to cover a large part of the image (10% is considered to be enough).
for the correct tracking of a feature is obtained and used for the validation of the tracked feature: every time a feature is correctly tracked, the occurrence index is incremented, and vice versa. The feature is actually used for motion estimation only if the occurrence index is above a certain threshold.
5. EVALUATION METHODS
Evaluation methods for motion-compensation algorithms are typically based on the following consideration: when the motion has been exactly compensated for, the difference between the stabilized image and the reference image is minimized and, theoretically, is not zero only for pixels that correspond to an object in the scene. The peak signal-to-noise ratio (PSNR) [12] can be used as a measure for evaluating the superposition of two images: a maximum PSNR is reached when a perfect stabilization is achieved. The PSNR can be defined as: PSNR(I,,I,)=lOlog 255
~val,_,
features;
MSE( I , , I , )
(3 )
FvaL =
(pj.k-
~j+l.,+,
I+lP,
for bottom-
where I , and IR are the current image and the reference image, respectively, and MSE(Ic,IR) denotes the Mean Square Error calculated for the considered images. For the evaluation of an image stabilization system, two different measures based on the PSNR are computed: - ITF (Interframe Transformation Fidelity): it is the PSNR calculated between two successive frames: PSNR(IbIk.,); - GTF (Global Transformation Fidelity): it is the PSNR calculated with respect to the reference frame (background):
rieht features:
PSNR(I~I0).
However, the higher modules of a typical video-surveillance system are not based on the image considered, but on the binary image that is obtained by thresholding the difference between the current and reference images. A measure of the similarity between images for video-surveillance systems can be calculated by applying the ITF and GTF measures to binary images and evaluating the two quantities for a threshold varying between 0 and 255. For high thresholds, the corresponding PSNR values are very large because the system is almost blind as it detects as changed only those areas that are more different with respect to a fixed background. The new equation for calculating the PSNR is:
7.1
I r .
I . . *a
If one of these functions is above a certain threshold, the corresponding point is marked as a feature to be tracked of that particular typology. In general, a feature-selection criterion is adopted in order to keep only the most significant features and discard all the others; this can be done by using two different strategies: 1) the whole image is divided into adjacent columns. From each column, only the most important feature is selected. In this way, the selected features are distributed over the image, but the selection criterion can discard even good features as soon as a stronger corner is detected in the same column; 2) the only features that are not near any similar ones in their neighborhood are selected: as a result, a possible mismatching between similar near features can be avoided. The classification of the feature typology leads to a more robust feature-tracking algorithm: tracking is performed by searching for a similar feature (in terms of corner intensity and typology) in a proper neighborhood in the current image. This method, like the previous one, compute a measure to take into account moving objects in a scene: a relative occurrence index is associated with each feature. In this way, a simplified probability
PSNR,,,( I , , I , ) = 1Olog
where:
I M S E ( I , , I, )
(4)
2
MSE(I,,I,)=
I==
drff (x, y )
wxh where w and h are the width and height of the images, respectively, and
dl;ff(x, y ) =
1 if ~ I c ( x , y ) - l R ( x , y )threshold ~2
0 otherwise
(6)
is the background differencing rule used for the MSE. In this case, the maximum MSE is equal to 1 : this explains the numerator coefficient in equation (4). The proposed stabilization algorithms have been evaluated for a video-surveillance system by using the receiving operator characteristic (ROC) curves [13], which plot the probability of false alarm vs. correct detection when the complexity (i.e., the motion range of the sensor) changes. The probabilities related to
35 1
the system have been evaluated through the analysis of the superposition of real and estimated bounding boxes, as proposed in [14].
stabilization video-surveillance algorithms. The validity of the adopted approach is demonstrated by the measures on the output of the system considered as well as by the probabilities calculated on a complete video-surveillance system.
6. RESULTS
The proposed methods were evaluated by calculating the GTF and ITF indexes for different change-detection thresholds. Both the grid method and the feature-tracking method were compared with the results obtained by using uncompensated sequences. The GTF index was used to evaluate motion compensation with respect to an initial reference image; through the ITF index, one can estimate the correlation between temporally adjacent images. Figures 4a and 4b show that, in both cases, the curve that represents the uncompensated sequence is always below the other lines: this means rhat, in both cases, the proposed methods are able to compensate for unwanted motion; moreover, the grid method achieves higher performances. Considering that the graphs are on a dB-like scale, it can be concluded that the illustrated methods provide good performances. L O C curves calculated for the considered Figure 5 displays the F system. Each point of the curves was calculated by varying the motion range of the images to be stabilized. The figure shows that the proposed methods ensure higher correct-detection-tofalse alarm probability ratios than the system based on uncompensated images. The graph also points out that, in this case, the grid method works better than the feature-tracking one. The working area highlighted in the figure refers to movements in the range of 5 to 15 pixels: in this area, the stabilization methods reach a maximum gain.
V..LllUh.
considered methods
8. REFERENCES
[ I ] S. Peleg, B. Rousso, A. Rav-Acha. and A. Zomet. Mosaicing on adaptive manifolds. IEEE Trans. on Pattern Analysis and Marltine Intelligence, Vol. 12, No. 10, pp. 1144-1154,October 2000.
[2] C.H. Morimoto and R. Chellappa. Automatic digital image stabilization.IEEE Int. Cot$ on Partem Recognition, August 1996. [3] C. Kuglin and D. Hines. The phase correlation image alignment method. IEEE Conference on Cybernetics and Society, September 1975. [4] R. Szeliski. Image mosaicing for tele-reality application. Technical Report CRL 94/%Digital Equipment Corporation, Cambridge Research Lab, May 1994. [SI M. Hansen, P. Anandan, K. Dana, G. Van der Wal, and P. Burt. Real-time scene stabilization and mosaic construction. DARPA Image Understanding Workshop. November 1 994. [6] M.J. Chen et al. A new block-matching criterion for motion estimation and its implementation. IEEE Trans. Circuit and System for Video Teclinology. vol. 5, pp. 231-236, June 1995. [71 L.M. Po and W.C. Ma. A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circuit arid System for Video Technology, vol. 3, no. I , June 1996. [SI L. Marcenaro, F. Oberti, and C.S. Regazzoni. Short-memory shape models for ground-plane predictive object tracking. Proc. First IEEE
lnr.
Workshop on
Perfonnance
Evaluation
o f
Tracking
and
yl
1u.J
I r
l r . . . h * .
(b) Figure 4 Evaluations of the modified (a) GTF and (b) ITF for the considered methods
[9] First IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, PETS2000, Grenoble, France, 2000. [IO] C. Tomasi and T. Kanade. Detection and tracking of point features. Technical Report CMU-CS-91-132. Camegie Mellon University, Pittsburgh, PA, April 1991. [Ill A. Censi, A. Fusiello, and V. Roberto. Image stabilization by feature tracking. Image Analysis arid Processing ICIAP, pp. 665-667, September 1999. [12] C. Morimoto and R. Chellappa. Fast electronic digital image stabilization. Proc. Int. Conf on Pattern Recognition, Vienna, Austria, August 1996. [13] H. Van Trees. Classical Detection and Estimation Theory Detection, Estimation and Modulation Theory, John Wiley & Sons, Inc., 1968, pp. 19-46. [I41 F. Oberti, E. Stringa, and G. Vernazza. Performance evaluation criterion for characterizing video-surveillance systems, Real-Time lmuging Journal, 2001 (in press).
7. CONCLUSIONS
In conclusion, this palper has shown a possible evolution of wellknown motion-compensation and image-stabilization methods. The proposed methods are able to filter unwanted motion, while preserving the movements of the objects in a scene. Evaluation methods have been developed for the proposed image-
352