Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

IMAGE STABILIZATION ALGORITHMS FOR VIDEO-SURVEILLANCE APPLICATIONS

Lucio Marcenaro, Gianni Vernazza and Carlo S. Regazzoni University of Genoa, Department of Biophysical and Electronic Engineering (DIBE) Via allOpera Pia 11/A 1-16145 Genova (Italy) Phone: +39-0 10-3532792 Fax: +39-0 10-3532134 e-mail: carlo@dibe.unige.it
ABSTRACT
In this paper, an image-stabilization algorithm is presented thot
is specifically oriented toward video-surveillance applications. The proposed approach is based on a novel motioncompensation method that is an adaptation of a well-known image-stabilization algorithm for visualization purposes to video-surveillance applications. In particular, the illustrated methods take into account the specificity of typical videosurveillance applications, where objects moving in a scene ofen cover a large part of an image thus causing the failure of classic image-stabilization techniques. In the second part of the paper, evaluation methods for image stabilization algorithms are discussed.
The paper is organized as follows: in Section I , the principles of image-registration and motion-compensation techniques are outlined. In Section 2, the need of image-stabilization algorithms for video-surveillance applications is highlighted. In Sections 3 and 4, the method proposed for image stabilization in videosurveillance systems is detailed and analyzed. Sections 5 and 6 deal with possible evaluation methods and results of the described algorithms respectively. Conclusions are drawn in Section 7.

2. IMAGE-REGISTRATION AND MOTIONCOMPENSATION TECHNIQUES


Many well-known motion estimation techniques can be found in literature. These algorithms are sometimes defined by the term image registration, which is related to the evaluation of the movements between successive images in a sequence. Imageregistration methods constitute the basis for image-stabilization and mosaicing techniques. Such algorithms are needed in several applications such as: Wide-area surveillance: image-mosaicing algorithms allow the surveillance of very large areas (industrial plants, etc) by using a small number of cameras; Cartography: sequences of aerial images can be combined to generate maps; Outdoor surveillance: image stabilization is needed to compensate for unwanted sensor movements (e.g., highway surveillance) [ 11; Automatic-vehicle driving: image stabilization is needed to attenuate vibration due to mechanical vehicle movements. Image registration can be defined as the estimation of the correspondences between an input image and a reference frame providing the reference system for the movement to be estimated. Image registration algorithms are used to determine the displacement between two images: the geometrical transformation is represented by the model of the camera movement. Image-registration algorithms can be divided into two categories: feature-based techniques [ 2 ] : the movement is estimated by tracking a set of features in the images; in this way, it is possible to find the translations and rotations that occurred between the considered frames. The feature-selection stage can be very complex because the robustness of the method mostly depends on this step. dense-matching techniques [3, 41: these methods estimate parameters related to the model of movement, while

1. INTRODUCTION
In the past few years, the market of video-surveillance systems has considerably grown. Video-surveillance sensors are usually represented by cameras that acquire video sequences to be transmitted to a remote control center. In first-generation videosurveillance systems, acquired images are presented to the human operator, who has to search for potentially dangerous situations. This paper deals with second-generation surveillance systems, where the images acquired by the sensors are processed by an automatic system that can detect and locate objects moving within a scene and possibly, recognize and classify their typologies and behaviors. Static video-surveillance cameras are often mounted on poles, thus they may be affected by vibrations and unwanted movements, for example, due to atmospheric disturbances. Such interferences are extremely harmful to automatic video-surveillance systems as they cause a considerable degradation of automatic event recognition. Imageprocessing methods adopted by this kind of systems typically use an image of an empty scene as a reference image for object detection and location. An unwanted movement in the camera shot often causes an incorrect superposition of the current and reference images as well as destructive consequences for typical change-detection algorithms. In the present paper, a novel image-stabilization algorithm is described together with the methods for evaluating the obtained performances, with special attention on automatic video-surveillance systems.

This work was partially supported by the Ministry of Universities and Scientific Research (MURST) of the Italian Government and by the British Council.

0-7803-6725-1/01/$10.00 02001 IEEE

349

minimizing the cost functional associated, for instance, with the differences between analogous regions in the frames considered. A method based on feature tracking is proposed by Morimoto and Chellappa in [ 2 ] , Hansen et al. [SI propose an algorithm based on global optical-flow estimation. A method widely used for motion estimation ilj the Block Matching Algorithm (BMA), which is based on the subdivision of the images of a sequence into blocks: each block. is then tracked in the sequence and the results of the tracking phase are used for motion compensation. However, this method is very time-consuming, then it is not useful when a real-time requirement must be met. Many algorithms have been proposed to improve algorithm performances [6, 71. The second step for image stabilization lies in motion compensation: motion parameters that have been estimated through mvstion registration techniques are now applied to the sequence in order to stabilize the images by placing them in a common reference system. In a general case, motion compensation should compensate for unwanted movements of the camera, while preserving the ones due to moving objects or to global camera-shot movements. A block scheme of a general image-stabilization system is depicted in Figure 1.

stable features from the images and to track them in the sequence. Unfortunately, the contents of video-surveillance images are considerably different because, typically, moving objects constitute a considerable part of an image, often covering a large area of the background and making unreliable the features placed beneath an object in a certain frame. Besides, the features detected within the bounding box of the object must be rejected because of their instability.

4. PROPOSED METHODS 4.1. The grid method


This method uses a fixed set of points on a grid that is superimposed on an image. The method evaluates the motion transformation to be applied to the image by minimizing the mean square error between the corresponding pixels in the images of a sequence. First, a grid of points is placed on both the reference and current images; sets of translations are applied to the grid and, for each set, a correlation index is calculated as:

wherep,',, denotes the pixels with the coordinates (j,k) in the image t , where t=O is the reference image and t=i is the current image; m and n represent the numbers of points on the grid along the horizontal and vertical axes, respectively. Equation (1) defines a correlation index between the considered images when a translation vector (a,b) is applied to the images. The index is computed on the discrete set of points represented by the grid; in order to minimize this term, an exhaustive search is performed over a certain range of translations, and the vector corresponding to the maximum correlation is selected for motion compensation. Figure 2 shows a typical video-surveillance image (the image sequence presented in [9] has been used for the present paper) where a reference grid (black squares) and the translated one (white sauares) have been SuDerimmsed.

Figure 1 Block diagiam of a generic image stabilization system.

The Motion Estimation module evaluates inter-frame motion, and the Motion Compensation module calculates the global transformation that is needed to stabilize the current image. Finally the Image Composition module modifies the considered image according to the results of the Motion Compensation module, thus generating the stabilized sequence or, if required, the mosaicing image. In the following, only static-camera surveillance systems will be considered, hence only the motion due to movements of objects in a scene will be preserved.

3. IMAGE STABILIZATIONFOR SURVEILLANCE APPLICATIONS


A standard video-surveillance application will be considered in the following. Automatic video-surveillance systems aim to process an input image sequence in order to detect and locate objects present in a scene, classify them, and interpret their behavior. Subsequently, they can send an alarm signal to inform the human operator that something dangerous is happening. A typical video-surveillance system [8] uses a reference image (background) that dtepicts the scene without any objects. By subtracting and threslnolding between the current acquired image and the background, the automatic system is able to detect objects in the scene. Because the low-level image-processing algorithms operate 011 the corresponding pixels of the considered couple of images, it is very important for the images to be in the same reference system: this is not guaranteed, in particular, if the installation of the seiisor suffers from vibrations. The algorithms that have been developed up to now work on wide panoramic images where moving objects do not cover a large percentage of a scene. In this case, a feature-based stabilization algorithm works well, being able to select highly

Figure 2 Schematic representation of the grid method. A validation algorithm is then used to discard the grid points that correspond to moving objects in the image: each point is associated with a confidence coefficient that is incremented when a certain point can be successfully tracked, whereas it is decremented when no corresponding point is found in the current processed image. The confidence coefficient is computed as a percentage of correct tracking; it is initialized with 1 and updated as follows: No. of times that the point was tracked C= (2) No. of processed frames

A point can be correctly tracked when a pixel with the same features is detected within a certain research area with respect to

350

the reference frame. Finally, the point is actually tracked if the above coefficient is above a certain threshold, hence the point is defined as trackable. If the method considers only the information about the points lying on the grid, it turns to be very fast but still reliable for video-surveillance applications when the grid is dense enough to cover a large part of the image (10% is considered to be enough).

for the correct tracking of a feature is obtained and used for the validation of the tracked feature: every time a feature is correctly tracked, the occurrence index is incremented, and vice versa. The feature is actually used for motion estimation only if the occurrence index is above a certain threshold.

5. EVALUATION METHODS
Evaluation methods for motion-compensation algorithms are typically based on the following consideration: when the motion has been exactly compensated for, the difference between the stabilized image and the reference image is minimized and, theoretically, is not zero only for pixels that correspond to an object in the scene. The peak signal-to-noise ratio (PSNR) [12] can be used as a measure for evaluating the superposition of two images: a maximum PSNR is reached when a perfect stabilization is achieved. The PSNR can be defined as: PSNR(I,,I,)=lOlog 255

4.2. Feature-based method


The second method considered is based on the feature tracker proposed in [lo] and used in [ l l ] for stabilization of aerial images. A set of points is selected from a reference imageby applying a criterion able to detect corners in the image. For each considered area, a two-dimensional gradient is evaluated, and, if a corner point is detected, its typology is classified by evaluating the following functions for each pixel in the reference image (Fig. 3):

~val,_,
features;

= I P , ~ -~,-~,~.~l+l<..~ - - P , - ~ . ~ ~ +- ~ ~P ,~ ~ ~ for - ~ top-left l


P , . t . l l
for top-

MSE( I , , I , )

(3 )

Fval,_, =lPj.t -J,+,.k-l~+~f,.t- f , + l . k l + l P , . t


right features;

Fval,_,=IPj.t - P ,-,. k+II+(f,.k -f,-l.tl+lP,.t -p,.t+ll for bottomleft features;

FvaL =

(pj.k-

~j+l.,+,

I+lP,

- P,*I.kI+ IPj.& - Pj.k+, I

for bottom-

where I , and IR are the current image and the reference image, respectively, and MSE(Ic,IR) denotes the Mean Square Error calculated for the considered images. For the evaluation of an image stabilization system, two different measures based on the PSNR are computed: - ITF (Interframe Transformation Fidelity): it is the PSNR calculated between two successive frames: PSNR(IbIk.,); - GTF (Global Transformation Fidelity): it is the PSNR calculated with respect to the reference frame (background):

rieht features:

PSNR(I~I0).
However, the higher modules of a typical video-surveillance system are not based on the image considered, but on the binary image that is obtained by thresholding the difference between the current and reference images. A measure of the similarity between images for video-surveillance systems can be calculated by applying the ITF and GTF measures to binary images and evaluating the two quantities for a threshold varying between 0 and 255. For high thresholds, the corresponding PSNR values are very large because the system is almost blind as it detects as changed only those areas that are more different with respect to a fixed background. The new equation for calculating the PSNR is:

7.1

I r .

I . . *a

Figure 3 Feature selection algorithm

If one of these functions is above a certain threshold, the corresponding point is marked as a feature to be tracked of that particular typology. In general, a feature-selection criterion is adopted in order to keep only the most significant features and discard all the others; this can be done by using two different strategies: 1) the whole image is divided into adjacent columns. From each column, only the most important feature is selected. In this way, the selected features are distributed over the image, but the selection criterion can discard even good features as soon as a stronger corner is detected in the same column; 2) the only features that are not near any similar ones in their neighborhood are selected: as a result, a possible mismatching between similar near features can be avoided. The classification of the feature typology leads to a more robust feature-tracking algorithm: tracking is performed by searching for a similar feature (in terms of corner intensity and typology) in a proper neighborhood in the current image. This method, like the previous one, compute a measure to take into account moving objects in a scene: a relative occurrence index is associated with each feature. In this way, a simplified probability

PSNR,,,( I , , I , ) = 1Olog
where:

I M S E ( I , , I, )

(4)

2
MSE(I,,I,)=
I==

drff (x, y )

wxh where w and h are the width and height of the images, respectively, and
dl;ff(x, y ) =

1 if ~ I c ( x , y ) - l R ( x , y )threshold ~2

0 otherwise

(6)

is the background differencing rule used for the MSE. In this case, the maximum MSE is equal to 1 : this explains the numerator coefficient in equation (4). The proposed stabilization algorithms have been evaluated for a video-surveillance system by using the receiving operator characteristic (ROC) curves [13], which plot the probability of false alarm vs. correct detection when the complexity (i.e., the motion range of the sensor) changes. The probabilities related to

35 1

the system have been evaluated through the analysis of the superposition of real and estimated bounding boxes, as proposed in [14].

stabilization video-surveillance algorithms. The validity of the adopted approach is demonstrated by the measures on the output of the system considered as well as by the probabilities calculated on a complete video-surveillance system.

6. RESULTS
The proposed methods were evaluated by calculating the GTF and ITF indexes for different change-detection thresholds. Both the grid method and the feature-tracking method were compared with the results obtained by using uncompensated sequences. The GTF index was used to evaluate motion compensation with respect to an initial reference image; through the ITF index, one can estimate the correlation between temporally adjacent images. Figures 4a and 4b show that, in both cases, the curve that represents the uncompensated sequence is always below the other lines: this means rhat, in both cases, the proposed methods are able to compensate for unwanted motion; moreover, the grid method achieves higher performances. Considering that the graphs are on a dB-like scale, it can be concluded that the illustrated methods provide good performances. L O C curves calculated for the considered Figure 5 displays the F system. Each point of the curves was calculated by varying the motion range of the images to be stabilized. The figure shows that the proposed methods ensure higher correct-detection-tofalse alarm probability ratios than the system based on uncompensated images. The graph also points out that, in this case, the grid method works better than the feature-tracking one. The working area highlighted in the figure refers to movements in the range of 5 to 15 pixels: in this area, the stabilization methods reach a maximum gain.

V..LllUh.

Figure 5 ROC curves for a video-surveillance system using the

considered methods

8. REFERENCES
[ I ] S. Peleg, B. Rousso, A. Rav-Acha. and A. Zomet. Mosaicing on adaptive manifolds. IEEE Trans. on Pattern Analysis and Marltine Intelligence, Vol. 12, No. 10, pp. 1144-1154,October 2000.

[2] C.H. Morimoto and R. Chellappa. Automatic digital image stabilization.IEEE Int. Cot$ on Partem Recognition, August 1996. [3] C. Kuglin and D. Hines. The phase correlation image alignment method. IEEE Conference on Cybernetics and Society, September 1975. [4] R. Szeliski. Image mosaicing for tele-reality application. Technical Report CRL 94/%Digital Equipment Corporation, Cambridge Research Lab, May 1994. [SI M. Hansen, P. Anandan, K. Dana, G. Van der Wal, and P. Burt. Real-time scene stabilization and mosaic construction. DARPA Image Understanding Workshop. November 1 994. [6] M.J. Chen et al. A new block-matching criterion for motion estimation and its implementation. IEEE Trans. Circuit and System for Video Teclinology. vol. 5, pp. 231-236, June 1995. [71 L.M. Po and W.C. Ma. A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circuit arid System for Video Technology, vol. 3, no. I , June 1996. [SI L. Marcenaro, F. Oberti, and C.S. Regazzoni. Short-memory shape models for ground-plane predictive object tracking. Proc. First IEEE
lnr.

Workshop on

Perfonnance

Evaluation

o f

Tracking

and

Surveillance, PETS2000, Grenoble, France, pp.50-56.2000.

yl

1u.J

I r

l r . . . h * .

(b) Figure 4 Evaluations of the modified (a) GTF and (b) ITF for the considered methods

[9] First IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, PETS2000, Grenoble, France, 2000. [IO] C. Tomasi and T. Kanade. Detection and tracking of point features. Technical Report CMU-CS-91-132. Camegie Mellon University, Pittsburgh, PA, April 1991. [Ill A. Censi, A. Fusiello, and V. Roberto. Image stabilization by feature tracking. Image Analysis arid Processing ICIAP, pp. 665-667, September 1999. [12] C. Morimoto and R. Chellappa. Fast electronic digital image stabilization. Proc. Int. Conf on Pattern Recognition, Vienna, Austria, August 1996. [13] H. Van Trees. Classical Detection and Estimation Theory Detection, Estimation and Modulation Theory, John Wiley & Sons, Inc., 1968, pp. 19-46. [I41 F. Oberti, E. Stringa, and G. Vernazza. Performance evaluation criterion for characterizing video-surveillance systems, Real-Time lmuging Journal, 2001 (in press).

7. CONCLUSIONS
In conclusion, this palper has shown a possible evolution of wellknown motion-compensation and image-stabilization methods. The proposed methods are able to filter unwanted motion, while preserving the movements of the objects in a scene. Evaluation methods have been developed for the proposed image-

352

You might also like