Professional Documents
Culture Documents
A Statistical Adaptive Block-Matching
A Statistical Adaptive Block-Matching
Abstract—In this paper, we address the problem of motion It is well known that the main computational bottleneck in a
estimation (ME) in digital video sequences and propose a new block-based video encoder is the motion estimation (ME). Most
fast, adaptive, and efficient block-matching algorithm. Higher often, ME is based on block-matching techniques. The use of an
quality and efficiency are achieved using a statistical model for the
motion vectors. This model introduces adaptation in the search exhaustive search algorithm is computationally so demanding
window, drastically reducing the number of positions where that every software implementation of a digital video encoder
correlation-type computation is performed. The efficiency is fur- uses a suboptimal search. Otherwise, it becomes impossible to
ther improved by progressively undersampling the macroblock. work in real time for CCIR 601 formats, with a decent quality,
Patterns for undersampling are proposed to obtain the maximum without specific added hardware.
benefit from single instruction multiple data (SIMD) instructions.
In contrast with existing motion-estimation techniques, search For this reason, several techniques have been proposed to re-
strategy and subsampled patterns are closely linked. This shows duce the computational burden of full search. However, as the
that a good search strategy is much more important than blindly hypotheses of previously introduced techniques based on a re-
reducing the number of pixels considered for the matching duced number of search positions turn out to be untrue in most
pattern. We describe an implementation of the proposed matching
of the real scenes, the resulting coding quality is often question-
strategy that exploits the very long instruction word (VLIW) and
SIMD technology available in the new Itanium Processor Family. able.
Results show that the proposed algorithm adapts easily to the To overcome the deficiencies of previous approaches, the
evolution of the scene avoiding annoying quality drops that can be proposed method relies on the statistical behavior of the
observed with other deterministic algorithms. The total number motion vectors data, thus improving adaptation. Furthermore,
of operations required by the proposed method is inferior to those
required by traditional approaches.
combining the search strategy with the subsampling of the
macroblock (MB) leads to a dramatic increase of efficiency.
Index Terms—Adaptive motion estimation, block matching, dig- Finally, by selecting the adequate subsampling patterns, a huge
ital video, MPEG, SIMD, VLIW.
benefit is gained from single instructions multiple data (SIMD)
instructions in the new generation of the Itanium Processor
I. INTRODUCTION Family (IPF).
This paper is organized as follows. In Section II, a short
R ESEARCH on digital video communication has received
continuously increasing interest over the last decade.
Some of this work converged into standards set on the market
overview of the block matching is given, along with general
considerations about its complexity; major algorithms that
diminish the number of operations are briefly described. In
(MPEG-1 [1] and MPEG-2 [2]; H.261 [3] and H.263 [4]),
Section III, an adaptive algorithm based on the statistical prop-
including the new multimedia standard (MPEG-4) [5] that is
erties of the motion field is introduced. In Section IV, we study
mainly based on the idea of second generation video coding
the impact of the adoption of particular subsampling patterns
[6], and the newly arrived H.264 [7]. However, the availability
for a MB; these patterns are particularly suited for processors
of higher bandwidth in communication channels, web-tv, video
that exploit the single instruction multiple data capabilities.
e-mail, video mobile peer to peer communications, and wireless
In the fifth session, an implementation for Itanium, the first
video servers broadcasting has continuously set new challenges
general purpose processor of IPF that exploits the VLIW and
and opened new horizons for digital video. Realtime MPEG
SIMD technology, is proposed. Results and Conclusions are
playback is nowadays a common practice on every personal
the last two sections of the paper.
computer. The situation worsens, though, when we consider
the video editing and encoding.
II. OVERVIEW OF BLOCK-MATCHING MOTION ESTIMATION
Manuscript received February 17, 2003; revised March 4, 2003. This paper A. Block Matching
was recommended by Associate Editor H. Sun.
F. Moschetti is with NTT DoCoMo, Inc. Multimedia Laboratories, Kanagawa In the representation used for block matching, an image is
239-8536, Japan (e-mail: fulvio@spg.yrp.nttdocomo.co.jp).
M. Kunt is with LTS-EPFL, Signal Processing Laboratory, Swiss Fed- divided into blocks of a general rectangular shape. In practical
eral Institute of Technology, CH 1015 Lausanne, Switzerland (e-mail: applications, these are squares of dimensions . For each
Murat.Kunt@epfl.ch). block in the current frame, the block of pixels in a reference
E. Debes is with Microprocessor Research, Intel Labs, Santa Clara, CA 95052
USA (e-mail: Eric.Debes@intel.com). frame that is the most similar—according to a particular crite-
Digital Object Identifier 10.1109/TCSVT.2003.811363 rion—is searched. The position difference between the current
1051-8215/03$17.00 © 2003 IEEE
418 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003
Fig. 1. Percentage of CPU resources dedicated to different modules for an MPEG encoder. 90% of resources are taken by ME.
block and the block that turns out to be the most similar in the Using the MAD, for every position investigated inside the
reference frame represents the motion vector , given by search window, the number of operations needed is:
So, it clearly appears that the algorithm that is looking for every
(1) possible position in the search window, called a full-search al-
gorithm (FSA), represents a very intensive computational task
where represents the luminance component of the video signal at the encoding process.
and represents the time interval between the current and the Fig. 1 shows the computational effort required in an MPEG
reference frame. In the case of MPEG encoding, could also encoder by the different modules at the encoding process. The
take on a negative value because of the backward prediction of module that is clearly overwhelming the others is the ME with
the frames (MPEG-2) or VOP (MPEG-4). The values more than 90% of the total CPU time spent just for this task. The
and in (1) represent the horizontal and vertical displacement measurement has been taken running the MPEG Software Sim-
of the block in the reference frame and must be valid values ulation Group MPEG-2 [8] encoder on a PentiumIII processor,
within a certain area in which the best match is sought. This area using a profiling tool called Vtune [9]. MPEG 1, 2, and 4 en-
is called the search window. The argument of the sum in (1) is coders all show the same behavior.
a norm. This can be an or an norm called, respectively,
the mean squared error or mean absolute difference (MAD). In C. Existing Methods
terms of the accuracy of the estimation, they give similar results, Block matching can be seen as a three level nested loop
whilst from a computational point of view, the norm is more (Fig. 2). The outermost loop represents the search that covers
efficient, as it does not require an extra multiplication. In this the entire frame ( ). The next inner loop considers all the
work, since we are interested in fast solutions, only the MAD points in the search window ( ). Finally, the innermost
will be considered. loop represents the content of a block. The computational load
of block matching can be decreased by reducing the number of
B. Computational Complexity of the Block-Matching operations in any of these loops.
Algorithm Some algorithms decrease the complexity at the most external
If a block is pixels, the cost of searching the minima in loop exploiting the spatial and temporal redundancy of the mo-
a search window on a sequential machine is . tion information among the MBs, such as genetic algorithms
Since the match has to be computed for all the block patterns [10], [11] and hierarchical motion-estimation algorithms [12],
of the frame, the asymptotical complexity of the whole process [13]. The hypothesis behind the genetic algorithms is that the
is , where is the number of blocks in the frame. motion field does not change so much from one image to the
For example, in the case of CIF images (352 288 pixels), with next. In the case of changes, the population of motion vectors
and (corresponding to a horizontal, vertical, adapts to the new situation. The hierarchical ME relies on the
maximum motion vector of value 16), 2.8 giga-matches of hypothesis that ME computed onto subsampled images gives
pixels per second are needed, if this task has to be performed quite a fair motion field for the entire image. When this is done,
in real time (25 frames per second). a refinement is still necessary to improve the precision of motion
MOSCHETTI et al.: A STATISTICAL ADAPTIVE BLOCK-MATCHING ME 419
vectors. Usually, hierarchical search uses coarse-to-fine transi- tion grows in a monotonic way with the distance from the posi-
tion to the next resolution level. However, as shown in [12], it tion of the absolute minimum in all the domain represented by
is also possible to implement the inverse transition in order to the search window. However, this is hardly true in the general
recover from local minima and to widen the search range. case, as was shown in [22], and it represents a possible source of
The second internal loop seeks for the best match inside error for the correct identification of the minimum of the MAD.
the search window. At this level, there are also several con- The second hypothesis is that “the motion field is gentle, smooth
tributions that have studied how to reduce the complexity of and varies slowly” [16]. Most of the improvements introduced
the block-matching operation without degrading the quality as to the tss strategy, like [16]–[19] are based on this assumption.
much [14]–[19]. The two-dimensional (2-D) logarithmic search This is quite true for video-conferencing-like sequences, but the
[14] is one of the first contributions in this domain. Three-step scenario changes significantly in the general case.
search (tss) [15] and its improvements, as the new three-step For example, let us consider the general structure of a group
(ntss) [16] and the four-step search (4ss) [17] also represent of pictures (GOP) or a group of visual objects (GOV) in an
important contributions in terms of reduction of computational MPEG encoding, that is IBBPBBPBBPBB. The standard does
complexity. In the tss, the center position, as well as eight not require encoding with this GOP format, but this is the sug-
coarsely spaced positions around the center, are tested first. gested and most commonly used one. The time distance between
Then, eight positions are again checked around the previous the two reference frames, in the case of the prediction, is ,
point of minimum, but the spacing is finer. This process is where is the time interval between two successive frames. In
continued for another step to give the final vector. Some other these cases, the magnitudes of the motion vectors are not any
contributions at this level are worth mentioning, such as the longer heavily packed around zero, except for very static se-
cross-search algorithm [18] and the unrestricted center-biased quences. It is also worth adding that the goodness of the predic-
diamond search (ucbds) [19]. tion of the frames is very important, since they represent then
Work has also been done at the most internal loop in order to the references for the following frames to be predicted.
the reduce the number of operations used to determine the sum In Fig. 3, the motion vectors distributions are shown for the
of the absolute difference at a MB level. In [20], Liu and Zac- sequences “Basket” and “Flower Garden” in CCIR 601 format,
carin propose quite an efficient method based on pixel subsam- and for “Stefan” and “Akiyo” in CIF. Measurements are shown
pling at the block level. The subsampling patterns are alternated for each sequence in an interval of 20 pixels and have been
over the different positions examined so that all the pixels of taken using a FSA block matching. Frames are considered with
the block contribute to the determination of the motion vector. a temporal distance of , and , with ms. The
In [21], the authors have better results, proposing a set of repre- first two time intervals correspond, respectively, to a rate of 25
sentative pixel patterns, even though they propose this approach frames per second and to the temporal interval between two
for hardware implementation. -frames or two video object planes (VOPs) in the case of tra-
ditional MPEG GOP or GOV. The last time interval considered
( ) is an example of scenarios coming from the extensive ex-
III. STATISTICAL ADAPTIVE BLOCK MATCHING-ALGORITHM ploitation of the feature of the temporal redundancy by strongly
reducing the frame rate to get higher compression ratios. We no-
A. Probability Distributions of Motion Vectors
tice that, apart from the very static “Akiyo” sequence, the distri-
References [14]–[19] can be considered as the most impor- bution of motion vectors along the axis is much more widely
tant algorithms that exploit the concept of reducing the points spread, while for the axis it is concentrated around zero, re-
checked in the search window. These algorithms are based on gardless the temporal distance between the current frame and its
two main hypotheses. First, it is assumed that the matching func- reference.
420 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003
Fig. 3. Motion vector distributions for the sequences: (a) Basket, (b) Flower Garden, (c) Stefan, and (d) Akiyo. For each sequence, measurements are shown,
respectively, from left to right with a temporal distance between the current and the reference frame of T (1), 2T (2), and 5T (3).
In view of these results, a new algorithm is proposed, accu- Any motion vector of the MBs of this object can be expressed
rately searching along the more likely directions. as
In MPEG-4 encoding, an object is delimited by its bounding where can be viewed as a noise vector representing the inac-
box. In case segmented objects are not available at the encoder, curacies of the block matching. When the whole frame is con-
object-oriented coding becomes a traditional coding and the sidered as an object, then obviously becomes the global motion
whole scene is viewed as an object. So in the general case, we vector.
can start considering a rigid object moving through the scene.
Let be the number of MBs covering this object and be the C. Noise Distribution
motion vector of the MB . The mean motion vector of the object
is given by Let be the probability density function (pdf) of the
vectors . The statistical characteristics, the shape, and the re-
gion of support of this function will indicate where to search and
(2) in which priority in order to reach the minimum of the MAD as
fast as possible. Experimental analysis shows that can
MOSCHETTI et al.: A STATISTICAL ADAPTIVE BLOCK-MATCHING ME 421
Fig. 4. Theoretical and measured cumulative density functions of the noise vectors distributions along the x and y directions for different sequences. The frames
for predictions are taken at various temporal distance 3T , with T = 40 ms.
be modeled as a generalized Gaussian distribution (ggd) as fol- the values 1 or 2, respectively. The ggd can also tend toward a
lows: uniform distribution when the parameter tends to infinity.
Since the ggd is a parametric distribution, it is important
to understand the behavior of the parameter that optimizes
the model. Experimental results (see Fig. 4) have shown that
the distribution model varies with values of comprised in
(4) an interval [0.6,2], according to the complexity of the scene.
This means that in most of the cases, the distribution of motion
vectors is comprised in a family of curves between a Laplacian
where is the mean vector of the noise vector distribution, is and a Gaussian. Actually, in some cases, the distribution can be
the standard deviation, and is the complete gamma func- even more concentrated around zero than a Laplacian, as this
tion. The main property of the ggd is that it spans from the Lapla- generally happens for static sequences. In Fig. 4, we can see
cian to the Gaussian distribution with the parameter taking on the cumulative density functions of the empirical and estimated
422 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003
Hypothesis H0 for checked in the frame . We use a traditional full search for ini-
Hypothesis H1 for tialization at the first predicted frame of a sequence, even though
for the initialization of the motion field, other fast algorithms,
Let be the number of successes of the event in a set of such as tss for example, might have also been employed.
trials. Exactly defining the points to be checked according to the
In our case, the partition is the set of possible vectors , noise pdf may be cumbersome in practical implementations.
determined with one pixel precision. The dimension of this As an approximation, one can define a dense search area
partition , also called the number of degrees of freedom, is and a coarse search region outside. The former is defined as
the number of admissible noise vectors. Given various parame- , around the position indicated by .
ters of our system, we can set to be comfortable. The Let and be the pdf of the noise vector distri-
number of noise vectors measured for each event is . The butions along the horizontal and vertical directions. According
measure of the test is then formulated as follows: to the target probability we want to find the minimum in
the SA area, the two values , are determined solving the
(5) following integral:
(a)
(b)
6
Fig. 5. (a) Pattern checked at the first step for a search window of 14 pixels
6
horizontal and 7 vertical, for a case of = 4, = 2. (b) Direct neighbors
of the temporary minimum found at a 2 pixel resolution.
Concerning complexity, it is worth noting that updating the Another essential issue in speeding up the ME concerns
correct value of is a complex operation. In a practical case, the computation of the matching metric used. When a block
though, using the values obtained from experimental results, we matching must be performed between two frames, the matching
fixed the value of to 2. Identifying an exact value of has an criterion is usually evaluated using every pixel of the matching
important meaning for a theoretical understanding of the motion block. Since block matching is based on the idea that all pixels
field behavior, since an accurate and adaptive value can better in a block move by the same amount, a good estimation of the
approximate the distribution in the generalized Gaussian motion could be obtained, in principle, by employing only a
functions family. From a practical point of view, though, its im- fraction of all these pixels. It is evident that the fewer the pixels
pact on the algorithm implementation is mainly reflected in the considered in computing the match, the higher the inaccuracies
size of the search area SA, an area that heavily depends also on in determining the real motion vectors. Liu and Zaccarin [20]
. Given that in practical cases varies between 0.6 and 2, we proposed a technique based on this idea, which considered
have fixed to its maximum range value; as a consequence, it just one fourth of the total number of pixels forming the MB.
does not affect the SA area unnecessarily shrinking it. There- They use four subsampling patterns that are alternated with the
fore, the sole role of shaping the dimension of the SA area is searched locations. It is a very effective algorithm, but all the
attributed to , that is updated at every predicted frame. Up- positions in the search window, even if subsampled, have to
dating requires a number of operations that are proportional be considered, so it cannot be applied to every search strategy.
to the number of motion vectors; therefore, if is the number Chan et al. [21] instead proposed an adaptive pixel decimation
of MVs, we will need sub add mul plus two divisions in order to vary the number of selected pixels to be used for
to update and each frame, which is absolutely negligible in the computation of the matching criterion. They consider 4
the total accounting of operations of the ME. 4 block-patterns and select the subsampling patterns according
424 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003
(a) (b)
Fig. 8. PSNR loss versus relative execution timegain. Results obtained applying the subsampling patterns P4, P3, P2, P1 to the FSA for the sequences (a) Akiyo
and (b) Stefan.
to the value of the gradient of the luminance. Even if it exploits Let be the probability of error of the prediction of the
the subsampling within the MB, a huge amount of tests are MAD, given the PMAD. Using this probability, we can find a
required to check which is the best pattern to be dynamically threshold parameter , such that is less than some threshold
chosen. The final amount of operations and branches issued for probability of error
an implementation onto general-purpose processors would be
too high. (8)
MAD PMAD
A. MAD and Partial MAD (PMAD)
In contrast, once the threshold is found, that can be set ac-
Let PMAD be the norm computed on a subset of pixels of cording to its impact in the overall distortion, it is possible to mea-
the MB. In order to better understand the impact of the usage of sure the probability of error in estimating the MAD using PMAD.
the PMAD instead of the MAD, it is particularly useful to rely Readers may refer to [25] for a complete description of the
on the model introduced by Lengwehasatit and Ortega in [24]. MAD versus PMAD model. We are interested here in adopting
The authors state and show that: (8) to measure the probability of correct MAD prediction using
a) the partial MAD is a good estimate of the final MAD; fixed PMAD patterns.
b) there exists a reliable model for the MAD-PMAD error,
and this model is about the same for any values of PMAD, B. SIMD-Oriented Subsampling Patterns and Their Impact
i.e., MAD PMAD on ME
MAD PMAD .
From the experimental data, it is possible to assume a The basic idea in choosing the different subsampling patterns
Laplacian model for the random variable corresponding used in our algorithm is the following. Since the block matching
to the difference between MAD and PMAD, namely is an operation well suited to a parallel implementation, and
MAD PMAD) , where is the Laplacian SIMD instruction sets are now present in almost every gen-
parameter found for the particular PMAD subsampling set eral-purpose processor, the subsampling pattern should exploit
adopted. This model is verified also for the four subsampling as much as possible the SIMD concept. In Fig. 7, the different
patterns P1–P4 shown in Fig. 7. patterns considered are shown. The choice of the patterns has
The introduction of a model for the MAD PMAD has in- been done considering a 16 16 pixels block that represents
deed a double advantage. First, it allows the development of fast a MB in a block-based encoder. The different patterns consid-
adaptive techniques. Second, for a fixed threshold of the error ered allow to understand the progressive impact of subsampling
in the prediction of the MAD once given the PMAD, it allows the macroblock in the estimation process. Pixels are grouped to-
measuring of the probability of error in fast matching for a fixed gether into a group of eight in order to fully exploit the SIMD
pattern. possibilities in 64-bit SIMD registers. In Fig. 8, the loss in terms
MOSCHETTI et al.: A STATISTICAL ADAPTIVE BLOCK-MATCHING ME 425
(a) (b)
Fig. 11. (a) MAD in the sequence Stefan for the SSA algorithm with the different subsampling patterns applied and for tss and 4ss. (b) PSNR in the sequence
Stefan for the SSA algorithm with the different subsampling patterns applied and for tss and 4ss.
TABLE II
PSNR AVERAGED OVER 100 FRAMES FOR THE DIFFERENCE SEQUENCES CONSIDERED. SSAP2 REFERS TO THE SSA ALGORITHM WITH THE PATTERN P2 OF FIG. 6
(a)
(b)
Fig. 14. “psad” instruction for Itanium. Fig. 15. (a) PSNR and (b) MPQM for the sequence Stefan encoded at 1.1
Mbits/s.
TABLE III
MPQM SUBJECTIVE DISTORTION AVERAGED OVER 100 FRAMES FOR THE DIFFERENCE SEQUENCES CONSIDERED. SSAP2 REFERS TO THE SSA ALGORITHM
WITH THE PATTERN P2 OF FIG. 6
TABLE V
AVERAGE NUMBER OF DIFFERENCES AND EXTRACTIONS OF ABSOLUTE VALUES, NEEDED TO FIND THE MINIMUM OF THE MAD FOR A
MACROBLOCK, WITHIN THE RELEVANT SEARCHING WINDOW. THIS REPRESENTS ALSO THE NUMBER OF TIMES THE ARGUMENT OF THE
DOUBLE SUMMATION IN (1) IS EXECUTED FOR EACH MB TO BE MATCHED
Fig. 17. Performance of various ME algorithms when a scene change occurs at frame 40 (from Akiyo to Stefan).
the new situation avoiding very annoying quality drops. Com- coding quality, one of the main problems that might actually
pared to full search, these may reach 3 or 4 dB, as is the case for arise is related to scene changes, where temporal coherency of
the other deterministic algorithms. Table II gives the PSNR of the motion field is lost. Anyway, when inserting this algorithm
the SSA algorithm for the subpatterns presented in Section IV. in a standard video codec, this issue can be easily solved. In
The SSAP2 considers just one fourth of the number of pels of the the MPEG-2 standard, whenever a scene change arises, relevant
SSA and these are purposely sparsed in the MB to profit from MBs are coded as Intra. This is an unequivocal signal of scene
the SIMD registers and instructions. Table IV shows the com- change in a predicted frame. Therefore, when at least one fourth
plexity of the different algorithms for the different sequences. of the MBs are coded as Intra in a predicted frame, we reini-
This complexity is expressed in terms of number of differences tialize the motion field properties using a tss search strategy.
and extractions of absolute value necessary to find the minimum The case of a scene change occurring exactly at the temporal
of the MAD, for each MB, within the relevant search window, position of an Intra frame still remains to be explained. Even in
namely it describes how many times the argument of the double this case, the first predicted frame would not be so badly pre-
summation in (1) is executed for each MB to be matched. The dicted: in fact, the positions checked out of the SA area make
sequences “Barca,” “Basket,” and “Flower Garden ” are CCIR the algorithm behave as a repetitive 2 two-step search looking
601format, while “Akiyo,” “Stefan,” and “Coastguard” are CIF for convergence. At the next frame, though, the variance of the
format. From the computational point of view, the SSA algo- motion field would immediately increase. This would enlarge
rithm with the P2 pattern generally requires half of the oper- the search area and, consequently, the search algorithm would
ations needed by the other algorithms [15]–[17]. Concerning rapidly adapt to the new scene, as the variance of motion vectors
430 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003
population at the previous frame would have been higher. At the [7] Joint model number 1, ITU-T SG16 Q.6 and ISO/IEC SC29/WG11,
third predicted frame, the situation is completely stabilized. A 2001.
[8] MPEG Software Simulation Group.. [Online]. Available:
similar situation appears in sequences with sudden change of http://www.mpeg.org/MPEG/MSSG/
directions as in Stefan. [9] Intel.. [Online]. Available: http://developer.intel.com/vtune/analyzer
In Fig. 17, we show the behavior of various search algorithms [10] K. H.-K. Chow and M. L. Liou, “Genetic motion search algorithm for
video compression,” IEEE Trans. Circuits Syst. Video Technol., vol. 3,
when a scene change occurs: at frame 40 of the sequence Akiyo, pp. 440–445, Dec. 1993.
we pass to frame 40 of the sequence Stefan, still at the same bit [11] M. Mattavelli, Motion Analysis and Estimation: From Ill-Posed Discrete
rate of 1.1 Mbits/s. Inverse Linear Problems to MPEG-2 Coding. Lausanne, Switzerland:
EPFL-LTS, 1997.
[12] F. Dufaux and M. Kunt, “Multigrid block matching motion estimation
VII. CONCLUSIONS with an adaptive local mesh refinement,” Proc. SPIE Visual Communi-
cation and Image Processing, Nov. 1992.
In this paper, a new algorithm to realize the block-matching [13] J. Chalidabhongse and C.-C. J. Kuo, “Fast motion vector estimation
ME has been presented. Compared to exhaustive and loga- using multiresolution-spatio-temporal correlations,” IEEE Trans. Cir-
cuits Syst. Video Technol., vol. 7, pp. 477–488, June 1997.
rithmic search, it requires fewer operations. This is obtained [14] J. R. Jain and A. K. Jain, “Displacement measurement and its application
in two different steps. The first step consists of reducing the in interframe image coding,” IEEE Trans. Commun., vol. COM-29, pp.
number of checking points in the search window by considering 1799–1808, 1981.
[15] T. Koga et al., “Motion compensated intraframe coding for video con-
a pattern that dynamically exploits the statistical properties ferencing,” in Proc. NTC 81, New Orleans, LA, 1981.
of the motion field. The second step consists of subsampling [16] R. Li, B. Zeng, and M. L. Liou, “A new three step search algorithm for
the MB for the computation of the MAD. The subsampling of block motion estimation,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 4, pp. 438–442, Aug. 1994.
the MB is realized in such a way that the advantage of using [17] L. M. Po and W. C. Ma, “A novel four-step search algorithm for fast
SIMD technologies currently available onto general purpose block motion estimation,” IEEE Trans. Circuits Syst. Video Technol.,
processors is considerable. Simulations have been run using vol. 6, pp. 313–317, June 1996.
[18] M. Ghanbari, “The cross-search algorithm for motion estimation,” IEEE
PSNR and MPQM. Overall, they show a better quality for Trans. Commun., vol. 38, pp. 950–953, July 1990.
the proposed algorithm when compared to the other classical [19] J. Y. Tham et al., “A novel unrestricted center-biased diamond search
searches, and also when the subsampling pattern is applied. algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video
Technol., vol. 8, pp. 369–377, Aug. 1998.
According to the local statistical properties of the sequence, [20] B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block
the temporal possibility of having a higher number of points to motion vectors,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp.
check in the search window is counterbalanced by the reduced 148–157, Apr. 1993.
[21] Y. L. Chan, W. L. Hui, and W. C. Siu, “A block motion vector estimation
number of pixels needed to compute the difference between using pattern based pixel decimation,” in Proc. IEEE Int. Symp. Circuits
two MBs. This brings an overall computational improvement and Systems, June 1997, pp. 1153–1157.
through statistical adaptation. Results also allow to validate [22] Y. L. Chan and W. C. Siu, “Relaiable block motion estimation through
the confidence measure of error surface,” Signal Processing, vol. 76, pp.
the hypothesis that all the pixels of the MB are moving with 135–146, 1999.
the same motion vector, as subsampling within the MB does [23] A. Papoulis, Probability, Random Variables and Stochastic Processes,
not affect the quality in a significant way. Accordingly, to M. G. Hill, Ed. New York: McGraw Hill, 1991.
[24] K. Lengwehasatit and A. Ortega, “Complexity-distortion tradeoffs in
obtain a better performance for rapidly varying sequences, it vector matching based on probabilistic partial distance techniques,” in
is preferable to check more points in the search window than Proc. Data Compression Conf. DCC’99, Salt Lake City, UT, 1999, pp.
to use all the pixels within the MB. All in all, the proposed 394–403.
[25] K. Lengwehasatit, “Complexity-distortion tradeoffs in image and video
algorithm provides better quality with higher increase in speed. compression,” in Electrical Engineering. Los Angeles: Univ. of
The performance of this algorithm may definitely improve Southern California, May 2000.
when semantically segmented objects are at one’s disposal, [26] L. Gwennap, Intel’s Itanium and IA-64: Technology and Market Fore-
cast: Micro Design Resources, 2000.
like in MPEG-4 object-oriented coding, thanks to the increased [27] J. H. Wolff, “Programming methods for the Pentium III processor’s
temporal coherency in the motion field. streaming simd extensions using the vtune performance enhancement
environment,” Intel Technol. J., 2nd quarter 1999.
[28] S. Winkler, “Vision models and quality metrics for image processing ap-
ACKNOWLEDGMENT plications,” in Signal Processing Laboratory. Lausanne, Switzerland:
EPFL, 2000.
Special thanks go to M. Sorci for constructive suggestions [29] L. C. J. Van den Branden, “Color moving pictures quality metrics,” in
and valuable insights. Proc. ICIP ’96, Lausanne, Switzerland, 1996.
REFERENCES
[1] Coding of moving pictures and associated audio information—Part 2:
Video, ISO/IEC 11 172-2, Information Technology, 1994. Fulvio Moschetti (S’99–M’01) received the M.S.
[2] Generic coding of moving pictures and associated audio informa- degree (cum laude) in electrical engineering from
tion—Part 2: Video, ISO/IEC Committee Draft 13 818-2, Information the University of Pavia, Pavia, Italy, in 1997, and
Technology, 1994. the Ph.D. degree from the Swiss Federal Institute of
[3] Video codec for audiovisual services at px64Kbit/s, ITU Recommenda- Technology, Lausanne, Switzerland, in 2001.
tion H.261. He was previously with Siemens Microelec-
[4] Video coding for low bitrate communications, ITU-T, Recommendation tronics Austria and the Swiss Federal Institute of
H.263, 1996. Technology. He is currently with NTT DoCoMo
[5] Motion pictures expert group-overview of the MPEG-4 standard, Multimedia Laboratories, Yokosuka, Japan. His
ISO/IEC JTC1/SC29/WG11 N2459. research interests include digital image and video
[6] M. Kunt, A. Ikonomopoulos, and M. Kocher, “Second generation image processing and image and video coding for mobile
coding techniques,” Proc. IEEE, vol. 73, no. 4, pp. 549–575, Apr. 1985. applications.
MOSCHETTI et al.: A STATISTICAL ADAPTIVE BLOCK-MATCHING ME 431
Murat Kunt (SM’70–M’74–SM’80–F’86) was born Eric Debes (M’03) received the M.S. degree in
in Ankara, Turkey, in 1945. He received the M.S. de- electrical and computer engineering from Supélec,
gree in physics and the Ph.D. degree in electrical en- France, in 1996, the M.S. degree in electrical engi-
gineering, both from the Swiss Federal Institute of neering from the Technical University Darmstadt,
Technology (EPFL), Lausanne, Switzerland, in 1969 Darmstadt, Germany, in 1997, and the Ph.D. degree
and 1974, respectively. in signal processing from the Swiss Federal Institute
From 1974 to 1976, he was a Visiting Scientist at of Technology, Lausanne, Switzerland.
the Research Laboratory of Electronics, Massachu- Since 2001, he has been a Researcher in the Media
setts Institute of Technology, Cambridge, where he Systems Group of the Microprocessor Research
developed compression techniques for X-ray images Labs, Intel Corporation, Santa Clara, CA. His
and electronic image files. In 1976, he returned to research interests include image and video coding
EPFL, where he is currently Professor of Electrical Engineering and Director and processing algorithms, as well as computer architecture and parallelism. In
of the Signal Processing Institute. He conducts teaching and research in dig- particular, he is working on new hardware features in the CPU and chipset to
ital signal and image processing, with applications to modeling, coding, pat- provide better media application performance and end-user quality of service
tern recognition, scene analysis, industrial developments, and biomedical en- with a given system and processor power envelope and/or energy budget.
gineering. His Laboratory participates in a large number of European projects
under various programmes such as Esprit, Eureka, Race, HCM, Commett, and
Cost. He is the author or co-author of more than 200 research papers and 15
books, and holds seven patents. He consults for governmental offices, including
the French General Assembly.
Dr. Kunt is the Editor-in-Chief of the Signal Processing Journal and
a founding member of the European Association for Signal Processing
(EURASIP). He serves as a Chairman and/or member of the scientific
committees of several international conferences and on the Editorial Boards
of the PROCEEDINGS OF THE IEEE, Pattern Recognition Letters, and Traitement
du Signal. He was the Co-Chairman of the first European Signal Processing
Conference in 1980 and the General Chairman of the International Image
Processing Conference (ICIP’96) in 1996, both held in Lausanne, Switzerland.
He was the President of the Swiss Association for Pattern Recognition from its
creation until 1997. He received the Gold Medal of EURASIP for meritorious
services, the IEEE ASSP Technical Achievement Award, the IEEE Third
Millennium Medal, an Honorary Doctorate from the Catholic University of
Louvain, the Technical Achievement Award of EURASIP, and the Imaging
Scientist of the Year Award of the IS&T and SPIE in 1983, 1997, 2000, 2001,
2003, and 2003, respectively.