Professional Documents
Culture Documents
An Improved Inter-Frame Prediction Algorithm For Video Coding Based On Fractal and H.264 PDF
An Improved Inter-Frame Prediction Algorithm For Video Coding Based On Fractal and H.264 PDF
12, 2017.
Digital Object Identifier 10.1109/ACCESS.2017.2745538
ABSTRACT Video compression has become more and more important nowadays along with the increasing
application of video sequences and rapidly growing resolution of them. H.264 is a widely applied video
coding standard for academic and commercial purposes. And fractal theory is one of the most active branches
in modern mathematics, which has shown a great potential in compression. In this paper, this study proposes
an improved inter prediction algorithm for video coding based on fractal theory and H.264. This study
take the same approach to make intra predictions as H.264 and this study adopt the fractal theory to make
inter predictions. Some improvements are introduced in this algorithm. First, luminance and chrominance
components are coded separately and the partitions are no longer associated as in H.264. Second, the partition
mode for chrominance components has been changed and the block size now rages from 16 × 16 to 4 × 4,
which is the same as luminance components. Third, this study introduced adaptive quantization parameter
offset, changing the offset for every frame in the quantization process to acquire better reconstructed image.
Comparison between the improved algorithm, the original fractal compress algorithm and JM19.0 (The latest
H.264/AVC reference software) confirms a slightly increase in Peak Signal-to-Noise Ratio, a significant
decrease in bitrate while the time consumed for compression remains less than 60% of that using JM19.0.
2169-3536
2017 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 5, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 18715
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
S. Zhu et al.: Improved Inter-Frame Prediction Algorithm for Video Coding Based on Fractal and H.264
the optimal mode. The Lagrange strategy for rate-distortion If all Cauchy sequences in the metric space (X, d) are
optimization (RDO) is used in H.264 to choose the optimal convergent, or if each Cauchy sequence converges to X,
coding mode: then (X, d) is called a complete metric space or Cauchy space.
and the current frame is called the residuals frame. The based on a rule. The quantization process of a uniform scalar
transformed and quantified residuals are written into the bit quantizer can be written as:
stream. The quantified data is then subjected to inverse quan-
Z = floor(|W | /1) · sgn(W ) (18)
tization and inverse transform, and added to the prediction
frame to form the reconstruction frame, which will be stored where 1 denotes the quantization step length, the function
for prediction of subsequent frames. floor() denotes the operation to round down a number to the
nearest integer, sgn() denotes the sign function that extract the
IV. IMPROVEMENT TO THE ORIGINAL ALGORITHM sign of a real number.
A. CHROMINANCE MACROBLOCK RESIZING Accordingly, the inverse quantization equation is:
In H.264, the 8×8-2×2 block partitioning mode is adopted W0 = 1 · Z (19)
for both intra-frame and inter-frame coding for chromi-
nance components. Because the number of pixels in the It can be seen that the quantization step length is an impor-
chrominance component is one fourth that of the luminance tant parameter to determine the accuracy and bitrate of an
component, each 8×8 chrominance block corresponds to a image. A larger step length means that fewer bits are needed
16×16 luminance block, sharing the same motion vector and for coding a numerical value and that more information about
macroblock partitioning. In normal video sequences, the con- the video is lost.
tents of chrominance components are often closely related to The values in the range of 0-1 will be mapped to 0 after
that of luminance components, so they share the same motion quantization, causing inevitable quantization errors. Hence,
vector and partition mode with the corresponding luminance H.264 defines a concept of quantization offset f. After adding
component instead of being predicted separately. This is a the offset to Equation (18), the quantization equation is:
highly effective method to reduce the bitrate and coding time |W | + f
Z = floor( ) · sgn(W ) (20)
in H.264. 1
While making predictions, the fractal coding scheme The inverse quantization equation remains the same. After
should not merely consider the motion vector by finding improving the quantizer, the deadband of quantization can be
the exactly identical or most similar blocks. Instead, frac- controlled by changing the value of f. If W is in the range
tal matching between blocks, which probably involve frac- (−1 + f, 1 − f), the quantized value is zero. The deadband
tal affine transforms, should also be taken into account. of quantization decreases with f. Note that the size of the
Hence, it is very possible that the luminance block and the deadband influences the subjective quality of the image. If the
chrominance block do not share the same motion vector and deadband is too large, many small-value pixels will be quan-
fractal partition mode, underscoring the need to code them tized to 0, causing the loss of many minutiae of the image.
separately. Based on statistical results, H.264 defines f = 1/3 for intra-
The chrominance and luminance components have their frame prediction and f = 1/6 for inter-frame prediction.
own fractal parameters, motion vectors and fractal modes However, setting the value of f like this relies on statistical
when they are separately coded. If the 8×8-2×2 partition results, and it is not the optimal quantization offset for the
mode is applied, the number of motion vectors will be at least coding of each frame. It is learned from our experiment that
twice that of H.264. At the same time, bitrate will continue the quantization offset has a great influence on the PSNR of
to increase due to the fractal parameters. Therefore, the par- an image. Taken in to consideration the fact that the offset is
titioning mode of the chrominance component is switched not written into the bit stream, the quality of image can be
to 16×16-4×4, the same as the luminance components. Then optimized by adjusting the quantization offset for each frame
the chrominance component can share the same coding pro- without influencing the size of the bit stream. Motivated
cess as the luminance component because they have exact the by this idea, this study proposes a self-adaptive QP offset
same size. algorithm. Details are as follows.
Adopting the new partition mode reduces the number of Step 1: After fractal prediction, compute the residuals and
macroblocks of the chrominance component by 3/4. Our then proceed to choose the QP offset.
experimental results show that the coding time is reduced by Step 2: Empirically pre-set the range of the QP offset, and
over 40% while maintaining almost the same Peak Signal-to- ensure that this range can be narrowed according to statistical
Noise Ratio (PSNR) for chrominance components. Details of results. QP offset is not set up directly. Instead, it is related to
the experimental results are given in Section V. the value of QP, and it can be computed as (21):
Qstep = Q(QP%6)·2QP/6 (21)
B. SELF-ADAPTIVE QP OFFSET
f = Qstep /denom (22)
Quantization provides an extremely effective tool to reduce
the bitrate of a video. It needs a lot of bits to represent a where Q(QP%6) refers to the Qstep that corresponds to
numerical value that has a large dynamic range. But after QP = 1 − 6, and it can be retrieved quickly using the
quantization, each value can be represented with fewer bits, preset data array; denom denotes the factor used to set the
for the data in a certain range is mapped to a same word quantization offset f, and it is also an important parameter to
TABLE 2. Configuration.
(d) Inverse transform and normalization uration file are given in Table 2.
B. SELF-ADAPTIVE QP OFFSET
After resizing the chrominance block, this study introduces
the self-adaptive QP offset algorithm. Experimental results FIGURE 6. Reconstructed frames from: (a) Proposed algorithm;
(b) JM19.0; (c) Original video.
indicate that the searching for the optimal QP offset results
in an increase in the coding time, but the PSNR of the
TABLE 6. Coding time comparison of proposed algorithm and JM19.0.
coded image is enhanced. In order to reduce the influence of
optimal QP searching on time consuming, the proposed self-
adaptive algorithm is only applied to the coding process of the
luminance component. Another reason for this is that human
is more sensitive to luminance change than to chrominance
change [26]. Increasing PSNR of the luminance component
promises more performance gains.
PSNR of the luminance component increases by the largest
margin of about 4dB when QP=0. It can increase by about
1.4dB when QP=4 and by 0.9∼1dB when QP=8, 20,
28 and 36. Table 5 presents the variation of coding time and
PSNR of the Y component for each video given QP=28. in the proposed algorithm is higher than that in JM19.0.
Human is more sensitive to luminance change than to chromi-
C. PERFORMANCE COMPARISON WITH JM19.0 nance change, so the PSNR decrease in chrominance com-
JM19.0 is the latest reference software of H.264/AVC. ponents is acceptable. The bitrate – PSNR curve of the
A comparison is made between the proposed algorithm and test video ‘football’ for Y, U and V component is shown
JM19.0 to demonstrate the performance of our algorithm. in Fig. 7.
We evaluate the subjective quality first and then a comparison From those figures, it can be observed that the proposed
is made under the following metrics: PSNRs of the Y, U and V algorithm can make better use of the bitrate because its
components, coding time and bitrate. PSNR is higher than that of JM19.0 given a high bitrate.
We choose 19th frame of the ‘mobile’, ‘paris’, and ‘bus’ But the proposed algorithm is inferior to JM19.0 when
when QP=14. Note that the 19th frame is the last P frame the bitrate is low. A possible reason is that the proposed
before the next I frame appears. The original frame, the recon- algorithm does not skip the macroblocks whose residuals
structed frame from JM19.0 and the reconstructed frame from are all zeros. Meanwhile, the fractal parameters transmitted
the proposed algorithm are shown in Fig. 6. occupy some bits and thus limit the scope for the bitrate to
It can be found that the reconstructed frames from the decrease.
proposed algorithm have almost the same quality as that from The proposed algorithm achieves enormous superiority in
the original video and JM19.0. terms of the coding time. Performance results of the algo-
When encoding with a low QP, PSNR of the Y com- rithms are given in Table 6.
ponent in the proposed algorithm is smaller than that in It can be seen that the proposed algorithm shortens the cod-
the JM19.0 algorithm by 2dB at most. Due to the use of ing time by about 50% when compared with JM19.0 because
larger blocks, PSNRs of the U and V components decrease of its low computation complexity. The coding time is much
by a larger margin than that of the Y component, and shorter than that of JM19.0 even after applying self-adaptive
are smaller than those of JM19.0 by about 8dB at most. QP offset algorithm, which may increase the coding time by
But when encoding with a high QP, the value of PSNR 10%-50%.
APPENDIX
The process to derive the fractal coefficients s and o:
The differences between the current frame and the refer-
ence frame is calculated through (29)
N
X
SSD = |ri − (s · di + o)|2 (29)
i=1
where N denotes the number of pixels in the macroblock,
ri denotes the values of pixels in the range block, di denotes
the values of pixels in the domain block. To minimize SSD,
find the partial derivative of the square of SSD with respect
to s and o:
N
∂SSD(s, o) X
= 2(sdi + o − ri )·di = 0 (30)
∂s
i=1
N
∂SSD(s, o) X
= 2(sdi + o − ri ) = 0 (31)
∂o
i=1
Then we have:
N N
1 X X
o= ( ri d i − s di2 ) (32)
N
P
di i=1 i=1
i=1
N
X N
X
s di = ri − N · o (33)
i=1 i=1
Furthermore, from Equation (32) and (33) we can finally
find the equation to calculate s and o:
PN PN PN
N ri di − ri di
i=1 i=1 i=1
s = 2
N N
N
P 2
P
di − ( di ) (34)
i=1
N i=1
N
1 P P
o = r i − s di
N i=1
i=1
REFERENCES
[1] E. G. I. Richardson. (Mar. 2003). H. 264/MPEG-4 Part 10 White Paper.
FIGURE 7. Bitrate-PSNR curve: (a) football(Y); (b) football(U);
[Online]. Available: http://www.vcodex.com
(c) football(V). [2] A. Puri, X. Chen, and A. Luthra, ‘‘Video coding using the
H. 264/MPEG-4 AVC compression standard,’’ Signal Process., Image
Commun., vol. 19, no. 9, pp. 793–849, 2004.
[3] G. Cote, B. Erol, M. Gallant, and F. Kossentini, ‘‘H.263+: Video coding
at low bit rates,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 7,
VI. CONCLUSION pp. 849–866, Nov. 1998.
In this paper, this study proposes an improved fractal [4] S. Acharjee, D. Biswas, N. Dey, P. Maji, and S. S. Chaudhuri, ‘‘An efficient
motion estimation algorithm using division mechanism of low and high
video compression algorithm, which combines the intra- motion zone,’’ in Proc. IEEE Int. Multi-Conf. Autom., Comput., Commun.,
frame coding algorithm of H.264 and the fractal inter-frame Control Compressed Sens. (iMacs), Mar. 2013, pp. 169–172.
[5] M. Wang, R. Liu, and C.-H. Lai, ‘‘Adaptive partition and hybrid method SHIPING ZHU (M’05) was born in Baoji, China,
in fractal video compression,’’ Comput. Math. Appl., vol. 51, no. 11, in 1970. He received the B.Sc. and M.Sc. degrees
pp. 1715–1726, 2006. in measuring and testing technologies and instru-
[6] M. S. Lazar and L. T. Bruton, ‘‘Fractal block coding of digital video,’’ IEEE ments from the Xi’an University of Technology,
Trans. Circuits Syst. Video Technol., vol. 4, no. 3, pp. 297–308, Jun. 1994. Xi’an, China, in 1991 and 1994, respectively,
[7] C.-M. Lai, K.-M. Lam, and W.-C. Siu, ‘‘A fast fractal image coding based and the Ph.D. degree in precision instrument and
on kick-out and zero contrast conditions,’’ IEEE Trans. Image Process., machinery from the Harbin Institute of Technol-
vol. 12, no. 11, pp. 1398–1403, Nov. 2003.
ogy, Harbin, China, in 1997.
[8] M. F. Barnsley, ‘‘Fractals everywhere,’’ J. Roy. Statist. Soc.-Ser. A Statist.
From 1997 to 1999, he was a Post-Doctoral
Soc., vol. 158, no. 2, p. 339, 1995.
[9] A. E. Jacquin, ‘‘Image coding based on a fractal theory of iterated contrac- Fellow with Beihang University, Beijing, China.
tive image transformations,’’ IEEE Trans. Image Process., vol. 1, no. 1, From 2000 to 2002, he was a Post-Doctoral Fellow with the Brain and Cog-
pp. 18–30, Jan. 1992. nition Research Center, Université Paul Sabatier, Toulouse, France. From
[10] S. P. Zhu, Y. S. Hou, Z. K. Wang, and E. B. Kam, ‘‘Fractal video sequences 2002 to 2004, he was a Post-Doctoral Fellow with the Department of Com-
coding with region-based functionality,’’ Appl. Math. Model., vol. 36, puter Science and the Department of Electrical and Computer Engineering,
no. 11, pp. 5633–5641, 2012. Université de Sherbrooke, Sherbrooke, QC, Canada. Since 2005, he has been
[11] S. G. Farkade and S. D. Kamble, ‘‘A hybrid block matching motion an Associate Professor with the Department of Measurement Control and
estimation approach for fractal video compression,’’ in Proc. Int. Conf. Information Technology, School of Instrumentation Science and Optoelec-
Circuits Power Comput. Technol. (ICCPCT), Mar. 2013, pp. 1151–1155. tronics Engineering, Beihang University. He has authored or coauthored over
[12] R. E. Chaudhari and S. B. Dhok, ‘‘Acceleration of fractal video compres- 80 journal and conference papers. He holds 50 China invention patents.
sion using FFT,’’ in Proc. 15th Int. Conf. Adv. Comput. Technol. (ICACT), His current research interests include image processing and video coding,
Sep. 2013, pp. 1–4. computer vision, and machine vision for 3-D measurement.
[13] S. D. Kamble, N. V. Thakur, L. G. Malik, and P. R. Bajaj, ‘‘Fractal video
coding using modified three-step search algorithm for block-matching
motion estimation,’’ in Proc. Int. Conf. Comput. Vis. Robot., Adv. Intell.
Syst. Comput. (ICCVR), vol. 332. 2015, pp. 151–162.
[14] S. D. Kamble, N. V. Thakur, L. G. Malik, and P. R. Bajaj, ‘‘Quadtree
partitioning and extended weighted finite automata-based fractal colour
video coding,’’ Int. J. Image Mining, vol. 2, no. 1, pp. 31–56, 2016.
[15] S. P. Zhu, L. Li, J. Chen, and K. Belloulata, ‘‘An automatic region-based
video sequence codec based on fractal compression,’’ AEU-Int. J. Electron.
Commun., vol. 68, no. 8, pp. 795–805, 2014.
[16] S. D. Kamble, N. V. Thakur, and P. R. Bajaj, ‘‘A review on block matching
SHUPEI ZHANG was born in Panzhihua, China,
motion estimation and automata theory based approaches for fractal cod-
ing,’’ Int. J. Interactive Multimedia Artif. Intell., vol. 4, no. 2, pp. 91–104,
in 1994. He received the B.Sc. degree in instru-
2016. mentation science and opto-electronics engineer-
[17] S. Acharjee, G. Pal, T. Redha, S. Chakraborty, S. S. Chaudhuri, and N. Dey, ing from Beihang University, Beijing, China,
‘‘Motion vector estimation using parallel processing,’’ in Proc. IEEE Int. in 2016, where he is currently pursuing the
Conf. Circuits, Commun., Control Comput. (IC), Nov. 2014, pp. 231–236. master’s degree. His current research interests
[18] Z. Huang, ‘‘Frame-groups based fractal video compression and its parallel include video compression.
implementation in Hadoop cloud computing environment,’’ Multidimen-
sional Syst. Signal Process., pp. 1–18, Mar. 2017, doi: 10.1007/s11045-
017-0480-1.
[19] A. M. H. Y. Saad and M. Z. Abdullah, ‘‘High-speed implementation of
fractal image compression in low cost FPGA,’’ Microprocess. Microsyst.,
vol. 47, pp. 429–440, Nov. 2016.
[20] A. M. H. Saad and M. Z. Abdullah, ‘‘Real-time implementation of fractal
image compression in low cost FPGA,’’ in Proc. IEEE Int. Conf. Imag.
Syst. Techn. (IST), Oct. 2016, pp. 13–18.
[21] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, ‘‘Overview of
the H.264/AVC video coding standard,’’ IEEE Trans. Circuits Syst. Video
Technol., vol. 13, no. 7, pp. 560–576, Jul. 2007.
[22] H.264/AVC Reference Software JM 19.0. Accessed: Jun. 30, 2017.
[Online]. Available: http://iphome.hhi.de/suehring/tml/
[23] J. E. Hutchinson, ‘‘Fractals self-similarity,’’ Indiana Univ. Math., vol. 35,
CHENHAO RAN was born in Yuncheng, China,
no. 5, pp. 713–741, 1981. in 1995. He received the B.Sc. degree in instru-
[24] Y. Fisher, Fractal image Compression: Theory and Application. New York, mentation science and opto-electronics engineer-
NY, USA: Springer, 2012. ing from Beihang University, Beijing, China,
[25] B. Sankaragomathi, L. Ganesan, and S. Arumugam, ‘‘Encoding video in 2016, where he is currently pursuing the mas-
sequences in fractal-based compression,’’ Fractals, vol. 15, no. 4, ter’s degree. His current research interests include
pp. 365–378, 2007. TDLAS and image reconstruction from histogram.
[26] W. Wang, J. Dong, and T. Tan, ‘‘Effective image splicing detection based
on image chroma,’’ in Proc. 16th IEEE Int. Conf. Image Process. (ICIP),
Nov. 2009, pp. 1257–1260.