Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

128 IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO.

1, MARCH 2016

Fast Coding Quad-Tree Decisions Using Prediction Residuals


Statistics for High Efficiency Video Coding (HEVC)
Hui Li Tan, Member, IEEE, Chi Chung Ko, Senior Member, IEEE, and Susanto Rahardja, Fellow, IEEE

Abstract—High Efficiency Video Coding (HEVC) is the latest video standard [3], [4], the HEVC standard approximately doubles data
coding standard to meet market demands for real-time high quality compression rate for equal perceptual video quality [5]. Designed to
video codecs. As compared to its predecessor H.264/AVC, HEVC can
address increased video resolutions, the attractive compression ben-
achieve significant compression gains but with higher encoding com-
plexity. Therefore, for real-time applications, significant encoding time efits offered by HEVC will ease the bandwidth bottleneck for the
reduction is still necessary. In the HEVC test model, the large number of deployment of HD and beyond HD contents.
coding quad-tree decisions to be tested during rate-distortion optimiza- Nonetheless, HEVC is reported to be several times more complex
tion would result in high encoding time. Hence, we propose a method to than H.264/AVC [2] and particularly suffers from high encoding time.
reduce the high encoding time by pruning the coding quad-trees using
prediction residuals statistics. Experimental results from HM16.3-based The high encoding time of HEVC is attributed to various new cod-
implementations show that the proposed residual-based pruning method ing tools and features that are introduced to provide increased coding
can reduce encoding time by an average of about 44% with an average flexibility. For instance, HEVC employs a flexible hierarchical pic-
of about 1.0% coding loss. ture partitioning structure that enables the use of large and multiple
Index Terms—High Efficiency Video Coding (HEVC), Coding sizes of processing units. HEVC also employs more comprehensive
Quad-Tree. intra prediction, inter prediction, transform and quantization, a new
loop filter, and an enhanced version of entropy coding. Such sophis-
ticated tools come with performance improvement but at the expense
I. I NTRODUCTION
of higher coding complexity, and this complexity would consider-
ECENT advancements in integrated circuits and multimedia
R technology have made consumer mobile devices ubiquitous,
allowing video contents to be broadcasted/transmitted anytime and
ably hamper its usage particularly in applications where real-time is
required.
In this paper, we propose a method to reduce the high encoding
anywhere. This directly increases market demands for real-time high time of HEVC, by pruning its coding quad-trees using prediction
quality video codecs. For instance, mobile video calling and con- residuals statistics. The proposed residual-based pruning method
ferencing applications require consumer mobile devices to encode introduces little modifications and insignificant computational com-
and decode video contents in real-time. Furthermore, online social plexity overheads to the existing encoder, but can effectively reduce
networking and broadcasting platforms have been extending their redundant mode and partition checks.
capabilities to allow consumers to live stream events using mobile This paper is organized as follows. First, an overview of the HEVC
devices. Even commercial broadcasting companies have been relying picture partitioning structure is provided in Section II. Then, prior
on small consumer grade devices for live news updates from places works on fast encoding methods for coding quad-tree decisions are
with accessibility constraints. As these devices are being equipped discussed in Section III. Subsequently, our proposed method is pre-
with increasingly high resolution content capture systems, consumers sented in Section IV, and the experimental results are shown in
will have the capability to produce higher quality video contents. In Section V. Finally, the conclusion is discussed in Section VI.
order for the contents to be readily available for consumption, there is
a need to have high performance video codecs that is able to stream
high quality video contents within similar bandwidth and time con- II. OVERVIEW OF HEVC P ICTURE PARTITIONING S TRUCTURE
straints. Nonetheless, the video codecs must remain low in complexity As shown in Fig. 1, like its predecessor standards such as MPEG-4
to run effectively on these low power devices with smaller integrated and H.264/AVC, HEVC adopts the conventional approach of using
circuits. temporal and spatial prediction, followed by transform coding of the
High Efficiency Video Coding (HEVC) [1], [2] is the lat- prediction residuals and entropy coding of the quantized transform
est video coding standard developed by the Joint Collaborative coefficients with other coding parameters. Nonetheless, HEVC uses
Team - Video Coding (JCT-VC), which comprises the ITU-T Video a flexible hierarchical picture partitioning structure to enable large
Coding Experts Group (VCEG) and the ISO/IEC Moving Pictures and multiple sizes of coding units (CUs), prediction units (PUs), and
Experts Group (MPEG). As compared to the existing H.264/AVC transform units (TUs) [2].
A CU defines a region sharing the same prediction mode (i.e.,
Manuscript received June 19, 2015; revised October 11, 2015; accepted
October 21, 2015. Date of publication January 12, 2016; date of current SKIP, INTER, or INTRA), a PU defines a region sharing the same
version March 2, 2016. (Corresponding author: Susanto Rahardja.) prediction information (i.e., intra prediction modes, motion parame-
H. L. Tan is with the Institute for Infocomm Research, A*STAR, ters, etc.), and a TU defines a region sharing the same transformation
Singapore 138632, and also with the National University of Singapore, and quantization. As shown in Fig. 2, in a coding quad-tree repre-
Singapore 117583 (e-mail: hltan@i2r.a-star.edu.sg).
C. C. Ko is with the National University of Singapore, Singapore 117583 sentation, a picture is first divided into non-overlapping coding tree
(e-mail: elekocc@nus.edu.sg). units (CTUs), which can then be recursively divided into smaller
S. Rahardja is with Northwestern Polytechnical University, Xi’an, CUs. A leaf CU, which is a CU that is not divided into sub-CUs,
Shaanxi, 710072, P.R. China, and also with the National University of can either be SKIP, INTER or INTRA coded. The CTU size is 64×64
Singapore, Singapore 117583 (e-mail: susanto@nwpu.edu.cn).
Color versions of one or more of the figures in this paper are available
(depth 0), and the minimum CU size is 8×8 (depth 3) [6]. An INTER
online at http://ieeexplore.ieee.org. or INTRA coded CU can then be further divided into PUs of various
Digital Object Identifier 10.1109/TBC.2015.2505406 partitions. The possible PU partitions for a CU of size 2N × 2N are
0018-9316 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO. 1, MARCH 2016 129

reference software description [12], the best prediction mode and


partition are first determined by assuming that the CU has not been
divided before recursively repeating the same process for each of its
sub-CUs. A final decision as to whether the CU is to be divided or
not is then made only after all the sub-CUs have been analyzed. This
corresponds to traversing the coding quad-tree in a depth-first man-
ner; the coding quad-tree decisions are made bottom up. Hence, if the
initial branching of the coding quad-trees can be avoided, the com-
putational complexity of RDO at the encoder can be greatly reduced.
Fig. 1. A block-based hybrid video encoder. An approach to encoding time
reduction is by judiciously pruning the rate-distortion search space in the In this paper, we explore the use of prediction residuals statistics for
circled region. pruning the coding quad-trees.

III. P RIOR W ORKS ON FAST E NCODING M ETHODS


The “Early SKIP Decision (ESD)” method [13], [14], “Coding-
Block-Flag Fast Condition (CFM)” method [15], and “Early CU
Condition (ECU)” method [16], [17] are some methods already
present in the reference software and can be enabled with the encoder
parameters ESD, CFM, and ECU respectively [2]. The “Early SKIP
Decision (ESD)” method checks for the motion vector difference and
coding-block-flag (cbf) when a unit is encoded with CU mode INTER
with partition 2N × 2N. Zero cbf indicates that there is no non-zero
quantized transform residuals to be coded. If the motion vector differ-
ence and cbf are both zero, the best interim CU mode is set as SKIP,
and all remaining CU mode checks at the current depth are skipped.
The “Coding-Block-Flag Fast Condition (CFM)” method checks for
the cbf when a unit is encoded with CU mode INTER with any par-
tition. If the cbf is zero, the best interim CU mode is set as INTER
Fig. 2. An illustration of the nested coding and transform quad-trees in the
HM16.3 reference software encoder. Blue bubbles: Coding quad-tree repre- with the current partition, and all remaining CU mode checks at the
sentation, where a largest coding tree unit (CTU) can be recursively divided current depth are skipped. The “Early CU Condition (ECU)” method
into smaller coding units (CUs). Green bubbles: A leaf CU can be further checks for the best interim CU mode at the current depth before the
divided into prediction units (PUs). Yellow bubbles: Transform quad-tree CU mode checks at the next depth. If the best interim CU mode at
representation, where a transform unit (TU) can be recursively divided into
smaller TUs. the current depth is SKIP, the best CU mode is set as SKIP, and all
remaining CU mode checks are skipped, i.e., the coding quad-tree is
terminated.
2N × 2N, N × 2N, 2N × N, 2N × N2 , 2N × 3N N 3N
2 , 2 × 2N, and 2 × 2N
There are also a few prior works on coding quad-tree mode and
for INTER coded CUs, and 2N × 2N and N × N for INTRA coded partition fast decision methods which are not present in the reference
CUs [6]. Finally, in a transform quad-tree representation nested in a software. Some of these methods attempt to prune the search space by
CU, a TU can be recursively divided into smaller TUs. The TU size terminating the coding quad-trees early. The “Fast Encoder (FEN)”
can range from 32 × 32 down to 4 × 4, with a maximum of three method [18] checks for the RD cost when a unit is encoded with
levels allowed for the transform quad-tree depth within a CU [6]. CU mode SKIP. If the SKIP RD cost is less than a threshold, the
The coding and transform quad-trees recursively decompose an coding quad-tree is terminated. The threshold is set as the aver-
image into variable unit sizes so that regions of different characteris- age RD cost of a number of previously SKIP coded CUs of the
tics can be better encoded. For instance, regions of high stationarity same CU size. An attempt to improve the “Fast Encoder (FEN)”
and homogeneity can be encoded with larger unit sizes with smaller method adjusts the weights appended to the average RD cost of a
side-information overheads. However, according to the rate-distortion number of previously SKIP coded CUs of the same CU size [19].
optimization (RDO) framework [7], [8], at a unit level, an operating The default weight is 1.0, and is increased by 0.5 if either one
point on the convex hull of all rate-distortion (RD) pairs is found by of the neighboring CU is SKIP coded or its CU of higher depth
minimizing the RD cost function over all sets of decisions made for is SKIP coded. The proposed method in [19] obtained an average
the unit, i.e., encoding time reduction of 46% with 1.4% average coding loss,
when benchmarked against HM3.0 for the Random Access setting
min J = Di (i ) + λRi (i ), (1) with High Efficiency configuration. Furthermore, the “Early Partition
i
Decision (EPD)” method [20] checks for the interim RD cost when
where i is the set of decisions made for unit i, and Di (i ) and a unit is encoded with CU mode INTER with any partition. If the
Ri (i ) are the distortion and rate, respectively achieved by using i . interim RD cost is less than a threshold, the coding quad-tree is
Each choice in the nested quad-trees amounts to a point on the terminated. Similar to the “FEN” method, the threshold is set to
operational RD plane. Hence, the flexibility provided by the nested the average RD cost of a number of previously SKIP coded CUs
quad-trees greatly increases the operational RD search space and of the same CU size. The “CU Depth Pruning (CDP)” method
computational complexity of RDO at the encoder. Fast decision algo- [20] terminates the coding quad-trees at the higher branches by
rithms, which judiciously prune the RD search space, are necessary traversing the coding quad-trees in a pseudo breadth-first manner.
for real-time broadcasting applications [9]–[11]. When benchmarked against HM4.0, their proposed “EPD” method
One possible way to reduce the RD search space is by avoiding obtained an average encoding time reduction of 36% with 0.8%
full branching of the coding quad-trees. Referring to Fig. 4, in the average coding loss, while their proposed “CDP” method obtained
130 IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO. 1, MARCH 2016

an average encoding time reduction of 9% with 0.1% average


coding loss.
Some other coding quad-tree mode and partition fast decision
methods attempt to prune the search space by skipping certain depths
of the coding quad-trees. Frame-level and CU-level depth statistics
are used in skipping rarely used CU depths [21]. The CU depth statis-
tics of the CUs in a frame are used at the frame-level, while the CU
depth statistics of the left, top-left, top, top-right and co-located CUs
are used at the CU-level. When benchmarked against TMuC0.9 on
some 720p (1280 × 720) and 1080p (1920 × 1080) test sequences for
only quantization parameter QP = 24, the proposed method in [21]
obtained an average encoding time reduction of 42%. Similarly, the
“Adaptive Depth Range Determination (ADRD)” method [22] uses
the CU depth statistics of the left, top, top-left, and co-located CUs
for skipping rarely used CU depths. The “ADRD” method is cou-
pled with three coding quad-tree early termination methods based
on (i) motion homogeneity checking of the current, left, top, top-left
and co-located CUs; (ii) RD cost checking of the current CU, with
the threshold set as a weighted RD cost of the left, top, top-left and Fig. 3. Plot of log thresholds vs QP.
co-located CUs; and (iii) SKIP mode checking, by skipping further
branching if the current and parent CUs are SKIP coded. Their pro-
posed “ADRD” method obtained an average encoding time reduction and the absolute difference between the prediction residuals variances
of 42% with 1.3% average coding loss, when benchmarked against of the two 2N × N PUs,
    
HM2.0. In another work, the coding quad-tree mode decision prob-  
u2 = V Xim¯ ,U − V Xim¯ ,D , (3)
lem is casted as a Bayesian decision problem [23]. Their proposed
method obtained an average encoding time reduction of 41% with where V(X) is the variance of the prediction residuals in partition X.
1.9% average coding loss, when benchmarked against HM4.0 using The proposed method terminates a coding quad-tree early when
four HEVC test sequences. In this paper, we use prediction residuals the prediction residuals statistics are both less than a threshold, i.e.,
statistics for pruning the coding quad-trees. The prediction residu-
u1 < ncf (QP), (4)
als statistics are obtained from the current CU and can be computed
independently of its spatially and temporally neighbouring CUs. We and
then apply a QP-dependent threshold for coding quad-tree decisions
using the prediction residuals statistics. u2 < ncf (QP), (5)
where n is the number of pixels in a CU, f (QP) is a quanti-
IV. P ROPOSED M ETHOD zation parameter (QP)-dependent threshold, and c is a constant
experimentally set to trade off complexity and coding performance.
Referring again to Fig. 2, each frame is divided into non-
The prediction residuals statistics are computed over different CU
overlapping CTUs, where Xi0 denotes the ith 0 CTU within the frame. sizes and hence is normalized by the number of pixels in a CU.
Let M denote the maximum depth of the coding quad-tree, and m0
Furthermore, the QP-dependent threshold would allow a higher
denotes a parameter for deciding the minimum CU size. Then, a CU
threshold and hence fewer splits into smaller CUs for larger QPs.
at depth m, where 0 ≤ m < M, is of size 2m0 +(M−m) × 2m0 +(M−m) .
As in [24], we model our QP-dependent threshold as
Therefore the CTU size is 2m0 +M × 2m0 +M while the minimum CU
size is 2m0 +1 × 2m0 +1 . According to the HEVC common test con- 2
f (QP) = , (6)
ditions [6], M = 4 and m0 = 2, i.e., the CTU size is 64 × 64, and 12
the minimum CU size is 8 × 8. where  is the quantization step size [4]. Eq. (6) is essentially the
Each CU at depth m is denoted by Xi0 ,i1 ,··· ,im , where as mentioned mean-squared quantization error [25].
above, i0 indexes the location of the root CTU within the frame, while An analysis of the empirical QP-dependent threshold for the
each subsequent index, i1 , · · · , im (0 ≤ i1 , · · · , im ≤ 3), specifies the prediction residuals statistics is performed using the HEVC test
index of the CU within its parent. Therefore, Xi0 ,i1 ,··· ,im−1 ,im is the sequences [6] on HM16.3 [12]. Two test sequences are used from
ith
m CU of Xi0 ,i1 ,··· ,im−1 . For example, if the parent CU is X0 , its four each resolution class. For each video sequence, the prediction resid-
sub-CUs would be X0,0 , X0,1 , X0,2 , and X0,3 . Similarly, if the parent uals statistics and ground truth split decisions (split decisions by
CU is X0,1 , its four sub-CUs would be X0,1,0 , X0,1,1 , X0,1,2 , and reference software with exhaustive CU search) are collected for eight
X0,1,3 . This notation would allow us to uniquely identify each CU video frames over QP = {17, 18, . . . , 41, 42}. For each QP, we find
within a frame. For convenience, we would also use Xim¯ to denote gQP to minimize the mean-squared error between y and ỹ, where
¯ denotes the list of indices i0 , i1 , · · · , im .
Xi0 ,i1 ,··· ,im , i.e., im y and ỹ denote the ground truth split decisions and predicted split
Let the two N × 2N PUs of CU Xim¯ be denoted by Xim¯ ,L and Xim¯ ,R decisions, respectively.
respectively. Similarly, let the two 2N ×N PUs of CU Xim¯ be denoted  max{u1 , u2 }
by Xim¯ ,U and Xim¯ ,D respectively. The proposed method terminates a 0 (no split) if < gQP
ỹ = n
coding quad-tree early by using prediction residuals statistics. The 1 (split) if otherwise.
prediction residuals statistics are computed as the absolute difference
between the prediction residuals variances of the two N × 2N PUs, Fig. 3 shows a plot of log(gQP ) against QP. log(gQP ) is approx-
     imately a linear function of QP, i.e., gQP is approximately a
  QP−4
u1 = V Xim¯ ,L − V Xim¯ ,R , (2) logarithmic linear function of QP. Since  ≈ 2 6 [4],  is also
IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO. 1, MARCH 2016 131

TABLE I
C ODING LOSSES (BD-R ATE ) AND E NCODING T IMES
(E NC T IME ) FOR R ANDOM ACCESS (RA) S ETTING
W ITH M AIN AND M AIN 10 C ONFIGURATIONS

TABLE II
C ODING LOSSES (BD-R ATE ) AND E NCODING T IMES (E NC T IME ) FOR
L OW D ELAY (LD) S ETTING W ITH M AIN AND M AIN 10
C ONFIGURATIONS

Fig. 4. An illustration of the proposed method. The mode and partition


decision for a CU in the HM16.3 reference software encoder is as shown.
The proposed method terminates a coding quad-tree early when the prediction
residuals statistics are both less than a QP-dependent threshold.

Therefore, c is set as 0.001 for all combinations of RA and LD


approximately a logarithmic linear function of QP. Fig. 3 shows a settings with Main and Main 10 configurations.
plot of log(cf (QP)) against QP. High correlation of gQP and f (QP) The coding losses and encoding times for RA setting with Main
can be observed (breaking down only at very high QP above 41), and Main10 configurations are shown in Table I while the coding
supporting the use of f (QP) for coding quad-tree decisions using the losses and encoding times for LD setting with Main and Main10
prediction residuals statistics. configurations are shown in Table II. The proposed residual-based
The proposed method is illustrated in Fig. 4. As can be seen, the pruning method reduces encoding time by an average of 44% with
proposed method involves little modifications to the existing encoder, an average of about 1.0% coding loss.
since the prediction residuals statistics are obtained from the PUs The simulation results of “Early SKIP Decision (ESD)”
directly within a CU, independently of the spatially and temporally method [13], [14], “Coding-Block-Flag Fast Condition (CFM)”
neighbouring CUs. The computational complexity overhead is in method [15], “Early CU Condition (ECU)” method [16], [17] and the
computing the prediction residuals statistics for the pruning condition. proposed residual-based pruning method are summarized in Table III.
Nonetheless, these computations of simple second order prediction For a clearer illustration of the performance-complexity trade-offs, a
residuals statistics are operations with very low complexity compared plot of coding losses against encoding times for the RA and LD
to the encoding processes that can be pruned, e.g., mode decision and with Main configuration is also shown in Fig. 5. For both RA and
motion estimation. LD settings, it can be seen that the operating points of the proposed
and “ECU” methods lie on the convex hull of the operating points.
This indicates that, as compared to the “CFM” and “ESD” methods,
V. E XPERIMENTAL R ESULTS the proposed and “ECU” methods are able to achieve more compet-
The proposed method is implemented in HM16.3 [12], and tested itive complexity-performance trade-offs. This could be attributed to
using the common test conditions [6]. The HEVC test sequences a multi-depths pruning approach for the proposed and “ECU” meth-
cover a range of resolutions, including Class A (2560 × 1600 pix- ods as compared to a single-depth pruning approach for the “CFM”
els), Class B (1920 × 1080 pixels), Class C (832 × 480 pixels), and “ESD” methods. As compared to the “ECU”, while the proposed
Class D (416 × 240 pixels) and Class E (1280 × 720 pixels). Using method suffered additional coding losses, it further reduced encoding
HM16.3 as the anchor, the performances of our algorithm in Random time by almost 10%. The proposed method is useful when encoding
Access (RA) and Low Delay (LD) settings, for both Main and Main10 time reduction is more critical than bandwidth reduction, such as in
configurations are assessed. The RA setting corresponds to storage low-delay video conferencing applications.
or entertainment applications, while the LD setting corresponds to It should be noted that the encoding time reductions shown in
video conferencing applications. The Main and Main10 configura- Fig. 5 result from both the encoding time overheads to compute the
tions correspond to 8 and 10 bit-per-sample storage and output, analysis algorithms and the encoding time savings due to the prun-
respectively. For each method and configuration, a video sequence ing algorithms. The complexity of the analysis algorithms of “ESD”,
is coded at four quantization parameters, QP = {22, 27, 32, 37}. The “ECU”, and “CFM” are low, since these analysis algorithms only
coding performances of the proposed method are measured by the involve checking of coding-block-flags, SKIP mode, and motion vec-
Bjontegaard Delta PSNR (BD-PSNR) [26], and their encoding times tor differences. Similarly, the complexity of our analysis algorithm
are presented as a percentage of the anchor encoding run-times. We is low, as only simple second order prediction residuals statistics are
experimented with c = 10i , where i = {−5, . . . , 0}, and found 0.001 used. Therefore, the encoding time reductions of all these methods
give the best trade-off between complexity and coding performance. largely indicate the efficiencies of the pruning algorithms.
132 IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO. 1, MARCH 2016

TABLE III
C ODING L OSSES (L UMA BD-R ATE ) AND E NCODING T IMES (E NC T IME ) FOR A LL C OMBINATIONS OF R ANDOM
ACCESS (RA) AND L OW D ELAY (LD) S ETTINGS W ITH M AIN AND M AIN 10 C ONFIGURATIONS

“ADRD” method [21] reporting encoding time reduction of over 40%.


The reported encoding time reduction of the Bayesian method
and the “ADRD” method are respectively 41% and 42%. Nonetheless,
the reported coding loss of the Bayesian method is close to a high 2%.
As for the “ADRD” method, the reported coding loss is 1.3%, which
is similar/slightly higher than ours. When making pruning decision
for a CU, different from the “ADRD” method which involves statistics
from the current CU and its spatially and temporally neighbouring
CUs, our proposed method only involves prediction residuals statis-
tics from the PUs directly within the current CU. Therefore, our
proposed method could be more parallel processing friendly due to
its independency on neighboring CUs.

VI. C ONCLUSION
HEVC is important to the economical adoption of the emerg-
ing high quality digital visual content formats. Nonetheless, further
encoding time reduction is necessary for its use in real-time appli-
cations. In this paper, we proposed a method for reducing the high
encoding time of HEVC, by pruning its coding quad-trees using pre-
diction residuals statistics. Our experimental results show that the
proposed residual-based pruning method effectively reduces redun-
dant mode and partition checks; encoding time is reduced by an
average of about 44% with an average of about 1.0% coding loss. By
using only simple second order prediction residuals statistics from
a current CU independently of its spatially and temporally neigh-
bouring CUs, the proposed method introduces little modifications
and insignificant computational complexity overheads to the existing
encoder.
In the future, we will investigate ways for reducing unneces-
sary symmetric motion partition (SMP) and asymmetric motion
partition (AMP) checks.
Fig. 5. Plot of coding loss vs encoding time for various fast encoding methods
under Random Access (RA) Main and Low Delay (LD) Main configurations.
ACKNOWLEDGMENT
The main author would like to thank Dr. Chuohao Yeo and
To better understand the encoding time overheads of our method, Dr. Yih Han Tan for their suggestions.
we turn on the prediction residuals statistics analysis algorithm
but turn off the pruning algorithm at every CU. The encoding
time increase is close to 0%, suggesting that the prediction resid-
R EFERENCES
uals statistics analysis algorithm incurs insignificant encoding time
overhead. [1] High Efficiency Video Coding, document ITU-T Rec. H.265, Oct. 2014.
We now discuss other prior fast encoding methods which are [2] F. Bossen, B. Bross, K. Suhring, and D. Flynn, “HEVC complexity
and implementation analysis,” IEEE Trans. Circuits Syst. Video Technol.,
not present in the reference software. As these prior methods are
vol. 22, no. 12, pp. 1685–1696, Dec. 2012.
proposed based on earlier HM versions and are evaluated using dif- [3] Advanced Video Coding for Generic Audio-Visual Services, document
ferent test conditions, the performance-complexity operating points ITU-T Rec. H.264 and ISO/IEC 14496-10 (AVC), ITU-T and ISO/IEC
of these prior methods may not be directly comparable to the JTC 1, May 2003.
performance-complexity operating points of our proposed method. [4] I. E. Richardson, H.264 and MPEG-4 Video Compression: Video Coding
Nonetheless, as the source codes of these prior methods are not for Next-Generation Multimedia. Chichester, U.K.: Wiley, 2004.
[5] J. Vanne, M. Viitanen, T. D. Hamalainen, and A. Hallapuro,
publicly available, evaluation of these prior methods on HM16.3 “Comparative rate-distortion-complexity analysis of HEVC and AVC
is not viable. In general, these prior methods reported about 40% video codecs,” IEEE Trans. Circuits Syst. Video Technol., vol. 22, no. 12,
encoding time reduction, with the Bayesian method [23] and the pp. 1885–1898, Dec. 2012.
IEEE TRANSACTIONS ON BROADCASTING, VOL. 62, NO. 1, MARCH 2016 133

[6] F. Bossen, Common HM Test Conditions and Software Reference [17] K. Choi and E. S. Jang, “Fast coding unit decision method based
Configurations, document JCTVC-L1100, JCT-VC, Geneva, on coding tree pruning for high efficiency video coding,” Opt.
Switzerland, Jan. 2013. Eng., vol. 51, no. 3, 2012, Art. ID 030502. [Online]. Available:
[7] A. Ortega and K. Ramchandran, “Rate-distortion methods for image http://dx.doi.org/10.1117/1.OE.51.3.030502
and video compression,” IEEE Signal Process. Mag., vol. 15, no. 6, [18] K. McCann, B. Bross, S.-I. Sekiguchi, and W.-J. Han, High Efficiency
pp. 23–50, Nov. 1998. Video Coding (HEVC) Test Model 2 (HM 2) Encoder Description,
[8] G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video document JCTVC-D502, JCT-VC, Daegu, Korea, Oct. 2010.
compression,” IEEE Signal Process. Mag., vol. 15, no. 6, pp. 74–90, [19] J. Kim, S. Jeong, S. Cho, and J. S. Choi, “Adaptive coding unit early
Nov. 1998. termination algorithm for HEVC,” in Proc. IEEE Int. Conf. Consum.
[9] C. Grecos and M. Y. Yang, “Fast inter mode prediction for P slices in Electron. (ICCE), Las Vegas, NV, USA, Jan. 2012, pp. 261–262.
the H264 video coding standard,” IEEE Trans. Broadcast., vol. 51, no. 2, [20] H. L. Tan, F. Liu, Y. H. Tan, and C. Yeo, “On fast coding tree block
pp. 256–263, Jun. 2005. and mode decision for high-efficiency video coding (HEVC),” in Proc.
[10] J. Hou, S. Wan, Z. Ma, and L.-P. Chau, “Consistent video quality control IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), Kyoto, Japan,
in scalable video coding using dependent distortion quantization model,” Mar. 2012, pp. 825–828.
IEEE Trans. Broadcast., vol. 59, no. 4, pp. 717–724, Dec. 2013. [21] J. Leng, L. Sun, T. Ikenaga, and S. Sakaida, “Content based hierarchi-
[11] Y. Wang, L.-P. Chau, and K.-H. Yap, “Bit-rate allocation for broadcast- cal fast coding unit decision algorithm for HEVC,” in Proc. Int. Conf.
ing of scalable video over wireless networks,” IEEE Trans. Broadcast., Multimedia Signal Process. (CMSP), vol. 1. Guilin, China, May 2011,
vol. 56, no. 3, pp. 288–295, Sep. 2010. pp. 56–59.
[12] K. McCann et al., High Efficiency Video Coding (HEVC) Test Model [22] L. Shen, Z. Liu, X. Zhang, W. Zhao, and Z. Zhang, “An effective CU
16 (HM16) Encoder Description, document JCTVC-R1002, JCTVC, size decision method for HEVC encoders,” IEEE Trans. Multimedia,
Sapporo, Japan, Jul. 2014. vol. 15, no. 2, pp. 465–470, Feb. 2013.
[13] J. Kim, J. Yang, K. Won, and B. Jeon, “Early determination of mode [23] X. Shen, L. Yu, and J. Chen, “Fast coding unit size selection for HEVC
decision for HEVC,” in Proc. Pict. Coding Symp. (PCS), Kraków, based on Bayesian decision rule,” in Proc. Pict. Coding Symp. (PCS),
Poland, May 2012, pp. 449–452. Kraków, Poland, May 2012, pp. 453–456.
[14] J. Yang, J. Kim, K. Won, H. Lee, and B. Jeon, Early SKIP Detection [24] Y. H. Tan, C. Yeo, H. L. Tan, and Z. Li, “On residual quad-tree cod-
for HEVC, document JCTVC-G543, JCT-VC, Geneva, Switzerland, ing in HEVC,” in Proc. IEEE 13th Int. Workshop Multimedia Signal
Nov. 2011. Process. (MMSP), Hangzhou, China, Oct. 2011, pp. 1–4.
[15] R. H. Gweon, Y.-L. Lee, and J. Lim, Early Termination of CU Encoding [25] H. R. Wu and K. R. Rao, Digital Video Image Quality and Perceptual
to Reduce HEVC Complexity, document JCTVC-F045, JCT-VC, Turin, Coding (Signal Processing and Communications). Boca Raton, FL,
Italy, Jul. 2011. USA: CRC Press, 2005.
[16] K. Choi, S. H. Park, and E. S. Jang, Coding Tree Pruning Based [26] G. Bjontegaard, Calculation of Average PSNR Differences Between RD
CU Early Termination, document JCTV-F092, JCT-VC, Turin, Italy, Curves, document VCEG-M33, ITU-T Q6/SG16, Austin, TX, USA,
Jul. 2011. Apr. 2001.

You might also like