Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

`

ASIC Implementation of an Optimized Integer DCT for


HEVC
Guddety Nageshwara Reddy1, P Reena Monica2
1School of Electronics Engineering , Vellore Institute of Technology University, Chennai,600127 India
2School of Electronics Engineering , Vellore Institute of Technology University, Chennai,600127 India
1
nageshwara.reddy2013@vit.ac.in

Abstract

Area and power efficient architectures for the implementation of integer discrete cosine transform (DCT) of different
lengths to be utilized in High Efficiency Video Coding (HEVC) are presented. The efficient constant matrix multiplication
(CMM) schemes can be used to derive parallel architectures for one dimensional integer DCT of different lengths.
Reusable architectures which could be used for implementing the discrete cosine transform of any of the prescribed lengths
of 4, 8, 16, and 32 each of them having particular advantage in terms of power and area is proposed. Also, the power
efficient architectures for full parallel and folded two dimensional discrete cosine transform is proposed. Simulation results
using CADENCE illustrate substantial improvement in power and area.

Keywords: DCT, High Efficiency Video Coding, integer discrete cosine transform.

1. Introduction However, with the recent advances in display technologies


Digital video has a customary presence in our day and video capturing the presence of High Definition (HD)
today lives for many years. It has been used for digital and Ultra High Definition (UHD) video contents in
television, hand held devices, in personal computers and numerous multimedia applications is quickly increasing
other multimedia applications. So it has grown [2]. This type of video resolutions needed a higher
tremendously in the last years and it seems that this information measure for its transmission and bigger
magnification is not decelerating [1]. storage capacities. During this approach the compression
ratios achieved by the present progressive video coding
The High Efficiency Video Coding (HEVC) standard standard for HD and UHD content do not seem enough
is the most imperative critical feature task of the ITU-T taking in account the available transmission and storage
Video Coding Experts Group (VCEG) furthermore the supports. With this in mind, the ITU-T VCEG and
ISO/IEC Moving Picture Experts Group (MPEG) ISO/IEC MPEG bodies created the Joint Collaborative
institutionalization associations, working along in an Team on Video Coding (JCT-VC) which is currently
exceedingly association known as the Joint developing a new video coding standard, the High
Collaborative Team on Video Coding (JCT-VC) [2]. Efficiency Video Coding (HEVC) standard, with the
The essential version of the HEVC standard has been objective of increasing the highest available compression
settled in January 2013, prompting partner adjusted ratios, particularly for very high resolution video
content that has been uncovered by every ITU-T and contents[3]. To do this, new coding techniques have to be
ISO/IEC. Further work is wanted to expand the standard developed that can guarantee better compression over the
to bolster numerous further application situations, and in current ones even if it is at the price of some additional
addition expanded reach utilizes with improved complexity. High end video applications have become
accuracy and shading arrangement support, versatile important in our day today life activity such as video
feature coding, and 3-D/stereo/multi view feature conference, watching movies and saving the videos using
coding. In ISO/IEC, the HEVC standard has ended up highly standard video camera etc.
MPEG-H Part 2 (ISO/IEC 23008-2) and in ITU-T its
known as ITU-T Recommendation H.265.
`
The joint collaborative team on video coding of moving Similarly inverse 2D DCT transformation is defined as
expert group and video coding experts group is in the [11]
processing of new video coding standard called High
Efficiency Video Coding (HEVC). (, ) =
HEVC is capable to supply double higher compression (2+1) (2+1)
1 1
=0 =0 ()()(, )cos( )cos( )
than H.264/AVC and it gives five hundredth of 2 2
reduction in bit rate than H.264/AVC.
2. Review of Integer DCT for High
The discrete cosine change (DCT) assumes an Efficiency Video Coding
imperative part in feature pressure because of its close
ideal decorrelation proficiency [4]. Numerous varieties
of number discrete cosine change have been proposed in Let U=[ u(0),u(1),,u(N-1) ] be the N-point input
the most recent a quarter century request to lessen the vector and V=[ v(0),v(1),.v(N-1) ] be N-point output
computational unpredictability [6].The Discrete Cosine vector of integer DCT, given by following equation
Transform (DCT) has been widely applied within the V=CNu, where CN denotes the N-point integer DCT
space of video and compression such as MPEG-2/4, kernel. CN for N=4, 8, 16 and 32 are defined by Joint
JPEG, H.263 and H.264/AVC [8]. One of the imperative Collaborative Team-Video Coding. The integer four-point
alternatives of HEVC is that it supports DCT of DCT, U=[ u(0), u(1), u(2), u(3) ] be the four-point input
different changes length, for example, 4, 8, 16, and 32. vector and V=[ v(0, v(1), v(2), v(3) ] be the four-point
Thusly the equipment outline is sufficiently adaptable output vector is given by
for processing of DCT of any of these change length.
The conventional DCT bolstered CMM and MCM can (0) 16 16 16 16 (0)
give answer for reckoning of any these lengths, (1) 8 24 24 8 (1)
[ ]= [ ][ ]
nonetheless they are not reused to any of these change (2) 16 16 16 16 (2)
length, for example, 16 and 32 [9]. In order to facilitate (3) 24 8 8 24 (3)
reusability, according to resource requirement for the
implementation of integer DCT for HEVC, an algorithm The 4-Point DCT of (1) can be separated into even and
is proposed. The 1-D and 2-D number DCT for HEVC index components and simplified as [12]
can be reused for any of the recommended change
length, for example, 4, 8, 16 and 32. Even part:

The Discrete Cosine Transform of a one dimensional (0) 16 16 (0)


sequence of length N is defined as in [10] [ ]=[ ][ ]
(2) 16 16 (1)
(2+1)
C(v) = (v) 1
=0 ()cos( ) Odd part:
2
For v = 0, 1, , N-1.
(1) 24 8 (0)
[ ]=[ ][ ]
Similarly, the inverse DCT transformation is defined as (3) 8 24 (1)
in [10]
Where a(0) =U(0)+U(3), a(1)=U(1)+U(2), b(0)=U(0)-
(2+1) U(3) and b(1)=U(1)-U(2).
() = 1
=0 ()()cos( 2
)
For k= 0, 1, .....,N-1 The N-point integer DCT for HEVC is [14]-[15]
computed by using a partial butterfly approach.
Considering above equation (u) is defined as in [10]
sqrt
1
for v = 0 (0) (0)
N
(v) = { 2
(2) (1)
sqrt N
for v 0 . .
= n/2
. .
If the input sequence has more than N sample points ( 4) (/2 2)
then it will be divided into sub-sequences of length N. [( 2)] [(/2 1)]
and
The Discrete Cosine Transform of a two dimensional (1) (0)
sequence of length N is defined as in [11] (3) (1)
. .
= n/2
(, ) = . .
(2+1) (2+1) ( 3) (/2 2)
()() 1 1
=0 =0 (, )cos( 2
)cos( 2 .
[( 1)] [(/2 1)]
`

where A(i)=U(i)+U(n-i-1)
B(i)= U(i)-U(n-i-1)

for i=0,1, , n/2-1.


Where Cn/2 is a( n/2)-point integer DCT kernel matrix
and Mn/2 is also a matrix size (n/2) x (n/2).

3. Proposed Architecture for


Integer DCT for HEVC
Fig. 2. 8-, 16- and 32-point integer DCT for HEVC.
In this section, the proposed architecture for integer
DCT of higher length for HEVC is discussed. The For example for 8-point, the input adder unit uses a
proposed Integer DCT consists of input adder unit butterfly approach and output of first four (a(0), a(1), a(2)
(IAU), output adder unit (OAU) and shift add unit and a(3)) input adder unit is added to (n/2)-point integer
(SAU). DCT unit and it gives the even part output(v(0), v(2), v(4)
and v(6)) and each of the remaining four (b(0), b(1), b(2)
3.1 Proposed architecture for 4-point and b(3)) input adder unit output is added to shift-add-unit
Integer DCT for HEVC for 8-point which has one input b(i) where i=0,1, ,n/2-1
and four output(Ti89, Ti75, Ti50, Ti18) and each of SAU
The proposed 4-pont integer DCT is shown in Fig. 1. output is added to the OAU which generates the output of
This architecture consists of three stages viz., the first the DCT from a binary adder tree . Structure for 16 and
stage is an input adder unit (IAU), the second stage is a 32-point integer DCT for HEVC can also be obtained
shift-add-unit (SAU) and the final stage is an output similarly.
adder unit (OAU).
3.3 Proposed Reusable Architecture for
In this architecture for input adder unit, the butterfly
approach is used which computes A(0), A(1), B(0) and
Integer DCT for HEVC
B(1). The shift add unit computes Ti36 and Ti83 in
second stage. Output of SAU is added to the OAU The proposed reusable integer DCT for 16 and 32-
which is the final stage. point is shown in Fig. 3. It consists of control unit, two
(n/2)-point integer DCT unit, input adder unit, shift add
unit and output adder unit.

Fig. 1. Four point integer DCT for HEVC.

3.2 Proposed architecture for 8, 16, and 32-


Point integer DCT for HEVC
The proposed 8-, 16- and 32-pont integer DCT for HEVC
is shown in Fig. 2. This architecture consists of four
stages. The stages are input adder unit, shift add unit,
(n/2)-point integer DCT unit and output adder unit. The
input adder unit computes A(i) and B(i) for i=0,1, n/2-1
in the first stage.
Fig. 16 and 32-point Reusable integer DCT architecture
for HEVC.
`

The input to one (n/2)-point integer DCT unit is fed with


2:1 multiplexers. The number of multiplexers are (n/2)
which select either [V(0), ., V(n/2-1)] or [A(0), ..,
A(n/2-1) and it is used for computation of the other n/2-
point integer DCT unit. This n/2- point DCT takes input
[V(n/2), , V(n-1)] which is used for the computation
of n-point DCT or for the DCT of a lower size. To reset
the input array of n/2, AND gates are used to disable
(n/2)-point integer DCT unit. The output of both (n/2)-
point integer DCT unit is multiplexed with the output
adder unit (OAU),which is the precedence of SAU and
IAU structure. The N AND gates are used before IAU to
disable IAU, SAU and OAU when architecture is used
for computation of (n/2)-point integer DCT or lower
size. The input of control unit is used to decide the size
of DCT.
Fig. 4. (8x8)-point 2D integer DCT for folded structure.

4. Proposed architecture of 2D-


Integer DCT

Using the separable property, (nxn)-point 2D-integer


DCT is computed by the row and column decomposition
matrix techniques in two stages.

1. STAGE-1: First n-point 1D integer DCT is


computed for each of the column matrix and in first
Fig. 5. (NxM) Transposition Buffer.
stage it gives a output matrix of size (nxn).
2. STAGE-2: In second stage n-point integer DCT is
computed for each of the row matrix and gives 4.2 Proposed structure for full parallel (8x8)
desired 2D-DCT of size (nxn). 2D- Integer DCT
In this section, the proposed architecture for folded and The computation of (8x8)-point 2D integer DCT for full-
full-parallel structure for 2D-DCT is presented. In this parallel structure is shown in Fig. 6. This structure consists
section text compression is done for the full parallel 2D- of two 8-point integer DCT unit and one transposition
DCT (8 x 8). buffer.

4.1 Proposed structure for folded (8x8) 2D-


Integer DCT
The computation of (8x8)-point 2D integer DCT for
folded structure is shown in Fig. 4. The proposed folded
structure consists of 8-point integer DCT unit and (N x M)
transposition buffer. The structure of proposed N x M Fig. 6. (8x8)-point 2D integer DCT for full-parallel
transposition buffer is shown in Fig. 5. It consists of n structure.
registers which is arranged in n columns and n rows. The During the first N cycles the intermediate result is stored
(NxM) transposition buffer can store n values in any one column-wise and next N cycles are stored in successive
column of the registers and one can select one of the rows of transposition buffer selected by MUXes and
column of registers through the MUXes. During the first n added to input of 1D-DCT module.
cycles, the DCT receives the successive columns from During this stage the output of 1D-DCT of first stage is
block of input and stores results in the registers of stored row-wise. In the next cycles, the results are read
successive columns in transposition buffer and during next and written column-wise. Alternatively row-wise and
n cycles, successive rows of transposition buffers are column-wise read and write operation in the transposition
selected by muxes and fed as the input to 8-point integer buffer continues. After synthesize in Xilinx tool, the text
DCT unit. compression for the full parallel DCT is performed and the
result is shown in Fig. 15.
`
5. Simulation results of proposed
Integer DCT for HEVC and Proposed
structure for folded 2D-DCT (8 x 8)

In this section, performance analysis and comparisons


of proposed integer DCT for HEVC is provided. All
proposed architectures is simulated in Xilinx ISE
Design Suite 14.3 and synthesized using 180 nm
technology of the cadence EDA tool.

Fig. 8. 32-point Integer DCT for HEVC.

Fig. 7. 16-point integer DCT for HEVC.

Fig. 9. Layout view of 32-point Integer DCT for HEVC.

Fig. 8. Layout view of 16-point integer DCT for HEVC.

Fig. 10. Reusable 16-point Integer DCT for HEVC.


`

Fig. 14. Folded structure for 2D-DCT of N x N.

Fig. 11. Layout view of Reusable 16-point Integer DCT


for HEVC

Fig. 15. Full-parallel structure for 2D-DCT of N x N.

Table. 1. Area, Gates and Power of Proposed Integer


DCT for HEVC Architectures
Fig. 12. Reusable 32-point integer DCT for HEVC.
ARCHITECTURE GATE AREA POWER
(m2) (nW)
4-point Integer DCT 234 6081 1121181.048

8-point Integer DCT 1119 29518 6829949.270

16-point Integer 4048 107659 30146315.53


DCT
32-point Integer 13496 376372 119492190.5
DCT
Reusable 16-point 5491 148255 25968048.36
Integer DCT
Reusable 32-point 17931 502676 102492511.6
Integer DCT
Fig. 13. Layout view of Reusable 32-point integer DCT
for HEVC
`
Table. 1 shows the comparison of integer DCT with Table. 5. Addition, Subtraction, Register, Maximum
reusable integer DCT. There is 13.8% reduction of Frequency for Proposed Integer DCT
power and 27% increase in area for reusable 16-point
integer DCT and 14.2% reduction of power and ARCHITECTURE ADD SUB REG FREQUENCY
25.12% increase in area for reusable 32-point integer
4-point Integer DCT 12 2 8 336.700
DCT for HEVC.
(MHz)
8-point Integer DCT 43 7 36 295.421
(MHz)
Table. 2. Area, Gates, Power of Proposed 2D Integer 16-point Integer 155 37 124 199.421
DCT Architectures DCT (MHz)
32-point Integer 494 194 428 63.698
ARCHITECTURE GATES AREA POWER DCT (MHz)
(m2) (nW) Reusable 16-point 198 44 176 199.421
Folded 2D-DCT 10953 296145 65543589.65 Integer DCT (MHz)
(8 x 8) 5
Reusable 32-point 649 231 584 63.698
Full-parallel 2D- 8952 236148 55325643.95 Integer DCT (MHz)
DCT (8 x 8) 3

Table. 2 shows the comparison of 2D integer DCT Table. 6. Addition, Subtraction, Registers and Maximum
architectures. There is 15.5% reduction of power and frequency of proposed 2D Integer DCT Architectures
20.2% reduction of area for proposed full-parallel 2D
integer DCT. ARCHITECTURE ADD SUB REG FREQUENCY
Folded 2D-DCT (8 x 220 110 4194 10.197(MHz)
Table. 3. Power optimization report on proposed Integer 8)
DCT for HEVC architectures using clock gating Full-parallel 2D-DCT 360 90 3180 11.680(MHz)
technique. (8 x 8)

ARCHITECTURE FLIPFLOPS POWER(mW) Table.5 and 6 show the proposed integer DCT and
4-point Integer DCT 44 0.201 proposed 2D integer DCTs frequency of operation . From
for HEVC these tables it is inferred that Integer DCT has the same
8-point Integer DCT 212 0.784 Maximum frequency for both n- point integer DCT and
for HEVC reusable integer DCT. It is also inferred that Full-parallel
16-point Integer 697 2.628 2D-DCT (8 x 8) has maximum frequency compared to
DCT for HEVC folded 2D-DCT (8 x 8).
32-point Integer 1089 8.671
DCT for HEVC
Reusable 16-point 672 1.802
Integer DCT for 5. Conclusion
HEVC
Reusable 32-point 1089 4.523
Integer DCT for
Area and power efficient architectures for integer DCT for
HEVC HEVC and reusable integer DCT for HEVC are proposed.
From the simulation results it is evident that the 16, 32-
Table. 3 shows the power optimization report for Integer point reusable integer DCT architecture for HEVC gives
DCT. There is 30.74% reduction of power for reusable less power than integer DCT for HEVC. Also, the
16-point integer DCT and 47.8% reduction of power for proposed architectures for full-parallel and folded 2D
reusable 32-point integer DCT. integer DCT for 8 x 8 is proven to be power efficient.

Table. 4. Power optimization report on proposed 2D References


Integer DCT architectures using clock gating
technique
1. N. Ahmed, T. Natarajan, and K. Rao, Discrete cosine transform,
IEEE Trans. Comput., vol. 100, no. 1, pp. 9093, Jan. 1974.
ARCHITECTURE FLIPFLOPS POWER(mW) 2. W. Cham and Y. Chan, An order-16 integer cosine transform,
Folded 2D-DCT (8 x 8) 1789 7.893 IEEE Trans. Signal Process., vol. 39, no. 5, pp. 12051208, May
1991.
Full-parallel 2D-DCT 1456 6.233 3. Y. Chen, S. Oraintara, and T. Nguyen, Video compression using
(8 x 8) Integer DCT, in Proc. IEEE Int. Conf. Image Process., Sep. 2000,
pp.844845.
Table.4 shows the power optimization report for 2D 4. J. Wu and Y. Li, A new type of integer DCT transform radix and it
rapid algorithm, in Proc. Int. Conf. Electric Inform. Control Eng.,
integer DCT architectures using clock gating technique. Apr.2011, pp. 10631066.
There is 21% reduction of power for Full-parallel 2D
integer DCT (8 x 8).
`
5. A. M. Ahmed and D. D. Day, Comparison between the cosine
And Hartley based naturalness preserving transforms for image
Watermarking and data hiding, in Proc. First Canad. Conf.
Comput. Robot Vision, May 2004, pp. 247251.
6. M. N. Haggag, M. El-Sharkawy, and G. Fahmy, Efficient fast
multiplication-free integer transformation for the 2-D DCT H.265
standard, in Proc. Int. Conf. Image Process., Sep. 2010, pp. 3769
3772.
7. B. Bross, W.-J. Han, J.-R. Ohm, G. J. Sullivan, Y.-K. Wang,
and T. Wiegand, High Efficiency Video Coding (HEVC) Text
Specification Draft 10 (for FDIS and Consent), JCT-VC L1003,
Geneva, Switzerland, Jan. 2013.
8. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra,
Overview of the H.264/AVC video coding standard, IEEE
Trans . Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560576,
Jul. 2003.
9. A. Ahmed, M. U. Shahid, and A. Rehman, N Point DCT VLSI
Architecture for Emerging HEVC Standard, in Proc. VLSI
Design, vol.2012, Article 752024, pp. 113, 2012.
10. S. Shen, W. Shen, Y. Fan, and X. Zeng, A unified 4/8/16/32-
Point integer IDCT architecture for multiple video coding
standards, in Proc.IEEE Int. Conf. Multimedia Expo, Jul. 2012,
pp. 788793.
11. J.-S. Park, W.-J. Nam, S.-M. Han, and S. Lee, 2-D large
inverse transform (16x16, 32x32) for HEVC (high efficiency
video coding),J. Semicond. Technol. Sci., vol. 12, no. 2, pp.
203211,Jun. 2012.
12. M. Budagavi and V. Sze, Unified forward+inverse transform
Architecture for HEVC, in Proc. Int. Conf. Image Process., Sep.
2012, pp. 209212.
14. A. Fuldseth, G. Bjntegaard, M. Budagavi, and V. Sze. (2011
, Nov.). JCTVC-G495, CE10: Core Transform Design for HEVC:
Proposal for Current HEVC Transform [Online]. Available:
http://phenix.int-evry.fr/jct/doc end user/documents/7
Geneva/wg11/JCTV%C-G495- v2.zip.
15. M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan,
Multiple constant multiplications: Efficient and versatile
framework and algorithms for exploring common
subexpression elimination, IEEE Trans. Comput.-Aided Des .
Integr. Circuits Syst., vol. 15, no. 2, pp . 151165, Feb. 1996

You might also like