Design and Hardware Implementation A Memory Efficient Huffman Decoding

Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding
345
DESIGN AND HARDWARE IMPLEMENTATION OF A MEMORY EFFICIENT HUFFMAN DECODING

Reza Hashemian Northern Illinois University Department of Electrical Engineering
Abstract- Hardware design of a high speed and memory

efficient Huffman decoder, introduced in[13], is presented. The algorithm developed is based on a specijic Huffman tree structure using a code-bit clustering scheme. The method is shown to be extremely efficient in the memory requirement, and fast in searching for the desired symbols. For an experimental video data with code-words extended up to 13 bits, the entire memory space needed is shown to be 122 worak in size, compared with normally 2 = 8196 words memory space requirement. The design of the decoder is carried out wing silicon-gate CMOS process. I. INTRODUCTION
The combination of Huffman coding and run-length coding has been shown to pexform efficiently in high speed data compressions[I]-[ 121. In fact, with some variations, this combined technique has been widely used as a near optimal entropy coding technique. For maximum compression, the coded data is normally sent through a continuous stream of bits with no specific guard-bit(s) assigned to separate between two consecutive symbols. As a result, decoding procedure in this case must recognize the code length as well as the symbol itself. In its simplest form Huffman coding may structurally be represented by a binary tree. Due to variable-length coding, however, the Huffman tree gets progressively sparse as it grows from the root. This sparsity in the Huffman tree may cause tmnendous waste of memory space, unless a properly structured technique is adopted to allocate the symbols in the memory. In addition, this sparsity may also result in a lengthy search procedure for locating a symbol. More specifically, if k-bit is the longest Huffman code assigned to a set of symbpls, the memory size for the symbols may easily reach to 2 words, in an unstructured memory environment. This evidently becomes prohibitively large for typical video data, where k is 13 or higher. Ideally, it is desirfble to reduce the memory size from the typical value of 2 , to a size proportional to the number of the actual symbols. This is not only a substantial reduction in memory requirement, but it may even allow us to replace a sizable external RAM by a much smaller memory, possibly inside the processing chip, for quicker access.
A new technique has been recently developed[l3] which deals with an efficient Huffman coding. It clearly addresses both issues of memory management and quicker access to the source code (symbol). It is shown that by clustering the bits in a codeword one is able to speed up the search by taking steps over the groups of code-bits rather than doing one bit at a time. In addition, it is shown that implementing this technique will drastically reduce the memory (LUT) size, and limits the number of LUTs to only one. The proposed technique is based on an ordering and clustering scheme that groups the codewords within the specified codeword lengths, and such ordering and clustering serves the following purposes: i) the searching time for more frequent symbols (shorter codes) is substantially reduced compare to less frequent symbols, resulting in an overall faster response, and ii) for long code-words the search for a symbol is also speeded up. This is achieved through a specific partitioning technique that groups the code bits in a codeword, and the search for a symbol is conducted by jumping overthe groups of bits rather than going through the bits individually. In a typical example of a digital video data with codeword lengths extended up to 13 bits, it is shown that[l3] the entire memory space needed for the Huffman Look Up Table is reduced to about 122 words, compared to a normal size of 8192 words[l3]. In Section II we overview the techniques developed in Reference [13]. The Huffman tree partitioning and node clustering procedures are discussed, and structuring the Huffman LUT and the search procedure to obtain a source code (symbol) is explained. In Section III we discuss the details of the design and hardware structure of a Huffman decoder. The design is partitioned into three major operational parts and each part is fully developed for hardware construction using CMOS technology. In this section some new design techniques are emploied to produse variable bit shifting from fix bit shifting. The use of two barrel shifters, K and L, along with the three registers B, C, and D, are the essential components to transfer the data. Section I is the V conclusion section.
Manuscript received June 27, 1994
0098 3063/94 $04.00 1994 IEEE
346
IEEE Transactions on Consumer Electronics, Vol. 40, No. 3, AUGUST 1994

2L Symbols 2 00 2 01 02 4 4
4 4
II. MEMORY STRUCTURE FOR A HUFFMAN

LOOK-UP TABLE
Table 1 is a Huffman table constructed for a typical digital video signal with codeword lengths extended up to 13 bits, and Fig. 1 illustrates the corresponding Huffman tree. As shown, the Huffman tree is partitioned by the cut lines xx, y-y, and z-z, each separated by a distance Lm, where Lm is the path length between any two consecutive cut lines. Such tree partitioning produces clusters; where, each cluster consists of a connected subtree with a single root node. A cluster length L, on the other hand, is defined as the length of the longest path in the cluster, started from the root. With the separation distance L, between the cut lines, it can easyly be The clustering operaproven that, L can never exceed L,. tion may also be performed on the Huffman table. This is done by simply partitioning the code-bits in each row into , groups of four (for L = 4) or less bits, as shown in Table 1. Figure 2 shows the Huffman tree after being cut by the cut lines and the clusters being separated. As illustrated, to each cluster a table is assigned pointing at the terminal nodes in that cluster. Each entry in the table provides two pieces of information: i) a single (sign) bit specifies whether a terminal node in the cluster represents an actual tree terminal node, or the node is just being cut by a cut line. In the later case the node is a connecting node between two clusters. For an actual terminal node the bit (MSB) is 0, and for the connecting node it is assigned 1. The second part of the data, follows the first (sign) bit, provides the information about the location of the node in the memory, assigned to the Huffman look-up table. We shall discuss this shortly. Figures 3 is the block representation of the combined Huffman look-up table (LUT), and Fig. 4 represents the memory space being allocated to the Huffman LUT. As illustrated, the LUT is a collection of all the Huffman tables (memories) associated with the clusters, combined into one piece. Each entry in the table consists of three pieces of information: a single (sign) bit designated as the symboVnosymbol code. The second piece of information is a two-bit code specifying the length of the pertinent codeword or the length of the next associated cluster. The last piece of information, given as part of each entry in the LUT, is either the source code (symbol) or the address to the next cluster. It is interesting to know that the LUT is the only reference table (memory) needed to allocate a symbol from its codeword, embedded in a stream of code-bits. The LUT is also arranged in such a way that it identifies the corresponding codewords within a steam of code-bits, immediately after the codeword is received. As discussed earlier, the memory blocks -a-, -b-, ..., -g-, in Fig. 3, are associated with different clusters in the tree (or subtrees). It is this clustering scheme that helps to speed up the search process and to jump over the blocks in order to allocate a symbol. Note also that the size of the memory is
Haffman
Code
00
01 lo00 1001 1010 101 1 1100 1101 1110
03
04
05
06
4 4 -
07 5 - 08 6 09 Oa 6
b 6 - O I oc I od 8 oe 8 8 8 8 __ 9 9 9 10 10 10 10 10 12 12 12 13 O f 10 11 12 13 14 15 16 17 18 19 la lb IC Id le If
1110
1110 1111 1111
1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
0 10
11 00 010 011 lo00 1001 1010 1011 1100 1101 1101 1110 1110 1110 1111 1111 1111
0 1
0
10 11 00 01 10 1100 1101 1110
1111 1111
1111 1111 1111 1111

1111 1111 1111
1111 1111 1111

1111
0
1
'3
1111
Table 1 A Huffman table partitioned into clusters or four bits or less.
Fig.1 A Huffman tree partitioned by the cut lines x-x, y-y, and Z-Z. substantially reduced due to this clustering effect. To better understand the searching process for a symbol we proceed through some examples: 1) We first take a sample from the input code-bit stream as 110110011...; where, the higher value bits in the stream are received first. We now select the f m t Lm- bits (Lm= 4 in our example) from the stream, which is 1101 = d . This is the address of the LUT, and we find the three H
Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding

0 1 2 3 4 5 6 7
341
-a-a-b-
b.c
-C-
-f-
F i g 3 Block-wise Huffman Fig.4 Huffman Look-Up
Fig.2 Clusters being produced from the Huffman tree after being cut by the cut lines x-x, y-y, and z-z.
Look-Up Table. Table. 3) Finally, we consider another input bit stream sample 111111111111001 The first four bits 1111 refer to the ad.... dress fH in the LUT, where we find the entry 1/11/14. The first bit 1 is no-symbol code. The second 2-bit code identifies the length of the next cluster, and the third piece of code, 14H, is the (offset) address for the next cluster, which is the c cluster. Since the length of the c cluster is four we take another 4-bit slice out of the input code-bit stream. This H slice is also 1111, which refers to the relative address f in the LUT. To find the absolute address we add the relative address and the offset address to get 14+f=23H. The content of the memory &UT) at 23H is 1/11/2a (see Fig. 5(c)). This leads us to the f cluster with cluster length 4 and the offset
Given the input code-bit stream
110110011
pieces data 0/11/07 at the d location (address), shown in H Fig. 5(a). The first bit 0 in 0/11/07 indicates that the symbol is found, and it is declared as 07H. The second piece of information is a 2-bit code 11, which stands for the codeword length of llb+l=100b or 4. This simply suggests that the codeword for the symbol 07 is found to be 1101, and the codeword length is four. 2) Next, we take another sample from the input code.... bit stream 111010010 Again, we select the first four bits 1110 = eH and refer to the LUT (Fig. 5(b)) at the given address. Here the three pieces data is identified as 1/01/10. The first bit 1 indicates that the symbol is not found at this location, and more search must be conducted the cluster (memory block) to be searched has the starting address 1OH. This is the b cluster (see Fig. 3). The second piece of information, i.e. 01, gives the length of the next cluster as u Olb+l=lOb or 2. O r next move, naturally, is to slice out two more bits from the input stream. These two bits are 10 giving the relative address of 10b=2. With the offset address of lOH, found earlier, we calculate the absolute address of 10+2=12H and refer to this location in the LUT in order to allocate the symbol. The three pieces data at this location is 0/01/09. With the first bit 0 observed we identify the symbol 09 with relative codeword length of Olb+l=lOb, or 2. Therefore, the absolute codeword for the symbol 09 is 111010, and the codeword length is six.
1101 = dH
0
...
____2_2
The symbol is found a t this Location.
I l b There are 4 b i b in this codeword. Tho codeword ia 1101.
07H ThL i the mymbol. .
Fig.S(a) The Huffman Look-Up Table. Running a decoding example.

Clien the input code-blt stream U: 111010010... the f h t 4 bit. are
1110 +-O --H I Find tho s h r t mddross for the next c1u.t.r.
Otb The clustar in m 2 level tree.
IOH The s b r t iddrsu af the
next cluster.
Take the next two artandad bit. lllOL0010... IO = ZH.
0
10+2 = lZH
T . a p b o l 1 Iocmbd her.. h .
Olb The extended aorarord is 2 bit long.

00H TU. is ths SJmbol. and the final codaword is L11010.
FigS(b) The Huffman Look-Up Table. Running a decoding example.
348
Glm
the input eade-bit steam .I I: I I 11I I I I I111001 we get 1111 = IH
...
I
11
Find the s The next clunker-length 4
1 4 H Ths start address.
Take the next 4 bits 111II11I1IIIOOL 1111 = IH. l4+f
...
13H
Find Ule addreas (ZaH). and aubtrss la 4 level long.

0
Take U s next 4 bit. L11111111111001

1111
...H, ea+! K
38H
dresses Y , as illustrated in Fig. 6. In addition, there are other design considerations such as; the main control block (FSM), generating the control signals and the clock distribution network, that are the main integral parts of the design. However, for simplicity purposes the issues related to the control signals and the clocking scheme are omitted from this design procedure. And also, only the decoding mode of operation is considered here. Other system activities such as; writing into the LUT, frame sinc, and field signal modes of operations are not discussed.
Find the address (3iH). and the D. subtree is one level L O W .
The FIFO Block As pointed out earlier, the FIFO block consists of a serial-to-parallel 16-bit register that receives the serial compressed code from the input bit stream and delivers the data in 16-bit format, called codeword slices, to the FIFO RAM. The address to this RAM is prepared by an input counter, ICNT, which is incremented each time a word is written into the memory. This is to keep track of the data being stored. The content of the ICNT is also compared with the content of another counter, OCNT, each time a new data is written into the FIFO. This comparison is needed to verify for any possible overflow within the FIFO. A schematic diagram representing the FIFO block is shown in Fig. 7. The FIFO RAM is used as a memory buffer to regulate the data flow and to provide codes for processing without any interruption. This is necessary because while the input data is received in a constant rate, in a video transmission environment for example, it may not get processed with the same rate. This is due to the fact that the variable length coding scheme, used in the Huffman coding, may consume the code bits in a different rate. The size of the FIFO RAM, however, vary from application to application. For example, in cases of on line video signal transmissions the FIFO may take a considerable size, equivalent to one or more video frames, in order to avoid any possible data interruption or
Take the next one hit 111111111111OOL ...

0
= OH.
3a+0
3aH
The qymbal ia found a8 leH. and the actual codeword ia I 1 111I I I I 1 110.
Fig.S(c) The Huffman Look-UP Table. Running a decoder example.
address 2a. The next move is to get the third slice of codebits from the input stream, i.e., 1111, and proceed to the relative address 1111 in the f cluster. We again calculate the absolute address as 2a+f=39H, and refer to the LUT. The data at this location is 1/00/3a. This refers to the next cluster g at the address 3a, and the cluster length is 00+1=1. This time we take a slice of one bit from the input steam, which happens to be a single 0 bit. Similar to the previous cases we refer to the location 3a in the LUT and find the entry 0/00/le. This finalizes the search for the symbol because 0 appears as the first bit in the entry. The desired symbol is eventually identified as le. The codeword for l e is evidently 1111,1111,1111,0, and the codeword length is 13, as expected.
III. OPERATIONAL BLOCKS AND HARDWARE IMPLEMENTATION

Here we propose a hardware implementation for a Huffman decoder using the algorithm developed in [13], and described earlier. A block diagram for the entire design is given in Fig. 6. As shown, there are basically three major operational blocks in this implementation, namely; the FIFO block, the LUT #Address Generator, and the Source Code Generator. The FIFO block is designed to receive the serial data from the input code stream, pack them into 16-bit sized words, and then send them into a FIFO RAM. The LUT Address Generator is the second and the major operational block in the decoder. This block receives the code stream in blocks of 16-bits, as stored in the FIFO RAM, slices them into partial or full codewords, and generates the address for the source code memory (Huffman symbol LUT). The third operational block, i.e., Source Code Generator, simply consists of a RAM, containing the source codes and their Offset (clusrer) Addresses in a format addressable by the codewords. Finally, a selector block is included in the design to separate the source codes (symbols) S from the offset cluster ad-
II '
'
Y1
- 1
sovrcs Cod. t (Symbol)
+*
Fig.6 HuPfman decoder using the bit clustering technique.
Addremm
output Addre..
Fig.7 FIFO: Conversion from bit stream to 18-bit code,
and the FIFO RAM.
Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding overflow. In our design example, however, the FIFO is constructed of four parallel 256 X 4 RAM blocks, providing a total locations for 256 16-bit codeword slices. To access the FIFO RAM we need to first read-enable the RAM and then read one 16-bit codeword slice at a time. The address for the output data is provided by a output counter OCNT. The output from the FIFO is latched into the code register A , and the OCNT is incremented each time a read operation is completed, pointing to the next read memory location. After each read operation the content of the OCNT is compared with that of the ICNT. This is necessary to watch for the underflow situation; when the OCNT becomes higher than the ICNT. In general, while overflow is an indication that the FIFO RAM is full beyond its capacity, underflow, on the other hand, happens when there is no valid data left in the FIFO to be read.
16-bit C o d e r o r d S l i c e
349
Y
The LUT Address Generator Block
The LUT Address Generator Block is the central block of the entire design. This block, shown in Fig. 8, is aimed to generate the address for the symbols in the LUT, and also to identify the corresponding codewords within the input bit stream. The input data to the LUT Address Generator Block is a 16-bit codeword slice received from the FIFO RAM (through the A code register). This slice is broken down and grouped into one or more combinations of codewords and/or partial codewords; where, each partial codeword corresponds to a cluster in the Huffman tree. A codeword so obtained is then used as the relative address to the LUT to allocate the pertinent symbol. A partial codeword, on the other hand, is used as the relative address to point to the next cluster within the Huffman tree. The process of the address generation starts by partitioning the 16-bit codeword slice into equal sized groups, with group size L,; where, L, is the length of the longest cluster in the tree. For a typical case where the largest clus, ter length is L = 4 the input data (in the A register) is partitioned into four groups of Cbits each, and the entire input data is simultaneously transferred into another register B, consisted of four 4-bit registers Bo, B,, B,, and B,, as shown
w4
Fig.8 The Look-Up Table Address Generator Block.
A register is loaded into the B register, and iii) a new 16-bit codeword slice is read from the FIFO RAM and loaded into the A register, as discussed in a previous section. A second stage of code partitioning starts at the C register. The C register is a 7-bit register and there are four different possibilities (slots) that one of the 4-bit B, registers can be loaded into C. Depending how far the code-bits are consumed a barrel shifter is used to transfer the data from a B, to C, filling a 4-bit slot within the C register. The filling is done immediately to the right of the last valid bit in C. Figure 9 shows a symbolic diagram of the barrel shifter loading the data from a Bi register into a 4-bit slot in the C register. The slot position is controlled by a shift parameter K; where, K takes values 0, 1, 2, or 3, for the slots c, to c6, c, to c5, c1 to c4, or co to c,, respectively. A specific counter keeps track of the code-bits being consumed by the address generator and assigns the appropriate value to the K parameter, accordingly. Now, we are ready to generate the LUT relative address. First we recognize that the word-length of the LUT relative address, i.e. L, is either being given an initial value
in Fig. 8. In tht next step four parallel 4-to-I multiplexers are used to select one of the B, registers, for i = 0, 1 2, 3, or , 4, and to transfer the data f one Bi into a 7-bit register C. T i is done through a barrel shifter. To take turn a 2-bit hs down counter provides the control signals vo and v1 for the , n multiplexers, such that the B to Bo registers are selected i descending order to load the C register as required. With this sequence of processing we will evidently be able to consume the code-bits (and hence the corresponding codewords) with the same order being received from the input bit stream. In order to continue processing the input data without any interruption certain steps must be taken, each time the counter (vo v!> reaches its zero. These steps are as follows: i) the counter is set to 3, ii) the 16-bit codeword slice in the
350
Output from C
Output from B
b3b2
bl bo
A s h i f t cell
code bits from C into the least significant portion of the D register, and fills up the most significant (the sign and the sign extension) pomon of D with zeros, and ii) shifts the valid code bits within the C register (L positions) to the left, where, the valid code bits refers to the original code bits received from the input data stream. Apparently, after each L shift the number of valid code bits in the C register are reduced, and they need to be replaced. This replacement is done through another barrel shifter with the shift parameter K, as discussed earlier (Fig. 9). Evidently, for a continuous and uninterrupted transfer of codes from a Bi to C and then from C to D, there must be a relationship between the two shift parameters K and L (or L). In search to find a relationship between the shift parameters K and L we notice that both barrel shifters deal with the C register. One fills it up 4-bits at a time and the other removes L bits at a time; where, L = 1, 2, 3, or 4. And to start the operation we need one loading into the C register before any bit(s) removal begins. It is easy to prove the following relationship between the shift parameters K and L:
Fig9 The barrel shifter K: loading the C register from the B register.
Output from C
Shift L
0 1 2 3
Fig.10 The barrel shifter L: loading the D register from the C register. for the start, or it is obtained through an earlier process, fed back from the Source Code Generator Block (see Fig. 6). With L specified a barrel shifter is used to transfer the desired code bits from the C register into the D register. This is shown in Fig. 8, and the transfer of code through the barrel shifter is illustrated in Fig. 10. As shown in Fig. 10, for a given L, or equivalently L = L+1, the barrel shifter shifts the code into both registers C and D, simultaneously. It shall fulfill the following operations: i) shifts L number of
Where, n is the total number of times C has been loaded (4 L represents the total bit removal bits at a time), and from C, also accumulated from the beginning of the operation. We shall verify Eq. (1) in certain simple cases. 1) Notice that with n = o,C L must also remain zero; otherwise K becomes negative. The interpretation is that, without loading C we can not load the D register. 2) At the initial state n = 0, and L = 0 we get K = 0. T h i s means that, first we have to load the highest slot, cg to c6, in C, as shown in Fig. 9. 3) For n = 1, and L = 0 the shift parameter K becomes 4, which is impossible. Thus, after any slot fill into C there must be one or more shifts of bits into the D register. Equation (1) is certainly not suitable for hardware implementation. No counter can keep track of the entire code consumption. Instead we can use the bit status of the C register as an indicator, and associate with it a 2-bit counter, X = x1 xo with output carry tout. When C is empty (no valid bit) we have X = 00 and cout is cleared. This is the case when the operation begins. The first 4-bit fill into C sets tout with no change in X. On the other hand, any bit(s) extraction L from C causes a subtraction of L from X (the tout carry included). Immediately after tout is cleared a 4-bit fl il into C will occur, and this causes tout to be set again. This, of course, happens without any change in X. Sil we have tl to find out the value for K (the slot number for fill) each time a fill operation occurs. In a simple calculation it becomes evident that K is also a two bit data and always K = X. For example, at the initial state we have X = 00 with tout cleared, and thus, K = 00. If after the first fill (tout = 1) we shift L number of bits to D, then K becomes equal to X - L which is in agreement with Eq. (1). As a further simplification, we
Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding may replace X - L' by X +L, h e r e r is the bitwise inverse W of L. We now propose the following algorithm for generating K:
35 1
Algorithm 1:
1. The system is initialized by resetting the counter X into X = 00 with tout cleared, and K = 00. 2. A loading is performed transfemng four bits from one of the Bi registers into the C register. Upon the completion of the data transfer tout is set. 3. For each-shift L we calculate the new counter value
xnew = Xold
Address Bus
Data B u s
8,
I!
LUT Address
PU
e:
4. If to step 2.
tout = 1 go to step 3; otherwise let K = X, and go
L.
Fig. 11 The source Code (Symbol) Generator Block. The design of a Huffman decoder is carried out using silicon-gate CMOS process. In our first design both the FIFO and the LUT are excluded from the core, bringing the hardware to a relatively smaller size in order to make it a part of a video decoder chip. However, with the LUT so much reduced in size (about 128 to 256 bytes) it can simply be included inside the chip for a higher speed/performance.
Finally, to generate the actual (absolute) address for the LUT a 4-bit adder is assigned to receive the LUT relative address from the D register and also the LUT offset address Y from the Source Code Generator Block (will be discussed later). The output W, resulted from the adder, is stored into the LUT Address Register and is subsequently used to access the LUT RAM, located in the Source Code Generator Block, as shown in Fig. 8.
IV. CONCLUSION
Hardware design of a high speed and memory efficient Huffman decoder, introduced in[13], is presented. The algorithm developed is based on a specific Huffman tree structure using a code-bit clustering scheme. The method is shown to be extremely efficient in memory requirement, and fast in searching for symbols. For an experimental video data with code-words extended up to 13 bits, the entire memory space needed is shown to be 122 words in size, compared with normally 213 = 8196 words memory space requirement. The design of the decoder is carried out using silicon-gate CMOS process.
The Source Code Generator Block

The Source Code Generator Block is the third operational block that produces either the pertinent source code (symbol) or the address to a particular cluster (memory block) for a further search. The processing in this block starts when the address W is received from the LUT Address Generator Block. The address code W is a 5-bit word, and when the LUT RAM is read enabled the address causes the LUT RAM to deliver an ll-bit data to the data bus. This data is the concatenation of three pieces of information; namely, i) the select bit s, ii) the 2-bit address code-length L, and iii) the symboVaddress U, as shown in Fig. 11. The data format in the LUT was also discussed previously, and the entries consisting of three pieces of codes are clearly shown in Fig. 4, as well. The select bit s separates a source code (symbol) from the address to another block of memory, representing the next cluster in the process. A selector block is used to distinguish between the symbol and the cluster address, as illustrated in Fig. 6. The address code-length L, on the other hand, is a two bit code, and its value specifies the number of code bits necessary to remove from the input bit stream. It is the value of L that specifies the cluster length or the length of the entire codeword. And it is the value of L that determines the number of bits to be skipped over in a cluster jump. The third piece of code contained in the LUT RAM data entry is either the desired symbol, or it represents an offset address to the next cluster in the search process. The selection of one of the two is done by the select code s, as described earlier.
IV. REFERENCES
[l]. C.P. Sandbank and I. Childs, "The Evolution Towards High-Definition Television," Proc. IEEE, vol. 73, no. 4, pp. 638-645, April 1985. [2]. T. Fujio, "High-Definition Television Systems," Proc. IEEE, vol. 73, no. 4, pp. 646-655, April 1985. [3]. A.K. Jain, "Image Data Compression: A Review," Proc. IEEE, vol. 69, no.3, pp. 349-389, Mar. 1981. [4]. P. G. Neumann, "Efficient Error-Limiting VariableLength Codes," IRE Trans. Inform. Theory, vol. IT-8, pp. 292-304, July 1962. [5]. R. Hunter, and A. H. Robinson, "International digital facsimile coding standards," Proc. IEEE vol. 68, no. 7, pp. 854-867, July 1980. [6]. D.A. Huffman, "A method for the construction of minimum redundancy codes," Proc. IRE, vol. 40, no. 10, pp. 1098-1101, Sept. 1952.
352
T. J. Ferguson, and J. H. Rabinowitz, "SelfSynchronizing Huffman Codes," IEEE Trans. Inform. Theory, vol. IT-30, no. 4, pp. 687-693, July 1984. [8]. S.M. Lei, and M.T Sun, "An Entropy Coding System for Digital HDTV Applications," IEEE Trans. Circuit Syst. Video Tech., vol. 1, no. 1, pp. 147-155, March 1991. [9]. K.H. Tzou, "High-Order Entropy Coding for Images," IEEE Trans. Circuit Syst. Video Tech., vol. 2, no. 1, pp. 87-89, March 1992. [lo]. M. E. Lukacs, "Variable word length coding for a high data rate DPCM video coder," Proc. Picture Coding Symp., pp. 54-56, 1986. [ll]. S.M. Lei, M.T Sun, and K.H. Tzou, "Design and Hardware Architecture of High-Order Conditional Entropy Coding for Image," IEEE Trans. Circuit Syst. Video Tech., vol. 2, no. 2, pp. 176-186, June 1992. [12]. S. Roman, Coding and Information Theory, SpringerVerlag, New York, 1992. [13]. R. Hashemian, "High Speed Search and Memory Efficient Huffman Coding," Proc. 1993 IEEE Inter. Symp. Circuit Syst., May 3-6, 1993.
[7].
REZA HASHEMIAN : Received the B.S.E.E. degree from the Tehran University, Tehran, Iran, in 1960, and the M.S. and Ph.D. degrees from the University of Wisconsin, Madison, both in electrical engineering, in 1965 and 1968, respectively. From 1968 to 1984 he was with Sharif University of Technology as an Assistant, Associate, and Full Professor. This includes Eight years (1972.1980) of research and development on circuit simulation, MOS modeling and characterization, and IC design at the Material and Energy Research Center, where he was also one of the founders, and served as the deputy director and director for nine years. He joined Signetics (Phillips) Corporation in 1984 where he worked on design of semi-custom IC's. Currently he is with the Department of Elecmcal Engineering, Northern Illinois University, Dekalb, lL,where he teaches and does research in IC Design, computer arithmetic, active noise cancellation, and code compression algorithms. Dr Hashemian is a senior member of the IEEE, and has served ;IS the co-chairman of the IEEE Iran section in 1970-72.

Design and Hardware Implementation A Memory Efficient Huffman Decoding

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design and Hardware Implementation A Memory Efficient Huffman Decoding

Uploaded by

Copyright:

Available Formats

Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding

DESIGN AND HARDWARE IMPLEMENTATION OF A MEMORY EFFICIENT HUFFMAN DECODING

Abstract- Hardware design of a high speed and memory

Manuscript received June 27, 1994

0098 3063/94 $04.00 1994 IEEE

IEEE Transactions on Consumer Electronics, Vol. 40, No. 3, AUGUST 1994

II. MEMORY STRUCTURE FOR A HUFFMAN

1111 1111 1111 1111

1111 1111 1111

Table 1 A Huffman table partitioned into clusters or four bits or less.

Hashemian: Design and Hardware Implementation of a Memory Efficient Huffman Decoding

F i g 3 Block-wise Huffman Fig.4 Huffman Look-Up

The symbol is found a t this Location.

I l b There are 4 b i b in this codeword. Tho codeword ia 1101.

07H ThL i the mymbol. .

Fig.S(a) The Huffman Look-Up Table. Running a decoding example.

IOH The s b r t iddrsu af the

Olb The extended aorarord is 2 bit long.

FigS(b) The Huffman Look-Up Table. Running a decoding example.

IEEE Transactions on Consumer Electronics, Vol. 40, No. 3, AUGUST 1994

the input eade-bit steam .I I: I I 11I I I I I111001 we get 1111 = IH

Find the s The next clunker-length 4

1 4 H Ths start address.

Take the next 4 bits 111II11I1IIIOOL 1111 = IH. l4+f

Find Ule addreas (ZaH). and aubtrss la 4 level long.

Take U s next 4 bit. L11111111111001

Find the address (3iH). and the D. subtree is one level L O W .

Take the next one hit 111111111111OOL ...

Fig.S(c) The Huffman Look-UP Table. Running a decoder example.

III. OPERATIONAL BLOCKS AND HARDWARE IMPLEMENTATION

sovrcs Cod. t (Symbol)

Fig.6 HuPfman decoder using the bit clustering technique.

Fig.7 FIFO: Conversion from bit stream to 18-bit code,

and the FIFO RAM.

IEEE Transactions on Consumer Electronics, Vol. 40, No. 3, AUGUST 1994

tout = 1 go to step 3; otherwise let K = X, and go

The Source Code Generator Block

IEEE Transactions on Consumer Electronics, Vol. 40, No. 3, AUGUST 1994

You might also like