Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 47

Product Overview

The DesignWare JPEG CODEC With Header Processing is part of an SoC-based multimedia solution that enables fast and simple image compression and decompression.

1.1

Features

Compatibility: 100 percent baseline ISO/IEC 10918-1 JPEG compliant 8-bit/channel pixel depths Single clock per pixel encoding and decoding Support for JPEG header generation and parsing Up to four programmable quantization tables Fully programmable Huffman tables (two AC and two DC) Fully programmable minimum coded unit (MCU) Encode/decode support (non simultaneous) Single clock Huffman coding and decoding Four-channel interface: Pixel In, Compressed Out, Compressed In, Pixel Out Simple external interface Stallable design Hardware support for restart marker insertion Support for single, grayscale component Functionality to enable/disable header processing Internal registers interface Fully synchronous design

1.2

System Integration
Figure 1-1 shows how the JPEG CODEC can be used in an SoC.

CPU
Register Interface JPEG/ ECS Out Display and/or Storage

Pixel In Acquistion and/or Storage

JPEG CODEC With Header Processing


JPEG/ ECS In Pixel Out

Memory Interface

Internal Memories
FIGURE 1-1 Typical Use of the JPEG CODEC in an SoC

Signal Descriptions

This chapter describes the JPEG CODEC interfacing signals.

3.1

JPEG CODEC Interface Block Diagram


Figure 3-1 shows the interface block diagram for the JPEG CODEC.

Pixel I/O

JPEG CODEC With Header Processing

Register Interface

Control Signals

JPEG/ECS I/O

FIGURE 3-1

JPEG CODEC Interface Block Diagram

Memory Interface

3.2

Top-Level Pin Diagram


Figure 3-2 shows the top-level diagram for the JPEG CODEC With Header Processing.

JPEG_CODEC_HDR
clk reset sw_res Control Signals en over cpu_addr cpu_din cpu_we cpu_dout pixin pixin_req pixout pixout_vld enc_data enc_data_vld dec_data dec_data_req

DCTRam ZigRams QMem

CPU Interface

HuffEnc DHTMem HuffMin HuffBase HuffSymb

Pixel I/O

JPEG/ ECS I/O

FIGURE 3-2

JPEG CODEC With Header Processing Top-Level Pin Diagram

3.3

Control Signals
Table 3-1 describes the control signals.
TABLE 3-1 Name clk reset sw_res en Control Signals I/O Input Input Input Input Description Core clock signal. Asynchronous core reset, active high. Synchronous core reset, active high. Synchronous core enable signal. When this signal is low, the core ignores all the input signals with the exception of reset and sw_res. All the core outputs must also be ignored when en is low. When high, this signal indicates the end of encoding or decoding.

over

Output

18

3.4

Core Register Interface Signals


Table 3-2 describes the Core Register interface signals.
TABLE 3-2 Name cpu_addr[2:0] cpu_din[31:0] cpu_we Core Register Interface Signals I/O Input Input Input Description Internal registers address bus. Internal registers input data bus. Internal registers write enable. When this signal is high, the data on the cpu_din bus is written to the internal register pointed by cpu_addr Internal registers output data bus.

cpu_dout[31:0]

Output

3.5

Pixel I/O Signals


Table 3-3 describes the pixel I/O signals.
TABLE 3-3 Name pixin[7:0] pixin_req pixout[7:0] pixout_vld Pixel I/O Signals I/O Input Output Output Output Description Input pixel data. When this signal is high, pixin input must be valid. Pixel data output. When this signal is high, pixout is valid.

3.6

JPEG/ECS I/O Signals


Table 3-4 describes the JPEG/ECS I/O signals.
TABLE 3-4 Name enc_data[7:0] enc_data_vld dec_data[7:0] dec_data_req JPEG/ECS I/O Signals I/O Output Output Input Output Description JPEG/ECS data output. When this signal is high, data on enc_data bus is valid. JPEG/ECS data input. When this signal is high, the dec_data input must be valid.

3.7

Memory Interface Signals


Table 3-5 describes the Memory interface signals.
TABLE 3-5 Name Memory Interface Signals I/O Description

DCTRam Memory Interface yh[14:0] memh[5:0] xv[14:0] memv[5:0] Output Output Input Output DCTRam write data. DCTRam write address. DCTRam read data. DCTRam read address.

ZigRams Memory Interface zd[10:0] za0[5:0] zwe0 zd0[10:0] za1[5:0] zwe1 zd1[10:0] Output Output Output Input Output Output Input ZigRams input data. ZigRam0 address bus. ZigRam0 read/write signal. When this signal is high, the data on the zd bus is written. ZigRam0 read data bus. ZigRam1 address bus. ZigRam1 read/write signal. When this signal is high, the data on the zd bus is written. ZigRam1 read data bus.

QMem Memory Interface dqtmem_data[7:0] dqtmem_addr[7:0] qtcoeff[7:0] Output Output Input QMem table write data bus QMem table address QMem table read data bus

HuffEnc Memory Interface he_addr[8:0] hed[11:0] Output Input HuffEnc table read address HuffEnc table read data bus

DHTMem Memory Interface dht_addr[8:0] dhtd[7:0] Output Input DHTMem table read address DHTMem table read data bus

HuffMin Memory Interface minmem_addr[1:0] mind[99:0] Output Input HuffMin table read address HuffMin table read data bus

20

TABLE 3-5 Name

Memory Interface Signals (Continued) I/O Description

HuffBase Memory Interface basemem_data[8:0] basemem_addr[5:0] base[8:0] Output Output Input HuffBase table write data bus HuffBase table read/write address HuffBase table read data bus

HuffSymb Memory Interface symbmem_data[7:0] symbmem_addr[8:0] symbol[7:0] Output Output Input HuffSymb table write data bus HuffSymb table read/write address HuffSymb table read data bus

Memory Enable Signals dqtmem_en Output Synchronous enable signal generated by the core for the DCTRam, ZigRam, and QMem memories. The memories are enabled when dqtmem_en is high. Synchronous enable signal generated by the core for the HuffEnc memory. The memory is enabled when he_en is high. Synchronous enable signal generated by the core for the DHTMem memory. The memory is enabled when dht_en is high. Asynchronous memory write enable signal for HuffBase and HuffSymb. QMem write enable signal. The memories are written to when mem_we is high.

he_en dht_en

Output Output

mem_we

Output

CODEC Operation

This chapter describes JPEG CODEC architecture, functionality and operations.

4.1

Hardware Overview
The JPEG CODEC With Header Processing is built around Synopsyss existing JPEG ECS CODEC and extends its functionality by providing additional support for JPEG Header parsing and generation. The core implements all the steps necessary to encode and decode image data according to the JPEG baseline algorithm as specified in ISO/IEC 10918-1. The designs simplicity enables high operational speed and makes it ideal for multimedia and color printing applications. The core is specifically designed to accelerate entropy-coded segment (ECS) encoding and decoding, because this forms the most computing-intensive part of the baseline JPEG algorithm. During encoding, at the start of every image frame, a JPEG header is first appended and then the scans are coded. Table specifications and other marker segments generally follow the start of frame (SOF) marker. The core starts by encoding the JPEG header bit stream with the parameters and table specifications programmed during initialization. When the core is ready to insert ECS data into the JPEG stream, the core begins accepting blocks of 8 x 8 pixels (data units) and encodes them into valid ECSs. An end of image (EOI) marker is appended for each frame at the end of the encoded JPEG data stream. The resulting encoded bit stream is 100 percent compliant with the interchange format syntax specified in Annex B of ISO/IEC 10918-1. The JPEG Encoder core consists of the JPEG Header Generator block, the ECS Encoder block, and a main controller that controls the two processing blocks. The core can enable/disable header processing. If disabled, only the ECS CODEC module is active and the core generates/decodes only ECS data. Support for restart markers is also provided. Because restart markers are the only markers that can be contained in an ECS, the ECS CODEC module inserts/recognizes them in the encoded stream. Restart marker insertion can be optionally enabled. During decoding, the JPEG CODEC starts parsing header information from the incoming JPEG encoded data stream. Table specifications and parameters required to correctly decode the ECS data that follows the header data are stripped out and stored in the core internal

CPU registers and external memories. Marker segments that do not contain information relevant to the core are parsed for their length field so that the data bytes associated with these segments can be ignored. The ECS CODEC is activated when ECS data is encountered in the input stream. The ECS CODEC accepts valid ECSs and decodes them into 8 x 8 pixel
/
J

blocks (data units). Decoded results depend on the quantization and Huffman table data, as well as the image parameters extracted from the header data. If a define restart interval (DRI) marker segment is present, restart markers are recognized and acted on. JPEG encoded data streams decoded by the core must be compliant with the interchange format syntax specified in the ISO/IEC 10918-1 specification. The core also supports JFIF images, the de facto standard used to encode JPEG images. However, all application-specific marker segments found in these data streams are ignored. The JPEG Decoder core consists of the JPEG Header Parser block, the ECS Decoder block, HuffMin RTL internal memory module, and a main controller that controls the two processing blocks. The core supports up to four color components, four quantization tables, and two sets of DC and AC Huffman tables. For encoding, the ECS Encoder requires 176 x 12 bits of storage for each set of AC and DC Huffman tables. Normally, two DC and AC tables are required, resulting in 384 x 12 bits of RAM or ROM. Throughout the rest of this databook, this memory is referred to as HuffEnc. For header generation, 206 x 8 bits of storage is required for each set of AC and DC Huffman tables, resulting in a total of 412 x 8 bits of RAM or ROM. Throughout the rest of this databook, this memory is referred to as DHTMem. For decoding, for a single set of AC and DC tables, two tables are required: 32 x 9 bits and 174 x 8 bits. If two sets of AC and DC tables are required, the table sizes are 64 x 9 bits and 336 x 8 bits. Throughout the rest of this databook, these memories are referred to as HuffBase and HuffSymb, respectively. If header processing is disabled, HuffMinan additional external memoryis required to decode the ECS data. This table needs 2 x 100 bits of storage for a single set of AC and DC tables. These memories can be RAM or ROM, depending on the application. Because the CODEC does not support encoding and decoding simultaneously, the storage for Huffman encoding

can be shared with the one for decoding. The logic required for sharing this storage can slightly reduce performance; you must decide if resource sharing is suitable for your application. Up to four quantization tables (one per color component) are stored in a userprovided RAM or ROM of appropriate size (each table requires 64 x 8 bits of storage). Throughout the rest of this databook, this memory is referred to as QMem. The MCU is the minimum number of blocks that can be encoded or decoded. The MCUs composition is fully programmable. For a single color component, the MCU coincides with an 8 x 8 pixel block. For multiple color components, the MCU comprises at least one block per component. However, in some cases for example, in color spaces such as YUV in video applications, the MCU can be formed by four Y pixel blocks, two U blocks, and two V blocks. Full MCU programmability enables encoding and decoding color components with different sampling factors such as 4Y + 2U + 2V, 4Y + U + V, R + G + B, C + M + Y + K and several other possible combinations. For each color component, you can specify which quantization and Huffman tables to use. You can also define how many blocks in each MCU belong to a particular color component.

This provides flexibility in the interleaving of color components with different subsampling rates as required by different applications, such as color printing and video editing. To operate correctly, JPEG CODEC also requires three RAMs: one 64 x 15 dual-port and two 64 x 11 single-port. Throughout the rest of this databook, these memories are referred to as DCTRam and ZigRam, respectively.

4.2

Source Files
Table 4-1 lists all the JPEG CODEC source files and indicates whether a file is used in encoding or decoding. Each file name corresponds to the module name.
TABLE 4-1 File jpeg_codec_hdr.v jpeg_main_ctrl.v jpeg_header.v jpeg_hparser.v jcodec.v dct.v acch.v accv.v addr_gen.v clamp.v crossh.v crossh.v mulh.v mulv.v zigzag.v regctrl.v quant.v mul.v rlen.v prio.v code.v cd.v hbit.v JPEG_CODEC_HDR Source Files Encoding (E)/Decoding (D) E/D E/D E D E/D E/D E/D E/D E/D D E/D E/D E/D E/D E/D E/D E/D E/D E E E E E

TABLE 4-1 File stuff.v unstuff.v decode.v dhuff.v rlex.v store.v qlook.v

JPEG_CODEC_HDR Source Files (Continued) Encoding (E)/Decoding (D) E D D D D E E E/D D

jpeg_mem_ctrl.v huffmin_mem_rtl.v

4.3

Top-Level Module Description


As shown in Figure 4-1, the jpeg_codec_hdr module implements the encoding and decoding of the JPEG bit stream.

cpu_din [0]

cpu_addr cpu_we

en

Main Controller

over

pixin

enc_data

JPEG Header Generation

JPEG ECS CODEC

JPEG Header Parser

pixout

dec_data

qtcoeff

Qlook [Q] Memory Controller

HuffMin RTL Memory

JPEG_CODEC_HDR

DHTMem QMem HuffMin (ECS only) HuffBase HuffEnc HuffSymb

FIGURE 4-1

JPEG_CODEC_HDR Top-Level Block Diagram

At the top level, the following modules form the JPEG CODEC With Header Processing core:

jpeg_main_ctrl (Main Controller) module (file jpeg_main_ctrl.v) jpeg_header (JPEG Header Generator) module (file jpeg_header.v) jpeg_hparser (JPEG Header Parser) module (file jpeg_hparser.v) jcodec (JPEG ECS CODEC) module (file jcodec.v) jpeg_mem_ctrl (Memory Controller) module (file jpeg_mem_ctrl.v) huffmin_mem_rtl (HuffMin RTL Memory) module (file huffmin_mem_rtl.v) qlook (Q Lookup Table) module (file qlook.v)

The encoding and decoding processes share the jcodec and jpeg_mem_ctrl modules.

4.4
4.4.1

Submodule Description
JPEG Main Controller
The JPEG CODEC With Header Processing utilizes a centralized control architecture, where a main controller controls the flow of events during the encoding or decoding process. The main controller enables and disables the blocks, based on control signals and status indication flags generated by each block. The controller is an event-based state machine controlling the operation of the JPEG CODECs three main blocks:

JPEG Header Generator JPEG Header Parser JPEG ECS CODEC

Figure 4-2 shows a simplified diagram of the controller state machine.

reset

IDLE headgen=1'b0 hp_en=1'b0 ecs_enb=1'b0 en == 1'b1 && Hdr == 1'b1 else

HDR GENERATE/PARSE headgen=~dec hp_en=dec ecs_enb=1'b0 else

en ==1'b1 && Hdr == 1'b0

headgen_done == 1'b1 (ecs_over && Hdr && dec) ECS ENCODE/DECODE headgen=1'b0 hp_en=1'b0 ecs_enb=en (ecs_over && Hdr && !dec) headgen_done == 1'b1 else else

EOI MARKER headgen=1'b1 hp_en=1'b0 ecs_enb=1'b0

FIGURE 4-2

JPEG Main Controller State Machine

As indicated in Figure 4-2, the controller waits in an Idle state on a reset until the cores enable signal, en, becomes active high. If encoding and header processing is enabled, the jpeg_header module is activated and the header stream is generated. The ECS CODEC is enabled when the headgen_done signal becomes active high. Pixel data on the pixin input port is sampled, encoded into valid ECSs, and appended to the output stream. When ECS encoding is complete, the jpeg_header block is enabled again to generate the EOI marker segment. The main controller FSM returns to its Idle state when it has completed encoding a single frame. Similarly, while decoding with header processing enabled, the jpeg_hparser module is first enabled so that information can be parsed from the incoming JPEG header data. When all header data has been stripped out and the memories have been stored with table specification data, the ECS CODEC block is enabled and the ECS data is decoded into valid pixel data. If the Hdr bit in the CPU registers is not set, the main controller activates only the ECS CODEC block and only ECS data is encoded or decoded. The main controller also generates a power-on signal for the header blocks. The logic value of this signal is dependent on the start/stop bit in CPU register 0. When the start signal is high, the flow of operation proceeds in the FSM as shown in Figure 4-2. If this signal becomes active low due to the start/stop bit being reset, all the blocks transition to the Idle state one clock cycle later.

4.4.2

JPEG Header Processing


The interchange format dictated by the ISO/IEC standard is a compressed image data representation that includes all the table specifications used in the encoding process. The interchange format has been specified for exchange of compressed information between different application environments, as shown in Figure 4-3. A source image compressed with a specified encoding process within application environment A can be passed to application environment B for decoding and image reconstruction using this interchange format.

Application Environment A

Compressed image data, including table specifications

Application Environment B

FIGURE 4-3

Interchange Format for Compressed Image Data

Structurally, the encoded data format consists of an ordered collection of parameters, markers, and entropy-coded segments. Parameters and markers are organized into marker segments. Because all these constituent parts are represented by byte-aligned codes, the compressed data format consists of an ordered sequence of 8-bit words. Parameters are defined as values specific to the source image characteristics and the encoding process. The parameters encode critical information that the decoding process requires to properly reconstruct the image. Markers identify the various structural parts of the compressed data format. The high-level constituent parts of the JPEG interchange format are shown in Figure 4-4. The highest-level states of the interchange format begin with a start of image (SOI) marker, contains only one frame, and ends with an EOI marker. At the second level, the syntax specifies that one or more table specifications can precede the frame header and that the frame must contain at least one or more scans. The scan header follows the frame header. Table specifications and other miscellaneous marker segments can precede the scan header. If restart marker processing is not enabled, the scan data contains only one ECS.
Compressed image data SOI Frame EOI

Frame [ Tables/ ] misc. Frame Header Scan


1

DNL ] Segment

[Scan 2]

[Scan last]

Scan [ Tables/ ] misc. Scan Header [ECS0 RST0 ECSlast-1 RSTlast-1] ECS last

Entropy-coded segment 0 <MCU1>, <MCU2>, ..... <MCUNRST>

Entropy-coded segment last <MCUn>, <MCUn + 1>, ..... <MCU last>

FIGURE 4-4

JPEG Interchange Format Syntax

The core header processing blocks support the interchange format syntax that applies to the sequential discrete cosine transform (DCT)-based mode of operation of the JPEG

baseline algorithm.

4.4.2.1

Header Generation The jpeg_header module implements the JPEG header generation. This module comprises a finite state machine and some dedicated logic to assist in the generation of header data. By analyzing the RTL code, the flow of the jpeg_header module state machine is quite easy to follow. If the Hdr bit is clear, the JPEG CODECs header generation capability is disabled and only ECS data is produced. All image-specific parameters to be inserted in the header fields must be programmed via the CPU Register interface. The JPEG header output stream comprises a sequential flow of markers, marker segments, table specification data, and parameters. Table 4-2 lists the supported markers, their code assignments, and a brief description. Figure 4-4 shows the order of the output bit stream, which the jpeg_header module FSM applies. Unless the byte stream is stalled by setting the en signal low, the encoded header byte stream is transferred to the output port enc_data at a rate of 1 byte per clock cycle. The FSM waits in the Idle state until the headgen signal becomes active high. The SOI marker is generated first. The DQT marker segment is generated next, where quantization tables are read from the QMem memory and inserted into the header output stream. The number of tables inserted depends on the value of the colspctype parameter. The frame header follows this and image-specific parameters programmed in the core internal registers are inserted in the assigned fields. The DHT table segment is encoded after the frame header and data associated with it is read from the DHTMem. If the Re bit is set, the restart marker segment is generated as well. The scan header is the last marker segment to be output before the core is ready for ECS data. When the scan header segment is complete, the jpeg_header sets the headgen_done signal high and returns to the Idle state. When the JPEG ECS CODEC has finished encoding the requested MCUs, the ecs_done signal is set and the jpeg_header module is again enabled to generate the EOI marker. This process generates a valid JPEG bit stream that is compatible with the interchange format specified in Annex B of ISO/IEC 10918-1.

4.4.2.2

Header Parsing The jpeg_hparser module implements the JPEG header parsing function. This module consists of a large finite state machine and dedicated logic to assist in the parsing of header data and programming of external memories and core internal registers. Access to

the external memories and the internal registers is shared with the JPEG ECS CODEC block. Read and write operations to all storage devices by these two blocks are nonsimultaneous. By reading the RTL code, the flow of the module state machine is relatively simple to understand. During decoding, the jpeg_hparser module accepts a valid JPEG stream and parses it for supported markers. The jpeg_hparser module supports all markers relevant to the JPEG baseline algorithm indicated in Annex B of ISO/IEC 10918-1.

Table 4-2 lists the supported markers, their code assignments, and provides a brief description.
TABLE 4-2 Markers Supported for JPEG Header Parsing Symbol SOF 0 DHT JPG RST m SOI EOI SOS DQT DNL DRI APP n JPG n COM Description Start of frame, Baseline DCT mode Huffman table(s) segment Reserved for JPEG extensions Restart markers with modulo 8 count m Start of image End of image Start of scan Quantization table(s) segment Defines number of lines Restart marker interval segment Reserved for application segments Reserved for JPEG extensions Comment H/W Marker Parsing Supported Supported Supported, but data ignored Supported by ECS CODEC Supported Supported by ECS CODEC Supported Supported Supported Supported Supported, but data ignored Supported, but data ignored Supported, but data ignored

Code Assignment 0xFFC0 0xFFC4 0xFFC8 0xFFD00xFFD7 0xFFD8 0xFFD9 0xFFDA 0xFFDB 0xFFDC 0xFFDD 0xFFE00xFFEF 0xFFE00xFFEF 0xFFFE

To activate the jpeg_hparser module, the Hdr and De bits in the core registers must be set. The main state machine waits in the Idle state until the hp_en input signal becomes active high. The dec_data_req output signal becomes active high three clock cycles after the en signal is set high and remains high unless the JPEG CODEC is stalled by setting the en signal low. The module decodes the JPEG header stream at a rate of 1 byte per clock cycle. Header data on the module input data bus is sampled only when en and dec_data_req are both high. Data transfer can be suspended at any time by setting en low, causing the data request signal to go low one clock cycle later. When parsing a supported marker, the jpeg_hparser block extracts the required parameters from the incoming data stream and stores them in registers. A set of registers identical to the cores CPU registers is set up within this module. At the end of parsing, the CPU registers are programmed before this module is disabled.

If a DQT marker segment is located, quantization data associated with it is written to the QMem memory. The Tqi field determines which quantization table (03) is stored in memory.

Because the Huffman table data format of the DHT marker segment is not compatible with the internal representation of the ECS Decoder Huffman tables, this module performs the conversion. The DHT segment data is read in 1 byte per clock cycle, converted into three different table formats (HuffMin, HuffBase, and HuffSymb), and the data is stored in their respective memories. The DNL and DRI segments are also recognized, and the appropriate fields in the registers are loaded. A marker segment consists of a marker followed by a 2-byte length parameter. This parameter enables the JPEG CODEC to identify the marker segments that do not contain any useful information, to read their 2-byte length field, and to discard the remaining data in that segment. Marker segments for JPEG extensions, application-specific data and comments are ignored. Finally, the scan header is decoded, and the register contents are written to the ECS CODEC internal registers. The headgen_done signal is set, and the main state machine returns to its Idle state.

4.4.3

JPEG ECS CODEC


The jcodec module implements ECS coding and decoding. At top level, the following modules form the jcodec module:

dct (dct/idct) module (file dct.v) zigzag module (file zigzag.v) store (FIFO) module (file store.v) quant (Quantizer/Dequantizer) module (file quant.v) code module (file code.v) unstuff module (file unstuff.v) decoder module (file decode.v) regctrl module (file regctrl.v)

The encoding and decoding processes share some modules such as the dct (dct/idct), the quant (quantizer/dequantizer), the zigzag, and the store (FIFO) modules. Figure 4-5 shows a simplified block diagram of the jcodec module.

DCTRam

ZigRam0

ZigRam1

QMem

HuffEnc

pixout pixin dct zigzag quant code enc

dec

unstuff

store addr decode regctrl din dout

jcodec

HuffMin

HuffBase HuffSymb

FIGURE 4-5

jcodec Module Block Diagram

4.4.3.1

dct (DCT/IDCT) Module The dct module performs the DCT and inverse discrete cosine transform (IDCT) on 8x8 blocks of image data. The following submodules form the dct module:

addr_gen mulh crossh acch mulv crossv accv clamp

Both the DCT and IDCT are performed in the same module where common logic is shared. When computing the DCT, the dct module accepts image data belonging to an 8 x 8 block along the rows of a block. The resulting coefficient is output along the columns. This is because the implementation exploits the DCTs property to be a separable transform. The image data is transformed first along the rows of an 8 x 8 block and then again along the columns.

Processing of rows and columns is overlapped and intermediate results are stored in the DCTRam memory. In the mulh module, each pixels data is first multiplied by a constant vector. The resulting vector is passed to the crossh module that selectively rearranges and duplicates the vectors components. Each vector is accumulated by the acch module and constitutes a partial result of the final DCT or IDCT result. After eight clock cycles, eight pixels have been processed and a row of eight pixels has been transformed. This is stored in the DCTRam. At the same time, row-transformed pixels are fetched and similarly processed by the mulv, crossv, and accv modules, whose activities are controlled by the addr_gen module. The IDCT is performed in a similar fashion. Additionally, the module clamp performs truncation on the inverse transformed IDCT data to restrict it to an 8bit value. 4.4.3.2 zigzag Module The zigzag module is a simple address generator that drives the ZigRams. During encoding, DCT transformed data being output along the columns of an 8 x 8 block is reordered in zigzag order. This reordering is achieved by writing the incoming data in one ZigRam while the other memory is read out of order. Similarly, during decoding, data coming from the decoder in zigzag order is rearranged before being passed to the dct module. 4.4.3.3 quant (Quantizer/Dequantizer) Module This module performs quantization and dequantization of DCT transformed coefficients. Coefficients are processed in zigzag scan order. During compression, quantization coefficients are looked up from the QMem memory. These coefficients are stored as reciprocals of the intended quantization values. The reciprocals are represented as 14-bit, non-standard floating-point numbers. All 256 possible values of the reciprocals are listed in the QLook table, as explained in Chapter 5, Programmers Guide. Each quantization coefficient is multiplied with its corresponding DCT coefficient for every 8 x 8 block. The resulting quantized values are then ready for entropy coding. The 14-bit precision of the reciprocal representation is sufficient to meet the ISO specification requirements as detailed on page 28 of ISO/IEC 10918-1. During decompression, quantization coefficients are looked up from the QMem memory. Because inverse quantization is now performed, these are stored as 8bit integers. The quantized value is multiplied by the appropriate quantization coefficient, resulting in a dequantized DCT transformed coefficient. The same multiplier defined in the mul module is used for dequantization.

4.4.3.4

store (FIFO) Module This module contains a 4-bit deep, 36-bit wide FIFO that is shared by both the encoding and decoding process. Each word in the FIFO consists of 4 bytes and 4 flag bits (one for each byte in the word). The meaning of the flag bits differs for encoding and decoding and is explained in the appropriate section. The FIFOs purpose is to smooth bandwidth bursts. In fact, both during encoding and decoding, up to 32 bits might be produced/required. This causes the encoding/decoding pipeline to stall, because the jcodec module I/O interface is only 8 bits wide.

4.4.3.5

code Module The code module consists of the following modules:


rlen module cd module hbit module stuff module

This module implements the lossless part of the ECS encoding algorithm. 4.4.3.5.1 rlen Module The rlen module applies the run length encoding algorithm to incoming quantized DCT transformed coefficients. According to the input, this module produces run/length pairs. This module also performs differential DC encoding, taking into account restart marker insertion. Symbols and amplitudes that are produced are then passed to the cd module. 4.4.3.5.2 cd Module The cd module calculates the address in the HuffEnc memory of symbols and amplitudes passed from the rlen module. The cd module also generates ZRL and EOB codes. ZRL codes are produced each time 16 zero coefficients are encountered. However, to increase compression efficiency, all ZRL codes are dropped and substituted with a single EOB code, provided this leads to greater compression. To achieve this and maintain one symbol per clock encoding, ZRL codes are pushed into a shallow FIFO but flushed out and substituted with an EOB code if a block terminates with a zero value. In accordance with Sections F.1.2.1.1 and F.1.2.2.1 of ISO/IEC 10918-1, this module processes amplitudes associated with a run length code. Processed amplitudes and Huffman codes retrieved from the HuffEnc table are passed to the hbit module for bit packing. Additional logic in the cd module generates packing instructions for the hbit module whenever code padding with 1s or marker insertion is required.

4.4.3.5.3

hbit Module This module packs incoming amplitude data and Huffman codes into a 32-bit word to be stored by the store module. The hbit module contains a 32-bit buffer that is written to the store module when the hbit module is full. Bytes in the 32-bit buffer that have a hex value of $FF but do not belong to a restart marker are flagged before being written to the store module using the extra flag bits previously mentioned.

4.4.3.5.4

stuff Module The stuff module reads the 4 bytes and its associated flags from the store module and outputs 1 byte at a time. Also, by decoding the associated flags, the stuff module determines whether a byte with the hex value of $FF needs to be expanded to $FF00. This occurs when an $FF byte does not belong to a marker. unstuff Module The module is part of the decoding pipeline. Incoming ECS data is stripped of any markers before being packed in a 32-bit word. The resulting 32-bit word is written to the store module. Each incoming marker is discarded and only ECS data is allowed in the FIFO. However, the position of non-restart markers is recorded in the FIFO using the associated 4-bit flags. This is required as stop condition by the decoding process. In fact, a nonrestart marker indicates the end of ECS data.

4.4.3.6

4.4.3.7

decode Module The decode module consists of the following modules:


dhuff module rlex module

The decode module implements the lossless portion of the JPEG baseline decoding algorithm. 4.4.3.7.1 dhuff Module The dhuff module performs Huffman decoding on incoming ECS data. Each Huffman decoding operation takes one clock cycle. The dhuff module contains buffa 64-bit buffer that is loaded with ECS data at the beginning of decoding. A bit pointer indicates the position within this buffer and is reset at the start of decoding. During decoding, the code length is first determined by comparing the word pointed in the buff buffer with 16 codes coming from the HuffMin memory. HuffMin contains four groups (two for AC and two for DC) of 16 codes. Of these 16 codes, the first is 1 bit long, the second 2 bits, and so on. This, in addition to the fact that for codes

longer than 8 bits, all the extra bits are 1, explains the width of the HuffMin memory (1 + 2 + 3 ++ 7 + 8 + 8 x 8 = 100). Each of the 16 codes in the HuffMin output represents the smallest Huffman code for a given length, enabling computing the length of the Huffman code by a parallel comparison. When

the code length is determined, it can be used to determine the position of the symbol sought in the HuffSymb table. Also, after the length is determined, the relevant portion of the Huffman code can be extracted from the buff buffer. The extracted Huffman code is added to a base address retrieved from the HuffBase table to obtain the address to the HuffSymb table. Access to the HuffSymb table with the address just obtained provides either the run length symbol required or the ZRL, EOB code. The length of the Huffman code also enables extracting the amplitude value associated with the run length symbol. This information is passed to the rlex module for run length expansion. The length of the Huffman code and the length of the associated amplitude is used to update the pointer to the buff buffer. In practice, updating the pointer to the buff buffer is far more complicated, because restart markers must also be considered. Even though the unstuff module strips restart markers off the incoming ECS data, the padded markers added before marker insertion are still attached to the data. This must be taken into account when updating the pointer to the buff buffer so they can be skipped in the next decoding cycle. When more than 32 bits have been used in the buff buffer, a new 32-bit word is read from the store module so that the buff buffer always has sufficient data to sustain single-cycle decoding. 4.4.3.7.2 rlex Module The rlex module receives a run length code and the amplitude value from the dhuff module, expands the run length code, and produces quantized DCT transformed coefficients in zigzag order. When an expansion is completed, the dhuff module requests a new run length code. This rlex module takes into account the effect of restart markers on differential DC decoding. 4.4.3.8 regctrl Module The regctrl module contains the private registers of the jcodec module and provides the main control for ECS data encoding and decoding process. The regctrl module also generates top bit addresses for accessing the correct Huffman and quantization tables based on the MCU composition.

4.4.3.8.1 Internal Registers There are eight private registers in the regctrl module. Registers are accessed via the cpu_din and cpu_dout buses and addressed via the cpu_addr bus. Register 0 is the Start/Stop register.
TABLE 4-3 regctrl Module: Register 0

Register Bit 0

Field start/stop

Description Writing any non-zero value to this register will start the CODEC. By writing 0, operations are aborted.

Register 1 contains seven different parameters.


TABLE 4-4 Register Bit 1:0 regctrl Module: Register 1 Field Nf Description Number of color components in the source image minus 1. For example, in a grayscale image Nf = 0; for a RGB or YUV image Nf = 2. Enables restart marker processing when set. The ECS Encoder inserts restart markers every NRST +1 MCU. If set, indicates that the core will decode JPEG or ECS data. If cleared, the core will encode JPEG or ECS data. colspctype defines the number of quantization tables to insert in the output stream. 1: Grayscale 2: YUV 3: RGB 7:6 8 15:9 31:16 Ysiz Ns Hdr Number of components for scan header marker segment Header generation enabled when set. Not used Number of lines in source image

2 3 5:4

Re De colspctype

Register 2 contains the NMCU value.


TABLE 4-5 Register Bit 26:0 regctrl Module: Register 2 Field NMCU Description The number of MCUs minus 1 that the core will encode. This 26-bit value is used for encoding only.

Register 3 contains the NRST value.


TABLE 4-6 Register Bit 15:0 regctrl Module: Register 3 Field NRST Description NRST value is the number of MCUs between two restart markers minus 1. The content of this register is ignored if the Re bit in register 1 is not set. During encoding, a restart marker is inserted in the produced ECS data every NRST + 1 MCU. During decoding of ECS data, a restart marker is expected every NRST + 1 MCU. Number of pixels per line.

31:16

Xsiz

Registers 47 are identical. Each register contains the description of the color components forming the MCU. Each register contains 8 bits. Registers 47 describe the structure of color components 03.
TABLE 4-7 Register Bit 0 1 3:2 7:4 11:8 15:12 regctrl Module: Registers 47 Field HDi HAi Tqi Nblocki Hi Vi Description The HDi bit selects the Huffman table for encoding the DC coefficient. The HAi bit selects the Huffman table for encoding the AC coefficients. The index of the particular quantization table associated with a color component. The number of data units (minus 1) that belong to a particular color in the MCU. Horizontal sampling factor for component i. Vertical sampling factor for component i.

Chapter 5, Programmers Guide provides more information about how to correctly program the cores internal registers.

4.4.4

Memory Controller
The jpeg_mem_ctrl module implements the memory controller. Because more than one core module can access the core memories, the data, address, and control signals for the memories go through the memory controller. The memory controller is essentially a multiplexer that enables sharing of the memories inputs and outputs. Specifically, the JPEG Header Generator and JPEG ECS CODEC both read the QMem memory. Similarly, the JPEG Header Parser and JPEG ECS CODEC both access the QMem, HuffBase, and HuffSymb memories. The memory controller also controls which process programs the cores internal registers. When the jpeg_hparser module is enabled, it has full access to the ECS CODECs CPU Register interface. All data written to these registers by an external process are ignored except for the start/stop bit in register 0.

4.4.5

HuffMin RTL Memory


The huffmin_mem_rtl module implements this memory. The Huffman decoder needs 400 bits of storage, one 100-bit word for each table (DC0, AC0, DC1, and AC1). All the bits are needed at the same time by a battery of comparators to detect the incoming code. To achieve single-cycle header decoding, it was necessary to architect this memory internally with flip-

flops and combinational logic. The external asynchronous HuffMin memory is required only if header parsing is disabled. In this case, the memory needs to be programmed during core initialization. The core read access interface for HuffMin has been provided.

4.4.6

Q Lookup Table
The qlook module implements this architecture. When encoding an 8 x 8 block of pixel data into ECS, each qtcoeff quantization coefficient that is read from QMem needs to be preprocessed internally within the core before being used by the ECS Encoder module. The reason is that the ECS Encoder uses a 14-bit floating-point representation of the 8-bit qtcoeff reciprocal. This processing is achieved with a simple table look-up.

4.6
4.6.1

CODEC Operation
This section outlines the JPEG CODEC operation.

Register Interface
The core is fully controlled and programmed through the Register Programming interface, whose timing is shown in Figure 4-21.

clk addr we din D0 D0 D1 D0 D1 A0 A1 A0 A1

dout

FIGURE 4-21 Programming Interface Timing Diagram

Data specified on the din bus is synchronously written in the register indicated by the addr address, provided the we write enable signal is set high. The content of the registers indicated by the addr bus is read asynchronously from the dout port.

4.6.2

Encoding Process
This section describes the encoding process and provides related timing diagrams. Based on whether the header processing functionality of the core is enabled, the encoding process compresses 8 x 8 pixel blocks (data units) into either a complete JPEG encoded output stream or only ECS data. Figure 4-22 shows a block diagram of the encoding process.

Pixel Data

DCT

Coefficient Quantization

Zig-Zag Run Length Encoding

Huffman Encoding

ECS Data

JPEG Header Generator (optional)

JPEG Bit Stream

FIGURE 4-22 Encoding Process Block Diagram

Before starting the encoding process, both quantization and Huffman encoding tables must be initialized. Furthermore, a description of the MCU must be programmed in the cores internal registers, as mentioned previously. The core uses this information to correctly encode incoming pixel blocks. If header generation is enabled, the DHTMem must be initialized and source image related parameters must be programmed in the cores registers. For detailed information on register and table programming, see Chapter 5, Programmers Guide. The core can be stalled at any time by asserting the en signal to active low, causing the core to suspend the encoding process. Operations resume when the en signal is set high. Any signal output by the core while en signal is low must be ignored. Writing any value with the least significant bit set into register 0 starts the encoding process. The encoding process can be stopped at any time by writing any value with the least significant bit reset into register 0. The encoding process starts on the cycle immediately after writing a valid starting value into register 0. However, the core can be started when the en signal is low, enabling the core to start and wait for the first valid pixel (indicated by en going high) to actually begin the encoding process. Timing is shown in Figure 4-23.

clk en pixin X00 X01 X02 X03 X04 X05

FIGURE 4-23 Encoding Process Timing Diagram

The core samples the pixin input only when the pixin_req and en signals are high. Pixel transfer to the pixin input can be suspended by setting en low. This action also sets the pixin_req output signal low. However, when setting en low, the entire core is stalled and its outputs are not valid.

Encoding can also be stopped regardless of the en signals value by writing in the register 0, as mentioned previously. This action enables the encoding process to be aborted at any time. Incoming pixels belonging to a data unit (8 x 8 block, see Figure 4-24) are expected in rows.

X00

X07

X70

X77

FIGURE 4-24 8 x 8 Block Format for Input Data

This means that input samples must be provided in the order: X00, X01X07, X10 X70X77. No pause is necessary between two different blocks, and encoding can proceed at the rate of one pixel per clock. The encoder expects data units in the order specified by the MCU composition. For example, for JFIF images, with MCU = 4Y + U + V, the core expects four luminance data units followed by the U and V data units. When the core is enabled after a reset, the main controller activates the header generation module, and a valid JPEG header bit stream is output on the enc_data port. Each valid byte is indicated by the enc_data_vld signal. The ECS CODEC module is then activated when the header generation is complete for the current frame being encoded. This completion is indicated by the pixin_req output signal going active high. Each incoming pixel is level shifted before being transformed with the DCT. The coefficients of the transformed 8 x 8 block are then quantized using the appropriate quantization table. The quantization table is selected based on the color component to which the data unit (8 x 8 block) belongs. The quantized block is then zigzag run length encoded. Finally, the resulting run lengths are encoded using the appropriate Huffman table, producing valid ECSs. A valid ECS byte from the enc_data port is indicated by the enc_data_vld signal. Figure 4-25 shows the timing for encoded output data.

clk enc_data_vld enc_data


JPEG/ ECS DATA JPEG/ ECS DATA JPEG/ ECS DATA

FIGURE 4-25 Encoder Output Timing Diagram

The cores output JPEG data stream can be stalled only by setting the en signal low and stalling the entire core. In this case, incoming pixels must be stopped as well. The core stops when it has compressed all the MCUs that it has been programmed to process. The end of the encoding process is indicated by the over signal going high.

4.6.3

Decoding Process
This section describes the decoding process and provides related timing diagrams. Depending on the mode of operation, the decoding process can either decode a complete JPEG encoded data stream or an input data stream with only ECSs. In either case, the core decodes the ECS data into valid 8 x 8 pixel blocks (data units). Figure 4-26 shows a block diagram of the decoding process.
JPEG Data ECS Data Pixel Data

JPEG Header Parser

Huffman Decoding

Zig-Zag Run Length Expan sion

Coefficient Dequantization

IDCT

FIGURE 4-26 Decoding Process Block Diagram

If decoding with header processing is enabled, the core can take a JPEG encoded bit stream and extract all the table specifications and image parameters to correctly decode the incoming ECS data into 8 x 8 pixel blocks. Because the jpeg_hparser module parses the input JPEG data stream for table specifications, the quantization and Huffman decoder table memories do not need to be programmed. Furthermore, the cores internal registers do not need to be programmed, because the header parser programs all the fields. If decoding only ECS data, both quantization and Huffman decoding tables must be initialized before starting the decoding process. Furthermore, a description of the MCU must be programmed in the cores internal registers, as mentioned previously. To correctly decode incoming ECS data bytes, the core uses the MCU composition. For detailed information on register and table programming, see Chapter 5, Programmers Guide. The core can be stalled at any time by setting the en signal low, causing the core to suspend the decoding process. Operations resume when the en signal is set high. Any signal output by the core while en is low must be ignored. Writing any value with the least significant bit set into register 0 starts the decoding process. The encoding process can be stopped at any time by writing any value with the least significant bit reset into register 0. The decoding process starts on the cycle immediately after writing a valid starting value into register 0. The dec_data_req output becomes active high one clock cycle after the en signal is set high. When active high, the cores dec_data_req output indicates that the core requires 8 bits of JPEG/ECS data at the dec_data input. The core captures the requested data at the next clock edge. A request for JPEG/ECS data from the core cannot be ignored, unless setting the en input low stalls the entire core.

In fact, it is possible to start the core with the en signal active low. This enables the core to start and wait for the external process to be ready (indicated by en going high) to actually beginning the decoding process. Timing is shown in Figure 4-27. Because the core can assert the dec_data_req signal at any time, the external process providing the core with JPEG/ECS data should generate the next JPEG/ECS data as soon as the requested one is clocked into the core. By having the next data byte ready, stalling of the core will be unnecessary avoided.

clk

en dec_data_req
JPEG/ECS JPEG/ECS DATA DATA JPEG/ECS DATA ECS DATA

dec_data

FIGURE 4-27 Decoder Input Timing Diagram

JPEG/ECS data on the dec_data input is sampled only when the en and dec_data_req signals are both high. Data transfer to the JPEG/ECS input can be suspended by setting the en signal low; however, the entire core is stalled and its outputs are not valid. ECS data that is input to the core is passed to the Huffman decoder, which can produce a symbol on each clock cycle if necessary. The Huffman decoder uses the appropriate AC or DC table according to the MCU description contained in the core register. The zigzag run length expander then processes generated symbols and produces the quantized DCT transformed coefficients. Each coefficient is dequantized according to the appropriate quantization table. The IDCT is then applied on the resulting block of data. Finally, the IDCT output is level- shifted before being output by the pixout output. Each valid pixel is indicated by an active high pixout_vld signal. Figure 4-28 shows the pixel output timing.
clk

en pixout_vld pixout X00 X01 X02 X03 X04 X05

FIGURE 4-28 Pixel Output Timing Diagram

Pixels are output in rows in the same order as expected at the input for encoding (as described previously). Provided that the code is not stalled by the en signal, the core produces a valid pixel per clock without any interruptions or gaps between data units. Occasionally at very low compression ratios (very low quantization coefficients), the pixel output might stall. In this case,

pixout_vld goes low.

Stalling the entire core using the en signal only stalls the cores output. In this case, incoming ECS data must be stopped as well. If the output must be stalled, use the en signal, which stalls all the cores operations. The core stops decoding ECS data whenever it encounters any non-restart marker. When the core has completed decoding, the pixout_vld signal goes low and the over signal pulses for one cycle.

Frequently Asked Questions

This chapter contains some frequently asked questions and their answers. 1. What are the main features of the JPEG CODEC core? Compatibility: 100 percent baseline ISO/IEC 10918-1 JPEG compliant 8-bit/channel pixel depths Single clock per pixel encoding and decoding Support for JPEG header generation and parsing Up to four programmable quantization tables Fully programmable Huffman tables (two AC and two DC) Fully programmable minimum coded unit (MCU) Encode/decode support (non-simultaneous) Single clock Huffman coding and decoding Four-channel interface: Pixel In, Compressed Out, Compressed In, Pixel Out Simple external interface Stallable design Hardware support for restart marker insertion Support for single, grayscale component Functionality to enable/disable header processing Internal Registers interface Fully synchronous design Available as fully functional and synthesizable Verilog core including testbench What is the throughput of the JPEG core?

2.

3.

100 MB/s min. ( 100MHz @ byte/clock) 0.13 m CMOS process 20 MB/s min. (20MHz @ byte clock) Xilinx Virtex-E Are there quantization tables for luminance and chrominance?

Yes. You can use up to four tables (per the ISO specification). Two tables are normally sufficient, but four tables enable color spaces typically used in color laser printing such as CMYK.

4.

Is the Huffman decoder programmable? Yes. The Huffman decoder is fully programmable and guarantees single-cycle decoding. The tables for encoding and decoding are not compatible. They could be shared, but the extra logic might be slower depending on the application. An external device must program both encoding and decoding tables.

5.

Does 66 MB/s mean 22 Mpixels/sec (24-bit pixels)? Yes, for 24-bit pixels. In some applications, a pixel can be as small as 1.5 bytes (JFIF images, the standard JPEG images), in which case 66 MB/s means 44 Mpixels/s.

6.

What's the encoders output? The encoder outputs only ECSs as defined in the JPEG specification. This output includes byte stuffing ($00 after a $FF bytes) and restart marker insertion, if required. The encoders output does not include other markers, tables, and so on. These items are quite easy to generate with an external device (that is, a micro-controller might already be part of the design).

7.

How big is the JPEG Encoder? The size of the encoder is approximately 30K gates (without memory on 0.13 m).

8.

How big is the JPEG CODEC? The size of the CODEC is approximately 56K gates (without memory on 0.13 m).

9.

Is your core able to stall at any clock during the output of the coefficient? After the STALL condition is removed, does the external logic need to send the 8 x 8 block from the very beginning, or can it continue from where it left off? There is an enable pin. When the pin is high, the core is activated and expects pixels at every clock cycle with no gaps; when the pin is low, the core freezes, but when it goes high again the core continues from the point at which the core was stopped. You can always stop encoding or decoding and start again. You can also restart the process from the beginning at any time.

10. What are the XV and YV buses? XV is the input bus, and YH and XV are used as for intermediate results to and from the dual-port RAM (now 64 x 15), while YV is the actual output. 11. To which standard does your JPEG CODEC conform? JPEG ISO/IEC 10918-1 12. Does the core support 12 bits per pixel instead of 8 bits per pixel? The core supports only 8 bits per the baseline specification. 13. Has the core been tested in silicon? Yes. The core has been tested in silicon in a .25- m ASIC CMOS process and in Virtex FPGA. 14. How does one test your core? Scan? Do you have functional test vectors?

Synopsys has a testbench with image files and functional test vectors.

15. Do you provide services to integrate your core into my system? Synopsys does not provide services to integrate the core into your system. However, Synopsys can direct you to a Synopsys-approved design center that might be able to help you. 16. What are the pixel input and output formats for the encoder and decoder? Both the encoder and decoder accept and output pixels belonging to an 8 x 8 block in rows. Pixels are expected and output in the order: x00x07, x10x17x70 x77. 17. Is it possible to compress and decompress data simultaneously? No. The reason is to avoid hardware duplication. The full CODEC shares DCT/IDCT, zigzag and quantizer/dequantizer. To achieve simultaneous encoding and decoding, separate encoder and decoder blocks are needed, which are larger than a full CODEC, but smaller than two CODECs. 18. How can we evaluate your CODEC? Synopsys has a C program that you can plug into the system for testing as well as a demo. 19. What does the C emulator do? The C emulation is only meant to run quickly. It does not hook with a simulator. The emulator reads a set of files (such as register content, tables, input data stream) and outputs an output file (pixels or ECS). The emulator provides a reference point and a way to test large files quickly, which can be valuable for evaluation. It is important to know that the emulator emulates the core, essentially working only on ECS compression and decompression. The emulator does not compress, decompress, or visualize images (these processes would require marker parsing and so on). There are some other programs supplied to that extent, but the main purpose remains core emulation. 20. Is MIND input 100 bits? Yes. The Huffman decoder needs 400 bits of storage, one 100-bit word for each table (DC0, AC0, DC1, and AC1). To detect the incoming code, a battery of comparators requires all the bits at the same time. Synopsyss initial plan was to make this memory internal using flip-flops. Because users have different needsone user might want only one table, another user might want them fixed ROM/combinational logic or had access to RAM cells (smaller than flip-flops)the memory was made external. Because this required storage is quite small and it is local to the CODEC, layout and routing problems are minimal. 21. I am interested in the internal accuracy of the JPEG. Assuming the internal accuracy of the JPEG refers to the precision of the DCT/IDCT module, Synopsys applied the DCT to an 8 x 8 block and then applied the IDCT to the result. This was done on 1,000,000 blocks. The worst PSNR was 52 dB. A PSNR of 40 dB is sufficient for most applications.

You might also like