Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DEVELOPMENT OF A RECONFIGURABLE IP STANDARDIZING ENVIRONMENT FOR AXI DECODER

Jeevan Jyoth Kumar Buddhi1, Padmanaban K2, Anand Mahalingam3 1-M.Sc. (Engg.) Student, 2-Professor, Department of EEE, M.S.Ramaiah School of Advanced Studies, Bangalore 560 058 3-Technical Lead, KPIT Cummins Infosystems Ltd, Bangalore-560 012

Abstract
AMBA [Advanced Microprocessor Bus Architecture] bus model is one of the prominent bus architectures used in System-on-Chip designs. The bus protocols support a chunk of address region, which expects the blocks that support the protocol should possess the address space/region within the boundary supported by the protocol. To know the exact region of the address space which the master needs to communicate with a module, is used in between the bus interface and the slave interface. The module is called as ADDRESS DECODER. These address decoders are unique to the bus protocol. In the current paper, the design algorithms followed to support the AXI protocol have been examined. The address decoding technique that decodes the valid and exact address of a slave is designed. The master communicates to slave by generating the valid strobes only if the slave error or the decode error doesnt exist. The decoded address represents the registers for valid strobes present in the CSR/Memory Map. Based on the strobes generated the data transactions occur. The tools used are Cadence NC-Sim for simulation and RTL Encounter for synthesis. The AXI address decoder has been designed using two parallel FSMs and the simulations are observed in Cadence NC-Sim. The design supports an interleaving depth of 1. The design supports burst length of 16 and a maximum data width of 1024 bits. The maximum address space by the address decoder is 4KB as supported by protocol. The address types that are supported by the design are FIXED, INCREMENT and WRAP. The maximum operating frequency of the top-level AXI Address Decoder works at 156.674MHz meeting the design specification of the protocol. From the simulation we conclude that the address decoding technique can successfully identify the slave address in the AXI protocol under the design specifications. Keywords: AXI protocol, AMBA, SoC, IP, FPGA Nomenclature MHz Mega Hertz Abbreviations AHB Advanced High Speed Bus AMBA Advanced Microprocessor Bus Architecture APB Advanced Peripheral Bus ARM Advanced RISC Machines AXI Advanced Extensible Interface FIFO First In First Out PERL Practical Extraction and Report Language RISC Reduced Instruction Set Computer SoC System on Chip XML Extensible Markup Language To know the exact region of the address space which the master needs to communicate with a module is used in between the bus interface and the slave interface. The module is called as ADDRESS DECODER. These address decoders are unique with the bus protocol. The address decoder decodes the valid and exact address to which the master and slave wanted to communicate by valid reads/writes generating the valid strobes for the transfers [2].

1. INTRODUCTION
The need of using AMBA system is due to its stability with a higher probability of first silicon success. The features and merits of the protocol helped the communication between core and major blocks of SoCs easy. But the design of the protocol supporting blocks is quiet complex and has to support the features of the AXI protocol [1]. For the slave to support the transaction initiated by the master that supports protocol, the slave interface also should support the AXI protocol. The bus protocols support a chunk of address region, which expects the blocks that support the protocol should posses the address space/ region within the boundary supported by the protocol.

Figure 1 Typical SoC Model [1]


Figure 1 represents a complex SoC architecture which uses the ARMs AMBA protocols of AHB/APB. With the present technology, the blocks in figure 1 16 Volume 10, Issue 1, May 2011

SAS

TECH Journal

communicate with each other using number of bus architectures that are available, of which the most popular bus architectures used in ARM based (Advanced Microprocessor Bus Architecture) bus protocols [2]. Almost all the SoC applications use the AHB protocol for the performance factor to be maintained. The SoC has to apply AXI protocol for systems better through put and latency.

worst case scenarios for modelling system level architectures is used. The drawback of the approach is the modelling effort is high though the BCA modelling is chosen for the system level architecture evaluation. The approach is used in SoC applications where the modelling is purely TLM based or ESL based AMBA AXI 3.0 specifications [4] defines the technology independent standard bus protocol methodologies used for easy integration of IPs within any complex system or System-on-Chip applications. AMBA stands for Advanced Microcontroller/Microprocessor Bus Architecture. The AMBA AXI protocol is targeted at high-performance, high-frequency system designs and includes a number of features that make it suitable for high-speed sub microns interconnect. The AMBA 3.0 specifications involves High performance system bus supporting for high band width interface between the elements with technology independence, electrical characteristics and timing specifications associated with it. The merits associated by using this protocol are it supports multiple outstanding transactions, out of order transactions completion, and high performance enabling bus transactions up to 8-1024 bits with a maximum of 16 transfers. The main drawback of the protocol is it requires bridges when there is need to communicate with the low level peripherals adding to an extra hardware. In efficient usage of the high bandwidth as it supports independent channels. Targeted to SoC applications where the data transfers rate is high on chip. Chung Lai et al [5] describes interleaving in a NoC (Network on Chip) employing the AXI protocol. While transmitting the data by selecting at least one buffer based on the interleaving capability of the slave provides a NoC system employing AXI and an interleaving method thereof, capable of smoothly transmitting data according to the interleaving acceptance capability of an IP when the AXI protocol is applied to the NoC. Implementation of FIFO merging and the data interleaving have been adopted for the improvement of memory controller that enhances of system performance. The interleaving concept is implemented by adding the buffers whose depth purely based on the capability of the AXI slave and master and the data is sent to the address based on the interleaving depth. The ordering of the transaction request is stored in the buffers but the data is sent directly. The address decoder generates address specific for the IPs register set and doesnt handle error signals.

2. PROBLEM DEFINITION
The Design and Verification of re-configurable IP Standardizing Environment for an Address Decoder supporting Advanced Extensible Interface is designed for AMBA 3.0 specification from ARM. Verilog is used to develop the RTL for the design of Address Decoder of AMBA AXI. The Address Decoder consists of Buffers for Write Address, Read Address, Read Data and Write Data channels, FSMs for Read and Write. The FSMs for read and write are developed individually as the read and write are independent in AXI. The module is defined to be utilised at high bandwidth and low latency designs. The test bench to verify the toplevel is developed after verifying the individual blocks of the architecture using Verilog. The verified design is made reconfigurable by parameterising the specifications of the protocol. The standardizing environment for the developed IP is developed using the PERL script which generates the same logic for different IP frames provided as an input. The tools used to achieve the above mentioned steps are Cadence NCSim for functional verification, Cadence RTLEncounter for the synthesis of the module.

3. METHODOLOGY
Sangkwon Na, Sung Yang and Chong-Min Kyung [1] describe a bus power model of AMBA AXI bus architecture. The power model consists of a switching activity and load capacitance. The synthesis method adopted for the architecture just deals with the slave components rather than the whole buses. Of all the bus architectures the authors selected the AMBA AXI which makes the architecture flexible and suitable for the implementation of low power algorithm. The merits of the approach are low bus power consumption time by 20% and the traffic of the communication is distributed. The drawback is the area is more when compared to the bus matrix that is fully connected. Targeted for the applications where the power is of major concern like automotive applications. Jong-Eun Lee et al [3], describe the system level evaluation technique that leverages transaction level modelling that extends to the realm of system level performance evaluation has been presented. The interleaving concept and FIFO merging concepts also presented for the memory controller that enhances system performance. The proposed technique develops the concept of worst case scenarios. Since the memory controller is often found to be an important component that critically affects the system performance and thus needs optimization, the paper further addresses how to evaluate and optimize the memory controllers, focusing on the test environment and the methodology. The merits of the approach lies on using the TLM technique at the BCA modelling level for the highest accuracy with the interconnect architectures. To reduce the high modelling effort of BCA modelling, the concept of SAS

4. AMBA AXI MODEL AND STANDARDIZING MODEL


AMBA defines a multilevel busing system, with a high-performance system bus and a lower-level peripheral bus. AMBA has 5 different variants of buses as listed below 1. 2. AMBA AXI4: AMBA Advanced Extensible Interface AMBA AXI: Interface AMBA Advanced Extensible

TECH Journal

17

Volume 10, Issue 1, May 2011

3. 4. 5.

AMBA AHB: Performance Bus

AMBA

Advanced

High

Burst-based transactions with only start address issued Separate channels for read and write data to enable low-cost Direct Memory Access (DMA) Support out-of-order transaction completion Easy addition of register stages to provide timing closure

AMBA ASB: AMBA Advanced System Bus AMBA APB: AMBA Advanced Peripheral Bus

Out of these five variants the AXI4 is the recent release by the ARM Ltd tied up with XILINX as an IP core. The AXI4 protocol is still in work to produce the SoC in which it is implemented. 4.1 Migrating from AHB to AXI The modern systems on chip including the multicore clusters, graphic controllers and other sophisticated peripherals force the system fabric to pose a critical performance bottleneck. The AHB [Advanced High Performance Bus] even in its multilayer configuration cannot meet the demands of todays SoC. The reason to migrate from the AHB to AXI include [3]: The AHB is transfer based protocol. For one data item to be written to or read from a selected slave, an address will be given for each transfer. All transfers will be initiated by the master. The stalling of the master occurs when the slave does not respond to the master immediately. Each master can have only one outstanding transaction The sequential accesses/bursts consists of consecutive transfers which specify their relationship by asserting HTRANS/HBURST accordingly Though the AHB systems are multiplexed and have independent read and write data busses, they cannot operate in full-duplex mode In the AHB-Lite the point-to-point concept is an afterthought but in AXI its the central focus

The AXI is largely a burst/transaction based protocol. For every transaction the address and control information on the address channel gives the nature of the data to be transferred. The data between master and slave is transferred by the read and write data channels as shown in the Figures 2 and 3 respectively. In write transactions the AXI protocol has an additional write response channel to allow the slave to signal the master about the completion of the write transaction. The protocol supports: The address information issuing before the actual data transfer The support for multiple outstanding transactions The support for out-of-order completion of transactions

4.2 AMBA AXI Architecture The AMBA AXI protocol is targeted at highperformance, high-frequency system designs and includes a number of features that make it suitable for high-speed sub microns interconnect [3]. The objectives of the latest generation AMBA interface are: To be suitable for high-bandwidth and low-latency designs To enable high-frequency operation without using complex bridges To meet the interface requirements of a wide range of components To provide flexibility in the implementation of interconnect architectures To be compatible with existing AHB and APB interfaces also The key features of the AXI protocol are: Separate address/control and data phases Support for unaligned data transfers using byte strobes

Figure 2 Channel for Read Transactions [3]

Figure 3 Channel for Write Transactions [3]


4.3 IP Standardizing Environment The exact meaning to the IP standardizing can be different in different aspects. They might be 1. Validating the developed IP to meet some of the standard requirements

SAS

TECH Journal

18

Volume 10, Issue 1, May 2011

2.

Developing an environment to validate the modelled IP for its functionality and configurability for its plug play Certifying the IP that met the requirements by a standard organization

Text (CSV) XML etc

3.

The metadata comprises of three kinds of information: Informal (Natural Language) Formal (XML) Semi-formal (Excel)

The first is followed by almost all the industries where the development and the validation for the standard requirements are met. The third is also followed by most industries where the developed IP is sent to some of the standard organizations for obtaining the mark of standards [4]. The second is followed by few companies which have their own tool environment for standardizing i.e. to validate the modelled IP. A team from Synopsys has built a tool environment for the IP standardizing and automation i.e. to implement re-use concept. The team formed a group called SPIRIT Consortium and developed the environment named as IP-XACT. The IPXACT is a standard that defines a standard way for describing and handling multi sourced IP components. The standardization effort in the Consortium is driven by a coalition of leading semiconductor and EDA companies like TI, Synopsys, LSI, SI, NXP, Cisco etc [4]. The environment is a powerful tool whose features are: Ensures the delivery of compatible component descriptions from the multiple component vendors Enables the interchange of complex component libraries between EDA tools for SoC design Also enables the provision of EDA vendor and neutral scripts for the component creation and configuration Enables the automated design integration and configuration within multi-vendor design flows

The content of these metadata or the documents of various formats consists of Ports Registers IOs

The legacy metadata is migrated to the formal design specified IP-XACT format. The analysis for the migration strategy could be for some of the reasons like: Level of standardization on documents Its alignment to the current standardization Manual vs. Automated strategy Technologies/tools to use etc.

The IP-XACT format generated output varies from different formats to different level of abstractions.

Figure 5 Sample Migration from Metadata to IP-XACT Output [7]


Figure 5 represents the sample migration of the metadata to automated formats. The input to the IPXACT environment is the migrated format information e.g. Word (.doc), Excel or CSV etc, where the information is saved in the formal format of IP-XACT i.e. XML. This XML data is extracted by the RUBY script of the environment and stored in the form of some textual output in a format supported by the IP-XACT [8]. This format is then imported to the Bitwise tool of the environment which possess smart templates and performs various coherency checks and produce the outputs in various extensions including Specman, System Verilog, Verilog, documentation (RTF) etc.

Figure 4 Automated End to End Flow in IPXACT [6]


Figure 4 represents the automated end to end flow in IP-XACT environment. The legacy metadata in the figure represents the formats, metadata and content that the IP-XACT supports. The formats like: SAS Document (word, Framemaker etc) Spreadsheet (Excel) PDF, Docbook XML,

5. DESIGN OF AXI ADDRESS DECODER


The design specifications mentioned in ARM 3.0 is used to model the AMBA AXI bus architecture. The Table 1 defines the specifications used for modelling the AMBA AXI address decoder using Verilog.

TECH Journal

19

Volume 10, Issue 1, May 2011

Table 1 explains about the design specifications associated with AMBA 3.0 specifications. The Address Region represents the maximum available region/space which the AXI address decoder able to decode the address in an IP i.e. supported by the AXI protocol for the writes/reads to process correctly.

The signals from the master through the write and read address channels are given to write and read address buffers respectively in the decoder. The signals from the write response channel are taken directly without a buffer, as the write response is controllable for the occurred writes. The data from these buffers are given to the parallel write and read FSMs block as shown in the Figure 6 This FSM block has two parallel FSMs for read and writes along with the decoding logic in it. The Buffers and FSM block control each other in generating the valid read/write strobes for the decoded address.

Table 1 Design Specifications for AXI Address Decoder [9]


Sl. No 1 Specification Address Region Features of AXI Protocol Reconfigurable Depth Description Supports a maximum address space of [<4 kB] in an IP. Supports a address boundary of AXI Protocol of [<=4 kB] Supports the features like data interleaving, unaligned transfers and independent channels for read and write. Supports a reconfigurable depth of data transfers for the buffers depending upon the length of the transaction. [Max 16] Supports a reconfigurable width of data bus depending on the maximum bus width supported by SoC [8-1024 bits] Supports the maximum clock frequency supported by the Protocol i.e. 160MHz.

Reconfigurable Width

Clock Frequency

Figure 7 Write State Machine


Figure 7 represents the Write State Machine. The Write State Sachine has four states where the IDLE_STATE initializes all the signals. In W_STATE_I the ID fields check happen, in W_STATE_II the data assignment and the strobes are generated based on the address that support by the protocol. In W_STATE_III the response of the write transaction is assigned based on the check of ID fields. The state transitions occur because of the handshake signals READY and VALID.

5.1 Top-Level Design of AXI Address Decoder Figure 6 represents the top-level block diagram of the AXI Address Decoder. The Address Decoder contains 5 blocks in it. They are: Write Address Buffer Write Data Buffer Read Address Buffer Read Data Buffer FSM

Figure 6 Top-level Block Diagram of AXI Address Decoder

Figure 8 Read State Machine

SAS

TECH Journal

20

Volume 10, Issue 1, May 2011

Figure 8 represents the Read State Machine. The read state machine has three states where the IDLE_STATE initializes all the signals. In R_STATE_I the ID fields check happen, in R_STATE_II the data assignment and the strobes are generated based on the address that support by the protocol also the response of the read transaction is assigned based on the check of ID fields. The state transitions occur because of the handshake signals READY and VALID.

The simulation results of the top-level will be discussed in the section on. The inputs of the AXI address decoder design declared as registers in test bench and the outputs of design declared as wires. The registers of the test bench are given with the stimulus to verify the functionality of the design. Prior to the verification of top-level Address Decoder the building blocks of the design is verified using the Verilog test bench based verification for functionality check.

6. VERIFICATION OF AXI DECODER


Figure 9 represents the HDL verification flow using test benches. The design is instantiated in the test bench, stimulus is applied to the inputs, and the outputs are monitored for the desired results. Each feature of the design is tested to ensure that unexpected bugs have not been introduced into the design. This means testing the specific features designed into the DUT, one at a time, without any functional malfunction. A test bench is the HDL code to verify a module: Apply input vectors to module inputs Check module outputs Report errors to user

7. RESULTS AND DISCUSSIONS


The verification of top-level is carried out after performing the verification of all the basic building blocks that includes address and data buffers for write, read and parallel FSM block. The simulation results of the basic building blocks have been described in chapter 6. The test cases encountered to verify the top-level AXI address decoder are: Reset Condition Check Ready Generation Valid Generation of Read Data Channel Write Data Transaction Read Data Transaction Strobe Generation Address Calculation

Figure 9 HDL Verification flow using Test Benches [10]


The figure 10 represents the block diagram of testbench for verifying the AXI Address Decoder. The DUT represents the design under test, i.e. the design of the AXI Address Decoder architecture. The test bench applies stimulus for the design called as test cases, to verify the functionality of the design.

These test cases verify the top-level for individual channel functionality as well as parallelism functionality. The write and read transactions depending on the address given by the master occupies the location after decoded by the address decoder. The strobe generation is carried out in FSM block by address calculations. The address calculations are carried out based on the control signals of the address channels. 7.1 Reset Condition Check This test case tests the state of all the output signals when reset signal is asserted. The reset is active at the negative edge and is given at the outset with respect to the positive edge of the clock. As the reset signal is used to bring the whole system to a known or initial state, the case tests outputs.

Figure 10 Block Diagram of Test Bench for AXI Address Decoder


Verification of the individual blocks has been carried out using test bench based verification set up.

Figure 11 Simulations for the Reset condition for the AXI Address Decoder

SAS

TECH Journal

21

Volume 10, Issue 1, May 2011

Figure 11 represents the simulations for the reset condition which initiates all the outputs to their initial value. Its clearly observed from the Figure 11 that the outputs of the Top-level design RDATA, BVALID, RRESP, BRESP, RVALID, W/RSTRB are 0. The ready signals of all the channels except read channel are high. 7.2 Write Data Transaction In this case we test the write transactions written to the slave based on the address given by the master with the same ID for both address and data. The strobes given by the master indicates the byte lanes occupied by the data to transfer to the slave. The write data is given along with the strobes, thus covering the test case of strobe generation too.

The table represents the utilization of number of slices, slice FFs, LUTs, IOBs and GCLKs on the XILINX Virtex-4 FPGA.

8. SUMMARY
The AMBA AXI Address decoder has been modelled using Verilog to meet the specifications of AMBA 3.0. The top-level architecture is modelled using five building blocks which includes buffers for the address channel signals of both read and write also the data channel signals of both read and write and a FSM block which contains parallel write and read FSMs along with control logic. The address buffers are designed for one depth supporting an interleaving depth of one and data buffers are designed for depth 16 supporting the maximum number of transfers of the transaction. The write FSM is developed with four states controlling the strobe generation, data transfer to output and response signalling for writes. The read FSM is developed with three states controlling the strobe generation, data transfer to output and response signalling for reads. The verification aspects along with the functionality of each basic building block of the toplevel AXI Address Decoder architecture is described with their simulation results. Each building block is verified based on the test cases that encountered for their functionality. The verification methodology followed for the functional verification is test bench based. The test cases include reset test condition along with other test cases to verify the functionality of the blocks. The type of test benches used to verify are linear as well as random. The functional verification of the top-level AXI Address Decoder has been carried out. The verification results prove that the design supports the protocols features of independent reads and writes. The burst length supported by the design is 16 maximum. The burst types supported are FIXED, WRAP and INCR. The data bus width supported is ranges from 8-1024 making the variable parameterizable. The design supports the interleaving depth if one. The generic strobe generation is calculated based on the address of the first transfer and the control signals of address channel. The read and write address buffers with depth one is designed as the capability of slave is taken as one for implementing the interleaving concept. The data bus width of the design is taken as 64 bit as the protocol supports from 8-1024 bits. The functionality of the toplevel has been simulated and verified and observed the operating frequency is 156.674MHz. The control logic for the read and write operations are developed in the form of two parallel FSMs. The reconfigurability of the design is obtained from the parameterising of widths. The standardizing environment has been developed in PERL to obtain the Verilog code of design for an applied XML script. From the simulation we conclude that the address decoding technique can successfully identify the slave address in the AXI protocol under the design specifications.

Figure 12 Simulations for Write Data Transaction along with Strobe Generation
Figure 12 represents the simulations for write data transaction along with the strobe generation and the write response OKAY indicating 0. The master gives the data along with WVALID asserting it high and the WSTRB as 3 and 5 indicating the valid byte lanes. The outputs obtained as the strobes that are valid based on the address decoding and the output data for the corresponding address given by the master identifying the Ids as same for WID and AWID. The response from the slave is given as 0 indicating OKAY for the data written. 7.3 Resource Utilization on FPGA Table 2 represents the resource utilization summary of the AXI address decoder design implemented on the Xilinx FPGA Virtex-4.

Table 2 Design summary of the AXI Decoder synthesized Virtex4 FPGA


Logic Utilization Number of Slices Number of Slice FFs Number of 4 input LUTs Number of bonded IOBs Number of GCLKs Used 2244 2624 3342 438 2 Available 89088 178176 178176 960 32 Utilization 2% 1% 1% 45% 6%

SAS

TECH Journal

22

Volume 10, Issue 1, May 2011

9. REFERENCES
[1] Sangkwon Na, Sung Yang and Chong Min Kyung, Low Power Bus Architecture Composition for AMBA AXI, Journal of Semiconductor Technology and Science, Vol. 9, No. 2, pp 75-79, June 2009 [2] Maxime Pelcat et. al, An Open Framework for Rapid Prototyping of Signal Processing Applications, EURASIP Journal on Embedded Systems, Vol. 4, June 2009 [3] Jong-Eun Lee et. al, System Level Architecture Evaluation and Optimization: an Industrial Case Study with AMBA3 AXI, Journal of Semiconductor Technology and Science, Vol. 5, No. 4, pp 229236, December 2005 [4] ARM Limited, AMBA AXI Protocol v 1.0 Specification, IHI 0022B, 2003,2004 [5] Chung-Hung Lai et. al, An Embedded AXI Bus Tracer with Dynamic Multi-Resolution and Real Time Compression, Diagnostics Services in Network-on-Chips, DAC Workshop, 4th Edition, June 13, 2010 [6] Liang-Bi Chen et. al, AXI Checker: An AMBA AXI On-Chip Bus Protocol Checker with an Efficient Verification Mechanism, Diagnostics Services in Network-on-Chips, DAC Workshop, 4th Edition, June 13, 2010 [7] Andy D. Pimentel et. al, Tool Integration and Interoperability Challenges of a System-Level Design Flow: Case Study, SAMOS, LNCS 5114, pp. 167-176, Sept 2008 [8] Judy Gehman, Ft. Collins, LSI Corporation, Method for Request Transaction Ordering in OCP Bus to AXI Bus Bridge Design, Nov 25, 2008 [9] The SPIRIT Consortium, Verilog to IP-XACT Conversion Quick User Guide, QUG-102 v1.1, May 2008 [10] Yu-Jung Huang, Ching-Mai Ko and Hsien-Chiao Teng, Design and Performance Analysis of A Reconfigurable Arbiter , WSEAS TRANSACTIONS on ELECTRONICS, Issue 4, Vol. 5, April 2008

SAS

TECH Journal

23

Volume 10, Issue 1, May 2011

You might also like