Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

Extending the Transaction Level

Modeling Approach for Fast


Communication Architecture
Exploration

Sudeep Pasricha, Nikil Dutt Mohamed Ben-Romdhane


{sudeep,dutt}@ cecs.uci.edu m.benromdhane@conexant.com
Center for Embedded Computer Systems Conexant Systems Inc
University of California, Irvine Newport Beach, CA
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
SoC Communication

Communication between IPs in such complex systems


significantly affects system performance!
Communication Architectures
► Several bus based communication architectures commonly
used in SoC designs
 AMBA (2.0, 3.0)
 IBM CoreConnect
 Wishbone
► Key Features
 High Performance System Bus
► processors, memory, DMA etc.
 Low Bandwidth Peripheral Bus
► timer, interrupt controller, UART etc.
AMBA 2.0

► AHB ► APB
 Pipelined  Low Power
 Burst modes  Simple Interface
 Split transactions  Single Master
 Multiple masters
AMBA 3.0
► Introduces AXI high performance protocol
 Out of order completion
 Fixed mode bursts
 Advanced system cache support
►Specify if transaction is cacheable/bufferable
►Specify attributes such as write-back/write-through
 Enhanced protection support
►Secure/non-secure transaction specification
 Exclusive access (for semaphore operations)
Issues
► Selecting and configuring these
architectures for optimal PE
performance is a critical activity
in a SoC design Interface
 bus architecture
(e.g. AMBA 2.0, AMBA 3.0
CoreConnect)
 architecture parameters

Interface
(e.g. bus width, burst size)
?

PE
 bus topologies
(e.g. shared, hierarchical)
 protocol choices
(e.g. arbitration strategies)

Interface

PE
SoC Simulation Speed
Cycle Rate Technology
1 Silicon Reference Design
10-2 HW Emulator
10-3 Transaction Model
10-4 Cycle Accurate Model
10-6 RTL Model
10-7 Gate Level Model

► Capturing a complete SoC design at RTL level and


then simulating for exploration is
 too slow (~10–100 cycles/s)
 cumbersome to capture all the detail
 too late in the design flow for exploration!
Problem Definition
► Toenable exploration of the System-on-Chip
communication design space
 early in the design flow
 good accuracy
 fast simulation speed (>> 100K cycles/s)
 rapid system prototyping
 IP reuse (plug and play IP library)
 Support early development of
►embedded software
►executable (golden) specification of SoC
►system testbenches
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Communication Modeling Approaches
► Cycle Accurate (CA) Models
► Bus Cycle Accurate (BCA) Models
► Transaction Level Modeling (TLM)
► Hybrid Modeling Approaches
Cycle Accurate Models
master slave Algorithm
var1 = a + b; case CTR_WR:
wait(); CTR_WR = in;
REG = d << var1; wait();
bus
wait(); CTR_WR |=0xf;
TLM
HREQ.set(1);
e = REG4 | 0xff
arb wait();
ST_RG = in|0x1
wait(); wait();

pin interface
BCA
• Detailed system debug and analysis

• Time consuming to model


- /1 to /3 RTL CA
• Too slow for exploring SoC designs
- 100x RTL
Register Transfer Level
Bus Cycle Accurate Models
master slave Algorithm
… …
var1 = a + b; case CTR_WR:
REG = d << var1; CTR_WR = in;
bus
HREQ.set(1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait(3, SC_NS);
arb ST_RG = in|0x1
wait(3,SC_NS);
… …

pin interface
BCA
• High level system exploration

• Still time consuming to model


- /5 to /10 RTL
CA
• Still slow for exploring SoC designs
- 100x to 500x RTL
Register Transfer Level
Transaction Level Models
master slave Algorithm
channel
… …
var1 = a + b; case CTR_WR:
d = d << var1; CTR_WR = in;
bus
request(port1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait();
arb ST_RG = in|0x1
wait();
… …

generic channel interface


BCA
• High level system validation and
embedded software development

• Fast to model
- /10 to /50 RTL
CA
• Fast simulation speed, but model not
too detailed for exploring SoC designs
- >>1000x RTL
Register Transfer Level
Hybrid Approaches
master slave Algorithm
… …
var1 = a + b; case CTR_WR:
d = d << var1; CTR_WR = in;
bus
request(port1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait(3, SC_NS);
arb ST_RG = in|0x1
wait(3, SC_NS);
HSEL.set(1); …

pin, transaction interface


• Use Transaction Level Modeling
BCA
(TLM) techniques to speed up Bus
Cycle Accurate (BCA) model
simulation
• Time to model varies (sometimes CA
more than BCA)
• Simulation speed generally slightly
faster than BCA
Register Transfer Level
Hybrid Approaches
► Xinping et al. (ICCAD 2002) use function calls
instead of slower signal semantics to describe
models of AMBA2 and CoreConnect
 resulting models are not detailed enough for accurate
communication exploration
► Caldariet al. (DATE 2003) similarly attempt to
model AMBA2 using function calls for reads/writes
 Bus signals are also modeled : slows simulation
 Clocked threads used extensively : slows simulation
Hybrid Approaches
► Ogawa et al. (DATE 2003) also model data
transfers in AMBA2 using read/write transactions
 use low level handshaking semantics
► Inmid 2003, ARM released the AHB Cycle-Level
Interface Specification
 for modeling AMBA AHB at CA level in SystemC
 function calls emulate bus signals at interface
 Scope for improving speed by reducing
number of calls
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
CCATB Modeling Abstraction
► Variant of Hybrid Modeling Approach
 No pins at interface
 read(), write() transaction interface
► Cycle Count Accurate at Transaction Boundaries
 maintains overall cycle accuracy, essential for system
exploration
► Trades off intra transaction visibility for
simulation speed
 more than 1.5x faster than fastest BCA models
Timing Analysis
CCATB
► Model Abstraction
 IPs modeled at behavioral level
 Bus model extends generic TLM channel, adding
►Timing
►Bus protocol details
► Communication Interface
 extension of read(), write() transactions from TLM
 Protocol details (e.g. burst size, cache hints) need to be passed
► Modeling Language - SystemC
 fast (C/C++ native execution)
 provides constructs (concurrency, timing) for hardware modeling
 extensive commercial tool support (debugging, waveform
viewing)
Exploration with CCATB Models
► Bus Architecture
 e.g. AMBA 2.0 or 3.0 or Coreconnect
► Bus widths
 e.g. 16/32/64 bits
► Burst Sizes
 for DMA and other bus masters
► Bus Hierarchy/Topology
 e.g. Single or Multi layer
► Arbitration Strategy
 e.g. static priority, TDMA, RR
► Buffer Sizes
 e.g. for queued out of order request completion
► Advanced Modes
 e.g. OO completion, CACHE/BUFFER hints
► IP Cores
 processor/peripherals
Master Bus Slave
msg.length = 1; get_requests(r); status read(a, msg)
addr = TIMER_REG2; sl_req = arbitrate(r); { switch (addr)
write(bus->port1, addr, a = decode(sl_req); {
msg); if (a.read) case TIMER_REG2:
wait(); st= read(a, sl_req); msg.data = t_reg2;
… else x.stat = SLV_OK;
st = write(a, sl_req); return x;

read/write
(addr, data_control_token)

request + arbitration +
decode cycle delay

Slave delay

Burst + pipeline + busy +


interface + slave + add. transaction status
arbitration delay

Simulation
Slave response Time
CCATB Transaction Token Fields

Request field Description


m_data pointer to an array of data
m_burst_length length of transaction burst
m_burst_type type of burst (incr, fixed, wrapping etc.)
m_byte_enable byte enable strobe for unaligned transfers
m_read indicates whether transaction is read/write
m_lock lock bus during transaction
m_cache cache/buffer hints
m_prot protection modes
m_transID transaction ID (needed for OO access)
m_busy_idle schedule of busy/idle cycles from master
m_ID ID for identifying the master
COMMEX Design
Framework
Outline
► Motivation

► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Exploration Study

Broadband Communication SoC Platform


Exploration Study
► Used platform for exploring AMBA 2.0 and AMBA
3.0 configurations
 Communication Protocol Comparison
 Arbitration strategies
 Topology configurations
 Optimal Buffer Size
 Simulation Speedup
Software Applications
► We use three application benchmarks running on
ARM926EJ-S ISS for simulation
 COMPLY
► Configures USB, SWITCH and DMA
 USBDRV
► Configures USB, DMA but restricts SWITCH activity
 SWITRN
► Configures SWITCH, DMA but restricts USB activity
Bus Protocol Comparison

Transactions (read/write) / sec

COMPLY

USBDRV AMBA3 (AXI)


AMBA2 (AHB)

SWITRN

0 500 1000 1500 2000


Arbitration Strategies

Transactions (read/write) / sec

2000
1800
1600
1400
1200 COMPLY
1000 USBDRV
800 SWITRN
600
400
200
0
Topology Configuration

Original Config = {ARM926, DMA, USB, SWITCH} on 1 bus


Topology Configuration

Config A = {ARM926, DMA} {USB, SWITCH} on 2 busses


Topology Configuration

Config B = {ARM926, DMA, SWITCH} {USB} on 2 busses


Topology Configuration
Conflicts (%)

45
40
35
30
25 COMPLY
20 USBDRV
15 SWITRN
10
5
0
Original config A config B
Effect of Buffer Size on Performance
Transactions (read/write) / sec

1800
1700
1600
1500 COMPLY
1400 USBDRV
1300 SWITRN
1200
1100
1000
1 2 3 4 5 6 7

Comparing performance with different SDRAM Out-of-Order Buffer sizes


Simulation Performance
Transactions (read/write) / sec

1800
1600
1400
1200
1000 CCATB
800 BCA
600
400
200
0
orig_c orig_u orig_s A_c A_u A_s B_c B_u B_s

Comparing speed of transaction based BCA


and CCATB platform models
Outline
► Motivation

► Related Work
► Communication Architectures
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Conclusion
► CCATB models
 1.55x to 2.20x faster than fastest BCA models
 Less Modeling effort compared to BCA models
►Since intra-transaction visibility is not a concern
 Accurate exploration of communication space
►Performance figures comparable in accuracy to detailed
pin accurate BCA models
 Conveniently fit into SoC Design Flow
►Easy to extend TLM level models to get CCATB models
►Easy to refine down to pin accurate BCA level
Thank You!

sudeep@cecs.uci.edu
CCATB
► Plug and play IP models from library
 Master (DMAs, processor ISS etc)
 Slave (Timers, Interrupt Controllers, Memory etc)
 Bus (AMBA 2.0 AHB, AMBA 3.0 AHB etc)
► Performance statistics include
 Arbitration Conflicts
 IP Throughput
 Bandwidth Utilization
 Cycles spent waiting for bus (for all master IPs)
 Instructions/transactions executed
Transaction Level Models (TLM)
► Transactiondefined as exchange of a data or an
event between two components
 data can be single word, a series of words (burst)
or a complex data structure that is transferred
over a bus
► TLM captures reads/writes of register values and
interrupts between various system components
 not concerned with micro architecture (pin details,
cycle accuracy, clock, protocols like handshaking)
COMMEX Features
► Fast communication space exploration at CCATB level
► Seamless interface refinement
 from TLM level down to CCATB level
 from CCATB down to BCA level
► Plug-and-play different IPs effortlessly
 communication architectures (e.g. AMBA2, AMBA3,
CoreConnect)
 masters (e.g. ARM926ej-s, ARM920, ARM940)
 slaves (e.g. simple ITC, vectored ITC)
► Integrate preexisting IPs using SystemC wrapper code
 e.g. ARM CCM models
IBM CoreConnect

► PLB ► OPB ► DCR


 Pipelined  Low bandwidth  Low throughput
► 4 deep read  Burst mode ► 1 r/w = 2 cycles
► 2 deep write  Multiple Masters  Ring type data bus
 Burst modes
 Split transactions
 Multiple masters

You might also like