Professional Documents
Culture Documents
Extending The Transaction Level Modeling Approach For Fast Communication Architecture Exploration Extending The Transaction Level Modeling Approach For Fast Communication Architecture Exploration
Extending The Transaction Level Modeling Approach For Fast Communication Architecture Exploration Extending The Transaction Level Modeling Approach For Fast Communication Architecture Exploration
► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
SoC Communication
► AHB ► APB
Pipelined Low Power
Burst modes Simple Interface
Split transactions Single Master
Multiple masters
AMBA 3.0
► Introduces AXI high performance protocol
Out of order completion
Fixed mode bursts
Advanced system cache support
►Specify if transaction is cacheable/bufferable
►Specify attributes such as write-back/write-through
Enhanced protection support
►Secure/non-secure transaction specification
Exclusive access (for semaphore operations)
Issues
► Selecting and configuring these
architectures for optimal PE
performance is a critical activity
in a SoC design Interface
bus architecture
(e.g. AMBA 2.0, AMBA 3.0
CoreConnect)
architecture parameters
Interface
(e.g. bus width, burst size)
?
PE
bus topologies
(e.g. shared, hierarchical)
protocol choices
(e.g. arbitration strategies)
Interface
PE
SoC Simulation Speed
Cycle Rate Technology
1 Silicon Reference Design
10-2 HW Emulator
10-3 Transaction Model
10-4 Cycle Accurate Model
10-6 RTL Model
10-7 Gate Level Model
► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Communication Modeling Approaches
► Cycle Accurate (CA) Models
► Bus Cycle Accurate (BCA) Models
► Transaction Level Modeling (TLM)
► Hybrid Modeling Approaches
Cycle Accurate Models
master slave Algorithm
var1 = a + b; case CTR_WR:
wait(); CTR_WR = in;
REG = d << var1; wait();
bus
wait(); CTR_WR |=0xf;
TLM
HREQ.set(1);
e = REG4 | 0xff
arb wait();
ST_RG = in|0x1
wait(); wait();
pin interface
BCA
• Detailed system debug and analysis
pin interface
BCA
• High level system exploration
• Fast to model
- /10 to /50 RTL
CA
• Fast simulation speed, but model not
too detailed for exploring SoC designs
- >>1000x RTL
Register Transfer Level
Hybrid Approaches
master slave Algorithm
… …
var1 = a + b; case CTR_WR:
d = d << var1; CTR_WR = in;
bus
request(port1); CTR_WR |=0xf;
TLM
e = REG4 | 0xff
wait(3, SC_NS);
arb ST_RG = in|0x1
wait(3, SC_NS);
HSEL.set(1); …
► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
CCATB Modeling Abstraction
► Variant of Hybrid Modeling Approach
No pins at interface
read(), write() transaction interface
► Cycle Count Accurate at Transaction Boundaries
maintains overall cycle accuracy, essential for system
exploration
► Trades off intra transaction visibility for
simulation speed
more than 1.5x faster than fastest BCA models
Timing Analysis
CCATB
► Model Abstraction
IPs modeled at behavioral level
Bus model extends generic TLM channel, adding
►Timing
►Bus protocol details
► Communication Interface
extension of read(), write() transactions from TLM
Protocol details (e.g. burst size, cache hints) need to be passed
► Modeling Language - SystemC
fast (C/C++ native execution)
provides constructs (concurrency, timing) for hardware modeling
extensive commercial tool support (debugging, waveform
viewing)
Exploration with CCATB Models
► Bus Architecture
e.g. AMBA 2.0 or 3.0 or Coreconnect
► Bus widths
e.g. 16/32/64 bits
► Burst Sizes
for DMA and other bus masters
► Bus Hierarchy/Topology
e.g. Single or Multi layer
► Arbitration Strategy
e.g. static priority, TDMA, RR
► Buffer Sizes
e.g. for queued out of order request completion
► Advanced Modes
e.g. OO completion, CACHE/BUFFER hints
► IP Cores
processor/peripherals
Master Bus Slave
msg.length = 1; get_requests(r); status read(a, msg)
addr = TIMER_REG2; sl_req = arbitrate(r); { switch (addr)
write(bus->port1, addr, a = decode(sl_req); {
msg); if (a.read) case TIMER_REG2:
wait(); st= read(a, sl_req); msg.data = t_reg2;
… else x.stat = SLV_OK;
st = write(a, sl_req); return x;
read/write
(addr, data_control_token)
request + arbitration +
decode cycle delay
Slave delay
Simulation
Slave response Time
CCATB Transaction Token Fields
► Related Work
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Exploration Study
COMPLY
SWITRN
2000
1800
1600
1400
1200 COMPLY
1000 USBDRV
800 SWITRN
600
400
200
0
Topology Configuration
45
40
35
30
25 COMPLY
20 USBDRV
15 SWITRN
10
5
0
Original config A config B
Effect of Buffer Size on Performance
Transactions (read/write) / sec
1800
1700
1600
1500 COMPLY
1400 USBDRV
1300 SWITRN
1200
1100
1000
1 2 3 4 5 6 7
1800
1600
1400
1200
1000 CCATB
800 BCA
600
400
200
0
orig_c orig_u orig_s A_c A_u A_s B_c B_u B_s
► Related Work
► Communication Architectures
► CCATB Modeling Abstraction
► Exploration Studies
► Conclusion
Conclusion
► CCATB models
1.55x to 2.20x faster than fastest BCA models
Less Modeling effort compared to BCA models
►Since intra-transaction visibility is not a concern
Accurate exploration of communication space
►Performance figures comparable in accuracy to detailed
pin accurate BCA models
Conveniently fit into SoC Design Flow
►Easy to extend TLM level models to get CCATB models
►Easy to refine down to pin accurate BCA level
Thank You!
sudeep@cecs.uci.edu
CCATB
► Plug and play IP models from library
Master (DMAs, processor ISS etc)
Slave (Timers, Interrupt Controllers, Memory etc)
Bus (AMBA 2.0 AHB, AMBA 3.0 AHB etc)
► Performance statistics include
Arbitration Conflicts
IP Throughput
Bandwidth Utilization
Cycles spent waiting for bus (for all master IPs)
Instructions/transactions executed
Transaction Level Models (TLM)
► Transactiondefined as exchange of a data or an
event between two components
data can be single word, a series of words (burst)
or a complex data structure that is transferred
over a bus
► TLM captures reads/writes of register values and
interrupts between various system components
not concerned with micro architecture (pin details,
cycle accuracy, clock, protocols like handshaking)
COMMEX Features
► Fast communication space exploration at CCATB level
► Seamless interface refinement
from TLM level down to CCATB level
from CCATB down to BCA level
► Plug-and-play different IPs effortlessly
communication architectures (e.g. AMBA2, AMBA3,
CoreConnect)
masters (e.g. ARM926ej-s, ARM920, ARM940)
slaves (e.g. simple ITC, vectored ITC)
► Integrate preexisting IPs using SystemC wrapper code
e.g. ARM CCM models
IBM CoreConnect