Professional Documents
Culture Documents
SoC - Midterm Lecture Materials
SoC - Midterm Lecture Materials
https://www.computerhistory.org/silico
nengine/digital-watch-is-first-system-on- Electronic module from a Hamilton Pulsar
chip-integrated-circuit/ digital watch
What does a system on chip contain?
HW HW
DESIGN FAB
SYSTEM FUNCTION HW & INTEG.
DEF. DESIGN SW & TEST
PART.
HW & SW SW SW
CODESIGN DESIGN CODE
HW & SW
Partitioning
& Codesign
Module Goals
• Introduce the fundamentals of HW/SW codesign
and partitioning concepts in designing embedded
systems
– Discuss the current trends in the codesign of
embedded systems
– Provide information on the goals of and methodology
for partitioning hardware/software in systems
• Show benefits of the codesign approach over
current design process
– Provide information on how to incorporate these
techniques into a general digital design methodology
for embedded systems
• Illustrate how codesign concepts are being
introduced into design methodologies
– Several example codesign systems are discussed
Module Outline
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Integrated HW/SW Modeling
Methodologies
• HW and SW Synthesis Methodologies
• Industry Approaches to HW/SW Codesign
• Hardware/Software Codesign Research
• Summary
Codesign Definition
and Key Concepts
• Codesign
– The meeting of system-level objectives by
exploiting the trade-offs between hardware
and software in a system through their
concurrent design
• Key concepts
– Concurrent: hardware and software
developed at the same time on parallel paths
– Integrated: interaction between hardware
and software development to produce design
meeting performance criteria and functional
specs
Motivations for Codesign
• Factors driving codesign (hardware/software
systems):
– Instruction Set Processors (ISPs) available as cores
in many design kits (386s, DSPs,
microcontrollers,etc.)
– Systems on Silicon - many transistors available in
typical processes (> 10 million transistors available
in IBM ASIC process, etc.)
– Increasing capacity of field programmable devices
- some devices even able to be reprogrammed on-
the-fly (FPGAs, CPLDs, etc.)
– Efficient C compilers for embedded processors
– Hardware synthesis capabilities
Motivations for Codesign
(cont.)
• The importance of codesign in designing
hardware/software systems:
– Improves design quality, design cycle time, and cost
• Reduces integration and test time
– Supports growing complexity of embedded systems
– Takes advantage of advances in tools and
technologies
• Processor cores
• High-level hardware synthesis capabilities
• ASIC development
Categorizing
Hardware/Software Systems
• Application Domain
– Embedded systems
• Manufacturing control
• Consumer electronics
• Vehicles
• Telecommunications
• Defense Systems
– Instruction Set Architectures
– Reconfigurable Systems
• Degree of programmability
– Access to programming
– Levels of programming
• Implementation Features
– Discrete vs. integrated components
– Fabrication technologies
Categories of Codesign Problems
• Codesign of embedded systems
– Usually consist of sensors, controller, and actuators
– Are reactive systems
– Usually have real-time constraints
– Usually have dependability constraints
• Codesign of ISAs
– Application-specific instruction set processors (ASIPs)
– Compiler and hardware optimization and trade-offs
• Codesign of Reconfigurable Systems
– Systems that can be personalized after manufacture for a
specific application
– Reconfiguration can be accomplished before execution of
concurrent with execution (called evolvable systems)
Components of the Codesign Problem
• Specification of the system
• Hardware/Software Partitioning
– Architectural assumptions - type of processor, interface style between
hardware and software, etc.
– Partitioning objectives - maximize speedup, latency requirements,
minimize size, cost, etc.
– Partitioning strategies - high level partitioning by hand, automated
partitioning using various techniques, etc.
• Scheduling
– Operation scheduling in hardware
– Instruction scheduling in compilers
– Process scheduling in operating systems
• Modeling the hardware/software system during the design process
Embedded Systems
Embedded Systems
Application-specific systems which contain hardware
and software tailored for a particular task and are
generally part of a larger system (e.g., industrial
controllers)
• Characteristics
– Are dedicated to a particular application
– Include processors dedicated to specific functions
– Represent a subset of reactive (responsive to external
inputs) systems
– Contain real-time constraints
– Include requirements that span:
• Performance
• Reliability
• Form factor
Embedded Systems:
Specific Trends
• Use of microprocessors only one or two
generations behind state-of-the-art for
desktops
– E.g. N/2 bit width where N is the bit width of
current desktop systems
• Contain limited amount of memory
• Must satisfy strict real-time and/or
performance constraints
• Must optimize additional design objectives:
– Cost
– Reliability
– Design time
• Increased use of hardware/software codesign
principles to meet constraints
Embedded Systems:
Examples
• Banking and transaction processing
applications
• Automobile engine control units
• Signal processing applications
• Home appliances (microwave ovens)
• Industrial controllers in factories
• Cellular communications
Embedded Systems:
Complexity Issues
• Complexity of embedded systems is
continually increasing
• Number of states in these systems (especially
in the software) is very large
• Description of a system can be complex,
making system analysis extremely hard
• Complexity management techniques are
necessary to model and analyze these systems
• Systems becoming too complex to achieve
accurate “first pass” design using conventional
techniques
• New issues rapidly emerging from new
implementation technologies
Techniques to Support
Complexity Management
• Delayed HW/SW partitioning
– Postpone as many decisions as possible that place
constraints on the design
• Abstractions and decomposition techniques
• Incremental development
– “Growing” software
– Requiring top-down design
• Description languages
• Simulation
• Standards
• Design methodology management framework
A Model of the Current
Hardware/Software Design
Process
DOD-STD-2167A
HWCI
HW Development Testing
Fabric.
Detailed
Design
Prelim.
Design
Hardware
Require.
Sys/HW
Analysis
Require.
Analysis
System System Operation.
Concepts Integ. and Testing and
Sys/SW test Eval.
Require.
Analysis Software
Require.
Analysis Prelim.
Design
Detailed
Design
Coding,
Unit test.,
SW Development Integ. test CSCI
Testing
[Franke91]
© IEEE 1991
Current Hardware/Software
Design Process
• Basic features of current process:
– System immediately partitioned into hardware and
software components
– Hardware and software developed separately
– “Hardware first” approach often adopted
• Implications of these features:
– HW/SW trade-offs restricted
• Impact of HW and SW on each other cannot be assessed
easily
– Late system integration
• Consequences these features:
– Poor quality designs
– Costly modifications
– Schedule slippages
Incorrect Assumptions in
Current Hardware/Software
Design Process
• Hardware and software can be acquired
separately and independently, with
successful and easy integration of the two
later
• Hardware problems can be fixed with
simple software modifications
• Once operational, software rarely needs
modification or maintenance
• Valid and complete software requirements
are easy to state and implement in code
Directions of the HW/SW
Design Process
Integrated Modeling Substrate
HWCI
HW Development Testing
Fabric.
Detailed
Design
Prelim.
Design
Hardware
Require.
Sys/HW
Analysis
Require.
Analysis Operation.
System Integrated Modeling Substrate System
Concepts Integ. and Testing and
Sys/SW test Evaluation
Require.
Analysis Software
Require.
Analysis Prelim.
Design
Detailed
Design
Coding,
Unit test.,
SW Development Integ. test CSCI
Testing
[Franke91]
© IEEE 1991
Requirements for the Ideal
Codesign Environment
• Unified, unbiased hardware/software
representation
– Supports uniform design and analysis techniques for
hardware and software
– Permits system evaluation in an integrated design
environment
– Allows easy migration of system tasks to either
hardware or software
• Iterative partitioning techniques
– Allow several different designs (HW/SW partitions) to
be evaluated
– Aid in determining best implementation for a system
– Partitioning applied to modules to best meet design
criteria (functionality and performance goals)
Requirements for the Ideal
Codesign Environment
(cont.)
• Integrated modeling substrate
– Supports evaluation at several stages of the design
process
– Supports step-wise development and integration of
hardware and software
• Validation Methodology
– Insures that system implemented meets initial
system requirements
Cross-fertilization Between
Hardware and Software
Design
• Fast growth in both VLSI design and
software engineering has raised awareness
of similarities between the two
– Hardware synthesis
– Programmable logic
– Description languages
– Graphics-driven design
System
FSM- Description Concurrent processes
directed graphs (Functional) Programming languages
SW HW
Another
HW/SW Software Interface Hardware
Synthesis Synthesis Synthesis
partition
System Specs..
HW/SW
Partitioning
HW/SW Integration
and Cosimulation
Integrated
System
© IEEE 1994
System Evaluation Design Verification
[Rozenblit94]
Codesign Features
3
Relative Prog.
Cost / Instr.
1 Folklore
25 50 75 100
% Util. of speed and mem capacity
Module Outline
• Introduction
• Summary
Unified HW/SW
Representation
• Unified Representation --
– A representation of a system that can be used to
describe its functionality independent of its
implementation in hardware or software
– Allows hardware/software partitioning to be
delayed until trade-offs can be made
– Typically used at a high-level in the design process
• Provides a simulation environment after
partitioning is done, for both hardware and
software designers to use to communicate
• Supports cross-fertilization between hardware
and software domains
Current Abstraction
Mechanisms in
Hardware Systems
Abstraction
The level of detail contained within the system
model
• Application Programs
• Utility Programs
• Operating System
• Monitor
• Machine Language
• Microcode
• Logic Devices
Abstract Hardware-Software
Model
Uses a unified representation of system to allow
early performance analysis
General
Performance
Evaluation
Abstract Evaluation
Identification
HW/SW of Design
of Bottlenecks
Model Alternatives
Evaluation
of HW/SW
Trade-offs
Examples of Unified HW/SW
Representations
Systems can be modeled at a high level as:
Data/control flow diagrams
Concurrent processes
Finite state machines
Object-oriented representations
Petri Nets
Unified Representations
(Cont.)
• Data/control flow graphs
– Graphs contain nodes corresponding to
operations in either hardware or software
– Often used in high-level hardware synthesis
– Can easily model data flow, control steps, and
concurrent operations because of its graphical
nature
5 X 4 Y
Example: + + Control Step 1
+ Control Step 2
+ Control Step 3
Unified Representations
(Cont.)
• Concurrent processes
– Interactive processes executing concurrently with
other processes in the system-level specification
– Enable hardware and software modeling
• Finite state machines
– Provide a mathematical foundation for verifying
system correctness, simulation,
hardware/software partitioning, and synthesis
– Multiple FSMs that communicate can be used to
model reactive real-time systems
Unified Representations
(Cont.)
• Object-oriented representations:
– Use techniques previously applied to software to
manage complexity and change in hardware
modeling
– Use C++ to describe hardware and display OO
characteristics
– Use OO concepts such as
• Data abstraction
• Information hiding
• Inheritance
– Use building block approach to gain OO benefits
• Higher component reuse
• Lower design cost
• Faster system design process
• Increased reliability
Unified Representations
(Cont.)
Object-oriented representation
Example:
3 Levels of abstraction:
within places
Output of a Petri net, representing the
state of the
Place
net
Module Outline
• Introduction
• Unified HW/SW Representations
• Summary
Hardware/Software
Partitioning
• Definition
– The process of deciding, for each subsystem,
whether the required functionality is more
advantageously implemented in hardware or
software
• Goal
– To achieve a partition that will give us the
required performance within the overall system
requirements (in size, weight, power, cost, etc.)
• This is a multivariate optimization problem
that when automated, is an NP-hard problem
HW/SW Partitioning Issues
• Software implementation
– May run on high-performance processors at low
cost (due to high-volume production)
– Incurs high cost of developing and maintaining
(complex) software
Partitioning Approaches
Component allocation
Outpu
t
Specification Abstraction
Levels
• Task-level dataflow graph
– A Dataflow graph where each operation
represents a task
• Task
– Each task is described as a sequential program
• Arithmetic-level dataflow graph
– A Dataflow graph of arithmetic operations along
with some control operations
– The most common model used in the partitioning
techniques
• Finite state machine (FSM) with datapath
– A finite state machine, with possibly complex
expressions being computed in a state or during a
transition
Specification Abstraction
Levels (Cont.)
• Register transfers
– The transfers between registers for each
machine state are described
• Structure
– A structural interconnection of physical
components
– Often called a netlist
Granularity Issues in
Partitioning
• The granularity of the decomposition is a
measure of the size of the specification in
each object
• The specification is first decomposed into
functional objects, which are then partitioned
among system components
– Coarse granularity means that each object
contains a large amount of the specification.
– Fine granularity means that each object contains
only a small amount of the specification
• Many more objects
• More possible partitions
– Better optimizations can be achieved
System Component
Allocation
• The process of choosing system component
types from among those allowed, and
selecting a number of each to use in a given
design
• The set of selected components is called an
allocation
– Various allocations can be used to implement a
specification, each differing primarily in monetary
cost and performance
– Allocation is typically done manually or in
conjunction with a partitioning algorithm
• A partitioning technique must designate the
types of system components to which
functional objects can be mapped
– ASICs, memories, etc.
Metrics and Estimations
Issues
• A technique must define the attributes of a
partition that determine its quality
– Such attributes are called metrics
• Examples include monetary cost, execution time,
communication bit-rates, power consumption, area,
pins, testability, reliability, program size, data size, and
memory size
• Closeness metrics are used to predict the benefit of
grouping any two objects
• Need to compute a metric’s value
– Because all metrics are defined in terms of the
structure (or software) that implements the
functional objects, it is difficult to compute costs
as no such implementation exists during
partitioning
Metrics in HW/SW
Partitioning
• Two key metrics are used in
hardware/software partitioning
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Summary
Cosimulation
Verilog HW Simulator
VHDL Simulator
Allowing hardware
SW
simulation models to
VHDL Foreign Language proc 1 “cosimulate” with software
Interface processes.
SW
proc 2
VHDL-C Based HW/SW Cosimulation for
DSP Multicomputer Application
Algorithm - C
Architecture - VHDL
Scheduler - C
CPU 1 CPU 2 CPU 3 CPU 4
Mapping Function
(e.g.):
Round Robin
Computational Communications Network
Requirements
Based
VHDL-C Based HW/SW Cosimulation for
DSP Multicomputer Application
Unix C Program VHDL Simulator
System State (e.g.):
CPU:
Time to instruction completion
Architecture Model
Comm Agent:
Messages in Send Queue
Messages in Recv Queue INSTRUME
NT
Network:
Communications Channels Busy PACKAGE
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Integrated HW/SW Modeling Methodologies
• Summary
Hardware Design
Methodology
Hardware Design Process:
Waterfall Model
Preliminary Detailed
Hardware
Hardware Hardware Fabrication Testing
Requirements
Design Design
Hardware Design
Methodology (Cont.)
• Use of HDLs for modeling and simulation
• Use of lower-level synthesis tools to derive register
transfer and lower-level designs
• Use of high-level hardware synthesis tools
– Behavioral descriptions
– System design constraints
• Introduction of synthesis for testability at all levels
Hardware Synthesis
• Definition
– The automatic design and implementation of
hardware from a specification written in a
hardware description language
• Goals/benefits
– To quickly create and modify designs
– To support a methodology that allows for multiple
design alternative consideration
– To remove from the designer the handling of the
tedious details of VLSI design
– To support the development of correct designs
Hardware Synthesis
Categories
• Algorithm synthesis
– Synthesis from design requirements to control-
flow behavior or abstract behavior
– Largely a manual process
• Register-transfer synthesis
– Also referred to as “high-level” or “behavioral”
synthesis
– Synthesis from abstract behavior, control-flow
behavior, or register-transfer behavior (on one
hand) to register-transfer structure (on the other)
– Logic synthesis
– Synthesis from register-transfer structures or
Boolean equations to gate-level logic (or physical
implementations using a predefined cell or IC
library)
Hardware Synthesis
Process Overview
Specification Implementation
Verification
Gate-level Gate-level Gate
Simulation Analysis
Silicon Vendor
Layout
Place and Route
Silicon
Software Design
Methodology
Software Design Process:
Waterfall Model
Software Software
Coding Testing Maintenance
Requirements Design
Software Design
Methodology (Cont.)
• Software requirements includes both
– Analysis
– Specification
• Design: 2 levels:
– System level - module specs.
– Detailed level - process design language (PDL) used
• Coding - in high-level language
– C/C++
• Maintenance - several levels
– Unit testing
– Integration testing
– System testing
– Regression testing
– Acceptance testing
Software Synthesis
– Visual Basic
• Domain-specific synthesis
Behavior GRAPH
ANALYSIS
Behavioral Specification
Code
Fragments
AUTOCODER
Ada Source
Code File
MCCI Domain Primitive Database
Interface Synthesis
Mixed
Implementation
Pure HW
Performance
Pure SW Constraints
Cost [Gupta93]
© IEEE 1993
Module Outline
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Integrated HW/SW Modeling Methodologies
• HW and SW Synthesis Methodologies
• Summary
Sanders Codesign
Methodology
Global influences
Design Tool Virtual Cost
Libraries
rules select. Environ. models
SW Req.
Feedback Partition. Design Code
At all Test
to user
steps
Integrate System
HW/SW Integrated HW/SW & Test Checkout
Req. Algorithm Tradeoff Simulation
Analysis Develop. Analysis
Fab &
HW Req. Logical Anal. Test
Requirements Partition. & Phys. &
Design Simul.
Hardware Modules
[HOOD94]
Sanders Codesign
Methodology
Integrated Modeling Substrate
System
Requirements
Arch Ind.
Proc Model
Hardware Software
Perf. Model Perf. Model
S Arch Dep.
Behavior I L Proc Model
Level Model M I
U B
ISA L R Source Code
Model A A
T R
I HOL
Y
RTL Model O
N Assembly
Gate Level
Model
Prototype Load
Hardware Module [RASSP94]
Sanders Codesign
Methodology
• Subsystems process
– Processing requirements are modeled in an
architecture-independent manner
– Codesign not an issue
• Architecture process
– HW/SW allocation analyzed via modeling of SW
performance on candidate architectures
– Hierarchical verification is performed using finer grain
modeling (ISA and below)
• Detailed design
– Downloadable executable application and test code is
verified to maximum extent possible
• Library support
– SW models validated on test data
– HW models validated using existing SW models
– HW & SW models jointed iterated throughout designs
Lockheed Martin ATL
Codesign Methodology
SW Req. SW SW SW
Spec. Design Code Debug
Partition.
SW
Prototype Test
User
Req. Top Interface
HW/SW HW/SW
& level
Tradeoff Cosimul.
Spec. Arch. HW/SW System
Integ. Checkout
Algor. HW
develop. Sim.
& simul.
HW HW
HW HW
Spec.. Dev.
Design Test
Partition
HW
Anal.
& Fab
[RASSP94]
Module Outline
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Integrated HW/SW Modeling Methodologies
• HW and SW Synthesis Methodologies
• Industry Approaches to HW/SW Codesign
parser scheduler
program
Verilog
Specification
comm. code
synthesizer generator
driver interface
synthesizer synthesizer
Processor &
netlist
Device Libraries
Behavioral Mixed Structural
Simulation Simulation Simulation
System Specification in
Chinook
(Unified Representation)
• The system specification is written in a dialect of Verilog and includes
the system’s behavior and the structure of the system architecture
• The behavior is specified as a set of tasks in a style similar to
communicating finite state machines - control states of the system
are organized as modes which are behavioral regimes similar to
hierarchical states
• In a given mode, the system’s responses are defined by a set of
handlers which are essentially event-triggered routines
• The designer must tag tasks or modules with the processor that is
preferred for their implementation - untagged tasks are
implemented in software
• The designer can specify response times and rate constraints for
tasks in the input description
Scheduling in Chinook
• Chinook provides an automated scheduling algorithm
• Low-level I/O routines and high level routines grouped in modes are
scheduled statically
• A static, nonpreemptive scheduling algorithm is used to meet
min/max timing constraints on low-level operations
– Determines serial ordering for operations
– Inserts delays as necessary to meet minimum constraints
– Includes heuristics in the scheduling algorithm to help exact algorithm
generate valid solution to NP-hard scheduling problem
• A customized dynamic scheduler may be generated for the top-level
modes
Interface Synthesis in
Chinook
• Realization of communication between system components is an
area of emphasis in the Chinook system
• Chinook synthesizes device drivers from timing diagrams
• Custom code for the processor being used is generated
– For processors with I/O ports, an efficient heuristic is used to connect
devices with minimal interface hardware
– For processors w/o I/O ports, a memory mapped I/O interface is
generated including allocating address spaces, and generating the
required bus logic and instructions
• Portions of the interface that cannot be implemented in software are
synthesized into external hardware
Communications
Synthesis and System
Simulation in Chinook
• Chinook provides methods for synthesizing communications systems
between multiple processors if a multicomputer implementation is chosen
– Bus-based, point-to-point, and hybrid communications schemes are supported
– Communications library that includes FIFOs, arbiters, and interconnect
templates is provided
ES to C ES Flowgraph ES to HW C
C Compiler Cost
Estimation Olympus
Run time
Object Code
Analysis
COSYMA - Aims and
Strategies
• Major aim is automating HW/SW
partitioning process, for which very few
tools currently exist
• COSYMA partitions at the basic block and
function level (including hierarchical
function calls)
– Simulated annealing algorithm is used
because of its flexibility in the cost function
and the possibility to trade-off computation
time vs result quality
– Starts with an unfeasible all-software solution
COSYMA - Cost Function
and Metrics
• The cost function is defined to force the
annealing to reach a feasible solution
before other optimization goals (e.g., area)
• The metrics used in cost computation are:
– Expected hardware execution times
– Software execution times
– Communication
– Hardware costs
• The cost function is updated in each step
of the simulated annealing algorithm
COSYMA - Cost Function
and Metrics (Cont.)
• After partitioning, the parts selected to be
realized in software are translated to a C
program, thereby inserting code for
communicating with the coprocessor
• The rest of the system is translated to the
input description of the high-level
synthesis system, and an application-
specific coprocessor is synthesized
• Lastly, a fast-timing analysis of the whole
HW/SW system is performed to test
whether all constraints are satisfied
Ptolemy
• Attributes
– Facilitates mixed-mode system simulation,
specification, and design
– Supports generation of DSP assembly code
from a block diagram description of algorithm
– Uses object-oriented representations to
model subsystems efficiently
– Supports different design styles called
Codesign Methodology
Using Ptolemy
• Ptolemy supports a framework for
hardware/software codesign, called the
Design Assistant
Netlist
Generation Ptolemy
VHDL/Synopsys
Simulation
Plasma
Separate Model of Computation Separate Model of Computatio
(e.g. discrete event) (e.g. data flow)
(*Key == Off) or
(*Belt == On) (*End == 5) *Alarm = On
Off
(*End == 10) or
Alarm
(*Belt == On) or
(*Key == Off) *Alarm = Off
S-graph Software
Specification
Begin
Next
S==Off
True False
*Key==On S==Wait
False
True False True
*Start *END==5 *END==10
Next False
True False True
S=Wait *Key==Off *Alarm=On *Belt==On
False
Next
True
Next FalseTrue
*Belt==On S=Alarm *Key==Off
False True Next False True
*Alarm=Off
Next
S=Off
Next
End
Sender A B C Receiver
Example HW to SW interface
ack
HW HW to SW SW
X y
-1 / 0
11 + 0- / 0
-0 / 1
X
0 1
y
10 / 1 x ack / y ack
The POLIS Co-design Environment
Graphical EFSM ESTEREL (Other)…
Formal
Verification
Compilers
Partitioning
CFSMs
SW Synthesis HW Synthesis
Interface
Synthesis
Prototype
Module Outline
• Introduction
• Unified HW/SW Representations
• HW/SW Partitioning Techniques
• Integrated HW/SW Modeling Methodologies
• HW and SW Synthesis Methodologies
• Industry Approaches to HW/SW Codesign
• Summary
Module Summary
• The synergistic design of hardware and software in a digital system, called
Hardware/Software Codesign, has been explored
• Elements of a HW/SW Codesign methodology have been outlined
• Industrial design flows that contain aspects of codesign have been
presented
• Present day research into automating portions of the codesign problem
have been explored
• As digital systems become more complex and performance criteria become
more stringent, codesign will become a necessity
• Better design tools and unified design environments will allow codesign
techniques to become standard practice
References
*Boehm73+ Boehm, B.W. “Software and its Impact: A Quantitative Assessment,” Datamation, May 1973, p. 48-59.
*Buchenrieder93+ Buchenrieder, K., “Codesign and Concurrent Engineering”, Hot Topics, IEEE Computer, R. D. Williams,
ed., January, 1993, pp. 85-86
*Buck94+ Buck, J., et al., “Ptolemy: a Framework for Simulating and Prototyping Heterogeneous Systems,” International
Journal of Computer Simulation, Vol. 4, April 1994, pp. 155-182.
[Chiodo92] Chiod0, M., A. Sangiovanni-Vincentelli, “Design Methods for Reactive Real-time Systems Codesign,”
International Workshop on Hardware/Software Codesign, Estes Park, Colorado, September 1992.
[Chiodo94] Chiodo, M., P. Giusto, A. Jurecska, M. Marelli, H. C. Hsieh, A. Sangiovanni-Vincentelli, L. Lavagno, “Hardware-
Software Codesign of Embedded Systems,” IEEE Micro, August, 1994, pp. 26-36; © IEEE 1994.
*Chou95+ P. Chou, R. Ortega, G. Borriello, “The Chinook hardware/software Co-design System,” Proceedings ISSS, Cannes,
France, 1995, pp. 22-27.
*DeMicheli93+ De Micheli, G., “Extending CAD Tools and Techniques”, Hot Topics, IEEE Computer, R. D. Williams, ed.,
January, 1993, pp. 84
*DeMicheli94+ De Micheli, G., “Computer-Aided Hardware-Software Codesign”, IEEE Micro, August, 1994, pp. 10-16
*DeMichelli97+ De Micheli, G., R. K. Gupta, “Hardware/Software Co-Design,” Proceedings of the IEEE, Vol. 85, No. 3,
March 1997, pp. 349-365.
*Ernst93+ Ernst, R., J. Henkel, T. Benner, “Hardware-Software Cosynthesis for Micro-controllers”, IEEE Design and Test,
December, 1993, pp. 64-75
*Franke91+ Franke, D.W., M.K. Purvis. “Hardware/Software Codesign: A Perspective,” Proceedings of the 13th
International Conference on Software Engineering, May 13-16, 1991, p. 344-352; © IEEE 1991
References (Cont.)
[Gajski94] Gajski, D. D., F. Vahid, S. Narayan, J. Gong, Specification and Design of Embedded Systems, Prentice Hall,
Englewood Cliffs, N J, 07632, 1994
*Gupta92+ Gupta, R.K., C.N. Coelho, Jr., G.D. Micheli. “Synthesis and Simulation of Digital Systems Containing Interactive
Hardware and Software Components,” 29th Design Automation Conference, June 1992, p.225-230.
*Gupta93+ Gupta, R.K., G. DeMicheli, “Hardware-Software Cosynthesis for Digital Systems,” IEEE Design and Test,
September 1993, p.29-40; © IEEE 1993.
*Hermann94+ Hermann, D., J. Henkel, R. Ernst, “An approach to the estimation of adapted Cost Parameters in the
COSYMA System”, 3rd International Conference on Hardware/Software codesign, Grenoble, France, September 22-
24, 1994, pp. 100-107
[Hood94] Hood, W., C. Myers, "RASSP: Viewpoint from a Prime Developer," Proceedings 1st Annual RASSP Conference,
Aug. 1994.
[IEEE] All referenced IEEE material is used with permission.
*Ismail95+ T. Ismail, A. Jerraya, “Synthesis Steps and Design Models for Codesign,” IEEE Computer, no. 2, pp. 44-52, Feb
1995.
*Kalavade93+ A. Kalavade, E. Lee, “A Hardware-Software Co-design Methodology for DSP Applications,” IEEE Design and
Test, vol. 10, no. 3, pp. 16-28, Sept. 1993.
*Klenke96+ Klenke, R. H., J. H. Aylor, R. Hillson, D. J. Kaplan, “VHDL-Based Performance Modeling for the Processing Graph
Method Tool (PGMT) Environment,” Proceedings of the VHDL International Users Forum, Spring 1996, pp. 69-73.
[Kumar95] Kumar, S., “A Unified Representation for Hardware/Software Codesign”, Doctoral Dissertation, Department of
Electrical Engineering, University of Virginia, May, 1995
[Jalote91] Jalote, P., An Integrated Approach to Software Engineering, Springer-Verlag, New York, 1991.
*McFarland90+ McFarland, M.C., A.C. Parker, R. Camposano. “The High-Level Synthesis of Digital Systems,” Proceedings of
the IEEE, Vol. 78, No. 2, February 1990, p.301-318, © IEEE 1990.
References (Cont.)
*Parker84+ Parker, A.C., “Automated Synthesis of Digital Systems,” IEEE Design and Test,, November 1984, p. 75-81.
[RASSP94] Proceedings of the 1st RASSP Conference, Aug. 15-18, 1994.
[Rozenblit94] Rozenblit, J. and K. Buchenrieder (editors). Codesign Computer -Aided Software/Hardware Engineering,
IEEE Press, Piscataway, NJ, 1994; © IEEE 1994.
*Smith86+ Smith, C.U., R.R. Gross. “Technology Transfer between VLSI Design and Software Engineering: CAD Tools and
Design Methodologies,” Proceedings of the IEEE, Vol. 74, No. 6, June 1986, p.875-885.
*Srivastava91+ M. B. Srivastava, R. W. Broderson, “Rapid prototyping of Hardware and Software in a Unified Framework,”
Proceedings ICCAD, 1991, pp. 152-155.
*Subrahmanyam93+ Subrahmanyam, P. A., “Hardware-Software Codesign -- Cautious optimism for the future”, Hot Topics,
IEEE Computer, R. D. Williams, ed., January, 1993, pp. 84
[Tanenbaum87] Tanenbaum, A.S., Operating Systems: Design and Implementation, Prentice-Hall, Inc., Englewood Cliffs,
N.J., 1987.
*Terry90+ Terry, C. “Concurrent Hardware and Software Design Benefits Embedded Systems,” EDN, July 1990, p. 148-154.
*Thimbleby88+ Thimbleby, H. “Delaying Commitment,” IEEE Software, Vol. 5, No. 3, May 1988, p. 78-86.
*Thomas93+ Thomas, D.E., J.K. Adams, H. Schmitt, “A Model and Methodology for Hardware-Software Codesign,” IEEE
Design and Test, September 1993, p.6-15; © IEEE 1993.
*Turn78+ Turn, R., “Hardware-Software Tradeoffs in Reliable Software Development,” 11th Annual Asilomar Conference
on Circuits, Systems, and Computers, 1978, p.282-288.
*Vahid94+ Vahid, F., J. Gong, D. D. Gajski, “A Binary Constraint Search Algorithm for Minimizing Hardware During
Hardware/Software Partitioning”, 3rd International Conference on Hardware/Software Codesign, Grenoble, France,
Sepetember22-24, 1994, pp. 214-219
*Wolf94+ Wolf, W.H. “Hardware-Software Codesign of Embedded Systems,” Proceedings of the IEEE, Vol. 82, No.7, July
1994, p.965-989.
References (Cont.)
Additional Reading:
Aylor, J.H. et al., "The Integration of Performance and Functional Modeling in VHDL” in Performance and Fault Modeling
with VHDL, J. Schoen, ed., Prentice-Hall, Englewood Cliffs, N.J., 1992.
D’Ambrosio, J. G., X. Hu, “Configuration-level Hardware-Software Partitioning for Real-time Embedded Systems”, 3rd
International Conference on Hardware/Software codesign, Grenoble, France, September 22-24, 1994, pp. 34-41
Eles, P., Z. Peng, A. Doboli, “VHDL System-Level Specification and Partitioning in a Hardware-Software Cosynthesis
Environment”, 3rd International Conference on Hardware/Software codesign, Grenoble, France, September 22-24,
1994, pp. 49-55
Gupta, R.K., G. DeMicheli, “Hardware-Software Cosynthesis for Digital Systems,” IEEE Design and Test, September 1993,
p.29-40.
Richards, M., Gadient, A., Frank, G., eds. Rapid Prototyping of Application Specific Signal Processors, Kluwer Academic
Publishers, Norwell, MA, 1997
Schultz, S.E., “An Overview of System Design,” ASIC and EDA, January 1993, p.12-21.
Thomas, D. E, J. K. Adams, H. Schmit, “A Model and Methodology for Hardware-Software Codesign”, IEEE Design and Test,
September, 1993, pp. 6-15
Zurcher, F.W., B. Randell, “Iterative Multi-level Modeling - A Methodology for Computer System Design,” Proceedings IFIP
Congress ‘68, Edinburgh, Scotland, August 1968, p.867-871.
• https://www.design-
reuse.com/articles/4339/enforcing-design-
rules-to-develop-reusable-ip.html
• https://www.design-
reuse.com/articles/6978/hardware-software-
partitioning-methodology-for-systems-on-
chip-socs-with-risc-host-and-configurable-
microprocessors.html
Design Reuse - 1
• Motivation
– High cost of design and verification
– Shorter design cycles
– Higher quality demands
– Emerging System-on-a-Chip (SoC) designs
• Very short design cycles
• Large numbers of distinct designs
• Analogous to board design today
Design Reuse - 2
• Requirements
– Correct and robust
• Well-written, well-documented, thoroughly-commented
code
• Well-designed verification suites and robust scripts
– Solves a general problem
• Easily configurable; parameterized
– Supports multiple technologies
• Soft macros: synthesis scripts span a variety of libraries
• Hard macros: porting strategies to new technologies
Design Reuse - 3
• Requirements (continued)
– Simulates with multiple simulators
• Both VHDL and Verilog version of models and test-benches
• Work with all major commercial simulators
– Accompanied by full verification environment
• Test benches and verification suites that provide high levels of verification
coverage
– Verified rigorously before release
• Includes construction of actual prototype tested in actual system with
real software
Design Reuse - 4
• Requirements (continued)
– Documented in terms of applications and restrictions
• Valid configurations and parameter values
• Interfacing requirements and restrictions
Example SoC
Digital Signal
Processor
Processor
System Bus
231
Design Paradigms - 5
• Correct by construction
– Focus on one pass design with goal of completely
correct during this pass
• Construction by correction
– Begin with the realization that multiple complete
iterations will be required
– First pass is quick to see the problems at various levels
caused by the decisions at prior levels
– Performed design refinement several times
The Role of Reuse
• Redesign of cores such as processors, bus
interfaces, DSP processors, DRAM controllers,
RAMS, etc. is not cost-effective
• Redesign of common blocks such as ALUs, barrel
shifters, adders, and multipliers, likewise, not cost
effective
• Availability of well-designed macros particularly
parameterizable versions can greatly reduce cost
Macros, Cores and Blocks - 1
• Verification
– Rule - strategy must be developed and
documented before macro selection or
design begins
• Guideline - selection of verification tools can
affect the coding style of macros and the
design - testbench design must be started early
in the design process
System Design Rules and Guidelines - 7
04/26/2003 242
System Design Rules and Guidelines - 8
04/26/2003 243
RTL Coding Guidelines - 1
• Fundamental Principles
• Basic Coding Practices
• Coding for Portability
• Guidelines for Clocks and Resets
• Coding for Synthesis
• Partitioning for Synthesis
• Designing with Memories
04/26/2003 244
RTL Coding Guidelines - 2
• Fundamental Principles
– Use simple constructs, basic types (for VHDL), and simple
clocking schemes
– Be consistent in coding style, naming, and style for
processes and state machines
– Use a regular partitioning method with module outputs
registered and modules of about the same size
– Use comments, meaningful names and constants and
parameters instead of numbers
04/26/2003 245
RTL Coding Guidelines - 3
04/26/2003 246
RTL Coding Guidelines – 4
• Basic Coding Practices (continued)
• Use clk for the clock signal. If multiple clocks, use clk as the prefix
for all clock signals.
• Use the same name for all clk signals driven by same source
• For active low signal, end with underscore _n for standardization
• Use rst for reset signals - if active low, use rst_n
• For buses, use consistent bit ordering; recommend (y downto x)
(VHDL) or (x:0) (Verilog)
• Use same name for connected ports and signals
• Use *_r for register output, *_a for asynchronous signals, *_pn for
signals in phase n, *_nxt for data in to register *_r, and *_z for
internal, 3-state signal.
– Many more!
04/26/2003 247
RTL Coding Guidelines - 5
• Coding for Portability
– Rule (VHDL) - Use only IEEE standard types
• Use std_logic instead of std_ulogic
• Be conservative re number of created types
• Do not use bit or bit_vector (no built in arithmetic)
– Do not use hard-coded values
– (VHDL) Collect all parameter values and function definitions
into a package DesignName_package.vhd
– (Verilog) Keep ‘define statements in a separate file
DesignName_params.v
04/26/2003 248
RTL Coding Guidelines - 6
– Avoid embedding dc_shell scripts to avoid unintended
execution with negative impact and obsolescence
– Use technology-independent libraries to maintain technology
independence (e.g., DW in Synopsys)
– Avoid instantiating gates in designs
– If technology-specific gates must be instantiated, isolate in
separate module
– If gate instantiated, use technology-independent library (e.g.,
GTECH in Synopsys)
04/26/2003 249
RTL Coding Guidelines - 7
– Code for translation between VHDL and
Verilog
• Do not use reserved keywords from Verilog as
identifiers in a description in VHDL and vice-versa.
• In VHDL, do not use:
– generate
– block
– Code to modify constant declarations
04/26/2003 250
RTL Coding Guidelines - 8
• Guidelines for Clocks and Resets
– Avoid mixed clock edges
• Duty cycle of clock becomes critical in timing analysis
• Separate serial scan handling required
• If required, worst case duty cycle(s) must be accurately modeled,
duty cycle documented, and + and - edge flip-flops in separate
modules
– Avoid clock buffers (done in physical design)
– Avoid gated clocks (technology specific, timing dependent,
and non-scannable)
04/26/2003 251
RTL Coding Guidelines – 9
– Avoid internally-generated clocks (logic they clock cannot
be scanned; synthesis constraints difficult to write)
– Avoid internally generated resets
– If gated clocks, or internally-generated clocks or resets
required, do in separate module at the top level of the
design and partition into modules using single clock and
reset.
– Model gated clocks as if registers enabled.
– Model complex reset by generating reset signal in
separate module
04/26/2003 252
RTL Coding Guidelines - 10
• Coding for Synthesis
– Infer registers
– Avoid latches
– Avoid combinational feedback
– Specify complete sensitivity lists
– In Verilog, always use non-blocking assignments in
always@(*edge clk)
– In VHDL, signals preferred to variables, but
variables can be used with caution
04/26/2003 253
RTL Coding Guidelines - 11
• Coding for Synthesis (continued)
– Use case over if-then-else whenever priority
structure not required.
– Use separate processes for sequential state
register and combinational logic
– In VHDL, create an enumerated type for state
vector. In Verilog, use ‘define. Why ‘define rather
than parameters?
– Keep FSM logic separate
04/26/2003 254
RTL Coding Guidelines - 12
04/26/2003 255
RTL Coding Guidelines - 13
04/26/2003 256
Macro Synthesis Guidelines - 1
04/26/2003 257
Macro Synthesis Guidelines - 2
• Subblock Synthesis
– Typically compile-characterize-write script- reoptimize
approach
• Macro Synthesis
– Compile individual subblocks
– Characterize-compile overall
– Perform incremental compile
• Use of RAM Compiler and Module Compiler
04/26/2003 258
Developing Hard Macros
04/26/2003 259
Macro Deployment - 1
• Deliverables
– Soft Macros
• RTL code
• Support files
• Documentation
• See Table 9-1
– Hard Macros
• Broad set of integration models
• Documentation for integration into final chip
• Design archive of files
• See Table 9-2
04/26/2003 260
Macro Deployment - 2
• Deliverables
– Soft Macros
• RTL code
• Support files
• Documentation
• See Table 9-1
– Hard Macros
• Broad set of integration models
• Documentation for integration into final chip
• Design archive of files
• See Tables 9-2, 9-3
04/26/2003 261
System Integration
• The integration process
– Selection of macros
– Design of macros
– Verification of macros
– The design flow - See Fig. 10-1 RMM
– Verification of design
• Overall: Verification important throughout the
macro design and system design
04/26/2003 262
Partitioning in
Hardware/Software
Co-Design
Overview Of a Partitioner
Closer Look At Partitioner
Issues Involved during Partitioning
Process
– Nature of Application
– Target Architectures
– Interplay Of Granularity and Estimation
– Closeness Metrics
– Cost Function
Nature Of Application
• Computation oriented systems
– Workstations, PC’s or scientific parallel computers
• Control Dominated Systems
reacts to external events
• Data-Dominated Systems
– Complex transformation or transportation of data
– Eg DSP or Router
• Mixed Systems
– Eg Mobile Phone or Motor Control
Architecture for control dominated
systems
• Each FSM mapped to a process
• Small Variable set – FSM state
• Short Program segments – FSM transitions
• Explosion of states and transitions – Issue of Code
Size
• Shared Memory architecture
• Optimizations – bit manipulations, few operation per
state transition .
• E.g.. 8051,Motorolla MC68332 , Siemen’s 80C166
Architecture for Data Oriented Systems
• Emphasis on high throughput than short latency deadline
• Large data variables – Memory optimization
• Periodic behaviour of system parts
– Static schedule
• Transformations for high concurrency such as loop unrolling
• Specialize control,data path and interconnect function units
• Priori known address sequences and operations – Memory
and address unit specialization
• Eg: DSP Applications–ADSP21060,TMS320C80
Mixed Systems
• Interconnected data and control dominated
functions
• Approaches
– Heterogeneous systems – Independently controlled
communicating specialized components
– Computation application without specific specialization
potential.
• E.g. Printer or Scanner controller
– Tailoring of less specialized systems to an application
domain – Eg. Minimize power consumption or cost for
a required level of performance
• E.g.: ARM family , Motorolla Cold Fire family
Modern Embedded Architectures
Highly multiplexed data path processors.
• ASIPs.
– Optimized for speed, performance, power
characteristics of the application and can be reused
and provide cost.
• VLIW processors.
– Network of horizontally programmable execution unit.
• Commercial programmable DSPs( Harvard Arch).
– Separate program and data memories.
– Instruction set is tuned to multiply-accumulation Op.
Granularity Level
• Coarse Grain Partitioning
– Task / Process or Function level
• Fine Grain Partitioning
– Operator ,Statement or Basic Block Level
• Even lower level of Assembly Language not
useful – Based upon processor details
Fine Grain Granularity
• Becomes important as processor performance and
system software increases.
• Less obvious , more difficult and time consuming
and can have high overheads.
– Communication time overhead.
– Communication area overhead – May require buffers or
memories.
– Interlocks.
– Change in efficiency of compiler optimizations , pipelines
and concurrent units utilizations.
Coarse Grain Granularity
• Limits parallelism
• Reduces time and error during estimations
• Better suited for manual partitioning
Closeness Metrics
• Measures the likelihood that two pieces of
specification are mapped on to the same system
component.
• Metrics.
– Connectivity.
• Measures no. of wires shared between two behaviours.
– Communication.
• Measures amount of data transferred between two
behaviours.
– Constrained Communication.
• Measures communication metric between those behaviours
with given performance constraints.
– Common accessors.
• Grouping of behaviours(or variables) accessed via subroutine
calls and variable read/write by many of same behaviours
reduces inter component communication.
– Sequential Execution.
• If two behaviours are defined sequentially in specification ,
mapping on to same processor does not affect performance.
– Hardware Sharing.
• Measures the amount of hardware that two behaviours can
share.
– Balanced Size.
• Achieves a final partition of groups that are roughly balanced
in hardware size.Otherwise above metrics
lead to a single group.
Structural/Functional
• Functional Partitioning. Partitioning
– Partitions a functional specification into smaller sub-
specifications and synthesizes structure for each.
– Isolates a function to one part.
• Reduces I/O.
• Prevents critical path from crossing parts thus reducing clock
period.
• Yields simpler hardware , reducing clock period.
• Complete control over I/O allowing tradeoff with performance.
– Reduces synthesis tool times and memory usage.
• Structural Partitioning.
– A structure is synthesized for the entire
specification and then partitioned.
– Size and Delay can be estimated quickly and
accurately.
– It cannot satisfy both size and I/O constraints.
– Placement and Routing can be done.
more efficiently.
– Not suitable for large systems.
Partitioning Algorithms
• Random Mapping
• Multistage Clustering
• Hierarchical Clustering
• Group Migration
• Ratio Cut
• Simulated Annealing
• Genetic Evolution
• ILP Formulation
Cosyma
• Target Architecture
– standard RISC processor core
– a fast RAM for program and data with single clock cycle
access time
– an automatically generated application specific
coprocessor.
– Peripheral units must be inserted by the designer.
– Processor and coprocessor communicate via shared
memory in mutual exclusion
• Granularity
– Partitioning works at the basic block level.
• Since communication between basic blocks of a
process is implicit , partitioning requires
communication analysis.
• Simulate on an RT-level model of the target
processor to obtain profiling and software timing
information
Hardware/Software Partitioning
• Input to partitioning are the ESG with profiling (or control
flow analysis) information, the CDR-file and synthesis
directives which include channel mapping directives,
partitioning directives, and component selection.
• Starts with an all software solution and tries to extract
hardware components iteratively until all timing
constraints are met.
• The partitioning goals are
– meet real-time constraints
– minimize hardware costs
– minimize the CAD system response time
Algorithm & Cost function
http://www.scholarpedia.org/article/Ant_colony_optimization
https://en.wikipedia.org/wiki/Integer_programming
The Particle Swarm
Optimization Algorithm
Summary
• Introduction to Particle Swarm Optimization
(PSO)
– Origins
– Concept
– PSO Algorithm
geographical
social
Introduction to the PSO: Algorithm -
Neighborhood
global
Introduction to the PSO: Algorithm -
Parameterss
Algorithm parameters
– A : Population of agents
– f : Objective function
– vi : Velocity of agent’s ai
In PSO:
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm -
Example
Introduction to the PSO: Algorithm
Characteristics
• Advantages
– Insensitive to scaling of design variables
– Simple implementation
– Easily parallelized for concurrent processing
– Derivative free
– Very few algorithm parameters
– Very efficient global search algorithm
• Disadvantages
– Tendency to a fast and premature convergence in mid optimum points
– Slow convergence in refined search stage (weak local search ability)
Introduction to the PSO: Different
Approaches
• Several approaches
– 2-D Otsu PSO
– Active Target PSO
– Adaptive PSO
– Adaptive Mutation PSO
– Adaptive PSO Guided by Acceleration Information
– Attractive Repulsive Particle Swarm Optimization
– Binary PSO
– Cooperative Multiple PSO
– Dynamic and Adjustable PSO
– Extended Particle Swarms
– …
Davoud Sedighizadeh and Ellips Masehian, “Particle Swarm Optimization Methods, Taxonomy and Applications”.
International Journal of Computer Theory and Engineering, Vol. 1, No. 5, December 2009
On solving Multiobjective Bin Packing Problem
Using Particle Swarm Optimization
• Objectives
– Minimize the number of bins used K
– Minimize the average deviation between the overall
centre of gravity and the desired one
PSO for the BPP:
Initialization
OR
PSO for the BPP:
Algorithm
1st Stage:
• Partial Swap between 2 bins
• Merge 2 bins
• Split 1 bin
2nd Stage:
• Random rotation
3rd Stage:
• Random shuffle
H hybrid
M multi
O objective
P particle
S swarm
O optimization
• Definition of parameters:
[1] Wang, K. P., Huang, L., Zhou C. G. and Pang, W., “Particle Swarm Optimization for Traveling Salesman Problem,”
International Conference on Machine Learning and Cybernetics, vol. 3, pp. 1583-1585, 2003.
[2] Tan, K. C., Lee, T. H., Chew, Y. H., and Lee, L. H., “A hybrid multiobjective evolutionary algorithm for solving truck
and trailer vehicle routing problems,” IEEE Congress on Evolutionary Computation, vol. 3, pp. 2134-2141, 2003.
PSO for the BPP:
Simulation Results
• Comparison on the performance of metaheuristic
algorithms against the branch and bound method (BB) on
single objective BPP
• Outperforms MOPSO and MOEA in most of the test cases used in this
paper
design methodology for embedded
memories
• https://hal.archives-ouvertes.fr/hal-
00181196/document
• https://www.ics.uci.edu/~dutt/pubs/bc12-
hipc02-panda.pdf
• http://www.interradesign.com/pdf/MC2_Data
sheet.pdf
sysyemC
Design Challenge
s
Reference 4
http://www.doulos.com
Silicon complexity Software
v.s complexity
Silicon is growing 10x Software in systems growing faster
complexity
every 6 years is
than 10x every 6 years
Reference 5
http://www.doulos.com
Increasing Complexity in SoC Designs
– Based on functionality
– Hardware and software
7
Traditional System Design Flow (1/2)
l Integration problems
l High cost and long iteration
8
Traditional System Design Flow (2/2)
nn System
System Level
Level Design
Hardware and
Design
nn Hardware and Software
Algorithm Developmen
nn Algorithm Development
Software
Processor tSelection
nn Processor Selection
Done mainly
nn Done mainly in
in C/C++
C/C++ C/C++
C/C++
Environment
Environment
ICDevelopment
nnIC Development Verificatio Process SoftwareDesign
Software Design
Hardware
nnHardware n w w CCooddeeDDeevveeloloppm
Implementation
nnImplementation wmRReTeTnO
nO
ttSSddeetta
Decisions
nnDecisions
$$$ ilDone
w aDone mainly in
isls mainly in
Donemainly
mainly in
in HDL
HDL C/C++
C/C++
nnDone
EDA
EDA CC//CC++++EEnnvviriroonn
Environment
Environment mmeenntt
Reference : Synopsys
9
Typical Project
Schedule
System
Design
Hardware Design
Prototype
Build
Hardware
Debug
Software
Design
Software
Coding
Software Debug
10
Former Front- Design Flow
End
C/C++ Convert by
C/C++ Hand
System
System
Level
Level
Model
Model Verilog
Verilog Verilog
Testbench
Testbench
AAnnaalylyss
isis
SSimimuulalattio
Results
Results ionn
Refine
SSyynntthhees
sisis
Reference : DAC 2002 SystemC Tutorial
11
Problems with the Design Flow
C/C++ Convert by
C/C++ Hand
System
System
Level
Level
Model
Model Verilog
Verilog Verilog
Testbench
Testbench
AAnnaalylyss
isis
Not reusable
SSimimuulalattio
Results
Results ionn
Refine
SSyynntthhees
˝Not done by designers sisis
˝The netlist is not preserved
14
Project Schedule with HW/SW Co-design
System Design
Hardware Design
Prototype Build
Hardware Debug
Software Design
Software Coding
Software Debug
15
Modern System Flow
Design
Specification of the
System
Hardware and
Software
Partitioning
Architectural
Exploration
16
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 17
Motivation to Use a Modeling
Language
l The increasing system design complexity
l The demand of higher level abstraction
and modeling
l Traditional HDLs (verilog, VHDL, etc)
are
suitable for system level design
– Lack of software supports
l To enable an efficient system design flow
18
Requirements a System Design
of Language
l Support system models at various levels
abstraction of
l Incorporation of embedded software
portions
of a complex system
– Both models and implementation-level code
l Creation of executable specifications
of design intent
l Creation of executable platform models
19
Requirements of a System Design
Language
l Fast simulation speed to enable design-space
exploration
– Both functional specification and
architectural implementation alternatives
l Constructs allowing the separation of
system function from system
communications
– In order to allow flexible adaptation and reuse of
both models and implementation
l Based on a well-established programming language
– In order to capitalize on the extensive infrastructure of
capture, compilation, and debugging tools already
available
20
Model Accuracy
Requirements
l Structuralaccuracy
l Timing accuracy
l Functional accuracy
21
System Level Language
SystemC
Cynlib
C/C++ SoC++
Based
Handel-C
A/RT
(Library)
VHDL/Verilog VHDL+
Replacement System
System-Level s
Modeling Verilog
Language
Higher-level SDL
Languages
SLDL
Entirely New
Languages SUPERLO
G
Java- Java
Based
22
Language Use
C/C++ SystemC TestBuilder, Verilog SUPERLOG
2.0 OpenVer,e VHDL
Embedded
Very
Good NO NO NO
SW Good
System
Level Very
OK Excel NO Good
Design Poor
Very
Verification OK Good Excel OK
Good
RTL
Design NO Good NO Excel Excel
SystemVerilog
3.0/3.1 Verilog-
1995
Verilog-
2001 earliest Verilog
29
Why C/C++ Based Language for System
Modeling
l Specification between architects
and implementers is executable
l High simulation speed due to the higher level
of abstraction
l Refinement, no translation into HDL (no
―semantic
gap‖)
l Testbench re-use
30
Advantages of Executable
Specifications
l Ensure the completeness of specification
– Even components (e.g. Peripherals) are so complex
– Create a program that behave the same way as the
system
l Avoid ambiguous interpretation of the specification
– Avoids unspecified parts and inconsistencies
– IP customer can evaluate the functionality up-front
l Validate system functionality before implementation
– Create early model and validate system performance
l Refine and test the implementation of
the specification
– Test automation improves Time-to-Market
31
Can Traditional C++ Standard Be Used?
Compile
header r
files
systemC Linke
librarie r
s
Source file for
Debugge system
r and testbenches
make
Executable=simulato
r
Reference : DAC 2002 SystemC Tutorial
35
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 36
SystemC Language Architectur
e
SystemC Language Layering Architecture
Not-standard
Event-Driven Simulation
Kernel
37
C++ Language
Standard
System Abstraction Level
(1/3)
l Untimed Functional Level (UTF)
– Refers to both the interface and
functionality
– Abstract communication channels
– Processes executed in zero time but in
order
– Transport of data executed in zero time
l Timed Functional Level (TF)
– Refers to both the interface and
functionality
– Processes are assigned an execution time
– Transport of data is assigned a time
– Latency modeled 38
– ―Timed‖ but not ―clocked‖
System Abstraction Level
(2/3)
l Bus Cycle Accurate (BCA)
– Transaction Level Model (TLM)
– Model the communications between system
modules using shared resources such as
busses
– Bus cycle accurate or transaction accurate
• No pin-level details
l Pin Cycle Accurate (PCA)
– Fully described by HW and the
signals
communications protocol
– Pin-level details
– Clocks used for timing
39
System Abstraction (3/3)
Level
l Register Transfer Accurate
– Fully timed
– Clocks used for
synchronization
– Complete functional details
• Every register for every cycle
• Every bus for every cycle
• Every bit described for every cycle
– Ready to RTL HDL
40
Core
Language
l Time Model
– To define time unit and its resolution
l Event-Driven Simulation Kernel
– To operate on events and switch between
processes, without knowing what the
events actually represent or what the
processes do
l Modules and Ports
– To represent structural information
l Interfaces and Channels
– To describe the abstraction of
communication between the design block
41
Time Model
42
Time Model
(cont’)
l Time resolution
– Must be specified before any time (e.g
objects sc_time) are created
-12
– Default value is one pico-second ( 10 )
l Time unit s
Example for 42 picosecond
SC_FS femtosecond
SC_NS nenosecond
Example for resolution
SC_US microsecond
sc_set_time_resolution(10,
SC_MS millisecond SC_PS)
sc_time SC_NS
T2(3.1416, )
SC_SEC second
T2 would be rounded to 3140 ps 43
Modules
l The basic building blocks for partitioning a design
l Declared with the SystemC keyword SC_MODULE
l Typically contain
– Ports that communicate with the environment
– Process that describe the functionality of the module
RTL process
44
Modules - Example
SC_MODULE (FIFO) {
Load //ports, process, internal data,
Full
Read FIFO etc sc_in<bool> load;
sc_in<bool> read;
Empty
Data sc_inout<int>data;
sc_out<bool> full;
sc_out<bool>
empty;
SC_CTOR(FIFO){
//body of constructor;
//process declaration, sensitivities,
etc.
} 45
};
Module Instantiatio
n
Top Module
Positional
Association
Named
Association
46
Similar Control Flow Description
s
IF
CASE
FOR
47
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 48
Data-Types
l SystemC allows users to use any C++ data types as well
as unique SystemC data types
– sc_bit – 2 value single bit type
– sc_logic – 4 value single bit
– type
– sc_int – 1 to 64 bit signed integer type
– sc_uint – 1 to 64 bit unsigned integer type
–
sc_bigint – arbitrary sized signed integer type
–
sc_biguint – arbitrary sized unsigned integer
type
–
sc_bv – arbitrary sized 2 value vector type
–
sc_lv – arbitrary sized 4 value vector type
–
sc_fixed - templated signed fixed point type
–
sc_ufixed - templated unsigned fixed point
–
type sc_fix - untemplated signed fixed point
type sc_ufix - untemplated unsigned fixed 49
point type
Type sc_bit
sc_bit
operators
For Example :
sc_bit a,b; //Declaration
a = a & b;
a=a|b
50
Type sc_logic
l The sc_logic has 4 values, ’0’(false), ’1’(true), ’X’
(unknown), and ’Z’ (high impedance or floating)
l This type can be used to model designs with multi-driver
busses, X propagation, startup values, and floating busses
l The most common type in RTL simulation
sc_logic
operators
For Example
l The
C++ int type is machine dependent, but
usually 32 bits
l SystemC integer type provides integers from 1 to
64 bits in signed and unsigned
forms
l sc_int<n>
– A Fixed Precision Signed Integer
– 2’s complement notation
l sc_uint<n>
– A Fixed Precision Unsigned Integer
52
The Operators of sc_int<n> and sc_uint<n>
53
Bit Select and Part Select
54
The Examples of sc_int<n> and sc_uint<n>
sc_int<64> x;// declaration example
sc_uint<48> y;// declaration
example
sc_int<16> x, y, z;
z = x & y;// perform and operation on x and y bit
// by bit
z = x >> 4;// assign x shifted right by 4 bits to z
mybit = myint[7];
operation sc_uint<4>
inta;
sc_uint<4> intb; 55
sc_uint<8> intc;
intc = (inta, intb);
Arbitrary Precision Signed and
Unsigned
Integer Types
l For the cases that some operands to be larger
have not work
l
than 64 bits, the sc_int and sc_uint integer) or
will
sc_bigint (arbitrary sized signed integer) can solve
The sc_biguint (arbitrary size
this problem
unsigned
l These types allow the designer to work on integers
of any size, limited only by underlying system
l limitations
Arithmetic
precision and other operators also use arbitrary
when operations
l performing slowly than their fixed
These types execute therefore should only
more precision be
counterparts and used
when necessary 56
The Operators of the sc_bigint<n> and
sc_biguint<n>
l Type sc_bigint is a 2’s complement signed integer of any size
l Type sc_biguint is an unsigned integer of any size
l The precision used for the calculations depends on the sizes
of the operands used
57
Arbitrary Length Bit Vector žÍsc_bv<n>
58
The New Operators for
sc_bv<n>
l The new operators perform bit reduction
– and_reduce()
– or_reduce()
– xor_reduce()
sc_bv<64> databus;
sc_logic result;
result =
databus.or_reduce();
If databus contains 1 or more 1 values the result of the reduction will be 1.
60
Arbitrary Length Logic Vector žÍsc_lv<n>
l
The sc_lv types cannot be used in
arithmetic operations directly
61
The Operator of the sc_lv<n>
s
Bitwise &(and) |(or) ^(xor) ~(not)
Assignment = &= |= ^=
Equality == !=
sc_uint<16>
uint16; sc_int<16>
int16; sc_lv<16>
lv16;
lv16= uint16; // convert uint to
lv int16 = lv16; // convert lv to
int
62
Fixed Point Types
l For a high level model, floating point numbers
are
useful to model arithmetic operations
l Floating point numbers can handle a very large
of values and range
l Floating point are easily scaled
data data
built as fixed point typestypes
are typically converted
to minimize the or
amount
of hardware cost
l
To model the behavior of fixed point hardware,
designers need bit accurate fixed point data
l
types
Fixed point types are also used to develop DSP
software
63
Fixed Point Types (cont’)
l Thereare 4 basic types used to model fixed point
types in
SystemC
– sc_fixed
– sc_ufixed
– sc_fix
– sc_ufix
l Types sc_fixed and sc_fix specify a signed
point data fixed
type
l Types sc_ufixed and sc_ufix specify an unsigned
fixed point data type
64
Fixed Point Types
(cont’)
l Types sc_fixed and sc_ufixed uses static
arguments to specify the functionality of the
type
– Static arguments must be known at compile time
l Types sc_fix and sc_ufix can use
argument types that are non-static
– Non-static arguments can be variables
– Types sc_fix and sc_ufix can use variables to
determine word length, integer word length,
etc.
65
Syntax of the Fixed Point
Types sc_fixed<wl, iwl, q_mode, o_mode, n_bits> x;
sc_ufixed<wl, iwl, q_mode, o_mode, n_bits> y;
sc_fix x(list of options);
sc_ufix y(list of options);
67
Overflow Modes
68
The Operator of Fixed Point
s
69
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 70
Processe
s
l Processes are the basic unit of execution within
SystemC
l Processes are called to simulate the behavior of
the target device or system
l Processes provide the mechanism of
concurrent behavior to model electronic
system
l
A process must be contained in a module
l
Processes cannot not be hierarchical
– No process will call another process
l Processes can call methods and
directly that are
functions
not processes 71
Processes
(cont’)
l Processes have sensitivity lists
– a list of signals that cause the process to be
invoked,
whenever the value of a signal in this list changes
73
Process —
SC_METHOD
lA method that does not have its own thread of
execution
– Cannot call code with wait()
l Executed when events (value changes) occur
on the sensitivity list
l When a method process is invoked, it
executes and returns control back to the
simulation kernel until it is finished
l Users are strongly recommended not to
write infinite loops within a method process
– Control will never be returned back to the simulator
74
SC_METHO (Example)
D
// rcv.h // rcv.cc
#include "systemc.h" #include "rcv.h"
#include "frame.h" #include "frame.h"
void rcv::extract_id() {
SC_MODULE(rcv) { frame_type frame;
sc_in<frame_type> frame = xin;
xin; sc_out<int> id; if(frame.type == 1) {
void extract_id(); id = frame.ida;
SC_CTOR(rcv) { } else {
SC_METHOD(extract_id id = frame.idb;
); sensitive(xin); }
} }
};
75
Process —
SC_THREAD
l Thread process can be suspended and reactivated
l A thread process can contain wait() functions that
suspend process execution until an event occurs
on the sensitivity list
l An event will reactivate the thread process from
the statement that was last suspended
l The process will continue to execute until the next
wait()
76
SC_THREAD (Example of a Traffic Light)
// traff.h // traff.cc
#include #include "traff.h"
"systemc.h" void traff::control_lights()
SC_MODULE(traff) NSred
{ = false;
{
// input ports NSgreen == false;
NSyellow
sc_in<bool> roadsensor; true;
sc_in<bool> clock; EWred = true;
// output ports EWyellow =
sc_out<bool> NSred; false; EWgreen
sc_out<bool> = false; while
NSyellow; (true) {
sc_out<bool> while (roadsensor
NSgreen; NSgreen = false; //== false)
road sensor
wait();
triggered
sc_out<bool> EWred;
sc_out<bool> NSred = false;
NSyellow = true;// set NS to
EWyellow; yellow
sc_out<bool> wait();
EWgreen; void
control_lights(); for (i=0; i<5;
NSyellow i++) // set NS to
= false;
int i; red
// Constructor SC_CTOR(traff) { NSgreen = false; // yellow interval
SC_THREAD(control_lights);// Thread over
EWgreen =
Process sensitive << roadsensor; true;
sensitive_pos << clock; NSred = true;// set EW to green
} EWred = false;
}; EWyellow = false;
wait();
for (i= 0; i<50; i++)
. 77
.
.
Process —
SC_CTHREAD
l Clocked thread process is a special case of the thread processes
l A clocked thread process is only triggered on one edge of
one clock
– Matches the way that hardware is typically implemented
with synthesis tools
l Clocked threads can be used to create implicit state
machines within design descriptions
l Implicit state machine
– The states of the system are not explicitly defined
– The states are described by sets of statements with wait()
function calls between them
l Explicit state machine
– To define the state machine states in a declaration
– To use a case statement to move from state to state
78
SC_CTHREAD (Example of a BUS function)
Process or Channel
(owner or event)
event
trigger trigger
trigger
81
Events in Classical Hardware
Modeling
lAhardware signal is responsible for notifying
the
event whenever its value changes
– A signal of Boolean type has two additional events
• One associated with the positive edge
• One associated with the negative edge
– A more complex channel, such as a FIFO buffer
• An event associated with the change from being
empty to having a word written to it
• An event associated with the change from being full
to having a word read from it
82
Relationship Between the
Events
l Anevent object may also be directly by
one process used process
P1 to control
– If P1 has access to event object E andP2P2 is
another
sensitive to or waiting on E, then P1 may
trigger the execution of P2 by notifying E
– In this case, event E is not associated with the
change in a but rather with the execution
channel,
of some path in P1
83
Sensitivit
y
l The sensitivity of a process defines when this
process will be resumed or activated
l A process can be sensitive to a set of events.
84
Static
Sensitivity
l Static sensitivity list
– In a module, the sensitivity lists of events are
determined before simulation begins
– The list remains the same throughout simulation
85
Dynamic
Sensitivity
l It is possible for a process to temporarily override
its static sensitivity list
– During simulation a thread process may suspend
itself
– To designate a specific event E as the current event on
which the process wishes to wait
– Then, only the notification of E will cause the thread
process to be resumed
– The static sensitivity list is ignored
86
Dynamic Sensitivit — wait()
y
wait(E)
next_trigger(200, SC_NS, E)
Otherwise, when the timeout expires, the method process will be triggered
an its static sensitivity list will be back in effect
89
Special Dynamic Sensitivity for
SC_CTHREAD — wait_until()
90
Special Dynamic Sensitivity for
SC_CTHREAD — watching()
91
Special Dynamic Sensitivity for
SC_CTHREAD — watching() (Cont’)
// datagen.h // datagen.cc
#include "systemc.h" #include "datagen.h"
SC_MODULE(data_gen) { void gen_data() {
sc_in_clk clk; if (reset == true) {
sc_inout<int> data; data = 0;
sc_in<bool> reset; }
void gen_data(); while (true) {
SC_CTOR(data_gen){ data = data + 1;
SC_CTHREAD(gen_data, clk.pos()); wait();
watching(reset.delayed() == true); data = data + 2;
} wait();
}; data = data + 4;
} wait();
94
Example of Modules, Ports, Interfaces,
and Channels
port
Module with a
port
interface
primitive
channel Hierarchical
channel
with a port
Port-channel binding
Module 1 HC Module 2
95
Interface
s
l The ―windows‖ into channels that describe the set of
operations
l Define sets of methods that channels must implement
l Specify only the signature of each operation, namely,
the operation’s name, parameters, and value
return
l It neither specifies how the operations are
implemented nor defines data fields
96
Interfaces
(cont’)
l All interfaces must be derived, directly or indirectly,
from the abstract base class : sc_interface
l The concept of interface is useful to model
design layered
– between modules which are different level
Connect of
ion
abstractio with ports
n
– Ports are connected to channels through interfaces
l – A port that is connected to a channel through an
Relationship
interface sees only those channel methods that are
defined by the interface
97
Interface
Examples
l All interface methods are pure virtual methods without
any implementation
98
Interface
Examples
100
Ports
101
Ports (cont’)
lA port can have three different modes of operation
– Input (sc_in<T>)
– Output (sc_out<T>)
– Inout (sc_inout<T>)
102
Ports (Cont’)
lA port of a module can be connected to
– Zero or more channels at the same level of
hierarchy
– Zero or more ports of its parent module
– At least one interface or port
l sc_port allows accessing a channel’s interface
methods by using operator žor operator [ ]
l In the following example:
– ―input‖ is an input port of a process
– read() is an interface method of the attached channel
103
Access Ports
Read/Wrir
e through
methods
Hardware
types
cannot be
accessed
directly
104
Specialized
Ports
l Specialized
ports can be created by refining port base
class sc_port or one of the predefined port types
– Addresses are used in addition to data
• Bus interface.
– Additional information on the channel’s status
• The number of samples in a FIFO/LIF
available O
– Higher forms of sensitivity
• Wait_for_request()
105
Port-less Channel
Access
l Inorder to facilitate IP reuse and to enable tool
support, SystemC 2.0 define the
following mandatory design style
– Design style for inter-module level communicatio
– Design style for intra-module level
n
communicatio
n
106
Port-less Channel Access
(cont’)
l For inter-module level communication, ports must be
used to connect modules to channels
– Ports are handles for communicating with the
―outside world‖ (channels outside the module)
– The handles allow for checking design rules and
attaching
communication attributes, such as priorities
– From a software point-of-view they can be seen as a
kind of smart pointers
l For intra-module level communication, direct
access to channels is allowed
– Without using the ports.
– Access a channel’s interface in a ―port-less‖ way
by directly calling the interface methods.
107
Channels
l A channel implements one or more interfaces, and
serves as a container for communication
l functionality
A channel is the workhorse for holding
l and transmitting data
A channel is not necessarily a point-to-
l
point connection
A channel may be connected to more than
l
two modules
A channel may vary widely in complexity, from
hardware signal to complex protocols with
embedded processes
l
SystemC 2.0 allows users to create their own
channel types
108
Channels
(cont’)
l Primitive channels
– Do not exhibit any visible structure
– Do not contain processes
– Cannot (directly) access other channels
primitive
l Hierarchical channels
– Basically are modules
– Can have structure
– Can contain other modules and processes
– Can (directly) access channels
other
109
Primitive
Channels
l The hardware signal
– sc_signal<T>
l The FIFO channel
– sc_fifo<T>
l The mutual-exclusion lock (mutex)
– sc_mutex
110
The Hardware Signal –
sc_signal<T>
l The semantics are similar to the VHDL signal
l Sc_signal<T> implements the interface sc_signal_inout_if<T>
// controller.h SC_CTOR(controller) {
#include "statemach.h" // .... other module statements
s1 = new state_machine ("s1");
SC_MODULE(controller) s1->clock(clk); // special case port to port binding
{ s1->en(lstat); // port en bound to signal lstat
sc_in<sc_logic> clk; s1->dir(down); // port dir bound to signal down
s1->st(status); // special case port to
sc_out<sc_logic> count;
sc_in<sc_logic> status; // port binding
}
sc_out<sc_logic> load; };
sc_out<sc_logic> clear
sc_signal<sc_logic> lstat;
sc_signal<sc_logic> down;
111
The FIFO Channel –
sc_fifo<T>
l To provide both blocking and nonblocking versions of access
l Sc_fifo<T> implements the interfaces sc_fifo_in_if<T> and
sc_fifo_out_if<T>
Blocking
Ifversion
the FIFO is empty
suspend until more data is available
If the FIFO is full
suspend until more space is
available
NonBlocking
Ifversion
the FIFO is empty
do
If the FIFO is full
nothing
do
nothing 112
The Mutual-Exclusion Lock
(Mutex)
– sc_mutex
l Model critical sections for accessing shared
variables
l A process attempts to lock the mutex
before entering a critical section
l If the mutex has already been locked by another
process, it will cause the current process to
suspend
113
Channel Design Rules
114
Channel
Attributes
l Channel attributes can be used for a per-port
configuration of the communication
l Channel attributes are helpful especially when
modules are connected to a bus
l Attributes that can be used
115
Channel (Example
Attributes )
l Let mod be an instance of a module and let port be a port
of this module
// create a local
channel
message_queue mq;
...
// connect the module port to the
channel mod.port( mq );
...
116
Hierarchical Channels
l To model the new generation of SoC communication
infrastructures efficiently
l For instance, OCB (On Chip Bus)
– The standard backbone from VSIA
– The OCB consisting of several units
intelligent
• Arbiter unit
• A Control
• Programming
• unit
Decode unit
l For modeling complex channels such as the OCB
backbone, primitive channels are not very
suitable
– Due to the lack of processes and structures
l For modeling this type of channels,
hierarchical channels should be used 117
Primitive Channels v.s Hierarchical
Channels
l Use primitive channels
– When you need to use the request-update
scheme
– When channels are atomic and cannot
reasonably be chopped into smaller pieces
– When speed is absolutely crucial (using primitive
channels we can often reduce the number of
delta cycles)
– When it doesn’t make any sense trying to build
up a channel (such as a mutex) out of processes
and other channels
118
Primitive Channels v.s
Hierarchical
Channels (Cont’)
l Use hierarchical channels
– When channels are truly hierarchical and
users would want to be able to explore the
underlying structure
– When channels contain processes
– When channels contain other channels
119
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 120
Clock
Objects
l Clock objects are special objects which generate
timing signals to synchronize events in the
simulation
l Clocks order events in time so that parallel events
in hardware are properly modeled by a simulator
on a sequential computer
l Typically
design inclocks are created
the testbench andatpassed
the top levelthrough
of the the
down
module hierarchy to the rest of the
design
121
Clock Objects (Example
)
due to the
use of
wait
123
Design Example: 4-bit LFSR
X0
D R1
D R2
D D
R3 R4
;
LFSR_env.cp
p #include "LFSR.h"
void LFSR_Gen::Gen_proc()
{
while (1) {
reset_n=1;
wait(2,SC_NS);
reset_n=0;
wait(4,SC_NS);
reset_n=1;
wait(100,SC_NS
);
}
}
void LFSR_Mon::Mon_proc()
{
cout << "random= ";
cout << random << "\n";
} 127
Main.cpp
#include "LFSR.h"
int sc_main(int argc, char* argv[])
{
sc_signal<bool> reset_n;
sc_signal<unsigned int> random;
sc_time t1(2,SC_NS); // 1 cycle=
2ns sc_clock clk("clk",t1,0.5);
LFSR_Gen
M1("LFSR_Gen");
M1(reset_n);
LFSR M2("LFSR");
M2(clk,reset_n,random);
LFSR_Mon
M3("LFSR_Mon");
M3(random);
sc_start(100,SC_NS);
128
return 0;
}
Running Results
random= 0
random= 1
random= 0
random= 1
random= 3
random= 7
random=
14
random=
13
random=
11
random= 6
random=
12
random= 9
random= 2
random= 5
random=
10 129
random= 4
Outline
l Introductio
l
n
System Modeling Languages
l SystemC
l
Overview
l
Data-Types
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l
Environments HW/SW Co-
l
Verification Conclusion
130
The Supported Tools for SystemC
l Platform and Compiler
l System design environments
131
Platform and
Compiler
l Typically,a compiler for C++ standard can
compile the SystemC source well
code
– SystemC just a extended template
l GNU gcc for many platform
l Sun with solaris
– Forte c++
l HP
– Hp aC++
l Intel with Microsoft OS
– MS Visual C++
132
System Design
Environments
l Synopsys
– CoCentric System Studio (CCSS)
l Cadence
133
CoCentric System Level Design
Platform
CoCentric System
C/SystemC, Studio
Reference Performance
Design Kit Exploration
Processor Model
HW/SW Co-design
134
CoCentric System Level Design
Platform
l Algorithm libraries and Reference Design Kits
– Broadband Access: ADSL, DOCSIS cable
modem
– Wireless: CDMA, Bluetooth, GSM/GPRS,
PDC, DECT, EDGE
– Digital Video: MPEG-2, MPEG-4
– Broadcast standard: DAB, DVB
– Error Correcting Coding: RS coding,
coding Hamming
– Speech Coding: ITU G.72X, GSM speech,
speech AMR
135
CoCentric System Level Design
Platform
l Simulation, Debugging and Analysis
– Mixing of architectural and algorithmic models
in the same simulation
– Works with VCS, Verilog-XL, ModelSim, import
Matlab models for co-simulation
– Macro-debugging at the block level
– Micro-debugging at the source code level
– Davis
– VirSim
136
CoCentric System Level Design
Platform
l Path to implementation
– System code generate automaticall
Synthesizable C d y
Cocentric System
Studio Executable
specification
C/C++/SystemC
Cocentric SystemC
Compiler
C/C++
software
Implementation
flow
Design
Physical 137
Compiler
Advance Design System
d
ADS
DSP
Designer
C/C++ Measurement
HDL Models MATLAB
Models Instrumentatio
n
138
Outline
l Introductio
l
n
System Modeling
l
Languages
SystemC
l
Overview
l
Data-Types
l
Processes
l
Interfaces
l
Simulation Supports
l
System Design
l Environments HW/SW Co-
Verification Conclusion 139
Traditional HW/SW Verification Flow
140
Pre-Silicon
Prototype
l Virtual prototype
– Simulation environment
l Emulator
– Hundreds kilo Hz
l Rapid prototype
– Combination of FPGAs and chips that can be
dedicated interconnected to
instantiate a design
– Tens mega Hz
l Roll-Your-Own (RYO) prototype
– FPGA and Board
– Tens mega Hz
141
Virtual
Prototypes
l Definition
– A simulation model of a product , component, or
system
l Features
– Higher abstraction level
– Easily setup and modify
– Cost-effective
– Great observability
– Shorten design cycles
142
Verification Speed
Handshake? reset?
RTL
143
Verification Speed
Algorith
m Level 1000X
Transactio
n Level
100X
1 10X
RTL
144
HW/SW Co-
Simulation
l Couple a software execution environment with a
hardware simulator
l Provides complete visibility and
debugger interface into each
environment
l Software normally executed on an Instruction Set
Simulator (ISS)
l A Bus Interface Model (BIM) converts abstract
software operations into detailed pin
operations
145
Advantages of HW/SW Co-Simulation
(1/2)
l Simulate in minutes instead of days
l Early architecture closure reduces risk by 80%
146
Advantages of HW/SW Co- (2/2)
Simulation
l Software Engineers
– Simulation model replace stub code
– More time to develop & debug code
– Validate code against hardware as you
develop
– Maintain software design integrity
l Hardware Engineer
– Embedded software replaces test bench
– Reduce the chance of an ASIC or Board spin
– Resolve gray areas before tape out
147
Synopsys’s Solution
SystemC
l System Studio
– SystemC System Studio
simulation DesignWare
l SystemC Compiler
– SystemC
synthesis SystemC
– System Compiler
l DesignWare C/C++
AMBA/ARM C
Design Compiler
models
Compiler/
Physical
Compiler
So
C
Reference : Synopsys
148
Synopsy System Studio
s
Architecture
Algorithm
ARM9 / AHB
SystemC
Simulation
Hardware
Software
Debugger
Memory
Bus
Unified
Verilog SystemC
Single-kernel
architecture
VHDL
PSL/Sugar
AMS Algorithm
Acceleration-on-
Demand
Reference : Cadence 151
Conclusion
s
l The system level design is a new design challenge
– Both hardware and software issues have to be considered
l High level abstraction and modeling is essential
for system design in future
– SystemC is a more mature language, but not the only one
l Co-design methodology can reduce the design cycle
– Allow earlier HW/SW integration
l Virtual co-simulation environment is required
– Reduce the cost and design cycle of hardware prototype
– Simulate 100x~1000x faster than RTL with the models of
higher level of abstraction
l A hot and hard area for designers and EDA vendors
152
References
l Book materials
– System Design with SystemC, T. Grotker, S. Liao, G. Martin, S.
Swan, Kluwer Academic Publishers.
– Surviving the SOC Revolution, H. Chang et. Al, Kluwer Academic
Publishers.
l Manual
– SystemC Version 2.0 User’s Guide
– Functional Specification for SystemC 2.0
l Slides
– SoC Design Methodology, Prof. C.W Jen
– Concept of System Level Design using Cocentric System Studio, L.F
Chen
l WWW
– http://www.synopsys.com
– http://www.systemc.org
– http://www.forteds.com
– http://www.celoxica.com
– http://www.adelantetechnologies.co
– m
– http://mint.cs.man.ac.uk/Projects/UPC/Languages/VHDL+.htm
– l
– http://eesof.tm.agilent.com/products/ads2002.html
– http://www.specc.gr.jp/eng/ 153
http://www.coware.com
http://www.doulos.com
SystemC: Co-specification and
Embedded System Modeling
Module:4 SoC and NoC Interconnection Structures 7hours CO4
3
SoC Buses Overview
AMBA
AMBAbus
bus Wishbone
ASB
ASB(Advanced
(AdvancedSystem
SystemBus)
Bus) CoreFrame
AHB
AHB(Advanced
(AdvancedHigh-
High- Marble
performance Bus)
performance Bus) PI bus
APB (Advanced Peripheral Bus)
OCP
Avalon
VCI (Virtual Component Interface)
CoreConnect
SiliconBackplane Network
PLB (Processor Local Bus)
OPB (On-chip Peripheral Bus)
ST Bus
Type I (Peripheral protocol)
Type II (Basic Protocol)
Type III (Advanced protocol)
4
Introduction
SOC designs involves the integration of intellectual property (IP)
cores, each separately designed and verified.
5
Overview: Interconnect Architectures
All above SOC modules need to communicate with each other for
the proper operation of system. Interconnects are used to do the
communication between them.
7
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
1. Communication Bandwidth:
2. Communication Latency:
4. Concurrency Requirement:
8
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
1. Communication Bandwidth:
9
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
2. Communication Latency:
For example:
Watching a movie that is a couple of seconds later than when it is
actually broadcast is of no consequence.
In contrast, even small, unanticipated latencies in a two - way mobile
communication protocol can make it almost impossible to carry out a
conversation.
Hence, Latency may or may not be important in terms of overall system
performance.
10
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
11
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
4. Concurrency Requirement:
12
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
For a bus, this consists of an address with control bits (read/write, etc.)
and data.
13
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
14
Overview: Interconnect Architectures
System level issues and specifications while Choosing a suitable
interconnect architecture:
For example:
A video camera captures pixel data at a rate governed by the video
standard used,
while a processor’s clock rate is usually determined by the
technology and architectural design.
As a result, IP blocks inside an SOC often need to operate at
different clock frequencies, creating separate timing regions known
as clock domains.
Crossing between clock domains can cause deadlock and
synchronization problems. 15
Bus: Basic Architecture
The Computer Systems heavily dependent on the characteristics of
its interconnect architecture.
16
Bus: Basic Architecture
The speed at which the bus can operate is often limited by:
17
Bus: Basic Architecture
Arbitration and Protocols:
Some logic must be present to use the bus; otherwise, two units
may send signals at the same time, causing conflicts.
The bus master controls the bus paths using specific slave
addresses and control signals.
18
Bus: Basic Architecture
Arbitration and Protocols:
The split bus has separate buses for each of these functions.
21
AMBA bus
22
AMBA bus
AMBA is Advanced Microcontroller Bus Architecture
24
AMBA bus
25
AMBA bus
The three distinct buses specified within the AMBA bus are:
ASB - First generation of AMBA system bus used for simple cost-
effective designs.
ASB supports:
burst transfer,
pipelined transfer operation and
multiple bus masters.
26
AMBA bus
The three distinct buses specified within the AMBA bus are:
27
AMBA bus
The three distinct buses specified within the AMBA bus are:
The bridge is peripheral bus master, while all buses devices (Timer,
UART, PIA, etc) are slaves.
28
AMBA AHB
AMBA AHB implements the features required for high-performance,
Burst transfers
Split transactions
Non-tristate implementation
29
AMBA AHB
A typical AMBA AHB system design contains the following components:
AHB master
AHB slave
30
AMBA AHB
A typical AMBA AHB system design contains the following components:
AHB arbiter
The bus arbiter ensures that only one bus master at a time is
allowed to initiate data transfers.
AHB decoder
31
AMBA AHB
The APB bridge is the only bus master on the AMBA APB.
Drives the APB data onto the system bus for a read transfer.
35
AMBA APB
Bridge
Block diagram
of bridge module
37
CoreConnect Bus
39
CoreConnect Bus
PLB Bus : Processor Local Bus
Example of PLB
Interconnection
Single cycle transfer of data between OPB bus master and OPB
slaves, etc.
Instead of tristate drivers OPB uses distributed multiplexer.
42
CoreConnect Bus
OPB Bridge:
PLB masters gain access to the peripherals on the OPB bus through
The OPB bridge acts as a slave device on the PLB and a master on
the OPB.
It supports word (32-bit), half-word (16-bit) and byte read and write
The OPB bridge performs dynamic bus sizing, allowing devices with
43
CoreConnect Bus
DCR Bus: Device Control Register Bus
44
IBM CoreConnect Vs ARM AMBA Architectures
45
Bus Sockets and Bus Wrappers
Using a standard SOC bus for the integration of different reusable IP
blocks has one major drawback.
46
Bus Sockets and Bus Wrappers
OCP - Open Core Protocol.
The OCP defines a point - to – point interface between two
communicating entities such as two IP cores using a core - centric
protocol.
A system consisting of three IP core modules using the OCP and bus
wrappers is shown in Figure.
One module is a system initiator, one is a system target, and another is
both initiator and target.
47
Analytic Bus Models
Contention and Shared Bus:
When contention occurs, either (1) it delays its request and is idle until
the resource is available or (2) it queues its request in a buffer and
proceeds until the resource is available.
49
NOC: Networks on Chip
SoC Interconnection Structures
Overview
• Introduction to Networks on a Chip
• Bus and Point-to-point NoC Systems
• Routing Algorithms and Switching Techniques
• Flow Control
• NOC Topology Generation and Analysis
Chapter 5: Computer System Design – System on Chip by M.J. Flynn and W. Luk
Chapter 12: On-Chip Communication Architectures – SoC Interconnect by S. Pasricha & N. Dutt
System-on-Chip and NoC
System-on-Chip --to-- Network-on-Chip
CPU
MPEG CORE
DSP
VGA CORE
Analog Component
ADC/DAC
p1 bus
p3
A communication link
System Bus
MSG
MSG
Decoded Message
NOC and SOC Design 12
What is an NoC?
• Network-on-chip (NoC) is a packet switched on-chip
communication network designed using a layered methodology
―routes packets, not wires‖
• NoCs use packets to route data from the source to the destination
PE via a network fabric that consists of
switches (routers)
interconnection links (wires)
15
Network-on-Chip vs. Bus Interconnection
• Total bandwidth grows BUS inter-connection is fairly
• Link speed unaffected simple and familiar
• Concurrent spatial reuse
However
• Pipelining is built-in
• Bandwidth is limited, shared
• Distributed arbitration • Speed goes down as N grows
• Separate abstraction layers • No concurrency
However
• No performance guarantee • Pipelining is tough
• Extra delay in routers • Central arbitration
• Area and power overhead? • No layers of abstraction
(communication and
• Modules need NI
computation are coupled)
• Unfamiliar methodology
S
NOC and SOC Design 17
Advanced Bus
Segmented Bus
• More General/Versatile
bus architecture Shared Bus to Segmented Bus
• Pipelining capability
• Burst transfer
• Split transactions
• Overlapped arbitration
• Transaction preemption, S
resumption & reordering
S
• NoC links:
Regular
Point-to-point -- no fan-out tree (problem)
Can use transmission-line layout
Well-defined current return path
• Can be optimized for noise / speed / power
Low swing, current mode, ….
Direct Topologies
each node has direct point-to-point link to a subset of
other nodes in the system called neighboring nodes
nodes consist of computational blocks and/or
memories, as well as a NI block that acts as a router e.g.
Nostrum, SOCBUS, Proteo, Octagon
as the number of nodes in the system increases, the total
available communication bandwidth also increases
fundamental trade-off is between connectivity and cost
NOC and SOC Design 23
NoC Topology
• Most direct network topologies have an orthogonal
implementation, where nodes can be arranged in an
n-dimensional orthogonal space
Routing for such networks is fairly simple
e.g. n-dimensional mesh, torus, folded torus, hypercube, and octagon
• 2D mesh is most popular topology
All links have the same length
• eases physical design
Chip area grows linearly with the number
of nodes
Must be designed in such a way as to
avoid traffic accumulating in the
center of the mesh
5 6 7 8 5 6 7 8
9 10 11 12 9 10 11 12
13 14 15 16 13 14 15 16
13 14 16 15
5 6 7 8
9 10 11 12 5 6 8 7
13 14 15 16 9 10 12 11
Message
Header Payload
Packet
Flit
Type
Type
Type
VC
VC
STALL/GO
Low overhead scheme
Requires only two control wires
• one going forward and signaling data availability
• the other going backward and signaling either a condition of buffers
filled (STALL) or of buffers free (GO)
Implement with distributed buffering (pipelining) along link
good performance – fast recovery from congestion
does not have any provision for fault handling
• higher level protocols responsible for handling flit interruption
NOC and SOC Design 47
Flow Control Schemes
T-Error
More aggressive scheme that can detect faults
• by making use of a second delayed clock at every buffer stage
Delayed clock re-samples input data to detect any inconsistencies
• then emits a VALID control signal
Re-synchronization stage added between end of link and
receiving
switch
• to handle offset between original and delayed clocks
Timing budget can be used to provide greater reliability by
configuring links with appropriate spacing and frequency
Does not provide a thorough fault handling mechanism
NOC and SOC Design 48
Flow Control Schemes
ACK/NACK
When flits are sent on a link, a local copy is kept in a buffer by sender
When ACK received by sender, it deletes copy of flit from its buffer
When NACK is received, sender rewinds its output queue and
starts
resending flits, starting from the corrupted one
Implemented either end-to-end or switch-to-switch
Sender needs to have a buffer of size 2N + k
• N is number of buffers encountered between source and destination
• k depends on latency of logic at the sender and receiver
Overall a minimum of 3N + k buffers are required
Fault handling support comes at cost of greater power, area overhead
NOC and SOC Design 49
Flow Control Schemes
ACK/NACK
Dest
Dest
Dest
HF
Dest
Dest
Dest
HF
Dest
F2
Dest
HF
F3
Dest
F2
HF
F4
DeF3st
F2
HF
TF
F4
DeF3st
F2
HF
TF
F4
DeF3st
F2
HF
Dest
Dest
Dest
Dest
Dest
F3 F2 HF
Dest
H
Dest
F
F3 F2
TF
Dest
HF
F2
F3
DTeFs
t
HF
F2
F3
F3
DTeFs
t
HF
F2
Module
or Router
another router
For NE
For SE
For SW
For NW
N E S W
FLIT
FLIT OUT
IN
ROUTING VC
& BUFFERS ALLOCATION ARBITRATION SWITCH
TRAVERSAL
• Targeted Network:
Best-effort, wormhole switched.
Lookup table based source routing.
No virtual channel support.
Round Robin switch output arbitration.
One NI per component master or slave interface.
All transactions converted to packets of the same length (flit
count).
Burst beats converted to separate packets.
NOC and SOC Design 88
System Input and Output
• Input:
Core Graph
Network Parameters
• Output:
Topology Graph
Route tables
Recommended
Operating Clock
Frequency
• Partitioning process.
A)
B)
Clock Frequency:
3.43 GHz
Therefore ρ = ρa =
bus_trans_time/(compute time+bus_trans_time)
Module-to-Module Communication
Point-to-point
Single bus
Multiple buses
Transfer S1 S0 L2 L1 L0
R0 <= R2 1 0 0 0 1
R0 <= R1, R2 <= R1 0 1 1 0 1
R0 <= R1, R1 <= R0 Impossible
COE838: SoC Design ©G. Khan 6
Single Bus Interconnection
Control Lines
Address Lines
Data Lines
Shared bus
Component
Synchronous
Bus wire 2
… Component
Synchronous
Source Destination
Bus wire n
Asynchronous buses:
Low level techniques: add repeaters
COE838: SoC Design ©G. Khan 38
Summary
On-chip communication architectures are
critical components in SoC designs
◦ Power, performance, cost, reliability constraints
◦ Rapidly increasing in complexity with the no. of cores
Review of basic concepts of (widely used)
bus- based communication architectures
Open Problems
◦ Designing communication architectures to satisfy
diverse and complex application constraints
• The AMBA was first introduced by a company named ARM in 1996. The first buses
used in AMBs were the Advanced Peripheral Bus or APB and the Advanced System
Bus or ASB. The design was an immediate success, and this was followed in 1999
by the AMBA 2. In this version, the AMBA added a high-performance bus or AHB
that used a singular clock-edge protocol which advanced the design of the
product.
• By 2003, the AMBA 3 was created and it introduced the Advanced Extensible
Interface or AXI which boosted the performance of the interconnect to an even
higher degree. It also brought along the Advanced Trace Bus or ATB which was
used on the CoreSight trace solution and on-chip debug. This design lasted for
several years until it was surpassed in 2010 by the AMBA 4. This version boosted
the AXI to a considerable degree and laid the foundation for newer versions.
•
• By 2013, the AMBA 5 came along and provided the Coherent Hub Interface or CHI
along with a newly designed high-speed transport application that helped reduce
congestions and create a streamlined approached. So potent has the impact of the
AMBA been that today the protocols are considered the industry standard for all
embedded processors.
• With increasing number of functional blocks (IP)
integrating into SOC designs, the shared bus
protocols (AHB/ASB) started hitting limitations ,
• the new revision of AMBA 3 introduced a point to
point connectivity protocol — AXI (Advanced
Extensible Interface). Further in 2010, an
enhanced version was introduced — AXI 4.
• Following diagram illustrates this evolution of
protocols along with the SOC design trends in
industry.
Advanced eXtensible Interface (AXI)
How AMBA Bus Works
• The AMBA bus was designed to address the interconnect for SoC
application and have the peripherals interface with each other
more efficiently. The purpose of the AMBA bus is to do the
following:
• ASB
• ASB supports features for high-performance systems like burst transfers,
pipelined transfer operation and multiple bus masters. It supports
connection of many processors and memories. ASB bus consists of Master,
Slave, Arbiter and Decoder. Only one master can access the bus at any
time with the help of arbiter. Master initiates the read and write
operations and slave responds to the read and write requests. Address
and appropriate slave are selected using decoder.
• AHB
• AHB is the specifically designed for high-performance designs. It supports
multiple bus masters and supports high bandwidth operations. A typical
AMBA system design contains AHB master, AHB slave, AHB arbiter and
AHB decoder. It is used to connect components like DMA, DSP and
Memory that require high bandwidth on a shared bus.
• AMBA AHB supports features required by high
bandwidth and high frequency designs:
• Burst transfers
• Split Transactions
• Wider data bus configurations (64/128 bits)
• Single-clock edge operation
• Single-cycle bus master handover
AXI
Advanced eXtensible Interface (AXI)
•
AXI is a point-to-point interconnect protocol that
overcomes limitations of shared bus protocols. It
targets high-performance and high-frequency systems
with key features as:
• Multiple outstanding transactions
• Out-of-order data completion
• Burst-based transactions with only start address issued
• Support for unaligned data transfers using strobes
• Simultaneous read and write transactions
• Pipelined interconnects for high speed operation
• ACE
ACE protocol extends the AXI4 protocol along with
hardware-coherent caches. ACE coherency protocol
ensures all the masters see correct data for any address
location. This avoids software cache maintenance to main
coherency between caches. ACE also provides barrier
transactions that guarantee ordering of multiple
transactions within a system and Distributed Virtual
Memory (DVM) functionality to manage virtual memory.
• CHI
CHI protocol defines interfaces for connection of fully
coherent processors. It is a packet based layered
communication protocol with Protocol, Link and Network
layer. It is topology independent and provides Quality of
Service (QoS) based mechanism to control resources in the
system. It supports high-frequency and non-blocking
coherent data transfers between processors that provides
performance and scale for applications like data center.
On-Chip Busses
• AMBA 2.0, 3.0 (ARM)
• CoreConnect (IBM)
• STBus (STMicroelectronics)
• Sonics Smart Interconnect (Sonics)
• Wishbone (Opencore)
• Avalon (Altera)
• PI Bus (OMI)
• MARBLE (Univ. of Manchester)
• CoreFrame (PalmChip)
Setup Access
Phase Phase
Setup Access
Phase Phase
Wait Access
Setup State
Phase Phase
• one unidirectional
address bus
(HADDR)
• two unidirectional
data buses
(HWDATA,
HRDATA)
• At any time only one
active data bus
Simple AHB Transfer
no wait state
HTRANS: indicates current transfer type (e.g. idle, busy, nonseq, seq)
HMASTLOCK: indicates a locked (atomic) transfer sequence
Slave out/master in
HRDATA[31:0]: the slave read data bus
HREADY: indicates previous transfer is complete
HRESP: the transfer response (OKAY=0, ERROR=1)
Basic Read and Write – No Wait States
Pipelined
Address
& Data
Transfer
Transaction B Starts
Multi-master operation
• Must isolate masters
• Each master assigned
to layer
• Interconnect arbitrates
slave accesses
Full crossbar switch
often not needed
• Slaves 1, 2, 3 are
shared
• Slaves 4, 5 are local
to Master 2
Arbiter
HBREQ_M1
HBREQ_M2
HBREQ_M3
The
transaction
proceeds
Independent
Here it is
Independent
Independent
Independent
Independent
I received
that data correctly. channels synchronized
with ID # or ―tags‖
• Information moves
only when:
Source is Valid, and
Destination is Ready
• Very flexible
Write Data
Channel
PLB OPB
• Pipelined DC
• Low bandwidth R • Low throughput
• Burst modes • Burst mode
• Split transactions • 1 r/w = 2 cycles
• Multiple masters • Multiple Masters
• Ring type data bus
Request and
arbitrator logic
Simultaneous multi-
master system that
permits bus transfers
between two masters
and two slaves.