Understanding Design Capacity in Hardware Emulators

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

January 9, 2015

Understanding Design Capacity


in Hardware Emulators

Lauro Rizzatti, Verification Consultant

Verification Consultant Lauro Rizzatti explains that the three different types of hardware
emulator offer different design capacities, thereby giving users more options.

nlike software simulators, whose


specifications do not mention limits in
design capacity, a primary specification
of hardware emulators is the maximum size
of the designs they can handle. More to the
point, different emulators have different
limits on design capacity. Why is this?
Lets first address the concept of design
capacity in a software simulation
environment. There is a good reason why
the vendors of simulators whether logic
simulators at the register transfer level (RTL)
and electronic system level (ESL), or gate
level, or even analog simulators dont
specify the maximum capacities of their
tools.
A simulator is essentially a software
algorithm running on a computer. The
algorithm processes data representing a
design model described in a design language
at one of multiple hierarchical levels as
illustrated in Table 1.
Design Description Level

Description Language
SystemVerilog, SystemC,
Electronic Sysytem Level
C++
Register Transfer Level
SystemVerilog, Verilog, VHDL
Gate Level
Verilog Netlist, EDIF Netlist
Transistor Level
SPICE Netlist

Table 1. Design hierarchical levels and


corresponding description languages.
The data representing the design model
resides on the hard drive of the computer.
When the simulator is invoked, that data
is moved into the host computers memory.
If the entire design fits into the physical
memory, the user can achieve the maximum

speed of execution for that design with the


given stimulus and the specific simulator. If
the design is too large to fit in the memory,
then only portions of the design will be
loaded. When the processing of a particular
portion is completed, the algorithm swaps
out the processed design portion and swaps
in the next design portion.
Clearly, the larger the memory, the larger
the size of the design portions that can be
moved in and out of memory. Two problems
that may arise are excessive memory
swapping and/or cache misses, both of
which can have deleterious effects on the
speed of execution.
The bottom line is there is no hard limit
to the design size any given software
simulator can handle. Rather, when the
design size reaches several tens of million
gates, the speed of execution may drop to
such an extent that it becomes impractical
to simulate. Often, designers of high end
processors claim simulation performance
of less than one cycle-per-second, which is
pathetically slow if the user needs to execute
many millions of cycles.
Hardware emulators are a completely
different matter and to further complicate
the story not all hardware emulators
are created equal. For this analysis, we
can divide them into three main classes:
processor-based emulators, custom FPGAbased emulators (also called emulators-onchip), and standard FPGA-based emulators.
The first two are based on custom chips; the
third is built using arrays of commercial
FPGAs.

Regardless of the type, all three emulators


have limits in terms of the maximum design
sizes they can handle, though there are
differences.
A processor-based emulator vaguely
resembles a software simulator in that
the design database, stored in memory, is
processed by a computing engine made
up of a vast array of Boolean solvers or
processors; hence the name of this emulator
type. As in a simulator, the larger the
memory, the larger the design the user can
process. In a processor-based emulator,
unlike a simulator based on a single
algorithm (simulators can run in parallel on
computer farms, but each computer executes
one simulation algorithm), there are ultra
large arrays of Boolean solvers running in
parallel. This computing scheme allows for
a somewhat soft limit in design capacity.
A user can slightly exceed the maximum
capacity specified by the vendor, maybe by
as much as 10 percent, at the expense of a
drop in performance that may be significant.
By contrast, both the custom FPGA-based
and the standard FPGA-based emulators
map the design into the reprogrammable
resources of the FPGAs (gate-for-gate,
roughly speaking)> The difference
between the two arises from their internal
architecture (see Whats The Difference
Between FPGA And Custom Silicon
Emulators?, Electronic Design, April 14,
2014).
The architecture of the custom FPGA used
in the emulator-on-chip assures a high
level of utilization of the reprogrammable

resources, approaching 100 percent before


running into routing congestion. Also, this
architecture is designed to ease the process
of partitioning a large design into a large
array of such custom FPGAs. Furthermore,
the architecture allows for extremely fast
place-and-route (P&R), in the ballpark of
five minutes per custom FPGA. However,
such a custom FPGA would make a poor
choice as a general-purpose FPGA, because
the capacity of custom FPGAs is a fraction
of that offered by the largest commercial
FPGAs. The end result is that the
architecture of the custom FPGA perfectly
suits the buildup of large arrays of FPGAs as
required in a modern emulation platform.
The architecture of a high-end commercial
FPGA offers the largest number of
reprogrammable resources, but at the
tradeoff of P&R processing time. At 90
percent utilization of the reprogrammable
resources, a successful compilation without
routing congestion may take more than
20 hours. When a commercial FPGA is
used as a building block in an emulation

platform, the emulator vendor may have


to deal with hundreds of these FPGAs.
Partitioning and P&R become issues that
have to be addressed by the compilation
software. Looking at emulator specification
sheets, it appears that the utilization of the
commercial FPGA is limited to about 50
percent. This helps to dramatically reduce
compilation time.
Table 2 compares capacities and compilation
times among the three main types of
emulation platforms.
Specifications
Max design
capacity per
box
Max design
capacity per
system
Compilation
time

ProcessorCustom
Standard
based
FPGA-based FPGA-based
72MG

256MG

300MG

2.3BG w/
32 boxes

2BG w/
8 boxes

3BG w/
10 boxes

70MG/
hour

35MG/hour

5MG/hour

Table 2. Comparison of capacities and


compilation times in processor-based,
emulator-on-chip, and commercial FPGAbased emulators based on vendor datasheets.

Dr. Lauro Rizzatti is a verification consultant. He was formerly


general manager of EVE-USA and its vice president of marketing
before Synopsys acquisition of EVE. Previously, Lauro held positions
in management, product marketing, technical marketing, and
engineering. He can be reached at lauro@rizzatti.com.

In summary, simulation and emulation are


two very different verification tools, each
with its own advantages and limitations.
Simulator vendors offer no hard data on
design size limits, whereas design teams
largely judge emulators on design capacity
and size. Simulation runs out of steam when
it is required to run a large number of gates,
while the three different types of hardware
emulators offer different design capacities,
thereby giving users more options.

You might also like