Professional Documents
Culture Documents
FPGA Design Methods For Fast Turnaround
FPGA Design Methods For Fast Turnaround
Angela Sutton Today’s FPGAs are doubling in capacity every 2 years and have already surpassed the 5 million equivalent
Staff Product ASIC gate mark. With designs of this magnitude, the need for fast flows has never been greater. At the
Marketing Manager, same time, designers are seeking rapid feedback on their ASIC or FPGA designs by implementing
FPGA
quick prototypes or initial designs on FPGA-based boards. These prototypes or designs allow designers
Implementation,
to start development, verification and debug of the design—in the context of system software and
Synopsys
hardware—and also to fine tune algorithms in the design architecture. Quick and intuitive debug iterations
to incorporate fixes are of great value. The ability to perform design updates that don’t completely
uproot all parts of the design that have already been verified is also a bonus! Whether the goal is
aggressive performance or to get a working initial design or prototype on the board as quickly as possible,
this paper provides information on traditional and new techniques that accelerate design and debug
iterations.
Introduction
FPGAs have been used for years to create functioning prototypes of ASIC and System on a Chip (SoC)
designs, and the popularity of this verification technique continues to increase. Indeed, well over 90% of
designs are prototyped using FPGAs. Such verification typically involves squeezing the evolving design,
eventually destined for an ASIC, into the largest, most capable FPGAs available on the market and then
debugging the design along with system software and drivers on the board.
Shorter iteration times for system debug…. results stability from one run to the next … Team and/
or parallel development flows …. Ways to quickly make small changes …. The holy grail
In addition to prototyping, there is increasing demand for FPGAs in production systems. Growth in
capacity, functionality, and performance, accompanied by a decrease in price per gate of FPGAs fuels
this trend. Low-cost FPGA families such as Cyclone-IV from Altera and Spartan-6 from Xilinx offer
million+ ASIC gate equivalent capacity and come equipped with embedded RAM, microprocessors,
dedicated DSP blocks and Gigabit serial transceivers. Part prices in volume have become very attractive
-- ranging from sub-$10 to the low hundreds of dollars. These production designs are typically both large
and challenging, driving a demand for ASIC-style iterative flows and fast design debug cycles. The cost
of a hardware design mistake or update may be cheaper in an FPGA than in an ASIC – simply repartition
and reprogram the chip with the corrected design - but time is still money when it comes to completing a
design project .
Different solutions for different design objectives?
Designers using FPGAs for production may have aggressive QoR goals, whereas designers using FPGAs
for prototypes may not. This paper details approaches available to FPGA production users with tight timing
constraints (QoR focused designs) as well as an expanded set available to ASIC prototypers who have lower
QoR expectations and slack timing constraints to meet.
Virtex-II
Virtex-II 100K LUT
Pro
Pro
1M 2M 5M
Equivalent ASIC gates
Figure 1: FPGA sizes are starting to double with each new generation of silicon meaning
a design iteration can now take days. The example shown is for Xilinx FPGA families
The shorter the design iteration time the better; the more stable the results from one run to the next the
better, since this simplifies the re-verification and debug process.
In this paper we look at a variety of techniques to bring design schedules under control for large FPGAs.
When using your server farm, you might apply slight variations in constraints/RTL or settings such as: Xilinx-
route; vary Place and Route effort level or seeds; and run several variations of design settings, in parallel, on
multiple machines. Then you would compare, contrast and choose the best result or just learn from the results
that you see. This can be somewhat of a trial-and-error process that helps you hunt for the best performance
and area results but may not always yield the results that you need. See Table 2
Table 2: Server Farms deliver more horsepower to let you try different scenarios in parallel
Synplify Premier itself includes a Design Planner feature which can optionally be used to partition RTL,
working in conjunction with physical synthesis or logic synthesis runs.
Table 3: Floorplanning /block based flows preserve design results but can cost QoR
For multi-million gate designs, however, place and route can take a whole work day to complete. This is
problematic if all you want is a quick iteration to test a small design change or when you just want a quick
initial implementation of your prototype on the board (see Figure 2). To chip away at the P&R runtime, some
place and route tools may be run in fast or lower effort modes (Table 4), sacrificing some QoR. For example,
Altera users may use fast P&R modes in the Quartus backend tools that run the fitter extremely fast and Xilinx
users may choose to apply lower effort levels to shorten P&R runtime.
Tune/fix RTL,
constraints. Synplify Premier
Analyze
Define CP’s
(optional)
Fast synthesis
Netlist
ISE
Table 4: Fast (or low effort) Placement and Routing modes speed up the P&R design step
but may cost some QoR
Additionally, users of Synplify Premier and Xilinx ISE P&R tools can perform fast incremental P&R (see Table
5) using the Xilinx “Guided Flow”. This flow which emphasizes results stability is useful when you make minor
changes to the design that are not on the critical path. How does it work? The Xilinx ISE P&R tools determine
“what’s changed” by doing a netlist comparison between 2nd and prior run. The key to the success of this flow
is the ability for the Synplify Premier synthesis tool to synthesize reproducible and deterministic netlists and
instance names from one run to the next, for every iteration. In 2007, Synplify Premier introduced “path group”
technology that localizes changes in a synthesized netlist to only those parts of the design where the RTL or
constraints actually changed. Similar RTL and constraints produce similar results—a reproducible netlist in
other words.
Table 5: Incremental P&R is useful for minor changes not on the critical path
Like Synplify Premier Synthesis, Xilinx and Altera P&R tools have multiprocessing capabilities to reduce
runtime at the cost of some QoR (see Table 6).
Table 6: Multiprocessing during P&R help reduce runtime, but at the cost of QoR
Faster Synthesis
We’ve discussed fast Place and Route (netlist to bitfile)—Now let’s look at ways to speed up design synthesis
(RTL to netlist). A faster synthesis iteration that incorporates and gives you feedback on an RTL or constraint
change in 1 hour instead of 3 hours is very valuable. Synthesis time can indeed be cut using Synplify Premier’s
new FAST synthesis mode (see Figure 3)—which improves runtimes by 2x to 3x for a small reduction in
overall Quality of Results (area and fmax).
RTL, constraints
Synplify Premier
TuneRTL 62% Typical synthesis
constraints Tight timing
runtime savings
constraints
Analyze
Synthesize
Virtex-5, C2009.03
Out of box geomean results
FAST mode ON vs. OFF
Same tight timing constraints
Netlist
When using fast synthesis mode, consider whether your intent is to tune your RTL constraints in which case
you would use this capability for synthesis-only iterations with your normal tight constraints…. Or whether
your intent is a fast iteration RTL bitfile, in which case it is recommended that you use loose timing
constraints (lower QoR).
If your intent is fast synthesis-only iterations…..use normal constraints with Synthesis FAST mode (see Table 7)
Since the Fast Synthesis flow does sacrifice some QoR, it is specifically recommended that you NOT run P&R
on the synthesized netlist; that netlist does reflect sub-optimal area and timing results after all. If you ran P&R
on the synthesis netlist, runtime benefit may be lost in an increased P&R runtime because P&R would have to
work harder to make up for the QoR lost during synthesis. If your intent is faster iterations, RTL to bitfile,….use
loose constraint with Synthesis FAST mode (see Figure 4).
RTL, constraints
Synplify Premier
24% Typical runtime saving
Debug (RTL to bitfile)
design Easy to meet
on the timing constraints 44% Typical runtime saving
board (RTL to netlist)
Synthesize
Virtex-5,
C2009.03.ISE 10.1sp3
Out of box geomean results
FAST mode ON vs. OFF
Netlist
1 MHz global clk timing
constraints
for synthesis and P&R
FPGA P&R
Bitfile
In the Synplify Premier tool, you can use your machine’s multiprocessing capability to synthesize designated
design blocks in parallel on separate processors, speeding your runtimes … up to 30% (see table 9). You can
specify the maximum number of processors to be used.
Disadvantage Generally reduces QoR. May frequently be used with block based flows which can further limit
QoR
Traditionally, if a small number of errors are encountered during synthesis the synthesis tool will promptly
abort the run. This can result in huge design delays if there are a lot of errors because each error will have to
be detected and fixed piecemeal. Suppose that your design synthesis run encounters 100 errors, of just 5
different types…. and that your flow aborts and errors out whenever a cumulative total of 3 errors have been
encountered. You fix the first 3 errors and then re-start your synthesis run - then the next 3 errors surface; they
are similar to the previous 3. You fix them and you have to start synthesis again. Wouldn’t it be better to know
about all 100 errors after 1 synthesis run rather than having to flush the errors 3 at a time? This is possible
with the Synplify Premier Synthesis product thanks to a new “continue synthesis on error” feature (see
Table 10). When possible, the synthesis tool will black box the erroneous portion of the design and continue
to synthesize the remainder of the design. Under the hood, Synplify Premier is automatically partitioning the
design for parallel synthesis. Good, error-free partitions complete while those with errors are black boxed.
Table 10: Synplify Premier Synthesis finds all errors in a design during a single synthesis run
When errors do occur in your project files, figuring out how to fix them can also be time-consuming. Synplify
Premier hyperlinks your error/warning report to useful documentation that helps you to identify a fix.
You can filter these errors/warnings by type so that you can work only on those errors or warnings that are of
interest (see Table 11).
Table 11: An error and warning report is generated allowing you to quickly identify and fix errors in
the design in aggregate
Table 12: Incremental Static Timing Analysis allows you to change exception constraints and see the results
reflected immediately in the timing report, without the need to run synthesis
Getting the design through your FPGA flow can be a challenge, especially when various pieces of the design
are still evolving. If you are creating ASIC prototypes, the team generating the ASIC source RTL could well be
changing that source underneath you every week. Your challenge then would be to respin the ASIC prototype
and provide feedback to the ASIC team on the source files faster than they are changing the source!! And,
pieces of the design may be unavailable or incomplete. Chances are that you will need:
Fast turnaround time using some of the flows previously described in this paper
``
Reproducible and stable results from one run to the next, e.g. if the ASIC team makes a small RTL change
``
to a file, it will only trigger a small change in the resulting FPGA netlist. As previously described, Synplify
Premier applies “path group” technology to localize small changes in the RTL to small changes in the
resulting netlist
The ability to synthesize in the absence of some source files—Modules of your design may be
``
incomplete or unavailable since the source files are still being worked but you may want to get a head start
and synthesize the modules that are already complete. You can do this using a compile-point block based
flow by designating the part of the design that is incomplete as a black box
The flexibility to swap out changed files and manage hundreds of design files. Some ways to do this are
``
described below
If prototyping, fast ASIC design import—an FPGA flow that accepts your (ASIC) files easily without manual
``
modification. Considerations and solutions are outlined in the section on the next page
Figure 6: Synopsys coreConsultant configures DesignWare cores and creates a Synplify Premier or
Synplify Pro-ready project file (scripts and source file)
ASIC design contains The FPGA compatibility issue is The FPGA Synthesis tool must …
Gated Clocks ( Used in FPGAs have no true equivalent of a gated Convert gated clocks to the logical
ASICs to reduce ASIC power clock. Also, when gated clocks are used equivalent without changing the intended
consumption) in a partitioned block based flow, clock functionality; Automatically Convert gated
management, allocation and correct clock to the FPGA equivalent (a register with
implementation across block boundaries a clock enable).
is a challenge so many FPGA tools don’t Support gated clocks even when a clock
support gated clocks in block based flows. exists in multiple blocks of a partitioned chip
DesignWare IP Must understand the meaning of and Accept RTL that includes instantiations
implement any DesignWare Building Block of DesignWare IP building blocks
instantiation or functions when it encounters (and Synthesize them with reasonable
one in the RTL. performance results)
Read any configuration of digital IP core, Accept designs generated and configured
even if it is encrypted by Synopsys coreTools
Your own or 3rd party IP May be encrypted in a way that the Preserve boundaries for the IP if requested.
synthesis tool cannot read. May have been Time through IP. In some cases, internally
highly optimized for ASIC and thus lack unencrypt but protect the IP
performance in an FPGA
Embedded memory functions FPGA tool may not recognize something Implement behavioral synthesis capabilities
in your RTL as a memory. The user has to (e.g. Synplify Premier’s SynCore) that
write specific memory models that work for generate RTL for memory in a way that the
FPGAs. FPGA tool recognizes and can implement
Some FPGA-vendor specific memory cores optimally.
are encrypted, making it impossible to
simulate the synthesized design (because
the memories are black boxed)
Extensive Language Language support may lag the ASIC Ensure compatibility with the most
support (VHDL, Verilog, tool’s support in particular with respect to commonly used Synthesizable ASIC RTL
SystemVerilog, VHDL 2008) SystemVerilog support.
Debug RTL/netlist/
Import constraints Debug
When Turnaround time RTL to bitfile implementation on the board is the priority, you may continue to
use fast synthesis and may use either fast or incremental or normal Place and Route modes for subsequent
iterations. Block based flows and multiprocessing are useful tools in your tool chest and you can continue to
use fast synthesis for subsequent iterations. See Figure 8.
Figure 8: Example flow—Quick ASIC Prototype: Priority = is fast board implementation and fast respins
These techniques for QoR and Quick Prototype design were summarized in Table 1.
Conclusion
Users of large FPGAs can get their products out the door much faster when design turnaround time is
reduced by using some or all of the methods described in this paper. Additionally, it is very valuable to have
results stability from one design run to the next when incorporating changes and to have the ability to quickly
integrate these changes and see the results. As FPGAs get larger the engineering teams developing them are
also growing requiring new parallel design methodologies be adopted.
At the same time, users generally don’t desire disruptive changes to the design methodology and, when
prototyping, hope that the methodology will not require significant changes to the ASIC project files for
them to be accepted by the FPGA flow. Synplify Premier delivers a menu of technologies including “fast
synthesis”,and “continue upon synthesis error” technology, block based flows, incremental flows, and ASIC
compatibility for prototypers. These capabilities ensure that large designs can be delivered on schedule.
Synopsys, Inc. 700 East Middlefield Road Mountain View, CA 94043 www.synopsys.com
©2010 Synopsys, Inc. All rights reserved. Synopsys is a trademark of Synopsys, Inc. in the United States and other countries. A list of Synopsys trademarks is
available at http://www.synopsys.com/copyright.html. All other names mentioned herein are trademarks or registered trademarks of their respective owners.
03/10.MH.10-18253.