Professional Documents
Culture Documents
Slides
Slides
Technology Timeline
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
4/4/2011
Feature
Technology node Reprogrammable Reprogramming speed (inc. erasing) Volatile (must be programmed on power-up) Requires external configuration file Good for prototyping Instant-on IP Security Size of configuration cell Power consumption Rad Hard
SRAM
State-of-the-art Yes (in system) Fast
Antifuse
One or more generations behind No
E2PROM / FLASH
One or more generations behind Yes (in-system or offline) 3x slower than SRAM No (but can be if required) No Yes (reasonable) Yes Very Good Medium-small (two transistors) Medium Not really
----
Yes
No
4/4/2011
Xilinx
Xilinx
Altera
All Others
Source: Company reports Latest information available; computed on a 4-quarter rolling basis
FPGA Families
Low-cost
Spartan 3 Spartan 3E Spartan 3L
High-performance
Virtex 4 LX / SX / FX Virtex 5 LX
Xilinx
Cyclone II
Stratix II Stratix II GX
Altera
4/4/2011
Xilinx
Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Source: [Xilinx Inc.]
Xilinx
Primary products: FPGAs and the associated CAD software
Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Source: [Xilinx Inc.]
4/4/2011
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
4/4/2011
Xilinx FPGA
Configurable Logic Blocks
Block RAMs Block RAMs
I/O pad
4/4/2011
Xilinx CLB
Configurable logic block (CLB) Slice CLB CLB Logic ll L i cell Logic cell Slice Logic ll L i cell Logic cell
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
4/4/2011
Slice S2 SHIFT
Slice S1
Slice S0
Local Routing
CIN
CIN
4-input p LUT
y mux flip-flop q
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
4/4/2011
4/4/2011
Look-Up Tables
Combinatorial logic is stored in Look-Up Tables (LUTs)
Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the complexity A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 . .
Z
A B C D
1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1
10
4/4/2011
LUT
x1 x2 x3 x4
x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0
Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs i
x1 x2 y y
11
4/4/2011
F5
F5 GXOR G
F4 F3 F2 F1 BX
A4 A3 A2 A1
WS
DI D
nBX BX 1 0
LUT
OUT
LUT
12
4/4/2011
Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer
MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice
Slice S1
F5
Slice S0
F6
F F5
F5
F6
13
4/4/2011
Early devices were based on the concept of programmable logic block, which comprised
3-input 3 input lookup table (LUT), (LUT) register that could act as flip flop or a latch, multiplexer, along with a few other elements.
FPGA vendors and researchers studied the relative merits of 3, 4, 5 and even 6 input LUTS.
The current consensus is that 4-input LUTS offer the optimal balance of pros and cons.
In the past, some devices were created using a mixture of different LUT sizes because this offered the promise of optimal device utilization. However current logic synthesis tools prefer uniformity and regularity
14
4/4/2011
LUT Example: Implement the function using: 2 input 2-input LUTs 3-input LUTs 4-input LUTs
F = ABD + BC D + A B C
A B D B C D A B C
A B D F B C D A B C F A B C D F
Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters p ,
LSB
Each CLB contains separate logic and routing for the fast MSB generation of sum & carry signals
15
4/4/2011
COUT
To CIN of S2 of the next CLB
SLICE S3
CIN COUT
SLICE S2 SLICE S1
COUT
CIN
CIN
CIN
CLB
Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then) Counters (count <= count +1)
16
4/4/2011
Shift Register
LUT
IN CE CLK
D CE
D CE
Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays d l Use CLB flip-flops to add depth
LUT
D CE
OUT
D CE
DEPTH[3:0]
17
4/4/2011
Shift Register
12 Cycles Operation A 64 4 Cycles Operation C 3 Cycles 3 Cycles Operation B 8 Cycles 64
9-Cycle imbalance
12 Cycles
Operation A Operation B
64
4 Cycles
Operation C
8 Cycles
Operation D - NOP
64
3 Cycles
12 Cycles
9 Cycles
Paths are Statically Balanced
18
4/4/2011
Distributed RAM
CLB LUT configurable as Distributed RAM
An LUT equals 16x1 RAM Cascade LUTs to increase RAM size
LUT
RAM16X1S
=
RAM32X1S
D WE WCLK A0 A1 A2 A3 A4 O
D WE WCLK A0 A1 A2 A3
LUT
=
LUT
or
RAM16X2S
D0 D1 WE WCLK A0 A1 A2 A3 O0 O1
RAM16X1D
D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 SPO
or
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
19
4/4/2011
4-input p LUT
y mux flip-flop q
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
20
4/4/2011
These blocks can be used for a variety of purposes, such as implementing standard single or dual port RAMs, FIFO, e.t.c.
Block RAM
Port B Port A
Spartan-3 Dual-Port Block RAM
Block RAM
Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)
21
4/4/2011
1 0
8k x 2
4,095
4k x 4
16k x 1
8,191 0
8+1
2k x (8+1) ( )
2047 16+2 0 1023 16,383
1024 x (16+2)
22
4/4/2011
23
4/4/2011
DOB[8:0]
Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic
24
4/4/2011
Embedded Multipliers
Some functions, like multipliers are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. g g Current FPGA incorporate special hard wired multiplier blocks which are typically located in close proximity to the embedded RAM blocks (Arithmetic Based Applications).
25
4/4/2011
18 x 18 Embedded Multiplier
Fast arithmetic functions
Optimized to implement multiply / accumulate modules
18 x 18 signed multiplier Fully combinational Optional registers with CE & RST ( i li ) O i l i ih (pipeline) Independent from adjacent block RAM
18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2s complement signed operation
18 x 18 Multiplier
Data_B (18 bits)
26
4/4/2011
Positions of Multipliers
27
4/4/2011
Special clock pin and pad Clock signal from outside world
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
28
4/4/2011
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
29
4/4/2011
30
4/4/2011
Output Path
SR
31
4/4/2011
IOB Functionality
IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered
advised for high-performance I/O
In the past, resistors were applied as discrete components (outside the FPGA). FPGA) Today's FPGAs allow the use of internal terminating resistors whose value can be configured by the user.
32
4/4/2011
FPGA Nomenclature
33
4/4/2011
Virtex-II / Pro 44,000 Logic Slices 444 18Kbits BRAMs 444 18x18 Multipliers 2 PowerPC Processors 20 Gbit I/O 1164 Max User I/O
34
4/4/2011
1t 4P to PowerPCs PC 4 to 16 multi-gigabit transceivers 12 to 216 multipliers 3,000 to 50,000 logic cells 200k to 4M bits RAM 204 to 852 I/Os
PowerPCs
Logic cells
35
4/4/2011
The main advantage of this scheme is the inherent speed p advantages to be gained from having the processor core in intimate proximity to FPGA fabric.
36
4/4/2011
Soft Core
As opposed to embedding a microprocessor physically into the fabric of the chip, it is possible to configure a group of p g programmable logic blocks to act as a microprocessor. g p Soft cores are simpler (more primitive) and slower than their hard-core counterparts.
ADVANTAGE?
1.
2.
The main advantage of this scheme is that the user need only implement a core if he/she needs it. Also, the user can instantiate as many cores as they require until they run out of resources!
Virtex Architectures
Built for high-performance applications
Other Families include Virtex-II Pro Virtex-4 Virtex-5 Latest Family include Virtex-6
Basic Architecture 74
37
4/4/2011
High performance True Dual-port RAM - 8 Mb SelectIO- Ultra Technology - 1164 I/O
RocketIO and RocketIO X High-speed Serial Transceivers 622 Mbps to 3.125 Gbps PowerPC Processors 400+ MHz Clock Rate - 2 XCITE Digitally Controlled Impedance Any I/O DCM Digital Clock Management - 12
Basic Architecture 75
Virtex-4 Family
Advanced Silicon Modular BLock (ASMBL) Architecture Optimized for logic, Embedded, and Signal Processing
LX
Resource
FX
12K 12K140K LCs 0.6 0.610 Mb 420 32 32192 240 240896 024 Channels 1 or 2 Cores 2 or 4 Cores
SX
23K 23K55K LCs 2.3 2.35.7 Mb 48 128 128512 320 320640 N/A N/A N/A
Logic Memory DCMs DSP Slices SelectIO RocketIO PowerPC Ethernet MAC
14K 14K200K LCs 0.9 0.96 Mb 412 32 3296 240 240960 N/A N/A N/A
Basic Architecture 76
38
4/4/2011
Virtex-4 Architecture
RocketIO Multi-Gigabit Transceivers
622 Mbps10.3 Gbps
Smart RAM
New block RAM/FIFO
Advanced CLBs
200K Logic Cells
1 Gbps SelectIO
ChipSync Source synch, XCITE Active Termination
Basic Architecture 77
Virtex-5 Family
Optimized for logic, Embedded, Signal Processing, and High-Speed Connectivity
Virtex-5 Platforms
LX
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
LXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
SXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
FXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.
Logic
Logic On-chip RAM DSP Capabilities Parallel I/Os Serial I/Os PowerPC Processors
Logic/Serial
DSP/Serial
Emb./Serial
Basic Architecture 78
39
4/4/2011
Virtex-5 Architecture
Enhanced
36Kbit Dual-Port Block RAM / Dualg FIFO with Integrated ECC 550 MHz Clock Management Tile with DCM and PLL SelectIO with ChipSync Technology and XCITE DCI Advanced Configuration Options 25x18 DSP Slice with Integrated ALU RocketIO Transceiver Options TriTri-Mode 10/100/1000 Mbps Ethernet MACs
LowLow-Power GTP: Up to 3.75 Gbps HighHigh-Performance GTX: Up to 6.5 Gbps
New
Most Advanced HighHighPerformance Real 6LUT Logic Fabric PCI Express Endpoint Block System Monitor Function with BuiltBuilt-in ADC Next Generation PowerPC Embedded Processor
Basic Architecture 79
Spartan-3
Bank 0 Bank 2
4 I/O Banks, Support for all I/O Standards including PCI, DDR333, RSDS, mini-LVDS
Bank 3
Bank 1
Basic Architecture 80
40
4/4/2011
Spartan-3 Family
Based upon Virtex-II Architecture Optimized for Lower Cost
Logic resources
Only one-half of the slices support RAM or SRL16s (SLICEM) Fewer block RAMs and multiplier blocks
Clock Resources
Fewer global clock multiplexers and DCM blocks
I/O Resources
Fewer pins per package No internal 3-state buffers Support for different standards
New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL Default is LVCMOS, versus LVTTL
Basic Architecture 81
Slice X1Y1
Slice X1Y0
Slice X0Y1
Slice X0Y0
Fast Connects
SHIFTOUT CIN
CIN
41
4/4/2011
Basic Architecture 83
Spartan-3E Features
More gates per I/O than Spartan 3 Spartan-3 Removed some I/O standards
Higher-drive LVCMOS GTL, GTLP SSTL2_II HSTL_II_18, HSTL I HSTL II 18 HSTL_I, HSTL_III LVDS_EXT, ULVDS
DDR Cascade
Internal data is presented on a single clock edge Architecture Basic
84
42
4/4/2011
Basic Architecture 85
Spartan-3A DSP
Tuning DSP Performance
Integrated XtremeDSP Sli Xt DSP Slice
Application optimized capacity Integrated pre-adder optimized for filters 250 MHz operation, standard speed grade Compatible with VirtexDSP
XtremeDSP DSP48A Slice
43
4/4/2011
Function Multiplier Pre-Adder Cascade Inputs Cascade Output Dedicated C input Adder Dynamic Opmodes ALU Logic Functions Pattern Detect SIMD ALU Support Carry Signals
DSP48
DSP48 Comparison
DSP48E 25 x 18 No Two Yes Yes 3 input 48 bit Yes Yes Yes Yes Carry In & Out DSP48A 18 x 18 Yes One Yes Yes 2 input 48 bit Yes No No No
Enables parallel ALU operations on multiple data sets.
Benefit
Reduces FPGA resource needs for DSP algorithms. Reduces the critical path timing in FIR filter applications better performance. Important in FIR filter construction. Enables fast d t E bl f t data path chaining of DSP48 bl k f l th h i i f blocks for larger filt filters. Enables fast data path chaining of DSP48 blocks for larger filters. The C input supports many 3-input mathematical functions, such as 3input addition and 2-input multiplication with a single addition and the very valuable rounding of multiplication away from zero. Supports simple add and accumulate functions.
One DSP48 can provide more than one function.. Multiply, Multiply-add, multiply-accumulate etc. Similar to the ALU of a microprocessor. Enables the selection of ALU function on a clock cycle basis Enables multiple functions to be selected. (Add, Subtract, or Compare) This feature supports convergent rounding, underflow/overflow detection for saturation arithmetic, and auto-resetting counters/accumulators.
Supports fast carry functions between DSP blocks. Often a speed limiting path.
Basic Architecture 87
126 DSP48As 126 2,268 373 47,744 53, 53 712 8 213 309 469
44
4/4/2011
Latest Families
Basic Architecture 89
Architecture Alignment
Virtex-6 FPGAs Spartan-6 FPGAs
Common Resources
45
4/4/2011
Virtex-6 HXT
Market Size
Virtex-6 SXT
High logic density + serial connectivity Ultra high-speed serial connectivity + logic DSP + logic + serial connectivity
+ 100s More
Designers Eccentrics
Higher System Performance
More design margin to simplify designs Higher integrated functionality
Lower Power
Help meet power budgets Eliminate heat sinks & fans Architecture 92 Basic Prevent thermal runaway
46
4/4/2011
Virtex-6 Family
Basic Architecture 93
Virtex-5
65-nm 6
Virtex-4
90-nm
Virtex-II Pro
130-nm
Virtex-II
150-nm
Virtex-E
180-nm 180 nm
Virtex
220-nm
2nd Generation 3rd Generation 4th Generation 5th Generation 6th Generation
1st Generation
47
4/4/2011
D Dynamic P i Power R d ti Reduction Reduced Core Voltage Devices Lower Overall Power I/O Power Improvements System Monitor
Slice
LUT LUT LUT LUT LUT LUT
Slices implement logic functions (slice_l) Slices for memories and shift registers (slice_m) LUT6 implements
All functions of up to 6 variables Two functions of up to 5 or less variables each Shift registers up to 32 stages long Consumption Benefits PowerMemories of 64 bits Performance Benefits
Shift register Multiple configurations within slice_m memories mode greatly reduces power Increased ratio of a slice consumption over FF implementation available closer to the source or target logic
Basic Architecture 96
CLB
Cost Benefits
48
4/4/2011
Pattern detector
Basic Architecture 98
49
4/4/2011
Spartan-6 Family
Basic Architecture 99
Spartan-6
Next Generation 45nm Spartan Family
Increased performance & density Evolutionary feature enhancements Dramatic cost & power reductions
50
4/4/2011
SpartanSpartan-6
LUT / Dual FF Pair 6LUT
LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic
Slice mix chosen for the optimal balance of Cost, Power & Performance
Basic Architecture 102
51
4/4/2011
Lower IO power Low power option -1L reduces power even further Fewer supply rails reduces power
Basic Architecture 103
52
4/4/2011
Memory Controller
Only low cost FPGA with a hard memory controller G Guaranteed memory interface performance providing t d i t f f idi
Reduced engineering & board design time DDR, DDR2, DDR3 & LP DDR support Up to 12.8Mbps bandwidth for each memory controller
DRAM
SRAM
FLASH
EEPROM
53
4/4/2011
OR
9K BRAM
Compare to Spartan-3A
Twice the Capabilities, Half the Power, Hard Blocks!
Feature Logic Cells (Kbit) LUT Design Block RAM (Mbit) Transceiver Count / Speed Voltage Scaling Static Power (typ mW) Memory Interface Max Differential IO Multipliers/DSP Memory Controllers Clock Management PCI Express Endpoint Security
Basic Architecture 108
Extended Spartan-3A (90nm) Up to 55K 4 input 4-input LUT + FF Up to 2 Mbit no No (1.2V only) 11 mW (smallest density) 400 Mbps 640 Mbps Up to 126 Multipliers / DSP no DCM Only no Device DNA Only
Spartan-6 (45nm) Up to 150K 6 input 6-input LUT + 2FF Up to 5 Mbit Up to 8 / Up to 3.125 Gbps Yes (1.2V, 1.0V) Up to 60% less! DDR3 800 Mbps 1050 Mbps Up to 184 DSP48 Blocks Up to 4 Hard Blocks DCM & PLL Yes, Gen 1 Device DNA & AES
54
4/4/2011
** All memory controller support x16 interface, except in CS225 package where x8 only is supported
55
4/4/2011
Functional simulation
Synthesis
Post-synthesis simulation y
56
4/4/2011
Logic Synthesis
VHDL description
architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1< A1 MUX 1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
Circuit netlist
57
4/4/2011
Synthesis Tools
XST
and others
Interpret RTL code p Synplify Pro: Produces synthesized circuit netlist in a standard EDIF (.edf) format
Can optionally produce .VHM (VHDL code merged into one) file for post-synthesis simulation
XST: Produces synthesized circuit netlist in NGC format Netlist is composed of gates in the particular Xilinx implementation library
http://toolbox.xilinx.com/docsan/xilinx9/books/manuals.pdf has information on libraries
Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist
58
4/4/2011
Implementation
After synthesis the entire implementation process is performed by FPGA vendor tools
59
4/4/2011
Mapping
LUT0 LUT4 LUT1 LUT5 LUT2 FF2 LUT3 FF1
60
4/4/2011
Placing
FPGA
CLB SLICES
Routing
Programmable Connections
FPGA
61
4/4/2011
Map report
Design Summary -------------Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 144 out of 4,704 3% Number of 4 input LUTs: 173 out of 4,704 3% Logic Distribution: Number of occupied Slices: 145 out of 2,352 6% Number of Slices containing only related logic: 145 out of 145 100% Number of Slices containing unrelated logic: g g 0 out of 145 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 210 out of 4,704 4% Number used as logic: 173 Number used as a route-thru: 5 Number used as 16x1 RAMs: 32 Number of bonded IOBs: 74 out of 176 42% Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25
62
4/4/2011
63
4/4/2011
Post-place-and-route simulation
After place-and-route performed, can do post-place-and-route simulation t l d t i l ti
Now have real timing information! Also can do static timing analysis: shows the worst case critical path in circuit
Configuration
Once a design is implemented, you must create a file that the FPGA can understand
This file is called a bit stream: a BIT file (.bit extension)
The BIT file can be downloaded directly to the FPGA, FPGA or can be converted into a PROM file which stores the programming information
64
4/4/2011
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)
A 2-input NAND function to represent one equivalent gate. An equivalent gate consists of an arbitrary number of transistors.
Different vendors provide different functions in their cell libraries, where each implementation of each function requires a different number of transistors (difficult to compare capacity/complexity) Solution: Assign each function an equivalent gate value and sum all these values. th l How can we establish a basis for comparison between FPGAs and ASICs? Can an ASIC of 500,000 equivalent gates that needs to be migrated into an FPGA fit into a particular FPGA?
65
4/4/2011
Divide the system gates value by three, so a three million FPGA system gates would equate to one million ASIC equivalent gates!!
However, to make comparisons between two different implementations on an FPGA (i.e. Floating point adder vs. Fixed point adder) designers should use the resources available in an FPGA:
Number of 4-input LUTs used Number of embedded multipliers Number of embedded RAM blocks
State-of-the-Art FPGAs
65-90 nm process on 300 mm wafers
Lower cost per function (LUT + register) Smaller and faster transistors: Higher speed Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. Integrated transceivers running at 10 Gigabits/sec >100,000 LUTs & flip-flops >200 embedded RAMs, and same number 18 x 18 multipliers
1156 pins (balls) with >800 GP I/O i (b ll ) ith 50 I/O standards, incl. LVDS with internal termination 16 low-skew global clock lines Multiple clock management circuits On-chip microprocessor(s) and multi-Gbps transceivers
66
4/4/2011
Altera Stratix-II
90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers
12 PLLs
21 X Bigger
C a p a c ity S peed P ric e
5.5 X Faster
50 X Less Expensive
1/9 1 1/92 1/93 1 /94 1/9 5 1/96 1/97 1/98 1 /99
Y ear
Source: Xilinx
67
4/4/2011
FPGA Shortcomings
Circuit Delay Delay increases due to programmable switches in the FPGA routing architecture Area Configuration cells and programmable resources incur substantial area penalty Power Typically not suited for low power applications
Performance ASIC Need to improve FPGA FPGA FPGA Cost ASIC Time to market ASIC
Conclusion
FPGAs are the main enabler of Reconfigurable Computing Systems FPGAs fill the gap between Instruction Set Processors (GPs) and ASICS.
Advantages: Flexible, programmable, Disadvantages: Power dissipation, performance w.r.t. ASIC
Applicability of FPGAs relies on CAD tools provided by different vendors such as Xili and Alt diff t d h Xilinx d Altera RCS can be realized with several technologies:
FPGAs: Fine/Medium Grain Coarse Grain Reconfigurable Architectures: CGRAs
68