Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

4/4/2011

EE 811 Advanced Digital System Design


Dr. Arshad Aziz

Basic FPGA Architecture

Technology Timeline
1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs FPGAs

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

4/4/2011

Major FPGA vendors


SRAM-based FPGAs Xilinx Inc. www xilinx com Inc www.xilinx.com Altera Corp. www.altera.com Atmel Corp. www.atmel.com Lattice Semiconductor Corp. www.latticesemi.com Antifuse and fl h b A tif d flash-based FPGA d FPGAs Actel Corp. www.actel.com QuickLogic Corp. www.quicklogic.com

Feature
Technology node Reprogrammable Reprogramming speed (inc. erasing) Volatile (must be programmed on power-up) Requires external configuration file Good for prototyping Instant-on IP Security Size of configuration cell Power consumption Rad Hard

SRAM
State-of-the-art Yes (in system) Fast

Antifuse
One or more generations behind No

E2PROM / FLASH
One or more generations behind Yes (in-system or offline) 3x slower than SRAM No (but can be if required) No Yes (reasonable) Yes Very Good Medium-small (two transistors) Medium Not really

----

Yes

No

Yes Yes (very good) No Acceptable


(especially when using bitstream encryption)

No No Yes Very Good Very small Low Yes

Large (six transistors) Medium No

4/4/2011

The Programmable Marketplace


Q1 Calendar Year 2005
PLD Segment Actel Lattice L tti 5% 7% QuickLogic: Q i kL i 2% Other: 2% FPGA Sub-Segment

Xilinx

58% 33% 51% 31% Altera 11%

Xilinx

Altera

All Others

Source: Company reports Latest information available; computed on a 4-quarter rolling basis

FPGA Families
Low-cost
Spartan 3 Spartan 3E Spartan 3L

High-performance
Virtex 4 LX / SX / FX Virtex 5 LX

Xilinx

Cyclone II

Stratix II Stratix II GX

Altera

4/4/2011

Xilinx

Primary products: FPGAs and the associated CAD software

Programmable Logic Devices

ISE Alliance and Foundation Series Design Software

Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Source: [Xilinx Inc.]

Xilinx
Primary products: FPGAs and the associated CAD software

Programmable Logic Devices

ISE Alliance and Foundation Series Design Software

Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Source: [Xilinx Inc.]

4/4/2011

Xilinx FPGA Families


Old families XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not recommended for modern designs. L Low Cost F il C t Family Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3 (90 nm) Spartan-3E (90 nm) Spartan-3A (90 nm) High-performance families High performance Virtex (220 nm) Virtex-E, Virtex-EM (180 nm) Virtex-II, Virtex-II PRO (130 nm) Virtex-4 (90 nm) Virtex 5 (65 nm)
Source: [Xilinx Inc.]

General structure of an FPGA

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

4/4/2011

Xilinx FPGA
Configurable Logic Blocks
Block RAMs Block RAMs

I/O Blocks Block RAMs

Generic FPGA architecture:


Configurable Logic Block (CLB) (CLB) Connection Block Wire segments Switch Block Routing Channels

I/O pad

4/4/2011

Xilinx CLB
Configurable logic block (CLB) Slice CLB CLB Logic ll L i cell Logic cell Slice Logic ll L i cell Logic cell

Slice CLB CLB Logic cell Logic cell

Slice Logic cell Logic cell

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

Xilinx Point of Reference


A Xilinx CLB has FOUR slices
Each slice has TWO logic cells Each logic cell has TWO LUTs plus other logic (carry and control) plus a flip-flop/latch
For SLICEL slices, these LUTs can be configured as:
1. 1 LUT

For SLICEM slices, these LUTs can be configured as:


1. LUT 2. 16 x 1 Distributed RAM (16 words x 1 bit/word) 3. 16-bit Shift Register

4/4/2011

CLB Structure of Spartan 3


Each Virtex-II CLB contains four slices
Local routing provides feedback between slices in the same CLB, and it provides routing to neighboring CLBs A switch matrix provides access to general routing resources
Switch Matrix COUT BUFT BUF T Slice S3 COUT

Slice S2 SHIFT

Slice S1

Slice S0

Local Routing

CIN

CIN

Simplified view of a Xilinx Logic Cell


16-bit SR 16x1 RAM

a b c d e clock clock enable set/reset

4-input p LUT

y mux flip-flop q

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

4/4/2011

Simplified Slice Structure


Each slice has four outputs
Two registered outputs, two non-registered outputs Two BUFTs associated with each CLB, accessible by all 16 CLB outputs
Slice 0 LUT Carry
PRE D Q CE CLR

Carry logic runs vertically, up only


Two independent carry chains per CLB
LUT Carry
D PRE Q CE CLR

Detailed Slice Structure


The next few slides discuss the slice features
LUTs MUXF5, MUXF6, MUXF7, MUXF8 (only the F5 and F6 MUX are shown in this diagram) Carry Logic MULT_ANDs Sequential Elements

4/4/2011

SRAM Cell (Pass Transistor)



An SRAM cell can drive the gate (G) terminal of an NMOS transistor. If SRAM (M) = 1 then signals passes from S D An SRAM cell can be attached to the select line of a MUX to control it.

Look-Up Tables
Combinatorial logic is stored in Look-Up Tables (LUTs)
Also called Function Generators (FGs) Capacity is limited by the number of inputs, not by the complexity A B C D Z 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 . .
Z

Delay through the LUT is constant


Combinatorial Logic

A B C D

1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

10

4/4/2011

Look Up Table (LUT)



The LUT is used to realize any Boolean function. Assume the function to be realized is y = (a&b) | !c This could be achieved by loading the LUT with the appropriate output values

LUT (Look-Up Table) Functionality


x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4

LUT

x1 x2 x3 x4

x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0

Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs i

x1 x2 y y

11

4/4/2011

5-Input Functions implemented using two LUTs



One CLB Slice can implement any function of 5 inputs Logic function i partitioned b L i f i is ii d between two LUT LUTs F5 multiplexer selects LUT
A4 A3 A2 A1 WS DI
0

LUT ROM RAM

F5
F5 GXOR G

F4 F3 F2 F1 BX

A4 A3 A2 A1

WS

DI D

LUT ROM RAM

nBX BX 1 0

5-Input Functions implemented using two LUTs


X X X X X 5 4 3 2 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0

LUT

OUT

LUT

12

4/4/2011

Dedicated Expansion Multiplexers



MUXF5 combines 2 LUTs to create
CLB Slice LUT LUT MUXF5 Slice LUT LUT MUXF5 MUXF6

Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer

MUXF6 combines 2 slices to form

Dedicated muxes are faster and more space efficient

Connecting Look-Up Tables


F5 F8

CLB Slice S3 Slice S2


F7

MUXF8 combines the two MUXF7 outputs (from the CLB above or below) MUXF6 combines slices S2 and S3 MUXF7 combines the two MUXF6 outputs MUXF6 combines slices S0 and S1 MUXF5 combines LUTs in each slice

Slice S1

F5

Slice S0

F6

F F5

F5

F6

13

4/4/2011

Programmable Logic Block

Early devices were based on the concept of programmable logic block, which comprised

3-input 3 input lookup table (LUT), (LUT) register that could act as flip flop or a latch, multiplexer, along with a few other elements.

3-, 4-, 5-, or 6-input LUTs?



The key feature of n-input LUT is that it can implement any possible n-input combinational logic function. Adding more inputs allows you to represent more complex functions, but every time you add an input, you double the number of SRAM cells!

The first FPGAs were based on 3-input LUTs.

FPGA vendors and researchers studied the relative merits of 3, 4, 5 and even 6 input LUTS.
The current consensus is that 4-input LUTS offer the optimal balance of pros and cons.

In the past, some devices were created using a mixture of different LUT sizes because this offered the promise of optimal device utilization. However current logic synthesis tools prefer uniformity and regularity

14

4/4/2011

FPGA Function generators


LUT Example: Implement the function using: 2 input 2-input LUTs 3-input LUTs 4-input LUTs

F = ABD + BC D + A B C

A B D B C D A B C

A B D F B C D A B C F A B C D F

Fast Carry Logic

Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters p ,

Carry logic is independent of normal logic and routing resources

LSB

Carry Logic Routing

Each CLB contains separate logic and routing for the fast MSB generation of sum & carry signals

15

4/4/2011

Fast Carry Logic


Simple, fast, and complete arithmetic Logic
Dedicated XOR gate for singlelevel sum completion Uses dedicated routing resources ti All synthesis tools can infer carry logic
COUT
To S0 of the next CLB

COUT
To CIN of S2 of the next CLB

First Carry Chain

SLICE S3
CIN COUT

SLICE S2 SLICE S1
COUT

CIN

Second Carry Chain SLICE S0

CIN

CIN

CLB

Accessing Carry Logic

All major synthesis tools can infer carry


logic for arithmetic functions

Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then) Counters (count <= count +1)

16

4/4/2011

Flexible Sequential Elements


Either flip-flops or latches Two in each slice; eight in each CLB ; g Inputs come from LUTs or from an independent CLB input Separate set and reset controls
Can be synchronous or asynchronous
_1 FDRSE D CE R S Q

FDCPE D PRE Q CE CLR

All controls are shared within a slice


Control signals can be inverted locally within a slice

LDCPE D PRE Q CE G CLR

Shift Register
LUT

Each LUT can be configured as shift register i t


Serial in, serial out

IN CE CLK

D CE

D CE

Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays d l Use CLB flip-flops to add depth

LUT

D CE

OUT

D CE

DEPTH[3:0]

17

4/4/2011

Shift Register
12 Cycles Operation A 64 4 Cycles Operation C 3 Cycles 3 Cycles Operation B 8 Cycles 64

9-Cycle imbalance

Register-rich FPGA Register rich


Allows for addition of pipeline stages to increase throughput

Data paths must be balanced to keep desired functionality

Shift Register LUT Example

12 Cycles
Operation A Operation B

64

4 Cycles
Operation C

8 Cycles
Operation D - NOP

64

3 Cycles
12 Cycles

9 Cycles
Paths are Statically Balanced

18

4/4/2011

Distributed RAM
CLB LUT configurable as Distributed RAM
An LUT equals 16x1 RAM Cascade LUTs to increase RAM size
LUT

RAM16X1S

=
RAM32X1S
D WE WCLK A0 A1 A2 A3 A4 O

D WE WCLK A0 A1 A2 A3

Synchronous write Asynchronous read


Can create a synchronous read by using extra flip-flops Naturally distributed RAM Naturally, read is asynchronous

LUT

=
LUT

or

RAM16X2S
D0 D1 WE WCLK A0 A1 A2 A3 O0 O1

RAM16X1D
D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 SPO

Two LUTs can make


32 x 1 single-port RAM 16 x 2 single-port RAM 16 x 1 dual-port RAM

or

Xilinx Multipurpose LUT

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

19

4/4/2011

Simplified view of a Xilinx Logic Cell


16-bit SR 16x1 RAM

a b c d e clock clock enable set/reset

4-input p LUT

y mux flip-flop q

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

RAM Blocks and Multipliers in Xilinx FPGAs

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

20

4/4/2011

Embedded Ram Blocks



A lot of applications require the use of memory, so FPGAs now include relatively large chunks of embedded RAM called e-RAM or Block RAM (BRAM). ( ) Depending on the architecture of the component, these blocks might be positioned around the periphery of the device or organized as columns

These blocks can be used for a variety of purposes, such as implementing standard single or dual port RAMs, FIFO, e.t.c.

Block RAM
Port B Port A
Spartan-3 Dual-Port Block RAM

Block RAM

Most efficient memory implementation


Dedicated blocks of memory

Ideal for most memory requirements


4 to 104 memory blocks
18 kbits = 18 432 bits per block (16 k without parity bits) 18,432

Use multiple blocks for larger memories

Builds both single and true dual-port RAMs Synchronous write and read (different from distributed RAM)

21

4/4/2011

Spartan-3 Block RAM Amounts

1 0

Block RAM can have various configurations (port aspect ratios)


2 0 4 0

8k x 2
4,095

4k x 4

16k x 1

8,191 0

8+1

2k x (8+1) ( )
2047 16+2 0 1023 16,383

1024 x (16+2)

22

4/4/2011

Block RAM Port Aspect Ratios

Single-Port Block RAM

23

4/4/2011

Dual-Port Block RAM

Dual-Port Bus Flexibility


RAMB4_S16_S8
WEA

Port A In 1K-Bit Depth

ENA RSTA CLKA ADDRA[9:0] DIA[17:0] WEB ENB DOA[17:0]

Port A Out 18-Bit Width

Port B In 2k-Bit Depth

RSTB CLKB ADDRB[10:0] DIB[8:0]

DOB[8:0]

Port B Out 9-Bit Width

Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic

24

4/4/2011

Two Independent Single-Port RAMs


RAMB4_S1_S1 Port A In 8K-Bit Depth 0, ADDR[12:0] 0 ADDR[12 0]
WEA ENA RSTA CLKA ADDRA[12:0] DIA[0] DOA[0]

Port A Out 1-Bit Width

Port B In 8K-Bit Depth 1, ADDR[12:0]

WEB ENB RSTB CLKB ADDRB[12:0] DIB[0] DOB[0]

Port B Out 1-Bit Width

Added advantage of True Dual DualPort


No wasted RAM Bits

To access the lower RAM


Tie the MSB address bit to Logic Low

Can split a Dual-Port 16K RAM into two Single-Port 8K RAM


Simultaneous independent access to each RAM

To access the upper RAM


Tie the MSB address bit to Logic High

Embedded Multipliers

Some functions, like multipliers are inherently slow if they are implemented by connecting a large number of programmable logic blocks together. g g Current FPGA incorporate special hard wired multiplier blocks which are typically located in close proximity to the embedded RAM blocks (Arithmetic Based Applications).

25

4/4/2011

18 x 18 Embedded Multiplier
Fast arithmetic functions
Optimized to implement multiply / accumulate modules

18 x 18 signed multiplier Fully combinational Optional registers with CE & RST ( i li ) O i l i ih (pipeline) Independent from adjacent block RAM

18 x 18 Multiplier
Embedded 18-bit x 18-bit multiplier
2s complement signed operation

i d in l M lti li Multipliers are organized i columns


Data_A (18 bits)

18 x 18 Multiplier
Data_B (18 bits)

Output (36 bits)

26

4/4/2011

Positions of Multipliers

Asynchronous 18-bit Multiplier

27

4/4/2011

18-bit Multiplier with Register

A simple clock tree


Clock tree Flip-flops

Special clock pin and pad Clock signal from outside world
The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

28

4/4/2011

Digital Clock Manager (DCM)

Clock signal from outside world

Clock Manager etc.


Special clock pin and pad

Daughter clocks used to drive internal clock trees or output pins

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

Digital Clock Managers (DCM)



The clock pin is usually connected to special hard-wired function called a clock-manager that generates daughter clocks. The daughter clocks may be used to drive internal clock trees or external output pins that can be used to provide clocking services to other devices on the host circuit board. There might be multiple clock managers supporting only a subset of features (Jitter removal, Frequency Synthesis, )

Clock signal from outside world

Clock Manager etc.


Special clock pin and pad

Daughter clocks used to drive internal clock trees or output pins

29

4/4/2011

DCM: Jitter Removal



In the real world clock edges may arrive a little early or a little late. A fuzzy clock would result (jitter) due to the delay encountered. The FPGA clock manager can be used to detect and correct for this jitter and provide a clean daughter clock signal for use inside the device.

DCM: Frequency Synthesis



The frequency of the clock signal being presented to the FPGA from the outside world might not be exactly what the designer engineer wishes for. The clock manager can be used to generate daughter clocks with frequencies that are derived by multiplying or dividing the original signal.

30

4/4/2011

DCM: Phase Shifting



Certain designs require the use of clocks that are phase shifted (delayed) with respect to each other. other Some clock managers allow you to select from fixed phase shifts of common values such as 1200 and 2400 (for a three-phase clocking scheme)

Basic I/O Block Structure


Three-State FF Enable Clock Set/Reset Output FF Enable D Q EC SR Direct Input FF Enable Registered Input Q D EC Input Path D Q EC SR Three-State Control

Output Path

SR

31

4/4/2011

IOB Functionality
IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered
advised for high-performance I/O

Inputs can be delayed

Configurable I/O Impedances



The signals used to connect devices on todays circuit board often have fast edge rates. In order to prevent signals reflecting back it is necessary to apply appropriate terminating resistors to the FPGA input and output pins.

In the past, resistors were applied as discrete components (outside the FPGA). FPGA) Today's FPGAs allow the use of internal terminating resistors whose value can be configured by the user.

32

4/4/2011

Spartan 3 Family Attributes

FPGA Nomenclature

33

4/4/2011

Spartan-3 FPGA Family Members

2001 Virtex-II FPGA Family


Virtex-II FPGA introduced followed by Virtex-II Pro in 2003 444 18x18 Multipliers & 18kbit block RAMs introduced Gbit Serial I/O Communications & Power PC Processors Introduced C Complex Floating Point Algorithm Implementation now possible

Virtex-II / Pro 44,000 Logic Slices 444 18Kbits BRAMs 444 18x18 Multipliers 2 PowerPC Processors 20 Gbit I/O 1164 Max User I/O

34

4/4/2011

Virtex II Pro Floorplan


Up to 16 serial transceivers
622 Mbps to 3.125 Gbps

1t 4P to PowerPCs PC 4 to 16 multi-gigabit transceivers 12 to 216 multipliers 3,000 to 50,000 logic cells 200k to 4M bits RAM 204 to 852 I/Os

PowerPCs
Logic cells

Virtex-II Pro (Selection)

35

4/4/2011

Embedded Processor Cores (Hard and Soft)



The majority of designs make use of microprocessors. These appeared as discrete devices on the circuit board. Lately, high-end FPGAs have become available that contain one or more embedded microprocessors (referred to as microprocessor cores). There are two types of cores: A hard microprocessor core is implemented as a dedicated predefined block (two approaches) A soft microprocessor core is implemented by configuring a group of programmable logic blocks to act as a microprocessor.

Embedded Core (Inside)



Xilinx and Altera tend to embed one or more microprocessor cores directly into the main FPGA fabric (PowerPC) In this case the design tools have to be able to take account of the presence of these blocks in the fabric (any memory used by the core is formed from the embedded RAM blocks).

The main advantage of this scheme is the inherent speed p advantages to be gained from having the processor core in intimate proximity to FPGA fabric.

36

4/4/2011

Soft Core

As opposed to embedding a microprocessor physically into the fabric of the chip, it is possible to configure a group of p g programmable logic blocks to act as a microprocessor. g p Soft cores are simpler (more primitive) and slower than their hard-core counterparts.
ADVANTAGE?

1.

2.

The main advantage of this scheme is that the user need only implement a core if he/she needs it. Also, the user can instantiate as many cores as they require until they run out of resources!

Virtex Architectures
Built for high-performance applications

Other Families include Virtex-II Pro Virtex-4 Virtex-5 Latest Family include Virtex-6

Basic Architecture 74

37

4/4/2011

Virtex-II Pro Architecture


Contains embedded Processors and Multi-Gigabit Transceivers

High performance True Dual-port RAM - 8 Mb SelectIO- Ultra Technology - 1164 I/O

Advanced FPGA Logic 99k logic cells

XtremeDSP Functionality Embedded multipliers

RocketIO and RocketIO X High-speed Serial Transceivers 622 Mbps to 3.125 Gbps PowerPC Processors 400+ MHz Clock Rate - 2 XCITE Digitally Controlled Impedance Any I/O DCM Digital Clock Management - 12

130 nm, 9 layer copper in 300 mm wafer technology

Basic Architecture 75

Virtex-4 Family
Advanced Silicon Modular BLock (ASMBL) Architecture Optimized for logic, Embedded, and Signal Processing

LX
Resource

FX
12K 12K140K LCs 0.6 0.610 Mb 420 32 32192 240 240896 024 Channels 1 or 2 Cores 2 or 4 Cores

SX
23K 23K55K LCs 2.3 2.35.7 Mb 48 128 128512 320 320640 N/A N/A N/A

Logic Memory DCMs DSP Slices SelectIO RocketIO PowerPC Ethernet MAC

14K 14K200K LCs 0.9 0.96 Mb 412 32 3296 240 240960 N/A N/A N/A

Basic Architecture 76

38

4/4/2011

Virtex-4 Architecture
RocketIO Multi-Gigabit Transceivers
622 Mbps10.3 Gbps

Smart RAM
New block RAM/FIFO

Advanced CLBs
200K Logic Cells

Xesium Clocking Technology


500 MHz

Tri-Mode Ethernet MAC XtremeDSP Technology Slices


256 18x18 GMACs 10/100/1000 Mbps

PowerPC 405 with APU Interface


450 MHz, 680 DMIPS

1 Gbps SelectIO
ChipSync Source synch, XCITE Active Termination

Basic Architecture 77

Virtex-5 Family
Optimized for logic, Embedded, Signal Processing, and High-Speed Connectivity
Virtex-5 Platforms

LX
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

LXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

SXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

FXT
The image cannot be display ed. Your computer may not hav e enough memory to open the image, or the image may hav e been corrupted. Restart y our computer, and then open the file again. If the red x still appears, y ou may hav e to delete the image and then insert it again.

Logic
Logic On-chip RAM DSP Capabilities Parallel I/Os Serial I/Os PowerPC Processors

Logic/Serial

DSP/Serial

Emb./Serial

Basic Architecture 78

39

4/4/2011

Virtex-5 Architecture
Enhanced
36Kbit Dual-Port Block RAM / Dualg FIFO with Integrated ECC 550 MHz Clock Management Tile with DCM and PLL SelectIO with ChipSync Technology and XCITE DCI Advanced Configuration Options 25x18 DSP Slice with Integrated ALU RocketIO Transceiver Options TriTri-Mode 10/100/1000 Mbps Ethernet MACs
LowLow-Power GTP: Up to 3.75 Gbps HighHigh-Performance GTX: Up to 6.5 Gbps

New
Most Advanced HighHighPerformance Real 6LUT Logic Fabric PCI Express Endpoint Block System Monitor Function with BuiltBuilt-in ADC Next Generation PowerPC Embedded Processor

Basic Architecture 79

TheBuilt for high volume, low-cost applications Spartan-3 Family


18x18 bit Embedded Pipelined Multipliers for efficient DSP Configurable 18K Block RAMs + Distributed RAM

Spartan-3

Bank 0 Bank 2

Up to eight on-chip Digital Clock Managers to support multiple system clocks

4 I/O Banks, Support for all I/O Standards including PCI, DDR333, RSDS, mini-LVDS

Bank 3

Bank 1

Basic Architecture 80

40

4/4/2011

Spartan-3 Family
Based upon Virtex-II Architecture Optimized for Lower Cost

Smaller process = lower core voltage


.09 micron versus .15 micron Vccint = 1.2V versus 1.5V

Logic resources
Only one-half of the slices support RAM or SRL16s (SLICEM) Fewer block RAMs and multiplier blocks

Clock Resources
Fewer global clock multiplexers and DCM blocks

I/O Resources
Fewer pins per package No internal 3-state buffers Support for different standards
New standards: 1.2V LVCMOS, 1.8V HSTL, and SSTL Default is LVCMOS, versus LVTTL
Basic Architecture 81

SLICEM and SLICEL


Each Spartan-3 CLB contains four slices
Similar to the Virtex-II
Left-Hand SLICEM Right-Hand SLICEL
COUT COUT

Slices are grouped in pairs


Left-hand SLICEM (Memory)
LUTs can be g y configured as memory or SRL16
Switch Matrix SHIFTIN

Slice X1Y1

Slice X1Y0

Slice X0Y1

Right-hand SLICEL (Logic)


LUT can be used as logic only
Basic Architecture 82

Slice X0Y0

Fast Connects

SHIFTOUT CIN

CIN

41

4/4/2011

Multiple Domain-optimized Platforms

Basic Architecture 83

Spartan-3E Features
More gates per I/O than Spartan 3 Spartan-3 Removed some I/O standards
Higher-drive LVCMOS GTL, GTLP SSTL2_II HSTL_II_18, HSTL I HSTL II 18 HSTL_I, HSTL_III LVDS_EXT, ULVDS

16 BUFGMUXes on left and right sides


Drive half the chip only In addition to eight global clocks

Pipelined multipliers Additional configuration modes


SPI, BPI Multi-Boot mode

DDR Cascade
Internal data is presented on a single clock edge Architecture Basic
84

42

4/4/2011

Spartan-3A DSP Features


Increased amount of block memory (BRAM)
1512K of S3A1800 vs 648 K of S3E1600

More XtremeDSP DSP48A slices


Replaces Embedded multiplier of Spartan-3E
3400A 126 DSP48As 1800A 84 DSP48As

Basic Architecture 85

Spartan-3A DSP
Tuning DSP Performance
Integrated XtremeDSP Sli Xt DSP Slice
Application optimized capacity Integrated pre-adder optimized for filters 250 MHz operation, standard speed grade Compatible with VirtexDSP
XtremeDSP DSP48A Slice

Increased memory capacity and performance


Also important for embedded processing, complex IP, etc Basic Architecture 86

43

4/4/2011

Function Multiplier Pre-Adder Cascade Inputs Cascade Output Dedicated C input Adder Dynamic Opmodes ALU Logic Functions Pattern Detect SIMD ALU Support Carry Signals

DSP48

DSP48 Comparison
DSP48E 25 x 18 No Two Yes Yes 3 input 48 bit Yes Yes Yes Yes Carry In & Out DSP48A 18 x 18 Yes One Yes Yes 2 input 48 bit Yes No No No
Enables parallel ALU operations on multiple data sets.

Benefit

18 x 18 No One Yes No 3 input 48 bit Yes No No No Carry In

Reduces FPGA resource needs for DSP algorithms. Reduces the critical path timing in FIR filter applications better performance. Important in FIR filter construction. Enables fast d t E bl f t data path chaining of DSP48 bl k f l th h i i f blocks for larger filt filters. Enables fast data path chaining of DSP48 blocks for larger filters. The C input supports many 3-input mathematical functions, such as 3input addition and 2-input multiplication with a single addition and the very valuable rounding of multiplication away from zero. Supports simple add and accumulate functions.

One DSP48 can provide more than one function.. Multiply, Multiply-add, multiply-accumulate etc. Similar to the ALU of a microprocessor. Enables the selection of ALU function on a clock cycle basis Enables multiple functions to be selected. (Add, Subtract, or Compare) This feature supports convergent rounding, underflow/overflow detection for saturation arithmetic, and auto-resetting counters/accumulators.

Carry In & Out

Supports fast carry functions between DSP blocks. Often a speed limiting path.

Basic Architecture 87

Spartan-3A Device Table


Spartan-3 Spartan-3A XC3S1400A XtremeDSP DSP48A Slices Dedicated Multipliers Block Ram Blocks Block RAM (Kb) Distributed RAM (Kb) FFs/LUTs Logic C ll L i Cells DCMs Max Diff I/O Pairs CS484 19x19mm (0.8mm pitch) *FG676 27x27mm (1.0mm pitch) Spartan-DSP Spartan-3A DSP XC3SD1800A XC3SD3400A

32 32 576 176 22,528 25,344 25 344 8 227 502 Basic Architecture 88

84 DSP48As 84 1,512 260 33,280 37,440 37 440 8 227 309 519

126 DSP48As 126 2,268 373 47,744 53, 53 712 8 213 309 469

44

4/4/2011

Latest Families

Basic Architecture 89

Architecture Alignment
Virtex-6 FPGAs Spartan-6 FPGAs

760K Logic Cell Device

Common Resources

LUT-6 CLB BlockRAM DSP Slices High-performance Clocking

150K Logic Cell Device

FIFO Logic Tri-mode EMAC System Monitor


*Optimized for target application in each family

Parallel I/O HSS Transceivers* PCIe Interface

Hardened Memory Controllers 3.3 Volt compatible I/O

Enables IP Portability, Protects Design Investments


Basic Architecture 90

45

4/4/2011

Addressing the Broad Range of Technical Requirements


Spartan-6 LX Spartan-6 LXT Virtex-6 LXT
Lowest cost logic + DSP Lowest logic + high-speed serial

Virtex-6 HXT

Market Size

Virtex-6 SXT

High logic density + serial connectivity Ultra high-speed serial connectivity + logic DSP + logic + serial connectivity

Application Market Segments


Basic Architecture 91

+ 100s More

Designers Eccentrics
Higher System Performance
More design margin to simplify designs Higher integrated functionality

Lower System Cost


Reduce BOM Implement design in a smaller device & lower speedgrade

Lower Power
Help meet power budgets Eliminate heat sinks & fans Architecture 92 Basic Prevent thermal runaway

46

4/4/2011

Virtex-6 Family

Basic Architecture 93

Virtex Product & Process Evolution


Virtex-6
40-nm

Virtex-5
65-nm 6

Virtex-4
90-nm

Virtex-II Pro
130-nm

Virtex-II
150-nm

Virtex-E
180-nm 180 nm

Virtex

220-nm
2nd Generation 3rd Generation 4th Generation 5th Generation 6th Generation

1st Generation

Delivering Balanced Performance, Power, and Cost


Basic Architecture 94
Virtex-6 Base Platform 94

47

4/4/2011

Strong Focus on Power Reduction


Static Power Reduction
Higher distribution of low leakage transistors Reduced capacitance through device shrink VCCINT = 0.9V option allows power / performance tradeoff Dynamic termination Allows sophisticated monitoring of temperature and voltage

D Dynamic P i Power R d ti Reduction Reduced Core Voltage Devices Lower Overall Power I/O Power Improvements System Monitor

Up to 50% Power Reduction vs. Previous Generation


Basic Architecture 95

Virtex-6 Logic Fabric


Virtex-6 Configurable Logic Block (CLB)
Each CLB contains two slices Each slice contains four 6-input Lookup Tables 6 input (6LUT)
Slice
LUT LUT

Slice
LUT LUT LUT LUT LUT LUT

Slices implement logic functions (slice_l) Slices for memories and shift registers (slice_m) LUT6 implements
All functions of up to 6 variables Two functions of up to 5 or less variables each Shift registers up to 32 stages long Consumption Benefits PowerMemories of 64 bits Performance Benefits
Shift register Multiple configurations within slice_m memories mode greatly reduces power Increased ratio of a slice consumption over FF implementation available closer to the source or target logic
Basic Architecture 96

CLB

Cost Benefits

Can pack logic and memory functions more efficiently

48

4/4/2011

Higher DSP Performance


Most advanced DSP architecture
New optional pre-adder for symmetric filters 25x18 multiplier
High resolution filters Efficient floating point support

ALU-like second stage enables mapping of advanced operations


Programmable op-code SIMD support Addition / Subtraction / Logic functions

Pattern detector

Lowest power consumption Highest DSP slice capacity


Up to 2K DSP Slices
Basic Architecture 97

Virtex-6 LXT / SXT FPGAs

Basic Architecture 98

49

4/4/2011

Spartan-6 Family

Basic Architecture 99

Spartan-6
Next Generation 45nm Spartan Family
Increased performance & density Evolutionary feature enhancements Dramatic cost & power reductions

Two Silicon Platforms


LX: Cost optimized Logic, Memory LXT: LX features plus High-Speed Serial Connectivity More unified & integrated with Virtex

Delivering the Optimal Balanced of Cost, Power & Performance


Basic Architecture 100

50

4/4/2011

Spartan-6 Logic Evolution


Higher Performance, Increased Utilization
Modified Virtex 6-input LUT
4 additional flip-flops per slice Higher utilization for register Spartan-3A Series & Spartanintensive designs Earlier LUT / FF Pair

NEW Efficient Design

SpartanSpartan-6
LUT / Dual FF Pair 6LUT

Efficient & Capable


Logic Arithmetic functions Distributed RAM & shift registers Interconnect 4LUT

Up to 25% Higher Performance

Great GeneralGeneral-Purpose Logic

6-input LUT & 2nd FlipFlipflop for Higher Utilization

Basic Architecture 101

Spartan-6 CLB Logic Slices


SliceM (25%) SliceL (25%) SliceX (50%)

LUT6 8 Registers Carry Logic Wide Function Muxes Distributed RAM / SRL logic

LUT6 8 Registers Carry Logic Wide Function Muxes

LUT6 Optimized for Logic p g 8 Registers

Slice mix chosen for the optimal balance of Cost, Power & Performance
Basic Architecture 102

51

4/4/2011

Spartan-6 Lowest Total Power


Static power reductions
Process & architectural innovations

Dynamic power reduction


Lower node capacitance & architectural innovations

More hard IP functionality


Integrated transceivers & other logic reduces power Hard IP uses less current & power than soft IP

Lower IO power Low power option -1L reduces power even further Fewer supply rails reduces power
Basic Architecture 103

Spartan-6 Hard Memory Controller


New Hard Block Memory Controller
Up to 4 controllers per device

Why a Hard Memory Block?


Very common design component Multiple customer benefits
Customer Requests
Higher performance Lower cost Lower power Easier designs

Spartan-6 Hard Block Memory Controller Benefits


Up to 800 Mbps Saves soft logic, smaller die Dedicated logic Timing closure no longer an issue Configurable MultiPort user interface CoreGen/MIG wizard & EDK support

Basic Architecture 104

52

4/4/2011

Memory Controller
Only low cost FPGA with a hard memory controller G Guaranteed memory interface performance providing t d i t f f idi
Reduced engineering & board design time DDR, DDR2, DDR3 & LP DDR support Up to 12.8Mbps bandwidth for each memory controller

Automatic calibration features M lti t structure f user i t f Multiport t t for interface


Six 32-bit programmable ports from fabric
Spartan-6

DRAM

SRAM

Controller interface to 4, 8 or 16 bit memories devices

FLASH

DRAM DDR DDR2 DDR3 LP DDR

EEPROM

Basic Architecture 105

Integrated DSP Slice


250 MHz implementation
Fast multiplier & 48 bit adder ASIC-like performance
XtremeDSP DSP48A1 Slice

Input and output registers for higher speed

Optimizes FIR filter applications

Super Regional Training 106

53

4/4/2011

Better, More BRAM


More Block RAMs
2x higher BRAM to Logic Cell ratio than Spartan-3A platform
9K BRAM 18K BRAM

More port flexibility


18K can be split into two 9K BRAM blocks and can be independently addressed

OR

9K BRAM

Improves buffering, caching & data storage


Excellent for embedded processing, communication protocols Enables DSP blocks to provide more efficient video and surveillance algorithms

Lower Static Power


Basic Architecture 107

Compare to Spartan-3A
Twice the Capabilities, Half the Power, Hard Blocks!
Feature Logic Cells (Kbit) LUT Design Block RAM (Mbit) Transceiver Count / Speed Voltage Scaling Static Power (typ mW) Memory Interface Max Differential IO Multipliers/DSP Memory Controllers Clock Management PCI Express Endpoint Security
Basic Architecture 108

Extended Spartan-3A (90nm) Up to 55K 4 input 4-input LUT + FF Up to 2 Mbit no No (1.2V only) 11 mW (smallest density) 400 Mbps 640 Mbps Up to 126 Multipliers / DSP no DCM Only no Device DNA Only

Spartan-6 (45nm) Up to 150K 6 input 6-input LUT + 2FF Up to 5 Mbit Up to 8 / Up to 3.125 Gbps Yes (1.2V, 1.0V) Up to 60% less! DDR3 800 Mbps 1050 Mbps Up to 184 DSP48 Blocks Up to 4 Hard Blocks DCM & PLL Yes, Gen 1 Device DNA & AES

54

4/4/2011

Spartan-6 LX / LXT FPGAs

** All memory controller support x16 interface, except in CS225 package where x8 only is supported

Basic Architecture 109

FPGA Design Flow

55

4/4/2011

Design process (1)


Specification
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds..

Verilog description (Your Verilog Source Files)


Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;

Functional simulation

Synthesis

Post-synthesis simulation y

Design process (2)


Implementation (Mapping, Placing & Routing) Timing simulation g

Configuration On chip testing

56

4/4/2011

Design Process control from Active-HDL

Logic Synthesis
VHDL description
architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1< A1 MUX 1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;

Circuit netlist

57

4/4/2011

Synthesis Tools

XST

and others

Features of synthesis tools

Interpret RTL code p Synplify Pro: Produces synthesized circuit netlist in a standard EDIF (.edf) format
Can optionally produce .VHM (VHDL code merged into one) file for post-synthesis simulation

XST: Produces synthesized circuit netlist in NGC format Netlist is composed of gates in the particular Xilinx implementation library
http://toolbox.xilinx.com/docsan/xilinx9/books/manuals.pdf has information on libraries

Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist

58

4/4/2011

Timing report after synthesis


Performance Summary ******************* Worst slack in design: -0.924 Requested Estimated Requested Estimated Clock Clock Starting Clock Frequency Frequency Period Period Slack Type Group ------------------------------------------------------------------------------------------------------exam1|clk 85.0 MHz 78.8 MHz 11.765 12.688 -0.924 0.924 inferred Inferred_clkgroup_0 System 85.0 MHz 86.4 MHz 11.765 11.572 0.193 system default_clkgroup ===========================================================

Implementation

After synthesis the entire implementation process is performed by FPGA vendor tools

59

4/4/2011

Mapping
LUT0 LUT4 LUT1 LUT5 LUT2 FF2 LUT3 FF1

60

4/4/2011

Placing

FPGA
CLB SLICES

Routing
Programmable Connections

FPGA

61

4/4/2011

Map report header


Release 7.1.03i Map H.41 Xilinx Mapping Report File for Design 'exam1' Design Information -----------------Command Line : c:\Xilinx\bin\nt\map.exe -p 2S200FG256-6 -o map.ncd -pr b -k 4 -cm area -c 100 -tx off exam1.ngd exam1.pcf Target Device : xc2s200 Target Package : fg256 Target Speed : -6 Mapper Version : spartan2 -- $Revision: 1.26.6.4 $ Mapped Date : Wed Nov 02 11:15:15 2005

Map report
Design Summary -------------Number of errors: 0 Number of warnings: 0 Logic Utilization: Number of Slice Flip Flops: 144 out of 4,704 3% Number of 4 input LUTs: 173 out of 4,704 3% Logic Distribution: Number of occupied Slices: 145 out of 2,352 6% Number of Slices containing only related logic: 145 out of 145 100% Number of Slices containing unrelated logic: g g 0 out of 145 0% *See NOTES below for an explanation of the effects of unrelated logic Total Number 4 input LUTs: 210 out of 4,704 4% Number used as logic: 173 Number used as a route-thru: 5 Number used as 16x1 RAMs: 32 Number of bonded IOBs: 74 out of 176 42% Number of GCLKs: 1 out of 4 25% Number of GCLKIOBs: 1 out of 4 25

62

4/4/2011

Place & route report


Timing Score: 0 Asterisk (*) preceding a constraint indicates it was not met met. This may be due to a setup or hold violation. -------------------------------------------------------------------------------Constraint | Requested | Actual | Logic | | | Levels -------------------------------------------------------------------------------TS_clk = PERIOD TIMEGRP "clk" 11.765 ns | 11.765ns | 11.622ns | 13 HIGH 50% | | | -------------------------------------------------------------------------------OFFSET = OUT 11.765 ns AFTER COMP "clk" | 11.765ns | 11.491ns | 1 -------------------------------------------------------------------------------OFFSET = IN 11.765 ns BEFORE COMP "clk" | 11.765ns | 11.442ns | 2 --------------------------------------------------------------------------------

Post layout timing report


Timing summary: --------------Timing errors: 0 Score: 0 Constraints cover 42912 paths, 0 nets, and 1038 connections Design statistics: Minimum period: 11.622ns (Maximum frequency: 86.044MHz) Minimum input required time before clock: 11.442ns Minimum output required time after clock: 11.491ns

63

4/4/2011

Post-place-and-route simulation
After place-and-route performed, can do post-place-and-route simulation t l d t i l ti
Now have real timing information! Also can do static timing analysis: shows the worst case critical path in circuit

Configuration
Once a design is implemented, you must create a file that the FPGA can understand
This file is called a bit stream: a BIT file (.bit extension)

The BIT file can be downloaded directly to the FPGA, FPGA or can be converted into a PROM file which stores the programming information

64

4/4/2011

Configuration of SRAM based FPGAs

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright 2004 Mentor Graphics Corp. (www.mentor.com)

System Gates vs. Real Gates



One common metric used to measure the size of a device in the ASIC world is that of equivalent gates (e-gate) (eConvention used:

A 2-input NAND function to represent one equivalent gate. An equivalent gate consists of an arbitrary number of transistors.

Different vendors provide different functions in their cell libraries, where each implementation of each function requires a different number of transistors (difficult to compare capacity/complexity) Solution: Assign each function an equivalent gate value and sum all these values. th l How can we establish a basis for comparison between FPGAs and ASICs? Can an ASIC of 500,000 equivalent gates that needs to be migrated into an FPGA fit into a particular FPGA?

65

4/4/2011

FPGAs: System Gates



System Gates A 4-input LUT can be used to represent Gates: anywhere between one and more than twenty 2-input primitive h b t d th t t 2i t i iti logic gates. Rule of thumb?

Divide the system gates value by three, so a three million FPGA system gates would equate to one million ASIC equivalent gates!!

However, to make comparisons between two different implementations on an FPGA (i.e. Floating point adder vs. Fixed point adder) designers should use the resources available in an FPGA:

Number of 4-input LUTs used Number of embedded multipliers Number of embedded RAM blocks

State-of-the-Art FPGAs

65-90 nm process on 300 mm wafers

Lower cost per function (LUT + register) Smaller and faster transistors: Higher speed Mainly through smart interconnects, clock management, dedicated circuits, flexible I/O. Integrated transceivers running at 10 Gigabits/sec >100,000 LUTs & flip-flops >200 embedded RAMs, and same number 18 x 18 multipliers

System speed up to 500 MHz

More Logic and Better Features:

1156 pins (balls) with >800 GP I/O i (b ll ) ith 50 I/O standards, incl. LVDS with internal termination 16 low-skew global clock lines Multiple clock management circuits On-chip microprocessor(s) and multi-Gbps transceivers

66

4/4/2011

Latest Devices: Capacity & Features


Xilinx Virtex-5
65nm process Up to 960 I/Os /O >200000 logic cells Up to 552 18kb block RAMs (~10Mb RAM) 450 DSP slices (18x18 multiplier-accumulator) 20 digital clock managers (DCM) 24 high-speed serial transceivers (622Mb/s to 11.1Gb/s) Up to four PowerPC 405 cores

Altera Stratix-II
90nm process Up to 1170 I/Os 179000 logic elements 9.6Mb embedded RAM 96 DSP blocks: 380 18x18 multipliers

12 PLLs

Serial I/O up to 1Gb/s No hard processor cores

FPGAs Becoming More Attractive

21 X Bigger
C a p a c ity S peed P ric e

5.5 X Faster

50 X Less Expensive
1/9 1 1/92 1/93 1 /94 1/9 5 1/96 1/97 1/98 1 /99

Y ear

Source: Xilinx

67

4/4/2011

FPGA Shortcomings

Circuit Delay Delay increases due to programmable switches in the FPGA routing architecture Area Configuration cells and programmable resources incur substantial area penalty Power Typically not suited for low power applications
Performance ASIC Need to improve FPGA FPGA FPGA Cost ASIC Time to market ASIC

Conclusion
FPGAs are the main enabler of Reconfigurable Computing Systems FPGAs fill the gap between Instruction Set Processors (GPs) and ASICS.
Advantages: Flexible, programmable, Disadvantages: Power dissipation, performance w.r.t. ASIC

Applicability of FPGAs relies on CAD tools provided by different vendors such as Xili and Alt diff t d h Xilinx d Altera RCS can be realized with several technologies:
FPGAs: Fine/Medium Grain Coarse Grain Reconfigurable Architectures: CGRAs

68

You might also like