VIT_wksp

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Introduction to

FPGA

Sandeepani George Mason University


World of Integrated Circuits
Integrated Circuits

Full-Custom Semi-Custom User


ASICs ASICs Programmable

PLD FPGA

PAL PLA PML LUT MUX Gates


(Look-Up Table)

Sandeepani 2
Two competing implementation approaches

ASIC FPGA
Application Specific Field Programmable
Integrated Circuit Gate Array
• designs must be sent • bought off the shelf
for expensive and time and reconfigured by
consuming fabrication designers themselves
in semiconductor foundry
• no physical layout design;
• designed all the way design ends with
from behavioral description a bitstream used
to physical layout to configure a device

Sandeepani 3
What are FPGAs ?

Field Programmable gate arrays (FPGAs) are


digital integrated circuits (Ics) that contain
configurable (Programmable) blocks of logic
along with configurable interconnects between
these blocks .Design engineers can
configure(Program) such devices to perform a
tremendous variety of tasks.

Sandeepani 4
What is an FPGA?

Configurable
Logic
Blocks
Block RAMs

Block RAMs
I/O
Blocks

Block
RAMs

Sandeepani 5
Which Way to Go?
ASICs FPGAs

Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in
high volumes Reconfigurability

Sandeepani 6
Other FPGA Advantages

• Manufacturing cycle for ASIC is very costly,


lengthy and engages lots of manpower
• Mistakes not detected at design time have
large impact on development time and cost
• FPGAs are perfect for rapid prototyping of
digital circuits
• Easy upgrades like in case of software
• Unique applications
• reconfigurable computing

Sandeepani 7
Major FPGA Vendors
SRAM-based FPGAs
• Xilinx, Inc.
• Altera Corp.
• Atmel
• Lattice Semiconductor

Flash & antifuse FPGAs


• Actel Corp.
• Quick Logic Corp.
Sandeepani 8
Introduction to
Xilinx Products

© 2006 Xilinx, Inc. All Rights Reserved


Outline

• Introduction
• CPLD’s & FPGA’s
• Summary
• Appendix

10
Xilinx Worldwide Presence

Headquarters
R&D
Sales, Marketing, Support
Manufacturing (Fab, Assy, Test)
RocketLabs

11
Redefining the PLD Market
Landscape
First to immerse the 300-MHz embedded PowerPC™
First FPGA to 90-nm process technology
Largest fabless
semiconductor
company

Leader in
multi-gigabit
I/O interfaces INVENTOR Fastest
integrated FPGA
Of the design software
FPGA
First to introduce an all-digital CPLD family
12
Xilinx Products for 2005/6

Virtex-4 FPGAs
Platform Design
Memory Rich
High Performance
Embedded PowerPC™
11.1 GHz I/O
Spartan FPGAs
SRAM Based
CPLDs Feature Rich
Low Cost Low Cost
Low Power
High Performance

10K 300K 8M 10M


Increasing Density (System Gates)

13
Other Current Xilinx Products

Other CPLDs Software Other FPGAs

14
Xilinx Services

• Web and phone support


• www.xilinx.com/support
• 24/7 Web support

• Training products
• Instructor-led classes
• Live e-learning

• Design services
• Also have Xperts 3rd-party
design services 15
Outline

• Introduction
• CPLD’s & FPGA’s
• Summary
• Appendix

16
Which Xilinx Product
Should You Use?
• CPLD versus
FPGA?
• Which CPLD
Family?
• What device,
package, speed,
etc.

17
CPLD Basics Review:
What is a CPLD?

• Definition: Complex Programmable Logic Device – A hybrid of PLD blocks


and interconnect for mid-size logic designs
IO/Registers/Logic Interconnect IO/Registers/Logic

18
CPLD Architecture
Function Block

MC0 MC0
Logic Logic
I/O I/O
Block Block

Interconnect Array
MCn MCn

MC0 MC0
Logic Logic
I/O I/O
Block Block
MCn MCn

• I/O – Input Output block. The gateway between the CPLD fabric and the outside world
• MC – Macrocell. The part of the CPLD architecture containing the register
• Logic Block – Where Product Terms are built
• Product Term – Single logical function made out of AND and OR terms
• Interconnect Array – Connection between the I/O, MC, and Logic Blocks
• Function Block – The name for a Logic Block and its associated Macrocells

19
CPLD Definitions:
High Performance

• Pin-to-Pin • Maximum registered frequency


Fastest operation of flip-flops (MHz)
combinatorial delay –

• Time from input, thru


interconnect to
output (ns)

Tpd (ns) Fmax (MHz)

20
CPLDs versus FPGAs
MC0 MC0
• CPLD architecture I/O Logic Logic I/O

Interconnect Array
Block
– Product term array MCn
Block
MCn

– Interconnect array
– Wide fanin MC0 MC0
I/O Logic Logic
I/O
– Deterministic timing MCn
Block Block
MCn
– Low standby power

• FPGA architecture Logic Logic Logic Logic Logic


Cell Cell Cell Cell Cell

– Look-up table based Logic Logic Logic Logic Logic


Cell Cell Cell Cell Cell
– X/Y routing matrix
Logic Logic Logic Logic Logic
– Higher density Cell Cell Cell Cell Cell
– Additional features Logic Logic Logic Logic Logic
• DLL Cell Cell Cell Cell Cell

• Multipliers Logic Logic Logic Logic Logic


Cell Cell Cell Cell Cell

21
CPLD or FPGA?

• CPLD • FPGA
– Volatile
• Nonvolatile – Need memory
• Consistent pin-to- device to load design at power up
– Complex timing model
pin timing
– Larger, more complex designs
• Simple timing – Memory resources
model – Applications – PCI, high-speed
serial communication, embedded
• Very low power processors
consumption
• Lowest cost point
• Fast internal
performance 22
Basic FPGA
Architecture

This material exempt per Department of Commerce George Mason University


Objectives
After completing this module, you will be able
to:
• Identify the basic architectural resources of the Virtex™-II
FPGA
• List the differences between the Virtex-II, Virtex-II Pro,
Spartan™-3, and Spartan-3E devices
• List the new and enhanced features of the new Virtex-4
device family

24
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix

Basic 25
Architecture
Overview
• All Xilinx FPGAs contain the same basic
resources
• Slices (grouped into CLBs)
• Contain combinatorial logic and register resources
• IOBs
• Interface between the FPGA and the outside world
• Programmable interconnect
• Other resources
• Memory
• Multipliers
• Global clock buffers
• Boundary scan logic

Basic 26
Architecture
Virtex-II Architecture

Block SelectRAM™ I/O Blocks (IOBs)


resource

Programmable
interconnect
Dedicated
multipliers
Configurable
Logic Blocks
(CLBs)

• Virtex™-II
architecture’s core
Clock Management
voltage (DCMs, BUFGMUXes)
operates at 1.5V
Basic 27
Architecture
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix

Basic 28
Architecture
Slices and CLBs

• Each Virtex-II CLB


COUT COUT
contains four slices BUFT
BUF T

• Local routing provides Slice S3

feedback between slices


in the same CLB, and it Slice S2

provides routing to Switch


Matrix
SHIFT

neighboring CLBs
Slice S1
• A switch matrix provides
access
Slice S0 Local Routing
to general routing
resources CIN CIN

Basic 29
Architecture
Simplified Slice Structure

• Each slice has four


outputs
• Two registered outputs, Slice 0
two non-registered outputs
PRE
• Two BUFTs associated LUT Carry D Q
CE
with each CLB, accessible
CLR
by all 16 CLB outputs
• Carry logic runs
vertically, LUT Carry D PRE
Q
up only CE

CLR
• Two independent
carry chains per CLB

Basic 30
Architecture
Detailed Slice Structure
• The next few slides
discuss the slice
features
• LUTs
• MUXF5, MUXF6,
MUXF7, MUXF8
(only the F5 and
F6 MUX are shown
in this diagram)
• Carry Logic
• MULT_ANDs
• Sequential Elements

Basic 31
Architecture
Look-Up Tables
• Combinatorial logic is stored in Look-Up
Tables (LUTs) A B C D Z
• Also called Function Generators (FGs) 0 0 0 0 0
• Capacity is limited by the number of inputs, 0 0 0 1 0
not by the complexity 0 0 1 0 0
• Delay through the LUT is constant 0 0 1 1 1
0 1 0 0 1
Combinatorial Logic 0 1 0 1 1
A . . .
B 1 1 0 0 0
Z
C 1 1 0 1 0
D
1 1 1 0 0
1 1 1 1 1
Basic 32
Architecture
Connecting Look-Up Tables

MUXF8 combines the two


CLB

F8
MUXF7 outputs (from the

F5
CLB above or below)
Slice S3
MUXF6 combines slices

F6
S2 and S3

F5
Slice S2

MUXF7 combines the two


F7

MUXF6 outputs
Slice S1
F5

MUXF6 combines slices S0 and S1


F6

Slice S0
F5

MUXF5 combines LUTs in each slice

Basic 33
Architecture
Fast Carry Logic

• Simple, fast, and COUT COUT


To S0 of the
complete arithmetic next CLB
To CIN of S2 of the next
CLB
Logic SLICE
• Dedicated XOR gate S3
First Carry CIN
for single-level sum Chain COUT
completion
• Uses dedicated SLICE
S2
routing resources
• All synthesis tools SLICE
can infer carry logic CIN
S1
COUT
Second
Carry
Chain
SLICE
S0
CIN CIN CLB

Basic 34
Architecture
MULT_AND Gate
• Highly efficient multiply and add implementation
• Earlier FPGA architectures require two LUTs per bit to perform the
multiplication and addition
• The MULT_AND gate enables an area reduction by performing
the
multiply and the add in one LUT per bit
LUT

A S CO
DI
CY_MUX
CI

CY_XOR

MULT_AND

Ax B

LUT

B LUT

Basic 35
Architecture
Flexible Sequential Elements
• Either flip-flops or latches
FDRSE_1
• Two in each slice; eight in D S Q
each CLB CE

• Inputs come from LUTs or R

from an independent CLB FDCPE

input D PRE Q
CE

• Separate set and reset CLR


controls
• Can be synchronous or LDCPE

asynchronous D PRE Q
CE
• All controls are shared within G
CLR
a slice
• Control signals can be inverted
locally within a slice
Basic 36
Architecture
Shift Register LUT (SRL16CE)

• Dynamically addressable
LUT
serial shift registers D D Q
CE CE
• Maximum delay of 16 clock CLK
cycles per LUT (128 per
D Q
CLB) CE

• Cascadable to other LUTs or


CLBs for longer shift D Q
CE
Q
registers
• Dedicated connection from
Q15 to D input of the next
SRL16CE
D Q
LUT
• Shift register length can CE

be changed A[3:0]
asynchronously Q15 (cascade out)

by toggling address A
Basic 37
Architecture
Shift Register LUT Example
• The SRL can be used to create a No Operation (NOP)
• This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72
CLBs) and associated routing and delays

12 Cycles

Operation A Operation B
64
4 Cycles 8 Cycles
64
Operation C Operation D - NOP

3 Cycles 9 Cycles
Paths are Statically
Balanced
12 Cycles

Basic 38
Architecture
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix

Basic 39
Architecture
IOB Element
• Input path
• Two DDR registers IOB
• Output path Input
Reg DDR MUX
• Two DDR registers OCK1 Reg
ICK1
• Two 3-state enable
Reg
DDR registers OCK2 3-state Reg
• Separate clocks and ICK2

clock enables for I and O


Reg DDR MUX
• Set and reset signals OCK1
are shared PAD
Reg
OCK2 Output

Basic 40
Architecture
SelectIO Standard
• Allows direct connections to external signals of varied
voltages and thresholds
• Optimizes the speed/noise tradeoff
• Saves having to place interface components onto your board
• Differential signaling standards
• LVDS, BLVDS, ULVDS
• LDT
• LVPECL
• Single-ended I/O standards
• LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V)
• PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz)
• GTL, GTLP
• and more!

Basic 41
Architecture
Digital Controlled
Impedance (DCI)
• DCI provides
• Output drivers that match the impedance of the traces
• On-chip termination for receivers and transmitters
• DCI advantages
• Improves signal integrity by eliminating stub reflections
• Reduces board routing complexity and component count by
eliminating external resistors
• Eliminates the effects of temperature, voltage, and process
variations by using an internal feedback circuit

Basic 42
Architecture
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and
Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 43
Architecture
Other Virtex-II Features
• Distributed RAM and block RAM
• Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits)
• Block RAM is a dedicated resources on the device (18-kb blocks)
• Dedicated 18 x 18 multipliers next to block RAMs
• Clock management resources
• Sixteen dedicated global clock multiplexers
• Digital Clock Managers (DCMs)

Basic 44
Architecture
Distributed SelectRAM Resources

• Uses a LUT in a slice as RAM16X1S


D
memory WE
WCLK
LUT
• Synchronous write A0
A1
O

A2

• Asynchronous read A3

• Accompanying flip-flops
RAM32X1S RAM16X1D
can be used to create D D

synchronous read WE
WCLK
WE
WCLK
Slice A0 O A0 SPO
• RAM and ROM are A1
A2
A1
A2

initialized during LUT A3


A4
A3
DPRA0 DPO

configuration DPRA1
DPRA2
DPRA3
• Data can be written to RAM LUT
after configuration
• Emulated dual-port RAM
• One read/write port
• One read-only port
Basic 45
Architecture
Block SelectRAM Resources
• Up to 3.5 Mb of RAM in 18-
kb blocks 18-kb block SelectRAM memory
DIA
• Synchronous read and write DIPA
ADDRA
• True dual-port memory WEA
ENA
• Each port has synchronous SSRA DOA
read and write capability CLKA DOPA

• Different clocks for each port DIB


DIPB
• Supports initial values ADDRB
WEB
• Synchronous reset on ENB
SSRB DOB
output latches CLKB DOPB

• Supports parity bits


• One parity bit per eight data
bits
Basic 46
Architecture
Dedicated Multiplier Blocks
• 18-bit twos complement signed operation
• Optimized to implement Multiply and Accumulate
functions
• Multipliers are physically located next to block
SelectRAM™ memory
Data_A
(18 bits)
4 x 4 signed
8 x 8 signed
18 x 18 Output
Multiplier (36 bits) 12 x 12 signed
18 x 18 signed
Data_B
(18 bits)

Basic 47
Architecture
Global Clock Routing Resources
• Sixteen dedicated global clock multiplexers
• Eight on the top-center of the die, eight on the bottom-center
• Driven by a clock input pad, a DCM, or local routing
• Global clock multiplexers provide the following:
• Traditional clock buffer (BUFG) function
• Global clock enable capability (BUFGCE)
• Glitch-free switching between clock signals (BUFGMUX)
• Up to eight clock nets can be used in each clock region of
the device
• Each device contains four or more clock regions

Basic 48
Architecture
Digital Clock Manager (DCM)
• Up to twelve DCMs per device
• Located on the top and bottom edges of the die
• Driven by clock input pads
• DCMs provide the following:
• Delay-Locked Loop (DLL)
• Digital Frequency Synthesizer (DFS)
• Digital Phase Shifter (DPS)
• Up to four outputs of each DCM can drive onto global
clock buffers
• All DCM outputs can drive general routing

Basic 49
Architecture
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-
3E, and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix

Basic 50
Architecture
Spartan-3 versus Virtex-II
• More I/O pins per
• Lower cost package
• Smaller process = lower • Only one-half of the slices
core voltage support RAM or SRL16s
• .09 micron versus .15 (SLICEM)
micron
• Vccint = 1.2V versus 1.5V
• Fewer block RAMs and
multiplier blocks
• Different I/O standard
• Same size and functionality
support
• New standards: 1.2V
• Eight global clock
LVCMOS, 1.8V HSTL, multiplexers
and SSTL • Two or four DCM blocks
• Default is LVCMOS,
• No internal 3-state buffers
versus LVTTL
• 3-state buffers are in the I/O
Basic 51
Architecture
SLICEM and SLICEL
• Each Spartan™-3 CLB
Right-Hand SLICEL
Left-Hand SLICEM
contains four slices COUT COUT
• Similar to the Virtex™-II
• Slices are grouped in Slice X1Y1

pairs
• Left-hand SLICEM Slice X1Y0
SHIFTIN
(Memory) Switch
Matrix
• LUTs can be
configured as memory Slice X0Y1
or SRL16
• Right-hand SLICEL Fast Connects
Slice X0Y0
(Logic)
• LUT can be used as CIN
SHIFTOUT CIN
logic only

Basic 52
Architecture
Spartan-3E Features
• 16 BUFGMUXes on left
• More gates per I/O than and right sides
Spartan-3 • Drive half the chip only
• Removed some I/O • In addition to eight global
standards clocks
• Higher-drive LVCMOS • Pipelined multipliers
• GTL, GTLP • Additional configuration
• SSTL2_II modes
• HSTL_II_18, HSTL_I, • SPI, BPI
HSTL_III
• Multi-Boot mode
• LVDS_EXT, ULVDS
• DDR Cascade
• Internal data is presented
on a single clock edge
Basic 53
Architecture
Virtex-II Pro Features
• 0.13 micron process
• Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT)
blocks
• Serializer and deserializer (SERDES)
• Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant
transceivers, and others
• 8-, 16-, and 32-bit selectable FPGA interface
• 8B/10B encoder and decoder
• PowerPC™ RISC processor blocks
• Thirty-two 32-bit General Purpose Registers (GPRs)
• Low power consumption: 0.9mW/MHz
• IBM CoreConnect bus architecture support

Basic 54
Architecture
Outline

• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix

Basic 55
Architecture
Virtex-4 Architecture Has the Most
Advanced Feature Set
RocketIO™
Multi-Gigabit Smart RAM
New block RAM/FIFO
Transceivers
622 Mbps–10.3 Gbps

Xesium Clocking
Advanced CLBs Technology
200K Logic Cells 500 MHz

Tri-Mode
Ethernet MAC
XtremeDSP™ 10/100/1000 Mbps
Technology Slices
256 18x18 GMACs
1 Gbps SelectIO™
PowerPC™ 405 ChipSync™ Source synch,
with APU Interface XCITE Active Termination
450 MHz, 680 DMIPS

Basic 56
Architecture
Choose the Platform that Best
Fits the Application
LX FX SX
Resource

Logic 14K–
14K –200K LCs 12K–
12K –140K LCs 23K–
23K –55K LCs

Memory 0.9–
0.9–6 Mb 0.6–
0.6–10 Mb 2.3–
2.3–5.7 Mb

DCMs 4–12 4–20 4–8

DSP Slices 32–


32–96 32–
32–192 128–
128–512

SelectIO 240–
240–960 240–
240–896 320–
320–640

N/A N/A
RocketIO 0–24 Channels

N/A N/A
PowerPC 1 or 2 Cores

N/A N/A
Ethernet MAC 2 or 4 Cores

Basic 57
Architecture
Xilinx Tool Flow

This material exempt per Department of George Mason University


Commerce license exception TSU
Objectives
After completing this module, you will be able
to:
• List the steps of the Xilinx design process
• Implement and simulate an FPGA design by using default
software options

Tool Flow 59 59
Outline

• Overview
• ISE
• Summary
• Lab 1: Xilinx Tool Flow
Demo

Tool Flow 60 60
Xilinx Design Flow

Plan & Create Code/ HDL RTL


Budget Schematic Simulation
Implement
Functional Synthesize
Translate
Simulation to create netlist

Map

Place & Route

Attain Timing Timing Create


Closure Simulation BIT File

Tool Flow 61 61
See Development System Reference
Guide for Flow Diagrams

Tool Flow 62 62
Design Entry Methods: HDL or
Schematic
• Plan and budget
• Whichever method you use, you will need a tool to
generate an EDIF or NGC netlist to bring into the
Xilinx implementation tools
• Popular synthesis tools include: Synplify, Precision, FPGA Compiler
II, and XST
• Tools available to assist in design entry
• Architecture Wizard, CORE Generator™ system, and StateCAD
tools
Plan & Budget Create Code/ HDL RTL
• Simulate the design toSchematic
ensure that it works as
Simulation
expected!
... Functional Synthesize
Simulation to create netlist

Tool Flow 63 63
Xilinx Implementation
• Once you generate a
netlist, Implement
you can implement the
Translate ...
design
• There are several outputs Map
of implementation
• Reports Place & Route
• Timing simulation netlists
.
• Floorplan files .
• FPGA Editor files .
• and more!

Tool Flow 64 64
What is Implementation?
• More than just Place & Route
• Implementation includes many phases
• Translate: Merge multiple design files into a single netlist
• Map: Group logical symbols from the netlist (gates) into physical
components (slices and IOBs)
• Place & Route: Place components onto the chip, connect the
components, and extract timing data into reports
• Each phase generates files that allow you to use other
Xilinx tools
• Floorplanner, FPGA Editor, XPower

Tool Flow 65 65
Timing Closure

Tool Flow 66 66
Download
• Once a design is implemented, you must create a file that
the FPGA can understand
• This file is called a bitstream: a BIT file (.bit extension)
• The BIT file can be downloaded directly into the FPGA, or
the BIT file can be converted into a PROM file, which
stores the programming information

Tool Flow 67 67
Outline

• Overview
• ISE
• Summary
• Lab 1: Xilinx Tool Flow
Demo

Tool Flow 68 68
ISE Project Navigator

• Built around the


Xilinx design flow
• Access to synthesis
and schematic tools
• Including third-
party synthesis
tools
• Implement your
design with a simple
double-click
• Fine-tune with
easy-to-access
software options

Tool Flow 69 69
Implementing a Design
• Implement a design:
• Select the top-level
source file in the
Sources in Project
window
• HDL, schematic, or
EDIF,
depending on your
design flow
• Double-click
Implement Design in
the Processes for
Source window

Tool Flow 70 70
Checking the Implementation
Status
• The ISE™ software will run all
of the necessary steps to
implement the design
• Synthesize HDL code
• Translate
• Map
• Place & Route
= process was completed
successfully
! = warnings
? = a file that is out of date
X = errors

Tool Flow 71 71
Simulating a Design
• Simulate a design:
• Select Sources for:
Behavioral Simulation
• Expand Xilinx ISE
Simulator in the Processes
for Source window
• Double-click Simulate
Behavioral Model or
Simulate Post-Place &
Route
Model
• You can also simulate after
Translate or after Map

Tool Flow 72 72
Viewing Subprocesses

• Expand each process to


view subtools and
subprocesses
• Translate
• Floorplan
• Assign package pins
• Map
• Analyze timing
• Place & Route
• Analyze timing
• Floorplan
• FPGA Editor
• Analyze power
• Create simulation model
Tool Flow 73 73
The Design Summary Displays
Design Data

• Quick View of
Reports,
Constraints
• Project Status
• Device
Utilization
• Design
Summary
Options
• Performance
and Constraints
• Reports
Tool Flow 74 74
Programming the FPGA
• There are two ways
to program an FPGA
• Through a PROM
device
• You must generate a
file that the PROM
programmer can
understand
• Directly from the
computer
• Use the iMPACT
configuration tool

Tool Flow 75 75

You might also like