Professional Documents
Culture Documents
VIT_wksp
VIT_wksp
VIT_wksp
FPGA
PLD FPGA
Sandeepani 2
Two competing implementation approaches
ASIC FPGA
Application Specific Field Programmable
Integrated Circuit Gate Array
• designs must be sent • bought off the shelf
for expensive and time and reconfigured by
consuming fabrication designers themselves
in semiconductor foundry
• no physical layout design;
• designed all the way design ends with
from behavioral description a bitstream used
to physical layout to configure a device
Sandeepani 3
What are FPGAs ?
Sandeepani 4
What is an FPGA?
Configurable
Logic
Blocks
Block RAMs
Block RAMs
I/O
Blocks
Block
RAMs
Sandeepani 5
Which Way to Go?
ASICs FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in
high volumes Reconfigurability
Sandeepani 6
Other FPGA Advantages
Sandeepani 7
Major FPGA Vendors
SRAM-based FPGAs
• Xilinx, Inc.
• Altera Corp.
• Atmel
• Lattice Semiconductor
• Introduction
• CPLD’s & FPGA’s
• Summary
• Appendix
10
Xilinx Worldwide Presence
Headquarters
R&D
Sales, Marketing, Support
Manufacturing (Fab, Assy, Test)
RocketLabs
11
Redefining the PLD Market
Landscape
First to immerse the 300-MHz embedded PowerPC™
First FPGA to 90-nm process technology
Largest fabless
semiconductor
company
Leader in
multi-gigabit
I/O interfaces INVENTOR Fastest
integrated FPGA
Of the design software
FPGA
First to introduce an all-digital CPLD family
12
Xilinx Products for 2005/6
Virtex-4 FPGAs
Platform Design
Memory Rich
High Performance
Embedded PowerPC™
11.1 GHz I/O
Spartan FPGAs
SRAM Based
CPLDs Feature Rich
Low Cost Low Cost
Low Power
High Performance
13
Other Current Xilinx Products
14
Xilinx Services
• Training products
• Instructor-led classes
• Live e-learning
• Design services
• Also have Xperts 3rd-party
design services 15
Outline
• Introduction
• CPLD’s & FPGA’s
• Summary
• Appendix
16
Which Xilinx Product
Should You Use?
• CPLD versus
FPGA?
• Which CPLD
Family?
• What device,
package, speed,
etc.
17
CPLD Basics Review:
What is a CPLD?
18
CPLD Architecture
Function Block
MC0 MC0
Logic Logic
I/O I/O
Block Block
Interconnect Array
MCn MCn
MC0 MC0
Logic Logic
I/O I/O
Block Block
MCn MCn
• I/O – Input Output block. The gateway between the CPLD fabric and the outside world
• MC – Macrocell. The part of the CPLD architecture containing the register
• Logic Block – Where Product Terms are built
• Product Term – Single logical function made out of AND and OR terms
• Interconnect Array – Connection between the I/O, MC, and Logic Blocks
• Function Block – The name for a Logic Block and its associated Macrocells
19
CPLD Definitions:
High Performance
20
CPLDs versus FPGAs
MC0 MC0
• CPLD architecture I/O Logic Logic I/O
Interconnect Array
Block
– Product term array MCn
Block
MCn
– Interconnect array
– Wide fanin MC0 MC0
I/O Logic Logic
I/O
– Deterministic timing MCn
Block Block
MCn
– Low standby power
21
CPLD or FPGA?
• CPLD • FPGA
– Volatile
• Nonvolatile – Need memory
• Consistent pin-to- device to load design at power up
– Complex timing model
pin timing
– Larger, more complex designs
• Simple timing – Memory resources
model – Applications – PCI, high-speed
serial communication, embedded
• Very low power processors
consumption
• Lowest cost point
• Fast internal
performance 22
Basic FPGA
Architecture
24
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 25
Architecture
Overview
• All Xilinx FPGAs contain the same basic
resources
• Slices (grouped into CLBs)
• Contain combinatorial logic and register resources
• IOBs
• Interface between the FPGA and the outside world
• Programmable interconnect
• Other resources
• Memory
• Multipliers
• Global clock buffers
• Boundary scan logic
Basic 26
Architecture
Virtex-II Architecture
Programmable
interconnect
Dedicated
multipliers
Configurable
Logic Blocks
(CLBs)
• Virtex™-II
architecture’s core
Clock Management
voltage (DCMs, BUFGMUXes)
operates at 1.5V
Basic 27
Architecture
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 28
Architecture
Slices and CLBs
neighboring CLBs
Slice S1
• A switch matrix provides
access
Slice S0 Local Routing
to general routing
resources CIN CIN
Basic 29
Architecture
Simplified Slice Structure
CLR
• Two independent
carry chains per CLB
Basic 30
Architecture
Detailed Slice Structure
• The next few slides
discuss the slice
features
• LUTs
• MUXF5, MUXF6,
MUXF7, MUXF8
(only the F5 and
F6 MUX are shown
in this diagram)
• Carry Logic
• MULT_ANDs
• Sequential Elements
Basic 31
Architecture
Look-Up Tables
• Combinatorial logic is stored in Look-Up
Tables (LUTs) A B C D Z
• Also called Function Generators (FGs) 0 0 0 0 0
• Capacity is limited by the number of inputs, 0 0 0 1 0
not by the complexity 0 0 1 0 0
• Delay through the LUT is constant 0 0 1 1 1
0 1 0 0 1
Combinatorial Logic 0 1 0 1 1
A . . .
B 1 1 0 0 0
Z
C 1 1 0 1 0
D
1 1 1 0 0
1 1 1 1 1
Basic 32
Architecture
Connecting Look-Up Tables
F8
MUXF7 outputs (from the
F5
CLB above or below)
Slice S3
MUXF6 combines slices
F6
S2 and S3
F5
Slice S2
MUXF6 outputs
Slice S1
F5
Slice S0
F5
Basic 33
Architecture
Fast Carry Logic
Basic 34
Architecture
MULT_AND Gate
• Highly efficient multiply and add implementation
• Earlier FPGA architectures require two LUTs per bit to perform the
multiplication and addition
• The MULT_AND gate enables an area reduction by performing
the
multiply and the add in one LUT per bit
LUT
A S CO
DI
CY_MUX
CI
CY_XOR
MULT_AND
Ax B
LUT
B LUT
Basic 35
Architecture
Flexible Sequential Elements
• Either flip-flops or latches
FDRSE_1
• Two in each slice; eight in D S Q
each CLB CE
input D PRE Q
CE
asynchronous D PRE Q
CE
• All controls are shared within G
CLR
a slice
• Control signals can be inverted
locally within a slice
Basic 36
Architecture
Shift Register LUT (SRL16CE)
• Dynamically addressable
LUT
serial shift registers D D Q
CE CE
• Maximum delay of 16 clock CLK
cycles per LUT (128 per
D Q
CLB) CE
be changed A[3:0]
asynchronously Q15 (cascade out)
by toggling address A
Basic 37
Architecture
Shift Register LUT Example
• The SRL can be used to create a No Operation (NOP)
• This example uses 64 LUTs (8 CLBs) to replace 576 flip-flops (72
CLBs) and associated routing and delays
12 Cycles
Operation A Operation B
64
4 Cycles 8 Cycles
64
Operation C Operation D - NOP
3 Cycles 9 Cycles
Paths are Statically
Balanced
12 Cycles
Basic 38
Architecture
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 39
Architecture
IOB Element
• Input path
• Two DDR registers IOB
• Output path Input
Reg DDR MUX
• Two DDR registers OCK1 Reg
ICK1
• Two 3-state enable
Reg
DDR registers OCK2 3-state Reg
• Separate clocks and ICK2
Basic 40
Architecture
SelectIO Standard
• Allows direct connections to external signals of varied
voltages and thresholds
• Optimizes the speed/noise tradeoff
• Saves having to place interface components onto your board
• Differential signaling standards
• LVDS, BLVDS, ULVDS
• LDT
• LVPECL
• Single-ended I/O standards
• LVTTL, LVCMOS (3.3V, 2.5V, 1.8V, and 1.5V)
• PCI-X at 133 MHz, PCI (3.3V at 33 MHz and 66 MHz)
• GTL, GTLP
• and more!
Basic 41
Architecture
Digital Controlled
Impedance (DCI)
• DCI provides
• Output drivers that match the impedance of the traces
• On-chip termination for receivers and transmitters
• DCI advantages
• Improves signal integrity by eliminating stub reflections
• Reduces board routing complexity and component count by
eliminating external resistors
• Eliminates the effects of temperature, voltage, and process
variations by using an internal feedback circuit
Basic 42
Architecture
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and
Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 43
Architecture
Other Virtex-II Features
• Distributed RAM and block RAM
• Distributed RAM uses the CLB resources (1 LUT = 16 RAM bits)
• Block RAM is a dedicated resources on the device (18-kb blocks)
• Dedicated 18 x 18 multipliers next to block RAMs
• Clock management resources
• Sixteen dedicated global clock multiplexers
• Digital Clock Managers (DCMs)
Basic 44
Architecture
Distributed SelectRAM Resources
A2
• Asynchronous read A3
• Accompanying flip-flops
RAM32X1S RAM16X1D
can be used to create D D
synchronous read WE
WCLK
WE
WCLK
Slice A0 O A0 SPO
• RAM and ROM are A1
A2
A1
A2
configuration DPRA1
DPRA2
DPRA3
• Data can be written to RAM LUT
after configuration
• Emulated dual-port RAM
• One read/write port
• One read-only port
Basic 45
Architecture
Block SelectRAM Resources
• Up to 3.5 Mb of RAM in 18-
kb blocks 18-kb block SelectRAM memory
DIA
• Synchronous read and write DIPA
ADDRA
• True dual-port memory WEA
ENA
• Each port has synchronous SSRA DOA
read and write capability CLKA DOPA
Basic 47
Architecture
Global Clock Routing Resources
• Sixteen dedicated global clock multiplexers
• Eight on the top-center of the die, eight on the bottom-center
• Driven by a clock input pad, a DCM, or local routing
• Global clock multiplexers provide the following:
• Traditional clock buffer (BUFG) function
• Global clock enable capability (BUFGCE)
• Glitch-free switching between clock signals (BUFGMUX)
• Up to eight clock nets can be used in each clock region of
the device
• Each device contains four or more clock regions
Basic 48
Architecture
Digital Clock Manager (DCM)
• Up to twelve DCMs per device
• Located on the top and bottom edges of the die
• Driven by clock input pads
• DCMs provide the following:
• Delay-Locked Loop (DLL)
• Digital Frequency Synthesizer (DFS)
• Digital Phase Shifter (DPS)
• Up to four outputs of each DCM can drive onto global
clock buffers
• All DCM outputs can drive general routing
Basic 49
Architecture
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-
3E, and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 50
Architecture
Spartan-3 versus Virtex-II
• More I/O pins per
• Lower cost package
• Smaller process = lower • Only one-half of the slices
core voltage support RAM or SRL16s
• .09 micron versus .15 (SLICEM)
micron
• Vccint = 1.2V versus 1.5V
• Fewer block RAMs and
multiplier blocks
• Different I/O standard
• Same size and functionality
support
• New standards: 1.2V
• Eight global clock
LVCMOS, 1.8V HSTL, multiplexers
and SSTL • Two or four DCM blocks
• Default is LVCMOS,
• No internal 3-state buffers
versus LVTTL
• 3-state buffers are in the I/O
Basic 51
Architecture
SLICEM and SLICEL
• Each Spartan™-3 CLB
Right-Hand SLICEL
Left-Hand SLICEM
contains four slices COUT COUT
• Similar to the Virtex™-II
• Slices are grouped in Slice X1Y1
pairs
• Left-hand SLICEM Slice X1Y0
SHIFTIN
(Memory) Switch
Matrix
• LUTs can be
configured as memory Slice X0Y1
or SRL16
• Right-hand SLICEL Fast Connects
Slice X0Y0
(Logic)
• LUT can be used as CIN
SHIFTOUT CIN
logic only
Basic 52
Architecture
Spartan-3E Features
• 16 BUFGMUXes on left
• More gates per I/O than and right sides
Spartan-3 • Drive half the chip only
• Removed some I/O • In addition to eight global
standards clocks
• Higher-drive LVCMOS • Pipelined multipliers
• GTL, GTLP • Additional configuration
• SSTL2_II modes
• HSTL_II_18, HSTL_I, • SPI, BPI
HSTL_III
• Multi-Boot mode
• LVDS_EXT, ULVDS
• DDR Cascade
• Internal data is presented
on a single clock edge
Basic 53
Architecture
Virtex-II Pro Features
• 0.13 micron process
• Up to 24 RocketIO™ Multi-Gigabit Transceiver (MGT)
blocks
• Serializer and deserializer (SERDES)
• Fibre Channel, Gigabit Ethernet, XAUI, Infiniband compliant
transceivers, and others
• 8-, 16-, and 32-bit selectable FPGA interface
• 8B/10B encoder and decoder
• PowerPC™ RISC processor blocks
• Thirty-two 32-bit General Purpose Registers (GPRs)
• Low power consumption: 0.9mW/MHz
• IBM CoreConnect bus architecture support
Basic 54
Architecture
Outline
• Overview
• Slice Resources
• I/O Resources
• Memory and Clocking
• Spartan-3, Spartan-3E,
and Virtex-II Pro
Features
• Virtex-4 Features
• Summary
• Appendix
Basic 55
Architecture
Virtex-4 Architecture Has the Most
Advanced Feature Set
RocketIO™
Multi-Gigabit Smart RAM
New block RAM/FIFO
Transceivers
622 Mbps–10.3 Gbps
Xesium Clocking
Advanced CLBs Technology
200K Logic Cells 500 MHz
Tri-Mode
Ethernet MAC
XtremeDSP™ 10/100/1000 Mbps
Technology Slices
256 18x18 GMACs
1 Gbps SelectIO™
PowerPC™ 405 ChipSync™ Source synch,
with APU Interface XCITE Active Termination
450 MHz, 680 DMIPS
Basic 56
Architecture
Choose the Platform that Best
Fits the Application
LX FX SX
Resource
Logic 14K–
14K –200K LCs 12K–
12K –140K LCs 23K–
23K –55K LCs
Memory 0.9–
0.9–6 Mb 0.6–
0.6–10 Mb 2.3–
2.3–5.7 Mb
SelectIO 240–
240–960 240–
240–896 320–
320–640
N/A N/A
RocketIO 0–24 Channels
N/A N/A
PowerPC 1 or 2 Cores
N/A N/A
Ethernet MAC 2 or 4 Cores
Basic 57
Architecture
Xilinx Tool Flow
Tool Flow 59 59
Outline
• Overview
• ISE
• Summary
• Lab 1: Xilinx Tool Flow
Demo
Tool Flow 60 60
Xilinx Design Flow
Map
Tool Flow 61 61
See Development System Reference
Guide for Flow Diagrams
Tool Flow 62 62
Design Entry Methods: HDL or
Schematic
• Plan and budget
• Whichever method you use, you will need a tool to
generate an EDIF or NGC netlist to bring into the
Xilinx implementation tools
• Popular synthesis tools include: Synplify, Precision, FPGA Compiler
II, and XST
• Tools available to assist in design entry
• Architecture Wizard, CORE Generator™ system, and StateCAD
tools
Plan & Budget Create Code/ HDL RTL
• Simulate the design toSchematic
ensure that it works as
Simulation
expected!
... Functional Synthesize
Simulation to create netlist
Tool Flow 63 63
Xilinx Implementation
• Once you generate a
netlist, Implement
you can implement the
Translate ...
design
• There are several outputs Map
of implementation
• Reports Place & Route
• Timing simulation netlists
.
• Floorplan files .
• FPGA Editor files .
• and more!
Tool Flow 64 64
What is Implementation?
• More than just Place & Route
• Implementation includes many phases
• Translate: Merge multiple design files into a single netlist
• Map: Group logical symbols from the netlist (gates) into physical
components (slices and IOBs)
• Place & Route: Place components onto the chip, connect the
components, and extract timing data into reports
• Each phase generates files that allow you to use other
Xilinx tools
• Floorplanner, FPGA Editor, XPower
Tool Flow 65 65
Timing Closure
Tool Flow 66 66
Download
• Once a design is implemented, you must create a file that
the FPGA can understand
• This file is called a bitstream: a BIT file (.bit extension)
• The BIT file can be downloaded directly into the FPGA, or
the BIT file can be converted into a PROM file, which
stores the programming information
Tool Flow 67 67
Outline
• Overview
• ISE
• Summary
• Lab 1: Xilinx Tool Flow
Demo
Tool Flow 68 68
ISE Project Navigator
Tool Flow 69 69
Implementing a Design
• Implement a design:
• Select the top-level
source file in the
Sources in Project
window
• HDL, schematic, or
EDIF,
depending on your
design flow
• Double-click
Implement Design in
the Processes for
Source window
Tool Flow 70 70
Checking the Implementation
Status
• The ISE™ software will run all
of the necessary steps to
implement the design
• Synthesize HDL code
• Translate
• Map
• Place & Route
= process was completed
successfully
! = warnings
? = a file that is out of date
X = errors
Tool Flow 71 71
Simulating a Design
• Simulate a design:
• Select Sources for:
Behavioral Simulation
• Expand Xilinx ISE
Simulator in the Processes
for Source window
• Double-click Simulate
Behavioral Model or
Simulate Post-Place &
Route
Model
• You can also simulate after
Translate or after Map
Tool Flow 72 72
Viewing Subprocesses
• Quick View of
Reports,
Constraints
• Project Status
• Device
Utilization
• Design
Summary
Options
• Performance
and Constraints
• Reports
Tool Flow 74 74
Programming the FPGA
• There are two ways
to program an FPGA
• Through a PROM
device
• You must generate a
file that the PROM
programmer can
understand
• Directly from the
computer
• Use the iMPACT
configuration tool
Tool Flow 75 75