Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

A

Generic FPGA

Switchbox
FS = 3

Connection box
FC = 3

IDEA Lab
integrated design, engineering, algorithmics
FPGA Components

• Problem: How to handle sequential logic


– Truth tables don’t work
• Possible solution:
– Add a flip-flop to the output of LUT
• BLEs: the basic logic element
– Circuit can now use output from LUT or from FF

IDEA Lab
integrated design, engineering, algorithmics
FPGA Architecture
switchbox
IO connections

– Some logic clusters are large (e.g.


Altera contains 8 LUT-FF pairs)
Logic
Cluster – Three important issues:
• Logic elements per cluster
• Cluster connectivity to
interconnect – wires (FC) –
connection flexibility
• Switchbox flexibility (Fs)

IDEA Lab
integrated design, engineering, algorithmics
FPGA Architecture and CAD

Synthesis

Technology Mapping

Placement/Floorplanning

Routing

Generate Programming
Data

Programming the Device

IDEA Lab
integrated design, engineering, algorithmics
Logic Block Area (LUT)

LBArea = α + (β × 2 ) k

J. Rose, R. Francis, D. Lewis, P. Chow, “Architecture of Field Programmable


Gate Arrays: The Effect of Logic Block Functionality on Area Efficiency”,
IEEE JSSC, Oct. 1990, pp. 1217-1225.

IDEA Lab
integrated design, engineering, algorithmics
Logic Block Architecture
Routing Architecture Efficiency

RArea = 2( LBside × W × λ ) + (W × λ ) 2

LBside λ

IDEA Lab
integrated design, engineering, algorithmics
Area

IDEA Lab
integrated design, engineering, algorithmics
Area v/s Inputs

IDEA Lab
integrated design, engineering, algorithmics
FPGA Architecture

IDEA Lab
integrated design, engineering, algorithmics
Virtex Slice

IDEA Lab
integrated design, engineering, algorithmics
The Logic Cluster

– Question: How many BLE should


there be per cluster?

IDEA Lab
integrated design, engineering, algorithmics
Some Interesting Questions

• For cluster based logic with N LUTs of size K


and I inputs to the cluster,
– What is the value of I so that 98% of the LUTs in
cluster can be utilized?
• What is the effect of K and N on FPGA area?
• What is the effect of K and N on FPGA delay?
• For what value of K and N, we obtain best
area-delay product?

IDEA Lab
integrated design, engineering, algorithmics
Cluster Based Logic

IDEA Lab
integrated design, engineering, algorithmics
Architectural Issues

– Differences from modern commercial FPGAs


• Channel wires driven by muxes
• Limited intra-cluster mux population
– Still provides interesting analysis
IDEA Lab
integrated design, engineering, algorithmics
Logic Utilization v/s Input Cluster Pins

IDEA Lab
integrated design, engineering, algorithmics
Number of Inputs per Cluster (98% utilization)

– Lots of opportunities for input sharing in large clusters (Betz –


CICC’99)
– Reducing inputs reduces the size of the device and makes it faster.
– I = K/2 * (N + 1)
IDEA Lab
integrated design, engineering, algorithmics
Effect of N and K on Area

Looks like cluster size N = 6-8 is good, K = 4-5


IDEA Lab
integrated design, engineering, algorithmics
Effect of N and K on Area
Intra-cluster area

IDEA Lab
integrated design, engineering, algorithmics
Effect of N and K on Area
Inter-cluster area

IDEA Lab
integrated design, engineering, algorithmics
Effect of N and K on Performance

Inconclusive: Big K and N > 3 value looks good


IDEA Lab
integrated design, engineering, algorithmics
Effect of N and K on Area-delay product

K = 4-6, N= 4-10 looks OK


IDEA Lab
integrated design, engineering, algorithmics
Virtex CLB

IDEA Lab
integrated design, engineering, algorithmics
Virtex Slice

IDEA Lab
integrated design, engineering, algorithmics
CLB Slice Structure

} Each slice contains two sets of the following:


◦ Four-input LUT
– Any 4-input logic function,
– or 16-bit x 1 sync RAM (SLICEM only)
– or 16-bit shift register (SLICEM only)
◦ Carry & Control
– Fast arithmetic logic
– Multiplier logic
– Multiplexer logic
◦ Storage element
– Latch or flip-flop
– Set and reset
– True or inverted inputs
– Sync. or async. control

IDEA Lab
integrated design, engineering, algorithmics
Implement Two 4-input Functions

4-input
function

3-input
function;
registered

IDEA Lab
integrated design, engineering, algorithmics
Implement Some Larger Functions

e.g. 9-input
parity

IDEA Lab
integrated design, engineering, algorithmics
FPGA CAD Flow

Logic Optimization Behavioral Synthesis

Technology Mapping RTL Structural Synthesis

Packing
Original Networks

Placement/Floorplanning
Logic Synthesis

Routing

Generate Programming
Data

Programming the Device

IDEA Lab
integrated design, engineering, algorithmics
Clustering

IDEA Lab
integrated design, engineering, algorithmics
Clustering
• Need to group BLEs into groups
• Goals:
– Minimize number of clusters
– Minimize inter-cluster wiring
– Minimize critical path (timing-driven)
• How do we do this
– Take advantage of cluster architecture

IDEA Lab
integrated design, engineering, algorithmics
VPACK

IDEA Lab
integrated design, engineering, algorithmics
Basic Clustering

• Flow
– Iterate until all BLEs consumed
– Start new cluster by selecting a random BLE
• select the currently unclustered BLE with the most used inputs,
– Add BLE with most shared inputs with current cluster to
cluster
• to minimize the number of inputs that must be routed to each
cluster.
– Keep adding until either cluster full or input pins used up
– Hill climbing – if some cluster BLEs unused
• Add another BLE even if cluster input count temporarily
overflowed
• If input count not eventually reduced select best choice from
before hill climbing

IDEA Lab
integrated design, engineering, algorithmics
Simple Example

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation

PI1 1 4 6 5 PO1

PI2 3 6 6 7 PO2

PI3 1 4 5 4 PO3

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation

0
PI1 1 4 6 5 PO1

0
PI2 3 6 6 7 PO2

0
PI3 1 4 5 4 PO3

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0
PI1 1 4 6 5 PO1

0
PI2 3 6 6 7 PO2
3

3
0
PI3 1 4 5 4 PO3
1

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0
PI1 1 4 6 5 PO1
7

0
PI2 3 6 6 7 PO2
3 9

0
PI3 1 4 5 4 PO3
1 7

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0
PI1
13 PO1
1 4 6 5
7

0 15
PI2 3 6 6 7 PO2
3 9

0
PI3 1 4 5 4 PO3
14
1 7

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0 18
PI1
13 PO1
1 4 6 5
7

0 15 22
PI2 3 6 6 7 PO2
3 9

0 18
PI3 1 4 5 4 PO3
14
1 7

IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0 18/22
PI1
13 PO1
1 4 6 5
7

0 15 22/22
PI2 3 6 6 7 PO2
3 9

0 18/22
PI3 1 4 5 4 PO3
14
1 7

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1
0 18/22
PI1
13 PO1
1 4 6 5
7

0 15 / 15 22/22
PI2 3 6 6 7 PO2
3 9

7 / 15

0 18/22
PI3 1 4 5 4 PO3
14/ 18
1 7

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation

1
0 18/22
PI1
13 PO1
1 4 6 5
7

0 15 / 15 22/22
PI2 3 6 6 7 PO2
3 9

7 / 15

0 18/22
PI3 1 4 5 4 PO3
14 / 18
1 7/ 13

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation

1 13 / 15
0 18/22
PI1 1 4 6 5 PO1
7

0 15 / 15 22/22
PI2 3 6 6 7 PO2
3 9

7 / 15

0 18/22
PI3 1 4 5 4 PO3
14 / 18
1 7/ 13

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1 13 / 15
0 18/22
PI1 1 4 6 5 PO1
7/9

0 15 / 15 22/22
PI2 3 6 6 7 PO2
3 9/9

7 / 15

0 18/22
PI3 1 4 5 4 PO3
14 / 18
1 7/ 13

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1/5 13 / 15
0 18/22
PI1 1 4 6 5 PO1
7/9

0 15 / 15 22/22
PI2 3 6 6 7 PO2
3/3 9/9

7 / 15

0 18/22
PI3 1 4 5 4 PO3
14 / 18
1/9 7/ 13

arrival time/required time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1/5 13 / 15
0/4 18/22
PI1 1 4 6 5 PO1
7/9

0/0 15 / 15 22/22
PI2 3 6 6 7 PO2
3/3 9/9

7 / 15

0/8 18/22
PI3 1 4 5 4 PO3
14 / 18
1/9 7/ 13

Slack = required time - arrival time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
4 2
4 4
PI1 1 4 6 5 PO1
2

0 0 0
PI2 3 6 6 7 PO2
0 0

8 4
PI3 1 4 5 4 PO3
4
8 6

Slack = required time - arrival time


IDEA Lab
integrated design, engineering, algorithmics
Slack and Criticality Calculation
1/5 13 / 15
0/4 18/22
PI1 1 4 6 5 PO1
7/9

0/0 15 / 15 22/22
PI2 3 6 6 7 PO2
3/3 9/9

7 / 15

0/8 18/22
PI3 1 4 5 4 PO3
14 / 18
1/9 7/ 13

Critical Path
IDEA Lab
integrated design, engineering, algorithmics
Example with interconnect delay

22

3 2 1 1
5 5 5
F F
F F
19 2 1

4 4 4
2 1 3 2

IDEA Lab
integrated design, engineering, algorithmics
Logic Cluster Structure

IDEA Lab
integrated design, engineering, algorithmics
Timing-Driven Clustering – T-VPACK

• Cost metric now considers both connectivity and


timing criticality

• Perform an analysis of criticality at beginning


considering all wires to be inter-cluster
• Determine “Base” BLE criticality

IDEA Lab
integrated design, engineering, algorithmics
Base Criticality

IDEA Lab
integrated design, engineering, algorithmics
How to break ties?

• Initially, many paths may have the same number of BLEs

IDEA Lab
integrated design, engineering, algorithmics

You might also like