Lecture Slide

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

EE-307

FPGA BASED DESIGN


Spring 2015

Understanding Speed of a Digital Circuit

Latency, Bandwidth & Timing


&&
Optimization for Non-Recursive DFGs
Lecture # 20 Engr. Rehan Ahmad <rehan.ahmad@iiu.edu.pk>
Speed
2

 Throughput
 Amount of data that is processed per clock cycle
 Metric: bits/sec
 Latency
 Time between data input and processed data output
 Metric: No. of cycles or time
 Timing
 Logic delays between sequential elements
 Metric : Clock period or Frequency.
 A high-throughput design
 More concerned with the steady-state data rate
 Less concerned about the time any specific piece of data
requires to propagate through the design (latency)
 Techniques
3
 Pipelining

High Throughput Design


Rev

Throughput top-level entity

8 bits
8 bits
input D Q
Combinational
D Q
Combinational
D Q
output
Logic Logic

clk clk clk


100MHz

clk

input input(0) input(1) input(2)

output (unknown) output(0) output(1)

1 cycle betweeen
output samples
 Throughput = (bits per output sample) / (time between consecutive output samples)
 Bits per output sample:
 In this example, 8 bits per output sample
 Time between consecutive output samples: clock cycles between output(n) to output(n+1)
 Can be measured in clock cycles, then translated to time
 In this example, time between consecutive output samples = 1 clock cycle = 10 ns
 Throughput = (8 bits per output sample) / (10 ns) = 0.8 bits / ns = 800 Mbits/s
[KIL]

An Example...
XPower = 1;
for (i=0;i < 3; i++)
 Software Code XPower = X * XPower;
 Digital Implementation

Throughput 8/3 = 2.7 bits/cyc.


(Ideally in sec.)
Same register and computational resources
are reused Latency 3 clk cycles

No new computations can begin until the Timing 1 Multiplier Delay


previous computation has completed
Coding an iterative module power3(
output [7:0] XPower,
algorithm output finished,
<with dependency> input [7:0] X,
input clk, start);

reg [7:0] ncount;


reg [7:0] XPower;

assign finished = (ncount == 0);

XPower = 1; always@(posedge clk)


for (i=0;i < 3; i++)
XPower = X * XPower; if(start) begin
XPower <= X;
ncount <= 2;
End
Can you make an FSM based design for this ?
else if(!finished) begin
ncount <= ncount - 1;
XPower <= XPower * X;
End

endmodule
XPower = 1;
Loop Unrolling & Pipelining for (i=0;i < 3; i++)
XPower = X * XPower;

XPower1 XPower2
XPower3

X2
X1
Coding
module power3( always @(posedge clk) begin
output reg [7:0] XPower, // Pipeline stage 1
input clk, X1 <= X;
8
input [7:0] X XPower1 <= X;
); // Pipeline stage 2
reg [7:0] XPower1, XPower2; X2 <= X1;
reg [7:0] X1, X2; XPower2 <= XPower1 * X1;
// Pipeline stage 3
XPower <= XPower2 * X2;
end
endmodule

XPower1 XPower2
XPower3

X2
X1
Throughput 8/3 = 2.7 bits/cyc. REV

ft
Latency 3 clk cycles

Timing
9
1 Multiplier Delay

XPower1 XPower2
XPower3

Throughput 8/1 = 8 bits/cyc.


X2
X1 Latency 3 clk cycles

Timing 1 Multiplier Delay


10

 In general, if an algorithm requiring n


iterative loops & is “unrolled,” the pipelined
implementation will exhibit a throughput
performance increase of a factor of n.
 The penalty for unrolling an iterative loop is a
proportional increase in area.
 A low-latency design is one that passes the data from
the input to the output as quickly as possible by
minimizing the intermediate processing delays.
 Technique
 Removal of pipelining, and logical short cuts that may reduce
the throughput or the max clock speed in a design
 Parallelisms

Decreasing Latency

11
Latency
top-level entity

8 bits
8 bits
input D Q
Combinational
D Q
Combinational
D Q
output
Logic Logic

clk clk clk

100 MHz

clk

input input(0) input(1) input(2)

output (unknown) output(0) output(1)


 Latency is the time between input(n) and output(n)
 i.e. time it takes from first input to first output, second input to second output, etc.
 Also called input-to-output latency
 Count the number of rising edges after input
 In this example, 3 rising edges  latency is 3 cycles
 Latency is measured in clock cycles (then translated to seconds)
 In this example, say clock period is 10 ns, then latency is 30 ns
REV

Removal of pipelining

Throughput 8/1 = 8 bits/cyc.

Latency 1 Cycle

Timing 2 Multiplier
Delays
Penalty
14

 Penalty in timing module power3(


output [7:0] XPower,
 Previous input [7:0] X
implementations could );
theoretically run the reg [7:0] XPower1, XPower2;
reg [7:0] X1, X2;
system clock period assign XPower = XPower2 * X2;
close to the delay of a always @* begin
single multiplier X1 = X;
XPower1 = X;
 For Low-latency end
implementation, the always @* begin
clock period must be at X2 = X1;
XPower2 = XPower1*X1;
least two multiplier end
delays endmodule
15

Understanding Timing
Timings
16

 Combinational Circuits
 Propagation delay : Logic & Routing Delay
 Sequential Synchronous Circuits : Flip Flops
 Propagation delay: tCLK2Q
 Some Constraints
 Setup time
 Hold time
Timing: Combinational Logic
tLOGIC + trouting

 Classification
 tLOGIC :propagation delay through logic components (e.g. LUTs)
 trouting :propagation delay through routing (wires)

tLOGIC

The output remains unchanged for a time period equal to the


contamination delay, tcd

The new output value is guaranteed to be valid after a time equal to the propagation
delay, tLOGIC
Timing: Flip Flops (Sequential Logic)
Input D must remain Input D can freely
stable during
this interval = {tS+tH}
change during
D Q
clk
this interval

clk

tS tH

tCLK2Q
Setup time tS – minimum time the input has to be stable before the rising edge of the clock
Hold time tH – minimum time the input has to be stable after the rising edge of the clock
Propagation delay tCLK2Q – time to propagate input to output after the rising edge of the clock
REV
Timing: A path is defined as a path from the output
of one flip-flop to the input of another
Path Delay flip-flop

D Q Combinational D Q
Logic
Launch Capture
clk Flip Flop clk Flip Flop

tCLK2Q tLOGIC tRout ts

clk

CLOCK PERIOD T
 tCLK2Q + tLOGIC+ tROUTING < (T - tS ) to avoid setup time
violation
 Rewriting the equation: tCLK2Q + tLOGIC + trouting + tS < T

tpath
Critical Path Delay

 Path delay tpath = tCLK2Q + tLOGIC + tROUTE + tS


 The largest of all the path delays in a circuit is
called the critical path delay (tcritical_path)
 The associated path is called the critical path
 There can be millions of paths in a circuit; timing
analysis CAD tools help to locate the critical path
Critical Path
D Q

1.1 ns
tCLK2Q=0.4 ns

PATH 1
D Q D Q
0.5 ns PATH 2

tCLK2Q=0.4 ns tS=0.2 ns
PATH 4
D Q D Q
0.8 ns

PATH 3 tS=0.2 ns
tCLK2Q=0.4 ns
 Path delays: tpath1 = 2.2 ns, tpath2 = 1.1 ns, tpath3 = 3.0 ns, tpath4 = 1.4 ns
 The critical path is PATH 4; the critical path delay is
tcritical_path = tpath4 = 3.0 ns
Critical Path- Example-2
twire1=0.4 ns tgateA=2.0 ns twire2=0.2 ns tgateB=1.2 ns twire3=0.8 ns

D Q D Q
Combinational Combinational
Gate A Gate B

tCLK2Q=0.4 ns tS=0.2 ns
tCLK2Q twire1 tgateA twire2 tgateB twire3 ts

clk

CLOCK PERIOD T
 Critical path delay = tcritical_path = 5.2 ns
 The minimum period for this circuit to work is Tmin = 5.2 ns
 Maximum clock frequency = 1/Tmin = 192 MHz
 If the clock period is smaller than Tmin, you will get a timing violation and circuit will not
operate correctly!!
 This kind of timing violation is called a "setup time" violation (also known as critical path violation)
23 MaX Frequency
Review – From Last Lecture
 Throughput
 Amount of data that is processed per clock cycle OR The aggregate/average data
24 processing rate
 Ideally average data rate IN to your system should be equal to the average data rate OUT
of your system – OR you will miss data !
 Improved by : Pipelining & Loop Unrolling !
 Streaming Applications – More concerned with throughput !
 Metric: bits/sec (Do not use bits/cycle as in example that I described in class)
 Latency
 Time between data input and processed data output
 Improved by Parallelising the system
 Response Time --- Important for Time Critical Signals, e.g. some interrupt triggered
operation processing an external signal of an avionics system !
 Metric: Time in terms of seconds
 Timing
 Logic delays between sequential elements
 Metric : Clock period or Frequency.
 [tCLK2Q + tLOGIC + trouting + tS ]< T
 Normally a compromise !
Questions….
FPGA

You might also like