Professional Documents
Culture Documents
Lecture Slide
Lecture Slide
Lecture Slide
Throughput
Amount of data that is processed per clock cycle
Metric: bits/sec
Latency
Time between data input and processed data output
Metric: No. of cycles or time
Timing
Logic delays between sequential elements
Metric : Clock period or Frequency.
A high-throughput design
More concerned with the steady-state data rate
Less concerned about the time any specific piece of data
requires to propagate through the design (latency)
Techniques
3
Pipelining
8 bits
8 bits
input D Q
Combinational
D Q
Combinational
D Q
output
Logic Logic
clk
1 cycle betweeen
output samples
Throughput = (bits per output sample) / (time between consecutive output samples)
Bits per output sample:
In this example, 8 bits per output sample
Time between consecutive output samples: clock cycles between output(n) to output(n+1)
Can be measured in clock cycles, then translated to time
In this example, time between consecutive output samples = 1 clock cycle = 10 ns
Throughput = (8 bits per output sample) / (10 ns) = 0.8 bits / ns = 800 Mbits/s
[KIL]
An Example...
XPower = 1;
for (i=0;i < 3; i++)
Software Code XPower = X * XPower;
Digital Implementation
endmodule
XPower = 1;
Loop Unrolling & Pipelining for (i=0;i < 3; i++)
XPower = X * XPower;
XPower1 XPower2
XPower3
X2
X1
Coding
module power3( always @(posedge clk) begin
output reg [7:0] XPower, // Pipeline stage 1
input clk, X1 <= X;
8
input [7:0] X XPower1 <= X;
); // Pipeline stage 2
reg [7:0] XPower1, XPower2; X2 <= X1;
reg [7:0] X1, X2; XPower2 <= XPower1 * X1;
// Pipeline stage 3
XPower <= XPower2 * X2;
end
endmodule
XPower1 XPower2
XPower3
X2
X1
Throughput 8/3 = 2.7 bits/cyc. REV
ft
Latency 3 clk cycles
Timing
9
1 Multiplier Delay
XPower1 XPower2
XPower3
Decreasing Latency
11
Latency
top-level entity
8 bits
8 bits
input D Q
Combinational
D Q
Combinational
D Q
output
Logic Logic
100 MHz
clk
Removal of pipelining
Latency 1 Cycle
Timing 2 Multiplier
Delays
Penalty
14
Understanding Timing
Timings
16
Combinational Circuits
Propagation delay : Logic & Routing Delay
Sequential Synchronous Circuits : Flip Flops
Propagation delay: tCLK2Q
Some Constraints
Setup time
Hold time
Timing: Combinational Logic
tLOGIC + trouting
Classification
tLOGIC :propagation delay through logic components (e.g. LUTs)
trouting :propagation delay through routing (wires)
tLOGIC
The new output value is guaranteed to be valid after a time equal to the propagation
delay, tLOGIC
Timing: Flip Flops (Sequential Logic)
Input D must remain Input D can freely
stable during
this interval = {tS+tH}
change during
D Q
clk
this interval
clk
tS tH
tCLK2Q
Setup time tS – minimum time the input has to be stable before the rising edge of the clock
Hold time tH – minimum time the input has to be stable after the rising edge of the clock
Propagation delay tCLK2Q – time to propagate input to output after the rising edge of the clock
REV
Timing: A path is defined as a path from the output
of one flip-flop to the input of another
Path Delay flip-flop
D Q Combinational D Q
Logic
Launch Capture
clk Flip Flop clk Flip Flop
clk
CLOCK PERIOD T
tCLK2Q + tLOGIC+ tROUTING < (T - tS ) to avoid setup time
violation
Rewriting the equation: tCLK2Q + tLOGIC + trouting + tS < T
tpath
Critical Path Delay
1.1 ns
tCLK2Q=0.4 ns
PATH 1
D Q D Q
0.5 ns PATH 2
tCLK2Q=0.4 ns tS=0.2 ns
PATH 4
D Q D Q
0.8 ns
PATH 3 tS=0.2 ns
tCLK2Q=0.4 ns
Path delays: tpath1 = 2.2 ns, tpath2 = 1.1 ns, tpath3 = 3.0 ns, tpath4 = 1.4 ns
The critical path is PATH 4; the critical path delay is
tcritical_path = tpath4 = 3.0 ns
Critical Path- Example-2
twire1=0.4 ns tgateA=2.0 ns twire2=0.2 ns tgateB=1.2 ns twire3=0.8 ns
D Q D Q
Combinational Combinational
Gate A Gate B
tCLK2Q=0.4 ns tS=0.2 ns
tCLK2Q twire1 tgateA twire2 tgateB twire3 ts
clk
CLOCK PERIOD T
Critical path delay = tcritical_path = 5.2 ns
The minimum period for this circuit to work is Tmin = 5.2 ns
Maximum clock frequency = 1/Tmin = 192 MHz
If the clock period is smaller than Tmin, you will get a timing violation and circuit will not
operate correctly!!
This kind of timing violation is called a "setup time" violation (also known as critical path violation)
23 MaX Frequency
Review – From Last Lecture
Throughput
Amount of data that is processed per clock cycle OR The aggregate/average data
24 processing rate
Ideally average data rate IN to your system should be equal to the average data rate OUT
of your system – OR you will miss data !
Improved by : Pipelining & Loop Unrolling !
Streaming Applications – More concerned with throughput !
Metric: bits/sec (Do not use bits/cycle as in example that I described in class)
Latency
Time between data input and processed data output
Improved by Parallelising the system
Response Time --- Important for Time Critical Signals, e.g. some interrupt triggered
operation processing an external signal of an avionics system !
Metric: Time in terms of seconds
Timing
Logic delays between sequential elements
Metric : Clock period or Frequency.
[tCLK2Q + tLOGIC + trouting + tS ]< T
Normally a compromise !
Questions….
FPGA