Professional Documents
Culture Documents
09 Accelerators
09 Accelerators
09 Accelerators
An Embedded Systems
Approach Using Verilog
Chapter 9
Accelerators
Portions of this work are from the book, Digital Design: An Embedded
Systems Approach Using Verilog, by Peter J. Ashenden, published by Morgan
Kaufmann Publishers, Copyright 2007 Elsevier Inc. All rights reserved.
Verilog
Accelerating performance
Instruction-level parallelism
Accelerators
Verilog
Achievable Parallelism
Verilog
Algorithm Kernels
Verilog
Amdahls Law
Accelerator speeds up
kernel by a factor s
Overall speedup factor s'
t ft (1 f )t
ft
t (1 f )t
s
t
1
s
t f (1 f )
s
Verilog
For kernel 1:
For kernel 2:
1
0.8
(1 0.8)
10
1
3.57
0.08 0.2
1.17
0.15
(1 0.15) 0.0015 0.85
100
Verilog
Parallel Architectures
Processing blocks
Data flow between them
Verilog
Parallel Architectures
data
in
step 1
step 2
step 3
data
out
Verilog
Verilog
Bus Arbitration
Arbitration policies
Priority, round-robin,
request
grant
request
arbiter
request
processor
grant
grant
accelerator
controller
memory
bus
memory
10
Verilog
Block-Processing Accelerator
Datapath comprises
11
Verilog
Stream-Processing Accelerator
12
Verilog
Processor/Accelerator Interface
13
Verilog
Application areas
14
Verilog
15
Verilog
+1
+2
+2
Dx (i, j ) O(i, j ) Gx
+1 +2 +1
Gy
D y (i, j ) O(i, j ) G y
16
Verilog
17
Verilog
18
Verilog
19
Verilog
Data Dependencies
20
Verilog
System Architecture
21
Verilog
Memory Bandwidth
22
Verilog
Memory Bandwidth
23
Verilog
24
Verilog
Accelerator Sequence
Steady state
Start of row
End of row
25
Verilog
Steady state
26
Verilog
Pixel Datapath
// Computation datapath signals
reg
[31:0] prev_row, curr_row, next_row;
reg
[7:0] O [-1:+1][-1:+1];
reg signed [10:0] Dx, Dy, D;
reg
[7:0] abs_D;
reg
[31:0] result_row;
...
// Computational datapath
always @(posedge clk_i) // Previous row register
if (prev_row_load) prev_row
<= dat_i;
else if (shift_en) prev_row[31:8] <= prev_row[23:0];
... // Current row register
... // Next row register
function [10:0] abs (input signed [10:0] x);
abs = x >= 0 ? x : -x;
endfunction
...
Digital Design Chapter 9 Accelerators
27
Verilog
Pixel Datapath
always @(posedge clk_i) // Computation pipeline
if (shift_en) begin
D = abs(Dx) + abs(Dy);
abs_D <= D[10:3];
Dx <= - $signed({3'b000, O[-1][-1]})
+ $signed({3'b000, O[-1][+1]})
- ($signed({3'b000, O[ 0][-1]}) << 1)
+ ($signed({3'b000, O[ 0][+1]}) << 1)
- $signed({3'b000, O[+1][-1]})
+ $signed({3'b000, O[+1][+1]});
Dy <=
$signed({3'b000, O[-1][-1]})
+ ($signed({3'b000, O[-1][ 0]}) << 1)
+ $signed({3'b000, O[-1][+1]})
- $signed({3'b000, O[+1][-1]})
- ($signed({3'b000, O[+1][ 0]}) << 1)
- $signed({3'b000, O[+1][+1]});
...
Digital Design Chapter 9 Accelerators
28
Verilog
Pixel Datapath
O[-1][-1] <= O[-1][0];
O[-1][ 0] <= O[-1][+1];
O[-1][+1] <= prev_row[31:24];
O[ 0][-1] <= O[0][ 0];
O[ 0][ 0] <= O[0][+1];
O[ 0][+1] <= curr_row[31:24];
O[+1][-1] <= O[+1][ 0];
O[+1][ 0] <= O[+1][+1];
O[+1][+1] <= next_row[31:24];
end
always @(posedge clk_i) // Result row register
if (shift_en) result_row <= {result_row[23:0], abs_D};
29
Verilog
Address Generation
30
Verilog
Address Generation
31
Verilog
Address Generation
always @(posedge clk_i) // O base address register
if (O_base_ce) O_base <= dat_i[21:2];
always @(posedge clk_i) // O address offset counter
if (offset_reset)
O_offset <= 0;
else if (O_offset_cnt_en) O_offset <= O_offset + 1;
always @(posedge clk_i) // D base address register
if (D_base_ce) D_base <= dat_i[21:2];
always @(posedge clk_i) // D address offset counter
if (offset_reset)
D_offset <= 0;
else if (D_offset_cnt_en) D_offset <= D_offset + 1;
...
32
Verilog
Address Generation
assign
assign
assign
assign
assign
33
Verilog
Control/Status Registers
Register
Offset
Read/Write
Purpose
Int_en
Write-only
Start
Write-only
O_base
Write-only
D_base
12
Write-only
Status
Read-only
34
Verilog
35
Verilog
36
Verilog
Control Sequencing
37
Verilog
38
Verilog
Accelerator Verification
39
Verilog
Arbiter
Sobel
Accelerator
Memory
Model
40
Verilog
41
Verilog
42
Verilog
43
Verilog
Bus Arbiter
Mealy FSM
44
Verilog
Bus Arbiter
always @(posedge clk) // Arbiter FSM register
if (rst) arbiter_current_state <= sobel;
else
arbiter_current_state <= arbiter_next_state;
always @* // Arbiter logic
case (arbiter_current_state)
sobel: if (sobel_cyc_o) begin
sobel_gnt <= 1'b1; cpu_gnt <= 1'b0; arbiter_next_state
end
else if (!sobel_cyc_o && cpu_cyc_o) begin
sobel_gnt <= 1'b0; cpu_gnt <= 1'b1; arbiter_next_state
end
else begin
sobel_gnt <= 1'b0; cpu_gnt <= 1'b0; arbiter_next_state
end
cpu:
if (cpu_cyc_o) begin
sobel_gnt <= 1'b0; cpu_gnt <= 1'b1; arbiter_next_state
end else if (sobel_cyc_o && !cpu_cyc_o) begin
sobel_gnt <= 1'b1; cpu_gnt <= 1'b0; arbiter_next_state
end else begin
sobel_gnt <= 1'b0; cpu_gnt <= 1'b0; arbiter_next_state
end
endcase
Digital Design Chapter 9 Accelerators
<= sobel;
<= cpu;
<= sobel;
<= cpu;
<= sobel;
<= sobel;
45
Verilog
Simulation Results
46
Verilog
Summary
Ahmdahls Law
Replication, pipelining,
47