Professional Documents
Culture Documents
High Level Synthesis - 03 - Transformations
High Level Synthesis - 03 - Transformations
High Level Synthesis - 03 - Transformations
Transformations
20/03/20
Transformations Techniques and
High-level Synthesis
❑ Techniques like
– loop pipelining,
– dynamic renaming,
– copy propagation,
– common subexpression elimination,
– speculative code motion,
– dynamic loop unrolling,
❑ form the base of dependency removal
– This in turn provides more opportunities for parallelism
2
Transformations techniques
Specification: Loop Pipelining:
while (k < 10)
sum += ++k;
+ k ==k +t 1
t=k+1
3
Transformations techniques
Specification: Copy
while (k < 10) Propagation:
sum += ++k;
+a1
t=k+1
k < 10 <
¬ (k < 10) Resource Allocation:
k < 10 <
¬ (k < 10)
+ + + <
k=k+1 LD
k = t
+
+
sum = sum + k Read After Write
sum = sum + tk
dependency
+
t ++ 11
t=k
4
Transformations techniques
Specification: Scheduling:
while (k < 10)
sum += ++k;
+a1
t=k+1
k < 10 <
¬ (k < 10) Resource Allocation:
k < 10 <
¬ (k < 10)
+ + + <
k=k+1 LD
k = t
+
sum = sum + k Read After Write +
sum = sum + t
dependency
+
t=t+1
5
Transformations vs scheduling
Specification: Scheduling:
while (k < 10)
sum += ++k;
+a1
t=k+1
k < 10 <
¬ (k < 10) Resource Allocation: k < 10 <
+ + < ¬ (k < 10)
+
k=k+1 + LD +
+
sum = sum + k Read After Write sum = sum + t
dependency k=t
t=t+1
6
Code Motion Techniques
❑ Speculative movement
of blocks of code
beyond/through /into
the basic blocks
❑ Helps in maximizing
– Parallelism extraction
– Resource utilization
❑ Four types
– Speculation
– Reverse Speculation
– Conditional Speculation
– Across hierarchical
blocks
7
Conditional Speculation Example
•Scenario: Three
scheduling steps to
complete the operations
•Resources available:
Two adders, One
multiplier,
•Timing Constraints:
One cycle each for
adder and multiplier
•Goal: Reduce the
scheduling steps within
the given constraints
8
Conditional Speculation Example (cntd.)
•Technique:
•Balance the branch
block BB1 with
respect to BB2 .
•Move the operation D
into the If Node
duplicating the
operation into both
branches as D1 and
D2
•Result: One scheduling
step is reduced!
9
Loop Unrolling
Original Program: Unrolled once:
for (i=0;i<32;i++){ for (i=0;i<32;i+=2){
sum = sum + a[i]*b[i]; sum = sum + a[i]*b[i];
} sum = sum + a[i+1]*b[i+1];
32 x 4 = 128 cycles }
16 x 6 = 96 cycles
10
Complexities in Loop Unrolling
Latency and FSM Delay variations associated with loop unrolling:
•Observation:
•Performance is not monotonically increasing
•FSM Delay is increasing almost linearly with unroll factor
•Conslusion: Choose unroll factor keeping above factors into consideration
11