Professional Documents
Culture Documents
VLIW Architectures For DSP: A Two-Part Lecture
VLIW Architectures For DSP: A Two-Part Lecture
www.BDTI.com
Outline
l Why VLIW?
l The TMS320C62xx
l ADI TigerSHARC
l Infineon Carmel
1
VLIW Architectures for DSP
Why VLIW?
VLIW vs Superscalar
2
VLIW Architectures for DSP
L1 S1 M1 D1 L2 S2 M2 D2
L: ALU
32 32
S: Shifter, ALU
M: Multiplier
On-Chip Data Memory
D: Address gen.
© 1999 Berkeley Design Technology, Inc. 6
3
VLIW Architectures for DSP
LOOP:
ADD .L1 A0,A3,A0
Can execute ||ADD .L2 B1,B7,B1
up to eight ||MPYHL .M1X A2,B2,A3
32-bit ||MPYLH .M2X A2,B2,B7
instructions ||LDW .D2 *B4++,B2
in parallel ||LDW .D1 *A7--,A2
||[B0] ADD .S2 -1,B0,B0
||[B0] B .S1 LOOP
u Increased performance
u Potentially scalable
l Can add more execution units, allow more
instructions to be packed into the VLIW instruction
4
VLIW Architectures for DSP
Benchmark Results
Execution Time on Complex Block FIR
45 Microseconds
(lower is faster)
microseconds
30
15
0
ADSP-2189 DSP1620 DSP56311 '320C549 TMS320C6202
75 MHz 120 MHz 150 MHz 120 MHz 250 MHz
5
VLIW Architectures for DSP
Benchmark Results
Memory Usage on FSM Benchmark
Bytes
200
(lower is better)
160
120
80
40
0
ADSP-218x DSP16xx DSP563xx '320C54x TMS320C62xx
www.bdti.com
l DSP Processors Hit the Mainstream
covers DSP architectural basics and new
developments. Originally printed in
IEEE Computer Magazine.
6
VLIW Architectures for DSP
Outline
l Why VLIW?
l The TMS320C62xx
l ADI TigerSHARC
l Infineon Carmel
StarCore SC140
7
VLIW Architectures for DSP
StarCore SC140
ADI TigerSHARC
8
VLIW Architectures for DSP
ADI TigerSHARC
Infineon Carmel
9
VLIW Architectures for DSP
Infineon Carmel
CLIW (Configurable Long Instruction Word)
General format:
cliw name (operand1, ... , operand 4) {
ALU1 || MAC1 || ALU2 || MAC2 || MOV1 || MOV2
}
Example CLIW:
cliw fft4(r0+=rn0, r1, r4, r5) {
a2 = a1l * a0h
|| *ma1 = ff1 + ff2
|| *ma2 = a2 - a1h * a0h
|| a1h = *ma3 - *ma4
|| ff1 = *ma3
|| ff2 = *ma4;
}
© 1999 Berkeley Design Technology, Inc. 19
Comparison
Data memory
Processor Issue bandwidth Instruction Clock Pipeline Notable
width (16-bit words) size (MHz) depth characteristics
1st VLIW-based
TMS320C62xx 8 4 words/cycle 32 bits 250 11
DSP processor
SIMD + VLIW,
TigerSHARC 4 16 words/cycle 32 bits 250* 8
data type agility
CLIW instructions,
Carmel 2, 6 4 words/cycle 24/48 bits 120 8
4 AGUs
*Projected
10
VLIW Architectures for DSP
Benchmark Results
Execution Time on Complex Block FIR
12
10
ted
8
jec
6
pro
4
http://www.BDTI.com
l DSP Processors Hit the Mainstream
covers DSP architectural basics and new
developments. Originally printed in
IEEE Computer Magazine.
11