02 TargetArchitectures DSP Extra

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Typical DSP

architectures and features


extra materials

SS 2010
HW/SW Codesign

Christian Plessl
Classic DSP characteristics
•  explicit parallelism
–  Harvard architecture for concurrent data access
–  concurrent operations on data and addresses
•  optimized control flow and background processing
–  zero-overhead loops
–  DMA controllers
•  special addressing modes
–  distinction of address, data and modifier registers
–  versatile address computation for indirect addressing
•  specialized instructions
–  single-cycle hardware multiplier
–  multiply accumulate instruction (MAC)

HW/SW Codesign 2010 Ch 2 - DSP extra 2


Harvard architecture

program/data general •  unified external memory for


bus purpose program and data
memory
processor core •  all operands in registers

program
•  separate program and data
bus memories
memory
•  operands also in memory
DSP processor •  concurrent access to
data memory bus
core •  instruction word
•  one or several data words
data memory bus •  example:
MPYF3 *(AR0)++, *(AR1)++, R0

instruction from data from data store result


from memory memory in data
program (address in (address in register R0
memory address address
register AR0) register AR1)

HW/SW Codesign 2010 Ch 2 - DSP extra 3


Specialized addressing modes
•  many DSPs distinguish address registers from data registers
•  additional ALUs for address computations
–  useful for indirect addressing (register points to operand in memory)
ADDF3 *(AR0)++, R1, R1
–  operations on address registers in parallel with operations on data
registers, no extra cycles
–  behavior depends on instruction and contents of special purpose
registers (modifier registers)
•  typical address update functions
–  increment/decrement by 1 (AR0++, AR0--)
–  increment/decrement by constant specified in modifier register (AR0 +=
MR0, AR0 -= MR5)
–  circular addressing (AR0 += 1 if AR0 < upper limit, else AR0 = base
address), see example
–  bit-reverse addressing, see example
–  …

HW/SW Codesign 2010 Ch 2 - DSP extra 4


Circular addressing
•  goal: implementation of ring buffers in linear address space
–  implementation variants
  copy data with data access, or
  use circular addressing (don’t copy data, wrap pointers)
–  supported by addressing modes
  data access and move operations
  increment operators that wrap around at buffer boundaries

latest input
ring buffer of length 4
x[M-2]
x[0] x[1] x[1] x[2] current x[M-1]
sample
(address x[0]
register) …

x[3] x[2] x[0] x[3]


x[M-3]
iteration i latest input iteration i+1 linear address
space
HW/SW Codesign 2010 Ch 2 - DSP extra 5
Bit-reverse addressing
•  goal: accelerate FFT operation
•  very important DSP operation
•  transforms signals between time and frequency representations
•  compute intensive:
–  N-point DFT needs O( N^2 ) complex multiplications
–  FFT needs O( N*log2(N) ) complex multiplications

x0 X0 000 000 =0 000 =0


+ 100
x1 X4 001 100 =7
100 =4
x2 8-point X2 010 010 =2 reverse + 100
x3 Fast X6 011 110 =6 carry 010 =2
Fourier + 100
x4 X1 100 001 =1
Transform 110 =6
x5 (FFT) X5 101 101 =5 + 100
x6 X3 110 011 =3 001 =1
x7 X7 111 111 =7
other method to compute
mirror bits
basic operation in many addresses, add N/2 with
(bit reverse)
DSP algorithms reverse carry arithmetic

HW/SW Codesign 2010 Ch 2 - DSP extra 6


Zero-overhead loops
•  goal
–  reduce overhead for executing loops
example: add first 100
–  general purpose processors values in array a and store
  initialize loop counter result in R1
  execute loop body
  check loop exit condition TMS320C3x-like assembler
  branch to loop start or exit loop LDI @a, AR0!
–  digital signal processors LDI 0.0, R1!
RPTS 99!
  initialize loop counter ADDF3 *(AR0)++, R1, R1!
  execute loop body …
  check loop exit condition
  branch to loop start or exit loop RPTS N repeats next
instruction N-1 times

HW/SW Codesign 2010 Ch 2 - DSP extra 7


Putting it together: scalar product

sum = 0.0;!
for (i=0; i<N; i++)!
sum = sum + a[i]*b[i];!

TMS320C3x assembler data register

zero-overhead loop LDI @a, AR0 address register


LDI @b, AR1
LDF 0, R0!
!LDF 0, R1!
exploit harvard
!RPTS N-1!
architecture, read two data
!MPYF3 *(AR0)++, *(AR1)++, R0!
operands in one cycle
|| !ADDF3 R0, R1, R1!
!ADDF3 R0, R1, R1

MAC - instruction address arithmetic (auto increment)

HW/SW Codesign 2010 Ch 2 - DSP extra 8


Further reading
•  Jennifer Eyre and Jeff Bier, “The Evolution of DSP Processors”,
BDTI Whitepaper
•  Phil Lapsley et al., “DSP Processor Fundamentals”, IEEE Press
•  Berkeley Design Technologies Website, http://www.bdti.com/

HW/SW Codesign 2010 Ch 2 - DSP extra 9

You might also like