DSP Processors

DSP Architectures
Comparison
• Since Fixed Point DS processor operates
using integer format, range of numbers get
limited leading to overflow problems. More
coding effort is needed to deal with such a
problem
– Choice for ASIC DSP (performance & small slice
area)
• Floating point offers wide range of data, but
requires complex circuitry hence more
expensive and slower than fixed point.
– Choice for prototyping or proof-of-concept
2
development
• Most floating point numbers perform
automatic normalization so that numbers are
properly shifted & aligned. The programmer
just needs to take care of overflow problem.
• But due to enormous dynamic range,scaling is
rarely needed.
• Floating point processors are easier to use than
fixed point processors but are more expensive
3
Comparison between fixed & floating point
processors
• 16 or 24 bit devices • 32 bit devices
• Limited dynamic range • Large dynamic range
• Overflow & quantization errors • Easier to program since no
must be resolved scaling is required
• Poorer C compiler efficiency. • Better C compiler efficiency. Can
Normally programed in be developed in C
assmbely
• Quick time to market
• Long product developement time
• Faster clock rate • Slower clock rate
• Less silicon area is required • More silicon area is required as
functional units are complex
• Cheaper
• More expensive
• Low power consumption
• High power consumption
• Bursty in nature
• High speed
4
Applications of fixed & floating
Processors
point
• Drive disc and motor • In radar,sonar & seismic
control applications
• Consumer audio • Highend audio applications
applications such as MP3
players,multimedia gaming
and digital cameras
• Speech coding/decoding and • Sound synthesis in
channel coding professional audio vedio
• Communication devices coding/decoding
such as modems & cellular
phones.
5
Sources of Error in DSPs
• DSP System: ADC, DSP device, DAC
• Accuracy depends on number of factors
contributed by ADC & DAC and how the
calculations are done in DSP device
• Errors in ADC & DAC : Quantization errors
(Limited by number of bits)
• Errors in DSP calculations: Finite word length
(Can be reduced by using larger word length
& by rounding instead of truncation)
6
Comparison between DSP and GPP
• Used for embedded • Desk top computing/Servers
applications
• Low power requirement • High Power consumption
• Have features required for
DSP applications (FFT,
Convolution, Correlation
etc.)
• Real time I/O
• High speed on chip
memories
• Bursty in nature
• Deals with infinite
continuous stream of data.
• Slow • High speed
• Has a typical MAC unit.
7
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP
8
• DSP Architectures
– General architectures
• Architectural aspects
– H/W and S/W aspects
– RISC, CISC
– Endianess
3
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP
4
Architecture models
• Von-Neumann
– Single memory space
– Inefficient for memory
intensive operations
5
Architecture models
• Harvard
– Split memory space,
separate prog. & data
buses
– Faster
Tripathi,ASE, Bangalore
66
6
Architecture models
• Modified Harvard
– Split memory space,
separate buses
– Parallel memory access
(using DARAM/DPRAM)
– Used in TMS320C54x,
1 program & 3 data buses
Data buses Prog. bus
7
VLIW Architecture
P
• Very Long Multiported register file
R
Instruction Word
architecture O
• Parallel / serial G Read / write cross bar

execution C
R
O
• Has multiple A
functional units N
M
• For signal & IP T
Functional …... Functional
applications/for Unit 1 Unit n

R
media processors U
• Ex. TMS320C6x O
N
L Instruction cache
I
T
8
Hardware Aspects
• CPU
– MAC
– ALU
– Shifter
– Pipelining and parallelism
– Buses
– Data address generator
• Memory
– DARAM/DPRAM/SARAM
• Multiport memories are costlier than multiple access due to
more number of pins and larger chip area but permit parallel
access of memory locations
– Cache
– ROM 9
Hardware Aspects…
• Peripherals and Input Output
– Serial port
• Standard serial port
• Buffered serial port
• TDM serial port
• Multi channel buffered serial port( auto-buffering unit supports high
speed transfers & reduces overhead of servicing interrupts)
– Host port interface
– DMA controller
– Parallel port
– Hardware timer
– Power management
• Clock frequency control
• Power-down mode
• Disabling of unused peripherals
10
Software Aspects
• Instruction set
– CISC: Complex Instruction Set Computing
– RISC: Reduced Instruction Set Computing
• Programming languages
– Assembly programs
– C programs
• Software development tools
– C compiler
– Assembler
– Linker
– Simulator
– Code Composer Studio (CCS)
11
RISC Vs CISC
• Instruction set is simple. • It is complex.
(typically <100 instructions) (>1000 instns.)
• Simple opcodes (ADD,SUB) • Instructions are tailored to
DSP(FIR,CONV,MACD)
• Compilers for HLL is shorter &
simple. Control unit is small • Compilers for HLL are costly. Control
(hard wired). unit is large (micro prog).
• More no. of registers. • Less no. of registers
• Programs are large • Programs are compact.
• Less time to execute. • More time to execute.
• Easy to pipeline • Difficult to pipeline.
• Programming is complex. • Programming is simple.

• Ex. ADSP 2100X, TMS320C54X,
• Ex. TMS320C6X 12
M563XX
TMS320C8X(RISC/CISC)
Endians
• Depending on the way bytes are ordered
within a larger object, a Processor can be:
– Big Endian
– Little Endian
• Depends on the way multi byte data is

stored.
13
Eg:- 12345678 can be stored in 4x8bit locations as follows:
Address Big Endian Little Endian
1000 12 78
1001 34 56
1002 56 34
1003 78 12
14
Endians in DSPs
• Little Endian  LS Byte first

– TI DSP Processors
• Big Endian  MS Byte first

– Motorola DSP 56k
15
Programmable DSP
• A programmable DSP device should provide
instructions similar to Microprocesors
• The computational capabilities provided by these
instructions should inlcude:
– Arithmetic operations like add,subtract &
multiply
– Logic operations like AND,OR,XOR & NOT
– MAC operations
– Signal scaling operations before & after DSP
16
Programmable DSP Cont..
• Support Architecture should include:
– On-chip registers for storage of
intermediate results
– On-chip memories for signal
samples(RAM)
– On-chip program memory for programs &
fixed data such as filter coefficients
17
Computational building
• blocks
Key issue: Speed and accuracy
• DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU
18
Computational Building Blocks of DSP
Computational building
• blocks
DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU
4
Shifter
• Required to scale down or scale up operands &
Amrita School of Engineering, Bangalore
results to avoid errors resulting from overflows and

underflows during computations
(a) When N number of n bits are added, the number of
required bits increases to (n+log2N) bits
– Loss due to overflow can be avoided by scaling
down each number by log2N
– Results in decrease in accuracy but loss due to
overflow can be avoided
– The actual sum can be obtained by scaling up the
result by log2N bits
11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 5

Shifter Cont..
(b) When 2 numbers represented by n bits are

multiplied, the product can have maximum 2n bits.
Discarding lower n bits results is loss of
accuracy. In multiplying 2 signed numbers accuracy
is slightly increased by shifting the product by 1 bit
to left before storing higher order bits.
(c) While carrying out floating point additions, the
operands should be normalized to have same
exponent. Shifting is required.

Barrel Shifter
• In DSPs multibit shifts are common
• Shifts by multibit is possible in one cycle using a
combinational circuit called Barrel shifter
• It connects the input lines representing a word to a
group of output lines with required shift determined
by its control inputs. Control unit also determines
the direction of shift
• For input word of n bits, 0 – (n-1) bit shifts require
log2n control lines
• Bits shifted out in the left shift are filled with zeros
in LSB & new bit positions are filled with MSB to
retain the sign in case of right shift 11
Barrel Shifter Cont..
• In practical DSPs shifting is combined with data

transfer. Both operations are executed in single
clock cycle.
• Since the circuit is a combinational logic circuit, the
time taken to implement shift is total combination
delay involved in decoding control lines & setting
up path from the input to output lines. The delay is
fraction of clock cycle.

MAC
• Requires an add/subtract unit & an additional register

called accumulator at the output
• Consists of a multiplier that multiplies 2 n-bit
numbers and gives product 2n bits wide. This is added
to or subtracted from the contents of accumulator and
the result saved in accumulator (A+BC)
• If both multiply & accumulate work in parallel, it can
be done in one cycle.
– When multiplier computes product, accumulator
accumulates the product of previous multiplication.
– If N products are to be accumulated,N-1 multiplies can
overlap with accumulations. During first multiply, acc is
idle and during last acc mul is idle(N+1 9
11/ 02/ 14
cycles)
© Dr.Shikha Tripathi,ASE, Bangalore
MAC Cont..
• Saturation logic:
– Overflow occurs when accumulator result becomes
larger than the largest (smaller than the smallest(-
ve numbers) results in underflow)
– Accumulator contents are limited to the most +ve
or most –ve value to avoid error known as wrap-
around error.

ALU
• In addition to shift, multiply and MAC, DSP is

required to carry out several arithmetic & logic
operations like add, subract, increment, decrement,
negate, AND,OR,NOT,EXOR and compare, like
in microprocessor
• ALU is similar to microprocessor with some
additional features

DSP Manufacturers
• Texas instruments
– TMS 320XXX
• Analog Devices
– ADSP 2100x, 218xx, 219xx
– SHARC, Tiger SHARC
• Motorola
– M56xxx
• Others
– Hitachi, Fuchitsu…
13
TI DSP IC
TMS 320 C 5X
TMX : Experimental device
TMP : Prototype
TMS : Qualified device
C : CMOS Tech with on –chip non- volatile memory

as ROM
E : CMOS tech with on-chip non – volatile memory as
EPROM
nothing : NMOS tech with on-chip non – volatile
mem as ROM
5 :Generation
X :Version number- 0,1,2,3,4x,5,6,7 14
TI DSP Types
• Fixed Point DSPs (16 bit DSPs)
– TMS320C2000(C24X)
– TMS320C5000(C54X & C55X)
– TMS320C6000(C62X & C64X)
• Floating Point DSPs(32 bit
DSPs)
– TMS320C3x
– TMS320C4x
– TMS320C67x 15
Applications
• C5x: MP3players, • C6x: Wireless

Modems, Cellular applications, Modems,
phones, digital cameras Remote-access servers,
DSL systems, Cable
• C3x: Filters, hi-fi modems, Multi-channel
systems, imaging, 3D telephone systems.
Graphics
• C8x: Video telephony,
• C4x: Virtual Reality, VR, Multimedia
Image recognition, applications, cellular base
Telecommunication stations.
16
Architecture of
TMS320C54x
Digital Signal Processing
 Digital Representation of Signals &
Processing of these signals
 Digital Representation: Conversion of natural
analog signals to digital by sampling &
quantization.
 Signal Types: Analog & Digital
Ex. Speech, image & video, biomedical, music,
radar, seismic signals (low frequency) etc..
4
 Processing: Analyze, modify or extract information
from signals.
Key operations:
Convolution,correlation,filtering,transformation &
modulation ……
All these involve MAC operation
 Applications:
Image processing (enhancement, edge detection,
denoising,animation etc), data compression,speech
recognition & analysis, communication, music, home
appliances…..
5
Advantages of DSP
• Flexible ( Programmable)
• Less sensitive to tolerance of components
– The memory & processor are fairly independent of
temperature & aging
• Cheaper with better performance and compact
• Cascaded easily
• Easy Storage
• Low power consumption
• Single chip processors possible
• Some signal processing operations impossible
to implement using analog technology
– Low frequency signal processing possible
• ……..
6
Disadvantages of DSP
• Increased system complexity due to additional pre and
post processing devices like ADC, DAC and complex
digital circuitry
• Limited range of frequencies are available for
processing. Large bandwidth designs are too
expensive. BWs in the range of 100 MHZ are still
processed by analog methods.
• Design time is more
• Finite word length problems exist
Advantages outweigh disadvantages in most applications

so DSP applications are increasing rapidly
7
Reference:
TMS320C54x manual available with
CCS Simulator
8
Architecture of C54x
• Fixed Point processor
• Advanced Harvard Architecture, CISC Processor
– Separate memory bus structures for program & data.
• High degree of parallelism
– Multiply, load/store, add/sub to/from ACC and new address
generation can be done simultaneously.
• Powerful Instruction set & most of the operations are
of single cycle
• Targeted for portable devices (cellular phones, MP3
players, digital cameras …)
9
Bus structure
• Has several address/data buses:
1. Program Bus (PB): carries instruction codes &
immediate operands from program memory to CPU.
2. Program Address Bus (PAB): provides addresses to
program memory for both read/write operations.
3. Data Bus (DB): carries data between data memory
space and CPU.
4. Data Address Bus (DAB): provides addresses to
access data memory.
10
Buses in C54x
• 8 major 16-bit buses
– 4 program / data buses
• 1 Program bus, PB
• 3 Data buses
• CB & DB for READ
• EB for Write
– 4 address buses
• PAB, CAB, DAB & EAB
11
• All CPU registers, peripheral registers
and I/O ports occupy data memory space
12
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o

Buses…
• Can generate upto 2 data-mem addr per cycle
– ARAU0 & ARAU1
• PB can carry data operands stored in program space
to MAC.
• One coeff from program mem
Two data values from data mem using ARAU0 &
ARAU1
Ex: [x(i) + x(N-1-i)] * h(i) symmetric FIR in single
cycle.
14
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

Memory organization
• Minimum address range of 192K words
– 64K words for program space
– 64K words for data space
– 64K words for I/O space
• ROM, DARAM, SARAM, two way shared
RAM
• On-chip Memory Security option
• MMR: 26 CPU regs, peripheral regs and
scratch pad RAM block located on data page
0(DP0) 16
Memory Mapped Registers(MMR)
• 96 registers mapped into page 0 of the data

memory space.
– 26 CPU reg
– 16 I/O port regs
– Peripheral & reserved regs
• Register operation == mem operation
ex: AR0 maps to mem 16h in 5x and maps to
10h in 54x.
17
Central Processing Unit
• CPU Registers
• 40-bit ALU
• Two 40-bit Acc Regs (AccA & AccB)
• Barrel Shifter Supporting 0-31 bit left shift
& 0-16 bit right shift range
• MAC Block
• 16-bit Temp Reg (T)
• 16-bit Transition Reg (TRN)
• Compare, Select and Store Unit (CSSU)
• Exponent Encoder
17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 18

14
Accumulators A & B
39-32 31-16 15-0
AG AH AL
Guard bits High-Order Bits Low-Order Bits
39-32 31-16 15-0
BG BH BL
Guard bits High-Order Bits Low-Order Bits

14
CPU registers
• IMR, IFR
• ST0 & ST1
• PMST
• AR0 – AR7(GPRs)
• SP reg
• Circular-Buffer size Register (BK)
• Block-Rep Regs (BRC, RSA and REA)
• PC Extension Reg (XPC)

14
Memory Organization
• Minimum address range of 192K words
– 64K words for program space(Page 0): On-chip
ROM(4K)
• Extended by 127 pages (Page 1-127) each of 64 K words
– 64K words for data space
• On-Chip DARAM (5K)
– 64K words for I/O space
• ROM, DARAM, SARAM, two way shared
RAM
• On-chip Memory Security option
4
Memory Mapped Registers
• Memory Mapped Registers (MMRs): CPU regs,

peripheral regs and scratch pad RAM block located on
data page 0(DP0)
– 96 registers mapped into page 0 of the data memory space.
• 26 CPU regs
• 16 I/O port regs
• Peripheral & reserved regs
• Register operation == mem operation
ex: AR0 maps to mem 16h in 5xand maps to 10h in
54x.
5
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o

Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

CPU registers
• IMR, IFR
• ST0 & ST1
• PMST
• AR0 – AR7(GPRs)
• SP reg
• Circular-Buffer size Register (BK)
• Block-Rep Regs (BRC, RSA and REA)
• PC Extension Reg (XPC)
8
ST0 register
15 - 13 12 11 10 9 8-0
ARP TC C OVA OVB DP
• DP : Data memory page pointer, concatenated

with the 7-LSBs of an instruction word to form a
direct memory address of 16-bits, if CPL = 0.
• OVB: Overflow for AccB.
• OVA: Overflow for AccA.
9
ST0 register Cont..
15 - 13 12 11 10 9 8-0
ARP TC C OVA OVB DP
• C: Carry,
1 for Carry generated by addition.
0 for Borrow generated by
subtraction
otherwise,
0 for add & 1 for sub.
10
ST0 register cont..
15 - 13 12 11 10 9 8-0
ARP TC C OVA OVB DP
• TC: Test/Control flag,Stores the result of ALU

test bit operations.
• ARP: Auxiliary Register Pointer, Selects AR0 –
AR7 for indirect single-operand addressing.
Set to 0 if CMPT = 0
11
Status Register (ST1)
15 14 13 12 7 6 5 4-0
11 10 9 8
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT
ASM
• BRAF: Block-Rep active flag

– BRAF=0, when BRC< zero; BRAF=1, when RPTB
• CPL: Compiler mode.
– CPL=0, DP is selected; CPL=1, SP is selected
• XF: External flag, a GP O/P pin for multiprocessor
configuration.
– Set: SSBX; Reset: RSBX
• HM: Hold Mode, determines whether the CPU stops or continues
execution when acknowledging an active HOLD signal.
12
Status register 1
15 14 13 11 10 9 8 7 6 5 4-0
12
BRAF INTM 0 OVM SXM C16 FRCT CMPT ASM
CPL XF HM
• INTM: Interrupt mode.
0, all unmasked interrupts are enabled 1, all maskable interrupts
are enabled
• OVM: Overflow mode, enables (1) / disables(0) the
accumulator to saturate on overflow.
• SXM: Sign extension mode, enables / disables sign extension
of an arithmetic operation.
13
Status register 1…..
15 14 13 12 11 10 9 8 7 6 5 4-0
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT ASM
• C16: Dual 16-bit/ Double precision arithmetic

mode.
– C16=0, ALU operates in double precision mode
–
C16=1, ALU operates in dual 16 bit arithmetic mode
• FRCT: Fractional mode (multiplication)
– If 1, multiplier output is left shifted by 1 bit to compensate for extra sign
bit
• CMPT: Compatibility mode for ARP.(ARP not updated(0), ARP

updated(1))
• ASM: Accumulator Shift Mode.
– Specifies a shift value of -16 to +15 range and is coded as 2’s complement
14
value
Processor Mode Status Register
15-7 6 5 4 3 2 1 0
IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*
• IPTR: Interrupt vector pointer. 9-bit =>

128-word program page of IVs.
• MP/MC: enables/disables on-chip ROM (4k) to be
addressable in program memory space.
0= enabled & addressable. 1= not available.
• OVLY: RAM overlay, enables on-chip DARAM (5k)
to be mapped into program space.
15
Processor Mode Status Register
15-7 6 5 4 3 2 1 0
IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*
• AVIS: Address visibility mode, enables/disables the internal

program address to be visible at the address pins.
• DROM: Data ROM, enables on-chip ROM to be mapped into
data space.
• CLKOFF: Clock Out off, disables the output CLKOUT
• SMUL: Saturation on multiplication
• SST: Saturation on store
16
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

Multiply & Add (MAC)Unit
• 17-bit x 17-bit hardware multiplier
• 40-bit adder
• MAC in one pipeline phase cycle
• Signed / unsigned multiplication
• Contains a zero detector, a rounder and
overflow / saturation logic
5
MAC Unit

MAC Unit…
7
ALU
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 8
Barrel Shifter
• Used for scaling operations…
– Prescaling an input data memory operand or the
Acc value before an ALU operation
• Performing a logical / arithmetic shift of the Acc.
• Normalizing the Acc?
• Post scaling the Acc before storing the Acc value
into data memory.
9
Addressing Modes of TMS320c54x
Data Addressing Modes
• Provide various ways to access operands to
execute instructions and place results in
the memory or the registers.
• C54x has 7 addressing modes
– Immediate Addressing
– Absolute Addressing
– Accumulator Addressing
– Direct Addressing
– Indirect Addressing
– Memory-Mapped Register Addressing
– Stack Addressing
4
Immediate addressing
• Value encoded in the instruction.
• Two types of values:
– Short immediate (3/5/8/9 bits)
– Long immediate (16 bits)
• # indicates immediate.
5
Example
• LD #5, ARP ; 3-bit constant
• LD #143h, DP ; 9-bit constant
• LD #80h, A ; 8-bit constant
• LD #1000h, A ; 16-bit
constant
6
Absolute Addressing
• Complete address is specified
• Address is always of 16-bits
• So, instruction is of 2 words
• 4 types:
– dmad addressing
– pmad addressing
– PA addressing
– *(lk) addressing
7
Example
• MVKD SAMPLE, *AR5 ;dmad
addr
• MVDK *AR3, DATA1 ; dmad

addr
• MVPD COEFF, *AR7 ; pmad

addr
• PORTR FIFO, *AR5 ; PA addr
• LD *(BUFFER), A; *(lk) addr 8

Accumulator Addressing
• Use Acc (A/B) contents as address.
• Used to address program memory as data.
• Two instructions:
– READA Smem
– WRITA Smem
9
Direct Addressing
• Lower 7-bit dma is an address offset
• CPL in ST1 used for selection.
• Types:
– DP-Referenced Direct addressing
• Can access upto 128 locations (7 bits) of 512 pages(9
bits of DP) in DMA
– SP-Referenced Direct addressing
10
11
Indirect Addressing
• Uses 8 ARs; AR0-AR7
• Used to step-through sequential locations in
mem in fixed-size steps
• AR modified by:
• Increment / Decrement
• Offset
• Index
• Special modes:
• Circular addressing
• Bit-reversed addressing
5
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 6
28/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore
14
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 7
14
Offset Address Modification
• 16-bit offset added to AR
• Two types:
– AR not updated
• Useful in accessing an element in array / structure
– AR updated to new address
• Useful in stepping thro’ an array in fixed step-size.
See Table: MOD 12 & 13.
8
Circular addressing
• Circular buffer: sliding window containing most
recent data
• Uses decrement/increment by 1 or by index
• BK: Circular buffer size register
• A circular buffer of size R must start at a N-bit
boundary (2N > R) (N LSBs of base address must
be zero resulting in Effective Base address
(EFB))
• End of Buffer (EOB) is obtained by replacing N
LSBs of ARx with N LSB of BK
• Index of circular buffer: N LSBs of ARx & step
is added or subtracted from AR
9
Circular Addressing Block Diagram
e
r
o
l
a
g
n
a
B
,
NgLSBs of ARx
n
hi
c
S
r
ae
te
in
ri
m
g
nA
E
fo
l
o
o
10
Example
• Let AR3=1020h and BK=40h.Determine the
start & end address of the buffer. What will be
the content of AR3 after LD *AR3+0%, A
if AR0=0025h
11
Circular addressing
• Rules to be followed:
e
n
i
g
n
E
hf
oc
S
l
ao
to
i
r
m
A
12
Bit reversed addressing
• Enhances the execution speed for FFT
algorithm
• AR0 specifies one-half the size of FFT
(2N-1) where N is integer (2N : size of FFT)
• To generate address, add AR0 to any AR
which is pointing to a data value in bit
reversed fashion.
– Carry propagates from left to right
13
Example
re
o
l
a
g
n
a
B
,
g
nl
io
h
rco
S
e
ae
tn
ii
rg
m
n
E
A
f
o
14
MMR Addressing Mode
5
CPU MMRs (Table 10.3)
6
7
Memory Mapped Register addressing
• Modifies MMRs without affecting DP and SP
• In addition to registers any scratch-pad RAM on
Data Page 0 can be modified
• 2 modes
 Direct: forces 9 MSBs of Dmem to 0(DP0)
 Indirect: uses 7 LSBs of current AR
If AR1 point to MMR & it contains FF25h,then AR1
points to Time Period register(PRD) whose address
is 25h (7 LSBs of FF25h).After execution AR1=25h.
• Example
 LDM MMR ,dst (direct)
 STM #lk, *arx (indirect)
8
Example 1 LDM AR4, ;Direct
A After execution
Before execution
A 00 0000 FFFF
A 00 0000 1111
AR4 FFFF AR4 FFFF
Example 2 LDM ;Direct

060h, execution
Before B After execution
00 0000 0000 00 0000
B B
1234
Data Memory 0060h 1234 0060h 1234
9
Example 3 STM #FFFF,IMR ;Direct
Before execution After execution

IMR FF01 IMR FFFF
Example 4 STM #8765,*Ar7+ ;Indirect
Before execution After execution

AR0 0000 AR0 8765
AR7 8010 AR7 0011

14
Some MMR addr mode Instructions



14

14
Stack Addressing
• System Stack: store PC when interrupt / subroutine
• To pass data values
• SP points to the last element stored onto the stack
• A Push pre-decrements & a Pop post-increments the
address in the stack

14
PSHD X2
( )

14
BUS Operations
• 16-bit read: D bus

• 16-bit write: E bus
• 32-bit read: C:D Bus (MSword : LSword)
• 32-bit write: ?????
• For 32-bit data, present address is MSword
• First word accessed is Even / Odd address??
• Even: Second word is at the next (higher) address
• Odd: Second word is at the previous (lower) address

14
Instruction Set of c54x
Instruction set
• Arithmetic operations
• Logical operations
• Load and store operations
• Program control
operations
• Special operations

14
Arithmetic operations
• Absolute value
• Addition
• Subtraction
• Multiplication

14
Assembler Directives
.mmregs ;Permits the memory map registers to be

referred using the name such as AR0, SP,DP…
.text ;Start assembling into program memory area
.data ;Start assembling into data memory area
.end ;End program
.equ ;Equate value with a symbol
also, .set

14
Assembler Directives Cont…
.word ;Intialise one or more 16-bit integers. Also, .half,

.int, .short
.bes size_in_bits ; Reserve size bits in current section,
label points to the last addressable word in
the reserved space
.space n ; Reserve n bits in current section, label points
to the first addressable word in the reserved space
.copy “filename” ; Include source statements from
another file.
Also .include “filename”

14
Program structure
• Label
• Assembler directive
• Mnemonic field
• Operand list
• Comment (after ;)

14
Assembly Conventions
tabs or
spaces
label: mnemonic operand,operand

;comment
colon optional instruction or directive

 Any printable ASCII text is allowed(it is not case sensitive)
 Use .asm extension for file
 Instructions and directives cannot be in first column
 Comments in any column after semicolon
Count .equ 4
macp *ar3-,coeff,a ;coeff in 7
Loop: 14 pma
A simple addition program
• Write a program to add 2 numbers stored in
locations 1000h and 1100h. Store the end

result in location 1250h.

14
Addition

14
Program
.mmregs
.text
stm #1000h,ar2
stm #1100h,ar3
stm #1250h,ar4
ld #0h,a
add *ar2,*ar3,a ;Result stored in AH
sth a,*ar4
.end
14

DSP Processors

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSP Processors

Uploaded by

Copyright:

Available Formats

DSP Architectures

Data buses Prog. bus

• Parallel / serial G Read / write cross bar

applications/for Unit 1 Unit n

• More no. of registers. • Less no. of registers

• Programs are large • Programs are compact.

• Less time to execute. • More time to execute.

• Easy to pipeline • Difficult to pipeline.

• Programming is complex. • Programming is simple.

• Depends on the way multi byte data is

Address Big Endian Little Endian

• Little Endian  LS Byte first

• Big Endian  MS Byte first

results to avoid errors resulting from overflows and

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 5

(b) When 2 numbers represented by n bits are

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6

• In practical DSPs shifting is combined with data

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 8

• Requires an add/subtract unit & an additional register

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 10

• In addition to shift, multiply and MAC, DSP is

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 11

C : CMOS Tech with on –chip non- volatile memory

• C5x: MP3players, • C6x: Wireless

Advantages outweigh disadvantages in most applications

17/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 13

17/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 15

• 96 registers mapped into page 0 of the data

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 18

39-32 31-16 15-0

Guard bits High-Order Bits Low-Order Bits

39-32 31-16 15-0

Guard bits High-Order Bits Low-Order Bits

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 19

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 20

• Memory Mapped Registers (MMRs): CPU regs,

18/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6

18/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 7

ARP TC C OVA OVB DP

• DP : Data memory page pointer, concatenated

ARP TC C OVA OVB DP

ARP TC C OVA OVB DP

• TC: Test/Control flag,Stores the result of ALU

• BRAF: Block-Rep active flag

• INTM: Interrupt mode.

0, all unmasked interrupts are enabled 1, all maskable interrupts

• C16: Dual 16-bit/ Double precision arithmetic

• CMPT: Compatibility mode for ARP.(ARP not updated(0), ARP

IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*

• IPTR: Interrupt vector pointer. 9-bit =>

IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*

• AVIS: Address visibility mode, enables/disables the internal

21/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 4

21/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6

• LD #143h, DP ; 9-bit constant

• LD #80h, A ; 8-bit constant

• MVDK *AR3, DATA1 ; dmad

• MVPD COEFF, *AR7 ; pmad

• PORTR FIFO, *AR5 ; PA addr

• LD *(BUFFER), A; *(lk) addr 8

• Used to address program memory as data.

• CPL in ST1 used for selection.

• LD (BUFFER), A; (lk) addr 8