Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 114

DSP Architectures

Comparison
• Since Fixed Point DS processor operates
using integer format, range of numbers get
limited leading to overflow problems. More
coding effort is needed to deal with such a
problem
– Choice for ASIC DSP (performance & small slice
area)
• Floating point offers wide range of data, but
requires complex circuitry hence more
expensive and slower than fixed point.
– Choice for prototyping or proof-of-concept
2
development
• Most floating point numbers perform
automatic normalization so that numbers are
properly shifted & aligned. The programmer
just needs to take care of overflow problem.
• But due to enormous dynamic range,scaling is
rarely needed.
• Floating point processors are easier to use than
fixed point processors but are more expensive

3
Comparison between fixed & floating point
processors
• 16 or 24 bit devices • 32 bit devices
• Limited dynamic range • Large dynamic range
• Overflow & quantization errors • Easier to program since no
must be resolved scaling is required
• Poorer C compiler efficiency. • Better C compiler efficiency. Can
Normally programed in be developed in C
assmbely
• Quick time to market
• Long product developement time
• Faster clock rate • Slower clock rate
• Less silicon area is required • More silicon area is required as
functional units are complex
• Cheaper
• More expensive
• Low power consumption
• High power consumption
• Bursty in nature
• High speed

4
Applications of fixed & floating
Processors
point
• Drive disc and motor • In radar,sonar & seismic
control applications
• Consumer audio • Highend audio applications
applications such as MP3
players,multimedia gaming
and digital cameras
• Speech coding/decoding and • Sound synthesis in
channel coding professional audio vedio
• Communication devices coding/decoding
such as modems & cellular
phones.

5
Sources of Error in DSPs
• DSP System: ADC, DSP device, DAC
• Accuracy depends on number of factors
contributed by ADC & DAC and how the
calculations are done in DSP device
• Errors in ADC & DAC : Quantization errors
(Limited by number of bits)
• Errors in DSP calculations: Finite word length
(Can be reduced by using larger word length
& by rounding instead of truncation)

6
Comparison between DSP and GPP
• Used for embedded • Desk top computing/Servers
applications
• Low power requirement • High Power consumption
• Have features required for
DSP applications (FFT,
Convolution, Correlation
etc.)
• Real time I/O
• High speed on chip
memories
• Bursty in nature
• Deals with infinite
continuous stream of data.
• Slow • High speed
• Has a typical MAC unit.
7
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP

8
• DSP Architectures
– General architectures
• Architectural aspects
– H/W and S/W aspects
– RISC, CISC
– Endianess

3
Digital Signal Processors
• Application Specific • Programmable
– Designed to perform one – Can be programmed for
function more accurately, different applications
faster and is more cost – Cost effective than GPP
effective – Architecture is designed
– Ex.: Digital filters, FFT for repetitive nature of
chips signal processing by
pipelining & parallelism
– Performs certain
operations like MAC
faster than GPP

4
Architecture models
• Von-Neumann
– Single memory space
– Inefficient for memory
intensive operations

5
Architecture models

• Harvard
– Split memory space,
separate prog. & data
buses
– Faster

Tripathi,ASE, Bangalore
66
6
Architecture models
• Modified Harvard
– Split memory space,
separate buses
– Parallel memory access
(using DARAM/DPRAM)
– Used in TMS320C54x,
1 program & 3 data buses

Data buses Prog. bus

7
VLIW Architecture
P
• Very Long Multiported register file
R
Instruction Word
architecture O

• Parallel / serial G Read / write cross bar


execution C
R
O
• Has multiple A
functional units N
M
• For signal & IP T
Functional …... Functional

applications/for Unit 1 Unit n


R
media processors U

• Ex. TMS320C6x O
N
L Instruction cache
I
T
8
Hardware Aspects
• CPU
– MAC
– ALU
– Shifter
– Pipelining and parallelism
– Buses
– Data address generator
• Memory
– DARAM/DPRAM/SARAM
• Multiport memories are costlier than multiple access due to
more number of pins and larger chip area but permit parallel
access of memory locations
– Cache
– ROM 9
Hardware Aspects…
• Peripherals and Input Output
– Serial port
• Standard serial port
• Buffered serial port
• TDM serial port
• Multi channel buffered serial port( auto-buffering unit supports high
speed transfers & reduces overhead of servicing interrupts)
– Host port interface
– DMA controller
– Parallel port
– Hardware timer
– Power management
• Clock frequency control
• Power-down mode
• Disabling of unused peripherals
10
Software Aspects
• Instruction set
– CISC: Complex Instruction Set Computing
– RISC: Reduced Instruction Set Computing
• Programming languages
– Assembly programs
– C programs
• Software development tools
– C compiler
– Assembler
– Linker
– Simulator
– Code Composer Studio (CCS)

11
RISC Vs CISC
• Instruction set is simple. • It is complex.
(typically <100 instructions) (>1000 instns.)
• Simple opcodes (ADD,SUB) • Instructions are tailored to
DSP(FIR,CONV,MACD)
• Compilers for HLL is shorter &
simple. Control unit is small • Compilers for HLL are costly. Control
(hard wired). unit is large (micro prog).

• More no. of registers. • Less no. of registers

• Programs are large • Programs are compact.

• Less time to execute. • More time to execute.

• Easy to pipeline • Difficult to pipeline.

• Programming is complex. • Programming is simple.


• Ex. ADSP 2100X, TMS320C54X,
• Ex. TMS320C6X 12
M563XX
TMS320C8X(RISC/CISC)
Endians
• Depending on the way bytes are ordered
within a larger object, a Processor can be:
– Big Endian
– Little Endian

• Depends on the way multi byte data is


stored.

13
Eg:- 12345678 can be stored in 4x8bit locations as follows:

Address Big Endian Little Endian

1000 12 78

1001 34 56

1002 56 34

1003 78 12

14
Endians in DSPs

• Little Endian  LS Byte first


– TI DSP Processors

• Big Endian  MS Byte first


– Motorola DSP 56k

15
Programmable DSP
• A programmable DSP device should provide
instructions similar to Microprocesors
• The computational capabilities provided by these
instructions should inlcude:
– Arithmetic operations like add,subtract &
multiply
– Logic operations like AND,OR,XOR & NOT
– MAC operations
– Signal scaling operations before & after DSP

16
Programmable DSP Cont..
• Support Architecture should include:
– On-chip registers for storage of
intermediate results
– On-chip memories for signal
samples(RAM)
– On-chip program memory for programs &
fixed data such as filter coefficients

17
Computational building
• blocks
Key issue: Speed and accuracy
• DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU

18
Computational Building Blocks of DSP
Computational building
• blocks
DSP computational building blocks
– Multiplier
– Shifter
– MAC Unit
– ALU

4
Shifter
• Required to scale down or scale up operands &
Amrita School of Engineering, Bangalore

results to avoid errors resulting from overflows and


underflows during computations
(a) When N number of n bits are added, the number of
required bits increases to (n+log2N) bits
– Loss due to overflow can be avoided by scaling
down each number by log2N
– Results in decrease in accuracy but loss due to
overflow can be avoided
– The actual sum can be obtained by scaling up the
result by log2N bits

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 5


Shifter Cont..
Amrita School of Engineering, Bangalore

(b) When 2 numbers represented by n bits are


multiplied, the product can have maximum 2n bits.
Discarding lower n bits results is loss of
accuracy. In multiplying 2 signed numbers accuracy
is slightly increased by shifting the product by 1 bit
to left before storing higher order bits.
(c) While carrying out floating point additions, the
operands should be normalized to have same
exponent. Shifting is required.

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6


Barrel Shifter
• In DSPs multibit shifts are common
• Shifts by multibit is possible in one cycle using a
combinational circuit called Barrel shifter
• It connects the input lines representing a word to a
group of output lines with required shift determined
by its control inputs. Control unit also determines
the direction of shift
• For input word of n bits, 0 – (n-1) bit shifts require
log2n control lines
• Bits shifted out in the left shift are filled with zeros
in LSB & new bit positions are filled with MSB to
retain the sign in case of right shift 11
Barrel Shifter Cont..
Amrita School of Engineering, Bangalore

• In practical DSPs shifting is combined with data


transfer. Both operations are executed in single
clock cycle.
• Since the circuit is a combinational logic circuit, the
time taken to implement shift is total combination
delay involved in decoding control lines & setting
up path from the input to output lines. The delay is
fraction of clock cycle.

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 8


MAC
Amrita School of Engineering, Bangalore

• Requires an add/subtract unit & an additional register


called accumulator at the output
• Consists of a multiplier that multiplies 2 n-bit
numbers and gives product 2n bits wide. This is added
to or subtracted from the contents of accumulator and
the result saved in accumulator (A+BC)
• If both multiply & accumulate work in parallel, it can
be done in one cycle.
– When multiplier computes product, accumulator
accumulates the product of previous multiplication.
– If N products are to be accumulated,N-1 multiplies can
overlap with accumulations. During first multiply, acc is
idle and during last acc mul is idle(N+1 9
11/ 02/ 14
cycles)
© Dr.Shikha Tripathi,ASE, Bangalore
MAC Cont..
Amrita School of Engineering, Bangalore

• Saturation logic:
– Overflow occurs when accumulator result becomes
larger than the largest (smaller than the smallest(-
ve numbers) results in underflow)
– Accumulator contents are limited to the most +ve
or most –ve value to avoid error known as wrap-
around error.

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 10


ALU
Amrita School of Engineering, Bangalore

• In addition to shift, multiply and MAC, DSP is


required to carry out several arithmetic & logic
operations like add, subract, increment, decrement,
negate, AND,OR,NOT,EXOR and compare, like
in microprocessor
• ALU is similar to microprocessor with some
additional features

11/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 11


DSP Manufacturers
• Texas instruments
– TMS 320XXX
• Analog Devices
– ADSP 2100x, 218xx, 219xx
– SHARC, Tiger SHARC
• Motorola
– M56xxx
• Others
– Hitachi, Fuchitsu…

13
TI DSP IC
TMS 320 C 5X
TMX : Experimental device
TMP : Prototype
TMS : Qualified device

C : CMOS Tech with on –chip non- volatile memory


as ROM
E : CMOS tech with on-chip non – volatile memory as
EPROM
nothing : NMOS tech with on-chip non – volatile
mem as ROM

5 :Generation
X :Version number- 0,1,2,3,4x,5,6,7 14
TI DSP Types
• Fixed Point DSPs (16 bit DSPs)
– TMS320C2000(C24X)
– TMS320C5000(C54X & C55X)
– TMS320C6000(C62X & C64X)
• Floating Point DSPs(32 bit
DSPs)
– TMS320C3x
– TMS320C4x
– TMS320C67x 15
Applications

• C5x: MP3players, • C6x: Wireless


Modems, Cellular applications, Modems,
phones, digital cameras Remote-access servers,
DSL systems, Cable
• C3x: Filters, hi-fi modems, Multi-channel
systems, imaging, 3D telephone systems.
Graphics
• C8x: Video telephony,
• C4x: Virtual Reality, VR, Multimedia
Image recognition, applications, cellular base
Telecommunication stations.

16
Architecture of
TMS320C54x
Digital Signal Processing
 Digital Representation of Signals &
Processing of these signals
 Digital Representation: Conversion of natural
analog signals to digital by sampling &
quantization.
 Signal Types: Analog & Digital
Ex. Speech, image & video, biomedical, music,
radar, seismic signals (low frequency) etc..

4
 Processing: Analyze, modify or extract information
from signals.
Key operations:
Convolution,correlation,filtering,transformation &
modulation ……
All these involve MAC operation
 Applications:
Image processing (enhancement, edge detection,
denoising,animation etc), data compression,speech
recognition & analysis, communication, music, home
appliances…..
5
Advantages of DSP
• Flexible ( Programmable)
• Less sensitive to tolerance of components
– The memory & processor are fairly independent of
temperature & aging
• Cheaper with better performance and compact
• Cascaded easily
• Easy Storage
• Low power consumption
• Single chip processors possible
• Some signal processing operations impossible
to implement using analog technology
– Low frequency signal processing possible
• ……..
6
Disadvantages of DSP
• Increased system complexity due to additional pre and
post processing devices like ADC, DAC and complex
digital circuitry
• Limited range of frequencies are available for
processing. Large bandwidth designs are too
expensive. BWs in the range of 100 MHZ are still
processed by analog methods.
• Design time is more
• Finite word length problems exist

Advantages outweigh disadvantages in most applications


so DSP applications are increasing rapidly

7
Reference:
TMS320C54x manual available with
CCS Simulator

8
Architecture of C54x
• Fixed Point processor
• Advanced Harvard Architecture, CISC Processor
– Separate memory bus structures for program & data.
• High degree of parallelism
– Multiply, load/store, add/sub to/from ACC and new address
generation can be done simultaneously.
• Powerful Instruction set & most of the operations are
of single cycle
• Targeted for portable devices (cellular phones, MP3
players, digital cameras …)

9
Bus structure
• Has several address/data buses:
1. Program Bus (PB): carries instruction codes &
immediate operands from program memory to CPU.
2. Program Address Bus (PAB): provides addresses to
program memory for both read/write operations.
3. Data Bus (DB): carries data between data memory
space and CPU.
4. Data Address Bus (DAB): provides addresses to
access data memory.

10
Buses in C54x
• 8 major 16-bit buses
– 4 program / data buses
• 1 Program bus, PB
• 3 Data buses
• CB & DB for READ
• EB for Write

– 4 address buses
• PAB, CAB, DAB & EAB

11
• All CPU registers, peripheral registers
and I/O ports occupy data memory space
12
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o

17/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 13


Buses…
• Can generate upto 2 data-mem addr per cycle
– ARAU0 & ARAU1
• PB can carry data operands stored in program space
to MAC.
• One coeff from program mem
Two data values from data mem using ARAU0 &
ARAU1
Ex: [x(i) + x(N-1-i)] * h(i) symmetric FIR in single
cycle.

14
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

17/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 15


Memory organization
• Minimum address range of 192K words
– 64K words for program space
– 64K words for data space
– 64K words for I/O space
• ROM, DARAM, SARAM, two way shared
RAM
• On-chip Memory Security option
• MMR: 26 CPU regs, peripheral regs and
scratch pad RAM block located on data page
0(DP0) 16
Memory Mapped Registers(MMR)

• 96 registers mapped into page 0 of the data


memory space.
– 26 CPU reg
– 16 I/O port regs
– Peripheral & reserved regs
• Register operation == mem operation
ex: AR0 maps to mem 16h in 5x and maps to
10h in 54x.

17
Central Processing Unit
Amrita School of Engineering, Bangalore

• CPU Registers
• 40-bit ALU
• Two 40-bit Acc Regs (AccA & AccB)
• Barrel Shifter Supporting 0-31 bit left shift
& 0-16 bit right shift range
• MAC Block
• 16-bit Temp Reg (T)
• 16-bit Transition Reg (TRN)
• Compare, Select and Store Unit (CSSU)
• Exponent Encoder

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 18


14
Amrita School of Engineering, Bangalore
Accumulators A & B

39-32 31-16 15-0

AG AH AL

Guard bits High-Order Bits Low-Order Bits

39-32 31-16 15-0

BG BH BL

Guard bits High-Order Bits Low-Order Bits

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 19


14
CPU registers
Amrita School of Engineering, Bangalore

• IMR, IFR
• ST0 & ST1
• PMST
• AR0 – AR7(GPRs)
• SP reg
• Circular-Buffer size Register (BK)
• Block-Rep Regs (BRC, RSA and REA)
• PC Extension Reg (XPC)

17/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore 20


14
Memory Organization
• Minimum address range of 192K words
– 64K words for program space(Page 0): On-chip
ROM(4K)
• Extended by 127 pages (Page 1-127) each of 64 K words
– 64K words for data space
• On-Chip DARAM (5K)
– 64K words for I/O space
• ROM, DARAM, SARAM, two way shared
RAM
• On-chip Memory Security option
4
Memory Mapped Registers

• Memory Mapped Registers (MMRs): CPU regs,


peripheral regs and scratch pad RAM block located on
data page 0(DP0)
– 96 registers mapped into page 0 of the data memory space.
• 26 CPU regs
• 16 I/O port regs
• Peripheral & reserved regs
• Register operation == mem operation
ex: AR0 maps to mem 16h in 5xand maps to 10h in
54x.

5
e
r
o
l
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
o

18/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6


Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

18/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 7


CPU registers
• IMR, IFR
• ST0 & ST1
• PMST
• AR0 – AR7(GPRs)
• SP reg
• Circular-Buffer size Register (BK)
• Block-Rep Regs (BRC, RSA and REA)
• PC Extension Reg (XPC)

8
ST0 register
15 - 13 12 11 10 9 8-0

ARP TC C OVA OVB DP

• DP : Data memory page pointer, concatenated


with the 7-LSBs of an instruction word to form a
direct memory address of 16-bits, if CPL = 0.
• OVB: Overflow for AccB.
• OVA: Overflow for AccA.

9
ST0 register Cont..
15 - 13 12 11 10 9 8-0

ARP TC C OVA OVB DP

• C: Carry,
1 for Carry generated by addition.
0 for Borrow generated by
subtraction
otherwise,
0 for add & 1 for sub.
10
ST0 register cont..
15 - 13 12 11 10 9 8-0

ARP TC C OVA OVB DP

• TC: Test/Control flag,Stores the result of ALU


test bit operations.
• ARP: Auxiliary Register Pointer, Selects AR0 –
AR7 for indirect single-operand addressing.
Set to 0 if CMPT = 0

11
Status Register (ST1)
15 14 13 12 7 6 5 4-0
11 10 9 8
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT
ASM

• BRAF: Block-Rep active flag


– BRAF=0, when BRC< zero; BRAF=1, when RPTB
• CPL: Compiler mode.
– CPL=0, DP is selected; CPL=1, SP is selected
• XF: External flag, a GP O/P pin for multiprocessor
configuration.
– Set: SSBX; Reset: RSBX
• HM: Hold Mode, determines whether the CPU stops or continues
execution when acknowledging an active HOLD signal.
12
Status register 1
15 14 13 11 10 9 8 7 6 5 4-0
12
BRAF INTM 0 OVM SXM C16 FRCT CMPT ASM
CPL XF HM

• INTM: Interrupt mode.

0, all unmasked interrupts are enabled 1, all maskable interrupts

are enabled
• OVM: Overflow mode, enables (1) / disables(0) the
accumulator to saturate on overflow.
• SXM: Sign extension mode, enables / disables sign extension
of an arithmetic operation.
13
Status register 1…..
15 14 13 12 11 10 9 8 7 6 5 4-0
BRAF CPL XF HM INTM 0 OVM SXM C16 FRCT CMPT ASM

• C16: Dual 16-bit/ Double precision arithmetic


mode.
– C16=0, ALU operates in double precision mode

C16=1, ALU operates in dual 16 bit arithmetic mode
• FRCT: Fractional mode (multiplication)
– If 1, multiplier output is left shifted by 1 bit to compensate for extra sign
bit

• CMPT: Compatibility mode for ARP.(ARP not updated(0), ARP


updated(1))
• ASM: Accumulator Shift Mode.
– Specifies a shift value of -16 to +15 range and is coded as 2’s complement
14
value
Processor Mode Status Register
15-7 6 5 4 3 2 1 0

IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*

• IPTR: Interrupt vector pointer. 9-bit =>


128-word program page of IVs.
• MP/MC: enables/disables on-chip ROM (4k) to be
addressable in program memory space.
0= enabled & addressable. 1= not available.
• OVLY: RAM overlay, enables on-chip DARAM (5k)
to be mapped into program space.

15
Processor Mode Status Register
15-7 6 5 4 3 2 1 0

IPTR MP/¯MC OVLY AVIS DROM CLKOFF SMUL* SST*

• AVIS: Address visibility mode, enables/disables the internal


program address to be visible at the address pins.
• DROM: Data ROM, enables on-chip ROM to be mapped into
data space.
• CLKOFF: Clock Out off, disables the output CLKOUT
• SMUL: Saturation on multiplication
• SST: Saturation on store

16
Barrel Shifter
e
r
o
l
ALU
a
g
n
a
B
,
g
nh
ci S
ar
te
ie
rn
m
i A
g
n
E
fo
l
o
MAC o Unit
CSSU

21/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 4


Multiply & Add (MAC)Unit
• 17-bit x 17-bit hardware multiplier
• 40-bit adder
• MAC in one pipeline phase cycle
• Signed / unsigned multiplication
• Contains a zero detector, a rounder and
overflow / saturation logic

5
MAC Unit

21/ 02/ 14 © Dr.Shikha Tripathi,ASE, Bangalore 6


MAC Unit…

7
ALU
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 8
Barrel Shifter
• Used for scaling operations…
– Prescaling an input data memory operand or the
Acc value before an ALU operation
• Performing a logical / arithmetic shift of the Acc.
• Normalizing the Acc?
• Post scaling the Acc before storing the Acc value
into data memory.

9
Addressing Modes of TMS320c54x
Data Addressing Modes
• Provide various ways to access operands to
execute instructions and place results in
the memory or the registers.
• C54x has 7 addressing modes
– Immediate Addressing
– Absolute Addressing
– Accumulator Addressing
– Direct Addressing
– Indirect Addressing
– Memory-Mapped Register Addressing
– Stack Addressing
4
Immediate addressing
• Value encoded in the instruction.
• Two types of values:
– Short immediate (3/5/8/9 bits)
– Long immediate (16 bits)
• # indicates immediate.

5
Example
• LD #5, ARP ; 3-bit constant

• LD #143h, DP ; 9-bit constant

• LD #80h, A ; 8-bit constant

• LD #1000h, A ; 16-bit

constant
6
Absolute Addressing
• Complete address is specified
• Address is always of 16-bits
• So, instruction is of 2 words
• 4 types:
– dmad addressing
– pmad addressing
– PA addressing
– *(lk) addressing

7
Example
• MVKD SAMPLE, *AR5 ;dmad
addr

• MVDK *AR3, DATA1 ; dmad


addr

• MVPD COEFF, *AR7 ; pmad


addr

• PORTR FIFO, *AR5 ; PA addr

• LD *(BUFFER), A; *(lk) addr 8


Accumulator Addressing
• Use Acc (A/B) contents as address.

• Used to address program memory as data.

• Two instructions:

– READA Smem

– WRITA Smem

9
Direct Addressing
• Lower 7-bit dma is an address offset

• CPL in ST1 used for selection.

• Types:
– DP-Referenced Direct addressing
• Can access upto 128 locations (7 bits) of 512 pages(9
bits of DP) in DMA
– SP-Referenced Direct addressing

10
11
Indirect Addressing
• Uses 8 ARs; AR0-AR7
• Used to step-through sequential locations in
mem in fixed-size steps
• AR modified by:
• Increment / Decrement
• Offset
• Index
• Special modes:
• Circular addressing
• Bit-reversed addressing

5
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 6
28/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore
14
e
r
o
l
a
g
n
a
B
,
g
nh
ci
S
ra
et
ei
nr
im
gA
n
E
fo
l
o
o 7
28/ 02/ © Dr.Shikha Tripathi,ASE, Bangalore
14
Offset Address Modification
• 16-bit offset added to AR
• Two types:
– AR not updated
• Useful in accessing an element in array / structure
– AR updated to new address
• Useful in stepping thro’ an array in fixed step-size.

See Table: MOD 12 & 13.

8
Circular addressing
• Circular buffer: sliding window containing most
recent data
• Uses decrement/increment by 1 or by index
• BK: Circular buffer size register
• A circular buffer of size R must start at a N-bit
boundary (2N > R) (N LSBs of base address must
be zero resulting in Effective Base address
(EFB))
• End of Buffer (EOB) is obtained by replacing N
LSBs of ARx with N LSB of BK
• Index of circular buffer: N LSBs of ARx & step
is added or subtracted from AR
9
Circular Addressing Block Diagram
e

r
o
l
a
g
n
a
B
,
NgLSBs of ARx
n
hi
c
S
r
ae
te
in
ri
m
g
nA
E
fo
l
o
o
10
Example
• Let AR3=1020h and BK=40h.Determine the
start & end address of the buffer. What will be
the content of AR3 after LD *AR3+0%, A
if AR0=0025h

11
Circular addressing
• Rules to be followed:

e
n
i
g
n
E
hf
oc
S
l
ao
to
i
r
m
A

12
Bit reversed addressing
• Enhances the execution speed for FFT
algorithm
• AR0 specifies one-half the size of FFT
(2N-1) where N is integer (2N : size of FFT)
• To generate address, add AR0 to any AR
which is pointing to a data value in bit
reversed fashion.
– Carry propagates from left to right

13
Example
re

o
l
a
g
n
a
B
,
g
nl
io
h
rco
S
e
ae
tn
ii
rg
m
n
E
A
f
o

14
MMR Addressing Mode

5
CPU MMRs (Table 10.3)

6
7
Memory Mapped Register addressing
• Modifies MMRs without affecting DP and SP
• In addition to registers any scratch-pad RAM on
Data Page 0 can be modified
• 2 modes
 Direct: forces 9 MSBs of Dmem to 0(DP0)
 Indirect: uses 7 LSBs of current AR
If AR1 point to MMR & it contains FF25h,then AR1
points to Time Period register(PRD) whose address
is 25h (7 LSBs of FF25h).After execution AR1=25h.
• Example
 LDM MMR ,dst (direct)
 STM #lk, *arx (indirect)
8
Example 1 LDM AR4, ;Direct
A After execution
Before execution
A 00 0000 FFFF
A 00 0000 1111
AR4 FFFF AR4 FFFF

Example 2 LDM ;Direct


060h, execution
Before B After execution
00 0000 0000 00 0000
B B
1234
Data Memory 0060h 1234 0060h 1234

9
Amrita School of Engineering, Bangalore

Example 3 STM #FFFF,IMR ;Direct

Before execution After execution


IMR FF01 IMR FFFF

Example 4 STM #8765,*Ar7+ ;Indirect

Before execution After execution


AR0 0000 AR0 8765

AR7 8010 AR7 0011

10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 10


14
Some MMR addr mode Instructions
Amrita School of Engineering, Bangalore


10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 11


14
Amrita School of Engineering, Bangalore

10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 12


14
Amrita School of Engineering, Bangalore
Stack Addressing
• System Stack: store PC when interrupt / subroutine
• To pass data values
• SP points to the last element stored onto the stack
• A Push pre-decrements & a Pop post-increments the
address in the stack

10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 13


14
PSHD X2
Amrita School of Engineering, Bangalore

( )

10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 14


14
BUS Operations
Amrita School of Engineering, Bangalore

• 16-bit read: D bus


• 16-bit write: E bus
• 32-bit read: C:D Bus (MSword : LSword)
• 32-bit write: ?????
• For 32-bit data, present address is MSword
• First word accessed is Even / Odd address??
• Even: Second word is at the next (higher) address
• Odd: Second word is at the previous (lower) address

10/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 15


14
Instruction Set of c54x
Instruction set
Amrita School of Engineering, Bangalore

• Arithmetic operations

• Logical operations

• Load and store operations

• Program control
operations
• Special operations

17/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 4


14
Arithmetic operations
Amrita School of Engineering, Bangalore

• Absolute value

• Addition

• Subtraction

• Multiplication

17/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 5


14
Assembler Directives
Amrita School of Engineering, Bangalore

.mmregs ;Permits the memory map registers to be


referred using the name such as AR0, SP,DP…
.text ;Start assembling into program memory area
.data ;Start assembling into data memory area
.end ;End program
.equ ;Equate value with a symbol
also, .set

18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 4


14
Assembler Directives Cont…
Amrita School of Engineering, Bangalore

.word ;Intialise one or more 16-bit integers. Also, .half,


.int, .short
.bes size_in_bits ; Reserve size bits in current section,
label points to the last addressable word in
the reserved space
.space n ; Reserve n bits in current section, label points
to the first addressable word in the reserved space
.copy “filename” ; Include source statements from
another file.
Also .include “filename”

18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 5


14
Program structure
Amrita School of Engineering, Bangalore

• Label

• Assembler directive

• Mnemonic field

• Operand list

• Comment (after ;)

18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 6


14
Assembly Conventions
tabs or
Amrita School of Engineering, Bangalore

spaces

label: mnemonic operand,operand


;comment

colon optional instruction or directive


 Any printable ASCII text is allowed(it is not case sensitive)
 Use .asm extension for file
 Instructions and directives cannot be in first column
 Comments in any column after semicolon
Count .equ 4
macp *ar3-,coeff,a ;coeff in 7
18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore
Loop: 14 pma
A simple addition program
• Write a program to add 2 numbers stored in
Amrita School of Engineering, Bangalore

locations 1000h and 1100h. Store the end


result in location 1250h.

18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 8


14
Amrita School of Engineering, Bangalore
Addition

18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 9


14
Program
Amrita School of Engineering, Bangalore

.mmregs
.text

stm #1000h,ar2
stm #1100h,ar3
stm #1250h,ar4

ld #0h,a
add *ar2,*ar3,a ;Result stored in AH
sth a,*ar4

.end
18/ 03/ © Dr.Shikha Tripathi,ASE, Bangalore 10
14

You might also like