Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 58

Advanced Processors

Overview of DSP
Unit-5
Unit-6
DSP Processors

DSP Processors are specialized microprocessor with an


optimized architecture for the fast operational needs of
digital signal processing.
Need for DSP Architecture

•Harvard Architecture
• Filtering, correlation, •Pipelining
FFT •Fast dedicated
Parallelism
• Heavy data flow hardware MAC
through CPU •Special Instruction
• Real time operations •Replication
•On-chip memory and
cache
•Extended Parallelism-
SIMD, VLIW, Superscalar
Simplified Architecture of Standard Microprocessor

Van Newman Architecture


Independency between the operations
Limitations on the increase in speed
Hardware Architecture for Signal Processing

Multiple Bus Structure


•Separate data and program memory
•Data memory
•Coefficients, input data, out put samples, intermediated data
Non-Pipelining Architecture
Pipeline Architecture
Pipelining Concept
Pipeline MAC Operation
MAC Configuration
Special Instructions

Special Instruction: MAC


Repeat: RPT
Single Instruction Multiple Data (SIMD)
Processing

Data bus-A

Data bus-B

ALU MAC Shifter ALU MAC Shifter

Execution Unit A Execution Unit B


SIMD Processing

16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bit 16 bi

16x16 16x16 16x16 16x16


MAC MAC MAC MAC

32- bit result 32- bit result 32- bit result 32- bit result
Very Long Instruction Word (VLIW)

Instruction fetch packet


Internal Program Memory •Eight 32 bit instructions
•Always 256 bits wide
8x32-bits
Instruction fetch decode and Execution packet
dispatch •Dispatches instructions into
appropriate execution units
nx32-bits •Varies from one to eight
instructions (32 bits to 256 bits)

L1 S1 M1 D1 L1 S1 M1 D1
Register file A Register file B

32-bits 32-bits Two data paths

Internal data RAM


Superscalar Processors

• Uses instruction level parallelism


• Developed to execute multiple instructions in one
cycle
• Achieved through multiple execution units
• Extensive use of pipelining
• Instruction width is not fixed
• An instruction can be issued to execute in parallel
like SIMD
• Uses load/store architecture suitable to take two
inputs and compute an output
Fixed point and Floating point
representation

16-bit signed fractional point,


IEEE 754 normalized representation of a
often indicated as Q1.15
single precision floating point number.
General purpose DSP architecture

DSP Processors

Fixed point processors • Floating point processors

•Represent each number with a minimum of •Represent each number with a minimum of 16 bits
16 bits •232 = 4,294,967,296 possible bit patterns can represent a
•216 = 65536 possible bit patterns can number
represent a number •Represented numbers are not uniformly spaced
•Unsigned integer : 0 to 65,535 •ANSI/IEEE Std. 754-1985-- the largest and smallest
•Signed integer : -32,768 to 32,767 numbers are ±3.4×1038 and ± 1.2x10-38, respectively
•Unsigned fraction : spread uniformly •The represented values are unequally spaced between
between 0 to 1 these two extremes, such that the gap between any two
•Signed fraction : spread uniformly between -1 numbers is about ten-million times smaller than the
to 1 value of the numbers.
•This is important because it places large gaps between
large numbers, but small gaps between small numbers
Fixed point digital signal processors
First Generation Second Generation Third Generation Fourth Generation
•TMS320C54xx,D
•TMS320C1X by TI •TMS320C5X from SP563X and •TMS320C62XX
in 1982 TI, DSP5600X from DSP16000 •VLIW
•Dedicated AU with Motorola, •Aimed for •Included
multiplier and ADSP21XX from Digital extensive
accumulator Analog Devices, communication parallelism while
•Harvard DSP16XX from and Digital Audio maintaining the
architecture with Lucent Technologies •Special features of
separate program •Enhanced features instructions for earlier versions
and data memory than first generation Adaptive filtering •Wider
•On-chip memory •Larger on-chip which included instructions,
and special memory and more echo wider data paths
instructions for special instructions cancellations and more registers,
execution of basic to execute DSP adaptive larger instruction
DSP algorithms algorithms equalization and cache and
•MAC with Repeat Viterbi decoding multiple AU
•Low power and
had power
management
facility
Floating point DSP processors

First Generation Second Generation Third Generation

•TMS320C3X TI •TMS320C4X, ADSP- •TMS320C67xx,


•Larger memory and 2106x SHARCH ADSP-TS001
many on-chip •Emphasis on •VLIW
peripheral facilities multiprocessing and
•Program cache and multiprocessor
on-chip dual access support
memories
•Graphics and
Image processing
•Supported three
floating point
formats
Special purpose Digital Signal Processor
Hardware digital filters : FIR

FIR structure

Hardware Architecture
for FIR filter
Special Purpose Digital Signal Processor
Hardware digital filters : IIR

IIR Structure

Hardware architecture for IIR filter


Special purpose Digital Signal Processor
Hardware FFT Processors

Simplified architecture of hardware


Concept of hardware butterfly processor FFT processor
Special purpose Digital Signal Processor
Hardware FFT Processors

Double buffering in real-time FFT

FFT performed on N point data in buffer A while buffer B is being filled


Architecture of TMS320C67XX
Valid Register Pairs
Name of the unit .L unit .S unit .M unit .D unit

Type of operation

32 bit add and subtract


Arithmetic operation 32/40 bit operation 32 bit operation -
operations only

Logical operation 32-bit operations 32-bit operations - 32-bit logical operations*

16x16 multiply
Multiply operations - - -
operations
32/40 bit shift
Shift operations - - -
operations

Compare operations 32/40 bit operation - - -

Branch operations - Yes - -

Loads and stores with 5-bit


Load and Store
- - - constant offset(15 bit
operations
constant offset in .D2 only)

Linear and circular


- - - Yes
address calculation

Constant generation - Yes - -

32/40 bit count


Count operations - - -
operations
16 bit move
Move operations Register to register only - Register to register only
operations
TMS320C67XX CPU data paths

Data lines:
•scr1 and scr2
•32bits (All)
•40bits (.L, .S)

Register File Cross Paths:


• Functional units can read and
write operands from own register
files
•.L1,.S1,.M1, .L2, .S2, .M2 have
access to opposite side registers
through cross paths

Memory Load and Store Paths:


•LD1and LD2 (LDDW)
•ST1 and ST2

Data Address Paths:


•DA1 and DA2 allows data address
generated by any one path to
access data to or from any register
Control Registers (accessed by .S2 alone using MVC)
Register Name Abbre. Description
Addressing Mode Reg. AMR Specifies linear or circular addressing of A4-A7 &B4-B7
Control Status Reg. CSR Contains important control and status bits of the processor
Program Counter E1 PCE1 Contains the address of the fetch packet that is in the E1
Phase Reg. phase of the pipeline
Interrupt Flag Reg. IFR Contains the status of INT4-INT5 and NMI maskable
interrupts
Interrupt Set Reg. ISR Used to manually set maskable pending interrupts

Interrupt Clear Reg. ICR Used to manually clear maskable pending interrupts
Interrupt Enable Reg. IER Used to enable/disable the individual maskable interrupts
Interrupt Service Table ISTP Points to beginning of interrupt service table
Reg.
Interrupt Return IRP Contains the address to be used to return from a maskable
Pointer interrupt
Non-maskable NRP Contains the address to be used to return from a non-
Interrupt Return maskable interrupt
Pointer
Address Mode Register AMR
31 26 25 21 20 16

Reserved BK1 BK2

B7 B6 B5 B4 A7 A6 A5 A4
mode mode mode mode mode mode mode mode

15 0

Mode Select Description of mode


0 0 Linear modification of address
0 1 Circular addressing using BK0
1 0 Circular addressing using BK1
1 1 Reserved
Unit 5
Introduction to Computer Architecture R5-12.1, R5-12.2
General purpose Digital Signal Processors R5-12.3
Selecting digital signal processors R5-12.4
Special purpose DSP Hardware R5-12.6
Architecture of TMS320C67X Reference GuideTMS320C67XX/T2-13.2
Features of C67X processors Reference GuideTMS320C67XX/T2-13.2
CPU TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-13.4
General purpose register files TMS320C67x/C67x+ DSPCPU and Instruction
Set Reference Guide/T2-13.5
Functional units and operation TMS320C67x/C67x+ DSPCPU and Instruction
Set Reference Guide/T2-13.6
Data paths TMS320C67x/C67x+ DSPCPU and Instruction
Set Reference Guide/T2-13.7
Control register file TMS320C67x/C67x+ DSPCPU and Instruction
Set Reference Guide/T2-13.8
Functional Units

Name of unit Type of operations


.L Arithmetic, Logical,
Compare , Other
.S Arithmetic, Logical,
Shift , Branch, Move,
Other
.M Multiply
.D Arithmetic, Load
store, Other
Addressing Modes
• Register Addressing mode:
– mnemonic .unit scr1, scr2, dst
– Mnemonic used could be ADD, SUB, MPY etc.
• ADD .L1 A1, A2,A3
• ADD .S2 B1, B2, B2
• ADD .L1 X A1,B2, A2

• Linear Addressing mode Uses .D1 and .D2


• Circular Addressing mode
Addressing Modes
• Linear Addressing mode: Uses .D1 and .D2
– mnemonic .unit mode field, dst

Load, store

*+baseR[offsetR/ucst5] Positive offset from baseR specified by offserR/ucst5


*-baseR[offsetR/ucst5 ] Negative offset from baseR specified by offserR/ucst5
*++baseR[offsetR/ucst5] Pre-incrmt from baseR specified by offserR/ucst5
*--baseR[offsetR/ucst5 ] Pre-decrmt from baseR specified by offserR/ucst5
*baseR++[offsetR/ucst5 ] Post-incrmt from baseR specified by offserR/ucst5
*baseR--[offsetR/ucst5 Post-decrmt from baseR specified by offserR/ucst5
Addressing Modes
• Linear Addressing mode: Uses .D1 and .D2
– mnemonic .unit mode field, dst

LDW .D1 *A0[1], A1


Load contents of mem located pointed by contents of A0+offset(1 left
shifted twice) into reg A1
(left shift by 3, 2, 1, 0 for double word, word, half word, byte respt.)

LDW .D1 *++A0[A4], A1

LDW .D1 *A0++[2], A1


Addressing Modes

• Circular Addressing mode:


– Uses .D1 and .D2
– A4-A7 and B4-B7 are used
– Address mode register is used to select modes for
A4/B4—A7/B7
• mnemonic .unit mode field, dst
Circular Buffering
Fixed Point Instructions
Instruction Functional Unit Description
MV .L1 or .L2 Move value from one register to another
.S1 or .S2
.D1 or .D2
MVC .S2 only Move value between control register and registerfile
MVK .S1 or .S2 Move 16-bit const into lower 16-bits of a register and
sign extended
MVKLH .S1 or .S2 Move 16-bit const into upper 16-bits of a register

MVKH .S1 or .S2 Move upper 16-bit const value of 32-bit into upper 16-
bits of a register
Flow of Execution
•Conditional Operations:
•All instructions can be conditional
•A1,A2,B0,B1,B2 are tested for conditional operation
(value as zero or non zero can be tested)
•Specified condition in register is tested at the beginning of
Execution E1 phase
•Parallel Operation:
• 8 instructions are fetched to form Fetched packet
•Execution of these instructions is controlled by scanning
p-bit from left to right
•P=1 of ith instruction; then i+1th instruction is to be
executed in parallel with ith instruction
•P=0 of ith instruction; then i+1th instruction is to be
executed in the next machine cycle after ith instruction
Flow of Execution
Flow of Execution
– Fully serial : p bits are zero; need 8 m/c to execute;
– Fully parallel : p bits are 1; need 1m/c
– Partially serial :
Flow of Execution

In summary
Pipelining

Fetch Operation

Program address generate PG Mem addr of 8 instr of fetch packet is generated


Program address send PS Address are send to mem
Program access ready wait PW Mem read operation
Program fetch packet receive PR 8 instrn are received in CPU

Execution will depend on fully serial, fully parallel or partially serial type
Pipelining
Decode Operation
•DP- Instruction dispatch
•Fetched packet are spilt into execution packet
•Execution packet consists of one instrn or two to eight parallel instrn
•Instrn are assigned to appropriate functional units

•DC-Instruction decode
•Source registers , destination registers and associated paths are decoded
Pipelining
Execute Operation

E1 E2 E3 E4 E5

Fixed point processor

E1 E2 E3 E4 E5 E6 E7 E8 E9 E10

Floating point processor


Internal Memory
Internal Memory •Cached based
internal mem arch.
•2 level mem arch
•L1P,L1D
• -4k size
•Not inculded in
Mem. Map
•Always enabled

•L2 64k size shared


for both program and
data mem

•First L1P and L1D are


accessed and if a
miss occurs then L2 is
accessed

•L2 controller
facilitates
•CPU access EMIF
•CPU access
Peripherals
External Memory

• If L2 miss occurs then external memory is


accessed
• Memory Attribute Register is used to enable
the external memory
On-chip Peripherals
Multichannel Buffered Serial Port
Features:
• Provides full-duplex communication
• Data selection size of 8,12,16,20,24 and 32 bits
• Independent framing and clocking for receive and transmit
• External shift clock or internal programmable clock for data
for transfer
• 8-bit data transfer with an option of LSB or MSB first
• Programmable polarity for both frame synchronization and
data clocks
• Double buffered register which allows continuous data
transmission
• Auto buffering capability through 5- channel DMA controller
• µ law and A law companding
• Direct interface to industry standard codecs, A/D, D/A
converters etc.
Multichannel Buffered Serial Port
Features of TMS320C6X Processor
• Advanced VLIW CPU with eight functional units, including two
multiplier and six ALUs
• Executes up to eight instructions per cycle allows to develop RISC
like code
• Instruction packing reduces code size, program fetches and power
consumption
• Efficient code execution on independent functional units
• Support 8/16/32- bit formats
• Field manipulation and instruction extract, set, clear and bit
counting operations
• Has support for single precision (32- bit) and double precision(64-
bit) IEEE floating point operations and also 32x32 bit integer
multiplication with 32 or 64- bit results
Unit 6
TMS320C67X Functional units TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2 14.1
Internal memory T2-15.4
External memory T2-15.5
on chip peripherals T2-15.6
Interrupts TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.8
Instruction set and addressing modes TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.1, T2-14.2
Fixed point instructions TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.3
Floating point instructions TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.6
Conditional operations TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.4
Parallel operations TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.5
Pipeline operations TMS320C67x/C67x+ DSPCPU and Instruction Set
Reference Guide/T2-14.7
Thank you!
Functional Units and Operations Performed
On-chip Peripherals

You might also like