Professional Documents
Culture Documents
Advanced Processors: Overview of DSP Unit-5 Unit-6
Advanced Processors: Overview of DSP Unit-5 Unit-6
Overview of DSP
Unit-5
Unit-6
DSP Processors
•Harvard Architecture
• Filtering, correlation, •Pipelining
FFT •Fast dedicated
Parallelism
• Heavy data flow hardware MAC
through CPU •Special Instruction
• Real time operations •Replication
•On-chip memory and
cache
•Extended Parallelism-
SIMD, VLIW, Superscalar
Simplified Architecture of Standard Microprocessor
Data bus-A
Data bus-B
32- bit result 32- bit result 32- bit result 32- bit result
Very Long Instruction Word (VLIW)
L1 S1 M1 D1 L1 S1 M1 D1
Register file A Register file B
DSP Processors
•Represent each number with a minimum of •Represent each number with a minimum of 16 bits
16 bits •232 = 4,294,967,296 possible bit patterns can represent a
•216 = 65536 possible bit patterns can number
represent a number •Represented numbers are not uniformly spaced
•Unsigned integer : 0 to 65,535 •ANSI/IEEE Std. 754-1985-- the largest and smallest
•Signed integer : -32,768 to 32,767 numbers are ±3.4×1038 and ± 1.2x10-38, respectively
•Unsigned fraction : spread uniformly •The represented values are unequally spaced between
between 0 to 1 these two extremes, such that the gap between any two
•Signed fraction : spread uniformly between -1 numbers is about ten-million times smaller than the
to 1 value of the numbers.
•This is important because it places large gaps between
large numbers, but small gaps between small numbers
Fixed point digital signal processors
First Generation Second Generation Third Generation Fourth Generation
•TMS320C54xx,D
•TMS320C1X by TI •TMS320C5X from SP563X and •TMS320C62XX
in 1982 TI, DSP5600X from DSP16000 •VLIW
•Dedicated AU with Motorola, •Aimed for •Included
multiplier and ADSP21XX from Digital extensive
accumulator Analog Devices, communication parallelism while
•Harvard DSP16XX from and Digital Audio maintaining the
architecture with Lucent Technologies •Special features of
separate program •Enhanced features instructions for earlier versions
and data memory than first generation Adaptive filtering •Wider
•On-chip memory •Larger on-chip which included instructions,
and special memory and more echo wider data paths
instructions for special instructions cancellations and more registers,
execution of basic to execute DSP adaptive larger instruction
DSP algorithms algorithms equalization and cache and
•MAC with Repeat Viterbi decoding multiple AU
•Low power and
had power
management
facility
Floating point DSP processors
FIR structure
Hardware Architecture
for FIR filter
Special Purpose Digital Signal Processor
Hardware digital filters : IIR
IIR Structure
Type of operation
16x16 multiply
Multiply operations - - -
operations
32/40 bit shift
Shift operations - - -
operations
Data lines:
•scr1 and scr2
•32bits (All)
•40bits (.L, .S)
Interrupt Clear Reg. ICR Used to manually clear maskable pending interrupts
Interrupt Enable Reg. IER Used to enable/disable the individual maskable interrupts
Interrupt Service Table ISTP Points to beginning of interrupt service table
Reg.
Interrupt Return IRP Contains the address to be used to return from a maskable
Pointer interrupt
Non-maskable NRP Contains the address to be used to return from a non-
Interrupt Return maskable interrupt
Pointer
Address Mode Register AMR
31 26 25 21 20 16
B7 B6 B5 B4 A7 A6 A5 A4
mode mode mode mode mode mode mode mode
15 0
Load, store
MVKH .S1 or .S2 Move upper 16-bit const value of 32-bit into upper 16-
bits of a register
Flow of Execution
•Conditional Operations:
•All instructions can be conditional
•A1,A2,B0,B1,B2 are tested for conditional operation
(value as zero or non zero can be tested)
•Specified condition in register is tested at the beginning of
Execution E1 phase
•Parallel Operation:
• 8 instructions are fetched to form Fetched packet
•Execution of these instructions is controlled by scanning
p-bit from left to right
•P=1 of ith instruction; then i+1th instruction is to be
executed in parallel with ith instruction
•P=0 of ith instruction; then i+1th instruction is to be
executed in the next machine cycle after ith instruction
Flow of Execution
Flow of Execution
– Fully serial : p bits are zero; need 8 m/c to execute;
– Fully parallel : p bits are 1; need 1m/c
– Partially serial :
Flow of Execution
In summary
Pipelining
Fetch Operation
Execution will depend on fully serial, fully parallel or partially serial type
Pipelining
Decode Operation
•DP- Instruction dispatch
•Fetched packet are spilt into execution packet
•Execution packet consists of one instrn or two to eight parallel instrn
•Instrn are assigned to appropriate functional units
•DC-Instruction decode
•Source registers , destination registers and associated paths are decoded
Pipelining
Execute Operation
E1 E2 E3 E4 E5
E1 E2 E3 E4 E5 E6 E7 E8 E9 E10
•L2 controller
facilitates
•CPU access EMIF
•CPU access
Peripherals
External Memory