Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 30

Overview

Why multiprocessors? The structure of multiprocessors. Elements of multiprocessors:


Processing elements. Memory. Interconnect.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Why multiprocessing?
True parallelism:
Task level. Data level.

May be necessary to meet real-time requirements.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Multiprocessing and real time


Faster rate processes are isolated on processors.
Specialized memory system as well.

mem

CPU

print engine

Slower rate processes are shared on a processor (or processor pool).


2004 Wayne Wolf

mem

CPU

File read, Rendering, Etc.

Overheads for Computers as Components 2e

Heterogeneous multiprocessors
Will often have a heterogeneous structure.
Different types of PEs. Specialized memory structure. Specialized interconnect.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Multiprocessor system-onchip
Multiple processors.
CPUs, DSPs, etc. Hardwired blocks. Mixed-signal.

Custom memory system. Lots of software.

2004 Wayne Wolf

Overheads for Computers as Components 2e

System-on-chip applications
Sophisticated markets:
High volume. Demanding performance, power requirements. Strict price restrictions.

Often standards-driven. Examples:


Communications. Multimedia. Networking.
2004 Wayne Wolf Overheads for Computers as Components 2e

Terminology
PE: processing element. Interconnection network: may require more than one clock cycle to transfer data. Message: address+data packet.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Generic multiprocessor
Shared memory:
PE PE PE

Message passing:
mem PE mem PE mem PE

Interconnect network

mem

mem mem

Interconnect network

2004 Wayne Wolf

Overheads for Computers as Components 2e

Shared memory vs. message passing


Shared memory and message passing are functionally equivalent. Different programming models:
Shared memory more like uniprocessor. Message passing good for streaming.

May have different implementation costs:


Interconnection network.
2004 Wayne Wolf Overheads for Computers as Components 2e

Shared memory implementation


Memory blocks are in address space. Memory interface sends messages through network to addressed memory block.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Message passing implementation


Program provides processor address, data/parameters.
Usually through API.

Packet(s) interface appears as I/O device.


Packet routed through network to interface.

Recipient must decode parameters to determine how to handle the message.


2004 Wayne Wolf Overheads for Computers as Components 2e

Processing element selection


What tasks run on what PEs?
Some tasks may be duplicated (e.g., HDTV motion estimation). Some processors may run different tasks.

How does the load change?


Static vs. dynamic task allocation.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Matching PEs to tasks


Factors:
Word size. Operand types. Performance. Energy/power consumption.

Hardwired function units:


Performance. Interface.
2004 Wayne Wolf Overheads for Computers as Components 2e

Task allocation
Tasks may be created at:
Design time (video encoder). Run time (user interface).

Tasks may be assigned to processing elements at:


Design time (predictable load). Run time (varying load).
Overheads for Computers as Components 2e

2004 Wayne Wolf

Memory system design


Uniform vs. heterogeneous memory system.
Power consumption. Cost. Programming difficulty.

Caches:
Memory consistency.
Overheads for Computers as Components 2e

2004 Wayne Wolf

Parallel memory systems


True concurrency--several memory blocks can operate simultaneously.

PE

PE

PE

Interconnect network mem mem mem

2004 Wayne Wolf

Overheads for Computers as Components 2e

Cache consistency
Problem: caches hide memory updates. Solution: have caches snoop changes.
PE cache network mem mem PE cache

2004 Wayne Wolf

Overheads for Computers as Components 2e

Cache consistency and tasks


Traditional scientific computing maps a single task onto multiple PEs. Embedded computing maps different tasks onto multiple PEs.
May be producer/consumer. Not all of the memory may need to be consistent.
Overheads for Computers as Components 2e

2004 Wayne Wolf

Network topologies
Major choices.
Bus. Crossbar. Buffered crossbar. Mesh. Application-specific.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Bus network
Advantages:
Well-understood. Easy to program. Many standards.

Disadvantages:
Contention. Significant capacitive load.
2004 Wayne Wolf Overheads for Computers as Components 2e

Crossbar
Advantages:
No contention. Simple design.

Disadvantages:
Not feasible for large numbers of ports.
Overheads for Computers as Components 2e

2004 Wayne Wolf

Buffered crossbar
Advantages:
Smaller than crossbar. Can achieve high utilization.

Disadvantages:
Requires scheduling.
2004 Wayne Wolf Overheads for Computers as Components 2e

Xbar

Mesh
Advantages:
Well-understood. Regular architecture.

Disadvantages:
Poor utilization.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Application-specific.
Advantages:
Higher utilization. Lower power.

Disadvantages:
Must be designed. Must carefully allocate data.
Overheads for Computers as Components 2e

2004 Wayne Wolf

TI OMAP
Targets communications, multimedia. Multiprocessor with DSP, RISC.
OMAP 5910:

C55x DSP MMU Memory ctrl ARM9 MPU interface System DMA control
bridge

I/O

2004 Wayne Wolf

Overheads for Computers as Components 2e

RTOS for multiprocessors


Issues:
Multiprocessor communication primitives. Scheduling policies.

Task scheduling is considerably harder with true concurrency.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Distributed system performance


Longest-path algorithms dont work under preemption. Several algorithms unroll the schedule to the length of the least common multiple of the periods:
produces a very long schedule; doesnt work for non-fixed periods.

Schedules based on upper bounds may give inaccurate results.


2004 Wayne Wolf Overheads for Computers as Components 2e

Data dependencies help

P1

P3

P2

P3 cannot preempt both P1 and P2. P1 cannot preempt P2.


Overheads for Computers as Components 2e

2004 Wayne Wolf

Preemptive execution hurts


Worst combination of events for P5s response time:
P1 P2 P3 P2 of higher priority P2 initiated before P4 causes P5 to wait for P2 and P3.

P5 M1 M2

P4 M3

Independent tasks can interferecant use longest path algorithms.

2004 Wayne Wolf

Overheads for Computers as Components 2e

Period shifting example


t1
P1

t2
P2

t3
P4

task t1 t2 t3

period 150 70 110

process CPU time P1 30 P2 10 P3 30 P4 20

P3

CPU 1

P1

P2

P2

CPU 2

P3

P4

P3

P4

P2 delayed on CPU 1; data dependency delays P3; priority delays P4. Worst-case t3 delay is 80, not 50.
2004 Wayne Wolf Overheads for Computers as Components 2e

You might also like