Professional Documents
Culture Documents
Me FIRST
Me FIRST
! ! %&'( !
Amdahls law: SpeedUp = ! = ! ; %)*+
= ,-''+.- ; Sequential part = 1-f
!"#$ !"#$
" #$%&'(""&%"
Hazard: Situations that prevent starting the next instruction in the next cycle
• Structure hazards: A required resource is busy.
• Data hazard (RAW, WAR, WAW): Need to wait for previous instruction to complete its data read/write.
• Control hazard: Deciding on control action depends on previous instruction.
Instruction level parallelism (ILP): is a measure of how many of the instructions in a computer program
can be executed at the same time. To increase the performance.
more issue slots lead to increase complexity in control logic and that will limit performance increase.
Pipelining: executing multiple instructions in parallel to increase ILP reduce cycle time by...(Instruction
level parallelism).
Deeper pipeline: Less work per stage (more stages) Þ shorter clock cycle.
Multiple issue: Replicate pipeline stages (multiple pipelines)
Start multiple instructions per clock cycle.
CPI < 1, so use Instructions Per Cycle (IPC).
But dependencies reduce this in practice.
1. Static multiple issue (Software approach - VLIW) “The tasks performed by compiles in loop unrolling”:
1-Compiler groups instructions to be issued together.
2- Packages them into “issue slots”.
3-Compiler detects and avoids hazards. (Scheduling)
2. Dynamic multiple issue (Hardware approach - Superscalar)
-CPU examines instruction stream and chooses instructions to issue each cycle.
-Avoid compiler help to reordering instructions. (though it may still help).
-CPU resolves hazards using advanced techniques at runtime.
Speculation: guess what to do with an instruction that we don’t have its result yet.
1-Start operation as soon as possible. 2-Check whether guess was right.
If so, complete the operation.
If not, roll-back and do the right thing.
We need to buffer the result of guess until we make sure that its correct.
Ad: 1, save time if the guess is correct DisAd: ^ so we consume the CPU power for nothing.
Scheduling Static Multiple Issue: Compiler must remove some/all hazards by:
1-Speculation: Reorder instructions into issue packets. (move load before the branch. -Can include fix-up
instructions to recover from incorrect guess).
2 -Pad with nop if necessary.
3-No dependencies with a packet. (Possibly some dependencies between packets)
Hardware can look ahead for instructions to execute: - Buffer the results until it determines they are
actually needed. - Flush buffers on incorrect speculation
Advantages: -Reduce hardware complexity -Less design time. -Shorter cycle time.
-Better performance. -Reduced power consumption.
Re-order buffer: circular queue with head and tail pointers / is a buffer used in a superscalar processor
It stores the results of instructions, reorder them and writes them in order even it was executed out-of-
order. It uses register renaming for some dependences. Send missing operand to RS.
¨Why Do Dynamic Scheduling? Why not just let the compiler schedule code?
-Not all stalls are predictable (cache misses).
-Can’t always schedule around branches (Branch outcome is dynamically determined)
-Different implementations of an ISA have different latencies and hazards.
Nop: inserted by the compiler before execution time, it will be executed as normal instruction will move
through all stages.
Stall: created by a hardware at runtime, it will move through few of stages not all of them.
Parallel programming:
Difficulties? Partitioning - Coordination - Communications overhead
Why? To save time and money as many resources working together.
App? Data bases and Data mining - Real time simulation of systems - Science and Engineering – graphics.
Chaining: Allows a vector operation to start as soon as the missing operand appears by forwarding the
results from the first functional unit to the second unit.
Adding more lanes allows to trade-off clock rate and energy without reducing performance!
Advantages of multimedia:
- Cost little to add to the standard ALU and easy to implement
- Require little extra state -> easy for context-switch
- Require little extra memory bandwidth
- No virtual memory problem of cross-page access and page-fault
Thread-level parallelism: execute independent programs/threads at the same time using different sources
of execution. To increase the performance.
Multithreading: multiple threads to share the functional units of one processor via overlapping.
Thread: process with its own PC, instructions and data. It may be a process part of a parallel program of
multiple processes, or it may be an independent program.
Advantages of using multiple instruction streams to improve 1. Throughput of computers that run many
programs 2. Execution time of multi-threaded programs. Increase the TLP. For every active thread, Pc,
instructions and data.
MIMD Multiprocessors:
SMP: Shared-Memory Multiprocessors = Symmetric MultiProcessors:
- Processors communicate with each other through the global memory at the same time.
- Access to memory from all processors is symmetric.
- Not easy to scalable as we have a single address space for all processors.
- Uniform Memory Access (UMA)
Distributed-memory multiprocessor:
- Scales better with an increased number of processors compared to SMPs
- Communicate by messages through the network.
-Each processor has a private physical address space.
- Non-Uniform Memory Access (NUMA)