Avoiding Pipeline Stalls in Hyperthreaded Processors: Iit Bombay M.Tech1

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

AVOIDING PIPELINE STALLS IN

HYPERTHREADED PROCESSORS
IIT BOMBAY
M.TECH1
Contents

 Hyperthreaded processing
 Pipeline stalls
 NetBurst Architecture
 Spin-wait
 References
Hyperthreaded Processing

 Hyperthreading (or symmetric multi-threading) is a hardware


technique used to squeeze more performance out of modern
processors.
 Appears to be a set of two independent processors.
 Same Hardware, with processor registers and state dependent
information separate.
 At a time, only one processor executes, and the other executes
when the first stalls waiting for memory access.
 Hence, enables greater utilization of hardware and thereby
resulting performance gain from 5-30%.
Hyper-Threading Technology Architecture

Arch State Arch State Arch State

Processor Execution Processor Execution


Resources Resources

Processor with out Hyper-


Processor with
Threading Technology Hyper-Threading Technology
Pipeline stalls

 Due to the architecture of modern processors and the


tendency to attempt pre-execution of known independent
blocks of code, multi-threaded and parallel programming
can be a tricky adventure.
 It becomes imperative that developers are aware of certain
processor-level issues that may adversely impact the
performance of a multi-threaded application. One common
source of performance degradation are pipeline stalls.
Pipeline Stalls

Fig:Pipeline stalls would look different in a 4-stage pipeline with the effect of “Bubbles”
Why avoid pipeline stalls??

 With the presence of pipeline stalls, the average instruction


throughput of the pipeline decreases.
 More is the number of bubbles(no of clocks) stalled, more is
the decrease in the average instruction throughput.

 Though all the pipeline stalls cannot be completely


removed or avoided completely, developers should make
efforts to decrease their frequency.
Pipeline stalls in NetBurst Architecture

 The NetBurst architecture is particularly adept at spotting


sequences of instructions that it can execute out of original:
program order, that is, ahead of time. These sequences are
characterized by:
 Having no dependency on other instructions.
 Not causing side effects that affect the execution of other instructions
(such as such as modifying a global state).
• When the processor spots these sequences, it executes the
instructions and stores the results. The processor cannot fully
retire these instructions because it must verify that assumptions
made during their speculative execution are correct.
Pipeline stalls in NetBurst Architecture(2)

 If the speculation was indeed correct, then instructions are


retired (in program order). However, if the assumptions are
wrong, in a particularly bad case, called a full stall, all
instructions in flight are terminated and retired in careful
sequence, all the pre-executed code is thrown out, and the
pipeline is cleared and restarted at the point of incorrect
speculation -- this time with the correct path.
Spin wait

One particular sequence where the processor detects to be


out of order is spin wait. It is the following set of assembly
instructions.

top_of_loop:
load x into a register
compare to 0
if not equal,
goto top_of_loop
else
...
Spin wait(2)

It sees that the loop does not depend on any variables being
calculated by other instructions and so the sequence can be
executed without fear of disturbing other instructions.
In addition, it knows that if x changes value while the loop is
running, this change will be caught before the instructions are
retired by the processor. As a result, it grabs this sequence
and executes it numerous times and very quickly.
 In the process, it floods the processor's store of instructions
to be retired with the repeated iterations of the loop.
 With no reason to slow down, the speculative execution
continues to crank out the instructions at full tilt. Finally, the
variable being waited on changes value.
Spin wait(3)

 The instruction-retirement logic recognizes this change and


triggers the full pipeline stall: it discards all the pre-executed
iterations of the loop that are waiting to be retired, it retires all
other instructions in flight, and it determines where the pipeline
should resume and sets the pipeline to that instruction.
 On a processor with Hyper-Threading Technology, this extra
work has a serious, detrimental impact on the performance of the
other thread: it starves the second logical processor of resources,
so that both threads are effectively incapable of doing any work
simply because the loop is spinning so fast.
Pause instruction

 It is clear that the loop variable cannot change faster than the
memory bus can update it. Hence, there is no benefit to pre-
execute the loop faster than the time needed for a memory
refresh. By inserting a pause instruction into a loop, the
programmer tells the processor to wait for the amount of time
equivalent to this memory access. On processors with Hyper-
Threading Technology, this respite enables the other
thread to use all of the resources on the physical
processor and continue processing.
_asm
{
Pause
}

Embedded assembly language code for pause instruction.


Spin-wait Instruction

 When introduced, the Pentium 4 processor included 2 new


instructions.
 The first instruction, monitor, watches a specified area of
memory for write activity.
 Its companion instruction, mwait, associates the writes to this
memory block to waking up a specific processor.
 Since updating a variable is a write activity, by specifying its
memory address and using mwait, a processor can simply be
suspended until the variable is updated. Effectively, this enables
a wait on a variable without spinning a loop.
References

 http://www.drdobbs.com/high-performance-computing
 http://en.wikipedia.org/wiki/NetBurst_%
28microarchitecture%29
 http://en.wikipedia.org/wiki/Hyper-threading
 http://en.wikipedia.org/wiki/Pipeline_stall
Thank You
N.Kedhar Nath
10305058

You might also like