Professional Documents
Culture Documents
Module 4 Note MP
Module 4 Note MP
SYLLABUS
Features of Pentium- pipelining- Stages of pipelining- Speedup due to pipelining- Pipeline Hazards
80386 Microprocessor is a 32-bit processor that holds the ability to carry out 32-bit operations
in one cycle.
It has a data and address bus of 32-bit each. Thus has the ability to address 4 GB (or 232) of
physical memory.
Features of 80386
80386 can address 1 Mbytes of physical memory using address lines A0-A19.
80386 appears to programmers as a fast 8086 with some new instructions.
Paging unit is disabled ,and hence the real addresses are the same as the physical addresses.
To form a physical memory address, appropriate segment registers contents(16 bits) are shifted left
by four positions and then added to the 16 bit offset address.
The segment in 80386 real mode can be read, write or executed.
PENTIUM PROCESSOR
Pentium Microprocessor is one of the powerful family members of Intel’s Χ86 microprocessor. It is
an advanced superscalar 32-bit microprocessor,
Pipelining
A program consists of several number of instructions.
These instructions may be executed in the following two ways
Non-Pipelined Execution
All the instructions of a program are executed sequentially one after the other.
A new instruction executes only after the previous instruction has executed completely.
This style of executing the instructions is highly inefficient.
Example-
Pipelined Execution
Stage 1 (Instruction Fetch): In this stage the CPU reads instructions from the address in the
memory whose value is present in the program counter.
Stage 2 (Instruction Decode): In this stage, instruction is decoded and the register file is accessed
to get the values from the registers used in the instruction.
Stage 3 (Instruction Execute):In this stage, ALU operations are performed.
Stage 4 (Memory Access): In this stage, memory operands are read and written from/to the
memory that is present in the instruction.
Stage 5 (Write Back):In this stage, computed/fetched value is written back to the register present
in the instructions.
Each stage is executed by its own dedicated CPU functional unit and each takes one clock cycle to
execute.
In Fig.we can see how instructions are overlapped using the pipelining technique.
In the first clock cycle, the first instruction is fetched. In the following clock cycle, that same
instruction is decoded while at the same time, the second instruction is being fetched.
During the third clock cycle, the first instruction is executed, the second instruction is decoded and the
third instruction is being fetched.
On the fifth clock cycle, we have the first instruction completed. From then on, we have an instruction
completing on every clock cycle.
The time for reaching the completion of the first instruction passed through the pipeline is called “time
to fill“. It is dependent on the number of stages. In this example, it takes 5 cycles to fill the pipeline.
1. Washing
2. Drying
3. Folding
The washing is performed using a washing machine, the drying using a dryer machine and the folding is
performed manually by the person doing the laundry.
For the purpose of the example let’s say the washing takes 60 minutes, the drying takes 30 minutes and the
folding of the clothes takes 30 minutes.
As each operation starts only after the previous has finished we have one load of laundry completed for 2
hours.
Fig. 1 Laundry process without pipeline technique
If we look closely at the process, one thing becomes obvious – once an operation is finished for
the current load of clothes, the hardware used stays idle and waits for the next load.
For example, the washer machine is idle while drying and folding operations are performed.
This is certainly not the most efficient way of doing things and one way to improve it is shown in
Fig. 2.
There we can see the same process of doing laundry, but this time using a pipeline technique.
As soon as the washing is completed, we can put the clothes of the 2nd load, so the washing
machine keeps working while the drying and the folding of the 1st load are being executed.
Based on the examples above, we can conclude that using the pipeline technique does not have an impact
on thetime needed for completing a single load of laundry (it takes 2 hours).
The improvement is visible when doing multiple loads. Without the pipeline, we can do two loads for a
total of4 hours.
Using the pipeline as shown in Fig. 2 we can do 3 loads in the same time frame. The speedup would
be even greater if all of the operations took the same time to complete, thus allowing better overlapping
in the pipeline.This is applied in microprocessor pipeline implementations.
SPEEDUP DUE TO PIPELINING
PIPELINE HAZARD
Any condition that causes a stall in the pipeline operations can be called a hazard.
Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing
during its designated clock cycles.
There are primarily three types of hazards:
i. Data Hazards
Data Hazards
A data hazard is any condition in which either the source or the destination operands of an instruction
are not available at the time expected in the pipeline.
As a result of which some operation has to be delayed and the pipeline stalls.
Whenever there are two instructions one of which depends on the data obtained from the other.
A=3+A
B=A*4
For the above sequence, the second instruction needs the value of ‘A’ computed in the first
instruction.Thus the second instruction is said to depend on the first.
If the execution is done in a pipelined processor, it is highly likely that the interleaving of these two
instructions can lead to incorrect results due to data dependency between the instructions. Thus the
pipeline needs to be stalled as and when necessary to avoid errors.
There are three types of data dependencies: Read after Write (RAW), Write after Read (WAR),
Write after Write (WAW).
It is also known as True dependency or Flow dependency. It occurs when the value produced
by an instruction is required by a subsequent instruction.
It is also known as anti dependency. These hazards occur when the output register of an
instruction is used right after read by a previous instruction.
It is also known as output dependency. These hazards occur when the output register of an
instruction is used for write after written by previous instruction .
Structural Hazards
This situation arises mainly when two instructions require a given hardware resource at the same time
and hence for one of the instructions the pipeline needs to be stalled.
The most common case is when memory is accessed at the same time by two instructions.
One instruction may need to access the memory as part of the Execute or Write back phase while other
instruction is being fetched.
In this case if both the instructions and data reside in the same memory.
Both the instructions can’t proceed together and one of them needs to be stalled till the other is done
with the memory access part.
Thus in general sufficient hardware resources are needed for avoiding structural hazards.
Control hazards
The instruction fetch unit of the CPU is responsible for providing a stream of instructions to the
execution unit.
The instructions fetched by the fetch unit are in consecutive memory locations and they are executed.
However the problem arises when one of the instructions is a branching instruction to some other
memory location. Thus all the instruction fetched in the pipeline from consecutive memory locations
are invalid now and need to removed(also called flushing of the pipeline).
This induces a stall till new instructions are again fetched from the memory address specified in the
branch instruction.
Often dedicated hardware is incorporated in the fetch unit to identify branch instructions and compute
branch addresses as soon as possible and reducing the resulting delay as a result.
Superscalar processors issue more than one instruction per clock cycle.
The processor with multiple processing units to handle several instructions in parallel in each
processing stage.
With this arrangement, several instructions start execution in the same clock cycle and the process
is said to use multiple issue.
Such processors are capable of achieving an instruction execution throughput of more than one
instruction per cycle. They are known as ‘Superscalar Processors’.
Superscalar Processors execute multiple instructions within the implemented Executing Units in the
Execution Stage, providing true parallelism within the CPU.
In a superscalar design, the processor must read the instructions from memory and decide which ones
can be run in parallel, dispatching them to the available Executing Units.
The Superscalar Processor can be envisioned as having multiple parallel pipelines.
In super scalar processor entire processor pipeline or parts of its are replicated.
Super pipeling is implemented in pentium.
There are two integer execution unit.
Two instruction can be executed at any time.
That is two integer instructions are also to be fetched and decoded in parallel.
Multicore processing
Multi core processor contains several processing units means “Cores” on one chip, and every core of
processor is capable to perform their different tasks.
For example, if you are doing multiple tasks at a same time like as using WhatsApp and Watching
movie then one core handle WhatsApp activities and other core manage to another works such as
watching movie.
This architecture of a multicore processor allows to make communication in between entire available
cores, and they split all processing tasks and then assigned them accurately.
When all processing tasks are done, then this processed data from every core is sent backward to main
board (Motherboard) of computer with using of single shared gateway.
Due to this technique, to improve the entire performance then single core processor.
The power consumption is a function of number of transistors on a chip. when more cores are added,
the transistor density also increases which contributes to the power consumption.
To reduce unnecessary power consumption, the multicore design also has to make use of a separate
power management unit that can manage or control unnecessary wastage of power.
Level of parallelism
Power and temperature issues
He process Performance is directly related to the amount of parallelism because more the number of
processes that can be executed simultaneously more will be the parallelism.
The success of multicore technology strongly depends on the way the algorithms are written.
If the algorithms written are not compatible with the multicore design, the process will run on one of
the cores, while other cores will sit idle.
Interconnect issues
Since there are so many components on chip in a multicore processor like core, caches, network
controllers etc, the interaction between them can affect the performance of the interconnection issues
are not resolved properly.
Cache coherence
In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same
level of the memory hierarchy. For example, the cache and the main memory may have inconsistent
copies of the same object.
As multiple processors operate in parallel, and independently multiple caches may possess different
copies of the same memory block, this creates cache coherence problem. Cache coherence
schemes help to avoid this problem by maintaining a uniform state for each cached block of data.
Let X be an element of shared data which has been referenced by two processors, P1 and P2.
In the beginning, three copies of X are consistent. If the processor P1 writes a new data X1 into the
cache, by using write-through policy, the same copy will be written immediately into the shared
memory.
In this case, inconsistency occurs between cache memory and the main memory. When a write-back
policy is used, the main memory will be updated when the modified data in the cache is replaced or
invalidated.
MMX Technology
MMX is a Pentium microprocessor from Intel that is designed to run faster when playing multimedia
applications. According to Intel, a PC with an MMX microprocessor runs a multimedia application up
to 60% faster than one with a microprocessor having the same clock speed but without MMX.
In addition, an MMX microprocessor runs other applications about 10% faster, probably because of
increased cache.
All of these enhancements are made while preserving compatibility with software and operating
systems developed for the Intel Architecture.
The MMX technology consists of several improvements over the non-MMX Pentium microprocessor:
1. 57 new microprocessor instructions have been added that are designed to handle video, audio, and
graphical data more efficiently. Programs can use MMX instructions without changing to a new
mode or operating-system visible state.
2. New 64-bit integer data type (Quadword).
3. A new process, Single Instruction Multiple Data (SIMD), makes it possible for one instruction to
perform the same operation on multiple data items.
4. The memory cache on the microprocessor has increased to 32 KB, meaning fewer accesses to
memory that is off the microprocessor.
Hyper-threading
Hyper Threading is a technology designed to increase the performance of the CPU. It allows multiple
threads to run on each core to make the CPU run efficiently.
It increases the amount of work performed by the CPU within a unit time.
A core is the execution unit of the CPU.
Initially, there was only one core in the CPU. Later, manufacturers added more cores to the CPU to
increase the number of instructions executed by the CPU at a time.
Hyper threading is a mechanism to increase the performance of the CPU further.
It makes the operating system recognize each physical core as two virtual or logical cores.
In other words, it virtually increases the number of cores in a CPU. Therefore, a single processor runs two
threads. hyper threading really does not increase the number of coresit just increases the cores virtually or
logically. Each virtual core can work independently.
PREVIOUS YEAR QUESTIONS FROM FOURTH MODULE
PART A (2 marks)
1. What is hyperthreading.
2. State the term pipelining.
3. Write any two features of pentium.
4. Write any four features of Pentium processor.
5. List pipeline hazards.
PART B (6 marks)