Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

MODULE 4 : ADVANCED PROCESSOR TECHNOLOGIES

SYLLABUS

Features of 80386- Real Mode - Protected Virtual Addressing Mode

Features of Pentium- pipelining- Stages of pipelining- Speedup due to pipelining- Pipeline Hazards

Super scalar Processors- Multiple Execution units

Multicore processing – Major issues in Multicore Processing (interconnects- cache coherence-snooping


protocol- Directory based protocol) MMX- SSE- Hyperthreading.

SPECIFIC OUTCOMES: MODULE 4

4.0 To understand the operating Modes of 80386

4.0.1 To list the features of 80386

4.0.2 To describe Real Mode & Protected Virtual Addressing Mode

4.1 To Explain Pentium Processor

4.1.1 To list the features of Pentium

4.1.2 To describe pipelining

4.2 To know the advanced technologies of modern Intel processors

4.2.1 To define Super scalar Architecture

4.2.2 To define Multicore processing

4.2.3 To define MMX Technology

4.2.4 To define Hyperthreading


80386 Microprocessor

 80386 Microprocessor is a 32-bit processor that holds the ability to carry out 32-bit operations
in one cycle.
 It has a data and address bus of 32-bit each. Thus has the ability to address 4 GB (or 232) of
physical memory.

Features of 80386

 As it is a 32-bit microprocessor. Thus has a 32-bit ALU.


 80386 has a data bus of 32-bit.
 It holds an address bus of 32 bit.
 It supports physical memory addressability of 4 GB and virtual memory addressability of 64 TB.
 80386 supports a variety of operating clock frequencies, which are 16 MHz, 20 MHz, 25 MHz,
and 33 MHz.
 It offers 3 stage pipeline: fetch, decode and execute. As it supports simultaneous fetching,
decoding, and execution inside the system.
 Operating modes of 80386 are
 Real mode
 Protected virtual address mode

Operating modes of 80386

 80386 supports 3 operating modes:


1. Real address mode
2. Protected virtual address mode
3. Virtual 8086 mode

1. REAL ADDRESS MODE

 80386 can address 1 Mbytes of physical memory using address lines A0-A19.
 80386 appears to programmers as a fast 8086 with some new instructions.
 Paging unit is disabled ,and hence the real addresses are the same as the physical addresses.
 To form a physical memory address, appropriate segment registers contents(16 bits) are shifted left
by four positions and then added to the 16 bit offset address.
 The segment in 80386 real mode can be read, write or executed.

2. Protected virtual address mode

 In the processor has very large address space.


 It has 4GB of physical memory address space & 64 TB of virtual memory address space.
 The address line A2-A31 along with 4 bank select signals BE0-BE3 are used to address 4 GB of
physical memory.
 The processor can switch from real address mode to PVAM by setting the PE bit in the control register.
 To form a physical memory address segment register + offset is converted to linear address by
segmentation and then linear address is converted to physical addressing by paging.

PENTIUM PROCESSOR

 Pentium Microprocessor is one of the powerful family members of Intel’s Χ86 microprocessor. It is
an advanced superscalar 32-bit microprocessor,

Features of Pentium processors

 Super scalar architecture- parallel execution of several instructions.


 64bit data bus
 8 bytes of data information can be transferred to and from memory in a single bus cycle.
 Supports burst read and burst write back cycles.
 Supports pipelining-dual instruction pipeline.
 Two parallel integer execution units, one floating point execution unit.
 8 KB of dedicated instruction cache.
 8 KB dedicated data cache gives data to execution units.
 Inbuilt advanced programmable interrupt controller.
 5 times faster than 80486.
 Data integrity and error detection.

Pipelining
 A program consists of several number of instructions.
 These instructions may be executed in the following two ways
Non-Pipelined Execution
 All the instructions of a program are executed sequentially one after the other.
 A new instruction executes only after the previous instruction has executed completely.
 This style of executing the instructions is highly inefficient.
Example-

 Consider a program consisting of three instructions.


 In a non-pipelined architecture, these instructions execute one after the other as-

Pipelined Execution

 Following are the 5 stages of pipeline

 Stage 1 (Instruction Fetch): In this stage the CPU reads instructions from the address in the
memory whose value is present in the program counter.
 Stage 2 (Instruction Decode): In this stage, instruction is decoded and the register file is accessed
to get the values from the registers used in the instruction.
 Stage 3 (Instruction Execute):In this stage, ALU operations are performed.
 Stage 4 (Memory Access): In this stage, memory operands are read and written from/to the
memory that is present in the instruction.
 Stage 5 (Write Back):In this stage, computed/fetched value is written back to the register present
in the instructions.
 Each stage is executed by its own dedicated CPU functional unit and each takes one clock cycle to
execute.
 In Fig.we can see how instructions are overlapped using the pipelining technique.
 In the first clock cycle, the first instruction is fetched. In the following clock cycle, that same
instruction is decoded while at the same time, the second instruction is being fetched.
 During the third clock cycle, the first instruction is executed, the second instruction is decoded and the
third instruction is being fetched.
 On the fifth clock cycle, we have the first instruction completed. From then on, we have an instruction
completing on every clock cycle.
 The time for reaching the completion of the first instruction passed through the pipeline is called “time
to fill“. It is dependent on the number of stages. In this example, it takes 5 cycles to fill the pipeline.

EXAMPLE FOR PIPELINING


 One of the most popular ways of explaining how pipeline works is using the process of doing laundry. The
laundry process consists of three standalone operations performed on the clothes:

1. Washing
2. Drying
3. Folding

 The washing is performed using a washing machine, the drying using a dryer machine and the folding is
performed manually by the person doing the laundry.

 For the purpose of the example let’s say the washing takes 60 minutes, the drying takes 30 minutes and the
folding of the clothes takes 30 minutes.

 As each operation starts only after the previous has finished we have one load of laundry completed for 2
hours.
Fig. 1 Laundry process without pipeline technique

 In Fig. 1 we can see the timeline of these operations.

 If we look closely at the process, one thing becomes obvious – once an operation is finished for
the current load of clothes, the hardware used stays idle and waits for the next load.
 For example, the washer machine is idle while drying and folding operations are performed.

 This is certainly not the most efficient way of doing things and one way to improve it is shown in
Fig. 2.

 There we can see the same process of doing laundry, but this time using a pipeline technique.
As soon as the washing is completed, we can put the clothes of the 2nd load, so the washing
machine keeps working while the drying and the folding of the 1st load are being executed.

Fig. 2 Laundry process with pipeline technique

 Based on the examples above, we can conclude that using the pipeline technique does not have an impact
on thetime needed for completing a single load of laundry (it takes 2 hours).

 The improvement is visible when doing multiple loads. Without the pipeline, we can do two loads for a
total of4 hours.

 Using the pipeline as shown in Fig. 2 we can do 3 loads in the same time frame. The speedup would
be even greater if all of the operations took the same time to complete, thus allowing better overlapping
in the pipeline.This is applied in microprocessor pipeline implementations.
SPEEDUP DUE TO PIPELINING

Sk= n*k/k + (n-1)

n= no.of instruction processed


k= no.of stages in the instruction pipeline.

PIPELINE HAZARD

 Any condition that causes a stall in the pipeline operations can be called a hazard.
 Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing
during its designated clock cycles.
 There are primarily three types of hazards:

i. Data Hazards

ii. Control Hazards or instruction Hazards

iii. Structural Hazards.

Data Hazards

 A data hazard is any condition in which either the source or the destination operands of an instruction
are not available at the time expected in the pipeline.

 As a result of which some operation has to be delayed and the pipeline stalls.

 Whenever there are two instructions one of which depends on the data obtained from the other.

 A=3+A

 B=A*4

 For the above sequence, the second instruction needs the value of ‘A’ computed in the first
instruction.Thus the second instruction is said to depend on the first.

 If the execution is done in a pipelined processor, it is highly likely that the interleaving of these two
instructions can lead to incorrect results due to data dependency between the instructions. Thus the
pipeline needs to be stalled as and when necessary to avoid errors.
 There are three types of data dependencies: Read after Write (RAW), Write after Read (WAR),
Write after Write (WAW).

1) READ AFTER WRITE(RAW)

 It is also known as True dependency or Flow dependency. It occurs when the value produced
by an instruction is required by a subsequent instruction.

2) WRITE AFTER READ(WAR)

 It is also known as anti dependency. These hazards occur when the output register of an
instruction is used right after read by a previous instruction.

3) WRITE AFTER WRITE(WAW)

 It is also known as output dependency. These hazards occur when the output register of an
instruction is used for write after written by previous instruction .

Structural Hazards

 This situation arises mainly when two instructions require a given hardware resource at the same time
and hence for one of the instructions the pipeline needs to be stalled.

 The most common case is when memory is accessed at the same time by two instructions.

 One instruction may need to access the memory as part of the Execute or Write back phase while other
instruction is being fetched.

 In this case if both the instructions and data reside in the same memory.

 Both the instructions can’t proceed together and one of them needs to be stalled till the other is done
with the memory access part.

 Thus in general sufficient hardware resources are needed for avoiding structural hazards.

Control hazards

 The instruction fetch unit of the CPU is responsible for providing a stream of instructions to the
execution unit.

 The instructions fetched by the fetch unit are in consecutive memory locations and they are executed.

 However the problem arises when one of the instructions is a branching instruction to some other
memory location. Thus all the instruction fetched in the pipeline from consecutive memory locations
are invalid now and need to removed(also called flushing of the pipeline).
 This induces a stall till new instructions are again fetched from the memory address specified in the
branch instruction.

 Thus the time lost as a result of this is called a branch penalty.

 Often dedicated hardware is incorporated in the fetch unit to identify branch instructions and compute
branch addresses as soon as possible and reducing the resulting delay as a result.

Super scalar Processors

 Superscalar processors issue more than one instruction per clock cycle.
 The processor with multiple processing units to handle several instructions in parallel in each
processing stage.
 With this arrangement, several instructions start execution in the same clock cycle and the process
is said to use multiple issue.
 Such processors are capable of achieving an instruction execution throughput of more than one
instruction per cycle. They are known as ‘Superscalar Processors’.
 Superscalar Processors execute multiple instructions within the implemented Executing Units in the
Execution Stage, providing true parallelism within the CPU.
 In a superscalar design, the processor must read the instructions from memory and decide which ones
can be run in parallel, dispatching them to the available Executing Units.
 The Superscalar Processor can be envisioned as having multiple parallel pipelines.

 In super scalar processor entire processor pipeline or parts of its are replicated.
 Super pipeling is implemented in pentium.
 There are two integer execution unit.
 Two instruction can be executed at any time.
 That is two integer instructions are also to be fetched and decoded in parallel.

Multicore processing

 Multi core processor contains several processing units means “Cores” on one chip, and every core of
processor is capable to perform their different tasks.
 For example, if you are doing multiple tasks at a same time like as using WhatsApp and Watching
movie then one core handle WhatsApp activities and other core manage to another works such as
watching movie.
 This architecture of a multicore processor allows to make communication in between entire available
cores, and they split all processing tasks and then assigned them accurately.
 When all processing tasks are done, then this processed data from every core is sent backward to main
board (Motherboard) of computer with using of single shared gateway.
 Due to this technique, to improve the entire performance then single core processor.

Advantages of Multi-Core Processor

 It is capable to perform more tasks compare to single core processors.


 It can perform multiple works in simultaneous with using of low frequency.
 It is able to process huge data compare to single core processor.
 Low consumption of energy while completing multiple tasks at a same time.

 It uses Instruction level parallelism.

Major issues in Multicore Processing

 Power and temperature issues


 Level of parallelism
 Interconnect issues
 Cache coherence

Power and temperature issues

 The power consumption is a function of number of transistors on a chip. when more cores are added,
the transistor density also increases which contributes to the power consumption.
 To reduce unnecessary power consumption, the multicore design also has to make use of a separate
power management unit that can manage or control unnecessary wastage of power.

Level of parallelism
 Power and temperature issues
 He process Performance is directly related to the amount of parallelism because more the number of
processes that can be executed simultaneously more will be the parallelism.
 The success of multicore technology strongly depends on the way the algorithms are written.
 If the algorithms written are not compatible with the multicore design, the process will run on one of
the cores, while other cores will sit idle.
Interconnect issues

 Since there are so many components on chip in a multicore processor like core, caches, network
controllers etc, the interaction between them can affect the performance of the interconnection issues
are not resolved properly.

Cache coherence

 In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same
level of the memory hierarchy. For example, the cache and the main memory may have inconsistent
copies of the same object.
 As multiple processors operate in parallel, and independently multiple caches may possess different
copies of the same memory block, this creates cache coherence problem. Cache coherence
schemes help to avoid this problem by maintaining a uniform state for each cached block of data.

 Let X be an element of shared data which has been referenced by two processors, P1 and P2.
 In the beginning, three copies of X are consistent. If the processor P1 writes a new data X1 into the
cache, by using write-through policy, the same copy will be written immediately into the shared
memory.
 In this case, inconsistency occurs between cache memory and the main memory. When a write-back
policy is used, the main memory will be updated when the modified data in the cache is replaced or
invalidated.
MMX Technology

 MMX is a Pentium microprocessor from Intel that is designed to run faster when playing multimedia
applications. According to Intel, a PC with an MMX microprocessor runs a multimedia application up
to 60% faster than one with a microprocessor having the same clock speed but without MMX.
 In addition, an MMX microprocessor runs other applications about 10% faster, probably because of
increased cache.
 All of these enhancements are made while preserving compatibility with software and operating
systems developed for the Intel Architecture.
 The MMX technology consists of several improvements over the non-MMX Pentium microprocessor:

1. 57 new microprocessor instructions have been added that are designed to handle video, audio, and
graphical data more efficiently. Programs can use MMX instructions without changing to a new
mode or operating-system visible state.
2. New 64-bit integer data type (Quadword).
3. A new process, Single Instruction Multiple Data (SIMD), makes it possible for one instruction to
perform the same operation on multiple data items.
4. The memory cache on the microprocessor has increased to 32 KB, meaning fewer accesses to
memory that is off the microprocessor.

Hyper-threading

 Hyper Threading is a technology designed to increase the performance of the CPU. It allows multiple
threads to run on each core to make the CPU run efficiently.
 It increases the amount of work performed by the CPU within a unit time.
 A core is the execution unit of the CPU.
 Initially, there was only one core in the CPU. Later, manufacturers added more cores to the CPU to
increase the number of instructions executed by the CPU at a time.
 Hyper threading is a mechanism to increase the performance of the CPU further.
 It makes the operating system recognize each physical core as two virtual or logical cores.
 In other words, it virtually increases the number of cores in a CPU. Therefore, a single processor runs two
threads. hyper threading really does not increase the number of coresit just increases the cores virtually or
logically. Each virtual core can work independently.
PREVIOUS YEAR QUESTIONS FROM FOURTH MODULE

PART A (2 marks)

1. What is hyperthreading.
2. State the term pipelining.
3. Write any two features of pentium.
4. Write any four features of Pentium processor.
5. List pipeline hazards.

PART B (6 marks)

1. Compare execution of instruction in unpipelined and pipelined processor with diagram.


2. Describe any two major issues in multicore processing.
3. Compare real mode and protected mode operation of 8086.
4. Explain three types of pipline hazards.
5. Explain super scalar processors with suitable diagram.
6. How does MMX make computation faster for media data.

PART C (15 marks)

1. Draw the execution unit of Pentium processor and explain . (8 marks)


2. Explain the features of Pentium processor. (7 marks)
3. Distinguish real mode and protected modes of 80386. (8 marks)
4. Describe MMX technology . (7 marks)
5. Explain the superscalar architecture with suitable diagram. (8 marks)
6. Give notes on hyper threading. (7 marks)
7. Give issues in multicore processing. (7 marks)
8. Draw the diagram of a multicore processor and explain multicore processing concept. (8 marks)
9. Draw and explain five stage pipeline. (7 marks)
10. Explain the features of 80386 (8 marks)
11. What are pipeline hazard (8 marks)
12. Write the enhanced features of 80386 compared to 8086 (9 marks)

You might also like