Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 56

Pentium Processor

Features of Pentium
• Introduced in 1993 with clock frequency
ranging from 60 to 66 MHz
• The primary changes in Pentium Processor were:
– Superscalar Architecture
– Dynamic Branch Prediction
– Pipelined Floating-Point Unit
– Separate 8K Code and Data Caches
– Writeback MESI Protocol in the Data Cache
– 64-Bit Data Bus
– Bus Cycle Pipelining
Pentium Architecture
Pentium Architecture
• It has data bus of 64 bit and address bus of 32-
bit
• There are two separate 8kB caches – one for
code and one for data.
• Each cache has a separate address
translation TLB which translates linear
addresses to physical.
• Code Cache:
– 2 way set associative cache
– 256 lines b/w code cache and prefetch
buffer, permitting prefetching of 32 bytes
(256/8) of instructions
Pentium Architecture
• Prefetch Buffers:
▫ Four prefetch buffers within the processor works as
two independent pairs.
 When instructions are prefetched from cache, they are
placed into one set of prefetch buffers.
 The other set is used as when a branch
operation is predicted.
▫ Prefetch buffer sends a pair of instructions to
instruction decoder
• Instruction Decode Unit:
▫ It occurs in two stages – Decode1 (D1) and
Decode2(D2)
▫ D1 checks whether instructions can
be paired
▫ D2 calculates the address of
memory resident
Pentium
Architecture
• Control Unit :
▫ This unit interprets the instruction word and
microcode entry point fed to it by Instruction
Decode Unit
▫ It handles exceptions, breakpoints and
interrupts.
▫ It controls the integer pipelines and floating point
sequences
• Microcode ROM :
▫ Stores microcode sequences
• Arithmetic/Logic Units (ALUs) :
▫ There are two parallel integer instruction pipelines: u-
pipeline and v-pipeline
▫ The u-pipeline has a barrel shifter
▫ The two ALUs perform the arithmetic and logical
operations specified by their instructions in their
respective pipeline
Pentium Registers
• Four 32-bit registers can be used as
∗ Four 32-bit register (EAX, EBX, ECX, EDX)
∗ Four 16-bit register (AX, BX, CX, DX)
∗ Eight 8-bit register (AH, AL, BH, BL, CH, CL, DH, DL)
• Some registers have special use
∗ ECX for count in loop instructions
Pentium Registers (Eflags)

• Flags never change for any data transfer or program control operation.
• Some of the flags are also used to control features found in the
microprocessor.
• Flag bits, with a brief description of function.
• C (carry) holds the carry after addition or
borrow after subtraction.
▫ also indicates error conditions
• P (parity) is the count of ones in a number
expressed as even or odd. Logic 0 for odd parity;
logic 1 for even parity.
▫ if a number contains three binary one bits, it has
odd parity
▫ if a number contains no one bits, it has even
parity
• C (carry) holds the carry after addition or
borrow after subtraction.
▫ also indicates error conditions
• P (parity) is the count of ones in a number
expressed as even or odd. Logic 0 for odd parity;
logic 1 for even parity.
▫ if a number contains three binary one bits, it has
odd parity; If a number contains no one bits, it
has even parity
• A (auxiliary carry) holds the carry (half-
carry) after addition or the borrow after
subtraction between bit positions 3 and 4 of the
result.
• Z (zero) shows that the result of an arithmetic
or logic operation is zero.
• S (sign) flag holds the arithmetic sign of the result
after an arithmetic or logic instruction executes.
• T (trap) The trap flag enables trapping through
an on-chip debugging feature.
• I (interrupt) controls operation of the INTR
(interrupt request) input pin.
• D (direction) selects increment or
decrement mode for the DI and/or SI registers.
• O (overflow) occurs when signed numbers
are added or subtracted.
▫ an overflow indicates the result has exceeded
the capacity of the machine
• IOPL used in protected mode operation
to select the privilege level for I/O devices.
• NT (nested task) flag indicates the current
task is nested within another task in protected
mode operation.
• RF (resume) used with debugging to
control resumption of execution after the next
instruction.
• VM (virtual mode) flag bit selects virtual
mode operation in a protected mode system
• AC, (alignment check) flag bit activates if a word
or doubleword is addressed on a non-word or non-
doubleword boundary.
• VIF is a copy of the interrupt flag bit available to the
Pentium 4–(virtual interrupt)
• VIP (virtual) provides information about a virtual
mode interrupt for (interrupt pending)
Pentium.
▫ used in multitasking environments to provide virtual
interrupt flags
• ID (identification) flag indicates that
the Pentium microprocessors support the
CPUID instruction.
▫ CPUID instruction provides the system with
information about the Pentium microprocessor
Control Registers
• CD cache disable controls the internal cache. If
CD=1 , the cache will not fill with new data . If CD=0
misses will cause the cache to fill with new data
• NW Not write through selects the mode of operation
for the data cache. If NW=1, the data cache is
inhibited from cache write though
• AM Alignment mask enables alignment checking
when set, it only occurs for protected mode
• WP write protect protects user level pages against
supervisor level write operations. When WP=1, the
supervisor can write to user level segments
• NE numeric error enables standard numeric
coprocessor error detection.
Pin Diagram
• CLOCK
▫ CLK - Clock (Input)
 Fundamental Timing for the Pentium
 The CPU uses this signal as the internal processor
clock.
▫ BF - Bus Frequency (Input)
 Bus Frequency determines the bus-to-core frequency
ratio
 When BF is strapped to Vcc, the processor will
operate at a 2 to 3 bus to core frequency ratio.
 When BF is strapped to Vss, the processor will
operate at a 1 to 2 bus to core frequency ratio.
• Initialization
▫ RESET - (Input)
 Forces the CPU to begin execution at a known state.
▫ INIT - Initialization (Input)
 The Pentium processor initialization input pin forces
the Pentium processor to begin execution in a
known state.
 The processor state after INIT is the same as the
state after RESET except that the internal caches,
write buffers, and floating point registers retain the
values they had prior to INIT.
• Address Bus
▫ A31:A3 - ADDRESS bus lines
 Output except for cache snooping
▫ The number of address lines determines the
amount of memory supported by the processor.
▫ Determines where in the 4GB memory space or
64K IO space the processor is accessing.
▫ These are input lines when AHOLD & EADS# are
active for Inquire Cycles (snooping)
• Address Bus
▫ BE7#:BEO#: Byte Enable lines (Outputs)
▫ Byte Enables to enable each of the 8 bytes in the
64-bit data path.
 Helps define the physical area of memory or I/O
accessed.
 The Pentium uses Byte Enables to address locations
within a QWORD.
 In effect a decode of the address lines A2-A0 which
the Pentium does not generate.
 Which lines go active depends on the address, and
whether a byte, word, double word or quad word is
required.
• Address Mask
▫ A20M#: Address 20 Mask (Input)
 Emulates the address wraparound at 1 MByte which
occurs on the 8086.
 When A20M# is asserted, the Pentium processor
masks physical address bit 20 (A20) before
performing a lookup to the internal caches or driving
a memory cycle on the bus.
 A20#M must be asserted only when the processor is
in real mode.
• Internal Parity
▫ IERR# - Internal Error (Output)
 Alerts System of Internal Parity Errors
• Address Parity
▫ AP Address Parity (I/O)
 Bi-directional address parity pin for the address lines.
 Address Parity is driven by the Pentium processor with
even parity information on all CPU generated cycles in
the same clock that the address is driven
 Even parity must be driven back to the CPU during
inquire cycles on this pin in the same clock as EADS#.
 Not supported on all systems
▫ APCHK#: Address Parity Check Signal (Output)
 The status of the address parity check is driven on the
APCHK# output.
 Even Parity Checking
• Data Bus.
▫ D63:DO - Data Lines (I/O).
 The bi-directional 64-bit data path to or from the
CPU.
 The signal W/R# distinguishes direction.
 During reads, the CPU samples the data bus when
BRDY# is asserted.
▫ DP7: DP0 - Data Parity (I/O)
 Bi-directional data parity pins for the data bus.
 Even Parity Check. One for each byte of the data
bus
 Output on writes, Input on reads.
 Not supported on all systems.
• Bus Control
▫ ADS# - Address Strobe (output)
 Indicates that a new valid bus cycle is currently
being driven by the Pentium processor.
 The following are some of the signals which are valid
when ADS#=0
 Addresses (A31:3)
 Byte Enables (BE7#:0#)
 Bus Cycle definition (M/IO#; D/C#; W/R#, CACHE#)
 From power-on the ADS# signal should be asserted
periodically when bus cycles are running
• Bus Control (Cont.)
▫ BRDY# - Burst Ready (Input)
 Transfer complete indication.
 The burst ready input indicates that the external system
has presented data on the data pins in response to a read
or thatthe external system has accepted the Pentium
processor data in response to a write request.
 This signal ends the current bus cycle and is used to
extend bus cycles to allow slow devices extra time.
 If LOW (non-burst cycles), this signal ends the
current bus cycle and the next bus cycle can
begin.
 If HIGH the Pentium is prevented from continuing
processing and wait states are added.
• Bus Cycle Definition
▫ M/IO# - Memory or Input/Output (output)
 M/IO# distinguishes between Memory and I/O
cycles.
 The memory/input-output is one of the
primary bus
cycle definition pins.
 1 = Memory Cycle
 0 = Input/Output Cycle
 It is driven valid in the same clock as the ADS#
signal is asserted.
• Bus Cycle Definition (Cont.)
▫ D/C# - Data or Code (output)
 D/C# distinguishes between data and code or special
cycles (control)
 The data/code output is one of the primary bus cycle
definition pins.
 1 = Data
 0 = Code / Control
»Control for Interrupt Acknowledge or Special Cycles
 It is driven valid in the same clock as the ADS#
signal is asserted.
• Bus Cycle Definition (Cont.)
▫ W/R# - Write or Read (output)
W/R# distinguishes between Write and Read cycles.
Write/read is one of the primary bus cycle
definition pins.
 1 = Write
 0 = Read
It is driven valid in the same clock as the ADS#
signal is asserted.
• Bus Cycle Definition (Cont.)
▫ Cache# - Cache ability (output)
 Processor indication of internal cache ability.
 The L1 cache must be enabled using the CD bit in
CR0 for Cache# to be asserted low.
 The Cache# signal could also be described as the
BURST instruction signal, because the Cache# signal
(qualified with KEN#) results in a burst mode
transfer of 32 bytes of code or data.
 Cache# and Ken# are used together to determine if
a read will be turned into a linefill. (Burst cycle).
 During write-back cycles, the CPU asserts the
CACHE# signal (KEN# does not have to be asserted)
• Bus Cycle Definition (Cont.)
▫ NA# - Next Address (Input)
 Indicates external memory is prepared for a pipeline
cycle.
 An active next address input indicates that the
external memory system is ready to accept a new bus
cycle although all data transfers for the current cycle
have not yet completed.
 When NA# is asserted, the Pentium supplies the
address for the start of the next transfer early, so
that the memory system can latch the new address
before the transfer is ready to start.
 A detailed discussion of Address Pipelining is
beyond the scope of this course.
• Bus Cycle Definition (Cont.)
▫ Lock# - Bus Lock (Output)
 The bus lock pin indicates that the current bus cycle is
locked, typically for a read-modify-write operation.
 The CPU will not allow a bus hold when LOCK# is
asserted.
 Locked cycles are generated when the programmer
prefixes certain instructions with the LOCK prefix.
 e.g. LOCK INC [EDI] ;Increment a
memory location
 Locked cycles are generated automatically for certain bus
transfer operations.
 Interrupt Acknowledge cycles
 The XCHG instructions when 1 operand is memory-based.
 See Pentium manual for more details.
• Cache Control
▫ KEN# - Cache Enable (Input)
 Indicates to the Pentium whether or not the system
can support a cache line fill for the current cycle.
 Cache# and Ken# are used together to determine if
a read will be turned into a linefill. (Burst cycle).
▫ WB/WT# - Write-back/Write-through
(Input)
 This pin allows a cache line to be defined as a a write
back or write-through on a line by line basis.
• Bus Arbitration
▫ HOLD - Bus Hold (Input)
 Allows another bus master complete control of the
CPU bus.
 In response to the bus hold request, the Pentium
processor will float most of its output and
input/output pins and assert HLDA after
completing all outstanding bus cycles.
 The Pentium processor will maintain its bus in this
state until HOLD is de-asserted.
▫ HLDA - Bus Hold Acknowledge (Output)
 External indication that the Pentium™ outputs are
floated.
• Bus Arbitration (Cont.)
▫ BOFF# - Backoff (Input)
 Forces the Pentium to get off the bus in the next
clock.
 After BOFF# is removed, the Pentium restarts the
bus cycle.
▫ BREQ - Bus Request (output)
 Indicates externally when a bus cycle is pending
internally.
 Used to inform the arbitration logic that the Pentium
need control of the bus to perform a bus cycle.
• Interrupts
▫ INTR - Maskable Interrupt (Input)
 Indicates that an external interrupt has been
generated.
 If the IF(Interrupt Enable Flag) bit in the EFLAGS
register is set, the Pentium processor will generate
two locked interrupt acknowledge bus cycles (to get
type number) and vectors to an interrupt handler
after the current instruction execution is completed.
▫ NMI - Non-Maskable Interrupt (Input)
 Indicates that an external non maskable interrupt
has been generated.
 The Pentium processor will vector to a Type 2
interrupt handler after the current
instruction execution is completed
• Probe Mode
▫ R/S# - Resume/Stop [Run/Scan] (Input)
 The run/stop input is an asynchronous, edge-
sensitive interrupt used to stop the normal execution
of the processor and place it into an idle state.
▫ PRDY - Probe Ready (Output)
 The probe ready output pin indicates that the
processor has stopped normal execution in response
to the R/S# pin going active. The CPU enters
Probe Mode.
What is Superscalar?
• Common instructions (arithmetic, load/store,
conditional branch) can be initiated and
executed independently
• Equally applicable to RISC & CISC
• In practice usually RISC
General Superscalar Organization
Superpipelined
• Many pipeline stages need less than half a clock
cycle
• Double internal clock speed gets two tasks per
external clock cycle
• Superscalar allows parallel fetch execute
Superscalar v Superpipeline
Limitations
• Instruction level parallelism
• Compiler based optimisation
• Hardware techniques
• Limited by
▫ True data dependency
▫ Procedural dependency
▫ Resource conflicts
▫ Output dependency
▫ Antidependency
True Data Dependency
• ADD r1, r2 (r1 := r1+r2;)
• MOVE r3,r1 (r3 := r1;)
• Can fetch and decode second instruction in
parallel with first
• Can NOT execute second instruction until first is
finished
Procedural Dependency
• Can not execute instructions after a branch in
parallel with instructions before a branch
• Also, if instruction length is not fixed,
instructions have to be decoded to find out how
many fetches are needed
• This prevents simultaneous fetches
Resource Conflict
• Two or more instructions requiring access to the
same resource at the same time
▫ e.g. two arithmetic instructions
• Can duplicate resources
▫ e.g. have two arithmetic units
Output Dependency
• Write-write dependency
▫ R3:=R3 + R5; (I1)
▫ R4:=R3 + 1; (I2)
▫ R3:=R5 + 1; (I3)
▫ R7:=R3 + R4; (I4)
In the above instruction sequence I2 cannot be
executed before I1 as of true dependency
and similar of I4 and I3
If They are not executed sequentially wrong
values will be fetchedwhich is referred as output
dependency
Antidependency
• Write-write dependency
▫ R3:=R3 + R5; (I1)
▫ R4:=R3 + 1; (I2)
▫ R3:=R5 + 1; (I3)
▫ R7:=R3 + R4; (I4)
▫ I3 can not complete before I2 starts as I2 needs a
value in R3 and I3 changes R3
Design Issues
• Instruction level parallelism
▫ Instructions in a sequence are independent
▫ Execution can be overlapped
▫ Governed by data and procedural
dependency
• Machine Parallelism
▫ Ability to take advantage of instruction level
parallelism
▫ Governed by number of parallel pipelines
Instruction Issue Policy
• Order in which instructions are fetched
• Order in which instructions are executed
• Order in which instructions change registers and
memory
In-Order Issue In-Order Completion
• Issue instructions in the order they occur
• Not very efficient
• May fetch >1 instruction
• Instructions must stall if necessary
In-Order Issue Out-of-Order Completion

• If any instruction is independent on current


instruction then it is then it is allowed to
execute before completion of current
instruction
Out-of-Order Issue Out-of-Order Completion
• Decouple decode pipeline from execution pipeline
• Can continue to fetch and decode until this pipeline is
full
• When a functional unit becomes available an instruction
can be executed
• Since instructions have been decoded, processor can
look ahead
Register Renaming
• Output and antidependencies occur because
register contents may not reflect the correct
ordering from the program
• May result in a pipeline stall
• Registers allocated dynamically
▫ i.e. registers are not specifically named
Superscalar Execution
Superscalar Implementation
• Simultaneously fetch multiple instructions
• Logic to determine true dependencies involving
register values
• Mechanisms to communicate these values
• Mechanisms to initiate multiple instructions in
parallel
• Resources for parallel execution of multiple
instructions
• Mechanisms for committing process state in
correct order
Programmers model
Data Transfer Instructions
• Move data between memory and the general
purpose and segment registers.
• Perform some operations as conditional moves,
stack access, and data conversion

You might also like