Pentium-4 RNM Final

• Comparative study of
• 8086,80386,Pentium1,
Pentium 2,Pentium 3
Pentium-4
Overview of the Netburst™ Micro-Architecture
System Bus
Level 1 Data cache
Bus Unit
Execution Unit
Level 2 Cache
Integer & FP
Memory Subsystem Execution Units
Fetch/ Trace Out of order Retire

Decode cache Execution ment
Logic
BTB/Branch Branch History Update

Prediction
Out of order Engine
Front End
04/19/2024 5
Frond end:
Fetches the Instructions, decode them and
send them to the out of order execution
core
There are three parts to it:
1.Fetch/Decode Unit.
2.Execution Trace cache.
3.BTB/Branch Prediction
• 1.Fetch/Decode Unit.
• -Fetches instructions from L2 cache

• -Decode into micro-ops
• -Store micro-ops in L1 cache
• 2.Execution Trace cache.
• Execution Trace Cache stores decoded

instructions. or uops.
and
• when there is a mis-prediction there is no
need to re-decode the instruction and so
decode latency is reduce
• 3.BTB/Branch Prediction :
• Determines next instruction to be fetch
from L2 cache in case of Trace cache miss
• Integer and Floating-Point Units
• This is the unit where the instructions are

actually executed.
• It has two parts
• -L1 data cache
• -Execution unit
• L1 data cache
• Used for both Integer and FP loads and
stores
• 4 way associative cache, write through
(Every data in L1 written to L2)
• 8 K in size and it is very fast
• Execution unit
• Execute micro-ops
• Data from L1 cache
• Results in registers
• -Up to 4 integer arithmetic operations per clock
cycle
• - 1 Floating point operation per clock cycle
• -A memory load and read operation each clock
cycle
• Out of Order Engine:
• This is where the Instructions are
prepared for execution.
• keeps the execution units busy.
• allocate as many instructions are possible
that have their operands ready .
• There are two parts to it:
• 1.Out of order Execution Logic
• Allows maximum Utilization
• Schedules micro-ops
• Based on data dependence and resources
• May speculatively execute
• Execute independent instructions that are
ready to execute.
• 2.Retirement Unit
• Ensures that the Instruction are back in
order.
• The retirement unit reorders the instructions,
executed in an out-of-order manner, back to
the original program order.
• This logic also reports branch history
information to the branch predictors at the
front end of the machine so they can train with
the latest known-good branch-history
information.
• Memory Subsystem :
• This includes the L2 cache and the system bus.
• The L2 cache stores both instructions and data
that cannot fit in the Execution Trace Cache and
the L1 data cache.
• Used for Accessing the Main memory when there
is a L2 cache miss.
• Used also for accessing the I/O devices
• Bandwidth – 3.2 GB/s
• Width – 64 Bits
• Clock rate – 400 MHz
Instruction Translation Lookaside
Buffer(ITLB)
• If there is a Trace Cache miss
• Then instruction bytes are required to be
fetched from the L2 cache
• ITLB receives the request from Trace
Cache to deliver new instruction
• ITLB translates the next IP linear address
to a physical address needed to access
the L2 cache.
• A request is sent to the L2 cache and
instruction bytes are returned
• These bytes are placed into
streaming buffers ,which hold the
bytes until they are decoded
• The ITLB also performs page-level
protection checking.
• Each logical processor has its own
ITLB and its own instruction pointer
• When both the logical processors
request the access of L2 cache
• The instruction fetch logic select the
processor which has requested first
• It however, reserves at least one
request slot for each logical processor
• In this way, both logical processors
can access and fetch data from L2
cache without any conflict
• Before the instructions are decoded,
they are stored in streaming buffers
• Each logical processor has its own set
of two 64-byte streaming buffers.
• Hyper threading technology
• HT technology enables a single physical
processor to execute two or more
separate code streams (called thread )
concurrently.
• HT technology allows 1 physical processor
to appear as 2 or more logical processor
to software (OS and application)
• HT technology is one form of hardware
multithreading capability of processor
• Each logical processor has its own
architecture state with its own set of
general-purpose and control registers,
some machine state registers
• Logical processors share a single set of

physical set of resources (Caches,
execution units , branch
predictors ,control logic and buses)
•
• OS view logical processors as physical
processors
• Schedule threads to logical processors as in
multiprocessor system
• Each logical processor has its own interrupt
controller (Interrupts sent to a specific logical
processor are handled only by it)

Pentium-4 RNM Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pentium-4 RNM Final

Uploaded by

Copyright:

Available Formats

• Comparative study of

Fetch/ Trace Out of order Retire

BTB/Branch Branch History Update

• -Fetches instructions from L2 cache

• Execution Trace Cache stores decoded

• This is the unit where the instructions are

• Logical processors share a single set of

You might also like