Piii Isa

Instruction Set
General Purpose Instruction X87 FPU Instruction SIMD Instruction
MMX Instruction SSE Instruction
System Instruction
CS854 Pentium III group
General Purpose Instruction
Data transfer Binary integer arithmetic Decimal arithmetic Logic operations Shift and rotate Bit and byte operations Program control String Flag control Segment register operations
CS854 Pentium III group 2
X87 FPU Instruction

Floating-point Integer Binary-coded decimal (BCD) operands
SIMD Instruction
SIMD: Single Instruction Multiple Data

MMX instruction & SSE instruction provides a group of instructions that perform SIMD operations on packed integer and/or packed floating-point data elements contained in the 64-bit MMX or the 128-bit XMM registers. enables increased performance on a wide variety of multimedia and communications applications.
Whats new in Pentium III

Pentium III=Pentium II + SSE SSE : Internet Streaming SIMD Extensions Seventy New Instruction Three Categories:

SIMD-Floating Point New Media Instruction Streaming Memory Instruction

The implementation of SSE
SSE has 128-bit architectural width
Double-cycling the existing 64-bit data paths. Deliver a realized 1.5 2x speedup Only have 10% die size overhead
SIMD-FP Instruction
SIMD feature introduce a new register file containing eight 128-bit registers
Capable of holding a vector of four IEEE single precision FP data elements Allow four single precision FP operations to be carried out within a single instruction
SIMD-FP Instruction
Resveration Station Port 0 Port 1
uop Dispatch
Port 2 Execution Unit 0 Execution Unit 1 LD/ST AGU 0
Port 3 LD/ST AGU 1
Port 4 Store Data
Multipler SIMD FP MMX x87
SIMD FP Adder
MMX ALU0
Shuffle
Wire
MMX ALU1
Dispatch Mechanism
Reciprocal Estimates
SIMD-FP Instruction
The dispatch mechanism allows half of a SIMD multiply and half of an independent SIMD add to be issued together The peak rate of one 128-bit operation is when the instructions alternate between different execution unit.
SIMD-FP Instruction
The news unit for shuffle instruction and reciprocal estimate

Rearranging elements within a vector Two approximation instructions: RCP RSQRT SIMD-FP and MMX or x87 instruction can be used concurrently A new control/status register MXCSR
Adding a new state
10
SIMD-FP Instruction
Support two modes of FP arithmetic

Full IEEE-754 mode Flush-to-zero(FTZ) mode
More that SIMD-FP arithmetic
Need perform operations on a subset of elements within a vector SIMD logical Instructions (AND,ANDN,OR,XOR) MOVMSKPS instruction
11
SIMD-FP Instruction
With MOVHPS,MOVLPS, SHUFPS instructions,Pentium III can transpose vectors with only a small overhead.
Memory
W3 Z3 Y3 X3 W2 Z2 Y2 X2 W1 Z1 Y1 X1 W0 Z0 Y0 X0
MOVHPS
MOVLPS MOVHPS
Y3 X3 Y2 X2 Y1 X1 Y0 X0
MOVLPS Rs
Registers
Rd
SHUFPS
Register
Rd
X3
X2
X1
X0
Drawback of this implementation
Code-scheduling dilemma
New Media Instruction
New integer instructionextensions to the MMX instruction set Accelerate important multimedia tasks
PMAX PMIN : Viterbi-Search algorithm in speech recognition PAVG: accelerate video decoding PSADBW: Speed motion-search in video encoding
Streaming Memory Instruction
One Downside of SIMD engines
Increase the processing rate above the memory systems ability to supply data
Intel increased the throughput of the memory system and the P6 bus

Prefetch instruction Streaming store Enhanced write combining(WC)

Prefetch instruction
Bring data into the cache before the program actually needs it Overlap processing with long-latency memory read Just hints never cause a program fault so can be hoisted arbitrarily far and retired before the memory access completes
Specify the cache level to which data will be prefetched Store data directly to memory,bypassing the caches Avoid polluting the caches when it knows the data being stored will not be accessed again soon
Streaming store Instruction
Enhanced write combining (WC)

Increase to four WC buffers Improve the buffer-management policies SFENCE instruction

Flush the wc buffer Ensure that all prior stores are globally visible
17
Conclusion

Almost the same core with Pentium II SSE enhance the multimedia capability SSE has some advantages over 3Dnow
18

Piii Isa

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Piii Isa

Uploaded by

Copyright:

Available Formats

Instruction Set

General Purpose Instruction X87 FPU Instruction SIMD Instruction

MMX Instruction SSE Instruction

CS854 Pentium III group

General Purpose Instruction

X87 FPU Instruction

Floating-point Integer Binary-coded decimal (BCD) operands

CS854 Pentium III group

SIMD: Single Instruction Multiple Data

CS854 Pentium III group

Whats new in Pentium III

SIMD-Floating Point New Media Instruction Streaming Memory Instruction

The implementation of SSE

SSE has 128-bit architectural width

CS854 Pentium III group

CS854 Pentium III group

Port 2 Execution Unit 0 Execution Unit 1 LD/ST AGU 0

Port 3 LD/ST AGU 1

Port 4 Store Data

Multipler SIMD FP MMX x87

CS854 Pentium III group

The news unit for shuffle instruction and reciprocal estimate

Adding a new state

CS854 Pentium III group

Support two modes of FP arithmetic

Full IEEE-754 mode Flush-to-zero(FTZ) mode

More that SIMD-FP arithmetic

CS854 Pentium III group

Drawback of this implementation

New Media Instruction

Streaming Memory Instruction

One Downside of SIMD engines

Prefetch instruction Streaming store Enhanced write combining(WC)

Streaming Memory Instruction

Streaming Memory Instruction

Streaming store Instruction

Streaming Memory Instruction

Enhanced write combining (WC)

Increase to four WC buffers Improve the buffer-management policies SFENCE instruction

CS854 Pentium III group

CS854 Pentium III group

You might also like