Professional Documents
Culture Documents
Power ISA
Power ISA
06 Revision B
July 23, 2010
ii
Power ISA
Preface
The roots of the Power ISA (Instruction Set Architecture) extend back over a quarter of a century, to IBM Research. The POWER (Performance Optimization With Enhanced RISC) Architecture was introduced with the RISC System/6000 product family in early 1990. In 1991, Apple, IBM, and Motorola began the collaboration to evolve to the PowerPC Architecture, expanding the architectures applicability. In 1997, Motorola and IBM began another collaboration, focused on optimizing PowerPC for embedded systems, which produced Book E. In 2006, Freescale and IBM collaborated on the creation of the Power ISA Version 2.03, which represented the reunification of the architecture by combining Book E content with the more general purpose PowerPC Version 2.02. A significant benefit of the reunification is the establishment of a single, compatible, 64-bit programming model. The combining also extends explicit architectural endorsement and control to Auxiliary Processing Units (APUs), units of function that were originally developed as implementation- or product family-specific extensions in the context of the Book E allocated opcode space. With the resulting architectural superset comes a framework that clearly establishes requirements and identifies options. To a very large extent, application program compatibility has been maintained throughout the history of the architecture, with the main exception being application exploitation of APUs. The framework identifies the base, pervasive, part of the architecture, and differentiates it from categories of optional function (see Section 1.3.5 of Book I). Because of the substantial differences in the supervisor (privileged) architecture that developed as Book E was optimized for embedded systems, the supervisor architectures for embedded and general purpose implementations are represented as mutually exclusive categories. Future versions of the architecture will seek to converge on a common solution where possible. This document defines the Power ISA Version 2.06. It is comprised of five books and a set of appendices. Book I, Power ISA User Instruction Set Architecture, covers the base instruction set and related facilities available to the application programmer. It includes five chapters derived from APU function, including the vector extension also known as Altivec. Book II, Power ISA Virtual Environment Architecture, defines the storage model and related instructions and facilities available to the application programmer. Book III-S, Power ISA Operating Environment Architecture, defines the supervisor instructions and related facilities used for general purpose implementations. Book III-E, Power ISA Operating Environment Architecture, defines the supervisor instructions and related facilities used for embedded implementations. It was derived from Book E and extended to include APU function. Book VLE, Power ISAVariable Length Encoded Instructions Architecture, defines alternative instruction encodings and definitions intended to increase instruction density for very low end implementations. It was derived from an APU description developed by Freescale Semiconductor. As used in this document, the term Power ISA refers to the instructions and facilities described in Books I, II, III-S, III-E, and VLE. Usage of the phrase Book III refers to both Book III-S and Book III-E. An exception to this rule is when, at the beginning of a Section or Book, it is specified that usage of the phrase Book III implies only either Book III-S or Book III-E. Change bars have been included to indicate changes from the Power ISA Version 2.05.
Preface
iii
Version 2.06 of the PowerISA was created by applying the following requests for change (RFCs) to PowerISA version 2.05. Fixed-Point Extended Precision Instructions: Four new divide instructions are added. The divide instructions assist in dividing 64 and 128 bit dividends by 32 and 64-bit divisors, respectively. Among other uses, these instructions can be used to provide better performance for extended-precision fixed-point divide operations. See Sections 3.3.8 and 3.3.8.1 of Book I. Bit Permute Instruction: A new instruction is added to provide high-speed bit permutations. See Section 3.3.13.1 of Book I. Additional Population Count Instructions: 32-bit and 64bit versions of the Population Count instructions are added. See Sections 3.3.12 and 3.3.13.1 of Book I. Software Initiated Stride-N Prefetching: A new stream variant of dcbt and dcbtst enables the specification of stride and first element offset. See Section 4.3.2 of Book II. Load/Store Doubleword Byte-Reverse Instructions: Doubleword versions of the byte-reversed Load and Store instructions are added. See Section 3.3.4.1 of Book I.
iv
Power ISA
Preface
vi
Power ISA
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . iii
Summary of Changes in Power ISA Version 2.06 Revision B . . . . . . . . . . . . . . . . . . . . iv 1.6.11 XX1-FORM. . . . . . . . . . . . . . . . . 1.6.12 XX2-FORM. . . . . . . . . . . . . . . . . 1.6.13 XX3-FORM. . . . . . . . . . . . . . . . . 1.6.14 XX4-FORM. . . . . . . . . . . . . . . . . 1.6.15 XS-FORM. . . . . . . . . . . . . . . . . . 1.6.16 XO-FORM . . . . . . . . . . . . . . . . . 1.6.17 A-FORM . . . . . . . . . . . . . . . . . . . 1.6.18 M-FORM . . . . . . . . . . . . . . . . . . 1.6.19 MD-FORM . . . . . . . . . . . . . . . . . 1.6.20 MDS-FORM . . . . . . . . . . . . . . . . 1.6.21 VA-FORM . . . . . . . . . . . . . . . . . . 1.6.22 VC-FORM . . . . . . . . . . . . . . . . . 1.6.23 VX-FORM. . . . . . . . . . . . . . . . . . 1.6.24 EVX-FORM . . . . . . . . . . . . . . . . 1.6.25 EVS-FORM . . . . . . . . . . . . . . . . 1.6.26 Z22-FORM . . . . . . . . . . . . . . . . . 1.6.27 Z23-FORM . . . . . . . . . . . . . . . . . 1.6.28 Instruction Fields . . . . . . . . . . . . 1.7 Classes of Instructions . . . . . . . . . . 1.7.1 Defined Instruction Class . . . . . . . 1.7.2 Illegal Instruction Class . . . . . . . . 1.7.3 Reserved Instruction Class . . . . . 1.7.4 Preferred Instruction Forms . . . . . 1.7.5 Invalid Instruction Forms . . . . . . . 1.7.6 Reserved-no-op Instructions [Category: Phased-In] . . . . . . . . . . . . . . 1.8 Exceptions. . . . . . . . . . . . . . . . . . . . 1.9 Storage Addressing. . . . . . . . . . . . . 1.9.1 Storage Operands . . . . . . . . . . . . 1.9.2 Instruction Fetches . . . . . . . . . . . . 1.9.3 Effective Address Calculation. . . . 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 21 21 22 22 22 22 22 22 23 23 24 26
Table of Contents . . . . . . . . . . . . . . . vii Figures. . . . . . . . . . . . . . . . . . . . . . . xxv Book I: Power ISA User Instruction Set Architecture. . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction . . . . . . . . . . 3
1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.2 Instruction Mnemonics and Operands . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Document Conventions . . . . . . . . . . 4 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Reserved Fields and Reserved Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.4 Description of Instruction Operation . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.3.5.2 Corequisite Category . . . . . . . . 11 1.3.5.3 Category Notation. . . . . . . . . . . 11 1.3.6 Environments. . . . . . . . . . . . . . . . 11 1.4 Processor Overview . . . . . . . . . . . . 12 1.5 Computation modes . . . . . . . . . . . . 14 1.5.1 Modes [Category: Server] . . . . . . 14 1.5.2 Modes [Category: Embedded]. . . 14 1.6 Instruction formats . . . . . . . . . . . . . 14 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 15 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 15 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 16 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 16 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.6.10 XFL-FORM . . . . . . . . . . . . . . . . 16
Table of Contents
vii
viii
Table of Contents
ix
Chapter 9. Embedded Floating-Point [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector]. . . . . . . . . . . . . . . . . . . . . . . 557
9.1 Overview . . . . . . . . . . . . . . . . . . . . 557 9.2 Programming Model . . . . . . . . . . . 558 9.2.1 Signal Processing Embedded Floating-Point Status and Control Register (SPEFSCR) . . . . . . . . . . . . . . . . . . . . . 558 9.2.2 Floating-Point Data Formats . . . 558 9.2.3 Exception Conditions . . . . . . . . . 559 9.2.3.1 Denormalized Values on Input 559 9.2.3.2 Embedded Floating-Point Overflow and Underflow . . . . . . . . . . . . . . . . . . . 559 9.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors . . . . . . . . . . . . . 559 9.2.3.4 Embedded Floating-Point Round (Inexact). . . . . . . . . . . . . . . . . . . . . . . . 559 9.2.3.5 Embedded Floating-Point Divide by Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 9.2.3.6 Default Results . . . . . . . . . . . . 560 9.2.4 IEEE 754 Compliance . . . . . . . . 560 9.2.4.1 Sticky Bit Handling For Exception Conditions . . . . . . . . . . . . . . . . . . . . . . 560 9.3 Embedded Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . 561 9.3.1 Load/Store Instructions . . . . . . . 561 9.3.2 SPE.Embedded Float Vector Instructions [Category: SPE.Embedded Float Vector]. . . . . . . . . . . . . . . . . . . . . 561 9.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single] . . . . . . . . . . . . . . . . . . . . . . . . . 569 9.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double] . . . . . . . . . . . . . . . . . . . . . . . . 576 9.4 Embedded Floating-Point Results Summary . . . . . . . . . . . . . . . . . . . . . . . 584
Chapter 8. Signal Processing Engine (SPE) [Category: Signal Processing Engine]. . . . . . . . . . . . . . . . . . . . . . 503
8.1 Overview. . . . . . . . . . . . . . . . . . . . 503 8.2 Nomenclature and Conventions . . 503 8.3 Programming Model . . . . . . . . . . . 504 8.3.1 General Operation . . . . . . . . . . . 504 8.3.2 GPR Registers. . . . . . . . . . . . . . 504 8.3.3 Accumulator Register . . . . . . . . 504 8.3.4 Signal Processing Embedded Floating-Point Status and Control Register (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 504 8.3.5 Data Formats . . . . . . . . . . . . . . . 507 8.3.5.1 Integer Format . . . . . . . . . . . . 507 8.3.5.2 Fractional Format . . . . . . . . . . 507 8.3.6 Computational Operations . . . . . 508
Table of Contents
xi
Version 2.06 Revision B Chapter 10. Legacy Move Assist Instruction [Category: Legacy Move Assist] . . . . . . . . . . . . . . . . . . . . . . 589 Chapter 11. Legacy Integer MultiplyAccumulate Instructions [Category: Legacy Integer MultiplyAccumulate] . . . . . . . . . . . . . . . . . . 591 Appendix A. Suggested FloatingPoint Models [Category: FloatingPoint] . . . . . . . . . . . . . . . . . . . . . . . 601
A.1 Floating-Point Round to SinglePrecision Model . . . . . . . . . . . . . . . . . .601 A.2 Floating-Point Convert to Integer Model . . . . . . . . . . . . . . . . . . . . . . . . . .605 A.3 Floating-Point Convert from Integer Model . . . . . . . . . . . . . . . . . . . . . . . . . .608 A.4 Floating-Point Round to Integer Model . . . . . . . . . . . . . . . . . . . . . . . . . .610 D.5 Convert to Single-Precision Embedded Floating-Point from Integer Word . . . . . . . . . . . . . . . . . . . . . . . . . . 623 D.6 Convert to Double-Precision Embedded Floating-Point from Integer Word . . . . . . . . . . . . . . . . . . . . . . . . . . 623 D.7 Convert to Double-Precision Embedded Floating-Point from Integer Doubleword . . . . . . . . . . . . . . . . . . . . . 624
Appendix C. Vector RTL Functions [Category: Vector] . . . . . . . . . . . . . 617 Appendix D. Embedded FloatingPoint RTL Functions [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector] . . . . . . . . . . . . . . . . . . . . . . 619
D.1 Common Functions . . . . . . . . . . . .619 D.2 Convert from Single-Precision Embedded Floating-Point to Integer Word with Saturation . . . . . . . . . . . . . . . . . . .620 D.3 Convert from Double-Precision Embedded Floating-Point to Integer Word with Saturation . . . . . . . . . . . . . . . . . . .621 D.4 Convert from Double-Precision Embedded Floating-Point to Integer Doubleword with Saturation . . . . . . . . .622
xii
Book II: Power ISA Virtual Environment Architecture. . . . . . . . . . . . . . . . . . 649 Chapter 1. Storage Model . . . . . . 651
1.1 Definitions . . . . . . . . . . . . . . . . . . . 1.2 Introduction . . . . . . . . . . . . . . . . . . 1.3 Virtual Storage . . . . . . . . . . . . . . . 1.4 Single-Copy Atomicity . . . . . . . . . 1.5 Cache Model . . . . . . . . . . . . . . . . 1.6 Storage Control Attributes . . . . . . 1.6.1 Write Through Required . . . . . . 1.6.2 Caching Inhibited . . . . . . . . . . . 1.6.3 Memory Coherence Required [Category: Memory Coherence] . . . . . 1.6.4 Guarded . . . . . . . . . . . . . . . . . . 1.6.5 Endianness [Category: Embedded.Little-Endian]. . . . . . . . . . . 1.6.6 Variable Length Encoded (VLE) Instructions . . . . . . . . . . . . . . . . . . . . . 1.6.7 Strong Access Order [Category: SAO] . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Shared Storage . . . . . . . . . . . . . . 1.7.1 Storage Access Ordering . . . . 651 652 653 653 654 654 654 655 655 655 656 656 657 658 658
Table of Contents
xiii
Version 2.06 Revision B Chapter 7. External Control [Category: External Control] . . . . 711
7.1 External Access Instructions . . . . .712
Book III-S: Power ISA Operating Environment Architecture - Server Environment [Category: Server] . . . . . . . . . . . . . 721 Chapter 1. Introduction . . . . . . . . 723
1.1 Overview . . . . . . . . . . . . . . . . . . . .723 1.2 Document Conventions . . . . . . . . .723 1.2.1 Definitions and Notation . . . . . . .723 1.2.2 Reserved Fields . . . . . . . . . . . . .724 1.3 General Systems Overview . . . . . .724 1.4 Exceptions . . . . . . . . . . . . . . . . . . .725 1.5 Synchronization . . . . . . . . . . . . . . .725 1.5.1 Context Synchronization . . . . . . .725 1.5.2 Execution Synchronization . . . . .726
xiv
Table of Contents
xv
Chapter 10. Synchronization Requirements for Context Alterations. . . . . . . . . . . . . . . . . . . 845 Appendix A. Assembler Extended Mnemonics . . . . . . . . . . . . . . . . . . 851
A.1 Move To/From Special Purpose Register Mnemonics . . . . . . . . . . . . . . 851
Appendix C. Example Trace Extensions . . . . . . . . . . . . . . . . . . 861 Appendix D. Interpretation of the DSISR as Set by an Alignment Interrupt. . . . . . . . . . . . . . . . . . . . . 863 Book III-E: Power ISA Operating Environment Architecture - Embedded Environment [Category: Embedded] . . . . . . . . . . . . . . . . . . 867 Chapter 1. Introduction . . . . . . . . 869
1.1 Overview. . . . . . . . . . . . . . . . . . . . 1.2 32-Bit Implementations . . . . . . . . . 1.3 Document Conventions . . . . . . . . 1.3.1 Definitions and Notation . . . . . . 1.3.2 Reserved Fields. . . . . . . . . . . . . 1.4 General Systems Overview . . . . . 1.5 Exceptions . . . . . . . . . . . . . . . . . . 869 869 869 869 871 871 871
xvi
Table of Contents
xvii
xviii
Table of Contents
xix
xx
Version 2.06 Revision B Chapter 12. Synchronization Requirements for Context Alterations . . . . . . . . . . . . . . . . . . 1079 Appendix A. ImplementationDependent Instructions . . . . . . . 1083
A.1 Embedded Cache Initialization [Category: Embedded.Cache Initialization] . . . . . . . . . . . . . . . . . . . 1083 A.2 Embedded Cache Debug Facility [Category: Embedded.Cache Debug] 1084 A.2.1 Embedded Cache Debug Registers . . . . . . . . . . . . . . . . . . . . . . 1084 A.2.1.1 Data Cache Debug Tag Register High. . . . . . . . . . . . . . . . . . . . . . . . . . 1084 A.2.1.2 Data Cache Debug Tag Register Low . . . . . . . . . . . . . . . . . . . . . . . . . . 1084 A.2.1.3 Instruction Cache Debug Data Register. . . . . . . . . . . . . . . . . . . . . . . 1085 A.2.1.4 Instruction Cache Debug Tag Register High . . . . . . . . . . . . . . . . . . 1085 A.2.1.5 Instruction Cache Debug Tag Register Low . . . . . . . . . . . . . . . . . . . 1085 A.2.2 Embedded Cache Debug Instructions . . . . . . . . . . . . . . . . . . . . 1086
Book VLE: Power ISA Operating Environment Architecture Variable Length Encoding (VLE) Environment [Category: Variable Length Encoding] . . . . . . . . . . . . . . . . . . . 1103 Chapter 1. Variable Length Encoding Introduction . . . . . . . . . . . . . . . . . 1105
1.1 Overview . . . . . . . . . . . . . . . . . . . 1105 1.2 Documentation Conventions . . . . 1106 1.2.1 Description of Instruction Operation . . . . . . . . . . . . . . . . . . . . . . 1106 1.3 Instruction Mnemonics and Operands . . . . . . . . . . . . . . . . . . . . . . 1106 1.4 VLE Instruction Formats . . . . . . . 1106 1.4.1 BD8-form (16-bit Branch Instructions) . . . . . . . . . . . . . . . . . . . . 1106 1.4.2 C-form (16-bit Control Instructions) . . . . . . . . . . . . . . . . . . . . 1106 1.4.3 IM5-form (16-bit register + immediate Instructions) . . . . . . . . . . . . . . . . . . . . 1106 1.4.4 OIM5-form (16-bit register + offset immediate Instructions) . . . . . . . . . . . 1106
Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 32-bit Implementations. . . . . . . . 1093
C.1 Hardware Guidelines . . . . . . . . . 1093 C.1.1 64-bit Specific Instructions . . . 1093 C.1.2 Registers on 32-bit Implementations . . . . . . . . . . . . . . . . 1093 C.1.3 Addressing on 32-bit Implementations . . . . . . . . . . . . . . . . 1093 C.1.4 TLB Fields on 32-bit Implementations . . . . . . . . . . . . . . . . 1093 C.1.5 Thread Control and Status on 32-bit Implementations . . . . . . . . . . . . . . . . 1093 C.2 32-bit Software Guidelines . . . . . 1093 C.2.1 32-bit Instruction Selection . . . 1093
Table of Contents
xxi
xxii
Version 2.06 Revision B Appendix A. VLE Instruction Set Sorted by Mnemonic. . . . . . . . . . 1159 Appendix B. VLE Instruction Set Sorted by Opcode . . . . . . . . . . . . 1175 Appendices: Power ISA Book I-III Appendices . . . . . . . . . . . . . . . . . 1191 Appendix A. Incompatibilities with the POWER Architecture . . . . . . 1193
A.1 New Instructions, Formerly Privileged Instructions . . . . . . . . . . . . . . . . . . . . 1193 A.2 Newly Privileged Instructions . . . . . . . . . . . . . . . . . . . . 1193 A.3 Reserved Fields in Instructions . . . . . . . . . . . . . . . . . . . . 1193 A.4 Reserved Bits in Registers . . . . . 1193 A.5 Alignment Check . . . . . . . . . . . . 1193 A.6 Condition Register . . . . . . . . . . . 1194 A.7 LK and Rc Bits . . . . . . . . . . . . . . 1194 A.8 BO Field . . . . . . . . . . . . . . . . . . . 1194 A.9 BH Field . . . . . . . . . . . . . . . . . . . 1194 A.10 Branch Conditional to Count Register. . . . . . . . . . . . . . . . . . . . . . . 1194 A.11 System Call . . . . . . . . . . . . . . . 1194 A.12 Fixed-Point Exception Register (XER) . . . . . . . . . . . . . . . . . 1195 A.13 Update Forms of Storage Access Instructions . . . . . . . . . . . . . . . . . . . . 1195 A.14 Multiple Register Loads . . . . . . 1195 A.15 Load/Store Multiple Instructions 1195 A.16 Move Assist Instructions. . . . . . 1195 A.17 Move To/From SPR . . . . . . . . . 1195 A.18 Effects of Exceptions on FPSCR Bits FR and FI . . . . . . . . . . . . . . . . . . . . . 1196 A.19 Store Floating-Point Single Instructions . . . . . . . . . . . . . . . . . . . . 1196 A.20 Move From FPSCR . . . . . . . . . 1196 A.21 Zeroing Bytes in the Data Cache . . . . . . . . . . . . . . . . . . . . . . . . 1196 A.22 Synchronization . . . . . . . . . . . . 1196 A.23 Move To Machine State Register Instruction . . . . . . . . . . . . . . . . . . . . . 1196 A.24 Direct-Store Segments . . . . . . . 1196 A.25 Segment Register Manipulation Instructions . . . . . . . . . 1196 A.26 TLB Entry Invalidation . . . . . . . 1197 A.27 Alignment Interrupts . . . . . . . . . 1197 A.28 Floating-Point Interrupts . . . . . . 1197 A.29 Timing Facilities . . . . . . . . . . . . 1197 A.29.1 Real-Time Clock . . . . . . . . . . 1197 A.29.2 Decrementer . . . . . . . . . . . . . 1197 A.30 Deleted Instructions . . . . . . . . . 1198 A.31 Discontinued Opcodes . . . . . . . 1198 A.32 POWER2 Compatibility. . . . . . . 1199 A.32.1 Cross-Reference for Changed POWER2 Mnemonics . . . . . . . . . . . . 1199 A.32.2 Load/Store Floating-Point Double . . . . . . . . . . . . . . . . . . . . . . . . 1199 A.32.3 Floating-Point Conversion to Integer . . . . . . . . . . . . . . . . . . . . . . . . 1199 A.32.4 Floating-Point Interrupts. . . . . 1200 A.32.5 Trace . . . . . . . . . . . . . . . . . . . 1200 A.33 Deleted Instructions . . . . . . . . . 1200 A.33.1 Discontinued Opcodes. . . . . . 1200
Appendix B. Platform Support Requirements . . . . . . . . . . . . . . . . 1201 Appendix C. Complete SPR List. . . . . . . . . . . . . . . . . . . . . . . . . 1205 Appendix D. Illegal Instructions . . . . . . . . . . . . . . . . . 1209 Appendix E. Reserved Instructions . . . . . . . . . . . . . . . . . 1211 Appendix F. Opcode Maps . . . . . 1213
Appendix G. Power ISA Instruction Set Sorted by Category . . . . . . . . 1237 Appendix H. Power ISA Instruction Set Sorted by Opcode . . . . . . . . . 1259 Appendix I. Power ISA Instruction Set Sorted by Mnemonic . . . . . . . 1281 Index . . . . . . . . . . . . . . . . . . . . . . . . 1303 Last Page - End of Document . . . . 1313
Table of Contents
xxiii
xxiv
Figures
Preface ................................................. iii Table of Contents ................................ vii Figures............................................... xxv Book I: Power ISA User Instruction Set Architecture ....................................................... 1
1. Category Listing . . . . . . . . . . . . . . . . . . . . . . . . . 2. Logical processing model . . . . . . . . . . . . . . . . . 3. Registers that are defined in Book I . . . . . . . . . 4. I instruction format. . . . . . . . . . . . . . . . . . . . . . . 5. B instruction format . . . . . . . . . . . . . . . . . . . . . . 6. SC instruction format. . . . . . . . . . . . . . . . . . . . . 7. D instruction format . . . . . . . . . . . . . . . . . . . . . . 8. DS instruction format. . . . . . . . . . . . . . . . . . . . . 9. DQ instruction format . . . . . . . . . . . . . . . . . . . . 10. X Instruction Format . . . . . . . . . . . . . . . . . . . . 11. XL instruction format . . . . . . . . . . . . . . . . . . . . 12. XFX instruction format. . . . . . . . . . . . . . . . . . . 13. XFL instruction format . . . . . . . . . . . . . . . . . . . 14. XX1 Instruction Format . . . . . . . . . . . . . . . . . . 15. XX2 Instruction Format . . . . . . . . . . . . . . . . . . 16. XX3 Instruction Format . . . . . . . . . . . . . . . . . . 17. XX4-Form Instruction Format . . . . . . . . . . . . . 18. XS instruction format . . . . . . . . . . . . . . . . . . . . 19. XO instruction format. . . . . . . . . . . . . . . . . . . . 20. A instruction format . . . . . . . . . . . . . . . . . . . . . 21. M instruction format. . . . . . . . . . . . . . . . . . . . . 22. MD instruction format . . . . . . . . . . . . . . . . . . . 23. MDS instruction format . . . . . . . . . . . . . . . . . . 24. VA instruction format . . . . . . . . . . . . . . . . . . . . 25. VC instruction format. . . . . . . . . . . . . . . . . . . . 26. VX instruction format . . . . . . . . . . . . . . . . . . . . 27. EVX instruction format. . . . . . . . . . . . . . . . . . . 28. EVS instruction format . . . . . . . . . . . . . . . . . . 29. Z22 instruction format . . . . . . . . . . . . . . . . . . . 30. Z23 instruction format . . . . . . . . . . . . . . . . . . . 31. Storage operands and byte ordering. . . . . . . . 32. C structure s, showing values of elements . . 33. Big-Endian mapping of structure s. . . . . . . . . 34. Little-Endian mapping of structure s . . . . . . . 9 12 13 15 15 15 15 15 15 16 16 16 16 17 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 24 24 24 24 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. Instructions and byte ordering . . . . . . . . . . . . . 25 Assembly language program p. . . . . . . . . . . . 25 Big-Endian mapping of program p . . . . . . . . . 25 Little-Endian mapping of program p . . . . . . . . 25 Condition Register . . . . . . . . . . . . . . . . . . . . . . 30 Link Register . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Count Register . . . . . . . . . . . . . . . . . . . . . . . . . 31 BO field encodings . . . . . . . . . . . . . . . . . . . . . . 32 at bit encodings . . . . . . . . . . . . . . . . . . . . . . . 32 BH field encodings . . . . . . . . . . . . . . . . . . . . . . 33 General Purpose Registers . . . . . . . . . . . . . . . 42 Fixed-Point Exception Register . . . . . . . . . . . . 42 Software-use SPRs . . . . . . . . . . . . . . . . . . . . . 43 Floating-Point Registers. . . . . . . . . . . . . . . . . 107 Floating-Point Status and Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Floating-Point Result Flags . . . . . . . . . . . . . . 109 Floating-point single format. . . . . . . . . . . . . . 109 Floating-point double format . . . . . . . . . . . . . 110 IEEE floating-point fields . . . . . . . . . . . . . . . . 110 Approximation to real numbers . . . . . . . . . . . 110 Selection of Z1 and Z2 . . . . . . . . . . . . . . . . . . 114 IEEE 64-bit execution model . . . . . . . . . . . . . 120 Interpretation of G, R, and X bits . . . . . . . . . . 120 Location of the Guard, Round, and Sticky bits in the IEEE execution model . . . 120 Multiply-add 64-bit execution model. . . . . . . . 121 Location of the Guard, Round, and Sticky bits in the multiply-add execution model . . . . . . . . . . . 121 Format for Unsigned Decimal Data . . . . . . . . 157 Format for Signed Decimal Data . . . . . . . . . . 157 Summary of BCD Digit and Sign Codes . . . . 157 DFP Short format . . . . . . . . . . . . . . . . . . . . . . 158 DFP Long format . . . . . . . . . . . . . . . . . . . . . . 158 DFP Extended format. . . . . . . . . . . . . . . . . . . 158 Encoding of the G field for Special Symbols . 158 Encoding of bits 0:4 of the G field for Finite Numbers 158 Summary of DFP Formats . . . . . . . . . . . . . . . 159 Value Ranges for Finite Number Data Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Encoding of NaN and Infinity Data Classes . . 160 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Encoding of DFP Rounding-Mode Control ( DRN). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Primary Encoding of Rounding-Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Secondary Encoding of Rounding-Mode
Figures
xxv
Book III-S: Power ISA Operating Environment Architecture - Server Environment [Category: Server] ............................ 721
1. Logical Partitioning Control Register . . . . . . . . 727 2. Real Mode Offset Register. . . . . . . . . . . . . . . . 729 3. Hypervisor Real Mode Offset Register. . . . . . . 730 4. Logical Partition Identification Register . . . . . . 730 5. Processor Compatibility Register . . . . . . . . . . . 730 6. Machine State Register . . . . . . . . . . . . . . . . . . 735 7. Processor Version Register . . . . . . . . . . . . . . . 745 8. Processor Identification Register . . . . . . . . . . . 746 9. Control Register . . . . . . . . . . . . . . . . . . . . . . . . 746 10. Software-use SPRs . . . . . . . . . . . . . . . . . . . . 747 11. SPRs for use by hypervisor programs . . . . . . 747 12. Priority levels for or Rx,Rx,Rx . . . . . . . . . . . . 752 13. SPR encodings . . . . . . . . . . . . . . . . . . . . . . . 753 14. SLBE for VRMA . . . . . . . . . . . . . . . . . . . . . . . 767 15. Address translation overview . . . . . . . . . . . . . 769 16. Translation of 64-bit effective address to 78 bit virtual address. . . . . . . . . . . . . . . . . . 769 17. SLB Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 18. Page Size Encoding. . . . . . . . . . . . . . . . . . . . 770 19. Translation of 78-bit virtual address to 60-bit real address . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 20. Page Table Entry . . . . . . . . . . . . . . . . . . . . . . 773 21. Format of PTELP when PTEL=1. . . . . . . . . . . 774 22. SDR1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 23. Setting the Reference and Change bits . . . . . 779 24. Authority Mask Register (AMR) . . . . . . . . . . . 780 25. Authority Mask Override Register (AMOR) . . 781 26. User Authority Mask Override Register (UAMOR) . . . . . . . . . . . . . . . . . . . . . . . . . . 781 27. PP bit protection states, address
xxvi
Book III-E: Power ISA Operating Environment Architecture - Embedded Environment [Category: Embedded]...................... 867
1. 2. 3. 4. 5. 6. Logical Partition Identification Register . . . . . . Thread Identification Register . . . . . . . . . . . . . Thread Enable Register . . . . . . . . . . . . . . . . . Thread Enable Status Register . . . . . . . . . . . . Initialize Next Instruction Address Register. . . Thread Management Register Numbers . . . . . 874 877 877 878 879 880
Figures
xxvii
999 1000 1001 1001 1002 1002 1003 1003 1011 1013 1034 1041 1042 1045 1047 1048 1049 1052 1052 1070 1084 1084 1085 1085 1085 1096 1097 1098 1098 1099 1100
Book VLE: Power ISA Operating Environment Architecture Variable Length Encoding (VLE) Environment [Category: Variable Length Encoding] ........................................ 1103
1. 2. 3. 4. 5. 6. 7. 8. BD8 instruction format. . . . . . . . . . . . . . . . . . C instruction format . . . . . . . . . . . . . . . . . . . . IM5 instruction format . . . . . . . . . . . . . . . . . . OIM5 instruction format . . . . . . . . . . . . . . . . . IM7 instruction format . . . . . . . . . . . . . . . . . . R instruction format . . . . . . . . . . . . . . . . . . . . RR instruction format. . . . . . . . . . . . . . . . . . . SD4 instruction format. . . . . . . . . . . . . . . . . . 1106 1106 1106 1106 1106 1107 1107 1107
xxviii
Power ISA I
Chapter 1. Introduction
1.1 Overview. . . . . . . . . . . . . . . . . . . . . . 3 1.2 Instruction Mnemonics and Operands . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Document Conventions . . . . . . . . . . 4 1.3.1 Definitions . . . . . . . . . . . . . . . . . . . 4 1.3.2 Notation . . . . . . . . . . . . . . . . . . . . . 5 1.3.3 Reserved Fields and Reserved Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.4 Description of Instruction Operation . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.5 Categories . . . . . . . . . . . . . . . . . . . 9 1.3.5.1 Phased-In/Phased-Out . . . . . . . 10 1.3.5.2 Corequisite Category . . . . . . . . 11 1.3.5.3 Category Notation. . . . . . . . . . . 11 1.3.6 Environments. . . . . . . . . . . . . . . . 11 1.4 Processor Overview . . . . . . . . . . . . 12 1.5 Computation modes . . . . . . . . . . . . 14 1.5.1 Modes [Category: Server] . . . . . . 14 1.5.2 Modes [Category: Embedded]. . . 14 1.6 Instruction formats . . . . . . . . . . . . . 14 1.6.1 I-FORM . . . . . . . . . . . . . . . . . . . . 15 1.6.2 B-FORM . . . . . . . . . . . . . . . . . . . 15 1.6.3 SC-FORM . . . . . . . . . . . . . . . . . . 15 1.6.4 D-FORM . . . . . . . . . . . . . . . . . . . 15 1.6.5 DS-FORM . . . . . . . . . . . . . . . . . . 15 1.6.6 DQ-FORM . . . . . . . . . . . . . . . . . . 15 1.6.7 X-FORM . . . . . . . . . . . . . . . . . . . 16 1.6.8 XL-FORM . . . . . . . . . . . . . . . . . . 16 1.6.9 XFX-FORM . . . . . . . . . . . . . . . . . 16 1.6.10 XFL-FORM . . . . . . . . . . . . . . . . 16 1.6.11 XX1-FORM . . . . . . . . . . . . . . . . 17 1.6.12 XX2-FORM. . . . . . . . . . . . . . . . . 1.6.13 XX3-FORM. . . . . . . . . . . . . . . . . 1.6.14 XX4-FORM. . . . . . . . . . . . . . . . . 1.6.15 XS-FORM. . . . . . . . . . . . . . . . . . 1.6.16 XO-FORM . . . . . . . . . . . . . . . . . 1.6.17 A-FORM . . . . . . . . . . . . . . . . . . . 1.6.18 M-FORM . . . . . . . . . . . . . . . . . . 1.6.19 MD-FORM . . . . . . . . . . . . . . . . . 1.6.20 MDS-FORM . . . . . . . . . . . . . . . . 1.6.21 VA-FORM . . . . . . . . . . . . . . . . . . 1.6.22 VC-FORM . . . . . . . . . . . . . . . . . 1.6.23 VX-FORM. . . . . . . . . . . . . . . . . . 1.6.24 EVX-FORM . . . . . . . . . . . . . . . . 1.6.25 EVS-FORM . . . . . . . . . . . . . . . . 1.6.26 Z22-FORM . . . . . . . . . . . . . . . . . 1.6.27 Z23-FORM . . . . . . . . . . . . . . . . . 1.6.28 Instruction Fields . . . . . . . . . . . . 1.7 Classes of Instructions . . . . . . . . . . 1.7.1 Defined Instruction Class . . . . . . . 1.7.2 Illegal Instruction Class . . . . . . . . 1.7.3 Reserved Instruction Class . . . . . 1.7.4 Preferred Instruction Forms . . . . . 1.7.5 Invalid Instruction Forms . . . . . . . 1.7.6 Reserved-no-op Instructions [Category: Phased-In] . . . . . . . . . . . . . . 1.8 Exceptions. . . . . . . . . . . . . . . . . . . . 1.9 Storage Addressing. . . . . . . . . . . . . 1.9.1 Storage Operands . . . . . . . . . . . . 1.9.2 Instruction Fetches . . . . . . . . . . . . 1.9.3 Effective Address Calculation. . . . 17 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 21 21 22 22 22 22 22 22 23 23 24 26
1.1 Overview
This chapter describes computation modes, document conventions, a processor overview, instruction formats, storage addressing, and instruction fetching.
Chapter 1. Introduction
Power ISA I
1.3.2 Notation
The following notation is used throughout the Power ISA documents. All numbers are decimal unless specified in some special way.
Xp:q means bits p through q of register/instruction/field/bit_string X. Xp q ... means bits p, q, ... of register/instruction/field/bit_string X.
(RA)
means the ones complement of the contents of register RA. A period (.) as the last character of an instruction mnemonic means that the instruction records status information in certain fields of the Condition Register as a side effect of execution. The symbol || is used to describe the concatenation of two values. For example, 010 || 111 is the same as 010111. xn means x raised to the nth power.
n
0bnnnn means a number expressed in binary format. 0xnnnn means a number expressed in hexadecimal format.
Underscores may be used between digits. RT, RA, R1, ... refer to General Purpose Registers. FRT, FRA, FR1, ... refer to Floating-Point Registers. FRTp, FRAp, FRBp, ... refer to an even-odd pair of Floating-Point Registers. Values must be even, otherwise the instruction form is invalid. VRT, VRA, VR1, ... refer to Vector Registers. (x) means the contents of register x, where x is the name of an instruction field. For example, (RA) means the contents of register RA, and (FRA) means the contents of register FRA, where RA and FRA are instruction fields. Names such as LR and CTR denote registers, not fields, so parentheses are not used with them. Parentheses are also omitted when register x is the register into which the result of an operation is placed. (RA|0) means the contents of register RA if the RA field has the value 1-31, or the value 0 if the RA field is 0. Bytes in instructions, fields, and bit strings are numbered from left to right, starting with byte 0 (most significant). Bits in registers, instructions, fields, and bit strings are specified as follows. In the last three items (definition of Xp etc.), if X is a field that specifies a GPR, FPR, or VR (e.g., the RS field of an instruction), the definitions apply to the register, not to the field.
x means the replication of x, n times (i.e., x concatenated to itself n-1 times). n0 and n1 are special cases:
n0 means a field of n bits with each bit equal to 0. Thus 50 is equivalent to 0b00000. n1 means a field of n bits with each bit equal to 1. Thus 51 is equivalent to 0b11111.
Each bit and field in instructions, and in status and control registers (e.g., XER, FPSCR) and Special Purpose Registers, is either defined or reserved. Some defined fields contain reserved values. In such cases when this document refers to the specific field, it refers only to the defined values, unless otherwise specified. /, //, ///, ... denotes a reserved field, in a register, instruction, field, or bit string. ?, ??, ???, ... denotes an implementation-dependent field in a register, instruction, field or bit string.
Bits in instructions, fields, and bit strings are numbered from left to right, starting with bit 0 For all registers except the Vector category, bits in registers that are less than 64 bits start with bit number 64-L, where L is the register length; for the Vector category, bits in registers that are less than 128 bits start with bit number 128-L. The leftmost bit of a sequence of bits is the most significant bit of the sequence. Xp means bit p of register/instruction/field/ bit_string X.
Chapter 1. Introduction
Programming Note It is the responsibility of software to preserve bits that are now reserved in System Registers, because they may be assigned a meaning in some future version of the architecture. In order to accomplish this preservation in implementation-independent fashion, software should do the following. Initialize each such register supplying zeros for all reserved bits. Alter (defined) bit(s) in the register by reading the register, altering only the desired bit(s), and then writing the new value back to the register. The XER and FPSCR are partial exceptions to this recommendation. Software can alter the status bits in these registers, preserving the reserved bits, by executing instructions that have the side effect of altering the status bits. Similarly, software can alter any defined bit in the FPSCR by executing a Floating-Point Status and Control Register instruction. Using such instructions is likely to yield better performance than using the method described in the second item above.
Power ISA I
Meaning Assignment Assignment of an instruction effective iea address. In 32-bit mode the high-order 32 bits of the 64-bit target address are set to 0. NOT logical operator + Twos complement addition Twos complement subtraction, unary minus Multiplication si Signed-integer multiplication ui Unsigned-integer multiplication / Division Division, with result truncated to integer Square root =, Equals, Not Equals relations <, , >, Signed comparison relations <u, >u Unsigned comparison relations ? Unordered comparison relation &, | AND, OR logical operators , Exclusive OR, Equivalence logical operators ((ab) = (ab)) ABS(x) Absolute value of x CEIL(x) Least integer x DCR(x) Device Control Register x <E.DC> DOUBLE(x) Result of converting x from floating-point single format to floating-point double format, using the model shown on page 123 EXTS(x) Result of extending x on the left with sign bits FLOOR(x) Greatest integer x GPR(x) General Purpose Register x MASK(x, y) Mask having 1s in positions x through y (wrapping if x > y) and 0s elsewhere MEM(x, y) Contents of a sequence of y bytes of storage. The sequence depends on the byte ordering used for storage access, as follows. Big-Endian byte ordering: The sequence starts with the byte at address x and ends with the byte at address x+y-1. Little-Endian byte ordering: The sequence starts with the byte at address x+y-1 and ends with the byte at address x. MEM_DECORATED(x,y,z) Contents of a sequence of y bytes of storage, where the storage is accessed with decoration z applied. The sequence depends on the byte ordering used for storage access, as follows. Big-Endian byte ordering: The sequence starts with the byte at address x and ends with the byte at address x+y-1.
Notation
Chapter 1. Introduction
Power ISA I
1.3.5 Categories
Each facility (including registers and fields therein) and instruction is in exactly one of the categories listed in Figure 1. A category may be defined as a dependent category. These are categories that are supported only if the category they are dependent on is also supported. DepenCategory Base Server Embedded Alternate Time Base Cache Specification Decimal Floating-Point Decorated Storage Embedded.Cache Debug Embedded.Cache Initialization Embedded.Device Control Embedded.Enhanced Debug Embedded.External PID Embedded.Hypervisor Embedded.Hypervisor.LRAT Embedded.Little-Endian Embedded.Page Table Embedded.TLB Write Conditional Embedded.Performance Monitor Embedded.Processor Control Embedded Cache Locking Abvr. B S E ATB CS DFP DS E.CD E.CI E.DC E.ED E.PD Notes
dent categories are identified by the . in their category name, e.g., if an implementation supports the Floating-Point.Record category, then the Floating-Point category is also supported. An implementation that supports a facility or instruction in a given category, except for the two categories described in Section 1.3.5.1, supports all facilities and instructions in that category.
Required for all implementations Required for Server implementations Required for Embedded implementations An additional Time Base; see Book II Specify a specific cache for some instructions; see Book II Decimal Floating-Point facilities Decorated Storage facilities Provides direct access to cache data and directory content Instructions that invalidate the entire cache Legacy Device Control bus support Embedded Enhanced Debug facility; see Book III-E Embedded External PID facility; see Book III-E
E.HV Embedded Logical Partitioning and hypervisor facilities E.HV.LRAT Embedded Hypervisor Logical to Real Address Translation facility; see Book III-E E.LE E.PT E.TWC E.PM E.PC ECL Embedded Little-Endian page attribute; see Book III-E Embedded Page Table facility; see Book III-E Embedded TLB Write Conditional facility; see Book III-E Embedded Performance Monitor example; see Book III-E Embedded Processor Control facility; see Book III-E Embedded Cache Locking facility; see Book III-E Embedded Multi-Threading; see Book III-E Embedded Multi-Threading Thread Management Facility External Control facility; see Book II External Proxy facility; see Book III-E Floating-Point Facilities Floating-Point instructions with Rc=1 Legacy Integer Multiply-accumulate instructions Determine Left most Zero Byte instruction Load/Store Quadword instructions; see Book III-S Requirement for Memory Coherence; see Book II Move Assist instructions Processor Compatibility Register Server Performance Monitor example; see Book III-S HTAB alignment on 256 KB boundary; see Book III-S
Embedded Multi-Threading EM Embedded Multi-Threading.Thread EM.TM Management External Control External Proxy Floating-Point Floating-Point.Record Legacy Integer Multiply-Accumulate1 Legacy Move Assist Load/Store Quadword Memory Coherence Move Assist Processor Compatibility Server.Performance Monitor EC EXP FP FP.R LMA LMV LSQ MMC MA PCR S.PM
Chapter 1. Introduction
Category Signal Processing Engine1, 2 SPE.Embedded Float Scalar Double SPE.Embedded Float Scalar Single SPE.Embedded Float Vector Store Conditional Page Mobility Stream Strong Access Order Trace Variable Length Encoding Vector-Scalar Extension
Abvr. SP SP.FD SP.FS SP.FV SCPM STM SAO TRC VLE VSX
Notes Facility for signal processing GPR-based Floating-Point double-precision instruction set GPR-based Floating-Point single-precision instruction set GPR-based Floating-Point Vector instruction set Store Conditional accounting for page movement; see Book II Stream variant of dcbt instruction; see Book II Assist for X86 and Sparc emulation; see Book II Trace Facility; see Book III-S Variable Length Encoding facility; see Book VLE Vector-Scalar Extension Requires implementation of Floating-Point and Vector categories Vector facilities Little-Endian support for Vector storage operations. wait instruction; see Book II Required for 64-bit implementations; not defined for 32-bit impls
V V.LE WT 64
Because of overlapping opcode usage, SPE is mutually exclusive with Vector and with Legacy Integer Multiply-Accumulate, and Legacy Integer Multiply-Accumulate is mutually exclusive with Vector. 2 The SPE-dependent Floating-Point categories are collectively referred to as SPE.Embedded Float_* or SP.*. Figure 1. Category Listing (Sheet 2 of 2) An instruction in a category that is not supported by the implementation is treated as an illegal instruction or an unimplemented instruction on that implementation (see Section 1.7.2). For an instruction that is supported by the implementation with field values that are defined by the architecture, the field values defined as part of a category that is not supported by the implementation are treated as reserved values on that implementation (see Section 1.3.3 and Section 1.7.5). Bits in a register that are in a category that is not supported by the implementation are treated as reserved. Phased-In as defined below. Abbreviations, if applicable, are shown in parentheses. Phased-In These are facilities and instructions that, in the next version of the architecture, will be required as part of the category they are dependent on. Servers do not implement a facility or instruction in this category. Servers that comply with earlier versions of this architecture may have optionally implemented facilities or instructions that were category Phased-In. Server, These are facilities and Embedded.Phased-In instructions that are part of (S,E.PI) the Server environment and, in the next version of the architecture, will be required for the Embedded environment. It is implementation-dependent whether Embedded processors implement a facility or instruction in this category.
1.3.5.1 Phased-In/Phased-Out
There are two special categories, Phased-In and Phased-Out, as well as two additional variations of
10
Power ISA I
These are facilities and instructions that are part of the Embedded environment and, in the next version of the architecture, will be required for the Server environment. Servers do not implement a facility or instruction in this category.
1.3.6 Environments
All implementations support one of the two defined environments, Server or Embedded. Environments refer to common subsets of instructions that are shared across many implementations. The Server environment describes implementations that support Category: Base and Server. The Embedded environment describes implementations that support Category: Base and Embedded.
Phased-Out
These are facilities and instructions that, in some future version of the architecture, will be dropped out of the architecture. System developers should develop a migration plan to eliminate use of them in new systems. For Server platforms, Phased-Out facilities and instructions must be implemented if the facility or instruction is part of another category (including the Base category) that is supported by the Server platform.
Programming Note Warning: Instructions and facilities being phased out of the architecture are likely to perform poorly on future implementations. New programs should not use them. Programming Note Facilities are categorized as Phased-In only in cases where there is a difference between the Server and Embedded environments.
Chapter 1. Introduction
11
to application programs are defined in other Books, and are not shown in the figure.)
data
instructions
storage
Figure 2.
12
Power ISA I
CR
32 63 32
FPSCR
63
and
Control
Register
on
Category: Vector:
VR 0 VR 1 ... ... VR 30 VR 31
0 127
VSR 63
127
Category: Embedded:
SPRG4 SPRG5 SPRG6 SPRG7
0 63
Category: SPE:
Accumulator
0 63
Category: Floating-Point:
FPR 0 FPR 1 ... ... FPR 30 FPR 31
0 63
Signal Processing and Embedded Floating-Point Status and Control Register on page 504
Floating-Point Registers on page 107 Figure 3. Registers that are defined in Book I
Chapter 1. Introduction
13
When an effective address is placed in a register other than the Initialize Next Instruction register (see Section 3.7.1 of Book III-E) by an instruction or event, the high-order 32 bits are set to an undefined value (see Section 1.9.3). Except for instructions in the SPE category, instructions that operate on GPRs and SPRs use only the low-order 32 bits of the source GPR or SPR and produce a 32-bit result; the high-order 32 bits of target GPRs are set to an undefined value, and the high-order 32 bits of target SPRs are preserved. The 64-Bit category is not supported. Programming Note The high-order 32 bits of 64-bit SPRs are not modified in 32-bit mode because for some 64-bit SPRs, such as the Thread Enable Register (see Section 3.3 of Book III-E), these bits control facilities that are active in 32-bit mode. Treating all 64-bit SPRs the same way in this regard simplifies architecture and implementation.
Implementations may provide a means for selecting between the two treatments of the high-order 32 bits of GPRs in 32-bit mode (i.e., for selecting between the behavior described in the first sub-bullet and the behavior described in the second sub-bullet). The means, if provided, is implementation-specific (including any software synchronization requirements for changing the selection), but must be hypervisor privileged, and the hypervisor must ensure that the selection is constant for a given partition. 32-bit processors provide only 32-bit mode, and provide it as described by the second sub-bullet of the 32-bit mode bullet above.
If these bits are implemented in 32-bit mode, the processor behaves as described for 32-bit mode in the Server environment. If these bits are not implemented in 32-bit mode, the processor behaves as described for 32-bit mode in the Server Environment except for the following.
14
Power ISA I
1.6.4 D-FORM
0 6 11 16 31
OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD Figure 7.
RT RT RS RS BF / L BF / L TO FRT FRS
RA RA RA RA RA RA RA RA RA
D SI D UI SI UI SI D D
D instruction format
1.6.5 DS-FORM
0 6 11 16 30 31
RA RA RA RA RA
DS DS DS DS DS
XO XO XO XO XO
1.6.1 I-FORM
0 6 30 31
DS instruction format
OPCD Figure 4.
LI I instruction format
AA LK
1.6.6 DQ-FORM
0 6 11 16 28 31
OPCD Figure 9.
RTp
RA
DQ
///
1.6.2 B-FORM
0 6 11 16 30 31
DQ instruction format
OPCD Figure 5.
BO
BI
BD
AA LK
B instruction format
1.6.3 SC-FORM
0 6 11 16 20 27 30 31
/// ///
/// ///
// ///
LEV ///
// //
1 1
/ /
SC instruction format
Chapter 1. Introduction
15
1.6.7 X-FORM
0 6 11 16 21 31
11
16
21
31
OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD
RT RA RT RA RT RA RT RA RT / SR RT /// RT /// RT /// RS RA RT RA RS RA RS RA RS RA RS RA RS RA RS RA RS / SR RS /// RS /// RS /// L TH RA BF / L RA BF // FRA BF // BFA // BF // /// W BF // /// TH RA / CT /// / CT RA /// L RA /// L /// /// L /// TO RA FRT RA FRT FRA FRTp RA FRT /// FRT /// FRT /// /// FRTp /// FRTp FRTp FRA FRTp FRAp BF // FRA BF // FRAp FRT S
/// RB RB NB /// RB RB /// RB RB RB RB NB SH /// /// /// RB /// /// RB RB FRB /// U / /// RB /// RB RB RB /// RB RB FRB RB FRB FRBp /// FRB FRBp FRBp FRBp FRBp FRBp FRB
XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO
/ / EH / / / 1 / Rc Rc 1 / / Rc Rc / / / / / / / / / Rc / / / / / / / / / / / Rc Rc Rc Rc Rc Rc Rc / / Rc
OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD
FRTp S FRS FRSp BT /// /// /// /// // IH /// /// WC /// T VRT VRS MO
XO XO XO XO XO XO XO XO XO XO XO XO XO XO XO
Rc / / Rc / / / / / 1 / / / / /
1.6.8 XL-FORM
0 6 11 16 21 31
XO XO XO XO XO
/ LK / / /
1.6.9 XFX-FORM
0 6 11 21 31
OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD OPCD
RT RT RT RT RT RT DUI RS RS RS RS RS
0 1
0 1
spr tbr /// FXM dcr pmrn DUIS FXM FXM spr dcr pmrn
/ /
XO XO XO XO XO XO XO XO XO XO XO XO
/ / / / / / / / / / / /
1.6.10 XFL-FORM
0 6 7 15 16 21 31
OPCD L
FLM
W FRB
XO
Rc
16
Power ISA I
1.6.11 XX1-FORM
0 6 11 16 21 31
1.6.17 A-FORM
0 6 11 16 21 26 31
OPCD OPCD
0 6
T S
11
RA RA
16
RB RB
21
XO XO
TX SX
31
XO XO XO XO XO
Rc Rc Rc Rc /
1.6.12 XX2-FORM
0 6 9 11 14 16 21 30 31
T T BF
9
B B B
14 16 21
XO XO XO
BX TX BX TX BX /
30 31
1.6.18 M-FORM
0 6 11 16 21 26 31
///
OPCD OPCD
RS RS
RA RA
RB SH
MB MB
ME Rc ME Rc
1.6.13 XX3-FORM
0 6 9 11 16 21 22 24 29 30 31
1.6.19 MD-FORM
0 6 11 16 21 27 30 31
T T BF T T
9 11
A A // A A A
16
B B B B B XO SHW XO DM
21 22
XO Rc XO XO XO XO
24
OPCD OPCD
RS RS
RA RA
sh sh
mb me
XO sh Rc XO sh Rc
1.6.20 MDS-FORM
0 6 11 16 21 27 31
OPCD OPCD
RS RS
RA RA
RB RB
mb me
XO Rc XO Rc
1.6.14 XX4-FORM
0 6 11 16 21 26 28 29 30 31
1.6.21 VA-FORM
0 6 11 16 21 26 31
OPCD
0 6
T
11
A
16
B
21
XO CXAXBX TX
26 28 29 30 31
OPCD OPCD
VRT VRT
VRA VRA
VRB VRB
VRC / SHB
XO XO
1.6.15 XS-FORM
0 6 11 16 21 30 31
OPCD
RS
RA
1.6.22 VC-FORM
0 6 11 16 21 22 31
1.6.16 XO-FORM
0 6 11 16 21 22 31
OPCD
VRT
VRA
VRB
Rc
XO
RT RT RT RT
RA RA RA RA
RB RB RB ///
OE / / OE
XO XO XO XO
Rc Rc / Rc
Chapter 1. Introduction
17
1.6.23 VX-FORM
0 6 11 16 21 31
1.6.27 Z23-FORM
0 6 11 16 21 23 31
VRA /// UIM / UIM // UIM /// UIM SIM /// ///
XO XO XO XO XO XO XO XO XO
FRB
RMC
XO XO XO XO XO XO XO
Rc Rc Rc Rc Rc Rc Rc
R FRB
R FRBp RMC
1.6.24 EVX-FORM
0 6 11 16 21 31
RS RS RT RT RT RT BF // RT RT
RA RA /// RA RA UI RA RA SI
1.6.25 EVS-FORM
0 6 11 16 21 29 31
OPCD
RT
RA
RB
XO
BFA
1.6.26 Z22-FORM
0 6 11 15 16
BF // BF // BF // BF // FRT FRTp
XO XO XO XO XO XO
/ / / / Rc Rc
BA (11:15) Field used to specify a bit in the CR to be used as a source. BB (16:20) Field used to specify a bit in the CR to be used as a source. BC (21:25) Field used to specify a bit in the CR to be used as a source. BD (16:29) Immediate field used to specify a 14-bit signed twos complement branch displacement which is concatenated on the right with 0b00 and sign-extended to 64 bits.
18
Power ISA I
Chapter 1. Introduction
19
20
Power ISA I
Chapter 1. Introduction
21
1.8 Exceptions
There are two kinds of exception, those caused directly by the execution of an instruction and those caused by an asynchronous event. In either case, the exception may cause one of several components of the system software to be invoked. The exceptions that can be caused directly by the execution of an instruction include the following: an attempt to execute an illegal instruction, or an attempt by an application program to execute a
22
Power ISA I
Chapter 1. Introduction
23
Load
for i=0 to N-1: for i=0 to N-1: RT(R-N)+i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-N)+i Little-Endian Byte Ordering Load Store for i=0 to N-1: for i=0 to N-1: RT(R-1)-i MEM(EA+i,1) MEM(EA+i,1) (RS)(R-1)-i Notes: 1. In this table, subscripts refer to bytes in a register rather than to bits as defined in Section 1.3.2. 2. This table does not apply to the lvebx, lvehx, lvewx, stvebx, stvehx, and stvewx instructions. Figure 31. Storage operands and byte ordering Figure 32 shows an example of a C language structure s containing an assortment of scalars and struct { int double char * char short int } s;
a; b; c; d[7]; e; f;
/* /* /* /* /* /*
*/ */ */ */ */ */
11
03
12
02
13
01
14
00
00 08 10 18 20
00 08 10 18 20
11
00
12
01
13
02
14
03 04 05 06 07
21
0F
22
0E
23
0D
24
0C
25
0B
26
0A
27
09
28
08
21
08
22
09
23
0A
24
0B
25
0C
26
0D
27
0E
28
0F
D C B A 31
17 16 15 14 13
32
12
33
11
34
10
31
10
32
11
33
12
34 A B C D
13 14 15 16 17 1F 1E
51
1D
52
1C 1B
G F E
1A 19 18
E F G
18 19 1A 1B
51
1C
52
1D 1E 1F
61
23
62
22
63
21
64
20
61
20
62
21
63
22
64
23
24
Power ISA I
beq done
07 06 05 04
00 08 10
add r7,r7,r4
0F 0E 0D 0C
lwzux r4,r5,r6
0B 0A 09 08
b loop
17 16 15 14 13
subi r5,r5,4
12 11 10
18
00 08 10
beq done
05 06 07
lwzux r4,r5,r6
08 09 0A 0B 0C
add r7,r7,r4
0D 0E 0F
subi r5,r5,4
10 11 12 13 14
b loop
15 16 17
18
Figure 37. Big-Endian mapping of program p The Little-Endian mapping of program p is shown in Figure 38.
Chapter 1. Introduction
25
In 64-bit mode, the entire 64-bit result comprises the 64-bit effective address. The effective address arithmetic wraps around from the maximum address, 264 - 1, to address 0, except that if the current instruction is at effective address 264 - 4 the effective address of the next sequential instruction is undefined. In 32-bit mode, the low-order 32 bits of the 64-bit result, preceded by 32 0 bits, comprise the 64-bit effective address for the purpose of addressing storage. When an effective address is placed into a register by an instruction or event, the value placed into the high-order 32 bits of the register differs between the Server environment and the Embedded environment. Server environment, and Embedded Environment when the high-order 32 bits of GPRs are implemented: - Load with Update and Store with Update instructions set the high-order 32 bits of register RA to the high-order 32 bits of the 64-bit result.
26
Power ISA I
Chapter 1. Introduction
27
28
Power ISA I
that causes the exception need not complete before the next instruction begins execution, with respect to setting exception bits and (if the exception is enabled) invoking the system error handler. A Store instruction modifies one or more bytes in an area of storage that contains instructions that will subsequently be executed. Before an instruction in that area of storage is executed, software synchronization is required to ensure that the instructions executed are consistent with the results produced by the Store instruction. Programming Note This software synchronization will generally be provided by system library programs (see Section 1.8 of Book II). Application programs should call the appropriate system library program before attempting to execute modified instructions.
29
0 1 2 3
Figure 39. Condition Register The bits in the Condition Register are grouped into eight 4-bit fields, named CR Field 0 (CR0), ..., CR Field 7 (CR7), which are set in one of the following ways. Specified fields of the CR can be set by a move to the CR from a GPR (mtcrf, mtocrf). A specified field of the CR can be set by a move to the CR from another CR field (mcrf), from XER32:35 (mcrxr), or from the FPSCR (mcrfs). CR Field 0 can be set as the implicit result of a fixed-point instruction. CR Field 1 can be set as the implicit result of a floating-point instruction. CR Field 1 can be set as the implicit result of a decimal floating-point instruction. CR Field 6 can be set as the implicit result of a vector instruction. A specified CR field can be set as the result of a Compare instruction. Instructions are provided to perform logical operations on individual CR bits and to test individual CR bits. For all fixed-point instructions in which Rc=1, and for addic., andi., and andis., the first three bits of CR Field 0 (bits 32:34 of the Condition Register) are set by signed comparison of the result to zero, and the fourth bit of CR Field 0 (bit 35 of the Condition Register) is copied from the SO field of the XER. Result here refers to the entire 64-bit value placed into the target register in 64-bit mode, and to bits 32:63 of the 64-bit value placed into the target register in 32-bit mode. if (64-bit mode) 0 then M 32 else M if (target_register)M:63 < 0 then c 0b100 0b010 else if (target_register)M:63 > 0 then c 0b001 else c CR0 c || XERSO If any portion of the result is undefined, then the value placed into the first three bits of CR Field 0 is undefined. The bits of CR Field 0 are interpreted as follows. 2
The stbcx., sthcx., stwcx., and stdcx. instructions (see Section 4.4.2, Load and Reserve and Store Conditional Instructions, in Book II) also set CR Field 0. For all floating-point instructions in which Rc=1, CR Field 1 (bits 36:39 of the Condition Register) is set to the Floating-Point exception status, copied from bits 0:3 of the Floating-Point Status and Control Register. This occurs regardless of whether any exceptions are enabled, and regardless of whether the writing of the result is suppressed (see Section 4.4, Floating-Point Exceptions on page 114). These bits are interpreted as follows. Bit 0 Description Floating-Point Exception Summary (FX) This is a copy of the contents of FPSCRFX at the completion of the instruction. Floating-Point Enabled Exception Summary (FEX) This is a copy of the contents of FPSCRFEX at the completion of the instruction. Floating-Point Invalid Operation Exception Summary (VX) This is a copy of the contents of FPSCRVX at the completion of the instruction. Floating-Point Overflow Exception (OX) This is a copy of the contents of FPSCROX at the completion of the instruction.
For Compare instructions, a specified CR field is set to reflect the result of the comparison. The bits of the specified CR field are interpreted as follows. A complete description of how the bits are set is given in the instruction descriptions in Section 3.3.9, Fixed-Point Compare Instructions on page 74, Section 4.6.8, Floating-Point Compare Instructions on page 148, and Section 8.3.9, SPE Instruction Set on page 510. Bit 0 Description Less Than, Floating-Point Less Than (LT, FL) For fixed-point Compare instructions, (RA) < SI or (RB) (signed comparison) or (RA) <u UI
30
Power ISA I
The Vector Integer Compare instructions (see Section 6.9.2, Vector Integer Compare Instructions) compare two Vector Registers element by element, interpreting the elements as unsigned or signed integers depending on the instruction, and set the corresponding element of the target Vector Register to all 1s if the relation being tested is true and 0s if the relation being tested is false. If Rc=1, CR Field 6 is set to reflect the result of the comparison, as follows Bit 0 1 2 3 Description The relation is true for all element pairs (i.e., VRT is set to all 1s). 0 The relation is false for all element pairs (i.e., VRT is set to all 0s). 0
The Vector Floating-Point Compare instructions compare two Vector Registers word element by word element, interpreting the elements as single-precision floating-point numbers. With the exception of the Vector Compare Bounds Floating-Point instruction, they set the target Vector Register, and CR Field 6 if Rc=1, in the same manner as do the Vector Integer Compare instructions. Bit 0 1 2 3 Description The relation is true for all element pairs (i.e., VRT is set to all 1s). 0 The relation is false for all element pairs (i.e., VRT is set to all 0s). 0
31
Notes: 1. z denotes a bit that is ignored. 2. The a and t bits are used as described below. Figure 42. BO field encodings The a and t bits of the BO field can be used by software to provide a hint about whether the branch is likely to be taken or is likely not to be taken, as shown in Figure 43. at 00 01 10 11 Hint No hint is given Reserved The branch is very likely not to be taken The branch is very likely to be taken
Figure 43. at bit encodings Programming Note Many implementations have dynamic mechanisms for predicting whether a branch will be taken. Because the dynamic prediction is likely to be very accurate, and is likely to be overridden by any hint provided by the at bits, the at bits should be set to 0b00 unless the static prediction implied by at=0b10 or at=0b11 is highly likely to be correct. For Branch Conditional to Link Register and Branch Conditional to Count Register instructions, the BH field
32
Power ISA I
bcctr[l]: The instruction is not a subroutine return; the target address is likely to be the same as the target address used the preceding time the branch was taken 01 bclr[l]: The instruction is not a subroutine return; the target address is likely to be the same as the target address used the preceding time the branch was taken
bcctr[l]: Reserved 10 11 Reserved bclr[l] and bcctr[l]: The target address is not predictable
Figure 44. BH field encodings Programming Note The hint provided by the BH field is independent of the hint provided by the at bits (e.g., the BH field provides no indication of whether the branch is likely to be taken).
33
A calls Glue: use a bl or bcl instruction (LK=1). Glue calls B: place the address of B into the Count Register, and use a bcctr instruction (LK=0). B returns to A: use a bclr instruction (LK=0) (the return address is in, or can be restored to, the Link Register).
Function call: Here A calls a function, the identity of which may vary from one instance of the call to another, instead of calling a specific program B. This case should be handled using the conventions of the preceding two bullets, depending on whether the call is direct or indirect, with the following differences.
If the call is direct, place the address of the function into the Count Register, and use a bcctrl instruction (LK=1) instead of a bl or bcl instruction. For the bcctr[l] instruction that branches to the function, use BH=0b11 if appropriate.
34
Power ISA I
Compatibility Note The bits corresponding to the current a and t bits, and to the current z bits except in the branch always BO encoding, had different meanings in versions of the architecture that precede Version 2.00. The bit corresponding to the t bit was called the y bit. The y bit indicated whether to use the architected default prediction (y=0) or to use the complement of the default prediction (y=1). The default prediction was defined as follows. If the instruction is bc[l][a] with a negative value in the displacement field, the branch is taken. (This is the only case in which the prediction corresponding to the y bit differs from the prediction corresponding to the t bit.) - In all other cases (bc[l][a] with a nonnegative value in the displacement field, bclr[l], or bcctr[l]), the branch is not taken. The BO encodings that test both the Count Register and the Condition Register had a y bit in place of the current z bit. The meaning of the y bit was as described in the preceding item. The a bit was a z bit. Because these bits have always been defined either to be ignored or to be treated as hints, a given program will produce the same result on any implementation regardless of the values of the bits. Also, because even the y bit is ignored, in practice, by most processors that comply with versions of the architecture that precede Version 2.00, the performance of a given program on those processors will not be affected by the values of the bits.
35
I-form
target_addr target_addr target_addr target_addr LI (AA=0 LK=0) (AA=1 LK=0) (AA=0 LK=1) (AA=1 LK=1) AA LK
30 31
Branch Conditional
bc bca bcl bcla 16
0 6
B-form
(AA=0 LK=0) (AA=1 LK=0) (AA=0 LK=1) (AA=1 LK=1) BD
16
BI
AA LK
30 31
target_addr specifies the branch target address. If AA=0 then the branch target address is the sum of LI || 0b00 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If AA=1 then the branch target address is the value LI || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: LR (if LK=1)
if (64-bit mode) then M 0 32 else M CTR - 1 if BO2 then CTR ctr_ok BO2 | ((CTRM:63 0) BO3) BO0 | (CRBI+32 BO1) cond_ok if ctr_ok & cond_ok then if AA then NIA iea EXTS(BD || 0b00) else NIA iea CIA + EXTS(BD || 0b00) if LK then LR iea CIA + 4 BI+32 specifies the Condition Register bit to be tested. The BO field is used to resolve the branch as described in Figure 42. target_addr specifies the branch target address. If AA=0 then the branch target address is the sum of BD || 0b00 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If AA=1 then the branch target address is the value BD || 0b00 sign-extended, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR LR Extended Mnemonics: Examples of extended mnemonics for Branch Conditional: Extended: blt target bne cr2,target bdnz target Equivalent to: bc 12,0,target bc 4,10,target bc 16,0,target (if BO2=0) (if LK=1)
36
Power ISA I
BO,BI,BH BO,BI,BH BO
11
BO,BI,BH BO,BI,BH BO
11
BI
BH
19 21
16
LK
31
BI
BH
19 21
528
LK
31
if (64-bit mode) 0 then M 32 else M if BO2 then CTR CTR - 1 BO2 | ((CTRM:63 0) BO3 ctr_ok cond_ok BO0 | (CRBI+32 BO1) if ctr_ok & cond_ok then NIA iea LR0:61 || 0b00 if LK then LR iea CIA + 4 BI+32 specifies the Condition Register bit to be tested. The BO field is used to resolve the branch as described in Figure 42. The BH field is used as described in Figure 44. The branch target address is LR0:61 || 0b00, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR LR Extended Mnemonics: Examples of extended mnemonics for Branch Conditional to Link Register: Extended: bclr 4,6 bltlr bnelr cr2 bdnzlr Programming Note bclr, bclrl, bcctr, and bcctrl each serve as both a basic and an extended mnemonic. The Assembler will recognize a bclr, bclrl, bcctr, or bcctrl mnemonic with three operands as the basic form, and a bclr, bclrl, bcctr, or bcctrl mnemonic with two operands as the extended form. In the extended form the BH operand is omitted and assumed to be 0b00. Equivalent to: bclr 4,6,0 bclr 12,0,0 bclr 4,10,0 bclr 16,0,0 (if BO2=0) (if LK=1)
cond_ok BO0 | (CRBI+32 BO1) if cond_ok then NIA iea CTR0:61 || 0b00 if LK then LR iea CIA + 4 BI+32 specifies the Condition Register bit to be tested. The BO field is used to resolve the branch as described in Figure 42. The BH field is used as described in Figure 44. The branch target address is CTR0:61 || 0b00, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. If the decrement and test CTR option is specified (BO2=0), the instruction form is invalid. Special Registers Altered: LR Extended Mnemonics: Examples of extended mnemonics for Branch Conditional to Count Register. Extended: bcctr 4,6 bltctr bnectr cr2 Equivalent to: bcctr 4,6,0 bcctr 12,0,0 bcctr 4,10,0 (if LK=1)
37
XL-form
XL-form
BT,BA,BB BT
11
BA
16
BB
21
257
/
31 0
19
BA
16
BB
21
225
/
31
CRBT+32
CRBT+32
(CRBA+32
& CRBB+32)
The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
Condition Register OR
cror 19
0 6
XL-form
XL-form
BT,BA,BB BT
11
BA
16
BB
21
449
/
31 0
19
BA
16
BB
21
193
/
31
CRBT+32
CRBA+32 | CRBB+32
CRBT+32
CRBA+32 CRBB+32
The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Extended Mnemonics: Example of extended mnemonics for Condition Register OR: Extended: crmove Bx,By Equivalent to: cror Bx,By,By
The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Extended Mnemonics: Example of extended mnemonics for Condition Register XOR: Extended: crclr Bx Equivalent to: crxor Bx,Bx,Bx
38
Power ISA I
XL-form
XL-form
BT,BA,BB BT
11
BA
16
BB
21
33
/
31 0
19
BA
16
BB
21
289
/
31
CRBT+32
(CRBA+32
| CRBB+32)
CRBT+32
CRBA+32 CRBB+32
The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Extended Mnemonics: Example of extended mnemonics for Condition Register NOR: Extended: crnot Bx,By Equivalent to: crnor Bx,By,By
The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32 Extended Mnemonics: Example of extended mnemonics for Condition Register Equivalent: Extended: crset Bx Equivalent to: creqv Bx,Bx,Bx
BT,BA,BB BT
11
BA
16
BB
21
129
/
31 0
19
BA
16
BB
21
417
/
31
CRBT+32
CRBA+32 &
CRBB+32
CRBT+32
CRBA+32 |
CRBB+32
The bit in the Condition Register specified by BA+32 is ANDed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
The bit in the Condition Register specified by BA+32 is ORed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
XL-form
BF,BFA BF //
9
BFA
11
//
14 16
///
21
/
31
CR4BF+32:4BF+35
CR4BFA+32:4BFA+35
The contents of Condition Register field BFA are copied to Condition Register field BF. Special Registers Altered: CR field BF
39
System Call
sc 17
0 6
SC-form
LEV ///
11
///
16
//
20
LEV
// 1 /
27 30 31
This instruction calls the system to perform a service. A complete description of this instruction can be found in Book III. The use of the LEV field is described in Book III. The LEV values greater than 1 are reserved, and bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. When control is returned to the program that executed the System Call instruction, the contents of the registers will depend on the register conventions used by the program providing the system service. This instruction is context synchronizing (see Book III). Special Registers Altered: Dependent on the system service Programming Note sc serves as both a basic and an extended mnemonic. The Assembler will recognize an sc mnemonic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0. In application programs the value of the LEV operand for sc should be 0.
40
Power ISA I
41
33
Overflow (OV) The Overflow bit is set to indicate that an overflow has occurred during execution of an instruction. XO-form Add, Subtract From, and Negate instructions having OE=1 set it to 1 if the carry out of bit M is not equal to the carry out of bit M+1, and set it to 0 otherwise. XO-form Multiply Low and Divide instructions having OE=1 set it to 1 if the result cannot be represented in 64 bits (mulld, divd, divde, divdu, divdeu) or in 32 bits (mullw, divw, divwe, divwu, divweu), and set it to 0 otherwise. The OV bit is not altered by Compare instructions, or by other instructions (except mtspr to the XER, and mcrxr) that cannot overflow. [Category: Legacy Integer Multiply-Accumulate] XO-form Legacy Integer Multiply-Accumulate instructions set OV when OE=1 to reflect overflow of the 32-bit result. For signed-integer accumulation, overflow occurs when the add produces a carry out of bit 32 that is not equal to the carry out of bit 33. For unsigned-integer accumulation, overflow occurs when the add produces a carry out of bit 32.
34
Figure 46. Fixed-Point Exception Register The bit definitions for the Fixed-Point Exception Register are shown below. Here M=0 in 64-bit mode and M=32 in 32-bit mode. The bits are set based on the operation of an instruction considered as a whole, not on intermediate results (e.g., the Subtract From Carrying instruction, the result of which is specified as the sum of three values, sets bits in the Fixed-Point Exception Register based on the entire operation, not on an intermediate sum). 35:56 Bit(s 0:31 32 Description Reserved Summary Overflow (SO) The Summary Overflow bit is set to 1 whenever an instruction (except mtspr) sets the Overflow bit. Once set, the SO bit remains set until it is cleared by an mtspr instruction (specifying the XER) or an mcrxr instruction. It is not altered by Compare instructions, or by other instructions (except mtspr to the XER, and mcrxr) that cannot overflow. Executing an mtspr instruction to the XER, supplying the values 0 for SO and 1 for OV, causes SO to be set to 0 and OV to be set to 1. 57:63
Carry (CA) The Carry bit is set as follows, during execution of certain instructions. Add Carrying, Subtract From Carrying, Add Extended, and Subtract From Extended types of instructions set it to 1 if there is a carry out of bit M, and set it to 0 otherwise. Shift Right Algebraic instructions set it to 1 if any 1-bits have been shifted out of a negative operand, and set it to 0 otherwise. The CA bit is not altered by Compare instructions, or by other instructions (except Shift Right Algebraic, mtspr to the XER, and mcrxr) that cannot carry. Reserved This field specifies the number of bytes to be transferred by a Load String Indexed or Store String Indexed instruction. [Category: Legacy Move Assist] This field is used as a target by dmlzb to indicate the byte location of the leftmost zero byte found.
42
Power ISA I
access them and does not define the Device Control Registers themselves. Device Control Registers may control the use of on-chip peripherals, such as memory controllers (the definition of specific Device Control Registers is implementation-dependent). The contents of user-mode-accessible Device Control Registers can be read using mfdcrux and written using mtdcrux.
The VR Save Register (VRSAVE) is a 32-bit register that can be used as a software use SPR; see Sections 3.2.4 and 6.3.3.
Architecture Note In versions of the Architecture that precede Version 2.05, the VRSAVE Register was in the Vector category (see Section 6.3.3). It is now included in the Base category to simplify operating system support of and migration between heterogeneous systems containing processors with and without support for the Vector category.
Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Sections 3.2.3 and 6.3.3.
43
44
Power ISA I
D-form
X-form
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
87
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA 560 || MEM(EA, 1) RT Let the effective address (EA) be the sum (RA|0)+ D. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA 560 || MEM(EA, 1) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None
D-form
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
119
/
31
EA RT RA
Let the effective address (EA) be the sum (RA)+ D. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
EA RT RA
Let the effective address (EA) be the sum (RA)+ (RB). The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
45
D-form
X-form
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
279
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA 480 || MEM(EA, 2) RT Let the effective address (EA) be the sum (RA|0)+ D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA 480 || MEM(EA, 2) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: None
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
311
/
31
EA RT RA
EA RT RA
Let the effective address (EA) be the sum (RA)+ D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
Let the effective address (EA) be the sum (RA)+ (RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
46
Power ISA I
D-form
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
343
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA EXTS(MEM(EA, 2)) RT Let the effective address (EA) be the sum (RA|0)+ D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA EXTS(MEM(EA, 2)) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. Special Registers Altered: None
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
375
/
31
EA RT RA
EA RT RA
Let the effective address (EA) be the sum (RA)+ D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
Let the effective address (EA) be the sum (RA)+ (RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
47
D-form
X-form
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
23
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA 320 || MEM(EA, 4) RT Let the effective address (EA) be the sum (RA|0)+ D. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA 320 || MEM(EA, 4) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None
RT,D(RA) RT
11
RA
16
D
31 0
31
RA
16
RB
21
55
/
31
EA RT RA
Let the effective address (EA) be the sum (RA)+ D. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
EA RT RA
Let the effective address (EA) be the sum (RA)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
48
Power ISA I
Version 2.06 Revision B 3.3.2.1 64-bit Fixed-Point Load Instructions [Category: 64-Bit]
Load Word Algebraic
lwa 58
0 6
DS-form
X-form
RT,DS(RA) RT
11
RA
16
DS
2
30 31 0
31
RA
16
RB
21
341
/
31
if RA = 0 then b 0 (RA) else b EA b + EXTS(DS || 0b00) EXTS(MEM(EA, 4)) RT Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) EXTS(MEM(EA, 4)) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. Special Registers Altered: None
RT,RA,RB RT
11
RA
16
RB
21
373
/
31
EA RT RA
Let the effective address (EA) be the sum (RA)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are filled with a copy of bit 0 of the loaded word. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
49
DS-form
X-form
RT,DS(RA) RT
11
RA
16
DS
0
30 31
RT
11
RA
16
RB
21
21
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(DS || 0b00) EA MEM(EA, 8) RT Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The doubleword in storage addressed by EA is loaded into RT. Special Registers Altered: None
0 (RA)
Let the effective address (EA) be the sum (RA|0)+ (RB). The doubleword in storage addressed by EA is loaded into RT. Special Registers Altered: None
DS-form
RT,DS(RA) RT
11
RA
16
DS
1
30 31 0
31
RA
16
RB
21
53
/
31
EA RT RA
Let the effective address (EA) be the sum (RA)+ (DS||0b00). The doubleword in storage addressed by EA is loaded into RT. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
EA RT RA
Let the effective address (EA) be the sum (RA)+ (RB). The doubleword in storage addressed by EA is loaded into RT. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
50
Power ISA I
Store Byte
stb 38
0 6
D-form
X-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
215
/
31
if RA = 0 then b 0 (RA) else b b + EXTS(D) EA MEM(EA, 1) (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ D. (RS)56:63 are stored into the byte in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 1) (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into the byte in storage addressed by EA. Special Registers Altered: None
D-form
X-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
247
/
31
EA (RA) + EXTS(D) (RS)56:63 MEM(EA, 1) RA EA Let the effective address (EA) be the sum (RA)+ D. (RS)56:63 are stored into the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) (RS)56:63 MEM(EA, 1) RA EA Let the effective address (EA) be the sum (RA)+ (RB). (RS)56:63 are stored into the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
51
D-form
X-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
407
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA (RS)48:63 MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ D. (RS)48:63 are stored into the halfword in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA (RS)48:63 MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)48:63 are stored into the halfword in storage addressed by EA. Special Registers Altered: None
D-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
439
/
31
EA (RA) + EXTS(D) MEM(EA, 2) (RS)48:63 RA EA Let the effective address (EA) be the sum (RA)+ D. (RS)48:63 are stored into the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) (RS)48:63 MEM(EA, 2) RA EA Let the effective address (EA) be the sum (RA)+ (RB). (RS)48:63 are stored into the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
52
Power ISA I
D-form
X-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
151
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA (RS)32:63 MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ D. (RS)32:63 are stored into the word in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA (RS)32:63 MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)32:63 are stored into the word in storage addressed by EA. Special Registers Altered: None
D-form
X-form
RS,D(RA) RS
11
RA
16
D
31 0
31
RA
16
RB
21
183
/
31
EA (RA) + EXTS(D) MEM(EA, 4) (RS)32:63 RA EA Let the effective address (EA) be the sum (RA)+ D. (RS)32:63 are stored into the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) MEM(EA, 4) (RS)32:63 RA EA Let the effective address (EA) be the sum (RA)+ (RB). (RS)32:63 are stored into the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
53
Version 2.06 Revision B 3.3.3.1 64-bit Fixed-Point Store Instructions [Category: 64-Bit]
Store Doubleword
std 62
0 6
DS-form
X-form
RS,DS(RA) RS
11
RA
16
DS
0
30 31 0
31
RA
16
RB
21
149
/
31
if RA = 0 then b 0 (RA) else b EA b + EXTS(DS || 0b00) (RS) MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). (RS) is stored into the doubleword in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) (RS) MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS) is stored into the doubleword in storage addressed by EA. Special Registers Altered: None
DS-form
RS,DS(RA) RS
11
RA
16
DS
1
30 31 0
31
RA
16
RB
21
181
/
31
EA (RA) + EXTS(DS || 0b00) (RS) MEM(EA, 8) RA EA Let the effective address (EA) be the sum (RA)+ (DS||0b00). (RS) is stored into the doubleword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) MEM(EA, 8) (RS) EA RA Let the effective address (EA) be the sum (RA)+ (RB). (RS) is stored into the doubleword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
54
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
790
/
31 0
31
RA
16
RB
21
918
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA MEM(EA, 2) load_data 480 || load_data RT 8:15 || load_data0:7 Let the effective address (EA) be the sum (RA|0)+(RB). Bits 0:7 of the halfword in storage addressed by EA are loaded into RT56:63. Bits 8:15 of the halfword in storage addressed by EA are loaded into RT48:55. RT0:47 are set to 0. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA (RS)56:63 || (RS)48:55 MEM(EA, 2) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the halfword in storage addressed by EA. (RS)48:55 are stored into bits 8:15 of the halfword in storage addressed by EA. Special Registers Altered: None
RT,RA,RB RT
11
RA
16
RB
21
534
/
31 0
31
RA
16
RB
21
662
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) MEM(EA, 4) load_data 320 || load_data RT 24:31 || load_data16:23 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum (RA|0)+ (RB). Bits 0:7 of the word in storage addressed by EA are loaded into RT56:63. Bits 8:15 of the word in storage addressed by EA are loaded into RT48:55. Bits 16:23 of the word in storage addressed by EA are loaded into RT40:47. Bits 24:31 of the word in storage addressed by EA are loaded into RT32:39. RT0:31 are set to 0. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) (RS)56:63 || (RS)48:55 || (RS)40:47 MEM(EA, 4) ||(RS)32:39 Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the word in storage addressed by EA. (RS)48:55 are stored into bits 8:15 of the word in storage addressed by EA. (RS)40:47 are stored into bits 16:23 of the word in storage addressed by EA. (RS)32:39 are stored into bits 24:31 of the word in storage addressed by EA. Special Registers Altered: None
55
Version 2.06 Revision B 3.3.4.1 64-Bit Load and Store with Byte Reversal Instructions [Category: 64-bit]
Load Doubleword Byte-Reverse Indexed X-form
ldbrx 31
0 6
RT,RA,RB RT
11
RA
16
RB
21
532
/
31 0
31
RA
16
RB
21
660
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA MEM(EA, 8) load_data RT load_data56:63 || load_data48:55 || load_data40:47 || load_data32:39 || load_data24:31 || load_data16:23 || load_data8:15 || load_data0:7 Let the effective address (EA) be the sum (RA|0)+(RB). Bits 0:7 of the doubleword in storage addressed by EA are loaded into RT56:63. Bits 8:15 of the doubleword in storage addressed by EA are loaded into RT48:55. Bits 16:23 of the doubleword in storage addressed by EA are loaded into RT40:47. Bits 24:31 of the doubleword in storage addressed by EA are loaded into RT32:39. Bits 32:39 of the doubleword in storage addressed by EA are loaded into RT24:31. Bits 40:47 of the doubleword in storage addressed by EA are loaded into RT16:23. Bits 48:55 of the doubleword in storage addressed by EA are loaded into RT8:15. Bits 56:63 of the doubleword in storage addressed by EA are loaded into RT0:7. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA (RS)56:63 || (RS)48:55 MEM(EA, 8) || (RS)40:47 || (RS)32:39 || (RS)24:31 || (RS)16:23 || (RS)8:15 || (RS)0:7 Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into bits 0:7 of the doubleword in storage addressed by EA. (RS)48:55 are stored into bits 8:15 of the doubleword in storage addressed by EA. (RS)40:47 are stored into bits 16:23 of the doubleword in storage addressed by EA. (RS)32:39 are stored into bits 23:31 of the doubleword in storage addressed by EA. (RS)24:31 are stored into bits 32:39 of the doubleword in storage addressed by EA. (RS)16:23 are stored into bits 40:47 of the doubleword in storage addressed by EA. (RS)8:15 are stored into bits 48:55 of the doubleword in storage addressed by EA. (RS)0:7 are stored into bits 56:63 of the doubleword in storage addressed by EA. Special Registers Altered: None
56
Power ISA I
D-form
D-form
RT,D(RA) RT
11
RA
16
D
31 0
47
RA
16
D
31
if RA = 0 then b 0 (RA) else b b + EXTS(D) EA r RT do while r 31 320 || MEM(EA, 4) GPR(r) r r + 1 EA + 4 EA Let n = (32-RT). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are loaded into the low-order 32 bits of GPRs RT through 31. The high-order 32 bits of these GPRs are set to zero. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + EXTS(D) EA r RS do while r 31 GPR(r)32:63 MEM(EA, 4) r r + 1 EA + 4 EA Let n = (32-RS). Let the effective address (EA) be the sum (RA|0)+ D. n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RS through 31. Special Registers Altered: None
57
58
Power ISA I
X-form
X-form
RT,RA,NB RT
11
RA
16
NB
21
597
/
31 0
31
RA
16
RB
21
533
/
31
if RA = 0 then EA 0 else EA (RA) 32 if NB = 0 then n NB else n r RT - 1 32 i do while n > 0 if i = 32 then r + 1 (mod 32) r 0 GPR(r) GPR(r)i:i+7 MEM(EA, 1) i + 8 i 32 if i = 64 then i EA EA + 1 n - 1 n Let the effective address (EA) be (RA|0). Let n = NB if NB0, n = 32 if NB=0; n is the number of bytes to load. Let nr=CEIL(n/4); nr is the number of registers to receive data. n consecutive bytes starting at EA are loaded into GPRs RT through RT+nr-1. Data are loaded into the low-order four bytes of each GPR; the high-order four bytes are set to 0. Bytes are loaded left to right in each register. The sequence of registers wraps around to GPR 0 if required. If the low-order four bytes of register RT+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are set to 0. If RA is in the range of registers to be loaded, including the case in which RA=0, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA XER57:63 n r RT - 1 32 i undefined RT do while n > 0 if i = 32 then r + 1 (mod 32) r GPR(r) 0 MEM(EA, 1) GPR(r)i:i+7 i + 8 i if i = 64 then i 32 EA + 1 EA n - 1 n Let the effective address (EA) be the sum (RA|0)+ (RB). Let n=XER57:63; n is the number of bytes to load. Let nr=CEIL(n/4); nr is the number of registers to receive data. If n>0, n consecutive bytes starting at EA are loaded into GPRs RT through RT+nr-1. Data are loaded into the low-order four bytes of each GPR; the high-order four bytes are set to 0. Bytes are loaded left to right in each register. The sequence of registers wraps around to GPR 0 if required. If the low-order four bytes of register RT+nr-1 are only partially filled, the unfilled low-order byte(s) of that register are set to 0. If n=0, the contents of register RT are undefined. If RA or RB is in the range of registers to be loaded, including the case in which RA=0, the instruction is treated as if the instruction form were invalid. If RT=RA or RT=RB, the instruction form is invalid. Special Registers Altered: None
59
X-form
X-form
RS,RA,NB RS
11
RA
16
NB
21
725
/
31 0
31
RA
16
RB
21
661
/
31
if RA = 0 then EA 0 else EA (RA) 32 if NB = 0 then n NB else n r RS - 1 32 i do while n > 0 if i = 32 then r r + 1 (mod 32) GPR(r)i:i+7 MEM(EA, 1) i i + 8 if i = 64 then i 32 EA + 1 EA n - 1 n Let the effective address (EA) be (RA|0). Let n = NB if NB0, n = 32 if NB=0; n is the number of bytes to store. Let nr =CEIL(n/4); nr is the number of registers to supply data. n consecutive bytes starting at EA are stored from GPRs RS through RS+nr-1. Data are stored from the low-order four bytes of each GPR. Bytes are stored left to right from each register. The sequence of registers wraps around to GPR 0 if required. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + (RB) EA XER57:63 n r RS - 1 32 i do while n > 0 if i = 32 then r r + 1 (mod 32) GPR(r)i:i+7 MEM(EA, 1) i i + 8 if i = 64 then i 32 EA + 1 EA n - 1 n Let the effective address (EA) be the sum (RA|0)+ (RB). Let n = XER57:63; n is the number of bytes to store. Let nr = CEIL(n/4); nr is the number of registers to supply data. If n>0, n consecutive bytes starting at EA are stored from GPRs RS through RS+nr-1. Data are stored from the low-order four bytes of each GPR. Bytes are stored left to right from each register. The sequence of registers wraps around to GPR 0 if required. If n=0, no bytes are stored. Special Registers Altered: None
60
Power ISA I
61
Add Immediate
addi 14
0 6
D-form
D-form
RT,RA,SI RT
11
RA
16
SI
31 0
15
RA
16
SI
31
if RA = 0 then RT else RT
if RA = 0 then RT else RT
The sum (RA|0) + SI is placed into register RT. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Add Immediate: Extended: li Rx,value la Rx,disp(Ry) subi Rx,Ry,value Programming Note addi, addis, add, and subf are the preferred instructions for addition and subtraction, because they set few status bits. Notice that addi and addis use the value 0, not the contents of GPR 0, if RA=0. Equivalent to: addi Rx,0,value addi Rx,Ry,disp addi Rx,Ry,-value
The sum (RA|0) + (SI || 0x0000) is placed into register RT. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Add Immediate Shifted: Extended: lis Rx,value subis Rx,Ry,value Equivalent to: addis Rx,0,value addis Rx,Ry,-value
62
Power ISA I
XO-form
RT,RA,RB RT,RA,RB RT,RA,RB RT,RA,RB RT
11
Subtract From
subf subf. subfo subfo. 31
0 6
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
RA
OE
21 22
266
Rc
31
RA
OE
21 22
40
Rc
31
RT
(RA) + (RB)
The sum (RA) + (RB) is placed into register RT. Special Registers Altered: CR0 SO OV (if Rc=1) (if OE=1)
(RA) + (RB) + 1 The sum (RA) + (RB) +1 is placed into register RT.
RT Special Registers Altered: CR0 SO OV Extended Mnemonics: Example of extended mnemonics for Subtract From: Extended: sub Rx,Ry,Rz Equivalent to: subf Rx,Rz,Ry (if Rc=1) (if OE=1)
D-form
RT,RA,SI RT
11
RA
16
SI
31 0
13
RA
16
SI
31
RT
(RA) + EXTS(SI) RT (RA) + EXTS(SI) The sum (RA) + SI is placed into register RT. Special Registers Altered: CR0 CA Extended Mnemonics: Example of extended mnemonics for Add Immediate Carrying and Record: Extended: subic. Rx,Ry,value Equivalent to: addic. Rx,Ry,-value
The sum (RA) + SI is placed into register RT. Special Registers Altered: CA Extended Mnemonics: Example of extended mnemonics for Add Immediate Carrying: Extended: subic Rx,Ry,value Equivalent to: addic Rx,Ry,-value
63
RT,RA,SI RT
11
RA
16
SI
31
Add Carrying
addc addc. addco addco. 31
0 6
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RA
OE
21 22
10
Rc
31
RA
16
RB
OE
21 22
Rc
31
RT
(RA) + (RB)
The sum (RA) + (RB) is placed into register RT. Special Registers Altered: CA CR0 SO OV
(RA) + (RB) + 1 The sum (RA) + (RB) + 1 is placed into register RT.
RT Special Registers Altered: CA CR0 SO OV Extended Mnemonics: Example of extended mnemonics for Subtract From Carrying: Extended: subc Rx,Ry,Rz Equivalent to: subfc Rx,Rz,Ry (if Rc=1) (if OE=1)
64
Power ISA I
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RA
OE
21 22
138
Rc
31
RA
16
RB
OE
21 22
136
Rc
31
RT
(RA) + (RB) + CA
The sum (RA) + (RB) + CA is placed into register RT. Special Registers Altered: CA CR0 SO OV
(RA) + (RB) + CA The sum (RA) + (RB) + CA is placed into register RT.
RT Special Registers Altered: CA CR0 SO OV (if Rc=1) (if OE=1)
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RA
16
///
OE
21 22
234
Rc
31
///
OE
21 22
232
Rc
31
RT
(RA) + CA - 1
The sum (RA) + CA + 641 is placed into register RT. Special Registers Altered: CA CR0 SO OV
65
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RA
16
///
OE
21 22
202
Rc
31
RA
16
///
OE
21 22
200
Rc
31
RT
(RA) + CA
The sum (RA) + CA is placed into register RT. Special Registers Altered: CA CR0 SO OV
Negate
neg neg. nego nego. 31
0 6
XO-form
RT,RA RT,RA RT,RA RT,RA RT
11
///
OE
21 22
104
Rc
31
66
Power ISA I
D-form
XO-form
(Rc=0) (Rc=1) RB
16
RT,RA,SI RT
11
RA
16
SI
31 0
31
RA
/
21 22
75
Rc
31
prod0:127 (RA) EXTS(SI) RT prod64:127 The 64-bit first operand is (RA). The 64-bit second operand is the sign-extended value of the SI field. The low-order 64 bits of the 128-bit product of the operands are placed into register RT. Both operands and the product are interpreted as signed integers. Special Registers Altered: None
prod0:63 (RA)32:63 (RB)32:63 RT32:63 prod0:31 RT0:31 undefined The 32-bit operands are the low-order 32 bits of RA and of RB. The high-order 32 bits of the 64-bit product of the operands are placed into RT32:63. The contents of RT0:31 are undefined. Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1)
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
XO-form
(Rc=0) (Rc=1)
RT,RA,RB RT,RA,RB RT
11
RA
OE
21 22
235
Rc
31
RA
16
RB
/
21 22
11
Rc
31
RT
(RA)32:63 (RB)32:63
The 32-bit operands are the low-order 32 bits of RA and of RB. The 64-bit product of the operands is placed into register RT. If OE=1 then OV is set to 1 if the product cannot be represented in 32 bits. Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 SO OV Programming Note For mulli and mullw, the low-order 32 bits of the product are the correct 32-bit product for 32-bit mode. For mulli and mulld, the low-order 64 bits of the product are independent of whether the operands are regarded as signed or unsigned 64-bit integers. For mulli and mullw, the low-order 32 bits of the product are independent of whether the operands are regarded as signed or unsigned 32-bit integers. (if Rc=1) (if OE=1)
prod0:63 (RA)32:63 (RB)32:63 RT32:63 prod0:31 RT0:31 undefined The 32-bit operands are the low-order 32 bits of RA and of RB. The high-order 32 bits of the 64-bit product of the operands are placed into RT32:63. The contents of RT0:31 are undefined. Both operands and the product are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1)
67
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB OE
21 22
RA
OE
21 22
491
Rc
31
RA
16
459
Rc
31
dividend0:31 (RA)32:63 divisor0:31 (RB)32:63 RT32:63 dividend divisor undefined RT0:31 The 32-bit dividend is (RA)32:63. The 32-bit divisor is (RB)32:63. The 32-bit quotient is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed integer that satisfies dividend = (quotient divisor) + r where 0 r < |divisor| if the dividend is nonnegative, and -|divisor| < r 0 if the dividend is negative. If an attempt is made to perform any of the divisions 0x8000_0000 -1 <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) Programming Note The 32-bit signed remainder of dividing (RA)32:63 by (RB)32:63 can be computed as follows, except in the case that (RA)32:63 = -231 and (RB)32:63 = -1. divw RT,RA,RB mullw RT,RT,RB subf RT,RT,RA # RT = quotient # RT = quotientdivisor # RT = remainder
dividend0:31 (RA)32:63 divisor0:31 (RB)32:63 RT32:63 dividend divisor undefined RT0:31 The 32 bit dividend is (RA)32:63. The 32-bit divisor is (RB)32:63. The 32-bit quotient is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. Both operands and the quotient are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned integer that satisfies dividend = (quotient divisor) + r where 0 r < divisor. If an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1) Programming Note The 32-bit unsigned remainder of dividing (RA)32:63 by (RB)32:63 can be computed as follows. divwu RT,RA,RB mullw RT,RT,RB subf RT,RT,RA # RT = quotient # RT = quotientdivisor # RT = remainder
68
Power ISA I
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RT
11
RA
16
RB
OE
21 22
427
Rc
31
RT
11
RA
16
RB
OE
21 22
395
Rc
31
dividend0:63 (RA)32:63 || 320 divisor0:31 (RB)32:63 RT32:63 dividend divisor undefined RT0:31 The 64-bit dividend is (RA)32:63 || 320. The 32-bit divisor is (RB)32:63. If the quotient can be represented in 32 bits, it is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed integer that satisfies dividend = (quotient divisor) + r where 0 r < |divisor| if the dividend is nonnegative, and -|divisor| < r 0 if the dividend is negative. If the quotient cannot be represented in 32 bits, or if an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1)
dividend0:63 (RA)32:63 || 320 divisor0:31 (RB)32:63 RT32:63 dividend divisor undefined RT0:31 The 64-bit dividend is (RA)32:63 || 320. The 32-bit divisor is (RB)32:63. If the quotient can be represented in 32 bits, it is placed into RT32:63. The contents of RT0:31 are undefined. The remainder is not supplied as a result. Both operands and the quotient are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned integer that satisfies dividend = (quotient divisor) + r where 0 r < divisor. If (RA) (RB), or if an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) SO OV (if OE=1)
69
5. q2 6. r2
70
Power ISA I
Version 2.06 Revision B 3.3.8.1 64-bit Fixed-Point Arithmetic Instructions [Category: 64-Bit]
Multiply Low Doubleword
mulld mulld. mulldo mulldo. 31
0 6
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
XO-form
(Rc=0) (Rc=1)
RT,RA,RB RT,RA,RB RT
11
RA
16
RB
/
21 22
73
Rc
31
RA
16
RB
OE
21 22
233
Rc
31
prod0:127 (RA) (RB) prod64:127 RT The 64-bit operands are (RA) and (RB). The low-order 64 bits of the 128-bit product of the operands are placed into register RT. If OE=1 then OV is set to 1 if the product cannot be represented in 64 bits. Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 SO OV Programming Note The XO-form Multiply instructions may execute faster on some implementations if RB contains the operand having the smaller absolute value. (if Rc=1) (if OE=1)
prod0:127 (RA) (RB) prod0:63 RT The 64-bit operands are (RA) and (RB). The high-order 64 bits of the 128-bit product of the operands are placed into register RT. Both operands and the product are interpreted as signed integers. Special Registers Altered: CR0 (if Rc=1)
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16
RA
/
21 22
Rc
31
prod0:127 (RA) (RB) prod0:63 RT The 64-bit operands are (RA) and (RB). The high-order 64 bits of the 128-bit product of the operands are placed into register RT. Both operands and the product are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. Special Registers Altered: CR0 (if Rc=1)
71
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1) RB
16
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RA
OE
21 22
489
Rc
31
RA
16
RB
OE
21 22
457
Rc
31
dividend0:63 (RA) divisor0:63 (RB) dividend divisor RT The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit quotient is placed into register RT. The remainder is not supplied as a result. Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed integer that satisfies dividend = (quotient divisor) + r where 0 r < |divisor| if the dividend is nonnegative, and -|divisor| < r 0 if the dividend is negative. If an attempt is made to perform any of the divisions 0x8000_0000_0000_0000 -1 <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 SO OV Programming Note The 64-bit signed remainder of dividing (RA) by (RB) can be computed as follows, except in the case that (RA) = -263 and (RB) = -1. divd RT,RA,RB mulld RT,RT,RB subf RT,RT,RA # RT = quotient # RT = quotientdivisor # RT = remainder (if Rc=1) (if OE=1)
dividend0:63 (RA) divisor0:63 (RB) dividend divisor RT The 64-bit dividend is (RA). The 64-bit divisor is (RB). The 64-bit quotient is placed into register RT. The remainder is not supplied as a result. Both operands and the quotient are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned integer that satisfies dividend = (quotient divisor) + r where 0 r < divisor. If an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In this case, if OE=1 then OV is set to 1. Special Registers Altered: CR0 SO OV Programming Note The 64-bit unsigned remainder of dividing (RA) by (RB) can be computed as follows. divdu RT,RA,RB mulld RT,RT,RB subf RT,RT,RA # RT = quotient # RT = quotientdivisor # RT = remainder (if Rc=1) (if OE=1)
72
Power ISA I
XO-form
(OE=0 Rc=0) (OE=0 Rc=1) (OE=1 Rc=0) (OE=1 Rc=1)
RT
11
RA
16
RB
OE
21 22
425
Rc
31
RT
11
RA
16
RB
OE
21 22
393
Rc
31
dividend0:127 (RA) || 640 divisor0:63 (RB) dividend divisor RT The 128-bit dividend is (RA) || 640. The 64-bit divisor is (RB). If the quotient can be represented in 64 bits, it is placed into register RT. The remainder is not supplied as a result. Both operands and the quotient are interpreted as signed integers. The quotient is the unique signed integer that satisfies dividend = (quotient divisor) + r where 0 r < |divisor| if the dividend is nonnegative, and -|divisor| < r 0 if the dividend is negative. If the quotient cannot be represented in 64 bits, or if an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 SO OV (if Rc=1) (if OE=1)
dividend0:127 (RA) || 640 (RB) divisor0:63 dividend divisor RT The 128-bit dividend is (RA) || 640. The 64-bit divisor is (RB). If the quotient can be represented in 64 bits, it is placed into register RT. The remainder is not supplied as a result. Both operands and the quotient are interpreted as unsigned integers, except that if Rc=1 the first three bits of CR Field 0 are set by signed comparison of the result to zero. The quotient is the unique unsigned integer that satisfies dividend = (quotient divisor) + r where 0 r < divisor. If (RA) (RB), or if an attempt is made to perform the division <anything> 0 then the contents of register RT are undefined as are (if Rc=1) the contents of the LT, GT, and EQ bits of CR Field 0. In these cases, if OE=1 then OV is set to 1. Special Registers Altered: CR0 SO OV Programming Note Unsigned long division of a 128-bit dividend contained in two 64-bit registers by a 64-bit divisor can be accomplished using the technique described in the Programming Note with the divweu instruction description: divd[e]u would be used instead of divw[e]u (and cmpld instead of cmplw, etc.). (if Rc=1) (if OE=1)
73
L=1 is part of Category: 64-Bit. When the operands are treated as 32-bit signed quantities, bit 32 of the register (RA or RB) is the sign bit. The Compare instructions set one bit in the leftmost three bits of the designated CR field to 1, and the other
Compare Immediate
cmpi 11
0 6
D-form
Compare
cmp BF,L,RA,RB BF
6
X-form
BF,L,RA,SI BF / L
9 10 11
RA
16
SI
31 0
31
/ L
9 10 11
RA
16
RB
21
/
31
if L = 0 then a EXTS((RA)32:63) (RA) else a if a < EXTS(SI) then c 0b100 0b010 else if a > EXTS(SI) then c 0b001 else c CR4BF+32:4BF+35 c || XERSO The contents of register RA ((RA)32:63 sign-extended to 64 bits if L=0) are compared with the sign-extended value of the SI field, treating the operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Extended Mnemonics: Examples of extended mnemonics for Compare Immediate: Extended: cmpdi Rx,value cmpwi cr3,Rx,value Equivalent to: cmpi 0,1,Rx,value cmpi 3,0,Rx,value
EXTS((RA)32:63) EXTS((RB)32:63) b else a (RA) (RB) b 0b100 if a < b then c else if a > b then c 0b010 0b001 else c c || XERSO CR4BF+32:4BF+35 The contents of register RA ((RA)32:63 if L=0) are compared with the contents of register RB ((RB)32:63 if L=0), treating the operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Extended Mnemonics: Examples of extended mnemonics for Compare: Extended: cmpd Rx,Ry cmpw cr3,Rx,Ry Equivalent to: cmp 0,1,Rx,Ry cmp 3,0,Rx,Ry
if L = 0 then a
74
Power ISA I
D-form
Compare Logical
cmpl BF,L,RA,RB BF
6
X-form
BF,L,RA,UI BF / L
9 10 11
RA
16
UI
31 0
31
/ L
9 10 11
RA
16
RB
21
32
/
31
32 if L = 0 then a 0 || (RA)32:63 else a (RA) 0b100 if a <u (480 || UI) then c 0b010 else if a >u (480 || UI) then c else c 0b001 c || XERSO CR4BF+32:4BF+35
The contents of register RA ((RA)32:63 zero-extended to 64 bits if L=0) are compared with 480 || UI, treating the operands as unsigned integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Extended Mnemonics: Examples of extended mnemonics for Compare Logical Immediate: Extended: cmpldi Rx,value cmplwi cr3,Rx,value Equivalent to: cmpli 0,1,Rx,value cmpli 3,0,Rx,value
32 if L = 0 then a 0 || (RA)32:63 320 || (RB) b 32:63 else a (RA) (RB) b if a <u b then c 0b100 0b010 else if a >u b then c 0b001 else c CR4BF+32:4BF+35 c || XERSO
The contents of register RA ((RA)32:63 if L=0) are compared with the contents of register RB ((RB)32:63 if L=0), treating the operands as unsigned integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF Extended Mnemonics: Examples of extended mnemonics for Compare Logical: Extended: cmpld Rx,Ry cmplw cr3,Rx,Ry Equivalent to: cmpl 0,1,Rx,Ry cmpl 3,0,Rx,Ry
75
D-form
Trap Word
tw TO,RA,RB 31 TO
6 11
X-form
TO,RA,SI TO
11
RA
16
SI
31 0
RA
16
RB
21
/
31
a if if if if if
EXTS((RA)32:63) (a < EXTS(SI)) & TO0 (a > EXTS(SI)) & TO1 (a = EXTS(SI)) & TO2 (a <u EXTS(SI)) & TO3 (a >u EXTS(SI)) & TO4
The contents of RA32:63 are compared with the sign-extended value of the SI field. If any bit in the TO field is set to 1 and its corresponding condition is met by the result of the comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context synchronizing (see Book III). Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Trap Word Immediate: Extended: twgti Rx,value twllei Rx,value Equivalent to: twi 8,Rx,value twi 6,Rx,value
a b if if if if if
EXTS((RA)32:63) EXTS((RB)32:63) (a < b) & TO0 then TRAP (a > b) & TO1 then TRAP (a = b) & TO2 then TRAP (a <u b) & TO3 then TRAP (a >u b) & TO4 then TRAP
The contents of RA32:63 are compared with the contents of RB32:63. If any bit in the TO field is set to 1 and its corresponding condition is met by the result of the comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context synchronizing (see Book III). Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Trap Word: Extended: tweq Rx,Ry twlge Rx,Ry trap Equivalent to: tw 4,Rx,Ry tw 5,Rx,Ry tw 31,0,0
76
Power ISA I
Version 2.06 Revision B 3.3.10.1 64-bit Fixed-Point Trap Instructions [Category: 64-Bit]
Trap Doubleword Immediate
tdi 2
0 6
TO,RA,SI TO
11
X-form
RA
16
TO,RA,RB 31 TO
6 11
RA
16
RB
21
68
/
31
a b if if if if if
(RA) EXTS(SI) (a < b) & TO0 then TRAP (a > b) & TO1 then TRAP (a = b) & TO2 then TRAP (a <u b) & TO3 then TRAP (a >u b) & TO4 then TRAP
The contents of register RA are compared with the sign-extended value of the SI field. If any bit in the TO field is set to 1 and its corresponding condition is met by the result of the comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context synchronizing (see Book III). Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Trap Doubleword Immediate: Extended: tdlti Rx,value tdnei Rx,value Equivalent to: tdi 16,Rx,value tdi 24,Rx,value
a b if if if if if
(RA) (RB) (a < b) & TO0 then TRAP (a > b) & TO1 then TRAP (a = b) & TO2 then TRAP (a <u b) & TO3 then TRAP (a >u b) & TO4 then TRAP
The contents of register RA are compared with the contents of register RB. If any bit in the TO field is set to 1 and its corresponding condition is met by the result of the comparison, the system trap handler is invoked. If the trap conditions are met, this instruction is context synchronizing (see Book III). Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Trap Doubleword: Extended: tdge Rx,Ry tdlnl Rx,Ry Equivalent to: td 12,Rx,Ry td 5,Rx,Ry
A-form
RT,RA,RB,BC RT
11
RA
16
RB
21
BC
26
15
/
31
(RA)
If the contents of bit BC+32 of the Condition Register are equal to 1, then the contents of register RA (or 0) are placed into register RT. Otherwise, the contents of register RB are placed into register RT. Special Registers Altered: None
77
AND Immediate
andi. 28
0 6
D-form
OR Immediate
ori RA,RS,UI 24 RS
6 11
D-form
RA,RS,UI RS
11
RA
16
UI
31 0
RA
16
UI
31
RA
RA
The contents of register RS are ANDed with 480 || UI and the result is placed into register RA. Special Registers Altered: CR0
The contents of register RS are ORed with 480 || UI and the result is placed into register RA. The preferred no-op (an instruction that does nothing) is:
D-form
ori
0,0,0
RA,RS,UI RS
11
RA
Extended Mnemonics: Example of extended mnemonics for OR Immediate: Extended: no-op Equivalent to: ori 0,0,0
RA
The contents of register RS are ANDed with 32 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: CR0
78
Power ISA I
D-form
RA,RS,UI RS
11
RA
16
UI
31
RA
The contents of register RS are ORed with 32 0 || UI || 160 and the result is placed into register RA. Special Registers Altered: None
XOR Immediate
xori 26
0 6
D-form
D-form
RA,RS,UI RS
11
RA
16
UI
31 0
27
RA
16
UI
31
RA
RA
The contents of register RS are XORed with 480 || UI and the result is placed into register RA. The executed form of a no-op (an instruction that does nothing, but consumes execution resources nevertheless) is: xori 0,0,0
The contents of register RS are XORed with 320 || UI || 160 and the result is placed into register RA. Special Registers Altered: None
Special Registers Altered: None Extended Mnemonics: Example of extended mnemonics for XOR Immediate: Extended: xnop Equivalent to: xori 0,0,0
Programming Note The executed form of no-op should be used only when the intent is to alter the timing of a program.
79
X-form
RA,RS,RB RA,RS,RB RS
11
OR
or or. 31
0 6
X-form
RA,RS,RB RA,RS,RB RS
11
(Rc=0) (Rc=1) RB
16 21
(Rc=0) (Rc=1) RB
16 21
RA
28
Rc
31
RA
444
Rc
31
RA
RA
(RS) | (RB)
The contents of register RS are ANDed with the contents of register RB and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
The contents of register RS are ORed with the contents of register RB and the result is placed into register RA. Some forms of or Rx,Rx,Rx provide special functions. See Section 3.2, or Instruction, in Book II. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for OR: (if Rc=1)
XOR
xor xor. 31
0 6
X-form
RA,RS,RB RA,RS,RB RS
11
(Rc=0) (Rc=1) RB
16 21
RA
316
Rc
31
RA
(RS) (RB)
The contents of register RS are XORed with the contents of register RB and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
Warning: Some forms of or Rx,Rx,Rx have undesired side effects. Software should not use undocumented forms of or Rx,Rx,Rx as such values may be assigned meaning in the future. See Section 3.2, or Instruction, in Book II for details. If a no-op is needed, the preferred no-op (ori 0,0,0) should be used.
NAND
nand nand. 31
0 6
X-form
RA,RS,RB RA,RS,RB RS
11
(Rc=0) (Rc=1) RB
16 21
RA
476
Rc
31
RA
((RS)
& (RB))
The contents of register RS are ANDed with the contents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 Programming Note nand or nor with RS=RB can be used to obtain the ones complement. (if Rc=1)
80
Power ISA I
X-form
RA,RS,RB RA,RS,RB RS
11
Equivalent
eqv eqv. 31
0 6
X-form
(Rc=0) (Rc=1) RB
16 21
(Rc=0) (Rc=1) RB
16 21
RA,RS,RB RA,RS,RB RS
11
RA
124
Rc
31
RA
284
Rc
31
RA
((RS)
| (RB))
RA
(RS) (RB)
The contents of register RS are ORed with the contents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for NOR: Extended: not Rx,Ry Equivalent to: nor Rx,Ry,Ry (if Rc=1)
The contents of register RS are XORed with the contents of register RB and the complemented result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
X-form
(Rc=0) (Rc=1) RB 60
21
OR with Complement
orc orc. 31
0 6
X-form
(Rc=0) (Rc=1) RB 412
21
RA,RS,RB RA,RS,RB RS
11
RA,RS,RB RA,RS,RB RS
11
RA
16
Rc
31
RA
16
Rc
31
RA
(RS) &
(RB)
RA
(RS) |
(RB)
The contents of register RS are ANDed with the complement of the contents of register RB and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
The contents of register RS are ORed with the complement of the contents of register RB and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
81
X-form
(Rc=0) (Rc=1)
X-form
(Rc=0) (Rc=1)
RA,RS RA,RS RS
11
RA,RS RA,RS RS
11
RA
16
///
21
954
Rc
31
RA
16
///
21
922
Rc
31
s (RS)56 RA56:63 (RS)56:63 56s RA0:55 (RS)56:63 are placed into RA56:63. RA0:55 are filled with a copy of (RS)56. Special Registers Altered: CR0 (if Rc=1)
s (RS)48 RA48:63 (RS)48:63 48s RA0:47 (RS)48:63 are placed into RA48:63. RA0:47 are filled with a copy of (RS)48. Special Registers Altered: CR0 (if Rc=1)
X-form
(Rc=0) (Rc=1)
Compare Bytes
cmpb 31 RA,RS,RB RS
6 11
X-form
RA,RS RA,RS RS
11
RA
16
RB
21
508
/
31
RA
16
///
21
26
Rc
31
n 32 do while n < 64 if (RS)n = 1 then leave n + 1 n n - 32 RA A count of the number of consecutive zero bits starting at bit 32 of register RS is placed into register RA. This number ranges from 0 to 32, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 Programming Note For both Count Leading Zeros instructions, if Rc=1 then LT is set to 0 in CR Field 0. (if Rc=1)
do n = 0 to 7 if RS8n:8n+7 = (RB)8n:8n+7 then 81 RA8n:8n+7 else 80 RA8n:8n+7 Each byte of the contents of register RS is compared to each corresponding byte of the contents in register RB. If they are equal, the corresponding byte in RA is set to 0xFF. Otherwise the corresponding byte in RA is set to 0x00. Special Registers Altered: None
82
Power ISA I
X-form
X-form
RA, RS RS
11
RA
16
///
21
122
/
31
RA
16
///
21
378
/
31
do i = 0 to 7 n 0 do j = 0 to 7 if (RS)(i8)+j = 1 then n n+1 n RA(i8):(i8)+7 A count of the number of one bits in each byte of register RS is placed into the corresponding byte of register RA. This number ranges from 0 to 8, inclusive. Special Registers Altered: None
do i = 0 to 1 n 0 do j = 0 to 31 if (RS)(i32)+j = 1 then n n+1 n RA(i32):(i32)+31 A count of the number of one bits in each word of register RS is placed into the corresponding word of register RA. This number ranges from 0 to 32, inclusive. Special Registers Altered: None
83
X-form
Parity Word
prtyw RA,RS 31 RS
6 11
X-form
RA
16
///
21
154
/
31
RS
11
RA
16
///
21
186
/
31
s 0 do i = 0 to 7 s / (RS)i%8+7 s 63 RA 0 || s The least significant bit in each byte of the contents of register RS is examined. If there is an odd number of one bits the value 1 is placed into register RA; otherwise the value 0 is placed into register RA. Special Registers Altered: None
s 0 t 0 do i = s do i = t RA0:31 RA32:63
0 to 3 s / (RS)i%8+7 4 to 7 t / (RS)i%8+7 31 0 || s 31 0 || t
The least significant bit in each byte of (RS)0:31 is examined. If there is an odd number of one bits the value 1 is placed into RA0:31; otherwise the value 0 is placed into RA0:31. The least significant bit in each byte of (RS)32:63 is examined. If there is an odd number of one bits the value 1 is placed into RA32:63; otherwise the value 0 is placed into RA32:63. Special Registers Altered: None Programming Note The Parity instructions are designed to be used in conjunction with the Population Count instruction to compute the parity of words or a doubleword. The parity of the upper and lower words in (RS) can be computed as follows. popcntb RA, RS prtyw RA, RA The parity of (RS) can be computed as follows. popcntb RA, RS prtyd RA, RA
84
Power ISA I
Version 2.06 Revision B 3.3.12.1 64-bit Fixed-Point Logical Instructions [Category: 64-Bit]
Extend Sign Word
extsw extsw. 31
0 6
X-form
X-form
(Rc=0) (Rc=1)
RA,RS RA,RS RS
11
RS
11
RA
16
///
21
506
/
31
RA
16
///
21
986
Rc
31
s (RS)32 RA32:63 (RS)32:63 32s RA0:31 (RS)32:63 are placed into RA32:63. RA0:31 are filled with a copy of (RS)32. Special Registers Altered: CR0 (if Rc=1)
n 0 do i = 0 to 63 if (RS)i = 1 then n+1 n RA n A count of the number of one bits in register RS is placed into register RA. This number ranges from 0 to 64, inclusive. Special Registers Altered: None
RA,RS RA,RS RS
11
(Rc=0) (Rc=1) RA
16
///
21
58
Rc
31
n 0 do while n < 64 if (RS)n = 1 then leave n n + 1 n RA A count of the number of consecutive zero bits starting at bit 0 of register RS is placed into register RA. This number ranges from 0 to 64, inclusive. If Rc=1, CR Field 0 is set to reflect the result. Special Registers Altered: CR0 (if Rc=1)
85
X-form
RS
11
RA
16
RB
21
252
/
31
For i = 0 to 7 index (RS)8*i:8*i+7 If index < 64 (RB)index then permi else permi 0 56 0 || perm0:7 RA Eight permuted bits are produced. For each permuted bit i where i ranges from 0 to 7 and for each byte i of RS, do the following. If byte i of RS is less than 64, permuted bit i is set to the bit of RB specified by byte i of RS; otherwise permuted bit i is set to 0. The permuted bits are placed in the least-significant byte of RA, and the remaining bits are filled with 0s. Special Registers Altered: None Programming Note The fact that the permuted bit is 0 if the corresponding index value exceeds 63 permits the permuted bits to be selected from a 128-bit quantity, using a single index register. For example, assume that the 128-bit quantity Q, from which the permuted bits are to be selected, is in registers r2 (high-order 64 bits of Q) and r3 (low-order 64 bits of Q), that the index values are in register r1, with each byte of r1 containing a value in the range 0:127, and that each byte of register r4 contains the value 64. The following code sequence selects eight permuted bits from Q and places them into the low-order byte of r6. bpermd r6,r1,r2 # select from highorder half of Q xor r0,r1,r4 # adjust index values bpermd r5,r0,r3 # select from loworder half of Q or r6,r6,r5 # merge the two selections
86
Power ISA I
RA,RS,SH,MB,ME RA,RS,SH,MB,ME RS
11
(Rc=0) (Rc=1) MB
21 26
RA
16
SH
ME
Rc
31
n r m RA
The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
87
RA,RS,RB,MB,ME RA,RS,RB,MB,ME RS
11
(Rc=0) (Rc=1) MB
21 26
RA
16
RB
ME
Rc
31
n r m RA
The contents of register RS are rotated32 left the number of bits specified by (RB)59:63. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for Rotate Left Word then AND with Mask: Extended: rotlw Rx,Ry,Rz Programming Note Let RSL represent the low-order 32 bits of register RS, with the bits numbered from 0 through 31. rlwnm can be used to extract an n-bit field that starts at variable bit position b in RSL, right-justified into the low-order 32 bits of register RA (clearing the remaining 32-n bits of the low-order 32 bits of RA), by setting RB59:63=b+n, MB=32-n, and ME=31. It can be used to extract an n-bit field that starts at variable bit position b in RSL, left-justified into the low-order 32 bits of register RA (clearing the remaining 32-n bits of the low-order 32 bits of RA), by setting RB59:63=b, MB = 0, and ME=n-1. It can be used to rotate the contents of the low-order 32 bits of a register left (right) by variable n bits, by setting RB59:63=n (32-n), MB=0, and ME=31. For all the uses given above, the high-order 32 bits of register RA are cleared. Extended mnemonics are provided for some of these uses; see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rlwnm Rx,Ry,Rz,0,31 (if Rc=1)
88
Power ISA I
RA,RS,SH,MB,ME RA,RS,SH,MB,ME RS
11
(Rc=0) (Rc=1) MB
21 26
RA
16
SH
ME
Rc
31
n r m RA
The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data are inserted into register RA under control of the generated mask. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for Rotate Left Word Immediate then Mask Insert: Extended: inslwi Rx,Ry,n,b Equivalent to: rlwimi Rx,Ry,32-b,b,b+n-1 (if Rc=1)
Programming Note Let RAL represent the low-order 32 bits of register RA, with the bits numbered from 0 through 31. rlwimi can be used to insert an n-bit field that is left-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-b, MB=b, and ME=(b+n)-1. It can be used to insert an n-bit field that is right-justified in the low-order 32 bits of register RS, into RAL starting at bit position b, by setting SH=32-(b+n), MB=b, and ME=(b+n)-1. Extended mnemonics are provided for both of these uses; see Appendix E, Assembler Extended Mnemonics on page 625.
89
Version 2.06 Revision B 3.3.13.1.1 64-bit Fixed-Point Rotate Instructions [Category: 64-Bit]
Rotate Left Doubleword Immediate then Clear Left MD-form
rldicl rldicl. 30
0 6
RA,RS,SH,MB RA,RS,SH,MB RS
11
(Rc=0) (Rc=1) sh mb
21
RA,RS,SH,ME RA,RS,SH,ME RS
11
(Rc=0) (Rc=1) sh me
21
RA
16
0 sh Rc
27 30 31
RA
16
1 sh Rc
27 30 31
n r b m RA
n r e m RA
The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through bit 63 and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Examples of extended mnemonics for Rotate Left Doubleword Immediate then Clear Left: Extended: extrdi Rx,Ry,n,b srdi Rx,Ry,n clrldi Rx,Ry,n Programming Note rldicl can be used to extract an n-bit field that starts at bit position b in register RS, right-justified into register RA (clearing the remaining 64-n bits of RA), by setting SH=b+n and MB=64-n. It can be used to rotate the contents of a register left (right) by n bits, by setting SH=n (64-n) and MB=0. It can be used to shift the contents of a register right by n bits, by setting SH=64-n and MB=n. It can be used to clear the high-order n bits of a register, by setting SH=0 and MB=n. Extended mnemonics are provided for all of these uses; see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rldicl Rx,Ry,b+n,64-n rldicl Rx,Ry,64-n,n rldicl Rx,Ry,0,n (if Rc=1)
The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit 0 through bit ME and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Examples of extended mnemonics for Rotate Left Doubleword Immediate then Clear Right: Extended: extldi Rx,Ry,n,b sldi Rx,Ry,n clrrdi Rx,Ry,n Programming Note rldicr can be used to extract an n-bit field that starts at bit position b in register RS, left-justified into register RA (clearing the remaining 64-n bits of RA), by setting SH=b and ME=n-1. It can be used to rotate the contents of a register left (right) by n bits, by setting SH=n (64-n) and ME=63. It can be used to shift the contents of a register left by n bits, by setting SH=n and ME=63-n. It can be used to clear the low-order n bits of a register, by setting SH=0 and ME=63-n. Extended mnemonics are provided for all of these uses (some devolve to rldicl); see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rldicr Rx,Ry,b,n-1 rldicr Rx,Ry,n,63-n rldicr Rx,Ry,0,63-n (if Rc=1)
90
Power ISA I
RA,RS,SH,MB RA,RS,SH,MB RS
11
(Rc=0) (Rc=1) sh mb
21
RA,RS,RB,MB RA,RS,RB,MB RS
11
(Rc=0) (Rc=1) RB mb
21 27
RA
16
2 sh Rc
27 30 31
RA
16
Rc
31
n r b m RA
n r b m RA
The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through bit 63-SH and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for Rotate Left Doubleword Immediate then Clear: Extended: clrlsldi Rx,Ry,b,n Programming Note rldic can be used to clear the high-order b bits of the contents of a register and then shift the result left by n bits, by setting SH=n and MB=b-n. It can be used to clear the high-order n bits of a register, by setting SH=0 and MB=n. Extended mnemonics are provided for both of these uses (the second devolves to rldicl); see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rldic Rx,Ry,n,b-n (if Rc=1)
The contents of register RS are rotated64 left the number of bits specified by (RB)58:63. A mask is generated having 1-bits from bit MB through bit 63 and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for Rotate Left Doubleword then Clear Left: Extended: rotld Rx,Ry,Rz Programming Note rldcl can be used to extract an n-bit field that starts at variable bit position b in register RS, right-justified into register RA (clearing the remaining 64-n bits of RA), by setting RB58:63=b+n and MB=64-n. It can be used to rotate the contents of a register left (right) by variable n bits, by setting RB58:63=n (64-n) and MB=0. Extended mnemonics are provided for some of these uses; see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rldcl Rx,Ry,Rz,0 (if Rc=1)
91
RA,RS,RB,ME RA,RS,RB,ME RS
11
(Rc=0) (Rc=1) RB me
21 27
RA,RS,SH,MB RA,RS,SH,MB RS
11
(Rc=0) (Rc=1) sh mb
21
RA
16
Rc
31
RA
16
3 sh Rc
27 30 31
n r e m RA
n r b m RA
The contents of register RS are rotated64 left the number of bits specified by (RB)58:63. A mask is generated having 1-bits from bit 0 through bit ME and 0-bits elsewhere. The rotated data are ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: CR0 Programming Note rldcr can be used to extract an n-bit field that starts at variable bit position b in register RS, left-justified into register RA (clearing the remaining 64-n bits of RA), by setting RB58:63=b and ME=n-1. It can be used to rotate the contents of a register left (right) by variable n bits, by setting RB58:63=n (64-n) and ME=63. Extended mnemonics are provided for some of these uses (some devolve to rldcl); see Appendix E, Assembler Extended Mnemonics on page 625. (if Rc=1)
The contents of register RS are rotated64 left SH bits. A mask is generated having 1-bits from bit MB through bit 63-SH and 0-bits elsewhere. The rotated data are inserted into register RA under control of the generated mask. Special Registers Altered: CR0 Extended Mnemonics: Example of extended mnemonics for Rotate Left Doubleword Immediate then Mask Insert: Extended: insrdi Rx,Ry,n,b Programming Note rldimi can be used to insert an n-bit field that is right-justified in register RS, into register RA starting at bit position b, by setting SH=64-(b+n) and MB=b. An extended mnemonic is provided for this use; see Appendix E, Assembler Extended Mnemonics on page 625. Equivalent to: rldimi Rx,Ry,64-(b+n),b (if Rc=1)
92
Power ISA I
X-form
(Rc=0) (Rc=1) RB
16 21
X-form
(Rc=0) (Rc=1) RB
16 21
RA,RS,RB RA,RS,RB RS
11
RA,RS,RB RA,RS,RB RS
11
RA
24
Rc
31
RA
536
Rc
31
n (RB)59:63 r ROTL32((RS)32:63, n) if (RB)58 = 0 then MASK(32, 63-n) m 640 else m RA r & m The contents of the low-order 32 bits of register RS are shifted left the number of bits specified by (RB)58:63. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RA32:63. RA0:31 are set to zero. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: CR0 (if Rc=1)
n (RB)59:63 r ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then MASK(n+32, 63) m 640 else m RA r & m The contents of the low-order 32 bits of register RS are shifted right the number of bits specified by (RB)58:63. Bits shifted out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The 32-bit result is placed into RA32:63. RA0:31 are set to zero. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: CR0 (if Rc=1)
93
X-form
(Rc=0) (Rc=1)
RA,RS,SH RA,RS,SH RS
11
(Rc=0) (Rc=1) SH
16 21
RA
16
RB
21
792
Rc
31
RA
824
Rc
31
n r m s RA CA
The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 32 of RS is replicated to fill the vacated positions on the left. The 32-bit result is placed into RA32:63. Bit 32 of RS is replicated to fill RA0:31. CA is set to 1 if the low-order 32 bits of (RS) contain a negative number and any 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to receive EXTS((RS)32:63), and CA to be set to 0.
n (RB)59:63 r ROTL32((RS)32:63, 64-n) if (RB)58 = 0 then MASK(n+32, 63) m 64 else m 0 (RS)32 s RA r&m | (64s)&m CA s & ((r&m)32:630) The contents of the low-order 32 bits of register RS are shifted right the number of bits specified by (RB)58:63. Bits shifted out of position 63 are lost. Bit 32 of RS is replicated to fill the vacated positions on the left. The 32-bit result is placed into RA32:63. Bit 32 of RS is replicated to fill RA0:31. CA is set to 1 if the low-order 32 bits of (RS) contain a negative number and any 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to receive EXTS((RS)32:63), and CA to be set to 0. Shift amounts from 32 to 63 give a result of 64 sign bits, and cause CA to receive the sign bit of (RS)32:63.
(if Rc=1)
94
Power ISA I
Version 2.06 Revision B 3.3.13.2.1 64-bit Fixed-Point Shift Instructions [Category: 64-Bit]
Shift Left Doubleword
sld sld. 31
0 6
X-form
(Rc=0) (Rc=1) RB 27
21
X-form
(Rc=0) (Rc=1)
RA,RS,RB RA,RS,RB RS
11
RA,RS,RB RA,RS,RB RS
11
RA
16
Rc
31
RA
16
RB
21
539
Rc
31
n (RB)58:63 r ROTL64((RS), n) if (RB)57 = 0 then MASK(0, 63-n) m 640 else m r & m RA The contents of register RS are shifted left the number of bits specified by (RB)57:63. Bits shifted out of position 0 are lost. Zeros are supplied to the vacated positions on the right. The result is placed into register RA. Shift amounts from 64 to 127 give a zero result. Special Registers Altered: CR0 (if Rc=1)
n (RB)58:63 r ROTL64((RS), 64-n) if (RB)57 = 0 then MASK(n, 63) m 640 else m r & m RA The contents of register RS are shifted right the number of bits specified by (RB)57:63. Bits shifted out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The result is placed into register RA. Shift amounts from 64 to 127 give a zero result. Special Registers Altered: CR0 (if Rc=1)
95
RA,RS,SH RA,RS,SH RS
11
(Rc=0) (Rc=1) sh
16 21
(Rc=0) (Rc=1) RB
16 21
RA
794
Rc
31
RA
413
sh Rc
30 31
n r m s RA CA
sh5 || sh0:4 ROTL64((RS), 64-n) MASK(n, 63) (RS)0 r&m | (64s)&m s & ((r&m)0)
The contents of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Bit 0 of RS is replicated to fill the vacated positions on the left. The result is placed into register RA. CA is set to 1 if (RS) is negative and any 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to be set equal to (RS), and CA to be set to 0. Special Registers Altered: CA CR0
n (RB)58:63 r ROTL64((RS), 64-n) if (RB)57 = 0 then MASK(n, 63) m 64 else m 0 (RS)0 s RA r&m | (64s)&m CA s & ((r&m)0) The contents of register RS are shifted right the number of bits specified by (RB)57:63. Bits shifted out of position 63 are lost. Bit 0 of RS is replicated to fill the vacated positions on the left. The result is placed into register RA. CA is set to 1 if (RS) is negative and any 1-bits are shifted out of position 63; otherwise CA is set to 0. A shift amount of zero causes RA to be set equal to (RS), and CA to be set to 0. Shift amounts from 64 to 127 give a result of 64 sign bits in RA, and cause CA to receive the sign bit of (RS). Special Registers Altered: CA CR0
(if Rc=1)
(if Rc=1)
96
Power ISA I
3.3.14 Binary Coded Decimal (BCD) Assist Instructions [Category: Embedded.Phased-in, Server]
The Binary Coded Decimal Assist instructions operate on Binary Coded Decimal operands (cbcdtd and addg6s) and Decimal Floating-Point operands (cdtbcd) See Chapter 5. for additional information.
XO-form
RA, RS 31 RS
11
RA
16
RB
/
21 22
74
/
31
RA
16
///
21
282
/
31
do i = 0 to 1 i x 32 n 0 RAn+0:n+7 RAn+8:n+19 DEC_TO_BCD( (RS)n+12:n+21 ) DEC_TO_BCD( (RS)n+22:n+31 ) RAn+20:n+31 The low-order 20 bits of each word of register RS contain two declets which are converted to six, 4-bit BCD fields; each set of six, 4-bit BCD fields is placed into the low-order 24 bits of the corresponding word in RA. The high-order 8 bits in each word of RA are set to 0. Special Registers Altered: None
do i = 0 to 15 dci carry_out(RA4xi:63 + RB4xi:63) 4 (dc0) || 4(dc1) || ... || 4(dc15) c RT (c) & 0x6666_6666_6666_6666 The contents of register RA are added to the contents of register RB. Sixteen carry bits are produced, one
RA, RS RS
11
RA
16
///
21
314
/
31
do i = 0 to 1 n i x 32 0 RAn+0:n+11 RAn+12:n+21 BCD_TO_DEC( (RS)n+8:n+19 ) RAn+22:n+31 BCD_TO_DEC( (RS)n+20:n+31 ) The low-order 24 bits of each word of register RS contain six, 4-bit BCD fields which are converted to two declets; each set of two declets is placed into the low-order 20 bits of the corresponding word in RA. The high-order 12 bits in each word of RA are set to 0. If a 4-bit BCD field has a value greater than 9 the results are undefined. Special Registers Altered: None
97
Programming Note addg6s can be used to add or subtract two BCD operands. In these examples it is assumed that r0 contains 0x666...666. (BCD data formats are described in Section 5.3.) Addition of the unsigned BCD operand in register RA to the unsigned BCD operand in register RB can be accomplished as follows. add add addg6s subf r1,RA,r0 r2,r1,RB RT,r1,RB RT,RT,r2# RT = RA +BCD RB
Subtraction of the unsigned BCD operand in register RA from the unsigned BCD operand in register RB can be accomplished as follows. (In this example it is assumed that RB is not register 0.) addi nor add addg6s subf r1,RB,1 r2,RA,RA# one's complement of RA r3,r1,r2 RT,r1,r2 RT,RT,r3# RT = RB -BCD RA
Additional instructions are needed to handle signed BCD operands, and BCD operands that occupy more than one register (e.g., unsigned BCD operands that have more than 16 decimal digits).
98
Power ISA I
Extended mnemonics
Extended mnemonics are provided for the mtspr and mfspr instructions so that they can be coded with the
99
Extended Mnemonics: Examples of extended mnemonics for Move To Special Purpose Register: Extended: mtxer Rx mtlr Rx mtctr Rx mtppr Rx mtppr32 Rx Programming Note The AMR is part of the context of the program (see Book III-S). Therefore modification of the AMR requires synchronization by software. For this reason, most operating systems provide a system library program that application programs can use to modify the AMR. Compiler and Assembler Note For the mtspr and mfspr instructions, the SPR number coded in Assembler language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. Equivalent to: mtspr 1,Rx mtspr 8,Rx mtspr 9,Rx mtspr 896,Rx mtspr 898,Rx
SPR,RS RS
11
spr
21
467
/
31
n spr5:9 || spr0:4 if n = 13 then see Book III-S else if length(SPR(n)) = 64 then (RS) SPR(n) else (RS)32:63 SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in the table below. Unless the SPR field contains 13 (denoting the AMR<S>), the contents of register RS are placed into the designated Special Purpose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. The AMR (Authority Mask Register) is used for storage protection in the Server environment. This use, and operation of mtspr for the AMR, are described in Book III-S. decimal SPR1 Register Name spr5:9 spr0:4 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 13 00000 01101 AMR5 256 01000 00000 VRSAVE 512 10000 00000 SPEFSCR2 896 11100 00000 PPR3 898 11100 00010 PPR324 Note that the order of the two 5-bit halves of the SPR number is reversed. Category: SPE. Category: Server; see Book III-S. Category: Phased-In. See Section 3.1 of Book II. Category: Server; see Book III-S.
1 2 3 4 5
If execution of this instruction is attempted specifying an SPR number that is not shown above, or an SPR number that is shown above but is in a category that is not supported by the implementation, one of the following occurs. If spr0 = 0, the illegal instruction error handler is invoked. If spr0 = 1, the system privileged instruction error handler is invoked. A complete description of this instruction can be found in Book III. Special Registers Altered: See above
100
Power ISA I
If spr0 = 1, the system privileged instruction error handler is invoked. A complete description of this instruction can be found in Book III.
RT,SPR RT
11
spr
21
339
/
31
n spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) RT else 32 RT 0 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in the table below. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the high-order 32 bits of RT are set to zero. decimal Register SPR1 spr5:9 spr0:4 Name 1 00000 00001 XER 8 00000 01000 LR 9 00000 01001 CTR 13 00000 01101 AMR8 136 00100 01000 CTRL 256 01000 00000 VRSAVE 259 01000 00011 SPRG3 260 01000 00100 SPRG42 261 01000 00101 SPRG52 262 01000 00110 SPRG62 263 01000 00111 SPRG72 268 01000 01100 TB3 269 01000 01101 TBU3 512 10000 00000 SPEFSCR4 526 10000 01110 ATB3,5 527 10000 01111 ATBU3,5 896 11100 00000 PPR6 898 11100 00010 PPR327 Note that the order of the two 5-bit halves of the SPR number is reversed. Category: Embedded. See Chapter 5 of Book II. Category: SPE. Category: Alternate Time Base. Category: Server; see Book III-S. Category: Phased-In. See Section 3.1 of Book II. Category: Server; see Book III-S.
Examples of extended mnemonics for Move From Special Purpose Register: Extended: mfxer Rx mflr Rx mfctr Rx Note See the Notes that appear with mtspr. Equivalent to: mfspr Rx,1 mfspr Rx,8 mfspr Rx,9
1 2 3 4 5 6 7 8
If execution of this instruction is attempted specifying an SPR number that is not shown above, or an SPR number that is shown above but is in a category that is not supported by the implementation, one of the following occurs. If spr0 = 0, the illegal instruction error handler is invoked.
101
FXM,RS RS 0
11 12
31 FXM /
20 21
0
11 12
///
21
19
/
31
144
/
31
RT
32
0 || CR
4 mask (FXM0) || 4(FXM1) || ... 4(FXM7) ((RS)32:63 & mask) | (CR & mask) CR
The contents of the Condition Register are placed into RT32:63. RT0:31 are set to 0. Special Registers Altered: None
The contents of bits 32:63 of register RS are placed into the Condition Register under control of the field mask specified by FXM. The field mask identifies the 4-bit fields affected. Let i be an integer in the range 0-7. If FXMi=1 then CR field i (CR bits 4i+32:4i+35) is set to the contents of the corresponding field of the low-order 32 bits of RS. Special Registers Altered: CR fields selected by mask Extended Mnemonics: Example of extended mnemonics for Move To Condition Register Fields: Extended: mtcr Rx Programming Note In the preferred form of this instruction (mtocrf), only one Condition Register field is updated. Equivalent to: mtcrf 0xFF,Rx
102
Power ISA I
Version 2.06 Revision B 3.3.15.1 Move To/From One Condition Register Field Instructions
Move To One Condition Register Field XFX-form
mtocrf 31
0 6
FXM,RS RS 1 FXM /
20 21
144
/
31 0
31
FXM
/
20 21
19
/
31
11 12
11 12
count 0 do i = 0 to 7 if FXMi = 1 then i n count count + 1 if count = 1 then CR4n+32:4n+35 (RS)4n+32:4n+35 else CR undefined If exactly one bit of the FXM field is set to 1, let n be the position of that bit in the field (0 n 7). The contents of bits 4n+32:4n+35 of register RS are placed into CR field n (CR bits 4n+32:4n+35). Otherwise, the contents of the Condition Register are undefined. Special Registers Altered: CR field selected by FXM Programming Note These forms of the mtcrf and mfcr instructions are intended to replace the old forms of the instructions (the forms shown in page 102), which will eventually be phased out of the architecture. The new forms are backward compatible with most processors that comply with versions of the architecture that precede Version 2.00. On those processors, the new forms are treated as the old forms. However, on some processors that comply with versions of the architecture that precede Version 2.00 the new forms may be treated as follows: mtocrf: may cause the system illegal instruction error handler to be invoked mfocrf: may place an undefined value into register RT
RT undefined count 0 do i = 0 to 7 if FXMi = 1 then n i count count + 1 if count = 1 then RT4n+32:4n+35 CR4n+32:4n+35 If exactly one bit of the FXM field is set to 1, let n be the position of that bit in the field (0 n 7). The contents of CR field n (CR bits 4n+32:4n+35) are placed into bits 4n+32:4n+35 of register RT and the contents of the remaining bits of register RT are undefined. Otherwise, the contents of register RT are undefined. Special Registers Altered: None
103
Version 2.06 Revision B 3.3.15.2 Move To/From System Registers [Category: Embedded]
Move to Condition Register from XER X-form
mcrxr 31
0 6
BF BF
9
//
11
///
16
///
21
512
/
31 0
31
6
RT
11
RA
16
///
21
291
/
31
XER32:35 DCRN (RA) RT DCR(DCRN) Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers are implementation-dependent.) The contents of the designated Device Control Register are placed into RT. For 32-bit Device Control Registers, the contents of bits 32:63 of the designated Device Control Register are placed into RT.
The contents of XER32:35 are copied to Condition Register field BF. XER32:35 are set to zero. Special Registers Altered: CR field BF XER32:35
See Move From Device Control Register Indexed X-form on page 901 in Book III for more information on this instruction. Special Registers Altered: Implementation-dependent
RS
11
RA
16
///
21
419
/
31
DCRN (RA) RS DCR(DCRN) Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers are implementation-dependent.) The contents of RS are placed into the designated Device Control Register. For 32-bit Device Control Registers, the contents of bits 32:63 of RS are placed into the Device Control Register. See Move To Device Control Register Indexed X-form on page 900 in Book III for more information on this instruction. Special Registers Altered: Implementation-dependent
104
Power ISA I
system compliant with the ANSI/IEEE Standard 754-1985, IEEE Standard for Binary Floating-Point Arithmetic (hereafter referred to as the IEEE standard). That standard defines certain required operations (addition, subtraction, etc.). Herein, the term floating-point operation is used to refer to one of these required operations and to additional operations defined (e.g., those performed by Multiply-Add or Reciprocal Estimate instructions). A Non-IEEE mode is
105
Floating-Point Exceptions
The following floating-point exceptions are detected by the processor: Invalid Operation Exception SNaN Infinity-Infinity InfinityInfinity ZeroZero InfinityZero Invalid Compare Software-Defined Condition Invalid Square Root Invalid Integer Convert Zero Divide Exception Overflow Exception Underflow Exception Inexact Exception (VX) (VXSNAN) (VXISI) (VXIDI) (VXZDZ) (VXIMZ) (VXVC) (VXSOFT) (VXSQRT) (VXCVI) (ZX) (OX) (UX) (XX)
Each floating-point exception, and each category of Invalid Operation Exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a corresponding enable bit in the FPSCR. See Section 4.2.2, Floating-Point Status and Control Register on page 107 for a description of these exception and enable bits, and Section 4.4, Floating-Point Exceptions on page 114 for a detailed discussion of floating-point exceptions, including the effects of the enable bits.
106
Power ISA I
Bit(s) 0:31 32
Description Reserved Floating-Point Exception Summary (FX) Every floating-point instruction, except mtfsfi and mtfsf, implicitly sets FPSCRFX to 1 if that instruction causes any of the floating-point exception bits in the FPSCR to change from 0 to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 can alter FPSCRFX explicitly. Programming Note FPSCRFX is defined not to be altered implicitly by mtfsfi and mtfsf because permitting these instructions to alter FPSCRFX implicitly could cause a paradox. An example is an mtfsfi or mtfsf instruction that supplies 0 for FPSCRFX and 1 for FPSCROX, and is executed when FPSCROX=0. See also the Programming Notes with the definition of these two instructions.
33
Floating-Point Enabled Exception Summary (FEX) This bit is the OR of all the floating-point exception bits masked by their respective enable bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot alter FPSCRFEX explicitly. Floating-Point Invalid Operation Exception Summary (VX) This bit is the OR of all the Invalid Operation exception bits. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot alter FPSCRVX explicitly. Floating-Point Overflow Exception (OX) See Section 4.4.3, Overflow Exception on page 117. Floating-Point Underflow Exception (UX) See Section 4.4.4, Underflow Exception on page 118. Floating-Point Zero Divide Exception (ZX) See Section 4.4.2, Zero Divide Exception on page 117. Floating-Point Inexact Exception (XX) See Section 4.4.5, Inexact Exception on page 119. FPSCRXX is a sticky version of FPSCRFI (see below). Thus the following rules completely describe how FPSCRXX is set by a given instruction. If the instruction affects FPSCRFI, the new value of FPSCRXX is obtained by ORing the old value of FPSCRXX with the new value of FPSCRFI.
34
36
37
38
Figure 49. Floating-Point Status and Control Register The bit definitions for the FPSCR are as follows.
107
Programming Note A single-precision operation that produces a denormalized result sets FPRF to indicate a denormalized number. When possible, single-precision denormalized numbers are represented in normalized double format in the target register.
40
47
41
42
Floating-Point Result Class Descriptor (C) Arithmetic, rounding, and Convert From Integer instructions may set this bit with the FPCC bits, to indicate the class of the result as shown in Figure 50 on page 109. Floating-Point Condition Code (FPCC) Floating-point Compare instructions set one of the FPCC bits to 1 and the other three FPCC bits to 0. Arithmetic, rounding, and Convert From Integer instructions may set the FPCC bits with the C bit, to indicate the class of the result as shown in Figure 50 on page 109. Note that in this case the high-order three bits of the FPCC retain their relational significance indicating that the value is less than, greater than, or equal to zero. Floating-Point Less Than or Negative (FL or <) Floating-Point Greater Than or Positive (FG or >) Floating-Point Equal or Zero (FE or =) Floating-Point Unordered or NaN (FU or ?) Reserved Floating-Point Invalid Operation Exception (Software-Defined Condition) (VXSOFT) This bit can be altered only by mcrfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1. See Section 4.4.1. Programming Note FPSCRVXSOFT can be used by software to indicate the occurrence of an arbitrary, software-defined, condition that is to be treated as an Invalid Operation Exception. For example, the bit could be set by a program that computes a base 10 logarithm if the supplied input is negative.
43
44
45
48 49 50 51 52 53
46
54
Floating-Point Invalid Operation Exception (Invalid Square Root) (VXSQRT) See Section 4.4.1. Floating-Point Invalid Operation Exception (Invalid Integer Convert) (VXCVI) See Section 4.4.1.
55
108
Power ISA I
Programming Note When the processor is in floating-point non-IEEE mode, the results of floating-point operations may be approximate, and performance for these operations may be better, more predictable, or less data-dependent than when the processor is not in non-IEEE mode. For example, in non-IEEE mode an implementation may return 0 instead of a denormalized number, and may return a large number instead of an infinity. Floating-Point Rounding Control (RN) See Section 4.3.6, Rounding on page 113. 00 01 10 11 Round to Nearest Round toward Zero Round toward +Infinity Round toward -Infinity
57
58
59
60
61
C 1 0 0 1 1 0 1 0 0
Result Value Class ? 1 1 0 0 0 0 0 0 1 Quiet NaN - Infinity - Normalized Number - Denormalized Number - Zero + Zero + Denormalized Number + Normalized Number + Infinity
FRACTION
31
109
S
0 1
EXP
12
FRACTION
63
Figure 52. Floating-point double format Values in floating-point format are composed of three fields: S EXP FRACTION sign bit exponent+bias fraction
Figure 54. Approximation to real numbers The NaNs are not related to the numeric values or infinities by order or value but are encodings used to convey diagnostic information such as the representation of uninitialized variables. The following is a description of the different floating-point values defined in the architecture: Binary floating-point numbers Machine representable values used as approximations to real numbers. Three categories of numbers are supported: normalized numbers, denormalized numbers, and zero values. Normalized numbers ( NOR) These are values that have a biased exponent value in the range: 1 to 254 in single format 1 to 2046 in double format They are values in which the implied unit bit is 1. Normalized numbers are interpreted as follows: NOR = (-1)s x 2E x (1.fraction) where s is the sign, E is the unbiased exponent, and 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part. The ranges covered by the magnitude (M) of a normalized floating-point number are approximately equal to: Single Format:
Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent (EXP), and the fraction portion (FRACTION) of the significand. The significand consists of a leading implied bit concatenated on the right with the FRACTION. This leading implied bit is 1 for normalized numbers and 0 for denormalized numbers and is located in the unit bit position (i.e., the first bit to the left of the binary point). Values representable within the two floating-point formats can be specified by the parameters listed in Figure 53. Format Double +1023 +1023 -1022
Single Exponent Bias Maximum Exponent Minimum Exponent Widths (bits) Format Sign Exponent Fraction Significand +127 +127 -126
32 1 8 23 24
64 1 11 52 53
Figure 53. IEEE floating-point fields The architecture requires that the FPRs of the Floating-Point Facility support the floating-point double format only.
1.2x10-38 M 3.4x1038 Double Format: 2.2x10-308 M 1.8x10308 Zero values ( 0) These are values that have a biased exponent value of zero and a fraction value of zero. Zeros can have a positive or negative sign. The sign of zero is ignored by comparison operations (i.e., comparison regards +0 as equal to -0). Denormalized numbers ( DEN) These are values that have a biased exponent value of zero and a nonzero fraction value. They are nonzero numbers smaller in magnitude than the representable normalized numbers. They are values in which the implied unit bit is 0. Denormalized numbers are interpreted as follows: DEN = (-1)s x 2Emin x (0.fraction)
110
Power ISA I
111
1. Load Floating-Point Single This form of instruction accesses a single-precision operand in single format in storage, converts it to double format, and loads it into an FPR. No floating-point exceptions are caused by these instructions. 2. Round to Floating-Point Single-Precision The Floating Round to Single-Precision instruction rounds a double-precision operand to single-precision, checking the exponent for single-precision range and handling any exceptions according to respective enable bits, and places that operand into an FPR in double format. For results produced by single-precision arithmetic instructions, single-precision loads, and other instances of the Floating Round to Single-Precision instruction, this operation does not alter the value. 3. Single-Precision Arithmetic Instructions This form of instruction takes operands from the FPRs in double format, performs the operation as if it produced an intermediate result having infinite precision and unbounded exponent range, and then coerces this intermediate result to fit in single format. Status bits, in the FPSCR and optionally in the Condition Register, are set to reflect the single-precision result. The result is then converted to double format and placed into an FPR. The result lies in the range supported by the single format. All input values must be representable in single format; if they are not, the result placed into the target FPR, and the setting of status bits in the FPSCR and in the Condition Register (if Rc=1), are undefined. 4. Store Floating-Point Single This form of instruction converts a double-precision operand to single format and stores that operand into storage. No floating-point exceptions are caused by these instructions. (The value being stored is effectively assumed to be the result of an instruction of one of the preceding three types.)
112
Power ISA I
4.3.6 Rounding
The material in this section applies to operations that have numeric operands (i.e., operands that are not infinities or NaNs). Rounding the intermediate result of such an operation may cause an Overflow Exception, an Underflow Exception, or an Inexact Exception. The remainder of this section assumes that the operation causes no exceptions and that the result is numeric. See Section 4.3.2, Value Representation and Section 4.4, Floating-Point Exceptions for the cases not covered here. The Arithmetic and Rounding and Conversion instructions round their intermediate results. With the exception of the Estimate instructions, these instructions produce an intermediate result that can be regarded as having infinite precision and unbounded exponent range. All but two groups of these instructions normalize or denormalize the intermediate result prior to rounding and then place the final result into the target FPR in double format. The Floating Round to Integer and Floating Convert To Integer instructions with biased exponents ranging from 1022 through 1074 are prepared for rounding by repetitively shifting the significand right one position and incrementing the biased exponent until it reaches a value of 1075. (Intermediate results with biased exponents 1075 or larger are already integers, and with biased exponents 1021 or less round to zero.) After rounding, the final result for Floating Round to Integer is normalized and put in double format, and for Floating Convert To Integer is converted to a signed fixed-point integer. FPSCR bits FR and FI generally indicate the results of rounding. Each of the instructions which rounds its intermediate result sets these bits. If the fraction is incremented during rounding then FR is set to 1, otherwise FR is set to 0. If the result is inexact then FI is set to 1, otherwise FI is set to zero. The Round to Integer instructions are exceptions to this rule, setting FR and FI to 0. The Estimate instructions set FR and FI to undefined values. The remaining floating-point instructions do not alter FR and FI. Four user-selectable rounding modes are provided through the Floating-Point Rounding Control field in the
113
RN 00 01 10 11
Rounding Mode Round to Nearest Round toward Zero Round toward +Infinity Round toward -Infinity
Let Z be the intermediate arithmetic result or the operand of a convert operation. If Z can be represented exactly in the target format, then the result in all rounding modes is Z as represented in the target format. If Z cannot be represented exactly in the target format, let Z1 and Z2 bound Z as the next larger and next smaller numbers representable in the target format. Then Z1 or Z2 can be used to approximate the result in the target format. Figure 55 shows the relation of Z, Z1, and Z2 in this case. The following rules specify the rounding in the four modes. LSB means least significant bit.
By Incrementing LSB of Z Infinitely Precise Value By Truncating after LSB
Z2 Z1 Z Negative values
Z2 Z1 Z Positive values
Figure 55. Selection of Z1 and Z2 Round to Nearest Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is even (least significant bit 0). Round toward Zero Choose the smaller in magnitude (Z1 or Z2). Round toward +Infinity Choose Z1. Round toward -Infinity Choose Z2. See Section 4.5.1, Execution Model for IEEE Operations on page 119 for a detailed explanation of rounding.
114
Power ISA I
In all cases, the question of whether a floating-point result is stored, and what value is stored, is governed by the FPSCR exception enable bits, as described in subsequent sections, and is not affected by the value of the FE0 and FE1 bits. In all cases in which the system floating-point enabled exception error handler is invoked, all instructions
115
4.4.1.2 Action
The action to be taken depends on the setting of the Invalid Operation Exception Enable bit of the FPSCR. When Invalid Operation Exception is enabled (FPSCRVE=1) and an Invalid Operation Exception occurs, the following actions are taken: 1. One or two Invalid Operation Exceptions are set FPSCRVXSNAN (if SNaN) (if - ) FPSCRVXISI FPSCRVXIDI (if ) FPSCRVXZDZ (if 0 0) (if 0) FPSCRVXIMZ FPSCRVXVC (if invalid comp) (if sfw-def cond) FPSCRVXSOFT (if invalid sqrt) FPSCRVXSQRT FPSCRVXCVI (if invalid int cvrt) 2. If the operation is an arithmetic, Floating Round to Single-Precision, Floating Round to Integer, or convert to integer operation, the target FPR is unchanged FPSCRFR FI are set to zero FPSCRFPRF is unchanged 3. If the operation is a compare, FPSCRFR FI C are unchanged FPSCRFPCC is set to reflect unordered 4. If an mtfsfi, mtfsf, or mtfsb1 instruction is executed that sets FPSCRVXSOFT to 1, The FPSCR is set as specified in the instruction description.
116
Power ISA I
4.4.2.2 Action
The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR. When Zero Divide Exception is enabled (FPSCRZE=1) and a Zero Divide Exception occurs, the following actions are taken: 1. Zero Divide Exception is set FPSCRZX 1 2. The target FPR is unchanged 3. FPSCRFR FI are set to zero 4. FPSCRFPRF is unchanged When Zero Divide Exception is disabled (FPSCRZE=0) and a Zero Divide Exception occurs, the following actions are taken: 1. Zero Divide Exception is set FPSCRZX 1 2. The target FPR is set to Infinity, where the sign is determined by the XOR of the signs of the operands 3. FPSCRFR FI are set to zero 4. FPSCRFPRF is set to indicate the class and sign of the result ( Infinity)
4.4.3.2 Action
The action to be taken depends on the setting of the Overflow Exception Enable bit of the FPSCR. When Overflow Exception is enabled (FPSCROE=1) and an Overflow Exception occurs, the following actions are taken: 1. Overflow Exception is set FPSCROX 1 2. For double-precision arithmetic instructions, the exponent of the normalized intermediate result is adjusted by subtracting 1536 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the exponent of the normalized intermediate result is adjusted by subtracting 192 4. The adjusted rounded result is placed into the target FPR 5. FPSCRFPRF is set to indicate the class and sign of the result ( Normal Number) When Overflow Exception is disabled (FPSCROE=0) and an Overflow Exception occurs, the following actions are taken:
117
4.4.4.2 Action
The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) and an Underflow Exception occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 2. For double-precision arithmetic instructions, the exponent of the normalized intermediate result is adjusted by adding 1536 3. For single-precision arithmetic instructions and the Floating Round to Single-Precision instruction, the exponent of the normalized intermediate result is adjusted by adding 192 4. The adjusted rounded result is placed into the target FPR 5. FPSCRFPRF is set to indicate the class and sign of the result ( Normalized Number)
118
Power ISA I
Programming Note The FR and FI bits are provided to allow the system floating-point enabled exception error handler, when invoked because of an Underflow Exception, to simulate a trap disabled environment. That is, the FR and FI bits allow the system floating-point enabled exception error handler to unround the result, thus allowing the result to be denormalized. When Underflow Exception is disabled (FPSCRUE=0) and an Underflow Exception occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 2. The rounded result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of the result ( Normalized Number, Denormalized Number, or Zero)
4.4.5.2 Action
The action to be taken does not depend on the setting of the Inexact Exception Enable bit of the FPSCR. When an Inexact Exception occurs, the following actions are taken: 1. Inexact Exception is set FPSCRXX 1 2. The rounded or overflowed result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of the result Programming Note In some implementations, enabling Inexact Exceptions may degrade performance more than does enabling other types of floating-point exception.
119
FRACTION
GR X
53 54 55
The significand of the intermediate result is prepared for rounding by shifting its contents right, if required, until the least significant bit to be retained is in the low-order bit position of the fraction. Four user-selectable rounding modes are provided through FPSCRRN as described in Section 4.3.6, Rounding on page 113. Using Z1 and Z2 as defined on page 113, the rules for rounding in each mode are as follows. Round to Nearest Guard bit = 0 The result is truncated. (Result exact (GRX=000) or closest to next lower value in magnitude (GRX=001, 010, or 011)) Guard bit = 1 Depends on Round and Sticky bits: Case a If the Round or Sticky bit is 1 (inclusive), the result is incremented. (Result closest to next higher value in magnitude (GRX=101, 110, or 111)) Case b If the Round and Sticky bits are 0 (result midway between closest representable values), then if the low-order bit of the result is 1 the result is incremented. Otherwise (the low-order bit of the result is 0) the result is truncated (this is the case of a tie rounded to even). Round toward Zero Choose the smaller in magnitude of Z1 or Z2. If the Guard, Round, or Sticky bit is nonzero, the result is inexact. Round toward + Infinity Choose Z1. Round toward - Infinity Choose Z2. If rounding results in a carry into C, the significand is shifted right one position and the exponent is incremented by one. This yields an inexact result, and possibly also exponent overflow. If any of the Guard, Round, or Sticky bits is nonzero, then the result is also inexact. Fraction bits are stored to the target FPR. For Floating Round to Integer, Floating Round to Single-Precision, and single-precision arithmetic instructions, low-order zeros must be appended as appropriate to fill out the double-precision fraction.
Figure 56. IEEE 64-bit execution model The S bit is the sign bit. The C bit is the carry bit, which captures the carry out of the significand. The L bit is the leading unit bit of the significand, which receives the implicit bit from the operand. The FRACTION is a 52-bit field that accepts the fraction of the operand. The Guard (G), Round (R), and Sticky (X) bits are extensions to the low-order bits of the accumulator. The G and R bits are required for postnormalization of the result. The G, R, and X bits are required during rounding to determine if the intermediate result is equally near the two nearest representable values. The X bit serves as an extension to the G and R bits by representing the logical OR of all bits that may appear to the low-order side of the R bit, due either to shifting the accumulator right or to other generation of low-order result bits. The G and R bits participate in the left shifts with zeros being shifted into the R bit. Figure 57 shows the significance of the G, R, and X bits with respect to the intermediate result (IR), the representable number next lower in magnitude (NL), and the representable number next higher in magnitude (NH). GRX 000 001 010 011 100 101 110 111 Figure 57. Interpretation of G, R, and X bits Figure 58 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision floating-point numbers relative to the accumulator illustrated in Figure 56. Format Guard Double G bit Single 24 Round R bit 25 Sticky X bit OR of 26:52, G, R, X IR closer to NH IR midway between NL and NH IR closer to NL Interpretation IR is exact
Figure 58. Location of the Guard, Round, and Sticky bits in the IEEE execution model
120
Power ISA I
If the instruction is Floating Negative Multiply-Add or Floating Negative Multiply-Subtract, the final result is negated.
FRACTION
X
106
Figure 59. Multiply-add 64-bit execution model The first part of the operation is a multiplication. The multiplication has two 53-bit significands as inputs, which are assumed to be prenormalized, and produces a result conforming to the above model. If there is a carry out of the significand (into the C bit), then the significand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the FRACTION and shifting the C bit (carry out) into the L bit. All 106 bits (L bit, the FRACTION) of the product take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of the operand with the smaller exponent is aligned (shifted) to the right by an amount that is added to that exponent to make it equal to the other inputs exponent. Zeros are shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the significand are ORed into the X bit. The add operation also produces a result conforming to the above model with the X bit taking part in the add operation. The result of the addition is then normalized, with all bits of the addition result, except the X bit, participating in the shift. The normalized result serves as the intermediate result that is input to the rounder. For rounding, the conceptual Guard, Round, and Sticky bits are defined in terms of accumulator bits. Figure 60 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision floating-point numbers in the multiply-add execution model. Format Guard Double 53 Single 24 Round 54 25 Sticky OR of 55:105, X OR of 26:105, X
Figure 60. Location of the Guard, Round, and Sticky bits in the multiply-add execution model The rules for rounding the intermediate result are the same as those given in Section 4.5.1.
121
122
Power ISA I
123
D-form
FRT,D(RA) FRT
11
RA
16
D
31
FRT
11
RA
16
RB
21
535
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA DOUBLE(MEM(EA, 4)) FRT Let the effective address (EA) be the sum (RA|0)+D. The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double format (see page 123) and placed into register FRT. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA FRT DOUBLE(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double format (see page 123) and placed into register FRT. Special Registers Altered: None
FRT,D(RA) FRT
11
RA
16
D
31 0
31
RA
16
RB
21
567
/
31
EA FRT RA
EA FRT RA
Let the effective address (EA) be the sum (RA)+D. The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double format (see page 123) and placed into register FRT. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
Let the effective address (EA) be the sum (RA)+(RB). The word in storage addressed by EA is interpreted as a floating-point single-precision operand. This word is converted to floating-point double format (see page 123) and placed into register FRT. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
124
Power ISA I
D-form
FRT,D(RA) FRT
11
RA
16
D
31
FRT
11
RA
16
RB
21
599
/
31
0 (RA)
Let the effective address (EA) be the sum (RA|0)+D. The doubleword in storage addressed by EA is loaded into register FRT. Special Registers Altered: None
0 (RA)
Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into register FRT. Special Registers Altered: None
FRT,D(RA) FRT
11
RA
16
D
31 0
31
RA
16
RB
21
631
/
31
EA FRT RA
EA FRT RA
Let the effective address (EA) be the sum (RA)+D. The doubleword in storage addressed by EA is loaded into register FRT. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
Let the effective address (EA) be the sum (RA)+(RB). The doubleword in storage addressed by EA is loaded into register FRT. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
125
FRT,RA,RB FRT
11
RA
16
RB
21
855
/
31
if RA = 0 then b 0 (RA) else b b + (RB) EA FRT EXTS(MEM(EA, 4)) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into FRT32:63. FRT0:31 are filled with a copy of bit 0 of the loaded word. Special Registers Altered: None
FRT
11
RA
16
RB
21
887
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) 320 || MEM(EA, 4) FRT Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into FRT32:63. FRT0:31 are set to 0. Special Registers Altered: None
126
Power ISA I
127
D-form
FRS,D(RA) FRS
11
RA
16
D
31 0
31
RA
16
RB
21
663
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA SINGLE((FRS)) MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+D. The contents of register FRS are converted to single format (see page 127) and stored into the word in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 4) SINGLE((FRS)) Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are converted to single format (see page 127) and stored into the word in storage addressed by EA. Special Registers Altered: None
FRS,D(RA) FRS
11
RA
16
D
31 0
31
RA
16
RB
21
695
/
31
EA (RA) + EXTS(D) SINGLE((FRS)) MEM(EA, 4) RA EA Let the effective address (EA) be the sum (RA)+D. The contents of register FRS are converted to single format (see page 127) and stored into the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) SINGLE((FRS)) MEM(EA, 4) RA EA Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are converted to single format (see page 127) and stored into the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
128
Power ISA I
D-form
FRS,D(RA) FRS
11
RA
16
D
31 0
31
RA
16
RB
21
727
/
31
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA (FRS) MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+D. The contents of register FRS are stored into the doubleword in storage addressed by EA. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 8) (FRS) Let the effective address (EA) be the sum (RA|0)+(RB). The contents of register FRS are stored into the doubleword in storage addressed by EA. Special Registers Altered: None
FRS,D(RA) FRS
11
RA
16
D
31 0
31
RA
16
RB
21
759
/
31
EA (RA) + EXTS(D) (FRS) MEM(EA, 8) RA EA Let the effective address (EA) be the sum (RA)+D. The contents of register FRS are stored into the doubleword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
EA (RA) + (RB) (FRS) MEM(EA, 8) RA EA Let the effective address (EA) be the sum (RA)+(RB). The contents of register FRS are stored into the doubleword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
129
FRS,RA,RB FRS
11
RA
16
RB
21
983
/
31
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 4) (FRS)32:63 Let the effective address (EA) be the sum (RA|0)+(RB). (FRS)32:63 are stored, without conversion, into the word in storage addressed by EA. If the contents of register FRS were produced, either directly or indirectly, by a Load Floating-Point Single instruction, a single-precision Arithmetic instruction, or frsp, then the value stored is undefined. (The contents of register FRS are produced directly by such an instruction if FRS is the target register for the instruction. The contents of register FRS are produced indirectly by such an instruction if FRS is the final target register of a sequence of one or more Floating-Point Move instructions, with the input to the sequence having been produced directly by such an instruction.) Special Registers Altered: None
130
Power ISA I
FRTp,DS(RA) FRTp
11
RA
16
DS
00
30 31 0
61
RA
16
DS
00
30 31
if RA = 0 then b 0 else b (RA) EA b + EXTS(DS||0b00) MEM(EA, 16) FRTp Let the effective address (EA) be the sum (RA|0) + (DS||0b00). The doubleword-pair in storage addressed by EA is placed into register-pair FRTp. If FRTp is odd, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + EXTS(DS||0b00) FRSp MEM(EA, 16) Let the effective address (EA) be the sum (RA|0) + (DS||0b00). The contents of register-pair FRSp are stored into the doubleword-pair in storage addressed by EA. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None
FRTp,RA,RB FRTp
11
RA
16
RB
21
791
/
31 0
31
RA
16
RB
21
919
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) MEM(EA, 16) FRTp Let the effective address (EA) be the sum (RA|0) + (RB). The doubleword-pair in storage addressed by EA is placed into register-pair FRTp. If FRTp is odd, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) FRSp MEM(EA, 16) Let the effective address (EA) be the sum (RA|0) + (RB). The contents of register-pair FRSp are stored into the doubleword-pair in storage addressed by EA. If FRSp is odd, the instruction form is invalid. Special Registers Altered: None
131
X-form
(Rc=0) (Rc=1)
Floating Negate
fneg fneg. 63
0 6
X-form
(Rc=0) (Rc=1) FRB
16 21
///
FRB
16 21
72
Rc
31
///
40
Rc
31
The contents of register FRB are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
The contents of register FRB with bit 0 inverted are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
X-form
(Rc=0) (Rc=1)
X-form
(Rc=0) (Rc=1) FRB
16 21
///
FRB
16 21
264
Rc
31
Rc
31
The contents of register FRB with bit 0 set to zero are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
The contents of register FRB with bit 0 set to the value of bit 0 of register FRA are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
///
136
Rc
31
The contents of register FRB with bit 0 set to one are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
132
Power ISA I
A-form
(Rc=0) (Rc=1) FRB ///
21 26
A-form
(Rc=0) (Rc=1)
21
Rc
31
FRB
16 21
///
26
20
Rc
31
16
fadds fadds. 59
0 6
fsubs fsubs. 59
0 6
21
Rc
31
20
Rc
31
16
16
The floating-point operand in register FRA is added to the floating-point operand in register FRB. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. If a carry occurs, the sums significand is shifted right one bit position and the exponent is increased by one. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1
The floating-point operand in register FRB is subtracted from the floating-point operand in register FRA. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. The execution of the Floating Subtract instruction is identical to that of Floating Add, except that the contents of FRB participate in the operation with the sign bit (bit 0) inverted. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1
(if Rc=1)
(if Rc=1)
133
A-form
(Rc=0) (Rc=1)
A-form
(Rc=0) (Rc=1)
///
FRC
21 26
25
Rc
31
FRB
16 21
///
26
18
Rc
31
fmuls fmuls. 59
0 6
fdivs fdivs. 59
0 6
25
Rc
31
18
Rc
31
16
The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point multiplication is based on exponent addition and multiplication of the significands. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXIMZ CR1
The floating-point operand in register FRA is divided by the floating-point operand in register FRB. The remainder is not supplied as a result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. Floating-point division is based on exponent subtraction and division of the significands. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. Special Registers Altered: FPRF FR FI FX OX UX ZX XX VXSNAN VXIDI VXZDZ CR1
(if Rc=1)
(if Rc=1)
134
Power ISA I
A-form
(Rc=0) (Rc=1)
///
FRB
16 21
///
26
22
Rc
31
///
///
26
24
Rc
31
fsqrts fsqrts. 59
0 6
fres fres. 59
0 6
///
///
26
22
Rc
31
///
///
26
24
Rc
31
The square root of the floating-point operand in register FRB is placed into register FRT. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. Operation with various special values of the operand is summarized below. Operand Result Exception VXSQRT - QNaN1 <0 QNaN1 VXSQRT -0 -0 None + + None VXSNAN SNaN QNaN1 QNaN QNaN None 1 No result if FPSCR VE = 1 FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX XX VXSNAN VXSQRT CR1
An estimate of the reciprocal of the floating-point operand in register FRB is placed into register FRT. The estimate placed into register FRT is correct to a precision of one part in 256 of the reciprocal of (FRB), i.e., 1ABS(estimate 1 x) ---------------------------------------------256 1x where x is the initial value in FRB. Operation with various special values of the operand is summarized below. Operand Result Exception - -0 None ZX -0 -1 ZX +0 +1 + +0 None VXSNAN SNaN QNaN2 QNaN QNaN None 1 No result if FPSCR ZE = 1. 2 No result if FPSCR VE = 1.
(if Rc=1)
FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different executions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX OX UX ZX XX (undefined) VXSNAN CR1
(if Rc=1)
135
Programming Note For the Floating-Point Estimate instructions, some implementations might implement a precision higher than the minimum architected precision. Thus, a program may take advantage of the higher precision instructions to increase performance by decreasing the iterations needed for software emulation of floating-point instructions. However, there is no guarantee given about the precision which may vary (up or down) between implementations. Only programs targeted at a specific implementation (i.e., the program will not be migrated to another implementation) should take advantage of the higher precision of the instructions. All other programs should rely on the minimum architected precision, which will guarantee the program to run properly across different implementations.
///
///
26
26
Rc
31
frsqrtes frsqrtes. 59
0 6
///
///
26
26
Rc
31
A estimate of the reciprocal of the square root of the floating-point operand in register FRB is placed into register FRT. The estimate placed into register FRT is correct to a precision of one part in 32 of the reciprocal of the square root of (FRB), i.e., 1ABS(estimate 1 ( x )) --------------------------------------------------32 1 ( x) where x is the initial value in FRB. Operation with various special values of the operand is summarized below. Operand Result Exception - QNaN2 VXSQRT VXSQRT <0 QNaN2 -0 -1 ZX +0 +1 ZX + +0 None VXSNAN SNaN QNaN2 QNaN QNaN None 1 No result if FPSCR ZE = 1. 2 No result if FPSCR VE = 1.
FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1 and Zero Divide Exceptions when FPSCRZE=1. The results of executing this instruction may vary between implementations, and between different executions on the same implementation. Special Registers Altered: FPRF FR (undefined) FI (undefined) FX ZX XX (undefined) VXSNAN VXSQRT CR1 Note See the Notes that appear with fre[s].
(if Rc=1)
136
Power ISA I
BF
//
9 11
FRA
16
FRB
21
128
/
31 0
63
6
BF
//
9 11
///
16
FRB
21
160
/
31
Let e_a be the unbiased exponent of the double-precision floating-point operand in register FRA. Let e_b be the unbiased exponent of the double-precision floating-point operand in register FRB. fe_flag is set to 1 if any of the following conditions occurs. The double-precision floating-point operand in register FRA is a NaN or an Infinity. The double-precision floating-point operand in register FRB is a Zero, a NaN, or an Infinity. e_b is less than or equal to -1022. e_b is greater than or equal to 1021. The double-precision floating-point operand in register FRA is not a zero and the difference, e_a - e_b, is greater than or equal to 1023. The double-precision floating-point operand in register FRA is not a zero and the difference, e_a - e_b, is less than or equal to -1021. The double-precision floating-point operand in register FRA is not a zero and e_a is less than or equal to -970 Otherwise fe_flag is set to 0. fg_flag is set to 1 if either of the following conditions occurs. The double-precision floating-point operand in register FRA is an Infinity. The double-precision floating-point operand in register FRB is a Zero, an Infinity, or a denormalized value. Otherwise fg_flag is set to 0. If the implementation guarantees a relative error of fre[s][.] of less than or equal to 2-14, then fl_flag is set to 1. Otherwise fl_flag is set to 0. CR field BF is set to the fl_flag || fg_flag || fe_flag || 0b0. Special Registers Altered: CR field BF value
Let e_b be the unbiased exponent of the double-precision floating-point operand in register FRB. fe_flag is set to 1 if either of the following conditions occurs. The double-precision floating-point operand in register FRB is a zero, a NaN, or an infinity, or a negative value. e_b is less than or equal to -970. Otherwise fe_flag is set to 0. fg_flag is set to 1 if the following condition occurs. The double-precision floating-point operand in register FRB is a Zero, an Infinity, or a denormalized value. Otherwise fg_flag is set to 0. If the implementation guarantees a relative error of frsqrte[s][.] of less than or equal to 2-14, then fl_flag is set to 1. Otherwise fl_flag is set to 0. CR field BF is set to the fl_flag || fg_flag || fe_flag || 0b0. Special Registers Altered: CR field BF Programming Note ftdiv and ftsqrt are provided to accelerate software emulation of divide and square root operations, by performing the requisite special case checking. Software needs only a single branch, on FE=1 (in CR[BF]), to a special case handler. FG and FL may provide further acceleration opportunities. value
137
A-form
(Rc=0) (Rc=1)
FRB
16
FRC
21 26
29
Rc
31
FRB
16
28
Rc
31
fmadds fmadds. 59
0 6
fmsubs fmsubs. 59
0 6
FRB
16
29
Rc
31
FRB
16
28
Rc
31
[(FRA)(FRC)] + (FRB)
[(FRA)(FRC)] - (FRB)
The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this intermediate result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ CR1
The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The floating-point operand in register FRB is subtracted from this intermediate result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR and placed into register FRT. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ CR1
(if Rc=1)
(if Rc=1)
138
Power ISA I
FRB
16
31
Rc
31
FRB
16
30
Rc
31
fnmadds fnmadds. 59
0 6
fnmsubs fnmsubs. 59
0 6
FRB
16
31
Rc
31
FRB
16
30
Rc
31
- ( [(FRA)(FRC)] + (FRB) )
- ( [(FRA)(FRC)] - (FRB) )
The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The floating-point operand in register FRB is added to this intermediate result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR, then negated and placed into register FRT. This instruction produces the same result as would be obtained by using the Floating Multiply-Add instruction and then negating the result, with the following exceptions. QNaNs propagate with no effect on their sign bit. QNaNs that are generated as the result of a disabled Invalid Operation Exception have a sign bit of 0. SNaNs that are converted to QNaNs as the result of a disabled Invalid Operation Exception retain the sign bit of the SNaN. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ CR1
The floating-point operand in register FRA is multiplied by the floating-point operand in register FRC. The floating-point operand in register FRB is subtracted from this intermediate result. If the most significant bit of the resultant significand is not 1, the result is normalized. The result is rounded to the target precision under control of the Floating-Point Rounding Control field RN of the FPSCR, then negated and placed into register FRT. This instruction produces the same result as would be obtained by using the Floating Multiply-Subtract instruction and then negating the result, with the following exceptions. QNaNs propagate with no effect on their sign bit. QNaNs that are generated as the result of a disabled Invalid Operation Exception have a sign bit of 0. SNaNs that are converted to QNaNs as the result of a disabled Invalid Operation Exception retain the sign bit of the SNaN. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ CR1
(if Rc=1)
(if Rc=1)
139
///
12
Rc
31
///
814
Rc
31
The floating-point operand in register FRB is rounded to single-precision, using the rounding mode specified by FPSCRRN, and placed into register FRT. The rounding is described fully in Section A.1, Floating-Point Round to Single-Precision Model on page 601. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE=1. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN CR1
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000_0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode specified by FPSCRRN. If the rounded value is greater than 263-1, then the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, then the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
(if Rc=1)
140
Power ISA I
///
815
Rc
31
FRT
11
///
FRB
16 21
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000_0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. then the If the rounded value is greater than 2 result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, then the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
63-1,
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x0000_0000_0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode specified by FPSCRRN. If the rounded value is greater than 264-1, then the result is 0xFFFF_FFFF_FFFF_FFFF, and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, then the result is 0x0000_0000_0000_0000, and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
(if Rc=1)
141
///
14
Rc
31
FRT
11
///
FRB
16 21
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x0000_0000_0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. If the rounded value is greater than 264-1, then the result is 0xFFFF_FFFF_FFFF_FFFF, and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, then the result is 0x0000_0000_0000_0000, and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT. The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode specified by FPSCRRN. If the rounded value is greater than 231-1, then the result is 0x7FFF_FFFF, and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, then the result is 0x8000_0000, and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and FRT0:31 is undefined, The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
(if Rc=1)
142
Power ISA I
///
FRB
16 21
15
Rc
31
FRT
11
///
FRB
16 21
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x8000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. then the If the rounded value is greater than 2 result is 0x7FFF_FFFF, and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, then the result is 0x8000_0000, and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and FRT0:31 is undefined, The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
31-1,
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode specified by FPSCRRN. If the rounded value is greater than 232-1, then the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0.0, then the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and FRT0:31 is undefined, The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
(if Rc=1)
143
FRT
11
///
FRB
16 21
///
846
Rc
31
Let src be the double-precision floating-point value in FRB. If src is a NaN, then the result is 0x0000_0000, VXCVI is set to 1, and, if src is an SNaN, VXSNAN is set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round toward Zero. If the rounded value is greater than 232-1, then the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0.0, then the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. If an enabled Invalid Operation Exception does not occur, then the result is placed into FRT32:63 and FRT0:31 is undefined, The conversion is described fully in Section A.2, Floating-Point Convert to Integer Model on page 605. Except for enabled Invalid Operation Exceptions, FPSCRFPRF is undefined. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
The 64-bit signed fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to double-precision, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, Floating-Point Convert from Integer Model. FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 Programming Note Converting a signed integer word to double-precision floating-point can be accomplished by loading the word from storage using Load Float Word Algebraic Indexed and then using fcfid.
(if Rc=1)
(if Rc=1)
144
Power ISA I
FRT
11
///
FRB
16 21
FRT
11
///
FRB
16 21
The 64-bit unsigned fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to double-precision, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, Floating-Point Convert from Integer Model. FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 Programming Note Converting an unsigned integer word to double-precision floating-point can be accomplished by loading the word from storage using Load Float Word and Zero Indexed and then using fcfidu.
The 64-bit signed fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to single-precision, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, Floating-Point Convert from Integer Model. FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 Programming Note Converting a signed integer word to single-precision floating-point can be accomplished by loading the word from storage using Load Float Word Algebraic Indexed and then using fcfids.
(if Rc=1)
(if Rc=1)
145
FRT
11
///
FRB
16 21
974
Rc
31
The 64-bit unsigned fixed-point operand in register FRB is converted to an infinitely precise floating-point integer. The result of the conversion is rounded to single-precision, using the rounding mode specified by FPSCRRN, and placed into register FRT. The conversion is described fully in Section A.3, Floating-Point Convert from Integer Model. FPSCRFPRF is set to the class and sign of the result. FPSCRFR is set if the result is incremented when rounded. FPSCRFI is set if the result is inexact. Special Registers Altered: FPRF FR FI FX XX CR1 Programming Note Converting a unsigned integer word to single-precision floating-point can be accomplished by loading the word from storage using Load Float Word and Zero Indexed and then using fcfidus.
(if Rc=1)
146
Power ISA I
X-form
(Rc=0) (Rc=1)
///
392
Rc
31
///
FRB
16 21
456
Rc
31
The floating-point operand in register FRB is rounded to an integral value as follows, with the result placed into register FRT. If the sign of the operand is positive, (FRB) + 0.5 is truncated to an integral value, otherwise (FRB) - 0.5 is truncated to an integral value. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE = 1. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1
The floating-point operand in register FRB is rounded to an integral value using the rounding mode round toward +infinity, and the result is placed into register FRT. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE = 1. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1
(if Rc = 1)
(if Rc = 1)
X-form
(Rc=0) (Rc=1)
///
FRB
16 21
488
Rc
31
///
424
Rc
31
The floating-point operand in register FRB is rounded to an integral value using the rounding mode round toward zero, and the result is placed into register FRT. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE = 1. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1
The floating-point operand in register FRB is rounded to an integral value using the rounding mode round toward -infinity, and the result is placed into register FRT. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when FPSCRVE = 1. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1
(if Rc = 1)
(if Rc = 1)
147
X-form
X-form
BF,FRA,FRB BF //
9
FRA
11
FRB
16 21
/
31 0
63
FRA
11
FRB
16 21
32
/
31
if (FRA) is a NaN or (FRB) is a NaN then c else if (FRA) < (FRB) then else if (FRA) > (FRB) then else c FPCC CR4BF:4BF+3 c if (FRA) is an SNaN or (FRB) is an SNaN then 1 VXSNAN
The floating-point operand in register FRA is compared to the floating-point operand in register FRB. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, then CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, then VXSNAN is set. Special Registers Altered: CR field BF FPCC FX VXSNAN
if (FRA) is a NaN or 0b0001 (FRB) is a NaN then c 0b1000 else if (FRA) < (FRB) then c else if (FRA) > (FRB) then c 0b0100 0b0010 else c c FPCC CR4BF:4BF+3 c if (FRA) is an SNaN or (FRB) is an SNaN then 1 VXSNAN 1 if VE = 0 then VXVC else if (FRA) is a QNaN or (FRB) is a QNaN then VXVC 1 The floating-point operand in register FRA is compared to the floating-point operand in register FRB. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, then CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, then VXSNAN is set and, if Invalid Operation is disabled (VE=0), VXVC is set. If neither operand is a Signaling NaN but at least one operand is a Quiet NaN, then VXVC is set. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC
148
Power ISA I
A-form
(Rc=0) (Rc=1) FRC
21 26
FRB
16
23
Rc
31
(FRC)
The floating-point operand in register FRA is compared to the value zero. If the operand is greater than or equal to zero, register FRT is set to the contents of register FRC. If the operand is less than zero or is a NaN, register FRT is set to the contents of register FRB. The comparison ignores the sign of zero (i.e., regards +0 as equal to -0). Special Registers Altered: CR1 Programming Note Examples of uses of this instruction can be found in Sections F.2, Floating-Point Conversions [Category: Floating-Point] on page 642 and F.3, Floating-Point Selection [Category: Floating-Point] on page 646. Warning: Care must be taken in using fsel if IEEE compatibility is required, or if the values being tested can be NaNs or infinities; see Section F.3.4, Notes on page 646. (if Rc=1)
149
X-form
(Rc=0) (Rc=1)
///
16
///
21
583
Rc
31
The contents of the FPSCR are placed into register FRT. Special Registers Altered: CR1 (if Rc=1)
BF,BFA BF //
9 11
BFA
//
14 16
///
21
64
/
31
The contents of FPSCR32:63 field BFA are copied to Condition Register field BF. All exception bits copied are set to 0 in the FPSCR. If the FX bit is copied, it is set to 0 in the FPSCR. Special Registers Altered: CR field BF FX OX UX ZX XX VXSNAN VXISI VXIDI VXZDZ VXIMZ VXVC VXSOFT VXSQRT VXCVI
(if BFA=0) (if BFA=1) (if BFA=2) (if BFA=3) (if BFA=5)
150
Power ISA I
X-form
(Rc=0) (Rc=1)
XFL-form
(Rc=0) (Rc=1)
BF,U,W BF,U,W BF //
9 11
FLM,FRB,L,W FLM,FRB,L,W L
6 7
///
W
15 16
/
20 21
134
Rc
31
FLM
FRB
21
711
Rc
31
15 16
The value of the U field is placed into FPSCR field BF+8%(1-W). FPSCRFX is altered only if BF = 0 and W = 0. Special Registers Altered: FPSCR field BF + 8%(1-W) CR1 Programming Note mtfsfi serves as both a basic and an extended mnemonic. The Assembler will recognize a mtfsfi mnemonic with three operands as the basic form, and a mtfsfi mnemonic with two operands as the extended form. In the extended form the W operand is omitted and assumed to be 0. Programming Note When FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of U0 and U3 (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from U0 and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according to the usual rule, given on page 107, and not from U1:2.
The FPSCR is modified as specified by the FLM, L, and W fields. L=0 The contents of register FRB are placed into the FPSCR under control of the W field and the field mask specified by FLM. W and the field mask identify the 4-bit fields affected. Let i be an integer in the range 0-7. If FLMi=1 then FPSCR field k is set to the contents of the corresponding field of register FRB, where k = i+8%(1-W). L=1 The contents of register FRB are placed into the FPSCR. FPSCRFX is not altered implicitly by this instruction. Special Registers Altered: FPSCR fields selected by mask, L, and W CR1 (if Rc=1) Programming Note mtfsf serves as both a basic and an extended mnemonic. The Assembler will recognize a mtfsf mnemonic with four operands as the basic form, and a mtfsf mnemonic with two operands as the extended form. In the extended form the W and L operands are omitted and both are assumed to be 0. Programming Note Updating fewer than eight fields of the FPSCR may have substantially poorer performance on some implementations than updating eight fields or all of the fields. Programming Note If L=1 or if L=0 and FPSCR32:35 is specified, bits 32 (FX) and 35 (OX) are set to the values of (FRB)32 and (FRB)35 (i.e., even if this instruction causes OX to change from 0 to 1, FX is set from (FRB)32 and not by the usual rule that FX is set to 1 when an exception bit changes from 0 to 1). Bits 33 and 34 (FEX and VX) are set according to the usual rule, given on page 107, and not from (FRB)33:34.
(if Rc=1)
151
X-form
(Rc=0) (Rc=1)
X-form
(Rc=0) (Rc=1)
BT BT BT
11
BT BT BT
11
///
16
///
21
70
Rc
31
///
16
///
21
38
Rc
31
Bit BT+32 of the FPSCR is set to 0. Special Registers Altered: FPSCR bit BT+32 CR1 Programming Note Bits 33 and 34 (FEX and VX) cannot be explicitly reset.
Bit BT+32 of the FPSCR is set to 1. Special Registers Altered: FPSCR bits BT+32 and FX CR1 Programming Note Bits 33 and 34 (FEX and VX) cannot be explicitly set.
(if Rc=1)
(if Rc=1)
152
Power ISA I
The DFP facility also shares the Condition Register (CR) with the fixed-Point facility, the BFP faciltiy, and the vector facility. The DFP facility supports three DFP data formats: DFP Short (single precision), DFP Long (double precision), and DFP Extended (quad precision). Most operations are performed on DFP Long or DFP Extended format directly. Support for DFP Short is limited to conversion to and from DFP Long. Some DFP instructions operate on other data types, including signed or unsigned binary fixed-point data, and signed or unsigned decimal data. DFP instructions are provided to perform arithmetic, compare, test, quantum-adjustment, conversion, and format operations on operands held in FPRs or FPR pairs.
153
Each DFP exception and each category of Invalid Operation Exception has an exception status bit in the FPSCR. In addition, each of the five DFP exceptions has a corresponding enable bit in the FPSCR. These enable bits enable or disable the invocation of the system floating-point enabled exception error handler, and may affect the setting of some exception status bits in the FPSCR. The usage of these bits by the DFP facility differs from the usage by the BFP facility. Section 5.5.10 DFP Exceptions on page 164 provides a detailed discussion of DFP exceptions, including the effects of the enable bits.
154
Power ISA I
37
38
40
41
142
43
44
33
45
34
46
35
155
47
59
60
48:51
61 62:63
48 49 50 51 52 53
Result Value Class ? 1 1 1 0 0 0 0 0 0 1 Signaling NaN (DFP only) Quiet NaN - Infinity - Normal Number - Subnormal Number - Zero + Zero + Subnormal Number + Normal Number + Infinity
54
56
57
156
Power ISA I
Figure 62. Format for Unsigned Decimal Data For signed decimal data, the rightmost nibble contains a sign code (S) and all other nibbles contain a digit code as shown in Figure 63.
157
for denoting the value as either a Not-a-Number or an Infinity. The first 5 bits of the combination field contain the encoding of NaN or infinity, or the two leftmost bits of the biased exponent and the leftmost digit (LMD) of the significand. The following tables show the encoding: G0:4 11111 11110 All others Description NaN Infinity Finite Number (see Figure 69)
Figure 68. Encoding of the G field for Special Symbols LMD 0 1 2 3 4 5 6 7 8 9 Leftmost 2-bits of biased exponent 00 00000 00001 00010 00011 00100 00101 00110 00111 11000 11001 01 01000 01001 01010 01011 01100 01101 01110 01111 11010 11011 10 10000 10001 10010 10011 10100 10101 10110 10111 11100 11101
G
12
T
31
Figure 69. Encoding of bits 0:4 of the G field for Finite Numbers For DFP finite numbers, the rightmost N-5 bits of the N-bit combination field contain the remaining bits of the biased exponent. For NaNs, bit 5 of the combination field is used to distinguish a Quiet NaN from a Signaling NaN; the remaining bits in a source operand are ignored and they are set to zeros in a target operand by most operations. For infinities, the rightmost N-5 bits of the N-bit combination field of a source operand are ignored and they are set to zeros in a target operand by most operations. Trailing Significand field (T) For DFP finite numbers, this field contains the remaining significand digits. For NaNs, this field may be used to contain diagnostic information. For infinities, contents in this field of a source operand are ignored and they are set to zeros in a target operand by most operations. The trailing significand field is a multiple of 10-bit blocks. The multiple depends on the format. Each 10-bit block is called a declet and represents three decimal digits, using the Densely Packed Decimal (DPD) encoding defined in Appendix B.
S
0 1
G
14
T
63
S
0 1
G
18
T
63
T (continued)
64 127
Figure 67. DFP Extended format The fields are defined as follows: Sign bit (S) The sign bit is in bit 0 of each format, and is zero for plus and one for minus. Combination field (G) As the name implies, this field provides a combination of the exponent and the left-most digit (LMD) of the significand, for finite numbers, or provides a special code
158
Power ISA I
159
Sign
* The significand is zero and the exponent is any representable value Figure 71. Value Ranges for Finite Number Data Classes Data Class +Infinity Infinity Quiet NaN Signaling NaN x Dont care Figure 72. Encoding of NaN and Infinity Data Classes Zeros Zeros have a zero significand and any representable value in the exponent. A +0 is distinct from -0, and zeros with different exponents are distinct, except that comparison treats them as equal. Subnormal Numbers Subnormal numbers have values that are smaller than Nmin and greater than zero in magnitude. Normal Numbers Normal numbers are nonzero finite numbers whose magnitude is between Nmin and Nmax inclusively. Infinities Infinities are represented by 0b11110 in the leftmost 5 bits of the combination field. When an operation is defined to generate an infinity as the result, a default infinity is sometimes supplied. A default infinity has all remaining bits in the combination field and trailing significand field set to zeros. When infinities are used as source operands, only the leftmost 5 bits of the combination field are interpreted (i.e., 0b11110 indicates the value is an infinity). The trailing significand field of infinities is usually ignored. For generated infinities, the leftmost 5 bits of the combination field are set to 0b11110 and all remaining combination bits are set to zero. Infinities can participate in most arithmetic operations and give a consistent result. In comparisons, any +Infinity compares greater than any finite number, and any -Infinity compares less than any finite number. All +Infinity are compared equal and all -Infinity are compared equal. Signaling and Quiet NaNs There are two types of Not-a-Numbers (NaNs), Signaling (SNaN) and Quiet (QNaN). S 0 1 x x G 11110xxx . . . xxx 11110xxx . . . xxx 111110xx . . . xxx 111111xx . . . xxx T xxx . . . xxx xxx . . . xxx xxx . . . xxx xxx . . . xxx
5.5.1 Rounding
Rounding takes a number regarded as infinitely precise and, if necessary, modifies it to fit the destinations precision. The destinations precision of an operation defines the set of permissible resultant values. For
160
Power ISA I
Z2
Z1
Z2 Z1 Z Positive Values
Round to Nearest, Ties to Even Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one whose units digit would have been even in the form with the largest common quantum of the two permissible resultant values. However, an infinitely precise result with magnitude at least (Nmax + 0.5Q(Nmax)) is rounded to infinity with no change in sign; where Q(Nmax) is the quantum of Nmax. Round toward 0 Choose the smaller in magnitude (Z1 or Z2). Round toward + Choose Z1. Round toward - Choose Z2. Round to Nearest, Ties away from 0 Choose the value that is closer to Z (Z1 or Z2). In case
For the quantum-adjustment, a 2-bit immediate field, called RMC (Rounding Mode Control), in the instruction specifies the rounding mode used. The RMC field may contain a primary encoding or a secondary encoding. For Quantize, Quantize Immediate, and Reround, the RMC field contains the primary encoding. For Round to FP Integer the field contains either encoding, depending on the setting of a RMC-encoding-selection
161
Notes: E(x) - exponent of the DFP operand in register x. Figure 77. Summary of Ideal Exponents
162
Power ISA I
the sign and significand of operands. Infinities compare equal, and NaNs compare equal. The test result is indicated in the FPSCRFPCC field and CR field BF. The Test Significance instruction compares the number of significant digits of one source operand with the referenced number of significant digits in another source operand. The test result is indicated in the FPSCRFPCC field and CR field BF. Execution of a test instruction does not cause any DFP exception.
163
164
Power ISA I
165
FE0 FE1 Description 1 1 Precise Mode The system floating-point enabled exception error handler is invoked precisely at the instruction that caused the enabled exception.
In all cases, the question of whether a DFP result is stored, and what value is stored, is governed by the FPSCR exception enable bits, as described in subsequent sections, and is not affected by the value of the FE0 and FE1 bits. In all cases in which the system floating-point enabled exception error handler is invoked, all instructions before the instruction at which the system floating-point enabled exception error handler is invoked have completed, and no instruction after the instruction at which the system floating-point enabled exception error handler is invoked has begun execution. (Recall that, for the two Imprecise modes, the instruction at which the system floating-point enabled exception error handler is invoked need not be the instruction that caused the exception.) The instruction at which the system floating-point enabled exception error handler is invoked has not been executed unless it is the excepting instruction, in which case it has been executed if the exception is not among those listed on page 164 as suppressed. Programming Note In the ignore and both imprecise modes, a Floating-Point Status and Control Register instruction can be used to force any exceptions, due to instructions initiated before the Floating-Point Status and Control Register instruction, to be recorded in the FPSCR. (This forcing is superfluous for Precise Mode.) In either of the Imprecise modes, a Floating-Point Status and Control Register instruction can be used to force any invocations of the system floating-point enabled exception error handler, due to instructions initiated before the Floating-Point Status and Control Register instruction, to occur. (This forcing has no effect in Ignore Exceptions Mode, and is superfluous for Precise Mode.) In order to obtain the best performance across the widest range of implementations, the programmer should obey the following guidelines. If the IEEE default results are acceptable to the application, Ignore Exceptions Mode should be used with all FPSCR exception enable bits set to zero. If the IEEE default results are not acceptable to the application, Imprecise Nonrecoverable Mode should be used, or Imprecise Recoverable Mode if recoverability is needed, with FPSCR exception
166
Power ISA I
Programming Note In addition, an Invalid Operation Exception occurs if software explicitly requests this by executing an mtfsfi, mtfsf, or mtfsb1 instruction that sets FPSCRVXSOFT to 1 (Software Request). The purpose of FPSCRVXSOFT is to allow software to cause an Invalid Operation Exception for a condition that is not necessarily associated with the execution of a DFP instruction. For example, it might be set by a program that computes a square root, if the source operand is negative.
Action
The action to be taken depends on the setting of the Invalid Operation Exception Enable bit of the FPSCR. When Invalid Operation Exception is enabled (FPSCRVE=1) and Invalid Operation occurs, the following actions are taken: 1. One or two Invalid Operation Exceptions are set: FPSCRVXSNAN (if SNaN) FPSCRVXISI (if - ) FPSCRVXIDI (if ) FPSCRVXZDZ (if 0 0) (if % 0) FPSCRVXIMZ FPSCRVXVC (if invalid comp) FPSCRVXCVI (if invalid conversion) 2. If the operation is an arithmetic, quantum-adjustment, conversion, or format, the target FPR is unchanged, FPSCRFR FI are set to zero, and FPSCRFPRF is unchanged. 3. If the operation is a compare, FPSCRFR FI C are unchanged, and FPSCRFPCC is set to reflect unordered. When Invalid Operation Exception is disabled (FPSCRVE=0) and Invalid Operation occurs, the following actions are taken: 1. One or two Invalid Operation Exceptions are set: FPSCRVXSNAN (if SNaN) (if - ) FPSCRVXISI FPSCRVXIDI (if ) FPSCRVXZDZ (if 0 0) FPSCRVXIMZ (if % 0) FPSCRVXVC (if invalid comp) (if invalid conversion) FPSCRVXCVI 2. If the operation is an arithmetic, quantum-adjustment, Round to DFP Long, Convert to DFP Extended, or format the target FPR is set to a Quiet NaN FPSCRFR FI are set to zero FPSCRFPRF is set to indicate the class of the result (Quiet NaN) 3. If the operation is a Convert To Fixed the target FPR is set as follows: FRT is set to the most positive 64-bit binary integer if the operand in FRB is a positive or
A Zero Divide Exception occurs when a Divide instruction is executed with a zero divisor value and a finite nonzero dividend value.
Action
The action to be taken depends on the setting of the Zero Divide Exception Enable bit of the FPSCR. When Zero Divide Exception is enabled (FPSCRZE=1) and Zero Divide occurs, the following actions are taken: 1. Zero Divide Exception is set FPSCRZX 1 2. The target FPR is unchanged 3. FPSCRFR FI are set to zero 4. FPSCRFPRF is unchanged When Zero Divide Exception is disabled (FPSCRZE=0) and Zero Divide occurs, the following actions are taken: 1. Zero Divide Exception is set FPSCRZX 1 2. The target FPR is set to , where the sign is determined by the XOR of the signs of the operands 3. FPSCRFR FI are set to zero 4. FPSCRFPRF is set to indicate the class and sign of the result ()
Action
Except for Reround, the following describes the handling of the IEEE overflow exception condition. The Reround operation does not recognize an overflow exception condition. The action to be taken depends on the setting of the Overflow Exception Enable bit of the FPSCR.
167
Action
The action to be taken depends on the setting of the Underflow Exception Enable bit of the FPSCR. When Underflow Exception is enabled (FPSCRUE=1) and underflow occurs, the following actions are taken: 1. Underflow Exception is set FPSCRUX 1 2. The infinitely precise result is multiplied by 10. That is, the exponent adjustment is added to the exponent. This is called the wrapped result. The exponent adjustment for all operations, except for Round To DFP Short and Round To DFP Long, is 576 for DFP Long and 9216 for DFP Extended. For Round To DFP Short and Round To DFP Long, the exponent adjustment is 192 for the source format of DFP Long and 3072 for the source format of DFP Extended. 3. The wrapped result is rounded to the target-format precision. This is called the wrapped rounded result. 4. If the wrapped rounded result has only one form, it is the delivered result. If the wrapped rounded result has redundant forms and is exact, the result of the form that has the exponent closest to the
168
Power ISA I
Action
The action to be taken does not depend on the setting of the Inexact Exception Enable bit of the FPSCR. When Inexact Exception occurs, the following actions are taken: 1. Inexact Exception is set FPSCRXX 1 2. The rounded or overflowed result is placed into the target FPR 3. FPSCRFPRF is set to indicate the class and sign of the result Programming Note In some implementations, enabling Inexact Exceptions may degrade performance more than does enabling other types of floating-point exception.
169
Range of v v < -Nmax, q < -Nmax v < -Nmax, q = -Nmax -Nmax v -Nmin -Nmin < v -Dmin -Dmin < v < -Dmin/2 v = -Dmin/2 -Dmin/2 < v < 0 v=0 0 < v < +Dmin/2 v = +Dmin/2 +Dmin/2 < v < +Dmin +Dmin v < +Nmin +Nmin v +Nmax +Nmax < v, q = +Nmax
Case Overflow Normal Normal Tiny Tiny Tiny Tiny EZD Tiny Tiny Tiny Tiny Normal Normal
Result (r) when Rounding Mode Is RNAZ RAFZ RTMI RFSP - -Nmax b b* -Dmin -Dmin -0 +0 +0 +Dmin +Dmin b* b +Nmax
1
+Nmax < v, q > +Nmax Overflow +1 +1 +1 +1 +Nmax +Nmax +1 +Nmax Explanation: This situation cannot occur. 1 The normal result r is considered to have been incremented. * The rounded value, in the extreme case, may be Nmin. In this case, the exception conditions are underflow, inexact, and incremented. b The value derived when the precise result v is rounded to the destinations precision, including both bounded precision and bounded exponent range. q The value derived when the precise result v is rounded to the destinations precision, but assuming an unbounded exponent range. r This is the returned value when neither overflow nor underflow is enabled. v Precise result before rounding, assuming unbounded precision and an unbounded exponent range. For data-format conversion operations, v is the source value. Dmin Smallest (in magnitude) representable subnormal number in the target format. EZD The result r of the exact-zero-difference case applies only to ADD and SUBTRACT with both source operands having opposite signs. (For ADD and SUBTRACT, when both source operands have the same sign, the sign of the zero result is the same sign as the sign of the source operands.) Nmax Largest (in magnitude) representable finite number in the target format. Nmin Smallest (in magnitude) representable normalized number in the target format. RAFZ Round away from 0. RFSP Round to Prepare for Shorter Precision. RNAZ Round to Nearest, Ties away from 0. RNE Round to Nearest, Ties to even. RNTZ Round to Nearest, Ties toward 0. RTPI Round toward +. RTMI Round toward -. RTZ Round toward 0.
170
Power ISA I
Case Overflow Overflow Overflow Overflow Overflow Overflow Overflow Normal Normal Normal Normal Normal Tiny Tiny Tiny Tiny Tiny Tiny
Is q Is q IncreIs r IncreIs r mented inexact mented inexact (|q|>|v|) (qv) (rv) OE=1 UE=1 XE=1 (|r|>|v|) Yes1 Yes1 Yes1 Yes1 Yes1 Yes1 Yes No Yes Yes Yes Yes No
1
Returned Results and Status Setting* T(r), OX 1, FI 1, FR 0, XX T(r), OX 1, FI 1, FR 1, XX T(r), OX 1, FI 1, FR 0, XX T(r), OX 1, FI 1, FR 1, XX 1 1 1, TX 1, TX 1,TO 1,TO
No Yes No No No No
Tiny Yes Yes No No1 Tw(q), UX 1, FI 0, FR 0, TU Tiny Yes Yes Yes No Tw(q), UX 1, FI 1, FR 0, XX 1,TU Tiny Yes Yes Yes Yes Tw(q), UX 1, FI 1, FR 1, XX 1,TU Explanation: The results do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. * Rounding sets only the FI and FR status flags. Setting of the OX, XX, or UX flag is part of the exception actions. They are listed here for reference. Wrap adjust, which depends on the type of operation and operand format. For all operations except Round to DFP Short and Round to DFP Long, the wrap adjust depends on the target format: = 10, where is 576 for DFP Long, and 9216 for DFP Extended. For Round to DFP Short and Round to DFP Long, the wrap adjust depends on the source q r v FI FR OX TO TU TX T(x) Tw(x) format: = 10 where is 192 for DFP Long and 3072 for DFP Extended. The value derived when the precise result v is rounded to destinations precision, but assuming an unbounded exponent range. The result as defined in Part 1 of this figure. Precise result before rounding, assuming unbounded precision and unbounded exponent range. Floating-Point-Fraction-Inexact status flag, FPSCRFI. This status flag is non-sticky. Floating-Point-Fraction-Rounded status flag, FPSCRFR. Floating-Point Overflow Exception status flag, FPSCRoX. The system floating-point enabled exception error handler is invoked for the overflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. The system floating-point enabled exception error handler is invoked for the underflow exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. The value x is placed at the target operand location. The wrapped rounded result x is placed at the target operand location. For all operations except data format conversions, the wrapped rounded result is in the same format and length as normal results at the target location. For data format conversions, the wrapped rounded result is in the same format and length as the source, but rounded to the target-format precision. Floating-Point-Underflow-Exception status flag, FPSCRUX Float-Point-Inexact-Exception Status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
UX XX
171
172
Power ISA I
X-form
(Rc=0) (Rc=1) FRB
16 21
X-form
(Rc=0) (Rc=1) FRB 514
21
Rc
31
Rc
31
16
daddq daddq. 63
0 6
(Rc=0) (Rc=1) 2
21
dsubq dsubq. 63
0
FRBp
16
Rc
31
FRAp
11
FRBp
16
Rc
31
The DFP operand in FRA[p] is added to the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the smaller exponent of the two source operands. Figure 81 summarizes the actions for Add. Figure 81 does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1
The DFP operand in FRB[p] is subtracted from the DFP operand in FRA[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the smaller exponent of the two source operands. The execution of Subtract is identical to that of Add, except that the operand in FRB participates in the operation with its sign bit inverted. See Figure 81. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI CR1
(if Rc=1)
(if Rc=1)
173
Operand a in FRA[p] is - F + QNaN SNaN Explanation: a+b +dINF - dINF dNaN F P(x) S(x)
Actions for Add (a + b) when operand b in FRB[p] is F + QNaN P(b) T(-dINF) VXISI: T(dNaN) S(a + b) T(+dINF) P(b) T(+dINF) T(+dINF) P(b) P(a) P(a) P(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a)
SNaN VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(a)
VXSNAN
The value a added to b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170) Default plus infinity. Default minus infinity. Default quiet NaN. All finite numbers, including zeros. The QNaN of operand x is propagated and placed in FRT[p]. The value x is placed in FRT[p] with the sign set by the rules of algebra. When the source operands have the same sign, the sign of the result is the same as the sign of the operands, including the case when the result is zero. When the operands have opposite signs, the sign of a zero result is positive in all rounding modes, except round toward -, in which case, the sign is minus. The value x is placed in FRT[p]. The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. The Invalid-Operation Exception (VXISI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.) The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.)
174
Power ISA I
X-form
(Rc=0) (Rc=1) FRB
16 21
(if Rc=1)
34
Rc
31
dmulq dmulq. 63
0 6
(Rc=0) (Rc=1) 34
21
FRBp
16
Rc
31
The DFP operand in FRA[p] is multiplied by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the sum of the two exponents of the source operands. Figure 82 summarizes the actions for Multiply. Figure 82 does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged.
Operand a in FRA[p] is
SNaN VXSNAN: U(b) 0 Fn VXSNAN: U(b) VXSNAN: U(b) QNaN VXSNAN: U(b) VXSNAN: U(a) SNaN Explanation: a*b The value a multiplied by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170) dINF Default infinity. dNaN Default quiet NaN. Fn Finite nonzero number (includes both normal and subnormal numbers). P(x) The QNaN of operand x is propagated and placed in FRT[p]. S(x) The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. T(x) The value x is placed in FRT[p]. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. The Invalid-Operation Exception (VXIMZ) occurs. The result is produced only when the exception is VXIMZ: disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.) The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception VXSNAN: is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.)
Actions for Multiply (a*b) when operand b in FRB[p] is Fn QNaN P(b) S(a * b) VXIMZ: T(dNaN) S(a * b) S(dINF) P(b) S(dINF) S(dINF) P(b) P(a) P(a) P(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a)
175
X-form
(Rc=0) (Rc=1) FRB
16 21
546
Rc
31
Figure 83 summarizes the actions for Divide. Figure 83 does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation and enabled zero-divide exceptions, in which cases the field remains unchanged. Special Registers Altered: FPRF FR FI FX OX UX ZX XX VXSNAN VXIDI VXZDZ CR1
ddivq ddivq. 63
0
FRAp
11
FRBp
16
Rc
31
(if Rc=1)
The DFP operand in FRA[p] is divided by the DFP operand in FRB[p]. The result is rounded to the target-format precision under control of the DRN (bits 29:31) of the FPSCR. An appropriate form of the rounded result is selected based on the ideal exponent and is placed in FRT[p]. The ideal exponent is the difference of subtracting the exponent of the divisor from the exponent of the dividend. Actions for Divide (a b) when operand b in FRB[p] is Fn QNaN S(a b) S(zt) P(b) S(a b) S(zt) P(b) S(dINF) VXIDI: T(dNaN) P(b) P(a) P(a) P(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a)
Operand a in FRA[p] is 0 Fn QNaN SNaN Explanation: a b dINF dNaN Fn P(x) S(x) T(x) U(x) VXIDI:
SNaN VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(a)
VXSNAN:
VXZDZ:
zt Zx
The value a divided by b, rounded to the target-format precision and returned in the appropriate form. (See Section 5.5.11 on page 170.) Default infinity. Default quiet NaN. Finite nonzero number (includes both normal and subnormal numbers). The QNaN of operand x is propagated and placed in FRT[p]. The value x is placed in FRT[p] with the sign set to the exclusive-OR of the source-operand signs. The value x is placed in FRT[p]. The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. The Invalid-Operation Exception (VXIDI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.) The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.) The Invalid-Operation Exception (VXZDZ) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 Invalid Operation Exception on page 166 for the exception actions.) True zero (zero significand and most negative exponent). The Zero-Divide Exception occurs. The result is produced only when the exception is disabled (See Section 5.5.10.2 Zero Divide Exception on page 167 for the exception actions.)
176
Power ISA I
177
X-form
BF,FRA,FRB BF //
6 9
FRA
11
FRB
16 21
642
/
31
dcmpuq 63
0
BF,FRAp,FRBp BF // FRAp
6 9 11
FRBp
16 21
642
/
31
The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN
Operand a in FRA[p] is - F + QNaN SNaN Explanation: C(a:b) F AeqB AgtB AltB AuoB VXSNAN
Actions for Compare Unordered (a:b) when operand b in FRB[p] is - F + QNaN SNaN AeqB AltB AltB AuoB Fu, VXSNAN AgtB C(a:b) AltB AuoB Fu, VXSNAN AgtB AgtB AeqB AuoB Fu, VXSNAN AuoB AuoB AuoB AuoB Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Fu, VXSNAN Algebraic comparison. See the table below. All finite numbers, including zeros. CR field BF and FPSCRFPCC are set to 0b0010. CR field BF and FPSCRFPCC are set to 0b0100. CR field BF and FPSCRFPCC are set to 0b1000. CR field BF and FPSCRFPCC are set to 0b0001. The invalid-operation exception (VXSNAN) occurs. See Section 5.5.10.1 for actions. Action for C(a:b) AeqB AltB AgtB
Relation of Value a to Value b a = b a < b a > b Figure 84. Actions: Compare Unordered
178
Power ISA I
X-form
BF,FRA,FRB BF //
6 9
FRA
11
FRB
16 21
130
/
31
dcmpoq 63
0
BF,FRAp,FRBp BF // FRAp
6 9 11
FRBp
16 21
130
/
31
The DFP operand in FRA[p] is compared to the DFP operand in FRB[p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. Special Registers Altered: CR field BF FPCC FX VXSNAN VXVC
Operand a in FRA[p] is - F + QNaN SNaN Explanation: C(a:b) F AeqB AgtB AltB AuoB VXSV VXVC
Actions for Compare ordered (a:b) when operand b in FRB[p] is - F + QNaN SNaN AuoB, VXSV AeqB AltB AltB AuoB, VXVC AgtB C(a:b) AltB AuoB, VXVC AuoB, VXSV AgtB AgtB AeqB AuoB, VXVC AuoB, VXSV AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXVC AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV AuoB, VXSV Algebraic comparison. See the table below All finite numbers, including zeros CR field BF and FPSCRFPCC are set to 0b0010. CR field BF and FPSCRFPCC are set to 0b0100. CR field BF and FPSCRFPCC are set to 0b1000. CR field BF and FPSCRFPCC are set to 0b0001. The invalid-operation exception (VXSNAN) occurs. Additionally, if the exception is disabled (FPSCRVE=0), then FPSCRVXVC is also set to one. See Section 5.5.10.1 for actions. The invalid-operation exception (VXVC) occurs. See Section 5.5.10.1 for actions. Action for C(a:b) AeqB AltB AgtB
Relation of Value a to Value b a = b a < b a > b Figure 85. Actions: Compare Ordered
179
Z22-form
Z22-form
BF,FRA,DCM BF //
6 9
FRA
11 16
DCM
22
194
/
31 0
59
FRA
11 16
DGM
22
226
/
31
dtstdcq 63
0
BF,FRAp,DCM BF // FRAp
6 9 11 16
dtstdgq DCM
22
BF,FRAp,DGM BF // FRAp
6 9 11 16
194
/
31 0
63
DGM
22
226
/
31
Let the DCM (Data Class Mask) field specify one or more of the 6 possible data classes, where each bit corresponds to a specific data class. DCM Bit 0 1 2 3 4 5 Data Class Zero Subnormal Normal Infinity Quiet NaN Signaling NaN
Let the DGM (Data Group Mask) field specify one or more of the 6 possible data groups, where each bit corresponds to a specific data group. The term extreme exponent means either the maximum exponent, Xmax, or the minimum exponent, Xmin. DGM Bit 0 1 2 3 4 5 Data Group Zero with non-extreme exponent Zero with extreme exponent Subnormal or (Normal with extreme exponent) Normal with non-extreme exponent and leftmost zero digit in significand Normal with non-extreme exponent and leftmost nonzero digit in significand Special symbol (Infinity, QNaN, or SNaN)
CR field BF and FPSCRFPCC are set to indicate the sign of the DFP operand in FRA[p] and whether the data class of the DFP operand in FRA[p] matches any of the data classes specified by DCM. Field 0000 0010 1000 1010 Meaning Operand positive with no match Operand positive with match Operand negative with no match Operand negative with match
CR field BF and FPSCRFPCC are set to indicate the sign of the DFP operand in FRA[p] and whether the data group of the DFP operand in FRA[p] matches any of the data groups specified by DGM. Field 0000 0010 1000 1010 Meaning Operand positive with no match Operand positive with match Operand negative with no match Operand negative with match
180
Power ISA I
X-form
BF,FRA,FRB BF //
6 9
FRA
11
FRB
16 21
162
/
31
dtstexq 63
0
BF,FRAp,FRBp BF // FRAp
6 9 11
FRBp
16 21
162
/
31
The exponent value (Ea) of the DFP operand in FRA[p] is compared to the exponent value (Eb) of the DFP operand in FRB [p]. The result of the compare is placed into CR field BF and the FPSCRFPCC. The codes in the CR field BF and FPSCRFPCC are defined for the DFP Test Exponent operations as follows. Bit 0 1 2 3 Description Ea < Eb Ea > Eb Ea = Eb Ea ? Eb
Special Registers Altered: CR field BF FPCC Operand a in FRA[p] is F QNaN SNaN Explanation: C(Ea:Eb) F AeqB AgtB AltB AuoB Actions for Test Exponent (Ea:Eb) when operand b in FRB[p] is F QNaN SNaN C(Ea:Eb) AuoB AuoB AuoB AuoB AeqB AuoB AuoB AuoB AuoB AeqB AeqB AuoB AuoB AeqB AeqB Algebraic comparison. See the table below. All finite numbers, including zeros CR field BF and FPSCRFPCC are set to 0b0010. CR field BF and FPSCRFPCC are set to 0b0100. CR field BF and FPSCRFPCC are set to 0b1000. CR field BF and FPSCRFPCC are set to 0b0001. Action for C(Ea:Eb) AeqB AltB AgtB
Relation of Value Ea to Value Eb Ea = Eb Ea < Eb Ea > Eb Figure 86. Actions: Test Exponent
181
X-form
Actions for Test Significance when the operand in FRB[p] is QNaN SNaN F C(k: NSDb) AuoB AuoB AuoB Explanation: C(k: NSDb) Algebraic comparison. See the table below. F All finite numbers, including zeros. AeqB CR field BF and FPSCRFPCC are set to 0b0010. AgtB CR field BF and FPSCRFPCC are set to 0b0100. AltB CR field BF and FPSCRFPCC are set to 0b1000. AuoB CR field BF and FPSCRFPCC are set to 0b0001. Relation of Value NSDb to Value k k g 0 and k = NSDb k g 0 and k < NSDb k g 0 and k > NSDb, or k = 0 Action for C(k:NSDb) AeqB AltB AgtB
BF,FRA,FRB BF //
6 9
FRA
11
FRB
16 21
674
/
31
dtstsfq 63
0
BF,FRA,FRBp BF //
6 9
FRA
11
FRBp
16 21
674
/
31
Let k be the contents of bits 58:63 of FRA that specifies the reference significance. The number of significant digits of the DFP operand in FRB[p], NSDb, is compared to the reference significance, k. For this instruction, the number of significant digits of the value 0 is considered to be zero. The result of the compare is placed into CR field BF and the FPSCRFPCC as follows. Bit 0 1 2 3 Description k g 0 and k < NSDb k g 0 and k > NSDb, or k = 0 k g 0 and k = NSDb k ? NSDb
Figure 87. Actions: Test Significance Special Registers Altered: CR field BF FPCC Programming Note The reference significance can be loaded into a FPR using a Load Float as Integer Word Algebraic instruction
182
Power ISA I
Programming Note DFP Quantize Immediate can be used to adjust values to a form having the specified exponent in the range -16 to 15. If the adjustment requires the significand to be shifted left, then: if the result would cause overflow from the most significant digit, the result is a default QNaN.; otherwise the result is the adjusted value (left shifted with matching exponent). If the adjustment requires the significand to be shifted right, the result is rounded based on the value of the RMC field. DFP Quantize Immediate can round a value to a specific number of fractional digits. Consider the computation of sales tax. Values expressed in U.S. dollars have 2 fractional digits, and sales tax rates typically have 3 fractional digits. The product of value and rate will yield 5 fractional digits. For example: 39.95 * 0.075 = 2.99625 This result needs to be rounded to the penny to compute the correct tax of $3.00. The following sequence computes the sales tax assuming the pre-tax total is in FRA and the tax rate is in FRB. The DFP Quantize Immediate instruction rounds the product (FRA * FRB) to 2 fractional digits (TE field = -2) using Round to nearest, ties away from 0 (RMC field = 2). The quantized and rounded result is placed in FRT. dmul f0,FRA,FRB dquai -2,FRT,f0,2
(Rc=0) (Rc=1) 67
23
FRB RMC
16 21
Rc
31
dquaiq dquaiq. 63
0
(Rc=0) (Rc=1) 67
23
TE
11
FRBp RMC
16 21
Rc
31
The DFP operand in FRB[p] is converted and rounded to the form with the exponent specified by TE based on the rounding mode specified in the RMC field. TE is a 5-bit signed binary integer. The result of that form is placed in FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. The ideal exponent is the exponent specified by TE. When the value of the operand in FRB[p] is greater than (10p-1) % 10TE, where p is the format precision, an invalid operation exception is recognized. When the delivered result differs in value from the operand in FRB[p], an inexact exception is recognized. No underflow exception is recognized by this operation, regardless of the value of the operand in FRB[p]. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
183
Z23-form
(Rc=0) (Rc=1) 3
23
underflow exception is recognized by this operation, regardless of the value of the operand in FRB[p]. Figure 89 and Figure 90 summarize the actions. The tables do not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation exception, in which case the field remains unchanged. Special Register Altered: FPRF FR FI FX XX VXSNAN VXCVI CR1 Programming Note DFP Quantize can be used to adjust one DFP value (FRB[p]) to a form having the same exponent as a second DFP value (FRA[p]). If the adjustment requires the significand to be shifted left, then: if the result would cause overflow from the most significant digit, the result is a default QNaN.; otherwise the result is the adjusted value (left shifted with matching exponent). If the adjustment requires the significand to be shifted right, the result is rounded based on the value of the RMC field. Figure 88 shows examples of these adjustments.
FRB RMC
16 21
Rc
31
dquaq dquaq. 63
0
(Rc=0) (Rc=1) 3 Rc
31
(if Rc=1)
The DFP operand in register FRB[p] is converted and rounded to the form with the same exponent as that of the DFP operand in FRA[p] based on the rounding mode specified in the RMC field. The result of that form is placed in FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. The ideal exponent is the exponent specified in FRA[p]. When the value of the operand in FRB[p] is greater than (10p-1) % 10Ea, where p is the format precision and Ea is the exponent of the operand in FRA[p], an invalid operation exception is recognized. When the delivered result differs in value from the operand in FRB[p], an inexact exception is recognized. No
FRA 1 (1 x 100) 1.00 (100 x 10-2) 1 (1 x 100) 1.00 (100 x 10-2) 1 (1 x 100) 1.00 (100 x 10-2) 0.01 (1 x 10-2) 1 (1 x 100) 1.0 (10 x 10-1)
FRB 9. (9 x 100) 9. (9 x 100) 49.1234 (491234 x 10-4) 49.1234 (491234 x 10-4) 49.9876 (499876 x 10-4) 49.9876 (499876 x 10-4) 49.9876 (499876 x 10-4) 9999999999999999 (9999999999999999 x 100) 9999999999999999 (9999999999999999 x 100)
FRT when RMC=1 9 (9 x 100) 9.00 (900 x 10-2) 49 (49 x 100) 49.12 (4912 x 10-2) 49 (49 x 100) 49.98 (4998 x 10-2) 49.98 (4998 x 10-2) 9999999999999999 (9999999999999999 x 100) QNaN
FRT when RMC=2 9 (9 x 100) 9.00 (900 x 10-2) 49 (49 x 100) 49.12 (4912 x 10-2) 50 (50 x 100) 49.99 (4999 x 10-2) 49.99 (4999 x 10-2) 9999999999999999 (9999999999999999 x 100) QNaN
184
Power ISA I
Operand a in FRA[p] is 0 Fn QNaN SNaN Explanation: * dINF dNaN Fn P(x) T(x) U(x) VXCVI VXSNAN
Actions for Quantize when operand b in FRB[p] is Fn QNaN * VXCVI: T(dNaN) P(b) * VXCVI: T(dNaN) P(b) VXCVI: T(dNaN) T(dINF) P(b) P(a) P(a) P(a) VXSNAN: U(a) VXSNAN: U(a) VXSNAN: U(a)
SNaN VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(b) VXSNAN: U(a)
See next table. Default infinity Default quiet NaN Finite nonzero numbers (includes both subnormal and normal numbers) The QNaN of operand x is propagated and placed in FRT[p] The value x is placed in FRT[p] The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions) The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions)
Figure 89. Actions (part 1) Quantize Actions for Quantize when operand b in FRB[p] is 0 Fn E(0) VXCVI: T(dNaN) Vb > (10p - 1) % 10Te E(0) L(b) Vb [ (10p - 1) % 10Te E(0) W(b) E(0) QR(b)
Te < Se
Te = Se Te > Se Explanation: dNaN Default quiet NaN E(0) The value of zero with the exponent value Te is placed in FRT[p]. L(x) The operand x is converted to the form with the exponent value Te. p The precision of the format. QR(x) The operand x is rounded to the result of the form with the exponent value Te based on the specified rounding mode. The result of that form is placed in FRT[p]. Se The exponent of the operand in FRB[p]. Te The target exponent; FRA[p] for dqua[q], or TE, a 5-bit signed binary integer for dquai[q]. T(x) The value x is placed in FRT[p]. The value of the operand in FRB[p]. Vb W(x) The value and the form of operand x is placed in FRT[p]. The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is VXCVI: disabled. (See Section 5.5.10.1 for actions.) Figure 90. Actions (part2) Quantize
185
Z23-form
(Rc=0) (Rc=1 35
23
invalid-operation exception, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX XX VXSNAN VXCVI CR1 Programming Note DFP Reround can be used to adjust a DFP value (FRB[p]) to have no more than a specified number (FRA[p]58:63) of significant digits. The result (FRT[p]) is right-justified leaving the specified number of digits and rounded as specified by the RMC field. If rounding increases the number of significant digits, the result is adjusted again (the significand is shifted right 1 digit and the exponent is incremented by 1). Figure 91 has example results from DFP Reround for 1, 2, and 10 significant digits. Programming Note DFP Reround is primarily used to round a DFP value to a specific number of digits before conversion to string format for printing or display. Another use for DFP Reround is to obtain the effective exponent of the most significant digit by specifying a reference significance of 1. The exponent can be extracted and used to compute the number of significant digits or to left-justify a value. For example, the following sequence computes the number of significant digits and returns it as an integer. FRB is the DFP value for which we want the number of significant digits; f13 contains the reference significance value 0x0000000000000001; and r1 is the stack pointer, with free space for doublewords at offsets -8 and -16. These doublewords are used to transfer the biased exponents from the FPRs to GPRs for integer computation. R3 contains the result of E(reround(1,FRA) ) - E(FRA) + 1, where E(x) represents the biased exponent of x. dxex stfd drrnd dxex stfd lfd lfd subf addi f0,FRB f0,-16(r1) f1,f13,FRB,1 # reround 1 digit toward 0 f1,f1 f1,-8(r1) r11,-16(r1) r3,-8(r1) r3,r11,r3 r3,r3,1
FRB RMC
16 21
Rc
31
(if Rc=1)
drrndq drrndq. 63
0
(Rc=0) (Rc=1) 35 Rc
31
Let k be the contents of bits 58:63 of FRA that specifies the reference significance. When the DFP operand in FRB[p] is a finite number, and if the reference significance is zero, or if the reference significance is nonzero and the number of significant digits of the source operand is less than or equal to the reference significance, then the value and the form of the source operand is placed in FRT[p]. If the reference significance is nonzero and the number of significant digits of the source operand is greater than the reference significance, then the source operand is converted and rounded to the number of significant digits specified in the reference significance based on the rounding mode specified in the RMC field. The result of the form with the specified number of significant digits is placed in FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. For this instruction, the number of significant digits of the value 0 is considered to be zero. The ideal exponent is the greater value of the exponent of the operand in FRB[p] and the referenced exponent. The referenced exponent is the resultant exponent if the operand in FRB[p] would have been converted and rounded to the number of significant digits specified in the reference significance based on the rounding mode specified in the RMC field. If the exponent of the rounded result of the form that has the specified number of significant digits would be greater than Xmax, an invalid operation exception (VXCVI) occurs. When the invalid-operation exception occurs, and if the exception is disabled, a default QNaN is returned. When an invalid-operation exception occurs, no inexact exception is recognized. In the absence of an invalid-operation exception, if the result differs in value from the operand in FRB[p], an inexact exception is recognized. This operation causes neither an overflow nor an underflow exception. Figure 92 summarizes the actions for Reround. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled
Given the value 412.34 the result is E(4 x 102) E(41234 x 10-2) + 1 = (398+2) - (398-2) + 1 = 400 396 + 1 = 5. Additional code is required to detect and handle special values like Subnormal, Infinity, and NAN.
186
Power ISA I
FRA58:63 (binary) 1 1 1 1 2 2 2 10 10 10
FRB 0.41234 (41234 % 10-5) 4.1234 (41234 % 10-4) 41.234 (41234 % 10-3) 412.34 (41234 % 10-2) 0.491234 (491234 % 10-6) 0.499876 (499876 % 10-6) 0.999876 (999876 % 10-6) 0.491234 (491234 % 10-6) 999.999 (999999 % 10-3) 9999999999999999 (9999999999999999 % 100)
FRT when RMC=1 0.4 (4 % 10-1) 4 (4 % 100) 4 (4 % 101) 4 (4 % 102) 0.49 (49 % 10-2) 0.49 (49 % 10-2) 0.99 (99 % 10-2) 0.491234 (491234 % 10-6) 999.999 (999999 % 10-3) 9.999999999E+14 (9999999999 % 105)
FRT when RMC=2 0.4 (4 % 10-1) 4 (4 % 100) 4 (4 % 101) 4 (4 % 102) 0.49 (49 % 10-2) 0.50 (50 % 10-2) 1.0 (10 % 10-1) 0.491234 (491234 % 10-6) 999.999 (999999 % 10-3) 1.000000000E+15 (1000000000 % 106)
Figure 91. DFP Reround examples Programming Note DFP Reround combined with DFP Quantize can be used to left justify a value (as needed by the frexp function). FRB is the DFP value for which we want to left justify; f13 contains the reference significance value 0x0000000000000001; and r1 is the stack pointer, with free space for a doubleword at offset -8. This doubleword is used to transfer the biased exponents from the FPR to a GPR, for integer computation. The adjusted biased exponent (+ format precision - 1) is transferred back into an FPR so it can be inserted into the rerounded value. The adjusted rerounded value becomes the quantize reference value. The quantize instruction returns the left justified result in FRT. drrnd dxex stfd lfd addi lfd stfd diex dqua f1,f13,FRB,1 # reround 1 digit toward 0 f0,f1 f0,-8(r1) r11,-8(r1) r11,r11,15 # biased exp + precision - 1 r11,-8(r1) f0,-8(r1) f1,f0,f1 # adjust exponent FRT,f1,f0,1 # quantize to adjusted exponent
187
0* W(b)
Actions for Reround when operand b in FRB[p] is Fn QNaN RR(b) or T(dINF) P(b) VXCVI: T(dNaN) W(b) T(dINF) P(b) W(b) T(dINF) P(b)
The number of significant digits of the value 0 is considered to be zero for this instruction. Not applicable. Default infinity. Finite nonzero numbers (includes both subnormal and normal numbers). Reference significance, which specifies the number of significant digits in the target operand. Number of significant digits in the operand in FRB[p]. The QNaN of operand x is propagated and placed in FRT[p]. The value x is rounded to the form that has the specified number of significant digits. If RR(x) [ (10k-1) % 10Xmax, then RR(x) is returned; otherwise an invalid-operation exception is recognized. The value x is placed in FRT[p]. The SNaN of operand x is converted to the corresponding QNaN and placed in FRT[p]. The Invalid-Operation Exception (VXCVI) occurs. The result is produced only when the exception is disabled. (See Section 5.5.10.1 for actions.) The Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. See Section 5.5.10.1 for actions. The value and the form of x is placed in FRT[p].
188
Power ISA I
Programming Note The DFP Round To FP Integer With Inexact and DFP Round To FP Integer With Inexact Quad instructions can be used to implement the decimal equivalent of the C99 rint function by specifying the primary RMC encoding for round according to FPSCRDRN (R=0, RMC=11). The specification for rint requires the inexact exception be raised if detected.
(Rc=0) (Rc=1) 99 Rc
31
drintxq drintxq. 63
0
(Rc=0) (Rc=1) 99 Rc
31
The DFP operand in FRB[p] is rounded to a floating-point integer and placed into FRT[p]. The sign of the result is the same as the sign of the operand in FRB[p]. The ideal exponent is the larger value of zero and the exponent of the operand in FRB[p]. The rounding mode used is specified in the RMC field. When the RMC-encoding-selection (R) bit is zero, the RMC field contains the primary encoding; when the bit is one, the field contains the secondary encoding. In addition to coercion of the converted value to fit the target format, the special rounding used by Round To FP Integer also coerces the target exponent to the ideal exponent. When the operand in FRB[p] is a finite number and the exponent is less than zero, the operand is rounded to the result with an exponent of zero. When the exponent is greater than or equal to zero, the result is set to the numerical value and the form of the operand in FRB[p]. When the result differs in value from the operand in FRB[p], an inexact exception is recognized. No underflow exception is recognized by this operation, regardless of the value of the operand in FRB[p]. Figure 93 summarizes the actions for Round To FP Integer With Inexact. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation, in which case the field remains unchanged. Special Registers Altered: FPRF FR FI FX XX VXSNAN CR1
(if Rc=1)
189
Operand b in FRB is
Actions* - No1 T(-dINF), FI 0, FR 0 F No W(n), FI 0, FR 0 F Yes W(n), FI 1, FR 0, XX 1 F Yes W(n), FI 1, FR 1, XX 1 F Yes W(n), FI 1, FR 0, XX 1, TX F Yes W(n), FI 1, FR 1, XX 1, TX T(+dINF), FI 0, FR 0 + No1 QNaN No1 P(b), FI 0, FR 0 U(b), FI 0, FR 0, VXSNAN 1 SNaN No1 1 VXSNAN 1, TV SNaN No Explanation: * Setting of XX and VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR is part of the exception actions.(See the sections, Inexact Exception and Invalid Operation Exception for more details.) The actions do not depend on this condition. 1 This condition is true by virtue of the state of some condition to the left of this column. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round To FP Integer. The QNaN of operand x is propagated and placed in FRT[p]. P(x) T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. XX Floating-Point-Inexact-Exception status flag, FPSCRXX.
Is n not precise (n b)
190
Power ISA I
Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1 Programming Note
(if Rc=1)
R FRB RMC
15 16 21
Rc
31
drintnq drintnq. 63
0
The DFP Round To FP Integer Without Inexact and DFP Round To FP Integer Without Inexact Quad instructions can be used to implement decimal equivalents of several C99 rounding functions by specifying the appropriate R and RMC field values. FunctionR Ceil 1 Floor 1 Nearbyint0 Round 0 Trunc 0 RMC 0b00 0b01 0b11 0b10 0b01
///
11
R FRBp RMC
15 16 21
Rc
31
This operation is the same as the Round To FP Integer With Inexact operation, except that this operation does not recognize an inexact exception. Figure 94 summarizes the actions for Round To FP Integer Without Inexact. The table does not include the setting of the FPSCRFPRF field. The FPSCRFPRF field is always set to the class and sign of the result, except for an enabled invalid-operation, in which case the field remains unchanged.
Note that nearbyint is similar to the rint function but without raising the inexact exception. Similarly ceil, floor, round, and trunc do not require the inexact exception.
Operand b in Inv.-Op. Exception Actions* FRB is Enabled - T(-dINF), FI 0, FR 0 F W(n), FI 0, FR 0 + T(+dINF), FI 0, FR 0 QNaN P(b), FI 0, FR 0 SNaN No U(b), FI 0, FR 0, VXSNAN 1 SNaN Yes VXSNAN 1, TV Explanation: * Setting of VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, Invalid Operation Exception for more details.) The actions do not depend on this condition. dINF Default infinity. F All finite numbers, including zeros. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. n The value derived when the source operand, b, is rounded to an integer using the special rounding for Round-To-FP-Integer. P(x) The QNaN of operand x is propagated and placed in FRT[p]. T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. U(x) The SNaN of operand x is converted to the corresponding QNaN and placed in FPT[p]. W(x) The value x in the form of zero exponent or the source exponent is placed in FRT[p]. Figure 94. Actions: Round to FP Integer Without Inexact
191
Programming Note DFP does not provide operations on short operands, so they must be converted to long format, and then converted back to be stored. Preserving correct signaling NaN semantics requires that signaling NaNs be propagated from the source to the result without recognizing an exception during widening from short to long or narrowing from long to short. Because DFP does not provide equivalents to the FP Load Floating-Point Single and Store Floating-Point Single functions, the widening is performed by loading the DFP short value with a Load Floating as Integer Word Indexed followed by a DFP Convert to DFP Long, and narrowing is performed by a DFP Round to DFP Short followed by a Store Floating-Point as Integer Word Indexed. If the SNaN or infinity in DFP short format uses the preferred DPD encoding, then converting this operand to DFP long format and back to DFP short will result in the original bit pattern.
SNaN Convert To DFP Long P(b)3,4 Convert To DFP Extended VXSNAN: U(b)2,4 Round To DFP Short P(b)3,5 Round To DFP Long VXSNAN: U(b)2,5 Explanation: 1The ideal exponent is the exponent of the source operand. 2Bits 5:N-1 of the N-bit combination field are set to zero. 3Bit 5 of the N-bit combination field is set to one. Bits 6:N-1 of the combination field are set to zero. 4The trailing significand field is padded on the left with zeros. 5Leftmost digits in the trailing significand field are removed. dINFDefault infinity. FAll finite numbers, including zeros. P(x)The special symbol in operand x is propagated into FRT[p]. R(x)The value x is rounded to the target-format precision; see Section 5.5.11 T(x)The value x is placed in FRT[p]. U(x)The SNaN of operand x is converted to the corresponding QNaN. VXSNANThe Invalid-Operation Exception (VXSNAN) occurs. The result is produced only when the exception is disabled. See Section 5.5.10.1 for actions. Figure 95. Actions: Data-Format Conversion Instructions
Instruction
Actions when operand b in FRB[p] is QNaN P(b)2,4 P(b)2,4 T(dINF) P(b)2,4 2,5 P(b) P(b)2,5 T(dINF) P(b)2,5
192
Power ISA I
X-form
(Rc=0) (Rc=1)
///
FRB
16 21
258
Rc
31
///
258
Rc
31
The DFP short operand in bits 32-63 of FRB is converted to DFP long format and the converted result is placed into FRT. The sign of the result is the same as the sign of the source operand. The ideal exponent is the exponent of the source operand. If the operand in FRB is an SNaN, it is converted to an SNaN in DFP long format and does not cause an invalid-operation exception. Special Registers Altered: FPRF FR (undefined) CR1 FI (undefined) (if Rc=1)
The DFP long operand in the FRB is converted to DFP extended format and placed into FRTp. The sign of the result is the same as the sign of the operand in FRB. The ideal exponent is the exponent of the operand in FRB. If the operand in FRB is an SNaN, an invalid-operation exception is recognized. If the exception is disabled, the SNaN is converted to the corresponding QNaN in DFP extended format. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXSNAN CR1
Programming Note Note that DFP short format is a storage-only format, Therefore, conversion of a short SNaN to long format will not cause an exception and the SNaN is preserved. Subsequent operation on that SNaN in long format will cause an exception.
(if Rc=1)
193
X-form
(Rc=0) (Rc=1)
X-form
(Rc=0) (Rc=1)
///
FRB
16 21
770
Rc
31
///
FRBp
16 21
770
Rc
31
The DFP long operand in FRB is converted and rounded to DFP short format. The DFP short value is extended on the left with zeros to form a 64-bit entity and placed into FRT. The sign of the result is the same as the sign of the source operand. The ideal exponent is the exponent of the source operand. If the operand in FRB is an SNaN, it is converted to an SNaN in DFP short format and does not cause an invalid-operation exception. Normally, the result is in the format and length of the target. However, when an overflow or underflow exception occurs and if the exception is enabled, the operation is completed by producing a wrapped rounded result in the same format and length as the source but rounded to the target-format precision. Special Registers Altered: FPRF FR FI FX OX UX XX CR1
The DFP extended operand in FRBp is converted and rounded to DFP long format. The result concatenated with 64 0s is placed in FRTp. The sign of the result is the same as the sign of the source operand. The ideal exponent is the exponent of the operand in FRBp. If the operand in FRBp is an SNaN, an invalid-operation exception is recognized. If the exception is disabled, the SNaN is converted to the corresponding QNaN in DFP long format. Normally, the result is in the format and length of the target. However, when an overflow or underflow exception occurs and if the exception is enabled, the operation is completed by producing a wrapped rounded result in the same format and length as the source but rounded to the target-format precision. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN CR1
(if Rc=1)
(if Rc=1)
Programming Note Note that DFP short format is a storage-only format, Therefore, conversion of a long SNaN to short format will not cause an exception. Converting a long format SNaN to short format is an implied move operation.
Programming Note Note that DFP Round to DFP Long, while producing a result in DFP long format, actually targets a register pair, writing 64 0s in FRTp+1.
194
Power ISA I
X-form
(Rc=0) (Rc=1)
X-form
(Rc=0) (Rc=1)
///
FRB
16 21
802
Rc
31
///
FRB
16 21
290
Rc
31
The 64-bit signed binary integer in FRB is converted and rounded to a DFP Long value and placed into FRT. The sign of the result is the same as the sign of the source operand. The ideal exponent is zero. If the source operand is a zero, then a plus zero with a zero exponent is returned. The FPSCRFPRF field is set to the class and sign of the result. Special Registers Altered: FPRF FR FI FX XX CR1
dctfixq dctfixq. 63
0 6
///
290
Rc
31
The DFP operand in FRB[p] is rounded to an integer value and is placed into FRT in the 64-bit signed binary integer format. The sign of the result is the same as the sign of the source operand, except when the source operand is a NaN or a zero. Figure 96 summarizes the actions for Convert To Fixed. Special Registers Altered: FPRF (undefined) FR FI FX XX VXSNAN VXCVI CR1
(if Rc=1)
X-form
(Rc=0) (Rc=1)
(if Rc=1)
///
FRB
16 21
802
Rc
31
The 64-bit signed binary integer in FRB is converted and rounded to a DFP Extended value and placed into FRTp. The sign of the result is the same as the sign of the source operand. The ideal exponent is zero. If the source operand is a zero, then a plus zero with a zero exponent is returned. The FPSCRFPRF field is set to the class and sign of the result. Special Registers Altered: FPRF FR (undefined) CR1 FI (undefined) (if Rc=1)
Programming Note It is recommended that software pre-round the operand to a floating-point integral using drintx[q] or drintn[q] is a rounding mode other than the current rounding mode specified by FPSCRDRN is needed. Saving, modifying and restoring the FPSCR just to temporarily change the rounding mode is less efficient than just employing drintx[p] or drint[p] which override the current rounding mode using an immediate control field. For example if the desired function rounding is Round to Nearest, Ties away from 0 but the default rounding (from FPSCRDRN) is Round to Nearest, Ties to Even then following is preferred. drintn dctfix 0,f1,f1,2 f1,f1
195
Operand b in FRB[p] is
q is
- b < MN < MN T(MN), FI 0, FR 0, VXCVI 1 - b < MN < MN VXCVI 1, TV - < b < MN = MN T(MN), FI 1, FR 0, XX 1 - < b < MN = MN T(MN), FI 1, FR 0, XX 1,TX MN b < 0 T(n), FI 0, FR 0 MN b < 0 T(n), FI 1, FR 0, XX 1 MN b < 0 T(n), FI 1, FR 1, XX 1 MN b < 0 T(n), FI 1, FR 0, XX 1, TX MN b < 0 T(n), FI 1, FR 1, XX 1, TX 0 T(0), FI 0, FR 0 0 < b MP T(n), FI 0, FR 0 0 < b MP T(n), FI 1, FR 0, XX 1 0 < b MP T(n), FI 1, FR 1, XX 1 0 < b MP T(n), FI 1, FR 0, XX 1, TX 0 < b MP T(n), FI 1, FR 1, XX 1, TX MP < b < + = MP T(MP), FI 1, FR 0, XX 1 MP < b < + = MP T(MP), FI 1, FR 0, XX 1, TX MP < b + > MP T(MP), FI 0, FR 0, VXCVI 1 MP < b + > MP VXCVI 1, TV QNaN T(MN), FI 0, FR 0, VXCVI 1 QNaN VXCVI 1, TV SNaN T(MN),FI 0, FR 0, VXCVI 1,VXSNAN 1 SNaN VXCVI 1,VXSNAN 1, TV Explanation: * Setting of XX, VXCVI, and VXSNAN is part of the corresponding exception actions. Also, when an invalid-operation exception occurs, setting of FI and FR bits is part of the exception actions. (See the sections, Inexact Exception and Invalid Operation Exception for more details.) The actions do not depend on this condition. FI Floating-Point-Fraction-Inexact status flag, FPSCRFI. FR Floating-Point-Fraction-Rounded status flag, FPSCRFR. MN Maximum negative number representable by the 64-bit binary integer format MP Maximum positive number representable by the 64-bit binary integer format. n The value q converted to a fixed-point result. q The value derived when the source value b is rounded to an integer using the specified rounding mode T(x) The value x is placed in FRT[p]. TV The system floating-point enabled exception error handler is invoked for the invalid-operation exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. TX The system floating-point enabled exception error handler is invoked for the inexact exception if the FE0 and FE1 bits in the machine-state register are set to any mode other than the ignore-exception mode. VXCVI The FPSCRVXCVI invalid operation exception status bit. VXSNAN The FPSCRVXSNAN invalid operation exception status bit. XX Floating-Point-Inexact-Exception status flag, FPSCRXX. Figure 96. Actions: Convert To Fixed
Is n not precise (n b) No Yes Yes Yes Yes No No Yes Yes Yes Yes -
Actions *
196
Power ISA I
322
Rc
31
///
834
Rc
31
ddedpdq ddedpdq. 63
0 6
denbcdq denbcdq. 63
0 6
FRBp
16
Rc
31
///
834
Rc
31
A portion of the significand of the DFP operand in FRB[p] is converted to a signed or unsigned BCD number depending on the SP field. For infinity and NaN, the significand is considered to be the contents in the trailing significand field padded on the left by a zero digit. SP0 = 0 (unsigned conversion) The rightmost 16 digits of the significand (32 digits for ddedpdq) is converted to an unsigned BCD number and the result is placed into FRT[p]. SP0 = 1 (signed conversion) The rightmost 15 digits of the significand (31 digits for ddedpdq) is converted to a signed BCD number with the same sign as the DFP operand, and the result is placed into FRT[p]. If the DFP operand is negative, the sign is encoded as 0b1101. If the DFP operand is positive, SP1 indicates which preferred plus sign encoding is used. If SP1 = 0, the plus sign is encoded as 0b1100 (the option-1 preferred sign code), otherwise the plus sign is encoded as 0b1111(the option-2 preferred sign code). Special Registers Altered: CR1 (if Rc=1)
The signed or unsigned BCD operand, depending on the S field, in FRB[p] is converted to a DFP number. The ideal exponent is zero. S = 0 (unsigned BCD operand) The unsigned BCD operand in FRB[p] is converted to a positive DFP number of the same magnitude and the result is placed into FRT[p]. S = 1 (signed BCD operand) The signed BCD operand in FRB[p] is converted to the corresponding DFP number and the result is placed into FRT[p]. If an invalid BCD digit or sign code is detected in the source operand, an invalid-operation exception (VXCVI) occurs. FPSCRFPRF is set to the class and sign of the result, except for Invalid Operation Exception when FPSCRVE=1. Special Registers Altered: FPRF FR (set to 0) FI (set to 0) FX VXCVI CR1
(if Rc=1)
197
///
354
Rc
31
Rc
31
16
dxexq dxexq. 63
0 6
diexq diexq. 63
0 6
///
354
Rc
31
FRBp
16
Rc
31
The biased exponent of the operand in FRB[p] is extracted and placed into FRT in the 64-bit signed binary integer format. When the operand in FRB is an infinity, QNaN, or SNaN, a special code is returned. Operand Finite Number Infinity QNaN SNaN Result biased exponent value -1 -2 -3 (if Rc=1)
Let a be the value of the 64-bit signed binary integer in FRA. a Result QNaN a > MBE1 MBE m a m 0 Finite number with biased exponent a a = -1 Infinity a = -2 QNaN a = -3 SNaN a < -3 QNaN 1 Maximum biased exponent for the target format When 0 [ a [ MBE, a is the biased target exponent that is combined with the sign bit and the significand value of the DFP operand in FRB[p] to form the DFP result in FRT[p]. The ideal exponent is the specified target exponent. When a specifies a special code (a < 0 or a > MBE), an infinity, QNaN, or SNaN is formed in FRT[p] with the trailing significand field containing the value from the trailing significand field of the source operand in FRB[p], and with an N-bit combination field set as follows. For an Infinity result, the leftmost 5 bits are set to 0b11110, and the rightmost N-5 bits are set to zero. For a QNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to zero, and the rightmost N-5 bits are set to zero. For an SNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to one, and the rightmost N-5 bits are set to zero. Special Registers Altered: CR1 Programming Note The exponent bias value is 101 for DFP Short, 398 for DFP Long, and 6176 for DFP Extended. (if Rc=1)
The exponent bias value is 101 for DFP Short, 398 for DFP Long, and 6176 for DFP Extended.
198
Power ISA I
Actions for Insert Biased Exponent when operand b in FRB[p] specifies QNaN SNaN F N, Rb Z, Rb Z, Rb Z, Rb I, Rb I, Rb I, Rb I, Rb Q, Rb S, Rb Q, Rb S, Rb Q, Rb S, Rb Q, Rb S, Rb
All finite numbers, including zeros The combination field in FRT[p] is set to indicate a default Infinity. The combination field in FRT[p] is set to the specified biased exponent in FRA and the leftmost significand digit in FRB[p]. The combination field in FRT[p] is set to indicate a default QNaN. The combination field in FRT[p] is set to indicate a default SNaN. The combination field in FRT[p] is set to indicate the specific biased exponent in FRA and a leftmost coefficient digit of zero. The contents of the trailing significand field in FRB[p] are reencoded using preferred DPD encodings and the reencoded result is placed in the same field in FRT[p]. The sign bit of FRB[p] is copied into the sign bit in FRT[p].
199
(Rc=0) (Rc=1) SH
22
(Rc=0) (Rc=1) SH
22
66
Rc
31
98
Rc
31
dscliq dscliq. 63
0 6
(Rc=0) (Rc=1) SH
22
dscriq dscriq. 63
0
(Rc=0) (Rc=1) SH 98
22
66
Rc
31
FRAp
11 16
Rc
31
The significand of the DFP operand in FRA[p] is shifted left SH digits. For a NaN or infinity, all significand digits are in the trailing significand field. SH is a 6-bit unsigned binary integer. Digits shifted out of the leftmost digit are lost. Zeros are supplied to the vacated positions on the right. The result is placed into FRT[p]. The sign of the result is the same as the sign of the source operand in FRA[p]. If the source operand in FRA[p] is a finite number, the exponent of the result is the same as the exponent of the source operand. For an Infinity, QNaN or SNaN result, the target formats N-bit combination field is set as follows. For an Infinity result, the leftmost 5 bits are set to 0b11110, and the rightmost N-5 bits are set to zero. For a QNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to zero, and the rightmost N-6 bits are set to zero. For an SNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to one, and the rightmost N-6 bits are set to zero. Special Registers Altered: CR1 (if Rc=1)
The significand of the DFP operand in FRA[p] is shifted right SH digits. For a NaN or infinity, all significand digits are in the trailing significand field. SH is a 6-bit unsigned binary integer. Digits shifted out of the units digit are lost. Zeros are supplied to the vacated positions on the left. The result is placed into FRT[p]. The sign of the result is the same as the sign of the source operand in FRA[p]. If the source operand in FRA[p] is a finite number, the exponent of the result is the same as the exponent of the source operand. For an Infinity, QNaN or SNaN result, the target formats N-bit combination field is set as follows. For an Infinity result, the leftmost 5 bits are set to 0b11110, and the rightmost N-5 bits are set to zero. For a QNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to zero, and the rightmost N-6 bits are set to zero. For an SNaN result, the leftmost 5 bits are set to 0b11111, bit 5 is set to one, and the rightmost N-6 bits are set to zero. Special Registers Altered: CR1 (if Rc=1)
200
Power ISA I
Full Name DFP Add DFP Add Quad DFP Subtract DFP Subtract Quad DFP Multiply DFP Multiply Quad DFP Divide DFP Divide Quad DFP Compare Ordered DFP Compare Ordered Quad DFP Compare Unordered DFP Compare Unordered Quad DFP Test Data Class DFP Test Data Class Quad DFP Test Data Group DFP Test Data Group Quad DFP Test Exponent DFP Test Exponent Quad DFP Test Significance DFP Test Significance Quad DFP Quantize Immediate DFP Quantize Immediate Quad DFP Quantize DFP Quantize Quad DFP Reround DFP Reround Quad DFP Round To FP Integer With Inexact DFP Round To FP Integer With Inexact Quad DFP Round To FP Integer Without Inexact DFP Round To FP Integer Without Inexact Quad DFP Convert To DFP Long DFP Convert To DFP Extended DFP Round To DFP Short DFP Round To DFP Long DFP Convert From Fixed Quad DFP Convert To Fixed DFP Convert To Fixed Quad DFP Decode DPD To BCD
Operands
SNaN Vs G Y Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N Y Y Y Y Y Y Y Y Y Y N Y N Y Y Y N N N N N N N N N N N N N N N N N N N Y N Y N N N N -
IE Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y -
dadd daddq dsub dsubq dmul dmulq ddiv ddivq dcmpo dcmpoq dcmpu dcmpuq dtstdc dtstdcq dtstdg dtstdgq dtstex dtstexq dtstsf dtstsfq dquai dquaiq dqua dquaq drrnd drrndq drintx drintxq drintn drintnq dctdp dctqpq drsp drdpq dcffixq dctfix dctfixq ddedpd
X FRT, FRA, FRB X FRTp, FRAp, FRBp X FRT, FRA, FRB X FRTp, FRAp, FRBp X FRT, FRA, FRB X FRTp, FRAp, FRBp X FRT, FRA, FRB X FRTp, FRAp, FRBp X BF, FRA, FRB X BF, FRAp, FRBp X BF, FRA, FRB X BF, FRAp, FRBp Z22 BF, FRA, DCM Z22 BF, FRAp, DCM Z22 BF, FRA,DGM Z22 BF, FRAp, DGM X BF, FRA, FRB X BF, FRAp, FRBp X BF, FRA(FIX), FRB X BF, FRA(FIX), FRBp Z23 TE, FRT, FRB, RMC Z23 TE, FRTp, FRBp, RMC Z23 FRT,FRA,FRB,RMC Z23 FRTp,FRAp,FRBp, RMC Z23 FRT,FRA(FIX),FRB,RMC Z23 FRTp, FRA(FIX), FRBp, RMC
RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE RE -
C Y Y Y Y Y Y Y Y N N N N N N N N N N N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y U U N
Y Y Y Y Y Y Y Y Y Y Y Y Y
1 1
Y Y Y Y Y Y Y Y -
V Z O U X V Z O U X V V V V
Y1 Y
1
Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y2 Y Y2 Y Y U U N V V X X V V O UX O U X V V V V V V V V V V X X X X X X X X
Y Y Y Y Y Y Y Y Y# Y# U Y# Y Y U Y Y -
Z23 R,FRT, FRB,RMC Z23 R,FRTp,FRBp,RMC Z23 R,FRT, FRB,RMC Z23 R,FRTp, FRBp,RMC X FRT, FRB (DFP Short) X FRTp, FRB X FRT (DFP Short), FRB X FRTp, FRBp X FRTp, FRB (FIX) X FRT (FIX), FRB X FRT (FIX), FRBp X SP, FRT(BCD), FRB
201
Mnemonic
Encoding
FORM
Full Name
Operands
SNaN Vs G N N N N N N N N N N N N N Y Y Y Y Y Y
IE Y Y Y Y -
ddedpdq DFP Decode DPD To BCD Quad denbcd DFP Encode BCD To DPD
X SP, FRTp(BCD), FRBp X S, FRT, FRB (BCD) X S, FRTp, FRBp (BCD) X FRT (FIX), FRB X FRT (FIX), FRBp X FRT, FRA(FIX), FRB X FRTp, FRA(FIX), FRBp Z22 FRT,FRA,SH Z22 FRTp,FRAp,SH
RE RE RE RE RE RE RE RE
N Y Y N N N N N N N N
N Y Y N N N N N N N N V V
Y# Y# -
denbcdq DFP Encode BCD To DPD Quad dxex dxexq diex diexq dscli dscliq dscri dscriq DFP Extract Biased Exponent DFP Extract Biased Exponent Quad DFP Insert Biased Exponent DFP Insert Biased Exponent Quad DFP Shift Significand Left Immediate DFP Shift Significand Left Immediate Quad
DFP Shift Significand Right ImmeZ22 FRT,FRA,SH diate DFP Shift Significand Right ImmeZ22 FRTp,FRAp,SH diate Quad FI and FR are set to zeros for these instructions. Not applicable.
Explanation:
#
1 2
A unique definition of the FPSCRFPCC field is provided for the instruction. These are the only instructions that may generate an SNaN and also set the FPSCFPRF field. Since the BFP FPSCRFPRF field does not include a code for SNaN, these instructions cause the need for redefining the FPSCRFPRF field for DFP. A 6-bit immediate operand specifying the data-class mask. A 6-bit immediate operand specifying the data-group mask. An SNaN can be generated as the target operand. An ideal exponent is defined for the instruction. Setting of the FPSCRFI flag. Setting of the FPSCRFR flag. No. An overflow exception may be recognized. The record bit, Rc, is provided to record FPSCR0:3 in CR field 1. The trailing significand field is reencoded using preferred DPD encodings.The preferred DPD encoding are also used for propagated NaNs, or converted NaNs and infinities. A 2-bit immediate operand specifying the rounding-mode control. An one-bit immediate operand specifying if the operation is signed or unsigned. A two-bit immediate operand: one bit specifies if the operation is signed or unsigned and, for signed operations, another bit specifies which preferred plus sign code is generated. An underflow exception may be recognized. An invalid-operation exception may be recognized. An input operand of SNaN causes an invalid-operation exception. An inexact exception may be recognized. Yes. Undefined A zero-divide exception may be recognized.
202
Power ISA I
203
Clamp(x, y, z) x is interpreted as a signed integer. If the value of x is less than y, then the value y is returned, else if the value of x is greater than z, the value z is returned, else the value x is returned. if (x < y) then y result VSCRSAT 1 else if (x > z) then z result VSCRSAT 1 x else result RoundToSPIntCeil(x) The value x if x is a single-precision floating-point integer; otherwise the smallest single-precision floating-point integer that is greater than x. RoundToSPIntFloor(x) The value x if x is a single-precision floating-point integer; otherwise the largest single-precision floating-point integer that is less than x. RoundToSPIntNear(x) The value x if x is a single-precision floating-point integer; otherwise the single-precision floating-point integer that is nearest in value to x (in case of a tie, the even single-precision floating-point integer is used). RoundToSPIntTrunc(x) The value x if x is a single-precision floating-point integer; otherwise the largest single-precision floating-point integer that is less than x if x>0, or the smallest single-precision floating-point integer that is greater than x if x<0. RoundToNearSP(x) The single-precision floating-point number that is nearest in value to the infinitely-precise floating-point intermediate result x (in case of a tie, the single-precision floating-point value with the least-significant bit equal to 0 is used). ReciprocalEstimateSP(x) A single-precision floating-point estimate of the reciprocal of the single-precision floating-point number x. ReciprocalSquareRootEstimateSP(x) A single-precision floating-point estimate of the reciprocal of the square root of the single-precision floating-point number x. LogBase2EstimateSP(x) A single-precision floating-point estimate of the base 2 logarithm of the single-precision floating-point number x. Power2EstimateSP(x) A single-precision floating-point estimate of the 2 raised to the power of the single-precision floating-point number x.
204
Power ISA I
Word 1 Halfword 2 B4
32 40
Word 2 Halfword 4 B8
64 72
Halfword 1 B2
16 24
Halfword 3 B6
48 56
Halfword 5 B10
80
Halfword 7 B14
112
B1
B3
B5
B7
B9
B11
88
B13
104
B15
120 127
ISA scalar floating-point unit. Special instructions (mfvscr and mtvscr) are provided to move the VSCR from and to a vector register. When moved to or from a vector register, the 32-bit VSCR is right justified in the 128-bit vector register. When moved to a vector register, bits 0-95 of the vector register are cleared (set to 0). VSCR
96 127
Figure 101.Vector Status and Control Register VR0 VR1 ... ... VR30 VR31
0 127
The bit definitions for the VSCR are as follows. Bit(s) Description
96:110 Reserved 111 Vector Non-Java Mode (NJ) This bit controls how denormalized values are handled by Vector Floating-Point instructions. 0 Denormalized values are handled as specified by Java and the IEEE standard; see Section 6.6.1. 1 If an element in a source VR contains a denormalized value, the value 0 is used instead. If an instruction causes an Underflow Exception, the corresponding element in the target VR is set to 0. In both cases the 0 has the same sign as the denormalized or underflowing value. 112:126 Reserved 127 Vector Saturation (SAT) Every vector instruction having Saturate in its name implicitly sets this bit to 1 if any result of that instruction saturates; see Section 6.8. mtvscr can alter this bit explicitly. This bit is sticky; that is, once set to 1 it remains set to 1 until it is set to 0 by an mtvscr instruction. After the mfvscr instruction executes, the result in the target vector register will be architecturally precise. That is, it will reflect all updates to the SAT bit that could have been made by vector instructions logically preceding it in the program flow, and further, it will not reflect any SAT updates that may be made to it by vector instructions logically following it in the program flow. To implement this, processors may choose to make the mfvscr instruction execution serializing within the vec-
Figure 100.Vector Registers Depending on the instruction, the contents of a Vector Register are interpreted as a sequence of equal-length elements (bytes, halfwords, or words) or as a quadword. Each of the elements is aligned at its natural boundary within the Vector Register, as shown in Figure 99. Many instructions perform a given operation in parallel on all elements in a Vector Register. Depending on the instruction, a byte, halfword, or word element can be interpreted as a signed-integer, an unsigned-integer, or a logical value; a word element can also be interpreted as a single-precision floating-point value. In the instruction descriptions, phrases like signed-integer word element are used as shorthand for word element, interpreted as a signed-integer. Load and Store instructions are provided that transfer a byte, halfword, word, or quadword between storage and a Vector Register.
205
Programming Note The VRSAVE register can be used to indicate which VRs are currently being used by a program. If this is done, the operating system could save only those VRs when an interrupt occurs (see Book III), and could restore only those VRs when resuming the interrupted program. If this approach is taken it must be applied rigorously; if a program fails to indicate that a given VR is in use, software errors may occur that will be difficult to detect and correct because they are timing-dependent. Some operating systems save and restore VRSAVE only for programs that also use other vector registers.
206
Power ISA I
00 10
00 0
01 1
02 2
03 3
04 4
05 5
06 6
07 7
08 8
09 9
0A A
0B B
0C C
0D D
0E E
0F F
00 10 05 0 06 1 07 2 08 3 09 4 0A 5 0B 6 0C 7 0D 8 0E 9 0F A
00 B
01 C
02 D
03 E
04 F
Figure 104.Unaligned quadword storage operand Vhi Vlo Vt,Vs 05 00 0 Figure 105.Vector Register contents 06 01 07 02 08 03 09 04 0A 05 0B 06 0C 07 0D 08 0E 09 0F 0A 0B 0C 0D 0E 0F 15 00 01 02 03 04
207
Programming Note
The sequence of instructions given below is one approach that can be used to load the unaligned quadword shown in Figure 104 into a Vector Register. In Figure 105 Vhi and Vlo are the Vector Registers that will receive the most significant quadword and least significant quadword respectively. VRT is the target Vector Register. After the two quadwords have been loaded into Vhi and Vlo, using Load Vector Indexed instructions, the alignment is performed by shifting the 32-byte quantity Vhi || Vlo left by an amount determined by the address of the first byte of the desired data. The shifting is done using a Vector Permute instruction for which the permute control vector is generated by a Load Vector for Shift Left instruction. The Load Vector for Shift Left instruction uses the same address specification as the Load Vector Indexed instruction that loads the Vhi register; this is the address of the desired unaligned quadword. The following sequence of instructions copies the unaligned quadword storage operand into register Vt. # Assumptions: # Rb != 0 and contents of Rb = 0xB lvx Vhi,0,Rb # load MSQ lvsl Vp,0,Rb # set permute control vector addi Rb,Rb,16 # address of LSQ lvx Vlo,0,Rb # load LSQ vperm Vt,Vhi,Vlo,Vp # align the data The procedure for storing an unaligned quadword is essentially the reverse of the procedure for loading one. However, a read-modify-write sequence is required that inserts the source quadword into two aligned quadwords in storage. The quadword to be stored is assumed to be in Vs; see Figure 105 The contents of Vs are shifted right and split into two parts, each of which is merged (using a Vector Select instruction) with the current contents of the two aligned quadwords (MSQ and LSQ) that will contain the most significant bytes and least significant bytes, respectively, of the unaligned quadword. The resulting two quadwords are stored using Store Vector Indexed instructions. A Load Vector for Shift Right instruction is used to generate the permute control vector that is used for the shifting. A single register is used for the shifted contents; this is possible because the shifting is done by means of a right rotation. The rotation is accomplished by specifying Vs for both components of the Vector Permute instruction. In addition, the same permute control vector is used on a sequence of 1s and 0s to generate the mask used by the Vector Select instructions that do the merging. The following sequence of instructions copies the contents of Vs into an unaligned quadword in storage. # Assumptions: # Rb != 0 and contents of Rb = 0xB lvx Vhi,0,Rb # load current MSQ lvsr Vp,0,Rb # set permute control vector addi Rb,Rb,16 # address of LSQ lvx Vlo,0,Rb # load current LSQ vspltisb V1s,-1 # generate the select mask bits vspltisb V0s,0 vperm Vmask,V0s,V1s,Vp # generate the select mask vperm Vs,Vs,Vs,Vp # right rotate the data vsel Vlo,Vs,Vlo,Vmask # insert LSQ component vsel Vhi,Vhi,Vs,Vmask # insert MSQ component stvx Vlo,0,Rb # store LSQ addi Rb,Rb,-16 # address of MSQ stvx Vhi,0,Rb # store MSQ
208
Power ISA I
209
210
Power ISA I
211
212
Power ISA I
VRT,RA,RB VRT
11
RA
16
RB
21
/
31 0
31
RA
16
RB
21
39
/
31
0 (RA)
VRT undefined if Big-Endian byte ordering then MEM(EA,1) VRT8eb:8eb+7 else VRT120-(8eb):127-(8eb) MEM(EA,1) Let the effective address (EA) be the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access, the contents of the byte in storage at address EA are placed into byte eb of register VRT. The remaining bytes in register VRT are set to undefined values. If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access, the contents of the byte in storage at address EA are placed into byte 15-eb of register VRT. The remaining bytes in register VRT are set to undefined values. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFE EA EA60:63 eb undefined VRT if Big-Endian byte ordering then MEM(EA,2) VRT8eb:8eb+15 else MEM(EA,2) VRT112-(8eb):127-(8eb) Let the effective address (EA) be the result of ANDing 0xFFFF_FFFF_FFFF_FFFE with the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access,
the contents of the byte in storage at address EA are placed into byte eb of register VRT, the contents of the byte in storage at address EA+1 are placed into byte eb+1 of register VRT, and the remaining bytes in register VRT are set to undefined values.
If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access,
the contents of the byte in storage at address EA are placed into byte 15-eb of register VRT, the contents of the byte in storage at address EA+1 are placed into byte 14-eb of register VRT, and the remaining bytes in register VRT are set to undefined values.
213
X-form
VRT,RA,RB RA
16
RB
21
103
/
31
VRT
11
RA
16
RB
21
71
/
31
if RA = 0 then b 0 (RA) else b (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFC EA EA60:63 eb VRT undefined if Big-Endian byte ordering then VRT8eb:8eb+31 MEM(EA,4) else MEM(EA,4) VRT96-(8eb):127-(8eb) Let the effective address (EA) be the result of ANDing 0xFFFF_FFFF_FFFF_FFFC with the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access,
if RA = 0 then b 0 else b (RA) b + (RB) EA MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) VRT Let the effective address (EA) be the sum (RA|0)+(RB). The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into VRT. Special Registers Altered: None
X-form
VRT,RA,RB VRT
11
RA
16
RB
21
359
/
31
the contents of the byte in storage at address EA are placed into byte eb of register VRT, the contents of the byte in storage at address EA+1 are placed into byte eb+1 of register VRT, the contents of the byte in storage at address EA+2 are placed into byte eb+2 of register VRT, the contents of the byte in storage at address EA+3 are placed into byte eb+3 of register VRT, and the remaining bytes in register VRT are set to undefined values.
if RA = 0 then b 0 else b (RA) b + (RB) EA MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) VRT mark_as_not_likely_to_be_needed_again_anytime_soon ( EA ) Let the effective address (EA) be the sum (RA|0)+(RB). The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into VRT. lvxl provides a hint that the quadword in storage addressed by EA will probably not be needed again by the program in the near future. Special Registers Altered: None
If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access, - the contents of the byte in storage at address EA are placed into byte 15-eb of register VRT, - the contents of the byte in storage at address EA+1 are placed into byte 14-eb of register VRT, - the contents of the byte in storage at address EA+2 are placed into byte 13-eb of register VRT, - the contents of the byte in storage at address EA+3 are placed into byte 12-eb of register VRT, and - the remaining bytes in register VRT are set to undefined values. Special Registers Altered: None
214
Power ISA I
Programming Note On some implementations, the hint provided by the lvxl instruction and the corresponding hint provided by the stvxl, lvepxl, and stvepxl instructions are applied to the entire cache block containing the specified quadword. On such implementations, the effect of the hint may be to cause that cache block to be considered a likely candidate for replacement when space is needed in the cache for a new block. Thus, on such implementations, the hint should be used with caution if the cache block containing the quadword also contains data that may be needed by the program in the near future. Also, the hint may be used before the last reference in a sequence of references to the quadword if the subsequent references are likely to occur sufficiently soon that the cache block containing the quadword is not likely to be displaced from the cache before the last reference.
215
VRS,RA,RB VRS
11
RA
16
RB
21
135
/
31 0
31
RA
16
RB
21
167
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA EA60:63 eb if Big-Endian byte ordering then VRS8eb:8eb+7 MEM(EA,1) else MEM(EA,1) VRS120-(8eb):127-(8eb) Let the effective address (EA) be the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access, the contents of byte eb of register VRS are placed in the byte in storage at address EA. If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access, the contents of byte 15-eb of register VRS are placed in the byte in storage at address EA. Special Registers Altered: None Programming Note Unless bits 60:63 of the address are known to match the byte offset of the subject byte element in register VRS, software should use Vector Splat to splat the subject byte element before performing the store.
if RA = 0 then b 0 else b (RA) (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFE EA EA60:63 eb if Big-Endian byte ordering then VRS8eb:8eb+15 MEM(EA,2) else MEM(EA,2) VRS112-(8eb):127-(8eb) Let the effective address (EA) be the result of ANDing 0xFFFF_FFFF_FFFF_FFFE with the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access,
the contents of byte eb of register VRS are placed in the byte in storage at address EA, and the contents of byte eb+1 of register VRS are placed in the byte in storage at address EA+1.
If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access,
the contents of byte 15-eb of register VRS are placed in the byte in storage at address EA, and the contents of byte 14-eb of register VRS are placed in the byte in storage at address EA+1.
Special Registers Altered: None Programming Note Unless bits 60:62 of the address are known to match the halfword offset of the subject halfword element in register VRS software should use Vector Splat to splat the subject halfword element before performing the store.
216
Power ISA I
X-form
VRS,RA,RB 31 VRS
11
RA
16
RB
21
231
/
31
RA
16
RB
21
199
/
31
if RA = 0 then b 0 (RA) else b (b + (RB)) & 0xFFFF_FFFF_FFFF_FFFC EA eb EA60:63 if Big-Endian byte ordering then VRS8eb:8eb+31 MEM(EA,4) else VRS96-(8eb):127-(8eb) MEM(EA,4) Let the effective address (EA) be the result of ANDing 0xFFFF_FFFF_FFFF_FFFC with the sum (RA|0)+(RB). Let eb be bits 60:63 of EA. If Big-Endian byte ordering is used for the storage access,
(VRS)
Let the effective address (EA) be the sum (RA|0)+(RB). The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0. Special Registers Altered: None
X-form
VRS,RA,RB VRS
11
the contents of byte eb of register VRS are placed in the byte in storage at address EA, the contents of byte eb+1 of register VRS are placed in the byte in storage at address EA+1, the contents of byte eb+2 of register VRS are placed in the byte in storage at address EA+2, and the contents of byte eb+3 of register VRS are placed in the byte in storage at address EA+3.
RA
16
RB
21
487
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA (VRS) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) mark_as_not_likely_to_be_needed_again_anytime_soon (EA) Let the effective address (EA) be the sum (RA|0)+(RB). The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0. stvxl provides a hint that the quadword in storage addressed by EA will probably not be needed again by the program in the near future. Special Registers Altered: None Programming Note See the Programming Note for the lvxl instruction on page 214.
If Category: Vector.Little-Endian is supported, then if Little-Endian byte ordering is used for the storage access,
the contents of byte 15-eb of register VRS are placed in the byte in storage at address EA, the contents of byte 14-eb of register VRS are placed in the byte in storage at address EA+1, the contents of byte 13-eb of register VRS are placed in the byte in storage at address EA+2, and the contents of byte 12-eb of register VRS are placed in the byte in storage at address EA+3.
Special Registers Altered: None Programming Note Unless bits 60:61 of the address are known to match the word offset of the subject word element in register VRS, software should use Vector Splat to splat the subject word element before performing the store.
217
Programming Note Examples of uses of lvsl, lvsr, and vperm to load and store unaligned data are given in Section 6.4.1. These instructions can also be used to rotate or shift the contents of a Vector Register left (lvsl) or right (lvsr) by sh bytes. For rotating, the Vector Register to be rotated should be specified as both register VRA and VRB for vperm. For shifting left, VRB for vperm should be a register containing all zeros and VRA should contain the value to be shifted, and vice versa for shifting right.
VRT,RA,RB VRT
11
RA
16
RB
21
/
31
VRT
11
RA
16
RB
21
38
/
31
if RA = 0 then b 0 (RA) else b (b + (RB))60:63 sh switch(sh) case(0x0): VRT 0x000102030405060708090A0B0C0D0E0F case(0x1): VRT 0x0102030405060708090A0B0C0D0E0F10 case(0x2): VRT 0x02030405060708090A0B0C0D0E0F1011 case(0x3): VRT 0x030405060708090A0B0C0D0E0F101112 case(0x4): VRT 0x0405060708090A0B0C0D0E0F10111213 case(0x5): VRT 0x05060708090A0B0C0D0E0F1011121314 case(0x6): VRT 0x060708090A0B0C0D0E0F101112131415 case(0x7): VRT 0x0708090A0B0C0D0E0F10111213141516 case(0x8): VRT 0x08090A0B0C0D0E0F1011121314151617 case(0x9): VRT 0x090A0B0C0D0E0F101112131415161718 case(0xA): VRT 0x0A0B0C0D0E0F10111213141516171819 case(0xB): VRT 0x0B0C0D0E0F101112131415161718191A case(0xC): VRT 0x0C0D0E0F101112131415161718191A1B case(0xD): VRT 0x0D0E0F101112131415161718191A1B1C case(0xE): VRT 0x0E0F101112131415161718191A1B1C1D case(0xF): VRT 0x0F101112131415161718191A1B1C1D1E Let sh be bits 60:63 of the sum (RA|0)+(RB). Let X be the 32 byte value 0x00 || 0x01 || 0x02 || || 0x1E || 0x1F. Bytes sh to sh+15 of X are placed into VRT. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b sh (b + (RB))60:63 switch(sh) case(0x0): VRT 0x101112131415161718191A1B1C1D1E1F case(0x1): VRT 0x0F101112131415161718191A1B1C1D1E case(0x2): VRT 0x0E0F101112131415161718191A1B1C1D case(0x3): VRT 0x0D0E0F101112131415161718191A1B1C case(0x4): VRT 0x0C0D0E0F101112131415161718191A1B case(0x5): VRT 0x0B0C0D0E0F101112131415161718191A case(0x6): VRT 0x0A0B0C0D0E0F10111213141516171819 case(0x7): VRT 0x090A0B0C0D0E0F101112131415161718 case(0x8): VRT 0x08090A0B0C0D0E0F1011121314151617 case(0x9): VRT 0x0708090A0B0C0D0E0F10111213141516 case(0xA): VRT 0x060708090A0B0C0D0E0F101112131415 case(0xB): VRT 0x05060708090A0B0C0D0E0F1011121314 case(0xC): VRT 0x0405060708090A0B0C0D0E0F10111213 case(0xD): VRT 0x030405060708090A0B0C0D0E0F101112 case(0xE): VRT 0x02030405060708090A0B0C0D0E0F1011 case(0xF): VRT 0x0102030405060708090A0B0C0D0E0F10 Let sh be bits 60:63 of the sum (RA|0)+(RB). Let X be the 32-byte value 0x00 || 0x01 || 0x02 || || 0x1E || 0x1F. Bytes 16-sh to 31-sh of X are placed into VRT. Special Registers Altered: None
218
Power ISA I
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
782
31
Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 7, do the following. Word element i in the source vector is packed to produce a 16-bit value as described below. - bit 7 of the first byte (bit 7 of the word) - bits 0:4 of the second byte (bits 8:12 of the word) - bits 0:4 of the third byte (bits 16:20 of the word) - bits 0:4 of the fourth byte (bits 24:28 of the word) The result is placed into halfword element i of VRT. Special Registers Altered: None Programming Note Each source word can be considered to be a 32-bit "pixel", consisting of four 8-bit "channels". Each target halfword can be considered to be a 16-bit pixel, consisting of one 1-bit channel and three 5-bit channels. A channel can be used to specify the intensity of a particular color, such as red, green, or blue, or to provide other information needed by the application.
219
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
398
31 0
VRA
16
VRB
21
270
31
do i=0 to 63 by 8 VRTi:i+7 Clamp(EXTS((VRA)i2:i2+15 ), -128, 127)24:31 VRTi+64:i+71 Clamp(EXTS((VRB)i2:i2+15 ), -128, 127)24:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 15, do the following. Signed-integer halfword element i in the source vector is converted to an signed-integer byte. - If the value of the element is greater than 127 the result saturates to 127 - If the value of the element is less than -128 the result saturates to -128. The low-order 8 bits of the result is placed into byte element i of VRT. Special Registers Altered: SAT
do i=0 to 63 by 8 VRTi:i+7 Clamp(EXTS((VRA)i2:i2+15), 0, 255)24:31 VRTi+64:i+71 Clamp(EXTS((VRB)i2:i2+15), 0, 255)24:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 15, do the following. Signed-integer halfword element i in the source vector is converted to an unsigned-integer byte. - If the value of the element is greater than 255 the result saturates to 255 - If the value of the element is less than 0 the result saturates to 0. The low-order 8 bits of the result is placed into byte element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
462
31 0
VRA
16
VRB
21
334
31
do i=0 to 63 by 16 VRTi:i+15 Clamp(EXTS((VRA)i2:i2+31, -215, 215-1)16:31 VRTi+64:i+79 Clamp(EXTS((VRB)i2:i2+31, -215, 215-1)16:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 7, do the following. Signed-integer word element i in the source vector is converted to an signed-integer halfword. - If the value of the element is greater than 215-1 the result saturates to 215-1 - If the value of the element is less than -215 the result saturates to -215. The low-order 16 bits of the result is placed into halfword element i of VRT. Special Registers Altered: SAT
do i=0 to 63 by 16 VRTi:i+15 Clamp(EXTS((VRA)i2:i2+31), 0, 216-1)16:31 VRTi+64:i+79 Clamp(EXTS((VRB)i2:i2+31), 0, 216-1 )16:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 7, do the following. Signed-integer word element i in the source vector is converted to an unsigned-integer halfword. - If the value of the element is greater than 216-1 the result saturates to 216-1 - If the value of the element is less than 0 the result saturates to 0. The low-order 16 bits of the result is placed into halfword element i of VRT. Special Registers Altered: SAT
220
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
14
31 0
VRA
16
VRB
21
142
31
do i=0 to 63 by 8 (VRA)i2+8:i2+15 VRTi:i+7 VRTi+64:i+71 (VRB)i2+8:i2+15 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 15, do the following. The contents of bits 8:15 of halfword element i in the source vector is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 63 by 8 VRTi:i+7 Clamp( EXTZ((VRA)i2:i2+15), 0, 255 )24:31 VRTi+64:i+71 Clamp( EXTZ((VRB)i2:i2+15), 0, 255 )24:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 15, do the following. Unsigned-integer halfword element i in the source vector is converted to an unsigned-integer byte. - If the value of the element is greater than 255 the result saturates to 255. The low-order 8 bits of the result is placed into byte element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
78
31 0
VRA
16
VRB
21
206
31
do i=0 to 63 by 16 VRTi:i+15 (VRA)i2+16:i2+31 VRTi+64:i+79 (VRB)i2+16:i2+31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 7, do the following. The contents of bits 16:31 of word element i in the source vector is placed into halfword element i of VRT. Special Registers Altered: None
do i=0 to 63 by 16 VRTi:i+15 Clamp( EXTZ((VRA)i2:i2+31 ), 0, 216-1 )16:31 VRTi+64:i+79 Clamp( EXTZ((VRB)i2:i2+31 ), 0, 216-1 )16:31 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 7, do the following. Unsigned-integer word element i in the source vector is converted to an unsigned-integer halfword. - If the value of the element is greater than 216-1 the result saturates to 216-1. The low-order 16 bits of the result is placed into halfword element i of VRT. Special Registers Altered: SAT
221
VX-form
VRT,VRB VRT
11
///
16
VRB
21
846
31 0
///
16
VRB
21
526
31
do i=0 to 63 by 8 VRTi2:i2+15 EXTS((VRB)i:i+7) For each vector element i from 0 to 7, do the following. Signed-integer byte element i in VRB is sign-extended to produce a signed-integer halfword and placed into halfword element i in VRT. Special Registers Altered: None
For each vector element i from 0 to 3, do the following. Halfword element i in VRB is unpacked as follows. - sign-extend bit 0 of the halfword to 8 bits - zero-extend bits 1:5 of the halfword to 8 bits - zero-extend bits 6:10 of the halfword to 8 bits - zero-extend bits 11:15 of the halfword to 8 bits The result is placed in word element i of VRT. Special Registers Altered: None Programming Note The source and target elements can be considered to be 16-bit and 32-bit pixels respectively, having the formats described in the Programming Note for the Vector Pack Pixel instruction on page 219. Programming Note Notice that the unpacking done by the Vector Unpack Pixel instructions does not reverse the packing done by the Vector Pack Pixel instruction. Specifically, if a 16-bit pixel is unpacked to a 32-bit pixel which is then packed to a 16-bit pixel, the resulting 16-bit pixel will not, in general, be equal to the original 16-bit pixel (because, for each channel except the first, Vector Unpack Pixel inserts high-order bits while Vector Pack Pixel discards low-order bits).
VRT,VRB VRT
11
///
16
VRB
21
590
31
do i=0 to 63 by 16 EXTS((VRB)i:i+15) VRTi2:i2+31 For each vector element i from 0 to 3, do the following. Signed-integer halfword element i in VRB is sign-extended to produce a signed-integer word and placed into word element i in VRT. Special Registers Altered: None
222
Power ISA I
VX-form
VRT,VRB VRT
11
///
16
VRB
21
974
31 0
///
16
VRB
21
654
31
do i=0 to 63 by 8 VRTi2:i2+15 EXTS((VRB)i+64:i+71) For each vector element i from 0 to 7, do the following. Signed-integer byte element i+8 in VRB is sign-extended to produce a signed-integer halfword and placed into halfword element i in VRT. Special Registers Altered: None
For each vector element i from 0 to 3, do the following. Halfword element i+4 in VRB is unpacked as follows. - sign-extend bit 0 of the halfword to 8 bits - zero-extend bits 1:5 of the halfword to 8 bits - zero-extend bits 6:10 of the halfword to 8 bits - zero-extend bits 11:15 of the halfword to 8 bits The result is placed in word element i of VRT. Special Registers Altered: None
VRT,VRB VRT
11
///
16
VRB
21
718
31
do i=0 to 63 by 16 EXTS((VRB)i+64:i+79) VRTi2:i2+31 For each vector element i from 0 to 3, do the following. Signed-integer halfword element i+4 in VRB is sign-extended to produce a signed-integer word and placed into word element i in VRT. Special Registers Altered: None
223
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
12
31 0
VRA
16
VRB
21
76
31
do i=0 to 63 by 8 VRTi2:i2+7 (VRA)i:i+7 VRTi2+8:i2+15 (VRB)i:i+7 For each vector element i from 0 to 7, do the following. Byte element i in VRA is placed into byte element 2i in VRT. Byte element i in VRB is placed into byte element 2i+1 in VRT. Special Registers Altered: None
do i=0 to 63 by 16 VRTi2:i2+15 (VRA)i:i+15 VRTi2+16:i2+31 (VRB)i:i+15 For each vector element i from 0 to 3, do the following. Halfword element i in VRA is placed into halfword element 2i in VRT. Halfword element i in VRB is placed into halfword element 2i+1 in VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
140
31
do i=0 to 63 by 32 VRTi2:i2+31 (VRA)i:i+31 VRTi2+32:i2+63 (VRB)i:i+31 For each vector element i from 0 to 1, do the following. Word element i in VRA is placed into word element 2i in VRT. Word element i in VRB is placed into word element 2i+1 in VRT. The word elements in the high-order half of VRA are placed, in the same order, into the even-numbered word elements of VRT. The word elements in the high-order half of VRB are placed, in the same order, into the odd-numbered word elements of VRT. Special Registers Altered: None
224
Power ISA I
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
268
31 0
VRA
16
VRB
21
332
31
do i=0 to 63 by 8 VRTi2:i2+7 (VRA)i+64:i+71 VRTi2+8:i2+15 (VRB)i+64:i+71 For each vector element i from 0 to 7, do the following. Byte element i+8 in VRA is placed into byte element 2i in VRT. Byte element i+8 in VRB is placed into byte element 2i+1 in VRT. Special Registers Altered: None
do i=0 to 63 by 16 VRTi2:i2+15 (VRA)i+64:i+79 VRTi2+16:i2+31 (VRB)i+64:i+79 For each vector element i from 0 to 3, do the following. Halfword element i+4 in VRA is placed into halfword element 2i in VRT. Halfword element i+4 in VRB is placed into halfword element 2i+1 in VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
396
31
do i=0 to 63 by 32 (VRA)i+64:i+95 VRTi2:i2+31 VRTi2+32:i2+63 (VRB)i+64:i+95 For each vector element i from 0 to 1, do the following. Word element i+2 in VRA is placed into word element 2i in VRT. Word element i+2 in VRB is placed into word element 2i+1 in VRT. Special Registers Altered: None
225
Programming Note The Vector Splat instructions can be used in preparation for performing arithmetic for which one source vector is to consist of elements that all have the same value (e.g., multiplying all elements of a Vector Register by a constant).
VX-form
VRB
21
524
31 0
SIM
16
///
21
780
31
b UIM || 0b000 do i=0 to 127 by 8 VRTi:i+7 (VRB)b:b+7 For each vector element i from 0 to 15, do the following. The contents of byte element UIM in VRB are placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 8 EXTS(SIM, 8) VRTi:i+7 For each vector element i from 0 to 15, do the following. The value of the SIM field, sign-extended to 8 bits, is placed into byte element i of VRT. Special Registers Altered: None
VX-form
VRB
21
588
31 0
SIM
16
///
21
844
31
b UIM || 0b0000 do i=0 to 127 by 16 VRTi:i+15 (VRB)b:b+15 For each vector element i from 0 to 7, do the following. The contents of halfword element UIM in VRB are placed into halfword element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTS(SIM, 16) VRTi:i+15 For each vector element i from 0 to 7, do the following. The value of the SIM field, sign-extended to 16 bits, is placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRB,UIM VRT
11
652
31 0
SIM
16
///
21
908
31
b UIM || 0b00000 do i=0 to 127 by 32 (VRB)b:b+31 VRTi:i+31 For each vector element i from 0 to 3, do the following. The contents of word element UIM in VRB are placed into word element i of VRT. Special Registers Altered: None
do i=0 to 127 by 32 VRTi:i+31 EXTS(SIM, 32) For each vector element i from 0 to 3, do the following. The value of the SIM field, sign-extended to 32 bits, is placed into word element i of VRT. Special Registers Altered: None
226
Power ISA I
VA-form
VRT,VRA,VRB,VRC VRT
11
VRA
16
VRB
21
VRC
26
42
31
Vector Permute
vperm 4
0 6
VA-form
do i=0 to 127 VRTi ((VRC)i=0) ? (VRA)i : (VRB)i For each bit in VRC that contains the value 0, the corresponding bit in VRA is placed into the corresponding bit of VRT. Otherwise, the corresponding bit in VRB is placed into the corresponding bit of VRT.
VRT,VRA,VRB,VRC VRT
11
VRA
16
VRB
21
VRC
26
43
31
temp0:255 (VRA) || (VRB) do i=0 to 127 by 8 b (VRC)i+3:i+7 || 0b000 tempb:b+7 VRTi:i+7 Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. For each vector element i from 0 to 15, do the following. The contents of the byte element in the source vector specified by bits 3:7 of byte element i of VRC are placed into byte element i of VRT. Special Registers Altered: None Programming Note See the Programming Notes with the Load Vector for Shift Left and Load Vector for Shift Right instructions on page 218 for examples of uses of vperm.
227
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
452
31 0
VRA
16
VRB
/ SHB
21 22 26
44
31
sh (VRB)125:127 t 1 do i=0 to 127 by 8 t & ((VRB)i+5:i+7=sh) t if t=1 then VRT (VRA) << sh undefined else VRT The contents of VRA are shifted left by the number of bits specified in (VRB)125:127. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is place into VRT, except if, for any byte element in register VRB, the low-order 3 bits are not equal to the shift amount, then VRT is undefined. Special Registers Altered: None
VRT
Let the source vector be the concatenation of the contents of VRA followed by the contents of VRB. Bytes SHB:SHB+15 of the source vector are placed into VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1036
31
shb VRT
The contents of VRA are shifted left by the number of bytes specified in (VRB)121:124. - Bytes shifted out of byte 0 are lost. - Zeros are supplied to the vacated bytes on the right. The result is placed into VRT. Special Registers Altered: None
228
Power ISA I
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
708
31 0
VRA
16
VRB
21
1100
31
sh (VRB)125:127 t 1 do i=0 to 127 by 8 t & ((VRB)i+5:i+7=sh) t if t=1 then VRT (VRA) >>ui sh undefined else VRT The contents of VRA are shifted right by the number of bits specified in (VRB)125:127. - Bits shifted out of bit 127 are lost. - Zeros are supplied to the vacated bits on the left. The result is place into VRT, except if, for any byte element in register VRB, the low-order 3 bits are not equal to the shift amount, then VRT is undefined. Special Registers Altered: None Programming Note A double-register shift by a dynamically specified number of bits (0-127) can be performed in six instructions. The following example shifts Vw || Vx left by the number of bits specified in Vy and places the high-order 128 bits of the result into Vz. vslo Vt1,Vw,Vy vsl Vt1,Vt1,Vy vsububm Vt3,V0,Vy vsro Vt2,Vx,Vt3 vsr Vt2,Vt2,Vt3 vor Vz,Vt1,Vt2 #shift high-order reg left #adjust shift count ((V0)=0) #shift low-order reg right #merge to get final result
shb VRT
The contents of VRA are shifted right by the number of bytes specified in (VRB)121:124. - Bytes shifted out of byte 15 are lost. - Zeros are supplied to the vacated bytes on the left. The result is placed into VRT. Special Registers Altered: None
229
VRT,VRA,VRB 4 VRT
11
VRA
16
VRB
21
768
31
VRA
16
VRB
21
384
31
do i=0 to 127 by 32 aop EXTZ((VRA)i:i+31) EXTZ((VRB)i:i+31) bop Chop( ( aop +int bop ) >>ui 32,1) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The carry out of the 32-bit sum is zero-extended to 32 bits and placed into word element i of VRT. Special Registers Altered: None
do i=0 to 127 by 8 EXTS(VRAi:i+7) aop bop EXTS(VRBi:i+7) Clamp( aop +int bop, -128, 127 )24:31 VRTi:i+7 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRA is added to signed-integer byte element i in VRB. - If the sum is greater than 127 the result saturates to 127. - If the sum is less than -128 the result saturates to -128. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
832
31 0
VRA
16
VRB
21
896
31
do i=0 to 127 by 16 aop EXTS((VRA)i:i+15) EXTS((VRB)i:i+15) bop VRTi:i+15 Clamp(aop +int bop, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is added to signed-integer halfword element i in VRB. - If the sum is greater than 215-1 the result saturates to 215-1 - If the sum is less than -215 the result saturates to -215. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 32 aop EXTS((VRA)i:i+31) EXTS((VRB)i:i+31) bop Clamp(aop +int bop, -231, 231-1) VRTi:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is added to signed-integer word element i in VRB. - If the sum is greater than 231-1 the result saturates to 231-1. - If the sum is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
230
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
0
31 0
VRA
16
VRB
21
64
31
do i=0 to 127 by 8 EXTZ((VRA)i:i+7) aop EXTZ((VRB)i:i+7) bop VRTi:i+7 Chop( aop +int bop, 8 ) For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: None Programming Note vaddubm can be used for unsigned or signed-integers.
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 Chop( aop +int bop, 16 ) For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is added to unsigned-integer halfword element i in VRB. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None Programming Note vadduhm can be used for unsigned or signed-integers.
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
128
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop EXTZ((VRB)i:i+31) bop temp aop +int bop Chop( aop +int bop, 32 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None Programming Note vadduwm can be used for unsigned or signed-integers.
231
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
512
31 0
VRA
16
VRB
21
576
31
do i=0 to 127 by 8 EXTZ((VRA)i:i+7) aop EXTZ((VRB)i:i+7) bop VRTi:i+7 Clamp( aop +int bop, 0, 255 )24:31 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. - If the sum is greater than 255 the result saturates to 255. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 Clamp(aop +int bop, 0, 216-1)16:31 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is added to unsigned-integer halfword element i in VRB. - If the sum is greater than 216-1 the result saturates to 216-1. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
640
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop bop EXTZ((VRB)i:i+31) Clamp(aop +int bop, 0, 232-1) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. - If the sum is greater than 232-1 the result saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
232
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1408
31 0
VRA
16
VRB
21
1792
31
do i=0 to 127 by 32 aop (VRA)i:i+31 bop (VRB)i:i+31 temp (EXTZ(aop) +int EXTZ(bop) +int 1) >> 32 VRTi:i+31 temp & 0x0000_0001 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRB is subtracted from unsigned-integer word element i in VRA. The complement of the borrow out of bit 0 of the 32-bit difference is zero-extended to 32 bits and placed into word element i of VRT. Special Registers Altered: None
do i=0 to 127 by 8 aop EXTS((VRA)i:i+7) EXTS((VRB)i:i+7) bop VRTi:i+7 Clamp(aop +int bop +int 1, -128, 127)24:31 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRB is subtracted from signed-integer byte element i in VRA. - If the intermediate result is greater than 127 the result saturates to 127. - If the intermediate result is less than -128 the result saturates to -128. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1856
31 0
VRA
16
VRB
21
1920
31
do i=0 to 127 by 16 aop EXTS((VRA)i:i+15) EXTS((VRB)i:i+15) bop VRTi:i+15 Clamp(aop +int bop +int 1, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRB is subtracted from signed-integer halfword element i in VRA. - If the intermediate result is greater than 215-1 the result saturates to 215-1. - If the intermediate result is less than -215 the result saturates to -215. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 32 aop EXTS((VRA)i:i+31) EXTS((VRB)i:i+31) bop Clamp(aop +int bop +int 1,-231,231-1) VRTi:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRB is subtracted from signed-integer word element i in VRA. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
233
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1024
31 0
VRA
16
VRB
21
1088
31
do i=0 to 127 by 8 EXTZ((VRA)i:i+7) aop EXTZ((VRB)i:i+7) bop VRTi:i+7 Chop( aop +int bop +int 1, 8 ) For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRB is subtracted from unsigned-integer byte element i in VRA. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+16 Chop( aop +int bop +int 1, 16 ) For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRB is subtracted from unsigned-integer halfword element i in VRA. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1152
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop bop EXTZ((VRB)i:i+31) Chop( aop +int bop +int 1, 32 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRB is subtracted from unsigned-integer word element i in VRA. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None
234
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1536
31 0
VRA
16
VRB
21
1600
31
do i=0 to 127 by 8 EXTZ((VRA)i:i+7) aop EXTZ((VRB)i:i+7) bop VRTi:i+7 Clamp(aop +int bop +int 1, 0, 255)24:31 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRB is subtracted from unsigned-integer byte element i in VRA. If the intermediate result is less than 0 the result saturates to 0. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 Clamp(aop +int bop +int 1,0,216-1)16:31 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRB is subtracted from unsigned-integer halfword element i in VRA. If the intermediate result is less than 0 the result saturates to 0. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1664
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop bop EXTZ((VRB)i:i+31) Clamp(aop +int bop +int 1, 0, 232-1) VRTi:i+31 For each vector element i from 0 to 7, do the following. Unsigned-integer word element i in VRB is subtracted from unsigned-integer word element i in VRA. - If the intermediate result is less than 0 the result saturates to 0. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
235
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
776
31 0
VRA
16
VRB
21
840
31
do i=0 to 127 by 16 prod EXTS((VRA)i:i+7) si EXTS((VRB)i:i+7) Chop( prod, 16 ) VRTi:i+15 For each vector element i from 0 to 7, do the following. Signed-integer byte element i2 in VRA is multiplied by signed-integer byte element i2 in VRB. The low-order 16 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
do i=0 to 127 by 32 prod EXTS((VRA)i:i+15) si EXTS((VRB)i:i+15) Chop( prod, 32 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. Signed-integer halfword element i2 in VRA is multiplied by signed-integer halfword element i2 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
520
31 0
VRA
16
VRB
21
584
31
do i=0 to 127 by 16 EXTZ((VRA)i:i+7) ui EXTZ((VRB)i:i+7) prod Chop(prod, 16) VRTi:i+15 For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i2 in VRA is multiplied by unsigned-integer byte element i2 in VRB. The low-order 16 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
do i=0 to 127 by 32 EXTZ((VRA)i:i+15) ui EXTZ((VRB)i:i+15) prod Chop(prod, 32) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer halfword element i2 in VRA is multiplied by unsigned-integer halfword element i2 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
236
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
264
31 0
VRA
16
VRB
21
328
31
do i=0 to 127 by 16 prod EXTS((VRA)i+8:i+15) si EXTS((VRB)i+8:i+15) Chop( prod, 16 ) VRTi:i+15 For each vector element i from 0 to 7, do the following. Signed-integer byte element i2+1 in VRA is multiplied by signed-integer byte element i2+1 in VRB. The low-order 16 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
do i=0 to 127 by 32 EXTS((VRA)i+16:i+31) si EXTS((VRB)i+16:i+31) prod Chop( prod, 32 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. Signed-integer halfword element i2+1 in VRA is multiplied by signed-integer halfword element i2+1 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
8
31 0
VRA
16
VRB
21
72
31
do i=0 to 127 by 16 EXTZ((VRA)i+8:i+15) ui EXTZ((VRB)i+8:i+15) prod Chop( prod, 16 ) VRTi:i+15 For each vector element i from 0 to 7, do the following. Unsigned-integer byte element i2+1 in VRA is multiplied by unsigned-integer byte element i2+1 in VRB. The low-order 16 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
do i=0 to 127 by 32 EXTZ((VRA)i+16:i+31)ui EXTZ((VRB)i+16:i+31) prod Chop( prod, 32 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer halfword element i2+1 in VRA is multiplied by unsigned-integer halfword element i2+1 in VRB. The low-order 32 bits of the product are placed into halfword element i VRT. Special Registers Altered: None
237
VRT
11
VRA
16
VRB
21
VRC
26
32
31 0
4
6
VRT
11
VRA
16
VRB
21
VRC
26
33
31
do i=0 to 127 by 16 prod EXTS((VRA)i:i+15) si EXTS((VRB)i:i+15) (prod >>si 15) +int EXTS((VRC)i:i+15 sum Clamp(sum, -215, 215-1)16:31 VRTi:i+15 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is multiplied by signed-integer halfword element i in VRB, producing a 32-bit signed-integer product. Bits 0:16 of the product are added to signed-integer halfword element i in VRC. - If the intermediate result is greater than 215-1 the result saturates to 215-1. - If the intermediate result is less than -215 the result saturates to -215. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 16 prod EXTS((VRA)i:i+15) si EXTS((VRB)i:i+15) ((prod +int 0x0000_4000) >>si 15) sum +int EXTS((VRC)i:i+15) VRTi:i+15 Clamp(sum, -215, 215-1)16:31 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is multiplied by signed-integer halfword element i in VRB, producing a 32-bit signed-integer product. The value 0x0000_4000 is added to the product, producing a 32-bit signed-integer sum. Bits 0:16 of the sum are added to signed-integer halfword element i in VRC. - If the intermediate result is greater than 215-1 the result saturates to 215-1. - If the intermediate result is less than -215 the result saturates to -215. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: SAT
238
Power ISA I
VRT
11
VRA
16
VRB
21
VRC
26
34
31 0
4
6
VRT
11
VRA
16
VRB
21
VRC
26
36
31
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) ui EXTZ((VRB)i:i+15) prod Chop( prod, 16 ) +int (VRC)i:i+15 sum VRTi:i+15 Chop( sum, 16 ) For each vector element i from 0 to 3, do the following. Unsigned-integer halfword element i in VRA is multiplied by unsigned-integer halfword element i in VRB, producing a 32-bit unsigned-integer product. The low-order 16 bits of the product are added to unsigned-integer halfword element i in VRC. The low-order 16 bits of the sum are placed into halfword element i of VRT. Special Registers Altered: None
do i=0 to 127 by 32 EXTZ((VRC)i:i+31) temp do j=0 to 31 by 8 prod EXTZ((VRA)i+j:i+j+7) ui EXTZ((VRB)i+j:i+j+7) temp +int prod temp VRTi:i+31 Chop( temp, 32 ) For each word element in VRT the following operations are performed, in the order shown.
Each of the four unsigned-integer byte elements contained in the corresponding word element of VRA is multiplied by the corresponding unsigned-integer byte element in VRB, producing an unsigned-integer halfword product. The sum of these four unsigned-integer halfword products is added to the unsigned-integer word element in VRC. The unsigned-integer word result is placed into the corresponding word element of VRT.
239
VRT
11
VRA
16
VRB
21
VRC
26
37
31 0
4
6
VRT
11
VRA
16
VRB
21
VRC
26
40
31
do i=0 to 127 by 32 (VRC)i:i+31 temp do j=0 to 31 by 8 prod0:15 (VRA)i+j:i+j+7 sui (VRB)i+j:i+j+7 temp temp +int EXTS(prod) temp VRTi:i+31 For each word element in VRT the following operations are performed, in the order shown.
do i=0 to 127 by 32 (VRC)i:i+31 temp do j=0 to 31 by 16 prod0:31 (VRA)i+j:i+j+15 si (VRB)i+j:i+j+15 temp temp +int prod temp VRTi:i+31 For each word element in VRT the following operations are performed, in the order shown.
Each of the four signed-integer byte elements contained in the corresponding word element of VRA is multiplied by the corresponding unsigned-integer byte element in VRB, producing a signed-integer product. The sum of these four signed-integer halfword products is added to the signed-integer word element in VRC. The signed-integer result is placed into the corresponding word element of VRT.
Each of the two signed-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corresponding signed-integer halfword element in VRB, producing a signed-integer product. The sum of these two signed-integer word products is added to the signed-integer word element in VRC. The signed-integer word result is placed into the corresponding word element of VRT.
240
Power ISA I
VRT
11
VRA
16
VRB
21
VRC
26
41
31 0
4
6
VRT
11
VRA
16
VRB
21
VRC
26
38
31
do i=0 to 127 by 32 EXTS((VRC)i:i+31) temp do j=0 to 31 by 16 prod EXTS((VRA)i+j:i+j+15) si EXTS((VRB)i+j:i+j+15) temp +int prod temp VRTi:i+31 Clamp(temp, -231, 231-1) For each word element in VRT the following operations are performed, in the order shown.
do i=0 to 127 by 32 EXTZ((VRC)i:i+31) temp do j=0 to 31 by 16 prod EXTZ((VRA)i+j:i+j+15) ui EXTZ((VRB)i+j:i+j+15) temp +int prod temp VRTi:i+31 Chop( temp, 32 ) For each word element in VRT the following operations are performed, in the order shown.
Each of the two signed-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corresponding signed-integer halfword element in VRB, producing a signed-integer product. The sum of these two signed-integer word products is added to the signed-integer word element in VRC. If the intermediate result is greater than 231-1 the result saturates to 231-1 and if it is less than -231 it saturates to -231. The result is placed into the corresponding word element of VRT.
Each of the two unsigned-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corresponding unsigned-integer halfword element in VRB, producing an unsigned-integer word product. The sum of these two unsigned-integer word products is added to the unsigned-integer word element in VRC. The unsigned-integer result is placed into the corresponding word element of VRT.
241
VRT
11
VRA
16
VRB
21
VRC
26
39
31
do i=0 to 127 by 32 EXTZ((VRC)i:i+31) temp do j=0 to 31 by 16 prod EXTZ((VRA)i+j:i+j+15) ui EXTZ((VRB)i+j:i+j+15) temp +int prod temp VRTi:i+31 Clamp(temp, 0, 232-1) For each word element in VRT the following operations are performed, in the order shown.
Each of the two unsigned-integer halfword elements contained in the corresponding word element of VRA is multiplied by the corresponding unsigned-integer halfword element in VRB, producing an unsigned-integer product. The sum of these two unsigned-integer word products is added to the unsigned-integer word element in VRC. If the intermediate result is greater than 232-1 the result saturates to 232-1. The result is placed into the corresponding word element of VRT.
242
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1928
31 0
VRA
16
VRB
21
1672
31
temp EXTS((VRB)96:127) do i=0 to 127 by 32 temp temp +int EXTS((VRA)i:i+31) 0x0000_0000 VRT0:31 VRT32:63 0x0000_0000 0x0000_0000 VRT64:95 Clamp(temp, -231, 231-1) VRT96:127 The sum of the four signed-integer word elements in VRA is added to signed-integer word element 3 of VRB.
do i=0 to 127 by 64 temp EXTS((VRB)i+32:i+63) do j=0 to 63 by 32 temp +int EXTS((VRA)i+j:i+j+31) temp VRTi:i+63 0x0000_0000 || Clamp(temp, -231, 231-1) Word elements 0 and 2 of VRT are set to 0. The sum of the signed-integer word elements 0 and 1 in VRA is added to the signed-integer word element in bits 32:63 of VRB.
If the intermediate result is greater than 231-1 the result saturates to 231-1. If the intermediate result is less than -231 the result saturates to -231.
If the intermediate result is greater than 231-1 the result saturates to 231-1. If the intermediate result is less than -231 the result saturates to -231.
The low-end 32 bits of the result are placed into word element 3 of VRT. Word elements 0 to 2 of VRT are set to 0. Special Registers Altered: SAT
The low-order 32 bits of the result are placed into word element 1 of VRT. The sum of signed-integer word elements 2 and 3 in VRA is added to the signed-integer word element in bits 96:127 of VRB.
If the intermediate result is greater than 231-1 the result saturates to 231-1. If the intermediate result is less than -231 the result saturates to -231.
The low-order 32 bits of the result are placed into word element 3 of VRT. Special Registers Altered: SAT
243
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1800
31 0
VRA
16
VRB
21
1608
31
do i=0 to 127 by 32 EXTS((VRB)i:i+31) temp do j=0 to 31 by 8 temp temp +int EXTS((VRA)i+j:i+j+7) Clamp(temp, -231, 231-1) VRTi:i+31 For each vector element i from 0 to 3, do the following. The sum of the four signed-integer byte elements contained in word element i of VRA is added to signed-integer word element i in VRB. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
do i=0 to 127 by 32 EXTS((VRB)i:i+31) temp do j=0 to 31 by 16 temp temp +int EXTS((VRA)i+j:i+j+15) Clamp(temp, -231, 231-1) VRTi:i+31 For each vector element i from 0 to 3, do the following. The sum of the two signed-integer halfword elements contained in word element i of VRA is added to signed-integer word element i in VRB. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The low-order 32 bits of the result are placed into the corresponding word element of VRT. Special Registers Altered: SAT
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1544
31
do i=0 to 127 by 32 temp EXTZ((VRB)i:i+31) do j=0 to 31 by 8 temp +int EXTZ((VRA)i+j:i+j+7) temp VRTi:i+31 Clamp( temp, 0, 232-1 ) For each vector element i from 0 to 3, do the following. The sum of the four unsigned-integer byte elements contained in word element i of VRA is added to unsigned-integer word element i in VRB. - If the intermediate result is greater than 232-1 it saturates to 232-1. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: SAT
244
Power ISA I
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1282
31 0
VRA
16
VRB
21
1346
31
do i=0 to 127 by 8 EXTS((VRA)i:i+7) aop bop EXTS((VRB)i:i+7) Chop(( aop +int bop +int 1 ) >> 1, 8) VRTi:i+7 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRA is added to signed-integer byte element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTS((VRA)i:i+15) aop bop EXTS((VRB)i:i+15) Chop(( aop +int bop +int 1 ) >> 1, 16) VRTi:i+15 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is added to signed-integer halfword element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1410
31
do i=0 to 127 by 32 aop EXTS((VRA)i:i+31) EXTS((VRB)i:i+31) bop Chop(( aop +int bop +int 1 ) >> 1, 32) VRTi:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is added to signed-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None
245
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1026
31 0
VRA
16
VRB
21
1090
31
do i=0 to 127 by 8 aop EXTZ((VRA)i:i+7) EXTZ((VRB)i:i+7 bop Chop((aop +int bop +int 1) >>ui 1, 8) VRTi:i+7 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is added to unsigned-integer byte element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 8 bits of the result are placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 Chop((aop +int bop +int 1) >>ui 1, 16) For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is added to unsigned-integer halfword element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 16 bits of the result are placed into halfword element i of VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1154
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop EXTZ((VRB)i:i+31) bop VRTi:i+31 Chop((aop +int bop +int 1) >>ui 1, 32) For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is added to unsigned-integer word element i in VRB. The sum is incremented by 1 and then shifted right 1 bit. The low-order 32 bits of the result are placed into word element i of VRT. Special Registers Altered: None
246
Power ISA I
Version 2.06 Revision B 6.9.1.7 Vector Integer Maximum and Minimum Instructions
Vector Maximum Signed Byte
vmaxsb 4
0 6
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
258
31 0
VRA
16
VRB
21
322
31
do i=0 to 127 by 8 EXTS((VRA)i:i+7) aop bop EXTS((VRB)i:i+7) ( aop >si bop ) VRTi:i+7 ? (VRA)i:i+7 : (VRB)i:i+7 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. The larger of the two values is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 aop EXTS((VRA)i:i+15) EXTS((VRB)i:i+15 bop ( aop >si bop ) VRTi:i+15 ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is compared to signed-integer halfword element i in VRB. The larger of the two values is placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
386
31
do i=0 to 127 by 32 EXTS((VRA)i:i+31) aop bop EXTS((VRB)i:i+31) ( aop >si bop ) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. The larger of the two values is placed into word element i of VRT. Special Registers Altered: None
247
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
2
31 0
VRA
16
VRB
21
66
31
do i=0 to 127 by 8 aop EXTZ((VRA)i:i+7) EXTZ((VRB)i:i+7) bop (aop >ui bop) ? (VRA)i:i+7 : (VRB)i:i+7 VRTi:i+7 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is compared to unsigned-integer byte element i in VRB. The larger of the two values is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 (aop >ui bop) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is compared to unsigned-integer halfword element i in VRB. The larger of the two values is placed into halfword element i of VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
130
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop bop EXTZ((VRB)i:i+31) (aop >ui bop) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is compared to unsigned-integer word element i in VRB. The larger of the two values is placed into word element i of VRT. Special Registers Altered: None
248
Power ISA I
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
770
31 0
VRA
16
VRB
21
834
31
do i=0 to 127 by 8 aop EXTS((VRA)i:i+7) EXTS((VRB)i:i+7) bop (aop <si bop) ? (VRA)i:i+7 : (VRB)i:i+7 VRTi:i+7 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. The smaller of the two values is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTS((VRA)i:i+15) aop EXTS((VRB)i:i+15) bop VRTi:i+15 ( aop <si bop ) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is compared to signed-integer halfword element i in VRB. The smaller of the two values is placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
898
31
do i=0 to 127 by 32 EXTS((VRA)i:i+31) aop bop EXTS((VRB)i:i+31) ( aop <si bop ) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. The smaller of the two values is placed into word element i of VRT. Special Registers Altered: None
249
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
514
31 0
VRA
16
VRB
21
578
31
do i=0 to 127 by 8 aop EXTZ((VRA)i:i+7) EXTZ((VRB)i:i+7 bop ( aop <ui bop ) VRTi:i+7 ? (VRA)i:i+7 : (VRB)i:i+7 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is compared to unsigned-integer byte element i in VRB. The smaller of the two values is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 EXTZ((VRA)i:i+15) aop EXTZ((VRB)i:i+15) bop VRTi:i+15 ( aop <ui bop ) ? (VRA)i:i+15 : (VRB)i:i+15 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is compared to unsigned-integer halfword element i in VRB. The smaller of the two values is placed into halfword element i of VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
642
31
do i=0 to 127 by 32 EXTZ((VRA)i:i+31) aop bop EXTZ((VRB)i:i+31) ( aop <ui bop ) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is compared to unsigned-integer word element i in VRB. The smaller of the two values is placed into word element i of VRT. Special Registers Altered: None
250
Power ISA I
Programming Note vcmpequb[.], vcmpequh[.] and vcmpequw[.] can be used for unsigned or signed-integers.
VRT
11
VRA
16
6
31 0
VRA
16
70
31
do i=0 to 127 by 8 ((VRA)i:i+7 =int (VRB)i:i+7) ? 81 : 80 VRTi:i+7 if Rc=1 then do t (VRT=1281) (VRT=1280) f CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is compared to unsigned-integer byte element i in VRB. Byte element i in VRT is set to all 1s if unsigned-integer byte element i in VRA is equal to unsigned-integer byte element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
do i=0 to 127 by 16 VRTi:i+15 ((VRA)i:i+15 =int (VRB)i:i+15) ? 161 : 160 if Rc=1 then do t (VRT=1281) (VRT=1280) f CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is compared to unsigned-integer halfword element element i in VRB. Halfword element i in VRT is set to all 1s if unsigned-integer halfword element i in VRA is equal to unsigned-integer halfword element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
251
VRT
11
VRA
16
134
31 0
VRA
16
774
31
do i=0 to 127 by 32 VRTi:i+31 ((VRA)i:i+31 =int (VRB)i:i+31) ? 321 : 320 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is compared to unsigned-integer word element i in VRB. Word element i in VRT is set to all 1s if unsigned-integer word element i in VRA is equal to unsigned-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
do i=0 to 127 by 8 ((VRA)i:i+7 >si (VRB)i:i+7) ? 81 : 80 VRTi:i+7 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 15, do the following. Signed-integer byte element i in VRA is compared to signed-integer byte element i in VRB. Byte element i in VRT is set to all 1s if signed-integer byte element i in VRA is greater than to signed-integer byte element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
VRA
16
838
31 0
VRA
16
902
31
do i=0 to 127 by 16 ((VRA)i:i+15 >si (VRB)i:i+15) ? 161 : 160 VRTi:i+15 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 7, do the following. Signed-integer halfword element i in VRA is compared to signed-integer halfword element i in VRB. Halfword element i in VRT is set to all 1s if signed-integer halfword element i in VRA is greater than signed-integer halfword element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
do i=0 to 127 by 32 ((VRA)i:i+31 >si (VRB)i:i+31) ? 321 : 320 VRTi:i+31 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 3, do the following. Signed-integer word element i in VRA is compared to signed-integer word element i in VRB. Word element i in VRT is set to all 1s if signed-integer word element i in VRA is greater than signed-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
252
Power ISA I
VRA
16
518
31 0
VRA
16
582
31
do i=0 to 127 by 8 ((VRA)i:i+7 >ui (VRB)i:i+7) ? 81 : 80 VRTi:i+7 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 15, do the following. Unsigned-integer byte element i in VRA is compared to unsigned-integer byte element i in VRB. Byte element i in VRT is set to all 1s if unsigned-integer byte element i in VRA is greater than to unsigned-integer byte element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
do i=0 to 127 by 16 ((VRA)i:i+15 >ui (VRB)i:i+15) ? 161 : 160 VRTi:i+15 if Rc=1 then do (VRT=1281) t (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 7, do the following. Unsigned-integer halfword element i in VRA is compared to unsigned-integer halfword element i in VRB. Halfword element i in VRT is set to all 1s if unsigned-integer halfword element i in VRA is greater than to unsigned-integer halfword element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
VRA
16
646
31
do i=0 to 127 by 32 ((VRA)i:i+31 >ui (VRB)i:i+31) ? 321 : 320 VRTi:i+31 if Rc=1 then do t (VRT=1281) (VRT=1280) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 3, do the following. Unsigned-integer word element i in VRA is compared to unsigned-integer word element i in VRB. Word element i in VRT is set to all 1s if unsigned-integer word element i in VRA is greater than to unsigned-integer word element i in VRB, and is set to all 0s otherwise. Special Registers Altered: CR6 (if Rc=1)
253
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1092
31
VRT
The contents of VRA are ANDed with the complement of the contents of VRB and the result is placed into VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1284
31
Vector Complement Register The Vector NOR instruction can be coded in a way such that it complements the contents of one Vector Register and places the result into another Vector Register. An extended mnemonic is provided that allows this operation to be coded easily. The following instruction complements the contents of register Vy and places the result into register Vx. vnot Vx,Vy (equivalent to: vnor Vx,Vy,Vy)
VRT
( (VRA) | (VRB) )
The contents of VRA are ORed with the contents of VRB and the complemented result is placed into VRT. Special Registers Altered: None
Vector Logical OR
vor 4
0 6
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1156
31
VRA
16
VRB
21
1028
31
VRT
(VRA) | (VRB)
VRT
The contents of VRA are ORed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None
The contents of VRA are ANDed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1220
31
VRT
(VRA) (VRB)
The contents of VRA are XORed with the contents of VRB and the result is placed into VRT. Special Registers Altered: None
254
Power ISA I
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
4
31 0
VRA
16
VRB
21
68
31
do i=0 to 127 by 8 sh (VRB)i+5:i+7 VRTi:i+7 (VRA)i:i+7 <<< sh For each vector element i from 0 to 15, do the following. Byte element i in VRA is rotated left by the number of bits specified in the low-order 3 bits of the corresponding byte element i in VRB. The result is placed into byte element i in VRT. Special Registers Altered: None
do i=0 to 127 by 16 sh (VRB)i+12:i+15 VRTi:i+15 (VRA)i:i+15 <<< sh For each vector element i from 0 to 7, do the following. Halfword element i in VRA is rotated left by the number of bits specified in the low-order 4 bits of the corresponding halfword element i in VRB. The result is placed into halfword element i in VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
132
31
do i=0 to 127 by 32 sh (VRB)i+27:i+31 VRTi:i+31 (VRA)i:i+31 <<< sh For each vector element i from 0 to 3, do the following. Word element i in VRA is rotated left by the number of bits specified in the low-order 5 bits of the corresponding word element i in VRB. The result is placed into word element i in VRT. Special Registers Altered: None
255
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
260
31 0
VRA
16
VRB
21
324
31
do i=0 to 127 by 8 sh (VRB)i+5:i+7 VRTi:i+7 (VRA)i:i+7 << sh For each vector element i from 0 to 15, do the following. Byte element i in VRA is shifted left by the number of bits specified in the low-order 3 bits of byte element i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 sh (VRB)i+12:i+15 VRTi:i+15 (VRA)i:i+15 << sh For each vector element i from 0 to 7, do the following. Halfword element i in VRA is shifted left by the number of bits specified in the low-order 4 bits of halfword element i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
388
31
do i=0 to 127 by 32 (VRB)i+27:i+31 sh VRTi:i+31 (VRA)i:i+31 << sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted left by the number of bits specified in the low-order 5 bits of word element i in VRB. - Bits shifted out of bit 0 are lost. - Zeros are supplied to the vacated bits on the right. The result is placed into word element i of VRT. Special Registers Altered: None
256
Power ISA I
VX-form
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
516
31 0
VRA
16
VRB
21
580
31
do i=0 to 127 by 8 sh (VRB)i+5:i+7 VRTi:i+7 (VRA)i:i+7 >>ui sh For each vector element i from 0 to 15, do the following. Byte element i in VRA is shifted right by the number of bits specified in the low-order 3 bits of byte element i in VRB. Bits shifted out of the least-significant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 sh (VRB)i+12:i+15 VRTi:i+15 (VRA)i:i+15 >>ui sh For each vector element i from 0 to 7, do the following. Halfword element i in VRA is shifted right by the number of bits specified in the low-order 4 bits of halfword element i in VRB. Bits shifted out of the least-significant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into halfword element i of VRT. Special Registers Altered: None
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
644
31
do i=0 to 127 by 32 (VRB)i+27:i+31 sh VRTi:i+31 (VRA)i:i+31 >>ui sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the number of bits specified in the low-order 5 bits of word element i in VRB. Bits shifted out of the least-significant bit are lost. Zeros are supplied to the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None
257
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
772
31 0
VRA
16
VRB
21
836
31
do i=0 to 127 by 8 (VRB)i+5:i+7 sh VRTi:i+7 (VRA)i:i+7 >>si sh For each vector element i from 0 to 15, do the following. Byte element i in VRA is shifted right by the number of bits specified in the low-order 3 bits of the corresponding byte element i in VRB. Bits shifted out of bit 7 of the byte element are lost. Bit 0 of the byte element is replicated to fill the vacated bits on the left. The result is placed into byte element i of VRT. Special Registers Altered: None
do i=0 to 127 by 16 (VRB)i+12:i+15 sh VRTi:i+15 (VRA)i:i+15 >>si sh For each vector element i from 0 to 7, do the following. Halfword element i in VRA is shifted right by the number of bits specified in the low-order 4 bits of the corresponding halfword element i in VRB. Bits shifted out of bit 15 of the halfword are lost. Bit 0 of the halfword is replicated to fill the vacated bits on the left. The result is placed into halfword element i of VRT. Special Registers Altered: None
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
900
31
do i=0 to 127 by 32 (VRB)i+27:i+31 sh VRTi:i+31 (VRA)i:i+31 >>si sh For each vector element i from 0 to 3, do the following. Word element i in VRA is shifted right by the number of bits specified in the low-order 5 bits of the corresponding word element i in VRB. Bits shifted out of bit 31 of the word are lost. Bit 0 of the word is replicated to fill the vacated bits on the left. The result is placed into word element i of VRT. Special Registers Altered: None
258
Power ISA I
VX-form
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
10
31 0
VRA
16
VRB
21
74
31
do i=0 to 127 by 32 VRTi:i+31 RoundToNearSP((VRA)i:i+31 +fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is added to single-precision floating-point element i in VRB. The intermediate result is rounded to the nearest single-precision floating-point number and placed into word element i of VRT. Special Registers Altered: None
do i=0 to 127 by 32 VRTi:i+31 RoundToNearSP((VRA)i:i+31 -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRB is subtracted from single-precision floating-point element i in VRA. The intermediate result is rounded to the nearest single-precision floating-point number and placed into word element i of VRT. Special Registers Altered: None
259
VRT,VRA,VRC,VRB VRT
11
VRA
16
VRB
21
VRC
26
46
31 0
VRA
16
VRB
21
VRC
26
47
31
do i=0 to 127 by 32 (VRA)i:i+31 fp (VRC)i:i+31 prod VRTi:i+31 RoundToNearSP( prod +fp (VRB)i:i+31 ) For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is multiplied by single-precision floating-point element i in VRC. Single-precision floating-point element i in VRB is added to the infinitely-precise product. The intermediate result is rounded to the nearest single-precision floating-point number and placed into word element i of VRT. Special Registers Altered: None Programming Note To use a multiply-add to perform an IEEE or Java compliant multiply, the addend must be -0.0. This is necessary to insure that the sign of a zero result will be correct when the product is -0.0 (+0.0 + -0.0 +0.0, and -0.0 + -0.0 -0.0). When the sign of a resulting 0.0 is not important, then +0.0 can be used as an addend which may, in some cases, avoid the need for a second register to hold a -0.0 in addition to the integer 0/floating-point +0.0 that may already be available.
do i=0 to 127 by 32 (VRA)i:i+31 fp (VRC)i:i+31 prod0:inf VRTi:i+31 -RoundToNearSP(prod0:inf -fp (VRB)i:i+31) For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is multiplied by single-precision floating-point element i in VRC. Single-precision floating-point element i in VRB is subtracted from the infinitely-precise product. The intermediate result is rounded to the nearest single-precision floating-point number, then negated and placed into word element i of VRT. Special Registers Altered: None
260
Power ISA I
VRT,VRA,VRB VRT
11
VRA
16
VRB
21
1034
31 0
VRA
16
VRB
21
1098
31
do i=0 to 127 by 32 ( (VRA)i:i+31 >fp (VRB)i:i+31 ) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is compared to single-precision floating-point element i in VRB. The larger of the two values is placed into word element i of VRT. The maximum of +0 and -0 is +0. The maximum of any value and a NaN is a QNaN. Special Registers Altered: None
do i=0 to 127 by 32 ( (VRA)i:i+31 <fp (VRB)i:i+31 ) VRTi:i+31 ? (VRA)i:i+31 : (VRB)i:i+31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is compared to single-precision floating-point element i in VRB. The smaller of the two values is placed into word element i of VRT. The minimum of +0 and -0 is -0. The minimum of any value and a NaN is a QNaN. Special Registers Altered: None
261
VRT,VRB,UIM VRT
11
UIM
16
VRB
21
970
31 0
UIM
16
VRB
21
906
31
do i=0 to 127 by 32 VRTi:i+31 ConvertSPtoSXWsaturate((VRB)i:i+31, UIM) For each vector element i from 0 to 3, do the following. Single-precision floating-point word element i in VRB is multiplied by 2UIM. The product is converted to a 32-bit signed fixed-point integer using the rounding mode Round toward Zero. - If the intermediate result is greater than 231-1 the result saturates to 231-1. - If the intermediate result is less than -231 the result saturates to -231. The result is placed into word element i of VRT. Special Registers Altered: SAT Extended Mnemonics: Example of an extended mnemonics for Vector Convert to Signed Fixed-Point Word Saturate: Extended: Equivalent to: vcfpsxws VRT,VRB,UIM vctsxs VRT,VRB,UIM
do i=0 to 127 by 32 VRTi:i+31 ConvertSPtoUXWsaturate((VRB)i:i+31, UIM) For each vector element i from 0 to 3, do the following. Single-precision floating-point word element i in VRB is multiplied by 2UIM. The product is converted to a 32-bit unsigned fixed-point integer using the rounding mode Round toward Zero. - If the intermediate result is greater than 232-1 the result saturates to 232-1. The result is placed into word element i of VRT. Special Registers Altered: SAT Extended Mnemonics: Example of an extended mnemonics for Vector Convert to Unsigned Fixed-Point Word Saturate: Extended: Equivalent to:
262
Power ISA I
VX-form
VRT,VRB,UIM VRT
11
UIM
16
VRB
21
842
31 0
UIM
16
VRB
21
778
31
do i=0 to 127 by 32 VRTi:i+31 ConvertSXWtoSP( (VRB)i:i+31 ) fp 2UIM For each vector element i from 0 to 3, do the following. Signed fixed-point word element i in VRB is converted to the nearest single-precision floating-point value. Each result is divided by 2UIM and placed into word element i of VRT. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Vector Convert from Signed Fixed-Point Word Extended: Equivalent to: vcsxwfp VRT,VRB,UIM vcfsx VRT,VRB,UIM
do i=0 to 127 by 32 VRTi:i+31 ConvertUXWtoSP( (VRB)i:i+31 ) fp 2UIM For each vector element i from 0 to 3, do the following. Unsigned fixed-point word element i in VRB is converted to the nearest single-precision floating-point value. The result is divided by 2UIM and placed into word element i of VRT. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Vector Convert from Unsigned Fixed-Point Word Extended: Equivalent to: vcuxwfp VRT,VRB,UIM vcfux VRT,VRB,UIM
263
VRT,VRB VRT
11
///
16
VRB
21
714
31 0
///
16
VRB
21
522
31
do i=0 to 127 by 32 RoundToSPIntFloor( (VRB)0:31 ) VRT0:31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRB is rounded to a single-precision floating-point integer using the rounding mode Round toward -Infinity. The result is placed into the corresponding word element i of VRT. Special Registers Altered: None Programming Note The Vector Convert To Fixed-Point Word instructions support only the rounding mode Round toward Zero. A floating-point number can be converted to a fixed-point integer using any of the other three rounding modes by executing the appropriate Vector Round to Floating-Point Integer instruction before the Vector Convert To Fixed-Point Word instruction. Programming Note The fixed-point integers used by the Vector Convert instructions can be interpreted as consisting of 32-UIM integer bits followed by UIM fraction bits.
do i=0 to 127 by 32 RoundToSPIntNear( (VRB)0:31 ) VRT0:31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRB is rounded to a single-precision floating-point integer using the rounding mode Round to Nearest. The result is placed into the corresponding word element i of VRT. Special Registers Altered: None
VRT,VRB VRT
11
///
16
VRB
21
650
31 0
///
16
VRB
21
586
31
do i=0 to 127 by 32 RoundToSPIntCeil( (VRB)0:31 ) VRT0:31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRB is rounded to a single-precision floating-point integer using the rounding mode Round toward +Infinity. The result is placed into the corresponding word element i of VRT. Special Registers Altered: None
do i=0 to 127 by 32 RoundToSPIntTrunc( (VRB)0:31 ) VRT0:31 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRB is rounded to a single-precision floating-point integer using the rounding mode Round toward Zero. The result is placed into the corresponding word element i of VRT. Special Registers Altered: None
264
Power ISA I
Programming Note Each single-precision floating-point word element in VRB should be non-negative; if it is negative, the corresponding element in VRA will necessarily be out of bounds. One exception to this is when the value of an element in VRB is -0.0 and the value of the corresponding element in VRA is either +0.0 or -0.0. +0.0 and -0.0 compare equal to -0.0.
VRA
16
966
31
do i=0 to 127 by 32 ( (VRA)i:i+31 fp (VRB)i:i+31 ) le ( (VRA)i:i+31 fp -(VRB)i:i+31 ) ge VRTi:i+31 le || ge || 300 if Rc=1 then do (VRT=1280) ib CR6 0b00 || ib || 0b0 For each vector element i from 0 to 3, do the following. Single-precision floating-point word element i in VRA is compared to single-precision floating-point word element i in VRB. A 2-bit value is formed that indicates whether the element in VRA is within the bounds specified by the element in VRB, as follows. - Bit 0 of the 2-bit value is set to 0 if the element in VRA is less than or equal to the element in VRB, and is set to 1 otherwise. - Bit 1 of the 2-bit value is set to 0 if the element in VRA is greater than or equal to the negation of the element in VRB, and is set to 1 otherwise. The 2-bit value is placed into the high-order two bits of word element i of VRT and the remaining bits of element i are set to 0. If Rc=1, CR field 6 is set as follows. Bit 0 1 2 Description Set to 0 Set to 0 Set to indicate whether all four elements in VRA are within the bounds specified by the corresponding element in VRB, otherwise set to 0. Set to 0 (if Rc=1)
VC-form
(Rc=0) (Rc=1)
VRA
16
VRB
Rc
21 22
198
31
do i=0 to 127 by 32 ((VRA)i:i+31 =fp (VRB)i:i+31) ? 321 : 320 VRTi:i+31 if Rc=1 then do ( VRT=1281 ) t ( VRT=1280 ) f CR6 t || 0b0 || f || 0b0 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is compared to single-precision floating-point element i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in VRA is equal to single-precision floating-point element i in VRB, and is set to all 0s otherwise. If the source element i in VRA or the source element i in VRB is a NaN, VRT is set to all 0s, indicating not equal to. If the source element i in VRA and the source element i in VRB are both infinity with the same sign, VRT is set to all 1s, indicating equal to. Special Registers Altered: CR6 (if Rc=1)
265
VC-form
(Rc=0) (Rc=1)
VRA
16
454
31 0
VRA
16
VRB
Rc
21 22
710
31
do i=0 to 127 by 32 VRTi:i+31 ((VRA)i:i+31 fp (VRB)i:i+31) ? 321 : 320 if Rc=1 then do ( VRT=1281 ) t ( VRT=1280 ) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is compared to single-precision floating-point element i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in VRA is greater than or equal to single-precision floating-point element i in VRB, and is set to all 0s otherwise. If the source element i in VRA or the source element i in VRB is a NaN, VRT is set to all 0s, indicating not greater than or equal to. If the source element i in VRA and the source element i in VRB are both infinity with the same sign, VRT is set to all 1s, indicating greater than or equal to. Special Registers Altered: CR6 (if Rc=1)
do i=0 to 127 by 32 ((VRA)i:i+31 >fp (VRB)i:i+31) ? 321 : 320 VRTi:i+31 if Rc=1 then do ( VRT=1281 ) t ( VRT=1280 ) f t || 0b0 || f || 0b0 CR6 For each vector element i from 0 to 3, do the following. Single-precision floating-point element i in VRA is compared to single-precision floating-point element i in VRB. Word element i in VRT is set to all 1s if single-precision floating-point element i in VRA is greater than single-precision floating-point element i in VRB, and is set to all 0s otherwise. If the source element i in VRA or the source element i in VRB is a NaN, VRT is set to all 0s, indicating not greater than. If the source element i in VRA and the source element i in VRB are both infinity with the same sign, VRT is set to all 0s, indicating not greater than. Special Registers Altered: CR6 (if Rc=1)
266
Power ISA I
VX-form
VRT,VRB VRT
11
///
16
VRB
21
394
31 0
///
16
VRB
21
458
31
do i=0 to 127 by 32 Power2EstimateSP( (VRB)i:i+31 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of 2 raised to the power of single-precision floating-point element i in VRB is placed into word element i of VRT. Let x be any single-precision floating-point input value. Unless x< -146 or the single-precision floating-point result of computing 2 raised to the power x would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16. The most significant 12 bits of the estimates significand are monotonic. An integral input value returns an integral value when the result is representable. The result for various special cases of the source value is given below. Value - Infinity -0 +0 +Infinity NaN Special Registers Altered: None Result +0 +1 +1 +Infinity QNaN
do i=0 to 127 by 32 LogBase2EstimateSP((VRB)i:i+31) VRTi:i+31 For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the base 2 logarithm of single-precision floating-point element i in VRB is placed into the corresponding word element of VRT. Let x be any single-precision floating-point input value. Unless | x-1 | is less than or equal to 0.125 or the single-precision floating-point result of computing the base 2 logarithm of x would be an infinity or a QNaN, the estimate has an absolute error in precision (absolute value of the difference between the estimate and the infinitely precise value) no greater than 2-5. Under the same conditions, the estimate has a relative error in precision no greater than one part in 8. The most significant 12 bits of the estimates significand are monotonic. The estimate is exact if x=2y, where y is an integer between -149 and +127 inclusive. Otherwise the value placed into the element of register VRT may vary between implementations, and between different executions on the same implementation. The result for various special cases of the source value is given below. Value - Infinity <0 -0 +0 +Infinity NaN Special Registers Altered: None Result QNaN QNaN - Infinity - Infinity +Infinity QNaN
267
VX-form
VRT,VRB VRT
11
///
16
VRB
21
266
31 0
///
16
VRB
21
330
31
do i=0 to 127 by 32 ReciprocalEstimateSP( (VRB)i:i+31 ) VRTi:i+31 For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the reciprocal of single-precision floating-point element i in VRB is placed into word element i of VRT. Unless the single-precision floating-point result of computing the reciprocal of a value would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 4096. Note that results may vary between implementations, and between different executions on the same implementation. The result for various special cases of the source value is given below. Value - Infinity -0 +0 +Infinity NaN Special Registers Altered: None Result -0 - Infinity + Infinity +0 QNaN
do i=0 to 127 by 32 ReciprocalSquareRootEstimateSP( VRTi:i+31 (VRB)i:i+31 ) For each vector element i from 0 to 3, do the following. The single-precision floating-point estimate of the reciprocal of the square root of single-precision floating-point element i in VRB is placed into word element i of VRT. Let x be any single-precision floating-point value. Unless the single-precision floating-point result of computing the reciprocal of the square root of x would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 4096. Note that results may vary between implementations, and between different executions on the same implementation. The result for various special cases of the source value is given below. Value - Infinity <0 -0 +0 +Infinity NaN Special Registers Altered: None Result QNaN QNaN - Infinity + Infinity +0 QNaN
268
Power ISA I
VRB ///
16
VRB
21
1604
31 0
///
21
1540
31
VSCR
(VRB)96:127
VRT
96
0 || (VSCR)
The contents of word element 3 of VRB are placed into the VSCR. Special Registers Altered: None
The contents of the VSCR are placed into word element 3 of VRT. The remaining word elements in VRT are set to 0. Special Registers Altered: None
269
270
Power ISA I
271
272
Power ISA I
7.1 Introduction
7.1.1 Overview of the Vector-Scalar Extension
Category Vector-Scalar Extension (VSX) provides facilities supporting vector and scalar binary floating-point operations. The following VSX features are provided to increase opportunities for vectorization. A unified register file, a set of Vector-Scalar Registers (VSR), supporting both scalar and vector operations is provided, eliminating the overhead of vector-scalar data transfer through storage. Support for word-aligned storage accesses for both scalar and vector operations is provided. Robust support for IEEE-754 for both vector and scalar floating-point operations is provided. Combining the Floating-Point Registers (FPR) defined in Chapter 4. Floating-Point Facility [Category: Floating-Point] and the Vector Registers (VR) defined in Chapter 6. Vector Facility [Category: Vector] provides additional registers to support more aggressive compiler optimizations for both vector and scalar operations. Implementations of VSX must also implement the Floating-Point (Chapter 4) and Vector (Chapter 6) categories.
Programming Note Application binary interfaces extended to support VSX require special care of vector data written to VSRs 0-31 (i.e., VSRs corresponding to FPRs). Legacy scalar function calls employ doubleword-based loads and stores to preserve the contents of any nonvolatile registers, This has the adverse effect of not preserving the contents of doubleword 1 of these VSRs.
Sixty-four 128-bit VSRs are provided. See Figure 106 All VSX floating-point computations and other data manipulation are performed on data residing in Vector-Scalar Registers, and results are placed into a VSR. Depending on the instruction, the contents of a VSR are interpreted as a sequence of equal-length elements (words or doublewords) or as a quadword. Each of the elements is aligned at its natural boundary within the VSR, as shown in Figure 106. Many instructions perform a given operation in parallel on all elements in a VSR. Depending on the instruction, a word element can be interpreted as a signed integer word (SW), an unsigned integer word (UW), a logical mask value (MW), or a single-precision floating-point value (SP); a doubleword element can be interpreted as a doubleword signed integer (SD), a doubleword unsigned integer (UD), a doubleword mask (DM), or a double-precision floating-point value (DP). In the instructions descriptions, phrases like signed integer word element are used as shorthand for word element, interpreted as a signed integer.
7.1.1.1 Compatibility with Category Floating-Point and Category Decimal Floating-Point Operations
The instruction sets defined in Chapter 4. Floating-Point Facility [Category: Floating-Point] and Chapter 5. Decimal Floating-Point [Category: Decimal Floating-Point] retain their definition with one primary difference. The FPRs are mapped to doubleword element 0 of VSRs 0-31. The contents of doubleword 1 of the VSR corresponding to a source FPR specified by an instruction are ignored. The contents of doubleword 1 of a VSR corresponding to the target FPR specified by an instruction are undefined.
273
SD/UD/MD/DP 1 SW/UW/MW/SP 2
64 96
SW/UW/MW/SP 1
SW/UW/MW/SP 3
127
doubleword element 0 of VSR[0], FPR[1] is located in doubleword element 0 of VSR[1], and so forth. All instructions that operate on an FPR are redefined to operate on doubleword element 0 of the corresponding VSR. The contents of doubleword element 1 of the VSR corresponding to a source FPR or FPR pair for these instructions are ignored and the contents of doubleword element 1 of the VSR corresponding to the target FPR or FPR pair for these instructions are undefined.
VSR[0] VSR[1]
FPR[0] FPR[1]
FPR[30] FPR[31]
VSR[62] VSR[63]
0 63 127
274
Power ISA I
All instructions that operate on a VR are redefined to operate on the corresponding VSR.
VR[30] VR[31]
127
275
35
32
276
Power ISA I
46
277
49
Greater
Than
or
50 51 52 53
Floating-Point Equal or Zero (FE) Floating-Point Unordered or NaN (FU) Reserved Floating-Point Invalid Operation Exception (Software-Defined Condition) (VXSOFT) This bit can be altered only by mcrfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1. See Section 7.4.1 , Floating-Point Invalid Operation Exception on page 295. Programming Note VXSOFT can be used by software to indicate the occurrence of an arbitrary, software-defined, condition that is to be treated as an Invalid Operation exception. For example, the bit could be set by a program that computes a base 10 logarithm if the supplied input is negative. Floating-Point Invalid Operation Exception (Invalid Square Root) (VXSQRT) This bit is set to 1 when a VSX Scalar Floating-Point Arithmetic or VSX Vector Floating-Point Arithmetic class instruction causes a Invalid Square Root type Invalid Operation exception. See Section 7.4.1 , Floating-Point Invalid Operation Exception on page 295. This bit can be set to 0 or 1 by a Move To FPSCR class instruction.
48:51
55
Floating-Point Invalid Operation Exception (Invalid Integer Convert) (VXCVI) This bit is set to 1 when a VSX Scalar Convert Double-Precision to Integer, VSX Vector Convert Double-Precision to Integer, or VSX Vector Convert Single-Precision to Integer class instruction causes a Invalid Integer Convert type Invalid Operation exception. See Section 7.4.1 , Floating-Point Invalid Operation Exception on page 295. This bit can be set to 0 or 1 by a Move To FPSCR class instruction.
278
Power ISA I
57
58
59
60
61
279
When the processor is in floating-point non-IEEE mode, the remaining FPSCR bits is permitted to have meanings different from those given in this document, and floating-point operations need not conform to the IEEE standard. The effects of executing a given floating-point instruction with NI=1, and any additional requirements for using non-IEEE mode, are implementation-dependent. The results of executing a given instruction in non-IEEE mode is permitted to vary between implementations, and between different executions on the same implementation. Programming Note When the processor is in floating-point non-IEEE mode, the results of floating-point operations is permitted to be approximate, and performance for these operations might be better, more predictable, or less data-dependent than when the processor is not in non-IEEE mode. For example, in non-IEEE mode an implementation is permitted to return 0 instead of a denormalized number and return a large number instead of an infinity. 62:63 Floating-Point Rounding Control (RN) This field is used by VSX Scalar Floating-Point and VSX Vector Floating-Point class instructions that round their result and the rounding mode is not implied by the opcode. This bit can be explicitly set or reset by a new Move To FPSCR class instruction. See Section 7.3.2.6 , Rounding on page 287. 00 01 10 11 Round to Nearest Even Round toward Zero Round toward +Infinity Round toward -Infinity
280
Power ISA I
Result Flags C 1 0 0 1 1 0 1 0 0 FL FG FE FU 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 1
Quiet NaN - Infinity - Normalized Number - Denormalized Number - Zero + Zero + Denormalized Number + Normalized Number + Infinity
Table 2.
281
listed in Sections 7.6.1, 7.6.2.1, and 7.6.8 through 7.6.9. A floating-point number consists of a signed exponent and a signed significand. The quantity expressed by this number is the product of the significand and the number 2exponent. Encodings are provided in the data format to represent finite numeric values, Infinity, and values that are Not a Number (NaN). Operations involving infinities produce results obeying traditional mathematical conventions. NaNs have no mathematical interpretation. Their encoding permits a variable diagnostic information field. NaNs might be used to indicate such things as uninitialized variables and can be produced by certain invalid operations. There is one class of exceptional events that occur during instruction execution that is unique to categories Vector-Scalar Extension and Floating-Point: the Floating-Point Exception. Floating-point exceptions are signaled with bits set in the FPSCR. They can cause the system floating-point enabled exception error handler to be invoked, precisely or imprecisely, if the proper control bits are set. Floating-Point Exceptions The following floating-point exceptions are detected by the processor: Invalid Operation exception SNaN Infinity-Infinity InfinityInfinity ZeroZero InfinityZero Invalid Compare Software-Defined Condition Invalid Square Root Invalid Integer Convert Zero Divide exception Overflow exception Underflow exception Inexact exception (VX) (VXSNAN) (VXISI) (VXIDI) (VXZDZ) (VXIMZ) (VXVC) (VXSOFT) (VXSQRT) (VXCVI) (ZX) (OX) (UX) (XX)
Each floating-point exception, and each category of Invalid Operation exception, has an exception bit in the FPSCR. In addition, each floating-point exception has a corresponding enable bit in the FPSCR. See Section 7.2.2, Floating-Point Status and Control Register on page 276 for a description of these exception and enable bits, and Section 7.3.3 , VSX Floating-Point Execution Models on page 289 for a detailed discussion of floating-point exceptions, including the effects of the enable bits.
282
Power ISA I
7.3.2
S
0
EXP
9
FRACTION
31
EXP
12
FRACTION
63
Figure 111.Floating-point double-precision format Single-Precision Format Exponent Bias Maximum Exponent (Emax) Minimum Exponent (Emin) Widths (bits): Format Sign Exponent Fraction Significand Nmax Nmin Dmin Dmin Nmax Nmin Table 3. +127 +127 -126 32 1 8 23 24 (1-2-24) x 2128 3.4 x 1038 1.0 x 2-126 1.2 x 10-38 1.0 x 2-149 1.4 x 10-45 Value is approximate Smallest (in magnitude) representable denormalized number. Largest (in magnitude) representable number. Smallest (in magnitude) representable normalized number. IEEE floating-point fields Double-Precision Format +1023 +1023 -1022 64 1 11 52 53 (1-2-53) x 21024 1.8 x 10308 1.0 x 2-1022 2.2 x 10-308 1.0 x 2-1074 4.9 x 10-324
283
Zero values (0) These are values that have a biased exponent value of zero and a fraction value of zero. Zeros can have a positive or negative sign. The sign of zero is ignored by comparison operations (that is, comparison regards +0 as equal to -0). Denormalized numbers (DEN) These are values that have a biased exponent value of zero and a nonzero fraction value. They are nonzero numbers smaller in magnitude than the representable normalized numbers. They are values in which the implied unit bit is 0. Denormalized numbers are interpreted as follows: DEN = (-1)s x 2Emin x (0.fraction) where Emin is the minimum representable exponent value (-126 for single-precision, -1022 for double-precision). Infinities (INF) These are values that have the maximum biased exponent value: 255 in single-precision format 2047 in double-precision format and a zero fraction value. They are used to approximate values greater in magnitude than the maximum normalized value. Infinity arithmetic is defined as the limiting case of real arithmetic, with restricted operations defined among numbers and infinities. Infinities and the real numbers can be related by ordering in the affine sense: -Infinity < every finite number < +Infinity Arithmetic on infinities is always exact and does not signal any exception, except when an exception occurs due to the invalid operations as described in Section 7.4.1 , Floating-Point Invalid Operation Exception on page 295. For comparison operations, +Infinity compares equal to +Infinity and -Infinity compares equal to -Infinity. Not a Numbers (NaNs) These are values that have the maximum biased exponent value and a nonzero fraction value. The sign bit is ignored (that is, NaNs are neither positive nor negative). If the high-order bit of the fraction field is 0, the NaN is a Signaling NaN; otherwise it is a Quiet NaN. Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions.
The NaNs are not related to the numeric values or infinities by order or value but are encodings used to convey diagnostic information such as the representation of uninitialized variables. The following is a description of the different floating-point values defined in the architecture: Binary floating-point numbers Machine representable values used as approximations to real numbers. Three categories of numbers are supported: normalized numbers, denormalized numbers, and zero values. Normalized numbers (NOR) These are values that have a biased exponent value in the range: 1 to 254 in single-precision format 1 to 2046 in double-precision format They are values in which the implied unit bit is 1. Normalized numbers are interpreted as follows: NOR = (-1)s x 2E x (1.fraction) where s is the sign, E is the unbiased exponent, and 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part.
284
Power ISA I
f(src1,src3,src2) ex: result = (src1 x src3) - src2 f(src1,src2) ex: result = src1 x src2 ex: result = src1 + src2 f(src1) ex: result = f(src1) When a QNaN is the result of a floating-point operation because one of the operands is a NaN or because a QNaN was generated due to a trap-disabled Invalid Operation exception, the following rule is applied to determine the NaN with the high-order fraction bit set to 1 that is to be stored as the result. if src1 is a NaN then result = Quiet(src1) else if src2 is a NaN (if there is a src2) then result = Quiet(src2) else if src3 is a NaN (if there is a src3) then result = Quiet(src3) else if disabled invalid operation exception then result = generated QNaN where Quiet(x) means x if x is a QNaN and x converted to a QNaN if x is an SNaN. Any instruction that generates a QNaN as the result of a disabled Invalid Operation exception generates the value 0x7FF8_0000_0000_0000 for double-precision and 0x7FC0_0000 for single-precision. Note that the M-form multiply-add-type instructions use the B source operand to specify src3 and the T target operand to specify src2, whereas A-form multiply-add-type instructions use the B source operand to specify src2 and the T target operand to specify src3. A double-precision NaN is considered to be representable in single-precision format if and only
285
286
Power ISA I
7.3.2.6 Rounding
The material in this section applies to operations that have numeric operands (that is, operands that are not infinities or NaNs). Rounding the intermediate result of such an operation can cause an Overflow exception, an Underflow exception, or an Inexact exception. The remainder of this section assumes that the operation causes no exceptions and that the result is numeric. See Section 7.3.2.2, Value Representation and Section 7.4, VSX Floating-Point Exceptions for the cases not covered here. The floating-point arithmetic, and rounding and conversion instructions round their intermediate results. With the exception of the estimate instructions, these instructions produce an intermediate result that can be regarded as having unbounded precision and exponent range. All but two groups of these instructions normalize or denormalize the intermediate result prior to rounding and then place the final result into the target element of the target VSR in either double or single-precision format. The scalar round to double-precision integer, vector round to double-precision integer, and convert double-precision to integer instructions with biased exponents ranging from 1022 through 1074 are prepared for rounding by repetitively shifting the significand right one position and incrementing the biased exponent until it reaches a value of 1075. (Intermediate results with biased exponents 1075 or larger are already integers, and with biased exponents 1021 or less round to zero.) After rounding, the final result for round to double-precision integer instructions is normalized and put in double-precision format, and, for the convert double-precision to integer instructions, is converted to a signed or unsigned integer. The vector round to single-precision integer and vector convert single-precision to integer instructions with biased exponents ranging from 126 through 178 are prepared for rounding by repetitively shifting the significand right one position and incrementing the biased exponent until it reaches a value of 179. (Intermediate results with biased exponents 179 or larger are already integers, and with biased exponents 125 or less round to zero.) After rounding, the final result for vector round to single-precision integer is normalized and put in double-precision format, and for
A fifth rounding mode is provided in the round to floating-point integer instructions (Section 7.6.7.2 on page 320), Round to Nearest Away. Let Z be the intermediate arithmetic result or the operand of a convert operation. If Z can be represented exactly in the target format, the result in all rounding modes is Z as represented in the target format. If Z cannot be represented exactly in the target format, let Z1 and Z2 bound Z as the next larger and next smaller numbers representable in the target format. Then Z1 or Z2 can be used to approximate the result in the target format. Figure 113 shows the relation of Z, Z1, and Z2 in this case. The following rules specify the rounding in the four modes. See Section 7.3.3.1, VSX Execution Model for IEEE Operations on page 289 for a detailed explanation of rounding. Figure 113 also summarizes the rounding actions for floating-point intermediate result for all supported rounding modes.
287
By Incrementing the least-significant bit of Z Infinitely-Precise Value By Truncating after the least-significant bit
Z2
Z1 Negative values
Z2 Z1 Z Positive values
Round to Nearest Away Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is furthest away from 0. Round to Nearest Even Choose the value that is closer to Z (Z1 or Z2). In case of a tie, choose the one that is even (least significant bit is 0). Round toward Zero Choose the smaller in magnitude (Z1 or Z2). Round toward +Infinity Choose Z1. Round toward -Infinity Choose Z2. Figure 113.Selection of Z1 and Z2
288
Power ISA I
FRACTION
G R X
53 54 55
Figure 114.IEEE floating-point execution model The S bit is the sign bit. The C bit is the carry bit, which captures the carry out of the significand. The L bit is the leading unit bit of the significand, which receives the implicit bit from the operand. The FRACTION is a 52-bit field that accepts the fraction of the operand. The Guard (G), Round (R), and Sticky (X) bits are extensions to the low-order bits of the accumulator. The G and R bits are required for postnormalization of the result. The G, R, and X bits are required during rounding to determine if the intermediate result is equally near the two nearest representable values. The X bit serves as an extension to the G and R bits by representing the logical OR of all bits that appear to the low-order side of the R bit, resulting from either shifting the accumulator right or to other generation of low-order result bits. The G and R bits participate in the left shifts with zeros being shifted into the R bit. Table 4 shows the significance of the G, R, and X bits with respect to the intermediate result (IR), the representable number next lower in magnitude (NL), and the representable number next higher in magnitude (NH). G 0 R 0 0 1 1 0 0 1 1 X 0 IR is exact 1 0 IR closer to NL 1 0 IR midway between NL and NH 1 0 IR closer to NH 1 Interpretation of G, R, and X bits Interpretation
0 0 0 1 1 1 1
Table 4.
Table 5 shows the positions of the Guard, Round, and Sticky bits for double-precision and single-precision floating-point numbers relative to the accumulator illustrated in Figure 114.
289
Location of the Guard, Round, and Sticky bits in the IEEE execution model
The significand of the intermediate result is prepared for rounding by shifting its contents right, if required, until the least significant bit to be retained is in the low-order bit position of the fraction. Four user-selectable rounding modes are provided through RN as described in Section 7.3.2.6, Rounding on page 287. The rules for rounding in each mode are as follows. Round to Nearest Even Guard bit = 0 The result is truncated. Guard bit = 1 Depends on Round and Sticky bits: Case a If the Round or Sticky bit is 1 (inclusive), the result is incremented. Case b If the Round and Sticky bits are 0 (result midway between closest representable values), if the low-order bit of the result is 1, the result is incremented. Otherwise (the low-order bit of the result is 0), the result is truncated. This is the case of a tie rounded to even. Round toward Zero Choose the smaller in magnitude of Z1 or Z2. If the Guard, Round, or Sticky bit is nonzero, the result is inexact. The result is truncated. Round toward +Infinity If positive, the result is incremented. If negative, the result is truncated. Round toward -Infinity If positive, the result is truncated. If negative, the result is incremented. A fifth rounding mode is provided in the VSX Round to Floating-Point Integer instructions (Section 7.6.7.2 on page 320) with the rules for rounding as follows.
FRACTION
X
106
Figure 115.Multiply-add 64-bit execution model The first part of the operation is a multiplication. The multiplication has two 53-bit significands as inputs, which are assumed to be prenormalized, and produces a result conforming to the above model. If there is a carry out of the significand (into the C bit), the significand is shifted right one position, shifting the L bit (leading unit bit) into the most significant bit of the FRACTION and shifting the C bit (carry out) into the L bit. All 106 bits (L bit, the FRACTION) of the product take part in the add operation. If the exponents of the two inputs to the adder are not equal, the significand of the operand with the smaller exponent is aligned (shifted) to the right by an amount that is added to that exponent to make it equal to the other inputs exponent. Zeros are shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the significand are ORed into the X bit. The add operation also produces a result conforming to the above model with the X bit taking part in the add operation.
290
Power ISA I
Location of the Guard, Round, and Sticky bits in the multiply-add execution model
The rules for rounding the intermediate result are the same as those given in Section 7.3.3.1. If the instruction is a negative multiply-add or negative multiply-subtract type instruction, the final result is negated.
291
On exception type... Enabled Invalid Operation Enabled Zero Divide Enabled Overflow Enabled Underflow Enabled Inexact Disabled Invalid Operation Disabled Zero Divide Disabled Overflow Disabled Underflow Disabled Inexact Table 7.
The subsequent sections define each of the floating-point exceptions and specify the action that is taken when they are detected.
292
Power ISA I
FE0 FE1 Description 0 0 Ignore Exceptions Mode Floating-point exceptions do not cause the system floating-point enabled exception error handler to be invoked. 0 1 Imprecise Nonrecoverable Mode The system floating-point enabled exception error handler is invoked at some point at or beyond the instruction that caused the enabled exception. It may not be possible to identify the excepting instruction or the data that caused the exception. Results produced by the excepting instruction might have been used by or might have affected subsequent instructions that are executed before the error handler is invoked. 1 0 Imprecise Recoverable Mode The system floating-point enabled exception error handler is invoked at some point at or beyond the instruction that caused the enabled exception. Sufficient information is provided to the error handler for it to identify the excepting instruction, the operands, and correct the result. No results produced by the excepting instruction have been used by or affected subsequent instructions that are executed before the error handler is invoked. 1 1 Precise Mode The system floating-point enabled exception error handler is invoked precisely at the instruction that caused the enabled exception. Architecture Note The FE0 and FE1 bits must be defined in Book III in a manner such that they can be changed dynamically and can easily be treated as part of a process state. In all cases, the question of whether a floating-point result is stored, and what value is stored, is governed by the FPSCR exception enable bits, as described in subsequent sections, and is not affected by the value of the FE0 and FE1 bits. In all cases in which the system floating-point enabled exception error handler is invoked, all instructions before the instruction at which the system floating-point enabled exception error handler is invoked have been completed, and no instruction after the instruction at which the system floating-point enabled exception error handler is invoked has begun
293
Engineering Note It is permissible for the implementation to be precise in any of the three modes that permit interrupts, or to be recoverable in Nonrecoverable Mode.
294
Power ISA I
Update of VSR[XT] is suppressed. FR and FI are set to zero. FPRF is unchanged. Scalar Floating-Point Compare
One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXVC (if SNaN) (if Invalid Compare)
2. 3.
For VSX Vector Floating-Point Arithmetic, VSX Vector Floating-Point Compare, VSX Vector DP-SP Conversion, VSX Vector Convert Floating-Point to Integer, and VSX Vector Round to Floating-Point Integer instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXISI VXIDI VXZDZ VXIMZ VXVC VXSQRT VXCVI 2. (if SNaN) (if Infinity Infinity) (if Infinity Infinity) (if Zero Zero) (if Infinity Zero) (if Invalid Compare) (if Invalid Square Root) (if Invalid Integer Convert)
295
3. 4.
296
Power ISA I
3. 4.
For the VSX Vector Single-Precision Arithmetic instructions, VSX Vector Single-Precision Maximum/Minimum instructions, the VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp) instruction, and the VSX Vector Round to Single-Precision Integer instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXISI VXIDI VXZDZ VXIMZ VXSQRT 2. (if SNaN) (if Infinity Infinity) (if Infinity Infinity) (if Zero Zero) (if Infinity Zero) (if Invalid Square Root)
The double-precision representation of a Quiet NaN is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. FR and FI are set to 0. FPRF is set to indicate the class of the result (Quiet NaN).
3. 4.
The single-precision representation of a Quiet NaN is placed into its respective word element of VSR[XT]. FR, FI, and FPRF are not modified.
For the VSX Vector Double-Precision Arithmetic instructions, VSX Vector Double-Precision Maximum/Minimum instructions, the VSX Vector Convert Single-Precision to Double-Precision format (xvcvspdp) instruction, and the VSX Vector Round to Double-Precision Integer instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXISI VXIDI VXZDZ VXIMZ VXSQRT 2. (if SNaN) (if Infinity Infinity) (if Infinity Infinity) (if Zero Zero) (if Infinity Zero) (if Invalid Square Root)
3.
The double-precision representation of a Quiet NaN is placed into its respective doubleword element of VSR[XT]. FR, FI, and FPRF are not modified.
3.
297
0x7FFF_FFFF_FFFF_FFFF is placed into doubleword element 0 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a positive number or +Infinity. 0x8000_0000_0000_0000 is placed into doubleword element 0 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a negative number, -Infinity, or NaN. The contents of doubleword element 1 of VSR[XT] are undefined.
0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element 0 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a positive number or +Infinity. 0x0000_0000_0000_0000 is placed into doubleword element 0 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a negative number, -Infinity, or NaN. The contents of doubleword element 1 of VSR[XT] are undefined.
3. 4.
3. 4.
For the VSX Scalar Convert Double-Precision to Signed Integer Word (xscvdpsxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
For the VSX Scalar Convert Double-Precision to Unsigned Integer Word (xscvdpuxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
0x7FFF_FFFF is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a positive number or +Infinity. 0x8000_0000 is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined.
0xFFFF_FFFF is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a positive number or +Infinity. 0x0000_0000 is placed into word element 1 of VSR[XT] if the double-precision operand in doubleword element 0 of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined.
3. 4.
3. 4.
298
Power ISA I
0x7FFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the double-precision operand in the corresponding doubleword element of VSR[XB] is a positive number or +Infinity. 0x8000_0000_0000_0000 is placed into its respective doubleword element i of VSR[XT] if the double-precision operand in the corresponding doubleword element of VSR[XB] is a negative number, -Infinity, or NaN.
0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a positive number or +Infinity. 0x0000_0000_0000_0000 is placed into doubleword element i of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a negative number, -Infinity, or NaN.
For the VSX Vector Convert Double-Precision to Signed Integer Word (xvcvdpsxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
For the VSX Vector Convert Double-Precision to Unsigned Integer Word (xvcvdpuxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
0x7FFF_FFFF is placed intoword element i2 of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a positive number or +Infinity. 0x8000_0000 is placed into word element i2 of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word element i2+1 of VSR[XT] are undefined.
0xFFFF_FFFF is placed into word element i2 of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a positive number or +Infinity. 0x0000_0000 is placed into word element i2 of VSR[XT] if the double-precision operand in doubleword element i of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word element i2+1 of VSR[XT] are undefined.
3.
3.
299
0x7FFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the single-precision operand in word element i2 of VSR[XB] is a positive number or +Infinity. 0x8000_0000_0000_0000 is placed into doubleword element i of VSR[XT] if the single-precision operand in word element i2 of VSR[XB] is a negative number, -Infinity, or NaN.
0xFFFF_FFFF_FFFF_FFFF is placed into doubleword element i of VSR[XT] if the single-precision operand in word element i2 of VSR[XB] is a positive number or +Infinity. 0x0000_0000_0000_0000 is placed into doubleword element i of VSR[XT] if the single-precision operand in word element i2 of VSR[XB] is a negative number, -Infinity, or NaN.
3.
3.
For the VSX Vector Convert Single-Precision to Signed Integer Word (xvcvspsxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
For the VSX Vector Convert Single-Precision to Unsigned Integer Word (xvcvspuxw) instruction: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXCVI 2. (if SNaN) (if Invalid Integer Convert)
0x7FFF_FFFF is placed into word element i of VSR[XT] if the single-precision operand in word element i of VSR[XB] is a positive number or +Infinity. 0x8000_0000 is placed into word element i of VSR[XT] if the single-precision operand in word element i of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word element i2+1 of VSR[XT] are undefined.
0xFFFF_FFFF is placed into word element i of VSR[XT] if the single-precision operand in the corresponding word element i2 of VSR[XB] is a positive number or +Infinity. 0x0000_0000 is placed into word element i of VSR[XT] if the single-precision operand in word element i i2 of VSR[XB] is a negative number, -Infinity, or NaN. The contents of word element i2+1 of VSR[XT] are undefined.
3.
3.
300
Power ISA I
For the VSX Vector Compare Single-Precision instructions: 1. One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXVC 2. (if SNaN) (if invalid compare)
0x0000_0000 is placed into its respective word element of VSR[XT]. FR, FI, and FPRF are not modified. double-precision compare
3.
One or two of the following Invalid Operation exceptions are set to 1. VXSNAN VXVC (if SNaN) (if invalid compare)
2.
0x0000_0000_0000_0000 is placed into its respective doubleword element of VSR[XT]. FR, FI, and FPRF are not modified.
3.
301
3. 4.
For the VSX Vector Divide Double-Precision (xvdivdp) instructions: 1. 2. ZX is set to 1. The double-precision representation of Infinity, where the sign is determined by the XOR of the signs of the operands, is placed into its respective doubleword element of VSR[XT]. FR, FI, and FPRF are not modified.
3.
For the VSX Vector Divide Single-Precision (xvdivsp) instructions: 1. 2. ZX is set to 1. The single-precision representation of Infinity, where the sign is determined by the XOR of the signs of the operands, is placed into its respective word element of VSR[XT]. The result is placed into its respective word element of VSR[XT] as a single-precision value. FR, FI, and FPRF are not modified.
For the VSX Vector Divide Double-Precision (xvdivdp), VSX Vector Divide Single-Precision (xvdivsp), VSX Vector Reciprocal Estimate Double-Precision (xvredp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector Reciprocal Square Root Estimate Double-Precision (xvrsqrtedp), or VSX Vector Reciprocal Square Root Estimate Single-Precision (xvrsqrtesp) instructions: 1. 2. 3. 4. ZX is set to 1. Update of VSR[XT] is suppressed for all vector elements. FR and FI are unchanged. FPRF is unchanged.
2.
3.
302
Power ISA I
3. 4.
For the VSX Vector Reciprocal Estimate Double-Precision (xvredp) and VSX Vector Reciprocal Square Root Estimate Double-Precision (xvrsqrtedp) instructions: 1. 2. ZX is set to 1. The double-precision representation of Infinity, where the sign is the sign of the operand, is placed into its respective doubleword element of VSR[XT]. FR, FI, and FPRF are not modified.
3.
For the VSX Vector Reciprocal Estimate Single-Precision (xvresp) and VSX Vector Reciprocal Square Root Estimate Single-Precision (xvrsqrtesp) instructions: 1. 2. ZX is set to 1. The single-precision representation of Infinity, where the sign is the sign of the operand, is placed into its respective word element of VSR[XT]. The result is placed into its respective word element of VSR[XT] as a single-precision value. FR, FI, and FPRF are not modified.
2.
3.
303
3.
4.
3.
4.
304
Power ISA I
4.
4. 5. 6.
For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Reciprocal Estimate Double-Precision (xsredp), VSX Scalar Subtract Double-Precision (xssubdp), and VSX Scalar Double-Precision Multiply-Add Arithmetic instructions: 3. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents of doubleword element 1 of VSR[XT] are undefined. FR is undefined. FI is set to 1. FPRF is set to indicate the class and sign of the result (Infinity or Normal Number).
4. 5. 6.
For VSX Vector Add Single-Precision (xvaddsp), VSX Vector Divide Single-Precision (xvdivsp), VSX Vector Multiply Single-Precision (xvmulsp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX Vector Subtract Single-Precision (xvsubsp), and VSX Vector Single-Precision Multiply-Add Arithmetic instructions: 3. The result is placed into its respective word element of VSR[XT] as a single-precision value. FR, FI, and FPRF are not modified.
4.
305
3.
3.
4.
3.
306
Power ISA I
3.
3.
For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Reciprocal Estimate Double-Precision (xsredp), VSX Scalar Subtract Double-Precision (xssubdp), and VSX Scalar Double-Precision Multiply-Add Arithmetic instructions: 1. 2. UX is set to 1. The result is placed into doubleword element 0 of VSR[XT] as a double-precision value. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to indicate the class and sign of the result ( Normalized Number, Denormalized Number, or Zero).
3.
For VSX Vector Add Single-Precision (xvaddsp), VSX Vector Divide Single-Precision (xvdivsp), VSX Vector Multiply Single-Precision (xvmulsp), VSX Vector Reciprocal Estimate Single-Precision (xvresp), VSX Vector round and Convert Double-Precision to Single-Precision format (xvcvdpsp), VSX Vector Subtract Single-Precision (xvsubsp), and VSX Vector Single-Precision Multiply-Add Arithmetic instructions: 1. 2. UX is set to 1. The result is placed into its respective word element of VSR[XT] as a single-precision value. FR, FI, and FPRF are not modified.
3.
307
3.
For the VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate (xscvdpsxws) and VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate (xscvdpuxws) instructions: 1. 2. XX is set to 1. The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. FPRF is set to indicate the class and sign of the result.
2.
The action to be taken depends on the setting of the Inexact Exception Enable bit of the FPSCR.
3.
For VSX vector floating-point operations: 1. 2. XX is set to 1. Update of VSR[XT] is suppressed for all vector elements. FR, FI, and FPRF are not modified.
3.
3.
For VSX Scalar Add Double-Precision (xsadddp), VSX Scalar Subtract Double-Precision (xssubdp), VSX Scalar Multiply Double-Precision (xsmuldp), VSX Scalar Divide Double-Precision (xsdivdp), VSX Scalar Square Root Double-Precision (xssqrtdp), VSX Scalar Double-Precision Multiply-Add Arithmetic, VSX Scalar Round to Double-Precision Integer Exact using Current rounding mode (xsrdpic), and VSX Scalar Convert Integer to Double-Precision instructions:
3.
308
Power ISA I
3.
For the VSX Scalar truncate Floating-Point to integer and Convert to Signed Fixed-Point Word format with Saturate instructions: 1. 2. XX is set to 1. The result is placed into word element 1 of VSR[XT] as a single-precision value. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. FPRF is set to indicate the class and sign of the result.
3.
For VSX vector single-precision floating-point operations: 1. 2. XX is set to 1. The result is placed into its respective word element of VSR[XT] as a single-precision value. FR, FI, and FPRF are not modified.
3.
For VSX vector double-precision floating-point operations: 1. 2. XX is set to 1. The result is placed into its respective doubleword element of VSR[XT] as a double-precision value. FR, FI, and FPRF are not modified.
3.
309
VSR contents when accessing aligned quadword in big-endian storage from Figure 116 Vt,Vs 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
0 1 2 3 4 5 6 7 8 9 A B C D E F
VSR contents when accessing aligned quadword in little-endian storage from Figure 117 Vt,Vs 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
1 2 3 4 5 6 7 8 9 A B C D E F
Figure 118.Vector-Scalar Register contents for aligned quadword Load or Store VSX Vector
Figure 116 illustrates the big-endian storage image of array AW. 0x0000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 0x0010:
0 1 2 3 4 5 6 7 8 9 A B C D E F
Figure 116.Big-endian storage image of array AW Figure 117 illustrates the little-endian storage image of array AW. 0x0000: 03 02 01 00 07 06 05 04 0B 0A 09 08 0F 0E 0D 0C 0x0010:
0 1 2 3 4 5 6 7 8 9 A B C D E F
Figure 117.Little-endian storage image of array AW Figure 118 shows the result of loading that quadword into a VSR or, equivalently, shows the contents that must be in a VSR if storing that VSR is to produce the storage contents shown in Figure 116 for big-endian. Note that Figure shows the effect of loading the quadword from both big-endian storage and little-endian storage.
310
Power ISA I
Figure 119 illustrates both big-endian and little-endian storage images of array B. Big-endian storage image of array B 0x0000: 01 23 45 67 00 11 22 33 44 55 66 77 88 99 AA BB 0x0010: CC DD EE FF
0 1 2 3
Figure 119.Storage images of array B Though this example shows the array starting at a quadword-aligned address, if the subject data of interest are elements 1 through 4, accessing elements 1 through 4 of array B involves an unaligned quadword storage access that spans two aligned quadwords.
Figure 120.Process to load misaligned quadword from big-endian storage using Load VSX Vector Word*4 Indexed Loading an Unaligned Quadword from Little-Endian Storage Loading elements from elements 1 through 4 of B (see Figure 119) into VR[VT] involves an unaligned quadword storage access. VSX supports word-aligned vector and scalar storage accesses using little-endian byte ordering. Little-endian storage image of array B 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88 0x0010: FF EE DD CC
0 1 2 3 4 5 6 7 8 9 A B C D E F
Figure 121.Process to load misaligned quadword from little-endian storage Load VSX Vector Word*4 Indexed
311
Storing an Unaligned Quadword to Little-Endian Storage Storing a VSR to elements 1 through 4 of B (see Figure 119) into VR[VT] involves an unaligned quadword storage access. VSX supports word-aligned vector and scalar storage accesses using little-endian byte ordering. Little-endian storage image of array B 0x0000: 67 45 23 01 33 22 11 00 77 66 55 44 BB AA 99 88 0x0010: FF EE DD CC
0 1 2 3 4 5 6 7 8 9 A B C D E F
Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F
Xs: F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB FC FD FE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0000: 01 23 45 67 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA BB 0x0010: FC FD FE FF
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x0000: 67 45 23 01 F3 F2 F1 F0 F7 F6 F5 F4 FB FA F9 F8 0x0010: FF FE FD FC
0 1 2 3 4 5 6 7 8 9 A B C D E F
Figure 122.Process to store misaligned quadword to big-endian storage using Store VSX Vector Word*4 Indexed
Figure 123.Process to store misaligned quadword to little-endian storage Store VSX Vector Word*4 Indexed
312
Power ISA I
Instruction Name Store VSX Scalar Doubleword Indexed VSX Scalar Store Instructions
Page 340
Instruction Name Load VSX Vector Doubleword and Splat Indexed VSX Vector Load and Splat Instruction
Page 339
Instruction Name Store VSX Vector Doubleword*2 Indexed Store VSX Vector Word*4 Indexed VSX Vector Store Instructions
313
Instruction Name VSX Vector Absolute Value Single-Precision VSX Vector Copy Sign Single-Precision VSX Vector Negative Absolute Value Single-Precision VSX Vector Negate Single-Precision VSX Vector Single-Precision Move Instructions
314
Power ISA I
Mnemonic
Instruction Name
xsmaddadp VSX Scalar Multiply-Add Type-A Double-Precision xsmaddmdp VSX Scalar Multiply-Add Type-M Double-Precision xsmsubadp VSX Scalar Multiply-Subtract Type-A Double-Precision xsmsubmdp VSX Scalar Multiply-Subtract Type-M Double-Precision xsnmaddadp VSX Scalar Negative Multiply-Add Type-A Double-Precision xsnmaddmdp VSX Scalar Negative Multiply-Add Type-M Double-Precision xsnmsubadp VSX Scalar Negative Multiply-Subtract Type-A Double-Precision xsnmsubmdp VSX Scalar Negative Multiply-Subtract Type-M Double-Precision Table 17. VSX Scalar Double-Precision Multiply-Add Arithmetic Instructions
315
Mnemonic xvaddsp xvdivsp xvmulsp xvresp xvrsqrtesp xvsqrtsp xvsubsp xvtdivsp xvtsqrtsp Table 19.
Instruction Name VSX Vector Add Single-Precision VSX Vector Divide Single-Precision VSX Vector Multiply Single-Precision VSX Vector Reciprocal Estimate Single-Precision VSX Vector Reciprocal Square Root Estimate Single-Precision VSX Vector Square Root Single-Precision VSX Vector Subtract Single-Precision VSX Vector Test for software Divide Single-Precision VSX Vector Test for software Square Root Single-Precision VSX Vector Single-Precision Elementary Arithmetic Instructions
Page 402 435 459 481 486 488 491 494 495
Mnemonic
Instruction Name
xvmaddadp VSX Vector Multiply-Add Type-A Double-Precision xvmaddmdp VSX Vector Multiply-Add Type-M Double-Precision xvmsubadp VSX Vector Multiply-Subtract Type-A Double-Precision xvmsubmdp VSX Vector Multiply-Subtract Type-M Double-Precision xvnmaddadp VSX Vector Negative Multiply-Add Type-A Double-Precision xvnmaddmdp VSX Vector Negative Multiply-Add Type-M Double-Precision xvnmsubadp VSX Vector Negative Multiply-Subtract Type-A Double-Precision xvnmsubmdp VSX Vector Negative Multiply-Subtract Type-M Double-Precision Table 20. VSX Vector Double-Precision Multiply-Add Arithmetic Instructions
Mnemonic xvmaddasp xvmaddmsp xvmsubasp xvmsubmsp xvnmaddasp xvnmaddmsp xvnmsubasp xvnmsubmsp Table 21.
Instruction Name VSX Vector Multiply-Add Type-A Single-Precision VSX Vector Multiply-Add Type-M Single-Precision VSX Vector Multiply-Subtract Type-A Single-Precision VSX Vector Multiply-Subtract Type-M Single-Precision VSX Vector Negative Multiply-Add Type-A Single-Precision VSX Vector Negative Multiply-Add Type-M Single-Precision VSX Vector Negative Multiply-Subtract Type-A Single-Precision VSX Vector Negative Multiply-Subtract Type-M Single-Precision
316
Power ISA I
Instruction Name VSX Scalar Maximum Double-Precision VSX Scalar Minimum Double-Precision VSX Scalar Double-Precision Maximum/Minimum Instructions
Instruction Name VSX Vector Compare Equal To Single-Precision VSX Vector Compare Greater Than or Equal To Single-Precision VSX Vector Compare Greater Than Single-Precision
Instruction Name VSX Vector Maximum Double-Precision VSX Vector Minimum Double-Precision VSX Vector Double-Precision Maximum/Minimum Instructions
Instruction Name VSX Vector Maximum Single-Precision VSX Vector Minimum Single-Precision VSX Vector Single-Precision Maximum/Minimum Instructions
317
Instruction Name VSX Scalar round and Convert Signed Fixed-Point Doubleword to Double-Precision format VSX Scalar round and Convert Unsigned Fixed-Point Doubleword to Double-Precision format VSX Scalar Convert Integer to Double-Precision Instructions
318
Power ISA I
Instruction Name VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate
Instruction Name VSX Vector Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Vector Convert Signed Fixed-Point Word to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Vector Convert Unsigned Fixed-Point Word to Double-Precision format VSX Vector Convert Integer to Double-Precision Instructions
Instruction Name VSX Vector Convert and round Signed Fixed-Point Doubleword to Single-Precision format VSX Vector Convert and round Signed Fixed-Point Word to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point Word to Single-Precision format VSX Vector Convert Integer to Single-Precision Instructions
319
Instruction Name VSX Vector Round to Single-Precision Integer using round to Nearest Away VSX Vector Round to Single-Precision Integer Exact using Current rounding mode VSX Vector Round to Single-Precision Integer using round towards -Infinity rounding mode VSX Vector Round to Single-Precision Integer using round towards +Infinity rounding mode VSX Vector Round to Single-Precision Integer using round towards Zero rounding mode VSX Vector Round to Single-Precision Integer Instructions
Page 500
320
Power ISA I
Page 501
Page 500
Instruction Name VSX Shift Left Double by Word Immediate VSX Shift Instruction
Page 501
321
322
Power ISA I
the
323
value
in
324
Power ISA I
325
326
Power ISA I
327
328
Power ISA I
329
y is a normalized floating-point value having unbounded range and precision. Return the value y rounded to double-precision under control of the rounding mode specified by x. if if if if IsQNaN(y) then return ConvertFPtoDP(y) IsInf(y) then return ConvertFPtoDP(y) IsZero(y) then return ConvertFPtoDP(y) y<Nmin then do if UE=0 then do if x=0b00 then r RoundToDPNearEven( DenormDP(y) ) if x=0b01 then r RoundToDPTrunc( DenormDP(y) ) if x=0b10 then r RoundToDPCeil( DenormDP(y) ) if x=0b11 then r RoundToDPFloor( DenormDP(y) ) ux_flag xx_flag return(ConvertFPtoDP(r)) end else do y Scalb(y,+1536) ux_flag 1 end end if x=0b00 then r RoundToDPNearEven(y) if x=0b01 then r RoundToDPTrunc(y) if x=0b10 then r RoundToDPCeil(y) if x=0b11 then r RoundToDPFloor(y)) if r>Nmax then do if OE=0 then do if x=0b00 then r sign ? -Inf : +Inf if x=0b01 then r sign ? -Nmax : +Nmax if x=0b10 then r sign ? -Nmax : +Inf if x=0b11 then r sign ? -Inf : +Nmax ox_flag 0b1 xx_flag 0b1 inc_flag 0bU return(ConvertFPtoDP(r)) end else do r Scalb(r,-1536) ox_flag 1 end end return(ConvertFPtoDP(r))
330
Power ISA I
331
332
Power ISA I
333
y is a normalized floating-point value having unbounded range and precision. Return the value y rounded to single-precision under control of the rounding mode specified by x. if if if if IsQNaN(y) then return ConvertFPtoSP(y) IsInf(y) then return ConvertFPtoSP(y) IsZero(y) then return ConvertFPtoSP(y) y<Nmin then do if UE=0 then do if x=0b00 then r RoundToSPNearEven( DenormSP(y) ) if x=0b01 then r RoundToSPTrunc( DenormSP(y) ) if x=0b10 then r RoundToSPCeil( DenormSP(y) ) if x=0b11 then r RoundToSPFloor( DenormSP(y) ) ux_flag xx_flag return(ConvertFPtoSP(r)) end else do y Scalb(y,+192) ux_flag 1 end end if x=0b00 then r RoundToSPNearEven(y) if x=0b01 then r RoundToSPTrunc(y) if x=0b10 then r RoundToSPCeil(y) if x=0b11 then r RoundToSPFloor(y)) if r>Nmax then do if OE=0 then do if x=0b00 then r sign ? -Inf : +Inf if x=0b01 then r sign ? -Nmax : +Nmax if x=0b10 then r sign ? -Nmax : +Inf if x=0b11 then r sign ? -Inf : +Nmax ox_flag 0b1 xx_flag 0b1 inc_flag 0bU return(ConvertFPtoSP(r)) end else do r Scalb(r,-192) ox_flag 1 end end return(ConvertFPtoSP(r))
334
Power ISA I
335
336
Power ISA I
337
XT,RA,RB T
11
(0x7C00_0498) RB 588
21
XT,RA,RB T
11
(0x7C00_0698) RB 844
21
RA
16
TX
31
RA
16
TX
31
Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of the doubleword in storage at address EA are placed in doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for lxsdx tgt = VSR[XT] MEM(EA,8)
0 64
Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of the doubleword in storage at address EA are placed into doubleword element 0 of VSR[XT]. The contents of the doubleword in storage at address EA+8 are placed into doubleword element 1 of VSR[XT]. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for lxvd2x
undefined
127
MEM(EA+8,8)
127
338
Power ISA I
XT,RA,RB T
11
(0x7C00_0298) RB 332
21
XT,RA,RB T
11
(0x7C00_0618) RB 780
21
RA
16
TX
31
RA
16
TX
31
Let XT be the value TX concatenated with T. Let XT be the value TX concatenated with T. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of the doubleword in storage at address EA are copied into doubleword elements 0 and 1 of VSR[XT]. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for lxvdsx tgt = VSR[XT] MEM(EA,8)
0 64
Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of the word in storage at address EA are placed into word element 0 of VSR[XT]. The contents of the word in storage at address EA+4 are placed into word element 1 of VSR[XT]. The contents of the word in storage at address EA+8 are placed into word element 2 of VSR[XT]. The contents of the word in storage at address EA+12 are placed into word element 3 of VSR[XT]. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for lxvw4x tgt = VSR[XT] MEM(EA,4)
0
MEM(EA,8)
127
MEM(EA+4,4)
32
MEM(EA+8,4) MEM(EA+12,4)
64 96 127
339
XS,RA,RB S
11
(0x7C00_0598) RB 716
21
XS,RA,RB S
11
(0x7C00_0798) RB 972
21
RA
16
SX
31
RA
16
SX
31
Let XS be the value SX concatenated with S. Let XS be the value SX concatenated with S. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of doubleword element 0 of VSR[XS] are placed in the doubleword in storage at address EA. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for stxsdx src = VSR[XS] DP/SD/UD/MD
0 64
Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of doubleword element 0 of VSR[XS] are placed in the doubleword in storage at address EA. The contents of doubleword element 1 of VSR[XS] are placed in the doubleword in storage at address EA+8. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None
unused
127
DP/SD/UD/MD
127
340
Power ISA I
XS,RA,RB S
11
(0x7C00_0718) RB 908
21
XT,XB T
11
(0xF000_0564) B
16 21
RA
16
SX
31
///
345
BX TX
30 31
XT XB result{0:63} VSR[XT]
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. The absolute value of the double-precision floating-point operand in doubleword element 0 of VSR[XB] is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. Special Registers Altered: None VSR Data Layout for xsabsdp src = VSR[XB]
Let XS be the value SX concatenated with S. Let EA be the sum of the contents of GPR[RA], or 0 if RA is equal to 0, and the contents of GPR[RB]. The contents of word element 0 of VSR[XS] are placed in the word in storage at address EA. The contents of word element 1 of VSR[XS] are placed in the word in storage at address EA+4. The contents of word element 2 of VSR[XS] are placed in the word in storage at address EA+8. The contents of word element 3 of VSR[XS] are placed in the word in storage at address EA+12. If EA is not word-aligned (that is, EA{62:63}g0b00), the system alignment error handler is permitted to be invoked instead of performing the storage access. Special Registers Altered: None VSR Data Layout for stxvw4x src = VSR[XS] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW
0 32 64 96 127
DP tgt = VSR[XT] DP
0 64
unused
undefined
127
341
The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VSR Data Layout for xsadddp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0100) B 32
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} v{0:inf} AddDP(src1,src2) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassSP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
unused
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 45, Actions for xsadddp, on page 343. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
undefined
127
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
342
Power ISA I
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v -Infinity v src1 v -Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v -Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd A(x,y) Q(x) v The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 45.
343
Range of v
Case
Rounding Mode RTN rv rv q Rnd(v) r -Infinity q Rnd(v) r -Infinity rv rv q Rnd(v) r -Nmax RTZ rv rv q Rnd(v) r -Nmax RTP rv rv q Rnd(v) r -Infinity q Rnd(v) r -Infinity RTM
v is a QNaN v = -Infinity -Infinity < v [ -(Nmax + 1ulp) -(Nmax + 1ulp) < v [ -(Nmax + ulp)
r -Nmax r -Nmax
q Rnd(v) r -Infinity
-(Nmax + ulp) < v < -Nmax v = -Nmax -Nmax < v < -Nmin v = -Nmin -Nmin < v < -Zero v = -Zero v = Rezd v = +Zero +Zero < v < +Nmin v = +Nmin +Nmin < v < +Nmax v = +Nmax +Nmax < v < +(Nmax + ulp) +(Nmax + ulp) [ v < +(Nmax + 1ulp) +(Nmax + 1ulp) [ v < +Infinity v = +Infinity Explanation:
v Den(x) This situation cannot occur.
Overflow Normal Normal Normal Normal Tiny Special Special Special Tiny Normal Normal Normal Overflow Normal Overflow Normal Overflow Special
q Rnd(v) r -Infinity rv r -Nmax r -Nmax r Rnd(v)
r -Nmax r -Nmax r Rnd(v) r -Nmin q Rnd(v) r Rnd(Den(v)) rv r +Zero rv q Rnd(v) r Rnd(Den(v)) r +Nmin r Rnd(v) r +Nmax r -Nmax r -Nmax r Rnd(v)
r -Nmax r Rnd(v) r -Nmin q Rnd(v) r Rnd(Den(v)) rv r -Zero rv q Rnd(v) r Rnd(Den(v)) r +Nmin r Rnd(v) r +Nmax
r -Nmin q Rnd(v) r Rnd(Den(v)) rv r +Zero rv q Rnd(v) r Rnd(Den(v)) r +Nmin r Rnd(v) r +Nmax q Rnd(v) r +Infinity q Rnd(v) r +Infinity q Rnd(v) r -Infinity rv
r +Nmax
The precise intermediate result defined in the instruction having unbounded range and precision. The value x is denormalized. The significand is shifted left by the amount of the difference between Emin for the target precision (that is, -1022 for double-precision, -126 for single-precision) and the unbiased exponent of x. The unbiased exponent of the denormalized value is Emin. The significand of the denormalized value has unbounded significand precision. Exact-zero-difference result. Applies only to add operations involving source operands having the same magnitude and different signs. The significand of x is rounded to the target precision according to the rounding mode specified in FPSCRRN. Exponent range of the rounded result is unbounded. See Section 7.3.2.6. Largest (in magnitude) representable normalized number in the target precision format. Smallest (in magnitude) representable normalized number in the target precision format. Least significant bit in the target precision formats significand (Unit in the Last Position). Round To Nearest, ties to Even. Round Toward Zero. Round Toward +infinity. Round Toward infinity.
Table 46.
344
Power ISA I
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(r), FPRFClassFP(r), FI0, FR0 T(r), FI0, FR0, fx(ZX) fx(ZX), error() T(r), FPRFClassFP(r), FI0, FR0, fx(VXSQRT) T(r), FPRFClassFP(r), FI0, FR0, fx(VXZDZ) T(r), FPRFClassFP(r), FI0, FR0, fx(VXIDI) T(r), FPRFClassFP(r), FI0, FR0, fx(VXISI) T(r), FPRFClassFP(r), FI0, FR0, fx(VXIMZ) T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN) T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN), fx(VXIMZ) T(r), FPRFClassFP(r), FI0, FR0, fx(VXSQRT) fx(VXZDZ), error() fx(VXIDI), error() fx(VXISI), error() fx(VXIMZ), error() fx(VXSNAN), error() fx(VXSNAN), fx(VXIMZ), error()
Special
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 1
0 0 1 1 0 1 1
0 1 0 1 1 0 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
Explanation:
ClassFP(x) fx(x) The results do not depend on this condition. Classifies the floating-point value x as defined in Table 2, Floating-Point Result Flags, on page 281. FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. Floating-Point Underflow exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXIDI. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Floating-Point Invalid Operation Exception (Zero Zero) status flag, FPSCRVXZDZ. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Floating-Point Zero Divide Exception status flag, FPSCRZX.
Table 47.
345
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(r), FPRFClassFP(r), FI0, FR0 T(r), FPRFClassFP(r), FI1, FR0, fx(XX) T(r), FPRFClassFP(r), FI1, FR1, fx(XX) T(r), FPRFClassFP(r), FI1, FR0, fx(XX), error() T(r), FPRFClassFP(r), FI1, FR1, fx(XX), error() T(r), FPRFClassFP(r), FI1, FR?, fx(OX), fx(XX) T(r), FPRFClassFP(r), FI1, FR?, fx(OX), fx(XX), error() T(q), FPRFClassFP(q), FI0, FR0, fx(OX), error() T(q), FPRFClassFP(q), FI1, FR0, fx(OX), fx(XX), error() T(q), FPRFClassFP(q), FI1, FR1, fx(OX), fx(XX), error() T(r), FPRFClassFP(r), FI0, FR0 T(r), FPRFClassFP(r), FI1, FR0, fx(UX), fx(XX) T(r), FPRFClassFP(r), FI1, FR1, fx(UX), fx(XX) T(r), FPRFClassFP(r), FI1, FR0, fx(UX), fx(XX), error() T(r), FPRFClassFP(r), FI1, FR1, fx(UX), fx(XX), error() T(q), FPRFClassFP(q), FI0, FR0, fx(UX), error() T(q), FPRFClassFP(q), FI1, FR0, fx(UX), fx(XX), error() T(q), FPRFClassFP(q), FI1, FR1, fx(UX), fx(XX), error()
Normal
0 0 1 1 1
0 0 0 0 0 1 1 1
0 0 1 1 0 1 0 0 1 1
no yes yes yes yes no yes yes yes yes yes yes yes
no yes no yes
Overflow
Tiny
Explanation:
ClassFP(x) fx(x) The results do not depend on this condition. Classifies the floating-point value x as defined in Table 2, Floating-Point Result Flags, on page 281. FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. Floating-Point Underflow exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXIDI. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Floating-Point Invalid Operation Exception (Zero Zero) status flag, FPSCRVXZDZ. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Floating-Point Zero Divide Exception status flag, FPSCRZX.
Table 47.
346
Power ISA I
BF,XA,XB BF //
9 11
DP src2 = VSR[XB] DP
0 64
unused
A
16
B
21
43
AXBX /
29 30 31
XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else if( IsQNaN(src1) | IsQNaN(src2) ) then vxvc_flag = 0b1 FL CompareLTDP(src1,src2) FG CompareGTDP(src1,src2) FE CompareEQDP(src1,src2) FU IsNAN(src1) | IsNAN(src2) CR[BF] FL || FG || FE || FU if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC)
undefined
127
Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is compared to src2. Zeros of same or opposite signs compare equal. Infinities of same signs compare equal. See Table 48, Actions for xscmpodp - Part 1: Compare Ordered, on page 348. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, VXSNAN is set, and Invalid Operation is disabled (VE=0), VXVC is set. If neither operand is a Signaling NaN but at least one operand is a Quiet NaN, VXVC is set. See Table 49, Actions for xscmpodp - Part 2: Result, on page 348. Special Registers Altered: CR[BF] FPCC FX VXSNAN VXVC
347
src2 Infinity Infinity cc0b0010 NZF cc0b1000 Zero cc0b1000 +Zero cc0b1000 +NZF cc0b1000 +Infinity cc0b1000 QNaN cc0b0001 vxvc_flag1 SNaN cc0b0001 vxsnan_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 vxsnan_flag1 cc0b0100 ccC(src1,src2) cc0b1000 cc0b1000 cc0b1000 cc0b1000 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 vxsnan_flag1 cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 vxsnan_flag1 cc0b0100 cc0b0100 cc0b0010 cc0b0010 cc0b1000 cc0b1000 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 vxsnan_flag1 cc0b0100 cc0b0100 cc0b0100 cc0b0100 ccC(src1,src2) cc0b1000 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 vxsnan_flag1 cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0100 cc0b0010 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 vxsnan_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag1 vxvc_flag(VE=0) cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 cc0b0001 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxsnan_flag1 vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0) vxvc_flag(VE=0)
NZF
Zero
Explanation:
src1 src2 NZF C(x,y) The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Nonzero finite number. The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results. 0b1000 0b0100 0b0010 cc when x is greater than y when x is less than y when x is equal to y
Table 48.
vxsnan_flag
vxvc_flag
VE
Returned Results and Status Setting FPCCcc, CR[BF]cc FPCCcc, CR[BF]cc, fx(VXVC) FPCCcc, CR[BF]cc, fx(VXSNAN) FPCCcc, CR[BF]cc, fx(VXSNAN), fx(VXVC) FPCCcc, CR[BF]cc, fx(VXVC), error() FPCCcc, CR[BF]cc, fx(VXSNAN), error()
0 0 0 1 1
0 0 1 1 0 1
0 1 0 1 1
Explanation:
cc fx(x) error() FX VXSNAN VXC The results do not depend on this condition. The 4-bit result as defined in Table 48. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Floating-Point Summary Exception status flag, FPSCRFX. Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1. Floating-Point Invalid Operation Exception (Invalid Compare) status flag, FPSCRVXVC. See Section 7.4.1.
Table 49.
348
Power ISA I
BF,XA,XB BF //
9 11
DP src2 = VSR[XB] DP
0 64
unused
A
16
B
21
35
AXBX /
29 30 31
XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} if( IsSNaN(src1) | IsSNaN(src2) ) then vxsnan_flag 1 FL CompareLTDP(src1,src2) FG CompareGTDP(src1,src2) FE CompareEQDP(src1,src2) FU IsNAN(src1) | IsNAN(src2) CR[BF] FL || FG || FE || FU if(vxsnan_flag) then SetFX(VXSNAN)
undefined
127
Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is compared to src2. Zeros of same or opposite signs compare equal equal. Infinities of same signs compare equal. See Table 50, Actions for xscmpudp - Part 1: Compare Unordered, on page 350. The result of the compare is placed into CR field BF and the FPCC. If either of the operands is a NaN, either quiet or signaling, CR field BF and the FPCC are set to reflect unordered. If either of the operands is a Signaling NaN, VXSNAN is set. See Table 51, Actions for xscmpudp - Part 2: Result, on page 350. Special Registers Altered: CR[BF] FPCC FX VXSNAN
349
src2 Infinity Infinity NZF Zero +Zero src1 +NZF +Infinity QNaN SNaN NZF Zero +Zero +NZF +Infinity QNaN SNaN
cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1 cc = 0b0001 vxsnan_flag = 1
Explanation:
src1 src2 NZF C(x,y) The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Nonzero finite number. The floating-point value x is compared to the floating-point value y, returning one of three 4-bit results. 0b1000 0b0100 0b0010 cc when x is greater than y when x is less than y when x is equal to y
Table 50.
vxsnan_flag
VE
Returned Results and Status Setting FPCCcc, CR[BF]cc FPCCcc, CR[BF]cc, fx(VXSNAN) FPCCcc, CR[BF]cc, fx(VXSNAN), error()
0 1
0 1 1
Explanation:
cc fx(x) error() FX VXSNAN The results do not depend on this condition. The 4-bit result as defined in Table 50. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Floating-Point Summary Exception status flag, FPSCRFX. Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. See Section 7.4.1.
Table 51.
350
Power ISA I
XT,XA,XB T
11
(0xF000_0580) B 176
21
A
16
AXBX TX
29 30 31
XT XA XB result{0:63} VSR[XT]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Bit 0 of VSR[XT] is set to the contents of bit 0 of VSR[XA]. Bits 1:63 of VSR[XT] are set to the contents of bits 1:63 of VSR[XB]. The contents of doubleword element 1 of VSR[XT] are undefined. Special Registers Altered: None VSR Data Layout for xscpsgndp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
unused
unused
undefined
127
351
Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VSR Data Layout for xscvdpsp src = VSR[XB]
XT,XB T
11
(0xF000_0424) B
16 21
///
265
BX TX
30 31
DP tgt = VSR[XT] SP
0 32
unused
XT TX || T XB BX || B reset_xflags() src VSR[XB]{0:63} result{0:31} RoundToSP(RN,src) if(vxsnan_flag) then SetFX(VXSNAN) if(xx_flag) then SetFX(XX) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU_UUUU_UUUU FPRF ClassSP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
undefined
64
undefined
127
Programming Note xscvdpsp can be used to convert the result of a category Floating-Point single-precision operation from 64-bit double-precision format to 32-bit single-precision format for eventual use in VSX vector single-precision operations. The 32-bit single-precision format is not compatible with category Floating-Point single-precision operations.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element 0 of VSR[XT] in single-precision format. The contents of word elements 1, 2, and 3 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345.
352
Power ISA I
FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. See Table 52. Special Registers Altered: FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI VSR Data Layout for xscvdpsxds src = VSR[XB] DP tgt = VSR[XT] SD
0 64
XT,XB T
11
(0xF000_0560) B
16 21
///
344
BX TX
30 31
XT TX || T XB BX || B inc_flag 0b0 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) result{0:63} ConvertDPtoSD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxcvi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
undefined
127
Programming Note xscvdpsxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xsrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified FR and FI are set to 0. Otherwise, The result is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to an undefined value.
353
VE
XE
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN
0 1 0 1 0 1 0 1
0 1 0 1 0 1
T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI1, fx(XX) T(Nmin), FR0, FI1, fx(XX), error() T(Nmin), FR0, FI0 T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI0 T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. yes T(Nmax), FR0, FI1, fx(XX) yes T(Nmax), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) FR0, FI0, fx(VXCVI), fx(VXSNAN), error()
Explanation:
fx(x) error() Nmin Nmax src T(x) FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). The double-precision floating-point value in doubleword element 0 of VSR[XB]. The signed integer doubleword value x is placed in doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined.
Table 52.
354
Power ISA I
FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. See Table 53. Special Registers Altered: FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI VSR Data Layout for xscvdpsxws src = VSR[XB] DP tgt = VSR[XT] undefined
0 32
XT,XB T
11
(0xF000_0160) B
16 21
///
88
BX TX
30 31
XT TX || T XB BX || B inc_flag 0b0 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) result{0:31} ConvertDPtoSW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxcvi_flag) if( ~vex_flag ) then do VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
SW
64
undefined
127
Programming Note xscvdpsxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xsrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified FR and FI are set to 0. Otherwise, The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. FPRF is set to an undefined value.
355
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN
0 1 0 1 0 1 0 1
Returned Results and Status Setting T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI1, fx(XX) T(Nmin), FR0, FI1, fx(XX), error() T(Nmin), FR0, FI0 T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI0 T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 T(Nmax), FR0, FI1, fx(XX) T(Nmax), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) FR0, FI0, fx(VXCVI), fx(VXSNAN), error()
Explanation:
fx(x) error() Nmin Nmax src T(x) FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The smallest signed integer word value, -231(0x8000_0000). The largest signed integer word value, 231-1 (0x7FFF_FFFF). The double-precision floating-point value in doubleword element 0 of VSR[XB]. The signed integer word value x is placed in word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined.
Table 53.
356
Power ISA I
FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. See Table 54. Special Registers Altered: FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI VSR Data Layout for xscvdpuxds src = VSR[XB] DP tgt = VSR[XT] UD
0 64
XT,XB T
11
(0xF000_0520) B
16 21
///
328
BX TX
30 31
XT TX || T XB BX || B inc_flag 0b0 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) result{0:63} ConvertDPtoUD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxcvi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
undefined
127
Programming Note xscvdpuxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xsrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified FR and FI are set to 0. Otherwise, The result is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to an undefined value.
357
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN
0 1 0 1 0 1 0 1
T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI1, fx(XX) T(Nmin), FR0, FI1, fx(XX), error() T(Nmin), FR0, FI0 T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI0 T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. yes T(Nmax), FR0, FI1, fx(XX) yes T(Nmax), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) FR0, FI0, fx(VXCVI), fx(VXSNAN), error()
Explanation:
fx(x) error() Nmin Nmax src T(x) FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). The double-precision floating-point value in doubleword element 0 of VSR[XB]. The unsigned integer doubleword value x is placed in doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined.
Table 54.
358
Power ISA I
FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. See Table 55. Special Registers Altered: FPRF=0bUUUUU FR FI FX XX VXSNAN VXCVI VSR Data Layout for xscvdpuxws src = VSR[XB] DP tgt = VSR[XT] undefined
0 32
XT,XB T
11
(0xF000_0120) B
16 21
///
72
BX TX
30 31
XT TX || T XB BX || B inc_flag 0b0 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) result{0:31} ConvertDPtoUW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxcvi_flag) if( ~vex_flag ) then do VSR[XT] 0xUUUU_UUUU || result || 0xUUUU_UUUU_UUUU_UUUU FPRF 0bUUUUU FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
UW
64
undefined
127
Programming Note xscvdpuxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xsrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified FR and FI are set to 0. Otherwise, The result is placed into word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined. FPRF is set to an undefined value.
359
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN
0 1 0 1 0 1 0 1
Returned Results and Status Setting T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI1, fx(XX) T(Nmin), FR0, FI1, fx(XX), error() T(Nmin), FR0, FI0 T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI0 T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX) T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0 T(Nmax), FR0, FI1, fx(XX) T(Nmax), FR0, FI1, fx(XX), error() T(Nmax), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI) FR0, FI0, fx(VXCVI), error() T(Nmin), FR0, FI0, fx(VXCVI), fx(VXSNAN) FR0, FI0, fx(VXCVI), fx(VXSNAN), error()
Explanation:
fx(x) error() Nmin Nmax src T(x) FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The smallest unsigned integer word value, 0 (0x0000_0000). The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). The double-precision floating-point value in doubleword element 0 of VSR[XB]. The unsigned integer word value x is placed in word element 1 of VSR[XT]. The contents of word elements 0, 2, and 3 of VSR[XT] are undefined.
Table 55.
360
Power ISA I
VSX Scalar Convert and round Signed Integer Doubleword to Double-Precision format XX2-form
xscvsxddp XT,XB T
6 11
XT,XB T
11
(0xF000_0524) (0xF000_05E0) B
16 21
///
16
B
21
329
BX TX
30 31 0
60
///
376
BX TX
30 31
XT TX || T XB BX || B reset_xflags() result{0:31} ConvertSPtoDP(VSR[XB]{0:31}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) end
XT TX || T XB BX || B reset_xflags() v{0:inf} ConvertSDtoFP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU if(xx_flag) then SetFX(XX) FPRF ClassDP(result) FR inc_flag FI xx_flag
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the signed integer value in doubleword element 0 of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. Special Registers Altered: FPRF FR FI FX XX VSR Data Layout for xscvsxddp
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the single-precision floating-point value in word element 0 of VSR[XB]. src is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xscvspdp src = VSR[XB] SP tgt = VSR[XT] DP
0 32 64
unused
unused
undefined
127
tgt = VSR[XT] DP
0 64
undefined
127
Programming Note xscvspdp can be used to convert a single-precision value in single-precision format to double-precision format for use by category Floating-Point scalar single-precision operations.
361
XT,XB T
11
(0xF000_05A0) B
16 21
///
360
BX TX
30 31
XT TX || T XB BX || B reset_xflags() src{0:inf} ConvertUDtoFP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,src) VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU if(xx_flag) then SetFX(XX) FPRF ClassDP(result) FR inc_flag FI xx_flag
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the unsigned integer value in doubleword element 0 of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. Special Registers Altered: FPRF FR FI FX XX VSR Data Layout for xscvuxddp src = VSR[XB] UD tgt = VSR[XT] DP
0 64
unused
undefined
127
362
Power ISA I
The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception or a trap-enabled zero divide exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX ZX XX VXSNAN VXIDI VXZDZ VSR Data Layout for xsdivdp
XT,XA,XB T
11
(0xF000_01C0) B 56
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} v{0:inf} DivideFP(src1,src2) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) if(vxzdz_flag) then SetFX(VXZDZ) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) if(zx_flag) then SetFX(ZX) vex_flag VE & (vxsnan_flag | vxidi_flag | vxzdz_flag) zex_flag ZE & zx_flag if( ~vex_flag & ~zex_flag ) then do VSR[XT] = result || 0xUUUU_UUUU_UUUU_UUUU FPRF = ClassDP(result) FR = inc_flag FI = xx_flag end else do FR = 0b0 FI = 0b0 end
unused
unused
undefined
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Actions for xsdivdp (p. 364). The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
363
src2 -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN -Infinity v dQNaN vxidi_flag 1 v +Zero v +Zero v Zero v Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v +Infinity v D(src1,src2) v +Zero v Zero v D(src1,src2) v Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v +Infinity v +Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v Infinity zx_flag 1 v Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v +Infinity zx_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v D(src1,src2) v Zero v +Zero v D(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxidi_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF D(x,y) Q(x) v The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 56.
364
Power ISA I
Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 57. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 57. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ
XT,XA,XB T
11
(0xF000_0108) B 33
21
A
16
AXBX TX
29 30 31
xsmaddmdp 60
0 6
XT,XA,XB T
11
(0xF000_0148) B 41
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 xsmaddadp ? VSR[XT]{0:63} : VSR[XB]{0:63} src3 xsmaddadp ? VSR[XB]{0:63} : VSR[XT]{0:63} v{0:inf} MultiplyAddFP(src1,src3,src2) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsmaddadp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmaddmdp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
365
unused
undefined
127
366
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 57.
367
XT,XA,XB T
11
(0xF000_0500) B 160
21
unused
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} result{0:63} MaximumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU end
unused
undefined
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src1 is greater than src2, src1 is placed into doubleword element 0 of VSR[XT]. Otherwise, src2 is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. The maximum of +0 and 0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN is that SNaN converted to a QNaN. FPRF, FR and FI are not modified. If a trap-enabled invalid operation exception occurs, VSR[XT] is not modified. See Table 58. Special Registers Altered: FX VXSNAN
368
Power ISA I
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XT]. Nonzero finite number. Return a QNaN with the payload of x. Return the greater of floating-point value x and floating-point value y. The value x is placed in doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF, FR and FI are not modified. fx(x) VXSNAN If x is equal to 0, FX is set to 1. x is set to 1. Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed.
Table 58.
src1
369
XT,XA,XB T
11
(0xF000_0540) B 168
21
unused
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} result{0:63} MinimumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU end
unused
undefined
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. If src1 is less than src2, src1 is placed into doubleword element 0 of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. The minimum of +0 and 0 is 0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN is that SNaN converted to a QNaN. FPRF, FR and FI are not modified. If a trap-enabled invalid operation exception occurs, VSR[XT] is not modified. See Table 59. Special Registers Altered: FX VXSNAN
370
Power ISA I
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XT]. Nonzero finite number. Return a QNaN with the payload of x. Return the lesser of floating-point value x and floating-point value y. The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF, FR and FI are not modified. fx(x) VXSNAN If x is equal to 0, FX is set to 1. x is set to 1. Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. If VE=1, update of VSR[XT] is suppressed.
Table 59.
src1
371
XT,XA,XB T
11
(0xF000_0188) B 49
21
A
16
AXBX TX
29 30 31
For xsmsubmdp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 60. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The result, having unbounded range and precision, is normalized3. See part 2 of Table 60. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ
xsmsubmdp 60
0 6
XT,XA,XB T
11
(0xF000_01C8) B 57
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XT]{0:63} src3 VSR[XB]{0:63} src2 xsmsubadp ? VSR[XT]{0:63} : VSR[XB]{0:63} src3 xsmsubadp ? VSR[XB]{0:63} : VSR[XT]{0:63} v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsmsubadp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
372
Power ISA I
unused
undefined
127
373
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 60.
374
Power ISA I
The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXIMZ VSR Data Layout for xsmuldp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0180) B 48
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} v{0:inf} MultiplyFP(src1,src2) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
unused
undefined
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 61. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
375
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v +Infinity v +Infinity v dQNaN vximz_flag 1 v dQNaN vximz_flag 1 v Infinity v Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v +Infinity -Zero v dQNaN vximz_flag 1 +Zero v dQNaN vximz_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity +Infinity v Infinity QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v Zero v +Infinity v src1 v Q(src1) vxsnan_flag 1 v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF M(x,y) Q(x) v The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 61.
376
Power ISA I
XT,XB T
11
(0xF000_05A4) B
16 21
XT,XB T
11
(0xF000_05E4) B
16 21
///
361
BX TX
30 31
///
377
BX TX
30 31
XT XB result{0:63} VSR[XT]
XT XB result{0:63} VSR[XT]
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. The contents of doubleword element 0 of VSR[XB], with bit 0 set to 1, is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. Special Registers Altered: None VSR Data Layout for xsnabsdp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. The contents of doubleword element 0 of VSR[XB], with bit 0 complemented, is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. Special Registers Altered: None VSR Data Layout for xsnegdp src = VSR[XB]
unused
DP tgt = VSR[XT]
unused
undefined
127 0
DP
64
undefined
127
377
Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 62. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 62. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 63, Scalar Floating-Point Final Result with Negation, on page 381. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ
XT,XA,XB T
11
(0xF000_0508) B 161
21
A
16
AXBX TX
29 30 31
xsnmaddmdp 60
0 6
XT,XA,XB T
11
(0xF000_0548) B 169
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 xsnmaddadp ? VSR[XT]{0:63} : VSR[XB]{0:63} src3 xsnmaddadp ? VSR[XB]{0:63} : VSR[XT]{0:63} v{0:inf} MultiplyAddDP(src1,src3,src2) result{0:63} NegateDP(RoundToDP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0 FI 0 end
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsnmaddadp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmaddmdp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
378
Power ISA I
unused
undefined
127
379
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmaddadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmaddmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 62.
380
Power ISA I
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(N(r)), FPRFClassFP(r), FI0, FR0 T(r), FPRFClassFP(r), FI0, FR0, fx(VXISI) T(r), FPRFClassFP(r), FI0, FR0, fx(VXIMZ) T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN) T(r), FPRFClassFP(r), FI0, FR0, fx(VXSNAN), fx(VXIMZ) fx(VXISI), error() fx(VXIMZ), error() fx(VXSNAN), error() fx(VXSNAN), fx(VXIMZ), error() T(N(r)), FPRFClassFP(N(r)), FI0, FR0 T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(XX) T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(XX) T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(XX), error() T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(XX), error() T(N(r)), FPRFClassFP(N(r)), FI1, FR?, fx(OX), fx(XX) T(N(r)), FPRFClassFP(N(r)), FI1, FR?, fx(OX), fx(XX), error() T(N(q)), FPRFClassFP(N(q)), FI0, FR0, fx(OX), error() T(N(q)), FPRFClassFP(N(q)), FI1, FR0, fx(OX), fx(XX), error() T(N(q)), FPRFClassFP(N(q)), FI1, FR1, fx(OX), fx(XX), error()
Special
0 0 0 0 1 1 1 1
0 0 1 1 1
0 0 1 1 0 1
0 0 1 1 0 1 1
0 1 0 1 1 0 1
0 1 1
no yes no yes
Normal
Overflow
Explanation:
ClassFP(x) fx(x) The results do not depend on this condition. Classifies the floating-point value x as defined in Table 2, Floating-Point Result Flags, on page 281. FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The value x is is negated by complementing the sign bit of x. The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Table 63.
381
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(N(r)), FPRFClassFP(N(r)), FI0, FR0 T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(UX), fx(XX) T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(UX), fx(XX) T(N(r)), FPRFClassFP(N(r)), FI1, FR0, fx(UX), fx(XX), error() T(N(r)), FPRFClassFP(N(r)), FI1, FR1, fx(UX), fx(XX), error() T(N(q)), FPRFClassFP(N(q)), FI0, FR0, fx(UX), error() T(N(q)), FPRFClassFP(N(q)), FI1, FR0, fx(UX), fx(XX), error() T(N(q)), FPRFClassFP(N(q)), FI1, FR1, fx(UX), fx(XX), error()
Tiny
0 0 0 0 0 1 1 1
0 0 1 1
no yes no yes
no yes yes
no yes
Explanation:
ClassFP(x) fx(x) The results do not depend on this condition. Classifies the floating-point value x as defined in Table 2, Floating-Point Result Flags, on page 281. FX is set to 1 if x=0. x is set to 1. Wrap adjust, where = 21536 for double-precision and = 2192 for single-precision. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. The value x is is negated by complementing the sign bit of x. The value x is placed in element 0 of VSR[XT] in the target precision format. The contents of the remaining element(s) of VSR[XT] are undefined. Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Table 63.
382
Power ISA I
XT,XA,XB T
11
(0xF000_0588) B 177
21
A
16
AXBX TX
29 30 31
For xsnmsubmdp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 64. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 64. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 63, Scalar Floating-Point Final Result with Negation, on page 381. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ
xsnmsubmdp 60
0 6
XT,XA,XB T
11
(0xF000_05C8) B 185
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XT]{0:63} src3 VSR[XB]{0:63} src2 xsnmsubadp ? VSR[XT]{0:63} : VSR[XB]{0:63} src3 xsnmsubadp ? VSR[XB]{0:63} : VSR[XT]{0:63} v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) result{0:63} NegateDP(RoundToDP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vximz_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsnmsubadp, do the following. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XT]. Let src3 be the double-precision floating-point value in doubleword element 0 of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
383
unused
undefined
127
384
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The double-precision floating-point value in doubleword element 0 of VSR[XA]. For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmsubadp, the double-precision floating-point value in doubleword element 0 of VSR[XB]. For xsnmsubmdp, the double-precision floating-point value in doubleword element 0 of VSR[XT]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 64.
385
XT,XB T
11
(0xF000_0124) B
16 21
///
73
BX TX
30 31
XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerNearAway(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassFP(result) end
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round to Nearest Away. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpi src = VSR[XB] DP tgt = VSR[XT] DP
0 64
unused
undefined
127
386
Power ISA I
XT,XB T
11
(0xF000_01AC) B
16 21
DP tgt = VSR[XT] DP
0 64
unused
///
107
BX TX
30 31
XT TX || T XB BX || B reset_xflags() src VSR[XB]{0:63} if(RN=0b00) then result{0:63} RoundToDPIntegerNearEven(src) if(RN=0b01) then result{0:63} RoundToDPIntegerTrunc(src) if(RN=0b10) then result{0:63} RoundToDPIntegerCeil(src) if(RN=0b11) then result{0:63} RoundToDPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) if(xx_flag) then SetFX(XX) vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
undefined
127
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR FI FX XX VXSNAN
387
VSX Scalar Round to Double-Precision Integer using round toward +Infinity XX2-form
xsrdpip 60
0 6
XT,XB T
11
(0xF000_01E4) B
16 21
XT,XB T
11
(0xF000_01A4) B
16 21
///
121
BX TX
30 31
///
105
BX TX
30 31
XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerFloor(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) end
XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerCeil(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) end
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round toward -Infinity. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpim src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round toward +Infinity. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 VSR Data Layout for xsrdpip src = VSR[XB]
FX VXSNAN
unused
DP tgt = VSR[XT]
unused
undefined
127 0
DP
64
undefined
127
388
Power ISA I
XT,XB T
11
(0xF000_0164) B
16 21
///
89
BX TX
30 31
XT TX || T XB BX || B reset_xflags() result{0:63} RoundToDPIntegerTrunc(VSR[XB]{0:63}) if(vxsnan_flag) then SetFX(VXSNAN) FR 0b0 FI 0b0 vex_flag VE & vxsnan_flag if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) end
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src is rounded to an integer using the rounding mode Round toward Zero. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to 0. FI is set to 0. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. Special Registers Altered: FPRF FR=0b0 FI=0b0 FX VXSNAN VSR Data Layout for xsrdpiz src = VSR[XB] DP tgt = VSR[XT] DP
0 64
unused
undefined
127
389
Source Value
Result
Exception
XT,XB T
11
///
16
B
21
90
BX TX
30 31
XT TX || T XB BX || B reset_xflags() v{0:inf} ReciprocalEstimateDP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(zx_flag) then SetFX(ZX) vex_flag VE & vxsnan_flag zex_flag ZE & zx_flag if( ~vex_flag & ~zex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR 0bU FI 0bU end
QNaN
The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to an undefined value. FI is set to an undefined value. If a trap-enabled invalid operation exception or a trap-enabled zero divide exception occurs, VSR[XT] and FPRF are not modified. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FPRF FR=0bU FI=0bU XX=0bU VXSNAN VSR Data Layout for xsredp src = VSR[XB]
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. A double-precision floating-point estimate of the reciprocal of src is placed into doubleword element 0 of VSR[XT] in double-precision format. Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of src. That is,
1 estimate ---------src --------------------------------------------1 ---------src 1 -----------------
FX OX UX
DP tgt = VSR[XT] DP
0 64
unused
undefined
127
16384
390
Power ISA I
Source Value
Result
Exception
XT,XB T
11
(0xF000_0128) B
16 21
///
74
BX TX
30 31
XT TX || T XB BX || B reset_xflags() v{0:inf} ReciprocalSquareRootEstimateDP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(zx_flag) then SetFX(ZX) vex_flag VE & (vxsnan_flag | vxsqrt_flag) zex_flag ZE & zx_flag if( ~vex_flag & ~zex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR 0bU FI 0bU end
QNaN
The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to an undefined value. FI is set to an undefined value. If a trap-enabled invalid operation exception or a trap-enabled zero divide exception occurs, VSR[XT] and FPRF are not modified. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FPRF FR=0bU FI=0bU FX OX UX XX=0bU VXSNAN VXSQRT VSR Data Layout for xsrsqrtedp src = VSR[XB] DP tgt = VSR[XT] DP undefined
64 127
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. A double-precision floating-point estimate of the reciprocal square root of src is placed into doubleword element 0 of VSR[XT] in double-precision format. Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of the square root of src. That is,
1 estimate -------------src ------------------------------------------------1 --------------src 1 --------------16384
unused
391
XT,XB T
11
(0xF000_012C) B
16 21
The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element 0 of VSR[XT] in double-precision format. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX XX VXSNAN VXSQRT VSR Data Layout for xssqrtdp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
///
75
BX TX
30 31
XT TX || T XB BX || B reset_xflags() v{0:inf} SquareRootFP(VSR[XB]{0:63}) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxsqrt_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. The unbounded-precision square root of src is produced. See Table 65.
unused
undefined
127
src -Infinity v dQNaN vxsqrt_flag 1 -NZF v dQNaN vxsqrt_flag 1 -Zero v +Zero +Zero v +Zero +NZF v SQRT(src) +Infinity v +Infinity QNaN v src SNaN v Q(src) vxsnan_flag 1
Explanation:
src dQNaN NZF SQRT(x) Q(x) v The double-precision floating-point value in doubleword element 0 of VSR[XB]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. The unbounded-precision square root of the floating-point value x. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 65.
392
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element 0 of VSR[XT]. The contents of doubleword element 1 of VSR[XT] are undefined. FPRF is set to the class and sign of the result. FR is set to indicate if the result was incremented when rounded. FI is set to indicate the result is inexact. If a trap-enabled invalid operation exception occurs, VSR[XT] and FPRF are not modified, and FR and FI are set to 0. See Table 47, Scalar Floating-Point Final Result, on page 345. Special Registers Altered: FPRF FR FI FX OX UX XX VXSNAN VXISI VSR Data Layout for xssubdp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0140) B 40
21
A
16
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B reset_xflags() src1 VSR[XA]{0:63} src2 VSR[XB]{0:63} v{0:inf} AddDP(src1,NegateDP(src2)) result{0:63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) vex_flag VE & (vxsnan_flag | vxisi_flag) if( ~vex_flag ) then do VSR[XT] result || 0xUUUU_UUUU_UUUU_UUUU FPRF ClassDP(result) FR inc_flag FI xx_flag end else do FR 0b0 FI 0b0 end
unused
unused
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. See Table 66. The sum is normalized2. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
undefined
127
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
393
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v Infinity v src1 v Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd S(x,y) S(x,y) Q(x) v The double-precision floating-point value in doubleword element 0 of VSR[XA]. The double-precision floating-point value in doubleword element 0 of VSR[XB]. Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). The floating-point value y is negated and then added to the floating-point value x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 66.
394
Power ISA I
CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xstdivdp src1 = VSR[XA] DP src2 = VSR[XB] DP
0 64
the
value
BF,XA,XB BF
(0xF000_01E8) B 61
21
//
9 11
A
16
AXBX /
29 30 31
AX || A BX || B VSR[XA]{0:63} VSR[XB]{0:63} VSR[XA]{1:11} - 1023 VSR[XB]{1:11} - 1023 IsNaN(src1) | IsInf(src1) | IsNaN(src2) | IsInf(src2) | IsZero(src2) | ( e_b <= -1022 ) | ( e_b >= 1021 ) | ( !IsZero(src1) & ( (e_a - e_b) >= 1023 ) ) | ( !IsZero(src1) & ( (e_a - e_b) <= -1021 ) ) | ( !IsZero(src1) & ( e_a <= -970 ) ) IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2) xsredp_error() <= 2-14 0b1 || fg_flag || fe_flag || 0b0
unused
undefined
127
Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let src1 be the double-precision floating-point value in doubleword element 0 of VSR[XA]. Let src2 be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. src1 is a NaN or an infinity. src2 is a zero, a NaN, or an infinity. e_b is less than or equal to -1022. e_b is greater than or equal to 1021. src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 1023. src1 is not a zero and the difference, e_a - e_b, is less than or equal to -1021. src1 is not a zero and e_a is less than or equal to -970 Otherwise fe_flag is set to 0. fg_flag is set to 1 for any of the following conditions. src1 is an infinity. src2 is a zero, an infinity, or a denormalized value. Otherwise fg_flag is set to 0.
395
BF,XB BF
(0xF000_01A8) B
16 21
//
9 11
///
106
BX /
30 31
BX || B VSR[XB]{0:63} VSR[XB]{1:11} - 1023 IsNaN(src) | IsInf(src) | IsZero(src) | IsNeg(src) | ( e_b <= -970 ) IsInf(src) | IsZero(src) | IsDen(src) xsrsqrtedp_error() <= 2-14 0b1 || fg_flag || fe_flag || 0b0
Let XB be the value BX concatenated with B. Let src be the double-precision floating-point value in doubleword element 0 of VSR[XB]. Let e_b be the unbiased exponent of src. fe_flag is set to 1 for any of the following conditions. src is a zero, a NaN, an infinity, or a negative value. e_b is less than or equal to -970 Otherwise fe_flag is set to 0. fg_flag is set to 1 for any of the following conditions. src is a zero, an infinity, or a denormalized value. Otherwise fg_flag is set to 0. CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xstsqrtdp src = VSR[XB] DP
0 64
the
value
unused
127
396
Power ISA I
XT,XB T
11
(0xF000_0764) B
16 21
XT,XB T
11
(0xF000_0664) B
16 21
///
473
BX TX
30 31
///
409
BX TX
30 31
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. The contents of doubleword element i of VSR[XB], with bit 0 set to 0, is placed into doubleword element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvabsdp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. The contents of word element i of VSR[XB], with bit 0 set to 0, is placed into word element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvabssp src = VSR[XB]
DP
SP tgt = VSR[XT]
SP
SP
SP
DP
127 0
SP
32
SP
64
SP
96
SP
127
397
The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI VSR Data Layout for xvadddp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0300) B 96
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} v{0:inf} AddDP(src1,src2) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
DP
DP
DP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 67. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
398
Power ISA I
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v -Infinity v src1 v -Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v -Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd A(x,y) Q(x) v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 67.
399
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(r) T(r), fx(ZX) fx(ZX), error() T(r), fx(VXSQRT) T(r), fx(VXZDZ) T(r), fx(VXIDI) T(r), fx(VXISI) T(r), fx(VXIMZ) T(r), fx(VXSNAN) T(r), fx(VXSNAN), fx(VXIMZ) T(r), fx(VXSQRT) fx(VXZDZ), error() fx(VXIDI), error() fx(VXISI), error() fx(VXIMZ), error() fx(VXSNAN), error() fx(VXSNAN), fx(VXIMZ), error() T(r) T(r), fx(XX) T(r), fx(XX) T(r), fx(XX), error() T(r), fx(XX), error()
Special
0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 1
0 0 1 1
0 0 1 1 0 1 1
0 1 0 1 1 0 1
0 1 1
0 1 1
0 1 1
0 1 1
0 1 1
no yes no yes
Normal
Explanation:
fx(x) q r v OX error() T(x) UX VXSNAN VXSQRT VXIDI VXIMZ VXISI VXZDZ XX ZX The results do not depend on this condition. FX is set to 1 if x=0. x is set to 1. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
The value x is placed in element i of VSR[XT] in the target precision format (where i
Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXIDI. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Floating-Point Invalid Operation Exception (Zero Zero) status flag, FPSCRVXZDZ. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Floating-Point Zero Divide Exception status flag, FPSCRZX.
Table 68.
400
Power ISA I
Case
Is r inexact? (r g v)
Is q inexact? (q g v)
vxsnan_flag
vxsqrt_flag
vximz_flag
vxisi_flag
vxidi_flag
vxzdz_flag
zx_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(r), fx(OX), fx(XX) T(r), fx(OX), fx(XX), error() fx(OX), error() fx(OX), fx(XX), error() fx(OX), fx(XX), error() T(r) T(r), fx(UX), fx(XX) T(r), fx(UX), fx(XX) T(r), fx(UX), fx(XX), error() T(r), fx(UX), fx(XX), error() fx(UX), error() fx(UX), fx(XX), error() fx(UX), fx(XX), error()
Overflow
0 0 1 1 1
0 0 0 0 0 1 1 1
0 1 0 0 1 1
Tiny
Explanation:
fx(x) q r v OX error() T(x) UX VXSNAN VXSQRT VXIDI VXIMZ VXISI VXZDZ XX ZX The results do not depend on this condition. FX is set to 1 if x=0. x is set to 1. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements.
The value x is placed in element i of VSR[XT] in the target precision format (where i
Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Invalid Square Root) status flag, FPSCRVXSQRT. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXIDI. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Floating-Point Invalid Operation Exception (Zero Zero) status flag, FPSCRVXZDZ. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI. Floating-Point Zero Divide Exception status flag, FPSCRZX.
Table 68.
401
The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI VSR Data Layout for xvaddsp src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] SP
0 32
XT,XA,XB T
11
(0xF000_0200) B 64
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} v{0:inf} AddSP(src1,src2) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
SP
SP
SP
SP
SP
SP
SP
64
SP
96
SP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src2 is added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 69. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
402
Power ISA I
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v -Infinity v -Infinity v -Infinity v -Infinity v -Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v -Infinity v src1 v -Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v -Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v -Infinity v A(src1,src2) v src2 v src2 v A(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd A(x,y) Q(x) v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 69.
403
Two zero inputs of same or different signs return true for that element. Two infinity inputs of same signs return true for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VSR Data Layout for xvcmpeqdp[.] src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] MD
0 64
A
16
Rc
21 22
99
AXBX TX
29 30 31
do i0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} vxsnan_flag IsSNaN(src1) | IsSNaN(src2) if( CompareEQDP(src1,src2) ) then result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF all_false 0b0 end else do result{i:i+63} 0x0000_0000_0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
DP
DP
MD
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2. The contents of doubleword element i of VSR[XT] are set to all 1s if src1 is equal to src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element.
404
Power ISA I
Two zero inputs of same or different signs return true for that element. Two infinity inputs of same signs return true for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VSR Data Layout for xvcmpeqsp[.] src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] MW
0 32
A
16
Rc
21 22
67
AXBX TX
29 30 31
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} vxsnan_flag IsSNaN(src1) | IsSNaN(src2) if( CompareEQSP(src1,src2) ) then result{i:i+31} 0xFFFF_FFFF all_false 0b0 end else do result{i:i+31} 0x0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
SP
SP
SP
SP
SP
SP
MW
64
MW
96
MW
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2. The contents of word element i of VSR[XT] are set to all 1s if src1 is equal to src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element.
405
The contents of doubleword element i of VSR[XT] are set to all 1s if src1 is greater than or equal to the double-precision floating-point operand in doubleword element i of VSR[XB]src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. Two zero inputs of same or different signs return true for that element. Two infinity inputs of same signs return true for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC VSR Data Layout for xvcmpgedp[.] src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] MD
0 64
A
16
Rc
21 22
115
AXBX TX
29 30 31
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) if( CompareGEDP(src1,src2) ) then result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF all_false 0b0 end else do result{i:i+63} 0x0000_0000_0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
DP
DP
MD
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2.
406
Power ISA I
The contents of word element i of VSR[XT] are set to all 1s if src1 is greater than or equal to src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. Two zero inputs of same or different signs return true for that element. Two infinity inputs of same signs return true for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC VSR Data Layout for xvcmpgesp[.] src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] MW
0 32
A
16
Rc
21 22
83
AXBX TX
29 30 31
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) if( CompareGESP(src1,src2) ) then result{i:i+31} 0xFFFF_FFFF all_false 0b0 end else do result{i:i+31} 0x0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
SP
SP
SP
SP
SP
SP
MW
64
MW
96
MW
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2.
407
The contents of doubleword element i of VSR[XT] are set to all 1s if src1 is greater than src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. Two zero inputs of same or different signs return false for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC VSR Data Layout for xvcmpgtdp[.] src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] MD
0 64
A
16
Rc
21 22
107
AXBX TX
29 30 31
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) if( CompareGTDP(src1,src2) ) then do result{i:i+63} 0xFFFF_FFFF_FFFF_FFFF all_false 0b0 end else do result{i:i+63} 0x0000_0000_0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
DP
DP
MD
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is compared to src2.
408
Power ISA I
The contents of word element i of VSR[XT] are set to all 1s if src1 is greater than src2, and is set to all 0s otherwise. A NaN input causes the comparison to return false for that element. Two zero inputs of same or different signs return false for that element. If Rc=1, CR Field 6 is set as follows. Bit 0 of CR[6] is set to indicate all vector elements compared true. Bit 1 of CR[6] is set to 0. Bit 2 of CR[6] is set to indicate all vector elements compared false. Bit 3 of CR[6] is set to 0. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT] and the contents of CR[6] are undefined if Rc is equal to 1. Special Registers Altered: CR[6] . . . . . . . . . . . . . . . . . . . . . . . . . . (if Rc=1) FX VXSNAN VXVC VSR Data Layout for xvcmpgtsp[.] src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] MW
0 32
A
16
Rc
21 22
75
AXBX TX
29 30 31
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} if( IsSNaN(src1) | IsSNaN(src2) ) then do vxsnan_flag 0b1 if(VE=0) then vxvc_flag 0b1 end else vxvc_flag IsQNaN(src1) | IsQNaN(src2) if( CompareGTSP(src1,src2) ) then do result{i:i+31} 0xFFFF_FFFF all_false 0b0 end else do result{i:i+31} 0x0000_0000 all_true 0b0 end if(vxsnan_flag) then SetFX(VXSNAN) if(vxvc_flag) then SetFX(VXVC) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxvc_flag) end if( ex_flag = 0 ) then VSR[XT] result if(Rc=1) then do if( !vex_flag ) then CR[6] all_true || 0b0 || all_false || 0b0 else CR[6] 0bUUUU end
SP
SP
SP
SP
SP
SP
MW
64
MW
96
MW
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is compared to src2.
409
XT,XA,XB T
11
(0xF000_0780) B 240
21
XT,XA,XB T
11
(0xF000_0680) B 208
21
A
16
AXBX TX
29 30 31
A
16
AXBX TX
29 30 31
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. The contents of bit 0 of doubleword element i of VSR[XA] are concatenated with the contents of bits 1:63 of doubleword element i of VSR[XB] and placed into doubleword element i of VSR[XT]. Special Registers Altered: None Extended Mnemonic xvmovdp XT,XB Equivalent To xvcpsgndp XT,XB,XB
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. The contents of bit 0 of word element i of VSR[XA] are concatenated with the contents of bits 1:31 of word element i of VSR[XB] and placed into word element i of VSR[XT]. Special Registers Altered: None Extended Mnemonic xvmovsp XT,XB Equivalent To xvcpsgnsp XT,XB,XB
VSR Data Layout for xvcpsgndp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
VSR Data Layout for xvcpsgnsp src1 = VSR[XA] DP SP src2 = VSR[XB] DP SP tgt = VSR[XT] DP
127 0
SP
SP
SP
SP
SP
SP
SP
32
SP
64
SP
96
SP
127
410
Power ISA I
XT,XB T
11
(0xF000_0624) B
16 21
///
393
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src VSR[XB]{i:i+63} result{i:i+31} RoundToSP(RN,src) result{i+32:i+63} 0xUUUU_UUUU if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VSR Data Layout for xvcvdpsp src = VSR[XB] DP tgt = VSR[XT] SP
0 32
DP
undefined
64
SP
96
undefined
127
411
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvdpsxds src = VSR[XB]
XT,XB T
11
(0xF000_0760) B
16 21
///
472
BX TX
30 31
DP tgt = VSR[XT] SD
0 64
DP
XT XB ex_flag
TX || T BX || B 0b0
SD
127
do i=0 to 127 by 64 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) result{i:i+63} ConvertDPtoSD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvdpsxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xvrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 70. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
412
Power ISA I
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
fx(x) error()
0 1 0 1 0 1 0 1
0 1 0 1 0 1
FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). The signed integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}).
Table 70.
VE
XE
Returned Results and Status Setting T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertDPtoSD(RoundToDPintegerTrunc(src))) T(ConvertDPtoSD(RoundToDPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. yes T(Nmax), fx(XX) yes fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
413
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvdpsxws src = VSR[XB]
XT,XB T
11
(0xF000_0360) B
16 21
///
216
BX TX
30 31
DP tgt = VSR[XT] SW
0 32
DP
XT XB ex_flag
TX || T BX || B 0b0
undefined
64
SW
96
undefined
127
do i=0 to 127 by 64 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) result{i:i+31} ConvertDPtoSW(rnd) result{i+32:i+63} 0xUUUU_UUUU if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvdpsxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xvrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. The result is placed into bits 0:31 of doubleword element i of VSR[XT]. The contents of bits 32:63 of doubleword element 1 of VSR[XT] are undefined. See Table 71. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
414
Power ISA I
VE
XE 0 1 0 1 0 1
Returned Results and Status Setting T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertDPtoSW(RoundToDPintegerTrunc(src))) T(ConvertDPtoSW(RoundToDPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) T(Nmax), fx(XX) T(Nmax), fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertDPtoSW(x) fx(x) error()
0 1 0 1 0 1 0 1
The double-precision floating-point integer value x converted to signed integer word format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest signed integer word value, -231(0x8000_0000). The largest signed integer word value, 231-1 (0x7FFF_FFFF). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). The signed integer word value x is placed in word element i of VSR[XT] (where i c {0,2}).
RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 71.
415
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvdpuxds src = VSR[XB]
XT,XB T
11
(0xF000_0720) B
16 21
///
456
BX TX
30 31
DP tgt = VSR[XT] UD
0 64
DP
XT XB ex_flag
TX || T BX || B 0b0
UD
127
do i=0 to 127 by 64 reset_xflags() rnd{0:63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) result{i:i+63} ConvertDPtoUD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvdpuxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xvrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 72. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
416
Power ISA I
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertDPtoUD(x) fx(x) error()
0 1 0 1 0 1 0 1
T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertDPtoUD(RoundToDPintegerTrunc(src))) T(ConvertDPtoUD(RoundToDPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in DP format but is included here for completeness. yes T(Nmax), fx(XX) yes T(Nmax), fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
The double-precision floating-point integer value x converted to unsigned integer doubleword format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). The double-precision floating-point value in doubleword element i VSR[XB] (where i c {0,1}). The unsigned integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}).
RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 72.
417
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvdpuxws src = VSR[XB]
XT,XB T
11
(0xF000_0320) B
16 21
///
200
BX TX
30 31
DP tgt = VSR[XT] UW
0 32
DP
XT XB ex_flag
TX || T BX || B 0b0
undefined
64
UW
96
undefined
127
do i=0 to 127 by 64 reset_xflags() rnd{0:63} RoundToDIntegerTrunc(VSR[XB]{i:i+63}) result{i:i+31} ConvertDPtoUW(rnd) result{i+32:i+63} 0xUUUU_UUUU if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvdpuxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Double-Precision Integer instruction that corresponds to the desired rounding mode, including xvrdpic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. The result is placed into bits 0:31 of doubleword element i of VSR[XT]. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
418
Power ISA I
VE
XE 0 1 0 1 0 1
Returned Results and Status Setting T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertDPtoUW(RoundToDPintegerTrunc(src))) T(ConvertDPtoUW(RoundToDPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) T(Nmax), fx(XX) fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertDPtoUW(x) fx(x) error()
0 1 0 1 0 1 0 1
The double-precision floating-point integer value x converted to unsigned integer word format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest unsigned integer word value, 0 (0x0000_0000). The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). The unsigned integer word value x is placed in word element i of VSR[XT] (where i c {0,2}).
RoundToDPintegerTrunc(x) The double-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 73.
419
XT,XB T
11
(0xF000_0724) B
16 21
///
457
BX TX
30 31
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() result{i:i+63} ConvertSPtoDP(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in bits 0:31 of doubleword element i of VSR[XB]. src is placed into doubleword element i of VSR[XT] in double-precison format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvcvspdp src = VSR[XB] SP tgt = VSR[XT] DP
0 32 64
unused
SP
unused
DP
96 127
420
Power ISA I
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvspsxds src = VSR[XB]
XT,XB T
11
(0xF000_0660) B
16 21
///
408
BX TX
30 31
SP tgt = VSR[XT] SD
0 32
unused
SP
unused
XT XB ex_flag
TX || T BX || B 0b0
SD
64 96 127
do i=0 to 127 by 64 reset_xflags() rnd{0:31} RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) result{i:i+63} ConvertSPtoSD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvspsxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Single-Precision Integer instruction that corresponds to the desired rounding mode, including xvrspic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in word element i2 of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 263-1, the result is 0x7FFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -263, the result is 0x8000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit signed-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
421
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertSPtoSD(x) fx(x) error()
0 1 0 1 0 1 0 1
T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertSPtoSD(RoundToSPintegerTrunc(src))) T(ConvertSPtoSD(RoundToSPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. yes T(Nmax), fx(XX) yes fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
The single-precision floating-point integer value x converted to signed integer doubleword format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest signed integer doubleword value, -263 (0x8000_0000_0000_0000). The largest signed integer doubleword value, 263-1 (0x7FFF_FFFF_FFFF_FFFF). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,2}). The signed integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}).
RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 74.
422
Power ISA I
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvspsxws src = VSR[XB]
XT,XB T
11
(0xF000_0260) B
16 21
///
152
BX TX
30 31
SP tgt = VSR[XT] SW
0 32
SP
SP
SP
XT XB ex_flag
TX || T BX || B 0b0
SW
64
SW
96
SW
127
do i=0 to 127 by 32 reset_xflags() rnd{0:31} RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) result{i:i+31} ConvertSPtoSW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvspsxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Single-Precision Integer instruction that corresponds to the desired rounding mode, including xvrspic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. If src is a NaN, the result is the value 0x8000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 231-1, the result is 0x7FFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than -231, the result is 0x8000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit signed-integer format. The result is placed into word element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
423
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertSPtoSW(x) fx(x) error()
0 1 0 1 0 1 0 1
T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertSPtoSW(RoundToSPintegerTrunc(src))) T(ConvertSPtoSW(RoundToSPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. yes T(Nmax), fx(XX) yes fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
The single-precision floating-point integer value x converted to signed integer word format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest signed integer word value, -231 (0x8000_0000). The largest signed integer word value, 231-1 (0x7FFF_FFFF). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). The signed integer word value x is placed in word element i of VSR[XT] (where i c {0,1,2,3}).
RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 75.
424
Power ISA I
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvspuxds src = VSR[XB]
XT,XB T
11
(0xF000_0620) B
16 21
///
392
BX TX
30 31
SP tgt = VSR[XT] UD
0 32
unused
SP
unused
XT XB ex_flag
TX || T BX || B 0b0
UD
64 96 127
do i=0 to 127 by 64 reset_xflags() rnd{0:inf} RoundToSPIntegerTrunc(src) result{i:i+63} ConvertSPtoUD(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvspuxds rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Single-Precision Integer instruction that corresponds to the desired rounding mode, including xvrspic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the single-precision floating-point operand in word element i2 of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000_0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 264-1, the result is 0xFFFF_FFFF_FFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000_0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 64-bit unsigned-integer format. The result is placed into doubleword element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
425
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertSPtoUD(x) fx(x) error()
0 1 0 1 0 1 0 1
T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertSPtoUD(RoundToSPintegerTrunc(src))) T(ConvertSPtoUD(RoundToSPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. yes T(Nmax), fx(XX) yes fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
The single-precision floating-point integral value x converted to unsigned integer doubleword format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest unsigned integer doubleword value, 0 (0x0000_0000_0000_0000). The largest unsigned integer doubleword value, 264-1 (0xFFFF_FFFF_FFFF_FFFF). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,2}). The unsigned integer doubleword value x is placed in doubleword element i of VSR[XT] (where i c {0,1}).
RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 76.
426
Power ISA I
Special Registers Altered: FX XX VXSNAN VXCVI VSR Data Layout for xvcvspuxws src = VSR[XB]
XT,XB T
11
(0xF000_0220) B
16 21
///
136
BX TX
30 31
SP tgt = VSR[XT] UW
0 32
SP
SP
SP
XT XB ex_flag
TX || T BX || B 0b0
UW
64
UW
96
UW
127
do i=0 to 127 by 32 reset_xflags() rnd{0:31} RoundToSPIntegerTrunc(src) result{i:i+31} ConvertSPtoUW(rnd) if(vxsnan_flag) then SetFX(VXSNAN) if(vxcvi_flag) then SetFX(VXCVI) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxcvi_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Programming Note xvcvspuxws rounds using Round towards Zero rounding mode. For other rounding modes, software must use a Round to Single-Precision Integer instruction that corresponds to the desired rounding mode, including xvrspic which uses the rounding mode specified by the RN field in the FPSCR.
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. If src is a NaN, the result is the value 0x0000_0000 and VXCVI is set to 1. If src is an SNaN, VXSNAN is also set to 1. Otherwise, src is rounded to a floating-point integer using the rounding mode Round Toward Zero. If the rounded value is greater than 232-1, the result is 0xFFFF_FFFF and VXCVI is set to 1. Otherwise, if the rounded value is less than 0, the result is 0x0000_0000 and VXCVI is set to 1. Otherwise, the result is the rounded value converted to 32-bit unsigned-integer format. The result is placed into word element i of VSR[XT]. See Table 73. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
427
VE
XE 0 1 0 1 0 1
src [ Nmin-1 Nmin-1 < src < Nmin src = Nmin Nmin < src < Nmax src = Nmax Nmax < src < Nmax+1 src m Nmax+1 src is a QNaN src is a SNaN Explanation:
ConvertSPtoUW(x) fx(x) error()
0 1 0 1 0 1 0 1
T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(XX) fx(XX), error() T(Nmin) T(ConvertSPtoUW(RoundToSPintegerTrunc(src))) T(ConvertSPtoUW(RoundToSPintegerTrunc(src))), fx(XX) fx(XX), error() T(Nmax) no Note: This case cannot occur as Nmax is not representable in SP format but is included here for completeness. yes T(Nmax), fx(XX) yes fx(XX), error() T(Nmax), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI) fx(VXCVI), error() T(Nmin), fx(VXCVI), fx(VXSNAN) fx(VXCVI), fx(VXSNAN), error()
The single-precision floating-point integer value x converted to unsigned integer word format. FX is set to 1 if x=0. x is set to 1. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of VSR[XT] is suppressed. The smallest unsigned integer word value, 0 (0x0000_0000). The largest unsigned integer word value, 232-1 (0xFFFF_FFFF). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). The unsigned integer word value x is placed in word element i of VSR[XT] (where i c {0,1,2,3}).
RoundToSPintegerTrunc(x) The single-precision floating-point value x rounded to an integer using the rounding mode Round towards Zero.
Table 77.
428
Power ISA I
VSX Vector Convert and round Signed Integer Doubleword to Single-Precision format XX2-form
xvcvsxdsp 60
0 6
XT,XB T
11
(0xF000_07E0) B
16 21
XT,XB T
11
(0xF000_06E0) B
16 21
///
504
BX TX
30 31
///
440
BX TX
30 31
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertSDtoFP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertSDtoFP(VSR[XB]{i:i+63}) result{i:i+31} RoundToSP(RN,v) result{i+32:i+63} 0xUUUU_UUUU if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the signed integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvsxddp src = VSR[XB] SD tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the signed integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX
SD tgt = VSR[XT] SP
0 32
SD
undefined
64
SP
96
undefined
127
429
VSX Vector Convert and round Signed Integer Word to Single-Precision format XX2-form
xvcvsxwsp 60
0 6
XT,XB T
11
(0xF000_03E0) B
16 21
XT,XB T
11
(0xF000_02E0) B
16 21
///
248
BX TX
30 31
///
184
BX TX
30 31
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertSWtoFP(VSR[XB]{i:i+31}) result{i:i+63} RoundToDP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 32 reset_xflags() v{0:inf} ConvertSWtoFP(VSR[XB]{i:i+31}) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the signed integer in bits 0:31 of doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvsxwdp src = VSR[XB] SW tgt = VSR[XT] DP
0 32 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the signed integer in word element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvsxwsp src = VSR[XB]
unused
SW
unused
SW tgt = VSR[XT]
SW
SW
SW
DP
96 127 0
SP
32
SP
64
SP
96
SP
127
430
Power ISA I
VSX Vector Convert and round Unsigned Integer Doubleword to Single-Precision format XX2-form
xvcvuxdsp 60
0 6
XT,XB T
11
(0xF000_07A0) B
16 21
XT,XB T
11
(0xF000_06A0) B
16 21
///
488
BX TX
30 31
///
424
BX TX
30 31
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertUDtoFP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertUDtoFP(VSR[XB]{i:i+63}) result{i:i+31} RoundToSP(RN,v) result{i+32:i+63} 0xUUUU_UUUU if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the unsigned integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvuxddp src = VSR[XB] UD tgt = VSR[XT] DP
0 32 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the unsigned integer in doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into bits 0:31 of doubleword element i of VSR[XT] in single-precision format. The contents of bits 32:63 of doubleword element i of VSR[XT] are undefined. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX
UD tgt = VSR[XT] SP
0 32
UD
undefined
64
SP
96
undefined
127
431
VSX Vector Convert and round Unsigned Integer Word to Single-Precision format XX2-form
xvcvuxwsp 60
0 6
XT,XB T
11
(0xF000_03A0) B
16 21
XT,XB T
11
(0xF000_02A0) B
16 21
///
232
BX TX
30 31
///
168
BX TX
30 31
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 64 reset_xflags() v{0:inf} ConvertUWtoFP(VSR[XB]{i:i+31}) result{i:i+63} RoundToDP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
XT TX || T XB BX || B ex_flag 0b0 do i=0 to 127 by 32 reset_xflags() v{0:inf} ConvertUWtoFP(VSR[XB]{i:i+31}) result{i:i+31} RoundToSP(RN,v) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the unsigned integer in bits 0:31 of doubleword element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvuxwdp src = VSR[XB] UW tgt = VSR[XT] DP
0 32 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the unsigned integer in word element i of VSR[XB]. src is converted to an unbounded-precision floating-point value and rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VSR Data Layout for xvcvuxwsp src = VSR[XB]
unused
UW
unused
UW tgt = VSR[XT]
UW
UW
UW
DP
96 127 0
SP
32
SP
64
SP
96
SP
127
432
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX ZX XX VXSNAN VXIDI VXZDZ VSR Data Layout for xvdivdp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_03C0) B 120
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} v{0:inf} DivideDP(src1,src2) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) if(vxisi_flag) then SetFX(VXZDZ) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxidi_flag) ex_flag ex_flag | (VE & vxzdz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
DP
DP
DP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Table 78. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
433
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v dQNaN vxidi_flag 1 v +Zero v +Zero v Zero v Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v Infinity v D(src1,src2) v +Zero v Zero v D(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v Infinity v +Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v Infinity zx_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v +Infinity zx_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v D(src1,src2) v Zero v +Zero v D(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxidi_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd D(x,y) Q(x) v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 78.
434
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX ZX XX VXSNAN VXIDI VXZDZ VSR Data Layout for xvdivsp src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] SP
0 32
XT,XA,XB T
11
(0xF000_02C0) B 88
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} v{0:inf} DivideSP(src1,src2) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxidi_flag) then SetFX(VXIDI) if(vxisi_flag) then SetFX(VXZDZ) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxidi_flag) ex_flag ex_flag | (VE & vxzdz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
SP
SP
SP
SP
SP
SP
SP
64
SP
96
SP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is divided1 by src2, producing a quotient having unbounded range and precision. The quotient is normalized2. See Table 79. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
1. Floating-point division is based on exponent subtraction and division of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
435
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v dQNaN vxidi_flag 1 v +Zero v +Zero v Zero v Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 -NZF v Infinity v D(src1,src2) v +Zero v Zero v D(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v Infinity v +Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v Infinity zx_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v Infinity zx_flag 1 v dQNaN vxzdz_flag 1 v dQNaN vxzdz_flag 1 v +Infinity zx_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v D(src1,src2) v Zero v +Zero v D(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v dQNaN vxidi_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vxidi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd D(x,y) Q(x) v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized quotient of floating-point value x divided by floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 79.
436
Power ISA I
XT,XA,XB T
11
(0xF000_0308) B 97
21
A
16
AXBX TX
29 30 31
For xvmaddmdp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 80. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 80. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
xvmaddmdp 60
0 6
XT,XA,XB T
11
(0xF000_0348) B 105
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 xvmaddadp ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 xvmaddadp ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} v{0:inf} MultiplyAddDP(src1,src3,src2) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmaddadp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
437
DP
DP
127
438
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 80.
439
XT,XA,XB T
11
(0xF000_0208) B 65
21
A
16
AXBX TX
29 30 31
For xvmaddmsp, do the following. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let src3 be the single-precision floating-point operand in word element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 81. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 81. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
xvmaddmsp 60
0 6
XT,XA,XB T
11
(0xF000_0248) B 73
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 xvmaddasp ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} src3 xvmaddasp ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} v{0:inf} MultiplyAddSP(src1,src3,src2) result{i:i+63} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvmaddasp, do the following. Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. Let src3 be the single-precision floating-point operand in word element i of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
440
Power ISA I
SP
SP
SP
SP
64
SP
96
SP
127
441
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). For xvmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 81.
442
Power ISA I
XT,XA,XB T
11
(0xF000_0700) B 224
21
DP src2 = VSR[XB] DP
DP
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
DP
TX || T AX || A BX || B 0b0
tgt = VSR[XT] DP
0 64
DP
127
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} result{i:i+63} MaximumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src1 is greater than src2, src1 is placed into doubleword element i of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element i of VSR[XT] in double-precision format. The maximum of +0 and 0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 82. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN
443
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) fx(x) VXSNAN The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Nonzero finite number. Return a QNaN with the payload of x. Return the greater of floating-point value x and floating-point value y. The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. FPRF, FR and FI are not modified. If x is equal to 0, FX is set to 1. x is set to 1. Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
Table 82.
src1
444
Power ISA I
XT,XA,XB T
11
(0xF000_0600) B 192
21
SP src2 = VSR[XB] SP
SP
SP
SP
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
SP
SP
SP
TX || T AX || A BX || B 0b0
tgt = VSR[XT] SP
0 32
SP
64
SP
96
SP
127
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} result{i:i+63} MaximumSP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. If src1 is greater than src2, src1 is placed into word element i of VSR[XT] in single-precision format. Otherwise, src2 is placed into word element i of VSR[XT] in single-precision format. The maximum of +0 and 0 is +0. The maximum of a QNaN and any value is that value. The maximum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 83. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN
445
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src2) T(M(src1,src2)) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src2) T(src2) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src2) T(src2) T(src2) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src2) T(src2) T(src2) T(src2) T(M(src1,src2)) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src2) T(src2) T(src2) T(src2) T(src2) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) fx(x) VXSNAN The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). Nonzero finite number. Return a QNaN with the payload of x. Return the greater of floating-point value x and floating-point value y. The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format. FPRF, FR and FI are not modified. If x is equal to 0, FX is set to 1. x is set to 1. Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
Table 83.
src1
446
Power ISA I
XT,XA,XB T
11
(0xF000_0740) B 232
21
DP src2 = VSR[XB] DP
DP
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
DP
TX || T AX || A BX || B 0b0
tgt = VSR[XT] DP
0 64
DP
127
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} result{i:i+63} MinimumDP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. If src1 is less than src2, src1 is placed into doubleword element i of VSR[XT] in double-precision format. Otherwise, src2 is placed into doubleword element i of VSR[XT] in double-precision format. The minimum of +0 and 0 is 0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 84. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN
447
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) fx(x) VXSNAN The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Nonzero finite number. Return a QNaN with the payload of x. Return the lesser of floating-point value x and floating-point value y. The value x is placed in doubleword element i (i{0,1}) of VSR[XT] in double-precision format. FPRF, FR and FI are not modified. If x is equal to 0, FX is set to 1. x is set to 1. Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
Table 84.
src1
448
Power ISA I
XT,XA,XB T
11
(0xF000_0640) B 200
21
SP src2 = VSR[XB] SP
SP
SP
SP
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
SP
SP
SP
TX || T AX || A BX || B 0b0
tgt = VSR[XT] SP
0 32
SP
64
SP
96
SP
127
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} result{i:i+31} MinimumSP(src1,src2) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. If src1 is less than src2, src1 is placed into word element i of VSR[XT] in single-precision format. Otherwise, src2 is placed into word element i of VSR[XT] in single-precision format. The minimum of +0 and 0 is 0. The minimum of a QNaN and any value is that value. The minimum of any value and an SNaN when VE=0 is that SNaN converted to a QNaN. See Table 85. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN
449
src2 Infinity Infinity NZF Zero +Zero +NZF +Infinity QNaN SNaN T(src1) T(src2) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) NZF T(src1) T(M(src1,src2)) T(src2) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) Zero T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Zero T(src1) T(src1) T(src1) T(src1) T(src2) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +NZF T(src1) T(src1) T(src1) T(src1) T(M(src1,src2)) T(src2) T(src2) T(Q(src1)) fx(VXSNAN) +Infinity T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src2) T(Q(src1)) fx(VXSNAN) QNaN T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(src1) T(Q(src1)) fx(VXSNAN) SNaN T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(Q(src2)) fx(VXSNAN) T(src1) fx(VXSNAN) T(Q(src1)) fx(VXSNAN)
Explanation:
src1 src2 NZF Q(x) M(x,y) T(x) fx(x) VXSNAN The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). Nonzero finite number. Return a QNaN with the payload of x. Return the lesser of floating-point value x and floating-point value y. The value x is placed in word element i (i{0,1,2,3}) of VSR[XT] in single-precision format. FPRF, FR and FI are not modified. If x is equal to 0, FX is set to 1. x is set to 1. Floating-point Invalid Operation Exception (SNaN). If VE=1, update of VSR[XT] is suppressed.
Table 85.
src1
450
Power ISA I
XT,XA,XB T
11
(0xF000_0388) B 113
21
A
16
AXBX TX
29 30 31
For xvmsubmdp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 86. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 86. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
xvmsubmdp 60
0 6
XT,XA,XB T
11
(0xF000_03C8) B 121
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 xvmsubadp ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 xvmsubadp ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmsubadp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB].
VXIMZ
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
451
DP
DP
127
452
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 86.
453
XT,XA,XB T
11
(0xF000_0288) B 81
21
Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let src3 be the single-precision floating-point operand in word element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 87. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 87. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
A
16
AXBX TX
29 30 31
xvmsubmsp 60
0 6
XT,XA,XB T
11
(0xF000_02C8) B 89
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 xvmsubasp ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} src3 xvmsubasp ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} v{0:inf} MultiplyAddSP(src1,src3,NegateSP(src2)) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvmsubasp, do the following. Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvmsubmsp, do the following.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
454
Power ISA I
SP
SP
SP
SP
64
SP
96
SP
127
455
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p Zero p Zero p +Zero p +Zero p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p Zero p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). For xvmsubasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmsubasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvmsubmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 87.
456
Power ISA I
The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXIMZ VSR Data Layout for xvmuldp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0380) B 112
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src3 VSR[XB]{i:i+63} v{0:inf} MultiplyDP(src1,src3) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
DP
DP
DP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 88. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
457
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v +Infinity v +Infinity v dQNaN vximz_flag 1 v dQNaN vximz_flag 1 v Infinity v Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v +Infinity -Zero v dQNaN vximz_flag 1 +Zero v dQNaN vximz_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity +Infinity v Infinity QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v Zero v +Infinity v src1 v Q(src1) vxsnan_flag 1 v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF M(x,y) Q(x) v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 88.
458
Power ISA I
The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXIMZ VSR Data Layout for xvmulsp src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] SP
0 32
XT,XA,XB T
11
(0xF000_0280) B 80
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src3 VSR[XB]{i:i+31} v{0:inf} MultiplySP(src1,src3) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
SP
SP
SP
SP
SP
SP
SP
64
SP
96
SP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src1 is multiplied1 by src2, producing a product having unbounded range and precision. The product is normalized2. See Table 89. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
459
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v +Infinity v +Infinity v dQNaN vximz_flag 1 v dQNaN vximz_flag 1 v Infinity v Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v +Infinity -Zero v dQNaN vximz_flag 1 +Zero v dQNaN vximz_flag 1 v Zero v Zero v +Zero v +Zero v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity +Infinity v Infinity QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v Zero v +Infinity v src1 v Q(src1) vxsnan_flag 1 v dQNaN vximz_flag 1 v src1 v Q(src1) vxsnan_flag 1
v M(src1,src2) v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 v +Infinity v src1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF M(x,y) Q(x) v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 89.
460
Power ISA I
XT,XB T
11
(0xF000_07A4) B
16 21
XT,XB T
11
(0xF000_06A4) B
16 21
///
489
BX TX
30 31
///
425
BX TX
30 31
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. The contents of doubleword element i of VSR[XB], with bit 0 set to 1, is placed into doubleword element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvnabsdp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. The contents of word element i of VSR[XB], with bit 0 set to 1, is placed into word element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvnabssp src = VSR[XB]
DP
SP tgt = VSR[XT]
SP
SP
SP
DP
127 0
SP
32
SP
64
SP
96
SP
127
461
(0xF000_06E4) B
16 21
XT,XB T
11
///
16
B
21
505
BX TX
30 31
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. The contents of doubleword element i of VSR[XB], with bit 0 complemented, is placed into doubleword element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvnegdp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. The contents of word element i of VSR[XB], with bit 0 complemented, is placed into word element i of VSR[XT]. Special Registers Altered: None VSR Data Layout for xvnegsp src = VSR[XB] SP SP SP SP
DP
tgt = VSR[XT] SP SP
32 64
SP
96
SP
127
DP
127
462
Power ISA I
XT,XA,XB T
11
(0xF000_0708) B 225
21
A
16
AXBX TX
29 30 31
For xvnmaddmdp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 90. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 90. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into doubleword element i of VSR[XT] in double-precision format. See Table 91, Vector Floating-Point Final Result with Negation, on page 466. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
xvnmaddmdp 60
0 6
XT,XA,XB T
11
(0xF000_0748) B 233
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 xvnmaddadp ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 xvnmaddadp ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} v{0:inf} MultiplyAddDP(src1,src3,src2) result{i:i+63} NegateDP(RoundToDP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvnmaddadp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB].
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
463
DP
DP
127
464
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmaddadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmaddmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 90.
465
Case
Is q inexact? (q g v)
Is r inexact? (r g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(N(r)) T(r), fx(VXISI) T(r), fx(VXIMZ) T(r), fx(VXSNAN) T(r), fx(VXSNAN), fx(VXIMZ) fx(VXISI), error() fx(VXIMZ), error() fx(VXSNAN), error() fx(VXSNAN), fx(VXIMZ), error() T(N(r)) T(N(r)), fx(XX) T(N(r)), fx(XX) T(N(r)), fx(XX), error() T(N(r)), fx(XX), error() T(N(r)), fx(OX), fx(XX) T(N(r)), fx(OX), fx(XX), error() fx(OX), error() fx(OX), fx(XX), error() fx(OX), fx(XX), error()
Special
0 0 0 0 1 1 1 1
0 0 1 1 1
0 0 1 1 0 1
0 0 1 1 0 1 1
0 1 0 1 1 0 1
0 1 1
no yes no yes
Normal
Overflow
Explanation:
fx(x) q r v FI FR OX error() N(x) T(x) UX VXSNAN VXIMZ VXISI XX The results do not depend on this condition. FX is set to 1 if x=0. x is set to 1. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. The value x is is negated by complementing the sign bit of x. The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i
Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Table 91.
466
Power ISA I
Case
Is q inexact? (q g v)
Is r inexact? (r g v)
vxsnan_flag
vximz_flag
vxisi_flag
OE
UE
VE
XE
ZE
Returned Results and Status Setting T(N(r)) T(N(r)), fx(UX), fx(XX) T(N(r)), fx(UX), fx(XX) T(N(r)), fx(UX), fx(XX), error() T(N(r)), fx(UX), fx(XX), error() fx(UX), error() fx(UX), fx(XX), error() fx(UX), fx(XX), error()
Tiny
0 0 0 0 0 1 1 1
0 0 1 1
no yes no yes
no yes yes
no yes
Explanation:
fx(x) q r v FI FR OX error() N(x) T(x) UX VXSNAN VXIMZ VXISI XX The results do not depend on this condition. FX is set to 1 if x=0. x is set to 1. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, unbounded exponent range. The value defined in Table 46, Floating-Point Intermediate Result Handling, on page 344, signficand rounded to the target precision, bounded exponent range. The precise intermediate result defined in the instruction having unbounded signficand precision, unbounded exponent range. Floating-Point Fraction Inexact status flag, FPSCRFI. This status flag is nonsticky. Floating-Point Fraction Rounded status flag, FPSCRFR. Floating-Point Overflow Exception status flag, FPSCROX. The system error handler is invoked for the trap-enabled exception if the FE0 and FE1 bits in the Machine State Register are set to any mode other than the ignore-exception mode. Update of the target VSR is suppressed for all vector elements. The value x is is negated by complementing the sign bit of x. The value x is placed in element i of VSR[XT] in the target precision format (where i c {0,1} for results with 64-bit elements, and i
Floating-Point Underflow Exception status flag, FPSCRUX Floating-Point Invalid Operation Exception (SNaN) status flag, FPSCRVXSNAN. Floating-Point Invalid Operation Exception (Infinity Zero) status flag, FPSCRVXIMZ. Floating-Point Invalid Operation Exception (Infinity Infinity) status flag, FPSCRVXISI. Float-Point Inexact Exception status flag, FPSCRXX. The flag is a sticky version of FPSCRFI. When FPSCRFI is set to a new value, the new value of FPSCRXX is set to the result of ORing the old value of FPSCRXX with the new value of FPSCRFI.
Table 91.
467
XT,XA,XB T
11
(0xF000_0608) B 193
21
Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let src3 be the single-precision floating-point operand in word element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 92. src2 is added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 92. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into word element i of VSR[XT] in single-precision format. See Table 91, Vector Floating-Point Final Result with Negation, on page 466. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
A
16
AXBX TX
29 30 31
xvnmaddmsp 60
0 6
XT,XA,XB T
11
(0xF000_0648) B 201
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 xvnmaddasp ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} src3 xvnmaddasp ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} v{0:inf} MultiplyAddSP(src1,src3,src2) result{i:i+31} NegateSP(RoundToSP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvnmaddasp, do the following. Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvnmaddmsp, do the following.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
468
Power ISA I
SP
SP
SP
SP
64
SP
96
SP
127
469
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Add Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v A(p,src2) v src2 v src2 v A(p,src2) v +Infinity vp vp +Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) A(x,y) M(x,y) p v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvnmaddasp, the single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). For xvnmaddmsp, the single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 92.
470
Power ISA I
XT,XA,XB T
11
(0xF000_0788) B 241
21
A
16
AXBX TX
29 30 31
For xvmsubmdp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 93. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 93. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into doubleword element i of VSR[XT] in double-precision format. See Table 91, Vector Floating-Point Final Result with Negation, on page 466. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
xvnmsubmdp 60
0 6
XT,XA,XB T
11
(0xF000_07C8) B 249
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 xvmsubadp ? VSR[XT]{i:i+63} : VSR[XB]{i:i+63} src3 xvmsubadp ? VSR[XB]{i:i+63} : VSR[XT]{i:i+63} v{0:inf} MultiplyAddDP(src1,src3,NegateDP(src2)) result{i:i+63} NegateDP(RoundToDP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. For xvmsubadp, do the following. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XT]. Let src3 be the double-precision floating-point operand in doubleword element i of VSR[XB].
VXIMZ
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
471
DP
DP
127
472
Power ISA I
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmsubadp, the double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). For xvnmsubmdp, the double-precision floating-point value in doubleword element i of VSR[XT] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 93.
473
XT,XA,XB T
11
(0xF000_0688) B 209
21
Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let src3 be the single-precision floating-point operand in word element i of VSR[XT]. src1 is multiplied1 by src3, producing a product having unbounded range and precision. See part 1 of Table 94. src2 is negated and added2 to the product, producing a sum having unbounded range and precision. The sum is normalized3. See part 2 of Table 94. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is negated and placed into word element i of VSR[XT] in single-precision format. See Table 91, Vector Floating-Point Final Result with Negation, on page 466. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI
A
16
AXBX TX
29 30 31
xvnmsubmsp 60
0 6
XT,XA,XB T
11
(0xF000_06C8) B 217
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 xvnmsubasp ? VSR[XT]{i:i+31} : VSR[XB]{i:i+31} src3 xvnmsubasp ? VSR[XB]{i:i+31} : VSR[XT]{i:i+31} v{0:inf} MultiplyAddSP(src1,src3,NegateSP(src2)) result{i:i+31} NegateSP(RoundToSP(RN,v)) if(vxsnan_flag) then SetFX(VXSNAN) if(vximz_flag) then SetFX(VXIMZ) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vximz_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
VXIMZ
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. For xvnmsubasp, do the following. Let src2 be the single-precision floating-point operand in word element i of VSR[XT]. Let src3 be the single-precision floating-point operand in word element i of VSR[XB]. For xvnmsubmsp, do the following.
1. Floating-point multiplication is based on exponent addition and multiplication of the significands. 2. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 3. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
474
Power ISA I
SP
SP
SP
SP
64
SP
96
SP
127
475
Part 1: Multiply Infinity NZF Zero src1 +Zero +NZF +Infinity QNaN SNaN Part 2: Subtract Infinity NZF Zero +Zero p +NZF +Infinity QNaN & src1 is a NaN QNaN & src1 not a NaN
src3 Infinity p +Infinity p +Infinity p dQNaN vximz_flag 1 p dQNaN vximz_flag 1 p Infinity p Infinity p src1 p Q(src1) vxsnan_flag 1 NZF p +Infinity Zero p dQNaN vximz_flag 1 +Zero p dQNaN vximz_flag 1 p src1 p Zero p +Zero p src1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1 +NZF p Infinity +Infinity p Infinity QNaN p src3 p src3 p src3 p src3 p src3 p src3 p src1 p Q(src1) vxsnan_flag 1 SNaN p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p Q(src3) vxsnan_flag 1 p src1 vxsnan_flag 1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p src1 p +Infinity p src1 p Q(src1) vxsnan_flag 1 p dQNaN vximz_flag 1 p src1 p Q(src1) vxsnan_flag 1
p M(src1,src3) p +Infinity p +Infinity p src1 p Q(src1) vxsnan_flag 1 p +Infinity p src1 p Q(src1) vxsnan_flag 1
src2 Infinity v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity vp vp NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp Zero v Infinity vp v Zero v Rezd vp v +Infinity vp vp +Zero v Infinity vp v Rezd v +Zero vp v +Infinity vp vp +NZF v Infinity v S(p,src2) v src2 v src2 v S(p,src2) v +Infinity vp vp +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 vp vp QNaN v src2 v src2 v src2 v src2 v src2 v src2 vp v src2 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 vp vxsnan_flag 1 v Q(src2) vxsnan_flag 1
Explanation:
src1 src2 src3 dQNaN NZF Rezd Q(x) S(x,y) M(x,y) p v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XT] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Can also occur with two nonzero finite number source operands. Return a QNaN with the payload of x. Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return the normalized product of floating-point value x and floating-point value y, having unbounded range and precision. The intermediate product having unbounded range and precision. The intermediate result having unbounded range and precision.
Table 94.
476
Power ISA I
XT,XB T
11
(0xF000_0324) B
16 21
///
201
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 64 reset_xflags() result{i:i+63} RoundToDPIntegerNearAway(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode Round to Nearest Away. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrdpi src = VSR[XB] DP tgt = VSR[XT] DP
0 64
DP
DP
127
477
VSX Vector Round to Double-Precision Integer using round toward -Infinity XX2-form
xvrdpim XT,XB T
6 11
(0xF000_03E4) B
16 21
XT,XB T
11
///
16
B
21
235
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src{0:63} VSR[XB]{i:i+63} if(RN=0b00) then result{i:i+63} RoundToDPIntegerNearEven(src) if(RN=0b01) then result{i:i+63} RoundToDPIntegerTrunc(src) if(RN=0b10) then result{i:i+63} RoundToDPIntegerCeil(src) if(RN=0b11) then result{i:i+63} RoundToDPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
do i=0 to 127 by 64 reset_xflags() result{i:i+63} RoundToDPIntegerFloor(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward -Infinity. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrdpim src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VXSNAN VSR Data Layout for xvrdpic src = VSR[XB] DP tgt = VSR[XT] DP
0 64
DP
DP
127
DP
DP
127
478
Power ISA I
VSX Vector Round to Double-Precision Integer using round toward Zero XX2-form
xvrdpiz 60
0 6
XT,XB T
11
(0xF000_03A4) B
16 21
XT,XB T
11
(0xF000_0364) B
16 21
///
233
BX TX
30 31
///
217
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 64 reset_xflags() result{i:i+63} RoundToDPIntegerCeil(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
do i=0 to 127 by 64 reset_xflags() result{i:i+63} RoundToDPIntegerTrunc(VSR[XB]{i:i+63}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward +Infinity. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrdpip src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward Zero. The result is placed into doubleword element i of VSR[XT] in double-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrdpiz src = VSR[XB]
DP
DP tgt = VSR[XT]
DP
DP
127 0
DP
64
DP
127
479
Source Value
Result
Exception
XT,XB T
11
///
16
B
21
218
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
QNaN
do i=0 to 127 by 64 reset_xflags() v{0:inf} ReciprocalEstimateDP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) end if( ex_flag = 0 ) then VSR[XT] result
If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FX OX UX ZX VXSNAN VSR Data Layout for xvredp src = VSR[XB]
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. A double-precision floating-point estimate of the reciprocal of src is placed into doubleword element i of VSR[XT] in double-precision format. Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of src. That is,
1 estimate ---------src --------------------------------------------1 ---------src 1 -----------------
DP tgt = VSR[XT] DP
0 64
DP
DP
127
16384
480
Power ISA I
Source Value
Result
Exception
XT,XB T
11
///
16
B
21
154
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
QNaN
do i=0 to 127 by 32 reset_xflags() v{0:inf} ReciprocalEstimateSP(VSR[XB]{i:i+31}) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (ZE & zx_flag) end if( ex_flag = 0 ) then VSR[XT] result
If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FX OX UX ZX VXSNAN VSR Data Layout for xvresp src = VSR[XB]
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. A single-precision floating-point estimate of the reciprocal of src is placed into word element i of VSR[XT] in single-precision format. Unless the reciprocal of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of src. That is,
1 estimate ---------src --------------------------------------------1 ---------src 1 -----------------
SP tgt = VSR[XT] SP
0 32
SP
SP
SP
SP
64
SP
96
SP
127
16384
481
VSX Vector Round to Single-Precision Integer Exact using Current rounding mode XX2-form
xvrspic 60
0 6
XT,XB T
11
(0xF000_0224) B
16 21
XT,XB T
11
(0xF000_02AC) B
16 21
///
137
BX TX
30 31
///
171
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 32 reset_xflags() result{i:i+31} RoundToSPIntegerNearAway(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer using the rounding mode Round to Nearest Away. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrspi src = VSR[XB] SP tgt = VSR[XT] SP
0 32
do i=0 to 127 by 32 reset_xflags() src{0:31} VSR[XB]{i:i+31} if(RN=0b00) then result{i:i+31} RoundToSPIntegerNearEven(src) if(RN=0b01) then result{i:i+31} RoundToSPIntegerTrunc(src) if(RN=0b10) then result{i:i+31} RoundToSPIntegerCeil(src) if(RN=0b11) then result{i:i+31} RoundToSPIntegerFloor(src) if(vxsnan_flag) then SetFX(VXSNAN) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer value using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR. The result is placed into word element i of VSR[XT] in single-precision format.
SP
SP
SP If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT].
SP
64
SP
96
SP
127
Special Registers Altered: FX XX VXSNAN VSR Data Layout for xvrspic src = VSR[XB] SP tgt = VSR[XT] SP
0 32
SP
SP
SP
SP
64
SP
96
SP
127
482
Power ISA I
VSX Vector Round to Single-Precision Integer using round toward +Infinity XX2-form
xvrspip 60
0 6
XT,XB T
11
(0xF000_02E4) B
16 21
XT,XB T
11
(0xF000_02A4) B
16 21
///
185
BX TX
30 31
///
169
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 32 reset_xflags() result{i:i+31} = RoundToSPIntegerFloor(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
do i=0 to 127 by 32 reset_xflags() result{i:i+31} = RoundToSPIntegerCeil(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward -Infinity. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrspim src = VSR[XB] SP tgt = VSR[XT] SP
0 32
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward +Infinity. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrspip src = VSR[XB]
SP
SP
SP
SP tgt = VSR[XT]
SP
SP
SP
SP
64
SP
96
SP
127 0
SP
32
SP
64
SP
96
SP
127
483
XT,XB T
11
(0xF000_0264) B
16 21
///
153
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 32 reset_xflags() result{i:i+31} = RoundToSPIntegerTrunc(VSR[XB]{i:i+31}) if(vxsnan_flag) then SetFX(VXSNAN) ex_flag ex_flag | (VE & vxsnan_flag) end if( ex_flag = 0 ) then VSR[XT] result
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. src is rounded to an integer using the rounding mode Round toward Zero. The result is placed into word element i of VSR[XT] in single-precision format. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX VXSNAN VSR Data Layout for xvrspiz src = VSR[XB] SP tgt = VSR[XT] SP
0 32
SP
SP
SP
SP
64
SP
96
SP
127
484
Power ISA I
Source Value
Result
Exception
XT,XB T
11
(0xF000_0328) B
16 21
///
202
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i0 to 127 by 64 reset_xflags() v{0:inf} RecipSquareRootEstimateDP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (ZE & zx_flag) end if( ex_flag = 0 ) then VSR[XT] result
QNaN
If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FX ZX VXSNAN VXSQRT VSR Data Layout for xvrsqrtedp src = VSR[XB] DP tgt = VSR[XT] DP
0 64
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. A double-precision floating-point estimate of the reciprocal square root of src is placed into doubleword element i of VSR[XT] in double-precision format. Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of the square root of src. That is,
1 estimate -------------src ------------------------------------------------1 --------------src 1 --------------16384
DP
DP
127
485
Source Value
Result
Exception
XT,XB T
11
(0xF000_0228) B
16 21
///
138
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 32 reset_xflags() v{0:inf} RecipSquareRootEstimateSP(VSR[XB]{i:i+31}) result{i:i+31} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(zx_flag) then SetFX(ZX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (ZE & zx_flag) end if( ex_flag = 0 ) then VSR[XT] result
QNaN
If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. The results of executing this instruction is permitted to vary between implementations, and between different executions on the same implementation. Special Registers Altered: FX ZX VXSNAN VXSQRT VSR Data Layout for xvrsqrtesp src = VSR[XB] SP tgt = VSR[XT] SP
0 32
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. A single-precision floating-point estimate of the reciprocal square root of src is placed into word element i of VSR[XT] in single-precision format. Unless the reciprocal of the square root of src would be a zero, an infinity, or a QNaN, the estimate has a relative error in precision no greater than one part in 16384 of the reciprocal of the square root of src. That is,
1 estimate -------------src ------------------------------------------------1 --------------src 1 --------------16384
SP
SP
SP
SP
64
SP
96
SP
127
486
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VXSNAN VXSQRT VSR Data Layout for xvsqrtdp src = VSR[XB] DP tgt = VSR[XT] DP DP
64 127
XT,XB T
11
(0xF000_032C) B
16 21
///
203
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i0 to 127 by 64 reset_xflags() v{0:inf} SquareRootDP(VSR[XB]{i:i+63}) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (XE & xx_flag end if( ex_flag ) then VSR[XT] result
DP
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. The unbounded-precision square root of src is produced. See Table 95. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
src -Infinity v dQNaN vxsqrt_flag 1 -NZF v dQNaN vxsqrt_flag 1 -Zero v +Zero +Zero v +Zero +NZF v SQRT(src) +Infinity v +Infinity QNaN v src SNaN v Q(src) vxsnan_flag 1
Explanation:
src dQNaN NZF SQRT(x) Q(x) v The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. The unbounded-precision square root of the floating-point value x. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 95.
487
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX XX VXSNAN VXSQRT VSR Data Layout for xvsqrtsp src = VSR[XB] SP tgt = VSR[XT] SP SP
32 64
XT,XB T
11
(0xF000_022C) B
16 21
///
139
BX TX
30 31
XT XB ex_flag
TX || T BX || B 0b0
do i=0 to 127 by 32 reset_xflags() v{0:inf} SquareRootSP(VSR[XB]{i:i+31}) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxsqrt_flag) then SetFX(VXSQRT) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxsqrt_flag) ex_flag ex_flag | (XE & xx_flag end if( ex_flag ) then VSR[XT] result
SP
SP
SP
SP
96
SP
127
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. The unbounded-precision square root of src is produced. See Table 96. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
src -Infinity v dQNaN vxsqrt_flag 1 -NZF v dQNaN vxsqrt_flag 1 -Zero v +Zero +Zero v +Zero +NZF v SQRT(src) +Infinity v +Infinity QNaN v src SNaN v Q(src) vxsnan_flag 1
Explanation:
src dQNaN NZF SQRT(x) Q(x) v The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. The unbounded-precision square root of the floating-point value x. Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 96.
488
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into doubleword element i of VSR[XT] in double-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI VSR Data Layout for xvsubdp src1 = VSR[XA] DP src2 = VSR[XB] DP tgt = VSR[XT] DP
0 64
XT,XA,XB T
11
(0xF000_0340) B 104
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 64 reset_xflags() src1 VSR[XA]{i:i+63} src2 VSR[XB]{i:i+63} v{0:inf} AddDP(src1,NegateDP(src2)) result{i:i+63} RoundToDP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag ) then VSR[XT] result
DP
DP
DP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 97. The intermediate result is rounded to double-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
489
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v Infinity v src1 v Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd S(x,y) Q(x) v The double-precision floating-point value in doubleword element i of VSR[XA] (where i c {0,1}). The double-precision floating-point value in doubleword element i of VSR[XB] (where i c {0,1}). Default quiet NaN (0x7FF8_0000_0000_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 97.
490
Power ISA I
See Table 46, Floating-Point Intermediate Result Handling, on page 344. The result is placed into word element i of VSR[XT] in single-precision format. See Table 68, Vector Floating-Point Final Result, on page 400. If a trap-enabled exception occurs in any element of the vector, no results are written to VSR[XT]. Special Registers Altered: FX OX UX XX VXSNAN VXISI VSR Data Layout for xvsubsp src1 = VSR[XA] SP src2 = VSR[XB] SP tgt = VSR[XT] SP
0 32
XT,XA,XB T
11
(0xF000_0240) B 72
21
A
16
AXBX TX
29 30 31
XT XA XB ex_flag
TX || T AX || A BX || B 0b0
do i=0 to 127 by 32 reset_xflags() src1 VSR[XA]{i:i+31} src2 VSR[XB]{i:i+31} v{0:inf} AddSP(src1,NegateSP(src2)) result{i:i+31} RoundToSP(RN,v) if(vxsnan_flag) then SetFX(VXSNAN) if(vxisi_flag) then SetFX(VXISI) if(ox_flag) then SetFX(OX) if(ux_flag) then SetFX(UX) if(xx_flag) then SetFX(XX) ex_flag ex_flag | (VE & vxsnan_flag) ex_flag ex_flag | (VE & vxisi_flag) ex_flag ex_flag | (OE & ox_flag) ex_flag ex_flag | (UE & ux_flag) ex_flag ex_flag | (XE & xx_flag) end if( ex_flag ) then VSR[XT] result
SP
SP
SP
SP
SP
SP
SP
64
SP
96
SP
127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. src2 is negated and added1 to src1, producing a sum having unbounded range and precision. The sum is normalized2. See Table 98. The intermediate result is rounded to single-precision using the rounding mode specified by the Floating-Point Rounding Control field RN of the FPSCR.
1. Floating-point addition is based on exponent comparison and addition of the two significands. The exponents of the two operands are compared, and the significand accompanying the smaller exponent is shifted right, with its exponent increased by one for each bit shifted, until the two exponents are equal. The two significands are then added or subtracted as appropriate, depending on the signs of the operands, to form an intermediate sum. All 53 bits of the significand as well as all three guard bits (G, R, and X) enter into the computation. 2. Floating-point normalization is based on shifting the significand left until the most-significant bit is 1 and decrementing the exponent by the number of bits the significand was shifted.
491
src2 -Infinity -Infinity -NZF -Zero src1 +Zero +NZF +Infinity QNaN SNaN v dQNaN vxisi_flag 1 v +Infinity v +Infinity v +Infinity v +Infinity v +Infinity v src1 v Q(src1) vxsnan_flag 1 -NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 -Zero v Infinity v src1 v Zero v Rezd v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Zero v Infinity v src1 v Rezd v +Zero v src1 v +Infinity v src1 v Q(src1) vxsnan_flag 1 +NZF v Infinity v S(src1,src2) v src2 v src2 v S(src1,src2) v +Infinity v src1 v Q(src1) vxsnan_flag 1 +Infinity v Infinity v Infinity v Infinity v Infinity v Infinity v dQNaN vxisi_flag 1 v src1 v Q(src1) vxsnan_flag 1 QNaN v src2 v src2 v src2 v src2 v src2 v src2 v src1 v Q(src1) vxsnan_flag 1 SNaN v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v Q(src2) vxsnan_flag 1 v src1 vxsnan_flag 1 v Q(src1) vxsnan_flag 1
Explanation:
src1 src2 dQNaN NZF Rezd S(x,y) Q(x) v The single-precision floating-point value in word element i of VSR[XA] (where i c {0,1,2,3}). The single-precision floating-point value in word element i of VSR[XB] (where i c {0,1,2,3}). Default quiet NaN (0x7FC0_0000). Nonzero finite number. Exact-zero-difference result (addition of two finite numbers having same magnitude but different signs). Return the normalized sum of floating-point value x and negated floating-point value y, having unbounded range and precision. Note: If x = -y, v is considered to be an exact-zero-difference result (Rezd). Return a QNaN with the payload of x. The intermediate result having unbounded signficand precision and unbounded exponent range.
Table 98.
492
Power ISA I
fg_flag is set to 1 for any of the following conditions. src1 is an infinity. src2 is a zero, an infinity, or a denormalized value. CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xvtdivdp src1 = VSR[XA] DP src2 = VSR[XB] DP
0 64
BF,XA,XB BF
(0xF000_03E8) B 125
21
//
9 11
A
16
AXBX /
29 30 31
the
value
XA XB eq_flag gt_flag
AX || A BX || B 0b0 0b0 by 64 VSR[XA]{i:i+63} VSR[XB]{i:i+63} src1{1:11} - 1023 src2{1:11} - 1023 fe_flag | IsNaN(src1) | IsInf(src1) | IsNaN(src2) | IsInf(src2) | IsZero(src2) | ( e_b <= -1022 ) | ( e_b >= 1021 ) | ( !IsZero(src1) & ( (e_a - e_b) >= 1023 ) ) | ( !IsZero(src1) & ( (e_a - e_b) <= -1021 ) ) | ( !IsZero(src1) & ( e_a <= -970 ) ) fg_flag | IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2)
DP
DP
127
fg_flag end
Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 1, do the following. Let src1 be the double-precision floating-point operand in doubleword element i of VSR[XA]. Let src2 be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. src1 is a NaN or an infinity. src2 is a zero, a NaN, or an infinity. e_b is less than or equal to -1022. e_b is greater than or equal to 1021. src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 1023. src1 is not a zero and the difference, e_a - e_b, is less than or equal to -1021. src1 is not a zero and e_a is less than or equal to -970
493
fg_flag is set to 1 for any of the following conditions. src1 is an infinity. src2 is a zero, an infinity, or a denormalized value. CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xvtdivsp src1 = VSR[XA] SP src2 = VSR[XB] SP
0 32
BF,XA,XB BF
(0xF000_02E8) B 93
21
//
9 11
A
16
AXBX /
29 30 31
XA XB eq_flag gt_flag
AX || A BX || B 0b0 0b0 by 32 VSR[XA]{i:i+31} VSR[XB]{i:i+31} src1{1:8} - 127 src2{1:8} - 127 fe_flag | IsNaN(src1) | IsInf(src1) | IsNaN(src2) | IsInf(src2) | IsZero(src2) | ( e_b <= -126 ) | ( e_b >= 125 ) | ( !IsZero(src1) & ( (e_a - e_b) >= 127 ) ) | ( !IsZero(src1) & ( (e_a - e_b) <= -125 ) ) | ( !IsZero(src1) & ( e_a <= -103 ) ) fg_flag | IsInf(src1) | IsInf(src2) | IsZero(src2) | IsDen(src2)
the
value
SP
SP
SP
SP
64
SP
96
SP
127
fg_flag end
Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 3, do the following. Let src1 be the single-precision floating-point operand in word element i of VSR[XA]. Let src2 be the single-precision floating-point operand in word element i of VSR[XB]. Let e_a be the unbiased exponent of src1. Let e_b be the unbiased exponent of src2. fe_flag is set to 1 for any of the following conditions. src1 is a NaN or an infinity. src2 is a zero, a NaN, or an infinity. e_b is less than or equal to -126. e_b is greater than or equal to 125. src1 is not a zero and the difference, e_a - e_b, is greater than or equal to 127. src1 is not a zero and the difference, e_a - e_b, is less than or equal to -125. src1 is not a zero and e_a is less than or equal to -103.
494
Power ISA I
BF,XB BF //
9 11
(0xF000_03A8) B
16 21
BF,XB BF //
9 11
(0xF000_02A8) B
16 21
///
234
BX /
30 31
///
170
BX /
30 31
XB BX || B fe_flag 0b0 fg_flag 0b0 do i=0 to 127 src e_b fe_flag fg_flag end fl_flag xvrsqrtedp_error() <= 2-14 CR[BF] 0b1 || fg_flag || fe_flag || 0b0 by 64 VSR[XB]{i:i+63} src2{1:11} - 1023 fe_flag | IsNaN(src) | IsInf(src) | IsZero(src) | IsNeg(src) | ( e_a <= -970 ) fg_flag | IsInf(src) | IsZero(src) | IsDen(src)
XB BX || B fe_flag 0b0 fg_flag 0b0 do i=0 to 127 by 32 src VSR[XB]{i:i+31} e_b src2{1:8} - 127 fe_flag fe_flag | IsNaN(src) | IsInf(src) | IsZero(src) | IsNeg(src) | ( e_a <= -103 ) fg_flag fg_flag | IsInf(src) | IsZero(src) | IsDen(src) end fl_flag = xvrsqrtesp_error() <= 2-14 CR[BF] = 0b1 || fg_flag || fe_flag || 0b0
Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 1, do the following. Let src be the double-precision floating-point operand in doubleword element i of VSR[XB]. Let e_b be the unbiased exponent of src. fe_flag is set to 1 for any of the following conditions. src is a zero, a NaN, an infinity, or a negative value. e_b is less than or equal to -970. fg_flag is set to 1 for the following condition. src is a zero, an infinity, or a denormalized value. CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xvtsqrtdp src = VSR[XB] DP
0 64
Let XB be the value BX concatenated with B. fe_flag is initialized to 0. fg_flag is initialized to 0. For each vector element i from 0 to 3, do the following. Let src be the single-precision floating-point operand in word element i of VSR[XB]. Let e_b be the unbiased exponent of src. fe_flag is set to 1 for any of the following conditions. src is a zero, a NaN, an infinity, or a negative value. e_b is less than or equal to -103. fg_flag is set to 1 for the following condition. src is a zero, an infinity, or a denormalized value. CR field BF is set to 0b1 || fg_flag || fe_flag || 0b0. Special Registers Altered: CR[BF] VSR Data Layout for xvtsqrtsp src = VSR[XB] the value
the
value
DP
127 0
SP
32
SP
64
SP
96
SP
127
495
XT,XA,XB T
11
XT,XA,XB T
11
(0xF000_0450) B 138
21
A
16
B
21
130
AXBX TX
29 30 31
A
16
AXBX TX
29 30 31
XT XA XB VSR[XT]
XT XA XB VSR[XT]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ANDed with the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxland src1 = VSR[XA]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ANDed with the complement of the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxland src1 = VSR[XA]
496
Power ISA I
XT,XA,XB T
11
XT,XA,XB T
11
(0xF000_0490) B 146
21
A
16
B
21
162
AXBX TX
29 30 31
A
16
AXBX TX
29 30 31
XT XA XB VSR[XT]
TX AX BX ~(
|| T || A || B VSR[XA] | VSR[XB] )
XT XA XB VSR[XT]
TX || T AX || A BX || B VSR[XA] | VSR[XB]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ORed with the contents of VSR[XB] and the complemented result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxlnor src1 = VSR[XA]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are ORed with the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxlor src1 = VSR[XA]
497
XT,XA,XB T
11
(0xF000_04D0) B 154
21
A
16
AXBX TX
29 30 31
XT XA XB VSR[XT]
TX || T AX || A BX || B VSR[XA] ^ VSR[XB]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of VSR[XA] are XORed with the contents of VSR[XB] and the result is placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxlxor src1 = VSR[XA]
src2 = VSR[XB]
tgt = VSR[XT]
127
498
Power ISA I
XT,XA,XB T
11
(0xF000_0090) B 18
21
XT,XA,XB T
11
(0xF000_0190) B 50
21
A
16
AXBX TX
29 30 31
A
16
AXBX TX
29 30 31
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of word element 0 of VSR[XA] are placed into word element 0 of VSR[XT]. The contents of word element 0 of VSR[XB] are placed into word element 1 of VSR[XT]. The contents of word element 1 of VSR[XA] are placed into word element 2 of VSR[XT]. The contents of word element 1 of VSR[XB] are placed into word element 3 of VSR[XT]. Special Registers Altered: None VSR Data Layout for xxmrghw src1 = VSR[XA] SP/SW/UW/MW SP/SW/UW/MW src2 = VSR[XB] SP/SW/UW/MW SP/SW/UW/MW tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW
0 32 64 96 127
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. The contents of word element 2 of VSR[XA] are placed into word element 0 of VSR[XT]. The contents of word element 2 of VSR[XB] are placed into word element 1 of VSR[XT]. The contents of word element 3 of VSR[XA] are placed into word element 2 of VSR[XT]. The contents of word element 3 of VSR[XB] are placed into word element 3 of VSR[XT]. Special Registers Altered: None VSR Data Layout for xxmrglw src1 = VSR[XA]
unused
unused
unused
SP/SW/UW/MW SP/SW/UW/MW
unused
unused
unused
SP/SW/UW/MW SP/SW/UW/MW
499
VSX Select
xxsel XT,XA,XB,XC T
6 11
XX4-form
(0xF000_0030) C
21
XT,XA,XB,DM T
11
(0xF000_0050) 60 A
16
3 CXAXBX TX
26 28 29 30 31
A
16
0 DM
21 22 24
10
AXBX TX
29 30 31
XT TX || T XA AX || A XB BX || B if(DM=0b00) if(DM=0b01) if(DM=0b10) if(DM=0b11) then then then then VSR[XT] VSR[XT] VSR[XT] VSR[XT] VSR[XA]{0:63} VSR[XA]{0:63} VSR[XA]{64:127} VSR[XA]{64:127} || || || || VSR[XB]{0:63} VSR[XB]{64:127} VSR[XB]{0:63} VSR[XB]{64:127}
XT XA XB XC
TX AX BX CX
|| || || ||
T A B C
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. If DM0=0, the contents of doubleword element 0 of VSR[XA] are placed into doubleword element 0 of VSR[XT]. Otherwise the contents of doubleword element 1 of VSR[XA] are placed into doubleword element 0 of VSR[XT]. If DM1=0, the contents of doubleword element 0 of VSR[XB] are placed into doubleword element 1 of VSR[XT]. Otherwise the contents of doubleword element 1 of VSR[XB] are placed into doubleword element 1 of VSR[XT]. Special Registers Altered: None
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let XC be the value CX concatenated with C. For each bit of VSR[XC] that contains the value 0, corresponding bit of VSR[XA] is placed into corresponding bit of VSR[XT]. Otherwise, corresponding bit of VSR[XB] is placed into corresponding bit of VSR[XT]. Special Registers Altered: None VSR Data Layout for xxsel src1 = VSR[XA] the the the the
src2 = VSR[XB] Extended Mnemonic xxspltd xxspltd xxmrghd xxmrgld xxswapd T,A,0 T,A,1 T,A,B T,A,B T,A Equivalent To xxpermdi xxpermdi xxpermdi xxpermdi xxpermdi T,A,A,0b00 T,A,A,0b11 T,A,B,0b00 T,A,B,0b11 T,A,A,0b10
0 127
src3 = VSR[XC]
tgt = VSR[XT]
VSR Data Layout for xxpermdi src1 = VSR[XA] DP/SD/UD/MD src2 = VSR[XB] DP/SD/UD/MD tgt = VSR[XT] DP/SD/UD/MD
0 64
DP/SD/UD/MD
DP/SD/UD/MD
DP/SD/UD/MD
127
500
Power ISA I
(0xF000_0290) B 164
21
XT,XA,XB,SHW T
11
(0xF000_0010) 60 / / / UIM
11 14 16
BX TX
30 31
A
16
0 SHW
21 22 24
AXBX TX
29 30 31
XT XA XB source{0:255} VSR[XT]
Let XT be the value TX concatenated with T. Let XA be the value AX concatenated with A. Let XB be the value BX concatenated with B. Let the source vector be the concatenation of the contents of VSR[XA] followed by the contents of VSR[XB]. Words SHW:SHW+3 of the source vector are placed into VSR[XT]. Special Registers Altered: None VSR Data Layout for xxsldwi src1 = VSR[XA] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW src2 = VSR[XB] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW
0 32 64 96 127
Let XT be the value TX concatenated with T. Let XB be the value BX concatenated with B. The contents of word element UIM of VSR[XB] are replicated in each word element of VSR[XT]. Special Registers Altered: None VSR Data Layout for xxspltw src = VSR[XB] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW tgt = VSR[XT] SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW SP/SW/UW/MW
0 32 64 96 127
501
502
Power ISA I
8.1 Overview
The Signal Processing Engine (SPE) accelerates signal processing applications normally suited to DSP operation. This is accomplished using short vectors (two element) within 64-bit GPRs and using single instruction multiple data (SIMD) operations to perform the requisite computations. SPE also architects an Accumulator register to allow for back to back operations without loop unrolling.
The RTL conventions in described below are used in addition to those described in Section 1.3:Additional RTL functions are described in Appendix D. Notation
sf
Meaning Signed fractional multiplication. Result of multiplying 2 signed fractional quantities having bit length n taking the least significant 2n-1 bits of the sign extended product and concatenating a 0 to the least significant bit forming a signed fractional result of 2n bits. Two 16-bit signed fractional quantities, a and b are multiplied, as shown below: ea0:31 = EXTS(a) eb0:31 = EXTS(b) prod0:63 = ea X eb eprod0:63 = EXTS(prod32:63) result0:31 = eprod33:63 || 0b0 Guarded signed fractional multiplication. Result of multiplying 2 signed fractional quantities having bit length 16 taking the least significant 31 bits of the sign extended product and concatenating a 0 to the least significant bit forming a guarded signed fractional result of 64 bits. Since guarded signed fractional multiplication produces a 64-bit result, fractional input quantities of -1 and -1 can produce +1 in the intermediate product. Two 16-bit fractional quantities, a and b are multiplied, as shown below:
503
<< >>
Figure 124.GPR
Figure 125.Accumulator
8.3.4 Signal Processing Embedded Floating-Point Status and Control Register (SPEFSCR)
Status and control for SPE uses the SPEFSCR register. This register is also used by the SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, and SPE.Embedded Float Vector categories. Status and control bits are shared with these categories. The SPEFSCR register is implemented as special purpose register (SPR) number 512 and is read and written by the mfspr and mtspr instructions. SPE instructions affect both the high element (bits 32:33) and low element status flags (bits 48:49) of the SPEFSCR. SPEFSCR
32 63
Figure 126. Signal Processing and Embedded Floating-Point Status and Control Register The SPEFSCR bits are defined as shown below. Bit 32 Description Summary Integer Overflow High (SOVH) SOVH is set to 1 when an SPE instruction sets OVH. This is a sticky bit.
504
Power ISA I
34
44
45
505
54
47 48
49
56 57
50
The Embedded Floating-Point Round interrupt is taken if the exception is enabled and if FG | FGH | FX | FXH (signifying an inexact result) is set to 1 as a result of an Embedded Floating-Point instruction. If an Embedded Floating-Point instruction results in overflow or underflow and the corresponding Embedded Floating-Point Underflow or Embedded Floating-Point Overflow exception is disabled then the Embedded Floating-Point Round interrupt is taken. 58 Embedded Floating-Point Invalid Operation/Input Error Exception Enable (FINVE) [Categories: SP.FV, SP.FD, SP.FS] 0 1 Exception disabled Exception enabled
51
52
If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the FINV or FINVH bit is set to 1 by an Embedded Floating-Point instruction. 59 Embedded Floating-Point Divide By Zero Exception Enable (FDBZE) [Categories: SP.FV, SP.FD, SP.FS] 0 Exception disabled
506
Power ISA I
If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the FDBZ or FDBZH bit is set to 1 by an Embedded Floating-Point instruction. 60 Embedded Floating-Point Underflow Exception Enable (FUNFE) [Categories: SP.FV, SP.FD, SP.FS] 0 1 Exception disabled Exception enabled
If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the FUNF or FUNFH bit is set to 1 by an Embedded Floating-Point instruction. 61 Embedded Floating-Point Overflow Exception Enable (FOVFE) [Categories: SP.FV, SP.FD, SP.FS] 0 1 Exception disabled Exception enabled
If the exception is enabled, an Embedded Floating-Point Data interrupt is taken if the FOVF or FOVFH bit is set to 1 by an Embedded Floating-Point instruction. 62:63 Embedded Floating-Point Rounding Mode Control (FRMC) [Categories: SP.FV, SP.FD, SP.FS] 00 01 10 11 Round to Nearest Round toward Zero Round toward +Infinity Round toward -Infinity Programming Note Rounding modes 0b10 (+Infinity) and 0b11 (-Infinity) may not be supported by some implementations. If an implementation does not support these, Embedded Floating-Point Round interrupts are generated for every Embedded Floating-Point instruction for which rounding is required when +Infinity or -Infinity modes are set and software is required to produce the correctly rounded result
507
Multiply and Accumulate instructions. These instructions perform multiply operations, optionally add the result to the accumulator, and place the result into the destination register and optionally into the accumulator. These instructions are composed of different multiply forms, data formats and data accumulate options. The mnemonics for these instructions indicate their various characteristics. These are shown in Table 99. Load and Store instructions. These instructions provide load and store capabilities for moving data to and from memory. A variety of forms are provided that position data for efficient computation. Compare and miscellaneous instructions. These instructions perform miscellaneous functions such as field manipulation, bit reversed incrementing, and vector compares.
Table 99:Mnemonic Extensions for Multiply Accumulate Instructions Extension he heg ho hog w wh wl smf smi ssf ssi umi usi a aa aaw an anw halfword even halfword even guarded halfword odd halfword odd guarded word word high word low signed modulo fractional signed modulo integer signed saturate fractional signed saturate integer unsigned modulo integer unsigned saturate integer place in accumulator add to accumulator Meaning Multiply Form 16 X 16 32 16 X 16 32, 64-bit final accumulate result 16 X 16 32 16 X 16 32, 64-bit final accumulate result 32 X 32 64 32 X 32 32 (high-order 32 bits of product) 32 X 32 32 (low-order 32 bits of product) Data Format modulo, no saturation or overflow modulo, no saturation or overflow saturation on product and accumulate saturation on product and accumulate modulo, no saturation or overflow saturation on product and accumulate Accumulate Option result accumulator accumulator + result accumulator Comments
add to accumulator as word elements accumulator0:31 + result0:31 accumulator0:31 accumulator32:63 + result32:63 accumulator32:63 add negated to accumulator add negated to accumulator as word elements accumulator - result accumulator accumulator0:31 - result0:31 accumulator0:31 accumulator32:63 - result32:63 accumulator32:63
508
Power ISA I
8.3.7 SPE Instructions 8.3.8 Saturation, Shift, and Bit Reverse Models
For saturation, left shifts, and bit reversal, the pseudo RTL is provided here to more accurately describe those functions that are referenced in the instruction pseudo RTL.
8.3.8.1 Saturation
SATURATE(ov, carry, sat_ovn, sat_ov, val) if ov then if carry then return sat_ovn else return sat_ov else return val
509
EVX-form
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
527
31 0
RA
16
///
21
520
31
n implementation-dependent number of mask bits mask (RB)64-n:63 (RA)64-n:63 a d BITREVERSE(1 + BITREVERSE(a | ( mask))) RT (RA)0:63-n || (d & mask) brinc computes a bit-reverse index based on the contents of RA and a mask specified in RB. The new index is written to RT. The number of bits in the mask is implementation-dependent but may not exceed 32. Special Registers Altered: None Programming Note brinc provides a way for software to access FFT data in a bit-reversed manner. RA contains the index into a buffer that contains data on which FFT is to be performed. RB contains a mask that allows the index to be updated with bit-reversed addressing. Typically this instruction precedes a load with index instruction; for example, brinc r2, r3, r4 lhax r8, r5, r2 RB contains a bit-mask that is based on the number of points in an FFT. To access a buffer containing n byte sized data that is to be accessed with bit-reversed addressing, the mask has log2n 1s in the least significant bit positions and 0s in the remaining most significant bit positions. If, however, the data size is a multiple of a halfword or a word, the mask is constructed so that the 1s are shifted left by log2 (size of the data) and 0s are placed in the least significant bit positions. Programming Note Architecture Note This instruction only modifies the lower 32 bits of the destination register in 32-bit implementations. For 64-bit implementations in 32-bit mode, the contents of the upper 32-bits of the destination register are undefined. Programming Note Execution of brinc does not cause SPE Unavailable exceptions regardless of MSRSPV.
RT0:31 ABS((RA)0:31) RT32:63 ABS((RA)32:63) The absolute value of each element of RA is placed in the corresponding elements of RT. An absolute value of 0x8000_0000 (most negative number) returns 0x8000_0000. Special Registers Altered: None
EVX-form
RT,RB,UI RT
11
UI
16
RB
21
514
31
RT0:31 (RB)0:31 + EXTZ(UI) RT32:63 (RB)32:63 + EXTZ(UI) UI is zero-extended and added to both the high and low elements of RB and the results are placed in RT. Note that the same value is added to both elements of the register. Special Registers Altered: None
RT
11
RA
16
///
21
1225
31
RT0:31 (ACC)0:31 + (RA)0:31 RT32:63 (ACC)32:63 + (RA)32:63 ACC0:63 (RT)0:63 Each word element in RA is added to the corresponding element in the accumulator and the results are placed in RT and into the accumulator. Special Registers Altered: ACC
510
Power ISA I
RT
11
RA
16
///
21
1217
31 0
4
6
RT
11
RA
16
///
21
1216
31
temp0:63 EXTS((ACC)0:31) + EXTS((RA)0:31) ovh temp31 temp32 RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:63 EXTS((ACC)32:63) + EXTS((RA)32:63) ovl temp31 temp32 RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and added to the corresponding sign-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:63 EXTZ((ACC)0:31) + EXTZ((RA)0:31) ovh temp31 RT0:31 SATURATE(ovh, temp31, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) temp0:63 EXTZ((ACC)32:63) + EXTZ((RA)32:63) ovl temp31 RT32:63 SATURATE(ovl, temp31, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) ACC0:63 (RT)0:63
SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each unsigned-integer word element in RA is zero-extended and added to the corresponding zero-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
EVX-form
RA
16
RB
21
512
31
4
0 6
RT
11
RA
16
///
21
1224
31
RT0:31 (RA)0:31 + (RB)0:31 RT32:63 (RA)32:63 + (RB)32:63 RT0:31 (ACC)0:31 + (RA)0:31 RT32:63 (ACC)32:63 + (RA)32:63 ACC0:63 (RT)0:63 Each unsigned-integer word element in RA is added to the corresponding element in the accumulator and the results are placed in RT and the accumulator. Special Registers Altered: ACC The corresponding elements of RA and RB are added and the results are placed in RT. The sum is a modulo sum. Special Registers Altered: None
511
EVX-form
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
529
31 0
RA
16
RB
21
530
31
RT0:31 (RA)0:31 & (RB)0:31 RT32:63 (RA)32:63 & (RB)32:63 The corresponding elements of RA and RB are ANDed bitwise and the results are placed in the corresponding element of RT. Special Registers Altered: None
RT0:31 (RA)0:31 & ((RB)0:31) RT32:63 (RA)32:63 & ((RB)32:63) The word elements of RA are ANDed bitwise with the complement of the corresponding elements of RB. The results are placed in the corresponding element of RT. Special Registers Altered: None
EVX-form
BF,RA,RB BF //
9 11
RA
16
RB
21
564
31 0
RA
16
RB
21
561
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah = bh) then ch 1 else ch 0 if (al = bl) then cl 1 else cl 0 CR4BF+32:4BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order element of RA is equal to the high-order element of RB; it is cleared otherwise. The next bit in BF is set if the low-order element of RA is equal to the low-order element of RB and cleared otherwise. The last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah > bh) then ch 1 else ch 0 if (al > bl) then cl 1 else cl 0 CR4BF+32:4BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order element of RA is greater than the high-order element of RB; it is cleared otherwise. The next bit in BF is set if the low-order element of RA is greater than the low-order element of RB and cleared otherwise. The last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF
512
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
560
31 0
RA
16
RB
21
563
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah >u bh) then ch 1 else ch 0 if (al >u bl) then cl 1 else cl 0 CR4BF+32:4BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order element of RA is greater than the high-order element of RB; it is cleared otherwise. The next bit in BF is set if the low-order element of RA is greater than the low-order element of RB and cleared otherwise. The last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah < bh) then ch 1 else ch 0 if (al < bl) then cl 1 else cl 0 CR4BF+32:4BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order element of RA is less than the high-order element of RB; it is cleared otherwise. The next bit in BF is set if the low-order element of RA is less than the low-order element of RB and cleared otherwise. The last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF
BF,RA,RB BF //
9 11
RA
16
RB
21
562
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah <u bh) then ch 1 else ch 0 if (al <u bl) then cl 1 else cl 0 CR4BF+32:4BF+35 ch || cl || (ch | cl) || (ch & cl) The most significant bit in BF is set if the high-order element of RA is less than the high-order element of RB; it is cleared otherwise. The next bit in BF is set if the low-order element of RA is less than the low-order element of RB and cleared otherwise. The last two bits of BF are set to the OR and AND of the result of the compare of the high and low elements. Special Registers Altered: CR field BF
513
EVX-form
RT,RA 4 RT
11
RA
16
RB
21
1222
31
RA
16
///
21
526
31
n 0 (RA)n s do while n < 32 if (RA)n s then leave n + 1 n n RT0:31 n 0 (RA)n+32 s do while n < 32 if (RA)n+32 s then leave n + 1 n n RT32:63 The leading sign bits in each element of RA are counted, and the respective count is placed into each element of RT. Special Registers Altered: None Programming Note evcntlzw is used for unsigned operands; evcntlsw is used for signed operands.
ddh (RA)0:31 ddl (RA)32:63 dvh (RB)0:31 dvl (RB)32:63 RT0:31 ddh dvh RT32:63 ddl dvl ovh 0 ovl 0 if ((ddh < 0) & (dvh = 0)) then RT0:31 0x8000_0000 ovh 1 else if ((ddh >= 0) & (dvh = 0)) RT0:31 0x7FFFFFFF ovh 1 else if (ddh = 0x8000_0000)&(dvh then RT0:31 0x7FFFFFFF ovh 1 if ((ddl < 0) & (dvl = 0)) then RT32:63 0x8000_0000 ovl 1 else if ((ddl >= 0) & (dvl = 0)) RT32:63 0x7FFFFFFF ovl 1 else if (ddl = 0x8000_0000)&(dvl then RT32:63 0x7FFFFFFF ovl 1 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl
then = 0xFFFF_FFFF)
then = 0xFFFF_FFFF)
RT,RA RT
11
RA
16
///
21
525
31
n 0 do while n < 32 if (RA)n = 1 then leave n + 1 n RT0:31 n 0 n do while n < 32 if (RA)n+32 = 1 then leave n + 1 n n RT32:63 The leading zero bits in each element of RA are counted, and the respective count is placed into each element of RT. Special Registers Altered: None
The two dividends are the two elements of the contents of RA. The two divisors are the two elements of the contents of RB. The resulting two 32-bit quotients on each element are placed into RT. The remainders are not supplied. The operands and quotients are interpreted as signed integers. Special Registers Altered: OV OVH SOV SOVH Programming Note Note that any overflow indication is always set as a side effect of this instruction. No form is defined that disables the setting of the overflow bits. In case of overflow, a saturated value is delivered into the destination register.
514
Power ISA I
EVX-form
Vector Equivalent
eveqv RT,RA,RB RT
6 11
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
1223
31 0
RA
16
RB
21
537
31
ddh (RA)0:31 ddl (RA)32:63 dvh (RB)0:31 dvl (RB)32:63 RT0:31 ddh dvh ddl dvl RT32:63 0 ovh ovl 0 if (dvh = 0) then 0xFFFFFFFF RT0:31 ovh 1 if (dvl = 0) then 0xFFFFFFFF RT32:63 ovl 1 ovh SPEFSCROVH ovl SPEFSCROV SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV | ovl SPEFSCRSOV The two dividends are the two elements of the contents of RA. The two divisors are the two elements of the contents of RB. Two 32-bit quotients are formed as a result of the division on each of the high and low elements and the quotients are placed into RT. Remainders are not supplied. Operands and quotients are interpreted as unsigned integers. Special Registers Altered: OV OVH SOV SOVH Programming Note Note that any overflow indication is always set as a side effect of this instruction. No form is defined that disables the setting of the overflow bits. In case of overflow, a saturated value is delivered into the destination register.
RT0:31 RT32:63
The corresponding elements of RA and RB are XORed bitwise, and the complemented results are placed in RT. Special Registers Altered: None
EVX-form
RT,RA RT
11
RA
16
///
21
522
31
RT0:31 RT32:63
EXTS((RA)24:31) EXTS((RA)56:63)
The signs of the low-order byte in each of the elements in RA are extended, and the results are placed in RT. Special Registers Altered: None
EVX-form
RT,RA RT
11
RA
16
///
21
523
31
RT0:31 RT32:63
EXTS((RA)16:31) EXTS((RA)48:63)
The signs of the odd halfwords in each of the elements in RA are extended, and the results are placed in RT. Special Registers Altered: None
515
RT,D(RA) RT
11
RA
16
UI
21
769
31 0
RA
16
RB
21
768
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI8) EA RT MEM(EA, 8) D in the instruction mnemonic is UI 8. The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
RT,D(RA) RT
11
RA
16
UI
21
773
31 0
RA
16
RB
21
772
31
if (RA = 0) then b 0 (RA) else b EA b + EXTZ(UI8) MEM(EA, 2) RT0:15 MEM(EA+2,2) RT16:31 RT32:47 MEM(EA+4,2) MEM(EA+6,2) RT48:63 D in the instruction mnemonic is UI 8. The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b EA b + (RB) MEM(EA, 2) RT0:15 MEM(EA+2,2) RT16:31 RT32:47 MEM(EA+4,2) MEM(EA+6,2) RT48:63 The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
516
Power ISA I
RT,D(RA) RT
11
RA
16
UI
21
771
31 0
RA
16
RB
21
770
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI8) EA RT0:31 MEM(EA, 4) MEM(EA+4, 4) RT32:63 D in the instruction mnemonic is UI 8. The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b b + (RB) EA RT0:31 MEM(EA, 4) MEM(EA+4, 4) RT32:63 The doubleword addressed by EA is loaded from memory and placed in RT. Special Registers Altered: None
Vector Load Halfword into Halfwords Even and Splat Indexed EVX-form
evlhhesplatx RT,RA,RB
RT
11
RA
16
UI
21
777
31 0
4
6
RT
11
RA
16
RB
21
776
31
if (RA = 0) then b 0 else b (RA) b + EXTZ(UI2) EA MEM(EA,2) RT0:15 RT16:31 0x0000 MEM(EA,2) RT32:47 0x0000 RT48:63 D in the instruction mnemonic is UI 2. The halfword addressed by EA is loaded from memory and placed in the even halfwords of each element of RT. The odd halfwords of each element of RT are set to 0. Special Registers Altered: None
if (RA = 0) then b else b (RA) b + (RB) EA MEM(EA,2) RT0:15 RT16:31 0x0000 MEM(EA,2) RT32:47 0x0000 RT48:63
The halfword addressed by EA is loaded from memory and placed in the even halfwords of each element of RT. The odd halfwords of each element of RT are set to 0. Special Registers Altered: None
517
Vector Load Halfword into Halfword Odd Signed and Splat Indexed EVX-form
evlhhossplatx RT,RA,RB
RT
11
RA
16
UI
21
783
31 0
4
6
RT
11
RA
16
RB
21
782
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI2) EA RT0:31 EXTS(MEM(EA,2)) EXTS(MEM(EA,2)) RT32:63 D in the instruction mnemonic is UI 2. The halfword addressed by EA is loaded from memory and placed in the odd halfwords sign extended in each element of RT. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b b + (RB) EA RT0:31 EXTS(MEM(EA,2)) EXTS(MEM(EA,2)) RT32:63 The halfword addressed by EA is loaded from memory and placed in the odd halfwords sign extended in each element of RT. Special Registers Altered: None
Vector Load Halfword into Halfword Odd Unsigned and Splat EVX-form
evlhhousplat RT,D(RA) 4
0 6
Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed EVX-form
evlhhousplatx RT,RA,RB
RT
11
RA
16
UI
21
781
31 0
4
6
RT
11
RA
16
RB
21
780
31
if (RA = 0) then b 0 else b (RA) b + EXTZ(UI2) EA EXTZ(MEM(EA,2)) RT0:31 RT32:63 EXTZ(MEM(EA,2)) D in the instruction mnemonic is UI 2. The halfword addressed by EA is loaded from memory and placed in the odd halfwords zero-extended in each element of RT. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) b + (RB) EA EXTZ(MEM(EA,2)) RT0:31 RT32:63 EXTZ(MEM(EA,2)) The halfword addressed by EA is loaded from memory and placed in the odd halfwords zero-extended in each element of RT. Special Registers Altered: None
518
Power ISA I
RT,D(RA) RT
11
RA
16
UI
21
785
31 0
RA
16
RB
21
784
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI4) EA RT0:15 MEM(EA,2) 0x0000 RT16:31 MEM(EA+2,2) RT32:47 RT48:63 0x0000 D in the instruction mnemonic is UI 4. The word addressed by EA is loaded from memory and placed in the even halfwords of each element of RT. The odd halfwords of each element of RT are set to 0. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b b + (RB) EA RT0:15 MEM(EA,2) 0x0000 RT16:31 MEM(EA+2,2) RT32:47 RT48:63 0x0000 The word addressed by EA is loaded from memory and placed in the even halfwords in each element of RT. The odd halfwords of each element of RT are set to 0. Special Registers Altered: None
Vector Load Word into Two Halfwords Odd Signed (with sign extension) EVX-form
evlwhos 4
0 6
Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) EVX-form
evlwhosx RT,RA,RB RT
6 11
RT,D(RA) RT
11
RA
16
UI
21
791
31 0
RA
16
RB
21
790
31
if (RA = 0) then b 0 else b (RA) b + EXTZ(UI4) EA EXTS(MEM(EA,2)) RT0:31 RT32:63 EXTS(MEM(EA+2,2)) D in the instruction mnemonic is UI 4. The word addressed by EA is loaded from memory and placed in the odd halfwords sign extended in each element of RT. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) b + (RB) EA EXTS(MEM(EA,2)) RT0:31 RT32:63 EXTS(MEM(EA+2,2)) The word addressed by EA is loaded from memory and placed in the odd halfwords sign extended in each element of RT. Special Registers Altered: None
519
Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) EVX-form
evlwhoux RT,RA,RB RT
6 11
RT,D(RA) RT
11
RA
16
UI
21
789
31 0
RA
16
RB
21
788
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI4) EA RT0:31 EXTZ(MEM(EA,2)) EXTZ(MEM(EA+2,2)) RT32:63 D in the instruction mnemonic is UI 4. The word addressed by EA is loaded from memory and placed in the odd halfwords zero-extended in each element of RT. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b EA b + (RB) EXTZ(MEM(EA,2)) RT0:31 EXTZ(MEM(EA+2,2)) RT32:63 The word addressed by EA is loaded from memory and placed in the odd halfwords zero-extended in each element of RT. Special Registers Altered: None
Vector Load Word into Two Halfwords and Splat Indexed EVX-form
evlwhsplatx RT,RA,RB
RT,D(RA) RT
11
RA
16
UI
21
797
31 0
4
6
RT
11
RA
16
RB
21
796
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI4) EA RT0:15 MEM(EA,2) MEM(EA,2) RT16:31 MEM(EA+2,2) RT32:47 RT48:63 MEM(EA+2,2) D in the instruction mnemonic is UI 4. The word addressed by EA is loaded from memory and placed in both the even and odd halfwords in each element of RT. Special Registers Altered: None
if (RA = 0) then b 0 (RA) else b EA b + (RB) MEM(EA,2) RT0:15 MEM(EA,2) RT16:31 RT32:47 MEM(EA+2,2) MEM(EA+2,2) RT48:63 The word addressed by EA is loaded from memory and placed in both the even and odd halfwords in each element of RT. Special Registers Altered: None
520
Power ISA I
RT
11
RA
16
UI
21
793
31 0
4
6
RT
11
RA
16
RB
21
792
31
if (RA = 0) then b 0 (RA) else b b + EXTZ(UI4) EA RT0:31 MEM(EA,4) MEM(EA,4) RT32:63 D in the instruction mnemonic is UI 4. The word addressed by EA is loaded from memory and placed in both elements of RT. Special Registers Altered: None
The word addressed by EA is loaded from memory and placed in both elements of RT. Special Registers Altered: None
EVX-form
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
556
31 0
RA
16
RB
21
557
31
RT0:31 RT32:63
(RA)0:31 (RB)0:31
RT0:31 RT32:63
(RA)32:63 (RB)32:63
The high-order elements of RA and RB are merged and placed in RT. Special Registers Altered: None Programming Note A vector splat high can be performed by specifying the same register in RA and RB.
The low-order elements of RA and RB are merged and placed in RT. Special Registers Altered: None Programming Note A vector splat low can be performed by specifying the same register in RA and RB.
521
EVX-form
EVX-form
RT
11
RA
16
RB
21
558
31 0
4
6
RT
11
RA
16
RB
21
559
31
RT0:31 RT32:63
(RA)0:31 (RB)32:63
RT0:31 RT32:63
(RA)32:63 (RB)0:31
The high-order element of RA and the low-order element of RB are merged and placed in RT. Special Registers Altered: None Programming Note With appropriate specification of RA and RB, evmergehi, evmergelo, evmergehilo, and evmergelohi provide a full 32-bit permute of two source operands.
The low-order element of RA and the high-order element of RB are merged and placed in RT. Special Registers Altered: None Programming Note A vector swap can be performed by specifying the same register in RA and RB.
Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate EVX-form
evmhegsmfaa RT,RA,RB 4
0 6
Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX-form
evmhegsmfan RT,RA,RB
RT
11
RA
16
RB
21
1323
31 0
4
6
RT
11
RA
16
RB
21
1451
31
temp0:63 (RA)32:47 gsf (RB)32:47 RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered, halfword signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added to the contents of the 64-bit accumulator and the result is placed in RT and the accumulator Special Registers Altered: ACC Note If the two input operands are both -1.0, the intermediate product is represented as +1.0.
temp0:63 (RA)32:47 gsf (RB)32:47 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered, halfword signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is subtracted from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Note If the two input operands are both -1.0, the intermediate product is represented as +1.0.
522
Power ISA I
Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX-form
evmhegsmian RT,RA,RB
RT
11
RA
16
RB
21
1321
31 0
4
6
RT
11
RA
16
RB
21
1449
31
temp0:31 (RA)32:47 si (RB)32:47 temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered halfword signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended and added to the contents of the 64-bit accumulator, and the resulting sum is placed in RT and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)32:47 si (RB)32:47 temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered halfword signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended and subtracted from the contents of the 64-bit accumulator, and the result is placed in RT and into the accumulator. Special Registers Altered: ACC
Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate EVX-form
evmhegumiaa RT,RA,RB 4
0 6
Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX-form
evmhegumian RT,RA,RB
RT
11
RA
16
RB
21
1320
31 0
4
6
RT
11
RA
16
RB
21
1448
31
temp0:31 (RA)32:47 ui (RB)32:47 temp0:63 EXTZ(temp0:31) (ACC)0:63 + temp0:63 RT0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered halfword unsigned-integer elements in RA and RB are multiplied. The intermediate product is zero-extended and added to the contents of the 64-bit accumulator. The resulting sum is placed in RT and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)32:47 ui (RB)32:47 temp0:63 EXTZ(temp0:31) (ACC)0:63 - temp0:63 RT0:63 ACC0:63 (RT)0:63 The corresponding low even-numbered unsigned-integer elements in RA and RB are multiplied. The intermediate product is zero-extended and subtracted from the contents of the 64-bit accumulator. The result is placed in RT and into the accumulator. Special Registers Altered: ACC
523
RT,RA,RB RT
11
RA
16
RB
21
1035
31 0
4
6
RT
11
RA
16
RB
21
1067
31
RT0:31 (RA)0:15 sf (RB)0:15 RT32:63 (RA)32:47 sf (RB)32:47 The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied then placed into the corresponding words of RT. Special Registers Altered: None
RT0:31
(RA)0:15 sf (RB)0:15
RT32:63 (RA)32:47 sf (RB)32:47 ACC0:63 (RT)0:63 The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied then placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC
Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words EVX-form
evmhesmfaaw RT,RA,RB 4
0 6
Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words EVX-form
evmhesmfanw RT,RA,RB
RT
11
RA
16
RB
21
1291
31 0
4
6
RT
11
RA
16
RB
21
1419
31
temp0:31 (RA)0:15 sf (RB)0:15 (ACC)0:31 + temp0:31 RT0:31 (RA)32:47 sf (RB)32:47 temp0:31 RT32:63 (ACC)32:63 + temp0:31 (RT)0:63 ACC0:63 For each word element in the accumulator, the corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each intermediate product are added to the contents of the accumulator words to form intermediate sums, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)0:15 sf (RB)0:15 (ACC)0:31 - temp0:31 RT0:31 temp0:31 (RA)32:47 sf (RB)32:47 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied. The 32-bit intermediate products are subtracted from the contents of the accumulator words to form intermediate differences, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
524
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
1033
31 0
4
6
RT
11
RA
16
RB
21
1065
31
RT0:31 (RA)0:15 si (RB)0:15 RT32:63 (RA)32:47 si (RB)32:47 The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT. Special Registers Altered: None
RT0:31 (RA)0:15 si (RB)0:15 RT32:63 (RA)32:47 si (RB)32:47 ACC0:63 (RT)0:63 The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC
Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words EVX-form
evmhesmiaaw RT,RA,RB 4
0 6
Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words EVX-form
evmhesmianw RT,RA,RB
RT
11
RA
16
RB
21
1289
31 0
4
6
RT
11
RA
16
RB
21
1417
31
temp0:31 (RA)0:15 si (RB)0:15 (ACC)0:31 + temp0:31 RT0:31 temp0:31 (RA)32:47 si (RB)32:47 (ACC)32:63 + temp0:31 RT32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the accumulator words to form intermediate sums, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)0:15 si (RB)0:15 (ACC)0:31 - temp0:31 RT0:31 temp0:31 (RA)32:47 si (RB)32:47 (ACC)32:63 - temp0:31 RT32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied. Each intermediate 32-bit product is subtracted from the contents of the accumulator words to form intermediate differences, which are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
525
RT,RA,RB RT
11
RA
16
RB
21
1027
31 0
4
6
RT
11
RA
16
RB
21
1059
31
temp0:31 (RA)0:15 sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:31 (RA)32:47 sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding words of RT. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV SOVH
temp0:31 (RA)0:15 sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:31 (RA)32:47 sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 ACC0:63 (RT)0:63 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: ACC OV OVH SOV SOVH
526
Power ISA I
Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words EVX-form
evmhessfanw RT,RA,RB
RT
11
RA
16
RB
21
1283
31 0
4
6
RT
11
RA
16
RB
21
1411
31
temp0:31 (RA)0:15 sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 0x7FFF_FFFF movh 1 else movh 0 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 0x7FFF_FFFF movl 1 else movl 0 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied producing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)0:15 sf (RB)0:15 if ((RA)0:15 = 0x8000) & ((RB)0:15 = 0x8000) then temp0:31 0x7FFF_FFFF movh 1 else movh 0 temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 sf (RB)32:47 if ((RA)32:47 = 0x8000) & ((RB)32:47 = 0x8000) then temp0:31 0x7FFF_FFFF movl 1 else movl 0 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding even-numbered halfword signed fractional elements in RA and RB are multiplied producing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
527
Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words EVX-form
evmhessianw RT,RA,RB
RT
11
RA
16
RB
21
1281
31 0
4
6
RT
11
RA
16
RB
21
1409
31
temp0:31 (RA)0:15 si (RB)0:15 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 si (RB)32:47 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)0:15 si (RB)0:15 temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)32:47 si (RB)32:47 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 RT0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding even-numbered halfword signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
528
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
1032
31 0
4
6
RT
11
RA
16
RB
21
1064
31
RT0:31 (RA)0:15 ui (RB)0:15 RT32:63 (RA)32:47 ui (RB)32:47 The corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT. Special Registers Altered: None
RT0:31 (RA)0:15 ui (RB)0:15 RT32:63 (RA)32:47 ui (RB)32:47 ACC0:63 (RT)0:63 The corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied. The two 32-bit products are placed into RT and into the accumulator. Special Registers Altered: ACC
Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words EVX-form
evmheumiaaw RT,RA,RB
Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX-form
evmheumianw RT,RA,RB
4
0 6
RT
11
RA
16
RB
21
1288
31 0
4
6
RT
11
RA
16
RB
21
1416
31
temp0:31 (RA)0:15 ui (RB)0:15 RT0:31 (ACC)0:31 + temp0:31 temp0:31 (RA)32:47 ui (RB)32:47 RT32:63 (ACC)32:63 + temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied. Each intermediate product is added to the contents of the corresponding accumulator words and the sums are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC
temp0:31 (RA)0:15 ui (RB)0:15 RT0:31 (ACC)0:31 - temp0:31 temp0:31 (RA)32:47 ui (RB)32:47 RT32:63 (ACC)32:63 - temp0:31 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied. Each intermediate product is subtracted from the contents of the corresponding accumulator words. The differences are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC
529
Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX-form
RT
11
RA
16
RB
21
1280
31 0
4
6
RT
11
RA
16
RB
21
1408
31
temp0:31 (RA)0:15 ui (RB)0:15 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp0:63 ovh temp31 RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) temp0:31 (RA)32:47 ui (RB)32:47 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:63 temp31 ovl RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) (RT)0:63 ACC0:63 SPEFSCROVH ovh ovl SPEFSCROV SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)0:15 ui (RB)0:15 temp0:63 EXTZ((ACC)0:31) - EXTZ(temp0:31) temp31 ovh RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) (RA)32:47 ui (RB)32:47 temp0:31 EXTZ((ACC)32:63) - EXTZ(temp0:31) temp0:63 ovl temp31 RT32:63 SATURATE(ovl, 0, 0x0000_0000, 0x0000_0000, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh ovl SPEFSCROV SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV | ovl SPEFSCRSOV For each word element in the accumulator, corresponding even-numbered halfword unsigned-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
530
Power ISA I
Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative EVX-form
evmhogsmfan RT,RA,RB
RT
11
RA
16
RB
21
1327
31 0
4
6
RT
11
RA
16
RB
21
1455
31
temp0:63 (RA)48:63 gsf (RB)48:63 RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered, halfword signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is added to the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Note If the two input operands are both -1.0, the intermediate product is represented as +1.0.
temp0:63 (RA)48:63 gsf (RB)48:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered, halfword signed fractional elements in RA and RB are multiplied using guarded signed fractional multiplication producing a sign extended 64-bit fractional product with the decimal between bits 32 and 33. The product is subtracted from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC Note If the two input operands are both -1.0, the intermediate product is represented as +1.0.
Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate EVX-form
evmhogsmiaa RT,RA,RB 4
0 6
Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative EVX-form
evmhogsmian RT,RA,RB
RT
11
RA
16
RB
21
1325
31 0
4
6
RT
11
RA
16
RB
21
1453
31
temp0:31 (RA)48:63 si (RB)48:63 temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended to 64 bits then added to the contents of the 64-bit accumulator, and the result is placed in RT and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)48:63 si (RB)48:63 temp0:63 EXTS(temp0:31) RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword signed-integer elements in RA and RB are multiplied. The intermediate product is sign-extended to 64 bits then subtracted from the contents of the 64-bit accumulator, and the result is placed in RT and into the accumulator. Special Registers Altered: ACC
531
Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative EVX-form
evmhogumian RT,RA,RB
RT
11
RA
16
RB
21
1324
31 0
4
6
RT
11
RA
16
RB
21
1452
31
temp0:31 (RA)48:63 ui (RB)48:63 temp0:63 EXTZ(temp0:31) RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. The intermediate product is zero-extended to 64 bits then added to the contents of the 64-bit accumulator, and the result is placed in RT and into the accumulator. Special Registers Altered: ACC
temp0:31 (RA)48:63 ui (RB)48:63 temp0:63 EXTZ(temp0:31) RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. The intermediate product is zero-extended to 64 bits then subtracted from the contents of the 64-bit accumulator, and the result is placed in RT and into the accumulator. Special Registers Altered: ACC
RT,RA,RB RT
11
RA
16
RB
21
1039
31 0
4
6
RT
11
RA
16
RB
21
1071
31
RT0:31 (RA)16:31 sf (RB)16:31 RT32:63 (RA)48:63 sf (RB)48:63 The corresponding odd-numbered, halfword signed fractional elements in RA and RB are multiplied. Each product is placed into the corresponding words of RT. Special Registers Altered: None
RT0:31 (RA)16:31 sf (RB)16:31 RT32:63 (RA)48:63 sf (RB)48:63 ACC0:63 (RT)0:63 The corresponding odd-numbered, halfword signed fractional elements in RA and RB are multiplied. Each product is placed into the corresponding words of RT. and into the accumulator. Special Registers Altered: ACC
532
Power ISA I
Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words EVX-form
evmhosmfanw RT,RA,RB
RT
11
RA
16
RB
21
1295
31 0
4
6
RT
11
RA
16
RB
21
1423
31
temp0:31 (RA)16:31 (ACC)0:31 + RT0:31 temp0:31 (RA)48:63 (ACC)32:63 RT32:63 ACC0:63 (RT)0:63
temp0:31 (RA)16:31 (ACC)0:31 RT0:31 temp0:31 (RA)48:63 (ACC)32:63 RT32:63 ACC0:63 (RT)0:63
For each word element in the accumulator, the corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each intermediate product are added to the contents of the corresponding accumulator word and the results are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
For each word element in the accumulator, the corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each intermediate product are subtracted from the contents of the corresponding accumulator word and the results are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
RT,RA,RB RT
11
RA
16
RB
21
1037
31 0
4
6
RT
11
RA
16
RB
21
1069
31
RT0:31 (RA)16:31 si (RB)16:31 RT32:63 (RA)48:63 si (RB)48:63 The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT. Special Registers Altered: None
RT0:31 (RA)16:31 si (RB)16:31 RT32:63 (RA)48:63 si (RB)48:63 ACC0:63 (RT)0:63 The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT and into the accumulator. Special Registers Altered: ACC
533
Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words EVX-form
evmhosmianw RT,RA,RB
RT
11
RA
16
RB
21
1293
31 0
4
6
RT
11
RA
16
RB
21
1421
31
temp0:31 (RA)16:31 (ACC)0:31 + RT0:31 temp0:31 (RA)48:63 (ACC)32:63 RT32:63 ACC0:63 (RT)0:63
temp0:31 (RA)16:31 si (RB)16:31 (ACC)0:31 - temp0:31 RT0:31 temp0:31 (RA)48:63 si (RB)48:63 (ACC)32:63 - temp0:31 RT32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. Each intermediate 32-bit product is subtracted from the contents of the corresponding accumulator word and the results are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
For each word element in the accumulator, the corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied. Each intermediate 32-bit product is added to the contents of the corresponding accumulator word and the results are placed into the corresponding RT words and into the accumulator. Special Registers Altered: ACC
534
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
1031
31 0
4
6
RT
11
RA
16
RB
21
1063
31
temp0:31 (RA)16:31 sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:31 (RA)48:63 sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding words of RT. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV SOVH
temp0:31 (RA)16:31 sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:31 (RA)48:63 sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 ACC0:63 (RT)0:63 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied. The 32 bits of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: ACC OV OVH SOV SOVH
535
Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words EVX-form
evmhossfanw RT,RA,RB
RT
11
RA
16
RB
21
1287
31 0
4
6
RT
11
RA
16
RB
21
1415
31
temp0:31 (RA)16:31 sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 0x7FFF_FFFF movh 1 else movh 0 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 0x7FFF_FFFF movl 1 else movl 0 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied producing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)16:31 sf (RB)16:31 if ((RA)16:31 = 0x8000) & ((RB)16:31 = 0x8000) then temp0:31 0x7FFF_FFFF movh 1 else movh 0 temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 sf (RB)48:63 if ((RA)48:63 = 0x8000) & ((RB)48:63 = 0x8000) then temp0:31 0x7FFF_FFFF movl 1 else movl 0 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh | movh SPEFSCROV ovl| movl SPEFSCRSOVH SPEFSCRSOVH | ovh | movh SPEFSCRSOV SPEFSCRSOV | ovl| movl The corresponding odd-numbered halfword signed fractional elements in RA and RB are multiplied producing a 32-bit product. If both inputs are -1.0, the result saturates to 0x7FFF_FFFF. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
536
Power ISA I
Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words EVX-form
evmhossianw RT,RA,RB
RT
11
RA
16
RB
21
1285
31 0
4
6
RT
11
RA
16
RB
21
1413
31
temp0:31 (RA)16:31 si (RB)16:31 temp0:63 EXTS((ACC)0:31) + EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 si (RB)48:63 temp0:63 EXTS((ACC)32:63) + EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)16:31 si (RB)16:31 temp0:63 EXTS((ACC)0:31) - EXTS(temp0:31) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:31 (RA)48:63 si (RB)48:63 temp0:63 EXTS((ACC)32:63) - EXTS(temp0:31) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding odd-numbered halfword signed-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
RT,RA,RB RT
11
RA
16
RB
21
1036
31 0
4
6
RT
11
RA
16
RB
21
1068
31
RT0:31 (RA)16:31 ui (RB)16:31 RT32:63 (RA)48:63 ui (RB)48:63 The corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. The two 32-bit products are placed into the corresponding words of RT. Special Registers Altered: None
RT0:31 (RA)16:31 ui (RB)16:31 RT32:63 (RA)48:63 ui (RB)48:63 ACC0:63 (RT)0:63 The corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. The two 32-bit products are placed into RT and into the accumulator. Special Registers Altered: ACC
537
Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words EVX-form
RT
11
RA
16
RB
21
1292
31 0
4
6
RT
11
RA
16
RB
21
1420
31
temp0:31 (RA)16:31 RT0:31 (ACC)0:31 + temp0:31 (RA)48:63 RT32:63 (ACC)32:63 ACC0:63 (RT)0:63
For each word element in the accumulator, the corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. Each intermediate product is added to the contents of the corresponding accumulator word. The sums are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC
temp0:31 (RA)16:31 RT0:31 (ACC)0:31 temp0:31 (RA)48:63 RT32:63 (ACC)32:63 ACC0:63 (RT)0:63
For each word element in the accumulator, the corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied. Each intermediate product is subtracted from the contents of the corresponding accumulator word. The results are placed into the corresponding RT and accumulator words. Special Registers Altered: ACC
Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words EVX-form
evmhousiaaw RT,RA,RB
Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words EVX-form
evmhousianw RT,RA,RB
4
0 6
RT
11
RA
16
RB
21
1284
31 0
4
6
RT
11
RA
16
RB
21
1412
31
temp0:31 (RA)16:31 ui (RB)16:31 temp0:63 EXTZ((ACC)0:31) + EXTZ(temp0:31) temp31 ovh RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) (RA)48:63 ui (RB)48:63 temp0:31 EXTZ((ACC)32:63) + EXTZ(temp0:31) temp0:63 ovl temp31 RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh ovl SPEFSCROV SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV | ovl SPEFSCRSOV For each word element in the accumulator, corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:31 (RA)16:31 ui (RB)16:31 EXTZ((ACC)0:31) - EXTZ(temp0:31) temp0:63 temp31 ovh RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) (RA)48:63 ui (RB)48:63 temp0:31 temp0:63 EXTZ((ACC)32:63) - EXTZ(temp0:31) temp31 ovl RT32:63 SATURATE(ovl, 0, 0x0000_0000,0x0000_0000, temp32:63) (RT)0:63 ACC0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOV | ovl SPEFSCRSOV For each word element in the accumulator, corresponding odd-numbered halfword unsigned-integer elements in RA and RB are multiplied producing a 32-bit product. Each 32-bit product is then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
538
Power ISA I
EVX-form
RT,RA RT
11
RA
16
///
21
1220
31
ACC0:63 (RA)0:63 RT0:63 (RA)0:63 The contents of RA are placed into the accumulator and RT. This is the method for initializing the accumulator. Special Registers Altered: ACC
RT,RA,RB RT
11
RA
16
RB
21
1103
31 0
4
6
RT
11
RA
16
RB
21
1135
31
temp0:63 (RA)0:31 sf (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 sf (RB)32:63 RT32:63 temp0:31 The corresponding word signed fractional elements in RA and RB are multiplied and bits 0:31 of the two products are placed into the two corresponding words of RT. Special Registers Altered: None
temp0:63 (RA)0:31 sf (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 sf (RB)32:63 RT32:63 temp0:31 ACC0:63 (RT)0:63 The corresponding word signed fractional elements in RA and RB are multiplied and bits 0:31 of the two products are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: ACC
RT,RA,RB RT
11
RA
16
RB
21
1101
31 0
4
6
RT
11
RA
16
RB
21
1133
31
temp0:63 (RA)0:31 si (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 si (RB)32:63 RT32:63 temp0:31 The corresponding word signed-integer elements in RA and RB are multiplied. Bits 0:31 of the two 64-bit products are placed into the two corresponding words of RT. Special Registers Altered: None
temp0:63 (RA)0:31 si (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 si (RB)32:63 RT32:63 temp0:31 ACC0:63 (RT)0:63 The corresponding word signed-integer elements in RA and RB are multiplied. Bits 0:31 of the two 64-bit products are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: ACC
539
RT,RA,RB RT
11
RA
16
RB
21
1095
31 0
4
6
RT
11
RA
16
RB
21
1127
31
temp0:63 (RA)0:31 sf (RB)0:31 if ((RA)0:31 = 0x8000_0000)& ((RB)0:31 = 0x8000_0000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63 = 0x8000_0000 &(RB)32:63 = 0x8000_0000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding word signed fractional elements in RA and RB are multiplied. Bits 0:31 of each product are placed into the corresponding words of RT. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV SOVH
temp0:63 (RA)0:31 sf (RB)0:31 if ((RA)0:31 = 0x8000_0000) & ((RB)0:31 = 0x8000_0000) then RT0:31 0x7FFF_FFFF movh 1 else RT0:31 temp0:31 movh 0 temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then RT32:63 0x7FFF_FFFF movl 1 else RT32:63 temp0:31 movl 0 ACC0:63 (RT)0:63 SPEFSCROVH movh SPEFSCROV movl SPEFSCRSOVH SPEFSCRSOVH | movh SPEFSCRSOV SPEFSCRSOV | movl The corresponding word signed fractional elements in RA and RB are multiplied. Bits 0:31 of each product are placed into the corresponding words of RT and into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: ACC OV OVH SOV SOVH
RT,RA,RB RT
11
RA
16
RB
21
1100
31 0
4
6
RT
11
RA
16
RB
21
1132
31
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 temp0:31 The corresponding word unsigned-integer elements in RA and RB are multiplied. Bits 0:31 of the two products are placed into the two corresponding words of RT. Special Registers Altered: None
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 temp0:31 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 temp0:31 ACC0:63 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. Bits 0:31 of the two products are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: ACC
540
Power ISA I
Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words EVX-form
evmwlsmianw RT,RA,RB
RT
11
RA
16
RB
21
1353
31 0
4
6
RT
11
RA
16
RB
21
1481
31
temp0:63 (RA)0:31 si (RB)0:31 RT0:31 (ACC)0:31 + temp32:63 temp0:63 (RA)32:63 si (RB)32:63 RT32:63 (ACC)32:63 + temp32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding word signed-integer elements in RA and RB are multiplied. The least significant 32 bits of each intermediate product are added to the contents of the corresponding accumulator words, and the result is placed in RT and the accumulator. Special Registers Altered: ACC
temp0:63 (RA)0:31 si (RB)0:31 RT0:31 (ACC)0:31 - temp32:63 temp0:63 (RA)32:63 si (RB)32:63 RT32:63 (ACC)32:63 - temp32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding word elements in RA and RB are multiplied. The least significant 32 bits of each intermediate product are subtracted from the contents of the corresponding accumulator words and the result is placed in RT and the accumulator. Special Registers Altered: ACC
Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words EVX-form
evmwlssiaaw RT,RA,RB 4
0 6
Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words EVX-form
evmwlssianw RT,RA,RB
RT
11
RA
16
RB
21
1345
31 0
4
6
RT
11
RA
16
RB
21
1473
31
temp0:63 (RA)0:31 si (RB)0:31 temp0:63 EXTS((ACC)0:31) + EXTS(temp32:63) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:63 (RA)32:63 si (RB)32:63 temp0:63 EXTS((ACC)32:63) + EXTS(temp32:63) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding word signed-integer elements in RA and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:63 (RA)0:31 si (RB)0:31 temp0:63 EXTS((ACC)0:31) - EXTS(temp32:63) ovh (temp31 temp32) RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:63 (RA)32:63 si (RB)32:63 temp0:63 EXTS((ACC)32:63) - EXTS(temp32:63) ovl (temp31 temp32) RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl The corresponding word signed-integer elements in RA and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
541
RT,RA,RB RT
11
RA
16
RB
21
1096
31 0
4
6
RT
11
RA
16
RB
21
1128
31
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 temp32:63 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 temp32:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each product are placed into the two corresponding words of RT. Special Registers Altered: None Programming Note The least significant 32 bits of the product are independent of whether the word elements in RA and RB are treated as signed or unsigned 32-bit integers. Note that evmwlumi can be used for signed or unsigned integers.
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 temp32:63 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 temp32:63 ACC0:63 (RT)0:63 The corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each product are placed into the two corresponding words of RT and into the accumulator. Special Registers Altered: ACC Programming Note The least significant 32 bits of the product are independent of whether the word elements in RA and RB are treated as signed or unsigned 32-bit integers. Note that evmwlumia can be used for signed or unsigned integers.
Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words EVX-form
evmwlumiaaw RT,RA,RB 4
0 6
Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words EVX-form
evmwlumianw RT,RA,RB
RT
11
RA
16
RB
21
1352
31 0
4
6
RT
11
RA
16
RB
21
1480
31
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 (ACC)0:31 + temp32:63 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 (ACC)32:63 + temp32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each product are added to the contents of the corresponding accumulator word and the result is placed in RT and the accumulator. Special Registers Altered: ACC
temp0:63 (RA)0:31 ui (RB)0:31 RT0:31 (ACC)0:31 - temp32:63 temp0:63 (RA)32:63 ui (RB)32:63 RT32:63 (ACC)32:63 - temp32:63 ACC0:63 (RT)0:63 For each word element in the accumulator, the corresponding word unsigned-integer elements in RA and RB are multiplied. The least significant 32 bits of each product are subtracted from the contents of the corresponding accumulator word and the result is placed in RT and the accumulator. Special Registers Altered: ACC
542
Power ISA I
Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words EVX-form
evmwlusianw RT,RA,RB
RT
11
RA
16
RB
21
1344
31 0
4
6
RT
11
RA
16
RB
21
1472
31
temp0:63 (RA)0:31 ui (RB)0:31 EXTZ((ACC)0:31) + EXTZ(temp32:63) temp0:63 ovh temp31 RT0:31 SATURATE(ovh, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) temp0:63 (RA)32:63 ui (RB)32:63 EXTZ((ACC)32:63) + EXTZ(temp32:63) temp0:63 temp31 ovl RT32:63 SATURATE(ovl, 0, 0xFFFF_FFFF, 0xFFFF_FFFF, temp32:63) (RT)0:63 ACC0:63 SPEFSCROVH ovh ovl SPEFSCROV SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, corresponding word unsigned-integer elements in RA and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then added to the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
temp0:63 (RA)0:31 ui (RB)0:31 EXTZ((ACC)0:31) - EXTZ(temp32:63) temp0:63 ovh temp31 RT0:31 SATURATE(ovh, 0, 0x0000_0000, 0x0000_0000, temp32:63) temp0:63 (RA)32:63 ui (RB)32:63 EXTZ((ACC)32:63) - EXTZ(temp32:63) temp0:63 temp31 ovl RT32:63 SATURATE(ovl, 0, 0x0000_0000, 0x0000_0000, temp32:63) (RT)0:63 ACC0:63 SPEFSCROVH ovh ovl SPEFSCROV SPEFSCRSOVH | ovh SPEFSCRSOVH SPEFSCRSOV SPEFSCRSOV | ovl For each word element in the accumulator, corresponding word unsigned-integer elements in RA and RB are multiplied producing a 64-bit product. The least significant 32 bits of each product are then subtracted from the corresponding word in the accumulator saturating if overflow occurs, and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
RT,RA,RB RT
11
RA
16
RB
21
1115
31 0
RA
16
RB
21
1147
31
RT0:63
(RA)32:63 sf (RB)32:63
The corresponding low word signed fractional elements in RA and RB are multiplied. The product is placed in RT. Special Registers Altered: None
RT0:63 (RA)32:63 sf (RB)32:63 ACC0:63 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The product is placed in RT and into the accumulator. Special Registers Altered: ACC
543
Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative EVX-form
RT
11
RA
16
RB
21
1371
31 0
4
6
RT
11
RA
16
RB
21
1499
31
temp0:63 (RA)32:63 sf (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The intermediate product is added to the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC
temp0:63 (RA)32:63 sf (RB)32:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The corresponding low word signed fractional elements in RA and RB are multiplied. The intermediate product is subtracted from the contents of the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC
RT,RA,RB RT
11
RA
16
RB
21
1113
31 0
RA
16
RB
21
1145
31
RT0:63
(RA)32:63 si (RB)32:63
The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT. Special Registers Altered: None
RT0:63 (RA)32:63 si (RB)32:63 ACC0:63 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The product is placed in RT and the accumulator. Special Registers Altered: ACC
Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative EVX-form
evmwsmian RT,RA,RB
4
0 6
RT
11
RA
16
RB
21
1369
31 0
4
6
RT
11
RA
16
RB
21
1497
31
temp0:63 (RA)32:63 si (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The intermediate product is added to the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC
temp0:63 (RA)32:63 si (RB)32:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The low word signed-integer elements in RA and RB are multiplied. The intermediate product is subtracted from the contents of the 64-bit accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC
544
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
1107
31 0
RA
16
RB
21
1139
31
temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63 = 0x8000_0000) & (RB32:63 = 0x8000_0000) then RT0:63 0x7FFF_FFFF_FFFF_FFFF mov 1 else RT0:63 temp0:63 mov 0 SPEFSCROVH 0 SPEFSCROV mov SPEFSCRSOV SPEFSCRSOV | mov The low word signed fractional elements in RA and RB are multiplied. The 64-bit product is placed in RT. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: OV OVH SOV
temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then RT0:63 0x7FFF_FFFF_FFFF_FFFF mov 1 else RT0:63 temp0:63 mov 0 ACC0:63 (RT)0:63 SPEFSCROVH 0 SPEFSCROV mov SPEFSCRSOV SPEFSCRSOV | mov The low word signed fractional elements in RA and RB are multiplied. The 64-bit product is placed in RT and into the accumulator. If both inputs are -1.0, the result saturates to the largest positive signed fraction. Special Registers Altered: ACC OV OVH SOV
545
Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative EVX-form
RT
11
RA
16
RB
21
1363
31 0
4
6
RT
11
RA
16
RB
21
1491
31
temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then temp0:63 0x7FFF_FFFF_FFFF_FFFF mov 1 else mov 0 temp0:64 EXTS((ACC)0:63) + EXTS(temp0:63) ov (temp0 temp1) RT0:63 temp1:64 ACC0:63 (RT)0:63 SPEFSCROVH 0 SPEFSCROV ov | mov SPEFSCRSOV SPEFSCRSOV | ov | mov The low word signed fractional elements in RA and RB are multiplied producing a 64-bit product. If both inputs are -1.0, the product saturates to the largest positive signed fraction. The 64-bit product is then added to the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV
temp0:63 (RA)32:63 sf (RB)32:63 if ((RA)32:63=0x8000_0000)&((RB)32:63=0x8000_0000) then temp0:63 0x7FFF_FFFF_FFFF_FFFF mov 1 else mov 0 temp0:64 EXTS((ACC)0:63) - EXTS(temp0:63) ov (temp0 temp1) RT0:63 temp1:64 ACC0:63 (RT)0:63 SPEFSCROVH 0 SPEFSCROV ov | mov SPEFSCRSOV SPEFSCRSOV | ov | mov The low word signed fractional elements in RA and RB are multiplied producing a 64-bit product. If both inputs are -1.0, the product saturates to the largest positive signed fraction. The 64-bit product is then subtracted from the accumulator and the result is placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV
RT,RA,RB RT
11
RA
16
RB
21
1112
31 0
RA
16
RB
21
1144
31
RT0:63
(RA)32:63 ui (RB)32:63
The low word unsigned-integer elements in RA and RB are multiplied to form a 64-bit product that is placed in RT. Special Registers Altered: None
RT0:63 (RA)32:63 ui (RB)32:63 ACC0:63 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied to form a 64-bit product that is placed in RT and into the accumulator. Special Registers Altered: ACC
546
Power ISA I
Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative EVX-form
RT
11
RA
16
RB
21
1368
31 0
4
6
RT
11
RA
16
RB
21
1496
31
temp0:63 (RA)32:63 ui (RB)32:63 RT0:63 (ACC)0:63 + temp0:63 ACC0:63 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied. The intermediate product is added to the contents of the 64-bit accumulator, and the resulting value is placed into the accumulator and in RT. Special Registers Altered: ACC
temp0:63 (RA)32:63 ui (RB)32:63 RT0:63 (ACC)0:63 - temp0:63 ACC0:63 (RT)0:63 The low word unsigned-integer elements in RA and RB are multiplied. The intermediate product is subtracted from the contents of the 64-bit accumulator, and the resulting value is placed into the accumulator and in RT. Special Registers Altered: ACC
Vector NAND
evnand 4
0 6
EVX-form
Vector Negate
evneg RT,RA RT
6 11
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
542
31 0
RA
16
///
21
521
31
RT0:31 ((RA)0:31 & (RB)0:31) RT32:63 ((RA)32:63 & (RB)32:63) Each element of RA and RB is bitwise NANDed. The result is placed in the corresponding element of RT. Special Registers Altered: None
RT0:31 NEG((RA)0:31) RT32:63 NEG((RA)32:63) The negative of each element of RA is placed in RT. The negative of 0x8000_0000 (most negative number) returns 0x8000_0000. Special Registers Altered: None
547
EVX-form
Vector OR
evor RT,RA,RB 4 RT
6 11
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
536
31 0
RA
16
RB
21
535
31
RT0:31 ((RA)0:31 | (RB)0:31) RT32:63 ((RA)32:63 | (RB)32:63) Each element of RA and RB is bitwise NORed. The result is placed in the corresponding element of RT. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Vector NOR instruction to produce a vector bitwise complement operation. Extended: evnot RT,RA Equivalent to: evnor RT,RA,RA
RT0:31 (RA)0:31 | (RB)0:31 RT32:63 (RA)32:63 | (RB)32:63 Each element of RA and RB is bitwise ORed. The result is placed in the corresponding element of RT. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Vector OR instruction to provide a 64-bit vector move instruction. Extended: evmr RT,RA Equivalent to: evor RT,RA,RA
EVX-form
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
539
31 0
RA
16
RB
21
552
31
RT0:31 (RA)0:31 | ((RB)0:31) RT32:63 (RA)32:63 | ((RB)32:63) Each element of RA is bitwise ORed with the complement of RB. The result is placed in the corresponding element of RT. Special Registers Altered: None
nh (RB)27:31 nl (RB)59:63 RT0:31 ROTL((RA)0:31, nh) RT32:63 ROTL((RA)32:63, nl) Each of the high and low elements of RA is rotated left by an amount specified in RB. The result is placed in RT. Rotate values for each element of RA are found in bit positions RB27:31 and RB59:63. Special Registers Altered: None
548
Power ISA I
EVX-form
RT,RA,UI 4 RT
11
RA
16
///
21
524
31
RA
16
UI
21
554
31
n UI RT0:31 ROTL((RA)0:31, n) RT32:63 ROTL((RA)32:63, n) Both the high and low elements of RA are rotated left by an amount specified by UI. Special Registers Altered: None
RT0:31 ((RA)0:31+0x00008000) & 0xFFFF0000 RT32:63 ((RA)32:63+0x00008000) & 0xFFFF0000 The 32-bit elements of RA are rounded into 16 bits. The result is placed in RT. The resulting 16 bits are placed in the most significant 16 bits of each element of RT, zeroing out the low-order 16 bits of each element. Special Registers Altered: None
Vector Select
evsel 4
0 6
EVS-form
RT,RA,RB,BFA RT
11
RA
16
RB
21
79
BFA
29 31
ch CRBFA4 cl CRBFA4+1 if (ch = 1) then RT0:31 (RA)0:31 else RT0:31 (RB)0:31 if (cl = 1) then RT32:63 (RA)32:63 else RT32:63 (RB)32:63 If the most significant bit in the BFA field of CR is set to 1, the high-order element of RA is placed in the high-order element of RT; otherwise, the high-order element of RB is placed into the high-order element of RT. If the next most significant bit in the BFA field of CR is set to 1, the low-order element of RA is placed in the low-order element of RT, otherwise, the low-order element of RB is placed into the low-order element of RT. Special Registers Altered: None
549
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
548
31 0
RA
16
UI
21
550
31
nh (RB)26:31 nl (RB)58:63 RT0:31 SL((RA)0:31, nh) RT32:63 SL((RA)32:63, nl) Each of the high and low elements of RA is shifted left by an amount specified in RB. The result is placed in RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 and 58:63. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None
n UI RT0:31 SL((RA)0:31, n) RT32:63 SL((RA)32:63, n) Both high and low elements of RA are shifted left by the 5-bit UI value and the results are placed in RT. Special Registers Altered: None
EVX-form
RT,SI 4 RT
11
SI
16
///
21
553
31
SI
16
///
21
555
31
RT0:31 SI || 270 RT32:63 SI || 270 The value specified by SI is padded with trailing zeros and placed in both elements of RT. The SI ends up in bit positions RT0:4 and RT32:36. Special Registers Altered: None
RT0:31 EXTS(SI) RT32:63 EXTS(SI) The value specified by SI is sign extended and placed in both elements of RT. Special Registers Altered: None
RT,RA,UI RT
11
RA
16
UI
21
547
31 0
RA
16
UI
21
546
31
n UI RT0:31 EXTS((RA)0:31-n) RT32:63 EXTS((RA)32:63-n) Both high and low elements of RA are shifted right by the 5-bit UI value. Bits in the most significant positions vacated by the shift are filled with a copy of the sign bit. Special Registers Altered: None
n UI RT0:31 EXTZ((RA)0:31-n) RT32:63 EXTZ((RA)32:63-n) Both high and low elements of RA are shifted right by the 5-bit UI value; zeros are shifted into the most significant position. Special Registers Altered: None
550
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
545
31 0
RA
16
RB
21
544
31
nh (RB)26:31 nl (RB)58:63 RT0:31 EXTS((RA)0:31-nh) RT32:63 EXTS((RA)32:63-nl) Both the high and low elements of RA are shifted right by an amount specified in RB. The result is placed in RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 and 58:63. The sign bits are shifted into the most significant position. Shift amounts from 32 to 63 give a result of 32 sign bits. Special Registers Altered: None
nh (RB)26:31 nl (RB)58:63 RT0:31 EXTZ((RA)0:31-nh) RT32:63 EXTZ((RA)32:63-nl) Both the high and low elements of RA are shifted right by an amount specified in RB. The result is placed in RT. The separate shift amounts for each element are specified by 6 bits in RB that lie in bit positions 26:31 and 58:63. Zeros are shifted into the most significant position. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None
RS,D(RA) RS
11
RA
16
UI
21
801
31 0
RA
16
RB
21
800
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI8) MEM(EA,8) (RS)0:63 D in the instruction mnemonic is UI 8. The contents of RS are stored as a doubleword in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,8) (RS)0:63 The contents of RS are stored as a doubleword in storage addressed by EA. Special Registers Altered: None
551
RS,D(RA) RS
11
RA
16
UI
21
805
31 0
RA
16
RB
21
804
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI8) MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)16:31 MEM(EA+4,2) (RS)32:47 MEM(EA+6,2) (RS)48:63 D in the instruction mnemonic is UI 8. The contents of RS are stored as four halfwords in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)16:31 MEM(EA+4,2) (RS)32:47 MEM(EA+6,2) (RS)48:63 The contents of RS are stored as four halfwords in storage addressed by EA. Special Registers Altered: None
RS,D(RA) RS
11
RA
16
UI
21
803
31 0
RA
16
RB
21
802
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI8) MEM(EA,4) (RS)0:31 MEM(EA+4,4) (RS)32:63 D in the instruction mnemonic is UI 8. The contents of RS are stored as two words in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,4) (RS)0:31 MEM(EA+4,4) (RS)32:63 The contents of RS are stored as two words in storage addressed by EA. Special Registers Altered: None
552
Power ISA I
RS,D(RA) RS
11
RA
16
UI
21
817
31 0
RA
16
RB
21
816
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI4) MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)32:47 D in the instruction mnemonic is UI 4. The even halfwords from each element of RS are stored as two halfwords in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,2) (RS)0:15 MEM(EA+2,2) (RS)32:47 The even halfwords from each element of RS are stored as two halfwords in storage addressed by EA. Special Registers Altered: None
RS,D(RA) RS
11
RA
16
UI
21
821
31 0
RA
16
RB
21
820
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI4) MEM(EA,2) (RS)16:31 MEM(EA+2,2) (RS)48:63 D in the instruction mnemonic is UI 4. The odd halfwords from each element of RS are stored as two halfwords in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,2) (RS)16:31 MEM(EA+2,2) (RS)48:63 The odd halfwords from each element of RS are stored as two halfwords in storage addressed by EA. Special Registers Altered: None
RS,D(RA) RS
11
RA
16
UI
21
825
31 0
RA
16
RB
21
824
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI4) MEM(EA,4) (RS)0:31 D in the instruction mnemonic is UI 4. The even word of RS is stored in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,4) (RS)0:31 The even word of RS is stored in storage addressed by EA. Special Registers Altered: None
553
RS,D(RA) RS
11
RA
16
UI
21
829
31 0
RA
16
RB
21
828
31
if (RA = 0) then b 0 else b (RA) EA b + EXTZ(UI4) MEM(EA,4) (RS)32:63 D in the instruction mnemonic is UI 4. The odd word of RS is stored in storage addressed by EA. Special Registers Altered: None
if (RA = 0) then b 0 else b (RA) EA b + (RB) MEM(EA,4) (RS)32:63 The odd word of RS is stored in storage addressed by EA. Special Registers Altered: None
RT
11
RA
16
///
21
1227
31 0
4
6
RT
11
RA
16
///
21
1219
31
RT0:31 (ACC)0:31 - (RA)0:31 RT32:63 (ACC)32:63 - (RA)32:63 ACC0:63 (RT)0:63 Each word element in RA is subtracted from the corresponding element in the accumulator and the difference is placed into the corresponding RT word and into the accumulator. Special Registers Altered: ACC
temp0:63 EXTS((ACC)0:31) - EXTS((RA)0:31) ovh temp31 temp32 RT0:31 SATURATE(ovh, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) temp0:63 EXTS((ACC)32:63) - EXTS((RA)32:63) ovl temp31 temp32 RT32:63 SATURATE(ovl, temp31, 0x8000_0000, 0x7FFF_FFFF, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each signed-integer word element in RA is sign-extended and subtracted from the corresponding sign-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
554
Power ISA I
EVX-form
RA
16
RB
21
516
31
RT
11
RA
16
///
21
1226
31
RT0:31 (ACC)0:31 - (RA)0:31 RT32:63 (ACC)32:63 - (RA)32:63 ACC0:63 (RT)0:63 Each unsigned-integer word element in RA is subtracted from the corresponding element in the accumulator and the results are placed in RT and into the accumulator. Special Registers Altered: ACC
RT0:31 (RB)0:31 - (RA)0:31 RT32:63 (RB)32:63 - (RA)32:63 Each signed-integer element of RA is subtracted from the corresponding element of RB and the results are placed in RT. Special Registers Altered: None
RT
11
RA
16
///
21
1218
31 0
UI
16
RB
21
518
31
temp0:63 EXTZ((ACC)0:31) - EXTZ((RA)0:31) ovh temp31 RT0:31 SATURATE(ovh, temp31, 0x0000_0000, 0x0000_0000, temp32:63) temp0:63 EXTS((ACC)32:63) - EXTS((RA)32:63) ovl temp31 RT32:63 SATURATE(ovl, temp31, 0x0000_0000, 0x0000_0000, temp32:63) ACC0:63 (RT)0:63 SPEFSCROVH ovh SPEFSCROV ovl SPEFSCRSOVH SPEFSCRSOVH | ovh SPEFSCRSOV SPEFSCRSOV | ovl Each unsigned-integer word element in RA is zero-extended and subtracted from the corresponding zero-extended element in the accumulator saturating if overflow occurs, and the results are placed in RT and the accumulator. Special Registers Altered: ACC OV OVH SOV SOVH
RT0:31 (RB)0:31 - EXTZ(UI) RT32:63 (RB)32:63 - EXTZ(UI) UI is zero-extended and subtracted from both the high and low elements of RB. Note that the same value is subtracted from both elements of the register. Special Registers Altered: None
Vector XOR
evxor 4
0 6
EVX-form
RT,RA,RB RT
11
RA
16
RB
21
534
31
RT0:31 (RA)0:31 (RB)0:31 RT32:63 (RA)32:63 (RB)32:63 Each element of RA and RB is exclusive-ORed. The results are placed in RT. Special Registers Altered: None
555
556
Power ISA I
Chapter 9. Embedded Floating-Point [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector]
9.1 Overview. . . . . . . . . . . . . . . . . . . . 557 9.2 Programming Model . . . . . . . . . . . 558 9.2.1 Signal Processing Embedded Floating-Point Status and Control Register (SPEFSCR). . . . . . . . . . . . . . . . . . . . . 558 9.2.2 Floating-Point Data Formats . . . 558 9.2.3 Exception Conditions . . . . . . . . . 559 9.2.3.1 Denormalized Values on Input 559 9.2.3.2 Embedded Floating-Point Overflow and Underflow . . . . . . . . . . . 559 9.2.3.3 Embedded Floating-Point Invalid Operation/Input Errors. . . . . . . . . . . . . 559 9.2.3.4 Embedded Floating-Point Round (Inexact) . . . . . . . . . . . . . . . . . . . . . . . 559 9.2.3.5 Embedded Floating-Point Divide by Zero. . . . . . . . . . . . . . . . . . . . . . . . . . . 559 9.2.3.6 Default Results . . . . . . . . . . . . 560 9.2.4 IEEE 754 Compliance . . . . . . . . 560 9.2.4.1 Sticky Bit Handling For Exception Conditions . . . . . . . . . . . . . . . . . . . . . . 560 9.3 Embedded Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . 561 9.3.1 Load/Store Instructions . . . . . . . 561 9.3.2 SPE.Embedded Float Vector Instructions [Category: SPE.Embedded Float Vector]. . . . . . . . . . . . . . . . . . . . . 561 9.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single] . . . . . . . . . . . . . . . . . . . . . . . . . 570 9.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double] . . . . . . . . . . . . . . . . . . . . . . . . 577 9.4 Embedded Floating-Point Results Summary . . . . . . . . . . . . . . . . . . . . . . . 586
9.1 Overview
The Embedded Floating-Point categories require the implementation of the Signal Processing Engine (SPE) category and consist of three distinct categories: Embedded vector single-precision floating-point (SPE.Embedded Float Vector [SP.FV]) Embedded scalar single-precision floating-point (SPE.Embedded Float Scalar Single [SP.FS]) Embedded scalar double-precision floating-point (SPE.Embedded Float Scalar Double [SP.FD]) Although each of these may be implemented independently, they are defined in a single chapter because it is likely that they may be implemented together. References to Embedded Floating-Point categories, Embedded Floating-Point instructions, or Embedded Floating-Point operations apply to all 3 categories.
Single-precision floating-point is handled by the SPE.Embedded Float Vector and SPE.Embedded Float Scalar Single categories; double-precision floating-point is handled by the SPE.Embedded Float Scalar Double category.
557
ing-point data elements are 64 bits wide with 1 sign bit (s), 11 bits of biased exponent (e) and 52 bits of fraction (f). In the IEEE 754 specification, floating-point values are represented in a format consisting of three explicit fields (sign field, biased exponent field, and fraction field) and an implicit hidden bit.
hidden bit 0 s 0 s 1 exp 1 exp 11 12 fraction 8 9 fraction 63 Double-precision 31 (or 32:63) Single-precision
s - sign bit; 0 = positive; 1 = negative exp - biased exponent field fraction - fractional portion of number
Figure 127.Floating-Point Data Format For single-precision normalized numbers, the biased exponent value e lies in the range of 1 to 254 corresponding to an actual exponent value E in the range -126 to +127. For double-precision normalized numbers, the biased exponent value e lies in the range of 1 to 2046 corresponding to an actual exponent value E in the range -1022 to +1023. With the hidden bit implied to be 1 (for normalized numbers), the value of the number is interpreted as follows:
( 1 ) 2 E ( 1.fraction )
where E is the unbiased exponent and 1.fraction is the mantissa (or significand) consisting of a leading 1 (the hidden bit) and a fractional part (fraction field). For the single-precision format, the maximum positive normalized number (pmax) is represented by the encoding 0x7F7FFFFF which is approximately 3.4E+38 (2128), and the minimum positive normalized value (pmin) is represented by the encoding 0x00800000 which is approximately 1.2E-38 (2-126). For the double-precision format, the maximum positive normalized number (pmax) is represented by the encoding 0x7feFFFFF_FFFFFFFF which is approximately 1.8E+307 (21024), and the minimum positive normalized value (pmin) is represented by the encoding 0x00100000_00000000 which is approximately 2.2E-308 (2-1022). Two specific values of the biased exponent are reserved (0 and 255 for single-precision; 0 and 2047 for double-precision) for encoding special values of +0, -0, +infinity, -infinity, and NaNs. Zeros of both positive and negative sign are represented by a biased exponent value e of 0 and a fraction f which is 0. Infinities of both positive and negative sign are represented by a maximum exponent field value (255 for single-precision, 2047 for double-precision) and a fraction which is 0.
9.2.1 Signal Processing Embedded Floating-Point Status and Control Register (SPEFSCR)
Status and control for the Embedded Floating-Point categories uses the SPEFSCR. This register is defined by the Signal Processing Engine category in Section 8.3.4. Status and control bits are shared for Embedded Floating-Point and SPE operations. Instructions in the SPE.Embedded Float Vector category affect both the high element (bits 34:39) and low element floating-point status flags (bits 50:55). Instructions in the SPE.Embedded Float Scalar Double and SPE.Embedded Float Scalar Single categories affect only the low element floating-point status flags and leave the high element floating-point status flags undefined.
558
Power ISA I
Programming Note On some implementations, operations that result in overflow or underflow are likely to take significantly longer than operations that do not. For example, these operations may cause a system error handler to be invoked; on such implementations, the system error handler updates the overflow bits appropriately.
559
560
Power ISA I
RT,RA RT
11
RA
16
///
21
644
31 0
RA
16
///
21
645
31
RT0:31 RT32:63
RT0:31 RT32:63
The sign bit of each element in register RA is set to 0 and the results are placed into register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
The sign bit of each element in register RA is set to 1 and the results are placed into register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
RT,RA RT
11
RA
16
///
21
646
31
RT0:31 RT32:63
The sign bit of each element in register RA is complemented and the results are placed into register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
561
RT,RA,RB RT
11
RA
16
RB
21
640
31 0
RA
16
RB
21
641
31
RT0:31 RT32:63
RT0:31 RT32:63
Each single-precision floating-point element of register RA is added to the corresponding element of register RB and the results are stored in register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of register RT. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS FOVF FOVFH FOVFS FUNF FUNFH FUNFS
Each single-precision floating-point element of register RB is subtracted from the corresponding element of register RA and the results are stored in register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of register RT. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS FOVF FOVFH FOVFS FUNF FUNFH FUNFS
RT,RA,RB RT
11
RA
16
RB
21
648
31 0
RA
16
RB
21
649
31
RT0:31 RT32:63
RT0:31 RT32:63
Each single-precision floating-point element of register RA is multiplied with the corresponding element of register RB and the result is stored in register RT. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS FOVF FOVFH FOVFS FUNF FUNFH FUNFS
Each single-precision floating-point element of register RA is divided by the corresponding element of register RB and the result is stored in register RT. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS FDBZ FDBZH FDBZS FOVF FOVFH FOVFS FUNF FUNFH FUNFS
562
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
652
31 0
RA
16
RB
21
653
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah > bh) then ch 1 0 else ch if (al > bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared against the corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 is greater than RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms as treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX CR field BF
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah < bh) then ch 1 0 else ch if (al < bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared against the corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 is less than RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is less than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms as treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX CR field BF
563
BF //
9 11
RA
16
RB
21
654
31 0
RA
16
RB
21
668
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah = bh) then ch 1 0 else ch if (al = bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared against the corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 is equal to RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms as treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX CR field BF
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah > bh) then ch 1 0 else ch if (al > bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared against the corresponding element of register RB.The results of the comparisons are placed into CR field BF. If RA0:31 is greater than RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are taken during the execution of evfststgt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of evfststgt is likely to be faster than the execution of evfscmpgt; however, if strict IEEE 754 compliance is required, the program should use evfscmpgt.
564
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
669
31 0
RA
16
RB
21
670
31
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah < bh) then ch 1 0 else ch if (al < bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared with the corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 is less than RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is less than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are taken during the execution of evfststlt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of evfststlt is likely to be faster than the execution of evfscmplt; however, if strict IEEE 754 compliance is required, the program should use evfscmplt.
ah (RA)0:31 al (RA)32:63 bh (RB)0:31 bl (RB)32:63 if (ah = bh) then ch 1 0 else ch if (al = bl) then cl 1 0 else cl ch || cl || (ch | cl) || (ch & cl) CR4BF:4BF+3 Each element of register RA is compared against the corresponding element of register RB. The results of the comparisons are placed into CR field BF. If RA0:31 is equal to RB0:31, bit 0 of CR field BF is set to 1, otherwise it is set to 0. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bit 2 of CR field BF is set to the OR of both result bits and Bit 3 of CR field BF is set to the AND of both result bits. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are taken during the execution of evfststeq. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of evfststeq is likely to be faster than the execution of evfscmpeq; however, if strict IEEE 754 compliance is required, the program should use evfscmpeq.
565
RT,RB RT
11
///
16
RB
21
657
31 0
///
16
RB
21
656
31
RT0:31 CnvtI32ToFP32((RB)0:31, S, HI, I) RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, I) Each signed integer element of register RB is converted to the nearest single-precision floating-point value using the current rounding mode and the results are placed into the corresponding element of register RT. Special Registers Altered: FGH FXH FG FX FINXS
RT0:31 CnvtI32ToFP32((RB)0:31, U, HI, I) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, I) Each unsigned integer element of register RB is converted to the nearest single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of register RT. Special Registers Altered: FGH FXH FG FX FINXS
RT,RB RT
11
///
16
RB
21
659
31 0
///
16
RB
21
658
31
RT0:31 CnvtI32ToFP32((RB)0:31, S, HI, F) RT32:63 CnvtI32ToFP32((RB)32:63, S, LO, F) Each signed fractional element of register RB is converted to a single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of register RT. Special Registers Altered: FGH FXH FG FX FINXS
RT0:31 CnvtI32ToFP32((RB)0:31, U, HI, F) RT32:63 CnvtI32ToFP32((RB)32:63, U, LO, F) Each unsigned fractional element of register RB is converted to a single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of register RT. Special Registers Altered: FGH FXH FG FX FINXS
566
Power ISA I
Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero EVX-form
evfsctsiz RT,RB RT
6 11
RT,RB RT
11
///
16
RB
21
661
31 0
///
16
RB
21
666
31
RT0:31 RT32:63
RT0:31 RT32:63
Each single-precision floating-point element in register RB is converted to a signed integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
Each single-precision floating-point element in register RB is converted to a signed integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX-form
evfsctuiz RT,RB RT
6 11
RT,RB RT
11
///
16
RB
21
660
31 0
///
16
RB
21
664
31
RT0:31 RT32:63
RT0:31 RT32:63
Each single-precision floating-point element in register RB is converted to an unsigned integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
Each single-precision floating-point element in register RB is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
567
RT,RB RT
11
///
16
RB
21
663
31 0
///
16
RB
21
662
31
RT0:31 RT32:63
RT0:31 RT32:63
Each single-precision floating-point element in register RB is converted to a signed fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit signed fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
Each single-precision floating-point element in register RB is converted to an unsigned fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVH FINVS FGH FXH FG FX FINXS
568
Power ISA I
9.3.3 SPE.Embedded Float Scalar Single Instructions [Category: SPE.Embedded Float Scalar Single]
Floating-Point Single-Precision Absolute Value EVX-form
efsabs 4
0 6
RT,RA RT
11
RA
16
///
21
708
31 0
RA
16
///
21
709
31
RT32:63
0b0 || (RA)33:63
RT32:63
0b1 || (RA)33:63
The sign bit of the low element of register RA is set to 0 and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
The sign bit of the low element of register RA is set to 1 and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
RT,RA RT
11
RA
16
///
21
710
31
RT32:63
(RA)32 || (RA)33:63
The sign bit of the low element of register RA is complemented and the result is placed into the low element of register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
569
RT,RA,RB RT
11
RA
16
RB
21
704
31 0
RA
16
RB
21
705
31
RT32:63
RT32:63
The low element of register RA is added to the low element of register RB and the result is stored in the low element of register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
The low element of register RB is subtracted from the low element of register RA and the result is stored in the low element of register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
RT,RA,RB RT
11
RA
16
RB
21
712
31 0
RA
16
RB
21
713
31
RT32:63
(RA)32:63 sp (RB)32:63
RT32:63
(RA)32:63 sp (RB)32:63
The low element of register RA is multiplied by the low element of register RB and the result is stored in the low element of register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
The low element of register RA is divided by the low element of register RB and the result is stored in the low element of register RT. Special Registers Altered: FINV FINVS FG FX FINXS FDBZ FDBZS FOVF FOVFS FUNF FUNFS
570
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
716
31 0
RA
16
RB
21
717
31
al (RA)32:63 bl (RB)32:63 if (al > bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)32:63 bl (RB)32:63 if (al < bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
The low element of register RA is compared against the low element of register RB. The results of the comparisons are placed into CR field BF. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
The low element of register RA is compared against the low element of register RB. If RA32:63 is less than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
571
BF,RA,RB BF //
9 11
RA
16
RB
21
718
31 0
RA
16
RB
21
732
31
al (RA)32:63 bl (RB)32:63 if (al = bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)32:63 bl (RB)32:63 if (al > bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
The low element of register RA is compared against the low element of register RB. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an Input Error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
The low element of register RA is compared against the low element of register RB. If RA32:63 is greater than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efststgt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efststgt is likely to be faster than the execution of efscmpgt; however, if strict IEEE 754 compliance is required, the program should use efscmpgt.
572
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
733
31 0
RA
16
RB
21
734
31
al (RA)32:63 bl (RB)32:63 if (al < bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)32:63 bl (RB)32:63 if (al = bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
The low element of register RA is compared against the low element of register RB. If RA32:63 is less than RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efststlt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efststlt is likely to be faster than the execution of efscmplt; however, if strict IEEE 754 compliance is required, the program should use efscmplt.
The low element of register RA is compared against the low element of register RB. If RA32:63 is equal to RB32:63, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efststeq. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efststeq is likely to be faster than the execution of efscmpeq; however, if strict IEEE 754 compliance is required, the program should use efscmpeq.
573
RT,RB RT
11
///
16
RB
21
721
31 0
///
16
RB
21
720
31
RT32:63
CnvtI32ToFP32((RB)32:63, S, LO, I)
RT32:63
CnvtI32ToFP32((RB)32:63, U, LO, I)
The signed integer low element in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Special Registers Altered: FINXS FG FX
The unsigned integer low element in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Special Registers Altered: FINXS FG FX
RT,RB RT
11
///
16
RB
21
723
31 0
///
16
RB
21
722
31
RT32:63
CnvtI32ToFP32((RB)32:63, S, LO, F)
RT32:63
CnvtI32ToFP32((RB)32:63, U, LO, F)
The signed fractional low element in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Special Registers Altered: FINXS FG FX
The unsigned fractional low element in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Special Registers Altered: FINXS FG FX
RT,RB RT
11
///
16
RB
21
725
31 0
///
16
RB
21
724
31
RT32:63
RT32:63
The single-precision floating-point low element in register RB is converted to a signed integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
The single-precision floating-point low element in register RB is converted to an unsigned integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
574
Power ISA I
Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero EVX-form
efsctuiz RT,RB RT
6 11
RT,RB RT
11
///
16
RB
21
730
31 0
///
16
RB
21
728
31
RT32:63
RT32:63
The single-precision floating-point low element in register RB is converted to a signed integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
The single-precision floating-point low element in register RB is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
RT,RB RT
11
///
16
RB
21
727
31 0
///
16
RB
21
726
31
RT32:63
RT32:63
The single-precision floating-point low element in register RB is converted to a signed fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
The single-precision floating-point low element in register RB is converted to an unsigned fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit unsigned fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
575
9.3.4 SPE.Embedded Float Scalar Double Instructions [Category: SPE.Embedded Float Scalar Double]
Floating-Point Double-Precision Absolute Value EVX-form
efdabs 4
0 6
RT,RA RT
11
RA
16
///
21
740
31 0
RA
16
///
21
741
31
RT0:63
0b0 || (RA)1:63
RT0:63
0b1 || (RA)1:63
The sign bit of register RA is set to 0 and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
The sign bit of register RA is set to 1 and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
RT,RA RT
11
RA
16
///
21
742
31
RT0:63
(RA)0 || (RA)1:63
The sign bit of register RA is complemented and the result is placed in register RT. Regardless of the value of register RA, no exceptions are taken during the execution of this instruction. Special Registers Altered: None
576
Power ISA I
RT,RA,RB RT
11
RA
16
RB
21
736
31 0
RA
16
RB
21
737
31
RT0:63
RT0:63
RA is added to RB and the result is stored in register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
RB is subtracted from RA and the result is stored in register RT. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
RT,RA,RB RT
11
RA
16
RB
21
744
31 0
RA
16
RB
21
745
31
RT0:63
(RA)0:63 dp (RB)0:63
RT0:63
(RA)0:63 dp (RB)0:63
RA is multiplied by RB and the result is stored in register RT. Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
RA is divided by RB and the result is stored in register RT. Special Registers Altered: FINV FINVS FG FX FINXS FDBZ FDBZS FOVF FOVFS FUNF FUNFS
577
BF,RA,RB BF //
9 11
RA
16
RB
21
748
31 0
RA
16
RB
21
749
31
al (RA)0:63 bl (RB)0:63 if (al > bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)0:63 bl (RB)0:63 if (al < bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
RA is compared against RB. If RA is greater than RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
RA is compared against RB. If RA is less than RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
BF,RA,RB BF //
9 11
RA
16
RB
21
750
31 0
RA
16
RB
21
764
31
al (RA)0:63 bl (RB)0:63 if (al = bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)0:63 bl (RB)0:63 if (al > bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
RA is compared against RB. If RA is equal to RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). If an input error occurs and default results are generated, NaNs, Infinities, and Denorms are treated as normalized numbers, using their values of e and f directly. Special Registers Altered: FINV FINVS FG FX CR field BF
RA is compared against RB. If RA is greater than RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efdtstgt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efdtstgt is likely to be faster than the execution of efdcmpgt; however, if strict IEEE 754 compliance is required, the program should use efdcmpgt.
578
Power ISA I
BF,RA,RB BF //
9 11
RA
16
RB
21
765
31 0
RA
16
RB
21
766
31
al (RA)0:63 bl (RB)0:63 if (al < bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
al (RA)0:63 bl (RB)0:63 if (al = bl) then cl 1 else cl 0 CR4BF:4BF+3 undefined || cl || undefined || undefined
RA is compared against RB. If RA is less than RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efdtstlt. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efdtstlt is likely to be faster than the execution of efdcmplt; however, if strict IEEE 754 compliance is required, the program should use efdcmplt.
RA is compared against RB. If RA is equal to RB, bit 1 of CR field BF is set to 1, otherwise it is set to 0. Bits 0, 2, and 3 of CR field BF are undefined. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of e and f directly. No exceptions are generated during the execution of efdtsteq. Special Registers Altered: CR field BF Programming Note In an implementation, the execution of efdtsteq is likely to be faster than the execution of efdcmpeq; however, if strict IEEE 754 compliance is required, the program should use efdcmpeq.
RT,RB RT
11
///
16
RB
21
753
31 0
///
16
RB
21
752
31
RT0:63
CnvtI32ToFP64((RB)32:63, S, I)
RT0:63
CnvtI32ToFP64((RB)32:63, U, I)
The signed integer low element in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Special Registers Altered: None
The unsigned integer low element in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Special Registers Altered: None
579
RT,RB RT
11
///
16
RB
21
739
31 0
///
16
RB
21
738
31
RT0:63
CnvtI64ToFP64((RB)0:63, S)
RT0:63
CnvtI64ToFP64((RB)0:63, U)
The signed integer doubleword in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Corequisite Categories: 64-Bit Special Registers Altered: FINXS FG FX
The unsigned integer doubleword in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Corequisite Categories: 64-Bit Special Registers Altered: FINXS FG FX
RT,RB 4 RT
11
///
16
RB
21
757
31
///
16
RB
21
755
31
RT32:63
CnvtFP64ToI32Sat((RB)0:63, S, RND, I)
RT0:63
CnvtI32ToFP64((RB)32:63, S, F)
The signed fractional low element in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Special Registers Altered: None
The double-precision floating-point value in register RB is converted to a signed integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
RT,RB RT
11
///
16
RB
21
754
31 0
///
16
RB
21
756
31
RT0:63
CnvtI32ToFP64((RB)32:63, U, F)
RT32:63
The unsigned fractional low element in register RB is converted to a double-precision floating-point value using the current rounding mode and the result is placed in register RT. Special Registers Altered: None
CnvtFP64ToI32Sat((RB)0:63, U, RND, I)
The double-precision floating-point value in register RB is converted to an unsigned integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
580
Power ISA I
Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero EVX-form
efdctuidz RT,RB RT
6 11
RT,RB RT
11
///
16
RB
21
747
31 0
///
16
RB
21
746
31
RT0:63
CnvtFP64ToI64Sat((RB)0:63, S, ZER)
RT0:63
CnvtFP64ToI64Sat((RB)0:63, U, ZER)
The double-precision floating-point value in register RB is converted to a signed integer doubleword using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 64-bit integer. NaNs are converted as though they were zero. Corequisite Categories: 64-Bit Special Registers Altered: FINV FINVS FINXS FG FX
The double-precision floating-point value in register RB is converted to an unsigned integer doubleword using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 64-bit integer. NaNs are converted as though they were zero. Corequisite Categories: 64-Bit Special Registers Altered: FINV FINVS FINXS FG FX
581
Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero EVX-form
efdctuiz RT,RB RT
6 11
RT,RB RT
11
///
16
RB
21
762
31 0
///
16
RB
21
760
31
RT32:63
CnvtFP64ToI32Sat((RB)0:63, S, ZER, I)
RT32:63
CnvtFP64ToI32Sat((RB)0:63, U, ZER, I)
The double-precision floating-point value in register RB is converted to a signed integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
The double-precision floating-point value in register RB is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
RT,RB RT
11
///
16
RB
21
759
31 0
///
16
RB
21
751
31
RT32:63
CnvtFP64ToI32Sat((RB)0:63, S, RND, F)
The double-precision floating-point value in register RB is converted to a signed fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
FP32format f; FP64format result; (RB)32:63 f if (fexp = 0) & (ffrac = 0)) then result fsign || 630 else if Isa32NaNorInfinity(f) | Isa32Denorm(f) then 1 SPEFSCRFINV result fsign || 0b11111111110 || 521 else if Isa32Denorm(f) then 1 SPEFSCRFINV result fsign || 630 else fsign resultsign resultexp fexp - 127 + 1023 ffrac || 290 resultfrac RT0:63 result The single-precision floating-point value in the low element of register RB is converted to a double-precision floating-point value and the result is placed in register RT. Corequisite Categories: SPE.Embedded Float Scalar Single or SPE.Embedded Float Vector Special Registers Altered: FINV FINVS FG FX
RT,RB RT
11
///
16
RB
21
758
31
RT32:63
CnvtFP64ToI32Sat((RB)0:63, U, RND, F)
The double-precision floating-point value in register RB is converted to an unsigned fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit unsigned fraction. NaNs are converted as though they were zero. Special Registers Altered: FINV FINVS FINXS FG FX
582
Power ISA I
RT,RB RT
11
///
16
RB
21
719
31
FP64format f; FP32format result; (RB)0:63 f if (fexp = 0) & (ffrac = 0)) then fsign || 310 result else if Isa64NaNorInfinity(f) then SPEFSCRFINV 1 fsign || 0b11111110 || 231 result else if Isa64Denorm(f) then 1 SPEFSCRFINV fsign || 310 result else unbias fexp - 1023 if unbias > 127 then fsign || 0b11111110 || 231 result SPEFSCRFOVF 1 else if unbias < -126 then fsign || 310 result SPEFSCRFUNF 1 else fsign resultsign resultexp unbias + 127 ffrac[0:22] resultfrac guard ffrac[23] sticky (ffrac[24:51] 0) Round32(result, LO, guard, result sticky) SPEFSCRFG guard sticky SPEFSCRFX if guard | sticky then SPEFSCRFINXS 1 result RT32:63 The double-precision floating-point value in register RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of register RT. Corequisite Categories: SPE.Embedded Float Scalar Scalar Special Registers Altered: FINV FINVS FOVF FOVFS FUNF FUNFS FG FX FINXS
583
Table 100:Embedded Floating-Point Results SummaryAdd, Sub, Mul, Div Operation Operand A Operand B Add Add Add Add Add Add Add Add Add Add Add Add Add Add Add Add Add Add NaN NaN NaN NaN NaN denorm denorm denorm denorm denorm zero zero zero NaN denorm zero Norm NaN denorm zero norm NaN denorm zero norm NaN denorm Result Add amax amax amax amax amax amax amax amax amax amax bmax bmax zero1 zero1 operand_b4 bmax bmax zero1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 FINV FOVF FUNF FDBZ FINX
584
Power ISA I
Table 100:Embedded Floating-Point Results SummaryAdd, Sub, Mul, Div (Continued) Operation Operand A Operand B Add Add Add Add Add Add Add Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Sub Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul zero zero norm norm norm norm norm NaN NaN NaN NaN NaN denorm denorm denorm denorm denorm zero zero zero zero zero norm norm norm norm norm NaN NaN NaN NaN NaN zero norm NaN denorm zero norm NaN denorm zero Norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero Norm NaN denorm zero norm Result zero
1
operand_b4 bmax bmax operand_a4 operand_a4 _Calc_ Subtract amax amax amax amax amax amax amax amax amax amax -bmax -bmax zero2 zero2 -operand_b4 -bmax -bmax zero2 zero2 -operand_b4 -bmax -bmax operand_a4 operand_a4 _Calc_ Multiply3 max max zero zero max max max zero zero max
585
Table 100:Embedded Floating-Point Results SummaryAdd, Sub, Mul, Div (Continued) Operation Operand A Operand B Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Mul Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div Div denorm denorm denorm denorm denorm zero zero zero zero zero norm norm norm norm norm NaN NaN NaN NaN NaN denorm denorm denorm denorm denorm zero zero zero zero zero norm norm norm norm norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero Norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero norm NaN denorm zero norm Result zero zero zero zero zero zero zero zero zero zero max max zero zero _Calc_ Divide3 zero zero max max max zero zero max max max zero zero max max zero zero zero max max zero zero zero max max _Calc_ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * FINV FOVF FUNF FDBZ FINX 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 *
586
Power ISA I
Table 101:Embedded Floating-Point Results SummarySingle Convert from Double Operand B + - +NaN -NaN +denorm -denorm +zero -zero norm efscfd result pmax nmax pmax nmax +zero -zero +zero -zero _Calc_ FINV 1 1 1 1 1 1 0 0 0 FOVF 0 0 0 0 0 0 0 0 * FUNF 0 0 0 0 0 0 0 0 * FDBZ 0 0 0 0 0 0 0 0 0 FINX 0 0 0 0 0 0 0 0 *
Table 102:Embedded Floating-Point Results SummaryDouble Convert from Single Operand B + - +NaN -NaN +denorm -denorm +zero -zero norm efdcfs result pmax nmax pmax nmax +zero -zero +zero -zero _Calc_ FINV 1 1 1 1 1 1 0 0 0 FOVF 0 0 0 0 0 0 0 0 0 FUNF 0 0 0 0 0 0 0 0 0 FDBZ 0 0 0 0 0 0 0 0 0 FINX 0 0 0 0 0 0 0 0 0
Table 103:Embedded Floating-Point Results SummaryConvert to Unsigned Operand B + - +NaN -NaN denorm zero +norm -norm Integer Result ctui[d][z] 0xFFFF_FFFF 0xFFFF_FFFF_FFFF_FFFF 0 0 0 0 0 _Calc_ _Calc_ Fractional Result FINV FOVF FUNF FDBZ FINX ctuf 0x7FFF_FFFF 0 0 0 0 0 _Calc_ _Calc_ 1 1 1 1 1 0 * * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * *
587
Table 104:Embedded Floating-Point Results SummaryConvert to Signed Operand B + - +NaN -NaN denorm zero +norm -norm Integer Result ctsi[d][z] 0x7FFF_FFFF 0x7FFF_FFFF_FFFF_FFFF 0x8000_0000 0x8000_0000_0000_0000 0 0 0 0 _Calc_ _Calc_ Fractional Result FINV FOVF FUNF FDBZ FINX ctsf 0x7FFF_FFFF 0x8000_0000 0 0 0 0 _Calc_ _Calc_ 1 1 1 1 1 0 * * 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 * *
Table 105:Embedded Floating-Point Results SummaryConvert from Unsigned Operand B zero norm Integer Source Fractional Source FINV FOVF FUNF FDBZ FINX cfui cfuf zero _Calc_ zero _Calc_ 0 0 0 0 0 0 0 0 0 *
Table 106:Embedded Floating-Point Results SummaryConvert from Signed Operand B zero norm Integer Source Fractional Source FINV FOVF FUNF FDBZ FINX cfsi cfsf zero _Calc_ zero _Calc_ 0 0 0 0 0 0 0 0 0 *
Table 107:Embedded Floating-Point Results Summary*abs, *nabs, *neg Operand A + - +NaN -NaN +denorm -denorm +zero -zero +norm -norm *abs pmax | + pmax | + pmax | NaN pmax | NaN +zero | +denorm +zero | +denorm +zero +zero +norm +norm *nabs nmax | - nmax | - nmax | -NaN nmax | -NaN -zero | -denorm -zero | -denorm -zero -zero -norm -norm *neg -amax | - -amax | + -amax | -NaN -amax | +NaN -zero | -denorm +zero | +denorm -zero +zero -norm +norm FINV FOVF FUNF FDBZ FINX 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
588
Power ISA I
Chapter 10. Legacy Move Assist Instruction [Category: Legacy Move Assist]
Determine Leftmost Zero Byte
dlmzb dlmzb. 31
0 6
X-form
(Rc=0) (Rc=1)
RA,RS,RB RA,RS,RB RS
11
(if Rc=1)
RA
16
RB
21
78
Rc
31
d0:63 (RS)32:63 || (RB)32:63 i 0 0 x 0 y do while (x<8) & (y=0) x + 1 x if di+32:i+39 = 0 then y 1 else i + 8 i RA x x XER57:63 if Rc = 1 then do CR35 SO if y = 1 then do 0b010 if x<5 then CR32:34 else CR32:34 0b100 else 0b001 CR32:34 The contents of bits 32:63 of register RS and the contents of bits 32:63 of register RB are concatenated to form an 8-byte operand. The operand is searched for the leftmost byte in which each bit is 0 (i.e., a null byte). Bytes in the operand are numbered from left to right starting with 1. If a null byte is found, its byte number is placed into bits 57:63 of the XER and into register RA. Otherwise, the value 0b000_1000 is placed into both bits 57:63 of the XER and register RA. If Rc is equal to 1, SO is copied into bit 35 of the CR and bits 32:34 of the CR are updated as follows: If no null byte is found, bits 32:34 of the CR are set to 0b001. If the leftmost null byte is in the first 4 bytes (i.e., from register RS), bits 32:34 of the CR are set to 0b010. If the leftmost null byte is in the last 4 bytes (i.e., from register RB), bits 32:34 of the CR are set to 0b100.
Chapter 10. Legacy Move Assist Instruction [Category: Legacy Move As-
589
590
Power ISA I
Chapter 11. Legacy Integer Multiply-Accumulate Instructions [Category: Legacy Integer Multiply-Accumulate]
The Legacy Integer Multiply-Accumulate instructions with Rc=1 set the first three bits of CR Field 0 based on the 32-bit result, as described in Section 3.3.7, Other Fixed-Point Instructions. The XO-form Legacy Integer Multiply-Accumulate instructions set SO and OV when OE=1 to reflect overflow of the 32-bit result. Programming Note Notice that CR Field 0 may not reflect the true (infinitely precise) result if overflow occurs.
RA
OE
21 22
172
Rc
31
RA
OE
21 22
236
Rc
31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)48:63 si (RB)32:47 temp0:32 prod0:31 + RT32:63 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 else RT32:63 RT0:31 undefined
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the sum is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
591
RA
OE
21 22
140
Rc
31
RT
11
RA
OE
21 22
204
Rc
31
prod0:31 (RA)48:63 ui (RB)32:47 temp0:32 prod0:31 + (RT)32:63 RT temp1:32 The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
592
Power ISA I
RA
OE
21 22
44
Rc
31
RA
OE
21 22
108
Rc
31
The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)32:47 si (RB)32:47 temp0:32 prod0:31 + (RT)32:63 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 RT32:63 else RT0:31 undefined
The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the sum is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
593
RA
OE
21 22
12
Rc
31
RT
11
RA
OE
21 22
76
Rc
31
The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
594
Power ISA I
RA
OE
21 22
428
Rc
31
RA
OE
21 22
492
Rc
31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)48:63 si (RB)48:63 temp0:32 prod0:31 + (RT)32:63 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 RT32:63 else RT0:31 undefined
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 of register RB. The 32-bit signed-integer product is added to the signed-integer word in bits 32:63 of register RT. If the sum is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the sum is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
595
RA
OE
21 22
396
Rc
31
RA
OE
21 22
460
Rc
31
The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. The low-order 32 bits of the sum are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB. The 32-bit unsigned-integer product is added to the unsigned-integer word in bits 32:63 of register RT. If the sum is greater than 232-1, then the value 0xFFFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the sum is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
X-form
(Rc=0) (Rc=1)
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16 21
RT,RA,RB RT,RA,RB RT
11
RA
168
Rc
31
RA
16
RB
21
136
Rc
31
RT32:63 RT0:31
RT32:63 RT0:31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB and the signed-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB and the unsigned-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
596
Power ISA I
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16 21
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16 21
RA
40
Rc
31
RA
Rc
31
RT32:63 RT0:31
RT32:63 RT0:31
The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB and the signed-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
The unsigned-integer halfword in bits 32:47 of register RA is multiplied by the unsigned-integer halfword in bits 32:47 of register RB and the unsigned-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16 21
RT,RA,RB RT,RA,RB RT
11
(Rc=0) (Rc=1) RB
16 21
RA
424
Rc
31
RA
392
Rc
31
RT32:63 RT0:31
RT32:63 RT0:31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 of register RB and the signed-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
The unsigned-integer halfword in bits 48:63 of register RA is multiplied by the unsigned-integer halfword in bits 48:63 of register RB and the unsigned-integer word result is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: CR0 (if Rc=1)
597
RA
OE
21 22
174
Rc
31
RT
11
RA
OE
21 22
238
Rc
31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the difference are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)48:63 si (RB)32:47 temp0:32 (RT)32:63 -si prod0:31 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 RT32:63 else RT0:31 undefined
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. If the difference is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the difference is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
598
Power ISA I
RA
OE
21 22
46
Rc
31
RT
11
RA
OE
21 22
110
Rc
31
The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the difference are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)32:47 si (RB)32:47 temp0:32 (RT)32:63 -si prod0:31 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 RT32:63 else RT0:31 undefined
The signed-integer halfword in bits 32:47 of register RA is multiplied by the signed-integer halfword in bits 32:47 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. If the difference is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the difference is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
599
RA
OE
21 22
430
Rc
31
RA
OE
21 22
494
Rc
31
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. The low-order 32 bits of the difference are placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
prod0:31 (RA)48:63 si (RB)48:63 temp0:32 (RT)32:63 -si prod0:31 if temp < -231 then RT32:63 else if temp > 231-1 then RT32:63 RT32:63 else RT0:31 undefined
The signed-integer halfword in bits 48:63 of register RA is multiplied by the signed-integer halfword in bits 48:63 of register RB. The 32-bit signed-integer product is subtracted from the signed-integer word in bits 32:63 of register RT. If the difference is less than -231, then the value 0x8000_0000 is placed into bits 32:63 of register RT. If the difference is greater than 231-1, then the value 0x7FFF_FFFF is placed into bits 32:63 of register RT. Otherwise, the difference is placed into bits 32:63 of register RT. The contents of bits 0:31 of register RT are undefined. Special Registers Altered: SO OV CR0 (if OE=1) (if Rc=1)
600
Power ISA I
601
+ zero - zero
End If frac0:52 > 0 then Do If frac0 = 1 then Do If sign = 0 then FPSCRFPRF If sign = 1 then FPSCRFPRF End If frac0 = 0 then Do If sign = 0 then FPSCRFPRF If sign = 1 then FPSCRFPRF End Normalize operand: Do while frac0 = 0 exp exp-1 frac0:52 frac1:52 || 0b0 End FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 End Done
Enabled Exponent Underflow: FPSCRUX 1 sign (FRB)0 If (FRB)1:11 = 0 then Do exp -1022 frac0:52 0b0 || (FRB)12:63 End If (FRB)1:11 > 0 then Do exp (FRB)1:11 - 1023 frac0:52 0b1 || (FRB)12:63 End Normalize operand: Do while frac0 = 0 exp exp - 1 frac0:52 frac1:52 || 0b0 End Round Single(sign,exp,frac0:52,0,0,0) FPSCRXX FPSCRXX | FPSCRFI exp exp + 192 FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If sign = 0 then FPSCRFPRF + normal number If sign = 1 then FPSCRFPRF - normal number Done Disabled Exponent Overflow: FPSCROX 1 If FPSCRRN = 0b00 then /* Round to Nearest */ Do If (FRB)0 = 0 then FRT 0x7FF0_0000_0000_0000 If (FRB)0 = 1 then FRT 0xFFF0_0000_0000_0000 If (FRB)0 = 0 then FPSCRFPRF + infinity If (FRB)0 = 1 then FPSCRFPRF - infinity End
602
Power ISA I
+ zero - zero
+ infinity - infinity
603
604
Power ISA I
0 then goto Infinity Operand then goto SNaN Operand then goto QNaN Operand Operand
exp (FRB)1:11 - 1023 /* exp - bias */ exp -1022 frac0:64 0b01 || (FRB)12:63 || 110 /* normal */ frac0:64 0b00 || (FRB)12:63 || 110 /* denormal */
0b000 gbit || rbit || xbit do i=1,63-exp /* do the loop 0 times if exp = 63 */ frac0:64 || gbit || rbit || xbit 0b0 || frac0:64 || gbit || (rbit | xbit) end Round Integer( sign, frac0:64, gbit, rbit, xbit, round_mode ) if sign = 1 then frac0:64
frac0:64
+ 1
605
unsigned integer & frac0:64 > 232-1 then unsigned integer & frac0:64 > 264-1 then unsigned integer & frac0:64 < 0 then unsigned integer & frac0:64 < 0 then
Round Integer( sign, frac0:64, gbit, rbit, xbit, round_mode ): 0 inc if round_mode = 0b00 then do /* Round to Nearest */ if sign || frac64 || gbit || rbit || xbit = 0bU11UU then if sign || frac64 || gbit || rbit || xbit = 0bU011U then if sign || frac64 || gbit || rbit || xbit = 0bU01U1 then end if round_mode = 0b10 then do /* Round toward +Infinity */ if sign || frac64 || gbit || rbit || xbit = 0b0U1UU then if sign || frac64 || gbit || rbit || xbit = 0b0UU1U then if sign || frac64 || gbit || rbit || xbit = 0b0UUU1 then end if round_mode = 0b11 then do /* Round toward -Infinity */ if sign || frac64 || gbit || rbit || xbit = 0b1U1UU then if sign || frac64 || gbit || rbit || xbit = 0b1UU1U then if sign || frac64 || gbit || rbit || xbit = 0b1UUU1 then end frac0:64 frac0:64 + inc FPSCRFR inc FPSCRFI gbit | rbit | xbit return
1 1 1
1 1 1
1 1 1
Infinity Operand: 0b0 FPSCRFR FPSCRFI 0b0 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = 32-bit signed integer then do if sign=0 then FRT 0xUUUU_UUUU_7FFF_FFFF 0xUUUU_UUUU_8000_0000 if sign=1 then FRT end else if tgt_precision = 32-bit unsigned integer then do if sign=0 then FRT 0xUUUU_UUUU_FFFF_FFFF 0xUUUU_UUUU_0000_0000 if sign=1 then FRT end else if tgt_precision = 64-bit signed integer then do if sign=0 then FRT 0x7FFF_FFFF_FFFF_FFFF if sign=1 then FRT 0x8000_0000_0000_0000
606
Power ISA I
signed integer then FRT 0xUUUU_UUUU_8000_0000 0x8000_0000_0000_0000 signed integer then FRT unsigned integer then FRT 0xUUUU_UUUU_0000_0000 unsigned integer then FRT 0x0000_0000_0000_0000
0xUUUU_UUUU_8000_0000 signed integer then FRT signed integer then FRT 0x8000_0000_0000_0000 unsigned integer then FRT 0xUUUU_UUUU_0000_0000 0x0000_0000_0000_0000 unsigned integer then FRT
Large Operand: FPSCRFR 0b0 FPSCRFI 0b0 FPSCRVXCVI 0b1 if FPSCRVE = 0 then do if tgt_precision = 32-bit signed integer then do if sign = 0 then FRT 0xUUUU_UUUU_7FFF_FFFF 0xUUUU_UUUU_8000_0000 if sign = 1 then FRT end else if tgt_precision = 64-bit signed integer then do if sign = 0 then FRT 0x7FFF_FFFF_FFFF_FFFF if sign = 1 then FRT 0x8000_0000_0000_0000 end else if tgt_precision = 32-bit unsigned integer then do if sign = 0 then FRT 0xUUUU_UUUU_FFFF_FFFF if sign = 1 then FRT 0xUUUU_UUUU_0000_0000 end else if tgt_precision = 64-bit unsigned integer then do if sign = 0 then FRT 0xFFFF_FFFF_FFFF_FFFF 0x0000_0000_0000_0000 if sign = 1 then FRT end FPSCRFPRF 0bUUUUU end done
607
Single then do
Unsigned then do
608
Power ISA I
do /* Round toward + Infinity */ || rbit || xbit = 0b0U1UU then inc 1 || rbit || xbit = 0b0UU1U then inc 1 1 || rbit || xbit = 0b0UUU1 then inc do /* Round toward - Infinity */ || rbit || xbit = 0b1U1UU then inc 1 || rbit || xbit = 0b1UU1U then inc 1 || rbit || xbit = 0b1UUU1 then inc 1
if tgt_precision = single-precision then frac0:23 + inc frac0:23 else /* tgt_precision = double-precision */ frac0:52 frac0:52 + inc if carry_out = 1 then exp FPSCRFR FPSCRFI FPSCRXX return inc gbit | rbit | xbit FPSCRXX | FPSCRFI exp + 1
609
Round Integer (sign, frac0:52, gbit, rbit, xbit) Do i = 2, 52 - exp frac0:52 frac1:52 || 0b0 End If frac0 = 1, then exp exp + 1 Else frac0:52 frac1:52 || 0b0 FRT0 sign FRT1:11 exp + 1023 FRT12:63 frac1:52 If (FRT)0 = 0 then FPSCRFPRF + normal number Else FPSCRFPRF - normal number FPSCRFR FI 0b00 Done Round Integer(sign, frac0:52, gbit, rbit, xbit): inc 0 If inst = Floating Round to Integer Nearest then /* ties away from zero */ Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0buu1uu then inc 1 End If inst = Floating Round to Integer Plus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b0u1uu then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b0uu1u then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b0uuu1 then inc 1 End If inst = Floating Round to Integer Minus then Do /* comparisons ignore u bits */ If sign || frac52 || gbit || rbit || xbit = 0b1u1uu then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b1uu1u then inc 1 If sign || frac52 || gbit || rbit || xbit = 0b1uuu1 then inc 1 End frac0:52 frac0:52 + inc Return
610
Power ISA I
+ infinity - infinity
611
612
Power ISA I
with the DPD entries shown in hexadecimal format. The BCD number is produced by replacing _ in the leftmost column with the corresponding digit along the top row. The table is split into two halves, with the right half being a continuation of the left half.
v = a | e | i w = (e & j & i) | (e & i) | a x = (a & k & i) | (a & i) | e y = m Alternatively, the following table can be used to perform the translation. The most significant bit of the three BCD digits (left column) is used to select a specific 10bit encoding (right column) of the DPD. aei 000 001 010 011 100 101 110 111 pqr stu v wxy bcd fgh 0 jkm bcd fgh 1 00m bcd jkh 1 01m bcd 10h 1 11m jkd fgh 1 10m fgd 01h 1 11m jkd 00h 1 11m 00d 11h 1 11m
e = (v & w & x) | (s & v & w & x) | (t & v & x & w) f = (p & t & v & w & x & s) | (s & x & v) | (s & v) g = (q & t & w & v & x & s) | (t & x & v) | (t & v) h = u i = (t (v j = (p (p k = (q (q m = y & & & & & & v & w & x) | (s & v & w & x) | w & x) s & t & w & v) | (s & v & w & x) | w & x & v) | (w & v) s & t & v & w) | (t & v & w & x) | v & w & x) | (x & v)
Alternatively, the following table can be used to perform the translation. A combination of five bits in the DPD encoding (leftmost column) are used to specify a translation to the 3-digit BCD encoding. Dashes (-) in the table are dont cares, and can be either one or zero.
The full translation of a 3-digit BCD number (000 - 999) to a 10-bit DPD is shown in Table 108 on page 615,
613
DPD Code BCD Value DPD Code 0x06E (0x16E) (0x26E) (0x36E) 0x06F (0x16F) (0x26F) (0x36F) 0x07E (0x17E) (0x27E) (0x37E) 0x07F (0x17F) (0x27F) (0x37F) 899 898 889 888 0x0EE (0x1EE) (0x2EE) (0x3EE) 0x0EF (0x1EF) (0x2EF) (0x3EF) 0x0FE (0x1FE) (0x2FE) (0x3FE) 0x0FF (0x1FF) (0x2FF) (0x3FF)
989
The full translation of the 10-bit DPD to a 3-digit BCD number is shown in Table 109 on page 616. The 10-bit DPD index is produced by concatenating the 6-bit value shown in the left column with the 4-bit index along the top row, both represented in hexadecimal. The values in parentheses are non-preferred translations and are explained further in the following section.
998
999
614
Power ISA I
615
616
Power ISA I
ConvertSPtoUXWsaturate( X, Y ) = X0 sign exp0:7 = X1:8 frac0:30 = X9:31 || 0b0000_0000 if((exp==255)&&(frac!=0)) then return(0x0000_0000) // NaN operand if((exp==255)&&(frac==0)) then do // infinity operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)>31) then do // large operand VSCRSAT = 1 return( (sign==1) ? 0x0000_0000 : 0xFFFF_FFFF ) if((exp+Y-127)<0) then return(0x0000_0000) // -1.0 < value < 1.0 // value rounds to 0 if( sign==1 ) then do // negative operand VSCRSAT = 1 return(0x0000_0000) significand0:31 = 0b1 || frac do i=1 to 31-(exp+Y-127) significand = significand >>ui 1 return( significand ) ConvertSXWtoSP( X ) = X0 sign exp0:7 = 32 + 127 frac0:32 = X0 || X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero operand if( sign==1 ) then frac = frac + 1 do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:32!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( sign || exp || frac1:23 )
617
ConvertUXWtoSP( X ) exp0:7 = 31 + 127 frac0:31 = X0:31 if( frac==0 ) return( 0x0000_0000 ) // Zero Operand do while( frac0==0 ) frac = frac << 1 exp = exp - 1 lsb = frac23 gbit = frac24 xbit = frac25:31!=0 inc = ( lsb && gbit ) | ( gbit && xbit ) frac0:23 = frac0:23 + inc if( carry_out==1 ) exp = exp + 1 return( 0b0 || exp || frac1:23 )
618
Power ISA I
Appendix D. Embedded Floating-Point RTL Functions [Category: SPE.Embedded Float Scalar Double] [Category: SPE.Embedded Float Scalar Single] [Category: SPE.Embedded Float Vector]
D.1 Common Functions
// Check if 32-bit fp value is a NaN or Infinity Isa32NaNorInfinity(fp) return (fpexp = 255) Isa32NaN(fp) return ((fpexp = 255) & (fpfrac 0)) // Check if 32-bit fp value is denormalized Isa32Denorm(fp) return ((fpexp = 0) & (fpfrac 0)) // Check if 64-bit fp value is a NaN or Infinity Isa64NaNorInfinity(fp) return (fpexp = 2047) Isa64NaN(fp) return ((fpexp = 2047) & (fpfrac 0)) // Check if 32-bit fp value is denormalized Isa64Denorm(fp) return ((fpexp = 0) & (fpfrac 0)) // Signal an error in the SPEFSCR SignalFPError(upper_lower, bits) if (upper_lower = HI) then bits << 15 bits SPEFSCR | bits SPEFSCR bits (FG | FX) if (upper_lower = HI) then bits << 15 bits SPEFSCR SPEFSCR & bits // Round a 32-bit fp result Round32(fp, guard, sticky) FP32format fp; if (SPEFSCRFINXE = 0) then if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then if (sticky | fpfrac[22]) then v0:23 fpfrac + 1 if v0 then if (fpexp >= 254) then // overflow fpsign || 0b11111110 || 231 fp else fpexp fpexp + 1 v1:23 fpfrac else fpfrac v1:23 else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent return fp // Round a 64-bit fp result Round64(fp, guard, sticky) FP32format fp; if (SPEFSCRFINXE = 0) then // nearest if (SPEFSCRFRMC = 0b00) then if (guard) then if (sticky | fpfrac[51]) then fpfrac + 1 v0:52 if v0 then if (fpexp >= 2046) then // overflow fpsign || fp 0b11111111110 || 521 else fpexp fpexp + 1 v1:52 fpfrac else v1:52 fpfrac else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent return fp
619
D.2 Convert from Single-Precision Embedded Floating-Point to Integer Word with Saturation
// Convert 32-bit Floating-Point to 32-bit integer // or fractional // signed = S (signed) or U (unsigned) // upper_lower = HI (high word) or LO (low word) // round = RND (round) or ZER (truncate) // fractional = F (fractional) or I (integer) CnvtFP32ToI32Sat(fp, signed, upper_lower, round, fractional) FP32format fp; if (Isa32NaNorInfinity(fp)) then SignalFPError(upper_lower, FINV) if (Isa32NaN(fp)) then return 0x00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa32Denorm(fp)) then SignalFPError(upper_lower, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(upper_lower, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer 158 max_exp 158 - fpexp shift if (signed = S) then if ((fpexp158)|(fpfrac0)|(fpsign1)) then max_exp - 1 max_exp else // fractional conversion max_exp 126 126 - fpexp shift if (signed = S) then shift shift + 1 if (fpexp > max_exp) then SignalFPError(upper_lower, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff 0b1 || fpfrac || 0b00000000 // add U bit result 0 guard 0 sticky for (n 0; n < shift; n n + 1) do sticky | guard sticky
result & 0x00000001 guard result > 1 result // Report sticky and guard bits if (upper_lower = HI) then guard SPEFSCRFGH sticky SPEFSCRFXH else SPEFSCRFG guard SPEFSCRFX sticky if (guard | sticky) then SPEFSCRFINXS 1 // Round the integer result if ((round = RND) & (SPEFSCRFINXE = 0)) then if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then if (sticky | (result & 0x00000001)) then result + 1 result else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent if (signed = S) then if (fpsign = 1) then result result + 1 return result
620
Power ISA I
D.3 Convert from Double-Precision Embedded Floating-Point to Integer Word with Saturation
// Convert 64-bit Floating-Point to 32-bit integer // or fractional // signed = S (signed) or U (unsigned) // round = RND (round) or ZER (truncate) // fractional = F (fractional) or I (integer) CnvtFP64ToI32Sat(fp, signed, round, fractional) FP64format fp; if (Isa64NaNorInfinity(fp)) then SignalFPError(LO, FINV) if (Isa64NaN(fp)) then return 0x00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else if (fpsign = 1) then return 0x00000000 else return 0xffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000 // regardless of sign if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000 // all zero values if (fractional = I) then // convert to integer 1054 max_exp 1054 - fpexp shift if (signed S) then if ((fpexp1054)|(fpfrac0)|(fpsign1)) then max_exp - 1 max_exp else // fractional conversion max_exp 1022 1022 - fpexp shift if (signed = S) then shift shift + 1 if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000 else return 0x7fffffff else return 0xffffffff 0b1 || fpfrac[0:30] // add U to frac result fpfrac[31] guard sticky (fpfrac[32:63] 0) 0; n < shift; n n + 1) do for (n sticky | guard sticky
result & 0x00000001 guard result > 1 result // Report sticky and guard bits SPEFSCRFG SPEFSCRFX guard sticky
if (guard | sticky) then SPEFSCRFINXS 1 // Round the result if ((round = RND) & (SPEFSCRFINXE = 0)) then if (SPEFSCRFRMC = 0b00) then // nearest if (guard) then if (sticky | (result & 0x00000001)) then result + 1 result else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent if (signed = S) then if (fpsign = 1) then result result + 1 return result
621
D.4 Convert from Double-Precision Embedded Floating-Point to Integer Doubleword with Saturation
// Convert 64-bit Floating-Point to 64-bit integer // signed = S (signed) or U (unsigned) // round = RND (round) or ZER (truncate) CnvtFP64ToI64Sat(fp, signed, round) FP64format fp; if (Isa64NaNorInfinity(fp)) then SignalFPError(LO, FINV) if (Isa64NaN(fp)) then return 0x00000000_00000000 // all NaNs if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else if (fpsign = 1) then return 0x00000000_00000000 else return 0xffffffff_ffffffff if (Isa64Denorm(fp)) then SignalFPError(LO, FINV) return 0x00000000_00000000 if ((signed = U) & (fpsign = 1)) then SignalFPError(LO, FOVF) // overflow return 0x00000000_00000000 if ((fpexp = 0) & (fpfrac = 0)) then return 0x00000000_00000000 // all zero values 1086 max_exp 1086 - fpexp shift if (signed = S) then if ((fpexp1086)|(fpfrac0)|(fpsign1)) then max_exp - 1 max_exp if (fpexp > max_exp) then SignalFPError(LO, FOVF) // overflow if (signed = S) then if (fpsign = 1) then return 0x80000000_00000000 else return 0x7fffffff_ffffffff else return 0xffffffff_ffffffff result 0b1 || fpfrac || 0b00000000000 //add U bit guard 0 0 sticky 0; n < shift; n n + 1) do for (n sticky sticky | guard result & 0x00000000_00000001 guard result result > 1 // Report sticky and guard bits guard SPEFSCRFG SPEFSCRFX sticky
if (guard | sticky) then SPEFSCRFINXS 1 // Round the result if ((round = RND) & (SPEFSCRFINXE = 0)) then // nearest if (SPEFSCRFRMC = 0b00) then if (guard) then if (sticky | (result&0x00000000_00000001)) then result + 1 result else if ((SPEFSCRFRMC & 0b10) = 0b10) then // infinity modes // implementation dependent if (signed = S) then if (fpsign = 1) then result result + 1 return result
622
Power ISA I
623
624
Power ISA I
E.1 Symbols
The following symbols are defined for use in instructions (basic or extended mnemonics) that specify a Condition Register field or a Condition Register bit. The first five (lt, ..., un) identify a bit number within a CR field. The remainder (cr0, ..., cr7) identify a CR field. An expression in which a CR field symbol is multiplied by 4 and then added to a bitnumber-within-CR-field symbol and 32 can be used to identify a CR bit. Symbol lt gt eq so un cr0 cr1 cr2 cr3 cr4 cr5 cr6 cr7 Value 0 1 2 3 3 0 1 2 3 4 5 6 7 Meaning Less than Greater than Equal Summary overflow Unordered (after floating-point comparison) CR Field 0 CR Field 1 CR Field 2 CR Field 3 CR Field 4 CR Field 5 CR Field 6 CR Field 7
The extended mnemonics in Sections E.2.2 and E.3 require identification of a CR bit: if one of the CR field symbols is used, it must be multiplied by 4 and added to a bit-number-within-CR-field (value in the range 0-3, explicit or symbolic) and 32. The extended mnemonics in Sections E.2.3 and E.5 require identification of a CR field: if one of the CR field symbols is used, it must not be multiplied by 4 or added to 32. (For the extended mnemonics in Section E.2.3, the bit number within the CR field is part of the extended mnemonic. The programmer identifies the CR field, and the Assembler does the multiplication and addition required to produce a CR bit number for the BI field of the underlying basic mnemonic.)
625
Power ISA I
Examples
1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a count loaded into CTR). bdnz target (equivalent to: bc 16,0,target)
2. Same as (1) but branch only if CTR is nonzero and condition in CR0 is equal. bdnzt eq,target (equivalent to: bc 8,2,target)
3. Same as (2), but equal condition is in CR5. bdnzt 4cr5+eq,target (equivalent to: bc 8,22,target)
626
Power ISA I
5. Same as (4), but set the Link Register. This is a form of conditional call. bfl 27,target (equivalent to: bcl 4,27,target)
These codes are reflected in the mnemonics shown in Table 111. Table 111: Branch mnemonics incorporating conditions LR not Set Branch Semantics Branch if less than Branch if less than or equal Branch if equal Branch if greater than or equal Branch if greater than Branch if not less than Branch if not equal Branch if not greater than Branch if summary overflow Branch if not summary overflow Branch if unordered Branch if not unordered bc bca Relative Absolute blt ble beq bge bgt bnl bne bng bso bns bun bnu blta blea beqa bgea bgta bnla bnea bnga bsoa bnsa buna bnua bclr To LR bltlr blelr beqlr bgelr bgtlr bnllr bnelr bnglr bsolr bnslr bunlr bnulr LR Set bcctr bcl bcla To CTR Relative Absolute bltctr blectr beqctr bgectr bgtctr bnlctr bnectr bngctr bsoctr bnsctr bunctr bnuctr bltl blel beql bgel bgtl bnll bnel bngl bsol bnsl bunl bnul bltla blela beqla bgela bgtla bnlla bnela bngla bsola bnsla bunla bnula bclrl To LR bltlrl blelrl beqlrl bgelrl bgtlrl bnllrl bnelrl bnglrl bsolrl bnslrl bunlrl bnulrl bcctrl To CTR bltctrl blectrl beqctrl bgectrl bgtctrl bnlctrl bnectrl bngctrl bsoctrl bnsctrl bunctrl bnuctrl
Examples
1. Branch if CR0 reflects condition not equal. bne target (equivalent to: bc 4,2,target)
627
Power ISA I
3. Branch to an absolute target if CR4 specifies greater than, setting the Link Register. This is a form of conditional call. bgtla cr4,target (equivalent to: bcla 12,17,target)
4. Same as (3), but target address is in the Count Register. bgtctrl cr4 (equivalent to: bcctrl 12,17,0)
Examples
1. Branch if CR0 reflects condition less than, specifying that the branch should be predicted to be taken. blt+ target
2. Same as (1), but target address is in the Link Register and the branch should be predicted not to be taken. bltlr-
628
Power ISA I
The symbols defined in Section E.1 can be used to identify the Condition Register bits.
Examples
1. Set CR bit 57. crset 25 (equivalent to: creqv 25,25,25)
3. Same as (2), but SO bit to be cleared is in CR3. crclr 4cr3+so (equivalent to: crxor 15,15,15)
5. Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into the EQ bit of CR5. crnot 4cr5+eq,4cr4+eq (equivalent to: crnor 22,18,18)
E.4.2 Subtract
The Subtract From instructions subtract the second operand (RA) from the third (RB). Extended mnemonics are provided that use the more normal order, in which the third operand is subtracted from the second. Both these mnemonics can be coded with a final o and/or . to cause the OE and/or Rc bit to be set in the underlying instruction. sub subc Rx,Ry,Rz Rx,Ry,Rz (equivalent to: (equivalent to: subf subfc Rx,Rz,Ry) Rx,Rz,Ry)
629
Power ISA I
Examples
1. Compare register Rx and immediate value 100 as unsigned 64-bit integers and place result into CR0. cmpldi Rx,100 (equivalent to: cmpli 0,1,Rx,100)
2. Same as (1), but place result into CR4. cmpldi cr4,Rx,100 (equivalent to: cmpli 4,1,Rx,100)
3. Compare registers Rx and Ry as signed 64-bit integers and place result into CR0. cmpd Rx,Ry (equivalent to: cmp 0,1,Rx,Ry)
Examples
1. Compare bits 32:63 of register Rx and immediate value 100 as signed 32-bit integers and place result into CR0. cmpwi Rx,100 (equivalent to: cmpi 0,0,Rx,100)
2. Same as (1), but place result into CR4. cmpwi cr4,Rx,100 (equivalent to: cmpi 4,0,Rx,100)
3. Compare bits 32:63 of registers Rx and Ry as unsigned 32-bit integers and place result into CR0. cmplw Rx,Ry (equivalent to: cmpl 0,0,Rx,Ry)
630
Power ISA I
These codes are reflected in the mnemonics shown in Table 115. Table 115: Trap mnemonics 64-bit Comparison Trap Semantics Trap unconditionally Trap unconditionally with parameters Trap if less than Trap if less than or equal Trap if equal Trap if greater than or equal Trap if greater than Trap if not less than Trap if not equal Trap if not greater than Trap if logically less than Trap if logically less than or equal Trap if logically greater than or equal Trap if logically greater than Trap if logically not less than Trap if logically not greater than tdi Immediate tdui tdlti tdlei tdeqi tdgei tdgti tdnli tdnei tdngi tdllti tdllei tdlgei tdlgti tdlnli tdlngi td Register tdu tdlt tdle tdeq tdge tdgt tdnl tdne tdng tdllt tdlle tdlge tdlgt tdlnl tdlng 32-bit Comparison twi Immediate twui twlti twlei tweqi twgei twgti twnli twnei twngi twllti twllei twlgei twlgti twlnli twlngi tw Register trap twu twlt twle tweq twge twgt twnl twne twng twllt twlle twlge twlgt twlnl twlng
631
Power ISA I
2. Same as (1), but comparison is to register Ry. tdne Rx,Ry (equivalent to: td 24,Rx,Ry)
3. Trap if bits 32:63 of register Rx, considered as a 32-bit quantity, are logically greater than 0x7FF. twlgti Rx,0x7FF (equivalent to: twi 1,Rx,0x7FF)
5. Trap unconditionally with immediate parameters Rx and Ry tdu Rx,Ry (equivalent to: td 31,Rx,Ry)
632
Power ISA I
Clear left and shift left Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can be used to scale a (known nonnegative) array index by the width of an element.
Examples
1. Extract the sign bit (bit 0) of register Ry and place the result right-justified into register Rx. extrdi Rx,Ry,1,0 (equivalent to: rldicl Rx,Ry,1,63)
2. Insert the bit extracted in (1) into the sign bit (bit 0) of register Rz. insrdi Rz,Rx,1,0 (equivalent to: rldimi Rz,Rx,63,0)
3. Shift the contents of register Rx left 8 bits. sldi Rx,Rx,8 (equivalent to: rldicr Rx,Rx,8,55)
4. Clear the high-order 32 bits of register Ry and place the result into register Rx. clrldi Rx,Ry,32 (equivalent to: rldicl Rx,Ry,0,32)
633
Power ISA I
clrlslwi ra,rs,b,n
Examples
1. Extract the sign bit (bit 32) of register Ry and place the result right-justified into register Rx. extrwi Rx,Ry,1,0 (equivalent to: rlwinm Rx,Ry,1,31,31)
2. Insert the bit extracted in (1) into the sign bit (bit 32) of register Rz. insrwi Rz,Rx,1,0 (equivalent to: rlwimi Rz,Rx,31,0,0)
3. Shift the contents of register Rx left 8 bits, clearing the high-order 32 bits. slwi Rx,Rx,8 (equivalent to: rlwinm Rx,Rx,8,0,23)
4. Clear the high-order 16 bits of the low-order 32 bits of register Ry and place the result into register Rx, clearing the high-order 32 bits of register Rx. clrlwi Rx,Ry,16 (equivalent to: rlwinm Rx,Ry,0,16,31)
634
Power ISA I
Move To SPR Extended mtxer Rx mtlr Rx mtctr Rx mtuamr Rx mtppr Rx mtppr32 Rx Equivalent to mtspr 1,Rx mtspr 8,Rx mtspr 9,Rx mtspr 13,Rx mtspr 896,Rx mtspr 898,Rx
Move From SPR Extended mfxer Rx mflr Rx mfctr Rx mfuamr Rx mfppr Rx mfppr32 Rx Equivalent to mfspr Rx,1 mfspr Rx,8 mfspr Rx,9 mfspr Rx,13 mfspr Rx,896 mfspr Rx,898
Category: Phased-In
Examples
1. Copy the contents of register Rx to the XER. mtxer Rx (equivalent to: mtspr 1,Rx)
2. Copy the contents of the LR to register Rx. mflr Rx (equivalent to: mfspr Rx,8)
3. Copy the contents of register Rx to the CTR. mtctr Rx (equivalent to: mtspr 9,Rx)
For some uses of a no-op instruction, optimizations related to no-ops, such as removal from the execution stream, are not desireable. An extended mnemonic is provided for the executed form of no-op. This form of no-op will still consume execution resources. xnop (equivalent to: xori 0,0,0)
Load Immediate
The addi and addis instructions can be used to load an immediate value into a register. Extended mnemonics are provided to convey the idea that no addition is being performed but merely data movement (from the immediate field of the instruction to a register). Load a 16-bit signed immediate value into register Rx. li Rx,value (equivalent to: addi Rx,0,value)
Load a 16-bit signed immediate value, shifted left by 16 bits, into register Rx. lis Rx,value (equivalent to: addis Rx,0,value)
635
Power ISA I
The la mnemonic is useful for obtaining the address of a variable specified by name, allowing the Assembler to supply the base register number and compute the displacement. If the variable v is located at offset Dv bytes from the address in register Rv, and the Assembler has been told to use register Rv as a base for references to the data structure containing v, then the following line causes the address of v to be loaded into register Rx. la Rx,v (equivalent to: addi Rx,Rv,Dv)
Move Register
Several Power ISA instructions can be coded in a way such that they simply copy the contents of one register to another. An extended mnemonic is provided to convey the idea that no computation is being performed but merely data movement (from one register to another). The following instruction copies the contents of register Ry to register Rx. This mnemonic can be coded with a final . to cause the Rc bit to be set in the underlying instruction. mr Rx,Ry (equivalent to: or Rx,Ry,Ry)
Complement Register
Several Power ISA instructions can be coded in a way such that they complement the contents of one register and place the result into another register. An extended mnemonic is provided that allows this operation to be coded easily. The following instruction complements the contents of register Ry and places the result into register Rx. This mnemonic can be coded with a final . to cause the Rc bit to be set in the underlying instruction. not Rx,Ry (equivalent to: nor Rx,Ry,Ry)
The following instructions may generate either the (old) mtcrf or mfcr instructions or the (new) mtocrf or mfocrf instruction, respectively, depending on the target machine type assembler parameter. mtcrf mfcr FXM,Rx Rx
All three extended mnemonics in this subsection are being phased out. In future assemblers the form mtcr Rx may not exist, and the mtcrf and mfcr mnemonics may generate the old form instructions (with bit 11 = 0) regardless of the target machine type assembler parameter, or may cease to exist.
636
Power ISA I
637
638
Power ISA I
639
Shift Right Immediate, N = 3 (shift amnt < 64) rldimi r4,r3,0,64-sh rldicl r4,r4,64-sh,0 rldimi r3,r2,0,64-sh rldicl r3,r3,64-sh,0 rldicl r2,r2,64-sh,sh Shift Right, N = 2 (shift amnt < 128) subfic r31,r6,64 srd r3,r3,r6 sld r0,r2,r31 or r3,r3,r0 addi r31,r6,-64 srd r0,r2,r31 or r3,r3,r0 srd r2,r2,r6 Shift Right, N = 3 (shift amnt < 64) subfic r31,r6,64 srd r4,r4,r6 sld r0,r3,r31 or r4,r4,r0 srd r3,r3,r6 sld r0,r2,r31 or r3,r3,r0 srd r2,r2,r6
Shift Right Immediate, N = 3 (shift amnt < 32) rlwinm r4,r4,32-sh,sh,31 rlwimi r4,r3,32-sh,0,sh-1 rlwinm r3,r3,32-sh,sh,31 rlwimi r3,r2,32-sh,0,sh-1 rlwinm r2,r2,32-sh,sh,31 Shift Right, N = 2 (shift amnt < 64) subfic r31,r6,32 srw r3,r3,r6 slw r0,r2,r31 or r3,r3,r0 addi r31,r6,-32 srw r0,r2,r31 or r3,r3,r0 srw r2,r2,r6 Shift Right, N = 3 (shift amnt < 32) subfic r31,r6,32 srw r4,r4,r6 slw r0,r3,r31 or r4,r4,r0 srw r3,r3,r6 slw r0,r2,r31 or r3,r3,r0 srw r2,r2,r6
640
Power ISA I
Version 2.06 Revision B Multiple-precision shifts in 64-bit mode, continued [Category: 64-Bit]
Shift Right Algebraic Immediate, N = 3 (shift amnt < 64) rldimi r4,r3,0,64-sh rldicl r4,r4,64-sh,0 rldimi r3,r2,0,64-sh rldicl r3,r3,64-sh,0 sradi r2,r2,sh Shift Right Algebraic, N = 2 (shift amnt < 128) subfic r31,r6,64 srd r3,r3,r6 sld r0,r2,r31 or r3,r3,r0 addic. r31,r6,-64 srad r0,r2,r31 ble $+8 ori r3,r0,0 srad r2,r2,r6 Shift Right Algebraic, N = 3 (shift amnt < 64) subfic r31,r6,64 srd r4,r4,r6 sld r0,r3,r31 or r4,r4,r0 srd r3,r3,r6 sld r0,r2,r31 or r3,r3,r0 srad r2,r2,r6
641
642
Power ISA I
An alternative, shorter, sequence can be used if rounding according to FSCPRRN is desired and FPSCRRN specifies Round toward +Infinity or Round toward -Infinity, or if it is acceptable for the rounded answer to be either of the two representable floating-point integers nearest to the given fixed-point integer. In this case the full convert from unsigned fixed-point integer doubleword function can be implemented with the sequence shown below, assuming the value 264 is in FPR 2. std lfd fcfid fadd fsel r3,disp(r1) f1,disp(r1) f1,f1 f4,f1,f2 f1,f1,f1,f4 #store dword #load float #convert to fp int #add 264 # if r3 < 0
The following sequence can be used, assuming a word at the address in GPR 1 + GPR 2 can be used as scratch space. stwx lfiwax fcfid r3,r1,r2 f1,r1,r2 f1,f1 # store word # load float # convert to fp int
643
xori
RA,RA,1
SignedAdd: xor r5,RA,RB andi. r5,r5, 15 # compare sign codes cmpld cr1,RA,RB # compare magnitudes beq cr0,samesign ble cr1,BminusA # set up for RT = RA -BCD RB nor r9,RB,RB # one's complement of RB addi r10,RA,16 # generate the carry in b submag BminusA: # set up for RT = RB -BCD RA nor r9,RA,RA # one's complement of RA addi r10,RB,16 # generate the carry in submag: rldicr add addg6s rldicr subf b r9,r9,0,59 r8,r10,r9 RT,r10,r9 RT,RT,0,59 RT,RT,r8 done r8,RB,0,59 r10,RA,r11 r9,r10,r8 RT,r10,RB RT,RT,r9 # remove the sign code # add 6's # RT = RA +BCD RB # remove the sign code # remove generated 6 from # sign position
# RT = RA +BCD RB
Subtraction of the unsigned BCD operand in register RA from the unsigned BCD operand in register RB can be accomplished as follows. (In this example it is assumed that RB is not register 0.) addi nor add addg6s subf r1,RB,1 r2,RA,RA r3,r1,r2 RT,r1,r2 RT,RT,r3
Additional instructions are needed to handle signed BCD operands, and BCD operands that occupy more than one register (e.g., unsigned BCD operands that have more than 16 decimal digits).
# generate the carry # RT1 = RA1 +BCD RB1 # propagate the carry
644
Power ISA I
# one's complement of RA0 # Generate the carry # RT1 = RB1 -BCD RA1 # propagate the carry # one's complement of RA0 # RT0 = RB0 -BCD RA0
645
F.3.4 Notes
The following Notes apply to the preceding examples and to the corresponding cases using the other three arithmetic relations (<, , and ). They should also be considered when any other use of fsel is contemplated. In these Notes, the optimized program is the Power ISA program shown, and the unoptimized program (not shown) is the corresponding Power ISA program that uses fcmpu and Branch Conditional instructions instead of fsel. 1. The unoptimized program affects the VXSNAN bit of the FPSCR, and therefore may cause the system error handler to be invoked if the corresponding exception is enabled, while the optimized program does not affect this bit. This property of the optimized program is incompatible with the IEEE standard. 2. The optimized program gives the incorrect result if a is a NaN. 3. The optimized program gives the incorrect result if a and/or b is a NaN (except that it may give the correct result in some cases for the minimum and maximum functions, depending on how those functions are defined to operate on NaNs). 4. The optimized program gives the incorrect result if a and b are infinities of the same sign. (Here it is assumed that Invalid Operation Exceptions are disabled, in which case the result of the subtraction is a NaN. The analysis is more complicated if Invalid Operation Exceptions are enabled, because in that case the target register of the subtraction is unchanged.) 5. The optimized program affects the OX, UX, XX, and VXISI bits of the FPSCR, and therefore may cause the system error handler to be invoked if the corresponding exceptions are enabled, while the unoptimized program does not affect these bits. This property of the optimized program is incompatible with the IEEE standard.
646
Power ISA I
647
648
Power ISA I
649
650
Power ISA II
1.1 Definitions
The following definitions, in addition to those specified in Book I, are used in this Book. In these definitions, Load instruction includes the Cache Management and other instructions that are stated in the instruction descriptions to be treated as a Load, and similarly for Store instruction.
instruction storage The view of storage as seen by the mechanism that fetches instructions. data storage The view of storage as seen by a Load or Store instruction. program order The execution of instructions in the order required by the sequential execution model. (See Section 2.2 of Book I.) A dcbz instruction that modifies storage which contains instructions has the same effect with respect to the sequential execution model as a Store instruction as described there.) storage location A contiguous sequence of one or more bytes in storage. When used in association with a specific instruction or the instruction fetching mechanism, the length of the sequence of one or more bytes is typically implied by the operation. In other uses, it may refer more abstractly to a group of bytes which share common storage attributes. storage access An access to a storage location. There are three (mutually exclusive) kinds of storage access.
system A combination of processors, storage, and associated mechanisms that is capable of executing programs. Sometimes the reference to system includes services provided by the privileged software. main storage The level of storage hierarchy in which all storage state is visible to all processors and mechanisms in the system. primary cache The level of cache closest to the processor. secondary cache After the primary cache, the next closest level of cache to the processor.
651
- instruction fetch
An access for the purpose of fetching an instruction.
- implicit access
An access by the processor for the purpose of address translation or reference and change recording (see Book III-S). caused by, associated with
- caused by
A storage access is said to be caused by an instruction if the instruction is a Load or Store and the access (data access) is to the storage location specified by the instruction.
- associated with
A storage access is said to be associated with an instruction if the access is for the purpose of fetching the instruction (instruction fetch), or is a data access caused by the instruction, or is an implicit access that occurs as a side effect of fetching or executing the instruction. prefetched instructions Instructions for which a copy of the instruction has been fetched from instruction storage, but the instruction has not yet been executed. uniprocessor A system that contains one processor. multiprocessor A system that contains two or more processors. shared storage multiprocessor A multiprocessor that contains some common storage, which all the processors in the system can access. performed A load or instruction fetch by a processor or mechanism (P1) is performed with respect to any processor or mechanism (P2) when the value to be returned by the load or instruction fetch can no longer be changed by a store by P2. A store by P1 is performed with respect to P2 when a load by P2 from the location accessed by the store will return the value stored (or a value stored subsequently). An instruction cache block invalidation by P1 is performed with respect to P2 when an instruction fetch by P2 will not be satisfied from the copy of
1.2 Introduction
The Power ISA User Instruction Set Architecture, discussed in Book I, defines storage as a linear array of bytes indexed from 0 to a maximum of 264-1. Each byte is identified by its index, called its address, and each byte contains a value. This information is sufficient to allow the programming of applications that require no special features of any particular system environment. The Power ISA Virtual Environment Architecture, described herein, expands this simple storage model to include caches, virtual storage, and shared storage multiprocessors. The Power ISA Virtual Environment Architecture, in conjunction with services based on the Power ISA Operating Environment Architecture (see Book III) and provided by the operating system, permits explicit control of this expanded storage model. A simple model for sequential execution allows at most one storage access to be performed at a time and requires that all storage accesses appear to be performed in program order. In contrast to this simple model, the Power ISA specifies a relaxed model of storage consistency. In a multiprocessor system that allows multiple copies of a storage location, aggressive implementations of the architecture can permit intervals of time during which different copies of a storage location have different values. This chapter describes features of the Power ISA that enable programmers to write correct programs for this storage model.
652
Power ISA II
No other accesses are guaranteed to be atomic. For example, the access caused by the following instructions is not guaranteed to be atomic. any Load or Store instruction for which the operand is unaligned lmw, stmw, lswi, lswx, stswi, stswx lfdp, lfdpx, stfdp, stfdpx any Cache Management instruction An access that is not atomic is performed as a set of smaller disjoint atomic accesses. In general, the number and alignment of these accesses are implementation-dependent, as is the relative order in which they are performed. The only exception to the preceding rule is that, for lfdp, lfdpx, stfdp, and stfdpx, if the access is aligned on a doubleword boundary, it is performed as a pair of disjoint atomic doubleword accesses. The results for several combinations of loads and stores to the same or overlapping locations are described below. 1. When two processors execute atomic stores to locations that do not overlap, and no other stores are performed to those locations, the contents of those locations are the same as if the two stores were performed by a single processor. 2. When two processors execute atomic stores to the same storage location, and no other store is performed to that location, the contents of that location are the result stored by one of the processors. 3. When two processors execute stores that have the same target location and are not guaranteed to be atomic, and no other store is performed to that location, the result is some combination of the bytes stored by both processors. 4. When two processors execute stores to overlapping locations, and no other store is performed to those locations, the result is some combination of the bytes stored by the processors to the overlapping bytes. The portions of the locations that do not overlap contain the bytes stored by the processor storing to the location. 5. When a processor executes an atomic store to a location, a second processor executes an atomic load from that location, and no other store is performed to that location, the value returned by the load is the contents of the location before the store or the contents of the location after the store. 6. When a load and a store with the same target location can be executed simultaneously, and no other store is performed to that location, the value returned by the load is some combination of the contents of the location before the store and the contents of the location after the store.
653
654
Power ISA II
1.6.4 Guarded
A data access to a Guarded storage location is performed only if either (a) the access is caused by an instruction that is known to be required by the sequential execution model, or (b) the access is a load and the storage location is already in a cache. If the storage is also Caching Inhibited, only the storage location specified by the instruction is accessed; otherwise any storage location in the cache block containing the specified storage location may be accessed.
655
656
Power ISA II
other processors and mechanisms, and than it is performed in memory. A direct consequence of this consideration is that although programs running on each processor will see the same sequence of accesses from any individual processor to SAO storage, each may in general see a different interleaving of the individual sequences. The memory barrier instructions may be used to establish stronger ordering, as described in Section 1.7.1, Storage Access Ordering, beginning with the third major bullet.
657
value is used to compute the effective address specified by the second Load), the corresponding storage accesses are performed in program order with respect to any processor or mechanism to the extent required by the associated Memory Coherence Required attributes. This applies even if the dependency has no effect on program logic (e.g., the value returned by the first Load is ANDed with zero and then added to the effective address specified by the second Load). When a processor (P1) executes a Synchronize, eieio<S>, or mbar<E> instruction a memory barrier is created, which orders applicable storage accesses pairwise, as follows. Let A be a set of storage accesses that includes all storage accesses associated with instructions preceding the barrier-creating instruction, and let B be a set of storage accesses that includes all storage accesses associated with instructions following the barrier-creating instruction. For each applicable pair ai,bj of storage accesses such that ai is in A and bj is in B, the memory barrier ensures that ai will be performed with respect to any processor or mechanism, to the extent required by the associated Memory Coherence Required attributes, before bj is performed with respect to that processor or mechanism. The ordering done by a memory barrier is said to be cumulative if it also orders storage accesses that are performed by processors and mechanisms other than P1, as follows.
A includes all applicable storage accesses by any such processor or mechanism that have been performed with respect to P1 before the memory barrier is created. B includes all applicable storage accesses by any such processor or mechanism that are performed after a Load instruction executed by that processor or mechanism has returned the value stored by a store that is in B.
No ordering should be assumed among the storage accesses caused by a single instruction (i.e, by an instruction for which the access is not atomic), even if the accesses are to SAO storage, and no means are provided for controlling that order.
658
Power ISA II
659
Programming Note The first example below illustrates cumulative ordering of storage accesses preceding a memory barrier, and the second illustrates cumulative ordering of storage accesses following a memory barrier. Assume that locations X, Y, and Z initially contain the value 0. Example 1: Processor A: stores the value 1 to location X Processor B: loads from location X obtaining the value 1, executes a sync instruction, then stores the value 2 to location Y Processor C: loads from location Y obtaining the value 2, executes a sync instruction, then loads from location X Example 2: Processor A: stores the value 1 to location X, executes a sync instruction, then stores the value 2 to location Y Processor B: loops loading from location Y until the value 2 is obtained, then stores the value 3 to location Z Processor C: loads from location Z obtaining the value 3, executes a sync instruction, then loads from location X In both cases, cumulative ordering dictates that the value loaded from location X by processor C is 1.
660
Power ISA II
Programming Note Before reassigning a virtual address to a different real page, privileged software may need to clear all processors reservations for the original real page in order to avoid a Store Conditional being successful only because the corresponding reservation for the original location is not cleared by a store to the new real page by some other processor or mechanism. This clearing of reservations is unnecessary on processors that support the Store Conditional Page Mobility category. The Store Conditional Page Mobility category does not provide a mechanism for the Store Conditional instruction to detect that a virtual page has been moved to a new real page and back again to the original real page that was accessed by a Load and Reserve instruction. Privileged software that moves a virtual page could clear the reservation on the processor it is running on in order to ensure that a Store Conditional instruction executed by that processor does not succeed in this case. (The stores that occur naturally as part of moving the virtual page will cause any reservations, held by other processors, in the target real page to be lost.)
1.7.3.1
Reservations
The ability to emulate an atomic operation using lwarx and stwcx. is based on the conditional behavior of stwcx., the reservation created by lwarx, and the clearing of that reservation if the target storage location is modified by another processor or mechanism before the stwcx. performs its store. A reservation is held on an aligned unit of real storage called a reservation granule. The size of the reservation granule is 2n bytes, where n is implementation-dependent but is always at least 4 (thus the minimum reservation granule size is a quadword) and, if the Store Conditional Page Mobility category is supported, where 2n is not larger than the smallest real page size supported by the implementation. The reservation granule associated with effective address EA contains the real address to which EA maps. (real_addr(EA) in the RTL for the Load And Reserve and Store Conditional instructions stands for real address to which EA maps.) The reservation also has an associated length, which is equal to the storage operand length, in bytes, of the Load and Reserve instruction that established the reservation. A processor has at most one reservation at any time. A reservation is established by executing a lbarx, lharx, lwarx, or ldarx instruction, as described below, and is lost (or may be lost, in the case of the fourth, fifth, sixth, seventh and ninth<E> item) if any of the following occur. Items 1-8 apply only if the relevant access is performed. (For example, an access that would ordinarily be caused by an instruction might not be performed if
661
Architecture Note An asynchronous interrupt may cause a loss of reservation, including interrupts not visible to or caused by privileged software. Programming Note A reservation may be lost if: Software executes a privileged instruction or utilizes a privileged facility Software accesses storage not intended for general-purpose programming Software performs a Decorated Storage operation Software accesses a Device Control Register <E> Programming Note One use of lwarx and stwcx. is to emulate a Compare and Swap primitive like that provided by the IBM System/370 Compare and Swap instruction; see Section B.1, Atomic Update Primitives on page 715. A System/370-style Compare and Swap checks only that the old and current values of the word being tested are equal, with the result that programs that use such a Compare and Swap to control a shared resource can err if the word has been modified and the old value subsequently restored. The combination of lwarx and stwcx. improves on such a Compare and Swap, because the reservation reliably binds the lwarx and stwcx. together. The reservation is always lost if the word is modified by another processor or mechanism between the lwarx and stwcx., so the stwcx. never succeeds unless the word has not been stored into (by another processor or mechanism) since the lwarx. Programming Note In general, programming conventions must ensure that lwarx and stwcx. specify addresses that match; a stwcx. should be paired with a specific lwarx to the same storage location. Situations in which a stwcx. may erroneously be issued after some lwarx other than that with which it is intended to be paired must be scrupulously avoided. For example, there must not be a context switch in which the processor holds a reservation in behalf of the old context, and the new context resumes after a lwarx and before the paired stwcx.. The stwcx. in the new context might succeed, which is not what was intended by the programmer. Such a situation must be prevented by executing a stbcx., sthcx., stwcx., or stdcx. that specifies a dummy writable aligned location as part of the context switch; see Section 6.4.3 of Book III-S and Section 7.5 of Book III-E.
662
Power ISA II
Programming Note Because the reservation is lost if another processor stores anywhere in the reservation granule, lock words (or bytes, halfwords, or doublewords) should be allocated such that few such stores occur, other than perhaps to the lock word itself. (Stores by other processors to the lock word result from contention for the lock, and are an expected consequence of using locks to control access to shared storage; stores to other locations in the reservation granule can cause needless reservation loss.) Such allocation can most easily be accomplished by allocating an entire reservation granule for the lock and wasting all but one word. Because reservation granule size is implementation-dependent, portable code must do such allocation dynamically. Similar considerations apply to other data that are shared directly using lwarx and stwcx. (e.g., pointers in certain linked lists; see Section B.3, List Insertion on page 719).
Programming Note The architecture does not include a fairness guarantee. In competing for a reservation, two processors can indefinitely lock out a third.
1.7.3.2
Forward Progress
Forward progress in loops that use lwarx and stwcx. is achieved by a cooperative effort among hardware, system software, and application software. The architecture guarantees that when a processor executes a lwarx to obtain a reservation for location X and then a stwcx. to store a value to location X, either 1. the stwcx. succeeds and the value is written to location X, or 2. the stwcx. fails because some other processor or mechanism modified location X, or 3. the stwcx. fails because the processors reservation was lost for some other reason. In Cases 1 and 2, the system as a whole makes progress in the sense that some processor successfully modifies location X. Case 3 covers reservation loss required for correct operation of the rest of the system. This includes cancellation caused by some other processor or mechanism writing elsewhere in the reservation granule, cancellation caused by the operating system in managing certain limited resources such as real storage, and cancellation caused by any of the other effects listed in see Section 1.7.3.1. An implementation may make a forward progress guarantee, defining the conditions under which the system as a whole makes progress. Such a guarantee must specify the possible causes of reservation loss in Case 3. While the architecture alone cannot provide such a guarantee, the characteristics listed in Cases 1 and 2 are necessary conditions for any forward progress guarantee. An implementation and operating system can build on them to provide such a guarantee.
663
The following instruction sequence, executed by the waiting program, will prevent the waiting programs from executing the instruction at location X until location X in instruction storage is consistent with data storage, and then will cause any prefetched instructions to be discarded. lwz r0,flag #loop until flag = 1 (when 1 is cmpwi r0,1 # loaded, location X in instn bne $-8 # storage is consistent with # location X in data storage) isync #discard any prefetched instns In the preceding instruction sequence any context synchronizing instruction (e.g., rfid) can be used instead of isync. (For Case 1 only isync can be used.) For both cases, if two or more instructions in separate data cache blocks have been modified, the dcbst instruction in the examples must be replaced by a sequence of dcbst instructions such that each block containing the modified instructions is copied back to main storage. Similarly, for icbi the sequence must invalidate each instruction cache block containing a location of an instruction that was modified. The sync instruction that appears above between dcbst X and icbi X would be placed between the sequence of dcbst instructions and the sequence of icbi instructions.
664
Power ISA II
Programming Note An example of how failure to satisfy the requirements given above can cause inconsistent information to be presented to the system error handler is as follows. If the value X0 (an illegal instruction) is executed, causing the system illegal instruction handler to be invoked, and before the error handler can load X0 into a register, X0 is replaced with X1, an Add Immediate instruction, it will appear that a legal instruction caused an illegal instruction exception. Programming Note It is possible to apply a patch or to instrument a given program without the need to suspend or halt the program. This can be accomplished by modifying the example shown in the Programming Note at the end of Section 1.8 where one program is creating instructions to be executed by one or more other programs. In place of the Store to a flag to indicate to the other programs that the code is ready to be executed, the program that is applying the patch would replace a patch class instruction in the original program with a Branch instruction that would cause any program executing the Branch to branch to the newly created code. The first instruction in the newly created code must be an isync, which will cause any prefetched instructions to be discarded, ensuring that the execution is consistent with the newly created code. The instruction storage location containing the isync instruction in the patch area must be consistent with data storage with respect to the processor that will execute the patched code before the Store which stores the new Branch instruction is performed. Programming Note It is believed that all processors that comply with versions of the architecture that precede Version 2.01 support concurrent modification and execution of instructions as described in this section if the requirements given above are satisfied, and that most such processors yield boundedly undefined results if the requirements given above are not satisfied. However, in general such support has not been verified by processor testing. Also, one such processor is known to yield undefined results in certain cases if the requirements given above are not satisfied.
665
666
Power ISA II
The placement (location and alignment) of operands in storage affects relative performance of storage accesses, and may affect it significantly. The best performance is guaranteed if storage operands are aligned. In order to obtain the best performance across the widest range of implementations, the programmer should assume the performance model described in Figure 1 with respect to the placement of storage operands for the Embedded environment. For the Server environment, Figure 1 applies for Big-Endian byte ordering, and Little-Endian byte ordering. Performance of storage accesses varies depending on the following:
667
4 Byte
any
1
If an instruction causes an access that is not atomic and any portion of the operand is in storage that is Write Through Required or Caching Inhibited, performance is likely to be poor. 2 If the storage operand spans two virtual pages that have different storage control attributes or, in the Server environment, spans two segments, performance is likely to be poor. 3 The storage operands for Vector instructions are all assumed to be aligned (see Section 6.4 of Book I). Figure 1. Performance effects of storage operand placement
668
Power ISA II
Programming Note There are many events that might cause a Load or Store instruction to be restarted. For example, a hardware error may cause execution of the instruction to be aborted after part of the access has been performed, and the recovery operation could then cause the aborted instruction to be re-executed. When an instruction is aborted after being partially executed, the contents of the instruction pointer indicate that the instruction has not been executed, however, the contents of some registers may have been altered and some bytes within the storage operand may have been accessed. The following are examples of an instruction being partially executed and altering the program state even though it appears that the instruction has not been executed. 1. Load Multiple, Load String: Some registers in the range of registers to be loaded may have been altered. 2. Any Store instruction, dcbz: Some bytes of the storage operand may have been altered.
669
670
Power ISA II
The facilities described in this section provide the means to control the use of resources that are shared with other processors.
Programming Note The ability to access the low-order half of the PPR (and thus the use of mfppr and mtppr) might be phased out in a future version of the architecture. Programming Note By setting the PRI field, a programmer may be able to improve system throughput by causing system resources to be used more efficiently. E.g., if a program is waiting on a lock (see Section B.2), it could set low priority, with the result that more processor resources would be diverted to the program that holds the lock. This diversion of resources may enable the lock-holding program to complete the operation under the lock more quickly, and then relinquish the lock to the waiting program. Programming Note or Rx,Rx,Rx can be used to modify the PRI field; see Section 3.2. Programming Note When the system error handler is invoked, the PRI field may be set to an undefined value.
///
0
PRI
11 14
///
???
26 32
///
44
???
63
///
32
PRI
43 46
///
???
58 63
Bit(s) 11:13
Description Program Priority (PRI) (PPR3243:45) 010 low 011 medium low (normal) 100 medium If other values are written to this field, the PRI field is not changed. (See Section 4.3.4 of Book III-S for additional information.)
26:31 44:63
671
3.2 or Instruction
Setting the PPR
The or Rx,Rx,Rx (see Book I) instruction can be used to set PPRPRI as shown in Figure 3. or. Rx,Rx,Rx does not set PPRPRI. Rx 1 6 2 Figure 3. PPRPRI Priority 010 011 100 low medium low (normal) medium
The following forms of or Rx,Rx,Rx provide hints about usage of shared processor resources.
672
Power ISA II
If the caches are combined, the same value should be given for an instruction cache attribute and the corresponding data cache attribute.
673
Programming Note The LSD, SNSE and SSE fields do not affect the initiation of streams specified using the dcbt and dcbtst instructions. Note that even when SNSE is not set, hardware may detect Stride-N streams in intervals when they access elements that map to sequential cache blocks. Programming Note The contents of the DSCR are intended to be managed by application programs. Access to the DSCR is privileged because, when the DSCR was added to the architecture, adding it as non-privileged would have been incompatible with the application binary interface (ABI) of some operating systems. Operating systems will provide a service that allows application programs to manage the contents of the DSCR.
Figure 4. Bits 58
Data Stream Control Register Name LSD Description Load Stream Disable This bit disables hardware detection and initiation of load streams. Stride-N Stream Enable This bit enables the hardware detection and initiation of load and store streams that have a stride greater than a single cache block. Such load streams are detected only when LSD is also zero. Such store streams are detected only when SSE is also one. Store Stream Enable This bit enables hardware detection and initiation of store streams. Default Prefetch Depth This field supplies a prefetch depth for hardware-detected streams and for software-defined streams for which a depth of zero is specified or for which dcbt/dcbtst with TH=1010 is not used in their description. Values and their meanings are as follows. 0 default (LPCRDPFD) 1 none 2 shallowest 3 shallow 4 medium 5 deep 6 deeper 7 deepest
59
SNSE
60
SSE
61:63
DPFD
The contents of the DSCR affect how a processor handles hardware-detected and software-defined data streams. A move to the DSCR causes all active and nascent data streams to cease to exist. Access to this SPR is privileged; see Book III.
674
Power ISA II
CT values not shown above may be used to specify implementation-dependent cache levels or implementation-dependent portions of a cache structure.
675
X-form
RA,RB ///
11
RA
16
RB
21
982
/
31 0
31
/
6 7
CT
11
RA
16
RB
21
22
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the instruction cache of any processors, the block is invalidated in those instruction caches. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and the block is in the instruction cache of this processor, the block is invalidated in that instruction cache. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Load (see Section 4.3), except that reference and change recording<S> need not be done. Special Registers Altered: None Programming Note Because the instruction is treated as a Load, the effective address is translated using translation resources that are used for data accesses, even though the block being invalidated was copied into the instruction cache based on translation resources used for instruction fetches (see Book III). Programming Note The invalidation of the specified block need not have been performed with respect to the processor executing the icbi instruction until a subsequent isync instruction has been executed by that processor. No other instruction or event has the corresponding effect. Let the effective address (EA) be the sum (RA|0)+(RB). If CT=0, this instruction provides a hint that the program will probably soon execute code from the addressed location. If CT0, the operation performed by this instruction is implementation-dependent, except that the instruction is treated as a no-op for values of CT that are not implemented. The hint is ignored if the block is Caching Inhibited. This instruction is treated as a Load (see Section 4.3), except that the system instruction storage error handler is not invoked. Special Registers Altered: None
676
Power ISA II
D UG /
57
ID
59 60 63
Bit(s) Description 0:56 EATRUNC High-order 57 bits of the effective address of the first element of the data stream. (i.e., the effective address of the first unit of the stream is EATRUNC || 70) 57 Direction (D) 0 1 Subsequent elements increasing addresses. Subsequent elements decreasing addresses. have have
677
59
Reserved
60:63 Stream ID (ID) Stream ID to use for this data stream. 01010 The dcbt/dcbtst instruction provides a hint that describes certain attributes of a data stream, or indicates that the program will probably soon access data streams that have been described using data stream variants of the dcbt/dcbtst instruction, or will probably no longer access such data streams. The EA is interpreted as follows. If GO=1 and S0b00 the hint provided by the instruction is undefined; the remainder of this instruction description assumes that this combination is not used. /// GO S / DEP
0 32 35 36
36:38 Depth (DEP) The DEP field provides a relative estimate of how many elements ahead of the point of stream use the latencyreducing actions should go. This value reflects a comparison of the rate of consumption of the elements of the data stream and the latency to bring an arbitrary element of the stream into cache. The values are as follows. 0 1 2 3 4 5 6 7 default = DSCRDPFD none shallowest shallow medium deep deeper deepest
//
39
UNITCNT T U /
47 57
ID
59 60 63
Bit(s) Description 0:31 32 Reserved GO 0 1 No information is provided by the GO field. For dcbt, the program will probably soon access all nascent load and store data streams that have been completely described, and will probably no longer access all other nascent load and store data streams. All other fields of the EA are ignored. (Nascent and completely described are defined below.) For dcbtst, this field value holds no meaning and is treated as though it were zero.
39:46 Reserved 47:56 UNITCNT Number of units in data stream. 57 Transient (T) If T=1, the programs need for each element of the data stream is likely to be transient (i.e., the time interval during which the program accesses the element is likely to be short). 58 Unlimited (U) If U=1, the number of units in the data stream is unlimited (and the UNITCNT field is ignored). 59 Reserved
60:63 Stream ID (ID) Stream ID to use for this data stream (GO=0 and S=0b00), or stream ID associated with the data stream which the program will probably no longer access(S=0b10).
33:34 Stop (S) 00 No information is provided by the S field. 01 Reserved 10 The program will probably no longer access the data stream (if any) associated with the specified
678
Power ISA II
Programming Note To maximize the utility of the Depth control mechanism, the architecture provides a hierarchy of three ways to program it. In the server environment, the DPFD field in the LPCR is used by the provisory/ firmware to set a safe or appropriate default depth for unaware operating systems and applications. (The corresponding default in the embedded environment is implementation specific.) The DPFD field in the DSCR may be initialized by the aware OS and overwritten by an application via the OSprovided service when per stream control is unnecessary or unaffordable. The DEP field in the EA specification when TH=0b01010 may be used by the application to specify the depth on a per-stream basis. The number of elements ahead of the point of stream use indicated by a given depth value may differ across implementations, as may the latency to bring a given element into the cache. To achieve optimum performance, some experimentation with different depth values may be necessary. 01011 The dcbt/dcbtst instruction provides a hint that describes certain attributes of a data stream. The EA is interpreted as follows. ///
0
Programming Note A program should use a dcbt/dcbtst instruction with TH=0b01011 only when the stride is larger than 128 bytes. Otherwise, consecutive units will be accessed, so the additional stream information has no benefit.
If the specified stream ID value is greater than m -1, where m is the number of stream IDs provided by the implementation, and either (a) TH=0b01000 or TH=0b01011, or (b) TH=0b01010 with GO=0 and S0b11, no hint is provided by the instruction. The following terminology is used to describe the state of a data stream. Except as described in the paragraph after the next paragraph, the state of a data stream at a given time is determined by the most recently provided hint(s) for the stream. A data stream for which only descriptive hints have been provided (by dcbt/dcbtst instructions with TH=0b01000 and UG=0, TH=0b01010 and GO=0 and S=0b00, and/or with TH=0b01011) is said to be nascent. A nascent data stream for which all relevant descriptive hints have been provided (by the dcbt/dcbtst usages listed in the preceding sentence) is considered to be completely described. The order of descriptive hints with respect to one another is unimportant. A data stream for which a hint has been provided (by a dcbt/dcbtst instruction with TH=0b01000 and UG=1 or dcbt with TH=0b01010 and GO=1) that the program will probably soon access it is said to be active. A data stream that is either nascent or active is considered to exist. A data stream for which a hint has been provided (e.g., by a dcbt instruction with TH=0b01010 and S0b00) that the program will probably no longer access it is considered no longer to exist. The hint provided by a dcbt/dcbtst instruction with TH=0b01000 and UG=1 implicitly includes a hint that the program will probably no longer access the data stream (if any) previously associated with the specified stream ID. The hint provided by a dcbt/dcbtst instruction with TH=0b01000 and UG=0, or with TH=0b01010 and GO=0 and S=0b00, or with TH=0b01011 implicitly includes a hint that the program will probably no longer access the active data stream (if any) previously associated with the specified stream ID. If a data stream is specified without using a dcbt/ dcbtst instruction with TH=0b01010 and GO=0 and S=0b00, then the number of elements in the stream is unlimited, and the programs need for each element of the stream is not likely to be transient. If a data stream is specified without using a dcbt/dcbtst instruction with
STRIDE
32
OFFSET
50 56
//
60
ID
63
32:49 Stride The displacement, in words, between the first byte of successive elements in the stream. The effective address of the Nth element in the stream is (N-1)STRIDE4 greater than or less than the effective address of the first element of the stream, depending on the direction specified for the stream. 50 Reserved
51:55 Offset The word-offset of the first element of the stream in its unit (i.e., the effective address of the first element of the stream is (EATRUNC || OFFSET || 0b00)).56:59Reserved 60:63 Stream ID (ID) Stream ID to use for this data stream.
679
680
Power ISA II
Software-specified data streams take precedence over hardware-detected data streams in use of data stream resources. The processors response to a hint that the program will probably no longer access a given data stream, or to the cessation of existence of a data stream, includes releasing the associated data stream resources, so that they can be used by hardware-detected data streams.
681
Programming Note TH=0b10000 If TH=0b10000, the dcbt instruction provides a hint that the program will probably soon load from the block containing the byte addressed by EA, and that the programs need for the block will be transient (i.e. the time interval during which the program accesses the block is likely to be short). The processors response to the hint that access to the block will be transient is to prefetch data into the cache hierarchy in a way that minimizes the displacement of data that has not been identified as transient.
682
Power ISA II
X-form
X-form
///
11
RA
16
RB
21
758
/
31 0
31
RA
16
RB
21
278
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). This instruction provides a hint that the program will probably soon store into a portion of the block and the contents of the rest of the block are not meaningful to the program. The contents of the block are undefined when the instruction completes. The hint is ignored if the block is Caching Inhibited. This instruction is treated as a Store (see Section 4.3) except that the instruction is treated as a no-op if execution of the instruction would cause the system data storage error handler to be invoked. Special Registers Altered: None
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbt instruction provides a hint that describes a block or data stream to which the program may perform a Load access. The instruction is also used to indicate imminent access or end of access to described load and store data streams. A hint that the program will probably soon load from a given storage location is ignored if the location is Caching Inhibited or Guarded<S>. The only operation that is caused by the dcbt instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be caused by or associated with the dcbt instruction (e.g., dcbt is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbt instruction may complete before the operation it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this section. If TH0b01010 and TH0b01011, this instruction is treated as a Load (see Section 4.3), except that the system data storage error handler is not invoked, and reference and change recording<S> need not be done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch instruction so that it can be coded with the TH value as the last operand for all categories, and so that the transient hint can be specified without coding the TH field explicitly. Extended: dcbtct RA,RB,TH Equivalent to: dcbt for TH values of 0b00000 0b00111; other TH values are invalid. dcbtds RA,RB,TH dcbt for TH values of 0b00000 or 0b01000 - 0b01111; other TH values are invalid. dcbtt RA,RB dcbt for TH value of 0b10000
683
Programming Notes New programs should avoid using the dcbt and dcbtst mnemonics; one of the extended mnemonics should be used exclusively. <S> If the dcbt mnemonic is used with only two operands, the TH operand is assumed to be 0b00000. Processors that comply with versions of the architecture that precede Version 2.01 do not necessarily ignore the hint provided by dcbt and dcbtst if the specified block is in storage that is Guarded <S> and not Caching Inhibited. Programming Note See the Programming Notes at the beginning of this section.
RA
16
RB
21
246
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtst instruction provides a hint that describes a block or data stream to which the program may perform a Store access, or indicates the expected use thereof. A hint that the program will soon store to a given storage location is ignored if the location is Caching Inhibited or Guarded<S>. The only operation that is caused by the dcbtst instruction is the providing of the hint. The actions (if any) taken by the processor in response to the hint are not considered to be caused by or associated with the dcbtst instruction (e.g., dcbtst is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by memory barriers. The dcbtst instruction may complete before the operation it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified at the beginning of this section. If TH0b01010 and TH0b01011, this instruction is treated as a Store (see Section 4.3), except that the system data storage error handler is not invoked, reference recording<S> need not be done, and change recording<S> is not done. Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store instruction so that it can be coded with the TH value as the last operand for all categories, and so that the transient hint can be specified without coding the TH field explicitly. Extended: dcbtstct RA,RB,TH Equivalent to: dcbtst for TH values of 0b00000 or 0b00000 - 0b00111; other TH values are invalid. dcbtst for TH values of 0b00000 or 0b01000 - 0b01111; other TH values are invalid. dcbtst for TH value of 0b10000.
dcbtstds RA,RB,TH
dcbtstt RA,RB
Programming Note See the Programming Notes at the beginning of this section.
684
Power ISA II
X-form
RA,RB ///
11
RA
16
RB
21
1014
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA block size (bytes) n m log2(n) EA0:63-m || m0 ea n 0x00 MEM(ea, n) Let the effective address (EA) be the sum (RA|0)+(RB). All bytes in the block containing the byte addressed by EA are set to zero. This instruction is treated as a Store (see Section 4.3). Special Registers Altered: None Programming Note dcbz does not cause the block to exist in the data cache if the block is in storage that is Caching Inhibited. For storage that is neither Write Through Required nor Caching Inhibited, dcbz provides an efficient means of setting blocks of storage to zero. It can be used to initialize large areas of such storage, in a manner that is likely to consume less memory bandwidth than an equivalent sequence of Store instructions. For storage that is either Write Through Required or Caching Inhibited, dcbz is likely to take significantly longer to execute than an equivalent sequence of Store instructions. For example, on some implementations dcbz for such storage may cause the system alignment error handler to be invoked; on such implementations the system alignment error handler sets the specified block to zero using Store instructions. See Section 5.9.1 of Book III-S and Section 6.11.1 of Book III-E for additional information about dcbz.
685
X-form
X-form
RA,RB ///
11
RA
16
RB
21
54
/
31 0
31
RA
16
RB
21
86
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any processor and any locations in the block are considered to be modified there, those locations are written to main storage, additional locations in the block may be written to main storage, and the block ceases to be considered to be modified in that data cache. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and the block is in the data cache of this processor and any locations in the block are considered to be modified there, those locations are written to main storage, additional locations in the block may be written to main storage, and the block ceases to be considered to be modified in that data cache. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Load (see Section 4.3), except that reference and change recording<S> need not be done, and it is treated as a Write with respect to debug events. Special Registers Altered: None
Let the effective address (EA) be the sum (RA|0)+(RB). L=0 If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any processor and any locations in the block are considered to be modified there, those locations are written to main storage and additional locations in the block may be written to main storage. The block is invalidated in the data caches of all processors. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and the block is in the data cache of this processor and any locations in the block are considered to be modified there, those locations are written to main storage and additional locations in the block may be written to main storage. The block is invalidated in the data cache of this processor. L=1 (dcbf local) [Category: Server, Embedded.Phased-In] The L=1 form of the dcbf instruction permits a program to limit the scope of the flush operation to the data cache of this processor. If the block containing the byte addressed by EA is in the data cache of this processor, it is removed from this cache. The coherence of the block is maintained to the extent required by the Memory Coherence Required storage attribute. L = 3 (dcbf local primary) [Category: Server, Embedded.Phased-In] The L=3 form of the dcbf instruction permits a program to limit the scope of the flush operation to the primary data cache of this processor. If the block containing the byte addressed by EA is in the primary data cache of this processor, it is removed from this cache. The coherence of the block is maintained to the extent required by the Memory Coherence Required storage attribute. For the L operand, the value 2 is reserved. The results of executing a dcbf instruction with L=2 are boundedly undefined. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited.
686
Power ISA II
Except in the dcbf instruction description in this section, references to dcbf in Books I-III imply L=0 unless otherwise stated or obvious from context; dcbfl is used for L=1 and dcbflp is used for L=3. Programming Note dcbf serves as both a basic and an extended mnemonic. The Assembler will recognize a dcbf mnemonic with three operands as the basic form, and a dcbf mnemonic with two operands as the extended form. In the extended form the L operand is omitted and assumed to be 0. Programming Note dcbf with L=1 can be used to provide a hint that a block in this processors data cache will not be reused soon. dcbf with L=3 can be used to flush a block from the processors primary data cache but reduce the latency of a subsequent access. For example, the block may be evicted from the primary data cache but a copy retained in a lower level of the cache hierarchy. Programs which manage coherence in software must use dcbf with L=0.
687
XL-form
///
11
///
16
///
21
150
/
31
Executing an isync instruction ensures that all instructions preceding the isync instruction have completed before the isync instruction completes, and that no subsequent instructions are initiated until after the isync instruction completes. It also ensures that all instruction cache block invalidations caused by icbi instructions preceding the isync instruction have been performed with respect to the processor executing the isync instruction, and then causes any prefetched instructions to be discarded. Except as described in the preceding sentence, the isync instruction may complete before storage accesses associated with instructions preceding the isync instruction have been performed. This instruction is context synchronizing (see Book III). Special Registers Altered: None
688
Power ISA II
Programming Note Because the Load And Reserve and Store Conditional instructions have implementation dependencies (e.g., the granularity at which reservations are managed), they must be used with care. The operating system should provide system library programs that use these instructions to implement the high-level synchronization functions (Test and Set, Compare and Swap, locking, etc.; see Appendix B) that are needed by application programs. Application programs should use these library programs, rather than use the Load And Reserve and Store Conditional instructions directly. Programming Note EH = 1 should be used when the program is obtaining a lock variable which it will subsequently release before another program attempts to perform a store to it. When contention for a lock is significant, using this hint may reduce the number of times a cache block is transferred between processor caches. EH = 0 should be used when all accesses to a mutex variable are performed using an instruction sequence with Load and Reserve followed by Store Conditional (e.g., emulating atomic update primitives such as Fetch and Add; see Appendix B). The processor may use this hint to optimize the cache to cache transfer of the block containing the mutex variable, thus reducing the latency of performing an operation such as Fetch and Add.
X-form
RT,RA,RB,EH RT
11
RA
16
RB
21
52
EH
31
0 (RA)
689
X-
RT,RA,RB,EH RT
11
RT,RA,RB,EH RT
11
RA
16
RB
21
20
EH
31
RA
16
RB
21
116
EH
31
if RA = 0 then b 0 (RA) else b EA b +(RB) 1 RESERVE 2 RESERVE_LENGTH RESERVE_ADDR real_addr(EA) 48 0 || MEM(EA, 2) RT Let the effective address (EA) be the sum (RA|0)+(RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. This instruction creates a reservation for use by a sthcx. instruction. A real address computed from the EA as described in Section 1.7.3.1 is associated with the reservation, and replaces any address previously associated with the reservation. A length of 2 bytes is associated with the reservation, and replaces any length previously associated with the reservation. The value of EH provides a hint as to whether the program will perform a subsequent store to the halfword in storage addressed by EA before some other processor attempts to modify it. 0 Other programs might attempt to modify the halfword in storage addressed by EA regardless of the result of the corresponding sthcx. instruction. 1 Other programs will not attempt to modify the halfword in storage addressed by EA until the program that has acquired the lock performs a subsequent store releasing the lock. EA must be a multiple of 2. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: None Programming Note lharx serves as both a basic and an extended mnemonic. The Assembler will recognize a lharx mnemonic with four operands as the basic form, and a lharx mnemonic with three operands as the extended form. In the extended form the EH operand is omitted and assumed to be 0.
if RA = 0 then b 0 else b (RA) b +(RB) EA 1 RESERVE RESERVE_LENGTH 4 real_addr(EA) RESERVE_ADDR 32 0 || MEM(EA, 4) RT Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. This instruction creates a reservation for use by a stwcx. instruction. A real address computed from the EA as described in Section 1.7.3.1 is associated with the reservation, and replaces any address previously associated with the reservation. A length of 4 bytes is associated with the reservation, and replaces any length previously associated with the reservation. The value of EH provides a hint as to whether the program will perform a subsequent store to the word in storage addressed by EA before some other processor attempts to modify it. 0 Other programs might attempt to modify the word in storage addressed by EA regardless of the result of the corresponding stwcx. instruction. 1 Other programs will not attempt to modify the word in storage addressed by EA until the program that has acquired the lock performs a subsequent store releasing the lock. EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: None Programming Note lwarx serves as both a basic and an extended mnemonic. The Assembler will recognize a lwarx mnemonic with four operands as the basic form, and a lwarx mnemonic with three operands as the extended form. In the extended form the EH operand is omitted and assumed to be 0.
690
Power ISA II
X-form
RS,RA,RB RS
11
RA
16
RB
21
694
1
31
established the reservation, it is undefined whether (RS)56:63 are stored into the byte in storage addressed by EA. Otherwise, no store is performed. If the Store Conditional Page Mobility category is not supported, it is undefined whether (RS)56:63 are stored into the byte in storage addressed by EA. If a reservation exists and the length associated with the reservation is not 1 byte, it is undefined whether (RS)56:63 are stored into the byte in storage addressed by EA. If a reservation does not exist, no store is performed. CR Field 0 is set as follows. n is a 1-bit value that indicates whether the store was performed, except that if, per the preceding description, it is undefined whether the store is performed, the value of n is undefined (and need not reflect whether the store was performed). CR0LT GT EQ SO = 0b00 || n || XERSO The reservation is cleared. Special Registers Altered: CR0
if RA = 0 then b 0 (RA) else b b + (RB) EA if RESERVE then if RESERVE_LENGTH = 1 then if RESERVE_ADDR = real_addr(EA) then (RS)56:63 MEM(EA, 1) undefined_case 0 1 store_performed else if SCPM category supported then smallest real page size supported by z implementation if RESERVE_ADDR z = real_addr(EA) z then 1 undefined_case else 0 undefined_case 0 store_performed else undefined_case 1 else undefined_case 1 else 0 undefined_case store_performed 0 if undefined_case then undefined 1-bit value u1 if u1 then MEM(EA, 1) (RS)56:63 u2 undefined 1-bit value CR0 0b00 || u2 || XERSO else 0b00 || store_performed || XERSO CR0 RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 1 byte, and the real storage location specified by the stbcx. is the same as the real storage location specified by the lbarx instruction that established the reservation, (RS)56:63 are stored into the byte in storage addressed by EA. If a reservation exists, the length associated with the reservation is 1 byte, and the real storage location specified by the stbcx. is not the same as the real storage location specified by the lbarx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location specified by the stbcx. is in the same z as the real storage location specified by the lbarx instruction that
691
X-
RS,RA,RB RS
11
RA
16
RB
21
726
1
31
age location specified by the lharx instruction that established the reservation, it is undefined whether (RS)48:63 are stored into the halfword in storage addressed by EA. Otherwise, no store is performed. If the Store Conditional Page Mobility category is not supported, it is undefined whether (RS)48:63 are stored into the halfword in storage addressed by EA. If a reservation exists and the length associated with the reservation is not 2 bytes, it is undefined whether (RS)48:63 are stored into the halfword in storage addressed by EA. If a reservation does not exist, no store is performed. CR Field 0 is set as follows. n is a 1-bit value that indicates whether the store was performed, except that if, per the preceding description, it is undefined whether the store is performed, the value of n is undefined (and need not reflect whether the store was performed). CR0LT GT EQ SO = 0b00 || n || XERSO The reservation is cleared. EA must be a multiple of 2. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: CR0
if RA = 0 then b 0 (RA) else b EA b + (RB) if RESERVE then if RESERVE_LENGTH = 2 then if RESERVE_ADDR = real_addr(EA) then (RS)48:63 MEM(EA, 2) undefined_case 0 store_performed 1 else if SCPM category supported then smallest real page size supported by z implementation if RESERVE_ADDR z = real_addr(EA) z then undefined_case 1 else 0 undefined_case store_performed 0 else 1 undefined_case else 1 undefined_case else undefined_case 0 0 store_performed if undefined_case then u1 undefined 1-bit value if u1 then (RS)48:63 MEM(EA, 2) u2 undefined 1-bit value 0b00 || u2 || XERSO CR0 else CR0 0b00 || store_performed || XERSO RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 2 bytes, and the real storage location specified by the sthcx. is the same as the real storage location specified by the lharx instruction that established the reservation, (RS)48:63 are stored into the halfword in storage addressed by EA. If a reservation exists, the length associated with the reservation is 2 bytes, and the real storage location specified by the sthcx. is not the same as the real storage location specified by the lharx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location specified by the sthcx. is in the same z as the real stor-
692
Power ISA II
X-form
RS,RA,RB RS
11
RA
16
RB
21
150
1
31
(RS)32:63 are stored into the word in storage addressed by EA. Otherwise, no store is performed. If the Store Conditional Page Mobility category is not supported, it is undefined whether (RS)32:63 are stored into the word in storage addressed by EA. If a reservation exists and the length associated with the reservation is not 4 bytes, it is undefined whether (RS)32:63 are stored into the word in storage addressed by EA. If a reservation does not exist, no store is performed. CR Field 0 is set as follows. n is a 1-bit value that indicates whether the store was performed, except that if, per the preceding description, it is undefined whether the store is performed, the value of n is undefined (and need not reflect whether the store was performed). CR0LT GT EQ SO = 0b00 || n || XERSO The reservation is cleared. EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: CR0
if RA = 0 then b 0 else b (RA) b + (RB) EA if RESERVE then if RESERVE_LENGTH = 4 then if RESERVE_ADDR = real_addr(EA) then (RS)32:63 MEM(EA, 4) undefined_case 0 1 store_performed else if SCPM category supported then smallest real page size supported by z implementation if RESERVE_ADDR z = real_addr(EA) z then 1 undefined_case else undefined_case 0 0 store_performed else undefined_case 1 else 1 undefined_case else undefined_case 0 0 store_performed if undefined_case then undefined 1-bit value u1 if u1 then MEM(EA, 4) (RS)32:63 u2 undefined 1-bit value 0b00 || u2 || XERSO CR0 else 0b00 || store_performed || XERSO CR0 RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 4 bytes, and the real storage location specified by the stwcx. is the same as the real storage location specified by the lwarx instruction that established the reservation, (RS)32:63 are stored into the word in storage addressed by EA. If a reservation exists, the length associated with the reservation is 4 bytes, and the real storage location specified by the stwcx. is not the same as the real storage location specified by the lwarx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size is the smallest real page size supported by the implementation. If the real storage location specified by the stwcx. is in the same z as the real storage location specified by the lwarx instruction that established the reservation, it is undefined whether
693
Version 2.06 Revision B 4.4.2.1 64-Bit Load and Reserve and Store Conditional Instructions [Category: 64-Bit]
Load Doubleword And Reserve Indexed X-form
ldarx 31
0 6
RT,RA,RB,EH RT
11
RA
16
RB
21
84
EH
31 0
31
RA
16
RB
21
214
1
31
if RA = 0 then b 0 else b (RA) b +(RB) EA 1 RESERVE RESERVE_LENGTH 8 real_addr(EA) RESERVE_ADDR MEM(EA, 8) RT Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into RT. This instruction creates a reservation for use by a stdcx. instruction. A real address computed from the EA as described in Section 1.7.3.1 is associated with the reservation, and replaces any address previously associated with the reservation. A length of 8 bytes is associated with the reservation, and replaces any length previously associated with the reservation. The value of EH provides a hint as to whether the program will perform a subsequent store to the doubleword in storage addressed by EA before some other processor attempts to modify it. 0 Other programs might attempt to modify the doubleword in storage addressed by EA regardless of the result of the corresponding stdcx. instruction. 1 Other programs will not attempt to modify the doubleword in storage addressed by EA until the program that has acquired the lock performs a subsequent store releasing the lock. EA must be a multiple of 8. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. Special Registers Altered: None Programming Note ldarx serves as both a basic and an extended mnemonic. The Assembler will recognize a ldarx mnemonic with four operands as the basic form, and a ldarx mnemonic with three operands as the extended form. In the extended form the EH operand is omitted and assumed to be 0.
if RA = 0 then b 0 else b (RA) b + (RB) EA if RESERVE then if RESERVE_LENGTH = 8 then if RESERVE_ADDR = real_addr(EA) then (RS) MEM(EA, 8) undefined_case 0 1 store_performed else if SCPM category supported then smallest real page size supported by z implementation if RESERVE_ADDR z = real_addr(EA) z then 1 undefined_case else undefined_case 0 0 store_performed else undefined_case 1 else 1 undefined_case else undefined_case 0 0 store_performed if undefined_case then undefined 1-bit value u1 if u1 then MEM(EA, 8) (RS) undefined 1-bit value u2 0b00 || u2 || XERSO CR0 else 0b00 || store_performed || XERSO CR0 RESERVE 0 Let the effective address (EA) be the sum (RA|0)+(RB). If a reservation exists, the length associated with the reservation is 8 bytes, and the real storage location specified by the stdcx. is the same as the real storage location specified by the ldarx instruction that established the reservation, (RS) is stored into the doubleword in storage addressed by EA. If a reservation exists, the length associated with the reservation is 8 bytes, and the real storage location specified by the stdcx. is not the same as the real storage location specified by the ldarx instruction that established the reservation, the following applies. If the Store Conditional Page Mobility category is supported, the following applies. Let z denote a naturally aligned block of real storage whose size
694
Power ISA II
695
Synchronize
sync 31
0 6
X-form
If L=0 (or L=2<S>), the sync instruction has the following additional properties. Executing the sync instruction ensures that all instructions preceding the sync instruction have completed before the sync instruction completes, and that no subsequent instructions are initiated until after the sync instruction completes. The sync instruction is execution synchronizing (see Book III). However, address translation and reference and change recording<S> (see Book III) associated with subsequent instructions may be performed before the sync instruction completes. The memory barrier provides the additional ordering function such that if a given instruction that is the result of a store in set B is executed, all applicable storage accesses in set A have been performed with respect to the processor executing the instruction to the extent required by the associated memory coherence properties. The single exception is that any storage access in set A that is caused by an icbi instruction executed by the processor executing the sync instruction (P1) may not have been performed with respect to P1 (see the description of the icbi instruction on page 676). The cumulative properties of the barrier apply to the execution of the given instruction as they would to a load that returned a value that was the result of a store in set B. The sync instruction provides an ordering function for the operations caused by the stream variants of the dcbt and dcbtst instructions (i.e. the providing of hints). The value L=3 is reserved. The sync instruction may complete before storage accesses associated with instructions preceding the sync instruction have been performed. The sync instruction may complete before operations caused by dcbt and dcbtst instructions preceding the sync instruction have been performed.
L /// L
9 11
///
16
///
21
598
/
31
The sync instruction creates a memory barrier (see Section 1.7.1). The set of storage accesses that is ordered by the memory barrier depends on the value of the L field. L=0 (heavyweight sync) The memory barrier provides an ordering function for the storage accesses associated with all instructions that are executed by the processor executing the sync instruction. The applicable pairs are all pairs ai,bj in which bj is a data access, except that if ai is the storage access caused by an icbi instruction then bj may be performed with respect to the processor executing the sync instruction before ai is performed with respect to that processor. L=1 (lightweight sync) The memory barrier provides an ordering function for the storage accesses caused by Load, Store, and dcbz instructions that are executed by the processor executing the sync instruction and for which the specified storage location is in storage that is Memory Coherence Required and is neither Write Through Required nor Caching Inhibited. The applicable pairs are all pairs ai,bj of such accesses except those in which ai is an access caused by a Store or dcbz instruction and bj is an access caused by a Load instruction. L=2<S> The set of storage accesses that is ordered by the memory barrier is described in Section 5.9.2 of Book III-S, as are additional properties of the sync instruction with L=2. The ordering done by the memory barrier is cumulative.
696
Power ISA II
Programming Note The sync instruction can be used to ensure that all stores into a data structure, caused by Store instructions executed in a critical section of a program, will be performed with respect to another processor before the store that releases the lock is performed with respect to that processor; see Section B.2, Lock Acquisition and Release, and Related Techniques on page 717. The memory barrier created by a sync instruction with L=0 or L=1 does not order implicit storage accesses. The memory barrier created by a sync instruction with any L value does not order instruction fetches. (The memory barrier created by a sync instruction with L=0 or L=2<S>; see Book III appears to order instruction fetches for instructions preceding the sync instruction with respect to data accesses caused by instructions following the sync instruction. However, this ordering is a consequence of the first additional property of sync with L=0, not a property of the memory barrier.) In order to obtain the best performance across the widest range of implementations, the programmer should use the sync instruction with L=1, or the eieio<S> or mbar<E> instruction, if any of these is sufficient for his needs; otherwise he should use sync with L=0. sync with L=2<S> should not be used by application programs. Programming Note The functions provided by sync with L=1 are a strict subset of those provided by sync with L=0. (The functions provided by sync with L=2<S> are a strict superset of those provided by sync with L=0; see Book III.)
Except in the sync instruction description in this section, references to sync in Books I-III imply L=0 unless otherwise stated or obvious from context; the appropriate extended mnemonics are used when other L values are intended. Programming Note Section 1.8 contains a detailed description of how to modify instructions such that a well-defined result is obtained. Programming Note sync serves as both a basic and an extended mnemonic. The Assembler will recognize a sync mnemonic with one operand as the basic form, and a sync mnemonic with no operand as the extended form. In the extended form the L operand is omitted and assumed to be 0.
697
Memory Barrier
mbar MO [Category: Embedded]
X-form
///
11
///
16
///
21
854
/
31 0
31
6
MO
11
///
16
///
21
854
/
31
The eieio instruction creates a memory barrier (see Section 1.7.1, Storage Access Ordering), which provides an ordering function for the storage accesses caused by Load, Store, dcbz, eciwx, and ecowx instructions executed by the processor executing the eieio instruction. These storage accesses are divided into the two sets listed below. The storage access caused by an eciwx instruction is ordered as a load, and the storage access caused by a dcbz or ecowx instruction is ordered as a store. 1. Loads and stores to storage that is both Caching Inhibited and Guarded, and stores to main storage caused by stores to storage that is Write Through Required. The applicable pairs are all pairs ai,bj of such accesses. 2. Stores to storage that is Memory Coherence Required and is neither Write Through Required nor Caching Inhibited. The applicable pairs are all pairs ai,bj of such accesses. The operations caused by the stream variants of the dcbt and dcbtst instructions (i.e. the providing of hints) are ordered by eieio as a third set of operations, and the operations caused by tlbie<S> and tlbsync instructions (see Book III-S) are ordered by eieio as a fourth set of operations. Each of the four sets of storage accesses or operations is ordered independently of the other three sets. The ordering done by eieio's memory barrier for the second set is cumulative; the ordering done by eieio's memory barrier for the other three sets is not cumulative. The eieio instruction may complete before storage accesses associated with instructions preceding the eieio instruction have been performed. The eieio instruction may complete before operations caused by dcbt and dcbtst instructions preceding the eieio instruction have been performed Special Registers Altered: None
When MO=0, the mbar instruction creates a cumulative memory barrier (see Section 1.7.1, Storage Access Ordering), which provides an ordering function for the storage accesses executed by the processor executing the mbar instruction. When MO0, an implementation may support the mbar instruction ordering a particular subset of storage accesses. An implementation may also support multiple, non-zero values of MO that each specify a different subset of storage accesses that are ordered by the mbar instruction. Which subsets of storage accesses that are ordered and which values of MO that specify these subsets is implementation-dependent. The mbar instruction may complete before storage accesses associated with instructions preceding the mbar instruction have been performed. The mbar instruction may complete before operations caused by dcbt and dcbtst instructions preceding the sync instruction have been performed. Special Registers Altered: None Programming Note The eieio<S> and mbar<E> instructions are intended for use in doing memory-mapped I/O). Because loads, and separately stores, to storage that is both Caching Inhibited and Guarded are performed in program order (see Section 1.7.1, Storage Access Ordering on page 658), eieio<S> or mbar<E> is needed for such storage only when loads must be ordered with respect to stores. For the eieio<S> instruction, accesses in set 1, ai and bj need not be the same kind of access or be to storage having the same storage control attributes. For example, ai can be a load to Caching Inhibited, Guarded storage, and bj a store to Write Through Required storage. If stronger ordering is desired than that provided by eieio<S> or mbar<E>, the sync instruction must be used, with the appropriate value in the L field.
698
Power ISA II
Programming Note The functions provided by eieio<S> and mbar<E> are a strict subset of those provided by sync with L=0. The functions provided by eieio<S> for its second set are a strict subset of those provided by sync with L=1. Since eieio<S> and mbar<E>share the same opcode, software designed for both server and embedded environments must assume that only the eieio<S> functionality applies since the functions provided by eieio are a subset of those provided by mbar with MO=0.
X-form
/// WC
9 11
///
16
///
21
62
/
31
///
11
///
16
///
21
62
/
31
The wait instruction allows instruction fetching and execution to be suspended under certain conditions, depending on the value of the WC field. A wait instruction without the WC field is treated as a wait instruction with WC=0. The defined values for WC are as follows. 0b00 0b01 Resume instruction fetching and execution when an interrupt occurs. Resume instruction fetching and execution when an interrupt occurs or when a reservation made by the processor does not exist (see Section 1.7.3). It is implementationdependent whether this WC value is supported or wait with this WC value is treated as a no-op. Resume instruction fetching and execution when an interrupt occurs or when an implementation-specific condition exists. It is implementation-dependent whether this WC value is supported or is treated as reserved. This WC value is treated as a no-op.
0b10
0b11
If WC=0, or if WC=1 and a reservation made by the processor exists when the wait instruction is executed, or if WC=2 and the associated implementation-specific condition does not exist when the wait instruction is executed, the following applies. Instruction fetching and execution is suspended. Once the wait instruction has completed, the NIA will point to the next sequential instruction. Instruction fetching and execution resumes when any of the following conditions are met. An interrupt occurs. WC=1 and a reservation made by the processor does not exist. WC=2 and the associated implementationspecific condition exists.
699
Programming Note Engineering Note Execution of a wait instruction indicates that no further instruction fetching will occur until the condition(s) associated with the WC field value for the instruction take place. The main purpose of the wait instruction is to enable power savings. wait frees computational resources which might be allocated to another program or converted into power savings. If an interrupt causes resumption of instruction execution, the interrupt handler will return to the instruction after the wait. Programming Note Engineering Note In previous versions of the architecture the wait instruction was context synchronizing.
Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for Wait: Extended: wait waitrsv waitimpl Programming Note The wait instruction with WC=0b00 can be used in verification test cases to signal the end of a test case. The encoding for the instruction is the same in both Big-Endian and Little-Endian modes. Programming Note The wait instruction may be useful as the primary instruction of an idle process or the completion of processing for a cooperative processor. However, overall system performance may be better served if the wait instruction is used by applications only for idle times that are expected to be short. Equivalent to: wait 0 wait 1 wait 2
700
Power ISA II
Programming Note Engineering Note The wait instruction is not execution synchronizing and does not cause a memory barrier. waitrsv behavior relative to a preceding Load and Reserve instruction or Store Conditional instruction has a data dependency on the reservation. When execution proceeds past waitrsv as the result of another processor storing to the reservation granule, a subsequent load from the same storage location may return stale data. It is also possible that execution could proceed past the waitrsv for other reasons such as the occurrence of an interrupt. There are no architecturally defined means to determine what terminated the wait. Moreover, even if software were to attempt to determine what caused the wait to terminate, by the time the check occurred, both causes (interrupt and storage modification) might be true. Software must be designed to deal with the various causes of wait termination. In general, if the program that performed wait does not see the new value of the storage location for which the reservation was held, it should re-establish the reservation by repeating the Load and Reserve instruction, and then perform another waitrsv. The following code waits for a device to update a memory location and assumes that r3 contains the address of the word to be updated. This assumes that software has already set this word to zero and is waiting for the device to update the word to a non-zero value. loop: lwarx r4,0,r3 # load and reserve cmpwi r4,0 # exit if nonzero bne- exit waitrsv # wait for reservation loss b loop exit: ... The b instruction results in re-execution of the waitrsv if instruction execution had resumed for some reason other than loss of the reservation made by the processor. This branch instruction is also necessary because the reservation might have been lost for reasons other than the device updating the memory location addressed by r3. Also, even if the device updated this memory location, the lwarx and waitrsv instructions may need to be re-executed until the lwarx returns the current data.
Programming Note Engineering Note A wait instruction without the WC field is treated as a wait instruction with WC=0b00 because older processors that comply with Power ISA 2.06 do not support the WC field.
701
702
Power ISA II
updated and other frequencies, such as the CPU clock or bus clock. The Time Base update frequency is not required to be constant. What is required, so that system software can keep time of day and operate interval timers, is one of the following. The system provides an (implementation-dependent) interrupt to software whenever the update frequency of the Time Base bits 0:59 changes, and a means to determine what the current update frequency is. The update frequency of the Time Base bits 0:59 is under the control of the system software. Programming Note If the operating system initializes the Time Base on power-on to some reasonable value and the update frequency of the Time Base is constant, the Time Base can be used as a source of values that increase at a constant rate, such as for time stamps in trace entries. Even if the update frequency is not constant, values read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Time Base values can be post-processed to become actual time values. Successive readings of the Time Base may return identical values.
TBL
63
Description Upper 32 bits of Time Base Lower 32 bits of Time Base Time Base
The Time Base bits 0:59 increment until their value becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the next increment their value becomes 0x000_0000_0000_0000. There is no interrupt or other indication when this occurs. Time base bits 60:63 may increment at a variable rate. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of bit 59 changes, they remain at 0xF until the value of bit 59 changes. 2 64 32 TTB = --------------------- = 5.90 x 1011 seconds 1GHz which is approximately 18,700 years. The Power ISA AS does not specify a relationship between the frequency at which the Time Base is
703
Programming Note The mfspr instruction can be used to read the Time Base on all processors that comply with Version 2.01 of the architecture or with any subsequent version. It is believed that the mfspr instruction can be used to read the Time Base on most processors that comply with versions of the architecture that precede Version 2.01. Processors for which mfspr cannot be used to read the Time Base include the following. 601 POWER3 (601 implements neither the Time Base nor mftb, but depends on software using mftb to read the Time Base, so that the attempt causes the Illegal Instruction error handler to be invoked and thereby permits the operating system to emulate the Time Base.)
XFX-form
RT
11
tbr
21
371
/
31
This instruction behaves as if it were an mfspr instruction; see the mfspr instruction description in Section 3.3.15 of Book I. Special Registers Altered: None Extended Mnemonics: Extended mnemonics for Move From Time Base: Extended: mftb mftbu Rx Rx Equivalent to: mftb Rx,268 mfspr Rx,268 mftb Rx,269 mfspr Rx,269
Programming Note New programs should use mfspr instead of mftb to access the Time Base. Programming Note mftb serves as both a basic and an extended mnemonic. The Assembler will recognize an mftb mnemonic with two operands as the basic form, and an mftb mnemonic with one operand as the extended form. In the extended form the TBR operand is omitted and assumed to be 268 (the value that corresponds to TB).
704
Power ISA II
For the Embedded environment when the processor is in 32-bit mode, it is not possible to read the Time Base using a single instruction. Instead, two instructions must be used, one of which reads TBL and the other of which reads TBU. Because of the possibility of a carry from TBL to TBU occurring between the two reads, a sequence such as the following must be used to read the Time Base. loop: mfspr Rx,TBU # load from TBU mfspr Ry,TB # load from TB mfspr Rz,TBU # load from TBU cmp cr0,0,Rx,Rz# check if old=new bne loop #branch if carry occurred
1. Described in POSIX Draft Standard P1003.4/D12, Draft Standard for Information Technology -- Portable Operating System Interface (POSIX) -Part 1: System Application Program Interface (API) - Amendment 1: Real-time Extension [C Language]. Institute of Electrical and Electronics Engineers, Inc., Feb. 1992.
705
ATBL
63
Figure 6.
The ATBL register is an aliased name for the ATB. The Alternate Time Base increments until its value becomes 0xFFFF_FFFF_FFFF_FFFF (264 - 1). At the next increment, its value becomes 0x0000_0000_0000_0000. There is no explicit indication (such as an interrupt; see Book III) that this has occurred. The Alternate Time Base is accessible in both user and supervisor mode. The counter can be read by executing a mfspr instruction specifying the ATB (or ATBL) register, but cannot be written. A second SPR register ATBU, is defined that accesses only the upper 32 bits of the counter. Thus the upper 32 bits of the counter may be read into a register by reading the ATBU register. The effect of entering a power-savings mode or of processor frequency changes on counting in the Alternate Time Base is implementation-dependent.
706
Power ISA II
The Decorated Storage facility provides Load, Store, and Notify operations to storage that have additional semantics other than the reading and writing of data values to the addressed storage locations. A decoration is specified that provides semantic information about how the operation is to be performed. A decorated device is a device that implements an address range of storage, and applies decorations to operations performed on the address range of storage. A Decorated Storage instruction specifies the following attributes: The type of access, which is either a Decorated Load, Decorated Store, or a Decorated Notify. The EA in register RB, to which the operation is to be performed. The decoration in register RA, which further defines what operation should be performed by the decorated device. The data itself, either data provided by the processor to the decorated device (in the case of a Decorated Store), or the data provided by the decorated device to be consumed by the processor (in the case of a Decorated Load). Decorated Notify operations do not contain data. The semantics of any Decorated Storage operation that is Caching Inhibited are defined by the decorated device depending on whether it is a Decorated Load, Decorated Store, or Decorated Notify, and the value supplied as a decoration. Such semantics may differ from decorated device to decorated device similar to how devices other than well-behaved memory may treat Load and Store operations. The semantics of any operation associated with a Decorated Storage operation that is not Caching Inhibited are the same as an analogous Load or Store instruction of the same data size. The results of a Decorated Storage operation that is Caching Inhibited to a device that does not support decorations is boundedly undefined. The results of a Load or Store operation that is Caching Inhibited to a
decorated device that requires a decoration is boundedly undefined. For Decorated Load operations, a Load operation with the specified decoration is performed to the EA and the data provided by the decorated device is placed in the target register. For Decorated Store operations, a Store operation using the data specified in the source register with the specified decoration is performed to the EA. Decorated Load instructions are treated as Load instructions for address translation, access control, debug events, storage attributes, alignment, and memory access ordering. Decorated Store instructions are treated as Store instructions for address translation, access control, debug events, storage attributes, alignment, and memory access ordering. A Decorated Notify instruction is treated as a zero byte Store for address translation, access control, debug events, storage attributes, alignment, and memory access ordering. Programming Note Software should be acutely aware of how transactions to a decorated device that implements Decorated Storage will occur. Not only does this imply knowing the particular decorated devices semantics, but also ensuring that the transactions are appropriately issued by the processor. This includes alignment, speculative accesses, and ordering. In general, Caching Inhibited accesses are required to be Guarded and properly aligned.
707
X-
RT,RA,RB RT
11
RA
16
RB
21
515
/
31 0
31
RA
16
RB
21
611
/
31
EA RT
(RB) 56 0 || MEM_DECORATED(EA,1,(RA))
EA RT
(RB) MEM_DECORATED(EA,8,(RA))
Let the effective address (EA) be the contents of RB. The byte in storage addressed by EA is loaded using the decoration supplied by (RA) into RT56:63. RT0:55 are set to 0. Special Registers Altered: None
Let the effective address (EA) be the contents of RB. The doubleword in storage addressed by EA is loaded using the decoration supplied by (RA) into RT. Special Registers Altered: None
X-form
RT,RA,RB 31 RT
11
RA
803
/
31
RA
16
RB
21
547
/
31
EA RT
EA FRT
(RB) MEM_DECORATED(EA,8,(RA))
Let the effective address (EA) be the contents of RB. The halfword in storage addressed by EA is loaded using the decoration supplied by (RA) into RT48:63. RT0:47 are set to 0. Special Registers Altered: None
Let the effective address (EA) be the contents of RB. The doubleword in storage addressed by EA is loaded using the decoration supplied by (RA) into FRT. Special Registers Altered: None
X-
RT,RA,RB RT
11
RA
16
RB
21
579
/
31
EA RT
(RB) 32 0 || MEM_DECORATED(EA,4,(RA))
Let the effective address (EA) be the contents of RB. The word in storage addressed by EA is loaded using the decoration supplied by (RA) into RT32:63. RT0:31 are set to 0. Special Registers Altered: None
708
Power ISA II
X-
RS,RA,RB RS
11
RA
16
RB
21
643
/
31 0
31
RA
16
RB
21
739
/
31
EA (RB) MEM_DECORATED(EA,1,(RA))
(RS)56:63
EA (RB) MEM_DECORATED(EA,8,(RA))
(RS)
Let the effective address (EA) be the contents of RB. (RS)56:63 are stored to the byte in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None
Let the effective address (EA) be the contents of RB. (RS) is stored to the doubleword in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None
RS,RA,RB RS
11
X-form
RA
16
RB
21
675
stfddx 31
6
FRS,RA,RB FRS
11
RA
931
/
31
EA (RB) MEM_DECORATED(EA,2,(RA))
(RS)48:63
Let the effective address (EA) be the contents of RB. (RS)48:63 are stored to the halfword in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None
EA (RB) MEM_DECORATED(EA,8,(RA))
(FRS)
Let the effective address (EA) be the contents of RB. (FRS) is stored to the doubleword in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None
X-
RS,RA,RB RS
11
RA
16
RB
21
707
/
31
EA (RB) MEM_DECORATED(EA,4,(RA))
(RS)32:63
Let the effective address (EA) be the contents of RB. (RS)32:63 are stored to the word in storage addressed by EA using the decoration supplied by (RA). Special Registers Altered: None
709
X-form
RA,RB ///
11
RA
16
RB
21
483
/
31
EA (RB) MEM_NOTIFY(EA,(RA)) Let the effective address (EA) be the contents of RB. The decoration supplied by (RA) is sent to the address in storage specified by EA. Special Registers Altered: None
710
Power ISA II
Computes an effective address (EA) like most X-form instructions Validates the EA as would be done for a load from that address Translates the EA to a real address Transmits the real address to the device Accepts a word of data from the device and places it into a General Purpose Register
External Control Out Word Indexed (ecowx), which does the following:
Computes an effective address (EA) like most X-form instructions Validates the EA as would be done for a store to that address Translates the EA to a real address Transmits the real address and a word of data from a General Purpose Register to the device
Permission to execute these instructions and identification of the target device are controlled by two fields, called the E bit and the RID field respectively. If attempt is made to execute either of these instructions when E=0 the system data storage error handler is invoked. The location of these fields is described in Book III. The storage access caused by eciwx and ecowx is performed as though the specified storage location is Caching Inhibited and Guarded, and is neither Write Through Required nor Memory Coherence Required. Interpretation of the real address transmitted by eciwx and ecowx and of the 32-bit value transmitted by ecowx is up to the target device, and is not specified by the Power ISA. See the System Architecture documentation for a given Power ISA system for details on how the External Control facility can be used with devices on that system.
Example
An example of a device designed to be used with the External Control facility might be a graphics adapter.
711
31
0 6
RT
11
RA
16
RB
21
310
/
31
(RA) else b b + (RB) EA address translation of EA raddr send store word request for raddr to device identified by RID send (RS)32:63 to device Let the effective address (EA) be the sum (RA|0)+(RB). A store word request for the real address corresponding to EA and the contents of RS32:63 are sent to the device identified by RID, bypassing the cache. The E bit must be 1. If it is not, the data storage error handler is invoked. EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. This instruction is treated as a Store, except that its storage access is not performed in program order with respect to accesses to other Caching Inhibited and Guarded storage locations unless software explicitly imposes that order. See Book III-S for additional information about this instruction. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) address translation of EA raddr send load word request for raddr to device identified by RID 320 || word from device RT Let the effective address (EA) be the sum (RA|0)+(RB). A load word request for the real address corresponding to EA is sent to the device identified by RID, bypassing the cache. The word returned by the device is placed into RT32:63. RT0:31 are set to 0. The E bit must be 1. If it is not, the data storage error handler is invoked. EA must be a multiple of 4. If it is not, either the system alignment error handler is invoked or the results are boundedly undefined. This instruction is treated as a Load. See Book III-S for additional information about this instruction. Special Registers Altered: None Programming Note The eieio<S> or mbar<E> instruction can be used to ensure that the storage accesses caused by eciwx and ecowx are performed in program order with respect to other Caching Inhibited and Guarded storage accesses.
31
0 6
RS
11
RA
16
RB
21
438
/
31
if RA = 0 then b
712
Power ISA II
713
714
Power ISA II
An atomic read/modify/write operation reads a storage location and writes its next value, which may be a function of its current value, all as a single atomic operation. The examples shown provide the effect of an atomic read/modify/write operation, but use several instructions rather than a single atomic instruction.
715
#load and reserve #1st 2 operands equal? #skip if not #store new value if still resved #loop if lost reservation #return value from storage
r5,0,r3 #load and reserve r0,r4,r5#AND word r0,0,r3 #store new value if still resved loop #loop if lost reservation
1. The semantics given for Compare and Swap above are based on those of the IBM System/370 Compare and Swap instruction. Other architectures may define a Compare and Swap instruction differently. 2. Compare and Swap is shown primarily for pedagogical reasons. It is useful on machines that lack the better synchronization facilities provided by lwarx and stwcx.. A major weakness of a System/370-style Compare and Swap instruction is that, although the instruction itself is atomic, it checks only that the old and current values of the word being tested are equal, with the result that programs that use such a Compare and Swap to control a shared resource can err if the word has been modified and the old value subsequently restored. The sequence shown above has the same weakness. 3. In some applications the second bne- instruction and/or the mr instruction can be omitted. The bne- is needed only if the application requires that if the EQ bit of CR Field 0 on exit indicates not equal then (r4) and (r6) are in fact not equal. The mr is needed only if the application requires that if the comparands are not equal then the word from storage is loaded into the register with which it was compared (rather than into a third register). If either or both of these instructions is omitted, the resulting Compare and Swap does not obey System/370 semantics.
#load and reserve #done if word not equal to 0 #try to store non-0 #loop if lost reservation
716
Power ISA II
quent isync create an import barrier that prevents the load from data1 from being performed until the branch has been resolved not to be taken. If the shared data structure is in storage that is neither Write Through Required nor Caching Inhibited, an lwsync instruction can be used instead of the isync instruction. If lwsync is used, the load from data1 may be performed before the stwcx.. But if the stwcx. fails, the second branch is taken and the lwarx is reexecuted. If the stwcx. succeeds, the value returned by the load from data1 is valid even if the load is performed before the stwcx., because the lwsync ensures that the load is performed after the instance of the lwarx that created the reservation used by the successful stwcx..
#load lock and reserve #skip ahead if # lock not free #try to set lock #loop if lost reservation #import barrier r7,data1(r9)#load shared data #wait for lock to free
r5,0,r3 #load pointer and reserve r0,r4,r5#increment the pointer r0,0,r3 #try to store new value loop #loop if lost reservation r7,data1(r5) #load shared data
The hint provided with lwarx indicates that after the program acquires the lock variable (i.e. stwcx. is successful), it will release it (i.e. store to it) prior to another program attempting to modify it. The second bne- does not complete until CR0 has been set by the stwcx.. The stwcx. does not set CR0 until it has completed (successfully or unsuccessfully). The lock is acquired when the stwcx. completes successfully. Together, the second bne- and the subse-
The load from data1 cannot be performed until the pointer value has been loaded into GPR 5 by the lwarx. The load from data1 may be performed before the stwcx.. But if the stwcx. fails, the branch is taken and the value returned by the load from data1 is discarded. If the stwcx. succeeds, the value returned by the load from data1 is valid even if the load is performed before the stwcx., because the load uses the pointer value returned by the instance of the lwarx that created the reservation used by the successful stwcx.. An isync instruction could be placed between the bneand the subsequent lwz, but no isync is needed if all accesses to the shared data structure depend on the value returned by the lwarx.
717
The lwsync ensures that the store that releases the lock will not be performed with respect to any other processor until all stores caused by instructions preceding the lwsync have been performed with respect to that processor.
An alternative is to use a technique similar to that described in Section B.2.1.2, by causing the stw to depend on the value returned by the lwz and omitting the cmpw and bne-. The dependency could be created by ANDing the value returned by the lwz with zero and then adding the result to the value to be stored by the stw. If both storage operands are in storage that is neither Write Through Required nor Caching Inhibited, another alternative is to replace the cmpw and bnewith an lwsync instruction.
The sync ensures that the store that releases the lock will not be performed with respect to any other processor until all stores caused by instructions preceding the sync have been performed with respect to that processor.
718
Power ISA II
B.4 Notes
1. To increase the likelihood that forward progress is made, it is important that looping on lwarx/stwcx. pairs be minimized. For example, in the Test and Set sequence shown in Section B.1, this is achieved by testing the old value before attempting the store; were the order reversed, more stwcx. instructions might be executed, and reservations might more often be lost between the lwarx and the stwcx. 2. The manner in which lwarx and stwcx. are communicated to other processors and mechanisms, and between levels of the storage hierarchy within a given processor, is implementation-dependent. In some implementations performance may be improved by minimizing looping on a lwarx instruction that fails to return a desired value. For example, in the Test and Set sequence shown in Section B.1, if the programmer wishes to stay in the loop until the word loaded is zero, he could change the bne- exit to bne- loop. However, in some implementations better performance may be obtained by using an ordinary Load instruction to do the initial checking of the value, as follows. loop: lwz r5,0(r3)#load the word cmpwi r5,0 #loop back if word bne- loop # not equal to 0 lwarx r5,0,r3 #try again, reserving cmpwi r5,0 # (likely to succeed) bne- loop stwcx.r4,0,r3 #try to store non-0 bne- loop #loop if lost reservn 3. In a multiprocessor, livelock is possible if there is a Store instruction (or any other instruction that can clear another processors reservation; see Section 1.7.3.1) between the lwarx and the stwcx. of a lwarx/stwcx. loop and any byte of the storage location specified by the Store is in the reservation granule. For example, the first code sequence shown in Section B.3 can cause livelock if two list elements have next element pointers in the same reservation granule.
r2,0,r3 #get next pointer r2,0(r4)#store in new element or sync #order stw before stwcx r4,0,r3 #add new element to list loop #loop if stwcx. failed
In the preceding example, if two list elements have next element pointers in the same reservation granule then, in a multiprocessor, livelock can occur. (Livelock is a state in which processors interact in a way such that no processor makes forward progress.) If it is not possible to allocate list elements such that each elements next element pointer is in a different reservation granule, then livelock can be avoided by using the following, more complicated, sequence. lwz loop1: mr stw sync loop2: lwarx cmpw bnestwcx. bner2,0(r3)#get next pointer r5,r2 #keep a copy r2,0(r4)#store in new element #order stw before stwcx. and before lwarx r2,0,r3 r2,r5 loop1 r4,0,r3 loop2 #get it again #loop if changed (someone # else progressed) #add new element to list #loop if failed
In the preceding example, livelock is avoided by the fact that each processor re-executes the stw only if some other processor has made forward progress.
719
720
Power ISA II
Book III-S: Power ISA Operating Environment Architecture - Server Environment [Category: Server]
721
722
Chapter 1. Introduction
1.1 Overview. . . . . . . . . . . . . . . . . . . . 1.2 Document Conventions . . . . . . . . 1.2.1 Definitions and Notation . . . . . . 1.2.2 Reserved Fields. . . . . . . . . . . . . 1.3 General Systems Overview . . . . . 723 723 723 724 724 1.4 Exceptions. . . . . . . . . . . . . . . . . . . 1.5 Synchronization. . . . . . . . . . . . . . . 1.5.1 Context Synchronization . . . . . . 1.5.2 Execution Synchronization . . . . . 725 725 725 726
1.1 Overview
Chapter 1 of Book I describes computation modes, document conventions, a general systems overview, instruction formats, and storage addressing. This chapter augments that description as necessary for the Power ISA Operating Environment Architecture.
real page A unit of real storage that is aligned at a boundary that is a multiple of its size. The real page size is 4KB. context of a program The state (e.g., privilege and relocation) in which the program executes. The context is controlled by the contents of certain System Registers, such as the MSR and SDR1, of certain lookaside buffers, such as the SLB and TLB, and of the Page Table.
Chapter 1. Introduction
723
A System Reset or Machine Check interrupt may occur. The determination of whether an instruction is required by the sequential execution model is not affected by the potential occurrence of a System Reset or Machine Check interrupt. (The determination is affected by the potential occurrence of any other kind of interrupt.) A context-altering instruction is executed (Chapter 10. Synchronization Requirements for Context Alterations on page 845). The context alteration need not take effect until the required subsequent synchronizing operation has occurred. A Reference and Change bit is updated by the thread. The update need not be performed with respect to that thread until the required subsequent synchronizing operation has occurred. A Branch instruction is executed and the branch is taken. The update of the ComeFrom Address Register<S> (see Section 8.1.1 of Book III-S) need not occur until a subsequent context synchronizing operation has occurred.
must If hypervisor software violates a rule that is stated using the word must (e.g., this field must be set to 0), and the rule pertains to the contents of a hypervisor resource, to executing an instruction that can be executed only in hypervisor state, or to accessing storage in real addressing mode, the results are undefined, and may include altering resources belonging to other partitions, causing the system to hang, etc. hardware Any combination of hard-wired implementation, emulation assist, or interrupt for software assistance. In the last case, the interrupt may be to an
724
1.4 Exceptions
The following augments the exceptions defined in Book I that can be caused directly by the execution of an instruction: the execution of a floating-point instruction when MSRFP=0 (Floating-Point Unavailable interrupt) an attempt to modify a hypervisor resource when the thread is in privileged but non-hypervisor state (see Chapter 2), or an attempt to execute a hypervisor-only instruction (e.g., tlbie) when the thread is in privileged but non-hypervisor state the execution of a traced instruction (Trace interrupt) the execution of a Vector instruction when the vector facility is unavailable (Vector Unavailable interrupt)
1.5 Synchronization
The synchronization described in this section refers to the state of the thread that is performing the synchronization.
Chapter 1. Introduction
725
726
2.1 Overview
The Logical Partitioning (LPAR) facility permits threads and portions of real storage to be assigned to logical collections called partitions, such that a program executing on a thread in one partition cannot interfere with any program executing on a thread in a different partition. This isolation can be provided for both problem state and privileged state programs, by using a layer of trusted software, called a hypervisor program (or simply a hypervisor), and the resources provided by this facility to manage system resources. (A hypervisor is a program that runs in hypervisor state; see below.)
The number of partitions supported is implementationdependent. A thread is assigned to one partition at any given time. A thread can be assigned to any given partition without consideration of the physical configuration of the system (e.g., shared registers, caches, organization of the storage hierarchy), except that threads that share certain hypervisor resources may need to be assigned to the same partition; see Section 2.7. The registers and facilities used to control Logical Partitioning are listed below and described in the following subsections. Except in the following subsections, references to the operating system in this document include the hypervisor unless otherwise stated or obvious from context.
VRMASD
RMLS
PECE
DPFD
MER
ILE
VC
12
17
34
38 39
49
52 5354 55
TC
///
///
///
///
///
LPES
///
60 62 63
Figure 1.
Logical Partitioning Control Register Bit 0:2 Description Virtualization Control (VC)
The contents of the LPCR control a number of aspects of the operation of the thread with respect to a logical partition. Below are shown the bit definitions for the LPCR.
727
HDICE
Ignore SLB Large Page Specification (ISL) Controls whether ISL mode is enabled as specified below. 0 - ISL mode disabled 1 - ISL mode enabled When ISL mode is enabled and address translation is enabled and the thread is not in hypervisor state, address translation is performed as if the contents of SLBL||LP were 0b000. When address translation is disabled, the setting of the ISL bit has no effect. ISL mode has no effect on SLB, TLB, and ERAT entry invalidations caused by slbie, slbia, tlbia, tlbie, and slbie.
50
3:8 9:11
Reserved Default Prefetch Depth (DPFD) The DPFD field is used as the default prefetch depth for data stream prefetching when DSCRDPFD=0; see page 678.
12:16
Virtual Real Mode Area Segment Descriptor (VRMASD) When address translation is disabled and VPM0=1, the contents of this field specify the L and LP fields of the segment descriptor that apply for storage references to the virtualized real mode area (VRMA). See Section 5.7.3.4 for additional information. The definitions and allowed values of the L and LP fields are the same as for the corresponding fields in the segment descriptor. (See Section 5.7.7.) If
51
728
The exception effects of this bit are said to be consistent with the contents of this bit if one of the following statements is true. - LPCRMER = 1 and a Mediated External exception exists. - LPCRMER = 0 and a Mediated External exception does not exist. A context synchronizing instruction or event that is executed or occurs when LPCRMER = 0 ensures that the exception effects of LPCRMER are consistent with the contents of LPCRMER. Otherwise, when an instruction changes the contents of LPCRMER, the exception effects of LPCRMER become consistent with the new contents of LPCRMER reasonably soon after the change. Programming Note LPCRMER provides a means for the hypervisor to direct an external exception to a partition independent of the partition's MSREE setting. (When MSREE=0, it is inappropriate for the hypervisor to deliver the exception.) Using LPCRMER, the partition can be interrupted upon enabling external interrupts. Without using LPCRMER, the hypervisor must check the state of MSREE whenever it gets control, which will result in less timely delivery of the exception to the partition. 53 54 Reserved Translation Control (TC) 0 1 55:59 60:61 The secondary Page Table search is enabled. The secondary Page Table search is disabled.
See Section 5.7.3 on page 765 (including subsections) and Section 5.7.9 on page 780 for a description of how storage accesses are affected by the setting of LPES1, and RMLS. See Section 6.5 on page 814 for a description of how the setting of LPES0:1 affects the processing of interrupts.
RMO
63
Name RMO
Reserved Logical Partitioning Environment Selector (LPES) Three of the four LPES values are supported. The 0b10 value is reserved.
All other fields are reserved. The supported RMO values are the non-negative multiples of 2s, where 2s is the smallest implementationdependent limit value representable by the contents of the Real Mode Limit Selector field of the LPCR.
60
LPES0
729
Programming Note On some implementations, software must prevent the execution of a tlbie instruction with an LPID operand value which matches the contents of another threads LPIDR that is being modified or is the same as the new value being written to the LPIDR. This restriction can be met with less effort if one partition identity is used only on threads on which no tlbie instruction is ever executed. This partition can be thought of as the transfer partition used exclusively to move a thread from one partition to another.
HRMO
63
Name HRMO
All other fields are reserved. The supported HRMO values are the non-negative multiples of 2r, where r is an implementation-dependent value and 12 r 26. The contents of the HRMOR affect how some storage accesses are performed as described in Section 5.7.3 on page 765 and Section 5.7.4 on page 768.
v2.05
VEC
VSX
//
62 63
Figure 5.
Starting from bit 0, high-order PCR bits are assigned to control the availability of certain categories. Starting from bit 63, low-order PCR bits are assigned to control the availability of resources that are new in a specified version of the Architecture. These low-order bits have the possibility of subsetting the function provided by a category for which availability is controlled by the highorder bits. For example, if a new function is added to Vector category in V 2.06, the V 2.05 bit and the VEC bits may be used together to enable a version of the Vector category that was available in V 2.05. Each defined bit in the PCR controls whether certain instructions, SPRs, and other related facilities are available in problem state. Except as specified elsewhere in this section, the PCR has no effect on facilities when the thread is not in problem state. Facilities that are made unavailable by the PCR are treated as follows when the thread is in problem state.
Bits 32:63
Name LPID
Figure 4.
The contents of the LPIDR identify the partition to which the thread is assigned, affecting operations necessary to manage the coherency of some translation lookaside buffers (see Section 5.10.1 and Chapter 10.). The number of LPIDR bits supported is implementation-dependent.
Instructions are treated as illegal instructions, SPRs are treated as if they were not defined for the implementation Fields in instructions are treated as if they were 0s.
A PCR bit may also determine how an instruction field value is interpreted or may define other behavior as specified in the bit definitions below. The PCR has no effect on the setting of the MSR and [H]SRR1 by interrupts, and by the [h]rfid and mtmsr[d] instructions, except as specified elsewhere in this section.
730
AMR access using SPR 13 addg6s bperm cdtbcd, cbcdtd dcffix[.] divde[o][.], divdeu[o][.], divwe[o][.], divweu[o][.] isel lfiwzx [Category: Floating-Point: PhasedIn] fctidu[.], fctiduz[.], fctiwu[.], fctiwuz[.], fcfids[.], fcfidu[.], fcfidus[.], ftdiv, ftsqrt [Category: Floating-Point: Phased-In] ldbrx, stdbrx [Category: 64-bit] popcntw, popcntd The listed instructions, facilities, and behaviors are available in the privilege states specified above. The listed instructions, facilities, and behaviors are unavailable in the privilege states specified above. Programming Note The definition of the VSX bit makes facilities in the VSX category unavailable in problem state when the v2.05 bit is set to 1 since these facilities were not defined in Version 2.05. See the VSX bit definition for additional information.
Reserved
731
Programming Note Because the PCR has no effect on privileged instructions except as specified above, privileged instructions that are available on newer implementations but not available on older implementations will behave differently when the thread is in problem state. On older implementations, either an Illegal Instruction type Program interrupt or a Hypervisor Emulation Assistance interrupt will occur because the instruction is undefined; on newer implementations, a Privileged Instruction type Program interrupt will occur because the instruction is implemented. (On older implementations the interrupt will be an Illegal Instruction type Program interrupt if the implementation complies with a version of the architecture that precedes V. 2.05, or complies with V. 2.05 and does not support the Hypervisor Emulation Assistance category, and will be a Hypervisor Emulation Assistance interrupt otherwise.)
732
733
734
Programming Note The privilege state of the thread is determined by MSRHV and MSRPR, as follows. HV PR 0 0 1 1 0 1 0 1 privileged problem hypervisor problem
Hypervisor state is also a privileged state (MSRPR = 0). All references to privileged state in the Books include hypervisor state unless otherwise stated or if it is obvious from the context. MSRHV can be set to 1 only by the System Call instruction and some interrupts. It can be set to 0 only by rfid and hrfid. 4:37 38 Reserved Vector Available (VEC) [Category: Vector] 0 The thread cannot execute any vector instructions, including vector loads, stores, and moves. The thread can execute vector instructions.
Figure 6.
Below are shown the bit definitions for the Machine State Register. Bit 0 Description Sixty-Four-Bit Mode (SF) 0 1 1:2 3 The thread is in 32-bit mode. The thread is in 64-bit mode. 39
Reserved
Reserved Hypervisor State (HV) 0 1 The thread is not in hypervisor state. If MSRPR=0 the thread is in hypervisor state; otherwise the thread is not in hypervisor state.
735
1 41:46 47 48
Reserved Reserved External Interrupt Enable (EE) 0 1 External and Decrementer interrupts are disabled. External and Decrementer interrupts are enabled.
This bit also affects whether Hypervisor Decrementer and Hypervisor Maintenance interrupts are enabled; see Section 6.5.12 on page 824 and Section 6.2.9 on page 809. 49 Problem State (PR) 0 1 The thread is in privileged state. The thread is in problem state. Programming Note Any instruction that sets MSRPR to 1 also sets MSREE, MSRIR, and MSRDR to 1. 50 Floating-Point Available (FP) [Category: Floating-Point] 0 The thread cannot execute any floatingpoint instructions, including floating-point loads, stores, and moves. The thread can execute floating-point instructions.
Branch tracing need not be supported on all implementations that support the Trace category. If the function is not implemented, this bit is treated as reserved. 55 Floating-Point Exception Mode 1 (FE1) [Category: Floating-Point] See below. 56:57 58 Reserved Instruction Relocate (IR) 0 1 Instruction address translation is disabled. Instruction address translation is enabled. Programming Note See the Programming Note in the definition of MSRPR. 59 Data Relocate (DR) 0 Data address translation is disabled. Effective Address Overflow (EAO) (see Book I) does not occur. Data address translation is enabled. EAO causes a Data Storage interrupt. Programming Note See the Programming Note in the definition of MSRPR. 60 61 Reserved Performance Monitor Mark (PMM) [Category: Server.Performance Monitor] See Appendix B of Book III-S.
1 51
Machine Check Interrupt Enable (ME) 0 1 Machine Check interrupts are disabled. Machine Check interrupts are enabled.
This bit is a hypervisor resource; see Chapter 2., Logical Partitioning (LPAR), on page 727. Programming Note The only instructions that can alter MSRME are rfid and hrfid.
52
62
53
Single-Step Trace Enable (SE) [Category: Trace] 0 1 The thread executes instructions normally. The thread generates a Single-Step type Trace interrupt after successfully completing the execution of the next instruction, unless that instruction is hrfid or rfid,
Additional information about the use of this bit is given in Sections 6.4.3, Interrupt Processing on page 811, 6.5.1, System Reset Interrupt on page 815, and 6.5.2, Machine Check Interrupt on page 816. 63 Little-Endian Mode (LE) 0 The thread is in Big-Endian mode.
736
737
System Call
sc 17
0 6
SC-form
Programming Note sc serves as both a basic and an extended mnemonic. The Assembler will recognize an sc mnemonic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form the LEV operand is omitted and assumed to be 0.
LEV ///
11
///
16
//
20
LEV
//
27
30 31
SRR0 iea CIA + 4 0 SRR133:36 42:47 MSR0:32 37:41 48:63 SRR10:32 37:41 48:63 MSR new_value (see below) 0x0000_0000_0000_0C00 NIA The effective address of the instruction following the System Call instruction is placed into SRR0. Bits 0:32, 37:41, and 48:63 of the MSR are placed into the corresponding bits of SRR1, and bits 33:36 and 42:47 of SRR1 are set to zero. Then a System Call interrupt is generated. The interrupt causes the MSR to be set as described in Section 6.5, Interrupt Definitions on page 814. The setting of the MSR is affected by the contents of the LEV field. LEV values greater than 1 are reserved. Bits 0:5 of the LEV field (instruction bits 20:25) are treated as a reserved field. The interrupt causes the next instruction to be fetched from effective address 0x0000_0000_0000_0C00. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR1 MSR Programming Note If LEV=1 the hypervisor is invoked. If LPES1=1, executing this instruction with LEV=1 is the only way that executing an instruction can cause hypervisor state to be entered. Because this instruction is not privileged, it is possible for application software to invoke the hypervisor. However, such invocation should be considered a programming error.
738
///
11
///
16
///
21
18
/
31 0
19
6
///
11
///
16
///
21
274
/
31
(MSR3 & SRR151) | ((MSR3) & MSR51) MSR51 MSR3 & SRR13 MSR3 MSR48 SRR148 | SRR149 SRR158 | SRR149 MSR58 MSR59 SRR159 | SRR149 MSR0:2 4:32 37:41 49:50 52:57 60:63 SRR10:2 4:32 37:41 49:50 52:57 60:63 NIA iea SRR00:61 || 0b00 If MSR3=1 then bits 3 and 51 of SRR1 are placed into the corresponding bits of the MSR. The result of ORing bits 48 and 49 of SRR1 is placed into MSR48. The result of ORing bits 58 and 49 of SRR1 is placed into MSR58. The result of ORing bits 59 and 49 of SRR1 is placed into MSR59. Bits 0:2, 4:32, 37:41, 49:50, 52:57, and 60:63 of SRR1 are placed into the corresponding bits of the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address SRR00:61 || 0b00 (when SF=1 in the new MSR value) or 320 || SRR032:61 || 0b00 (when SF=0 in the new MSR value). If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into SRR0 or HSRR0 by the interrupt processing mechanism (see Section 6.4.3) is the address of the instruction that would have been executed next had the interrupt not occurred. This instruction is privileged and context synchronizing. Special Registers Altered: MSR Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1.
HSRR148 | HSRR149 MSR48 MSR58 HSRR158 | HSRR149 MSR59 HSRR159 | HSRR149 MSR0:32 37:41 49:57 60:63 HSRR10:32 37:41 49:57 60:63 NIA iea HSRR00:61 || 0b00 The result of ORing bits 48 and 49 of HSRR1 is placed into MSR48. The result of ORing bits 58 and 49 of HSRR1 is placed into MSR58. The result of ORing bits 59 and 49 of HSRR1 is placed into MSR59. Bits 0:32, 37:41, 49:57, and 60:63 of HSRR1 are placed into the corresponding bits of the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address HSRR00:61 || 0b00 (when SF=1 in the new MSR value) or 320 || HSRR032:61 || 0b00 (when SF=0 in the new MSR value). If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into SRR0 or HSRR0 by the interrupt processing mechanism (see Section 6.4.3) is the address of the instruction that would have been executed next had the interrupt not occurred. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: MSR Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1.
739
740
XL-form
Nap
nap
XL-form
///
11
///
16
///
21
402
31
/
0
19
6
///
11
///
16
///
21
434
31
The thread is placed into doze power-saving level. When the thread is in doze power-saving level, the state of all thread resources is maintained as if the thread was not in power-saving mode. When the interrupt that causes exit from doze powersaving level occurs, resource state is as described in the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None
The thread is placed into nap power-saving level. When the thread is in nap power-saving level, the state of the Decrementer and all hypervisor resources is maintained as if the thread was not in power-saving mode, and sufficient information is maintained to allow the hypervisor to resume execution. When the interrupt that causes exit from nap powersaving level occurs, resource state is as described in the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None Programming Note If the state of the Decrementer were not maintained and updated as if the thread was not in power-saving mode, Decrementer exceptions would not reliably cause exit from nap power-saving level even if Decrementer exceptions were enabled to cause exit.
741
XL-form
XL-form
///
11
///
16
///
21
466
31
/
0
19
6
///
11
///
16
///
21
498
31
The thread is placed into sleep power-saving level. When the thread is in sleep power-saving level, the state of all resources may be lost except for the HRMOR. When the interrupt that causes exit from sleep powersaving level occurs, resource state is as described in the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None Programming Note If the state of the Decrementer is not maintained and updated, in sleep or rvwinkle power-saving level, as if the thread was not in power-saving mode, Decrementer exceptions will not reliably cause exit from power-saving mode even if Decrementer exceptions are enabled to cause exit.
The thread is placed into rvwinkle power-saving level. When the thread is in rvwinkle power-saving level, the state of all resources may be lost except for the HRMOR. When the interrupt that causes exit from rvwinkle power-saving level occurs, resource state is as described in the preceding paragraph, except that if the exception that caused the exit is a System Reset, Machine Check, or Hypervisor Maintenance exception, resource state that would be lost if the exception occurred when the thread was not in power-saving mode may be lost. This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: None Programming Note In the short story by Washington Irving, Rip Van Winkle is a man who fell asleep on a green knoll and awoke twenty years later.
Note Note See the Notes that appear in the rvwinkle instruction description. See the Notes that appear in the sleep instruction description.
742
loop: cmp Rx,Rx /* create dependency */ bne loop nap/doze/sleep/rvwinkle /* enter power- */ /* saving mode */ b $ /* branch to self */ After the thread has entered power-saving mode as specified above, various exceptions may cause exit from power-saving mode. The exceptions include, System Reset, Machine Check, Decrementer, External, Hypevisor Maintenance, and implementation-specific exceptions. Upon exit from power-saving mode, if the exception was a Machine Check exception, then a Machine Check interrupt occurs; otherwise a System Reset interrupt occurs, and the contents of SRR1 indicate the type of exception that caused exit from powersaving mode. See Section 6.5.1 for additional information.
743
744
The PVR distinguishes between implementations that differ in attributes that may affect software. It contains two fields. Version A 16-bit number that identifies the version of the implementation. Different version numbers indicate major differences between implementations, such as which categories are supported. A 16-bit number that distinguishes between implementations of the version. Different revision numbers indicate minor differences between implementations having the same version number, such as clock rate and Engineering Change level.
Revision
Version numbers are assigned by the Power ISA process. Revision numbers are assigned by an implementation-defined process.
Revision
63
Figure 7.
745
Bits 0:31
Name PROCID
Description Thread ID
Figure 8.
RUN
63
The means by which the PIR is initialized are implementation-dependent. The PIR is a hypervisor resource; see Chapter 2. Bit 63 Name RUN
The CTRL RUN can be used by the operating system to indicate when the thread is doing useful work. The contents of the CTRL can be written by the mtspr instruction and read by the mfspr instruction. Write access to the CTRL is privileged. Reads can be performed in privileged or problem state.
746
Figure 10. Software-use SPRs SPRG0, SPRG1, and SPRG2 are privileged registers. SPRG3 is a privileged register except that the contents may be copied to a GPR in Problem state when accessed using the mfspr instruction. Programming Note Neither the contents of the SPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the thread. One or more of the registers is likely to be needed by non-hypervisor interrupt handler programs (e.g., as scratch registers and/or pointers to per thread save areas). Operating systems must ensure that no sensitive data are left in SPRG3 when a problem state program is dispatched, and operating systems for secure systems must ensure that SPRG3 cannot be used to implement a covert channel between problem state programs. These requirements can be satisfied by clearing SPRG3 before passing control to a program that will run in problem state. HSPRG0 and HSPRG1 are 64-bit registers provided for use by hypervisor programs. HSPRG0 HSPRG1
0 63
Figure 11. SPRs for use by hypervisor programs Programming Note Neither the contents of the HSPRGs, nor accessing them using mtspr or mfspr, has a side effect on the operation of the thread. One or more of the registers is likely to be needed by hypervisor interrupt handler programs (e.g., as scratch registers and/or pointers to per thread save areas).
747
748
RT,RA,RB RT
11
RA
16
RB
21
853
/
31 0
31
RA
16
RB
21
821
/
31
if RA = 0 then b 0 (RA) else b b + (RB) EA 560 || MEM(EA, 1) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA 480 || MEM(EA, 2) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
RT,RA,RB RT
11
RA
16
RB
21
789
/
31 0
31
RA
16
RB
21
885
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) 320 || MEM(EA, 4) RT Let the effective address (EA) be the sum (RA|0)+ (RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
0 (RA)
Let the effective address (EA) be the sum (RA|0)+ (RB). The doubleword in storage addressed by EA is loaded into RT. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
749
RS,RA,RB RS
11
RA
16
RB
21
981
/
31 0
31
RA
16
RB
21
949
/
31
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 1) (RS)56:63 Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)56:63 are stored into the byte in storage addressed by EA. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b b + (RB) EA MEM(EA, 2) (RS)48:63 Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)48:63 are stored into the halfword in storage addressed by EA. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
RS,RA,RB RS
11
RA
16
RB
21
917
/
31 0
31
RA
16
RB
21
1013
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) (RS)32:63 MEM(EA, 4) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS)32:63 are stored into the word in storage addressed by EA. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
if RA = 0 then b 0 (RA) else b EA b + (RB) (RS) MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+ (RB). (RS) is stored into the doubleword in storage addressed by EA. The storage access caused by this instruction is performed as though the specified storage location is Caching Inhibited and Guarded. This instruction is hypervisor privileged. Special Registers Altered: None
750
4.4.2 Fixed-Point Load and Store Quadword Instructions [Category: Load/ Store Quadword]
Load Quadword
lq 56
0 6
DQ-form
Store Quadword
stq RSp,DS(RA) 62
0 6
DS-form
RTp,DQ(RA) RTp
11
RA
16
DQ
///
28 31
RSp
11
RA
16
DS
2
30 31
if RA = 0 then b 0 else b (RA) b + EXTS(DQ || 0b0000) EA MEM(EA, 8) RTp Let the effective address (EA) be the sum (RA|0)+ (DQ||0b0000). The quadword in storage addressed by EA is loaded into register pair RTp. EA must be a multiple of 16. If it is not, an Alignment interrupt occurs. If RTp is odd or RTp=RA, the instruction form is invalid. If RTp=RA, an attempt to execute this instruction causes an Illegal Instruction type Program interrupt. (The RTp=RA case includes the case of RTp=RA=0.) This instruction is not supported in Little-Endian mode. Execution of this instruction in Little-Endian mode causes either an Alignment interrupt or the results are boundedly undefined. This instruction is privileged.
if RA = 0 then b 0 else b (RA) b + EXTS(DS || 0b00) EA RSp MEM(EA, 8) Let the effective address (EA) be the sum (RA|0)+ (DS||0b00). The contents of register pair RSp are stored into the quadword in storage addressed by EA. EA must be a multiple of 16. If it is not, an Alignment interrupt occurs. If RSp is odd, the instruction form is invalid. This instruction is not supported in Little-Endian mode. Execution of this instruction in Little-Endian mode causes either an Alignment interrupt or the results are boundedly undefined. This instruction is privileged.
751
4.4.3 OR Instruction
or Rx,Rx,Rx can be used to set PPRPRI (see Section 4.3.4) as shown in Figure 12. PPRPRI remains unchanged if the privilege state of the thread executing the instruction is lower than the privilege indicated in the figure. (The encodings available to problem-state programs, as well as encodings for additional shared resource hints not shown here, are described in Chapter 3 of Book II.) Rx 31 1 6 2 5 3 7 PPRPRI 001 010 011 100 101 110 111 Priority very low low medium low medium (normal) medium high high very high Privileged yes no no no yes yes hypv
Extended mnemonics
Extended mnemonics are provided for the mtspr and mfspr instructions so that they can be coded with the SPR name as part of the mnemonic rather than as a numeric operand. See Appendix A, Assembler Extended Mnemonics on page 851.
752
753
754
SPR,RS RS
11
When the designated SPR is the Hypervisor Maintenance Exception Register (HMER), the contents of register RS are ANDed with the contents of the HMER and the result is placed into the HMER. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered. spr0=1 if and only if writing the register is privileged. Execution of this instruction specifying an SPR number with spr0=1 causes a Privileged Instruction type Program interrupt when MSRPR=1 and, if the SPR is a hypervisor resource (see Figure 13), when MSRHV PR=0b00. Execution of this instruction specifying an SPR number that is not defined for the implementation, including SPR numbers that are shown in Figure 13 but are in a category that is not supported by the implementation, causes one of the following. if spr0=0:
spr
21
467
/
31
n spr5:9 || spr0:4 if (n 13) & (n 29) & (n 157) & (n 336) then if length(SPR(n)) = 64 then SPR(n) (RS) else (RS)32:63 SPR(n) else if n = 13 then if MSRHV PR = 0b10 then SPR(13) (RS) else if MSRHV PR = 0b00 then SPR(13) ((RS) & AMOR) | ((SPR(13)) & AMOR) else SPR(13) ((RS) & UAMOR) | ((SPR(13)) & UAMOR) if n = 29 then if MSRHV PR = 0b10 then (RS) SPR(29) else SPR(29) ((RS) & AMOR) | ((SPR(29)) & AMOR) if n = 157 then if MSRHV PR = 0b10 then (RS) SPR(157) else SPR(157) (RS) & AMOR if n = 336 then (SPR(336)) & (RS) SPR(336) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 13. The contents of register RS are placed into the designated Special Purpose Register, except as described in the next four paragraphs. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. When the designated SPR is the Authority Mask Register (AMR), using SPR 13 or SPR 29, and MSRHV PR=0b00, the contents of bit positions of register RS corresponding to 1 bits in the Authority Mask Override Register (AMOR) are placed into the corresponding bits of the AMR; the other AMR bits are not modified. When the designated SPR is the AMR, using SPR 13, and MSRPR=1, the contents of bit positions of register RS corresponding to 1 bits in the User Authority Mask Override Register (UAMOR) are placed into the corresponding bits of the AMR; the other AMR bits are not modified. When the designated SPR is the UAMOR and MSRHV PR=0b00, the contents of register RS are ANDed with the contents of the AMOR and the result is placed into the UAMOR.
if MSRPR=0: Hypervisor Emulation Assistance interrupt for SPR 0 and no operation (i.e. the instruction is treated as a no-op) for all other SPRs if spr0=1: - if MSRPR=1: Privileged Instruction type Program interrupt
Special Registers Altered: See Figure 13 Programming Note For a discussion of software synchronization requirements when altering certain Special Purpose Registers, see Chapter 10. Synchronization Requirements for Context Alterations on page 845.
755
RT,SPR RT
11
spr
21
339
/
31
n spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) RT else 32 RT 0 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 13. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the highorder 32 bits of RT are set to zero. spr0=1 if and only if reading the register is privileged. Execution of this instruction specifying an SPR number with spr0=1 causes a Privileged Instruction type Program interrupt when MSRPR=1 and, if the SPR is a hypervisor resource (see Figure 13), when MSRHV PR=0b00. Execution of this instruction specifying an SPR number that is not defined for the implementation, including SPR numbers that are shown in Figure 13 but are in a category that is not supported by the implementation, causes one of the following. if spr0=0:
if MSRPR=0: Hypervisor Emulation Assistance interrupt for SPRs 0, 4, 5, and 6 and no operation (i.e. the instruction is treated as a no-op) for all other SPRs if spr0=1: - if MSRPR=1: Privileged Instruction type Program interrupt
Special Registers Altered: None Note See the Notes that appear with mtspr.
756
X-form
Programming Note If MSREE=0 and an External or Decrementer exception is pending, executing an mtmsr instruction that sets MSREE to 1 will cause the External or Decrementer interrupt to occur before the next instruction is executed, if no higher priority exception exists (see Section 6.8, Interrupt Priorities on page 831). Similarly, if a Hypervisor Decrementer interrupt is pending, execution of the instruction by the hypervisor causes a Hypervisor Decrementer interrupt to occur if HDICE=1. For a discussion of software synchronization requirements when altering certain MSR bits, see Chapter 10. Programming Note
RS,L RS
11
///
L
15 16
///
21
146
/
31
if L = 0 then MSR48 (RS)48 | (RS)49 MSR58 (RS)58 | (RS)49 MSR59 (RS)59 | (RS)49 MSR32:47 49:50 52:57 60:62 (RS)32:47 else (RS)48 62 MSR48 62
The MSR is set based on the contents of register RS and of the L field. L=0: The result of ORing bits 48 and 49 of register RS is placed into MSR48. The result of ORing bits 58 and 49 of register RS is placed into MSR58. The result of ORing bits 59 and 49 of register RS is placed into MSR59. Bits 32:47, 49:50, 52:57, and 60:62 of register RS are placed into the corresponding bits of the MSR. L=1: Bits 48 and 62 of register RS are placed into the corresponding bits of the MSR. The remaining bits of the MSR are unchanged. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsr instruction description in this section, references to mtmsr in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsr instruction that modifies an MSR bit other than the EE or RI bit implies L=0).
mtmsr serves as both a basic and an extended mnemonic. The Assembler will recognize an mtmsr mnemonic with two operands as the basic form, and an mtmsr mnemonic with one operand as the extended form. In the extended form the L operand is omitted and assumed to be 0. Programming Note There is no need for an analogous version of the mfmsr instruction, because the existing instruction copies the entire contents of the MSR to the selected GPR.
Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRME or MSRLE. (This instruction does not alter MSRHV because it does not alter any of the high-order 32 bits of the MSR.) If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used.
757
X-form
Programming Note If MSREE=0 and an External or Decrementer exception is pending, executing an mtmsrd instruction that sets MSREE to 1 will cause the External or Decrementer interrupt to occur before the next instruction is executed, if no higher priority exception exists (see Section 6.8, Interrupt Priorities on page 831). Similarly, if a Hypervisor Decrementer interrupt is pending, execution of the instruction by the hypervisor causes a Hypervisor Decrementer interrupt to occur if HDICE=1. For a discussion of software synchronization requirements when altering certain MSR bits, see Chapter 10. Programming Note mtmsrd serves as both a basic and an extended mnemonic. The Assembler will recognize an mtmsrd mnemonic with two operands as the basic form, and an mtmsrd mnemonic with one operand as the extended form. In the extended form the L operand is omitted and assumed to be 0.
RS,L RS
11
///
L
15 16
///
21
178
/
31
if L = 0 then MSR48 (RS)48 | (RS)49 MSR58 (RS)58 | (RS)49 MSR59 (RS)59 | (RS)49 MSR0:2 4:47 49:50 52:57 60:62 (RS)0:2 4:47 49:50 52:57 60:62 else (RS)48 62 MSR48 62 The MSR is set based on the contents of register RS and of the L field. L=0: The result of ORing bits 48 and 49 of register RS is placed into MSR48. The result of ORing bits 58 and 49 of register RS is placed into MSR58. The result of ORing bits 59 and 49 of register RS is placed into MSR59. Bits 0:2, 4:47, 49:50, 52:57, and 60:62 of register RS are placed into the corresponding bits of the MSR. L=1: Bits 48 and 62 of register RS are placed into the corresponding bits of the MSR. The remaining bits of the MSR are unchanged. This instruction is privileged. If L=0 this instruction is context synchronizing. If L=1 this instruction is execution synchronizing; in addition, the alterations of the EE and RI bits take effect as soon as the instruction completes. Special Registers Altered: MSR Except in the mtmsrd instruction description in this section, references to mtmsrd in this document imply either L value unless otherwise stated or obvious from context (e.g., a reference to an mtmsrd instruction that modifies an MSR bit other than the EE or RI bit implies L=0).
Programming Note If this instruction sets MSRPR to 1, it also sets MSREE, MSRIR, and MSRDR to 1. This instruction does not alter MSRLE, MSRME or MSRHV. If the only MSR bits to be altered are MSREE RI, to obtain the best performance L=1 should be used.
758
RT RT
11
///
16
///
21
83
/
31
RT
MSR
The contents of the MSR are placed into register RT. This instruction is privileged. Special Registers Altered: None
759
760
761
5.1 Overview
A program references storage using the effective address computed by the hardware when it executes a Load, Store, Branch, or Cache Management instruction, or when it fetches the next sequential instruction. The effective address is translated to a real address according to procedures described in Section 5.7.3, in Section 5.7.5 and in the following sections. The real address is what is presented to the storage subsystem. For a complete discussion of storage addressing and effective address calculation, see Section 1.9 of Book I.
are indicated as such in Chapter 10. Synchronization Requirements for Context Alterations on page 845. Implicit branches are not supported by the Power ISA. If an implicit branch occurs, the results are boundedly undefined.
762
Programming Note In configurations supporting multiple partitions, hypervisor software must ensure that a storage access by a program in one partition will not cause a Checkstop or other system-wide event that could affect the integrity of other partitions (see Chapter 2). For example, such an event could occur if a real address placed in a Page Table Entry or made accessible to a partition using the Offset Real Mode Address mechanism (see Section 5.7.3.2) does not exist.
763
Real page size is 212 bytes (4 KB). Effective address space size is 264 bytes. An effective address is translated to a virtual address via the Segment Lookaside Buffer (SLB). - Virtual address space size is 2n bytes, 65n78; see Note 2. - Segment size is 2s bytes, s=28 or 40. - 2n-40 number of virtual segments 2n-28; see Note 2. - Virtual page size is 2p bytes, where 12p, and 2p is no larger than either the size of the biggest segment or the real address space; a size of 4 KB, 64 KB, and an implementationdependent number of other sizes are supported; see Note 3. The Page Table specifies the virtual page size. The SLB specifies the base virtual page size, which is the smallest virtual page size that the segment can contain. The base virtual page size is 2b bytes. - Segments contain pages of a single size, a mixture of 4 KB and 64 KB pages, or a mixture of page sizes that include implementationdependent page sizes. A virtual address is translated to a real address via the Page Table. - Virtual page size is 2p bytes, where 12p, and 2p is no larger than either the size of the biggest segment or the real address space; a size of 4KB, 64 KB, and an implementationdependent number of other sizes are supported; see Note 3. Notes: 1. The value of m is implementation-dependent (subject to the maximum given above). When used to address storage, the high-order 60-m bits of the 60-bit real address must be zeros. 2. The value of n is implementation-dependent (subject to the range given above). In references to 78bit virtual addresses elsewhere in this Book, the high-order 78-n bits of the 78-bit virtual address are assumed to be zeros. 3. The supported values of p for the larger virtual page sizes are implementation-dependent (subject to the limitations given above).
bit mode. In 32-bit mode (MSRSF=0), the high-order 32 bits of the 64-bit effective address are treated as zeros for the purpose of addressing storage. This applies to both data accesses and instruction fetches. It applies independent of whether address translation is enabled or disabled. This truncation of the effective address is the only respect in which storage accesses in 32-bit mode differ from those in 64-bit mode. Programming Note Treating the high-order 32 bits of the effective address as zeros effectively truncates the 64-bit effective address to a 32-bit effective address such as would have been generated on a 32-bit implementation of the Power ISA. Thus, for example, the ESID in 32-bit mode is the high-order four bits of this truncated effective address; the ESID thus lies in the range 0-15. When address translation is enabled, these four bits would select a Segment Register on a 32-bit implementation of the Power ISA. The SLB entries that translate these 16 ESIDs can be used to emulate these Segment Registers.
764
j+r = m, where the real address size supported by the implementation is m bits. Programming Note EA4:63-r should equal 60-r0. If this condition is satisfied, ORing the effective address with the offset produces a result that is equivalent to adding the effective address and the offset. If m<60, EA4:63-m and HRMOR4:63-m must be zeros.
765
Programming Note The offset specified by the RMOR should be a nonzero multiple of the limit specified by the RMLS. If these registers are set thus, ORing the effective address with the offset produces a result that is equivalent to adding the effective address and the offset. (The offset must not be zero, because real page 0 contains the fixed interrupt vectors and real pages 1 and 2 may be used for implementationspecific purposes; see Section 5.7.4, Address Ranges Having Defined Uses on page 768.)
5.7.3.3 Storage Control Attributes for Accesses in Real and Hypervisor Real Addressing Modes
Storage accesses in hypervisor real addressing mode are performed as though all of storage had the following storage control attributes, except as modified by the Hypervisor Real Mode Storage Control facility (see Section 5.7.3.3.1). (The storage control attributes are defined in Book II.) not Write Through Required not Caching Inhibited, for instruction fetches not Caching Inhibited, for data accesses except those caused by the Load/Store Caching Inhibited instructions; Caching Inhibited, for data accesses caused by the Load/Store Caching Inhibited instructions Memory Coherence Required, for data accesses Guarded not SAO Storage accesses in real addressing mode are performed as though all of storage had the following storage control attributes. (Such accesses use the Offset Real Mode Address mechanism.) not Write Through Required not Caching Inhibited Memory Coherence Required, for data accesses not Guarded not SAO Additionally, storage accesses in real or hypervisor real addressing modes are performed as though all storage was not No-execute. Programming Note Because storage accesses in real addressing mode and hypervisor real addressing mode do not use the SLB or the Page Table, accesses in these modes bypass all checking and recording of information contained therein (e.g., storage protection checks that use information contained therein are not performed, and reference and change information is not recorded).
766
Programming Note The preceding capability can be used to improve the performance of hypervisor software that runs in hypervisor real addressing mode, by causing accesses to instructions and data that occupy wellbehaved storage to be treated as non-Guarded.
Programming Note The C bit in Figure 14 is set to 0 because the implementation-dependent lookaside information associated with the VRMA is expected to be long-lived. See Section 5.9.3.1. Programming Note The 1 TB VSID 0x0_01FF_FFFF should not be used by the operating system for purposes other than mapping the VRMA when address translation is enabled. Programming Note Software should specify PTEB = 0b01 for all Page Table Entries that map the VRMA in order to be consistent with the values in Figure 14. Programming Note All accesses to the RMA are considered not Guarded. The G bit of the associated Page Table Entry determines whether an access to the VRMA is Guarded. Therefore, if an instruction is fetched from the VRMA, a Hypervisor Instruction Storage interrupt will result if G=1 in the associated Page Table Entry. Programming Note The RMA is considered non-SAO storage. However, any page in the VRMA is treated as SAO storage if WIMG = 0b1110 in the associated Page Table Entry.
767
Version 2.06 Revision B 5.7.3.5 Storage Control Attributes for Implicit Storage Accesses
Implicit accesses to the Page Table during address translation and in recording reference and change information are performed as though the storage occupied by the Page Table had the following storage control attributes. not Write Through Required not Caching Inhibited Memory Coherence Required not Guarded not SAO The definition of performed given in Book II applies also to these implicit accesses; accesses for performing address translation are considered to be loads in this respect, and accesses for recording reference and change information are considered to be stores. These implicit accesses are ordered by the ptesync instruction as described in Section 5.9.2.
768
s-p Page
p Byte
63
ESID
SLBEn
0 35 37 39 88 89 VSID0:77-s 93 95 96
s-p Page
p Byte
5.7.6.1 (SLB)
The Segment Lookaside Buffer (SLB) specifies the mapping between Effective Segment IDs (ESIDs) and Virtual Segment IDs (VSIDs). The number of SLB entries is implementation-dependent, except that all implementations provide at least 32 entries. The contents of the SLB are managed by software, using the instructions described in Section 5.9.3.1. See Chapter 10. Synchronization Requirements for Context Alterations on page 845 for the rules that software must follow when updating the SLB.
SLB Entry
Each SLB entry (SLBE, sometimes referred to as a segment descriptor) maps one ESID to one VSID. Figure 17 shows the layout of an SLB entry
769
For each SLB entry, software must ensure the following requirements are satisfied. V B
36 37 39
VSID
KsKpNLC / LP 89 94 95 96
Description Effective Segment ID Entry valid (V=1) or invalid (V=0) Segment Size Selector 0b00 - 256 MB (s=28) 0b01 - 1 TB (s=40) 0b10 - reserved 0b11 - reserved Virtual Segment ID Supervisor (privileged) state storage key (see Section 5.7.9.2) Problem state storage key (See Section 5.7.9.2.) No-execute segment if N=1 Virtual page size selector bit 0 Class Virtual page size selector bits 1:2
L||LP contains a value supported by the implementation. The base virtual page size selected by the L and LP fields does not exceed the segment size selected by the B field. If s=40, the following bits of the SLB entry contain 0s. - ESID24:35 - VSID38:49 The bits in the above two items are ignored by the hardware.
All other fields are reserved. B0 (SLBE37) is treated as a reserved field. Figure 17. SLB Entry Instructions cannot be executed from a No-execute (N=1) segment. Segments may contain a mixture of pages sizes. The L and LP bits specify the base virtual page size that the segment may contain. The SLBL||LP encoding are those shown in Figure 18. The base virtual page size (also referred to as the base page size) is the smallest virtual page size for the segment. The base virtual page size is 2b bytes. The actual virtual page size (also referred to as the actual page size or virtual page size) is specified by PTEL LP. encoding page size 0b000 4 KB 0b101 64 KB additional 2b bytes, where b > 12 and b may differ among encoding values values1 1 The additional values are implementation-dependent, as are the corresponding base virtual page sizes. Any values that are not supported by a given implementation are reserved in that implementation. Figure 18. Page Size Encoding
The Class field of the SLBE is used in conjunction with the slbie and slbia instructions (see Section 5.9.3.1). Class refers to a grouping of SLB entries and implementation-specific lookaside information so that only entries in a certain group need be invalidated and others might be preserved. The Class value assigned to an implementation-specific lookaside entry derived from an SLB entry must match the Class value of that SLB entry. The Class value assigned to an implementation-specific lookaside entry that is not derived from an SLB entry (such as real mode address translations) is 0. Software must ensure that the SLB contains at most one entry that translates a given effective address, and that if the SLB contains an entry that translates a given effective address, then any previously existing translation of that effective address has been invalidated. An attempt to create an SLB entry that violates this requirement may cause a Machine Check. Programming Note It is permissible for software to replace the contents of a valid SLB entry without invalidating the translation specified by that entry provided the specified restrictions are followed. See Chapter 10 Note 11.
770
If the SLB search fails, a segment fault occurs. This is an Instruction Segment exception or a Data Segment exception, depending on whether the effective address is for an instruction fetch or for a data access.
771
HTABORG
2 44
HTABSIZE
13 5
// xxx.......xx000.00 0 4 1718 45
/// 59 63
28
39
38
28
AND
28
OR *
* If the Server.Relaxed Page Table Alignment category is supported, low order HTABORG bits are not necessarily zero; the OR block to the left is replaced with a full adder, and the carry out is added to bits HTABORG4:17 to form RA0:13 of the PTEG.
Page Table
16 bytes PTEG 0 PTE0 PTE7
14
28
11
0000000
60-bit Real Address of Page Table Entry Group (PTEG)
PTEG n 128 bytes Page Table Entry (PTE) B 0 AVA 16 bytes SW 57 L H V pp / key 2 4
(ARPN||LP)0:59-p
ARPN
LP key 44
R C WIMG N pp 61 62 63
6162 63 0 1
52 54 5556 57
60-p
p Byte
772
All other fields are reserved. Figure 20. Page Table Entry Programming Note The H bit in the Page Table entry should not be set to one unless the secondary Page Table search has been enabled. If b23, the Abbreviated Virtual Address (AVA) field contains bits 0:54 of the VA. Otherwise bits 0:77-b of the AVA field contain bits 0:77-b of the VA, and bits 78b:54 of the AVA field must be zero. Programming Note The AVA field omits the low-order 23 bits of the VA. These bits are not needed in the PTE, because the low-order b of these bits are part of the byte offset into the virtual page and, if b<23, the high-order 23-b of these bits are always used in selecting the PTEGs to be searched (see Section 5.7.7.3). On implementations that support a virtual address size of only n bits, n<78, bits 0:77-n of the AVA field must be zeros. A virtual page is mapped to a sequence of 2p-12 contiguous real pages such that the low-order p-12 bits of the real page number of the first real page in the sequence are 0s. PTEL LP specify both a base virtual page size (henceforth referred to as the base page size) and an actual virtual page size (henceforth referred to as the actual page size or virtual page size). The actual page size is the size of the virtual page mapped by the PTE. The base page size is the smallest actual page size that a segment can contain. See Section 5.7.6. If PTEL=0, the base virtual page size and actual virtual page size are 4KB, and ARPN concatenated with LP (ARPN||LP) contains the page number of the real page that maps the virtual page described by the entry. If PTEL=1, the base page size and actual page size are specified by PTELP. In this case, the contents of PTELP have the format shown in Figure 21. Bits labelled r are
B pp / key
0 1 2 4
AVA ARPN
SW L H V LP key R C WIMG N pp
52 55 56 57 61 62 63
44
Dword Bit(s) Name Description 0 0:1 B Segment Size 0b00 - 256 MB 0b01 - 1 TB 0b10 - reserved 0b11 - reserved 2:56 AVA Abbreviated Virtual Address 57:60 SW Available for software use 61 L Virtual page size 0b0 - 4 KB 0b1 - greater than 4KB (large page) 62 H Hash function identifier 63 V Entry valid (V=1) or invalid (V=0)
773
HTABORG
46
///
HTABSIZE
59 63
All other fields are reserved. Figure 22. SDR1 SDR1 is a hypervisor resource; see Chapter 2. The HTABORG field in SDR1 contains the high-order 42 bits of the 60-bit real address of the Page Table. The Page Table is thus constrained to lie on a 218 byte (256 KB) boundary at a minimum. At least 11 bits from the hash function (see Figure 19) are used to index into the Page Table. The minimum size Page Table is 256 KB (211 PTEGs of 128 bytes each).
774
775
776
5.7.7.4 Relaxed Page Table Alignment [Category: Server.Relaxed Page Table Alignment]
The Page Table can be aligned on any 218 byte (256 KB) boundary regardless of the HTAB size. Section 5.7.7.2 describes the Storage Description Register, which includes the HTABORG field. That description generally applies except for the following difference. As the Page Table size is increased beyond 256 KB, the value in HTABORG need not have more of its low-order bits equal to 0. Instead, (HTABORG || 180) is the real address of the start of the Page Table regardless of the Page Table size. A Page Table search is performed as described in Section 5.7.7.3 except the 60-bit real address of a PTEG for both the primary and, if the secondary Page Table search is enabled, the secondary hash is formed by concatenating the following values: Bits 0:27 of the 39-bit appropriate primary or secondary hash value ANDed with the mask
777
Programming Note The following cases apply to virtual pages in a segment with a smaller page size, in which case there may be multiple PTEs for the virtual page. When a virtual page is accessed, hardware maintains the R and C bits in one or more of the PTEs corresponding to the referenced virtual page, and not necessarily the PTE that would be used if the Page Table were searched for the storage access (see Section 5.7.7.1). Therefore the R and C bits for a given virtual page in a segment with a smaller base page size may each be updated in a different PTE (subject to the requirements in Figure 28). If a virtual page access results in the R or C bits needing to be set to 1, the Page Table may be searched for any PTE corresponding to the virtual page accessed, and if the PTE is not found, a Data Storage Interrupt or Hypervisor Data Storage Interrupt may occur. When the hardware updates the Reference and Change bits in the Page Table Entry, the accesses are performed as described in Section 5.7.3.5, Storage Control Attributes for Implicit Storage Accesses on page 768. The accesses may be performed using operations equivalent to a store to a byte, halfword, word, or doubleword, and are not necessarily performed as an atomic read/modify/write of the affected bytes. These Reference and Change bit updates are not necessarily immediately visible to software. Executing a sync instruction ensures that all Reference and Change bit updates associated with address translations that were performed, by the thread executing the sync instruction, before the sync instruction is executed will be performed with respect to that thread before the sync instructions memory barrier is created. There are additional requirements for synchronizing Reference and Change bit updates in multi-threaded systems; see Section 5.10, Page Table Update Synchronization Requirements on page 804. Programming Note Because the sync instruction is execution synchronizing, the set of Reference and Change bit updates that are performed with respect to the thread executing the sync instruction before the memory barrier is created includes all Reference and Change bit updates associated with instructions preceding the sync instruction. If software refers to a Page Table Entry when MSRDR=1, the Reference and Change bits in the associated Page Table Entry are set as for ordinary loads and stores. See Section 5.10 for the rules software must follow when updating Reference and Change bits.
778
Status of Access Indexed Move Assist insn w 0 len in XER Storage protection violation Out-of-order I-fetch or Load-type insn or dcbtst Out-of-order Store-type insn (except dcbtst) Would be required by the sequential execution model in the absence of system-caused or imprecise interrupts3 All other cases In-order Load-type or Store-type insn, access not performed4 Load-type insn Store-type insn Other in-order access I-fetch Ordinary Load, eciwx Other ordinary Store, ecowx, dcbz icbi, dcbt, dcbtst, dcbst, dcbf[l]
R C No No Acc1 No Acc No
Acc Acc
Acc1 2 No
No Acc2 No No Yes No
Acc means that it is acceptable to set the bit. 1 It is preferable not to set the bit. 2 If C is set, R is also set unless it is already set. 3 For Floating-Point Enabled Exception type Program interrupts, imprecise refers to the exception mode controlled by MSRFE0 FE1. 4 This case does not apply to the Touch instructions, because they do not cause a storage access. Figure 23. Setting the Reference and Change bits
779
(AMR), shown in Figure 24. The access permissions associated with the Virtual Page Class Key Protection mechanism apply only to data accesses, and only when address translation is enabled. The Virtual Page Class Key Protection mechanism has no effect on instruction fetches. Programming Note If address translation is disabled for a given access, the access is not affected by the Virtual Page Class Key Protection mechanism even if the access is made in virtual real addressing mode.
MSR bits HV, IR, DR, PR the key bits in the associated SLB entry the page protection bits and key bits in the associated PTE the AMR, AMOR, and UAMOR LPCR bits LPES1 and VPM0
...
The storage protection mechanism consists of the Virtual Page Class Key Protection mechanism, described in Section 5.7.9.1, and the Basic Storage Protection mechanism, described in Section 5.7.9.2 and Section 5.7.9.3. When address translation is enabled for an access, the access is permitted if and only if the access is permitted by both the Virtual Page Class Key Protection mechanism and the Basic Storage Protection mechanism. When address translation is disabled for an access, the access is permitted if and only if the access is permitted by the Basic Storage Protection mechanism. If an instruction fetch is not permitted, an Instruction Storage exception or a Hypervisor Instruction Storage exception is generated. If a data access is not permitted, a Data Storage exception or a Hypervisor Data Storage exception is generated. A protection domain is a maximal range of effective addresses for which variables related to storage protection can be independently specified (including by default, as in real and hypervisor real addressing modes), or a maximal range of addresses, effective or virtual, for which variables related to storage protection cannot be specified. Examples include: a segment, a virtual page (including for a virtualized Real Mode Area), the Real Mode Area (regardless of whether the RMA is virtualized), the effective address range 0:260-1 in hypervisor real addressing mode, and a maximal range of effective or virtual addresses that cannot be mapped to real addresses. A protection boundary is a boundary between protection domains.
Description Access mask for class number 0 Access mask for class number 1 Access mask for class number n Access mask for class number 31
Figure 24. Authority Mask Register (AMR) The access mask for each class defines the access permissions that apply to loads and stores for which the virtual address is translated using a Page Table Entry that contains a KEY field value equal to the class number. The access permissions associated with each class are defined as follows, where AMR2n and AMR2n+1 refer to the first and second bits of the access mask corresponding to class number n.
A store is permitted if AMR2n=0b0; otherwise the store is not permitted. A load is permitted if AMR2n+1=0b0; otherwise the load is not permitted.
The AMR can be accessed using either SPR 13 or SPR 29. Access to the AMR using SPR 29 is privileged. Programming Note Because the AMR is part of the program context (if address translation is enabled), and because it is desirable for most application programmers not to have to understand the software synchronization requirements for context alterations (or the nuances of address translation and storage protection), operating systems should provide a system library program that application programs can use to modify the AMR. The Authority Mask Override Register (AMOR) and the User Authority Mask Override Register (UAMOR), shown in Figure 25 and Figure 26 respectively, can be
780
Programming Note The preceding requirement permits designs to implement the AMOR and/or UAMOR as 32-bit registers specifically, to implement only the evennumbered bits (or only the odd-numbered bits) of the register in a manner such that the reduction, from the architecturally-required 64 bits to 32 bits, is not visible to (correct) software. This implementation technique saves space in the hardware. (A design that uses this technique does the appropriate fan in/out when the register is accessed, to provide the appearance, to (correct) software, of supporting all 64 bits of the register.) Permitting designs to implement the [U]AMOR as 32-bit registers by virtue of the software requirement specified above, rather than by defining the [U]AMOR as 32-bit registers, permits the architecture to be extended in the future to support controlling modification of the read access AMR bits (the odd-numbered bits) independently from the write access AMR bits (the even-numbered bits), if that proves desirable. If this independent control does prove desirable, the only architecture change would be to eliminate the software requirement. Programming Note When modifying the AMOR and/or UAMOR, the hypervisor should ensure that the two registers are consistent with one another before giving control to a non-hypervisor program. In particular, the hypervisor should ensure that if AMORi=0 then UAMORi=0, for all i in the range 0:63. (Having AMORi=0 and UAMORi=1 would permit problem state programs, but not the operating system, to modify AMR bit i.)
Mask
Override
Register
UAMOR
0 63
Figure 26. User Authority Mask Override Register (UAMOR) The bits of the AMOR and UAMOR are in 1-1 correspondence with the bits of the AMR (i.e., [U]AMORi corresponds to AMRi). The AMOR affects modifications of the AMR and UAMOR in privileged but non hypervisor state; the UAMOR affects modifications of the AMR in problem state. When mtspr specifying the AMR (using either SPR 13 or SPR 29) is executed in privileged but non-hypervisor state, the AMOR is used as a mask that controls which bits of the resulting AMR contents come from register RS and which AMR bits are not modified. Similarly, when mtspr specifying the AMR (using SPR 13) is executed in problem state, the UAMOR is used as a mask that controls which bits of the resulting AMR contents come from register RS and which AMR bits are not modified. When mtspr specifying the UAMOR is executed in privileged but non-hypervisor state, the AMOR is ANDed with the contents of register RS and the result is placed into the UAMOR; the AMOR thereby controls which bits of the resulting UAMOR contents come from register RS and which UAMOR bits are set to zero. A complete description of these effects can be found in the description of the mtspr instruction on page 755. Software must ensure that both bits of each even/odd bit pair of the AMOR contain the same value. i.e., the contents of register RS for mtspr specifying the AMOR must be such that (RS)2n = (RS)2n+1 for every n in the range 0:31 and likewise for the UAMOR. If this requirement is violated for the UAMOR the results of accessing the UAMOR (including implicitly by the hardware as described in the second item of the preceding list) are boundedly undefined; if the requirement is violated for the AMOR the results of accessing the AMOR (including implicitly by the hardware as described in the first and third items of the list) are undefined.
781
Set bits 2:3 and 62:63 of SPR 29 (which is either the ACCR or the AMR) to x, where x is the desired 2-bit value for controlling Data Address Compare matches, and set bits 0:1 to 0s. Set PTE154 (which is either the AC bit or KEY4) to the same value that the AC bit would be set to, and set PTE12:3 (which are either RPN bits, that correspond to a real address size larger than the size supported by any implementation that supports the Data Address Compare mechanism, or KEY0:1) and PTE152:53 (which are either reserved bits or KEY2:3) to 0s. Use PTEKEY values 0 and 1 only for purposes of emulating the Data Address Compare mechanism, except that PTEKEY value 0 may
also be used for any virtual pages for which it is desired that the Virtual Page Class Key Protection mechanism permit all accesses. Do not use PTEKEY =31.
When a Data Storage interrupt occurs, if DSISR42=1 then ignore the interrupt for Cache Management instructions other than dcbz. (These instructions can cause a virtual page class key protection violation but cannot cause a Data Address Compare match.) Otherwise treat the interrupt as if a Data Address Compare match had occurred. (Note: Cases for which it is undefined whether a Data Address Compare match occurs do not necessarily cause a virtual page class key protection violation.)
(Because privileged software can access the AMR using either SPR 13 or SPR 29, it might seem that, when SPR 13 was added to the architecture (in Version 2.06), SPR 29 should have been removed. SPR 29 is retained for two reasons: first, to avoid requiring privileged software to change to use the newer SPR number; and second, to retain the ability to emulate the Data Address Compare mechanism as described above.)
782
Programming Note An example of the use of the AMOR (and UAMOR) is to support lightweight partitions, here called adjunct partitions, that provide services (e.g., device drivers) to client partitions. The adjunct partition would be managed by the hypervisor. It would run in problem state with MSRHV PR=0b11, thereby restricting the resources it can modify (MSRPR=1) and causing its interrupts to go to the hypervisor (MSRHV=1), and it would share a Page Table with the client partition it serves. Typically, each of the two partitions would have data storage that the other partition must not be able to access. The hypervisor can use the AMOR, UAMOR, AMR, and PTE KEY field to provide the required protection. (The adjunct partitions lightness of weight derives from not requiring an operating system, and especially from not requiring a full partition context switch (SLB flush, TLB flush, SDR1 change, etc.) when the client partition invokes the services of the adjunct partition.) For example, suppose each of the two partitions must not be able to access any of the other partition's data storage. The hypervisor could use KEY value j for all data virtual pages that only the adjunct partition must be able to access. Before dispatching the client partition for the first time, the hypervisor would initialize the three registers as follows. AMR: all 0s except bits 2j and 2j+1, which would contain 1s UAMOR: all 0s AMOR: all 1s except bits 2j and 2j+1, which would contain 0s Before dispatching the adjunct partition, the hypervisor would set UAMOR to all 0s, and would set the AMR to all 1s except bits 2j and 2j+1, which would be set to 0s. (Because the adjunct partition would run in problem state, there is no need for the hypervisor to modify the AMOR, and the adjunct partition cannot modify the UAMOR.) In addition, the hypervisor would prevent the client partition from modifying or deleting PTEs that contain translations used by the adjunct partition. (It may be desirable to avoid using KEY values 0, 1, and 31 for storage that only the adjunct partition can access, because these KEY values may be needed by the client partition to emulate the Data Address Compare mechanism, as described above. Also, old software, that was written for an implementation that complies with a version of the architecture that precedes Version 2.04 (the version in which virtual page class keys were added), effectively uses KEY 0 for all virtual pages.)
Programming Note Initialization of the UAMOR to all 0s, by the hypervisor before dispatching a partition for the first time, as described in the preceding Programming Note, permits operating systems (in partitions that run in a compatibility mode corresponding to Version 2.06 of the architecture or a subsequent version) to migrate gradually to supporting problem state access to the AMR specifically, to avoid having to be changed immediately to modify the UAMOR and to save the AMR contents when an interrupt occurs from problem state. Relatedly, having the UAMOR contain all 0s while an application program is running protects old application programs that are AMR-unaware. In the absence of programming errors, such application programs would not attempt to read or modify the AMR. However, having the UAMOR contain all 0s protects such programs against modifying the AMR inadvertently. Permitting an AMR-unaware application program to modify the AMR (inadvertently) is potentially harmful for the obvious reasons. (The program might set to 1 an AMR bit corresponding to accesses that are necessary in order for the program to work correctly.) Moreover, even for an operating system that includes support for problem state modification of the AMR, having the UAMOR contain all 0s allows the operating system to avoid saving and restoring the AMR for AMR-unaware application programs. Such an operating system would provide a system service program that allows an application program to declare itself to be AMR-aware i.e., potentially to need to modify the AMR. When an application program invokes this service, the operating system would set the UAMOR to the non-zero value appropriate to the access authorities (load and/or store, for one or more key values) that the application program is allowed to modify, and thereafter would save and restore the AMR (and preserve the UAMOR) for this application program. (Having the UAMOR contain all 0s does not prevent an AMR-unaware program from reading the AMR, but inadvertent reading of the AMR is likely to be much less harmful than inadvertently modifying it.) (For partitions that run in a compatibility mode corresponding to a version of the architecture that precedes Version 2.06, the PCR provides sufficient protection to application programs.)
783
HV 0 1 0 1
If the effective address for the access is less than the value specified by the RMLS, the access authority is read/write; otherwise the access is not permitted.
All PP encodings not shown above are reserved. The results of using reserved PP encodings are boundedly undefined. Figure 27. PP bit protection states, address translation enabled
784
785
Bit W1,3 I3 M2 G
1
Storage Control Attribute 0 - not Write Through Required 1 - Write Through Required 0 - not Caching Inhibited 1 - Caching Inhibited 0 - not Memory Coherence Required 1 - Memory Coherence Required 0 - not Guarded 1 - Guarded
Support for the 1 value of the W bit is optional. Implementations that do not support the 1 value treat the bit as reserved and assume its value to be 0. 2 [Category: Memory Coherence] Support for the 0 value of the M bit is optional, implementations that do not support the 0 value assume the value of the bit to be 1, and may either preserve the value of the bit or write it as 1. 3 [Category: SAO] The combination WIMG = 0b1110 has behavior unrelated to the meanings of the individual bits. See see Section 5.8.2.1, Storage Control Bit Restrictions for additional information. Figure 29. Storage control bits When address translation is enabled, instructions are not fetched from storage for which the G bit in the Page Table Entry is set to 1; see Section 5.7.9. When address translation is disabled, the storage control attributes are implicit; see Section 5.7.3.3. In Sections 5.8.2.1 and 5.8.2.2, access includes accesses that are performed out-of-order, and references to W, I, M, and G bits include the values of those bits that are implied when address translation is disabled. Programming Note In a system consisting of only a single-threaded processor which has caches, correct coherent execution does not require storage to be accessed as Memory Coherence Required, and accessing storage as not Memory Coherence Required may give better performance.
Programming Note If an application program requests both the Write Through Required and the Caching Inhibited attributes for a given storage location, the operating system should set the I bit to 1 and the W bit to 0. For implementations that support the SAO category, the operating system should provide a means by which application programs can request SAO storage, in order to avoid confusion with the preceding guideline (since SAO is encoded using WI=0b11). At any given time, the value of the W bit must be the same for all accesses to a given real page. At any given time, the value of the I bit must be the same for all accesses to a given real page.
786
Programming Note The storage control bit alterations described above are examples of cases in which the directives for application of statements about the W and I bits to SAO given in the third paragraph of the preceding subsection must be applied. A transition from the typical WIMG=0b0010 for ordinary storage to WIMG=0b1110 for SAO storage does not require the flush described above because both WIMG combinations indicate storage that is not Caching Inhibited. Programming Note It is recommended that dcbf be used, rather than dcbfl, when changing the value of the I or W bit from 0 to 1. (dcbfl would have to be executed on all threads for which the contents of the data cache may be inconsistent with the new value of the bit, whereas, if the M bit for the page is 1, dcbf need be executed on only one thread in the system.) When changing the value of the M bit for a given real page, software must ensure that all data caches are consistent with main storage. The actions required to do this are system-dependent. Programming Note For example, when changing the M bit in some directory-based systems, software may be required to execute dcbf[l] on each thread to flush all storage locations accessed with the old M value before permitting the locations to be accessed with the new M value. Additional requirements for changing the storage control bits in the Page Table are given in Section 5.10.
787
788
Programming Note The function of all the instructions described in Sections 5.9.3.1 - 5.9.3.3 is independent of whether address translation is enabled or disabled. For a discussion of software synchronization requirements when invalidating SLB and TLB entries, see Chapter 10.
5.9.3.1
Programming Note Accesses to a given SLB entry caused by the instructions described in this section obey the sequential execution model with respect to the contents of the entry and with respect to data dependencies on those contents. That is, if an instruction sequence contains two or more of these instructions, when the sequence has completed, the final contents of the SLB entry and of General Purpose Registers is as if the instructions had been executed in program order. However, software synchronization is required in order to ensure that any alterations of the entry take effect correctly with respect to address translation; see Chapter 10.
789
X-form
The hardware ignores the contents of RB listed below and software must set them to 0s.
RB ///
11
///
16
RB
21
434
/
31
ea0:35 (RB)0:35 if, for SLB entry that translates or most recently translated ea, entry_class = (RB)36 and entry_seg_size = size specified in (RB)37:38 then for SLB entry (if any) that translates ea SLBEV 0 all other fields of SLBE undefined else log_base_2(entry_seg_size) s esid (RB)0:63-s undefined 1-bit value u if u then if an SLB entry translates esid SLBEV 0 undefined all other fields of SLBE Let the Effective Address (EA) be any EA for which EA0:35 = (RB)0:35. Let the class be (RB)36. Let the segment size be equal to the segment size specified in (RB)37:38; the allowed values of (RB)37:38, and the correspondence between the values and the segment size, are the same as for the B field in the SLBE (see Figure 17 on page 770). The class value and segment size must be the same as the class value and segment size in the SLB entry that translates the EA, or the values that were in the SLB entry that most recently translated the EA if the translation is no longer in the SLB; if these values are not the same, it is implementation-dependent whether the SLB entry (or implementation-dependent translation information) that translates the EA is invalidated, and the next paragraph need not apply. If the SLB contains only a single entry that translates the EA, then that is the only SLB entry that is invalidated, except that it is implementation-dependent whether an implementation-specific lookaside entry for a real mode address translation is invalidated. If the SLB contains more than one such entry, then zero or more such entries are invalidated, and similarly for any implementation-specific lookaside information used in address translation; additionally, a machine check may occur. SLB entries are invalidated by setting the V bit in the entry to 0, and the remaining fields of the entry are set to undefined values.
If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros. This instruction is privileged. Special Registers Altered: None Programming Note slbie does not affect SLBs on other threads. Programming Note The reason the class value specified by slbie must be the same as the Class value that is or was in the relevant SLB entry is that the hardware may use these values to optimize invalidation of implementation-specific lookaside information used in address translation. If the value specified by slbie differs from the value that is or was in the relevant SLB entry, these optimizations may produce incorrect results. (An example of implementation-specific address translation lookaside information is the set of recently used translations of effective addresses to real addresses that some implementations maintain in an Effective to Real Address Translation (ERAT) lookaside buffer.) When switching tasks in certain cases, it may be advantageous to preserve some implementationspecific lookaside entries while invalidating others. The IH=0b001 invalidation hint of the slbia instruction can be used for this purpose if SLB class values are appropriately assigned, i.e. a class value of 0 gives the hint that the entry should be preserved and a class value of 1 indicates the entry must be invalidated. Also, it is advantageous to assign a class value of 1 to entries that need to be invalidated via an slbie instruction while preserving implementation-specific lookaside entries that are not derived from an SLB entry since such entries are assigned a class value of 0. The Move To Segment Register instructions (see Section 5.9.3.2.1) create SLB entries in which the Class value is 0. Programming Note The B value in register RB may be needed for invalidating ERAT entries corresponding to the translation being invalidated.
790
X-form
Programming Note If slbia is executed when instruction address translation is enabled, software can ensure that attempting to fetch the instruction following the slbia does not cause an Instruction Segment interrupt by placing the slbia and the subsequent instruction in the effective segment mapped by SLB entry 0. (The preceding assumes that no other interrupts occur between executing the slbia and executing the subsequent instruction.) Programming Note The defined values for IH are as follows. 0b000 All ERAT entries are invalidated. (This value is not a hint.) This value should be used by the hypervisor when relocating itself (i.e. when modifying the HRMOR) or when reconfiguring real storage. Preserve ERAT entries with a Class value of 0. This value should be used by an operating system when switching tasks in certain cases; for example, if SLBEC=0 is used for SLB translations shared between the tasks. Preserve ERAT entries created when MSRIR/DR=0. This value should generally be used by an operating system when switching tasks. Preserve ERAT entries created when MSRHV=1 and MSRIR/DR=0. This value should be used by the hypervisor when switching partitions.
IH //
8
IH
11
///
16
///
21
498
/
31
for each SLB entry except SLB entry 0 SLBEV 0 undefined all other fields of SLBE For all SLB entries except SLB entry 0, the V bit in the entry is set to 0, making the entry invalid, and the remaining fields of the entry are set to undefined values. SLB entry 0 is not altered. On implementations that have implementation-specific lookaside information for effective to real address translations, the IH field provides a hint that can be used to invalidate entries selectively in such lookaside information. The defined values for IH are as follows. 0b000 All such implementation-specific lookaside information is invalidated. (This value is not a hint.) Preserve such implementation-specific lookaside information having a Class value of 0. Preserve such implementation-specific lookaside information created when MSRIR/DR=0. Preserve such implementation-specific lookaside information created when MSRHV=1, MSRPR=0, and MSRIR/DR=0.
0b001
0b010
0b110
All other IH values are reserved. If the IH field contains a reserved value, the hint provided by the IH field is undefined. Implementation specific lookaside information for which preservation is not requested is invalidated. Implementation specific lookaside information for which preservation is requested may be invalidated. When IH=0b000, execution of this instruction has the side effect of clearing the storage access history associated with the Hypervisor Real Mode Storage Control facility. See Section 5.7.3.3.1, Hypervisor Real Mode Storage Control for more details. This instruction is privileged. Special Registers Altered: None Programming Note slbia does not affect SLBs on other threads.
Programming Note slbia serves as both a basic and an extended mnemonic. The Assembler will recognize an slbia mnemonic with one operand as the basic form, and an slbia mnemonic with no operand as the extended form. In the extended form the IH operand is omitted and assumed to be 0.
791
X-form
Programming Note The reason slbmte cannot be used to invalidate an SLB entry is that it does not necessarily affect implementation-specific address translation lookaside information. slbie (or slbia) must be used for this purpose.
RS,RB RS
11
///
16
RB
21
402
/
31
The SLB entry specified by bits 52:63 of register RB is loaded from register RS and from the remainder of register RB. The contents of these registers are interpreted as shown in Figure 30. RS B
0 2
VSID
KsKpNLC 0 LP 0s 52 57 58 60 63
RB ESID
0
V
36 37
0s
52
index
63
RS0:1 RS2:51 RS52 RS53 RS54 RS55 RS56 RS57 RS58:59 RS60:63 RB0:35 RB36 RB37:51 RB52:63
B VSID Ks Kp N L C must be 0b0 LP must be 0b0000 ESID V must be 0b000 || 0x000 index, which selects the SLB entry
Figure 30. GPR contents for slbmte On implementations that support a virtual address size of only n bits, n<78, (RS)0:77-n must be zeros. (RS)57 and(RS)60:63 are to be ignored by the hardware. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 0:15). This instruction cannot be used to invalidate an SLB entry. This instruction is privileged. Special Registers Altered: None
792
X-form
X-form
RT,RB RT
11
///
16
RB
21
851
/
31 0
31
///
16
RB
21
915
/
31
If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the B, VSID, Ks, Kp, N, L, C, and LP fields of the entry are placed into register RT. The contents of these registers are interpreted as shown in Figure 31. RT
If the SLB entry specified by bits 52:63 of register RB is valid (V=1), the contents of the ESID and V fields of the entry are placed into register RT. The contents of these registers are interpreted as shown in Figure 32. RT ESID V
36 37
0s
63
B
0 2
VSID
KsKpNLC 0 LP
52 57 58
0s
60 63
RB 0s index
52 63
RB 0s
0 52
index
63
RT0:1 RT2:51 RT52 RT53 RT54 RT55 RT56 RT57 RT58:59 RT60:63 RB0:51 RB52:63
B VSID Ks Kp N L C set to 0b0 LP set to 0b0000 must be 0x0_0000_0000_0000 index, which selects the SLB entry
ESID V set to 0b000 || 0x00_0000 must be 0x0_0000_0000_0000 index, which selects the SLB entry
Figure 32. GPR contents for slbmfee If the SLB entry specified by bits 52:63 of register RB is invalid (V=0), the contents of register RT are set to 0. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. This instruction is privileged. Special Registers Altered: None
Figure 31. GPR contents for slbmfev On implementations that support a virtual address size of only n bits, n<78, RT0:77-n are set to zeros. If the SLB entry specified by bits 52:63 of register RB is invalid (V=0), the contents of register RT are set to 0. High-order bits of (RB)52:63 that correspond to SLB entries beyond the size of the SLB provided by the implementation must be zeros. This instruction is privileged. Special Registers Altered: None
793
X-form
If this instruction is executed in 32-bit mode, (RB)0:31 must be zeros (i.e., the ESID must be in the range 015). This instruction is privileged. Special Registers Altered: CR0
RT
11
///
16
RB
21
979
1
31
The SLB is searched for an entry that matches the effective address specified by register RB. The search is performed as if it were being performed for purposes of address translation. E.g., in order for a given entry to satisfy the search, the entry must be valid (V=1), and (RB)0:63-s must equal SLBE[ESID0:63-s] (where 2s is the segment size selected by the B field in the entry). If exactly one matching entry is found, the contents of the B, VSID, Ks, Kp, N, L, C, and LP fields of the entry are placed into register RT. If no matching entry is found, register RT is set to 0. If more than one matching entry is found, either one of the matching entries is used, as if it were the only matching entry, or a Machine Check occurs. If a Machine Check occurs, register RT, and CR Field 0 are set to undefined values, and the description below of how this register and this field is set does not apply. The contents of registers RT and RB are interpreted as shown in Figure 33. RT B
0 2
VSID
KsKpNLC 0 LP 0s 52 57 58 60 63
RB ESID
0 40
0s
63
RT0:1 RT2:51 RT52 RT53 RT54 RT55 RT56 RT57 RT58:59 RT60:63 RB0:35 RB36:63
Figure 33. GPR contents for slbfee. If s > 28, RT80-s:51 are set to zeros. On implementations that support a virtual address size of only n bits, n < 78, RT2:79-n are set to zeros. CR Field 0 is set as follows. j is a 1-bit value that is equal to 0b1 if a matching entry was found. Otherwise, j is 0b0. CR0LT GT EQ SO = 0b00 || j || XERSO
794
. KsKpN 0 VSID23:49 32 33 36 37 63
RB --0 32
ESID
36
--63
Figure 34. GPR contents for mtsr, mtsrin, mfsr, and mfsrin Programming Note The Segment Register format used by the instructions described in this section corresponds to the low-order 32 bits of RS and RT shown in the figure. This format is essentially the same as that for the Segment Registers of 32-bit PowerPC implementations. The only differences are the following. Bit 36 corresponds to a reserved bit in Segment Registers. Software must supply 0 for the bit because it corresponds to the L bit in SLB entries, and large pages are not supported for SLB entries created by the Move To Segment Register instructions. VSID bits 23:25 correspond to reserved bits in Segment Registers. Software can use these extra VSID bits to create VSIDs that are larger than those supported by the Segment Register Manipulation instructions of 32-bit PowerPC implementations. Bit 32 of RS and RT corresponds to the T (directstore) bit of early 32-bit PowerPC implementations. No corresponding bit exists in SLB entries. Programming Note The Programming Note in the introduction to Section 5.9.3.1 applies also to the Segment Register Manipulation instructions described in this section, and to any combination of the instructions described in the two sections, except as specified below for mfsr and mfsrin. The requirement that the SLB contain at most one entry that translates a given effective address (see Section 5.7.6.1) applies to SLB entries created by mtsr and mtsrin. This requirement is satisfied naturally if only mtsr and mtsrin are used to create SLB entries for a given ESID, because for these instructions the association between SLB entries and ESID values is fixed (SLB entry i is used for ESID i). However, care must be taken if slbmte is also used to create SLB entries for the ESID, because for slbmte the association between SLB entries and ESID values is specified by software.
795
X-form
SR,RS RS /
11 12
SR
16
///
21
210
/
31 0
31
///
16
RB
21
242
/
31
The SLB entry specified by SR is loaded from register RS, as follows. SLBE Bit(s) 0:31 32:35 36 37:38 39:61 62:88 89:91 92 93 94 95:96 Set to 0x0000_0000 SR 0b1 0b00 0b000||0x0_0000 (RS)37:63 (RS)33:35 (RS)36 0b0 0b0 0b00 SLB Field(s) ESID0:31 ESID32:35 V B VSID0:22 VSID23:49 KsKpN L ((RS)36 must be 0b0) C reserved LP
The SLB entry specified by (RB)32:35 is loaded from register RS, as follows. SLBE Bit(s) 0:31 32:35 36 37:38 39:61 62:88 89:91 92 93 94 95:96 Set to 0x0000_0000 (RB)32:35 0b1 0b00 0b000||0x0_0000 (RS)37:63 (RS)33:35 (RS)36 0b0 0b0 0b00 SLB Field(s) ESID0:31 ESID32:35 V B VSID0:22 VSID23:49 KsKpN L ((RS)36 must be 0b0) C reserved LP
MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction is privileged. Special Registers Altered: None
MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction is privileged. Special Registers Altered: None
796
X-form
RT,SR RT /
11 12
SR
16
///
21
595
/
31 0
31
///
16
RB
21
659
/
31
The contents of the low-order 27 bits of the VSID field and the contents of the Ks, Kp, N, and L fields of the SLB entry specified by SR are placed into register RT as follows. SLBE Bit(s) 62:88 89:91 92 Copied to RT37:63 RT33:35 RT36 SLB Field(s) VSID23:49 KsKpN L (SLBEL must be 0b0)
The contents of the low-order 27 bits of the VSID field and the contents of the Ks, Kp, N, and L fields of the SLB entry specified by (RB)32:35 are placed into register RT as follows. SLBE Bit(s) 62:88 89:91 92 Copied to RT37:63 RT33:35 RT36 SLB Field(s) VSID23:49 KsKpN L (SLBEL must be 0b0)
RT32 is set to 0. The contents of RT0:31 are undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction must be used only to read an SLB entry that was, or could have been, created by mtsr or mtsrin and has not subsequently been invalidated (i.e., an SLB entry in which ESID<16, V=1, VSID<227, L=0, and C=0). If the SLB entry is invalid (V=0), RT33:63 are set to 0. Otherwise the contents of register RT are undefined. This instruction is privileged. Special Registers Altered: None RT32 is set to 0. The contents of RT0:31 are undefined. MSRSF must be 0 when this instruction is executed; otherwise the results are boundedly undefined. This instruction must be used only to read an SLB entry that was, or could have been, created by mtsr or mtsrin and has not subsequently been invalidated (i.e., an SLB entry in which ESID<16, V=1, VSID<227, L=0, and C=0). If the SLB entry is invalid (V=0), RT33:63 are set to 0. Otherwise the contents of register RT are undefined. This instruction is privileged. Special Registers Altered: None
797
X-form
RB if L=1: AVA
0 44
LP
0s B
52 54
AVAL L
56 63
31
0 6
RS
11
///
16
RB
21
306
/
31
RS32:63 contains an LPID value. The supported (RS)32:63 values are the same as the LPID values supported in LPIDR. RS0:31 must contain zeros and are ignored by the hardware. If the L field in RB contains 0, the base page size is 4 KB and RB56:58 (AP - Actual Page size field) must be set to the SLBEL||LP encoding for the page size corresponding to the actual page size specified by the PTE that was used to create the TLB entry to be invalidated. Thus, b is equal to 12 and p is equal to log2 (actual page size specified by (RB)56:58). The Abbreviated Virtual Address (AVA) field in register RB must contain bits 14:65 of the virtual address translated by the TLB entry to be invalidated. Variable i is equal to 51. If the L field in RB contains 1, the following rules apply. The base page size and actual page size are specified in the LP field in register RB, where the relationship between (RB)44:51 (LP - Large Page size selector field) and the base page size and actual page size is the same as the relationship between PTELP and the base page size and actual page size, except for the r bits (see Section 5.7.7.1 on page 773 and Figure 21 on page 774). Thus, b is equal to log2 (base page size specified by (RB)44:51) and p is equal to log2 (actual page size specified by (RB)44:51). Specifically, (RB)44+c:51 must be equal to the contents of bits c:7 of the LP field of the PTE that was used to create the TLB entry to be invalidated, where c is the maximum of 0 and (20-p). Variable i is the larger of (63-p) and the value that is the smaller of 43 and (63-b). (RB)0:i must contain bits 14:(i+14) of the virtual address translated by the TLB to be invalidated. If b>20, RB64-b:43 may contain any value and are ignored by the hardware. If b<20, (RB)56:75-b must contain bits 58:77-b of the virtual address translated by the TLB to be invalidated, and other bits in (RB)56:62 may contain any value and are ignored by the hardware. If b20, (RB)56:62 (AVAL - Abbreviated Virtual Address, Lower) may contain any value and are ignored by the hardware. Let the segment size be equal to the segment size specified in (RB)54:55 (B field). The contents of RB54:55 must be the same as the contents of the B field of the PTE that was used to create the TLB entry to be invalidated. RB52:53 and RB59:62 (when (RB)63 = 0) must contain zeros and are ignored by the hardware.
L (RB)63 if L = 0 then base_pg_size = 4K actual_pg_size = page size specified in (RB)56:58 i = 51 else base_pg_size = base page size specified in (RB)44:51 actual_pg_size = actual page size specified in (RB)44:51 i = max(min(43,63-b),63-p) log_base_2(base_pg_size) b p log_base_2(actual_pg_size) segment size specified in (RB)54:55 sg_size for each thread for each TLB entry if (entry_VA14:i+14 = (RB)0:i) & (entry_sg_size = sg_size) & (entry_base_pg_size = base_pg_size) & (entry_actual_pg_size = actual_pg_size) & ( ( TLBEs contain LPID & (TLBELPID = (RS)32:63) ) | ( TLBEs do not contain LPID & (LPIDRLPID = (RS)32:63) ) ) then if ((L = 0)|(b 20)) then TLB entry invalid else if (entry_VA58:77-b = (RB)56:75-b) then TLB entry invalid The operation performed by this instruction is based on the contents of registers RS and RB. The contents of these registers are shown below, where L is (RB)63. RS: 0s
0 32
LPID
63
RB if L=0: AVA
0
0s B AP 0s L
52 54 56 59 63
798
Programming Note For tlbie[l] instructions in which (RB)63=0, the AP value in RB is provided to make it easier for the hardware to locate address translations, in lookaside buffers, corresponding to the address translation being invalidated. For tlbie[l] instructions the AP specification is not binary compatible with versions of the architecture that precede Version 2.06. As an example, for an actual page size of 64 KB AP=0b101, whereas software written for an implementation that complies with a version of the architecture that precedes V. 2.06 would have AP=100 since AP was a 1 bit value followed by 0s in RB57:58. If binary compatibility is important, for a 64 KB page software can use AP=0b101 on these earlier implementations since these implementations were required to ignore RB57:58. Programming Note For tlbie[l] instructions the AVA and AVAL fields in RB contain different VA bits from those in PTEAVA.
799
X-form
RB ///
11
LP
IS
AVAL L
56 63
52 54
///
16
RB
21
274
/
31
IS=0b10 or 0b11: IS (RB)52:53 switch(IS) case (0b00): (RB)63 L if L = 0 then base_pg_size = 4K actual_pg_size = page size specified in (RB)56:58 i = 51 else base_pg_size = base page size specified in (RB)44:51 actual_pg_size = actual page size specified in (RB)44:51 i = max(min(43,63-b),63-p) b log_base_2(base_pg_size) log_base_2(actual_pg_size) p segment size specified in (RB)54:55 sg_size for each TLB entry if (entry_VA14:i+14 = (RB)0:i) & (entry_sg_size = segment_size) & (entry_base_pg_size = base_pg_size) & (entry_actual_pg_size = actual_pg_size) & (TLBEs do not contain LPID | (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) then if ((L = 0)|(b 20)) then TLB entry invalid else if (entry_VA58:77-b = (RB)56:75-b) then invalid TLB entry case (0b10): implementation-dependent number, 40i51 i for each TLB entry in set (RB)i:51 if (TLBEs do not contain LPID | (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) invalid then TLB entry case (0b11): i implementation-dependent number, 40i51 if MSRHV then TLBEs in set (RB)i:51 invalid else for each TLB entry in set (RB)i:51 if (TLBEs do not contain LPID | (TLBEs contain LPID & (TLBELPID=LPIDRLPID))) invalid then TLB entry The operation performed by this instruction is based on the contents of register RB. The contents of RB are shown below, where IS is (RB)52:53 and L is (RB)63. IS=0b00 and L=0: AVA
0
0s
0 40
SET
IS
52 54
0s
63
The Invalidation Selector (IS) field in RB has three defined values (0b00, 0b10, and 0b11). The IS value of 0b01 is reserved and is treated in the same manner as the corresponding case for instruction fields (see Section 1.3.3, Reserved Fields and Reserved Values on page 6 in Book I). Engineering Note IS field in RB contains 0b00 If the L field in RB contains 0, the base page size is 4 KB and RB56:58 (AP - Actual Page size field) must be set to the SLBEL||LP encoding for the page size corresponding to the actual page size specified by the PTE that was used to create the TLB entry to be invalidated. Thus, b is equal to 12 and p is equal to log2 (actual page size specified by (RB)56:58). The Abbreviated Virtual Address (AVA) field in register RB must contain bits 14:65 of the virtual address translated by the TLB entry to be invalidated. Variable i is equal to 51. If the L field in RB contains 1, the following rules apply. The base page size and actual page size are specified in the LP field in register RB, where the relationship between (RB)44:51 (LP - Large Page size selector field) and the base page size and actual page size is the same as the relationship between PTELP and the base page size and actual page size, except for the r bits (see Section 5.7.7.1 on page 773 and Figure 21 on page 774). Thus, b is equal to log2 (base page size specified by (RB)44:51) and p is equal to log2 (actual page size speci(RB)44+c:51 fied by (RB)44:51). Specifically, must be equal to the contents of bits c:7 of the LP field of the PTE that was used to create the TLB entry to be invalidated, where c is the maximum of 0 and (20-p). Variable i is the larger of (63-p) and the value that is the smaller of 43 and (63-b). (RB)0:i must contain bits 14:(i+14) of the virtual address translated by the TLB to be invalidated. If b>20, RB64-b:43 may contain any value and are ignored by the hardware. If b<20, (RB)56:75-b must contain bits 58:77-b of the virtual address translated by the TLB to be invalidated, and other bits in (RB)56:62 may
IS
B AP 0s L
56 59 63
52 54
800
801
Programming Note The primary use of this instruction by hypervisor software is to invalidate TLB entries prior to reassigning a thread to a new logical partition. For IS = 0b10 or 0b11, it is implementation-dependent whether ERAT entries are invalidated. If the tlbiel instruction is being executed due to a partition swap, an slbia instruction can be used to invalidate the pertinent ERAT entries. If the tlbiel instruction is being executed to invalidate TLB entries with parity or ECC errors, the fact that the corresponding ERAT entries are not invalidated is immaterial. If the tlbiel instruction is being executed to invalidate multiple matching TLB entries, the fact that the corresponding ERAT entries are not invalidated is immaterial for implementations that never create multiple matching ERAT entries. The primary use of this instruction by operating system software is to invalidate TLB entries that were created by the hypervisor using an implementation-specific hypervisor-managed TLB facility, if such a facility is provided. tlbiel may be executed on a given thread even if the sequence tlbie - eieio - tlbsync - ptesync is concurrently being executed on another thread. See also the Programming Notes with the description of the tlbie instruction.
X-form
///
11
///
16
///
21
370
/
31
invalid
All TLB entries are made invalid on the thread executing the tlbia instruction. This instruction is hypervisor privileged. This instruction is optional, and need not be implemented. Special Registers Altered: None Programming Note tlbia does not affect TLBs on other threads.
802
X-form
///
11
///
16
///
21
566
/
31
The tlbsync instruction provides an ordering function for the effects of all tlbie instructions executed by the thread executing the tlbsync instruction, with respect to the memory barrier created by a subsequent ptesync instruction executed by the same thread. Executing a tlbsync instruction ensures that all of the following will occur. All TLB invalidations caused by tlbie instructions preceding the tlbsync instruction will have completed on any other thread before any data accesses caused by instructions following the ptesync instruction are performed with respect to that thread. All storage accesses by other threads for which the address was translated using the translations being invalidated, and all Reference and Change bit updates associated with address translations that were performed by other threads using the translations being invalidated, will have been performed with respect to the thread executing the ptesync instruction, to the extent required by the associated Memory Coherence Required attributes, before the ptesync instructions memory barrier is created. The operation performed by this instruction is ordered by the eieio (or sync or ptesync) instruction with respect to preceding tlbie instructions executed by the thread executing the tlbsync instruction. The operations caused by tlbie and tlbsync are ordered by eieio as a fourth set of operations, which is independent of the other three sets that eieio orders. The tlbsync instruction may complete before operations caused by tlbie instructions preceding the tlbsync instruction have been performed. This instruction is hypervisor privileged. See Section 5.10 for a description of other requirements associated with the use of this instruction. Special Registers Altered: None Programming Note tlbsync should not be used to synchronize the completion of tlbiel.
803
The old LPID value (i.e., the contents of the LPIDR just before the mtspr instruction is executed) is the value L The new LPID value (i.e., the value specified by the mtspr instruction) is the value L
Other instructions (excluding mtspr instructions that modify the LPIDR as described above, and excluding tlbie instructions except as shown) may be interleaved with the instruction sequence shown above, but the instructions in the sequence must appear in the order
804
Before permitting an mtspr instruction that modifies the LPIDR to be executed on a given thread, software must ensure that no other thread will execute a conflicting instruction until after the mtspr instruction followed by a context synchronizing instruction have been executed on the given thread (a context synchronizing event can be used instead of the context synchronizing instruction; see Chapter 10). The conflicting instructions in this case are the following. a tlbie instruction specifying an LPID operand value that matches either the old or the new LPIDRLPID value a tlbsync instruction that is part of a tlbie-eieiotlbsync-ptesync sequence in which the tlbie instruction(s) specify an LPID value that matches either the old or the new LPIDRLPID value Programming Note The restrictions specified above regarding modifying the LPIDR apply even on systems consisting of only a single-threaded processor, and even if the new LPID value is equal to the old LPID value.
The sequences of operations shown in the following subsections assume a multi-threaded environment. In an environment consisting of only a single-threaded processor, the tlbsync must be omitted, and the eieio that separates the tlbie from the tlbsync can be omitted. In a multi-threaded environment, when tlbiel is used instead of tlbie in a Page Table update, the synchronization requirements are the same as when tlbie
805
Version 2.06 Revision B 5.10.1.2 Modifying a Page Table Entry General Case
If a valid entry is to be modified and the translation instantiated by the entry being modified is to be invalidated, the following sequence can be used to modify the PTE, maintain a consistent state, ensure that the translation instantiated by the old entry is no longer available, and ensure that a subsequent reference to the virtual address translated by the new entry will use the correct real address and associated attributes. (The sequence is equivalent to deleting the PTE and then adding a new one; see Sections 5.10.1.1 and 5.10.1.3.) PTEV 0 /* (other fields dont matter)*/ ptesync /* order update before tlbie and before next Page Table search */ tlbie(old_B,old_VA14:77-b,old_L,old_LP,old_AP, old_LPID) /*invalidate old translation*/ eieio /* order tlbie before tlbsync */ tlbsync /* order tlbie before ptesync */ ptesync /* order tlbie, tlbsync and 1st update before 2nd update */ new values PTEARPN,LP,AC,R,C,WIMG,N,PP eieio /* order 2nd update before 3rd */ PTEB,AVA,SW,L,H,V new values (V=1) ptesync /* order 2nd and 3rd updates before next Page Table search and before next data access */
806
Chapter 6. Interrupts
6.1 Overview. . . . . . . . . . . . . . . . . . . . 807 6.2 Interrupt Registers . . . . . . . . . . . . 808 6.2.1 Machine Status Save/Restore Registers . . . . . . . . . . . . . . . . . . . . . . . 808 6.2.2 Hypervisor Machine Status Save/ Restore Registers . . . . . . . . . . . . . . . . 808 6.2.3 Data Address Register . . . . . . . 808 6.2.4 Hypervisor Data Address Register. . . . . . . . . . . . . . . . . . . . . . . . 808 6.2.5 Data Storage Interrupt Status Register . . . . . . . . . . . . . . . . . . 808 6.2.6 Hypervisor Data Storage Interrupt Status Register . . . . . . . . . . . . . . . . . . 809 6.2.7 Hypervisor Emulation Instruction Register. . . . . . . . . . . . . . . . . . . . . . . . 809 6.2.8 Hypervisor Maintenance Exception Register. . . . . . . . . . . . . . . . . . . . . . . . 809 6.2.9 Hypervisor Maintenance Exception Enable Register. . . . . . . . . . . . . . . . . . 809 6.3 Interrupt Synchronization . . . . . . . 810 6.4 Interrupt Classes . . . . . . . . . . . . . 810 6.4.1 Precise Interrupt . . . . . . . . . . . . 810 6.4.2 Imprecise Interrupt. . . . . . . . . . . 810 6.4.3 Interrupt Processing . . . . . . . . . 811 6.4.4 Implicit alteration of HSRR0 and HSRR1 . . . . . . . . . . . . . . . . . . . . . . . . 813 6.5 Interrupt Definitions . . . . . . . . . . . 814 6.5.1 System Reset Interrupt . . . . . . . 815 6.5.2 Machine Check Interrupt . . . . . . 816 6.5.3 Data Storage Interrupt . . . . . . . . 818 6.5.4 Data Segment Interrupt . . . . . . . 819 6.5.5 Instruction Storage Interrupt . . . 819 6.5.6 Instruction Segment Interrupt. . . . . . . . . . . . . . . . . . . . . . . . 820 6.5.7 External Interrupt . . . . . . . . . . . . 820 6.5.8 Alignment Interrupt. . . . . . . . . . . 821 6.5.9 Program Interrupt . . . . . . . . . . . . 822 6.5.10 Floating-Point Unavailable Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 823 6.5.11 Decrementer Interrupt . . . . . . . 824 6.5.12 Hypervisor Decrementer Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 824 6.5.13 System Call Interrupt . . . . . . . . 824 6.5.14 Trace Interrupt [Category: Trace] . . . . . . . . . . . . . . . . . . . . . . . . . . 824 6.5.15 Hypervisor Data Storage Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 825 6.5.16 Hypervisor Instruction Storage Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 826 6.5.17 Hypervisor Emulation Assistance Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 826 6.5.18 Hypervisor Maintenance Interrupt . . . . . . . . . . . . . . . . . . . . . . . . 827 6.5.19 Performance Monitor Interrupt [Category: Server.Performance Monitor] . . . . . . . . . . . . . . . . . . . . . . . . 827 6.5.20 Vector Unavailable Interrupt [Category: Vector] . . . . . . . . . . . . . . . . 827 6.5.21 VSX Unavailable Interrupt [Category: VSX]. . . . . . . . . . . . . . . . . . 828 6.6 Partially Executed Instructions . . . . . . . . . . . . . . . . . . . . . 829 6.7 Exception Ordering . . . . . . . . . . . . 830 6.7.1 Unordered Exceptions . . . . . . . . 830 6.7.2 Ordered Exceptions . . . . . . . . . . 830 6.8 Interrupt Priorities . . . . . . . . . . . . . 831
6.1 Overview
The Power ISA provides an interrupt mechanism to allow the thread to change state as a result of external signals, errors, or unusual conditions arising in the execution of instructions. System Reset and Machine Check interrupts are not ordered. All other interrupts are ordered such that only
one interrupt is reported, and when it is processed (taken) no program state is lost. Since Save/Restore Registers SRR0 and SRR1 are serially reusable resources used by most interrupts, program state may be lost when an unordered interrupt is taken.
Chapter 6. Interrupts
807
Programming Note Execution of some instructions, and fetching instructions when MSRIR=1, may have the side effect of modifying HSRR0 and HSRR1; see Section 6.4.4.
//
62 63
SRR1
0 63
Figure 35. Save/Restore Registers SRR1 bits may be treated as reserved in a given implementation if they correspond to MSR bits that are reserved or are treated as reserved in that implementation and, for SRR1 bits in the range 33:36, 42:43, and 45:47, they are specified as being set either to 0 or to an undefined value for all interrupts that set SRR1 (including implementation-dependent setting, e.g. by the Machine Check interrupt or by implementation-specific interrupts). SRR144 cannot be treated as reserved, regardless of how it is set by interrupts, because it is used by software, as described in a Programming Note near the end of Section 6.5.9, Program Interrupt on page 822.
//
62 63
HSRR1
0 63
Figure 36. Hypervisor Save/Restore Registers HSRR1 bits may be treated as reserved in a given implementation if they correspond to MSR bits that are reserved or are treated as reserved in that implementation and, for HSRR1 bits in the range 33:36 and 42:47, they are specified as being set either to 0 or to an undefined value for all interrupts that set HSRR1 (including implementation-dependent setting, e.g. by implementation-specific interrupts). The HSRR0 and HSRR1 are hypervisor resources; see Chapter 2.
Figure 39. Data Storage Interrupt Status Register DSISR bits may be treated as reserved in a given implementation if they are specified as being set either to 0 or to an undefined value for all interrupts that set the DSISR (including implementation-dependent set-
808
When the mtspr instruction is executed with the HMER as the encoded Special Purpose Register, the contents of register RS are ANDed with the contents of the HMER and the result is placed into the HMER. The exception bits in the HMER are sticky; that is, once set to 1 they remain set to 1 until they are set to 0 by an mthmer instruction. Programming Note An access to the HMER is likely to be very slow. Software should access it sparingly.
Storage
Interrupt
Emulation
Instruction
Exception
Register (HMER) is associated with one or more causes of the Hypervisor Maintenance exception, and is set when the associated exception(s) occur. If the corresponding bit in the Hypervisor Maintenance Exception Enable Register (HMEER) is set, a Hypervisor Maintenance Interrupt (HMI) may occur. If the thread is in a powersaving mode when the interrupt would have occurred, the thread will exit the power-saving mode; see Section 6.5.18 and Section 3.3.2.
HMER
0 63
Maintenance
Exception
The contents of the HMER are as follows: 0 Set to 1 for a Malfunction Alert. 1 Set to 1 when performance is degraded for thermal reasons. 2 Set to 1 when thread recovery is invoked.
Chapter 6. Interrupts
809
appear to have completed with respect to the executing thread. 3. The instruction causing the exception may appear not to have begun execution (except for causing the exception), may have been partially executed, or may have completed, depending on the interrupt type. 4. Architecturally, no subsequent instruction has begun execution.
810
Chapter 6. Interrupts
811
The purpose of the instruction is to cause an interrupt. Example: System Call interrupt caused by sc. The interrupt is caused by a condition that is stated, in the instruction description, potentially to cause the interrupt. Example: Alignment interrupt caused by lwarx for which the storage operand is not aligned. The program is attempting to perform a function that it should not be permitted to perform. Example: Data Storage interrupt caused by lwz for which the storage operand is in storage that the program should not be permitted to access. (If the function is one that the program should be permitted to perform, the conditions that caused the interrupt should be corrected and the program re-dispatched such that the instruction will be re-executed. Example: Data Storage interrupt caused by lwz for which the storage operand is in storage that the program should be permitted to access but for which there currently is no PTE that satisfies the Page Table search.)
The interrupt is caused by a condition for which the instruction description (including related material such as the introduction to the section describing the instruction) implies that the instruction works correctly. Example: Alignment interrupt caused by lmw for which the storage operand is not aligned, or by dcbz for which the storage operand is in storage that is Write Through Required or Caching Inhibited. The instruction is an illegal instruction that should appear, to the program executing it, as if it were supported by the implementation. Example: A Hypervisor Emulation Assistance interrupt is caused by an instruction that has been phased out of the architecture but is still used by some programs that the operating system supports, or by an instruction that is in
Programming Note If a program modifies an instruction that it or another program will subsequently execute and the execution of the instruction causes an interrupt, the state of storage and the content of some registers may appear to be inconsistent to the interrupt handler program. For example, this could be the result of one program executing an instruction that causes a Hypervisor Emulation Assistance interrupt just before another instance of the same program stores an Add Immediate instruction in that storage location. To the interrupt handler code, it would appear that a hardware generated the interrupt as the result of executing a valid instruction.
812
Programming Note In order to handle Machine Check and System Reset interrupts correctly, the operating system should manage MSRRI as follows. In the Machine Check and System Reset interrupt handlers, interpret SRR1 bit 62 (where MSRRI is placed) as: - 0: interrupt is not recoverable - 1: interrupt is recoverable In each interrupt handler, when enough state has been saved that a Machine Check or System Reset interrupt can be recovered from, set MSRRI to 1. In each interrupt handler, do the following (in order) just before returning. 1. Set MSRRI to 0. 2. Set SRR0 and SRR1 to the values to be used by rfid. The new value of SRR1 should have bit 62 set to 1 (which will happen naturally if SRR1 is restored to the value saved there by the interrupt, because the interrupt handler will not be executing this sequence unless the interrupt is recoverable). 3. Execute rfid. For interrupts that set the SRRs other than Machine Check or System Reset, MSRRI can be managed similarly when these interrupts occur within interrupt handlers for other interrupts that set the SRRs. This Note does not apply to interrupts that set the HSRRs because these interrupts put the thread into hypervisor state, and either do not occur or can be prevented from occurring within interrupt handlers for other interrupts that set the HSRRs.
1. Branch instructions b[l][a], bc[l][a], bclr[l], bcctr[l] 2. Fixed-Point Load and Store Instructions lbz, lbzx, lhz, lhzx, lwz, lwzx, ld<64>, ldx<64>, stb, stbx, sth, sthx, stw, stwx, std<64>, stdx<64>
Similarly, fetching instructions may have the side effect of altering the contents of HSRR0 and HSRR1 unless MSRIR=0.
Chapter 6. Interrupts
813
effective address of the interrupt vector for each interrupt type. (Section 5.7.4 on page 768 summarizes all architecturally defined uses of effective addresses, including those implied by Figure 45.) MSR Bit IR DR FE0 FE1 EE RI ME HV 0 0 0 0 0 0 p 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 h - e 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 - - 1 0 0 0 0 0 0 - s 0 0 0 0 0 0 - m 0 0 0 0 0 - - 1 0 0 0 0 0 - - 1 0 0 0 0 0 - - 1 0 0 0 0 0 - - 1 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m 0 0 0 0 0 0 - m
bit is set to 0 bit is set to 1 bit is set to 1 if interrupt ocurred while the thread was in power-saving mode; otherwise not altered bit is not altered if LPES1=0, set to 1; otherwise not altered if LPES0=0, set to 1; otherwise not altered if LPES0=1, set to 0; otherwise not altered if LEV=1 or LPES1=0, set to 1; otherwise not altered
Settings for Other Bits Bits BE, FP, PMM, PR, SE, and VEC1, and VSX2 are set to 0. If the interrupt results in HV being equal to 1, the LE bit is copied from the HILE bit; otherwise the LE bit is copied from the LPCRILE bit. The SF bit is set to 1. Reserved bits are set as if written as 0. 1 Category: Vector 2 Category: Vector Scalar Emulation 3 Category: Floating-Point Figure 44. MSR setting due to interrupt
814
If multiple exceptions that cause exit from power-saving mode exist, the exception reported is the exception corresponding to
Chapter 6. Interrupts
815
45 46:47
Others MSR
01
In addition, if the interrupt occurs when the thread is in power-saving mode and is caused by an exception other than a System Reset exception, all other registers, except HSRR0 and HSRR1, that would be set by the corresponding interrupt if the exception occurred when the thread was not in power-saving mode are set by the System Reset interrupt, and are set to the values to which they would be set if the exception occurred when the thread was not in power-saving mode. Execution resumes at 0x0000_0000_0000_0100. effective address
10
The means for software to distinguish between poweron Reset and other types of System Reset are implementation-dependent.
11
816
10
The interrupt occurred when the thread was in power-saving mode. The state of some resources was not maintained, but the state of all hypervisor resources was maintained as if the thread was not in power-saving mode and the state of all other resources is such that the hypervisor can resume execution. The interrupt occurred when the thread was in power-saving mode. The state of some resources was not maintained, and the state of some hypervisor resources was not maintained or the state of some resources is such that the hypervisor cannot resume execution. Programming Note Although the resources that are maintained in power-saving mode (except in the doze power-saving level) are implementation-dependent, the hypervisor can avoid implementation-dependence in the portion of the System Reset and Machine Check interrupt handlers that recover from having been in power-saving mode by using the contents of SRR146:47, to determine what state to restore. (To avoid implementation-dependence in the portion of the hypervisor that enters power-saving mode, the hypervisor must use the specification of the four instructions to determine what state to save.)
Set to indicate whether the interrupt occurred when the thread was in powersaving mode and, if so, the extent to which resource state was maintained while the thread was in power-saving mode, as follows. 00 The interrupt did not occur when the thread was in power-saving mode. The interrupt occurred when the thread was in power-saving mode. The state of all resources was maintained as if the thread was not in power-saving mode.
62
01
If the interrupt did not occur while the thread was in power-saving mode, loaded from bit 62 of the MSR if the thread is in a recoverable state; otherwise set to 0. If the interrupt occurred while the thread was in power-saving mode, set to 1 if the thread is in a recoverable state; otherwise set to 0. Set to an implementation-dependent value. See Figure 44. Set to an implementation-dependent value. Set to an implementation-dependent value. effective address
A Machine Check interrupt caused by the existence of multiple SLB entries or TLB entries (or similar entries in implementation-specific translation caches) which translate a given effective or virtual address (see Sec-
Chapter 6. Interrupts
817
34:35 36
38 39:40 41 42
Set to 0. Set to 1 if MSRDR=1 and the translation for an attempted access is not found in the Page Table; otherwise set to 0.. Set to 0. Set to 1 if the access is not permitted by Figure 27 or 28, as appropriate; otherwise set to 0. Set to 1 if the access is due to a lq, stq, lbarx, lharx, lwarx, ldarx, stbcx., sthcx., stwcx., or stdcx. instruction that addresses storage that is Write Through Required or Caching Inhibited; otherwise set to 0. Set to 1 for a Store, dcbz, or ecowx instruction; otherwise set to 0. Set to 0. Set to 1 if a Data Address Breakpoint match occurs; otherwise set to 0. Set to 1 if the access is not permitted by virtual page class key protection; otherwise set to 0. Programming Note Storage protection violations for the Data Storage interrupt are reported in DSISR36 and DSISR42, whereas storage protection violations for the Instruction Storage interrupt are reported in SRR135 and SRR136.
43
44:63 DAR
Set to 1 if execution of an eciwx or ecowx instruction is attempted when EARE=0; otherwise set to 0. Set to 0. Set to the effective address of a storage element as described in the following list. The list should be read from the top down; the DAR is set as described by the first item that corresponds to an exception that is reported in the DSISR. For example, if a Load Word instruction causes a storage protection violation and a Data Address Breakpoint match (and both are reported in the DSISR), the DAR is set to the effective address of a byte in the first aligned doubleword for which access was attempted in the page that caused the exception. a Data Storage exception occurs for reasons other than a Data Address Breakpoint match or, for eciwx and ecowx, EARE=0 - a byte in the block that caused the exception, for a Cache Management instruction - a byte in the first aligned quadword for which access was attempted in the page that caused
818
SRR1 33
34
Chapter 6. Interrupts
819
If multiple Instruction Storage exceptions occur due to attempting to fetch a single instruction, any one or more of the bits corresponding to these exceptions may be set to 1 in SRR1. Execution resumes at 0x0000_0000_0000_0400. effective address
When LPES0=1, the following registers are set: SRR0 Set to the effective address of the instruction that the thread would have attempted to execute next if no interrupt conditions were present. Set to 0. Set to 0. Loaded from the MSR. See Figure 44 on page 814. effective address
Because the value of MSREE is always 1 when the thread is in problem state, the simpler expression MSREE |
(LPES0 | MSRHV)
is equivalent to the expression given above. Programming Note The Direct External exception has the same meaning as the External exception in versions of the architecture prior to Version 2.05.
An Instruction Segment interrupt occurs if MSRIR=1 and the translation of the effective address of the next instruction to be executed is not found in the SLB (or in any implementation-specific address translation lookaside information).
820
When LPES0=1, the following registers are set: SRR0 Set to the effective address of the instruction that the thread would have attempted to execute next if no interrupt conditions were present. Set to 0. Set to 0. Loaded from the MSR. See Figure 44 on page 814. effective address
SRR1 33:36 42:47 Others MSR DSISR 32:43 44:45 46 47:48 49 50:53
54:58 59:63
DAR
Set to the effective address computed by the instruction, except that if the interrupt
Chapter 6. Interrupts
821
Privileged Instruction The following applies if the instruction is executed when MSRPR = 1. A Privileged Instruction type Program interrupt is generated when execution is attempted of a privileged instruction, or of an mtspr or mfspr instruction with an SPR field that contains a value having spr0=1. The following applies if the instruction is executed when MSRHV PR = 0b00. A Privileged Instruction type Program interrupt is generated when execution is attempted of an mtspr or mfspr instruction with an SPR field that designates an SPR that is accessible by the instruction only when the thread is in hypervisor state, or when execution of a hypervisor-privileged instruction is attempted. Programming Note
effective
These are the only cases in which a Privileged Instruction type Program interrupt can be generated when MSRPR=0. They can be distinguished from other causes of Privileged Instruction type Program interrupts by examining SRR149 (the bit in which MSRPR was saved by the interrupt). Trap A Trap type Program interrupt is generated when any of the conditions specified in a Trap instruction is met. The following registers are set: SRR0 For all Program interrupts except a FloatingPoint Enabled Exception type Program interrupt, set to the effective address of the instruction that caused the corresponding exception. For a Floating-Point Enabled Exception type Program interrupt, set as described in the following list. - If MSRFE0 FE1 = 0b00, FPSCRFEX = 1, and an instruction is executed that changes MSRFE0 FE1 to a nonzero value, set to the effective address of the instruction that the thread would have attempted
The architecture does not support the use of an unaligned effective address by lharx, lwarx, ldarx, sthcx., stwcx., stdcx., eciwx, and ecowx. If an Alignment interrupt occurs because one of these instructions specifies an unaligned effective address, the Alignment interrupt handler must not attempt to simulate the instruction, but instead should treat the instruction as a programming error.
822
If MSRFE0 FE = 0b11, set to the effective address of the instruction that caused the Floating-Point Enabled Exception. If MSRFE0 FE = 0b01 or 0b10, set to the effective address of the first instruction that caused a Floating-Point Enabled Exception since the most recent time FPSCRFEX was changed from 1 to 0 or of some subsequent instruction. Programming Note If SRR0 is set to the effective address of a subsequent instruction, that instruction will not be beyond the first such instruction at which synchronization of floating-point instructions occurs. (Recall that such synchronization is caused by Floating-Point Status and Control Register instructions, as well as by execution synchronizing instructions and events.)
SRR1 33:36 42 43
44 45 46 47
Set to 0. Set to 0. Set to 1 for a Floating-Point Enabled Exception type Program interrupt; otherwise set to 0. Set to 0. Set to 1 for a Privileged Instruction type Program interrupt; otherwise set to 0. Set to 1 for a Trap type Program interrupt; otherwise set to 0. Set to 0 if SRR0 contains the address of the instruction causing the exception and there is only one such instruction; otherwise set to 1. Programming Note SRR147 can be set to 1 only if the exception is a Floating-Point Enabled Exception and either MSRFE0 FE1 = 0b01 or 0b10 or MSRFE0 FE1 has just been changed from 0b00 to a nonzero value. (SRR147 is always set to 1 in the last case.)
In versions of the architecture that precede V. 2.05, the conditions that now cause a Hypervisor Emulation Assistance interrupt instead caused an Illegal Instruction type Program interrupt. This was a Program interrupt for which registers (SRR0, SRR1, and the MSR) were set as described above for the Privileged Instruction type Program interrupt, except that SRR144 was set to 1 and SRR145 was set to 0. Thus operating systems have code to handle these conditions, at the Program interrupt vector location. For this reason, if a Hypervisor Emulation Assistance interrupt occurs, when the thread is not in hypervisor state, for an instruction that the hypervisor does not emulate, the hypervisor should pass control to the operating system at the operating system's Program interrupt vector location, with all registers (SRR0, SRR1, MSR, GPRs, etc.) set as if the instruction had caused a Privileged Instruction type Program interrupt, except with SRR144:45 set to 0b10. (The Hypervisor Emulation Assistance interrupt was added to the architecture in V. 2.05, and the Illegal Instruction type Program interrupt was removed from the architecture in V. 2.06. In V. 2.05 the Hypervisor Emulation Assistance interrupt was optional: implementations that supported it generated it as described in V. 2.06, and never generated an Illegal Instruction type Program interrupt; implementations that did not support it generated an Illegal Instruction type Program interrupt as described above.)
Others
Chapter 6. Interrupts
823
The following registers are set: SRR0 SRR1 33:36 42:47 Others MSR Set to the effective address of the instruction following the System Call instruction. Set to 0. Set to 0. Loaded from the MSR. See Figure 44 on page 814. effective address
An attempt to execute an sc instruction with LEV=1 in problem state should be treated as a programming error.
SRR1 33:36 and 42:47 Set to an implementation-dependent value. Others Loaded from the MSR. MSR See Figure 44 on page 814. effective address
Because the value of MSREE is always 1 when the thread is in problem state, the simpler expression (MSREE | (MSRHV)) & HDICE is equivalent to the expression given above.
824
Programming Note The following instructions are not traced. rfid hrfid sc, and Trap instructions that trap other instructions that cause interrupts (other than Trace interrupts) the first instructions of any interrupt handler instructions that are emulated by software In general, interrupt handlers can achieve the effect of tracing these instructions.
34:35 36
37
38 39:40 41 42
44:63 HDAR
Chapter 6. Interrupts
825
34 35 36
If multiple Hypervisor Instruction Storage exceptions occur due to attempting to fetch a single instruction, any one or more of the bits corresponding to these exceptions may be set to 1 in HSRR1. Execution resumes at 0x0000_0000_0000_0E10. effective address
826
The exception bits in the HMER are sticky; that is, once set to 1 they remain set to 1 until they are set to 0 by an mthmer instruction. Execution resumes at 0x0000_0000_0000_0E60. effective address
Programming Note Because the value of MSREE is always 1 when the thread is in problem state, the simpler expression (MSREE | (MSRHV)) is equivalent to the expression given above. Programming Note If an implementation uses the HMER to record that a readable resource, such as the Time Base, has been corrupted, then, because the HMI is disabled in the hypervisor state, it is necessary for the hypervisor to check HMER after reading that resource to be sure an error has not occurred.
If a Hypervisor Emulation Assistance interrupt occurs, when the thread is not in hypervisor state, for an instruction that the hypervisor does not emulate, the hypervisor should pass control to the operating system as if the instruction had caused an "Illegal Instruction type Program interrupt", as described in a Programming Note near the end of Section 6.5.9, Program Interrupt on page 822.
Chapter 6. Interrupts
827
828
Programming Note An exception may result in the partial execution of a Load or Store instruction. For example, if the Page Table Entry that translates the address of the storage operand is altered, by a program running on another thread, such that the new contents of the Page Table Entry preclude performing the access, the alteration could cause the Load or Store instruction to be aborted after having been partially executed. As stated in the Book II section cited above, if an instruction is partially executed the contents of registers are preserved to the extent that the instruction can be re-executed correctly. The consequent preservation is described in the following list. For any given instruction, zero, one, or two items in the list apply. For a fixed-point Load instruction that is not a multiple or string form, or for an eciwx instruction, if RT=RA or RT=RB then the contents of register RT are not altered. For an lq instruction, if RT+1 = RA then the contents of register RT+1 are not altered. For an update form Load or Store instruction, the contents of register RA are not altered.
The architecture does not support continuation of an aborted instruction but intends that the aborted instruction be re-executed if appropriate.
Chapter 6. Interrupts
829
Instruction-Caused and Precise 1. Instruction Segment 2. [Hypervisor] Instruction Storage 3.a Hypervisor Emulation Assistance 3.b Program - Privileged Instruction 4. Function-Dependent 4.a Fixed-Point and Branch 1a Program - Trap 1b System Call 1c [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment 2 Trace 4.b Floating-Point 1 FP Unavailable 2a Program - Precise Mode Floating-Pt Enabled Excepn 2b [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment 3 Trace 4.c Vector 1 Vector Unavailable 2a [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment 3 Trace 4.d VSX 1 VSX Unavailable 2a Program - Precise Mode Floating-Pt Enabled Excepn 2b [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment 3 Trace
For implementations that execute multiple instructions in parallel using pipeline or superscalar techniques, or combinations of these, it can be difficult to understand the ordering of exceptions.To understand this ordering it is useful to consider a model in which each instruction is fetched, then decoded, then executed, all before the next instruction is fetched. In this model, the exceptions a single instruction would generate are in the order shown in the list of instruction-caused exceptions. Exceptions with different numbers have different ordering. Exceptions with the same numbering but different lettering are mutually exclusive and cannot be caused by the same instruction. The External, Decrementer, and Hypervisor Decrementer interrupts have equal ordering. Similarly, where Data Storage, Data Segment, and Alignment exceptions are listed in the same item they have equal ordering. Even on threads that are capable of executing several instructions simultaneously, or out of order, instructioncaused interrupts (precise and imprecise) occur in program order.
830
priate ordered interrupt if no higher priority exception exists when the interrupt is to be generated. Within this category a particular instruction may present more than a single exception. When this occurs, those exceptions are ordered in priority as indicated in the following lists. Where [Hypervisor] Data Storage, Data Segment, and Alignment exceptions are listed in the same item they have equal priority (i.e., the hardware may generate any one of the three interrupts for which an exception exists). A. Fixed-Point Loads and Stores a. These exceptions are mutually exclusive and have the same priority: Hypervisor Emulation Assistance Program - Privileged Instruction b. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment c. Trace B. Floating-Point Loads and Stores a. Hypervisor Emulation Assistance b. Floating-Point Unavailable c. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment d. Trace C. Vector Loads and Stores a. Hypervisor Emulation Assistance b. Vector Unavailable c. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment d. Trace D. VSX Loads and Stores a. Hypervisor Emulation Assistance b. VSX Unavailable c. [Hypervisor] Data Storage, [Hypervisor] Data Segment, or Alignment d. Trace E. Other Floating-Point Instructions a. Hypervisor Emulation Assistance b. Floating-Point Unavailable c. Program - Precise Mode Floating-Point Enabled Exception d. Trace F. Other Vector Instructions a. Hypervisor Emulation Assistance b. Vector Unavailable c. Trace G. Other VSX Instructions a. Hypervisor Emulation Assistance b. VSX Unavailable c. Program - Precise Mode Floating-Point Enabled Exception d. Trace H. rfid, hrfid and mtmsr[d] a. Program - Privileged Instruction b. Program - Floating-Point Enabled Exception
Chapter 6. Interrupts
831
832
7.1 Overview
The Time Base, Decrementer, Hypervisor Decrementer, Processor Utilization of Resources, and Scaled Processor Utilization of Resources registers provide timing functions for the system. The remainder of this section describes these registers and related facilities.
See Chapter 5 of Book II for infromation about the update frequency of the Time Base. The Time Base is implemented such that: 1. Loading a GPR from the Time Base has no effect on the accuracy of the Time Base. 2. Copying the contents of a GPR to the Time Base replaces the contents of the Time Base with the contents of the GPR. The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and other frequencies, such as the CPU clock or bus clock in a Power ISA system. The Time Base update frequency is not required to be constant. What is required, so that system software can keep time of day and operate interval timers, is one of the following. The system provides an (implementation-dependent) interrupt to software whenever the update frequency of the Time Base changes, and a means to determine what the current update frequency is. The update frequency of the Time Base is under the control of the system software. Implementations must provide a means for either preventing the Time Base from incrementing or preventing it from being read in problem state (MSRPR=1). If the means is under software control, it must be privileged and, in implementations of the Server environment, must be accessible only in hypervisor state (MSRHV PR = 0b10). There must be a method for getting all Time Bases in the system to start incrementing with values that are identical or almost identical.
TBU40 TBU
0 32
/// TBL
63
Description Upper 40 bits of Time Base Upper 32 bits of Time Base Lower 32 bits of Time Base
Figure 46. Time Base The Time Base is a hypervisor resource; see Chapter 2. The SPRs TBU40, TBU, and TBL provide access to the fields of the Time Base shown in Figure 46. When a mtspr instruction is executed specifying one of these SPRs, the associated field of the Time Base is altered and the remaining bits of the Time Base are not affected.
833
Programming Note If software initializes the Time Base on power-on to some reasonable value and the update frequency of the Time Base is constant, the Time Base can be used as a source of values that increase at a constant rate, such as for time stamps in trace entries. Even if the update frequency is not constant, values read from the Time Base are monotonically increasing (except when the Time Base wraps from 264-1 to 0). If a trace entry is recorded each time the update frequency changes, the sequence of Time Base values can be post-processed to become actual time values. Successive readings of the Time Base may return identical values. If Time Base bits 60:63 are used as part of a random number generator, software must account for the fact that these bits are set to 0x0 only when bit 59 changes state regardless of whether or not they incremented to 0xF since they were previously set to 0x0. See the description of the Time Base in Chapter 5 of Book II for ways to compute time of day in POSIX format from the Time Base.
7.3 Decrementer
The Decrementer (DEC) is a 32-bit decrementing counter that provides a mechanism for causing a Decrementer interrupt after a programmable delay. The contents of the Decrementer are treated as a signed integer. DEC
32
63
Figure 47. Decrementer The Decrementer counts down. Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next decrement their value becomes 0xFFF_FFFF. Decrementer bits 60:63 may decrement at a variable rate. When the value of bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 decrement to 0x0 before the value of bit 59 changes, they remain at 0x0 until the value of bit 59 changes. The Decrementer is driven at the same frequency as the Time Base. The period of the Decrementer will depend on the driving frequency, but if the same values are used as given earlier for the Time Base (see Section 5.2 of Book II), and if the Time Base update frequency is constant, the period would be 2 32 TDEC = -------------------- = 137 seconds. 1 GHz When the contents of DEC32 change from 0 to 1, a Decrementer exception will come into existence within a reasonable period of time. When the contents of DEC32 change from 1 to 0, the existing Decrementer exception, if any, will cease to exist within a reasonable period of time, but not later than the completion of the next context synchronizing instruction or event.
32
Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL prevents the possibility of a carry from TBL to TBU while the Time Base is being initialized. The preferred method of changing the Time Base utilizes the TBU40 facility. The following code sequence demonstrates the process. Assume the upper 40 bits of
834
Figure 48. Hypervisor Decrementer The Hypervisor Decrementer is a hypervisor resource; see Chapter 2. Hypervisor Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next decrement their value becomes 0xFFF_FFFF. Bits 60:63 may decrement at a variable rate. When the value of bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 decrement to 0x0 before the value of bit 59 changes, they remain at 0x0 until the value of bit 59 changes. The Hypervisor Decrementer is driven at the same frequency as the Time Base. The period of the Hypervisor Decrementer will depend on the driving frequency, but if the same values are used as given above for the Time Base (see Section 7.2), and if the Time Base update frequency is constant, the period would be 2 32 TDEC = -------------------- = 137 seconds. 1 GHz When the contents of HDEC32 change from 0 to 1 and the thread is not in a power-saving mode, a Hypervisor Decrementer exception will come into existence within a reasonable period of time. When a Hypervisor Decrementer interrupt occurs, the existing Hypervisor Decrementer exception will cease to exist within a reasonable period of time, but not later than the completion of the next context synchronizing instruction or event. Even if multiple HDEC32 transitions from 0 to 1 occur before a Hypervisor Decrementer interrupt occurs, at most one Hypervisor Decrementer exception exists. The preceding paragraph applies regardless of whether the change in the contents of HDEC32 is the result of decrementation of the Hypervisor Decrementer by the hardware or of modification of the Hypervisor Decrementer caused by execution of an mtspr instruction. The operation of the Hypervisor Decrementer has the following additional properties. 1. Loading a GPR from the Hypervisor Decrementer has no effect on the accuracy of the Hypervisor Decrementer. 2. Copying the contents of a GPR to the Hypervisor Decrementer replaces the contents of the Hypervisor Decrementer with the contents of the GPR.
32
835
Programming Note In systems that change the Time Base update frequency for purposes such as power management, the Hypervisor Decrementer update frequency will also change. Software must be aware of this in order to set interval timers. If Hypervisor Decrementer bits 60:63 are used as part of a random number generator, software must account for the fact that these bits are set to 0xF only when bit 59 changes state regardless of whether or not they decremented to 0x0 since they were previously set to 0xF. Programming Note A Hypervisor Decrementer exception is not created if the thread is in a power-saving mode when HDEC32 changes from 0 to 1 because having a Hypervisor Decrementer interrupt occur almost immediately after exiting the power-saving mode in this case is deemed unnecessary. The hypervisor already has control, and if a timed exit from the power-saving mode is necessary and possible, the hypervisor can use the Decrementer to exit the power-saving mode at the appropriate time. For sleep and rvwinkle power-saving levels, the state of the Hypervisor Decrementer and Decrementer is not necessarily maintained and updated.
Utilization
of
Resources
The PURR is a hypervisor resource; see Chapter 2. The contents of the PURR increase monotonically, unless altered by software, until the sum of the contents plus the amount by which it is to be increased exceed 0xFFFF_FFFF_FFFF_FFFF (264 - 1) at which point the contents are replaced by that sum modulo 264. There is no interrupt or other indication when this occurs. The rate at which the value represented by the contents of the PURR increases is an estimate of the portion of resources used by the thread per unit time with respect to other threads that share those resources monitored by the PURR. When the thread is idle, the rate at
Utilization
of
836
837
838
8.1 Overview
Implementations provide debug facilities to enable hardware and software debug functions, such as control flow tracing, data address breakpoints, and program single-stepping. The debug facilities described in this section consist of the Come-From Address Register (see Section 8.1.1), and the Data Address Breakpoint Register (DABR) and Data Address Breakpoint Register Extension (DABRX) (see Section 8.1.2). The interrupt associated with the Data Address Breakpoint registers is described in Section 6.5.3. The Trace facility, which can be used for single-stepping as well as for control flow tracing, is described in Section 6.5.14. The mfspr and mtspr instructions (see Section 4.4.4) provide access to the registers of the debug facilities. In addition to the facilities mentioned above, implementations typically provide debug facilities, modes, and access mechanisms that are implementation-specific. For example, implementations typically provide facilities for instruction address tracing, and also access to certain debug facilities via a dedicated interface such as the IEEE 1149.1 Test Access Port (JTAG).
//
62 63
Figure 51. Come-From Address Register The contents of the CFAR can be read and written using the mfspr and mtspr instructions. Acccess to the CFAR is privileged. Programming Note This register can be used for purposes of debugging software. For example, often a software bug results in the program executing a portion of the code that it should not have reached or causing an unexpected interrupt. In the former case, a breakpoint can be placed in the portion of the code that was erroneously reached and the program reexecuted. In either case, the interrupt handler can save the contents of the CFAR (before executing the first instruction that would modify the register), and then make the saved contents available for a debugger to use in determining the control flow path by which the exception was reached. In order to preserve the CFAR's contents for each partition and to prevent it from being used to implement a "covert channel" between partitions, the hypervisor should initialize/save/restore the CFAR when switching partitions on a given thread.
839
BT DW DR
61 62 63
privileged but non-hypervisor state and DABRXPNH = 1, or - problem state and DABRXPR = 1 the instruction is a Store and DABRDW = 1, or the instruction is a Load and DABRDR = 1. In 32-bit mode the high-order 32 bits of the EA are treated as zeros for the purpose of detecting a match. If the above conditions are satisfied, a match also occurs for eciwx and ecowx. For the purpose of determining whether a match occurs, eciwx is treated as a Load, and ecowx is treated as a Store. If the above conditions are satisfied, it is undefined whether a match occurs in the following cases. The instruction is Store Conditional but the store is not performed The instruction is dcbz. (For the purpose of determining whether a match occurs, dcbz is treated as a Store.) The Cache Management instructions other than dcbz never cause a match. A Data Address Breakpoint match causes a Data Storage exception or a Hypervisor Data Storage exception (see Section 6.5.3, Data Storage Interrupt on page 818 and Section 6.5.15, Hypervisor Data Storage Interrupt on page 825). If a match occurs, some or all of the bytes of the storage operand may have been accessed; however, if a Store or ecowx instruction causes the match, the storage operand is not modified if the instruction is one of the following: any Store instruction that causes an atomic access ecowx Programming Note
Bit(s) 0:60 61 62 63
Name DAB BT DW DR
Description Data Address Breakpoint Breakpoint Translation Data Write Data Read
BTI PRIVM
60 61 63
Bit(s) 60 61:63 61 62 63
Description Breakpoint Translation Ignore Privilege Mask Hypervisor state Privileged but Non-Hypervisor state Problem state
All other fields are reserved. Figure 53. Data Address Extension Breakpoint Register
The DABR and DABRX are hypervisor resources; see Section 2.7 on page 732. The supported PRIVM values are 0b000, 0b001, 0b010, 0b011, 0b100, and 0b111. If the PRIVM field does not contain one of the supported values, then whether a match occurs for a given storage access is undefined. Elsewhere in this section it is assumed that the PRIVM field contains one of the supported values. Programming Note PRIVM value 0b000 causes matches not to occur regardless of the contents of other DABR and DABRX fields. PRIVM values 0b101 and 0b110 are not supported because a storage location that is shared between the hypervisor and non-hypervisor software is unlikely to be accessed using the same EA by both the hypervisor and the non-hypervisor software. (PRIVM value 0b111 is supported primarily for reasons of software compatibility, as described in a subsequent Programming Note.) A Data Address Breakpoint match occurs for a Load or Store instruction if, for any byte accessed, all of the following conditions are satisfied. the access is - a quadword access and EA0:59 = DABR0:59, or - not a quadword access and EA0:60 = DABR0:60 (= DABRDAB) (MSRDR = DABRBT) | DABRXBTI the thread is in - hypervisor state and DABRXHYP = 1, or
The Data Address Breakpoint mechanism does not apply to instruction fetches. Programming Note Before setting a breakpoint requested by the operating system, the hypervisor must verify that the requested contents of the DABR and DABRX cannot cause the hypervisor to receive a Data Storage interrupt that it is not prepared to handle, or that it intrinsically cannot handle (e.g., the EA is in the range of EAs at which the hypervisor's Data Storage interrupt handler saves registers, DABRBT || DABRXBTI 0b10, DABRDW = 1, and DABRXHYP = 1).
840
Programming Note Implementations that comply with versions of the architecture that precede Version 2.02 do not provide the DABRX. Forward compatibility for software that was written for such implementations (and uses the Data Address Breakpoint facility) can be obtained by setting DABRX60:63 to 0b0111.
841
842
///
RID
58 63
Bit(s) 32 58:63
Name E RID
All other fields are reserved. Figure 54. External Access Register The External Access Register (EAR) is a hypervisor resource; see Chapter 2. The high-order bits of the RID field that correspond to bits of the Resource ID beyond the width of the Resource ID supported by the implementation are treated as reserved bits. Programming Note The hypervisor can use the EAR to control which programs are allowed to execute External Access instructions, when they are allowed to do so, and which devices they are allowed to communicate with using these instructions.
843
844
845
Instruction or Event interrupt rfid hrfid sc Trap mtmsrd (SF) mtmsr[d] (PR) mtmsr[d] (DR) mtsr[in] mtspr (SDR1) mtspr (AMR) mtspr (EAR) mtspr (RMOR) mtspr (HRMOR) mtspr (LPCR) mtspr (DABR) mtspr (DABRX) slbie slbia slbmte tlbie tlbiel tlbia Store(PTE)
Required Before none none none none none none none none CSI ptesync CSI CSI CSI CSI CSI --CSI CSI CSI CSI CSI CSI none
Required After none none none none none none none none CSI CSI CSI CSI CSI CSI CSI --CSI CSI CSI CSI ptesync CSI {ptesync, CSI}
Notes
3,4 15 13 13 13 2 2
11 5,7 5 5 6,7
Instruction or Event interrupt rfid hrfid sc Trap mtmsrd (SF) mtmsr[d] (EE) mtmsr[d] (PR) mtmsr[d] (FP) mtmsr[d](FE0,FE1) mtmsr[d] (SE, BE) mtmsr[d] (IR) mtmsr[d] (RI) mtsr[in] mtspr (DEC) mtspr (SDR1) mtspr (CTRL) mtspr (HDEC) mtspr (RMOR) mtspr (HRMOR) mtspr (LPCR) mtspr (LPIDR) mtspr (PCR) slbie slbia slbmte tlbie tlbiel tlbia Store(PTE)
Required Before none none none none none none none none none none none none none none none ptesync none none none none none CSI none none none none none none none none
Required After none none none none none none none none none none none none none CSI none CSI none none CSI CSI CSI CSI CSI CSI CSI CSI CSI CSI CSI {ptesync, CSI}
Notes
8 1 9
846
847
Programming Note If it is desired to set MSRIR to 1 early in an operating system interrupt handler, advantage can sometimes be taken of the fact that EA0:3 are ignored when forming the real address when address translation is disabled and MSRHV = 0. For example, if address translation resources are set such that effective address 0xn000_0000_0000_0000 maps to real address 0x000_0000_0000_0000 when address translation is enabled, where n is an arbitrary 4-bit value, the following code sequence, in real page 0, can be used early in the interrupt handler. la li sldi or mtctr bcctr rx,target ry,0xn000 ry,ry,48 rx,rx,ry # set high-order nibble of target addr to 0xn rx # branch to targ
targ: mfmsr rx ori rx,rx,0x0020 mtmsrd rx# set MSR[IR] to 1 The mtmsrd does not cause an implicit branch in real address space because the real address of the next sequential instruction is independent of MSRIR. Using mtmsrd, rather than rfid (or similar context synchronizing instruction that alters the control flow), may yield better performance on some implementations. (Variations on the technique are possible. For example, the target instruction of the bcctr can be in arbitrary real page P, where P is a 48-bit value, provided that effective address 0xn || P || 0x000 maps to real address P || 0x000 when address translation is enabled.)
10. The elapsed time between the contents of the Decrementer or Hypervisor Decrementer becoming negative and the signaling of the corresponding exception is not defined. 11. If an slbmte instruction alters the mapping, or associated attributes, of a currently mapped ESID, the slbmte must be preceded by an slbie (or slbia) instruction that invalidates the existing translation. This applies even if the corresponding entry is no longer in the SLB (the translation may still be in implementation-specific address translation lookaside information). No software synchronization is needed between the slbie and the slbmte, regardless of whether the index of the SLB entry (if any) containing the current translation is the same as the SLB index specified by the slbmte. No slbie (or slbia) is needed if the slbmte instruction replaces a valid SLB entry with a mapping of a
848
849
850
851
Table 3: Extended mnemonics for moving to/from an SPR Special Purpose Register Fixed-Point Exception Register Link Register Count Register Data Stream Control Register Data Storage Interrupt Status Register Data Address Register Decrementer Storage Description Register 1 Save/Restore Register 0 Save/Restore Register 1 Come-From Address Register AMR CTRL UAMOR Special Purpose Registers G0 through G3 Time Base [Lower] Time Base Upper Time Base Upper 40 Processor Version Register HMER HMEER AMOR MMCRA PMC1 PMC2 PMC3 PMC4 PMC5 PMC6 MMCR0 MMCR1 PPR PPR322 Processor Identification Register
1
Move To SPR Extended mtxer Rx mtlr Rx mtctr Rx mtdscr Rx mtdsisr Rx mtdar Rx mtdec Rx mtsdr1 Rx mtsrr0 Rx mtsrr1 Rx mtcfar Rx mtamr Rx mtctrl Rx mtuamor Rx mtsprg n,Rx mttbl Rx mttbu Rx mttbu40 Rx mthmer Rx mthmeer Rx mtamor Rx mtmmcra Rx mtpmc1 Rx mtpmc2 Rx mtpmc3 Rx mtpmc4 Rx mtpmc5 Rx mtpmc6 Rx mtmmcr0 Rx mtmmcr1 Rx mtppr Rx mtppr32 Rx Equivalent to mtspr 1,Rx mtspr 8,Rx mtspr 9,Rx mtspr 17,Rx mtspr 18,Rx mtspr 19,Rx mtspr 22,Rx mtspr 25,Rx mtspr 26,Rx mtspr 27,Rx mtspr 28,Rx mtspr 29,Rx mtspr 152,Rx mtspr 157,Rx mtspr 272+n,Rx mtspr 284,Rx mtspr 285,Rx mtspr 286,Rx mtspr 336,Rx mtspr 337,Rx mtspr 349,Rx mtspr 786,Rx mtspr 787,Rx mtspr 788,Rx mtspr 789,Rx mtspr 790,Rx mtspr 791,Rx mtspr 792,Rx mtspr 795,Rx mtspr 798,Rx mtspr 896, Rx mtspr 898, Rx -
Move From SPR1 Extended mfxer Rx mflr Rx mfctr Rx mfdscr Rx mfdsisr Rx mfdar Rx mfdec Rx mfsdr1 Rx mfsrr0 Rx mfsrr1 Rx mfcfar Rx mfamr Rx mfctrl Rx mfuamor Rx mfsprg Rx,n mftb Rx mftbu Rx mfpvr Rx mfhmer Rx mfhmeer Rx mfamor Rx mfmmcra Rx mfpmc1 Rx mfpmc2 Rx mfpmc3 Rx mfpmc4 Rx mfpmc5 Rx mfpmc6 Rx mfmmcr0 Rx mfmmcr1 Rx mfppr Rx mfppr32 Rx mfpir Rx Equivalent to mfspr Rx,1 mfspr Rx,8 mfspr Rx,9 mfspr Rx,17 mfspr Rx,18 mfspr Rx,19 mfspr Rx,22 mfspr Rx,25 mfspr Rx,26 mfspr Rx,27 mfspr Rx,28 mfspr Rx,29 mfspr Rx,136 mfspr Rx,157 mfspr Rx,272+n mftb Rx,2681 mfspr Rx,268 mftb Rx,2691 mfspr Rx,269 mfspr Rx,287 mfspr Rx,336 mfspr Rx,337 mfspr Rx,349 mfspr Rx,770 mfspr Rx,771 mfspr Rx,772 mfspr Rx,773 mfspr Rx,774 mfspr Rx,775 mfspr Rx,776 mfspr Rx,779 mfspr Rx,782 mfspr Rx, 896 mfspr Rx, 898 mfspr Rx,1023
The mftb instruction is Category: Phased-Out. Assemblers targeting version 2.03 or later of the architecture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 5.2.1 of Book II). 2 Category: Phased-In
852
SIAR and SDAR (Sampled Instruction Address Register and Sampled Data Address Register), which contain the address of the sampled instruction and of the sampled data
the Performance Monitor interrupt, which can be caused by monitored conditions and events The minimal subset of the features that makes the resulting Performance Monitor useful to software consists of MSRPMM, PMC1, PMC2, PMC3, PMC4, MMCR0, MMCR1, and MMCRA and certain bits and fields of these three Monitor Mode Control Registers, and the Performance Monitor Interrupt. These features are identified as the basic features below. The remaining features (the remaining SPRs, and the remaining bits and fields in the three Monitor Mode Control Registers) are considered extensions. The events that can be counted in the PMCs as well as the code that identifies each event are implementationdependent. The events and codes may vary between PMCs, as well as between implementations. For the programmable PMCs, the event to be counted is selected by specifying the appropriate code in the MMCR Selector field for the PMC. Some events may include operations that are performed out-of-order. Many aspects of the operation of the Performance Monitor are summarized by the following hierarchy, which is described starting at the lowest level. A counter negative condition exists when the value in a PMC is negative (i.e., when bit 0 of the PMC is 1). A Time Base transition event occurs when a selected bit of the Time Base changes from 0 to 1 (the bit is selected by an MMCR field). The term condition or event is used as an abbreviation for counter negative condition or Time Base transition event. A condition or event can be caused implicitly by the hardware (e.g., incrementing a PMC) or explicitly by software (mtspr). A condition or event is enabled if the corresponding Enable bit in an MMCR is 1. The occurrence of an enabled condition or event can have side effects within the Performance Monitor, such as causing the PMCs to cease counting. An enabled condition or event causes a Performance Monitor alert if Performance Monitor alerts are enabled by the corresponding Enable bit in
PMM (Performance Monitor Mark), which can be used to select one or more programs for monitoring
SPRs
PMC1 - PMC6 (Performance Monitor Counter registers 1 - 6), which count events MMCR0, MMCR1, and MMCRA (Monitor Mode Control Registers 0, 1, and A), which control the Performance Monitor facility
853
Programming Note Software can use this bit as a process-specific marker which, in conjunction with MMCR0FCM0 FCM1 (see Section B.2.2), permits events to be counted on a process-specific basis. (The bit is saved by interrupts and restored by rfid.) Common uses of the PMM bit include the following. Count events for a few selected processes. This use requires the following bit settings. - MSRPMM=1 for the selected processes, MSRPMM=0 for all other processes - MMCR0FCM0=1 - MMCR0FCM1=0 Count events for all but a few selected processes. This use requires the following bit settings. - MSRPMM=1 for the selected processes, MSRPMM=0 for all other processes - MMCR0FCM0=0 - MMCR0FCM1=1 Notice that for both of these uses a mark value of 1 identifies the few processes and a mark value of 0 identifies the remaining many processes. Because the PMM bit is set to 0 when an interrupt occurs (see Figure 44 on page 814), interrupt handlers are treated as one of the many. If it is desired to treat interrupt handlers as one of the few, the mark value convention just described would be reversed.
854
SPR1 decimal spr5:9 spr0:4 786 11000 10010 787 11000 10011 788 11000 10100 789 11000 10101 790 11000 10110 791 11000 10111 792 11000 11000 795 11000 11011 796 11000 11100 797 11000 11101 798 11000 11110
1
Register Name MMCRA PMC1 PMC2 PMC3 PMC4 PMC5 PMC6 MMCR0 SIAR SDAR MMCR1
Privileged yes yes yes yes yes yes yes yes yes yes yes
Note that the order of the two 5-bit halves of the SPR number is reversed.
decimal 770,786 771,787 772,788 773,789 774,790 775,791 776,792 779,795 780,796 781,797 782,798
1
SPR1,2 spr5:9 spr0:4 11000 n0010 11000 n0011 11000 n0100 11000 n0101 11000 n0110 11000 n0111 11000 n1000 11000 n1011 11000 n1100 11000 n1101 11000 n1110
Register Name MMCRA PMC1 PMC2 PMC3 PMC4 PMC5 PMC6 MMCR0 SIAR SDAR MMCR1
Privileged no,yes no,yes no,yes no,yes no,yes no,yes no,yes no,yes no,yes no,yes no,yes
Figure 57. Performance Monitor Counter registers PMC1, PMC2, PMC3, and PMC4 are basic features. PMC5 and PMC6 are not programmable. PMC5 counts instructions completed and PMC6 counts cycles. Normally each PMC is incremented each hardware cycle by the number of times the corresponding event occurred in that cycle. Other modes of incrementing may also be provided (e.g., see the description of MMCR1 bits PMC1HIST and PMCjHIST). PMCj is used as an abbreviation for PMCi, i > 1. Programming Note PMC5 and PMC6 are defined to facilitate calculating basic performance metrics such as cycles per instruction (CPI).
Note that the order of the two 5-bit halves of the SPR number is reversed. 2 Reading the SPR is privileged if and only if n=1. Figure 55. Performance Monitor SPR encodings for mfspr
855
Programming Note Software can use a PMC to pace the collection of Performance Monitor data. For example, if it is desired to collect event counts every n cycles, software can specify that a particular PMC count cycles and set that PMC to 0x8000_0000 - n. The events of interest would be counted in other PMCs. The counter negative condition that will occur after n cycles can, with the appropriate setting of MMCR bits, cause counter values to become frozen, cause a Performance Monitor interrupt to occur, etc.
Freeze Counters while Mark = 1 (FCM1) This bit is a basic feature. 0 1 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are not incremented if MSRPMM=1.
36
Freeze Counters while Mark = 0 (FCM0) This bit is a basic feature. 0 1 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are not incremented if MSRPMM=0.
37
Performance Monitor Alert Enable (PMAE) This bit is a basic feature. 0 1 Performance Monitor alerts are disabled. Performance Monitor alerts are enabled until a Performance Monitor alert occurs, at which time: MMCR0PMAE is set to 0 MMCR0PMAO is set to 1 Programming Note Software can set this bit and MMCR0PMAO to 0 to prevent Performance Monitor interrupts. Software can set this bit to 1 and then poll the bit to determine whether an enabled condition or event has occurred. This is especially useful for software that runs with MSREE=0. In earlier versions of the architecture that lacked the concept of Performance Monitor alerts, this bit was called Performance Monitor Exception Enable (PMXE).
Figure 58. Monitor Mode Control Register 0 MMCR0 is a basic feature. Within MMCR0, some of the bits and fields are basic features and some are extensions. The basic bits and fields are identified as such, below. Some bits of MMCR0 are altered by the hardware when various events occur, as described below. The bit definitions of MMCR0 are as follows. MMCR0 bits that are not implemented are treated as reserved. Bit(s) 0:31 32 Description Reserved Freeze Counters (FC) This bit is a basic feature. 0 The PMCs are incremented (if permitted by other MMCR bits). 1 The PMCs are not incremented. The hardware sets this bit to 1 when an enabled condition or event occurs and MMCR0FCECE=1. 33 Freeze Counters in Privileged State (FCS) This bit is a basic feature. 0 1 34 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are not incremented if MSRHV PR=0b00. 38
Freeze Counters on Enabled Condition or Event (FCECE) 0 1 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are incremented (if permitted by other MMCR bits) until an enabled condition or event occurs when MMCR0TRIGGER=0, at which time: MMCR0FC is set to 1
Freeze Counters in Problem State (FCP) This bit is a basic feature. 39:40
If the enabled condition or event occurs when MMCR0TRIGGER=1, the FCECE bit is treated as if it were 0. Time Base Selector (TBSEL)
856
See the description of the FCECE bit, above, regarding the interaction between TRIGGER and FCECE. Programming Note Uses of TRIGGER include the following. Resume counting in the PMCjs when PMC1 becomes negative, without causing a Performance Monitor interrupt. Then freeze all PMCs (and optionally cause a Performance Monitor interrupt) when a PMCj becomes negative. The PMCjs then reflect the events that occurred between the time PMC1 became negative and the time a PMCj becomes negative. This use requires the following MMCR0 bit settings.
TRIGGER=1 PMC1CE=0 PMCjCE=1 TBEE=0 FCECE=1 PMAE=1 (if a Performance Monitor interrupt is desired)
Reserved PMC1 Condition Enable (PMC1CE) This bit controls whether counter negative conditions due to a negative value in PMC1 are enabled. 0 1 Counter negative conditions for PMC1 are disabled. Counter negative conditions for PMC1 are enabled.
Resume counting in the PMCjs when PMC1 becomes negative, and cause a Performance Monitor interrupt without freezing any PMCs. The PMCjs then reflect the events that occurred between the time PMC1 became negative and the time the interrupt handler reads them. This use requires the following MMCR0 bit settings.
49
PMCj Condition Enable (PMCjCE) This bit controls whether counter negative conditions due to a negative value in any PMCj (i.e., in any PMC except PMC1) are enabled. 0 1 Counter negative conditions for all PMCjs are disabled. Counter negative conditions for all PMCjs are enabled. 51:52 53:55 56
50
Trigger (TRIGGER) 0 The PMCs are incremented (if permitted by other MMCR bits).
857
This bit is set to 1 by the hardware when a Performance Monitor alert occurs. This bit can be set to 0 only by the mtspr instruction. Programming Note Software can set this bit to 1 to simulate the occurrence of a Performance Monitor alert. Software should set this bit to 0 after handling the Performance Monitor alert. 57 58 Setting is implementation-dependent. Freeze Counters 1-4 (FC1-4) 0 1 59 PMC1 - PMC4 are incremented (if permitted by other MMCR bits). PMC1 - PMC4 are not incremented.
Figure 59. Monitor Mode Control Register 1 MMCR1 is a basic feature. Within MMCR1, some of the bits and fields are basic features and some are extensions. The basic bits and fields are identified as such, below. Some bits of MMCR1 are altered by the hardware when various events occur, as described below. The bit definitions of MMCR1 are as follows. MMCR1 bits that are not implemented are treated as reserved. Bit(s) 0:31 Description Implementation-Dependent Use These bits have implementation-dependent uses (e.g., extended event selection). 32:39 40:47 48:55 56:63 PMC1 Selector (PMC3SEL) PMC2 Selector (PMC4SEL) PMC3 Selector (PMC5SEL) PMC4 Selector (PMC6SEL) Each of these fields contains a code that identifies the event to be counted by PMCs 1 through 4 respectively. PMC Selectors are basic features. Compatibility Note In versions of the architecture that precede Version 2.02 the PMC Selector Fields were six bits long, and were split between MMCR0 and MMCR1. PMC1-8 were all programmable. If more programmable PMCs are implemented in the future, additional MMCRs may be defined to cover the additional selectors.
Freeze Counters 5-6 (FC5-6) 0 1 PMC5 - PMC6 are incremented (if permitted by other MMCR bits). PMC5 - PMC6 are not incremented.
60:61 62
Reserved Freeze Counters in Wait State (FCWAIT) This bit is a basic feature. 0 1 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are not incremented if CTRL31=0. Software is expected to set CTRL31=0 when it is in a wait state, i.e, when there is no process ready to run.
Only Branch Unit type of events do not increment if CTRL31=0. Other units continue to count. 63 Freeze Counters in Hypervisor State (FCH) This bit is a basic feature. 0 1 The PMCs are incremented (if permitted by other MMCR bits). The PMCs are not incremented if MSRHV PR=0b10.
Figure 60. Monitor Mode Control Register A MMCRA is a basic feature. Within MMCRA, some of the bits and fields are basic features and some are
858
Figure 61. Sampled Instruction Address Register When a Performance Monitor alert occurs, SIAR is set to the effective address of an instruction that was being executed, possibly out-of-order, at or around the time that the Performance Monitor alert occurred. This instruction is called the sampled instruction. The contents of SIAR may be altered by the hardware if and only if MMCR0PMAE=1. Thus after the Performance Monitor alert occurs, the contents of SIAR are not altered by the hardware until software sets MMCR0PMAE to 1. After software sets MMCR0PMAE to 1, the contents of SIAR are undefined until the next Performance Monitor alert occurs. See Section B.4 regarding the effects of the Trace facility on SIAR. Programming Note If the Performance Monitor alert causes a Performance Monitor interrupt, the value of MSRHV PR that was in effect when the sampled instruction was being executed is reported in MMCRA.
Figure 62. Sampled Data Address Register When a Performance Monitor alert occurs, SDAR is set to the effective address of the storage operand of an instruction that was being executed, possibly out-oforder, at or around the time that the Performance Monitor alert occurred. This storage operand is called the sampled data. The sampled data may be, but need not be, the storage operand (if any) of the sampled instruction (see Section B.2.5). The contents of SDAR may be altered by the hardware if and only if MMCR0PMAE=1. Thus after the Performance Monitor alert occurs, the contents of SDAR are not altered by the hardware until software sets
859
SRR1 33:36 and 42:47 Implementation-specific. Others Loaded from the MSR. MSR SIAR SDAR See Figure 44 on page 814. Set to the effective address of the sampled instruction (see Section B.2.5). Set to the effective address of the sampled data (see Section B.2.6). effective address
In general, statements about External and Decrementer interrupts elsewhere in this Book apply also to the Performance Monitor interrupt; for example, if a Performance Monitor exception exists when an mtmsrd[d] instruction is executed that changes MSREE from 0 to 1, the Performance Monitor interrupt will occur
860
35
36
42 43 44
45 46
47
MSRSE BE = 0b11
If MSRSE BE=0b11, the hardware generates a Trace exception under the conditions described in Section 6.5.14 for MSRSE BE=0b01, and also after successfully completing the execution of any instruction that would cause at least one of SRR1 bits 33:36, 42, and 44:46 to be set to 1 (see below) if the instruction were executed when MSRSE BE=0b10. This overrides the implicit statement in Section 6.5.14 that the effects of MSRSE BE=0b11 are the same as those of MSRSE BE=0b10.
SRR1
When a Trace interrupt occurs, the SRR1 bits that are not loaded from the MSR are set as follows instead of as described in Section 6.5.14. 33 Set to 1 if the traced instruction is icbi; otherwise set to 0.
If the state of the Performance Monitor is such that the Performance Monitor may be altering these registers (i.e., if MMCR0PMAE=1), the contents of SIAR and SDAR as used by the Trace facility are undefined and may change even when no Trace interrupt occurs.
861
862
863
If DSISR 47:53 is: 10 0 0011 10 0 0100 10 0 0101 10 0 0110 10 0 0111 10 0 1000 10 0 1001 10 0 1010 10 0 1011 10 0 1100 10 0 1101 10 0 1110 10 0 1111 10 1 0000 10 1 0001 10 1 0010 10 1 0011 10 1 0100 10 1 0101 10 1 0110 10 1 0111 10 1 1000 10 1 1001 10 1 1010 10 1 1011 10 1 1100 10 1 1101 10 1 1110 10 1 1111 11 0 0000 11 0 0001 11 0 0010 11 0 0011 11 0 0100 11 0 0101 11 0 0110 11 0 0111 11 0 1000 11 0 1001 11 0 1010 11 0 1011 11 0 1100 11 0 1101 11 1 1101 11 0 1110 11 0 1111 11 1 0000 11 1 0001 11 1 0010 11 1 0011 11 1 0100 11 1 0101 11 1 0110 11 1 0111 11 1 1000 11 1 1001 11 1 1010 11 1 1011 11 1 1100 11 1 1101 11 1 1110 11 1 1111
so the instruction is: stdcx. lwbrx stwbrx sthcx. lhbrx sthbrx eciwx ecowx stbcx. dcbz lwzx stwx lhzx lhax sthx lfsx lfdx stfsx stfdx lfdpx lfiwax lfiwzx stfdpx stfiwx lwzux stwux lhzux lhaux sthux lfsux lfdux stfsux stfdux -
864
865
866
Book III-E: Power ISA Operating Environment Architecture - Embedded Environment [Category: Embedded]
867
868
Chapter 1. Introduction
1.1 Overview. . . . . . . . . . . . . . . . . . . . 1.2 32-Bit Implementations . . . . . . . . . 1.3 Document Conventions . . . . . . . . 1.3.1 Definitions and Notation . . . . . . 1.3.2 Reserved Fields. . . . . . . . . . . . . 1.4 General Systems Overview . . . . . 869 869 869 869 871 871 1.5 Exceptions. . . . . . . . . . . . . . . . . . . 1.6 Synchronization . . . . . . . . . . . . . . 1.6.1 Context Synchronization . . . . . . 1.6.2 Execution Synchronization . . . . . 871 872 872 872
1.1 Overview
Chapter 1 of Book I describes computation modes, document conventions, a general systems overview, instruction formats, and storage addressing. This chapter augments that description as necessary for the Power ISA Operating Environment Architecture.
interrupt or Unimplemented Operation exception type Program interrupt, as appropriate. For system instruction storage error handler substitute Instruction Storage interrupt or Instruction TLB Error, as appropriate. For system privileged instruction error handler substitute Privileged Instruction exception type Program interrupt. For system service program substitute System Call interrupt. For system trap handler substitute Trap type Program interrupt.
Chapter 1. Introduction
869
interrupt The act of changing the machine state in response to an exception, as described in Chapter 7. Interrupts and Exceptions on page 991. trap interrupt An interrupt that results from execution of a Trap instruction. Additional exceptions to the sequential execution model, beyond those described in the section entitled Instruction Fetching in Book I, are the following.
A reset or Machine Check interrupt may occur. The determination of whether an instruction is required by the sequential execution model is not affected by the potential occurrence of a reset or Machine Check interrupt. (The determination is affected by the potential occurrence of any other kind of interrupt.) A context-altering instruction is executed (Chapter 12. Synchronization Requirements for Context Alterations on page 1079). The
870
1.5 Exceptions
The following augments the exceptions defined in Book I that can be caused directly by the execution of an instruction: the execution of a floating-point instruction when MSRFP=0 (Floating-Point Unavailable interrupt) execution of an instruction that causes a debug event (Debug interrupt). the execution of an auxiliary processor instruction when the auxiliary processor is unavailable (Auxiliary Processor Unavailable interrupt) the execution of a Vector, SPE, or Embedded Floating-Point instruction when MSRSPV=0 (SPE/ Embedded Floating-Point/Vector Unavailable interrupt)
Chapter 1. Introduction
871
1.6 Synchronization
The synchronization described in this section refers to the state of the thread that is performing the synchronization.
Programming Note The context established by a context synchronizing instruction includes modifications to certain resources that were performed with respect to the context synchronizing thread before the operation was initiated. The resources in this case include shared SPRs that contain program context such as LPIDR as well as TLBs shared by other threads. Programming Note A context synchronizing operation is necessarily execution synchronizing; see Section 1.6.2. Unlike the Synchronize instruction, a context synchronizing operation does not affect the order in which storage accesses are performed. Item 2 permits a choice only for isync (and sync; see Section 1.6.2) because all other execution synchronizing operations also alter context.
872
2.1 Overview
The Embedded.Hypervisor category permits threads and portions of real storage to be assigned to logical collections called partitions, such that a program executing in one partition cannot interfere with any program executing in a different partition. This isolation can be provided for both problem state and privileged state programs, by using a layer of trusted software, called a hypervisor program (or simply a hypervisor), and the resources provided by this facility to manage system resources. The collection of software that runs in a given partition and its associated resources is called a guest. The guest normally includes an operating system (or other system software) running in privileged state and its associated processes running in the problem state under the management of the hypervisor. The thread is in the guest state when a guest is executing and is in the hypervisor state when the hypervisor is executing. The thread is executing in the guest state when MSRGS=1. The number of partitions supported is implementationdependent. A thread is assigned to one partition at any given time. A thread can be assigned to any given partition without consideration of the physical configuration of the system (e.g. shared registers, caches, organization of the storage hierarchy), except that threads that share certain hypervisor resources may need to be assigned to the same partition. Additionally, certain resources may be utilized by the guest at the discretion of the hypervisor. Such usage may cause interference between partitions and the hypervisor should allocate those resources accordingly. The primary registers and facilities used to control Logical Partitioning are listed below and described in the following subsections. Other facili-
ties associated with Logical Partitioning are described within the appropriate sections within this Book. An instruction that is hypervisor privileged must be execute in the hypervisor state (MSRGS PR = 0b00). If an attempt is made to execute a hypervisor-privileged instruction in the guest supervisor state (MSRGS PR = 0b10), an Embedded Hypervisor Privilege exception occurs. A register that is hypervisor privileged may only be accessed in the hypervisor state (MSRGS PR = 0b00). If a hypervisor-privileged register is accessed in the guest supervisor state (MSRGS PR = 0b10), an Embedded Hypervisor Privilege exception occurs. When MSRGS PR = 0b01 or MSRGS PR = 0b11, the thread is in problem (user) state. The resources (instructions and registers) that are available are generally the same when MSRPR = 0b1 regardless of the state of MSRGS, however when MSRGS PR = 0b11 some interrupts are directed to the guest supervisor state. When MSRGS PR = 0b01 interrupts are always directed to the hypervisor (see Section 2.3.1, Directed Interrupts). Category Embedded.Hypervisor changes the operating system programming model to allow for easier virtualization, while retaining a default backward compatible mode in which an operating system written for hardware not implementing this category will still operate as before without using the Logical Partitioning facilities. Category Embedded.Hypervisor requires that Category: Embedded.Processor Control is also supported.
2.2 Registers
Registers specific to Logical Partitioning and hypervisor control are defined in this section. Other registers
873
Figure 1.
The LPIDR is part of the virtual address. During address translation, its content is compared to the TLPID field in the TLB entry to determine a matching TLB entry. The LPIDR is hypervisor privileged. The 12 least significant bits of LPIDR contain the LPID value. All 12 bits do not need to be implemented. Unimplemented bits should read as zero. The number of implemented bits is reported in MMUCFGLPIDSIZE
Type of Access SPR Mapped to GSRR0 GSRR1 GEPR GESR GDEAR GPIR GSPRG0 GSPRG1 GSPRG2 GSPRG3 mtspr, mfspr mtspr, mfspr mfspr mtspr, mfspr mtspr, mfspr mfspr mtspr, mfspr mtspr, mfspr mtspr, mfspr mtspr, mfspr
If an implementation permits problem state read access to SPRG3, the problem state read access is remapped to GSPRG3.
874
875
876
3.1 Overview
The Thread Control facility permits the hypervisor to control and monitor the execution, priority, and other aspects of threads.
63
Figure 3.
The TEN is a 64-bit register. For t < T, where T is the number of threads supported by the implementation, bit 63-t corresponds to thread t. When TEN63-t is 0, thread t is disabled. When TEN63-t is 1, thread t is enabled. Software is permitted to write any value to bits 0:63-T; a subsequent reading of these bits always returns 0. The TEN can be accessed using two SPR numbers.
Figure 2.
The TIR is a 64-bit read-only register that can be used to distinguish the thread from other threads on a multithreaded processor. Threads are numbered sequentially, with valid values ranging from 0 to t-1, where t is the number of threads implemented. A thread for which TIR = n is referred to as thread n. The TIR is hypervisor privileged.
When SPR 438 (Thread Enable Set, or TENS) is written, threads for which the corresponding bit in TENS is 1 are enabled; threads for which the corresponding bit in TENS is 0 are unaffected. When SPR 439 (Thread Enable Clear, or TENC) is written, threads for which the corresponding bit in TENC is 1 are disabled; threads for which the corresponding bit in TENC is 0 are unaffected.
When each SPR is read, the current value of the TEN is returned. The TEN is hypervisor privileged.
877
Architecture Note 32-bit implementations are limited to 32 threads. In 64-bit mode, the high order 32 bits of TEN are not modified when written. See Section 1.5.2, Modes [Category: Embedded], in Book I.
Figure 4.
The TENSR is a 64-bit read-only register. Bit 63-t of the TENSR corresponds to thread t. The contents of the TENSR are equal to the contents of the TEN, except that when TEN63-t changes from 1 to 0, TENSR63-t does not change from 1 to 0 until thread t is disabled. The TENSR is hypervisor privileged. Architecture Note 32-bit implementations are limited to 32 threads.
878
0
62 63
Figure 5.
Initialize Register
Next
Instruction
Address
Bit 63 is always 0. Bit 62 is part of the Instruction Address if Category: VLE is supported; otherwise bit 62 is always 0. When the INIA is written in 32-bit mode, bits 0:31 are set to 0s. The initial value of x0xFFFF_FFFF_FFFF_FFFC. all INIAs is
879
TMR,RS RS
11
tmr
21
494
/
31
n tmr5:9 || tmr0:4 TMR(n) (RS) The TMR field denotes a Thread Management Register, encoded as shown in the table below. The contents of register RS are placed into the designated Thread Management Register. TMR1 Register Name tmr5:9 tmr0:4 320 01010 00000 INIA0 ... ................... ... 383 10111 11111 INIA63 1 Note that the order of the two 5-bit halves of the SPR number is reversed. decimal Figure 6. Thread Management Register Numbers
All values of the TMR field not shown in Figure 6 are implementation-specific. An implementation only provides INIA registers corresponding to its implemented threads. Execution of this instruction specifying a TMR number that is not defined for the implementation causes an Illegal Instruction type Program interrupt if MSRGS PR=0b00. This instruction is hypervisor privileged. Special Registers Altered: See above
880
0 1
The thread is in hypervisor state if MSRPR = 0. The thread is in guest state. MSRGS cannot be changed unless thread is in the hypervisor state. Virtualized Implementation Note In a virtualized implementation, MSRGS will be 1.
36 37
Implementation-dependent User Cache Locking Enable (UCLE) [Category: Embedded Cache Locking.User Mode] 0 1 Cache Locking instructions are privileged. Cache Locking instructions can be executed in user mode (MSRPR=1).
If Category: Embedded Cache Locking.User Mode is not supported, this bit is treated as reserved. 38 SP/Embedded Floating-Point/Vector Available (SPV) [Category: Signal Processing]: 0 The thread cannot execute any SP instructions except for the brinc instruction. 1 The thread can execute all SP instructions. [Category: Vector]: 0 The thread cannot execute any Vector instruction. 1 The thread can execute Vector instructions. 39:45 Reserved
Figure 7.
Below are shown the bit definitions for the Machine State Register. Bit 32 Description Computation Mode (CM) 0 1 33 34 35 The thread runs in 32-bit mode. The thread runs in 64-bit mode.
881
51
Machine Check Enable (ME) 0 1 Machine Check interrupts are disabled. Machine Check interrupts are enabled. [Category: Embedded.Hypervisor] Machine Check interrupts with the exception of Guest Processor Doorbell Machine Check are enabled regardless of the state of MSRME when MSRGS = 1.
52
Floating-Point Exception Mode 0 (FE0) [Category: Floating-Point] (See below) Implementation-dependent Debug Interrupt Enable (DE) 0 1 Debug interrupts are disabled Debug interrupts are enabled DBCR0IDM=1 Virtualized Implementation Note In a virtualized implementation, when MSRDE=1, the registers SPRG9, DSRR0, and DSRR1 are volatile. if
55
56 57 58
Reserved Reserved Instruction Address Space (IS) 0 The thread directs all instruction fetches to address space 0 (TS=0 in the relevant TLB entry). The thread directs all instruction fetches to address space 1 (TS=1 in the relevant TLB entry).
Data Address Space (DS) 0 The thread directs all data storage accesses to address space 0 (TS=0 in the relevant TLB entry). The thread directs all data storage accesses to address space 1 (TS=1 in the relevant TLB entry).
60 61
Implementation-dependent Performance Monitor Mark (PMM) [Category: Embedded.Performance Monitor] 0 1 Disable statistics gathering on marked processes. Enable statistics gathering on marked processes
882
The Floating-Point Exception Mode bits FE0 and FE1 are interpreted as shown below. For further details see Book I.
See Section 8.3 for the initial state of the MSR. [Category:Embedded.Hypervisor] Some bits in the MSR can only be changed when the thread is in hypervisor state or the MSRP register has been configured to allow changes in the guest supervisor state. See Section 4.2.2, Machine State Register Protect Register (MSRP). Programming Note A Machine State Register bit that is reserved may be altered by rfi/rfci/rfmci/rfdi [Category:Embedded.Enhanced Debug]/rfgi [Category:Embedded.Hypervisor].
38:53 54
Reserved Debug Enable Protect (DEP) 0 1 MSRDE can be modified in guest supervisor state. MSRDE cannot be modified in guest supervisor state. Virtualized Implementation Note It is the responsibility of the hypervisor to ensure that DBCR0EDM is consistent with usage of DEP.
55:60 61
Reserved Performance Monitor Mark Protect (PMMP) [Category: E.PM] 0 1 MSRPMM can be modified in guest supervisor state. MSRPMM cannot be modified in guest supervisor state and guest state accesses to Performance Monitor Registers using mfpmr and mtpmr are affected as described later in this section.
62:63
Reserved
Figure 8.
The MSRP is hypervisor privileged. A context synchronizing operation must be performed following a change to MSRP to ensure that its changes are visible in the current context. The behavior of cache locking instructions (dcbtls, dcbtstls, dcblc, icbtls, icblc) in guest privileged state is dependent on the setting of MSRPUCLEP. When MSRPUCLEP = 0, cache locking instructions are permitted to execute normally in the guest privileged state. When MSRPUCLEP = 1, cache locking instructions are not permitted to execute in the guest privileged state and cause an Embedded Hypervisor Privilege exception. [Category: ECL] The behavior of Performance Monitor instructions (mtpmr, mfpmr) is dependent on the setting of
The MSRP is used to prevent guest supervisor state program from modifying the UCLE, DE, or PMM bits in the MSR. The MSRP bits UCLEP, DEP, and PMMP control whether the guest can change the corresponding MSR bits UCLE, DE, and PMM, respectively. When the MSRP bit associated with a corresponding MSR bit is 0, any operation in guest privileged state is allowed to modify that MSR bit, whether from an instruction that modifies the MSR, or from an interrupt which is taken in the guest supervisor state. When the MSRP bit associated with a corresponding MSR bit is 1 no operation in guest privileged state is allowed to modify that MSR bit (i.e., it remains unchanged), whether from an instruction that modifies the MSR, or from an interrupt from
883
34
Instruction TLB Error Interrupt Directed to Guest State (ITLBGS) [Category: Embedded.Hypervisor] Controls whether an Instruction TLB Error Interrupt that occurs in the guest state is taken in the guest supervisor state or the hypervisor state. 0 Instruction TLB Error Interrupts that occur in the guest state are directed to the hypervisor state. Instruction TLB Error Interrupts that occur in the guest state are directed to the guest supervisor state.
35
Data Storage Interrupt Directed to Guest State (DSIGS) [Category: Embedded.Hypervisor] Controls whether a Data Storage Interrupt that occurs in the guest state is taken in the guest supervisor state or the hypervisor state, except for an interrupt caused by a TLB Ineligible exception <E.PT>. 0 Data Storage Interrupts that occur in the guest state are directed to the hypervisor state. Data Storage Interrupts that occur in the guest state are directed to the guest supervisor state except that a Data Storage Interrupt due to a TLB Ineligible exception <E.PT> is directed to the hypervisor state, regardless of the existence of other exceptions that cause a Data Storage interrupt.
Figure 9.
These bits are interpreted as follows: Bit 32 Definition External Input Interrupt Directed to Guest State (EXTGS) [Category: Embedded.Hypervisor] Controls whether an External Input Interrupt is taken in the guest supervisor state or the hypervisor state. 0 External Input Interrupts are directed to the hypervisor state. External Input Interrupts pend until MSRGS=1 or MSREE=1. External Inputs interrupts are directed to the guest supervisor state. External Input interrupts pend until MSRGS=1 and MSREE=1. 36
Instruction Storage Interrupt Directed to Guest State (ISIGS) [Category: Embedded.Hypervisor] Controls whether an Instruction Storage Interrupt that occurs in the guest state is taken in the guest supervisor state or the hypervisor state, except for an interrupt caused by a TLB Ineligible exception <E.PT>. 0 Instruction Storage Interrupts that occur in the guest supervisor state are directed to the hypervisor state. Instruction Storage Interrupts that occur in the guest state are directed to the guest supervisor state.
33
Data TLB Error Interrupt Directed to Guest State (DTLBGS) [Category: Embedded.Hypervisor] Controls whether a Data TLB Error Interrupt that occurs in the guest state is taken in the guest supervisor state or the hypervisor state. 0 Data TLB Error Interrupts that occur in the guest state are directed to the hypervisor state.
37
Disable Embedded Hypervisor Debug (DUVD) [Category: Embedded.Hypervisor] Controls whether Debug Events occur in the hypervisor state. 0 Debug events can occur in the hypervisor state.
884
38
Interrupt Computation Mode (ICM) [Category: 64-bit] If category E.HV is implemented, this bit controls the computational mode of the thread when an interrupt occurs that is directed to the hypervisor state. At interrupt time, EPCRICM is copied into MSRCM if the interrupt is directed to the hypervisor state. If category E.HV is not implemented, then this bit controls the computational mode of the thread when any interrupt occurs. At interrupt time, EPCRICM is copied into MSRCM. 0 1 Interrupts will execute in 32-bit mode. Interrupts will execute in 64-bit mode.
Reserved
This register is hypervisor privileged. Engineering Note EPCR only needs to be implemented if Category: Embedded.Hypervisor or category 64-bit are implemented.
39
Guest Interrupt Computation Mode (GICM) [Category: Embedded.Hypervisor] [Corequisite Category: 64-bit] Controls the computational mode of the thread when an interrupt occurs that is directed to the guest supervisor state. At interrupt time, EPCRGICM is copied into MSRCM if the interrupt is directed to the guest supervisor state 0 1 Interrupts will execute in 32-bit mode. Interrupts will execute in 64-bit mode.
40
Disable Guest TLB Management Instructions (DGTMI) [Category: Embedded.Hypervisor] Controls whether guest supervisor state can execute any TLB management instructions. 0 tlbsrx. and tlbwe (for a Logical to Real Address translation hit) are allowed to execute normally when MSRGS,PR = 0b10. tlbsrx. and tlbwe always cause an Embedded Hypervisor Privilege Interrupt when MSRGS,PR = 0b10.
41
Disable MAS Interrupt Updates for Hypervisor (DMIUH) [Category: Embedded.Hypervisor] Controls whether MAS registers are updated by hardware when a Data or Instruction TLB Error Interrupt or a Data or Instruction Storage Interrupt is taken in the hypervisor. 0 MAS registers are set as described in Table 11 on page 963 when a Data or Instruction TLB Error Interrupt or a Data or Instruction Storage Interrupt is taken in the hypervisor.
885
and by which the system can return from performing a service or from processing an interrupt. The System Call instruction is described in Book I, but only at the level required by an application programmer. A complete description of this instruction appears below.
System Call
sc 17
0 6
SC-form
is generated. The interrupt causes the MSR to be set as described in Section 7.6.10 and Section 7.6.27. If LEV=0 and the thread is in guest state, the interrupt causes the next instruction to be fetched from the effective address based on one of the following. GIVPR0:47||GIVOR848:59||0b0000 if IVORs [Category: Embedded.Phased-Out] are supported. GIVPR0:51||0x120 if Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported. If LEV=0 and the thread is in hypervisor state, the interrupt causes the next instruction to be fetched from the effective address based on one of the following. IVPR0:47||IVOR848:59||0b0000 if IVORs [Category: Embedded.Phased-Out] are supported. IVPR0:51||0x120 if Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported. If LEV=1, the interrupt causes the next instruction to be fetched from the effective address based on one of the following. IVPR0:47||IVOR4048:59||0b0000 if IVORs [Category: Embedded.Phased-Out] are supported. IVPR0:51||0x300 if Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported. This instruction is context synchronizing. Special Registers Altered: SRR0 GSRR0 SRR1 GSRR1 MSR Programming Note sc serves as both a basic and an extended mnemonic. The Assembler will recognize an sc mnemonic with one operand as the basic form, and an sc mnemonic with no operand as the extended form. In the extended form, the LEV operand is omitted and assumed to be 0.
///
11
///
16
///
20
///
27
// 1 /
30 31
sc LEV [Category:Embedded.Hypervisor] 17
0 6
///
11
///
16
///
20
LEV
27
// 1 /
30 31
if LEV = 0 then if MSRGS = 1 then GSRR0 iea CIA + 4 MSR GSRR1 if IVORs supported then GIVPR0:47 || GIVOR848:59 || 0b0000 NIA else GIVPR0:51||0x0120 NIA MSR new_value (see below) else SRR0 iea CIA + 4 MSR SRR1 if IVORs supported then NIA IVPR0:47 || IVOR848:59 || 0b0000 else NIA IVPR0:51||0x0120 new_value (see below) MSR else if LEV = 1 then SRR0 iea CIA + 4 MSR SRR1 if IVORs supported then IVPR0:47 || IVOR4048:59 || 0b0000 NIA else NIA IVPR0:51||0x300 new_value (see below) MSR If category E.HV is not implemented, the System Call instruction behaves as if MSRGS = 0 and LEV = 0. If MSRGS = 0 or if LEV = 1, the effective address of the instruction following the System Call instruction is placed into SRR0 and the contents of the MSR are copied into SRR1. Otherwise, the effective address of the instruction following the System Call instruction is placed into GSRR0 and the contents of the MSR are copied into GSRR1. If LEV=0, a System Call interrupt is generated. If LEV=1, an Embedded Hypervisor System Call interrupt
886
XL-form
XL-form
///
11
///
16
///
21
50
/
31 0
19
6
///
11
///
16
///
21
51
/
31
MSR NIA
iea
MSR NIA
iea
The rfi instruction is used to return from a base class interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of SRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address SRR00:61||0b00. (Note: VLE behavior may be different; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the applicable save/ restore register 0 by the interrupt processing mechanism (see Section 7.6 on page 1009) is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in SRR0 at the time of the execution of the rfi). This instruction is privileged and context synchronizing. [Category:Embedded.Hypervisor] When rfi is executed in guest state, the instruction is mapped to rfgi and rfgi is executed instead. Special Registers Altered: MSR
The rfci instruction is used to return from a critical class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of CSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address CSRR00:61||0b00. (Note: VLE behavior may be different; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into SRR0 or CSRR0 by the interrupt processing mechanism (see Section 7.6 on page 1009) is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in CSRR0 at the time of the execution of the rfci). This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: MSR
887
X-form
///
11
///
16
///
21
39
/
31 0
19
6
///
11
///
16
///
21
38
/
31
MSR NIA
iea
MSR NIA
iea
The rfdi instruction is used to return from a Debug interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address DSRR00:61||0b00. (Note: VLE behavior may be different; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into SRR0, CSRR0, or DSRR0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in DSRR0 at the time of the execution of the rfdi). This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: MSR
The rfmci instruction is used to return from a Machine Check class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of MCSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address MCSRR00:61||0b00. (Note: VLE behavior may be different; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into SRR0, CSRR0, MCSRR0, or DSRR0 [Category: Embedded.Enhanced Debug] by the interrupt processing mechanism (see Section 7.6 on page 1009) is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in MCSRR0 at the time of the execution of the rfmci). This instruction is hypervisor privileged and context synchronizing. Special Registers Altered: MSR
888
XL-form
[Category:Embedded.Hypervisor] ///
11
OC
[Category: Embedded.Hypervisor] OC
21
///
16
///
21
102
/
31
270
/
31
newmsr GSRR1 if MSRGS = 1 then MSRGS newmsrGS,WE prots MSRPUCLEP,DEP,PMMP newmsr prots & MSR | ~prots & newmsr newmsr MSR NIA iea GSRR00:61 || 0b00 The rfgi instruction is used to return from a guest state base class interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of Guest Save/Restore Register 1 are placed into the MSR. If the rfgi is executed in the guest supervisor state (MSRGS PR = 0b10), the bit MSRGS is not modified and the bits MSRUCLE DE PMM are modified only if the associated bits in the Machine State Register Protect (MSRP) Register are set to 0. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address GSRR00:61||0b00. (Note: VLE behavior may be different; see Book VLE.) If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the associated save/restore register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in GSRR0 at the time of the execution of the rfgi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR
The ehpriv instruction generates an Embedded Hypervisor Privilege Exception resulting in an Embedded Hypervisor Privilege Interrupt. The OC field may be used by hypervisor software to provide a facility for emulated virtual instructions. Special Registers Altered: None Programming Note The ehpriv instruction is analogous to a guaranteed illegal instruction encoding in that it guarantees that an Embedded Hypervisor Privilege exception is generated. The instruction is useful for programs that need to communicate information to the hypervisor software, particularly as a means for implementing breakpoint operations in a hypervisor managed debugger. Programming Note ehpriv serves as both a basic and an extended mnemonic. The Assembler will recognize an ehpriv mnemonic with one operand as the basic form, and an ehpriv mnemonic with no operand as the extended form. In the extended form, the OC operand is omitted and assumed to be 0.
889
890
instruction. Read access to the PVR is privileged; write access is not provided. Version
32 48
Revision
63
Figure 10. Processor Version Register The PVR distinguishes between implementations that differ in attributes that may affect software. It contains two fields. Version A 16-bit number that identifies the version of the implementation. Different version numbers indicate major differences between implementations, such as which optional facilities and instructions are supported. A 16-bit number that distinguishes between implementations of the version. Different revision numbers indicate minor differences between implementations having the same version number, such as clock rate and Engineering Change level.
Revision
Version numbers are assigned by the Power ISA Architecture process. Revision numbers are assigned by an implementation-defined process.
891
Programming Note mfspr RT,PIR should be used to read GPIR in guest supervisor state. See Section 2.2.1, Register Mapping. Engineering Note Some previous implementations did not allow write access to the PIR. This allows the hypervisor to easily track changes to the GPIR. If write access is allowed to the PIR, write access should be allowed to the GPIR and should only be available to the hypervisor.
Bits 32:63
Name PIR
Description Thread ID
Figure 11. Processor Identification Register The means by which the PIR is initialized are implementation-dependent. Programming Note The PIR can be used to identify the thread globally among all threads in a system that contains multiple threads. This facilitates more efficient usage of the Processor Control facility (see Section 11).
Bits 32:63
Name GPIR
Description Thread ID
Figure 12. Guest Processor Identification Register The means by which the GPIR is initialized are implementation-dependent.
892
GSPRG0 through GSPRG2 [Category:Embedded.Hypervisor] These 64-bit registers can be accessed only in supervisor mode. GSPRG3 [Category:Embedded.Hypervisor] This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. If an implementation permits problem state read access to SPRG3, the problem state read access is remapped to GSPRG3. SPRGi or GSPRGi can be read using mfspr and written using mtspr. Programming Note mfspr RT,SPRGi should be used to read GSPRGi in guest state. mtspr SPRGi,RS should be used to write GSPRGi in guest state. See Section 2.2.1, Register Mapping.
Figure 13. Special Purpose Registers Programming Note USPRG0 was made a 32-bit register and renamed to VRSAVE; see Sections 3.2.3 and 6.3.3 of Book I. SPRG0 through SPRG2 These 64-bit registers can be accessed only in supervisor mode. [Category:Embedded.Hypervisor] Access to these registers in guest supervisor state is mapped to GSPRG0 through GSPRG2. SPRG3 This 64-bit register can be read in supervisor mode and can be written only in supervisor mode. It is implementation-dependent whether or not this register can be read in user mode. [Category:Embedded.Hypervisor] Access to this register in guest state is mapped to GSPRG3. SPRG4 through SPRG7 These 64-bit registers can be written only in supervisor mode. These registers can be read in supervisor and user modes. SPRG8 through SPRG9 These 64-bit registers can be accessed only in supervisor mode. Programming Note The intended use for SPRG9 is for internal debug exception handling.
893
If the TLB lookup is successful, the storage access control mechanism grants or denies the access using context information from EPLCEPR or EPSCEPR for loads and stores respectively. If access is not granted, a Data Storage interrupt occurs, and the ESREPID bit is set to 1. If the operation was a Store, the ESRST bit is also set to 1.
Process
ID
Load
Context
These bits are interpreted as follows: Bit 32 Definition External Load Context PR Bit (EPR) Used in place of MSRPR by the storage access control mechanism when an External Process ID Load instruction is executed. 0 1 33 Supervisor mode User mode
External Load Context AS Bit (EAS) Used in place of MSRDS for translation when an External Process ID Load instruction is executed, and, if this Load instruction causes a Data TLB Error interrupt, loaded into MAS registers in place of MSRDS. 0 1 Address space 0 Address space 1
34
External Load Context GS Bit (EGS) [Category: Embedded.Hypervisor] Used in place of MSRGS for translation when an External Process ID Load instruction is executed. 0 1 Hypervisor state Guest state Programming Note When a mtspr instruction is executed that targets EPLC, the EGS and ELPID fields are only modified if the thread is in hypervisor state.
35 36:47
Reserved External Load Context LPID Value (ELPID) [Category:Embedded.Hypervisor] Used in place of LPIDR register for translation
894
Programming Note When a mtspr instruction is executed that targets EPSC, the EGS and ELPID fields are only modified if the thread is in hypervisor state. 35 36:47 Reserved External Store Context LPID Value (ELPID) [Category:Embedded.Hypervisor] Used in place of LPIDR register for translation when an External Process ID Store instruction is executed. Programming Note When a mtspr instruction is executed that targets EPSC, the EGS and ELPID fields are only modified if the thread is in hypervisor state. Reserved External Store Context Process ID Value (EPID) Used in place of the Process ID register value for translation when an external PID Store instruction is executed, and, if this Store instruction causes a Data TLB Error interrupt, loaded into MAS registers in place of PID contents.
ID
Store
Context
These bits are interpreted as follows: Bits 32 Definition External Store Context PR Bit (EPR) Used in place of MSRPR by the storage access control mechanism when an External Process ID Store instruction is executed. 0 1 33 Supervisor mode User mode
External Store Context AS Bit (EAS) Used in place of MSRDS for translation when an External Process ID Store instruction is executed, and, if this Store instruction causes a Data TLB Error interrupt, loaded into MAS registers in place of MSRDS. 0 1 Address space 0 Address space 1
34
External Store Context GS Bit (EGS) [Category: Embedded.Hypervisor] Used in place of MSRGS for translation when an External Process ID Store instruction is executed. 0 1 Hypervisor state Guest state
895
Extended mnemonics
Extended mnemonics are provided for the mtspr and mfspr instructions so that they can be coded with the SPR name as part of the mnemonic rather than as a numeric operand; see Appendix B.
Privileged Length Cat2 mtspr mfspr (bits) no no 64 B no no 64 B no no 64 B yes yes 64 STM yes8 32 B yes8 9 yes9 64 B yes yes9 yes9 64 B yes yes 32 E 32 E hypv8 hypv8 hypv8 64 E 64 E.HV; E.PT hypv8 hypv8 32 E.HV; E.PT hypv8 hypv8 hypv8 hypv8 64 E 32 E hypv8 hypv8 yes9 64 E yes9 yes9 yes9 32 E 64 E hypv8 hypv8 no no 32 B no 64 B no 64 E no 64 B B no 325 9 9 yes 64 B yes yes yes 64 E 32 EC hypv4 hypv4 32 B hypv4 hypv4 32 B yes9 32 E hypv8 yes 32 B 32 E hypv5,8 hypv8 32 E.HV hypv3 32 E.HV,(E;64) hypv3 hypv3 hypv8 hypv8 32 E 32 E hypv8 hypv8 32 E hypv8 hypv8 32 E.HV hypv3 hypv3 64 E hypv8 hypv8 hypv8 hypv8 64 E 64 E hypv8 hypv8
896
decimal 315 316 317 318 319 336 338 339 340 341 342 343 344-347 348 349 350 368-371 372 373 378 379 380 381 382 383 400-415 432-435 436 437 438 439 440-441 442 443 444 445 446 447 512 526 527 528 529 530 531 532 533 570 571 572 574 575 604 605 624
Cat2 E E E E E E E.HV E.HV E E.HV E.HV.LRAT E.HV.LRAT E.HV E.HV; 64 E.HV; 64 E.PT E.HV E; 64 E; 64 E.HV E.HV E.HV;EXP E.HV E.HV E.HV E E.HV E.HV.LRAT E.MT E.MT E.MT E.HV E.HV E.HV E.HV E.HV E.MT E.HV SP ATB ATB SP SP SP E.PM E.PC E.PC E E E E.ED E.ED E E.ED E
897
SPR(n)
(RS)32:63
SPR,RS RS
11
spr
21
467
/
31
The SPR field denotes a Special Purpose Register, encoded as shown in Figure 16. The contents of register RS are placed into the designated Special Purpose Register. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RS are placed into the SPR. For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one leaves the other unaltered.
898
if MSRPR=1: Privileged Instruction type Program interrupt; if MSRPR=0: boundedly undefined results
If the SPR number is set to a value that is shown in Figure 16 but corresponds to an optional Special Purpose Register that is not provided by the implementation, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: See Figure 16 Compiler and Assembler Note For the mtspr and mfspr instructions, the SPR number coded in assembler language does not appear directly as a 10-bit binary number in the instruction. The number coded is split into two 5-bit halves that are reversed in the instruction, with the high-order 5 bits appearing in bits 16:20 of the instruction and the low-order 5 bits in bits 11:15. Programming Note For a discussion of software synchronization requirements when altering certain Special Purpose Registers, see Chapter 12. Synchronization Requirements for Context Alterations on page 1079.
899
RT,SPR RT
11
spr
21
339
/
31 0
31
6
RS
11
dcr
21
451
/
31
n spr5:9 || spr0:4 if length(SPR(n)) = 64 then SPR(n) RT else 32 RT 0 || SPR(n) The SPR field denotes a Special Purpose Register, encoded as shown in Figure 16. The contents of the designated Special Purpose Register are placed into register RT. For Special Purpose Registers that are 32 bits long, the low-order 32 bits of RT receive the contents of the Special Purpose Register and the highorder 32 bits of RT are set to zero. spr0=1 if and only if reading the register is privileged. Execution of this instruction specifying a defined and privileged register when MSRPR=1 causes a Privileged Instruction type Program interrupt. Execution of this instruction specifying an SPR number that is not defined for the implementation causes either an Illegal Instruction type Program interrupt or one of the following. if spr0=0: boundedly undefined results if spr0=1:
DCRN dcr0:4 || dcr5:9 DCR(DCRN) (RS) Let DCRN denote a Device Control Register. (The supported Device Control Registers are implementationdependent.) The contents of register RS are placed into the designated Device Control Register. For 32-bit Device Control Registers, the contents of bits 32:63 of (RS) are placed into the Device Control Register. This instruction is privileged. Special Registers Altered: Implementation-dependent.
RS
11
RA
16
///
21
387
/
31
if MSRPR=1: Privileged Instruction type Program interrupt if MSRPR=0: boundedly undefined results
DCRN (RA) DCR(DCRN) (RS) Let the contents of register RA denote a Device Control Register. (The supported Device Control Registers supported are implementation-dependent.) The contents of register RS are placed into the designated Device Control Register. For 32-bit Device Control Registers, the contents of RS32:63 are placed into the Device Control Register. The specification of Device Control Registers using mtdcrx, mtdcrux (see Book I), and mtdcr is implementation-dependent. For example, mtdcr 105,r2 and mtdcrux r1,r2 (where register r1 contains the value 105) may not produce identical results on an implementation. This instruction is privileged. Special Registers Altered: Implementation-dependent.
If the SPR field contains a value that is shown in Figure 16 but corresponds to an optional Special Purpose Register that is not provided by the implementation, the effect of executing this instruction is the same as if the SPR number were reserved. Special Registers Altered: None Note See the Notes that appear with mtspr.
900
X-form
RT
11
dcr
21
323
/
31 0
31
///
16
///
21
146
/
31
DCRN dcr0:4 || dcr5:9 RT DCR(DCRN) Let DCRN denote a Device Control Register. (The supported Device Control Registers are implementationdependent.) The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control Registers, the contents of the Device Control Register are placed into bits 32:63 of RT. Bits 0:31 of RT are set to 0. This instruction is privileged. Special Registers Altered: Implementation-dependent.
newmsr (RS)32:63 0 if MSRCM = 0 & newmsrCM = 1 then NIA0:31 if MSRGS = 1 then newmsrGS WE MSRGS WE prots0:31 0 MSRPUCLEP DEP PMMP protsUCLEP DEP PMMP newmsr prots & MSR | ~prots & newmsr newmsr MSR The contents of register RS32:63 are placed into the MSR. If the thread is changing from 32-bit mode to 64bit mode, the next instruction is fetched from 320||NIA 32:63. This instruction is privileged and execution synchronizing. In addition, alterations to the EE or CE bits are effective as soon as the instruction completes. Thus if MSREE=0 and an External interrupt is pending, executing an mtmsr that sets MSREE to 1 will cause the External interrupt to be taken before the next instruction is executed, if no higher priority exception exists. Likewise, if MSRCE=0 and a Critical Input interrupt is pending, executing an mtmsr that sets MSRCE to 1 will cause the Critical Input interrupt to be taken before the next instruction is executed if no higher priority exception exists. (See Section 7.6 on page 1009.) [Category:Embedded.Hypervisor] GS, WE and bits protected with MSRP are only modified if mtmsr is executed in hypervisor state. Special Registers Altered: MSR Programming Note For a discussion of software synchronization requirements when altering certain MSR bits please refer to Chapter 12.
RT
11
RA
16
///
21
259
/
31
DCRN (RA) RT DCR(DCRN) Let the contents of register RA denote a Device Control Register (the supported Device Control Registers are implementation-dependent.) The contents of the designated Device Control Register are placed into register RT. For 32-bit Device Control Registers, the contents of bits 32:63 of the designated Device Control Register are placed into RT. Bits 0:31 of RT are set to 0. The specification of Device Control Registers using mfdcrx and mfdcrux (see Book I) compared to the specification of Device Control Registers using mfdcr is implementation-dependent. For example, mfdcr r2,105 and mfdcrx r2,r1 (where register r1 contains the value 105) may not produce identical results on an implementation or between implementations. Also, accessing privileged Device Control Registers in supervisor mode with mfdcrux is implementation-dependent. This instruction is privileged.
901
X-form
RT 31 RT
11
///
16
///
21
131
/
31
///
16
///
21
83
/
31
RT
32
MSREE 0 || MSR
(RS)48
The content of (RS)48 is placed into MSREE. Alteration of the MSREE bit is effective as soon as the instruction completes. Thus if MSREE=0 and an External interrupt is pending, executing a wrtee instruction that sets MSREE to 1 will cause the External interrupt to occur before the next instruction is executed, if no higher priority exception exists (Section 7.9, Exception Priorities on page 1037). This instruction is privileged. Special Registers Altered: MSR
The contents of the MSR are placed into bits 32:63 of register RT and bits 0:31 of RT are set to 0. This instruction is privileged. Special Registers Altered: None
902
E ///
11
///
E
16 17
///
21
163
/
31
MSREE
The value specified in the E field is placed into MSREE. Alteration of the MSREE bit is effective as soon as the instruction completes. Thus if MSREE=0 and an External interrupt is pending, executing a wrtee instruction that sets MSREE to 1 will cause the External interrupt to occur before the next instruction is executed, if no higher priority exception exists (Section 7.9, Exception Priorities on page 1037). This instruction is privileged. Special Registers Altered: MSR Programming Note wrtee and wrteei are used to provide atomic update of MSREE. Typical usage is: mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE mfmsr Rn #save EE in (Rn)48 wrteei 0 #turn off EE : : : : #code with EE disabled wrtee Rn #restore EE without altering #other MSR bits that might #have changed
903
5.4.2 OR Instruction
or Rx,Rx,Rx can be used to set PPRPRI (see Figure 2 in Section 3.1 of Book II) as shown in Figure 17. PPRPRI remains unchanged if the privilege state of the thread executing the instruction is lower than the privilege indicated in the figure. (The encodings available to problem-state programs, as well as encodings for additional shared resource hints not shown here, are described in Section 3.2 of Book II.)
Rx 31 1 6 2 5 3 7
1
PPR32PRI Priority 001 010 011 100 101 110 111 very low low medium low (normal) medium medium high high very high
If the Embedded.Hypervisor category is supported, this value is hypervisor privileged. Otherwise, the value is privileged.
904
RT,RA,RB RT
11
RA
16
RB
21
95
/
31 0
31
RA
16
RB
21
287
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) 560 || MEM(EA,1) RT Let the effective address (EA) be the sum (RA|0)+(RB). The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. For lbepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS <E.HV> EPLCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a lbzx instruction except for using the EPLC register to provide the translation context.
if RA = 0 then b 0 else b (RA) EA b + (RB) 480 || MEM(EA,2) RT Let the effective address (EA) be the sum (RA|0)+(RB). The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. For lhepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS <E.HV> EPLCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a lhzx instruction except for using the EPLC register to provide the translation context.
905
RT,RA,RB RT
11
RA
16
RB
21
31
/
31 0
31
RA
16
RB
21
29
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) 32 RT 0 || MEM(EA,4) Let the effective address (EA) be the sum (RA|0)+(RB). The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. For lwepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS <E.HV> EPLCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a lwzx instruction except for using the EPLC register to provide the translation context.
0 (RA)
Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into RT. For ldepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in pace of MSRGS <E.HV> EPLCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Corequisite Categories: 64-Bit Special Registers Altered: None Programming Note This instruction behaves identically to a ldx instruction except for using the EPLC register to provide the translation context.
906
RS,RA,RB RS
11
RA
16
RB
21
223
/
31 0
31
RA
16
RB
21
415
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA,1) (RS)56:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)56:63 are stored into the byte in storage addressed by EA. For stbepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in pace of MSRGS <E.HV> EPSCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a stbx instruction except for using the EPSC register to provide the translation context.
if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA,2) (RS)48:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)48:63 are stored into the halfword in storage addressed by EA. For sthepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in pace of MSRGS <E.HV> EPSCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a sthx instruction except for using the EPSC register to provide the translation context.
907
RS,RA,RB RS
11
RA
16
RB
21
159
/
31 0
31
RA
16
RB
21
157
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA,4) (RS)32:63 Let the effective address (EA) be the sum (RA|0)+(RB). (RS)32:63 are stored into the word in storage addressed by EA. For stwepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in pace of MSRGS <E.HV> EPSCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Special Registers Altered: None Programming Note This instruction behaves identically to a stwx instruction except for using the EPSC register to provide the translation context.
0 (RA)
Let the effective address (EA) be the sum (RA|0)+(RB). (RS) is stored into the doubleword in storage addressed by EA. For stdepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in pace of MSRGS <E.HV> EPSCELPID is used in pace of LPIDR <E.HV> This instruction is privileged. Corequisite Categories: 64-Bit Special Registers Altered: None Programming Note This instruction behaves identically to a stdx instruction except for using the EPSC register to provide the translation context.
908
RA,RB ///
11
RA
16
RB
21
63
/
31 0
31
RA
16
RB
21
319
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required, a block containing the byte addressed by EA is in the data cache of any thread, and any locations in the block are considered to be modified there, then those locations are written to main storage. Additional locations in the block may be written to main storage. The block ceases to be considered modified in that data cache. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and the block is in the data cache of this thread, and any locations in the block are considered to be modified there, those locations are written to main storage. Additional locations in the block may be written to main storage, and the block ceases to be considered modified in that data cache. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. The instruction is treated as a Load with respect to translation, memory protection, and is treated as a Write with respect to debug events. This instruction is privileged. For dcbstep, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> Special Registers Altered: None Programming Note This instruction behaves identically to a dcbst instruction except for using the EPLC register to provide the translation context.
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtep instruction provides a hint that the program will probably soon load from the block containing the byte addressed by EA. If the Cache Specification category is supported, the nature of the hint is affected by TH values of 0b00000 to 0b00111. Values associated with the Stream category are ignored. See Section 4.3.2 of Book II for more information. If the block is in a storage location that is Caching Inhibited or Guarded, the hint is ignored. The only operation that is caused by the dcbtep instruction is the providing of the hint. The actions (if any) taken in response to the hint are not considered to be caused by or associated with the dcbtep instruction (e.g., dcbtep is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbtep instruction may complete before the operation it causes has been performed. The nature of the hint depends, in part, on the value of the TH field, as specified in the dcbt instruction in Section 4.3.2 of Book II. The instruction is treated as a Load, except that no interrupt occurs if a protection violation occurs. The instruction is privileged. The normal address translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch by External PID instruction so that it can
909
RA,RB,L /// L
9 11
RA
16
RB
21
127
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). L=0 If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any processor and any locations in the block are considered to be modified there, those locations are written to main storage and additional locations in the block may be written to main storage. The block is invalidated in the data caches of all processors. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and the block is in the data cache of this processor and any locations in the block are considered to be modified there, those locations are written to main storage and additional locations in the block may be written to main storage. The block is invalidated in the data cache of this processor. L=1 (dcbf local) [Category: Embedded.PhasedIn] The L=1 form of the dcbfep instruction permits a program to limit the scope of the flush operation to the data cache of this processor. If the block containing the byte addressed by EA is in the data cache of this processor, it is removed from this cache. The coherence of the block is maintained to the extent required by the Memory Coherence Required storage attribute. L = 3 (dcbf local primary) [Category: Embedded.Phased-In] The L=3 form of the dcbfep instruction permits a program to limit the scope of the flush operation to the primary data cache of this processor. If the block containing the byte addressed by EA is in the primary data cache of this processor, it is removed from this cache. The coherence of the block is maintained to the extent required by the Memory Coherence Required storage attribute. For the L operand, the value 2 is reserved. The results of executing a dcbfep instruction with L=2 are boundedly undefined. Engineering Note dcbfep with an L value of 2 should be treated as if the value were 0.
910
Programming Note dcbfep with L=1 can be used to provide a hint that a block in this processors data cache will not be reused soon. dcbfep with L=3 can be used to flush a block from the processors primary data cache but reduce the latency of a subsequent access. For example, the block may be evicted from the primary data cache but a copy retained in a lower level of the cache hierarchy. Programs which manage coherence in software must use dcbfep with L=0. Engineering Note
dcbfep with L=1 requires caches that are private to this processor to be flushed. Caches shared with other processors need not be affected provided that the requirements of memory coherence are satisfied.
Except in the dcbfep instruction description in this section, references to dcbfep in Books I-III imply L=0 unless otherwise stated or obvious from context; dcbflep is used for L=1 and dcbflpep is used for L=3. Programming Note dcbfep serves as both a basic and an extended mnemonic. The Assembler will recognize a dcbfep mnemonic with three operands as the basic form, and a dcbfep mnemonic with two operands as the extended form. In the extended form the L operand is omitted and assumed to be 0.
911
that it can be coded with the TH value as the last operand for all categories. . Extended: Equivalent to: dcbtstctep RA,RB,TH dcbtstep for TH values of 0b0000 - 0b0111; other TH values are invalid. Programming Note This instruction behaves identically to a dcbtst instruction except for using the EPLC register to provide the translation context, and not supporting the Stream category.
TH,RA,RB TH
11
RA
16
RB
21
255
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtstep instruction provides a hint that the program will probably soon store to the block containing the byte addressed by EA. If the Cache Specification category is supported, the nature of the hint is affected by TH values of 0b00000 to 0b00111. Values associated with the Stream category are ignored. See Section 4.3.2 of Book II for more information. If the block is in a storage location that is Caching Inhibited or Guarded, the hint is ignored. The only operation that is caused by the dcbtstep instruction is the providing of the hint. The actions (if any) taken in response to the hint are not considered to be caused by or associated with the dcbtstep instruction (e.g., dcbtstep is considered not to cause any data accesses). No means are provided by which software can synchronize these actions with the execution of the instruction stream. For example, these actions are not ordered by the memory barrier created by a sync instruction. The dcbtstep instruction may complete before the operation it causes has been performed. The instruction is treated as a Store, except that no interrupt occurs if a protection violation occurs. The instruction is privileged. The normal address translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> Special Registers Altered: None Extended Mnemonics: Extended mnemonics are provided for the Data Cache Block Touch for Store by External PID instruction so
912
RA,RB ///
11
RA
16
RB
21
991
/
31 0
31
RA
16
RB
21
1023
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the instruction cache of any thread, the block is invalidated in those instruction caches. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and a block containing the byte addressed by EA is in the instruction cache of this thread, the block is invalidated in that instruction cache. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. The instruction is treated as a Load. This instruction is privileged. For icbiep, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> Special Registers Altered: None Programming Note This instruction behaves identically to an icbi instruction except for using the EPLC register to provide the translation context.
if RA = 0 then b 0 (RA) else b b + (RB) EA n block size (bytes) log2(n) m EA0:63-m || m0 ea n MEM(ea, n) 0x00 Let the effective address (EA) be the sum (RA|0)+(RB). All bytes in the block containing the byte addressed by EA are set to zero. This instruction is treated as a Store. This instruction is privileged. The normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in place of MSR[GS] <E.HV> EPSCELPID is used in place of LPIDR <E.HV> Special Registers Altered: None Programming Note See the Programming Notes for the dcbz instruction. Programming Note This instruction behaves identically to a dcbz instruction except for using the EPSC register to provide the translation context.
913
FRT,RA,RB FRT
11
RA
16
RB
21
607
/
31 0
31
RA
16
RB
21
735
/
31
0 (RA)
if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA,8) (FRS) Let the effective address (EA) be the sum (RA|0)+(RB). (FRS) is stored into the doubleword in storage addressed by EA. For stfdepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in place of MSR[GS] <E.HV> EPSCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute stfdepx while MSRFP=0 will cause a Floating-Point Unavailable interrupt. Corequisite Categories: Floating-Point Special Registers Altered: None Programming Note This instruction behaves identically to a stfdx instruction except for using the EPSC register to provide the translation context.
Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into FRT. For lfdepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute lfdepx while MSRFP=0 will cause a Floating-Point Unavailable interrupt. Corequisite Categories: Floating-Point Special Registers Altered: None Programming Note This instruction behaves identically to a lfdx instruction except for using the EPLC register to provide the translation context.
914
RT,RA,RB RT
11
RA
16
RB
21
799
/
31 0
31
RA
16
RB
21
927
/
31
0 (RA)
0 (RA)
Let the effective address (EA) be the sum (RA|0)+(RB). The doubleword in storage addressed by EA is loaded into RT. For evlddepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute evlddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. Corequisite Categories: Signal Processing Engine Special Registers Altered: None Programming Note This instruction behaves identically to a evlddx instruction except for using the EPLC register to provide the translation context.
Let the effective address (EA) be the sum (RA|0)+(RB). (RS) is stored into the doubleword in storage addressed by EA. For evstddepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in place of MSR[GS] <E.HV> EPSCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute evstddepx while MSRSPV=0 will cause an SPE Unavailable interrupt. Corequisite Categories: Signal Processing Engine Special Registers Altered: None Programming Note This instruction behaves identically to a evstddx instruction except for using the EPSC register to provide the translation context.
915
VRT,RA,RB VRT
11
RA
16
RB
21
295
/
31 0
31
RA
16
RB
21
263
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) VRT MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) Let the effective address (EA) be the sum (RA|0)+(RB). The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into VRT. For lvepx, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute lvepx while MSRSPV=0 will cause a Vector Unavailable interrupt. Corequisite Categories: Vector Special Registers Altered: None Programming Note This instruction behaves identically to a lvx instruction except for using the EPLC register to provide the translation context.
if RA = 0 then b 0 else b (RA) EA b + (RB) VRT MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) mark_as_not_likely_to_be_needed_again_anytime_soon ( EA ) Let the effective address (EA) be the sum (RA|0)+(RB). The quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0 is loaded into VRT. lvepxl provides a hint that the quadword in storage addressed by EA will probably not be needed again by the program in the near future. For lvepxl, the normal translation mechanism is not used. The contents of the EPLC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPLCEPR is used in place of MSRPR EPLCEAS is used in place of MSRDS EPLCEPID is used in place of PID EPLCEGS is used in place of MSR[GS] <E.HV> EPLCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute lvepxl while MSRSPV=0 will cause a Vector Unavailable interrupt. Corequisite Categories: Vector Special Registers Altered: None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a lvxl instruction except for using the EPLC register to provide the translation context.
916
VRS,RA,RB VRS
11
RA
16
RB
21
807
/
31 0
31
RA
16
RB
21
775
/
31
(VRS)
Let the effective address (EA) be the sum (RA|0)+(RB). The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0. For stvepx, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in place of MSR[GS] <E.HV> EPSCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute stvepx while MSRSPV=0 will cause a Vector Unavailable interrupt. Corequisite Categories: Vector Special Registers Altered: None Programming Note This instruction behaves identically to a stvx instruction except for using the EPSC register to provide the translation context.
if RA = 0 then b 0 else b (RA) EA b + (RB) MEM(EA & 0xFFFF_FFFF_FFFF_FFF0, 16) (VRS) mark_as_not_likely_to_be_needed_again_anytime_soon (EA) Let the effective address (EA) be the sum (RA|0)+(RB). The contents of VRS are stored into the quadword in storage addressed by the result of EA ANDed with 0xFFFF_FFFF_FFFF_FFF0. The stvepxl instruction provides a hint that the quadword addressed by EA will probably not be needed again by the program in the near future. For stvepxl, the normal translation mechanism is not used. The contents of the EPSC register are used to provide the context in which translation occurs. The following substitutions are made for just the translation and access control process: EPSCEPR is used in place of MSRPR EPSCEAS is used in place of MSRDS EPSCEPID is used in place of PID EPSCEGS is used in place of MSR[GS] <E.HV> EPSCELPID is used in place of LPIDR <E.HV> This instruction is privileged. An attempt to execute stvepxl while MSRSPV=0 will cause a Vector Unavailable interrupt. Corequisite Categories: Vector Special Registers Altered: None Programming Note See the Programming Notes for the lvxl instruction in Section 6.7.2 of Book I. Programming Note This instruction behaves identically to a stvxl instruction except for using the EPSC register to provide the translation context.
917
918
919
6.1 Overview
Instruction effective addresses are generated for sequential instruction fetches and for addresses that correspond to a change in program flow (branches, interrupts). Data effective addresses are generated by Load, Store, and Cache Management instructions. TLB Management instructions generate effective addresses to determine the presence of or to invalidate a specific TLB entry associated with that address. For a complete discussion of storage addressing and effective address calculation, see Section 1.9 of Book I. Portions of the context of an effective address are appended to it to form the virtual address. The context is provided by various registers. The virtual address consists of the Logical Partition ID (LPID) <E.HV>, the Guest State <E.HV>, the address space identifier, the process identifier, and the effective address. The virtual address is translated to a real address by a matching direct entry in the Translation Lookaside Buffer (TLB) according to procedures described in Section 6.7.3. The Virtual Page Number (VPN) part of the virtual address is compared to the TLB contents to determine a match. The VPN consists of bits of the virtual address with the exception of the low-order effective address bits that correspond to the byte offset within the page. If the Embedded.Page Table category is supported, a virtual address can be translated by the Page Table pointed to by a matching indirect TLB entry as described in Section 6.7.4. As a result of a Page Table translation, a direct TLB entry is created, and this direct TLB entry can be used for subsequent translations. All virtual addresses are translated by the Page Table <E.PT> or the TLB, i.e. unlike the Server environment, there is no real mode. The real address that results from the translation is used to access main storage. The Translation Lookaside Buffer is the hardware resource that also controls protection and storage control attributes. TLB permission bits control user and supervisor read, write and execute capability. If the Embedded.Hypervisor category is supported, the Virtualization Fault bit permits data accesses to pages to be
trapped to the hypervisor, which allows the hypervisor to virtualize data accesses to specific pages, e.g. accesses to memory-mapped I/O. Storage control attributes described in Book II are supported by corresponding TLB bits, as are four optional implementationdependent user-defined storage control attributes. The organization of the TLB (e.g. associativity, number of entries, number of arrays, etc.) is implementationdependent. MMU configuration and TLB configuration information in various registers describes this implementation-dependent organization. Software manages translation directly by installing TLB entries, and indirectly by setting up the page tables, which the TLB will cache. TLB Management instructions are used by software to read, write, search and invalidate TLB contents. MMU Assist Registers (MAS) are used to transfer data to and from the TLB arrays by TLB Management instructions. If the Embedded.Hypervisor category is not supported, TLB Management instructions are privileged instructions. A different MMU Architecture Version (MAV) is used to indicate that different register layouts and functions are provided. The MMU Architecture Version Number is specified by the read-only MMUCFG register. The Embedded.Hypervisor.LRAT, Embedded TLB Write Conditional, and Embedded.Page Table categories are available only in MMU Architecture Version 2.0. If the Embedded.Hypervisor category is supported and the Embedded.Hypervisor.LRAT category is not supported, most TLB Management instructions are hypervisor instructions. TLB entries contain real addresses, and, to maintain isolation between partitions, guest operating systems are not given access to real addresses. In this case most TLB Management instructions trap to the hypervisor. The hypervisor can emulate a TLB Management instruction by swapping a Real Page Number for corresponding Logical Page Number (LPN) in the MAS registers or vice versa so that the guest OS only sees LPNs. However, if the Embedded.Hypervisor.LRAT category is supported, hardware can perform the translation of
920
Virtual Address
Lookup VA in TLB
921
Programming Note [Category: Embedded.Hypervisor.LRAT]: The logical pages sizes supported by an implementation are typically larger than the real page sizes supported. This implies that memory blocks must be assigned to a partition with larger granularity than the memory blocks that can be managed within a partition.
922
operation would not have been performed in the sequential execution model, any results of the operation are abandoned (except as described below). In the remainder of this section, including its subsections, Load instruction includes the Cache Management and other instructions that are stated in the instruction descriptions to be treated as a Load, and similarly for Store instruction. A data access that is performed out-of-order may correspond to an arbitrary Load or Store instruction (e.g., a Load or Store instruction that is not in the instruction stream being executed). Similarly, an instruction fetch that is performed out-of-order may be for an arbitrary instruction (e.g., the aligned word at an arbitrary location in instruction storage). Most operations can be performed out-of-order, as long as the machine appears to follow the sequential execution model. Certain out-of-order operations are restricted, as follows. Stores Stores are not performed out-of-order (even if the Store instructions that caused them were executed out-of-order). Accessing Guarded Storage The restrictions for this case are given in Section 6.8.1.1. The only permitted side effects of performing an operation out-of-order are the following. A Machine Check that could be caused by in-order execution may occur out-of-order except that, if category E.HV is supported and the Machine Check is the result of multiple TLB entries that translate the same VA, the Machine Check interrupt must occur in the context in which it was caused. Also, if category E.HV is supported, a Machine Check interrupt resulting from the following situations must be precise. Execution of an External Process ID instruction that has an operand that can be translated by multiple TLB entries. Execution of a tlbivax instruction that isnt a TLB invalidate all and there are multiple entries in a single threads TLB array(s) that match the complete VPN. Execution of a tlbilx instruction with T=3 and there are multiple entries in the TLB array(s) that match the complete VPN. Execution of a tlbsx or tlbsrx. instruction and there are multiple matching TLB entries. Non-Guarded storage locations that could be fetched into a cache by in-order fetching or execution of an arbitrary instruction may be fetched outof-order into that cache.
923
Maintenance of TLB entries is under software control, except that if the Embedded.Page Table category is supported, hardware will write TLB entries for translations performed via the Page Table. System software determines TLB entry replacement strategy and the format and use of any page state information. If a TLB provides Next Victim (NV) information, software can optionally use NV to choose a TLB entry to be replaced. See Section 6.11.4.7. Some implementations allow software to specify that a hardware generated hash and hardware replacement algorithm should be used to select the entry. See Section 6.11.4.7. The TLB entry contains all the information required to identify the page, to specify the translation, to specify access controls, and to specify the storage control attributes. A TLB entry is written by copying information from MAS registers, using a tlbwe instruction (see page 987). A TLB entry is read by copying information to MAS registers, using a tlbre instruction (see page 985). Software can also search for specific TLB entries using the tlbsx instruction (see page 982) and, if the Embedded.TLB Write Conditional category is supported, tlbsrx. (see page 984). Each TLB entry describes a page. Fields in the TLB entry fall into five categories: Page identification fields (information required to identify the page to the hardware translation mechanism). Address translation fields Access control fields Storage control attribute fields TLB management field While the fields in the TLB entry are required, unless they are identified as part of a category that is not supported, no particular TLB entry format is formally specified. The tlbre and tlbwe instructions provide the ability to read or write individual entries. Below are shown the field definitions for the TLB entry. Some fields that are used only for indirect TLB entries can be overlaid with fields that are used only for direct TLB entries. Such overlap is implementation-dependent and an example is shown in Figure 19 on page 928.
924
TID
TLPID
TGS
925
IND
Page Identification Field for indirect entry Name Description SPSIZE Sub-Page Size (IND=1) <E.PT> SPSIZE is a 5-bit field that specifies the minimum page size that can be specified by each Page Table Entry in the Page Table that is pointed to by the indirect TLB entry. This minimum page size is 2SPSIZE KB and must be at least 4 KB. Thus SPSIZE must be at least 2. Valid values are specified by EPTCFGSPS2 SPS1 SPS0. See Section 6.7.4
926
ACM
927
Programming Note Any TLB entry with IPROT = 0 is volatile and may be evicted for the following reasons even though software didnt explicitly remove or invalidate the entry. Generous TLB invalidations (tlbivax and tlbilx) TLB updates due to Page Table translations <E.PT> Hardware replacement algorithm on a tlbwe instruction if MMUCFGHES=1 and MAS0HES =1. On a virtualized implementation, a TLB entry with IPROT = 0 may be evicted at any time.
Access Control Field for direct and indirect entries Name Description VF Virtualization Fault <E.HV;E.PT> See Section 6.7.6.4 This 1-bit field specifies whether the TLB entry is used by the hypervisor to virtualize data accesses, e.g. accesses to memory-mapped I/O. A translation of the operand address of a Load, Store, or Cache Management instruction that uses a TLB entry with the Virtualization Fault field equal to 1 causes a Virtualization Fault exception type Data Storage interrupt regardless of the settings of the permission bits. The interrupt is always directed to hypervisor state regardless of the setting of EPCRDSIGS. 0 A Load, Store, or Cache Management access to this page does not cause a Virtualization Fault exception. 1 A Load, Store, or Cache Management access to this page causes a Virtualization Fault exception. TLB Management Field Name Description IPROT Invalidation Protection A TLB entry with this bit equal to 1 is protected from all TLB invalidation mechanisms except the explicit writing of a 0 to the V bit. See Section 6.11.4.3. IPROT is implemented only for TLB entries in TLB arrays where TLBnCFGIPROT is indicated. If IPROT = 1, the TLB entry is protected from invalidate operations due to any of the following. execution of tlbivax execution of tlbilx tlbivax invalidations from another thread tlbilx invalidations from another thread when the TLB is shared with that thread TLB invalidate all operations This bit is a hypervisor resource.
TLB entry with IND=1 SPSIZE0 SPSIZE1 SPSIZE2 SPSIZE3 SPSIZE4 RPN52
Figure 19. Overlaid TLB Field Example Virtualized Implementation Note On virtualized implementations, programmers should weigh the degredation that may be caused by execute-only pages against the need for the security availed by the protection.
928
effective address to form the virtual address. If the Embedded.Hypervisor category is supported, this address is also prepended by a Logical Partition ID and Guest State bit. The Logical Partition ID is provided by the contents of LPIDR and the Guest State bit is provided by the MSRGS. For instruction fetches, the address space identifier is provided by MSRIS and the process identifier is provided by the contents of the Process ID Register. For data storage accesses, the address space identifier is provided by the MSRDS and the process identifier is provided by the contents of the Process ID Register.
GS
LPID
AS
PID
0
Offset
63
Virtual Address
TLB
multiple-entry
RPN0:53
Offset
63
access are compared to the Translation Address Space bit (TS), the Translation ID field (TID), and the value in the Effective Page Number field (EPN), respectively, of each TLB entry. If the Embedded.Hypervisor category is supported, the Logical Partition ID and the Guest State bit are also compared to the Translation Logical Partition ID (TLPID) and Translation Guest State (TGS) of each TLB entry. Figure 21 illustrates the criteria for a virtual address to match a specific TLB entry for a direct TLB entry (IND = 0). See Section 6.7.4 for details on Page Table translation using an indirect TLB entry. The virtual address of a storage access matches a direct TLB entry if the first four following conditions are true, and, additionally, if the Embedded.Page Table category is supported, the fifth condition is true, and, addi-
929
Table 2: Page Size and Effective Address to TLB EPN Comparison for MAV = 1.0 EA to EPN Comparison Page Size SIZE (bits 0:532SIZE) (4SIZEKB) EPN0:53 =? EA0:53 1 KB 0b0000 0b0001 4KB EPN0:51 =? EA0:51 0b0010 16KB EPN0:49 =? EA0:49 EPN0:47 =? EA0:47 64KB 0b0011 0b0100 256KB EPN0:45 =? EA0:45 EPN0:43 =? EA0:43 1MB 0b0101 EPN0:41 =? EA0:41 0b0110 4MB 16MB 0b0111 EPN0:39 =? EA0:39 0b1000 64MB EPN0:37 =? EA0:37 EPN0:35 =? EA0:35 0b1001 256MB EPN0:33 =? EA0:33 0b1010 1GB EPN0:31 =? EA0:31 0b1011 4GB 0b1100 16GB EPN0:29 =? EA0:29 0b1101 64GB EPN0:27 =? EA0:27 0b1110 256GB EPN0:25 =? EA0:25 EPN0:23 =? EA0:23 0b1111 1TB
Table 3: Page Size and Effective Address to TLB EPN Comparison for MAV = 2.0 EA to EPN Comparison Page Size SIZE (bits 0:53SIZE) (2SIZEKB) 0b00000 1KB EPN0:53 =? EA0:53 0b00001 2KB EPN0:52 =? EA0:52 0b00010 4KB EPN0:51 =? EA0:51 0b00011 8KB EPN0:50 =? EA0:50 16KB 0b00100 EPN0:49 =? EA0:49 0b00101 32KB EPN0:48 =? EA0:48 0b00110 64KB EPN0:47 =? EA0:47 0b00111 128KB EPN0:46 =? EA0:46 0b01000 256KB EPN0:45 =? EA0:45 0b01001 512KB EPN0:44 =? EA0:44 0b01010 1MB EPN0:43 =? EA0:43 0b01011 2MB EPN0:42 =? EA0:42 0b01100 4MB EPN0:41 =? EA0:41 0b01101 8MB EPN0:40 =? EA0:40 0b01110 16MB EPN0:39 =? EA0:39 0b01111 32MB EPN0:38 =? EA0:38 0b10000 64MB EPN0:37 =? EA0:37 0b10001 128MB EPN0:36 =? EA0:36 0b10010 256MB EPN0:35 =? EA0:35 0b10011 512MB EPN0:34 =? EA0:34 0b10100 1GB EPN0:33 =? EA0:33 0b10101 2GB EPN0:32 =? EA0:32 0b10110 4GB EPN0:31 =? EA0:31 0b10111 8GB EPN0:30 =? EA0:30 0b11000 16GB EPN0:29 =? EA0:29 0b11001 32GB EPN0:28 =? EA0:28 0b11010 64GB EPN0:27 =? EA0:27 0b11011 128GB EPN0:26 =? EA0:26 0b11100 256GB EPN0:25 =? EA0:25 0b11101 512GB EPN0:24 =? EA0:24 0b11110 1TB EPN0:23 =? EA0:23 0b11111 2TB EPN0:22 =? EA0:22 Programming Note An implementation need not support all page sizes. If the virtual address of the storage access matches a TLB entry in accordance with the selection criteria specified in the preceding paragraph, the value of the Real Page Number field (RPN) of the matching TLB entry provides the real page number portion of the real address. Let n=64log2(page size in bytes) where page size is specified by the SIZE field of the TLB entry. Bits n:63 of the effective address are appended to bits 0:n1 of the 54-bit RPN field of the matching TLB entry to produce the 64-bit real address (i.e. RA = RPN0:n1 || EAn:63) that is presented to main storage to perform the storage access. The page size is determined by the value of the SIZE field of the matching TLB entry. See Table 4 and Table 5. Depending on the page size, certain RPN bits of the matching TLB entry must be zero as shown in Table 4 and Table 5. Otherwise, it is implementation-dependent whether the
930
Table 5: Real Address Generation for MAV = 2.0 SIZE 0b00000 0b00001 0b00010 0b00011 0b00100 0b00101 0b00110 0b00111 0b01000 0b01001 0b01010 0b01011 0b01100 0b01101 0b01110 0b01111 0b10000 0b10001 0b10010 0b10011 0b10100 0b10101 0b10110 0b10111 0b11000 0b11001 0b11010 0b11011 0b11100 0b11101 0b11110 0b11111 Page Size (4SIZE KB) 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB 256GB 512GB 1TB 2TB RPN Bits Required to be Equal to 0 none RPN53:53=0 RPN52:53=0 RPN51:53=0 RPN50:53=0 RPN49:53=0 RPN48:53=0 RPN47:53=0 RPN46:53=0 RPN45:53=0 RPN44:53=0 RPN43:53=0 RPN42:53=0 RPN41:53=0 RPN40:53=0 RPN39:53=0 RPN38:53=0 RPN37:53=0 RPN36:53=0 RPN35:53=0 RPN34:53=0 RPN33:53=0 RPN32:53=0 RPN31:53=0 RPN30:53=0 RPN29:53=0 RPN28:53=0 RPN27:53=0 RPN26:53=0 RPN25:53=0 RPN24:53=0 RPN23:53=0 Real Address RPN0:53 RPN0:52 RPN0:51 RPN0:50 RPN0:49 RPN0:48 RPN0:47 RPN0:46 RPN0:45 RPN0:44 RPN0:43 RPN0:42 RPN0:41 RPN0:40 RPN0:39 RPN0:38 RPN0:37 RPN0:36 RPN0:35 RPN0:34 RPN0:33 RPN0:32 RPN0:31 RPN0:30 RPN0:29 RPN0:28 RPN0:27 RPN0:26 RPN0:25 RPN0:24 RPN0:23 RPN0:22 || EA54:63 || EA53:63 || EA52:63 || EA51:63 || EA50:63 || EA49:63 || EA48:63 || EA47:63 || EA46:63 || EA45:63 || EA44:63 || EA43:63 || EA42:63 || EA41:63 || EA40:63 || EA39:63 || EA38:63 || EA37:63 || EA36:63 || EA35:63 || EA34:63 || EA33:63 || EA32:63 || EA31:63 || EA30:63 || EA29:63 || EA28:63 || EA27:63 || EA26:63 || EA25:63 || EA24:63 || EA23:63
A TLB Miss exception occurs if there is no valid matching direct entry in the TLB for the page specified by the virtual address (Instruction or Data TLB Error interrupt) and, if the Embedded.Page Table category is supported, there is no matching indirect entry (see Section 6.7.4). A TLB Miss exception for an instruction fetch will result in an Instruction TLB Miss exception type Instruction TLB Error interrupt. A TLB Miss exception for a data storage access will result in a Data TLB Miss exception type Data TLB Error interrupt. Although the possibility exists to place multiple direct and/or multiple indirect entries into the TLB that match a specific virtual address, assuming a set-associative or fully-associative organization, doing so is a programming error. Either one of the matching entries is used or a Machine Check exception occurs if there are multiple matching direct entries or multiple matching indirect entries for an instruction or data access. The rest of the matching TLB entry provides the access control bits (UX, SX, UW, SW, UR, SR, VF), and stor-
931
TLBentry[i][TGS]
=?
=?
EA
effective address of storage access Guest State Logical Partition ID Indirect bit
TLBentry[i][TID]n:63
=0?
shared page
GS LPID IND
TLBentry[i][EPN]0:N-1 EA0:N-1
contents of Process ID Register for instruction fetches and data storage accesses
TLBentry[i][IND]
Figure 21. Address Translation: Virtual Address to direct TLB Entry Match Process
entry is 1, a Virtualization Fault exception occurs. If the PTE is a valid entry (V bit = 1), the PTE is used to translate the address. The PTE includes the abbreviated RPN (ARPN), page size (PS), storage control (WIMGE), implementation-dependent bits, and storage access control bits (BAP,R,C) that are used for the access. If the Embedded.Page Table and Embedded.Hypervisor categories are both supported, the Embedded.Hypervisor.LRAT category is supported. In this case, the RPN from the PTE is treated as a Logical Page Number (LPN) and the LPN is translated by the LRAT into an RPN. See Section 6.9. If there is more than one matching direct TLB entry or more than one matching indirect TLB entry, any one of the duplicate entries may be used or Machine Check exception may occur. See Section 6.7.5 for the rules that software must follow when updating the Page Table.
932
Programming Note Even when the Embedded.Hypervisor category is supported, a Page Table can optionally be treated as a guest supervisor resource due to the LRAT. If the Page Table is treated as a hypervisor resource, the Page Table must be placed in storage to which only the hypervisor has access. Moreover, the contents of the Page Table must be such that non-hypervisor software cannot modify storage that contains hypervisor programs or data. An LRAT identity mapping (LPN=RPN) can be used when the Page Tables are treated as hypervisor resources, especially if only one LRAT entry is provided. If the LRAT identity mapping converts LPNs into RPNs that extend beyond the memory given to the partition, the Page Table Entries still provide the hypervisor with a mechanism to limit a guests accesses to memory assigned to the partition, assuming guest execution of TLB Management instructions is disabled. Programming Note If storage accesses are to scattered virtual pages, an Embedded Page Table could be sparsely used, and, in the worst case, there could be only one valid PTE in the Page Table. In this case it would be more efficient for software to directly load TLB entries rather than have both an indirect TLBE and a direct TLBE, which is loaded from the Page Table.
933
Matching Indirect TLB Entry (TLBE) SIZE SPSIZE RPN TGS IND
1 1 54 5 5
Effective Address
64-p p
1 xxx.......xxx00.0 0 31 32 52
Byte 51 64-p 63
SHIFT RIGHT 0
21
20 21
28
AND
21 * OR
Page Table
0b000
64-bit Real Address of PTE
8 bytes
Page Table Entry (PTE) 8 bytes ARPN 0 ARPN0:51-p WIMGE R impl. SW0 C PS BAP SW1 V dep. 40 45 46 50 51 52 56 62 63
byte
Logical Address translation performed if Category E.HV is supported and the TGS bit of the associated indirect TLB entry is 1.
LRAT
934
935
Programming Note The computation of the real address of the PTE can be understood as follows. (Some of the facts mentioned below, such as the fact that the minimum Page Table size is 2K, are covered later in the section.) 1. q is the number of EA bits above bit 52 that are part of the byte offset within the effective page. (The minimum size of a page that is mapped by a PTE is 4K, so EA52:63 are always part of the byte offset, and SPSIZE must be at least 2.) S is the low-order 29-q bits of the EPN, prepended with q 0s. 2. The EA-mask has a number of low-order 1 bits equal to the difference between log2(# PTEs) and log2(minimum # PTEs) = 8. (The log2 of the number of PTEs in a Page Table is SIZE SPSIZE. The minimum Page Table size is 2K and PTE size = 8 bytes, so the minimum number of PTEs is 211 23 = 28.) Call this number s; i.e., s = (SIZE - SPSIZE) - 8, and the log2 of the number of PTEs in the Page Table is s+8. 3. The result is the low-order s bits of the EPN that are immediately above the lowest-order 8 EPN bits (the lowest-order 8 bits are always used to select the PTE), prepended with 21-s 0s. (If s could be greater than (29-q)-8, the EPN bits included in the result could include 0 bits that were shifted in step 1. However, this would correspond to (SIZE - SPSIZE) - 8 > (31 - SPSIZE) - 8, which would imply SIZE > 31, which is impossible.) 4. R consists of the high-order 21-s bits of RPN32:52 followed by the low-order s bits of the EPN that are immediately above the lowestorder 8 EPN bits. 5. The real address of the PTE thus consists of the high-order 53-s bits of the RPN from the TLB entry, followed by the low-order s+8 bits of the EPN (recall that s+8 is the number of PTEs in the Page Table), followed by 3 0s.
936
SW0
40
45 46
50 51 52
56
62 63
50 51 52:55 56:61
SW0 C PS BAP
62 63
SW1 V
Description Abbreviated Real Page Number Storage control attributes Reference bit Implementation-dependent These bits can be used to support User-Definable Storage Control Attributes, ACM and VLE. These bits are used in any combination of the following or subsets of the following: 46:49 - User-Definable Storage Control Attributes (U0:U3) 48 - ACM 49 - VLE <VLE> Available for software use Change bit Page Size (real) Base Access Permission bits: 0: Base UX 1: Base SX 2: Base UW 3: Base SW 4: Base UR 5: Base SR Available for software use Entry valid (V=1) or invalid (V=0)
SW1
Figure 23. Page Table Entry The Page Size (PS) field encodes page sizes using the same encodes as the TLBSIZE, except that 0b0 is prepended to the 4-bit PS value (0 || PS) to form the equivalent 5-bit encode and PS must specify a page size of 4 KB or larger. See Table 3 on page 930. The Abbreviated Real Page Number (ARPN) field contains the least significant 40 bits of the RPN. The full RPN associated with the PTE is formed from the ARPN prepended with 0x000, i.e. RPN = 0x000 || ARPN. Depending on the page size, certain ARPN bits must be zero. Specifically, if p>12, ARPN52-p:39 must be zeros, where p = log2(page size specified by PTEPS). If an implementation supports a real address with only r bits, r<52, and either the Embedded.Hypervisor category is not supported or the TGS bit of the corresponding indirect TLB entry is 0, the high-order 52-r bits of PTEARPN are ignored and treated as 0s for address translation. If the Embedded.Hypervisor category is supported, an implementation supports a logical
TLB Update
As a result of a Page Table translation, a corresponding direct TLB entry is created if no exception occurs, is optionally created if certain exceptions occur, and is not created if certain other exceptions occur. If no exception occurs, a direct TLB entry is written to create an entry corresponding to the virtual address and the contents of the PTE that was used to translate the virtual address. In this case, hardware selects the TLB array and TLB entry to be written. Any TLB array
937
938
Programming Note Only software creates indirect TLB entries, but both software and hardware create direct TLB entries. Unless a TLB Write Conditional instruction is used, software must avoid creating a direct TLB entry for a VPN that may also be simultaneously translated via a Page Table by a thread sharing the TLB. Otherwise multiple, direct TLB entries could be created. If software is preloading a TLB with a direct TLB entry and there is already an indirect TLB entry that could be used to translate the same VPN, software must ensure that no program on any thread sharing the TLB is accessing the VPN. Otherwise multiple, direct TLB entries could be created. If the Embedded.TLB Write Conditional category is supported, a TLB Write Conditional instruction can be used to create a direct TLB entry for the same VPN that may also be mapped by an existing indirect entry and Page Table Entry, assuming the page size specified by the TLB Write Conditional and PTE are identical.
939
Programming Note For all of the sequences shown in the following subsections, if it is necessary to communicate completion of the sequence to software running on another thread, the sync instruction at the end of the sequence should be followed by a Store instruction that stores a chosen value to some chosen storage location X. The memory barrier created by the sync instruction ensures that if a Load instruction executed by another thread returns the chosen value from location X, the sequences stores to the Page Table have been performed with respect to that other thread. The Load instruction that returns the chosen value should be followed by a context synchronizing instruction in order to ensure that all instructions following the context synchronizing instruction will be fetched and executed using the values stored by the sequence (or values stored subsequently). (These instructions may have been fetched or executed out-of-order using the old contents of the PTE.) This Note assumes that the Page Table and location X are in storage that is Memory Coherence Required.
940
*/ */ */
General Case
If a valid entry is to be modified and the translation instantiated by the entry being modified is to be invalidated, the old PTE can be deleted and a new one added using the sequences described in the two preceding sections, in order to ensure that the translation instantiated by the old entry is no longer available, maintain a consistent state, modify the PTE, and ensure that a subsequent reference to the virtual address translated by the new entry will use the correct real address and associated attributes.
941
942
1. Category: Embedded.Hypervisor 2. icbtls and icblc require execute or read access. 3. dcbi may cause a Read or Write Access Control Exception based on whether the data is flushed prior to invalidation. 4. It is implementation-dependent whether dcbtstls is treated as a Load or a Store. 5. If an exception is detected, the instruction is treated as a no-op and no interrupt is taken.
943
Load Instruction If a copy of any byte of the storage operand is in a cache then that byte may be accessed in the cache or in main storage.
6.8.2 User-Definable
User-definable storage control attributes control userdefinable and implementation-dependent behavior of the storage system. The existence of these bits is implementation-dependent. These bits are both implementation-dependent and system-dependent in their effect. These bits may be used in any combination and also in combination with the other storage control attribute bits.
944
Storage Control Attribute 0 - not Write Through Required 1 - Write Through Required 0 - not Caching Inhibited 1 - Caching Inhibited 0 - not Memory Coherence Required 1 - Memory Coherence Required 0 - not Guarded 1 - Guarded 0 - Big-Endian 1 - Little-Endian User-Definable 0 - non Variable Length Encoding (VLE). 1 - VLE 0 - not Alternate Coherency Mode 1 - Alternate Coherency Mode (if M=1)
ACM7
1
3 4 5 6
Support for the 1 value of the W bit is optional. Implementations that do not support the 1 value treat the bit as reserved and assume its value to be 0. Support of the 1 value is optional for implementations that do not support multiprocessing, implementations that do not support this storage control attribute assume the value of the bit to be 0, and setting M=1 in a TLB entry will have no effect. [Category: Embedded.Little-Endian] Support for these attributes is optional. [Category: VLE] [Category: SAO] The combination WIMG = 0b1110 has behavior unrelated to the meanings of the individual bits. See Section 6.8.3.1, Storage Control Bit Restrictions for additional information. The coherency method used in Alternate Coherency Mode is implementation-dependent.
Figure 25. Storage control bits In Section 6.8.3.1 and 6.8.3.2, access includes accesses that are performed out-of-order. Programming Note In a system consisting of only a single-threaded processor that has caches, correct coherent execution does not require storage to be accessed as Memory Coherence Required, and accessing storage as not Memory Coherence Required may give better performance.
945
946
Programming Note
This Note suggests one example for managing reference and change recording. When performing physical page management, it is useful to know whether a given physical page has been referenced or altered. Note that this may be more involved than knowing whether a given TLB entry has been used to reference or alter memory, since multiple TLB entries may translate to the same physical page. If it is necessary to replace the contents of some physical page with other contents, a page which has been referenced (accessed for any purpose) is more likely to be retained than a page which has never been referenced. If the contents of a given physical page are to be replaced, then the contents of that page must be written to the backing store before replacement, if anything in that page has been changed. Software must maintain records to control this process. Similarly, when performing TLB management, it is useful to know whether a given TLB entry has been referenced. When making a decision about which entry to cast-out of the TLB, an entry which has been referenced is more likely to be retained in the TLB than an entry which has never been referenced. Execute, Read and Write Access Control exceptions may be used to allow software to maintain reference information for a TLB entry and for its associated physical page. The entry is built, with its UX, SX, UR, SR, UW, and SW bits off, and the index and effective page number of the entry retained by software. The first attempt of application code to use the page will cause an Access Control exception (because the entry is marked No Execute, No Read, and No Write). The Instruction or Data Storage interrupt handler records the reference to the TLB entry and to the associated physical page in a software table, and then turns on the appropriate access control bit. An initial read from the page could be handled by only turning on the appropriate UR or SR access control bits, leaving the page read-only. In a demand-paged environment, when the contents of a physical page are to be replaced, if any storage in that physical page has been altered, then the backing storage must be updated. The information that a physical page is dirty is typically recorded in a Change bit for that page. Write Access Control exceptions may be used to allow software to maintain change information for a physical page. For the example just given for reference recording, the first write access to the page via the TLB entry will create a Write Access Control exception type Data Storage interrupt. The Data Storage interrupt handler records the change status to the physical page in a software table, and then turns on the appropriate UW and SW bits.
947
LSIZE Logical Page Size The LSIZE field specifies the size of the logical page associated with the LRAT entry as 2LSIZEKB, where 0 LSIZE 31. Implementations may support any one or more of these logical page sizes (see Section 6.10.3.6), and these logical page sizes need not be the same as the real page sizes that are implemented. However, the smallest logical page is no smaller than the smallest real page. The encodes for LSIZE are the same as the encodes for the TLB SIZE. See Section 6.7.2. This field must be one of the logical pages sizes specified by the LRATPS register. LRPN LRAT Real Page Number (up to 54 bits) Bits 0:n1 of the LRPN field are used to replace bits 0:n1 of the LPN to produce the RPN that is written to TLBRPN by a tlbwe instruction or a Page Table translation (where n=64log2(logical page size in bytes) and logical page size is specified by the LSIZE field of the LRAT entry). Software must set unused low-order LRPN bits to 0. Note: Bits X:Y of the LRPN field are implemented, where X 0 and Y 53. X = 64 MMUCFGRASIZE. Y = p - 1 where p = 64 log2(smallest logical page size in bytes) and smallest logical page size is the smallest page size supported by the implementation as specified by the LRATPS register. Unimplemented LRPN bits are treated as if they contain 0s. An LRAT entry can be written by the hypervisor using the tlbwe instruction with MAS0ATSEL equal to 1. The contents of the LRAT entry specified by MAS0ESEL, and MAS2EPN are written from MAS registers. See the tlbwe instruction description in Section 6.11.4.9. An LRAT entry can be read by the hypervisor using the tlbre instruction with MAS0ATSEL equal to 1. The contents of the LRAT entry specified by MAS0ESEL and MAS2EPN are read and placed into the MAS registers. See the tlbre instruction description in Section 6.11.4.9. Maintenance of LRAT entries is under hypervisor software control. Hypervisor software determines LRAT entry replacement strategy. There is no Next Victim support for the LRAT array. The LRAT array is a hypervisor resource. There is at most one LRAT array per thread.
948
Programming Note Hypervisor software should not create an LRAT entry that maps any real memory regions for which a TLB entry should have VF equal to 1. Otherwise, a guest operating system could incorrectly create TLB entries, for this memory, with VF=0, assuming hypervisor software normally sets MAS8VF=0 before giving control to a guest operating system. TLB Write When the guest operating system manipulates the values of RPN fields of MAS registers, the values are treated as forming an LPN. When guest supervisor software attempts to execute tlbwe on an implementation that supports MMU Architecture Version 1, tlbre, and tlbsx, which operate on a TLB entrys real page number (RPN) or when guest supervisor software attempts to execute a TLB Management instruction with guest execution of TLB Management instructions disabled (EPCRDGTMI=1), an Embedded Hypervisor Privilege exception occurs. Also, if TLBnCFGGTWE = 0 for a TLB array and the guest supervisor executes a tlbwe to the TLB array, an Embedded Hypervisor Privilege exception occurs. If a tlbwe caused the exception, the hypervisor can replace the LPN value in the MAS registers with the corresponding RPN, execute a tlbwe, and restore the LPN in the MAS registers before returning to the guest operating system. If a tlbre or tlbsx caused the exception, the hypervisor can execute the exception-causing instruction and replace the RPN value in the MAS registers with the corresponding LPN before returning to the guest operating system. A Logical to Real Address Translation (LRAT) array provides a mechanism that allows a guest operating system to write the TLB without trapping to the hypervisor. When guest supervisor software executes tlbwe on an implementation that supports MMU Architecture Version 2, guest execution of TLB Management instructions is enabled (EPCRDGTMI=0), and TLBnCFGGTWE = 1 for the TLB array to be written, an LPN is formed. If MAS7 is implemented, LPN = MAS7RPNU || MAS3RPNL. Otherwise, LPN = 320 || MAS3RPNL. The LPN is translated into an RPN by the LRAT if a matching LRAT entry is found. A matching LRAT entry exists if the following conditions are all true for some LRAT entry. The Valid bit of the LRAT entry is one. Either the LPID field is not supported in the LRAT (LRATCFGLPID=0) or the value of LPIDRLPID is equal to the value of the LPID field of the LRAT entry. Bits 64-q:n-1 of the LPN match the corresponding bits of the LPN field of the LRAT entry where n = 64 log2(logical page size in bytes), logical page size is specified by the LSIZE field of the LRAT entry, and q is specified by LRATCFGLASIZE. Either of the following is true.
949
Programming Note The PID register was referred to as PID0 in the Type FSL Storage Control appendix of previous versions of the architecture.
PID
63
Figure 26. Processor ID Register (PID) The PID register fields are described below. Bit 50:63 Description Processor ID (PID) Identifies a unique process (except for the value of 0) and is used to construct a virtual address for storage accesses.
950
TLB Write Conditional (TWC) [MAV=2.0] Indicates whether the Embedded.TLB Write Conditional category is supported. 0 1 E.TWC category is not supported E.TWC category is supported. See Section 6.11.4.2.1 for a description of conditional TLB writes. This category also includes support for the tlbsrx. instruction, MAS0WQ, and MAS6ISIZE.
PID Register Size (PIDSIZE) The value of PIDSIZE is one less than the number of bits implemented for each of the PID registers implemented. Only the least significant PIDSIZE+1 bits in the PID registers are implemented. The maximum number of PID register bits that may be implemented is 14. Number of TLBs (NTLBS) The value of NTLBS is one less than the number of software-accessible TLB structures that are implemented. NTLBS is set to one less than the number of TLB structures so that its value matches the maximum value of MAS0TLBSEL. 00 01 10 11 1 TLB 2 TLBs 3 TLBs 4 TLBs
60:61
32
36
40
47
53
58 60 62 63
Figure 27. MMU Configuration Register [MAV=1.0] PIDSIZE NTLBS MAVN LRAT TWC /// LPIDSIZE RASIZE /// //
62:63
32
36
40
47 48 49
53
58 60 62 63
MMU Architecture Version Number (MAVN) Indicates the version number of the architecture of the MMU implemented. 00 01 10 11 Version 1.0 Version 2.0 Reserved Reserved
Figure 28. MMU Configuration Register [MAV=2.0] The MMUCFG fields are described below. Bit 36:39 Description LPID Register Size (LPIDSIZE) The value of LPIDSIZE is the number of bits in LPIDR that are implemented. Only the least significant LPIDSIZE bits in LPIDR are implemented. The Embedded.Hypervisor category is supported if and only if LPIDSIZE > 0. Real Address Size (RASIZE) Number of bits in a real address supported by the implementation. LRAT Translation Supported (LRAT) [Category: Embedded.Hypervisor.LRAT] Indicates LRAT translation is supported. 0 LRAT translation is not supported. A tlbwe executed in guest supervisor state results in an Embedded Hypervisor Privilege exception.
40:46
47
951
TLB array cannot be loaded from the Page Table. TLB array can be loaded from the Page Table.
32
40
44
48 49 50
46
///
NENTRY
63
45 46 47 48 49 50 51 52
Indirect (IND) [MAV=2.0 and Category: E.PT] Indicates that an indirect TLB entry can be created in the TLB array and that there is a corresponding EPTCFG register that defines the SIZE and Sub-Page Size values that are supported. 0 1 The TLB array treats the IND bit as reserved. The TLB array supports indirect entries.
Figure 30. TLB Configuration Register [MAV=2.0] The TLBnCFG fields are described below. Bit 32:39 Description Associativity (ASSOC) Total number of entries in a TLB array which can be used for translating addresses with a given EPN. This number is referred to as the associativity level of the TLB array. Some values of assoc have special meanings when used in combination with specific values of NENTRY, as follows. However, if TLBnCFGHES=1, the associativity level of the TLB array is implementation dependent. 47
Guest TLB Write Enabled (GTWE) [MAV=2.0 and Category: Embedded.Hypervisor.LRAT] Indicates that a guest supervisor can write the TLB array because LRAT translation is supported for the TLB array. 0 A guest supervisor cannot write the TLB array. A tlbwe executed in guest supervisor state results in an Embedded Hypervisor Privilege exception. A guest supervisor can write the TLB array if guest execution of TLB Management instructions is enabled (EPCRDGTMI=0).
NENTRY ASSOC Meaning 0 0 no TLB present 0 1 TLB geometry is completely implementation-defined. MAS0ESEL is ignored 0 >1 TLB geometry and number of entries is implementation defined, but has known associativity. For tlbre and tlbwe, a set of TLB entries is selected by an implementation dependent function of MAS8TGS TLPID <E.HV>, MAS1TS TID TSIZE, and MAS2EPN. MAS0ESEL is used to select among entries in this set, except on tlbwe if MAS0HES=1. n > 0 n or 0 TLB is fully associative 40:43 Minimum Page Size (MINSIZE) [MAV=1.0] This field defines the minimum page size of the TLB array. Page size encoding is defined in Section 6.7.2. Maximum Page Size (MAXSIZE) [MAV=1.0] This field defines the maximum page size of the TLB array. Page size encoding is defined in Section 6.7.2. Page Table (PT) [MAV=2.0 and Category: E.PT]
48
Invalidate Protection (IPROT) Invalidate protect capability of the TLB array. 0 1 Indicates invalidate protection capability not supported. Indicates invalidate protection capability supported.
49
Page Size Availability (AVAIL) [MAV=1.0] This defines the page size availability of the TLB array. If the Embedded.Page Table category is supported, this also defines the virtual address space size availability of TLB array. Otherwise, this field is reserved. 0 Fixed selectable page size from MINSIZE to MAXSIZE (all TLB entries are the same size). Variable page size from MINSIZE to MAXSIZE (each TLB entry can be sized separately).
44:47
50
45
Hardware Entry Select (HES) [MAV=2.0] Indicates whether the TLB array supports MAS0HES and the associated method for hardware selecting a TLB entry based on MAS1TID TSIZE and MAS2EPN for a tlbwe instruction.
952
Table 9: Relationship of TLBnPS PS bits and LRATPS PS bits to page size TLBnPS or LRATPS bit 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 PSm PS31 PS30 PS29 PS28 PS27 PS26 PS25 PS24 PS23 PS22 PS21 PS20 PS19 PS18 PS17 PS16 PS15 PS14 PS13 PS12 PS11 PS10 PS9 PS8 PS7 PS6 PS5 PS4 PS3 PS2 PS1 PS0 Page Size 2TB 1TB 512GB 256GB 128GB 64GB 32GB 16GB 8GB 4GB 2GB 1GB 512MB 256MB 128MB 64MB 32MB 16MB 8mb 4MB 2MB 1MB 512KB 256KB 128KB 64KB 32KB 16KB 8KB 4KB 2KB 1KB
52:63
Figure 31. TLB n Page Size Register The TLBnPS fields are described below. Bit 32:63 Description Page Size 31 - Page Size 0 (PS31-PS0) PSm indicates whether a direct TLB entry page size of 2m KB is supported by the TLB array. PSm corresponds to bit TLBnPS63-m for m = 0 to 31. 0 1 Direct TLB entry page size of KB is not supported. Direct TLB entry page size of 2m KB is supported. 2m
Table 9 shows the relationship between the Page Size (PSm) bits in TLBnPS and page size. The existence and type of mechanism for configuring the use of a subset of supported page sizes is implementation-dependent.
953
Bit 32:39
Description Associativity (ASSOC) Total number of entries in the LRAT array which can be used for translating addresses with a given LPN. This number is referred to as the associativity level of the LRAT array. A value equal to NENTRY or 0 indicates the array is fully-associative. Logical Address Size (LASIZE) Number of bits in a logical address supported by the implementation. LPID Supported (LPID) Indicates whether the LPID field in the LRAT is supported. 0 1 The LPID field in the LRAT is not supported. The LPID field in the LRAT is supported.
PS1
44
SPS1
49
PS0
54
SPS0
59 63
32 34
Figure 32. Embedded Page Table Configuration Register The EPTCFG fields are described below. Bit 34:38 Description Page Size 2 (PS2) PS2 indicates whether an indirect TLB entry with a page size of 2PS2 KB combined with the sub-page size specified by SPS2 is supported. Sub-Page Size 2 (SPS2) SPS2 indicates whether an indirect TLB entry with a sub-page size of 2SPS2 KB combined with the page size specified by PS2 is supported. Page Size 1 (PS1) PS1 indicates whether an indirect TLB entry with a page size of 2PS1 KB combined with the sub-page size specified by SPS1 is supported. Sub-Page Size1 (SPS1) SPS1 indicates whether an indirect TLB entry with a sub-page size of 2SPS1 KB combined with the page size specified by PS1 is supported. Page Size 0 (PS0) PS0 indicates whether an indirect TLB entry with a page size of 2PS0 KB combined with the sub-page size specified by SPS0 is supported. Sub-Page Size 0 (SPS0) SPS0 indicates whether an indirect TLB entry with a sub-page size of 2SPS0 KB combined with the page size specified by PS0 is supported. 50 40:46
39:43
52:63
Number of Entries (NENTRY) Number of entries in LRAT array. At least one entry is supported.
44:48
49:53
54:58
59:63
The LRATPS fields are described below. Bit 32:63 Description Page Size 31 - Page Size 0 (PS31-PS0) PSm indicates whether a logical page size of 2m KB is supported by the LRAT array. PSm corresponds to bit LRATPS64-m for m = 0 to 31. 0 Logical page size of 2m KB is not supported. Logical page size of 2m KB is supported.
LASIZE
40
///
47
LPID
NENTRY
63
50 51 52
Figure 33. LRAT Configuration Register The LRATCFG fields are described below.
954
32
41
45
49
53
57 58 59 61 62 63
Figure 35. MMU Control and Status Register 0 [MAV=1.0] /// TLB2_FI TLB3_FI // /
32
57 58 59 61 62 63
Figure 36. MMU Control and Status Register 0 [MAV=2.0] The MMUCSR0 fields are described below. Bit 41:56 Description TLBn Array Page Size [MAV=1.0] A 4-bit field specifies the page size for TLBn array. Page size encoding is defined in Section 6.7.2. If the value of TLBn_PS is not and between TLBnCFGMINSIZE TLBnCFGMAXSIZE, the page size is set to TLBnCFGMINSIZE. A TLBn_PS field is implemented only for a TLB array that can be programmed to support only one of several fixed page sizes. For each TLB array n (for 0 n < MMUCFGNTLBS), this field is implemented only if the following are all true. TLBnCFGAVAIL = 0. TLBnCFGMINSIZE TLBnCFGMAXSIZE. 57 58 61 62
TLB2 Invalidate All (TLB2_FI) TLB invalidate all bit for the TLB2 array. TLB3 Invalidate All (TLB3_FI) TLB invalidate all bit for the TLB3 array. TLB0 Invalidate All (TLB0_FI) TLB invalidate all bit for the TLB0 array. TLB1 Invalidate All (TLB1_FI) TLB invalidate all bit for the TLB1 array.
955
1
36 48 49 50 52 63
32 33 34
Figure 37. MAS0 register The MAS0 fields are described below. Bit 32 Description Array Type Select (ATSEL) [Category: Embedded.Hypervisor.LRAT] Selects LRAT or TLB for access for tlbwe and tlbre. In guest state, MAS0ATSEL is treated as if it were zero such that a TLB array is always selected. 0 1 34:35 TLB LRAT 50:51
TLB Select (TLBSEL) If ATSEL=0 or MSRGS=1, selects TLB for access. If ATSEL=1, TLBSEL is treated as reserved. 00 01 10 11 TLB0 TLB1 TLB2 TLB3
Write Qualifier (WQ) [MAV=2.0 and Category: Embedded.TLB Write Conditional] Qualifies the TLB write operation performed by tlbwe if a TLB entry is to be written (MAS0ATSEL=0 or MSRGS=1). If an LRAT entry is to be written (MAS0ATSEL=1 and MSRGS=0) by a tlbwe and the Embedded.TLB Write Conditional category is supported, WQ must be 0b00. Otherwise the result is undefined. This field has no effect on other TLB Management instructions. Whether an implementation supports this field is indicated by MMUCFGTWC. If MMUCFGTWC = 0, WQ is ignored and treated as 0b00 for tlbwe. 00 The selected TLB entry is written regardless of the TLB-reservation. The TLB-reservation is cleared. 01 The selected TLB entry is written if and only if the TLB reservation exists. A tlbwe with this value is called a TLB Write Conditional. The TLB-reservation is cleared. See Section 6.11.4.2.1, TLB Write Conditional [Embedded.TLB Write Conditional]. 10 The TLB-reservation is cleared; no TLB entry is written. 11 Reserved
36:47
Entry Select (ESEL) Identifies an entry in the selected array to be used for tlbwe and tlbre. Valid values for ESEL are from 0 to TLBnCFGASSOC - 1 for a TLB array and 0 to LRATCFGASSOC - 1 for the LRAT. That is, ESEL selects the entry in the selected array from the set of entries which can be used for translating addresses with the EPN (if TLBnCFGHES=0 and either MAS0ATSEL=0 or MSRGS=1), the combination of EPN, SIZE, and PID (for tlbwe if TLBnCFMAS0HES=0, and either GHES=1, MAS0ATSEL=0 or MSRGS=1, and for tlbre if TLBnCFGHES=1, and either MAS0ATSEL=0 or MSRGS=1), or the LPN (if MAS0ATSEL=1 and MSRGS=0) specified by MAS2EPN. For fullyassociative TLB or LRAT arrays, ESEL ranges from 0 to TLBnCFGNENTRY - 1 or 0 to LRATCFGNENTRY - 1, respectively. Hardware Entry Select (HES) [MAV=2.0] Determines how the TLB entry within the selected TLB array is selected by tlbwe if a TLB entry is to be written (MAS0ATSEL=0 or MSRGS=1). If an LRAT entry is to be written (MAS0ATSEL=1 and MSRGS=0) by a tlbwe, HES must be 0. Otherwise the result is undefined. This field has no effect on other TLB Management instructions. Whether an implementation supports this bit for a TLB array is
52:63
49
Next Victim (NV) NV is a hint to software to identify the next victim to be targeted for a TLB miss replacement operation for those TLBs that support the NV function. If the TLB selected by MAS0TLBSEL does not support the NV function, this field is undefined. The method of determining the next victim is implementation-dependent. NV is updated on tlbsx hit and miss cases as shown in Table 11 on page 963, on execution of tlbre if the TLB array being accessed supports the NV field, and on TLB Error interrupts if the Embedded.Hypervisor category is not supported, MAS Register updates are enabled for interrupts directed to the hypervisor (EPCRDMIUH = 0), or the interrupt is directed to the guest state. When NV is updated by a supported TLB array, the NV field will always present a value that can be used in the MAS0ESEL field. The LRAT array does not support Next Victim.
956
52:56
Translation Size (TSIZE) [MAV=2.0] See the TLB SIZE field definition in Section 6.7.1 and the LRAT LSIZE field definition in Section 6.9.
///
TS TSIZE
51 52 56
///
63
32 33 34
///
63
32 33 34
48 50 51 52
///
W I MG
Figure 39. MAS1 register [MAV=2.0] The MAS1 fields are described below. Bit 32 Definition Valid Bit (V) See the corresponding TLB bit definition in Section 6.7.1 and the corresponding LRAT bit definition in Section 6.9. Invalidate Protect (IPROT) See the corresponding TLB bit definition in Section 6.7.1. Translation Identity (TID) See the corresponding TLB field definition in Section 6.7.1. Indirect (IND) [MAV=2.0 and Category: Embedded.Page Table] See the corresponding TLB bit definition in Section 6.7.1. Translation Space (TS) See the corresponding TLB bit definition in Section 6.7.1. Translation Size (TSIZE) [MAV=1.0] See the TLB SIZE field definition in Section 6.7.1.
52 57 58 59 60 61 62
MAS2U EPN
0
W I MG
33
54 57 58 59 60 61 62
Figure 41. MAS2 register [MAV = 2.0] The MAS2 fields are described below. Bit 0:31 Description MAS2 Upper (MAS2U) MAS2U is an SPR that corresponds to EPN0:31 of MAS2. Effective Page Number (EPN) [MAV=1.0] See the corresponding TLB bit definition in Section 6.7.1. Bits that correspond to an offset within the smallest virtual page implemented need not be implemented. Unimplemented EPN bits are treated as 0s. Effective Page Number (EPN) [MAV=2.0 See the corresponding TLB bit definition in Section 6.7.1 and the LRAT LPN field defini-
34:47
50
0:51
51
52:55
0:53
957
RPNL
32
//
52 54
Figure 42. MAS3 register for MAS1IND=0 [MAV=1.0] UX SX UW SW UR SR UND SPSIZE RPNL
32
54
Figure 43. MAS3 register for MAS1IND=0 [MAV=2.0] For MAS1IND = 1, the layout of the MAS3 register is shown in Figure 44. RPNL
32
54
60
Figure 44. MAS3 register for MAS1IND=1 [MAV=2.0 and Category: E.PT] The MAS3 fields are described below. Bit 32:51 Description Real Page Number (bits 32:51) (RPNL or RPN32:51) [MAV=1.0] The real page number is formed by the upper n bits of (MAS7RPNU || MAS3RPNL), where n = 64 - log2(page size in bytes) and page size is specified by MAS1TSIZE for a tlbwe instruction and by the SIZE field of the TLB entry if a TLB entry is being read by a tlbre or tlbsx instruction. RPN0:31 are accessed through MAS7. RPNL bits corresponding to bits that are not implemented in the RPN field of the TLB are treated as reserved. Real Page Number (bits 32:53) (RPNL or RPN32:53) [MAV=2.0] The real page number is formed by the upper n bits of (MAS7RPNU || MAS3RPNL), where n = 64 - log2(page size in bytes) and page size is
61
62
63
958
U0:U3
U0:U3
U0:U3
//
TLBSELD
///
32 34 36
52
56 57 58 59 60 61 62 63
TSIZED
If MAS1IND = 0, MAS358:63 are defined as follows: 58 User State Execute Enable (UX) See the corresponding TLB bit definition in Section 6.7.1. Supervisor State Execute Enable (SX) See the corresponding TLB bit definition in Section 6.7.1. User State Write Enable (UW) See the corresponding TLB bit definition in Section 6.7.1. Supervisor State Write Enable (SW) See the corresponding TLB bit definition in Section 6.7.1. User State Read Enable (UR) See the corresponding TLB bit definition in Section 6.7.1. Supervisor State Read Enable (SR) See the corresponding TLB bit definition in Section 6.7.1. 57
32 34 36
48 49 52
Figure 46. MAS4 register [MAV=2.0] The MAS4 fields are described below. Bit 34:35 Description TLBSEL Default Value (TLBSELD) Specifies the default value loaded MAS0TLBSEL on the interrupt. in
59
60
48
61
IND Default Value (INDD) Specifies the default value loaded in MAS1IND and MAS6SIND on the interrupt. Default TSIZE Value (TSIZED) [MAV=1.0] Specifies the default value loaded into MAS1TSIZE on a TLB miss exception. Default TSIZE Value (TSIZED) [MAV=2.0] Specifies the default value loaded into MAS1TSIZE on a TLB miss exception. If MMUCFGTWC = 1, TSIZED is also the default value loaded into MAS6ISIZE on the interrupt. Default ACM Value (ACMD) Specifies the default value loaded into MAS2ACM on the interrupt. Default VLE Value (VLED) [Category: VLE] Specifies the default value loaded into MAS2VLE on the interrupt. Default W Value (WD) Specifies the default value loaded into MAS2W on the interrupt. Default I Value (ID) Specifies the default value loaded into MAS2I on the interrupt. Default M Value (MD) Specifies the default value loaded into MAS2M on the interrupt. Default G Value (GD) Specifies the default value loaded into MAS2G on the interrupt.
52:55
62
52:56
63
If MAS1IND = 1, MAS358:63 are defined as follows: 58:62 Sub-Page Size (SPSIZE) See the corresponding TLB field definition in Section 6.7.1. Undefined) (UND) The value of this bit is undefined after a tlbre or tlbsx.
58
63
59
60
61
62
ACMD VLED WD ID MD GD ED
INDD
//
TLBSELD
///
//
959
SPID
48
///
63
SPID
///
48 52
ISIZE
57
///
62 63
Figure 49. MAS6 register [MAV = 2.0] The MAS6 fields are described below. Bit 34:47 Description Search PID (SPID) Specifies the value of PID used when searching the TLB during execution of tlbsx. It also defines the PID of the TLB entry to be invalidated by tlbilx with T=1 or T=3 and tlbivax with EA61=0. The number of bits implemented is the same as the number of bits implemented in the PID register. Programming Note The SPID field was referred to as SPID0 in previous versions of the architecture. 52:56 Invalidation Size (ISIZE) [MAV=2.0] ISIZE defines the size of the virtual address space mapped by the TLB entry to be invalidated by tlbilx T=3 and tlbivax. This field is only supported if MMUCFGTWC = 1 or, for any TLB array, TLBnCFGHES = 1. Otherwise this field is reserved. Programming Note To make code more portable across implementations, software should always set ISIZE before executing tlbilx T=3 and tlbivax. 62 Indirect Value for Searches (SIND) [MAV=2.0 and Category: Embedded.Page Table] Specifies the value of IND used when searching the TLB during execution of tlbsx. It also defines the Indirect (IND) value of the TLB entry to be invalidated by tlbilx T=3 and tlbivax. Address Space Value for Searches (SAS) Specifies the value of AS used when searching the TLB during execution of tlbsx. It also defines the TS value of the TLB entry to be invalidated by tlbilx T=3 and tlbivax.
SLPID
63
32 33
Figure 47. MAS5 register The MAS5 fields are described below. Bit 32 Description Search GS (SGS) Specifies the GS value used when searching the TLB during execution of tlbsrx. <E.TWC> and tlbsx and for selecting TLB entries to be invalidated by tlbilx or tlbivax. The SGS field is compared with the TGS field of each TLB entry to find a matching entry. Search Logical Partition ID (SLPID) Specifies the LPID value used when searching the TLB during execution of tlbsrx. <E.TWC> and tlbsx and for selecting TLB entries to be invalidated by tlbilx or tlbivax. The SLPID field is compared with the TLPID field of each TLB entry to find a matching entry. Only the least significant MMUCFGLPIDSIZE bits of SLPID are implemented.
52:63
All other fields are reserved. Programming Note Hypervisor software should generally treat MAS5 as part of the partition state.
960
SIND SAS
SAS
33
Translation Virtualization Fault (VF) See the corresponding TLB bit definition in Section 6.7.1. Translation Logical Process ID (TLPID) See the corresponding TLB bit definition in Section 6.7.1 and the LRAT LPID field definition in Section 6.9.
52:63
All other fields are reserved. Programming Note Hypervisor software should generally treat MAS8 as part of the partition state. After executing tlbsx and tlbre, hypervisor software may need to restore MAS8 before returning to guest state. This is especially important if the Embedded.Hypervisor.LRAT category is supported because a guest can execute a tlbwe instruction that writes a TLB entry with the MAS8 values. Programming Note For a TLB entry with VF=1, hypervisor software should have the execution permission bits set so that an instruction fetch of the page is prevented. The VF bit can be used to force a Data Storage interrupt for virtualization of MMIO.
Figure 50. MAS7 register The MAS7 fields are described below. Bit 32:63 Description Real Page Number (bits 0:31) (RPNU or RPN0:31) RPN32:53 are accessed through MAS3. RPNU bits corresponding to bits that are not implemented in the RPN field of the TLB are treated as reserved.
TLPID
63
32 33 34
Figure 51. MAS8 register The MAS8 fields are described below. Bit 32 Description Translation Guest State (TGS) See the corresponding TLB bit definition in Section 6.7.1.
1 This register is a hypervisor resource, and can be accessed by one of these instructions only in hypervisor state (see Chapter 2). 2 See Section 1.3.5 of Book I. If multiple categories are listed, the register pair is only provided if all categories are supported. Otherwise the SPR number is treated as reserved.
961
962
Table 11:MAS Register Update Summary for TLB operations Value Loaded on Event Data TLB Error InterData TLB Data TLB rupt on Load Error InterError Interor Store that rupt on Exter- rupt on Exteris not catenal Process nal Process gory E.PD or ID Load ID Store Instruction <E.PD>2 <E.PD>2 TLB Error Interrupt2 0 0 0
tlbsx hit
tlbsx miss
tlbre
0 MAS4TLBSELD if TLB array [MAS4TLBSEL D] supports next victim then hardware hint, else undefined TLBnCFGHES for array specified by MAS4TLBSELD 0b01
MAS4TLBSELD MAS4TLBSELD MAS4TLBSELD TLB array that hit if TLB array if TLB array if TLB array [MAS4TLBSEL [MAS4TLBSEL [MAS4TLBSEL D] supports D] supports D] supports next victim next victim next victim then hardware then hardware then hardware hint, hint, hint, else undeelse undeelse undefined fined fined TLBnCFGHES TLBnCFGHES TLBnCFGHES for array speci- for array speci- for array specified by fied by fied by MAS4TLBSELD MAS4TLBSELD MAS4TLBSELD 0b01 0b01 0b01 index of entry that hit
MAS0HES
0b01
if TLB array if TLB array if TLB array if TLB array if TLB array if TLB array with the [MAS4TLBSEL [MAS0TLBSEL] [MAS4TLBSEL [MAS4TLBSEL [MAS4TLBSEL matching supports next D] supports D] supports D] supports D] supports next victim next victim next victim entry supnext victim victim then then next then next then next ports next victhen next hardware hint, hardware hint, hardware hint, hardware hint, tim then hardware hint, else undeelse undeelse undeelse undehardware hint, else undefined fined fined fined else undefined fined 1 0 PID MAS4INDD MSRIS or MSRDS MAS4TSIZED EA0:53
1
MAS1V MAS1IPROT MAS1TID MAS1IND <E.PT> MAS1TS MAS1TSIZE MAS2EPN MAS2ACM MAS2VLE <VLE> MAS2W I M G E MAS3RPNL
1 TLBIPROT TLBTID TLBIND TLBTS TLBSIZE TLBEPN TLBACM TLBVLE TLBW I M G E TLBRPN[32:53]
TLBV TLBIPROT TLBTID TLBIND TLBTS TLBSIZE TLBEPN TLBACM TLBVLE TLBW I M G E TLBRPN[32:53]
MAS4ACMD MAS4VLED
GD ED
963
Table 11:MAS Register Update Summary for TLB operations Value Loaded on Event Data TLB Error InterData TLB Data TLB rupt on Load Error InterError Interor Store that rupt on Exter- rupt on Exteris not catenal Process nal Process gory E.PD or ID Load ID Store Instruction <E.PD>2 <E.PD>2 TLB Error Interrupt2 0 0 0 0 0 0
tlbsx hit
tlbsx miss
tlbre
MAS3U0:U3 MAS3UX SX UW
SW UR SR
0 0
MAS3SPSIZE
(see the entry (see the entry (see the entry for MAS3UX SX for MAS3UX SX for MAS3UX SX UW SW UR SR) UW SW UR SR) UW SW UR SR)
if Category: (see the entry if Category: E.PT supfor MAS3UX SX E.PT supported and ported and UW SW UR SR) TLBIND TLBIND = 1 then TLB= 1 then TLBSPSIZE SPSIZE
MAS3UND
(see the entry (see the entry (see the entry for MAS3UX SX for MAS3UX SX for MAS3UX SX UW SW UR SR) UW SW UR SR) UW SW UR SR)
MAS5 <E.HV> & MAS4 MAS6SPID MAS6ISIZE if TLBnCFGHES=1 or <E.TWC> MAS6SAS MAS6SIND <E.PT> MAS7RPNU
TLPID
3 PID MAS4TSIZED
3 EPLCEPID MAS4TSIZED
3 EPSCEPID MAS4TSIZED
EPLCEAS MAS4INDD 0
3
EPSCEAS MAS4INDD 0
3
TLBRPN[0:31] TLBTGS VF
TLPID
TLBRPN[0:31] TLBTGS VF
TLPID
MAS8TGS VF <E.HV>
1. If MSRCM=0 (32-bit mode) at the time of the exception, EPN0:31 are set to 0. 2. If the E.HV category is not supported, MAS Register updates are enabled for interrupts directed to the hypervisor (EPCRDMIUH = 0), or the interrupt is directed to the guest state. 3. MAS5 and MAS8 are not updated on a Data or Instruction TLB Error interrupt. The hypervisor should ensure they already contain values appropriate to the partition.
964
X-form
mbar. On other implementations this instruction is treated as a Load (see the section cited above). If a thread holds a reservation and some other thread executes a dcbi that specifies a location in the same reservation granule, the reservation may be lost only if the dcbi is treated as a Store. dcbi may cause a cache locking exception, the details of which are implementation-dependent. This instruction is privileged. Special Registers Altered: None
RA,RB ///
11
RA
16
RB
21
470
/
31
if RA=0 then b 0 else b (RA) EA b + (RB) InvalidateDataCacheBlock( EA ) Let the effective address (EA) be the sum (RA|0)+(RB). If the block containing the byte addressed by EA is in storage that is Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of any thread, then the block is invalidated in those data caches. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in any such data cache, those locations are written to main storage and additional locations in the block may be written to main storage. If the block containing the byte addressed by EA is in storage that is not Memory Coherence Required and a block containing the byte addressed by EA is in the data cache of this thread, then the block is invalidated in that data cache. On some implementations, before the block is invalidated, if any locations in the block are considered to be modified in that data cache, those locations are written to main storage and additional locations in the block may be written to main storage. The function of this instruction is independent of whether the block containing the byte addressed by EA is in storage that is Write Through Required or Caching Inhibited. This instruction is treated as a Store (see Section 6.7.6.5) on implementations that invalidate a block without first writing to main storage all locations in the block that are considered to be modified in the data cache, except that the invalidation is not ordered by
965
966
[Category:Embedded.Hypervisor] The behavior of cache locking instructions in guest privileged state (dcbtls, dcbtstls, dcblc, icbtls, icblc) is dependent on the setting of MSRPUCLEP. When MSRPUCLEP = 0, cache locking instructions are permitted to execute normally in the guest privileged state. When MSRPUCLEP = 1, cache locking instructions are not permitted to execute in the guest privileged state and cause an Embedded Hypervisor Privilege exception when executed. See Section 4.2.2, Machine State Register Protect Register (MSRP).
6.11.2.2.1 Overlocking
If no exceptions occur for the execution of an dcbtls, dcbtstls, or icbtls instruction, an attempt is made to lock the specified block into the cache. If all of the available cache blocks into which the specified block may be loaded are already locked, an overlocking condition occurs. The overlocking condition may be reported in an implementation-dependent manner. If an overlocking condition occurs, it is implementationdependent whether the specified block is not locked into the cache or if another locked block is evicted and the specified block is locked. The selection of which block is replaced in an overlocking situation is implementation-dependent. The overlocking condition is still said to exist, and is reflected in any implementation-dependent overlocking status. An attempt to lock a block that is already present and valid in the cache will not cause an overlocking condition. If a cache block is to be loaded because of an instruction other than a Cache Management or Cache Locking instruction and all available blocks into which the block can be loaded are locked, the instruction executes and completes, but no cache blocks are unlocked and the block is not loaded into the cache. Programming Note Since caches may be shared among threads, an overlocking condition may occur when loading a block even though a given thread has not locked all the available cache blocks. Similarly. blocks may be unlocked as a result of invalidations by other threads.
967
Data Cache Block Touch for Store and Lock Set X-form
dcbtstls CT,RA,RB /
6 7
CT,RA,RB /
6 7
CT
11
RA
16
RB
21
166
/
31 0
31
CT
11
RA
16
RB
21
134
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtls instruction provides a hint that the program will probably soon load from the block containing the byte addressed by EA, and that the block containing the byte addressed by EA is to be loaded and locked into the cache specified by the CT field. (See Section 4.3 of Book II.) If the CT field is set to a value not supported by the implementation, no operation is performed. If the block already exists in the cache, the block is locked without accessing storage. An unable-to-lock condition may occur (see Section 6.11.2.2.2), or an overlocking condition may occur (see Section 6.11.2.2.1). The dcbtls instruction may complete before the operation it causes has been performed. The instruction is treated as a Load. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is supported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None
Let the effective address (EA) be the sum (RA|0)+(RB). The dcbtstls instruction provides a hint that the program will probably soon store to the block containing the byte addressed by EA, and that the block containing the byte addressed by EA is to be loaded and locked into the cache specified by the CT field. (See Section 4.3 of Book II.) If the CT field is set to a value not supported by the implementation, no operation is performed. If the block already exists in the cache, the block is locked without accessing storage. An unable-to-lock condition may occur (see Section 6.11.2.2.2), or an overlocking condition may occur (see Section 6.11.2.2.1). The dcbtstls instruction may complete before the operation it causes has been performed. The instruction is treated as a Store. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is supported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None
968
CT,RA,RB /
6 7
CT
11
RA
16
RB
21
486
/
31 0
31
CT
11
RA
16
RB
21
230
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). The icbtls instruction causes the block containing the byte addressed by EA to be loaded and locked into the instruction cache specified by CT, and provides a hint that the program will probably soon execute code from the block. See Section 4.3 of Book II for a definition of the CT field. If the block already exists in the cache, the block is locked without refetching from memory. This instruction treated as a Load (see Section 4.3 of Book II). If an unable-to-lock condition occurs (see Section 6.11.2.2.2) no operation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is supported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None
Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the instruction cache specified by the CT field is unlocked. The instruction is treated as a Load. If an unable-to-lock condition occurs (see Section 6.11.2.2.2) no operation is performed. If the block containing the byte addressed by EA is not locked in the specified cache, no cache operation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is supported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None
X-form
CT,RA,RB /
6 7
CT
11
RA
16
RB
21
390
/
31
Let the effective address (EA) be the sum (RA|0)+(RB). The block containing the byte addressed by EA in the data cache specified by the CT field is unlocked. The instruction is treated as a Load. If an unable-to-lock condition occurs (see Section 6.11.2.2.2) no operation is performed. If the block containing the byte addressed by EA is not locked in the specified cache, no cache operation is performed. This instruction is privileged unless the Embedded Cache Locking.User Mode category is supported. If the Embedded Cache Locking.User Mode category is supported, this instruction is privileged only if MSRUCLE=0. Special Registers Altered: None Programming Note The dcblc and icblc instructions are used to remove locks previously set by the corresponding lock set instructions.
969
Programming Note If the Embedded.Hypervisor category is supported and if EPCRITLBGS DTLBGS = 0b00, the hypervisor can virtualize the physical TLB by keeping a software copy of at least the guest operating system TLB entries with IPROT=1 and avoid keeping the guest Instruction and Data TLB Error interrupt handlers in the physical TLB.
970
971
972
Programming Note A common operation is to ensure that the TLB-reservation has been set by a tlbsrx. <E.TWC> instruction before executing a subsequent Load instruction of a software page table entry in order to ensure the TLB-reservation detects an invalidation of the entry that was accessed. Beside using a context synchronizing instruction, software can also ensure the TLB-reservation has been set by a tlbsrx. <E.TWC> instruction by reading the CR0 field or CR with a mfocrf or mfcr instruction after the tlbsrx. <E.TWC> and creating a dependency between the data read from CR0 or CR and the address used for the subsequent Load instruction. Serialization of TLB operations Regardless of which threads initiated the operations, all operations (reads, writes, invalidates, and searches) involving a single TLB are defined to be serialized such that only one operation occurs at a time. This operation is consistent with the program order of the thread performing the TLB operation. This also applies to a TLB that is shared by multiple threads. Even if there is no matching TLB entry on a tlbivax, the TLB is still searched to determine there is no matching entry and this search is still referred to as the TLB invalidation. If two threads share a TLB and both simultaneously execute a tlbsrx. <E.TWC> instruction for a virtual address in a virtual page V, and then both threads execute a TLB Write Conditional to create a TLB entry for the virtual page V, at most one of these tlbwe instructions succeeds. If, after thread P1 establishes a TLB-reservation for a virtual address in a virtual page V, another thread P2 executes a tlbivax that invalidates a TLB entry for the virtual page V and thread P1 does a TLB Write Conditional to create a TLB entry for the virtual page V, then one of the following occurs. The TLB invalidation occurs before the TLB write. Thus the TLB-reservation is lost and the TLB Write Conditional does not succeed. The TLB write occurs before the TLB invalidation. Thus the TLB Write Conditional succeeds and the resulting TLB entry created by the tlbwe is invalidated by the tlbivax. Forward progress Forward progress in loops that use tlbsrx. <E.TWC> and tlbwe with MAS0WQ = 0b01 is achieved by a cooperative effort among hardware and system software. The architecture guarantees that when a thread executes a tlbsrx. <E.TWC> to set a TLB-reservation for virtual address X and then a TLB Write Conditional to write a TLB entry, either
Synchronization of TLB-reservation The side-effect of a tlbsrx. <E.TWC> instruction setting the TLB-reservation can be synchronized by a context synchronizing instruction or event.
973
974
Programming Note The preferred method of invalidating entire TLB arrays is invalidation using MMUCSR0, however tlbilx may be more efficient.
975
Programming Note If TLBnCFGHES=1 for a TLB array and it is important that the lookaside information corresponding to a TLB entry be invalidated, software should use tlbilx or tlbivax to invalidate the virtual address.
976
977
RA,RB ///
11
RA
16
RB
21
786
/
31
if RA = 0 then b 0 (RA) else b EA b + (RB) for each thread if MAV = 1.0 then for TLB array = EA59:60 if MAV = 2.0 then for each TLB array for each TLB entry if MMUCFGTWC = 1 or TLBnCFGHES = 1 then c MAS6ISIZE else entrySIZE c if MAV = 1.0 then m ((1 << (2 c)) - 1) else m ((1 << c) - 1) if ((EA0:53 & m) = (entryEPN & m)) & entrySIZE = MAS6ISIZE & entryTID = MAS6SPID & entryTS = MAS6SAS & (E.PT not supported | entryIND = MAS6SIND) & (E.HV not supported | (entryTLPID = MAS5SLPID & entryTGS = MAS5SGS)) | ((MAV = 1.0) & (EA61 = 1)) then 0 if entryIPROT = 0 then entryV Let the effective address (EA) be the sum (RA|0)+ (RB). The EA is interpreted as show below. EA0:53 EA0:53
The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is based on the following. If MMUCFGTWC = 1 or TLBnCFGHES = 1, c is equal MAS6ISIZE. Otherwise, c is equal to entrySIZE. If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2 c)) - 1). Otherwise, m is equal to the logical NOT of ((1 << c) - 1). The TID value of the TLB entry is equal to MAS6SPID and the TS value of the TLB entry is equal to MAS6SAS. The implementation does not support the Embedded.Page Table category or the IND value of the TLB entry is equal to MAS6SIND. Either of the following is true: The implementation does not support the Embedded.Hypervisor category. The TLPID value of the TLB entry is equal to MAS5SLPID and the TGS value of the TLB entry is equal to MAS5SGS. entryIPROT = 0. In MMU Architecture Version 1.0 if EA61=1, all entries in all threads not protected by the IPROT attribute in the TLB array targeted by EA59:60 are made invalid. If the instruction specifies a TLB array that does not exist, the instruction is treated as if the instruction form is invalid. If the implementation requires the page size to be specified by MAS6ISIZE (MMUCFGTWC = 1 or, for the specified TLB array, TLBnCFGHES = 1) and the page size specified by MAS6ISIZE is not supported by the implementation, the instruction is treated as if the instruction form is invalid. If the operation isnt a TLB invalidate all and there are multiple entries in a single threads TLB array(s) that match the complete VPN, then zero or more matching entries with IPROT=0 are invalidated or a Machine Check interrupt occurs. If the Embedded.Hypervisor category is supported, this Machine Check interrupt must be precise. The operation performed by this instruction is ordered by the mbar (or sync) instruction with respect to a subsequent tlbsync instruction executed by the thread executing the tlbivax instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as a set of operations which is independent of the other sets that mbar orders. The effects of the invalidation on this thread are not guaranteed to be visible to the programming model
EA54:58 Reserved EA59:60 TLB array selector [MAV = 1.0] 00 01 10 11 EA61 TLB0 TLB1 TLB2 TLB3
EA62:63 Reserved If EA61=0, all TLB entries on all threads that have all of the following properties are made invalid. The MAS registers listed are those in the thread executing the tlbivax. The MMU architecture version is 2.0 or the entry is in the TLB array targeted by EA59:60.
978
979
T
11
RA
16
RB
21
18
/
31
if RA = 0 then b 0 (RA) else b b + (RB) EA for each TLB array for each TLB entry if MMUCFGTWC = 1 or TLBnCFGHES = 1 then c MAS6ISIZE else entrySIZE c if MAV = 1.0 then m ((1 << (2 c)) - 1) else m ((1 << c) - 1) if (entryIPROT = 0) & (entryTLPID = MAS5SLPID) then 0 if T = 0 then entryV if T = 1 & entryTID = MAS6SPID then entryV 0 if T = 3 & entryTGS = MAS5SGS & ((EA0:53 & m) = (entryEPN & m)) & entrySIZE = MAS6ISIZE & entryTID = MAS6SPID & entryTS = MAS6SAS & (E.PT not supported | entryIND = MAS6SIND) then entryV 0 Let the effective address (EA) be the sum (RA|0) + (RB). The tlbilx instruction invalidates TLB entries in the thread that executes the tlbilx instruction. TLB entries which are protected by the IPROT attribute (entryIPROT = 1) are not invalidated. If T = 0, all TLB entries that have all of the following properties are made invalid on the thread executing the tlbilx instruction. The TLPID of the entry matches MAS5SLPID. The IPROT of entry is 0. If T = 1, all TLB entries that have all of the following properties are made invalid on the thread executing the tlbilx instruction. The TLPID of the entry matches MAS5SLPID. The TID of the entry matches MAS6SPID. The IPROT of entry is 0. If T = 3, all TLB entries in the thread executing the tlbilx instruction that have all of the following properties are made invalid. The TLPID value of the TLB entry is equal to MAS5SLPID and the TGS value of the TLB entry is equal to MAS5SGS. The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is based on the following.
If MMUCFGTWC = 1 or TLBnCFGHES = 1, c is equal MAS6ISIZE. Otherwise, c is equal to entrySIZE. If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2 c)) - 1). Otherwise, m is equal to the logical NOT of ((1 << c) - 1). The TID value of the TLB entry is equal to MAS6SPID and the TS value of the TLB entry is equal to MAS6SAS. The implementation does not support the Embedded.Page Table category or the IND value of the TLB entry is equal to MAS6SIND. The IPROT of entry is 0. The effects of the invalidation are not guaranteed to be visible to the programming model until the completion of a context synchronizing operation. Invalidations may occur for other TLB entries on the thread executing the tlbilx instruction, but in no case will any TLB entries with the IPROT attribute set be made invalid. If T = 2, the instruction form is invalid. If T = 3 and the implementation requires the page size to be specified by MAS6ISIZE (MMUCFGTWC = 1 or, for any TLB array, TLBnCFGHES = 1) and the page size specified by MAS6ISIZE is not supported by the implementation, the instruction is treated as if the instruction form is invalid. If T=3 and there are multiple entries in the TLB array(s) that match the complete VPN, then zero or more matching entries with IPROT=0 are invalidated or a Machine Check interrupt occurs. If the Embedded.Hypervisor category is supported, this Machine Check interrupt must be precise. If RA does not equal 0, it is implementation-dependent whether an Illegal Instruction exception occurs. If the Embedded.Hypervisor category is supported and guest execution of TLB Management instructions is disabled (EPCRDGTMI=1), this instruction is hypervisor privileged. Otherwise, this instruction is privileged. Special Registers Altered: None Extended Mnemonics: Examples of extended mnemonics for TLB Invalidate Local: Extended: tlbilxlpid tlbilxpid tlbilxva RA,RB tlbilxva RB Equivalent to: tlbilx 0,0,0 tlbilx 1,0,0 tlbilx 3,RA,RB tlbilx 3,0,RB
980
Programming Note tlbilx is the preferred way of performing TLB invalidations, especially for operating systems running as a guest to the hypervisor since the invalidations are partitioned and do not require hypervisor privilege. Programming Note When dispatching a guest operating system, hypervisor software should always set MAS5SLPID to the guests corresponding LPID value. Programming Note Executing a tlbilx instruction with T=0 or T=1 may take many cycles to perform. Software should only issue these operations when an LPID or a PID value is reused or taken out of use.
981
X-form
RA,RB ///
11
RA
16
RB
21
914
/
31
if RA = 0 then b 0 else b (RA) b + (RB) EA 0 Valid_matching_entry_exists for each TLB array for each TLB entry if MAV = 1.0 then m ((1 << (2 entrySIZE)) - 1) else m ((1 << entrySIZE) - 1) if ((EA0:53 & m) = (entryEPN & m)) & (entryTID = MAS6SPID | entryTID = 0) & entryTS = MAS6SAS & (E.PT not supported | entryIND = MAS6SIND) & (E.HV not supported | (entryTGS = MAS5SGS & (entryTLPID = MAS5SLPID | entryTLPID = 0))) then Valid_matching_entry_exists 1 exit for loops if Valid_matching_entry_exists matching entry found entry TLB array number where TLB entry found array index into TLB array of TLB entry found index if TLB array supports Next Victim then hint hardware hint for Next Victim else hint undefined entryRPN rpn MAS0ATSEL 0 MAS0TLBSEL array index MAS0ESEL if MAS0HES supported MAS0HES 0 if Next Victim supported then if TLB array specified by MAS0TLBSEL supports NV then hint MAS0NV else MAS0NV undefined 1 MAS1V entryTID TS SIZE MAS1TID TS TSIZE if TLB array supports IPROT then entryIPROT MAS1IPROT else MAS1IPROT 0 if category E.PT supported then if TLB array supports indirect entries then entryIND MAS1IND if entryIND = 1 entrySPSIZE MAS3SPSIZE else entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR else MAS1IND 0 entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR else entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR MAS2EPN W I M G E entryEPN W I M G E if category VLE supported then MAS2VLE entryVLE
if ACM supported then MAS2ACM entryACM MAS3RPNL rpn32:53 MAS3U0:U3 entryU0:U3 MAS7RPNU rpn0:31 if category E.HV supported then MAS8TGS VF TLPID entryTGS VF TLPID else 0 MAS0ATSEL MAS0TLBSEL MAS4TLBSELD if Next Victim supported then if TLB array specified by MAS4TLBSELD supports Next Victim then MAS0ESEL hint hint for next replacement MAS0NV else MAS0ESEL undefined undefined MAS0NV else MAS0ESEL undefined if MAS0HES supported MAS0HES TLBnCFGHES for the TLB array specified by MAS4TLBSELD MAS1V IPROT 0 MAS1TID TS MAS6SPID SAS MAS1TSIZE MAS4TSIZED if Embedded.Page Table category supported then MAS1IND MAS4INDD MAS2W I M G E MAS4WD ID MD GD ED if category VLE supported then MAS2VLE MAS4VLED if ACM supported, then MAS2ACM MAS4ACMD MAS2EPN undefined 0 MAS3RPNL MAS3U0:U3 UX SX UW SW UR SR 0 0 MAS7RPNU if category E.TWC supported then MAS0WQ 0b01 Let the effective address (EA) be the sum (RA|0)+ (RB). If any TLB array contains a valid entry matching the MAS1IND <E.PT> and virtual address formed by MAS5SGS <E.HV>, MAS5SLPID <E.HV>, MAS1TS TID, and EA, the search is considered successful. A TLB entry matches if all the following conditions are met. The valid bit of the TLB entry is 1. The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is determined as follows: If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2 entrySIZE)) - 1). Otherwise, m is equal to the logical NOT of ((1 << entrySIZE) - 1) The TID value of the TLB entry is equal to MAS6SPID or is zero. The TS value of the TLB entry is equal to MAS6SAS. Either the Embedded.Page Table category is not supported or the IND value of the TLB entry is equal to MAS6SIND. Either of the following is true: The implementation does not support the Embedded.Hypervisor category. The TGS value of the TLB entry is equal to MAS5SGS and either the TLPID value of the TLB entry is equal to MAS5SLPID or is zero.
982
983
///
11
RA
16
RB
21
850
1
31
if RA = 0 then b 0 else b (RA) b + (RB) EA MAS1TID pid as MAS1TS if Embedded.Page Table category supported then MAS1IND ind if category E.HV supported then MAS5SGS gs lpid MAS5SLPID va gs || lpid || as || pid || EA else as || pid || EA va TLB-RESERVE 1 if Embedded.Page Table category supported then ind || va TLB-RESERVE_IND_N_ADDR else TLB-RESERVE_ADDR va 0 Valid_matching_entry_exists for each TLB array for each TLB entry if MAV = 1.0 then m ((1 << (2 entrySIZE)) - 1) else m ((1 << entrySIZE) - 1) if ((EA0:53 & m) = (entryEPN & m)) & (entryTID = MAS1TID | entryTID = 0) & entryTS = MAS1TS & (E.PT not supported | entryIND = MAS1IND) & (E.HV not supported | (entryTGS = MAS5SGS & (entryTLPID = MAS5SLPID | entryTLPID = 0))) then Valid_matching_entry_exists 1 exit for loops if Valid_matching_entry_exists then 0b0010 CR0 else 0b0000 CR0 Let the effective address (EA) be the sum (RA|0)+ (RB). If any TLB array contains a valid entry matching the MAS1IND <E.PT> and virtual address formed by MAS5SGS <E.HV>, MAS5SLPID <E.HV>, MAS1TS TID, and EA, the search is considered successful. A TLB entry matches if all the following conditions are met. The valid bit of the TLB entry is 1. Either the Embedded.Page Table category is not supported or the IND value of the TLB entry is equal to MAS1IND. The logical AND of EA0:53 and m is equal to the logical AND of the EPN value of the TLB entry and m, where m is determined as follows: If MMU Architecture Version 1.0 is supported, m is equal to the logical NOT of ((1 << (2
entrySIZE)) - 1). Otherwise, m is equal to the logical NOT of ((1 << entrySIZE) - 1) The TID value of the TLB entry is equal to MAS1TID or is zero. The TS value of the TLB entry is equal to MAS1TS. Either of the following is true: The implementation does not support the Embedded.Hypervisor category. The TGS value of the TLB entry is equal to MAS5SGS and either the TLPID value of the TLB entry is equal to MAS5SLPID or is zero. CR Field 0 is set as follows. n is a 1-bit value that indicates whether the search was successful. CR0LT GT EQ SO = 0b00 || n || 0 This instruction creates a TLB-reservation for use by a TLB Write instruction. The virtual address described above is associated with the TLB-reservation, and replaces any address previously associated with the TLB-reservation. (The TLB-reservation is created regardless of whether the search succeeds.) If there are multiple matching TLB entries, either one of the matching entries is used or a Machine Check exception occurs. If the Embedded.Hypervisor category is supported, this Machine Check interrupt must be precise. If RA does not equal zero, it is implementation-dependent whether an Illegal Instruction exception occurs. If the Embedded.Hypervisor category is supported and guest execution of TLB Management instructions is disabled (EPCRDGTMI=1), this instruction is hypervisor privileged. Otherwise, this instruction is privileged. Special Registers Altered: CR0
984
X-form
///
11
///
16
///
21
946
/
31
if MAS0ATSEL= 0 then if TLBnCFGHES = 0 then SelectTLB(MAS0TLBSEL,MAS0ESEL, MAS2EPN) entry else entry SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, MAS2EPN, MAS0ESEL) if Next Victim supported then if TLB array specified by MAS0TLBSEL supports NV then hint MAS0NV else MAS0NV undefined if TLB entry is found then entryRPN rpn MAS1V TID TS TSIZE entryV TID TS SIZE if TLB array supports IPROT then MAS1IPROT entryIPROT else 0 MAS1IPROT if category E.PT supported then if TLB array supports indirect entries then entryIND MAS1IND if entryIND = 1 entrySPSIZE MAS3SPSIZE else MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR else 0 MAS1IND MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR else entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR MAS2EPN W I M G E entryEPN W I M G E if category VLE supported then MAS2VLE entryVLE if ACM supported then MAS2ACM entryACM MAS3RPNL rpn32:53 MAS3U0:U3 entryU0:U3 MAS7RPNU rpn0:31 if category E.HV supported then entryTGS VF TLPID MAS8TGS VF TLPID else MAS1V 0 undefined MAS1IPROT TID TS TSIZE if Embedded.Page Table supported then undefined MAS1IND undefined MAS2EPN W I M G E if category VLE supported then MAS2VLE undefined undefined if ACM supported then MAS2ACM undefined MAS3RPNL U0:U3 UX SX UW SW UR SR MAS7RPNU undefined if category E.HV supported then undefined MAS8TGS VF TLPID else entry SelectLRAT(MAS0ESEL, MAS2EPN) MAS0NV undefined if LRAT entry is found then entryLRPN rpn MAS1V TSIZE entryV LSIZE
MAS1IPROT TID TS 0 0 0 if Embedded.Page Table supported then 0 MAS1IND MAS2EPN entryLPN MAS2W I M G E 0 0 0 0 0 if category VLE supported then 0 MAS2VLE 0 if ACM supported then MAS2ACM MAS3RPNL rpn32:53 MAS3U0:U3 UX SX UW SW UR SR 0 0 0 0 0 0 0 rpn0:31 MAS7RPNU MAS8TGS VF 0 0 if the LRAT supports LPID entryLPID MAS8TLPID else undefined MAS8TLPID else MAS1V 0 undefined MAS1IPROT TID TS TSIZE if Embedded.Page Table supported then MAS1IND undefined undefined MAS2EPN W I M G E if category VLE supported then MAS2VLE undefined undefined if ACM supported then MAS2ACM undefined MAS3RPNL U0:U3 UX SX UW SW UR SR MAS7RPNU undefined undefined MAS8TGS VF TLPID If the Embedded.Hypervisor.LRAT category is not supported or MAS0ATSEL is 0, then the following applies. If Next Victim is supported for any TLB array, the following applies. If the TLB array specified by MAS0TLBSEL supports Next Victim, MAS0NV is set to the hardware hint for the index of the entry to be replaced. Otherwise, MAS0NV is set to an implementation-dependent undefined value. A TLB entry is selected in one of the following ways. if TLBnCFGHES=0 for the TLB array selected by MAS0TLBSEL, the TLB entry is specified by MAS0TLBSEL, MAS0ESEL and MAS2EPN. if TLBnCFGHES=1 for the TLB array selected by MAS0TLBSEL, the TLB entry is specified by MAS0TLBSEL, MAS0ESEL and a hardware generated hash based on MAS1TID, MAS1TSIZE, and MAS2EPN. If the selected TLB entry exists, MAS register fields are loaded according to the following. MAS1V TID TS TSIZE are loaded from the V, TID, TS, and SIZE fields of the TLB entry. If the TLB array supports IPROT, MAS1IPROT is loaded from the IPROT bit of the TLB entry. Otherwise, MAS1IPROT is set to 0. MAS2EPN W I M G E are loaded from the EPN, W, I, M, G, and E fields of the TLB entry. If the VLE category is supported, MAS2VLE is loaded from the VLE bit of the TLB entry. If Alternate Coherency Mode is supported, MAS2ACM is loaded from the ACM bit of the TLB entry.
985
986
X-form
X-form
///
11
///
16
///
21
566
/
31
0
31
6
///
11
///
16
///
21
978
/
31
The tlbsync instruction provides an ordering function for the effects of all tlbivax instructions executed by the thread executing the tlbsync instruction, with respect to the memory barrier created by a subsequent sync instruction executed by the same thread. Executing a tlbsync instruction ensures that all of the following will occur. All TLB invalidations caused by tlbivax instructions preceding the tlbsync instruction will have completed on any other thread before any data accesses caused by instructions following the sync instruction are performed with respect to that thread. All storage accesses by other threads for which the address was translated using the translations being invalidated will have been performed with respect to the thread executing the sync instruction, to the extent required by the associated Memory Coherence Required attributes, before the sync instructions memory barrier is created. The operation performed by this instruction is ordered by the mbar or sync instruction with respect to preceding tlbivax instructions executed by the thread executing the tlbsync instruction. The operations caused by tlbivax and tlbsync are ordered by mbar as a set of operations, which is independent of the other sets that mbar orders. The tlbsync instruction may complete before operations caused by tlbivax instructions preceding the tlbsync instruction have been performed. If the Embedded.Hypervisor category is supported, this instruction is hypervisor privileged. Otherwise, this instruction is privileged. Special Registers Altered: None Programming Note Care must be taken on some implementations when using the tlbsync instruction as there may be a system-imposed restriction of only one tlbsync allowed on the bus at a given time in the system.
if MAS0WQ = 0b00 | MAS0WQ = 0b01 then if MAS0ATSEL = 0 or MSRGS = 1 then if TLBnCFGHES = 0 then SelectTLB(MAS0TLBSEL,MAS0ESEL, MAS2EPN) entry else if MAS0HES = 1 then SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, entry MAS2EPN, hardware_replacement_algorithm) else SelectTLB(MAS0TLBSEL, MAS1TID TSIZE, entry MAS2EPN, MAS0ESEL) if TLB array specified by MAS0TLBSEL supports NV &((MAS0WQ = 0b00) | (category E.TWC supported & (MAS0WQ = 0b01) & (TLB-reservation))) then MAS0NV hint if TLB entry is found & ((MAS0WQ = 0b00) | ((category E.TWC supported) & (MAS0WQ = 0b01) & (TLB-reservation))) then if category E.HV.LRAT supported & (MSRGS=1) & (MAS1V=1) then translate_logical_to_real(MAS7RPNU rpn || MAS3RPNL, MAS8TLPID) else if MAS7 implemented then MAS7RPNU || MAS3RPNL rpn 320 || MAS3 else rpn RPNL entryV IPROT TID TS SIZE MAS1V IPROT TID TS TSIZE entryEPN VLE W I M G E ACM MAS2EPN VLE W I M G E ACM entryU0:U3 MAS3U0:U3 if category E.PT supported and TLB array supports indirect entries then entryIND MAS1IND if MAS1IND = 0 then MAS3UX SX UW SW UR SR entryUX SX UW SW UR SR else MAS3SPSIZE entrySPSIZE else entryUX SX UW SW UR SR MAS3UX SX UW SW UR SR entryRPN rpn if (category E.HV is supported) MAS8TGS VF TLPID entryTGS VF TLPID if category E.TWC supported 0 TLB-reservation else entry SelectLRAT(MAS0ESEL,MAS2EPN) if LRAT entry is found & (MAS0WQ = 0b00) & (MAS0HES = 0b0) then MAS0NV hint entryV LSIZE MAS1V TSIZE entryLPN MAS2EPN entryRPN MAS7RPNU || MAS3RPNL if LRATCFGLPID = 1 entryLPID MAS8TLPID else if category E.TWC supported TLB-reservation 0
987
988
989
990
991
7.1 Overview
An interrupt is the action in which the thread saves its old context (MSR and next instruction address) and begins execution at a pre-determined interrupt-handler address, with a modified MSR. Exceptions are the events that will, if enabled, cause the thread to take an interrupt. Exceptions are generated by signals from internal and external peripherals, instructions, the internal timer Interrupts are divided into 4 classes, as described in Section 7.4.3, such that only one interrupt of each class is reported, and when it is processed no program state is lost. Since Save/Restore register pairs GSRR0/ GSRR1 <E.HV> SRR0/SRR1, CSRR0/CSRR1, DSRR0/DSRR1 [Category: E.ED], and MCSSR0/ MCSSR1 are serially reusable resources used by guest <E.HV>, base, critical, debug [Category: E.ED],
Machine Check interrupts, respectively, program state may be lost when an unordered interrupt is taken. (See Section 7.8, Interrupt Ordering and Masking.
992
Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. The contents of SRR1 can be read into register RT using mfspr RT,SRR1. The contents of register RS can be written into the SRR1 using mtspr SRR1,RS. This register is hypervisor privileged.
//
62 63
Figure 52. Save/Restore Register 0 In general, SRR0 contains the address of the instruction that caused the non-critical interrupt, or the address of the instruction to return to after a non-critical interrupt is serviced. The contents of SRR0 when an interrupt is taken are mode dependent, reflecting the computation mode when the interrupt is taken and the computation mode entered for execution of the interrupt (specified by EPCRICM) <E.HV>. When computation mode when the interrupt is taken is 32-bit mode and the computation mode entered for execution of the interrupt is 64-bit mode, the high-order 32 bits of SRR0 are set to 0s. When computation mode when the interrupt is taken is 64-bit mode and the computation mode entered for execution of the interrupt is 32-bit mode, the contents SRR0 are undefined. The contents of SRR0 upon interrupt can be described as follows (assuming Addr is the address to be put into SRR0): if (MSRCM = 0) & (EPCRICM = 0) then SRR0 32undefined || Addr32:63 if (MSRCM = 0) & (EPCRICM = 1) then SRR0 320 || Addr32:63 if (MSRCM = 1)&(EPCRICM = 1) then SRR0 Addr0:63 if (MSRCM = 1)&(EPCRICM = 0) then SRR0 undefined The contents of SRR0 can be read into register RT using mfspr RT,SRR0. The contents of register RS can be written into the SRR0 using mtspr SRR0,RS. This register is hypervisor privileged.
//
62 63
Figure 54. Guest Save/Restore Register 0 In general, GSRR0 contains the address of the instruction that caused the guest interrupt, or the address of the instruction to return to after a guest interrupt is serviced. The contents of GSRR0 when an interrupt is taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the computation mode entered for execution of the interrupt (specified by EPCRGICM). The contents of GSRR0 upon interrupt can be described as follows (assuming Addr is the address to be put into GSRR0): if (MSRCM = 0) & (EPCRGICM = 0) then GSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (EPCRGICM = 1) then GSRR0 320 || Addr32:63 if (MSRCM = 1)&(EPCRGICM = 1) then GSRR0 Addr0:63 if (MSRCM=1)&(EPCRGICM=0) then GSRR0 undefined The contents of GSRR0 can be read into register RT using mfspr RT,GSRR0. The contents of register RS can be written into the GSRR0 using mtspr GSRR0,RS. This register is privileged. Programming Note
Figure 53. Save/Restore Register 1 Bits of SRR1 that correspond to reserved bits in the MSR are also reserved.
mfspr RT,SRR0 should be used to read GSRR0 in guest supervisor state. mtspr SRR0,RS should be used to write GSRR0 in guest supervisor state. See Section 2.2.1, Register Mapping.
993
Figure 55. Guest Save/Restore Register 1 Bits of GSRR1 that correspond to reserved bits in the MSR are also reserved. Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfgi/rfci/rfdi/rfmci. The contents of GSRR1 can be read into register RT using mfspr RT,GSRR1. The contents of register RS can be written into the GSRR1 using mtspr GSRR1,RS. This register is privileged. Programming Note mfspr RT,SRR1 should be used to read GSRR1 in guest supervisor state. mtspr SRR1,RS should be used to write GSRR1 in guest supervisor state. See Section 2.2.1, Register Mapping.
Figure 57. Critical Save/Restore Register 1 Bits of CSRR1 that correspond to reserved bits in the MSR are also reserved. Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. The contents of CSRR1 can be read into bits 32:63 of register RT using mfspr RT,CSRR1, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS
//
62 63
994
Figure 59. Debug Save/Restore Register 1 Bits of DSRR1 that correspond to reserved bits in the Machine State Register are also reserved. The contents of DSRR1 can be read into bits 32:63 of register RT using mfspr RT,DSRR1, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS can be written into the DSSR1 using mtspr DSRR1,RS. This register is hypervisor privileged.
//
62 63
Figure 58. Debug Save/Restore Register 0 In general, DSRR0 contains the address of an instruction that was executing or just finished execution when the Debug exception occurred. The contents of DSRR0 when an interrupt is taken are mode dependent, reflecting the computation mode when the interrupt is taken and the computation mode entered for execution of the interrupt (specified by EPCRICM) [Category:Embedded.Hypervisor]. If computation mode when the interrupt is taken is 32-bit mode and the computation mode entered for execution of the interrupt is 64-bit mode, the high-order 32 bits of DSRR0 are set to 0s. When computation mode when the interrupt is taken is 64-bit mode and the computation mode entered for execution of the interrupt is 32-bit mode, the contents DSRR0 are undefined. The contents of DSRR0 upon Debug interrupt can be described as follows (assuming Addr is the address to be put into DSRR0):
if (MSRCM Addr32:63 if (MSRCM if (MSRCM if (MSRCM = 0) & (EPCRICM = 0) then DSRR0 = 0) & (EPCRICM = 1) then DSRR0 = 1) & (EPCRICM = 1) then DSRR0 = 1) & (EPCRICM = 0) then DSRR0
32
undefined ||
320
The contents of DSRR0 can be read into register RT using mfspr RT,DSRR0. The contents of register RS can be written into DSRR0 using mtspr DSRR0,RS. This register is hypervisor privileged.
995
high-order 48 bits of the address of the exception processing routines. The 16-bit exception vector offsets from the appropriate IVOR (provided in Section 7.6.1, Interrupt Fixed Offsets [Category: Embedded.Phased-In]) are concatenated to the right of bits 0:47 of the Interrupt Vector Prefix Register to form the 64-bit address of the exception processing routine. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, the following applies. IVPR52:63 are reserved. Bits 0:51 of the Interrupt Vector Prefix Register provide the highorder 52 bits of the address of the exception processing routines. The 12-bit exception vector offsets (provided in Section 7.6.1, Interrupt Fixed Offsets [Category: Embedded.Phased-In]) are concatenated to the right of bits 0:47 of the Interrupt Vector Prefix Register to form the 64-bit address of the exception processing routine. The contents of Interrupt Vector Prefix Register can be read into register RT using mfspr RT,IVPR. The contents of register RS can be written into Interrupt Vector Prefix Register using mtspr IVPR,RS. This register is hypervisor privileged.
996
997
40
ST
Store operation
41 42 43 44
45 46 47 48:52 53 54
PUO BO
PIE Imprecise exception Reserved DATA Data Access [Category: Embedded.Page Table] TLBI TLB Ineligible [Category: Embedded.Page Table] PT
55
56
SPV
57
EPID
Data Storage Instruction Storage LRAT Error Page Table [Category: Embedded.Page Data Storage Table] Instruction Storage LRAT Error Signal Processing operation [Category: Sig- Alignment nal Processing Engine] Data Storage Vector operation [Category: Vector] Data TLB LRAT Error Embedded Floating-point Data Embedded Floating-point Round SPE/Embedded Floating-point/Vector Unavailable External Process ID operation [Category: Alignment Embedded.External Process ID] Data Storage Data TLB LRAT Error
998
59:61 62
Figure 60. Exception Syndrome Register Definitions Programming Note The information provided by the ESR is not complete. System software may also need to identify the type of instruction that caused the interrupt, examine the TLB entry accessed by a data or instruction storage access, as well as examine the ESR to fully determine what exception or exceptions caused the interrupt. For example, a Data Storage interrupt may be caused by both a Protection Violation exception as well as a Byte Ordering exception. System software would have to look beyond ESRBO, such as the state of MSRPR in SRR1 and the page protection bits in the TLB entry accessed by the storage access, to determine whether or not a Protection Violation also occurred. The contents of the ESR can be read into bits 32:63 of register RT using mfspr RT,ESR, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS can be written into the ESR using mtspr ESR,RS. This register is hypervisor privileged.
999
IVORi IVOR0 IVOR1 IVOR2 IVOR3 IVOR4 IVOR5 IVOR6 IVOR7 IVOR8 IVOR9 IVOR10 IVOR11 IVOR12 IVOR13 IVOR14 IVOR15 IVOR16 : IVOR31
Interrupt Critical Input Machine Check Data Storage Instruction Storage External Input Alignment Program Floating-Point Unavailable System Call Auxiliary Processor Unavailable Decrementer Fixed-Interval Timer Interrupt Watchdog Timer Interrupt Data TLB Error Instruction TLB Error Debug Reserved
[Category: Signal Processing Engine] [Category: Vector] IVOR32 SPE/Embedded Floating-Point/Vector Unavailable Interrupt
[Category: SP.Embedded Float_*] (IVORs 33 & 34 are required if any SP.Float_ dependent category is supported.) IVOR33 IVOR34 Embedded Floating-Point Data Interrupt Embedded Floating-Point Round Interrupt Embedded Performance Monitor Interrupt Processor Doorbell Interrupt Processor Doorbell Critical Interrupt
IVORi GIVOR2 GIVOR3 GIVOR4 GIVOR8 GIVOR13 GIVOR14 IVOR 42 IVOR43 : IVOR63
Interrupt Data Storage Instruction Storage External Input System Call Data TLB Error Instruction TLB Error LRAT Error Interrupt Implementation-dependent
[Category: Embedded.Hypervisor.LRAT]
[Category: Embedded.Hypervisor, Embedded.Processor Control] IVOR38 IVOR39 Guest Processor Doorbell Interrupt Guest Processor Doorbell Critical/ Machine Check Interrupt Embedded Hypervisor System Call Interrupt Embedded Hypervisor Privilege Interrupt LRAT Error Interrupt
Figure 62. Guest Interrupt Vector Offset Register Assignments Bits 48:59 of the contents of GIVORi can be read into bits 48:59 of register RT using mfspr RT,GIVORi, setting bits 0:47 and bits 60:63 of GPR(RT) to zero. Bits 48:59 of the contents of register RS can be written into bits 48:59 of GIVORi using mtspr GIVORi,RS. Write access to these registers is hypervisor privileged. Read access to these registers is privileged. Programming Note mfspr RT,IVORi should be used to read GIVORi in guest supervisor state. mtspr IVORi,RS should be used to write GIVOR in guest supervisor state. Hypervisor software should emulate the accesses for the guest.
[Category: Embedded.Hypervisor.LRAT] IVOR 42 IVOR43... Implementation-dependent IVOR63 Figure 61. Interrupt Vector Offset Register Assignments
1000
Programming Note The architecture only provides a few GIVORs that are implemented in hardware that are performance critical. Hypervisor software should emulate access to IVORs that do not have corresponding GIVORs.
7.2.17 Logical Page Exception Register [Category: Embedded.Hypervisor and Embedded.Page Table]
The Logical Page Exception Register (LPER) is a 64-bit register that is required when both the Embedded.Hypervisor and Embedded.Page Table categories are supported. LPER bits are numbered 0 (most-significant bit) to 63 (least-significant bit). ///
0 12
ALPN
52
///
LPS
60 63
Figure 63. Logical Page Exception Register The LPER fields are described below. Bit 12:52 Definition Abbreviated Logical Page Number (ALPN) This field contains the Abbreviated Real Page Number from the PTE which caused the LRAT Error interrupt. Only bits corresponding to the PTEARPN bits supported by the implementation need be implemented. Logical Page Size (LPS) This field contains the Page Size from the PTE that caused the LRAT Error interrupt.
//
62 63
Figure 64. Machine Check Save/Restore Register 0 In general, MCSRR0 contains the address of an instruction that was executing or about to be executed when the Machine Check exception occurred. The contents of MCSRR0 when a Machine Check interrupt is taken are mode dependent, reflecting the computation mode currently in use (specified by MSRCM) and the computation mode entered for execution of the Machine Check interrupt (specified by EPCRICM) [Category:Embedded.Hypervisor]. The contents of MCSRR0 upon Machine Check interrupt can be described as follows (assuming Addr is the address to be put into MCSRR0): if (MSRCM = 0) & (EPCRICM = 0) then MCSRR0 32undefined || Addr32:63 if (MSRCM = 0) & (EPCRICM = 1) then MCSRR0 320 || Addr32:63 if (MSRCM = 1)&(EPCRICM = 1) then MCSRR0 Addr0:63 if (MSRCM=1)&(EPCRICM=0) then MCSRR0 undefined The contents of MCSRR0 can be read into register RT using mfspr RT,MCSRR0. The contents of register RS can be written into MCSRR0 using mtspr MCSRR0,RS. This register is hypervisor privileged.
60:63
All other fields are reserved. The LPER contains the values of the ARPN and PS fields from the PTE that was used to translate a virtual address for an instruction fetch, Load, Store or Cache Management instruction that caused an LRAT Error interrupt as a result of an LRAT Miss exception. The contents of LPER are unchanged by an interrupt for any other type of exception. The LPER is a hypervisor resource. The contents of the Logical Page Exception Register can be read into register RT using mfspr RT,LPER. On both a 32-bit and a 64-bit implementation, the contents of LPER0:31 can be read into register RT32:63 using mfspr RT,LPERU. The contents of register RS can be written into the LPER using mtspr LPER,RS. On both a 32-bit and a 64-bit implementation, the contents of register RS32:63 can be written into the LPER0:31 using mtspr LPERU,RS. On a 32-bit implementation that supports fewer than 33 bits of real address, it is implementation-dependent
1001
into Machine Check Interrupt Vector Prefix Register using mtspr IVPR,RS. Programming Note In some implementations that support Interrupt Fixed Offsets, certain instruction cache errors result in a Machine Check exception. The Machine Check interrupt handler needs to be in Caching Inhibited storage in order for the interrupt handler to operate despite an instruction cache error.
Figure 65. Machine Check Save/Restore Register 1 Bits of MCSRR1 that correspond to reserved bits in the MSR are also reserved. Programming Note A MSR bit that is reserved may be inadvertently modified by rfi/rfci/rfmci. The contents of MCSRR1 can be read into register RT using mfspr RT,MCSRR1. The contents of register RS can be written into the MCSRR1 using mtspr MCSRR1,RS. This register is hypervisor privileged.
Figure 66. External Proxy Register When the External Input interrupt is taken, the contents of the EPR provide information related to the External Input Interrupt. This register is hypervisor privileged. Programming Note
The EPR is provided for faster interrupt processing as well as situations where an interrupt must be taken, but software must delay the resultant processing for later. The EPR contains the vector from the interrupt controller. The process of receiving the interrupt into the EPR acknowledges the interrupt to the interrupt controller. The method for enabling or disabling the acknowledgment of the interrupt by placing the interrupt-related information in the EPR is implementation-dependent. If this acknowledgement is disabled, then the EPR is set to 0 when the External Input interrupt occurs.
7.2.20 Guest External Proxy Register [Category: Embedded Hypervisor, External Proxy]
The Guest External Proxy Register (GEPR) contains implementation-dependent information related to an External Input interrupt when an External Input interrupt directed to the guest occurs. The GEPR is only
1002
Figure 67. Guest External Proxy Register When the External Input interrupt is taken in the guest supervisor state, the contents of the GEPR provide information related to the External Input Interrupt. The contents of the GEPR can be read into bits 32:63 of register RT using mfspr RT,GEPR, setting bits 0:31 of RT to zero. The contents of bits 32:63 of register RS can be written into the GEPR using mtspr GEPR,RS. The GEPR is identical in form and function to the EPR. This register is privileged. Programming Note The GEPR is provided for faster interrupt processing as well as situations where an interrupt must be taken, but software must delay the resultant processing for later. The GEPR contains the vector from the interrupt controller. The process of receiving the interrupt into the GEPR acknowledges the interrupt to the interrupt controller. The method for enabling or disabling the acknowledgment of the interrupt by placing the interrupt-related information in the GEPR is implementation-dependent. If this acknowledgement is disabled, then the GEPR is set to 0 when the External Input interrupt occurs. Programming Note mfspr RT,EPR should be used to read GEPR in guest supervisor state. Hypervisor software should emulate the accesses for the guest. This keeps the programming model consistent for an operating system running as a guest and running directly in hypervisor state. Programming Note Writing the GEPR register is allowed from both guest supervisor state and hypervisor state. Hypervisor must be able to write GEPR to virtualize External Input interrupt handling for the guest if the guest is using External Proxy. Writing to EPR from the guest is not mapped and results in the same behavior as any undefined supervisor level SPR.
1003
7.3 Exceptions
There are two kinds of exceptions, those caused directly by the execution of an instruction and those caused by an asynchronous event. In either case, the exception may cause one of several types of interrupts to be invoked. Examples of exceptions that can be caused directly by the execution of an instruction include but are not limited to the following: an attempt to execute a reserved-illegal instruction (Illegal Instruction exception type Program interrupt) an attempt by an application program to execute a privileged instruction (Privileged Instruction exception type Program interrupt) an attempt by an application program to access a privileged Special Purpose Register (Privileged Instruction exception type Program interrupt) an attempt by an application program to access a Special Purpose Register that does not exist (Unimplemented Operation Instruction exception type Program interrupt) an attempt by a system program to access a Special Purpose Register that does not exist (boundedly undefined results) the execution of a defined instruction using an invalid form (Illegal Instruction exception type Program interrupt, Unimplemented Operation exception type Program interrupt, or Privileged Instruction exception type Program interrupt) an attempt to access a storage location that is either unavailable (Instruction TLB Error interrupt or Data TLB Error interrupt) or not permitted (Instruction Storage interrupt or Data Storage interrupt) an attempt to access storage with an effective address alignment not supported by the implementation (Alignment interrupt) the execution of a System Call instruction (System Call interrupt) the execution of a Trap instruction whose trap condition is met (Trap type Program interrupt) the execution of a floating-point instruction when floating-point instructions are unavailable (Floating-point Unavailable interrupt) the execution of a floating-point instruction that causes a floating-point enabled exception to exist (Enabled exception type Program interrupt) the execution of a defined instruction that is not implemented by the implementation (Illegal Instruction exception or Unimplemented Operation exception type of Program interrupt)
the execution of an instruction that is not implemented by the implementation (Illegal Instruction exception or Unimplemented Operation exception type of Program interrupt) the execution of an auxiliary processor instruction when the auxiliary processor instruction is unavailable (Auxiliary Processor Unavailable interrupt) the execution of an instruction that causes an auxiliary processor enabled exception (Enabled exception type Program interrupt) The invocation of an interrupt is precise, except that if one of the imprecise modes for invoking the Floatingpoint Enabled Exception type Program interrupt is in effect then the invocation of the Floating-point Enabled Exception type Program interrupt may be imprecise. When the interrupt is invoked imprecisely, the excepting instruction does not appear to complete before the next instruction starts (because one of the effects of the excepting instruction, namely the invocation of the interrupt, has not yet occurred).
1004
1005
1006
Programming Note In general, at process switch (partition switch), due to possible process interlocks and possible data availability requirements, the operating system (hypervisor) needs to consider executing the following. stbcx., sthcx., stwcx. or stdcx., to clear the reservation if one is outstanding, to ensure that a lbarx, lharx, lwarx or ldarx in the old process (partition) is not paired with a stbcx., sthcx., stwcx., or stdcx. in the new process (partition). sync, to ensure that all storage operations of an interrupted process are complete with respect to other threads before that process begins executing on another thread. isync, rfgi <E.HV>, rfi, rfdi [Category: Embedded.Enhanced Debug], rfmci, or rfci to ensure that the instructions in the new process execute in the new context.
1007
The purpose of the instruction is to cause an interrupt. Example: System Call interrupt caused by sc. The interrupt is caused by a condition that is stated, in the instruction description, potentially to cause the interrupt. Example: Alignment interrupt caused by lwarx for which the storage operand is not aligned. The program is attempting to perform a function that it should not be permitted to perform. Example: Data Storage interrupt caused by lwz for which the storage operand is in storage that the program should not be permitted to access. (If the function is one that the program should be permitted to perform, the conditions that caused the interrupt should be corrected and the program re-dispatched such that the instruction will be re-executed. Example: Data Storage interrupt caused by lwz for which the storage operand is in storage that the program should be permitted to access but for which there currently is no TLB entry.)
The interrupt is caused by a condition for which the instruction description (including related material such as the introduction to the section describing the instruction) implies that the instruction works correctly. Example: Alignment interrupt caused by lmw for which the storage operand is not aligned, or by dcbz or dcbzep for which the storage operand is in storage that is Write Through Required or Caching Inhibited. The instruction is an illegal instruction that should appear, to the program executing it, as if it were supported by the implementation. Example: Illegal Instruction type Program interrupt caused by an instruction that has been phased out of the architecture but is still used by some programs that the operating
1008
IVOR IVOR0
IVOR1
Machine Check
Machine Check
1013
2,4 1013
Access
1014
Load and Reserve or Store Conditional to write-thru required storage (W=1) Cache Locking Byte Ordering
x x
Virtualization Fault
TLB Ineligible
Access Byte Ordering Mismatched Instruction Storage (See Book VLE.)) Misaligned Instruction Storage (See Book VLE.) Page Table Fault TLB Ineligible
x x x
{DLK0,DLK1},[ST] [VLEMI] BO, [ST], [FP,AP,SPV], [VLEMI], [EPID] [ST], [FP,AP,SPV], [VLEMI], [EPID] PT, [ST], [FP,AP,SPV], [VLEMI], [EPID] TLBI,[ST], [FP,AP,SPV], [VLEMI], [EPID] [PT] BO, [VLEMI] BO,VLEMI
E E
E.PT
E.PT
E.PT
1016
MIF
PT TLBI
1009
Page
IVOR IVOR4
External Input
EE or GS EE and GS x x x x x x x x x x x [ST],[FP,AP,SPV] [EPID],[VLEMI] PIL, [VLEMI] PPR,[AP], [VLEMI] PTR,[VLEMI] FP, [PIE] FE0, FE1 AP PUO, [VLEMI] [FP,AP,SPV] [VLEMI]
1018
E.HV 1
1018
E E E E E E E E E, E.HV E DIE E
1019 1020
6,7
7 1021 1021
AP Unavailable Decrementer
AP Unavailable x
1021 1022
IVOR11
FIT
FIE E
1022
IVOR12
Watchdog
WIE E
10
1023
IVOR13 Data TLB Error GIVOR13 <E.HV> IVOR14 Inst TLB Error GIVOR14 <E.HV> IVOR15 Debug
E, E.HV E, E.HV DE DE DE DE DE DE DE DE DE DE IDM IDM IDM IDM IDM IDM IDM IDM IDM IDM E E E E E E E E.ED E.ED E.ED
1024
1025
Trap x Inst Addr Compare x Data Addr Compare x Instruction Complete x Branch Taken x Return From Interrupt x Interrupt Taken x Uncond Debug Event x Critical Interrupt Taken x Critical Interrupt Return x
x x x x x x x x
1010
Page
IVOR IVOR32
Interrupt
Exception
SPE/Embedded SPE Unavailable Floating-Point/Vector Unavailable Vector Unavailable Embedded Floating- Embedded FloatingPoint Data Point Data Embedded Floating- Embedded FloatingPoint Round Point Round Embedded Perfor- Embedded Performance Monitor mance Monitor Processor Doorbell Processor Doorbell
SPV, [VLEMI]
SPE
1026
SPV x x x SPV, [VLEMI] SPV, [VLEMI] EE or GS EE or GS CE or GS EE and GS CE and GS ME and GS [VLEMI] [VLEMI] [ST],[FP,AP,SPV] [DATA],[PT] [VLEMI], [EPID]
1027 1027
IVOR36
E.PC
IVOR37
Processor Doorbell Critical Guest Processor Doorbell Guest Processor Doorbell Critical Guest Processor Doorbell Machine Check Embedded Hypervisor System Call Embedded Hypervisor Privilege LRAT Error
Processor Doorbell Crit- x ical Guest Processor Door- x bell Guest Processor Door- x bell Critical Guest Processor Door- x bell Machine Check Embedded Hypervisor System Call Embedded Hypervisor Privilege LRAT Miss x x x
E.PC
IVOR38
E.PC, E.HV E.PC, E.HV E.PC, E.HV E.HV E.HV E.HV. LRAT 1031
IVOR39
1. If an expression of MSR bits is provided, the interrupt is masked if the expression evaluates to 0 and is enabled if the expression evaluates to 1. Figure 68. Interrupt and Exception Types
Figure 68 Notes
1. Although it is not specified, it is common for system implementations to provide, as part of the interrupt controller, independent mask and status bits for the various sources of Critical Input and External Input interrupts.
2. Machine Check interrupts are a special case and are not classified as asynchronous nor synchronous. See Section 7.4.4 on page 1005. 3. The Instruction Complete and Branch Taken debug events are only defined for MSRDE=1 when in Internal Debug Mode (DBCR0IDM=1). In other words, when in Internal Debug Mode with MSRDE=0, then Instruction Complete and Branch Taken debug events cannot occur, and no DBSR
1011
Page
1012
offset 0x000 0x020 0x040 0x060 0x080 0x0A0 0x0C0 0x0E0 0x100 0x120 0x140 0x160 0x180 0x1A0 0x1C0 0x1E0
Interrupt Machine Check Critical Input Debug Data Storage Instruction Storage External Input Alignment Program Floating-Point Unavailable System Call Auxiliary Processor Unavailable Decrementer Fixed-Interval Timer Interrupt Watchdog Timer Interrupt Data TLB Error Instruction TLB Error
[Category: Signal Processing Engine] [Category: Vector] 0x200 SPE/Embedded Floating-Point/Vector Unavailable Interrupt
[Category: SP.Embedded Float_*] (The following vector offsets are required if any SP.Float_ dependent category is supported.) 0x220 0x240 0x260 0x280 0x2A0 0x2C0 0x2E0 0x300 0x320 0x340 0x360 ... 0x7FF 0x800 ... 0xFFF Embedded Floating-Point Data Interrupt Embedded Floating-Point Round Interrupt Embedded Performance Monitor Interrupt
All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x020. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR048:59|| 0b0000. Programming Note Software is responsible for taking any action(s) that are required by the implementation in order to clear any Critical Input exception status prior to reenabling MSRCE in order to avoid another, redundant Critical Input interrupt.
[Category: Embedded Performance Monitor] [Category: Embedded.Processor Control] Processor Doorbell Interrupt Processor Doorbell Critical Interrupt Guest Processor Doorbell Guest Processor Doorbell Critical; Guest Processor Doorbell Machine Check Embedded Hypervisor System Call Embedded Hypervisor Privilege LRAT Error interrupt Reserved
[Category: Embedded.Hypervisor]
[Category: Embedded.Hypervisor.LRAT]
Implementation-dependent
1013
Programming Note If a Machine Check interrupt is caused by an error in the storage subsystem, the storage subsystem may return incorrect data, that may be placed into registers and/or on-chip caches. Programming Note On implementations on which a Machine Check interrupt can be caused by referring to an invalid real address, executing a dcbz, dcbzep, or dcba instruction can cause a delayed Machine Check interrupt by establishing in the data cache a block that is associated with an invalid real address. See Section 4.3 of Book II. A Machine Check interrupt can eventually occur if and when a subsequent attempt is made to write that block to main storage, for example as the result of executing an instruction that causes a cache miss for which the block is the target for replacement or as the result of executing a dcbst, dcbstep, dcbf, or dcbfep instruction.
All other defined MSR bits set to 0. ESR/MCSR Implementation-dependent. Instruction execution resumes at address IVPR0:47 || IVOR148:59||0b0000. If the Embedded.Hypervisor category is supported, a Machine Check interrupt caused by the existence of multiple direct TLB entries or multiple indirect TLB entries (or similar entries in implementation-specific translation caches) which translate a given virtual address (see Section 6.7.3) must occur while still in the context of the partition or hypervisor state that caused it. In these cases, the interrupt must be presented in a way that permits continuing execution. Treating the exception as instruction-caused allows these requirements to be achieved. Also, if the Embedded.Hypervisor category is supported, a Machine Check interrupt resulting from the following situations must be precise. Execution of an External Process ID instruction that has an operand that can be translated by multiple TLB entries. Execution of a tlbivax instruction that isnt a TLB invalidate all and there are multiple entries in a single threads TLB array(s) that match the complete VPN. Execution of a tlbilx instruction with T=3 and there are multiple entries in the TLB array(s) that match the complete VPN. Execution of a tlbsx or tlbsrx. instruction and there are multiple matching TLB entries. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported and Machine Check Interrupt Vector Prefix Register is supported, instruction execution resumes at address MCIVPR0:51 || 0x000. If Interrupt Fixed Offsets [Category: Embedded.PhasedIn] are supported and Machine Check Interrupt Vector Prefix Register is not implemented, instruction execution resumes at address IVPR0:51 || 0x000. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR148:59|| 0b0000.
1014
Instructions lswx or stswx with a length of zero, icbt, dcbt, dcbtep, dcbtst, dcbtstep, or dcba cannot cause a Data Storage interrupt, regardless of the effective address. Programming Note The icbi, icbiep, icbt, icbtls and icblc instructions are treated as Loads from the addressed byte with respect to address translation and protection. These Instruction Cache Management instructions use MSRDS, not MSRIS, to determine translation for their operands. Instruction Storage exceptions and Instruction TLB Miss exceptions are associated with the fetching of instructions not with the execution of instructions. Data Storage exceptions and Data TLB Miss exceptions are associated with the execution of Instruction Cache Management instructions. One exception to the above is that icbtls and icblc only cause a Data Storage exception if they have neither execute access nor read access. When a Data Storage interrupt occurs, the thread suppresses the execution of the instruction causing the Data Storage exception. If Category: Embedded.Hypervisor is not supported or if Category: Embedded.Hypervisor is supported and the interrupt is directed to hypervisor state, SRR0, SRR1, MSR, DEAR, and ESR are updated as follows: SRR0 SRR1 Set to the effective address of the instruction causing the Data Storage interrupt. Set to the contents of the MSR at the time of the interrupt.
MSR CM MSRCM is set to EPCRICM. CE, ME, DE Unchanged. All other defined MSR bits set to 0. DEAR Set to the effective address of a byte that is both within the range of the bytes being accessed by the Storage Access or Cache Management instruction, and within the page whose access caused the Data Storage exception.
1015
ST
DLK0:1 AP
BO TLBI
PT
SPV
VLEMI EPID
Set to 1 if the instruction causing the interrupt is a floating-point load or store; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a Store or store-class Cache Management instruction; otherwise set to 0. Set to an implementation-dependent value due to a Cache Locking exception causing the interrupt. Set to 1 if the instruction causing the interrupt is an Auxiliary Processor load or store; otherwise set to 0. Set to 1 if the instruction caused a Byte Ordering exception; otherwise set to 0. Set to 1 if a TLB Ineligible exception occurred during a Page Table translation for the instruction causing the interrupt; otherwise set to 0. If a Page Table Fault or Read or Write Access Control exception occurred during a Page Table translation for the instruction causing the interrupt, then PT is set to 1 if no TLB entry was created from the Page Table and is set to an implementationdependent value if a TLB entry was created. See Section 6.7.4 for rules about TLB updates. If no Page Table Fault or Read or Write Access Control exception occurred during a Page Table translation for the instruction causing the interrupt, set to 0. Set to 1 if the instruction causing the interrupt is a SPE operation or a Vector operation; otherwise set to 0. Set to 1 if the instruction causing the interrupt resides in VLE storage. Set to 1 if the instruction causing the interrupt is an External Process ID instruction; otherwise set to 0.
All other defined ESR bits are set to 0. If Category: Embedded.Hypervisor is supported and the interrupt is directed to guest supervisor state GSRR0, GSRR1, GDEAR, and GESR are set in place of SRR0, SRR1, DEAR, and ESR, respectively. The MSR is set as follows: MSR CM MSRCM is set to EPCRGICM. CE, ME,GS,DE Unchanged. Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. All other defined MSR bits set to 0. The following is a prioritized listing of the various exceptions which cause a Data Storage interrupt and the corresponding ESR bit, if applicable. Even though
1016
TLBI
Page Table Fault exception A Page Table Fault exception is caused when a Page Table translation occurs for an instruction fetch and the Page Table Entry that is accessed is invalid (Valid bit = 0). TLB Ineligible exception A TLB Ineligible exception is caused when a Page Table translation occurs for an instruction fetch and any of the following conditions are true. The only TLB entries that can be used to hold the translation for the virtual address have IPROT=1 No TLB array can be loaded from the Page Table for the page size specified by the PTE. The PTEARPN is treated as an LPN (The Embedded.Hypervisor category is supported) and there is no TLB array that meets all the following conditions. The TLB array supports the page size specified by the PTE. The TLB array can be loaded from the Page Table (TLBnCFGPT = 1). If the Embedded.Hypervisor category is supported, an Instruction Storage interrupt resulting from a TLB Ineligible exception is always directed to hypervisor state regardless of the setting of EPCRISIGS. When an Instruction Storage interrupt occurs, the thread suppresses the execution of the instruction causing the Instruction Storage exception. If Category: Embedded.Hypervisor is not supported or if Category: Embedded.Hypervisor is supported and the interrupt is directed to hypervisor state, SRR0, SRR1, MSR, and ESR are updated as follows: SRR0, SRR1, MSR, and ESR are updated as follows: SRR0 Set to the effective address of the instruction causing the Instruction Storage interrupt. Set to the contents of the MSR at the time of the interrupt.
PT
VLEMI
Set to 1 if the instruction fetch caused a Byte Ordering exception; otherwise set to 0. Set to 1 if a TLB Ineligible exception occurred during a Page Table translation for the instruction causing the interrupt; otherwise set to 0. If a Page Table Fault or an Execute Access Control exception occurred during a Page Table translation for the instruction causing the interrupt, then PT is set to 1 if no TLB entry was created from the Page Table and is set to an implementation-dependent value if a TLB entry was created. See Section 6.7.4 for rules about TLB updates. If no Page Table Fault or Execute Access Control exception occurred during a Page Table translation for the instruction causing the interrupt, set to 0. Set to 1 if the instruction causing the interrupt resides in VLE storage.
All other defined ESR bits are set to 0. If Category: Embedded.Hypervisor is supported and the interrupt is directed to guest supervisor state, GSRR0, GSRR1, and GESR are set in place of SRR0, SRR1, and ESR, respectively. The MSR is set as follows: MSR CM MSRCM is set to EPCRGICM. CE, ME,GS,DE Unchanged. Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. All other defined MSR bits set to 0. The following is a prioritized listing of the various exceptions which cause a Instruction Storage interrupt and the corresponding ESR bit, if applicable. Even though multiple of these exceptions may occur, at most one of the following exceptions is reported in the ESR. 1. Page Table Fault <E.PT>: PT 2. TLB Ineligible <E.PT>: TLBI 3. Byte Ordering exception: BO 4. Execute Access: If the exception occurred during a Page Table translation, PT <E.PT> Programming Note Since some Instruction Storage exceptions are not mutually exclusive, system software may need to examine the TLB entry or the Page Table entry accessed by the data storage access in order to determine whether additional exceptions may have also occurred.
SRR1
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged. All other defined MSR bits set to 0.
1017
MSR CM MSRCM is set to EHSRICM. CE, ME,DE, Unchanged. All other defined MSR bits set to 0. If Category: Embedded.Hypervisor is supported and the interrupt is directed to the guest supervisor state, GSRR0 and GSRR1 are set in place of SRR0 and SRR1, respectively. The MSR is set as follows:
1018
VLEMI EPID
All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x0C0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR548:59|| 0b0000.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged CE, ME, DE, ICM Unchanged. All other defined MSR bits set to 0. DEAR Set to the effective address of a byte that is both within the range of the bytes being accessed by the Storage Access or Cache Management instruction, and within the page whose access caused the Alignment exception.
ESR FP
ST AP
Set to 1 if the instruction causing the interrupt is a floating-point load or store; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a Store; otherwise set to 0. Set to 1 if the instruction causing the interrupt is an Auxiliary Processor load or store; otherwise set to 0.
1019
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged All other defined MSR bits set to 0.
FP
PIE
SPV
VLEMI
All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x0E0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR648:59|| 0b0000.
1020
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x100. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR748:59|| 0b0000.
MSR CM VLEMI
MSRCM is set to EPCRICM. Set to 1 if the instruction causing the interrupt resides in VLE storage. CE, ME,DE Unchanged. All other defined MSR bits set to 0.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x140. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR948:59|| 0b0000.
If Category: Embedded.Hypervisor is supported and the interrupt is directed to guest supervisor state (MSRGS = 1), GSRR0 and GSRR1 are set in place of SRR0 and SRR1, respectively. The MSR is set as follows: MSR CM MSRCM is set to EPCRGICM. CE, ME,GS,DE Unchanged. Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. All other defined MSR bits set to 0.
1021
Programming Note MSREE also enables the External Input and Decrementer interrupts. SRR0, SRR1, MSR, and TSR are updated as follows: SRR0 SRR1 Set to the effective address of the next instruction to be executed. Set to the contents of the MSR at the time of the interrupt.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged. All other defined MSR bits set to 0.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged All other defined MSR bits set to 0. TSR (See Section 9.5.1 on page 1050.) DIS Set to 1. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x160. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR1048:59|| 0b0000. Programming Note Software is responsible for clearing the Decrementer exception status prior to re-enabling the MSREE bit in order to avoid another redundant Decrementer interrupt. To clear the Decrementer exception, the interrupt handling routine must clear TSRDIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit position that is to be cleared and 0 in all other bit positions. The writedata to the TSR is not direct data, but a mask. A 1 causes the bit to be cleared, and a 0 has no effect.
TSR FIS
If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x180. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR1148:59|| 0b0000. Programming Note Software is responsible for clearing the Fixed-Interval Timer exception status prior to re-enabling the MSREE bit in order to avoid another redundant Fixed-Interval Timer interrupt. To clear the FixedInterval Timer exception, the interrupt handling routine must clear TSRFIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit position that is to be cleared and 0 in all other bit positions. The write-data to the TSR is not direct data, but a mask. A 1 causes the bit to be cleared, and a 0 has no effect.
1022
MSR CM MSRCM is set to EHSRICM. CE, ME, DE Unchanged. All other defined MSR bits set to 0. DEAR Set to the effective address of a byte that is both within the range of the bytes being accessed by the Storage Access or Cache Management instruction, and within the page whose access caused the Data TLB Error exception.
If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x1A0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR1248:59|| 0b0000. Programming Note Software is responsible for clearing the Watchdog Timer exception status prior to re-enabling the MSRCE bit in order to avoid another redundant Watchdog Timer interrupt. To clear the Watchdog Timer exception, the interrupt handling routine must clear TSRWIS. Clearing is done by writing a word to TSR using mtspr with a 1 in any bit position that is to be cleared and 0 in all other bit positions. The write-data to the TSR is not direct data, but a mask. A 1 causes the bit to be cleared, and a 0 has no effect. ESR ST
FP
AP
SPV
VLEMI EPID
Set to 1 if the instruction causing the interrupt is a Store, dcbi, dcbz, or dcbzep instruction; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a floating-point load or store; otherwise set to 0. Set to 1 if the instruction causing the interrupt is an Auxiliary Processor load or store; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a SPE operation or a Vector operation; otherwise set to 0. Set to 1 if the instruction causing the interrupt resides in VLE storage. Set to 1 if the instruction causing the interrupt is an External Process ID instruction; otherwise set to 0.
All other defined ESR bits are set to 0. If Category: Embedded.Hypervisor is supported and the interrupt is directed to guest supervisor state, GSRR0, GSRR1, GDEAR, and GESR are set in place of SRR0, SRR1, DEAR, and ESR, respectively. The MSR is set as follows: MSR CM MSRCM is set to EPCRGICM. CE, ME,GS,DE Unchanged. Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. All other defined MSR bits set to 0. If Category Embedded.Hypervisor is supported and the interrupt is directed to the guest state, instruction execution resumes at the address given by one of the following. GIVPR0:47 || GIVOR1348:59||0b0000 if IVORs [Category: Embedded.Phased-Out] are supported. GIVPR0:51||0x1C0 if Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported.
1023
SRR1
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged. All other defined MSR bits set to 0. If Category: Embedded.Hypervisor is supported and the interrupt is directed to guest supervisor state, GSRR0 and GSRR1 are set in place of SRR0 and SRR1, respectively. The MSR is set as follows: MSR CM MSRCM is set to EPCRGICM. CE, ME,GS,DE Unchanged. Bits in the MSR corresponding to set bits in the MSRP register are left unchanged. All other defined MSR bits set to 0. If Category Embedded.Hypervisor is supported and the interrupt is directed to the guest state, instruction execution resumes at the address given by one of the following.
1024
caused the Debug interrupt. See Section 10.4.10, Critical Interrupt Return Debug Event [Category: Embedded.Enhanced Debug]. For Debug exceptions that occur while Debug interrupts are disabled (DBCR0IDM=0 or MSRDE=0), a Debug interrupt will occur at the next synchronizing event if DBCR0IDM and MSRDE are modified such that they are both 1 and if the Debug exception Status is still set in the DBSR. When this occurs, CSRR0 or DSRR0 [Category:Embedded.Enhanced Debug] is set to the address of the instruction that would have executed next, not with the address of the instruction that modified the Debug Control Register 0 or MSR and thus caused the interrupt. CSRR1 or DSRR1 [Category: Embedded.Enhanced Debug] Set to the contents of the MSR at the time of the interrupt. MSR CM ME
All other supported MSR bits set to 0. DBSR Set to indicate type of Debug Event (see Section 10.5.2)
If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x040. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR1548:59|| 0b0000.
1025
7.6.18 SPE/Embedded FloatingPoint/Vector Unavailable Interrupt [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Float Vector, Vector]
The SPE/Embedded Floating-Point/Vector Unavailable interrupt occurs when no higher priority exception exists, and an attempt is made to execute an SPE, SPE.Embedded Float Scalar Double, SPE.Embedded Float Vector, or Vector instruction and MSRSPV = 0. When an Embedded Floating-Point Unavailable interrupt occurs, the hardware suppresses the execution of the instruction causing the exception. SRR0, SRR1, MSR, and ESR are updated as follows: SRR0 Set to the effective address of the instruction causing the Embedded Floating-Point Unavailable interrupt. Set to the contents of the MSR at the time of the interrupt.
SRR1
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged ESR SPV VLEMI
Set to 1. Set to 1 if the instruction causing the interrupt resides in VLE storage.
All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x200. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3248:59|| 0b0000. Programming Note This interrupt is also used by the Signal Processing Engine in the same manner. It should be used by software to determine if the application is using the upper 32 bits of the GPRs in a 32-bit implementation and thus be required to save and restore them on context switch.
1026
7.6.19 Embedded Floating-Point Data Interrupt [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, SPE.Embedded Float Vector]
The Embedded Floating-Point Data interrupt occurs when no higher priority exception exists (see Section 7.9) and an Embedded Floating-Point Data exception is presented to the interrupt mechanism. The Embedded Floating-Point Data exception causing the interrupt is indicated in the SPEFSCR; these exceptions include Embedded Floating-Point Invalid Operation/Input Error (FINV, FINVH), Embedded Floating-Point Divide By Zero (FDBZ, FDBZH), Embedded Floating-Point Overflow (FOV, FOVH), and Embedded Floating-Point Underflow (FUNF, FUNFH) When an Embedded Floating-Point Data interrupt occurs, the hardware suppresses the execution of the instruction causing the exception. SRR0, SRR1, MSR, and ESR are updated as follows: SRR0 Set to the effective address of the instruction causing the Embedded Floating-Point Data interrupt. Set to the contents of the MSR at the time of the interrupt.
7.6.20 Embedded Floating-Point Round Interrupt [Categories: SPE.Embedded Float Scalar Double, SPE.Embedded Float Scalar Single, SPE.Embedded Float Vector]
The Embedded Floating-Point Round interrupt occurs when no higher priority exception exists (see Section 7.9 on page 1037), SPEFSCRFINXE is set to 1, and any of the following occurs:
the unrounded result of an Embedded Floating-Point operation is not exact an overflow occurs and overflow exceptions are disabled (FOVF or FOVFH is set to 1 and FOVFE is set to 0) an underflow occurs and underflow exceptions are disabled (FUNF is set to 1 and FUNFE is set to 0).
The value of SPEFSCRFINXS is 1, indicating that one of the above exceptions has occurred, and additional information about the exception is found in SPEFSCRFGH FG FXH FX. When an Embedded Floating-Point Round interrupt occurs, the hardware completes the execution of the instruction causing the exception and writes the result to the destination register prior to taking the interrupt. SRR0, SRR1, MSR, and ESR are updated as follows: SRR0 Set to the effective address of the instruction following the instruction causing the Embedded Floating-Point Round interrupt. Set to the contents of the MSR at the time of the interrupt.
SRR1
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged VLEMI Set to 1 if the instruction causing the interrupt resides in VLE storage. All other defined MSR bits set to 0. ESR SPV
SRR1
Set to 1.
All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x220. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3348:59|| 0b0000.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged All other defined MSR bits set to 0. ESR SPV VLEMI
Set to 1. Set to 1 if the instruction causing the interrupt resides in VLE storage.
All other defined ESR bits are set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x240. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3448:59|| 0b0000.
1027
Programming Note If an implementation does not support Infinity rounding modes and the rounding mode is set to be +Infinity or -Infinity, an Embedded Floating-Point Round interrupt occurs after every Embedded Floating-Point instruction for which rounding might occur regardless of the value of FINXE, provided no higher priority exception exists. When an Embedded Floating-Point Round interrupt occurs, the unrounded (truncated) result of an inexact high or low element is placed in the target register. If only a single element is inexact, the other exact element is updated with the correctly rounded result, and the FG and FX bits corresponding to the other exact element will both be 0. The bits FG (FGH) and FX (FXH) are provided so that an interrupt handler can round the result as it desires. FG (FGH) is the value of the bit immediately to the right of the least significant bit of the destination format mantissa from the infinitely precise intermediate calculation before rounding. FX (FXH) is the value of the or of all the bits to the right of the FG (FGH) of the destination format mantissa from the infinitely precise intermediate calculation before rounding.
CSRR1
MSR CM ME DE
MSRCM is set to EPCRICM. Unchanged. Unchanged if category E.ED is supported, otherwise set to 0
All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2A0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3748:59|| 0b0000.
MSR CM
1028
Programming Note Some guest operating systems running on a hypervisor may use lazy interrupt blocking. That is, when the operating system wants to block interrupts at the interrupt controller, it does not actually perform the blocking operation, but instead sets a value in memory that represents the level at which interrupts are to be blocked. When an actual interrupt occurs, this value is consulted to determine if the interrupt should have been blocked. If so, the current interrupt level that is to be blocked is set in the interrupt controller and the interrupt handling code returns without acknowledging the interrupt. When interrupts are unblocked at a later time, the interrupt will be reasserted by the interrupt controller. When a hypervisor is taking external interrupts and then reflecting them to a guest, the hypervisor must acknowledge the interrupt before reflecting it to the guest since the external interrupt will occur again once MSRGS = 1 regardless of the state of MSREE. To emulate the behavior required for lazy interrupt blocking by the guest, the hypervisor should execute another msgsnd instruction specifying a Guest Processor Doorbell at the time that it is reflecting the interrupt to the guest. When the guest performs its interrupt acknowledge (a hypercall or writing to an interrupt controller register emulated by the hypervisor), the hypervisor can execute a msgclr to clear a pending message if there are no other interrupts to be reflected to the guest.
MSR CM MSRCM is set to EPCRICM. CE, ME,DE Unchanged. All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2C0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3848:59|| 0b0000. Programming Note Guest Processor Doorbell interrupts are used by the hypervisor to be notified when the guest operating system has set MSREE to 1. This allows the hypervisor to reflect base class interrupts to the guest at a time when the guest is ready to accept them (MSRGS=1 and MSREE=1).
1029
CSRR0, CSRR1 and MSR are updated as follows: CSRR0 CSRR1 Set to the effective address of the next instruction to be executed. Set to the contents of the MSR at the time of the interrupt.
MSR CM ME DE
MSRCM is set to EPCRICM. Unchanged. Unchanged if category E.ED is supported, otherwise set to 0
All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2E0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3948:59|| 0b0000. Programming Note
MSR CM ME DE
MSRCM is set to EPCRICM. Unchanged. Unchanged if category E.ED is supported, otherwise set to 0
All other defined MSR bits set to 0. If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x2E0. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR3948:59|| 0b0000. Programming Note Guest Processor Doorbell Critical interrupts are used by the hypervisor to be notified when the guest operating system has set MSRCE to 1. This allows the hypervisor to reflect critical class interrupts to the guest at a time when the guest is ready to accept them (MSRGS=1 and MSRCE=1).
Guest Processor Doorbell Machine Check interrupts are used by the hypervisor to be notified when the guest operating system has set MSRME to 1. This allows the hypervisor to reflect machine check class interrupts to the guest at a time when the guest is ready to accept them (MSRGS=1 and MSRME=1). Programming Note Guest Processor Doorbell Critical interrupts and Guest Processor Doorbell Machine Check interrupts share the same IVOR. Hypervisor software can differentiate between the two interrupts by comparing whether CE or ME is set in CSRR1 and which interrupt class is to be reflected.
7.6.26 Guest Processor Doorbell Machine Check Interrupt [Category: Embedded.Hypervisor,Embedded.Processor Control]
A Guest Processor Doorbell Machine Check Interrupt occurs when no higher priority exception exists, a Guest Processor Doorbell Machine Check exception is present, and the interrupt is enabled (MSRGS = 1 and MSRME=1). Guest Processor Doorbell Machine Check exceptions are generated when G_DBELL_MC messages (see Chapter 11) are received and accepted by the thread.
MSR CM VLEMI
MSRCM is set to EPCRICM. Set to 1 if the instruction causing the interrupt resides in VLE storage. CE, ME,DE
1030
SRR1
MSR CM VLEMI
MSRCM is set to EPCRICM. Set to 1 if the instruction causing the interrupt resides in VLE storage. CE, ME,DE Unchanged. All other defined MSR bits set to 0.
If Interrupt Fixed Offsets [Category: Embedded.Phased-In] are supported, instruction execution resumes at address IVPR0:51 || 0x320. Otherwise, instruction execution resumes at address IVPR0:47 || IVOR4148:59|| 0b0000.
1031
ESR FP
ST
AP
SPV
DATA
TLBI
PT
VLEMI
EPID
Set to 1 if the instruction causing the interrupt is a floating-point load or store and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a Store or store-class Cache Management instruction and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. Set to 1 if the instruction causing the interrupt is an Auxiliary Processor load or store and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. Set to 1 if the instruction causing the interrupt is a SPE operation or a Vector operation, the instruction is a Load or Store, and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0. Set to 1 if the interrupt is due to an LRAT miss resulting from a Page Table translation of a Load, Store or Cache Management operand address; otherwise set to 0. Set to 1 if a TLB Ineligible exception occurred during a Page Table translation for the instruction causing the interrupt; otherwise set to 0. Set to 1 if the cause of the interrupt is an LRAT miss exception on a Page Table translation. Set to 0 if the cause of the interrupt is an LRAT miss exception on a tlbwe. Set to 1 if the instruction causing the interrupt resides in VLE storage, the instruction is a Load, Store, or Cache Management instruction, and the translation of the operand address causes the LRAT Miss exception. Set to 1 if the instruction causing the interrupt is an External Process ID instruction, the instruction is a Load, Store, or Cache Management instruction, and the translation of the operand address causes the LRAT Miss exception; otherwise set to 0.
1032
1033
1034
Re-enabling of MSREE Re-enabling of MSRCE,EE in critical class interrupt handlers, and if the Embedded.Enhanced Debug category is not supported, re-enabling of MSRDE. [Category: Embedded.Enhanced Debug] Reenabling of MSRCE,EE,DE in Debug class interrupt handlers Re-enabling of MSREE,CE,DE,ME in Machine Check interrupt handlers.
Branching (or sequential execution) to addresses for which any of the following conditions are true. The address is not mapped by the TLB or the Page Table or is mapped without UX=1 or SX=1 permission. Both the Embedded.Hypervisor.LRAT and the Embedded Page.Table category are supported, MSRGS=1, and the effective address is mapped by the Page Table but the LPN is not mapped by the LRAT. This prevents Instruction Storage, LRAT Error, and Instruction TLB Error interrupts. Load, Store or Cache Management instructions to addresses for which any of the following conditions are true. The address is not mapped by the TLB or the Page Table or is mapped without the required access permissions. Both the Embedded.Hypervisor.LRAT and the Embedded Page.Table category are supported, MSRGS=1, and the effective address is mapped by the Page Table but the LPN is not mapped by the LRAT. This prevents Data Storage, LRAT Error, and Data TLB Error interrupts. Execution of any floating-point instruction This prevents Floating-Point Unavailable interrupts. Note that this interrupt would occur upon the
1035
Even though, as indicated above, the base, synchronous exception types listed under item 1 are generated with higher priority than the non-base interrupt classes listed in items 2-6, the fact is that these base class interrupts will immediately be followed by the highest priority existing interrupt in items 2-5, without executing any instructions at the base class interrupt handler. This is because the base interrupt classes do not automatically disable the MSR mask bits for the interrupts listed in 2-5. In all other cases, a particular interrupt class from the above list will automatically disable any subsequent interrupts of the same class, as well as all other interrupt classes that are listed below it in the priority order.
1036
Programming Note Some exception types may even be mutually exclusive of each other and could otherwise be considered the same priority. In these cases, the exceptions are listed in the order suggested by the sequential execution model.
If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Data Address Compare) or Debug (Data Value Compare), and is not causing any of the exceptions listed in items 2-11, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result.
7.9.1.2 Exception Priorities for Other Defined Load and Store Instructions and Defined Cache Management Instructions
The following prioritized list of exceptions may occur as a result of the attempted execution of any other defined Load or Store instruction, or defined Cache Management instruction. 1. 2. 3. 4. Debug (Instruction Address Compare) Instruction TLB Error Instruction Storage Interrupt (all types) LRAT Error for instruction fetch [Categories: E.PT and E.HV.LRAT]
1037
If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Trap), and is not causing any of the exceptions listed in items 2-6, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result.
6. 7. 8. 9.
10. 11.
1038
If the instruction is causing both a Debug (Instruction Address Compare) and a Debug (Branch Taken), and is not causing any of the exceptions listed in items 2-6, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result.
If the rfi or rfci, rfmci, or rfdi [Category: Embedded.Enhanced Debug] or rfgi [Category: Embedded.Hypervisor] instruction is causing both a Debug (Instruction Address Compare) and a Debug (Return From Interrupt), and is not causing any of the exceptions listed in items 2-6, it is permissible for both exceptions to be generated and recorded in the DBSR. A single Debug interrupt will result.
1039
1040
8.1 Background
This chapter describes the requirements for thread reset. This includes both the means of causing reset, and the specific initialization that is required to be performed automatically by the hardware. This chapter also provides an overview of the operations that should be performed by initialization software. In general, the specific actions taken by a thread upon reset are implementation-dependent. Also, it is the responsibility of system initialization software to initialize the majority of thread and system resources after reset. Implementations are required to provide a minimum thread initialization such that this system software may be fetched and executed, thereby accomplishing the rest of system initialization.
The Thread Enable Register, Machine State Register, Processor Version Register, and a TLB entry are updated as follows.
Setting Comments 0 Computation Mode (set to 32-bit mode) GS 0 Hypervisor state <E.HV> UCLE 0 User Cache Locking Enable SPV 0 SPE/Embedded Floating-Point/ Vector Unavailable CE 0 Critical Input interrupts disabled DE 0 Debug interrupts disabled EE 0 External Input interrupts disabled PR 0 Supervisor mode FP 0 FP unavailable ME 0 Machine Check interrupts disabled FE0 0 FP exception type Program interrupts disabled FE1 0 FP exception type Program interrupts disabled IS 0 Instruction Address Space 0 DS 0 Data Address Space 0 PMM 0 Performance Monitor Mark Figure 71. Machine State Register Initial Values
Bit CM
1041
Field V EPN RPN TS IND <E.PT> TLPID <E.HV> TGS <E.HV> SIZE W I M G E U0 U1 U2 U3 TID UX UR UW SX SR SW VLE ACM IPROT VF <E.HV>
Comments valid Represents the last page in effective address space Represents the last page in physical address space translation address space 0 direct entry translation logical partition ID translation hypervisor state smallest page size supported implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value implementation-dependent value, but page must be accessible implementation-dependent value implementation-dependent value implementation-dependent value page is execute accessible in supervisor mode page is read accessible in supervisor mode page is write accessible in supervisor mode implementation-dependent value implementation-dependent value implementation-dependent value no virtualization fault
TLB entry
A TLB entry (which entry is implementation-dependent) is initialized in an implementation-dependent manner that maps the last page in the implemented effective storage address space, with the following field settings:
Figure 72. TLB Initial Values The initial settings of EPN and RPN are dependent upon the number of bits implemented in the EPN and RPN fields and the minimum page size supported by the implementation. For example, an implementation that supports 64KB pages as the smallest size and 32 bits of effective address would implement a 16 bit EPN and set the initial value of the EPN field of the TLB boot entry to 216-1 (0xFFFF) while an implementation that supports 4K pages as the smallest size and 32 bits of effective address would implement a 20 bit EPN and
1042
Invalidate the instruction cache and data cache (implementation-dependent). Initialize system memory as required by the operating system or application code. Initialize the Interrupt Vector Prefix Register and Interrupt Vector Offset Register. Initialize other registers as needed by the system. Initialize off-chip system facilities. Dispatch the operating system or application code.
1043
1044
9.1 Overview
The Time Base, Decrementer, Fixed-interval Timer, and Watchdog Timer provide timing functions for the system. The remainder of this section describes these registers and related facilities.
The period of the Time Base depends on the driving frequency. As an order of magnitude example, suppose that the CPU clock is 1 GHz and that the Time Base is driven by this frequency divided by 32. Then the period of the Time Base would be 2 32 TTB = -------------------- = 5.90 1011 seconds 1 GHz which is approximately 18,700 years. The Time Base is implemented such that: 1. Loading a GPR from the Time Base has no effect on the accuracy of the Time Base. 2. Copying the contents of a GPR to the Time Base replaces the contents of the Time Base with the contents of the GPR. The Power ISA does not specify a relationship between the frequency at which the Time Base is updated and other frequencies, such as the CPU clock or bus clock in a Power ISA system. The Time Base update frequency is not required to be constant. What is required, so that system software can keep time of day and operate interval timers, is one of the following. The system provides an (implementation-dependent) interrupt to software whenever the update frequency of the Time Base bits 0:59 changes, and a means to determine what the current update frequency is. The update frequency of the Time Base bits 0:59 is under the control of the system software. Implementations must provide a means for either preventing the Time Base from incrementing or preventing it from being read in user mode (MSRPR=1). If the means is under software control, it must be privileged.
64
TBL
63
Figure 73. Time Base The Time Base bits 0:59 increment until their value becomes 0xFFF_FFFF_FFFF_FFFF (259 - 1), at the next increment their value becomes 0x000_0000_0000_0000. There is no interrupt or other indication when this occurs. Time base bits 60:63 may increment at a variable rate. When the value of bit 59 changes, bits 60:63 are set to zero; if bits 60:63 increment to 0xF before the value of bit 59 changes, they remain at 0xF until the value of bit 59 changes.
1045
Programming Note The instructions for writing the Time Base are mode-independent. Thus code written to set the Time Base will work correctly in either 64-bit or 32bit mode.
Provided that no interrupts occur while the last three instructions are being executed, loading 0 into TBL prevents the possibility of a carry from TBL to TBU while the Time Base is being initialized. Virtualized Implementation Note In virtualized implementations, TBU and TBL are read-only.
1046
9.3 Decrementer
The Decrementer (DEC) is a 32-bit decrementing counter that provides a mechanism for causing a Decrementer interrupt after a programmable delay. The contents of the Decrementer are treated as a signed integer. DEC
32 63
Figure 74. Decrementer Decrementer bits 32:59 count down until their value becomes 0x000_0000, at the next increment their value becomes 0xFFF_FFFF. Decrementer bits 60:63 may decrement at a variable rate. When the value of bit 59 changes, bits 60:63 are set to 0xF; if bits 60:63 decrement to 0x0 before the value of bit 59 changes, they remain at 0x0 until the value of bit 59 changes. The Decrementer is driven by the same frequency as the Time Base. The period of the Decrementer will depend on the driving frequency, but if the same values are used as given above for the Time Base (see Section 9.2), and if the Time Base update frequency is constant, the period would be 2 32 TDEC = -------------------- = 137 seconds. 1 GHz The Decrementer counts down. The operation of the Decrementer satisfies the following constraints. 1. The operation of the Time Base and the Decrementer is coherent, i.e., the counters are driven by the same fundamental time base. 2. Loading a GPR from the Decrementer has no effect on the accuracy of the Time Base. 3. Copying the contents of a GPR to the Decrementer replaces the contents of the Decrementer with the contents of the GPR. Programming Note In systems that change the Time Base update frequency for purposes such as power management, the Decrementer input frequency will also change. Software must be aware of this in order to set interval timers. If Decrementer bits 60:63 are used as part of a random number generator, software must account for the fact that these bits are set to 0xF only when bit 59 changes state regardless of whether or not they decremented to 0x0 since they were previously set to 0xF.
32
1047
Figure 75. Decrementer Bits of the decrementer auto-reload register are numbered 32 (most-significant bit) to 63 (least-significant bit). The Decrementer Auto-Reload Register is provided to support the auto-reload feature of the Decrementer. See Section 9.3.2 The contents of the Decrementer Auto-Reload Register cannot be read. The contents of bits 32:63 of register RS can be written to the Decrementer Auto-Reload Register using the mtspr instruction. This register is hypervisor privileged.
1048
Watchdog Timer events based on one of 4 Time Base bits selected by TCRWP (the 4 Time Base bits that can be selected by TCRWP are implementation-dependent) Fixed-Interval Timer events based on one of 4 Time Base bits selected by TCRFP (the 4 Time Base bits that can be selected by TCRFP are implementation-dependent)
(decrementer) DEC
Decrementer event 0/1 detect auto-reload
DECAR 0
Figure 76. Relationships of the Timer Facilities The contents of the Timer Control Register can be read using the mfspr instruction. The contents of bits 32:63 of register RS can be written to the Timer Control Register using the mtspr instruction. The contents of the TCR are defined below: Bit(s) 32:33 Description Watchdog Timer Period Section 9.7 on page 1051) (WP) (see 36 of any of these settings is implementationdependent. The Watchdog Timer Reset Control field is cleared to zero by reset. These bits are set only by software. Once a 1 has been written to one of these bits, that bit remains a 1 until a reset occurs. This is to prevent errant code from disabling the Watchdog reset function. Watchdog Timer Interrupt Enable (WIE) (see Section 9.7 on page 1051) 0 1 37 Disable Watchdog Timer interrupt Enable Watchdog Timer interrupt
31
Specifies one of 4 bit locations of the Time Base used to signal a Watchdog Timer exception on a transition from 0 to 1. The 4 Time Base bits that can be specified to serve as the Watchdog Timer period are implementation-dependent.
Decrementer Interrupt Enable (DIE) (see Section 9.3 on page 1047) 0 1 Disable Decrementer interrupt Enable Decrementer interrupt (FP) (see
34:35
Watchdog Timer Reset Control (WRC) (see Section 9.7 on page 1051) 00 No Watchdog Timer reset will occur TCRWRC resets to 0b00. This field may be set by software, but cannot be cleared by software (except by a software-induced reset) 01-11 Force thread to be reset on second timeout of Watchdog Timer. The exact function
38:39
Specifies one of 4 bit locations of the Time Base used to signal a Fixed-Interval Timer exception on a transition from 0 to 1. The 4 Time Base bits that can be specified to serve as the Fixed-Interval Timer period are implementation-dependent. 40 Fixed-Interval Timer Interrupt Enable (FIE) (see Section 9.6 on page 1051)
1049
Auto-Reload Enable (ARE) 0 Disable auto-reload of the Decrementer Decrementer exception is presented (i.e. TSRDIS is set to 1) when the Decrementer is decremented from a value of 0x0000_0001. The next value placed in the Decrementer is the value 0x0000_0000. If Category: Embedded.Hypervisor is supported, when (MSREE=1 or MSRGS=1), TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. If Category: Embedded.Hypervisor is not supported, when MSREE=1, TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. Software must reset TSRDIS. 1 Enable auto-reload of the Decrementer Decrementer exception is presented (i.e. TSRDIS is set to 1) when the Decrementer is decremented from a value of 0x0000_0001. The contents of the Decrementer Auto-Reload Register is placed in the Decrementer. The Decrementer resumes decrementing. If Category: Embedded.Hypervisor is supported, when (MSREE=1 or MSRGS=1), TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. If Category: Embedded.Hypervisor is not supported, when MSREE=1, TCRDIE=1, and TSRDIS=1, a Decrementer interrupt is taken. Software must reset TSRDIS.
Watchdog Timer Interrupt Status (WIS) (see Section 9.7 on page 1051) 0 1 A Watchdog Timer event has not occurred. A Watchdog Timer event has occurred. If Category: Embedded.Hypervisor is supported, when (MSRCE=1 or MSRGS=1 ) and TCRWIE=1, a Watchdog Timer interrupt is taken. If Category: Embedded.Hypervisor is not supported, when MSRCE=1 and TCRWIE=1, a Watchdog Timer interrupt is taken.
34:35
Watchdog Timer Reset Status (WRS) (see Section 9.7 on page 1051) These two bits are set to one of three values when a reset is caused by the Watchdog Timer. These bits are undefined at power-up. 00 No Watchdog Timer reset has occurred. 01 Implementation-dependent reset information. 10 Implementation-dependent reset information. 11 Implementation-dependent reset information.
36
Decrementer Interrupt Status (DIS) (see Section 9.3.2 on page 1047) 0 1 A Decrementer event has not occurred. A Decrementer event has occurred. If Category: Embedded.Hypervisor is supported, when (MSREE=1 or MSRGS=1) and TCRDIE=1, a Decrementer interrupt is taken. If Category: Embedded.Hypervisor is not supported, when MSREE=1 and TCRDIE=1, a Decrementer interrupt is taken.
42 43:63
Implementation-dependent Reserved
37
Fixed-Interval Timer Interrupt Status (FIS) (see Section 9.6 on page 1051) 0 1 A Fixed-Interval Timer event has not occurred. A Fixed-Interval Timer event has occurred. If Category: Embedded.Hypervisor is supported, when (MSREE=1 or MSRGS=1) and TCRFIE=1, a Fixed-Interval Timer interrupt is taken. If Category: Embedded.Hypervisor is not supported, when MSREE=1 and TCRFIE=1, a FixedInterval Timer interrupt is taken.
38:63
Reserved
1050
1051
Time-out. No exception recorded in TSRWIS. Set TSRENW so next time-out will cause exception.
TSRENW,WIS=0b00
(2) SW Loop
TSRENW,WIS=0b10
(3) SW Loop
Time-out. WDT exception recorded in TSRWIS. If Category: Embedded.Hypervisor is supported, WDT interrupt will occur if enabled by TCRWIE=1 and (MSRCE=1 or MSRGS=1). If category Embedded.Hypervisor is not supported, WDT will occur if enabled by TCRWIE=1 and MSRCE=1. If TCRWRC00 then RESET, including TSRWRS TCRWRC TCRWRC 0b00
TSRENW,WIS=0b01
Time-out
Action when timer interval expires Set Enable Next Watchdog Timer (TSRENW=1). Set Enable Next Watchdog Timer (TSRENW=1). Set Watchdog Timer interrupt status bit (TSRWIS=1). If Category: Embedded.Hypervisor is supported and Watchdog Timer interrupt is enabled (TCRWIE=1 and (MSRCE=1 or MSRGS=1)), then interrupt.If Category: Embedded.Hypervisor is not supported and Watchdog Timer interrupt is enabled (TCRWIE=1 and MSRCE=1), then interrupt. Cause Watchdog Timer reset action specified by TCRWRC. Reset will copy pre-reset TCRWRC into TSRWRS, then clear TCRWRC.
Figure 78. Watchdog Timer Controls The controls described in the above table imply three different modes of operation that a programmer might select for the Watchdog Timer. Each of these modes assumes that TCRWRC has been set to allow thread reset by the Watchdog facility: 1. Always take the Watchdog Timer interrupt when pending, and never attempt to prevent its occurrence. In this mode, the Watchdog Timer interrupt caused by a first time-out is used to clear TSRWIS so a second time-out never occurs. TSRENW is not cleared, thereby allowing the next time-out to cause another interrupt. 2. Always take the Watchdog Timer interrupt when pending, but avoid when possible. In this mode a recurring code loop of reliable duration (or perhaps a periodic interrupt handler such as the FixedInterval Timer interrupt handler) is used to repeatedly clear TSRENW such that a first time-out exception is avoided, and thus no Watchdog Timer interrupt occurs. Once TSRENW has been cleared, software has between one and two full Watchdog periods before a Watchdog exception will be posted in TSRWIS. If this occurs before the software is able to clear TSRENW again, a Watchdog
1052
1053
1054
10.1 Overview
Debug facilities are provided to enable hardware and software debug functions, such as instruction and data breakpoints and program single stepping. The debug facilities consist of a set of Debug Control Registers (DBCR0, DBCR1, and DBCR2) (see Section 10.5.1 on page 1063), a set of Address and Data Value Compare Registers (IAC1, IAC2, IAC3, IAC4, DAC1, DAC2, DVC1, and DVC2), (see Section 10.4.3, Section 10.4.4, and Section 10.4.5), a Debug Status Register (DBSR) (see Section 10.5.2) for enabling and recording various kinds of debug events, and a special Debug interrupt type built into the interrupt mechanism (see Section 7.6.17). The debug facilities also provide a mechanism for software-controlled thread reset, and for controlling the operation of the timers in a debug environment.
The mfspr and mtspr instructions (see Section 5.4.1) provide access to the registers of the debug facilities. In addition to the facilities described here, implementations will typically include debug facilities, modes, and access mechanisms which are implementation-specific. For example, implementations will typically provide access to the debug facilities via a dedicated interface such as the IEEE 1149.1 Test Access Port (JTAG).
1055
1056
DBCR1IAC2US specifies whether IAC2 debug events can occur in user mode or supervisor mode, or both. DBCR1IAC3US specifies whether IAC3 debug events can occur in user mode or supervisor mode, or both. DBCR1IAC4US specifies whether IAC4 debug events can occur in user mode or supervisor mode, or both.
1057
For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode. Exclusive address range compare mode For IAC1 and IAC2 debug events, if the 64-bit address of the instruction fetch is less than the contents of the IAC1 or greater than or equal to the contents of the IAC2, an instruction address match occurs. For IAC3 and IAC4 debug events, if the 64-bit address of the instruction fetch is less than the contents of the IAC3 or greater than or equal to the contents of the IAC4, an instruction address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode.
Exact address compare mode If the address of the instruction fetch is equal to the value in the enabled IAC Register, an instruction address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode. Address bit match mode For IAC1 and IAC2 debug events, if the address of the instruction fetch access, ANDed with the contents of the IAC2, are equal to the contents of the IAC1, also ANDed with the contents of the IAC2, an instruction address match occurs. For IAC3 and IAC4 debug events, if the address of the instruction fetch, ANDed with the contents of the IAC4, are equal to the contents of the IAC3, also ANDed with the contents of the IAC4, an instruction address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode.
See the detailed description of DBCR0 (see Section 10.5.1.1, Debug Control Register 0 (DBCR0) on page 1063) and DBCR1 (see Section 10.5.1.2, Debug Control Register 1 (DBCR1) on page 1064) and the modes for detecting IAC1, IAC2, IAC3 and IAC4 debug events. Instruction Address Compare debug events can occur regardless of the setting of MSRDE or DBCR0IDM. When an Instruction Address Compare debug event occurs, the corresponding DBSRIAC1, DBSRIAC2, DBSRIAC3, or DBSRIAC4 bit or bits are set to record the debug exception. If MSRDE=0, DBSRIDE is also set to 1 to record the imprecise debug event. If MSRDE=1 (i.e. Debug interrupts are enabled) at the time of the Instruction Address Compare debug exception, a Debug interrupt will occur immediately (provided there exists no higher priority exception which is enabled to cause an interrupt). The execution of the instruction causing the exception will be suppressed, and CSRR0/DSRR0 [Category: Embedded.Enhanced Debug] will be set to the address of the excepting instruction. If MSRDE=0 (i.e. Debug interrupts are disabled) at the time of the Instruction Address Compare debug exception, a Debug interrupt will not occur, and the instruction will complete execution (provided the instruction is not causing some other exception which will generate an enabled interrupt).
Inclusive address range compare mode For IAC1 and IAC2 debug events, if the 64-bit
1058
dcbt, dcbtls, dcbtep, icbt, icbtls, icbi, icblc, dcblc, and icbiep are all considered reads with respect to debug events. Note that dcbt, dcbtep, and icbt are treated as no-operations when they report Data Storage or Data TLB Miss exceptions, instead of being allowed to cause interrupts. However, these instructions are allowed to cause Debug interrupts, even when they would otherwise have been nooped due to a Data Storage or Data TLB Miss exception. dcbtst, dcbtstls, dcbtstep, dcbz, dcbzep, dcbi, dcbf, dcbfep, dcba, dcbst, and dcbstep are all considered writes with respect to
Exact address compare mode If the 64-bit address of the data storage access is equal to the value in the enabled Data Address Compare Register, a data address match occurs.
1059
Address bit match mode If the address of the data storage access, ANDed with the contents of the DAC2, are equal to the contents of the DAC1, also ANDed with the contents of the DAC2, a data address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode.
Inclusive address range compare mode If the 64-bit address of the data storage access is greater than or equal to the contents of the DAC1 and less than the contents of the DAC2, a data address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode.
Exclusive address range compare mode If the 64-bit address of the data storage access is less than the contents of the DAC1 or greater than or equal to the contents of the DAC2, a data address match occurs. For 64-bit implementations, the addresses are masked to compare only bits 32:63 when the thread is executing in 32-bit mode.
1060
BRT
1061
1062
Warning: Writing 0b01, 0b10, or 0b11 to these bits may cause a thread reset to occur. Instruction (ICMP) 0 1 Completion Debug Event
ICMP debug events are disabled ICMP debug events are enabled
1063
48
Return Debug Event Enable (RET) 0 1 RET debug events cannot occur RET debug events can occur
Note: Taken branches will not cause a BRT debug event if MSRDE=0. 38 Interrupt Taken Debug Event Enable (IRPT) 0 1 IRPT debug events are disabled IRPT debug events are enabled 49:56 57
Note: Return From Critical Interrupt will not cause an RET debug event if MSRDE=0. If the Embedded.Enhanced Debug category is supported, see Section 10.4.10 Reserved Critical Interrupt Taken Debug Event (CIRPT) [Category: Embedded.Enhanced Debug] A Critical Interrupt Taken Debug Event occurs when DBCR0CIRPT = 1 and a critical interrupt (any interrupt that uses the critical class, i.e. uses CSRR0 and CSRR1) occurs. 0 1 58 Critical interrupt taken debug events are disabled. Critical interrupt taken debug events are enabled.
Note: Critical interrupts will not cause an IRPT Debug event even if MSRDE=0. If the Embedded.Enhanced Debug category is supported, see Section 10.4.9. 39 Trap Debug Event Enable (TRAP) 0 1 40 TRAP debug events cannot occur TRAP debug events can occur
Instruction Address Compare 1 Debug Event Enable (IAC1) 0 1 IAC1 debug events cannot occur IAC1 debug events can occur
41
Instruction Address Compare 2 Debug Event Enable (IAC2) 0 1 IAC2 debug events cannot occur IAC2 debug events can occur
42
Instruction Address Compare 3 Debug Event Enable (IAC3) 0 1 IAC3 debug events cannot occur IAC3 debug events can occur 59:62 63
Critical Interrupt Return Debug Event (CRET) [Category: Embedded.Enhanced Debug] A Critical Interrupt Return Debug Event occurs when DBCR0CRET= 1 and a return from critical interrupt (an rfci instruction is executed) occurs. 0 1 Critical interrupt return debug events are disabled. Critical interrupt return debug events are enabled.
43
Instruction Address Compare 4 Debug Event Enable (IAC4) 0 1 IAC4 debug events cannot occur IAC4 debug events can occur
Implementation-dependent Freeze Timers on Debug Event (FT) 0 1 Enable clocking of timers Disable clocking of timers if any DBSR bit is set (except MRR) Virtualized Implementation Note The FT bit may not be supported in virtualized implementations.
44:45
Data Address Compare 1 Debug Event Enable (DAC1) 00 DAC1 debug events cannot occur 01 DAC1 debug events can occur only if a store-type data storage access 10 DAC1 debug events can occur only if a load-type data storage access 11 DAC1 debug events can occur on any data storage access
46:47
Data Address Compare 2 Debug Event Enable (DAC2) 00 DAC2 debug events cannot occur 01 DAC2 debug events can occur only if a store-type data storage access 10 DAC2 debug events can occur only if a load-type data storage access
1064
1065
Architecture Note For IAC12M inclusive (10) and exclusive (11) range comparisons, either DBCR0IAC1 or DBCR0IAC2 can be set to one in order to enable the range comparison. It is permissible for both DBCR0IAC1 and DBCR0IAC2 to be enabled. In that case, both DBSRIAC1 and DBSRIAC2 will be set for a range match. The same behavior holds for IAC34M with the appropriate substitutions.
1066
00 Exact address compare DAC1 debug events can occur only if the address of the data storage access is equal to the value specified in DAC1. DAC2 debug events can occur only if the address of the data storage access is equal to the value specified in DAC2. 01 Address bit match DAC1 and DAC2 debug events can occur only if the address of the data storage access, ANDed with the contents of DAC2 are equal to the contents of DAC1, also ANDed with the contents of DAC2. If DAC1USDAC2US or DAC1ERDAC2ER, results are boundedly undefined. 10 Inclusive address range compare DAC1 and DAC2 debug events can occur only if the address of the data storage access is greater than or equal to the value specified in DAC1 and less than the value specified in DAC2. If DAC1US DAC2US or DAC1ER DAC2ER, results are boundedly undefined. 11 Exclusive address range compare DAC1 and DAC2 debug events can occur only if the address of the data storage access is less than the value specified in DAC1 or is greater than or equal to the value specified in DAC2. If DAC1US DAC2US or DAC1ER DAC2ER, results are boundedly undefined. 42:43 44:45 Reserved Data Value Compare 1 Mode (DVC1M) 00 DAC1 debug events can occur 01 DAC1 debug events can occur only when all bytes specified in DBCR2DVC1BE in the data value of the data storage access match their corresponding bytes in DVC1 10 DAC1 debug events can occur only when at least one of the bytes specified in DBCR2DVC1BE in the data value of the
1067
Instruction Complete Debug Event (ICMP) Set to 1 if an Instruction Completion debug event occurred and DBCR0ICMP=1. See Section 10.4.5.
37
Branch Taken Debug Event (BRT) Set to 1 if a Branch Taken debug event See occurred and DBCR0BRT=1. Section 10.4.4.
38
Interrupt Taken Debug Event (IRPT) Set to 1 if an Interrupt Taken debug event occurred and DBCR0IRPT=1. See Section 10.4.6.
39
Trap Instruction Debug Event (TRAP) Set to 1 if a Trap Instruction debug event occurred and DBCR0TRAP=1. See Section 10.4.3.
40
1068
Critical Interrupt Return Debug Event (CRET) [Category: Embedded.Enhanced Debug] A Critical Interrupt Return Debug Event occurs when DBCR0CRET=1 and a return from critical interrupt (an rfci instruction is executed) occurs. 0 1 Critical interrupt return debug events are disabled. Critical interrupt return debug events are enabled.
59:63
Implementation-dependent
1069
A debug event may be enabled to occur upon loads, stores, or cache operations to an address specified in either the DAC1 or DAC2, inside or outside a range specified by the DAC1 and DAC2, or to blocks of addresses specified by the combination of the DAC1 and DAC1 (see Section 10.4.2). The contents of the Data Address Compare i Register (where i={1 or 2}) can be read into register RT using mfspr RT,DACi. The contents of register RS can be written to the Data Address Compare i Register using mtspr DACi,RS. The contents of the DAC1 or DAC2 are compared to the address generated by a data storage access instruction. These registers are hypervisor privileged.
Figure 79. Debug Status Register Write Register The DBSRWR is provided as a means to restore the contents of the DBSR on a partition switch. The DBSRWR is hypervisor privileged. Writing DBSRWR changes the value in the DBSR. Writing non-zero bits may enable an imprecise Debug exception which may cause later imprecise Debug Interrupts. In order to correctly write DBSRWR, software should ensure that MSRDE = 0 when the value is written and perform a context synchronizing operation before setting MSRDE to 1.
1070
XFX-form
19
0 6
DUI
11
DUIS
21
198
/
31
if enabled by implementation-dependent means then DUI implementation-dependent register halt thread else illegal instruction exception Execution of the dnh instruction causes the thread to stop fetching instructions and taking interrupts if execution of the instruction has been enabled. The contents of the DUI field are sent to the external debug facility to identify the reason for the halt. If execution of the dnh instruction has not been previously enabled, executing the dnh instruction produces an Illegal Instruction exception. The means by which execution of the dnh instruction is enabled is implementation-dependent. The current state of the debug facility, whether the thread is in IDM or EDM mode has no effect on the execution of the dnh instruction. The instruction is context synchronizing. Programming Note The DUIS field in the instruction may be used to pass information to an external debug facility. After the dnh instruction has executed, the instruction itself can be read back by the Illegal Instruction Interrupt handler or the external debug facility if the contents of the DUIS field are of interest. If the thread entered the Illegal Instruction Interrupt handler, software can use SRR0 to obtain the address of the dnh instruction which caused the handler to be invoked. Special Registers Altered: None
1071
1072
11.1 Overview
The Processor Control facility provides a mechanism for threads within a coherence domain to send messages to all devices in the coherence domain. The facility provides a mechanism for sending interrupts that are not dependent on the interrupt controller to threads and allows message filtering by the threads that receive the message. The Processor Control facility is also useful for sending messages to a device that provides specialized services such as secure boot operations controlled by a security device. The Processor Control facility defines how threads send messages and what actions threads take on the receipt of a message. The actions taken by devices other than threads are not defined.
To provide inter thread interrupt capability the following doorbell message types are defined: Processor Doorbell Processor Doorbell Critical Guest Processor Doorbell <E.HV> Guest Processor Doorbell Critical <E.HV> Guest Processor Doorbell Machine Check <E.HV> A doorbell message causes an interrupt to occur on threads when the message is received and the thread determines through examination of the payload that the message should be accepted. The examination of the payload for this purpose is termed filtering. The acceptance of a doorbell message causes an exception to be generated on the accepting thread. Threads accept and filter messages defined in Section 11.2.1. Threads may also accept other implementation-dependent defined messages.
1073
38:49
LPID Tag (LPIDTAG) <E.HV> The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it matches all values in the LPIDR register. PIR Tag (PIRTAG) The contents of this field are compared with bits 50:63 of the PIR register.
50:63
If category E.HV is supported by the thread on which a DBELL message is received, the message will only be accepted if it is for this partition (payloadLPIDTAG = LPIDR) or it is for all partitions (payloadLPIDTAG = 0) and it meets the additional criteria for acceptance below. If a DBELL message is received by a thread, the message is accepted and a Processor Doorbell exception is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); The message is intended for this thread (PIR50:63=payloadPIRTAG). The exception condition remains until a Processor Doorbell Interrupt is taken, or a msgclr instruction is executed on the receiving thread with a message type of DBELL. A change to any of the filtering criteria (i.e. changing the PIR register) will not clear a pending Processor Doorbell exception. DBELL messages are not cumulative. That is, if a DBELL message is accepted and the interrupt is pended because MSREE=0, additional DBELL messages that would be accepted are ignored until the Processor Doorbell exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of DBELL on the receiving thread.
Message types other than these and their associated actions are implementation-dependent.
1074
38:49
LPID Tag (LPIDTAG) <E.HV> The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it matches all values in the LPIDR register. PIR Tag (PIRTAG) The contents of this field are compared with bits 50:63 of the PIR register.
38:49
50:63
LPID Tag (LPIDTAG) The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it matches all values in the LPIDR register. PIR Tag (PIRTAG) The contents of this field are compared with bits 50:63 of the GPIR register.
If category E.HV is supported by the thread on which a DBELL_CRIT message is received, the message will only be accepted if it is for this partition (payloadLPIDTAG = LPIDR) or it is for all partitions (payloadLPIDTAG = 0) and it meets the additional criteria for acceptance below. If a DBELL_CRIT message is received by a thread, the message is accepted and a Processor Doorbell Critical exception is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); The message is intended for this thread (PIR50:63=payloadPIRTAG). DBELL_CRIT messages are not cumulative. That is, if a DBELL_CRIT message is accepted and the interrupt is pended because MSRCE=0, additional DBELL_CRIT messages that would be accepted are ignored until the Processor Doorbell Critical exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of DBELL_CRIT on the receiving thread.
50:63
When a G_DBELL message is received by a thread, the message will only be accepted if it is for this partition (payloadLPIDTAG = LPIDR) or it is for all partitions (payloadLPIDTAG = 0) and it meets the additional criteria for acceptance below. The message is accepted and a Guest Processor Doorbell exception is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); The message is intended for this thread (GPIR50:63=payloadPIRTAG). G_DBELL messages are not cumulative. That is, if a G_DBELL message is accepted and the interrupt is pended because MSRCE=0, additional G_DBELL messages that would be accepted are ignored until the Guest Processor Doorbell exception is cleared by taking the interrupt or cleared by executing a msgclr with a message type of G_DBELL on the receiving thread.
1075
38:49
LPID Tag (LPIDTAG) The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it matches all values in the LPIDR register. PIR Tag (PIRTAG) The contents of this field are compared with bits 50:63 of the GPIR register.
38:49
50:63
LPID Tag (LPIDTAG) The contents of this field are compared with the contents of the LPIDR. If LPIDTAG = 0, it matches all values in the LPIDR register. PIR Tag (PIRTAG) The contents of this field are compared with bits 50:63 of the GPIR register.
50:63
When a G_DBELL_CRIT message is received by a thread, the message will only be accepted if it is for this partition (payloadLPIDTAG = LPIDR) or it is for all partitions (payloadLPIDTAG = 0) and it meets the additional criteria for acceptance below. If a G_DBELL_CRIT message is received by a thread, the message is accepted and a Guest Processor Doorbell Critical exception is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); The message is intended for this thread (GPIR50:63=payloadPIRTAG). G_DBELL_CRIT messages are not cumulative. That is, if a G_DBELL_CRIT message is accepted and the interrupt is pended because MSRCE=0, additional G_DBELL messages that would be accepted are ignored until the Guest Processor Doorbell Critical exception is cleared by taking the interrupt or cleared
When a G_DBELL_MC message is received by a thread, the message will only be accepted if it is for this partition (payloadLPIDTAG = LPIDR) or it is for all partitions (payloadLPIDTAG = 0) and it meets the additional criteria for acceptance below. If a G_DBELL_MC message is received by a thread, the message is accepted and a Guest Processor Doorbell Machine Check exception is generated if one of the following conditions exist: This is a broadcast message (payloadBRDCAST=1); The message is intended for this thread (GPIR50:63=payloadPIRTAG). G_DBELL_MC messages are not cumulative. That is, if a G_DBELL_MC message is accepted and the interrupt is pended because MSRCE=0, additional G_DBELL_MC messages that would be accepted are
1076
1077
Message Send
msgsnd 31
0 6
X-form
Message Clear
msgclr RB ///
6 11
X-form
RB ///
11
///
16
RB
21
206
/
31 0
31
///
16
RB
21
238
/
31
msgtype GPR(RB)32:36 payload GPR(RB)37:63 send_msg_to_choherence_domain(msgtype, payload) msgsnd sends a message to all devices in the coherence domain. The message contains a type and a payload. The message type (msgtype) is defined by the contents of RB32:36 and the message payload is defined by the contents of RB37:63. Message delivery is reliable and guaranteed. Each device may perform specific actions based on the message type and payload or may ignore messages. Consult the implementation users manual for specific actions taken based on message type and payload. For threads, actions taken on receipt of a message are defined in Section 11.2.1. For storage access ordering, msgsnd is treated as a Store with respect to memory barriers. This instruction is hypervisor privileged. Special Registers Altered: None
msgtype GPR(RB)32:36 clear_received_message(msgtype) msgclr clears a message of msgtype previously accepted by the thread executing the msgclr. msgtype is defined by the contents of RB32:36. A message is said to be cleared when a pending exception generated by an accepted message has not yet taken its associated interrupt. If a pending exception exists for msgtype that exception is cleared at the completion of the msgclr instruction. For threads, the types of messages that can be cleared are defined in Section 11.2.1. This instruction is hypervisor privileged. Special Registers Altered: None Programming Note Execution of a msgclr instruction that clears a pending exception when the associated interrupt is masked because the interrupt enable (MSREE or MSRCE) is not set to 1 will always clear the pending exception (and thus the interrupt will not occur) if a subsequent instruction causes MSREE or MSRCE to be set to 1.
1078
1079
Instruction or Event interrupt rfi rfci rfmci rfdi[Category:E.ED] rfgi sc mtmsr (GS) mtspr (LPIDR) mtspr (GIVPR) mtspr (DBSRWR) mtspr (EPCR) mtspr (GIVORi) mtmsr (CM) mtmsr (UCLE) mtmsr (SPV) mtmsr (CE) mtmsr (EE) mtmsr (PR) mtmsr (FP) mtmsr (DE) mtmsr (ME) mtmsr (FE0) mtmsr (FE1) mtmsr (IS) mtspr (DEC) mtspr (PID) mtspr (IVPR) mtspr (DBSR) mtspr (DBCR0,DBCR1) mtspr (IAC1,IAC2,IAC3, IAC4) mtspr (IVORi) mtspr (TSR) mtspr (TCR) mtspr (MMUCSR0) TLB invalidate all mtspr (MCIVPR) tlbilx Store(PTE) tlbivax tlbwe wrtee wrteei
Required Before none none none none none none none none none none none none none none none none none none none none none none none none none none none none ----
Required After none none none none none none none CSI CSI none CSI CSI none none none CSI none none CSI CSI CSI CSI CSI CSI CSI none CSI none ----
Notes
4 4
Instruction or Event interrupt rfi rfci rfmci rfdi[Category:E.ED] rfgi sc mtmsr (GS) mtspr (LPIDR) mtmsr (CM) mtmsr (PR) mtmsr (ME) mtmsr (DS) mtspr (PID) mtspr (DBSR) mtspr (DBCR0,DBCR2) mtspr (DAC1,DAC2, DVC1,DVC2) mtspr (MMUCSR0) TLB invalidate all tlbilx Store(PTE) tlbivax tlbwe
Required Before none none none none none none none none CSI none none none none CSI -----
Required After none none none none none none none CSI CSI CSI CSI CSI CSI CSI -----
Notes
5 5 5
2 7 2 5 5 5
CSI, or CSI and sync CSI, or CSI and sync {sync, CSI} CSI, or CSI and sync CSI, or CSI and sync
Notes:
1. There are additional software synchronization requirements for this instruction in multi-threaded environments (e.g., it may be necessary to invalidate one or more TLB entries on all threads in the system and to be able to determine that the invalidations have completed and that all side effects of the invalidations have taken effect); it is also necessary to execute a tlbsync instruction. 2. The alteration must not cause an implicit branch in real address space. Thus the real address of the context-altering instruction and of each subsequent instruction, up to and including the next context synchronizing instruction, must be independent of whether the alteration has taken effect. 3. A context synchronizing instruction is required after altering MSRME to ensure that the alteration takes effect for subsequent Machine Check interrupts, which may not be recoverable and therefore
none none none none none none none none none none none
none none none CSI, or CSI and sync none CSI, or CSI and sync {sync, CSI} CSI, or CSI and sync CSI, or CSI and sync none none
7 7 6,9
1080
Programming Note The following sequence illustrates why it is necessary, for data accesses, to ensure that all storage accesses due to instructions before the tlbwe or tlbivax have completed to a point at which they have reported all exceptions they will cause. Assume that valid TLB entries exist for the target storage location when the sequence starts. A program issues a load or store to a page. The same program executes a tlbwe or tlbivax that invalidates the corresponding TLB entry. The Load or Store instruction finally executes, and gets a TLB Miss exception. The TLB Miss exception is semantically incorrect. In order to prevent it, a context synchronizing instruction must be executed between steps 1 and 2. 7. The elapsed time between the Decrementer reaching zero, or the transition of the selected Time Base bit for the Fixed-Interval Timer or the Watchdog Timer, and the signalling of the Decrementer, Fixed-Interval Timer or the Watchdog Timer exception is not defined. 8. The notation {sync, CSI} denotes an instruction sequence. Other instructions may be interleaved with this sequence, but these instructions must appear in the order shown. No software synchronization is required before the Store instruction because (a) stores are not performed out-of-order and (b) address translations associated with instructions preceding the Store instruction are not performed again after the store has been performed (see Section 5.5). These properties ensure that all address translations associated with instructions preceding the Store instruction will be performed using the old contents of the PTE. The sync instruction after the Store instruction ensures that all lookups of the Page Table that are performed after the sync instruction completes will use the value stored (or a value stored subsequently). The context synchronizing instruction after the sync instruction ensures that any address translations associated with instructions following the context synchronizing instruction that were performed using the old contents of the PTE will be discarded, with the result that these address translations will be performed again and, if there is no corresponding entry in any implementation-specific address translation lookaside information, will use the value stored (or a value stored subsequently).
1081
1082
X-form
X-form
CT / CT
11
///
16
///
21
454
/
31
CT
11
///
16
///
21
966
/
31
6 7
6 7
If CT is not supported by the implementation, this instruction designates the primary data cache as the target data cache. If CT is supported by the implementation, let CT designate either the primary data cache or another level of the data cache hierarchy, as specified in Section 4.3, Cache Management Instructions, in Book II, as the target data cache. The contents of the target data cache of the thread executing the dci instruction are invalidated. Software must place a sync instruction before the dci to guarantee all previous data storage accesses complete before the dci is performed. Software must place a sync instruction after the dci to guarantee that the dci completes before any subsequent data storage accesses are performed. This instruction is hypervisor privileged. Special Registers Altered: None Extended Mnemonics: Extended mnemonic for Data Cache Invalidate Extended: dccci Equivalent to: dci 0
If CT is not supported by the implementation, this instruction designates the primary instruction cache as the target instruction cache. If CT is supported by the implementation, let CT designate either the primary instruction cache or another level of the instruction cache hierarchy, as specified in Section 4.3, Cache Management Instructions, in Book II, as the target instruction cache. The contents of the target instruction cache of the thread executing the ici instruction are invalidated. Software must place a sync instruction before the ici to guarantee all previous instruction storage accesses complete before the ici is performed. Software must place an isync instruction after the ici to invalidate any instructions that may have already been fetched from the previous contents of the instruction cache after the isync. This instruction is hypervisor privileged. Special Registers Altered: None Extended Mnemonics: Extended mnemonic for Instruction Cache Invalidate Extended: iccci Equivalent to: ici 0
1083
Figure 80. Data Cache Debug Tag Register High Programming Note An example implementation of DCDBTRH could have the following content and format. Bit(s) 32:55 Description Tag Real Address (TRA) Bits 0:23 of the lower 32 bits of the 36-bit real address associated with this cache block Valid (V) The valid indicator for the cache block (1 indicates valid) Reserved Tag Extended Real Address (TERA) Upper 4 bits of the 36-bit real address associated with this cache block
Figure 81. Data Cache Debug Tag Register Low Programming Note An example implementation of DCDBTRL could have the following content and format. Bit(s) 32:44 45 46:47 48:51 52:55 56:59 Description Reserved (TRA) U bit parity (UPAR) Tag parity (TPAR) Data parity (DPAR) Modified (dirty) parity (MPAR) Dirty Indicators (D) The dirty (modified) indicators for each of the four doublewords in the cache block U0 Storage Attribute (U0) The U0 storage attribute for the page associated with this cache block U1 Storage Attribute (U1) The U1 storage attribute for the page associated with this cache block U2 Storage Attribute (U2) The U2 storage attribute for the page associated with this cache block U3 Storage Attribute (U3) The U3 storage attribute for the page associated with this cache block
56
57:59 60:63
60
Implementations may support different content and format based on their cache implementation. This register is hypervisor privileged.
61
62
63
Implementations may support different content and format based on their cache implementation. This register is hypervisor privileged.
1084
Figure 82. Instruction Cache Debug Data Register This register is hypervisor privileged.
Figure 84. Instruction Cache Debug Tag Register Low Programming Note
An example implementation of ICDBTRL could have the following content and format. Bit(s) 32:53 54 Description Reserved Translation Space (TS) The address space portion of the virtual address associated with this cache block. Translation ID Disable (TD) TID Disable field for the memory page associated with this cache block Translation ID (TID) TID field portion of the virtual address associated with this cache block
55
Figure 83. Instruction Cache Debug Tag Register High Programming Note An example implementation of ICDBTRH could have the following content and format. Bit(s) 32:55 Description Tag Effective Address (TEA) Bits 0:23 of the 32-bit effective address associated with this cache block Valid (V) The valid indicator for the cache block (1 indicates valid) Tag parity (TPAR) Instruction Data parity (DPAR) Reserved 56:63
Other implementations may support different content and format based on their cache implementation. This register is hypervisor privileged.
56
57:58 59 60:63
Implementations may support different content and format based on their cache implementation. This register is hypervisor privileged.
1085
X-form
sync
RT,RA,RB RT
11
# ensure that all previous # cache operations have # completed regT,regA,regB# read cache information; # ensure dcread completes # before attempting to # read results regD,dcdbtrh # move high portion of tag # into GPR D regE,dcdbtrl # move low portion of tag # into GPR E
RA
16
RB
21
486
/
31
dcread isync
[Alternative Encoding] 31
0 6
RT
11
RA
16
RB
21
326
/
31
mfspr mfspr
if RA = 0 then b 0 else b (RA) EA b + (RB) log2(cache size) C B log2(cache block size) IDX EA64-C:63-B WD EA64-B:61 RT0:31 undefined RT32:63 (data cache data)[IDX]WD32:WD32+31 DCDBTRH (data cache tag high)[IDX] DCDBTRL (data cache tag low)[IDX] Let the effective address (EA) be the sum of the contents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Let C = log2(cache size in bytes). Let B = log2(cache block size in bytes). EA64-C:63-B selects one of the 2C-B data cache blocks. EA64-B:61 selects one of the data words in the selected data cache block. The selected word in the selected data cache block is placed into register RT. The contents of the data cache directory entry associated with the selected data cache block are placed into DCDBTRH and DCDBTRL (see Figure 80 and Figure 81). dcread requires software to guarantee execution synchronization before subsequent mfspr instructions can read the results of the dcread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the dcread instruction, a sequence such as the following must be used:
This instruction is hypervisor privileged. Special Registers Altered: DCDBTRH DCDBTRL Programming Note dcread can be used by a debug tool to determine the contents of the data cache, without knowing the specific addresses of the blocks which are currently contained within the cache. Programming Note Execution of dcread before the data cache has completed all cache operations associated with previously executed instructions (such as block fills and block flushes) is undefined.
1086
X-form
Programming Note icread can be used by a debug tool to determine the contents of the instruction cache, without knowing the specific addresses of the blocks which are currently contained within the cache.
RA,RB ///
11
RA
16
RB
21
998
/
31
if RA = 0 then b 0 else b (RA) EA b + (RB) log2(cache size) C B log2(cache block size) IDX EA64-C:63-B WD EA64-B:61 ICDBDR (instruction cache data)[IDX]WD32:WD32+31 ICDBTRH (instruction cache tag high)[IDX] ICDBTRL (instruction cache tag low)[IDX] Let the effective address (EA) be the sum of the contents of register RA, or 0 if RA is equal to 0, and the contents of register RB. Let C = log2(cache size in bytes). Let B = log2(cache block size in bytes). EA64-C:63-B selects one of the 2C-B instruction cache blocks. EA64-B:61 selects one of the data words in the selected instruction cache block. The selected word in the selected instruction cache block is placed into ICDBDR. The contents of the instruction cache directory entry associated with the selected cache block are placed into ICDBTRH and ICDBTRL (see Figure 83 and Figure 84). icread requires software to guarantee execution synchronization before subsequent mfspr instructions can read the results of the icread instruction into GPRs. In order to guarantee that the mfspr instructions obtain the results of the icread instruction, a sequence such as the following must be used: icread isync regA,regB # read cache information # ensure icread completes # before attempting to # read results # move instruction # information into GPR C # move high portion of # tag into GPR D # move low portion of tag # into GPR E
This instruction is hypervisor privileged. Special Registers Altered: ICDBDR ICDBTRH ICDBTRL
1087
1088
1089
Table 14:Extended mnemonics for moving to/from an SPR Special Purpose Register Fixed-Point Exception Register Link Register Count Register Decrementer Save/Restore Register 0 Save/Restore Register 1 Special Purpose Registers G0 through G3 Time Base [Lower] Time Base Upper PPR32 Processor Version Register
1
Move To SPR Extended mtxer Rx mtlr Rx mtctr Rx mtdec Rx mtsrr0 Rx mtsrr1 Rx mtsprg n,Rx mttbl Rx mttbu Rx mtppr32 Rx Equivalent to mtspr 1,Rx mtspr 8,Rx mtspr 9,Rx mtspr 22,Rx mtspr 26,Rx mtspr 27,Rx mtspr 272+n,Rx mtspr 284,Rx mtspr 285,Rx mtspr 898, Rx -
Move From SPR Extended mfxer Rx mflr Rx mfctr Rx mfdec Rx mfsrr0 Rx mfsrr1 Rx mfsprg Rx,n mftb Rx mftbu Rx mfppr32 Rx mfpvr Rx Equivalent to mfspr Rx,1 mfspr Rx,8 mfspr Rx,9 mfspr Rx,22 mfspr Rx,26 mfspr Rx,27 mfspr Rx,272+n mftb Rx,2681 mfspr Rx,268 mftb Rx,2691 mfspr Rx,269 mfspr Rx, 8982 mfspr Rx,287
The mftb instruction is Category: Phased-Out. Assemblers targeting version 2.03 or later of the architecture should generate an mfspr instruction for the mftb and mftbu extended mnemonics; see the corresponding Assembler Note in the mftb instruction description (see Section 5.2.1 of Book II). 2 Category: Phased-In
1090
1091
1092
Appendix C. Guidelines for 64-bit Implementations in 32-bit Mode and 32-bit Implementations
C.1 Hardware Guidelines
C.1.1 64-bit Specific Instructions
The instructions in the Category: 64-Bit are considered restricted only to 64-bit processing. A 32-bit implementation need not implement the group; likewise, the 32-bit applications will not utilize any of these instructions. All other instructions shall either be supported directly by the implementation, or sufficient infrastructure will be provided to enable software emulation of the instructions. A 64-bit implementation that is executing in 32-bit mode may choose to take an Unimplemented Instruction Exception when these 64-bit specific instructions are executed. Branch to Link Register and Branch to Count Register instructions, given the LR and CTR are implemented only as 32-bit registers, only concatenating 2 0s to the right of bits 32:61 of these registers is necessary to form the 32-bit branch target address. For next sequential instruction address computation, the behavior is the same as for 64-bit implementations in 32-bit mode.
1093
1094
1095
Figure 85. Thread States and PMLCan Bit Settings Two unconditional counting modes may be specified: Counting is unconditionally enabled regardless of the states of MSRPMM and MSRPR. This can be accomplished by setting PMLCanFCS, PMLCanFCU, PMLCanFCM1, and PMLCanFCM0 to 0 for each counter control. Counting is unconditionally disabled regardless of the states of MSRPMM and MSRPR. This can be accomplished by setting PMGC0FAC to 1 or by setting PMLCanFC to 1 for each counter control. Alternatively, this can be accomplished by setting PMLCanFCM1 to 1 and PMLCanFCM0 to 1 for each counter control or by setting PMLCanFCS to 1 and PMLCanFCU to 1 for each counter control. Programming Note Events may be counted in a fuzzy manner. That is, events may not be counted precisely due to the nature of an implementation. Users of the Performance Monitor facility should be aware that an event may be counted even if it was precisely filtered, though it should not have been. In general such discrepancies are statistically unimportant and users should not assume that counts are explicitly accurate.
1096
Programming Note Event name and event numbers will differ greatly across implementations and software should not expect that events and event names will be consistent.
Programming Note When taking a Performance Monitor interrupt software should clear the overflow condition by reading the counter register and setting the counter register to a non-overflow value since the normal return from the interrupt will set MSREE back to 1.
D.2.4 Thresholds
Thresholds are values that must be exceeded for an event to be counted. Threshold values are programmed in the PMLCb0..15THRESHOLD field. The events which may be thresholded and the units of each event that may be thresholded are implementation-dependent. Programming a threshold value for an event that is not defined to use a threshold gives boundedly undefined results.
Figure 86. [User] Performance Control Register 0 These bits are interpreted as follows: Bit 32 Description
Monitor
Global
Freeze All Counters (FAC) The FAC bit is sticky; that is, once set to 1 it remains set to 1 until it is set to 0 by an mtpmr instruction. 0 The PMCs can be incremented (if enabled by other Performance Monitor control fields). The PMCs can not be incremented. Monitor Interrupt Enable
1 33
Performance (PMIE) 0
Performance Monitor interrupts are disabled. Performance Monitor interrupts are enabled and occur when an enabled condition or event occurs. Enabled conditions and events are described in Section D.2.5.
Freeze Counters on Enabled Condition or Event (FCECE) Enabled conditions and events are described in Section D.2.5. 0 The PMCs can be incremented (if enabled by other Performance Monitor control fields). The PMCs can be incremented (if enabled by other Performance Monitor control fields) only until an enabled condition or event occurs. When an enabled condition or event occurs, PMGC0FAC is set to 1. It is the users responsibility to set PMGC0FAC to 0.
1097
The UPMGC0 register is an alias to the PMGC0 register for user mode read only access.
It is recommended that CE be set to 0 when counter PMCn is selected for chaining; see Section D.5.1. 38 Freeze Counter in Guest State (FCGS) 0 1 39 The PMC is incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRGS is 1. The PMC is incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRGS is 0 and MSRPR is 0.
Monitor
Local 1 40 41:47
PMLCa is set to 0 at reset. These bits are interpreted as follows: Bit 32 Description Freeze Counter (FC) 0 The PMC can be incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented.
Reserved Event Selector (EVENT) Up to 128 events selectable; see Section D.2.3. Setting is implementation-dependent. Reserved
48:53 54:63
1 33
Freeze Counter in Supervisor State (FCS) 0 1 The PMC is incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRPR is 0. The PMC can be incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRPR is 1. The PMC can be incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRPMM is 1.
The UPMLCa0..15 registers are aliases to the PMLCa0..15 registers for user mode read only access.
34
1 35
1 36
Monitor
Local
Freeze Counter while Mark is Cleared (FCM0) 0 The PMC can be incremented (if enabled by other Performance Monitor control fields). The PMC can not be incremented if MSRPMM is 0.
PMLCb is set to 0 at reset. These bits are interpreted as follows: Bit 32:52 53:55 Description Reserved Threshold Multiple (THRESHMUL) 000 Threshold field is multiplied by 1 (THRESHOLD 1)
1 37
1098
Counter Value (CV) Indicates the number of occurrences of the specified event.
56:57 58:63
Reserved Threshold (THRESHOLD) Only events that exceed the value THRESHOLD multiplied as described by THRESHMUL are counted. Events to which a threshold value applies are implementation-dependent as are the unit (for example duration in cycles) and the granularity with which the threshold value is interpreted.
The minimum value for a counter is 0 (0x0000_0000) and the maximum value is 4,294,967,295 (0xFFFF_FFFF). A counter can increment up to the maximum value and then wraps to the minimum value. A counter enters the overflow state when the high-order bit is set to 1, which normally occurs only when the counter increments from a value below 2,147,483,648 (0x8000_0000) to a value greater than or equal to 2,147,483,648 (0x8000_0000). Several different actions may occur when an overflow state is reached, depending on the configuration: If PMLCanCE is 0, no special actions occur on overflow: the counter continues incrementing, and no exception is signaled. If PMLCanCE and PMGC0FCECE are 1, all counters are frozen when PMCn overflows. If PMLCanCE, PMGC0PMIE, and MSREE are 1, an exception is signalled when PMCn reaches overflow. Note that the interrupts are masked by setting MSREE to 0. An overflow condition may be present while MSREE is zero, but the interrupt is not taken until MSREE is set to 1. If an overflow condition occurs while MSREE is 0 (the exception is masked), the exception is still signalled once MSREE is set to 1 if the overflow condition is still present and the configuration has not been changed in the meantime to disable the exception; however, if MSREE remains 0 until after the counter leaves the overflow state (MSB becomes 0), or if MSREE remains 0 until after PMLCanCE or PMGC0PMIE are set to 0, the exception does not occur. Programming Note Loading a PMC with an overflowed value can cause an immediate exception. For example, if PMLCanCE, PMGC0PMIE, and MSREE are all 1, and an mtpmr loads an overflowed value into a PMCn that previously held a non-overflowed value, then an interrupt will be generated before any event counting has occurred. The following sequence is generally recommended for setting the counter values and configurations. 1. Set PMGC0FAC to 1 to freeze the counters. 2. Perform a series of mtpmr operations to initialize counter values and configure the control registers 3. Release the counters by setting PMGC0FAC to 0 with a final mtpmr.
Programming Note By varying the threshold value, software can obtain a profile of the event characteristics subject to thresholding. For example, if PMC1 is configured to count cache misses that last longer than the threshold value, software can measure the distribution of cache miss durations for a given program by monitoring the program repeatedly using a different threshold value each time. The UPMLCb0..15 registers are aliases to the PMLCb0..15 registers for user mode read only access.
Figure 89. [User] Performance Monitor Counter Registers PMCs are set to 0 at reset. These bits are interpreted as follows: Bit 32 Description Overflow (OV)
1099
RT,PMRN RT
11
pmrn
21
334
/
31 0
31
pmrn
21
462
/
31
n pmrn5:9 || pmrn0:4 if length(PMR(n)) = 64 then RT PMR(n) else 320 || PMR(n) RT 32:63 Let PMRN denote a Performance Monitor Register number and PMR the set of Performance Monitor Registers. The contents of the designated Performance Monitor Register are placed into register RT. The list of defined Performance Monitor Registers and their privilege class is provided in Figure 90. Execution of this instruction specifying a defined and privileged Performance Monitor Register when MSRPR=1 will result in a Privileged Instruction exception. Category: Embedded.Hypervisor] If MSRPPMMP = 1 and MSRGS = 1, execution of this instruction specifying a defined Performance Monitor Register sets RT to 0. Execution of this instruction specifying an undefined Performance Monitor Register will either result in an Illegal Instruction exception or will produce an undefined value for register RT. Special Registers Altered: None
n pmrn5:9 || pmrn0:4 if length(PMR(n)) = 64 then PMR(n) (RS) else (RS)32:63 PMR(n) Let PMRN denote a Performance Monitor Register number and PMR the set of Performance Monitor Registers. The contents of the register RS are placed into the designated Performance Monitor Register. The list of defined Performance Monitor Registers and their privilege class is provided in Figure 90. Execution of this instruction specifying a defined and privileged Performance Monitor Register when MSRPR=1 will result in a Privileged Instruction exception. [Category: Embedded.Hypervisor] If MSRPPMMP = 1 and MSRGS = 1 and MSRPR = 0, execution of this instruction specifying a defined Performance Monitor Register results in a Embedded Hypervisor Privilege exception. Execution of this instruction specifying an undefined Performance Monitor Register will either result in an Illegal Instruction exception or will perform no operation. Special Registers Altered: None
PMR1 Privileged Register Name Cat pmrn5:9 pmrn0:4 mtpmr mfpmr 0-15 00000 0xxxx PMC0..15 no E.PM 16-31 00000 1xxxx PMC0..15 yes yes E.PM 128-143 00100 0xxxx PMLCA0..15 no E.PM 144-159 00100 1xxxx PMLCA0..15 yes yes E.PM 256-271 01000 0xxxx PMLCB0..15 no E.PM 272-287 01000 1xxxx PMLCB0..15 yes yes E.PM 384 01100 00000 PMGC0 no E.PM 400 01100 10000 PMGC0 yes yes E.PM - This register is not defined for this instruction. 1 Note that the order of the two 5-bit halves of the PMR number is reversed. decimal Figure 90. Embedded.Peformance Monitor PMRs
1100
Rx,pmctr1 #load from upper counter Ry,pmctr0 #load from lower counter Rz,pmctr1 #load from upper counter cr0,0,Rz,Rx #see if old = new 4,2,loop #loop if carry occurred between reads
The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The above sequence is not necessary if the counters are frozen.
D.5.2 Thresholding
Threshold event measurement enables the counting of duration and usage events. Assume an example event, dLFB load miss cycles, requires a threshold value. A dLFB load miss cycles event is counted only when the number of cycles spent recovering from the miss is greater than the threshold. If the event is counted on two counters and each counter has an individual threshold, one execution of a performance monitor program can sample two different threshold values. Measuring code performance with multiple concurrent thresholds expedites code profiling significantly.
1101
1102
Book VLE: Power ISA Operating Environment Architecture Variable Length Encoding (VLE) Environment [Category: Variable Length Encoding]
1103
1104
1107 1107 1107 1107 1107 1107 1107 1107 1107 1107 1107 1107 1108 1108
This chapter describes computation modes, document conventions, a processor overview, instruction formats, storage addressing, and instruction addressing.
standard instruction encodings and VLE instructions for that page of memory. Instruction encodings in pages marked as VLE are either 16 or 32 bits long, and are aligned on 16-bit boundaries. Because of this, all instruction pages marked as VLE are required to use Big-Endian byte ordering. The programming model uses the same register set with both instruction set encodings, although some registers are not accessible by VLE instructions using the 16-bit formats and not all condition register (CR) fields are used by Conditional Branch instructions or instructions that access the condition register executing from a VLE instruction page. In addition, immediate fields and displacements differ in size and use, due to the more restrictive encodings imposed by VLE instruction formats. VLE additional instruction fields are described in Section 1.4.19, Instruction Fields. Other than the requirement of Big-Endian byte ordering for instruction pages and the additional storage attribute to identify whether the instruction page corresponds to a VLE section of code, VLE complies with the memory model, register model, timer facilities, debug facilities, and interrupt/exception model defined
1.1 Overview
Variable Length Encoding (VLE) is a code density optimized re-encoding of much of the instruction set defined by Books I, II, and III-E using both 16-bit and 32-bit instruction formats. VLE offers more efficient binary representations of applications for the embedded processor spaces where code density plays a major role in affecting overall system cost, and to a somewhat lesser extent, performance. VLE is a supplement to the instruction set defined by Book I-III and code pages using VLE encoding or nonVLE encoding can be intermingled in a system providing focus on both high performance and code density where most needed. VLE provides alternative encodings to instructions defined in Books I-III to enable reduced code footprint. This set of alternative encodings is selected on a page basis. A single storage attribute bit selects between
1105
BD8 BD8
LK
C instruction format
OPCD Figure 3.
X O
UI5
RX
X O R C
OIM5 OIM5
RX RX
OPCD Figure 5.
UI7
RX
1106
1.4.12 ESC-form
0 6 11 16 21 31
OPCD
//
//
ELEV
XO
OPCD Figure 6.
XO
RX
1.4.13 I16A-form
R instruction format
0 6 11 16 21 31
OPCD OPCD
si ui
RA RA
XO XO
si ui
Figure 12. I16A instruction format OPCD OPCD OPCD OPCD Figure 7.
XO
X R O C
XO XO
RY RY ARY RY
RX RX RX ARX
1.4.14 I16L-form
0 6 11 16 21 31
OPCD
RT
ui
XO
ui
RR instruction format
1.4.15 M-form
0 6 11 16 21 26 31
RZ
RX
OPCD OPCD
RS RS
RA RA
SH SH
MB MB
ME ME
X O X O
1.4.9 BD15-form
0 6 10 12 16 31
1.4.16 SCI8-form
0 6 11 16 21 22 24 31
OPCD Figure 9.
BD15
LK
1.4.10 BD24-form
0 6 7 31
RT RT RS RS
000 BF32 001 BF32
XO
RA RA RA RA RA RA RA
OPCD 0
BD24
1.4.11 D8-form
0 6 11 16 24 31
1.4.17 LI20-form
0 6 11 16 17 21 31
OPCD
RT
li20
XO li20
li20
OPCD OPCD
RT RS
RA RA
XO XO
D8 D8
1107
1.4.18 X-form
0 6 9 11 16 21 31
OPCD
BF 0
RA
RB
XO
BF32 (9:10) Field used to specify one of the Condition Register fields to be used as a target of a compare instruction. D8 (24:31) The D8 field is a 8-bit signed displacement which is sign-extended to 64 bits. ELEV (16:20) Field used by the e_sc instruction. F (21) Fill value used to fill the remaining 56 bits of a scaled-immediate 8 value.
LI20 (17:20 || 11:15 || 21:31) A 20-bit signed immediate value which is signextended to 64 bits for the e_li instruction. LK (7, 15, 31) LINK bit. 0 1 Do not set the Link Register. Set the Link Register. The sum of the value 2 or 4 and the address of the Branch instruction is placed into the Link Register.
BD8 (8:15), BD15 (16:30), BD24 (7:30) Immediate field specifying a signed two's complement branch displacement which is concatenated on the right with 0b0 and signextended to 64 bits. BD15. (Used by 32-bit branch conditional class instructions) A 15-bit signed displacement that is sign-extended and shifted left one bit (concatenated with 0b0) and then added to the current instruction address to form the branch target address. BD24. (Used by 32-bit branch class instructions) A 24-bit signed displacement that is sign-extended and shifted left one bit (concatenated with 0b0) and then added to the current instruction address to form the branch target address. BD8. (Used by 16-bit branch and branch conditional class instructions) An 8-bit signed displacement that is sign-extended and shifted left one bit (concatenated with 0b0) and then added to the current instruction address to form the branch target address. BI16 (6:7), BI32 (12:15) Field used to specify one of the Condition Register fields to be used as a condition of a Branch Conditional instruction. BO16 (5), BO32 (10:11) Field used to specify whether to branch if the condition is true, false, or to decrement the Count Register and branch if the Count Register is not zero in a Branch Conditional instruction.
OIM5 (7:11) Offset Immediate field used to specify a 5-bit unsigned fixed-point value in the range [1:32] encoded as [0:31]. Thus the binary encoding of 0b00000 represents an immediate value of 1, 0b00001 represents an immediate value of 2, and so on. OPCD (0:3, 0:4, 0:5, 0:9, 0:14, 0:15) Primary opcode field.
Rc (6, 7, 20, 31) RECORD bit. 0 1 Do not alter the Condition Register. Set Condition Register Field 0.
RX (12:15) Field used to specify a General Purpose Register in the ranges R0:R7 or R24:R31 to be used as a source or as a destination. R0 is encoded as 0b0000, R1 as 0b0001, etc. R24 is encoded as 0b1000, R25 as 0b1001, etc. RY (8:11) Field used to specify a General Purpose Register in the ranges R0:R7 or R24:R31 to be used as a source. R0 is encoded as 0b0000, R1 as 0b0001, etc. R24 is encoded as 0b1000, R25 as 0b1001, etc. RZ (8:11) Field used to specify a General Purpose Register in the ranges R0:R7 or R24:R31 to be used as a source or as a destination for load/ store data. R0 is encoded as 0b0000, R1 as 0b0001, etc. R24 is encoded as 0b1000, R25 as 0b1001, etc. SCL (22:23) Field used to specify a scale amount in Imme-
1108
1109
1110
A program references memory using the effective address (EA) computed by the processor when it executes a Storage Access or Branch instruction (or certain other instructions described in Book II and Book IIIE), or when it fetches the next sequential instruction.
1111
Sequential instruction fetching (or The value 4 [2] is added to the address of the current 32-bit [16-bit] instruction to non-taken branch instructions) form the EA of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC [0xFFFF_FFFF_FFFF_FFFE] in 64-bit mode or 0xFFFF_FFFC [0xFFFF_FFFE] in 32-bit mode, the address of the next sequential instruction is undefined. Any Branch instruction with LK = 1 (32-bit instruction format) Branch se_bl. se_blrl. se_bctrl instructions (16-bit instruction format) The value 4 is added to the address of the current branch instruction and the result is placed into the LR. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode o r0xFFFF_FFFC in 32-bit mode, the result placed into the LR is undefined. The value 2 is added to the address of the current branch instruction and the result is placed into the LR. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFE in 64-bit mode or 0xFFFF_FFFE in 32-bit mode, the result placed into the LR is undefined. attribute bit set to 0. If a Mismatched Instruction Storage Exception is detected and no higher priority exception exists, an Instruction Storage Interrupt will occur setting SRR0(GSRR0) to the misaligned address for which execution was attempted. A Byte Ordering Instruction Storage Exception occurs when an implementation which supports VLE attempts to execute an instruction that has the VLE storage attribute set to 1 and the E (Endian) storage attribute set to 1 for the page that corresponds to the effective address of the instruction. If a Byte Ordering Instruction Storage Exception is detected and no higher priority exception exists, an Instruction Storage Interrupt will occur setting SRR0(GSRR0) to the address for which execution was attempted.
1112
1113
1114
3.1 Overview
Category VLE uses the same semantics as Books IIII. Due to the limited instruction encoding formats, VLE instructions typically support reduced immediate fields and displacements, and not all operations defined by Books IIII are encoded in category VLE. The basic philosophy is to capture all useful operations, with most frequent operations given priority. Immediate fields and displacements are provided to cover the majority of ranges encountered in embedded control code. Instructions are encoded in either a 16- or 32-bit format, and these may be freely intermixed. VLE instructions cannot access floating-point registers (FPRs). VLE instructions use GPRs and SPRs with the following limitations: VLE instructions using the 16-bit formats are limited to addressing GPR0GPR7, and GPR24 GPR31 in most instructions. Move instructions are provided to transfer register contents between these registers and GPR8GPR23. VLE compare and bit test instructions using the 16-bit formats implicitly set their results in CR0. VLE instruction encodings are generally different than instructions defined by Books IIII, except that most instructions falling within primary opcode 31 are encoded identically and have identical semantics unless they affect or access a resource not supported by category VLE.
1115
1116
4.1.2 Link Register (LR) . . . . . . . . . . 4.1.3 Count Register (CTR) . . . . . . . 4.2 Branch Instructions . . . . . . . . . . . 4.3 System Linkage Instructions . . . . 4.4 Condition Register Instructions . .
A specified CR field can be set as the result of a fixed-point compare instruction. CR field 0 can be set as the result of a fixed-point bit test instruction. Other instructions from implemented categories may also set bits in the CR in the same manner that they would when not in VLE mode. Instructions are provided to perform logical operations on individual CR bits and to test individual CR bits. For all fixed-point instructions in which the Rc bit is defined and set, and for e_add2i., e_and2i.,and e_and2is., the first three bits of CR field 0 (CR32:34) are set by signed comparison of the result to zero, and the fourth bit of CR field 0 (CR35) is copied from the final state of XERSO. Result here refers to the entire 64-bit value placed into the target register in 64-bit mode, and to bits 32:63 of the value placed into the target register in 32-bit mode. if (64-bit mode) 0 then M 32 else M if (target_register)M:63 < 0 then c else if (target_register)M:63 > 0 then c else c CR0 c || XERSO
Figure 18. Condition Register The bits in the Condition Register are grouped into eight 4-bit fields, CR Field 0 (CR0) ... CR Field 7 (CR7), which are set by VLE defined instructions in one of the following ways. Specified fields of the condition register can be set by a move to the CR from a GPR (mtcrf, mtocrf). A specified CR field can be set by a move to the CR from another CR field (e_mcrf) or from XER32:35 (mcrxr). CR field 0 can be set as the implicit result of a fixed-point instruction.
If any portion of the result is undefined, the value placed into the first three bits of CR field 0 is undefined. The bits of CR field 0 are interpreted as shown below. CR Bit Description 32 33 34 Negative (LT) The result is negative. Positive (GT) The result is positive. Zero (EQ) The result is 0.
1117
1118
Figure 19. BO32 field encodings Encodings for the BO16 field for VLE are shown in Figure 20.
BO16 0 1
1119
BD24-form
(LK=0) (LK=1) BD24 LK
31
BD8-form
(LK=0) (LK=1)
target_addr target_addr 0
6 7
target_addr target_addr 0 LK
6 7 8
BD8
15
target_addr specifies the branch target address. The branch target address is the sum of BD24 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: LR (if LK=1)
target_addr specifies the branch target address. The branch target address is the sum of BD8 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: LR (if LK=1)
BD8
15
if (64-bit mode) 0 then M 32 else M if BO320 then CTRM:63 CTRM:63 - 1 BO320 | ((CTRM:63 0) BO321) ctr_ok BO320 | (CRBI32+32 BO321) cond_ok if ctr_ok & cond_ok then NIA iea (CIA + EXTS(BD15 || 0b0)) else NIA iea CIA + 4 if LK then LR iea CIA + 4 The BI32 field specifies the Condition Register bit to be tested. The BO32 field is used to resolve the branch as described in Figure 19. target_addr specifies the branch target address. The branch target address is the sum of BD15 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: CTR LR (if BO320=1) (if LK=1)
cond_ok (CRBI16+32 BO16) if cond_ok then NIA iea CIA + EXTS(BD8 || 0b0) else NIA iea CIA + 2 The BI16 field specifies the Condition Register bit to be tested. The BO16 field is used to resolve the branch as described in Figure 20. target_addr specifies the branch target address. The branch target address is the sum of BD8 || 0b0 sign-extended and the address of this instruction, with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. Special Registers Altered: None
1120
(LK=0) (LK=1) LK
15
(LK=0) (LK=1)
NIA iea CTR0:62 || 0b0 if LK then LR iea CIA + 2 The branch target address is CTR0:62 || 0b0 with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: LR (if LK=1)
NIA iea LR0:62 || 0b0 if LK then LR iea CIA + 2 The branch target address is LR0:62 || 0b0 with the high-order 32 bits of the branch target address set to 0 in 32-bit mode. If LK=1 then the effective address of the instruction following the Branch instruction is placed into the Link Register. Special Registers Altered: LR (if LK=1)
1121
System Call
se_sc 02
0 15
C-form,ESC-form
e_sc 31
0 6
ELEV ///
11
If MSRGS = 0 or if ELEV = 1, the effective address of the instruction following the System Call instruction is placed into SRR0 and the contents of the MSR are copied into SRR1. Otherwise, the effective address of the instruction following the System Call instruction is placed into GSRR0 and the contents of the MSR are copied into GSRR1. ELEV values greater than 1 are reserved. Bits 0:3 of the ELEV field (instruction bits 16:19) are treated as a reserved field. If ELEV=0, a System Call interrupt is generated. If ELEV=1, an Embedded Hypervisor System Call interrupt is generated. The interrupt causes the MSR to be set as described in Section 7.6.10 and Section 7.6.27 of Book III-E. If ELEV=0 or se_sc is executed, and the processor is in guest state, instruction execution resumes at the address given by one of the following. GIVPR0:47 || GIVOR848:59||0b0000 if IVORs [Category:Embedded.Phased-Out] are supported. GIVPR0:51||0x120 if Interrupt Fixed Offsets [Category:Embedded.Phased-In] are supported. If ELEV=0 or se_sc is executed, and the processor is in hypervisor state, instruction execution resumes at the address given by one of the following. IVPR0:47 || IVOR848:59||0b0000 if IVORs [Category:Embedded.Phased-Out] are supported. IVPR0:51||0x120 if Interrupt Fixed Offsets [Category:Embedded.Phased-In] are supported. If ELEV=1, the interrupt causes instruction execution to resume at the address given by one of the following. GIVPR0:47 || GIVOR4048:59||0b0000 if IVORs [Category:Embedded.Phased-Out] are supported. GIVPR0:51||0x300 if Interrupt Fixed Offsets [Category:Embedded.Phased-In] are supported. This instruction is context synchronizing. Special Registers Altered: SRR0 GSRR0 SRR1 GSRR1 MSR Programming Note e_sc serves as both a basic and an extended mnemonic. The Assembler will recognize an e_sc mnemonic with one operand as the basic form, and an e_sc mnemonic with no operand as the extended form. In the extended form, the ELEV operand is omitted and assumed to be 0.
36
/
31
lev = ELEV if se_sc then 0 lev rr0 iea CIA + 2 else if e_sc then lev ELEV rr0 iea CIA + 4 if lev = 0 then if MSRGS = 1 then GSRR0 iea rr0 GSRR1 MSR if IVORs supported then GIVPR0:47 || NIA GIVOR848:59 || 0b0000 else GIVPR0:51 || 0x120 NIA MSR new_value (see below) else SRR0 iea rr0 SRR1 MSR if IVORs supported then IVPR0:47 || NIA IVOR848:59 || 0b0000 else IVPR0:51 || 0x120 NIA MSR new_value (see below) else if ELEV = 1 then SRR0 iea CIA + 4 SRR1 MSR if IVORs supported then IVPR0:47 || IVOR4048:59 || 0b0000 NIA else NIA IVPR0:51 || 0x300 new_value (see below) MSR If category E.HV is not implemented, the System Call instruction behaves as if MSRGS = 0 and ELEV = 0.
1122
C-form
11
15
se_illegal is used to request an Illegal Instruction exception. The behavior is the same as if an illegal instruction was executed. This instruction is context synchronizing. Special Registers Altered: SRR0 SRR1 MSR ESR
MSR NIA
iea
The se_rfmci instruction is used to return from a machine check class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of MCSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address MCSRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the values placed into the save/ restore registers by the interrupt processing mechanism (see Chapter 7 of Book III-E) is the address and MSR value of the instruction that would have been executed next had the interrupt not occurred (that is, the address in MCSRR0 at the time of the execution of the se_rfmci). This instruction is privileged and context synchronizing. Special Registers Altered: MSR
1123
C-form
C-form
MSR NIA
iea
MSR NIA
iea
The se_rfci instruction is used to return from a critical class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of CSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address CSRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the values placed into the save/restore registers by the interrupt processing mechanism (see Chapter 7 of Book III-E) is the address and MSR value of the instruction that would have been executed next had the interrupt not occurred (that is, the address in CSRR0 at the time of the execution of the se_rfci). This instruction is privileged and context synchronizing. Special Registers Altered: MSR
The se_rfi instruction is used to return from a non-critical class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of SRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched under control of the new MSR value from the address SRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the values placed into the save/restore registers by the interrupt processing mechanism (see Chapter 7 of Book III-E) is the address and MSR value of the instruction that would have been executed next had the interrupt not occurred (that is, the address in SRR0 at the time of the execution of the se_rfi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR
1124
C-form
C-form
i
MSR NIA
iea
The se_rfdi instruction is used to return from a debug class interrupt, or as a means of establishing a new context and synchronizing on that new context simultaneously. The contents of DSRR1 are placed into the MSR. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address DSRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the save/restore registers by the interrupt processing mechanism (see Chapter 7 of Book III-E) is the address of the instruction that would have been executed next had the interrupt not occurred (that is, the address in DSRR0 at the time of the execution of se_rfdi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Corequisite Categories: Embedded.Enhanced Debug
newmsr GSRR1 if MSRGS = 1 then MSRGS,WE newmsrGS,WE prots MSRPUCLEP,DEP,PMMP newmsr prots & MSR | ~prots & newmsr newmsr MSR NIA iea GSRR00:62 || 0b0 The se_rfgi instruction is used to return from a guest state base class interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of Guest Save/Restore Register 1 are placed into the MSR. If the se_rfgi is executed in the guest supervisor state (MSRGS PR = 0b10), the bits MSRGS WE are not modified and the bits MSRUCLE DE PMM are modified only if the associated bits in the Machine State Register Protect (MSRP) Register are set to 0. If the new MSR value does not enable any pending exceptions, then the next instruction is fetched, under control of the new MSR value, from the address GSRR00:62||0b0. If the new MSR value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the associated save/ restore register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in GSRR0 at the time of the execution of the se_rfgi). This instruction is privileged and context synchronizing. Special Registers Altered: MSR Corequisite Categories: Embedded.Hypervisor
1125
XL-form
BT,BA,BB BT
11
BA
16
BB
21
257
/
31 0
31
BA
16
BB
21
129
/
31
CRBT+32
The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
CRBB+32
The bit in the Condition Register specified by BA+32 is ANDed with the ones complement of the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
XL-form
XL-form
BT,BA,BB BT
11
BA
16
BB
21
289
/
31 0
31
BA
16
BB
21
225
/
31
CRBT+32
CRBA+32 CRBB+32
CRBT+32
(CRBA+32
& CRBB+32)
The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
The bit in the Condition Register specified by BA+32 is ANDed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
1126
XL-form
Condition Register OR
e_cror BT,BA,BB BT
6 11
XL-form
BT,BA,BB BT
11
BA
16
BB
21
33
/
31 0
31
BA
16
BB
21
449
/
31
CRBT+32
(CRBA+32
| CRBB+32)
CRBT+32
CRBA+32 | CRBB+32
The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by BB+32, and the complemented result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
The bit in the Condition Register specified by BA+32 is ORed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
XL-form
BT,BA,BB 31 BT
11
BA
16
BB
21
193
/
31
BA
16
BB
21
417
/
31
CRBA+32 CRBB+32
CRBB+32
The bit in the Condition Register specified by BA+32 is ORed with the complement of the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
The bit in the Condition Register specified by BA+32 is XORed with the bit in the Condition Register specified by BB+32, and the result is placed into the bit in the Condition Register specified by BT+32. Special Registers Altered: CRBT+32
Move CR Field
e_mcrf 31
0 6
XL-form
///
21
16
/
31
CR4xBF+32:4xBF+35
CR4xBFA+32:4xBFA+35
The contents of Condition Register field BFA are copied to Condition Register field BF. Special Registers Altered: CR field BF
1127
1128
5.7 Fixed-Point Trap Instructions . . . . 1145 5.8 Fixed-Point Select Instruction . . . 1145 5.9 Fixed-Point Logical, Bit, and Move Instructions . . . . . . . . . . . . . . . . . . . . 1146 5.10 Fixed-Point Rotate and Shift Instructions . . . . . . . . . . . . . . . . . . . . 1151 5.11 Move To/From System Register Instructions . . . . . . . . . . . . . . . . . . . . 1154
1129
D-form
RT,D(RA) RT
11
RA
16
D
31 0
08
RZ
12
RX
15
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA 560 || MEM(EA, 1) RT Let the effective address (EA) be the sum (RA|0) + D. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None
EA RZ
Let the effective address (EA) be the sum RX + SD4. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. Special Registers Altered: None
D-form
RT,D8(RA) RT
11
RA
16
0
24
D8
31 0
14
RA
16
D
31
EA RT RA
Let the effective address (EA) be the sum (RA) + D8. The byte in storage addressed by EA is loaded into RT56:63. RT0:55 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA EXTS(MEM(EA, 2)) RT Let the effective address (EA) be the sum (RA|0) + D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. Special Registers Altered: None
D-form
RT,D(RA) RT
11
RA
16
D
31 0
10
RZ
12
RX
15
if RA = 0 then b 0 (RA) else b EA b + EXTS(D) 48 0 || MEM(EA, 2) RT Let the effective address (EA) be the sum (RA|0) + D. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. Special Registers Altered: None
EA RZ
Let the effective address (EA) be the sum (RX) + (SD4 || 0). The halfword in storage addressed by EA is loaded into RZ48:63. RZ0:47 are set to 0. Special Registers Altered: None
1130
RT,D8(RA) RT
11
RA
16
03
24
D8
31 0
06
RA
16
01
24
D8
31
EA RT RA
EA RT RA
Let the effective address (EA) be the sum (RA) + D8. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are filled with a copy of bit 0 of the loaded halfword. EA is placed into RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
Let the effective address (EA) be the sum (RA) + D8. The halfword in storage addressed by EA is loaded into RT48:63. RT0:47 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
D-form
RT,D(RA) RT
11
RA
16
D
31 0
12
RZ
12
RX
15
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA 320 || MEM(EA, 4) RT Let the effective address (EA) be the sum (RA|0) + D. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. Special Registers Altered: None
EA RZ
Let the effective address (EA) be the sum (RX) + (SD4 || 00). The word in storage addressed by EA is loaded into RZ32:63. RZ0:31 are set to 0. Special Registers Altered: None
1131
RT,D8(RA) RT
11
RA
16
02
24
D8
31
EA RT RA
Let the effective address (EA) be the sum (RA) + D8. The word in storage addressed by EA is loaded into RT32:63. RT0:31 are set to 0. EA is placed into register RA. If RA=0 or RA=RT, the instruction form is invalid. Special Registers Altered: None
1132
Store Byte
e_stb 13
0 6
D-form
SD4-form
RS,D(RA) RS
11
RA
16
D
31 0
09
RZ
12
RX
15
if RA = 0 then b 0 (RA) else b EA b + EXTS(D) (RS)56:63 MEM(EA, 1) Let the effective address (EA) be the sum (RA|0)+ D. (RS)56:63 are stored in the byte in storage addressed by EA. Special Registers Altered: None
EA (RX) + EXTS(SD4) (RZ)56:63 MEM(EA, 1) Let the effective address (EA) be the sum (RX) + SD4. (RZ)56:63 are stored in the byte in storage addressed by EA. Special Registers Altered: None
1133
D8-form
RS,D8(RA) RS
11
RA
16
04
24
D8
31
EA (RA) + EXTS(D8) MEM(EA, 1) (RS)56:63 RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)56:63 are stored in the byte in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
Store Halfword
e_sth 23
0 6
D-form
SD4-form
RS,D(RA) RS
11
RA
16
D
31 0
11
RZ
12
RX
15
if RA = 0 then b 0 (RA) else b EA b + EXTS(D) (RS)48:63 MEM(EA, 2) Let the effective address (EA) be the sum (RA|0) + D. (RS)48:63 are stored in the halfword in storage addressed by EA. Special Registers Altered: None
EA (RX) + (590 || SD4 || 0) (RZ)48:63 MEM(EA, 2) Let the effective address (EA) be the sum (RX) + (SD4 || 0). (RZ)48:63 are stored in the halfword in storage addressed by EA. Special Registers Altered: None
D8-form
RS,D8(RA) RS
11
RA
16
05
24
D8
31
EA (RA) + EXTS(D8) (RS)48:63 MEM(EA, 2) RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)48:63 are stored in the halfword in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
1134
D-form
SD4-form
RS,D(RA) RS
11
RA
16
D
31 0
13
RZ
12
RX
15
if RA = 0 then b 0 else b (RA) b + EXTS(D) EA (RS)32:63 MEM(EA, 4) Let the effective address (EA) be the sum (RA|0) + D. (RS)32:63 are stored in the word in storage addressed by EA. Special Registers Altered: None
EA (RX) + (580 || SD4 || 20) MEM(EA, 4) (RZ)32:63 Let the effective address (EA) be the sum (RX)+ (SD4 || 00). (RZ)32:63 are stored in the word in storage addressed by EA. Special Registers Altered: None
D8-form
RS,D8(RA) RS
11
RA
16
06
24
D8
31
EA (RA) + EXTS(D8) MEM(EA, 4) (RS)32:63 RA EA Let the effective address (EA) be the sum (RA) + D8. (RS)32:63 are stored in the word in storage addressed by EA. EA is placed into register RA. If RA=0, the instruction form is invalid. Special Registers Altered: None
1135
D8-form
D8-form
RT,D8(RA) RT
11
RA
16
08
24
D8
31 0
06
RA
16
9
24
D8
31
if RA = 0 then b 0 else b (RA) b + EXTS(D8) EA RT r do while r 31 32 0 || MEM(EA,4) GPR(r) r + 1 r EA EA + 4 Let n = (32-RT). Let the effective address (EA) be the sum (RA|0) + D8. n consecutive words starting at EA are loaded into the low-order 32 bits of GPRs RT through 31. The highorder 32 bits of these GPRs are set to zero. If RA is in the range of registers to be loaded, including the case in which RA = 0, the instruction form is invalid. Special Registers Altered: None
if RA = 0 then b 0 else b (RA) b + EXTS(D8) EA RS r do while r 31 GPR(r)32:63 MEM(EA,4) r r + 1 EA EA + 4 Let n = (32-RS). Let the effective address (EA) be the sum (RA|0) + D8. n consecutive words starting at EA are stored from the low-order 32 bits of GPRs RS through 31. Special Registers Altered: None
1136
e_addic[.] and e_subfic[.] always set CA to reflect the carry out of bit 0 in 64-bit mode and out of bit 32 in 32bit mode. The fixed-point Arithmetic instructions from Book I, add[.], addo[.], addc[.], addco[.], adde[.], addeo[.], addme[.], addmeo[.], addze[.], addzeo[.], divw[.], divwo[.], divwu[.], divwuo[.], mulhw[.], mulhwu[.], mullw[.], mullwo[.] neg[.], nego[.], subf[.], subfo[.] subfe[.], subfeo[.], subfme[.], subfmeo[.], subfze[.], subfzeo[.], subfc[.], and subfco[.] are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I; see Section 3.3.8 of Book I for the instruction definitions. The fixed-point Arithmetic instructions from Book I, mulld[.], mulldo[.], mulhd[.], mulhdu[.], muldu[.], divd[.], divdo[.], divdu[.], and divduo[.] are available while executing in VLE mode on 64-bit implementations. The mnemonics, decoding, and semantics for those instructions are identical to these in Book I; see Section 3.3.8 of Book I for the instruction definitions.
1137
RR-form
Add Immediate
e_add16i RT,RA,SI RT
6 11
D-form
RX,RY 0
8
RY
12
RX
15 0
07
RA
16
SI
31
RX
(RX) + (RY)
RT
(RA) + EXTS(SI)
The sum (RX) + (RY) is placed into register RX. Special Registers Altered: None
The sum (RA) + SI is placed into register RT. Special Registers Altered: None
RA,si si
11
RA
16
17
21
si
31 0
28
RA
16
18
21
si
31
RA
(RA) + EXTS(si)
RA
The sum (RA) + si is placed into register RT. Special Registers Altered: CR0
The sum (RA) + (si || 0x0000) is placed into register RA. Special Registers Altered: None
SCI8-form
(Rc=0) (Rc=1) 8
Rc 20
OIM5-form
RT,RA,sci8 RT,RA,sci8 RT
11
RX
15
RA
16
F SCL
21 22 24
UI8
31
oimm (590 || OIM5) + 1 (RX) + oimm RX The sum (RX) + oimm is placed into RX. The value of oimm must be in the range of 1 to 32. Special Registers Altered: None
The sum (RA) + sci8 is placed into register RT. Special Registers Altered: CR0 (if Rc=1)
1138
RT,RA,sci8 RT,RA,sci8 RT
11
(Rc=0) (Rc=1) 9
16
RA
Rc F SCL
20 21 22 24
UI8
31
The sum (RA) + sci8 is placed into register RT. Special Registers Altered: CR0 CA (if Rc=1)
Subtract
se_sub 1
0 6
RR-form
RX,RY 2
8
RR-form
RY
12
RX
15 0
01
RY
12
RX
15
(RX) + (RY) + 1 The sum (RX) + (RY) + 1 is placed into register RX.
RX Special Registers Altered: None
Subtract Immediate
se_subi se_subi. 09 RX,oimm RX,oimm Rc OIM5
6 7 12
OIM5-form
(Rc=0) (Rc=1)
RT,RA,sci8 RT,RA,sci8 RT
11
RX
15
RA
UI8
31
sci8 RT
56-SCL8F
oimm (590 || OIM5) + 1 (RX) + oimm + 1 RX The sum (RA) + oimm + 1 is placed into register RX. The value of oimm must be in the range 1 to 32. Special Registers Altered: CR0 (if Rc=1)
The sum (RA) + sci8 + 1 is placed into register RT. Special Registers Altered: CR0 CA (if Rc=1)
1139
RT,RA,sci8 RT
11
RA
16
20
F SCL
21 22 24
UI8
31 0
28
RA
16
20
21
si
31
prod0:127 (RA) EXTS(si) prod64:127 RA The 64-bit first operand is (RA). The 64-bit second operand is the sign-extended value of the si operand. The low-order 64-bits of the 128-bit product of the operands are placed into register RA. Both operands and the product are interpreted as signed integers. Special Registers Altered: None
The 64-bit first operand is (RA). The 64-bit second operand is the sci8 operand. The low-order 64-bits of the 128-bit product of the operands are placed into register RT. Both operands and the product are interpreted as signed integers. Special Registers Altered: None
RR-form
R-form
RX,RY 1
8
RY
12
RX
15 0
RX
15
RX
(RX)32:63 (RY)32:63
The 32-bit operands are the low-order 32-bits of RX and of RY. The 64-bit product of the operands is placed into register RX. Both operands and the product are interpreted as signed integers. Special Registers Altered: None
1140
The fixed-point Compare instructions from Book I, cmp and cmpl are available while executing in VLE mode. The mnemonics, decoding, and semantics for these instructions are identical to those in Book I; see Section 3.3.9 of Book I for the instruction definitions.
IM5-form
I16A-form
RX,UI5 1
6 7
UI5
12
RX
15 0
28
RA
16
19
21
si
31
0b001 else d
0b010
b EXTS(si) if (RA)32:63 < b32:63 then c if (RA)32:63 > b32:63 then c if (RA)32:63 = b32:63 then c CR0 c || XERSO
Bit UI5+32 of register RX is tested for equality to 0 and the result is recorded in CR0. EQ is set if the tested bit is 0, LT is cleared, and GT is set to the inverse value of EQ. Special Registers Altered: CR0
The low-order 32 bits of register RA are compared with si, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
1141
Compare Word
se_cmp RX,RY 0
6 8
RR-form
RY
12
RX
15
RA
16
21
F SCL
21 22 24
UI8
31
56-SCL8 sci8 F || UI8 ||SCL8F if (RA)32:63 < sci832:63 then c if (RA)32:63 > sci832:63 then c if (RA)32:63 = sci832:63 then c c || XERSO CR4BF32+32:4BF32+35
if (RX)32:63 < (RY)32:63 then c if (RX)32:63 > (RY)32:63 then c if (RX)32:63 = (RY)32:63 then c c || XERSO CR0
The low-order 32 bits of register RA are compared with sci8, treating operands as signed integers. The result of the comparison is placed into CR field BF32. Special Registers Altered: CR field BF32
The low-order 32 bits of register RX are compared with the low-order 32 bits of register RY, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
RX,UI5 1
6 7
UI5
12
RX
15 0
28
RA
16
21
21
ui
31
590 || UI5 b if (RX)32:63 < b32:63 then c if (RX)32:63 > b32:63 then c if (RX)32:63 = b32:63 then c CR0 c || XERSO
480 || ui b if (RA)32:63 <u b32:63 then c if (RA)32:63 >u b32:63 then c if (RA)32:63 = b32:63 then c CR0 c || XERSO
The low-order 32 bits of register RX are compared with UI5, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
The low-order 32 bits of register RA are compared with ui, treating operands as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
1142
RR-form
RY
12
RX
15
RA
16
21
F SCL
21 22 24
UI8
31
56-SCL8 sci8 F || UI8 ||SCL8F u if (RA)32:63 < sci832:63 then c if (RA)32:63 >u sci832:63 then c if (RA)32:63 = sci832:63 then c c || XERSO CR4BF32+32:4BF32+35
if (RX)32:63 <u (RY)32:63 then c if (RX)32:63 >u (RY)32:63 then c if (RX)32:63 = (RY)32:63 then c c || XERSO CR0
The low-order 32 bits of register RA are compared with sci8, treating operands as unsigned integers. The result of the comparison is placed into CR field BF32. Special Registers Altered: CR field BF32
The low-order 32 bits of register RX are compared with the low-order 32 bits of register RY, treating operands as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
Compare Halfword
e_cmph BF,RA,RB BF 0
6 9 11
X-form
RX,oimm 31 1 OIM5
6 7 12
RA
16
RB
21
14
/
31
RX
15
590 || (OIM5 + 1) oimm if (RX)32:63 <u oimm32:63 then c if (RX)32:63 >u oimm32:63 then c if (RX)32:63 = oimm32:63 then c CR0 c || XERSO
a EXTS((RA)48:63) EXTS((RB)48:63) b if a < b then c 0b100 0b010 if a > b then c 0b001 if a = b then c CR4BF+32:4BF+35 c || XERSO The low-order 16 bits of register RA are compared with the low-order 16 bits of register RB, treating operands as signed integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF
The low-order 32 bits of register RX are compared with oimm, treating operands as unsigned integers. The result of the comparison is placed into CR0. The value of oimm must be in the range of 1 to 32. Special Registers Altered: CR0
1143
RR-form
RX,RY 2
8
RY
12
RX
15 0
28
6
si
11
RA
16
22
21
si
31
a EXTS((RX)48:63) b EXTS((RY)48:63) 0b100 if a < b then c 0b010 if a > b then c if a = b then c 0b001 c || XERSO CR0 The low-order 16 bits of register RX are compared with the low-order 16 bits of register RY, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
a EXTS((RA)48:63) b EXTS(si) 0b100 if a < b then c 0b010 if a > b then c if a = b then c 0b001 c || XERSO CR0 The low-order 16 bits of register RA are compared with si, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
X-form
BF,RA,RB BF 0
9 11
RA
16
RB
21
46
/
31 0
RY
12
RX
15
a EXTZ((RA)48:63) EXTZ((RB)48:63) b if a <u b then c 0b100 0b010 if a >u b then c 0b001 if a = b then c CR4BF+32:4BF+35 c || XERSO The low-order 16 bits of register RA are compared with the low-order 16 bits of register RB, treating operands as unsigned integers. The result of the comparison is placed into CR field BF. Special Registers Altered: CR field BF
The low-order 16 bits of register RX are compared with the low-order 16 bits of register RY, treating operands as unsigned integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
1144
ui
11
RA
16
23
21
ui
31
48 a 0 || (RA)48:63 48 b 0 || ui 0b100 if a <u b then c if a >u b then c 0b010 0b001 if a = b then c c || XERSO CR0
The low-order 16 bits of register RA are compared with the ui field, treating operands as signed integers. The result of the comparison is placed into CR0. Special Registers Altered: CR0
1145
RT,ui RT
11
ui
16
25
21
ui
31 0
28
ui
16
29
21
ui
31
RT
(RT) &
(480
|| ui) RT (RT) & (320 || ui || 160) The contents of register RT are ANDed with 320 || ui || 160 and the result is placed into register RT. Special Registers Altered: CR0
The contents of register RT are ANDed with 480 || ui and the result is placed into register RT. Special Registers Altered: CR0
IM5-form
RA,RS,sci8 RA,RS,sci8 RS
11
(Rc=0) (Rc=1)
0
11
UI5
12
RX
15
RA
16
12 Rc F SCL
20 21 22 24
UI8
31
RX
The contents of register RX are ANDed with 590 || UI5 and the result is placed into register RX. Special Registers Altered: None
The contents of register RS are ANDed with sci8 and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
1146
I16L-form
RT,ui RT
11
ui
16
24
21
ui
31 0
28
ui
16
26
21
ui
31
RT
(RT) | (480 || ui) RT (RT) | (320 || ui || 160) The contents of register RT are ORed with 320 || ui || 16 0 and the result is placed into register RT. Special Registers Altered: None
The contents of register RT are ORed with 480 || ui and the result is placed into register RT. Special Registers Altered: None
OR Scaled Immediate
e_ori e_ori. 06
0 6
SCI8-form
(Rc=0) (Rc=1)
SCI8-form
(Rc=0) (Rc=1)
RA,RS,sci8 RA,RS,sci8 RS
11
RA
16
13 Rc F SCL
20 21 22 24
UI8
31 0
RA
16
14 Rc F SCL
20 21 22 24
UI8
31
The contents of register RS are ORed with sci8 and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
The contents of register RS are XORed with sci8 and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
RR-form
(Rc=0) (Rc=1) RX
12 15
RX,RY RX,RY 1 Rc RY
6 7 8
RX,RY 1
8
RY
12
RX
15
RX
RX
(RX) &
(RY)
The contents of register RX are ANDed with the contents of register RY and the result is placed into register RX. Special Registers Altered: CR0 (if Rc=1)
The contents of register RX are ANDed with the complement of the contents of register RY and the result is placed into register RX. Special Registers Altered: None
1147
RR-form
R-form
RX,RY 0
8
RY
12
RX
15 0
RX
15
RX
(RX) | (RY)
RX
(RX)
The contents of register RX are ORed with the contents of register RY and the result is placed into register RX. Special Registers Altered: None
The contents of RX are complemented and placed into register RX. Special Registers Altered: None
IM5-form
IM5-form
RX,UI5 0
6 7
UI5
12
RX
15 0
24
UI5
12
RX
15
a RX
a RX
Bit UI5+32 of register RX is set to 1. All other bits in register RX are set to 0. Special Registers Altered: None
IM5-form
IM5-form
0
6 7
UI5
12
RX
15 0
25
UI5
12
RX
15
641 64-a
a RX
0 || a1
If UI5 is not zero, the low-order UI5 bits are set to 1 in register RX and all other bits in register RX are set to 0. If UI5 is 0, all bits in register RX are set to 1. Special Registers Altered: None
1148
R-form
RX 13
12
RX
15 0
RX
15
s RX
(RX)56 56 s || (RX)56:63
s RX
(RX)48 48 s || (RX)48:63
(RX)56:63 are placed into RX56:63. Bit 56 of register RX is placed into RX0:55. Special Registers Altered: None
(RX)48:63 are placed into RX48:63. Bit 48 of register RX is placed into RX0:47. Special Registers Altered: None
R-form
R-form
RX 12
12
RX
15 0
RX
15
RX
560
|| (RX)56:63
RX
480
|| (RX)48:63
(RX)56:63 are placed into RX56:63. RX0:55 are set to 0. Special Registers Altered: None
(RX)48:63 are placed into RX48:63. RX0:47 are set to 0. Special Registers Altered: None
Load Immediate
e_li 28
0 6
LI20-form
IM7-form
RT,LI20 RT
11
li20
0 li20
16 17 21
li20
31 0
09
RX
15
RT
RX
570
|| UI7
The sign-extended LI20 field is placed into RT. Special Registers Altered: None
The zero-extended UI7 field is placed into RX. Special Registers Altered: None
I16L-form
RT,ui RT
11
ui
16
28
21
ui
31
RT
32
0 || ui || 160
The zero-extended value of ui shifted left 16 bits is placed into RT. Special Registers Altered: None
1149
RR-form
Move Register
se_mr RX,RY 1
6 8
RR-form
RX,ARY 3
8
ARY
12
RX
15 0
RY
12
RX
15
r RX
ARY+8 GPR(r)
RX
(RY)
The contents of register RY are placed into RX. Special Registers Altered: None
The contents of register ARY+8 are placed into RX. ARY specifies a register in the range R8:R23. Special Registers Altered: None
RR-form
ARX,RY 2
8
RY
ARX
12 15
r ARX+8 (RY) GPR(r) The contents of register RY are placed into register ARX+8. ARX specifies a register in the range R8:R23. Special Registers Altered: None
1150
X-form
(Rc=0) (Rc=1) RB
16 21
X-form
(Rc=0) (Rc=1)
RA,RS,RB RA,RS,RB RS
11
RA,RS,SH RA,RS,SH RS
11
RA
280
Rc
31
RA
16
SH
21
312
Rc
31
n RA
(RB)59:63 ROTL32((RS)32:63,n)
n RA
SH ROTL32((RS)32:63,n)
The contents of register RS are rotated32 left the number of bits specified by (RB)59:63 and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
The contents of register RS are rotated32 left SH bits and the result is placed into register RA. Special Registers Altered: CR0 (if Rc=1)
RA,RS,SH,MB,ME RS
11
RA
16
SH
21
MB
26
ME
0
31 0
29
RA
16
SH
21
MB
26
ME
1
31
n r m RA
n r m RA
The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data is inserted into register RA under control of the generated mask. Special Registers Altered: None
The contents of register RS are rotated32 left SH bits. A mask is generated having 1-bits from bit MB+32 through bit ME+32 and 0-bits elsewhere. The rotated data is ANDed with the generated mask and the result is placed into register RA. Special Registers Altered: None
1151
X-form
(Rc=0) (Rc=1)
RA,RS,SH RA,RS,SH RS
11
RX,UI5 0
6 7
RA
16
SH
21
56
Rc
31
UI5
12
RX
15
n r m RA
n r m RX
The contents of the low-order 32 bits of register RS are shifted left SH bits. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RA32:63. RA0:31 are set to 0. Special Registers Altered: CR0 (if Rc=1)
The contents of the low-order 32 bits of register RX are shifted left UI5 bits. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. Special Registers Altered: None
RR-form
RX,RY 2
8
RY
12
RX
15 0
26
UI5
12
RX
15
n (RY)58:63 r ROTL32((RX)32:63, n) MASK(32, 63-n) if (RY)58 = 0 then m 640 else m r & m RX The contents of the low-order 32 bits of register RX are shifted left the number of bits specified by (RY)58:63. Bits shifted out of position 32 are lost. Zeros are supplied to the vacated positions on the right. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. Shift amounts from 32-63 give a zero result. Special Registers Altered: None
n r m s RX CA
UI5 ROTL32((RX)32:63, 64-n) MASK(n+32, 63) (RX)32 r&m | (64s)&m s & ((r&m)32:630)
The contents of the low-order 32 bits of register RX are shifted right UI5 bits. Bits shifted out of position 63 are lost, and bit 32 of RX is replicated to fill the vacated positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result is placed into RX32:63. CA is set to 1 if the low-order 32 bits of register RX contain a negative value and any 1-bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Special Registers Altered: CA
1152
RR-form
X-form
(Rc=0) (Rc=1)
RX,RY 1
8
RY
12
RX
15 0
31
RA
16
SH
21
568
Rc
31
n (RY)59:63 r ROTL32((RX)32:63, 64-n) MASK(n+32, 63) if (RY)58 = 0 then m 64 0 else m s (RX)32 RX r&m | (64s)&m s & ((r&m)32:630) CA The contents of the low-order 32 bits of register RX are shifted right the number of bits specified by (RY)58:63. Bits shifted out of position 63 are lost, and bit 32 of RX is replicated to fill the vacated positions on the left. Bit 32 of RX is replicated to fill RX0:31 and the 32-bit result is placed into RX32:63. CA is set to 1 if the low-order 32 bits of register RX contain a negative value and any 1bits are shifted out of bit position 63; otherwise CA is set to 0. A shift amount of zero causes RX to receive EXTS((RX)32:63), and CA to be set to 0. Shift amounts from 32-63 give a result of 64 sign bits, and cause CA to receive the sign bit of (RX)32:63. Special Registers Altered: CA
n r m RA
The contents of the low-order 32 bits of register RS are shifted right SH bits. Bits shifted out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The 32-bit result is placed into RA32:63. RA0:31 are set to 0. Special Registers Altered: CR0 (if Rc=1)
RR-form
RX,UI5 16 0
6 7
RY
12
RX
15
UI5
12
RX
15
n r m RX
n (RY)59:63 r ROTL32((RX)32:63, 64-n) MASK(n+32, 63) if (RY)58 = 0 then m 64 0 else m RX r & m The contents of the low-order 32 bits of register RX are shifted right the number of bits specified by (RY)58:63. Bits shifted out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. Shift amounts from 32 to 63 give a zero result. Special Registers Altered: None
The contents of the low-order 32 bits of register RX are shifted right UI5 bits. Bits shifted out of position 63 are lost. Zeros are supplied to the vacated positions on the left. The 32-bit result is placed into RX32:63. RX0:31 are set to 0. Special Registers Altered: None
1153
R-form
R-form
RX 10
12
RX
15 0
RX
15
RX
CTR
RX
LR
The CTR contents are placed into register RX. Special Registers Altered: None
The LR contents are placed into register RX. Special Registers Altered: None
R-form
R-form
RX 11
12
RX
15 0
RX
15
CTR
(RX)
LR
(RX)
The contents of register RX are placed into the CTR. Special Registers Altered: CTR
The contents of register RX are placed into the LR. Special Registers Altered: LR
1154
Instruction Synchronize
se_isync 01
0 15
C-form
Executing an se_isync instruction ensures that all instructions preceding the se_isync instruction have completed before the se_isync instruction completes, and that no subsequent instructions are initiated until after the se_isync instruction completes. It also ensures that all instruction cache block invalidations caused by icbi instructions preceding the se_isync instruction have been performed with respect to the processor executing the se_isync instruction, and then causes any prefetched instructions to be discarded. Except as described in the preceding sentence, the se_isync instruction may complete before storage accesses associated with instructions preceding the se_isync instruction have been performed. This instruction is context synchronizing. The se_isync instruction has identical semantics to the Book II isync instruction, but has a different encoding. Special Registers Altered: None
1155
to fetch VLE instructions from a page marked as LittleEndian generates an instruction storage interrupt byteordering exception.
1156
Instructions and resources from categories other than Base and Embedded are available in VLE. These include categories for which all the instructions in the category use primary opcode 4 or primary opcode 31.
7.2 Vector
Vector instructions implemented by category VLE are identical to those defined in Book I. If category Vector is supported in non-VLE mode, Vector instructions are also supported in VLE mode. The mnemonics, decoding, and semantics for those instructions are identical to those in Book I; see Chapter 6 of Book I for the instruction definitions.
1157
1158
XO XO XO XO XO X X EVX X X X X X X X X X X X X X X X X X X X X X XO XO XO XO X X D I16A I16A SCI8 SCI8 I16L I16L SCI8 BD24 BD15
Opcode (hexadecimal)2 7C000214 7C000014 7C000114 7C0001D4 7C000194 7C000038 7C000078 1000020F 7C000000 7C000040 7C000074 7C000034 7C0005EC 7C0000AC 7C0000FE 7C0003AC 7C00030C 7C00006C 7C00007E 7C00022C 7C00027E 7C00014C 7C0001EC 7C0001FE 7C00010C 7C0007EC 7C0007FE 7C00038C 7C0003CC 7C0003D2 7C000392 7C0003D6 7C000396 7C00009C 7C0003C6 1C000000 70008800 70009000 18008000 18009000 7000C800 7000E800 1800C000 78000000 7A000000
Form
Cat1 B B B B B B B SP B B 64 B E B E.PD E ECL B E.PD B E.PD ECL B E.PD ECL B E.PD E.CI E.CD 64 64 B B LMV DS VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE
Mnemonic add[o][.] addc[o][.] adde[o][.] addme[o][.] addze[o][.] and[.] andc[.] brinc cmp cmpl cntlzd[.] cntlzw[.] dcba dcbf dcbfep dcbi dcblc dcbst dcbstep dcbt dcbtep dcbtls dcbtst dcbtstep dcbtstls dcbz dcbzep dci dcread divd[o][.] divdu[o][.] divw[o][.] divwu[o][.] dlmzb[.] dsn e_add16i e_add2i. e_add2is e_addi[.] e_addic[.] e_and2i. e_and2is. e_andi[.] e_b[l] e_bc[l]
Instruction Add Add Carrying Add Extended Add to Minus One Extended Add to Zero Extended AND AND with Complement Bit Reversed Increment Compare Compare Logical Count Leading Zeros Doubleword Count Leading Zeros Word Data Cache Block Allocate Data Cache Block Flush Data Cache Block Flush by External PID Data Cache Block Invalidate Data Cache Block Lock Clear Data Cache Block Store Data Cache Block Store by External PID Data Cache Block Touch Data Cache Block Touch by External PID Data Cache Block Touch and Lock Set Data Cache Block Touch for Store Data Cache Block Touch for Store by External PID Data Cache Block Touch for Store and Lock Set Data Cache Block set to Zero Data Cache Block set to Zero by External PID Data Cache Invalidate Data Cache Read [Alternative Encoding] Divide Doubleword Divide Doubleword Unsigned Divide Word Divide Word Unsigned Determine Leftmost Zero Byte Decorated Storage Notify Add Immediate Add (2 operand) Immediate and Record Add (2 operand) Immediate Shifted Add Scaled Immediate Add Scaled Immediate Carrying AND (two operand) Immediate AND (2 operand) Immediate Shifted AND Scaled Immediate Branch [and Link] Branch Conditional [and Link]
SR SR SR SR SR SR SR
SR SR P P M
P M P M P H H SR SR SR SR SR SR SR SR SR SR SR CT
1159
Form
Cat1 VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE, E.HV VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD
Mnemonic e_cmp16i e_cmph e_cmph16i e_cmphl e_cmphl16i e_cmpi e_cmpl16i e_cmpli e_crand e_crandc e_creqv e_crnand e_crnor e_cror e_crorc e_crxor e_lbz e_lbzu e_lha e_lhau e_lhz e_lhzu e_li e_lis e_lmw e_lwz e_lwzu e_mcrf e_mull2i e_mulli e_or2i e_or2is e_ori[.] e_rlw[.] e_rlwi[.] e_rlwimi e_rlwinm e_sc e_slwi[.] e_srwi[.] e_stb e_stbu e_sth e_sthu e_stmw e_stw e_stwu e_subfic[.] e_xori[.] efdabs efdadd efdcfs efdcfsf efdcfsi efdcfsid
Instruction Compare Immediate Word Compare Halfword Compare Halfword Immediate Compare Halfword Logical Compare Halfword Logical Immediate Compare Scaled Immediate Word Compare Logical Immediate Word Compare Logical Scaled Immediate Word Condition Register AND Condition Register AND with Complement Condition Register Equivalent Condition Register NAND Condition Register NOR Condition Register OR Condition Register OR with Complement Condition Register XOR Load Byte and Zero Load Byte and Zero with Update Load Halfword Algebraic Load Halfword Algebraic with Update Load Halfword and Zero Load Halfword and Zero with Update Load Immediate Load Immediate Shifted Load Multiple Word Load Word and Zero Load Word and Zero with Update Move CR Field Multiply (2 operand) Low Immediate Multiply Low Scaled Immediate OR (two operand) Immediate OR (2 operand) Immediate Shifted OR Scaled Immediate Rotate Left Word Rotate Left Word Immediate Rotate Left Word Immediate then Mask Insert Rotate Left Word Immediate then AND with Mask System Call Shift Left Word Immediate Shift Right Word Immediate Store Byte Store Byte with Update Store Halfword Store Halfword with Update Store Multiple Word Store Word Store Word with Update Subtract From Scaled Immediate Carrying XOR Scaled Immediate Floating-Point Double-Precision Absolute Value Floating-Point Double-Precision Add Floating-Point Double-Precision Convert from Single-Precision Convert Floating-Point Double-Precision from Signed Fraction Convert Floating-Point Double-Precision from Signed Integer Convert Floating-Point Double-Precision from Signed Integer Doubleword
I16A X I16A X I16A SCI8 I16A SCI8 XL XL XL XL XL XL XL XL D D8 D D8 D D8 LI20 I16L D8 D D8 XL I16A SCI8 I16L I16L SCI8 X X M M ESC X X D D8 D D8 D8 D D8 SCI8 SCI8 EVX EVX EVX EVX EVX EVX
SR SR
SR SR
1160
Form
Cat1 SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FS SP.FS SP.FD SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS
Mnemonic efdcfuf efdcfui efdcfuid efdcmpeq efdcmpgt efdcmplt efdctsf efdctsi efdctsidz efdctsiz efdctuf efdctui efdctuidz efdctuiz efddiv efdmul efdnabs efdneg efdsub efdtsteq efdtstgt efdtstlt efsabs efsadd efscfd efscfsf efscfsi efscfuf efscfui efscmpeq efscmpgt efscmplt efsctsf efsctsi efsctsiz efsctuf efsctui efsctuiz efsdiv
Instruction Convert Floating-Point Double-Precision from Unsigned Fraction Convert Floating-Point Double-Precision from Unsigned Integer Convert Floating-Point Double-Precision from Unsigned Integer Doubleword Floating-Point Double-Precision Compare Equal Floating-Point Double-Precision Compare Greater Than Floating-Point Double-Precision Compare Less Than Convert Floating-Point Double-Precision to Signed Fraction Convert Floating-Point Double-Precision to Signed Integer Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Fraction Convert Floating-Point Double-Precision to Unsigned Integer Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero Floating-Point Double-Precision Divide Floating-Point Double-Precision Multiply Floating-Point Double-Precision Negative Absolute Value Floating-Point Double-Precision Negate Floating-Point Double-Precision Subtract Floating-Point Double-Precision Test Equal Floating-Point Double-Precision Test Greater Than Floating-Point Double-Precision Test Less Than Floating-Point Single-Precision Absolute Value Floating-Point Single-Precision Add Floating-Point Single-Precision Convert from Double-Precision Convert Floating-Point Single-Precision from Signed Fraction Convert Floating-Point Single-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Single-Precision from Unsigned Integer Floating-Point Single-Precision Compare Equal Floating-Point Single-Precision Compare Greater Than Floating-Point Single-Precision Compare Less Than Convert Floating-Point Single-Precision to Signed Fraction Convert Floating-Point Single-Precision to Signed Integer Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero Convert Floating-Point Single-Precision to Unsigned Fraction Convert Floating-Point Single-Precision to Unsigned Integer Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero Floating-Point Single-Precision Divide
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1161
Form
Cat1 SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS E.HV B SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV
Mnemonic efsmul efsnabs efsneg efssub efststeq efststgt efststlt ehpriv eqv[.] evabs evaddiw evaddsmiaaw evaddssiaaw evaddumiaaw evaddusiaaw evaddw evand evandc evcmpeq evcmpgts evcmpgtu evcmplts evcmpltu evcntlsw evcntlzw evdivws evdivwu eveqv evextsb evextsh evfsabs evfsadd evfscfsf evfscfsi evfscfuf evfscfui evfscmpeq evfscmpgt evfscmplt evfsctsf evfsctsi evfsctsiz evfsctuf evfsctui evfsctuiz
Instruction Floating-Point Single-Precision Multiply Floating-Point Single-Precision Negative Absolute Value Floating-Point Single-Precision Negate Floating-Point Single-Precision Subtract Floating-Point Single-Precision Test Equal Floating-Point Single-Precision Test Greater Than Floating-Point Single-Precision Test Less Than Embedded Hypervisor Privilege Equivalent Vector Absolute Value Vector Add Immediate Word Vector Add Signed, Modulo, Integer to Accumulator Word Vector Add Signed, Saturate, Integer to Accumulator Word Vector Add Unsigned, Modulo, Integer to Accumulator Word Vector Add Unsigned, Saturate, Integer to Accumulator Word Vector Add Word Vector AND Vector AND with Complement Vector Compare Equal Vector Compare Greater Than Signed Vector Compare Greater Than Unsigned Vector Compare Less Than Signed Vector Compare Less Than Unsigned Vector Count Leading Signed Bits Word Vector Count Leading Zeros Word Vector Divide Word Signed Vector Divide Word Unsigned Vector Equivalent Vector Extend Sign Byte Vector Extend Sign Halfword Vector Floating-Point Single-Precision Absolute Value Vector Floating-Point Single-Precision Add Vector Convert Floating-Point Single-Precision from Signed Fraction Vector Convert Floating-Point Single-Precision from Signed Integer Vector Convert Floating-Point Single-Precision from Unsigned Fraction Vector Convert Floating-Point Single-Precision from Unsigned Integer Vector Floating-Point Single-Precision Compare Equal Vector Floating-Point Single-Precision Compare Greater Than Vector Floating-Point Single-Precision Compare Less Than Vector Convert Floating-Point Single-Precision to Signed Fraction Vector Convert Floating-Point Single-Precision to Signed Integer Vector Convert Floating-Point Single-Precision to Signed Integer with Round Toward Zero Vector Convert Floating-Point Single-Precision to Unsigned Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero
EVX EVX EVX EVX EVX EVX EVX XL X EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1162
Form
Cat1 SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP E.PD SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evfsdiv evfsmul evfsnabs evfsneg evfssub evfststeq evfststgt evfststlt evldd evlddepx evlddx evldh evldhx evldw evldwx evlhhesplat evlhhesplatx evlhhossplat evlhhossplatx evlhhousplat evlhhousplatx evlwhe evlwhex evlwhos evlwhosx evlwhou evlwhoux evlwhsplat evlwhsplatx evlwwsplat evlwwsplatx evmergehi evmergehilo evmergelo evmergelohi evmhegsmfaa evmhegsmfan evmhegsmiaa evmhegsmian evmhegumiaa evmhegumian evmhesmf
Instruction Vector Floating-Point Single-Precision Divide Vector Floating-Point Single-Precision Multiply Vector Floating-Point Single-Precision Negative Absolute Value Vector Floating-Point Single-Precision Negate Vector Floating-Point Single-Precision Subtract Vector Floating-Point Single-Precision Test Equal Vector Floating-Point Single-Precision Test Greater Than Vector Floating-Point Single-Precision Test Less Than Vector Load Double Word into Double Word Vector Load Doubleword into Doubleword by External Process ID Indexed Vector Load Double Word into Double Word Indexed Vector Load Double Word into Four Halfwords Vector Load Double Word into Four Halfwords Indexed Vector Load Double Word into Two Words Vector Load Double Word into Two Words Indexed Vector Load Halfword into Halfwords Even and Splat Vector Load Halfword into Halfwords Even and Splat Indexed Vector Load Halfword into Halfword Odd and Splat Vector Load Halfword into Halfword Odd Signed and Splat Indexed Vector Load Halfword into Halfword Odd Unsigned and Splat Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed Vector Load Word into Two Halfwords Even Vector Load Word into Two Halfwords Even Indexed Vector Load Word into Two Halfwords Odd Signed (with sign extension) Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Vector Load Word into Two Halfwords and Splat Vector Load Word into Two Halfwords and Splat Indexed Vector Load Word into Word and Splat Vector Load Word into Word and Splat Indexed Vector Merge High Vector Merge High/Low Vector Merge Low Vector Merge Low/High Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Signed, Modulo, Fractional
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1163
Form
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmhesmfa evmhesmfaaw evmhesmfanw evmhesmi evmhesmia evmhesmiaaw evmhesmianw evmhessf evmhessfa evmhessfaaw evmhessfanw evmhessiaaw evmhessianw evmheumi evmheumia evmheumiaaw evmheumianw evmheusiaaw evmheusianw evmhogsmfaa evmhogsmfan evmhogsmiaa evmhogsmian evmhogumiaa evmhogumian evmhosmf evmhosmfa evmhosmfaaw evmhosmfanw evmhosmi
Instruction Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulate Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer, and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1164
Form
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmhosmia evmhosmiaaw evmhosmianw evmhossf evmhossfa evmhossfaaw evmhossfanw evmhossiaaw evmhossianw evmhoumi evmhoumia evmhoumiaaw evmhoumianw evmhousiaaw evmhousianw evmra evmwhsmf evmwhsmfa evmwhsmi evmwhsmia evmwhssf evmwhssfa evmwhumi evmwhumia evmwlsmiaaw evmwlsmianw evmwlssiaaw evmwlssianw evmwlumi evmwlumia evmwlumiaaw evmwlumianw
Instruction Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional Vector Multiply Halfwords, Odd, Signed, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words Initialize Accumulator Vector Multiply Word High Signed, Modulo, Fractional Vector Multiply Word High Signed, Modulo, Fractional to Accumulator Vector Multiply Word High Signed, Modulo, Integer Vector Multiply Word High Signed, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Saturate, Fractional Vector Multiply Word High Signed, Saturate, Fractional to Accumulator Vector Multiply Word High Unsigned, Modulo, Integer Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Modulo, Integer Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1165
Form
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP E.PD SP SP SP SP SP SP SP SP
Mnemonic evmwlusiaaw evmwlusianw evmwsmf evmwsmfa evmwsmfaa evmwsmfan evmwsmi evmwsmia evmwsmiaa evmwsmian evmwssf evmwssfa evmwssfaa evmwssfan evmwumi evmwumia evmwumiaa evmwumian evnand evneg evnor evor evorc evrlw evrlwi evrndw evsel evslw evslwi evsplatfi evsplati evsrwis evsrwiu evsrws evsrwu evstdd evstddepx evstddx evstdh evstdhx evstdw evstdwx evstwhe evstwhex evstwho
Instruction Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Signed, Modulo, Fractional Vector Multiply Word Signed, Modulo, Fractional to Accumulator Vector Multiply Word Signed, Modulo, Fractional and Accumulate Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Word Signed, Modulo, Integer Vector Multiply Word Signed, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Saturate, Fractional Vector Multiply Word Signed, Saturate, Fractional to Accumulator Vector Multiply Word Signed, Saturate, Fractional and Accumulate Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative Vector Multiply Word Unsigned, Modulo, Integer Vector Multiply Word Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative Vector NAND Vector Negate Vector NOR Vector OR Vector OR with Complement Vector Rotate Left Word Vector Rotate Left Word Immediate Vector Round Word Vector Select Vector Shift Left Word Vector Shift Left Word Immediate Vector Splat Fractional Immediate Vector Splat Immediate Vector Shift Right Word Immediate Signed Vector Shift Right Word Immediate Unsigned Vector Shift Right Word Signed Vector Shift Right Word Unsigned Vector Store Double of Double Vector Store Doubleword into Doubleword by External Process ID Indexed Vector Store Doubleword of Doubleword Indexed Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords Indexed Vector Store Double of Two Words Vector Store Double of Two Words Indexed Vector Store Word of Two Halfwords from Even Vector Store Word of Two Halfwords from Even Indexed Vector Store Word of Two Halfwords from Odd
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVS EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1166
Form
Cat1 SP SP SP SP SP SP SP SP SP
Mnemonic evstwhox evstwwe evstwwex evstwwo evstwwox evsubfsmiaaw evsubfssiaaw evsubfumiaaw evsubfusiaaw
Instruction Vector Store Word of Two Halfwords from Odd Indexed Vector Store Word of Word from Even Vector Store Word of Word from Even Indexed Vector Store Word of Word from Odd Vector Store Word of Word from Odd Indexed Vector Subtract Signed, Modulo, Integer to Accumulator Word Vector Subtract Signed, Saturate, Integer to Accumulator Word Vector Subtract Unsigned, Modulo, Integer to Accumulator Word Vector Subtract Unsigned, Saturate, Integer to Accumulator Word Vector Subtract from Word Vector Subtract Immediate from Word Vector XOR Extend Sign Byte Extend Sign Halfword Extend Sign Word Instruction Cache Block Invalidate Instruction Cache Block Invalidate by External PID Instruction Cache Block Lock Clear Instruction Cache Block Touch Instruction Cache Block Touch and Lock Set Instruction Cache Invalidate Instruction Cache Read Integer Select Load Byte and Reserve Indexed Load Byte with Decoration Indexed Load Byte by External Process ID Indexed Load Byte and Zero with Update Indexed Load Byte and Zero Indexed Load Doubleword And Reserve Indexed Load Doubleword with Decoration Indexed Load Doubleword by External Process ID Indexed Load Doubleword with Update Indexed Load Doubleword Indexed Load Floating Doubleword with Decoration Indexed Load Floating-Point Double by External Process ID Indexed Load Halfword and Reserve Indexed Load Halfword Algebraic with Update Indexed Load Halfword Algebraic Indexed Load Halfword Byte-Reverse Indexed Load Halfword with Decoration Indexed Load Halfword by External Process ID Indexed Load Halfword and Zero with Update Indexed Load Halfword and Zero Indexed Load String Word Immediate Load String Word Indexed Load Vector Element Byte Indexed Load Vector Element Halfword Indexed Load Vector by External Process ID Indexed Load Vector by External Process ID Indexed LRU Load Vector Element Word Indexed Load Vector for Shift Left Indexed Load Vector for Shift Right Indexed Load Vector Indexed Load Vector Indexed LRU Load Word And Reserve Indexed Load Word Algebraic with Update Indexed
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX X X X X X X X X X X A X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
10000204 SP evsubfw 10000206 SP evsubifw 10000216 SP evxor 7C000774 SR B extsb[.] 7C000734 SR B extsh[.] 7C0007B4 SR 64 extsw[.] 7C0007AC B icbi 7C0007BE P E.PD icbiep 7C0001CC M ECL icblc 7C00002C E icbt 7C0003CC M ECL icbtls 7C00078C H E.CI ici 7C0007CC H E.CD icread 7C00001E B isel 7C000068 B lbarx 7C000406 DS lbdx 7C0000BE P E.PD lbepx 7C0000EE B lbzux 7C0000AE B lbzx 7C0000A8 64 ldarx 7C0004C6 DS lddx 7C00003A P E.PD;64 ldepx 7C00006A 64 ldux 7C00002A 64 ldx 7C000646 DS lfddx 7C0004BE P E.PD lfdepx 7C0000E8 7C0002EE 7C0002AE 7C00062C 7C000446 7C00023E 7C00026E 7C00022E 7C0004AA 7C00042A 7C00000E 7C00004E 7C00024E 7C00020E 7C00008E 7C00000C 7C00004C 7C0000CE 7C0002CE 7C000028 7C0002EA B B B B DS E.PD B B MA MA V V E.PD E.PD V V V V V B 64 lharx lhaux lhax lhbrx lhdx lhepx lhzux lhzx lswi lswx lvebx lvehx lvepx lvepxl lvewx lvsl lvsr lvx lvxl lwarx lwaux
P P
1167
Form
Cat1 64 B DS E.PD B B LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA E E B E.DC E.DC E.DC B B E.PM B V E.PC E.PC B E.DC E.DC E.DC E B E.PM B V LMA LMA 64 64 LMA LMA B B 64
Instruction
X X X X X X XO XO XO XO XO XO XO XO XO XO XO XO X X XFX XFX X X X XFX XFX XFX VX X X XFX XFX X X X XFX XFX XFX VX X X XO XO X X XO XO XO
P P P O O H H P P P O O SR SR SR SR SR SR SR SR SR
Load Word Algebraic Indexed Load Word Byte-Reverse Indexed Load Word with Decoration Indexed Load Word by External Process ID Indexed Load Word and Zero with Update Indexed Load Word and Zero Indexed Multiply Accumulate Cross Halfword to Word Modulo Signed macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned mbar Memory Barrier mcrxr Move To Condition Register from XER mfcr Move From Condition Register mfdcr Move From Device Control Register mfdcrux Move From Device Control Register User-mode Indexed mfdcrx Move From Device Control Register Indexed mfmsr Move From Machine State Register mfocrf Move From One Condition Register Field mfpmr Move From Performance Monitor Register mfspr Move From Special Purpose Register mfvscr Move From VSCR msgclr Message Clear msgsnd Message Send mtcrf Move To Condition Register Fields mtdcr Move To Device Control Register mtdcrux Move To Device Control Register User-mode Indexed mtdcrx Move To Device Control Register Indexed mtmsr Move To Machine State Register mtocrf Move To One Condition Register Field mtpmr Move To Performance Monitor Register mtspr Move To Special Purpose Register mtvscr Move To VSCR mulchw[.] Multiply Cross Halfword to Word Signed mulchwu[.] Multiply Cross Halfword to Word Unsigned mulhd[.] Multiply High Doubleword mulhdu[.] Multiply High Doubleword Unsigned mulhhw[.] Multiply High Halfword to Word Signed mulhhwu[.] Multiply High Halfword to Word Unsigned mulhw[.] Multiply High Word mulhwu[.] Multiply High Word Unsigned mulld[o][.] Multiply Low Doubleword
1168
Form
Cat1 LMA LMA B B B LMA LMA LMA LMA LMA LMA B B B B VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE
Instruction
X X XO X XO XO XO XO XO XO XO X X X X RR OIM5 RR RR IM5 BD8 BD8 IM5 C IM5 C IM5 IM5 IM5 RR RR RR IM5 RR OIM5 R R R R C C SD4 SD4 IM7 SD4 RR R R RR RR R R RR R R
SR SR SR SR SR SR
100001DC SR 1000005C SR 100000DC SR 1000035C SR 100003DC SR 7C0000F8 7C000378 7C000338 7C0000F4 0400---2000---4600---4500---2E00---E800---E000---6000---0006---6200---0004---2C00---6400---6600---0C00---0E00---0F00---2A00---0D00---2200---00D0---00F0---00C0---00E0---0000---0001---8000---A000---4800---C000---0300---00A0---0080---0100---0200---00B0---0090---0500---0030---0020---SR SR SR
SR
Multiply Low Halfword to Word Signed Multiply Low Halfword to Word Unsigned Multiply Low Word NAND Negate Negative Multiply Accumulate Cross Halfword to Word Modulo Signed nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed nor[.] NOR or[.] OR orc[.] OR with Complement popcntb Population Count Bytes se_add Add Short Form se_addi Add Immediate Short Form se_and[.] AND Short Form se_andc AND with Complement Short Form se_andi AND Immediate Short Form se_b[l] Branch [and Link] se_bc Branch Conditional Short Form se_bclri Bit Clear Immediate se_bctr[l] Branch to Count Register [and Link] se_bgeni Bit Generate Immediate se_blr[l] Branch to Link Register [and Link] se_bmaski Bit Mask Generate Immediate se_bseti Bit Set Immediate se_btsti Bit Test Immediate se_cmp Compare Word se_cmph Compare Halfword Short Form se_cmphl Compare Halfword Logical Short Form se_cmpi Compare Immediate Word Short Form se_cmpl Compare Logical Word se_cmpli Compare Logical Immediate Word se_extsb Extend Sign Byte Short Form se_extsh Extend Sign Halfword Short Form se_extzb Extend Zero Byte se_extzh Extend Zero Halfword se_illegal Illegal se_isync Instruction Synchronize se_lbz Load Byte and Zero Short Form se_lhz Load Halfword and Zero Short Form se_li Load Immediate Short Form se_lwz Load Word and Zero Short Form se_mfar Move from Alternate Register se_mfctr Move From Count Register se_mflr Move From Link Register se_mr Move Register se_mtar Move To Alternate Register se_mtctr Move To Count Register se_mtlr Move To Link Register se_mullw Multiply Low Word Short Form se_neg Negate Short Form se_not NOT Short Form
1169
Form
Cat1 VLE VLE VLE VLE, E.HV VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE 64 B 64 64 B B 64 B B DS E.PD B B 64 DS E.PD;64 64 64 DS E.PD B B DS E.PD B B MA MA V V E.PD E.PD V V V B B DS E.PD B
Mnemonic se_or se_rfci se_rfdi se_rfgi se_rfi se_rfmci se_sc se_slw se_slwi se_sraw se_srawi se_srw se_srwi se_stb se_sth se_stw se_sub se_subf se_subi[.] sld[.] slw[.] srad[.] sradi[.] sraw[.] srawi[.] srd[.] srw[.] stbcx. stbdx stbepx stbux stbx stdcx. stddx stdepx stdux stdx stfddx stfdepx sthbrx sthcx. sthdx sthepx sthux sthx stswi stswx stvebx stvehx stvepx stvepxl stvewx stvx stvxl stwbrx stwcx. stwdx stwepx stwux
Instruction OR Short Form Return From Critical Interrupt Return From Debug Interrupt Return From Guest Interrupt Return From Interrupt Return From Machine Check Interrupt System Call Shift Left Word Shift Left Word Immediate Short Form Shift Right Algebraic Word Shift Right Algebraic Immediate Shift Right Word Shift Right Word Immediate Short Form Store Byte Short Form Store Halfword Short Form Store Word Short Form Subtract Subtract From Short Form Subtract Immediate Shift Left Doubleword Shift Left Word Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword Immediate Shift Right Algebraic Word Shift Right Algebraic Word Immediate Shift Right Doubleword Shift Right Word Store Byte Conditional Indexed Store Byte with Decoration Indexed Store Byte by External Process ID Indexed Store Byte with Update Indexed Store Byte Indexed Store Doubleword Conditional Indexed Store Doubleword with Decoration Indexed Store Doubleword by External Process ID Indexed Store Doubleword with Update Indexed Store Doubleword Indexed Store Floating Doubleword with Decoration Indexed Store Floating-Point Double by External Process ID Indexed Store Halfword Byte-Reverse Indexed Store Halfword Conditional Indexed Store Halfword with Decoration Indexed Store Halfword by External Process ID Indexed Store Halfword with Update Indexed Store Halfword Indexed Store String Word Immediate Store String Word Indexed Store Vector Element Byte Indexed Store Vector Element Halfword Indexed Store Vector by External Process ID Indexed Store Vector by External Process ID Indexed LRU Store Vector Element Word Indexed Store Vector Indexed Store Vector Indexed LRU Store Word Byte-Reverse Indexed Store Word Conditional Indexed Store Word with Decoration Indexed Store Word by External Process ID Indexed Store Word with Update Indexed
H H P P H
SR SR SR SR SR SR SR SR SR P
P P
1170
Form
Cat1 B B B B B B B 64 E E E E E B V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic stwx subf[o][.] subfc[o][.] subfe[o][.] subfme[o][.] subfze[o][.] sync td tlbivax tlbre tlbsx tlbsync tlbwe tw vaddcuw vaddfp vaddsbs vaddshs vaddsws vaddubm vaddubs vadduhm vadduhs vadduwm vadduws vand vandc vavgsb vavgsh vavgsw vavgub vavguh vavguw vcfsx vcfux vcmpbfp[.] vcmpeqfp[.] vcmpequb[.] vcmpequh[.] vcmpequw[.] vcmpgefp[.] vcmpgtfp[.] vcmpgtsb[.] vcmpgtsh[.] vcmpgtsw[.] vcmpgtub[.] vcmpgtuh[.] vcmpgtuw[.] vctsxs vctuxs vexptefp vlogefp vmaddfp vmaxfp vmaxsb vmaxsh vmaxsw vmaxub vmaxuh vmaxuw
Instruction Store Word Indexed Subtract From Subtract From Carrying Subtract From Extended Subtract From Minus One Extended Subtract From Zero Extended Synchronize Trap Doubleword TLB Invalidate Virtual Address Indexed TLB Read Entry TLB Search Indexed TLB Synchronize TLB Write Entry Trap Word Vector Add and write Carry-out Unsigned Word Vector Add Single-Precision Vector Add Signed Byte Saturate Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate Vector Add Unsigned Byte Modulo Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Modulo Vector Add Unsigned Halfword Saturate Vector Add Unsigned Word Modulo Vector Add Unsigned Word Saturate Vector Logical AND Vector Logical AND with Complement Vector Average Signed Byte Vector Average Signed Halfword Vector Average Signed Word Vector Average Unsigned Byte Vector Average Unsigned Halfword Vector Average Unsigned Word Vector Convert From Signed Fixed-Point Word Vector Convert From Vector Compare Bounds Single-Precision Vector Compare Equal To Single-Precision Vector Compare Equal To Unsigned Byte Vector Compare Equal To Unsigned Halfword Vector Compare Equal To Unsigned Word Vector Compare Greater Than or Equal To Single-Precision Vector Compare Greater Than Single-Precision Vector Compare Greater Than Signed Byte Vector Compare Greater Than Signed Halfword Vector Compare Greater Than Signed Word Vector Compare Greater Than Unsigned Byte Vector Compare Greater Than Unsigned Halfword Vector Compare Greater Than Unsigned Word Vector Convert To Signed Fixed-Point Word Saturate Vector Convert To Unsigned Fixed-Point Word Saturate Vector 2 Raised to the Exponent Estimate Floating-Point Vector Log Base 2 Estimate Floating-Point Vector Multiply-Add Single-Precision Vector Maximum Single-Precision Vector Maximum Signed Byte Vector Maximum Signed Halfword Vector Maximum Signed Word Vector Maximum Unsigned Byte Vector Maximum Unsigned Halfword Vector Maximum Unsigned Word
X XO XO XO XO XO X X X X X X X X VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VC VC VC VC VC VC VC VC VC VC VC VC VC VX VX VX VX VA VX VX VX VX VX VX VX
SR SR SR SR SR H H H H H
1171
Form
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic vmhaddshs vmhraddshs vminfp vminsb vminsh vminsw vminub vminuh vminuw vmladduhm vmrghb vmrghh vmrghw vmrglb vmrglh vmrglw vmsummbm vmsumshm vmsumshs vmsumubm vmsumuhm vmsumuhs vmulesb vmulesh vmuleub vmuleuh vmulosb vmulosh vmuloub vmulouh vnmsubfp vnor vor vperm vpkpx vpkshss vpkshus vpkswss vpkswus vpkuhum vpkuhus vpkuwum vpkuwus vrefp vrfim vrfin vrfip vrfiz vrlb vrlh vrlw vrsqrtefp vsel vsl vslb vsldoi vslh vslo vslw vspltb
Instruction Vector Multiply-High-Add Signed Halfword Saturate Vector Multiply-High-Round-Add Signed Halfword Saturate Vector Minimum Single-Precision Vector Minimum Signed Byte Vector Minimum Signed Halfword Vector Minimum Signed Word Vector Minimum Unsigned Byte Vector Minimum Unsigned Halfword Vector Minimum Unsigned Word Vector Multiply-Low-Add Unsigned Halfword Modulo Vector Merge High Byte Vector Merge High Halfword Vector Merge High Word Vector Merge Low Byte Vector Merge Low Halfword Vector Merge Low Word Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword Modulo Vector Multiply-Sum Signed Halfword Saturate Vector Multiply-Sum Unsigned Byte Modulo Vector Multiply-Sum Unsigned Halfword Modulo Vector Multiply-Sum Unsigned Halfword Saturate Vector Multiply Even Signed Byte Vector Multiply Even Signed Halfword Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword Vector Multiply Odd Signed Byte Vector Multiply Odd Signed Halfword Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword Vector Negative Multiply-Subtract Single-Precision Vector Logical NOR Vector Logical OR Vector Permute Vector Pack Pixel Vector Pack Signed Halfword Signed Saturate Vector Pack Signed Halfword Unsigned Saturate Vector Pack Signed Word Signed Saturate Vector Pack Signed Word Unsigned Saturate Vector Pack Unsigned Halfword Unsigned Modulo Vector Pack Unsigned Halfword Unsigned Saturate Vector Pack Unsigned Word Unsigned Modulo Vector Pack Unsigned Word Unsigned Saturate Vector Reciprocal Estimate Single-Precision Vector Round to Single-Precision Integer toward -Infinity Vector Round to Single-Precision Integer Nearest Vector Round to Single-Precision Integer toward +Infinity Vector Round to Single-Precision Integer toward Zero Vector Rotate Left Byte Vector Rotate Left Halfword Vector Rotate Left Word Vector Reciprocal Square Root Estimate Single-Precision Vector Select Vector Shift Left Vector Shift Left Byte Vector Shift Left Double by Octet Immediate Vector Shift Left Halfword Vector Shift Left by Octet Vector Shift Left Word Vector Splat Byte
VA VA VX VX VX VX VX VX VX VA VX VX VX VX VX VX VA VA VA VA VA VA VX VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX
1172
Form
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V WT E E B
Mnemonic vsplth vspltisb vspltish vspltisw vspltw vsr vsrab vsrah vsraw vsrb vsrh vsro vsrw vsubcuw vsubfp vsubsbs vsubshs vsubsws vsububm vsububs vsubuhm vsubuhs vsubuwm vsubuws vsum2sws vsum4sbs vsum4shs vsum4ubs vsumsws vupkhpx vupkhsb vupkhsh vupklpx vupklsb vupklsh vxor wait wrtee wrteei xor[.]
Instruction Vector Splat Halfword Vector Splat Immediate Signed Byte Vector Splat Immediate Signed Halfword Vector Splat Immediate Signed Word Vector Splat Word Vector Shift Right Vector Shift Right Algebraic Word Vector Shift Right Algebraic Halfword Vector Shift Right Algebraic Word Vector Shift Right Byte Vector Shift Right Halfword Vector Shift Right by Octet Vector Shift Right Word Vector Subtract and Write Carry-Out Unsigned Word Vector Subtract Single-Precision Vector Subtract Signed Byte Saturate Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate Vector Subtract Unsigned Byte Modulo Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword Modulo Vector Subtract Unsigned Halfword Saturate Vector Subtract Unsigned Word Modulo Vector Subtract Unsigned Word Saturate Vector Sum across Half Signed Word Saturate Vector Sum across Quarter Signed Byte Saturate Vector Sum across Quarter Signed Halfword Saturate Vector Sum across Quarter Unsigned Byte Saturate Vector Sum across Signed Word Saturate Vector Unpack High Pixel Vector Unpack High Signed Byte Vector Unpack High Signed Halfword Vector Unpack Low Pixel Vector Unpack Low Signed Byte Vector Unpack Low Signed Halfword Vector Logical XOR Wait Write MSR External Enable Write MSR External Enable Immediate XOR
VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX X X X X
1
See the key to the mode dependency and privilege columns on page 1302 and the key to the category column in Section 1.3.5 of Book I. 2 For 16-bit instructions, the Opcode column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0s in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the Opcode column represents the 32-bit hexadecimal instruction encoding with the opcode, extended opcode, and other fields with fixed values in the corresponding fields in the instruction, and with 0...s in bit positions which are not opcode, extended opcode or fixed value bits.
1173
1174
C C C C C C C C C C R R R R R R R R R R RR RR RR RR RR RR RR RR RR RR RR VX VX VX VC VX VX VX VX X XO VA
Form
Cat1 VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE, E.HV VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE V V V V V V V V LMA LMA V
Mnemonic se_illegal se_isync se_sc se_blr[l] se_bctr[l] se_rfi se_rfci se_rfdi se_rfmci se_rfgi se_not se_neg se_mflr se_mtlr se_mfctr se_mtctr se_extzb se_extsb se_extzh se_extsh se_mr se_mtar se_mfar se_add se_mullw se_sub se_subf se_cmp se_cmpl se_cmph se_cmphl vaddubm vmaxub vrlb vcmpequb[.] vmuloub vaddfp vmrghb vpkuhum mulhhwu[.] machhwu[o][.] vmhaddshs
Instruction Illegal Instruction Synchronize System Call Branch to Link Register [and Link] Branch to Count Register [and Link] Return From Interrupt Return From Critical Interrupt Return From Debug Interrupt Return From Machine Check Interrupt Return From Guest Interrupt NOT Short Form Negate Short Form Move From Link Register Move To Link Register Move From Count Register Move To Count Register Extend Zero Byte Extend Sign Byte Short Form Extend Zero Halfword Extend Sign Halfword Short Form Move Register Move To Alternate Register Move from Alternate Register Add Short Form Multiply Low Word Short Form Subtract Subtract From Short Form Compare Word Compare Logical Word Compare Halfword Short Form Compare Halfword Logical Short Form Vector Add Unsigned Byte Modulo Vector Maximum Unsigned Byte Vector Rotate Left Byte Vector Compare Equal To Unsigned Byte Vector Multiply Odd Unsigned Byte Vector Add Single-Precision Vector Merge High Byte Vector Pack Unsigned Halfword Unsigned Modulo Multiply High Halfword to Word Unsigned Multiply Accumulate High Halfword to Word Modulo Unsigned Vector Multiply-High-Add Signed Halfword Saturate
P H H H P
0020---0030---0080---0090---00A0---00B0---00C0---00D0---00E0---00F0---0100---0200---0300---0400---0500---0600---0700---0C00---0D00---0E00---0F00---10000000 10000002 10000004 10000006 10000008 1000000A 1000000C 1000000E 10000010 SR 10000018 SR 10000020
1175
Form
Cat1 V V V V V V V V V V V V V V V V V V V V V LMA LMA LMA V V V V V V LMA V V LMA LMA V V V V V V LMA LMA V V V V V V LMA LMA LMA
Mnemonic vmhraddshs
Instruction
VA VA VA VA VA VA VA VA VA VA VA VA VA VX VX VX VC VX VX VX VX X XO XO VX VX VX VC VX VX XO VC VX XO XO VX VX VX VX VX VX X XO VX VX VX VX VX VX X XO XO
10000022 10000024 10000025 10000026 10000027 10000028 10000029 1000002A 1000002B 1000002C 1000002E 1000002F 10000040 10000042 10000044 10000046 10000048 1000004A 1000004C 1000004E 10000050 SR 10000058 SR 1000005C SR 10000080 10000082 10000084 10000086 1000008C 1000008E 10000098 SR 100000C6 100000CE 100000D8 SR 100000DC SR 10000102 10000104 10000108 1000010A 1000010C 1000010E 10000110 SR 10000118 SR 10000142 10000144 10000148 1000014A 1000014C 1000014E 10000150 SR 10000158 SR 1000015C SR
Vector Multiply-High-Round-Add Signed Halfword Saturate vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo vmsumubm Vector Multiply-Sum Unsigned Byte Modulo vmsummbm Vector Multiply-Sum Mixed Byte Modulo vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate vmsumshm Vector Multiply-Sum Signed Halfword Modulo vmsumshs Vector Multiply-Sum Signed Halfword Saturate vsel Vector Select vperm Vector Permute vsldoi Vector Shift Left Double by Octet Immediate vmaddfp Vector Multiply-Add Single-Precision vnmsubfp Vector Negative Multiply-Subtract Single-Precision vadduhm Vector Add Unsigned Halfword Modulo vmaxuh Vector Maximum Unsigned Halfword vrlh Vector Rotate Left Halfword vcmpequh[.] Vector Compare Equal To Unsigned Halfword vmulouh Vector Multiply Odd Unsigned Halfword vsubfp Vector Subtract Single-Precision vmrghh Vector Merge High Halfword vpkuwum Vector Pack Unsigned Word Unsigned Modulo mulhhw[.] Multiply High Halfword to Word Signed machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed vadduwm Vector Add Unsigned Word Modulo vmaxuw Vector Maximum Unsigned Word vrlw Vector Rotate Left Word vcmpequw[.] Vector Compare Equal To Unsigned Word vmrghw Vector Merge High Word vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned vcmpeqfp[.] Vector Compare Equal To Single-Precision vpkuwus Vector Pack Unsigned Word Unsigned Saturate machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed vmaxsb Vector Maximum Signed Byte vslb Vector Shift Left Byte vmulosb Vector Multiply Odd Signed Byte vrefp Vector Reciprocal Estimate Single-Precision vmrglb Vector Merge Low Byte vpkshus Vector Pack Signed Halfword Unsigned Saturate mulchwu[.] Multiply Cross Halfword to Word Unsigned macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned vmaxsh Vector Maximum Signed Halfword vslh Vector Shift Left Halfword vmulosh Vector Multiply Odd Signed Halfword vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision vmrglh Vector Merge Low Halfword vpkswus Vector Pack Signed Word Unsigned Saturate mulchw[.] Multiply Cross Halfword to Word Signed macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed
1176
Form
Instruction
VX VX VX VX VX VX XO VX VC VX VX XO XO EVX VX EVX VX EVX VX EVX VC EVX VX EVX EVX VX EVX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
Vector Add and write Carry-out Unsigned Word Vector Maximum Signed Word Vector Shift Left Word Vector 2 Raised to the Exponent Estimate Floating-Point Vector Merge Low Word Vector Pack Signed Halfword Signed Saturate Multiply Accumulate Cross Halfword to Word Saturate Unsigned vsl Vector Shift Left vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision vlogefp Vector Log Base 2 Estimate Floating-Point vpkswss Vector Pack Signed Word Signed Saturate macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed evaddw Vector Add Word vaddubs Vector Add Unsigned Byte Saturate evaddiw Vector Add Immediate Word vminub Vector Minimum Unsigned Byte evsubfw Vector Subtract from Word vsrb Vector Shift Right Byte evsubifw Vector Subtract Immediate from Word vcmpgtub[.] Vector Compare Greater Than Unsigned Byte evabs Vector Absolute Value vmuleub Vector Multiply Even Unsigned Byte evneg Vector Negate evextsb Vector Extend Sign Byte vrfin Vector Round to Single-Precision Integer Nearest evextsh Vector Extend Sign Halfword evrndw Vector Round Word vspltb Vector Splat Byte evcntlzw Vector Count Leading Zeros Word evcntlsw Vector Count Leading Signed Bits Word vupkhsb Vector Unpack High Signed Byte brinc Bit Reversed Increment evand Vector AND evandc Vector AND with Complement evxor Vector XOR evor Vector OR evnor Vector NOR eveqv Vector Equivalent evorc Vector OR with Complement evnand Vector NAND evsrwu Vector Shift Right Word Unsigned evsrws Vector Shift Right Word Signed evsrwiu Vector Shift Right Word Immediate Unsigned evsrwis Vector Shift Right Word Immediate Signed evslw Vector Shift Left Word evslwi Vector Shift Left Word Immediate evrlw Vector Rotate Left Word evsplati Vector Splat Immediate evrlwi Vector Rotate Left Word Immediate evsplatfi Vector Splat Fractional Immediate evmergehi Vector Merge High evmergelo Vector Merge Low evmergehilo Vector Merge High/Low evmergelohi Vector Merge Low/High evcmpgtu Vector Compare Greater Than Unsigned evcmpgts Vector Compare Greater Than Signed
1177
Form
Cat1 SP SP SP V V V V V V V V SP SP.FV V SP.FV V SP.FV V SP.FV SP.FV V SP.FV SP.FV V SP.FV V SP.FV SP.FV V SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FS SP.FS SP.FS V SP.FS
Mnemonic evcmpltu evcmplts evcmpeq vadduhs vminuh vsrh vcmpgtuh[.] vmuleuh vrfiz vsplth vupkhsh evsel evfsadd vadduws evfssub vminuw evfsabs vsrw evfsnabs evfsneg vcmpgtuw[.] evfsmul evfsdiv vrfip evfscmpgt vspltw evfscmplt evfscmpeq vupklsb evfscfui evfscfsi evfscfuf evfscfsf evfsctui evfsctsi evfsctuf evfsctsf evfsctuiz evfsctsiz evfststgt evfststlt evfststeq efsadd efssub efsabs vsr efsnabs
Instruction Vector Compare Less Than Unsigned Vector Compare Less Than Signed Vector Compare Equal Vector Add Unsigned Halfword Saturate Vector Minimum Unsigned Halfword Vector Shift Right Halfword Vector Compare Greater Than Unsigned Halfword Vector Multiply Even Unsigned Halfword Vector Round to Single-Precision Integer toward Zero Vector Splat Halfword Vector Unpack High Signed Halfword Vector Select Vector Floating-Point Single-Precision Add Vector Add Unsigned Word Saturate Vector Floating-Point Single-Precision Subtract Vector Minimum Unsigned Word Vector Floating-Point Single-Precision Absolute Value Vector Shift Right Word Vector Floating-Point Single-Precision Negative Absolute Value Vector Floating-Point Single-Precision Negate Vector Compare Greater Than Unsigned Word Vector Floating-Point Single-Precision Multiply Vector Floating-Point Single-Precision Divide Vector Round to Single-Precision Integer toward +Infinity Vector Floating-Point Single-Precision Compare Greater Than Vector Splat Word Vector Floating-Point Single-Precision Compare Less Than Vector Floating-Point Single-Precision Compare Equal Vector Unpack Low Signed Byte Vector Convert Floating-Point Single-Precision from Unsigned Integer Vector Convert Floating-Point Single-Precision from Signed Integer Vector Convert Floating-Point Single-Precision from Unsigned Fraction Vector Convert Floating-Point Single-Precision from Signed Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer Vector Convert Floating-Point Single-Precision to Signed Integer Vector Convert Floating-Point Single-Precision to Unsigned Fraction Vector Convert Floating-Point Single-Precision to Signed Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Vector Convert Floating-Point Single-Precision to Signed Integer with Round Toward Zero Vector Floating-Point Single-Precision Test Greater Than Vector Floating-Point Single-Precision Test Less Than Vector Floating-Point Single-Precision Test Equal Floating-Point Single-Precision Add Floating-Point Single-Precision Subtract Floating-Point Single-Precision Absolute Value Vector Shift Right Floating-Point Single-Precision Negative Absolute Value
EVX EVX EVX VX VX VX VC VX VX VX VX EVS EVX VX EVX VX EVX VX EVX EVX VC EVX EVX VX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX
1178
Form
Cat1 SP.FS V SP.FS SP.FS V SP.FS SP.FS SP.FS V SP.FD SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FS SP.FD
Mnemonic efsneg vcmpgtfp[.] efsmul efsdiv vrfim efscmpgt efscmplt efscmpeq vupklsh efscfd efscfui efscfsi efscfuf efscfsf efsctui efsctsi efsctuf efsctsf efsctuiz efsctsiz efststgt efststlt efststeq efdadd efdsub efdcfuid efdcfsid efdabs efdnabs efdneg efdmul efddiv efdctuidz efdctsidz efdcmplt efdcmpgt efdcmpeq efdcfs efdcfui efdcfsi efscfuf efdcfsf
Instruction Floating-Point Single-Precision Negate Vector Compare Greater Than Single-Precision Floating-Point Single-Precision Multiply Floating-Point Single-Precision Divide Vector Round to Single-Precision Integer toward -Infinity Floating-Point Single-Precision Compare Greater Than Floating-Point Single-Precision Compare Less Than Floating-Point Single-Precision Compare Equal Vector Unpack Low Signed Halfword Floating-Point Single-Precision Convert from Double-Precision Convert Floating-Point Single-Precision from Unsigned Integer Convert Floating-Point Single-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Single-Precision from Signed Fraction Convert Floating-Point Single-Precision to Unsigned Integer Convert Floating-Point Single-Precision to Signed Integer Convert Floating-Point Single-Precision to Unsigned Fraction Convert Floating-Point Single-Precision to Signed Fraction Convert Floating-Point Single-Precision to Unsigned Integer with Round Towards Zero Convert Floating-Point Single-Precision to Signed Integer with Round Towards Zero Floating-Point Single-Precision Test Greater Than Floating-Point Single-Precision Test Less Than Floating-Point Single-Precision Test Equal Floating-Point Double-Precision Add Floating-Point Double-Precision Subtract Convert Floating-Point Double-Precision from Unsigned Integer Doubleword Convert Floating-Point Double-Precision from Signed Integer Doubleword Floating-Point Double-Precision Absolute Value Floating-Point Double-Precision Negative Absolute Value Floating-Point Double-Precision Negate Floating-Point Double-Precision Multiply Floating-Point Double-Precision Divide Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero Floating-Point Double-Precision Compare Less Than Floating-Point Double-Precision Compare Greater Than Floating-Point Double-Precision Compare Equal Floating-Point Double-Precision Convert from Single-Precision Convert Floating-Point Double-Precision from Unsigned Integer Convert Floating-Point Double-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Double-Precision from Signed Fraction
EVX VC EVX EVX VX EVX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1179
Form
Cat1 SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP V SP SP V SP SP V SP V SP V SP V SP V SP SP V SP LMA SP SP SP SP SP SP LMA SP SP SP SP SP SP SP SP SP
Mnemonic efdctui efdctsi efdctuf efdctsf efdctuiz efdctsiz efdtstgt efdtstlt efdtsteq evlddx vaddsbs evldd evldwx vminsb evldw evldhx vsrab evldh vcmpgtsb[.] evlhhesplatx vmulesb evlhhesplat vcfux evlhhousplatx vspltisb evlhhousplat evlhhossplatx vpkpx evlhhossplat mullhwu[.] evlwhe evlwhoux evlwhou evlwhosx evlwhos evlwwsplatx maclhwu[o][.] evlwwsplat evlwhsplatx evlwhsplat evstddx evstdd evstdwx evstdw evstdhx evstdh
Instruction Convert Floating-Point Double-Precision to Unsigned Integer Convert Floating-Point Double-Precision to Signed Integer Convert Floating-Point Double-Precision to Unsigned Fraction Convert Floating-Point Double-Precision to Signed Fraction Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero Floating-Point Double-Precision Test Greater Than Floating-Point Double-Precision Test Less Than Floating-Point Double-Precision Test Equal Vector Load Double Word into Double Word Indexed Vector Add Signed Byte Saturate Vector Load Double Word into Double Word Vector Load Double Word into Two Words Indexed Vector Minimum Signed Byte Vector Load Double Word into Two Words Vector Load Double Word into Four Halfwords Indexed Vector Shift Right Algebraic Word Vector Load Double Word into Four Halfwords Vector Compare Greater Than Signed Byte Vector Load Halfword into Halfwords Even and Splat Indexed Vector Multiply Even Signed Byte Vector Load Halfword into Halfwords Even and Splat Vector Convert From Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed Vector Splat Immediate Signed Byte Vector Load Halfword into Halfword Odd Unsigned and Splat Vector Load Halfword into Halfword Odd Signed and Splat Indexed Vector Pack Pixel Vector Load Halfword into Halfword Odd and Splat Multiply Low Halfword to Word Unsigned Vector Load Word into Two Halfwords Even Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) Vector Load Word into Two Halfwords Odd Signed (with sign extension) Vector Load Word into Word and Splat Indexed Multiply Accumulate Low Halfword to Word Modulo Unsigned Vector Load Word into Word and Splat Vector Load Word into Two Halfwords and Splat Indexed Vector Load Word into Two Halfwords and Splat Vector Store Doubleword of Doubleword Indexed Vector Store Double of Double Vector Store Double of Two Words Indexed Vector Store Double of Two Words Vector Store Double of Four Halfwords Indexed Vector Store Double of Four Halfwords
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX EVX VX EVX EVX VX EVX VC EVX VX EVX VX EVX VX EVX EVX VX EVX X EVX EVX EVX EVX EVX EVX XO EVX EVX EVX EVX EVX EVX EVX EVX EVX
1180
Form
Mnemonic evstwhex evstwhe evstwhox evstwho evstwwex evstwwe evstwwox evstwwo vaddshs vminsh vsrah vcmpgtsh[.] vmulesh vcfsx vspltish vupkhpx mullhw[.] maclhw[o][.] nmaclhw[o][.] vaddsws vminsw vsraw vcmpgtsw[.] vctuxs vspltisw maclhwsu[o][.] vcmpbfp[.] vctsxs vupklpx maclhws[o][.] nmaclhws[o][.] vsububm vavgub evmhessf vand evmhossf evmheumi evmhesmi vmaxfp evmhesmf evmhoumi vslo evmhosmi evmhosmf evmhessfa evmhossfa
Instruction Vector Store Word of Two Halfwords from Even Indexed Vector Store Word of Two Halfwords from Even Vector Store Word of Two Halfwords from Odd Indexed Vector Store Word of Two Halfwords from Odd Vector Store Word of Word from Even Indexed Vector Store Word of Word from Even Vector Store Word of Word from Odd Indexed Vector Store Word of Word from Odd Vector Add Signed Halfword Saturate Vector Minimum Signed Halfword Vector Shift Right Algebraic Halfword Vector Compare Greater Than Signed Halfword Vector Multiply Even Signed Halfword Vector Convert From Signed Fixed-Point Word Vector Splat Immediate Signed Halfword Vector Unpack High Pixel Multiply Low Halfword to Word Signed Multiply Accumulate Low Halfword to Word Modulo Signed Negative Multiply Accumulate Low Halfword to Word Modulo Signed Vector Add Signed Word Saturate Vector Minimum Signed Word Vector Shift Right Algebraic Word Vector Compare Greater Than Signed Word Vector Convert To Unsigned Fixed-Point Word Saturate Vector Splat Immediate Signed Word Multiply Accumulate Low Halfword to Word Saturate Unsigned Vector Compare Bounds Single-Precision Vector Convert To Signed Fixed-Point Word Saturate Vector Unpack Low Pixel Multiply Accumulate Low Halfword to Word Saturate Signed Negative Multiply Accumulate Low Halfword to Word Saturate Signed Vector Subtract Unsigned Byte Modulo Vector Average Unsigned Byte Vector Multiply Halfwords, Even, Signed, Saturate, Fractional Vector Logical AND Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer Vector Multiply Halfwords, Even, Signed, Modulo, Integer Vector Maximum Single-Precision Vector Multiply Halfwords, Even, Signed, Modulo, Fractional Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer Vector Shift Left by Octet Vector Multiply Halfwords, Odd, Signed, Modulo, Integer Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional to Accumulator
EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX VC VX VX VX VX X XO XO VX VX VX VC VX VX XO VC VX VX XO XO VX VX EVX VX EVX EVX EVX VX EVX EVX VX EVX EVX EVX EVX
1181
Form
Cat1 SP SP SP SP SP SP V V V SP SP V SP V SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V V V SP SP SP SP SP V SP SP SP
Mnemonic evmheumia evmhesmia evmhesmfa evmhoumia evmhosmia evmhosmfa vsubuhm vavguh vandc evmwhssf evmwlumi vminfp evmwhumi vsro evmwhsmi evmwhsmf evmwssf evmwumi evmwsmi evmwsmf evmwhssfa evmwlumia evmwhumia evmwhsmia evmwhsmfa evmwssfa evmwumia evmwsmia evmwsmfa vsubuwm vavguw vor evaddusiaaw evaddssiaaw evsubfusiaaw evsubfssiaaw evmra vxor evdivws evdivwu evaddumiaaw
Instruction Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator Vector Subtract Unsigned Halfword Modulo Vector Average Unsigned Halfword Vector Logical AND with Complement Vector Multiply Word High Signed, Saturate, Fractional Vector Multiply Word Low Unsigned, Modulo, Integer Vector Minimum Single-Precision Vector Multiply Word High Unsigned, Modulo, Integer Vector Shift Right by Octet Vector Multiply Word High Signed, Modulo, Integer Vector Multiply Word High Signed, Modulo, Fractional Vector Multiply Word Signed, Saturate, Fractional Vector Multiply Word Unsigned, Modulo, Integer Vector Multiply Word Signed, Modulo, Integer Vector Multiply Word Signed, Modulo, Fractional Vector Multiply Word High Signed, Saturate, Fractional to Accumulator Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Modulo, Fractional to Accumulator Vector Multiply Word Signed, Saturate, Fractional to Accumulator Vector Multiply Word Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Fractional to Accumulator Vector Subtract Unsigned Word Modulo Vector Average Unsigned Word Vector Logical OR Vector Add Unsigned, Saturate, Integer to Accumulator Word Vector Add Signed, Saturate, Integer to Accumulator Word Vector Subtract Unsigned, Saturate, Integer to Accumulator Word Vector Subtract Signed, Saturate, Integer to Accumulator Word Initialize Accumulator Vector Logical XOR Vector Divide Word Signed Vector Divide Word Unsigned Vector Add Unsigned, Modulo, Integer to Accumulator Word
EVX EVX EVX EVX EVX EVX VX VX VX EVX EVX VX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX EVX EVX EVX EVX EVX VX EVX EVX EVX
1182
Form
Cat1 SP SP SP SP SP V SP SP V SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V SP SP SP SP SP
Mnemonic evaddsmiaaw evsubfumiaaw evsubfsmiaaw evmheusiaaw evmhessiaaw vavgsb evmhessfaaw evmhousiaaw vnor evmhossiaaw evmhossfaaw evmheumiaaw evmhesmiaaw evmhesmfaaw evmhoumiaaw evmhosmiaaw evmhosmfaaw evmhegumiaa evmhegsmiaa evmhegsmfaa evmhogumiaa evmhogsmiaa evmhogsmfaa evmwlusiaaw evmwlssiaaw vavgsh evmwlumiaaw evmwlsmiaaw evmwssfaa evmwumiaa evmwsmiaa
Instruction Vector Add Signed, Modulo, Integer to Accumulator Word Vector Subtract Unsigned, Modulo, Integer to Accumulator Word Vector Subtract Signed, Modulo, Integer to Accumulator Word Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words Vector Average Signed Byte Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Vector Logical NOR Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer, and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words Vector Average Signed Halfword Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words Vector Multiply Word Signed, Saturate, Fractional and Accumulate Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Integer and Accumulate
EVX EVX EVX EVX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX EVX EVX EVX EVX
1183
Form
Cat1 SP SP V SP V SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V V V
Mnemonic evmwsmfaa evmheusianw vsubcuw evmhessianw vavgsw evmhessfanw evmhousianw evmhossianw evmhossfanw evmheumianw evmhesmianw evmhesmfanw evmhoumianw evmhosmianw evmhosmfanw evmhegumian evmhegsmian evmhegsmfan evmhogumian evmhogsmian evmhogsmfan evmwlusianw evmwlssianw evmwlumianw evmwlsmianw evmwssfan evmwumian evmwsmian evmwsmfan vsububs mfvscr vsum4ubs
Instruction Vector Multiply Word Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Subtract and Write Carry-Out Unsigned Word Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words Vector Average Signed Word Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative Vector Subtract Unsigned Byte Saturate Move From VSCR Vector Sum across Quarter Unsigned Byte Saturate
EVX EVX VX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX
1184
Form
Cat1 V V V V V V V V V V VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE
Mnemonic vsubuhs mtvscr vsum4shs vsubuws vsum2sws vsubsbs vsum4sbs vsubshs vsubsws vsumsws e_lbzu e_lhzu e_lwzu e_lhau e_stbu e_sthu e_stwu e_lmw e_stmw e_addi[.] e_addic[.] e_mulli e_cmpi e_subfic[.] e_andi[.] e_ori[.] e_xori[.] e_cmpli e_add16i se_addi se_cmpli se_subi[.] se_cmpi se_bmaski se_andi e_lbz e_stb e_lha se_srw se_sraw se_slw se_or se_andc se_and[.] se_li e_lwz e_stw e_lhz e_sth se_bclri se_bgeni se_bseti se_btsti se_srwi se_srawi se_slwi e_li e_add2i. e_add2is e_cmp16i e_mull2i
Instruction Vector Subtract Unsigned Halfword Saturate Move To VSCR Vector Sum across Quarter Signed Halfword Saturate Vector Subtract Unsigned Word Saturate Vector Sum across Half Signed Word Saturate Vector Subtract Signed Byte Saturate Vector Sum across Quarter Signed Byte Saturate Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate Vector Sum across Signed Word Saturate Load Byte and Zero with Update Load Halfword and Zero with Update Load Word and Zero with Update Load Halfword Algebraic with Update Store Byte with Update Store Halfword with Update Store Word with Update Load Multiple Word Store Multiple Word Add Scaled Immediate Add Scaled Immediate Carrying Multiply Low Scaled Immediate Compare Scaled Immediate Word Subtract From Scaled Immediate Carrying AND Scaled Immediate OR Scaled Immediate XOR Scaled Immediate Compare Logical Scaled Immediate Word Add Immediate Add Immediate Short Form Compare Logical Immediate Word Subtract Immediate Compare Immediate Word Short Form Bit Mask Generate Immediate AND Immediate Short Form Load Byte and Zero Store Byte Load Halfword Algebraic Shift Right Word Shift Right Algebraic Word Shift Left Word OR Short Form AND with Complement Short Form AND Short Form Load Immediate Short Form Load Word and Zero Store Word Load Halfword and Zero Store Halfword Bit Clear Immediate Bit Generate Immediate Bit Set Immediate Bit Test Immediate Shift Right Word Immediate Short Form Shift Right Algebraic Word Immediate Shift Left Word Immediate Short Form Load Immediate Add (2 operand) Immediate and Record Add (2 operand) Immediate Shifted Compare Immediate Word Multiply (2 operand) Low Immediate
VX VX VX VX VX VX VX VX VX VX D8 D8 D8 D8 D8 D8 D8 D8 D8 SCI8 SCI8 SCI8 SCI8 SCI8 SCI8 SCI8 SCI8 SCI8 D OIM5 OIM5 OIM5 IM5 IM5 IM5 D D D RR RR RR RR RR RR IM7 D D D D IM5 IM5 IM5 IM5 IM5 IM5 IM5 LI20 I16A I16A I16A I16A
SR SR SR SR SR SR
SR
SR
SR
1185
Form
Cat1 VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE VLE B B V V B 64 B B VLE B VLE B B 64 E B B B 64 B E.PD;64 E.PD B VLE VLE, E.HV V V B VLE B 64 B B VLE 64 B WT E.PD 64 V 64 B LMV B 64 B B E.PD
Mnemonic e_cmpl16i e_cmph16i e_cmphl16i e_or2i e_and2i. e_or2is e_lis e_and2is. e_rlwimi e_rlwinm e_b[l] e_bc[l] cmp tw lvsl lvebx subfc[o][.] mulhdu[.] addc[o][.] mulhwu[.] e_cmph isel e_mcrf mfcr lwarx ldx icbt lwzx slw[.] cntlzw[.] sld[.] and[.] ldepx lwepx cmpl e_crnor e_sc lvsr lvehx subf[o][.] e_cmphl lbarx ldux dcbst lwzux e_slwi[.] cntlzd[.] andc[.] wait dcbstep td lvewx mulhd[.] mulhw[.] dlmzb[.] mfmsr ldarx dcbf lbzx lbepx
Instruction Compare Logical Immediate Word Compare Halfword Immediate Compare Halfword Logical Immediate OR (two operand) Immediate AND (two operand) Immediate OR (2 operand) Immediate Shifted Load Immediate Shifted AND (2 operand) Immediate Shifted Rotate Left Word Immediate then Mask Insert Rotate Left Word Immediate then AND with Mask Branch [and Link] Branch Conditional [and Link] Compare Trap Word Load Vector for Shift Left Indexed Load Vector Element Byte Indexed Subtract From Carrying Multiply High Doubleword Unsigned Add Carrying Multiply High Word Unsigned Compare Halfword Integer Select Move CR Field Move From Condition Register Load Word And Reserve Indexed Load Doubleword Indexed Instruction Cache Block Touch Load Word and Zero Indexed Shift Left Word Count Leading Zeros Word Shift Left Doubleword AND Load Doubleword by External Process ID Indexed Load Word by External Process ID Indexed Compare Logical Condition Register NOR System Call Load Vector for Shift Right Indexed Load Vector Element Halfword Indexed Subtract From Compare Halfword Logical Load Byte and Reserve Indexed Load Doubleword with Update Indexed Data Cache Block Store Load Word and Zero with Update Indexed Shift Left Word Immediate Count Leading Zeros Doubleword AND with Complement Wait Data Cache Block Store by External PID Trap Doubleword Load Vector Element Word Indexed Multiply High Doubleword Multiply High Word Determine Leftmost Zero Byte Move From Machine State Register Load Doubleword And Reserve Indexed Data Cache Block Flush Load Byte and Zero Indexed Load Byte by External Process ID Indexed
I16A I16A I16A I16L I16L I16L I16L I16L M M BD24 BD15 X X X X XO XO XO XO X A XL XFX X X X X X X X X X X X XL ESC X X XO X X X X X X X X X X X X XO XO X X X X X X
SR SR
CT
SR SR SR SR
SR SR SR SR P P
SR
SR SR SR
SR SR SR P
1186
Form
Cat1 V B B B B B E.PD VLE E ECL V B B B E 64 B B E.PD;64 E.PD E ECL V 64 B VLE V B B E.PC 64 B E.PD VLE ECL V B 64 B B E.PC B B E.PD VLE E.DC E.PD B E.HV B B VLE B E.PD VLE E.DC E.PD B VLE B E.PD
Mnemonic lvx neg[o][.] lharx lbzux popcntb nor[.] dcbfep e_crandc wrtee dcbtstls stvebx subfe[o][.] adde[o][.] mtcrf mtmsr stdx stwcx. stwx stdepx stwepx wrteei dcbtls stvehx stdux stwux e_crxor stvewx subfze[o][.] addze[o][.] msgsnd stdcx. stbx stbepx e_crnand icblc stvx subfme[o][.] mulld[o][.] addme[o][.] mullw[o][.] msgclr dcbtst stbux dcbtstep e_crand mfdcrx lvepxl add[o][.] ehpriv dcbt lhzx e_rlw[.] eqv[.] lhepx e_creqv mfdcrux lvepx lhzux e_rlwi[.] xor[.] dcbtep
Instruction Load Vector Indexed Negate Load Halfword and Reserve Indexed Load Byte and Zero with Update Indexed Population Count Bytes NOR Data Cache Block Flush by External PID Condition Register AND with Complement Write MSR External Enable Data Cache Block Touch for Store and Lock Set Store Vector Element Byte Indexed Subtract From Extended Add Extended Move To Condition Register Fields Move To Machine State Register Store Doubleword Indexed Store Word Conditional Indexed Store Word Indexed Store Doubleword by External Process ID Indexed Store Word by External Process ID Indexed Write MSR External Enable Immediate Data Cache Block Touch and Lock Set Store Vector Element Halfword Indexed Store Doubleword with Update Indexed Store Word with Update Indexed Condition Register XOR Store Vector Element Word Indexed Subtract From Zero Extended Add to Zero Extended Message Send Store Doubleword Conditional Indexed Store Byte Indexed Store Byte by External Process ID Indexed Condition Register NAND Instruction Cache Block Lock Clear Store Vector Indexed Subtract From Minus One Extended Multiply Low Doubleword Add to Minus One Extended Multiply Low Word Message Clear Data Cache Block Touch for Store Store Byte with Update Indexed Data Cache Block Touch for Store by External PID Condition Register AND Move From Device Control Register Indexed Load Vector by External Process ID Indexed LRU Add Embedded Hypervisor Privilege Data Cache Block Touch Load Halfword and Zero Indexed Rotate Left Word Equivalent Load Halfword by External Process ID Indexed Condition Register Equivalent Move From Device Control Register User-mode Indexed Load Vector by External Process ID Indexed Load Halfword and Zero with Update Indexed Rotate Left Word Immediate XOR Data Cache Block Touch by External PID
X XO X X X X X XL X X X XO XO XFX X X X X X X X X X X X XL X XO XO X X X X XL X X XO XO XO XO X X X X XL X X XO XL X X X X X XL X X X X X X
SR
SR P P M SR SR P
P P P M
SR SR H P M SR SR SR SR H P P P SR
SR SR P P SR SR P
1187
Form
Cat1 E.DC E.CD E.PM B 64 B V 64 B E.DC ECL B B E.PD VLE E.DC B B VLE E.DC E.CI 64 B E.PM B E DS ECL E.CD V 64 B E DS MA B B 64 DS E VLE DS MA B E.PD DS DS MA B DS B DS MA B E.PD DS E E.PD E
Mnemonic mfdcr dcread mfpmr mfspr lwax lhax lvxl lwaux lhaux mtdcrx dcblc sthx orc[.] sthepx e_crorc mtdcrux sthux or[.] e_cror mtdcr dci divdu[o][.] divwu[o][.] mtpmr mtspr dcbi dsn icbtls dcread stvxl divd[o][.] divw[o][.] mcrxr lbdx lswx lwbrx srw[.] srd[.] lhdx tlbsync e_srwi[.] lwdx lswi sync lfdepx lddx stbdx stswx stwbrx sthdx stbcx. stwdx stswi sthcx. stfdepx stddx dcba stvepxl tlbivax
Instruction Move From Device Control Register Data Cache Read Move From Performance Monitor Register Move From Special Purpose Register Load Word Algebraic Indexed Load Halfword Algebraic Indexed Load Vector Indexed LRU Load Word Algebraic with Update Indexed Load Halfword Algebraic with Update Indexed Move To Device Control Register Indexed Data Cache Block Lock Clear Store Halfword Indexed OR with Complement Store Halfword by External Process ID Indexed Condition Register OR with Complement Move To Device Control Register User-mode Indexed Store Halfword with Update Indexed OR Condition Register OR Move To Device Control Register Data Cache Invalidate Divide Doubleword Unsigned Divide Word Unsigned Move To Performance Monitor Register Move To Special Purpose Register Data Cache Block Invalidate Decorated Storage Notify Instruction Cache Block Touch and Lock Set Data Cache Read [Alternative Encoding] Store Vector Indexed LRU Divide Doubleword Divide Word Move To Condition Register from XER Load Byte with Decoration Indexed Load String Word Indexed Load Word Byte-Reverse Indexed Shift Right Word Shift Right Doubleword Load Halfword with Decoration Indexed TLB Synchronize Shift Right Word Immediate Load Word with Decoration Indexed Load String Word Immediate Synchronize Load Floating-Point Double by External Process ID Indexed Load Doubleword with Decoration Indexed Store Byte with Decoration Indexed Store String Word Indexed Store Word Byte-Reverse Indexed Store Halfword with Decoration Indexed Store Byte Conditional Indexed Store Word with Decoration Indexed Store String Word Immediate Store Halfword Conditional Indexed Store Floating-Point Double by External Process ID Indexed Store Doubleword with Decoration Indexed Data Cache Block Allocate Store Vector by External Process ID Indexed LRU TLB Invalidate Virtual Address Indexed
P P O O
P M SR P
SR P H SR SR O O P M H SR SR
SR SR H SR
P H
1188
Form
Cat1 B B 64 E.PD DS E.PD B 64 E E B B E.PD DS E B E.CI E B 64 E.PD E.CD B E.PD B B VLE VLE VLE VLE VLE VLE VLE VLE
Mnemonic lhbrx sraw[.] srad[.] evlddepx lfddx stvepx srawi[.] sradi[.] mbar tlbsx sthbrx extsh[.] evstddepx stfddx tlbre extsb[.] ici tlbwe icbi extsw[.] icbiep icread dcbz dcbzep mfocrf mtocrf se_lbz se_stb se_lhz se_sth se_lwz se_stw se_bc se_b[l]
Instruction Load Halfword Byte-Reverse Indexed Shift Right Algebraic Word Shift Right Algebraic Doubleword Vector Load Doubleword into Doubleword by External Process ID Indexed Load Floating Doubleword with Decoration Indexed Store Vector by External Process ID Indexed Shift Right Algebraic Word Immediate Shift Right Algebraic Doubleword Immediate Memory Barrier TLB Search Indexed Store Halfword Byte-Reverse Indexed Extend Sign Halfword Vector Store Doubleword into Doubleword by External Process ID Indexed Store Floating Doubleword with Decoration Indexed TLB Read Entry Extend Sign Byte Instruction Cache Invalidate TLB Write Entry Instruction Cache Block Invalidate Extend Sign Word Instruction Cache Block Invalidate by External PID Instruction Cache Read Data Cache Block set to Zero Data Cache Block set to Zero by External PID Move From One Condition Register Field Move To One Condition Register Field Load Byte and Zero Short Form Store Byte Short Form Load Halfword and Zero Short Form Store Halfword Short Form Load Word and Zero Short Form Store Word Short Form Branch Conditional Short Form Branch [and Link]
X X X EVX X X X XS X X X X EVX X X X X X X X X X X X XFX XFX SD4 SD4 SD4 SD4 SD4 SD4 BD8 BD8
1
See the key to the mode dependency and privilege column below and the key to the category column in Section 1.3.5 of Book I.
16-bit instructions, the Opcode column represents the 16-bit hexadecimal instruction encoding with the opcode and extended opcode in the corresponding fields in the instruction, and with 0s in bit positions which are not opcode bits; dashes are used following the opcode to indicate the form is a 16-bit instruction. For 32-bit instructions, the "Opcode" column represents the 32-bit hexadecimal instruction encoding with the opcode, extended opcode, and other fields with fixed values in the corresponding fields in the instruction, and with 0s in bit positions which are not opcode, extended opcode or fixed value bits."
2For
1189
1190
1191
1192
In several cases the Power ISA assumes that reserved fields in POWER instructions indeed contain zero. The cases include the following. bclr[l] and bcctr[l] assume that bits 19:20 in the POWER instructions contain zero. cmpi, cmp, cmpli, and cmpl assume that bit 10 in the POWER instructions contains zero. mtspr and mfspr assume that bits 16:20 in the POWER instructions contain zero. mtcrf and mfcr assume that bit 11 in the POWER instructions is contains zero. Synchronize assumes that bits 9:10 in the POWER instruction (dcs) contain zero. (This assumption provides compatibility for application programs, but not necessarily for operating system programs; see Section A.22.) mtmsr assumes that bit 15 in the POWER instruction contains zero.
1193
A.9 BH Field
Bits 19:20 of the Branch Conditional to Link Register and Branch Conditional to Count Register instructions are reserved in POWER but are defined as a branch hint (BH) field in Power ISA. Because these bits are hints, they may affect performance but do not affect the results of executing the instruction.
A.8 BO Field
POWER shows certain bits in the BO field used by Branch Conditional instructions as x. Although the POWER Architecture does not say how these bits are to be interpreted, they are in fact ignored by the processor.
1194
In Power ISA a Move Assist instruction may be interrupted by a system-caused interrupt, while in POWER the instruction cannot be thus interrupted.
If the instruction is executed in privileged state, a Hypervisor Emulation Assistance interrupt occurs if the SPR value is 0 or, for mfspr only, if the SPR value is 4, 5, or 6. In
1195
A.22 Synchronization
The Synchronize instruction (called dcs in POWER) and the isync instruction (called ics in POWER) cause more pervasive synchronization in Power ISA than in POWER. However, unlike dcs, Synchronize does not wait until data cache block writes caused by preceding instructions have been performed in main storage. Also, Synchronize has an L field while dcs does not, and some uses of the instruction by the operating system require L=2<S>. (The L field corresponds to reserved bits in dcs and hence is expected to be zero in POWER programs; see Section A.3.)
1196
A.29.2 Decrementer
The Power ISA Decrementer differs from the POWER Decrementer in the following respects. The Power ISA DEC decrements at the same rate that the TB increments, while the POWER DEC decrements every nanosecond (which is the same rate that the RTC increments). Not all bits of the POWER DEC need be implemented, while all bits of the Power ISA DEC must be implemented. The interrupt caused by the DEC has its own interrupt vector location in Power ISA, but is considered an External interrupt in POWER.
1197
(*) This instruction is privileged. Assembler Note It might be helpful to current software writers for the Assembler to flag the discontinued POWER instructions.
(*) This instruction is privileged. Note: Many of these instructions use the MQ register. The MQ is not defined in the Power ISA.
1198
the second column of the table: the remainder of the line gives the Power ISA mnemonic and the page on which the instruction is described, as well as the instruction names. POWER2 mnemonics that have not changed are not listed.
POWER2 Instruction Mnemonic Floating Convert Double to Integer fctiw[.] with Round Floating Convert Double to Integer fctiwz[.] with Round to Zero
Power ISA Instruction Floating Convert To Integer Word Floating Convert To Integer Word with round toward Zero
PRI 56 57 31 31 60 61 31 31
Differences between the l/stfdp[x] instructions and the POWER2 l/stfq[u][x] instructions include the following. The storage operand for the l/stfdp[x] instructions must be quadword aligned for optimal performance. The register pairs for the l/stfdp[x] instructions must be even-odd pairs, instead of any consecutive pair. The l/stfdp[x] instructions do not have update forms.
1199
A.32.5 Trace
The Trace interrupt vector location differs between the two architectures, and there are many other differences.
1200
1201
Category Base Server Embedded Alternate Time Base Cache Specification Decimal Floating-Point Decorated Storage Embedded.Cache Debug Embedded.Cache Initialization Embedded.Device Control Embedded.Enhanced Debug Embedded.External PID Embedded.Hypervisor Embedded.Hypervisor.LRAT Embedded.Little-Endian Embedded.Page Table Embedded.Performance Monitor Embedded.Processor Control Embedded Cache Locking Embedded Multi-Threading Embedded Multi-Threading.Thread Management Embedded.TLB Write Conditional External Control External Proxy Floating-Point Floating-Point.Record Legacy Move Assist Legacy Integer Multiply-Accumulate Load/Store Quadword Memory Coherence Move Assist Processor Compatibility Server.Performance Monitor Signal Processing Engine SPE.Embedded Float Scalar Double SPE.Embedded Float Scalar Single SPE.Embedded Float Vector Store Conditional Page Mobility Stream Strong Access Order Trace
Server Platform x x
Embedded Platform x x
x x
x x
x2 x x x
x x x x
1202
Server Platform + +1 x
Embedded Platform
1. If the Vector category is supported, Vector.Little-Endian is required on the Server platform. 2. Optional for the Server Platform. Figure 21. Platform Support Requirements (Sheet 2 of 2)
1203
1204
decimal 1 8 9 13 17 18 19 22 25 26 27 28 29 48 54 55 56 57 58 59 61 62 63 136 152 157 256 259 260-263 268 269 272-275 276-279 282 284 285 286 286 287
SPR1 spr5:9 spr0:4 00000 00001 00000 01000 00000 01001 00000 01101 00000 10001 00000 10010 00000 10011 00000 10110 00000 11001 00000 11010 00000 11011 00000 11100 00000 11101 00001 10000 00001 10110 00001 10111 00001 11000 00001 11001 00001 11010 00001 11011 00001 11101 00001 11110 00001 11111 00100 01000 00100 11000 00100 11101 01000 00000 01000 00011 01000 001xx 01000 01100 01000 01101 01000 100xx 01000 101xx 01000 11010 01000 11100 01000 11101 01000 11110 01000 11110 01000 11111
Register Name XER LR CTR AMR DSCR DSISR DAR DEC SDR1 SRR0 SRR1 CFAR AMR PID DECAR MCIVPR LPER LPERU CSRR0 CSRR1 DEAR ESR IVPR CTRL CTRL UAMOR VRSAVE SPRG3 SPRG[4-7] TB TBU SPRG[0-3] SPRG[4-7] EAR TBL TBU TBU40 PIR PVR
Privileged mtspr mfspr no no no no no no no no9 yes yes yes yes yes yes yes12 yes12 3 hypv3 hypv 13 yes13 yes 13 yes yes13 yes yes yes yes9 yes yes hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 hypv12 yes13 yes13 13 yes13 yes 12 hypv hypv12 no yes yes yes10 no no no no no no yes13 yes13 yes yes hypv4 hypv4 hypv4 hypv4 hypv hypv12 yes13 yes
Length (bits) 64 64 64 64 64 32 64 32 64 64 64 64 64 32 32 64 64 32 64 32 64 32 64 32 32 64 32 64 64 64 32 64 64 32 32 32 64 32 32
1205
decimal 304 304 305 306 306 307 307 308 308 309 309 310 310 311 312 312 313 313 314 314 315 315 316 317 318 318 319 319 336 336 337 338 338 339 339 340 341 342 343 344-347 348 349 349 350 368-371 372 373 378 379 380 381 382 383 400-415
Length Cat2 (bits) 64 S 32 E 64 S 32 S 32 E.HV 64 S 32 E.HV, (E;64) 64 S 32 E 64 S 32 E 32 S 32 E 32 E.HV 64 S 64 E 64 S 64 E 64 S 64 E 64 S 64 E 64 E 64 E 64 S 64 E 32 S 64 E 32 E 64 S 64 S 64 S 32 E.HV 32 S 32 E.HV 32 E 32 E.HV 32 E.HV.LRAT 32 E.HV.LRAT 32 E.HV 64 E.HV; 64 64 E.HV; 64 64 S 32 E.PT 64 E.HV 64 E; 64 64 E; 64 64 E.HV 32 E.HV 32 E.HV;EXP 64 E.HV 32 E.HV 32 E.HV 32 E
1206
decimal 432-435 436 437 438 439 440-441 442 443 444 445 446 447 512 526 527 528 529 530 531 532 533 570 571 572 574 575 604 605 624 625 626 627 628 630 631 688-691 702 768-783 784-799 896 898 924 925 926 927 944 947 948 979 1012 1013 1015 1015 1023
Length (bits) 32 32 64 64 64 32 32 32 32 32 64 64 32 64 32 32 32 32 32 32 32 64 32 64 64 32 64 64 32 32 64 32 32 32 32 32 32 64 64 64 32 32 32 32 32 32 32 32 32 32 64 64 32 32
Cat2 E.HV E.HV.LRAT E.MT E.MT E.MT E.HV E.HV E.HV E.HV E.HV E.MT E.HV SP ATB ATB SP SP SP E.PM E.PC E.PC E E E E.ED E.ED E E.ED E E E E E E E E EXP S.PM S.PM S B11 E.CD E.CD E.CD E.CD E E.PD E.PD E.CD E S S E S
1207
1208
1209
1210
2. Instructions for the POWER Architecture that have not been included in the Power ISA. These are listed in Section A.31, Discontinued Opcodes and Section A.33.1, Discontinued Opcodes. 3. Implementation-specific instructions used to conform to the Power ISA specification. 4. Any other implementation-dependent instructions that are not defined in the Power ISA.
1211
1212
The category abbreviations are shown on Section 1.3.5 of Book I. However, the categories Phased-In, Phased-Out, and floating-point Record are not listed in the opcode tables. The extended opcode tables show the extended opcode in decimal, the instruction mnemonic, the category, and the instruction format. These tables appear in order of primary opcode within three groups. The first group consists of the primary opcodes that have small extended opcode fields (2-4 bits), namely 30, 58, and 62. The second group consists of primary opcodes that have 11-bit extended opcode fields. The third group consists of primary opcodes that have 10-bit extended opcode fields. The tables for the second and third groups are rotated. In the extended opcode tables several special markings are used. A prime () following an instruction mnemonic denotes an additional cell, after the lowest-numbered one, used by the instruction. For example, subfc occupies cells 8 and 520 of primary opcode 31, with the former corresponding to OE=0 and the latter to OE=1. Similarly, sradi occupies cells 826 and 827, with the former corresponding to sh5=0 and the latter to sh5=1 (the 9-bit extended opcode 413, shown on page 96, excludes the sh5 bit). Two vertical bars (||) are used instead of primed mnemonics when an instruction occupies an entire column of a table. The instruction mnemonic is repeated in the last cell of the column. For primary opcode 31, an asterisk (*) in a cell that would otherwise be empty means that the cell is
1213
01 2 tdi 64 05 6
03 See primary opcode 0 extensions on page 1213 D Trap Doubleword Immediate Trap Word Immediate 07 See Table 9 and Table 10 Multiply Low Immediate 0B Subtract From Immediate Carrying
09 10 B 0D 14
Compare Logical Immediate D Compare Immediate 0F Add Immediate Carrying Add Immediate Carrying and Record addis Add Immediate D Add Immediate Shifted D B 13 Branch Conditional 12 19 System Call CR ops, Branch etc. XL See Table 12 on page 1227 I 16 23 17 Rotate Left Word Imm. then Mask Insert rlwnm Rotate Left Word Imm. then AND with Mask M Rotate Left Word then AND with Mask 1B OR Immediate xoris OR Immediate Shifted XOR Immediate B D XOR Immediate Shifted 31 1F AND Immediate AND Immediate Shifted FX Extended Ops See Table 4 on page 1216 See Table 12 on page 1227 35 23 Load Word and Zero Load Word and Zero with Update lbzu Load Byte and Zero B D Load Byte and Zero with Update 39 27 Store Word Store Word with Update stbu Store Byte B D Store Byte with Update 43 2B Load Half and Zero Load Half and Zero with Update lhau Load Half Algebraic B D Load Half Algebraic with Update 47 2F Store Half stmw Store Half with Update Load Multiple Word B D Store Multiple Word 51 33 Load Floating-Point Single lfdu Load Floating-Point Single with Update Load Floating-Point Double FP D Load Floating-Point Double with Update 55 37 Store Floating-Point Single stfdu Store Floating-Point Single with Update Store Floating-Point Double FP D Store Floating-Point Double with Update 59 3B Load Quadword See Table 5 on page 1216 FP Single See Table 6 on page 1216 & DFP Ops See Table 17 on page 1231
B 1A 27
1214
1215
Table 4: Extended opcodes for primary opcode 30 (instruction bits 27:30) 00 01 10 11 3 1 2 0 rldicr rldicl rldicr rldicl 00 64 64 MD MD MD MD 7 5 6 4 rldimi rldic rldimi rldic 01 64 64 MD MD MD MD 9 8 rldcr rldcl 10 64 64 MDS MDS 11
Table 7: Extended opcodes for primary opcode 61 (instruction bits 30:31) 0 1 0 stfdp 0 FP DS
Table 5: Extended opcodes for primary opcode 57 (instruction bits 30:31) 0 1 0 lfdp 0 FP DS 1
Table 8: Extended opcodes for primary opcode 62 (instruction bits 30:31) 0 1 1 0 std stdu 0 64 64 DS DS 2 stq 1 LSQ DS
Table 6: Extended opcodes for primary opcode 58 (instruction bits 30:31) 0 1 1 0 ld ldu 0 64 64 DS DS 2 lwa 1 64 DS
1216
000001
000010
000011
000100
000101
000110
vcmpequb
000111
001000
001001
001010
001011
001100
001101
001110
001111
V 70 V V
VC VC 134
vcmpequh vcmpequw
8 vmuloub V VX 72 vmulouh V VX
10 vaddfp V VX 74 vsubfp V VX
VC 198 vcmpeqfp V VC 264 vmulosb V VX 328 vmulosh V VX 266 vrefp VX 330 vrsqrtefp V VX 394 vexptefp V VX 458 vlogefp V VX 522 vrfin V VX 586 vrfiz V VX 650 vrfip V VX 714 vrfim V VX 778 vcfux V VX 842 vcfsx V VX 906 vctuxs V VX 970 vctsxs V VX 1034 vmaxfp V VX 1098 vminfp V VX V
384
VX
512
768
260 vslb VX 324 vslh V VX 388 vslw V VX 452 vsl V VX 516 vsrb V VX 580 vsrh V VX 644 vsrw V VX 708 vsr V VX 772 vsrab V VX 836 vsrah V VX 900 vsraw V VX V
1024
10001 vsubuhm
VX 1088
1408
VX
1028 vand VX 1092 vandc V VX 1156 vor V VX 1220 vxor V VX 1284 vnor V VX
VC 710 vcmpgtfp V VC 774 vcmpgtsb V VC 838 vcmpgtsh V VC 902 vcmpgtsw V VC 966 vcmpbfp V VC 1030
vcmpequb. vcmpequh.
14 vpkuhum V VX 78 vpkuwum V VX 142 vpkuhus V VX 206 vpkuwus V VX 270 vpkshus V VX 334 vpkswus V VX 398 vpkshss V VX 462 vpkswss V VX 526 vupkhsb V VX 590 vupkhsh V VX 654 vupklsb V VX 718 vupklsh V VX 782 vpkpx V VX 846 vupkhpx V VX
V V V V
vcmpequw. vcmpeqfp.
10111
vcmpgefp.
1478
vcmpgtub. vcmpgtuh.
vsum4ubs
1544
vcmpgtuw.
1792
VX
V V V V
1800 vsum4sbs V VX
vcmpgtsw. vcmpbfp.
vsumsws
1928
VX
1217
16 80
17 81
machhwu long LMA XO LMA XO machhw machhw. LMA XO LMA XO long long LMA XO LMA XO machhws long LMA XO LMA XO
24 88
24 89
00001 mulhhw
LMA
mulhhw. X LMA X
92
93
220
220
272 336
273 337
macchwu long LMA XO LMA XO macchw macchw. LMA XO LMA XO long long LMA XO LMA XO macchws long LMA XO LMA XO long long LMA XO LMA XO nmacchw long LMA XO LMA XO
00101 mulchw
LMA
mulchw. X LMA X
348
349
476
477
784 848
784 849
maclhwu maclhwu. LMA XO LMA XO maclhw maclhw. LMA XO LMA XO long long LMA XO LMA XO maclhws maclhws. LMA XO LMA XO long long LMA XO LMA XO machhw' long LMA XO LMA XO long long LMA XO LMA XO long long LMA XO LMA XO long long LMA XO LMA XO macchw' long LMA XO LMA XO long long LMA XO LMA XO long long LMA XO LMA XO
01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
860
861
988
989
1048 1112
1049 1113
1116
1117
1176
1177
1244
1245
1372
1373
1500
1501
long long LMA XO LMA XO maclhw maclhw' LMA XO LMA XO long long LMA XO LMA XO long long LMA XO LMA XO
1884
1885
2012
2013
1218
101011
vperm
101100
vsldoi
101101
101110
101111
32 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
32 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
34 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
36 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
37 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
38 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
39 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
40 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
41 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
42 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
43 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
44 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
vmaddfp vnmsubfp
46 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
47 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
VA V
VA V
VA
VA V
VA V
VA V
VA V
VA V
VA V
VA V
VA V
VA
VA V
VA
00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
vsel
vperm
vsdoi
vmaddfp vnmsubfp
1219
1220
Table 10:(Left) Extended opcodes for primary opcode 4 [Category: SP.*] (instruction bits 21:31)
000000 00000 00001 00010 00011 00100 00101 00110 00111 01000 evaddw
SP EVX
000001
000010
000011
000100
000101
000110
000111
001000
001001
001010
001011
001100
001101
001110
001111
512
evaddiw SP EVX
514
evsubfw SP EVX
516
evsubifw SP EVX
518
evabs evneg evextsb evextsh evrndw evcntlzw evcntlsw brinc SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX
520
521
522
523
524
525
526
527
01001 01010 evfsadd 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
long long long long evmra SP EVX SP EVX SP EVX SP EVX SP EVX long long SP EVX SP EVX long long SP EVX SP EVX long long SP EVX SP EVX long long SP EVX SP EVX evmhessf SP EVX evfssub sp.fv EVX sp.fv EVX efsadd efssub sp.fs EVX sp.fs EVX
evfsabs evfsnabs evfsneg sp.fv EVX sp.fv EVX sp.fv EVX efsabs efsnabs efsneg sp.fs EVX sp.fs EVX sp.fs EVX
646 710
evfsmul evfsdiv sp.fv EVX sp.fv EVX efsmul efsdiv sp.fs EVX sp.fs EVX long long SP EVX SP EVX
long evfscmplt long sp.fv EVX sp.fv EVX sp.fv EVX efscmpgt efscmplt efscmpeq efscfd sp.fs EVX sp.fs EVX sp.fs EVX sp.fd EVX long long long long SP EVX SP EVX SP EVX SP EVX
719 783
evlddx evldd evldwx evldw evldhx evldh SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX
770
771
1027
evmhossf long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1031 1095
1032 1096
1033
long long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1035
1036 1100
1037 1101
1039 1103
1218
1219 1283
1220 1284
evdivws evdivwu long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX
1222
1223 1287
1226
1227 1291
1285
long long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1292
1293
long SP EVX
1295
1411
1412
1413
long long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1415
1419
1420
1421
long SP EVX
1423
1221
010001
010010
010011
010100
010101
010110
010111
011000
011001
011010
011011
011100
011101
011110
011111
529
530
534
535
536
537
evorc SP EVX
539
evnand SP EVX
542
658 722
659 723
666 730
evfststgt evfststlt evfststeq sp.fv EVX sp.fv EVX sp.fv EVX efststgt efststlt efststeq sp.fs EVX sp.fs EVX sp.fs EVX long long SP EVX SP EVX
670 734
01100 evlwhex 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
evlwhoux evlwhou evlwhosx evlwhos long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX
793
evmwssf SP EVX
1107
1112
1113
long SP EVX
1115
long SP EVX
1363
1368
1369
long SP EVX
1371
long SP EVX
1491
1496
1497
long SP EVX
1499
1222
100001
100010
100011
100100
100101
100110
100111
101000
101001
101010
101011
101100
101101
101110
101111
544
545
546
547
548
evslwi SP EVX
550
evrlw evsplati evrlwi evsplatfi long long long long SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX SP EVX
552
553
554
555
556
557
558
559
736 800
737 801
738 802
739 803
740 804
741 805
742
efdmul efddiv efdctuidz efdctsidz efdcmpgt efdcmplt efdcmpeq efdcfs sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX
744
745
746
747
748
749
750
751
01100 evstddx 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
long SP EVX
1059
long long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1063 1127
1064 1128
1065
long long long SP EVX SP EVX SP EVX long long SP EVX SP EVX
1067
1068 1132
1069 1133
1071 1135
1320
1321
1323
1324
1325
long SP EVX
1327
1448
1449
1451
1452
1453
long SP EVX
1455
1223
110001
110010
110011
110100
110101
110110
110111
111000
111001
111010
111011
111100
111101
111110
111111
560
561
562
563
564
evsel evsel evsel evsel evsel evsel evsel evsel SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS SP EVS
632
633
634
635
636
637
638
639
efdcfui efdcfsi efdcfuf efdcfsf efdctui efdctsi efdctuf efdctsf efdctuiz sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX sp.fdEVX
752 816
753 817
754
755
756 820
757 821
758
759
760 824
efdctsiz sp.fdEVX
762
efdtstgt efdtstlt efdtsteq sp.fdEVX sp.fdEVX sp.fdEVX evstwwox evstwwo SP EVX SP EVX
764 828
765 829
766
825
01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
long SP EVX
1139
1144
1145
long SP EVX
1147
1224
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
33 crnor B XL
129 crandc B XL
dnh E.EDXFX
198
lxvdsx
332
VSX XX
1225
16 00000 bclr B XL
00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
51 rfci XL
150 isync B XL
274 hrfid S XL
402 doze S XL 434 nap S XL 466 sleep S XL 498 rvwinkle S XL 528 bcctr B XL
1226
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
01011
01100
01101
01110
01111
0 cmp B X 32 cmpl B X
B 33 Resd VLE
4 tw
64
68 td
193 Resd VLE 225 Resd VLE 257 Resd VLE 289 Resd VLE
259 mfdcrx E.DC X 291 mfdcrux E.DC X 323 mfdcr E.DC XFX
7 lvebx V X 39 lvehx V X 71 lvewx V X 103 lvx V X 134 135 dcbtstls stvebx ECL X V X 166 167 dcbtls stvehx ECL X V X 199 stvewx V X 230 231 icblc stvx ECL X V X 262 263 lvepxl Resd E.PD X AP 295
lvepx E.PD X
6 lvsl V X 38 lvsr V X
8 subfc B XO 40 subf B XO
9 mulhdu 64 XO
10 addc B XO
11 mulhwu B XO
dlmzb LMV X
138 adde B XO
233 mulld 64 XO
msgsnd E.PC X
206
235 mullw B XO
msgclr E.PC X
238
dcread E.CD X
326
lxvdsx VSX XX
332
334
{366} mftmr 393 395 divdeu divweu 64 XO B XO 425 427 divde divwe 64 XO B XO 457 459 divdu divwu 64 XO B XO 489 491 divd divw 64 XO B XO 521 522 523 mulhdu addc mulhwu 64XO B XO B XO 398 mvptas
390
419 451
dci E.CI X
454
DS X 515 lbdx DS X 547 lhdx DS X 579 lwdx DS X 611 lddx DS X 643 stbdx DS X 675 sthdx DS X 707 stwdx DS X 739 stddx DS X
dsn
486 Resd AP
462
585 586 587 588 lxsdx mulhd addg6s mulhw 64 XO BCDA XO B XO VSX XX 616 neg B XO 648 subfe B XO
650 adde B XO
775
stxsdx
716
780
VSX XX
802
803 lfddx DS X
807
844
VSX XX
stxvw4x
908
VSX XX
divde
937
divwe
939
icread E.CD X
1227
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
lwepx E.PD X
16 Resd VLE
18 tlbilx E X
19 mfcr B XFX
(82) mtsrd X (114) mtsrdin X 146 mtmsr B X 178 mtmsrd S X 210 mtsr S X 242 mtsrin S X 274 tlbiel S X 306 tlbie S X
83 mfmsr B X
21 ldx 64 X 53 ldux 64 X
308 Resd 339 mfspr B XFX 371 mftb S XFX 341 lwax 64 X 373 lwaux 64 X
214 stdcx. 64 X 246 dcbtst B X 278 dcbt B X 310 eciwx EC X 342 Resd AP 374 Resd AP
467 mtspr B XFX 498 slbia S X (530) no-op (562) no-op (594) no-op (626) no-op (658) no-op (690) no-op (722) no-op (754) no-op 786 tlbivax E X (818) rac X 850 851 tlbsrx. slbmfev E.TWC X S X 659 mfsrin S X 660 stdbrx 64 X 595 mfsr S X
597 lswi MA X
661 stswx MA X
725 stswi MA X
915 slbmfee S X
979 slbfee. S X
1010 Resd
789 lwzcix S X 821 lhzcix S X 853 lbzcix S X 885 ldcix S X 917 stwcix S X 949 sthcix S X 981 stbcix S X 1013 stdcix S X
23 lwzx B X 55 lwzux B X 87 lbzx B X 119 lbzux B X 151 stwx B X 183 stwux B X 215 stbx B X 247 stbux B X 279 lhzx B X 311 lhzux B X 343 lhax B X 375 lhaux B X 407 sthx B X 439 sthux B X 471 lmw* All D 503 stmw* All D 535 lfsx FP X 567 lfsux FP X 599 lfdx FP X 631 lfdux FP X 663 stfsx FP X 695 stfsux FP X 727 stfdx FP X 759 stfdux FP X 791 lfdpx FP X 823 Resd 855 lfiwax FP X 887 lfiwzx FP X 919 stfdpx FP X 951 Resd AP 983 stfiwx FP X
26 cntlzw B X 58 cntlzd 64 X
27 sld 64 X
popcntb prtyw
122
B X 154 B X 186
prtyd
64
28 29 30 ldepx and rldicl* B X E.PD X 64 MD 62 60 See andc Table 13 B X 94 rldicr* 64 MD 124 126 nor rldicr* B X 64 MD 157 158 stdepx rldic* E.PD;64 X 64 MD 190 rldic* 64 MD 222 rldimi* 64 MD 252 254 bpermd rldimi* 64 X 64 MD 286 284 285 evlddepx rldcl* eqv B X E.PD evx 64 MDS 316 318 xor rldcr* B X 64 MDS 350 * 382 * 412 orc
evstddepx
31
95
127
159
286
319
popcntw
378
X 413 X E.PD evx 444 or B X 476 nand B X 508 cmpb B X 539 srd 64 X B
See Table 15
447 andis.* B D
popcntd
506
lfdepx E.PD X
607
stfdepx E.PD X
735
827 sradi 64 XS
854
918 sthbrx B X
991
1023
1228
62
62
854
mbar X
854
159 159 00100 rlwimi* stwepx B M E.PD X 191 00101 rlwinm* B M 223
00110 00111 01000 01001 01010 01011 01100
stbepx E.PD X
255 rlwnm* B M 287 287 lhepx ori* B D E.PD X 319 319 dcbtep oris* B D E.PD X 351 xori* B D 383 xoris* B D 415 415 sthepx andi.* B D E.PD X
1229
15 || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || ||
00001 00010
47 *
175 * 207 *
335 cmpli* B D 367 cmpi* B D 399 addic* B D 431 addic.* B D 463 addi* B D 495 addis* B D
isel
1230
2 dadd DFP X 34 dmul DFP X 66 dscli DFP Z22 98 dscri DFP Z 130 dcmpo DFP X 162 dtstex DFP X 194 dtstdc DFP Z23 226 dtstdg DFP Z23 258 dctdps DFP X 290 dctfix DFP X 322 ddedpd DFP X 354 dxex DFP X
227 drintn DFP Z23 259 dqua DFP Z 291 drrnd DFP Z 323 dquai DFP Z 355 drintx DFP Z23
514 dsub DFP X 546 ddiv DFP X 578 dscli DFP Z22 610 dscri DFP Z22 642 dcmpu DFP X 674 dtstsf DFP X 706 dtstdc DFP Z23 738 dtstdg DFP Z23 770 drsp DFP X
515 dqua DFP Z 547 drrnd DFP Z 579 dquai DFP Z 611 drintx DFP Z23
739 drintn DFP Z23 771 dqua DFP Z 803 drrnd DFP Z 834 835 denbcd dquai DFP X DFP Z 866 867 diex drintx DFP X DFP Z23
846 fcfids FP X
1231
18 fdivs FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fdivs
20 fsubs FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fsubs
21 fadds FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fadds
22 fsqrts FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fsqrts
24 fres FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fres
25 fmuls FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fmuls
26 frsqrtes FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || frsqrtes
28 fmsubs FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fmsub
1232
XX XX XX
VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
128 xsadddp 160 xssubdp 192 xsmuldp 224 xsdivdp 256 xvdivsp 288 xvdivsp 320 xvdivsp 352 xvdivsp 384 xvdivdp 416 xvdivdp 448 xvdivdp 480 xvdivdp
XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX
132 xsmaddadp 164 xsmaddmdp 196 xsmsubadp 228 xsmsubmdp 260 xvmaddasp 292 xvmaddmsp 324 xvmsubasp 356 xvmsubmsp 388 xvmaddadp 420 xvmaddmdp 452 xvmsubadp 484 xvmsubmdp
XX VSX XX VSX XX VSX XX XX VSX XX VSX XX VSX XX XX VSX XX VSX XX XX VSX VSX VSX VSX
XX VSX XX VSX XX
XX XX
XX XX XX
XX XX XX
520 xxland 552 xxlandc 584 xxlor 616 xxlxor 648 xxlnor
XX XX XX XX XX
644 xsnmaddadp 676 xsnmaddmdp 708 xsnmsubadp 740 xsnmsubmdp 772 xvnmaddasp 804 xvnmaddmsp 836 xvnmsubasp 868 xvnmsubmsp 900 xvnmaddadp 932 xvnmaddmdp 964 xvnmsubadp 996 xvnmsubmdp
XX VSX XX XX XX XX XX XX XX XX XX XX XX
XX XX XX
XX XX XX
1233
VSX
24 xxsel
XA
146 144 xscvdpuxws xsrdpi VSX XX VSX 176 178 00101 xscvdpsxws xsrdpiz VSX XX VSX 210 00110 xsrdpip VSX 242 00111 xsrdpim VSX 272 274 01000 xvcvspuxws xvrspi VSX XX VSX 304 306 01001 xvcvspsxws xvrspiz VSX XX VSX 338 336 01010 xvcvuxwsp xvrspip VSX XX VSX 368 370 01011 xvcvsxwsp xvrspim VSX XX VSX 402 400 01100 xvcvdpuxws xvrdpi VSX XX VSX 434 432 01101 xvcvdpsxws xvrdpiz VSX XX VSX 464 466 01110 xvcvuxwdp xvrdpip VSX XX VSX 498 496 01111 xvcvsxwdp xvrdpim VSX XX VSX 530 10000 xscvdpsp VSX
00100 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
150 xssqrtdp*
XX
214 xsrdpic*
XX XX
VSX
278 xvsqrtsp*
XX
342 xvrspic*
XX XX
406 xvsqrtdp*
XX
470 xvrdpic*
XX XX
656 xscvdpuxds VSX 688 xscvdpsxds VSX 720 xscvuxddp VSX 752 xscvsxddp VSX 784 xvcvspuxds VSX 816 xvcvspsxds VSX 848 xvcvuxdsp VSX 880 xvcvsxdsp VSX 912 xvcvdpuxds VSX 944 xvcvdpsxds VSX 976 xvcvuxddp VSX 1008 xvcvsxddp VSX
XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX XX VSX
658 xscvspdp 690 xsabsdp 722 xsnabsdp 754 xsnegdp 786 xvcvdpsp 818 xvabssp 850 xvnabssp 882 xvnegsp 914 xvcvspdp 946 xvabsdp 978 xvnabsdp 1010 xvnegdp
XX XX XX XX XX XX XX XX XX XX XX XX
1234
2 daddq DFP X 34 dmulq DFP X 66 dscliq DFP Z22 98 dscriq DFP Z 130 dcmpoq DFP X 162 dtstexq DFP X 194 dtstdcq DFP Z22 226 dtstdgq DFP Z22 258 dctqpq DFP X 290 dctfixq DFP X 322 ddedpdq DFP X 354 dxexq DFP X
3 dquaq DFP Z 35 drrndq DFP Z23 67 dquaiq DFP Z 99 drintxq DFP Z23
38 mtfsb1 FP X 70 mtfsb0 FP X
12 frsp FP X
14 fctiw FP X
15 fctiwz FP X
134 mtfsfi FP X
136 fnabs FP X
142 fctiwu FP X
143 fctiwuz FP X
227 drintnq DFP Z23 259 dqua DFP Z 291 drrnd DFP Z 323 dquai DFP Z 355 drintx DFP Z23
264 fabs FP X
392
frin friz
FP X 424 FP X 456
frip
514 dsubq DFP X 546 ddivq DFP X 578 dscli DFP Z22 610 dscri DFP Z22 642 dcmpuq DFP X 674 dtstsfq DFP X 706 dtstdc DFP Z23 738 dtstdg DFP Z23 770 drdpq DFP X 802 dcffixq DFP X 834 denbcdq DFP X 866 diexq DFP X
483 drintn DFP Z23 515 dqua DFP Z 547 drrnd DFP Z 579 dquai DFP Z 611 drintx DFP Z23
FP X 488
frim
FP
583 mffs FP X
711 mtfsf FP XFL 739 drintn DFP Z23 771 dqua DFP Z 803 drrnd DFP Z 835 dquai DFP Z 867 drintx DFP Z23
815 fctidz FP X
943 fctiduz FP X
1235
18 fdiv FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fdiv
20 fsub FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fsub
21 fadd FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fadd
22 fsqrt FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fsqrt
23 fsel FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fsel
24 fre FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fre
25 fmul FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fmul
26 frsqrte FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || frsqrte
28 fmsub FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fmsub
29 fmadd FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fmadd
30 fnmsub FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fnmsub
31 fnmadd FP A || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || || fnmadd
1236
Mnemonic bpermd cntlzd[.] divd[o][.] divde[o][.] divdeu[o][.] divdu[o][.] extsw[.] ld ldarx ldbrx ldu ldux ldx lwa lwaux lwax mulhd[.] mulhdu[.] mulld[o][.] prtyd rldcl[.] rldcr[.] rldic[.] rldicl[.] rldicr[.] rldimi[.] sld[.] srad[.] sradi[.] srd[.] std stdbrx stdcx. stdu stdux stdx td tdi add[o][.] addc[o][.] adde[o][.] addi addic addic. addis
Instruction Bit Permute Doubleword Count Leading Zeros Doubleword Divide Doubleword Divide Doubleword Extended Divide Doubleword Extended Unsigned Divide Doubleword Unsigned Extend Sign Word Load Doubleword Load Doubleword And Reserve Indexed Load Doubleword Byte-Reverse Indexed Load Doubleword with Update Load Doubleword with Update Indexed Load Doubleword Indexed Load Word Algebraic Load Word Algebraic with Update Indexed Load Word Algebraic Indexed Multiply High Doubleword Multiply High Doubleword Unsigned Multiply Low Doubleword Parity Doubleword Rotate Left Doubleword then Clear Left Rotate Left Doubleword then Clear Right Rotate Left Doubleword Immediate then Clear Rotate Left Doubleword Immediate then Clear Left Rotate Left Doubleword Immediate then Clear Right Rotate Left Doubleword Immediate then Mask Insert Shift Left Doubleword Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword Immediate Shift Right Doubleword Store Doubleword Store Doubleword Byte-Reverse Indexed Store Doubleword Conditional Indexed Store Doubleword with Update Store Doubleword with Update Indexed Store Doubleword Indexed Trap Doubleword Trap Doubleword Immediate Add Add Carrying Add Extended Add Immediate Add Immediate Carrying Add Immediate Carrying and Record Add Immediate Shifted
X X XO XO XO XO X DS X X DS X X DS X X XO XO XO X MDS MDS MD MD MD MD X X XS X DS X X DS X X X D XO XO XO D D D D
SR SR SR SR SR SR
SR SR SR SR SR SR SR SR SR SR SR SR SR
SR SR SR SR SR
1237
Form
Page 65 66 80 81 78 78 36 36 37 37 74 82 74 75 75 82 38 39 39 38 39 38 39 38 686 686 683 684 686 68 69 69 68 81 82 82 676 77 688 689 45 45 45 46 47 690 47 47 47 55 46 46 46 46 57 689 55 48 48 48 48
Cat1 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
Mnemonic addme[o][.] addze[o][.] and[.] andc[.] andi. andis. b[l][a] bc[l][a] bcctr[l] bclr[l] cmp cmpb cmpi cmpl cmpli cntlzw[.] crand crandc creqv crnand crnor cror crorc crxor dcbf dcbst dcbt dcbtst dcbz divw[o][.] divwe[o][.] divweu[o][.] divwu[o][.] eqv[.] extsb[.] extsh[.] icbi isel isync lbarx lbz lbzu lbzux lbzx lha lharx lhau lhaux lhax lhbrx lhz lhzu lhzux lhzx lmw lwarx lwbrx lwz lwzu lwzux lwzx
Instruction Add to Minus One Extended Add to Zero Extended AND AND with Complement AND Immediate AND Immediate Shifted Branch Branch Conditional Branch Conditional to Count Register Branch Conditional to Link Register Compare Compare Bytes Compare Immediate Compare Logical Compare Logical Immediate Count Leading Zeros Word Condition Register AND Condition Register AND with Complement Condition Register Equivalent Condition Register NAND Condition Register NOR Condition Register OR Condition Register OR with Complement Condition Register XOR Data Cache Block Flush Data Cache Block Store Data Cache Block Touch Data Cache Block Touch for Store Data Cache Block set to Zero Divide Word Divide Word Extended Divide Word Extended Unsigned Divide Word Unsigned Equivalent Extend Sign Byte Extend Sign Halfword Instruction Cache Block Invalidate Integer Select Instruction Synchronize Load Byte and Reserve Indexed Load Byte and Zero Load Byte and Zero with Update Load Byte and Zero with Update Indexed Load Byte and Zero Indexed Load Halfword Algebraic Load Halfword and Reserve Indexed Load Halfword Algebraic with Update Load Halfword Algebraic with Update Indexed Load Halfword Algebraic Indexed Load Halfword Byte-Reverse Indexed Load Halfword and Zero Load Halfword and Zero with Update Load Halfword and Zero with Update Indexed Load Halfword and Zero Indexed Load Multiple Word Load Word And Reserve Indexed Load Word Byte-Reverse Indexed Load Word and Zero Load Word and Zero with Update Load Word and Zero with Update Indexed Load Word and Zero Indexed
XO XO X X D D I B XL XL X X D X D X XL XL XL XL XL XL XL XL X X X X X XO XO XO XO X X X X A XL X D D X X D X D X X X D D X X D X X D D X X
31 234 SR 31 202 SR 31 28 SR 31 60 SR 28 SR 29 SR 18 16 CT 19 528 CT 19 16 CT 31 0 31 508 11 31 32 10 31 26 SR 19 257 19 129 19 289 19 225 19 33 19 449 19 417 19 193 31 86 31 54 31 278 31 246 31 1014 31 491 SR 31 427 SR 31 395 SR 31 459 SR 31 284 SR 31 954 SR 31 922 SR 31 982 31 15 19 150 31 52 34 35 31 119 31 87 42 31 116 43 31 375 31 343 31 790 40 41 31 311 31 279 46 31 20 31 534 32 33 31 55 31 23
1238
Form
Page
Cat1 B B B B B B B B B B B B B B B B B B B B B B B B B B B
Mnemonic mcrf mfcr mfmsr mfocrf mfspr mtcrf mtocrf mtspr mulhw[.] mulhwu[.] mulli mullw[o][.] nand[.] neg[o][.] nor[.] or[.] orc[.] ori oris popcntb popcntd popcntw prtyw rlwimi[.] rlwinm[.] rlwnm[.] sc
Instruction Move Condition Register Field Move From Condition Register Move From Machine State Register Move From One Condition Register Field Move From Special Purpose Register Move To Condition Register Fields Move To One Condition Register Field Move To Special Purpose Register Multiply High Word Multiply High Word Unsigned Multiply Low Immediate Multiply Low Word NAND Negate NOR OR OR with Complement OR Immediate OR Immediate Shifted Population Count Bytes Population Count Doubleword Population Count Word Parity Word Rotate Left Word Immediate then Mask Insert Rotate Left Word Immediate then AND with Mask Rotate Left Word then AND with Mask System Call
X X X X D X D X X D X X D X X D D X X D X X XO XO XO D XO XO X
31 31 31 31 38 31 39 31 31 44 31 31 45 31 31 47 36 31 31 37 31 31 31 31 31 8 31 31 31
24 792 824 536 694 247 215 918 726 439 407 662 150 183 151 40 8 136 232 200 598
39 102 P 759, 901 103 O 101, 704 102 103 O 100 SR 67 SR 67 67 SR 67 SR 80 SR 66 SR 81 SR 80 SR 81 78 79 83 85 83 84 SR 89 SR 87 SR 88 40, 738, 886 SR 93 SR 94 SR 94 SR 93 51 691 51 51 51 52 55 692 52 52 52 57 53 55 693 53 53 53 SR 63 SR 64 SR 65 SR 64 SR 65 SR 66 696
B B B B B B B B B B B B B B B B B B B B B B B B B B B B B
slw[.] sraw[.] srawi[.] srw[.] stb stbcx. stbu stbux stbx sth sthbrx sthcx. sthu sthux sthx stmw stw stwbrx stwcx. stwu stwux stwx subf[o][.] subfc[o][.] subfe[o][.] subfic subfme[o][.] subfze[o][.] sync
Shift Left Word Shift Right Algebraic Word Shift Right Algebraic Word Immediate Shift Right Word Store Byte Store Byte Conditional Indexed Store Byte with Update Store Byte with Update Indexed Store Byte Indexed Store Halfword Store Halfword Byte-Reverse Indexed Store Halfword Conditional Indexed Store Halfword with Update Store Halfword with Update Indexed Store Halfword Indexed Store Multiple Word Store Word Store Word Byte-Reverse Indexed Store Word Conditional Indexed Store Word with Update Store Word with Update Indexed Store Word Indexed Subtract From Subtract From Carrying Subtract From Extended Subtract From Immediate Carrying Subtract From Minus One Extended Subtract From Zero Extended Synchronize
1239
Form
Page
Cat1 B
Instruction
X D X D D XO X X X X X X X X X X X X X X X X X X X X X X X X Z23 Z23 Z23 Z23 X Z23 Z23 Z23 Z23 Z23 Z23 X Z22 Z23 Z22 Z22 X X Z23 Z22 Z23 Z22 X X X X X X
31 3 31 26 27 31 31 31 59 63 59 63 59 63 59 63 59 59 63 63 59 63 59 63 59 63 59 63 59 63 59 59 63 63 63 59 63 59 63 59 63 59 59 63 59 63 59 63 59 63 59 63 59 63 59 63 59 63
4 316 74 314 282 2 2 802 802 130 130 642 642 258 290 290 258 322 322 546 546 834 834 866 866 34 34 3 67 67 3 770 227 227 99 99 35 35 770 66 66 98 98 514 514 194 194 226 226 162 162 674 674 354 354
H 803, 987, 905 76 76 SR 80 79 79 H 97 H 97 H 97 173 173 195 195 179 179 178 179 193 195 195 193 197 197 176 176 197 197 198 198 175 175 184 183 183 184 194 191 191 189 189 186 186 194 200 200 200 200 173 173 180 180 180 180 181 181 182 182 198 198
B B B B B BCDA BCDA BCDA DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP
tw twi xor[.] xori xoris addg6s cbcdtd cdtbcd dadd[.] daddq[.] dcffix[.] dcffixq[.] dcmpo dcmpoq dcmpu dcmpuq dctdp[.] dctfix[.] dctfixq[.] dctqpq[.] ddedpd[.] ddedpdq[.] ddiv[.] ddivq[.] denbcd[.] denbcdq[.] diex[.] diexq[.] dmul[.] dmulq[.] dqua[.] dquai[.] dquaiq[.] dquaq[.] drdpq[.] drintn[.] drintnq[.] drintx[.] drintxq[.] drrnd[.] drrndq[.] drsp[.] dscli[.] dscliq[.] dscri[.] dscriq[.] dsub[.] dsubq[.] dtstdc dtstdcq dtstdg dtstdgq dtstex dtstexq dtstsf dtstsfq dxex[.] dxexq[.]
Trap Word Trap Word Immediate XOR XOR Immediate XOR Immediate Shifted Add and Generate Sixes Convert Binary Coded Decimal to Declets Convert Declets To Binary Coded Decimal DFP Add DFP Add Quad DFP Convert From Fixed DFP Convert From Fixed Quad DFP Compare Ordered DFP Compare Ordered Quad DFP Compare Unordered DFP Compare Unordered Quad DFP Convert To DFP Long DFP Convert To Fixed DFP Convert To Fixed Quad DFP Convert To DFP Extended DFP Decode DPD To BCD DFP Decode DPD To BCD Quad DFP Divide DFP Divide Quad DFP Encode BCD To DPD DFP Encode BCD To DPD Quad DFP Insert Biased Exponent DFP Insert Biased Exponent Quad DFP Multiply DFP Multiply Quad DFP Quantize DFP Quantize Immediate DFP Quantize Immediate Quad DFP Quantize Quad DFP Round To DFP Long DFP Round To FP Integer Without Inexact DFP Round To FP Integer Without Inexact Quad DFP Round To FP Integer With Inexact DFP Round To FP Integer With Inexact Quad DFP Reround DFP Reround Quad DFP Round To DFP Short DFP Shift Significand Left Immediate DFP Shift Significand Left Immediate Quad DFP Shift Significand Right Immediate DFP Shift Significand Right Immediate Quad DFP Subtract DFP Subtract Quad DFP Test Data Class DFP Test Data Class Quad DFP Test Data Group DFP Test Data Group Quad DFP Test Exponent DFP Test Exponent Quad DFP Test Significance DFP Test Significance Quad DFP Extract Biased Exponent DFP Extract Biased Exponent Quad
1240
Form
Page 710 708 708 708 708 708 709 709 709 709 709 683 965 676 698 104 901 887 887 888 980 978, 903 985, 904 982, 904 987, 905 902 903 1086 1086 1087 1083 1083 901 104 901 900 104 900 1071 888 889 889 1078 1078 910 909 909 912 913 915 915 913 905
Cat1 DS DS DS DS DS DS DS DS DS DS DS E E E E E E E E E E E E E E E E E.CD E.CD E.CD E.CI E.CI E.DC E.DC E.DC E.DC E.DC E.DC E.ED E.ED E.HV E.HV E.PC E.PC E.PD E.PD E.PD E.PD E.PD E.PD E.PD E.PD E.PD
Mnemonic dsn lbdx lddx lfddx lhdx lwdx stbdx stddx stfddx sthdx stwdx dcba dcbi icbt mbar mcrxr mtmsr rfci rfi rfmci tlbilx tlbivax tlbre tlbsx tlbwe wrtee wrteei dcread dcread icread dci ici mfdcr mfdcrux mfdcrx mtdcr mtdcrux mtdcrx dnh rfdi ehpriv rfgi msgclr msgsnd dcbfep dcbstep dcbtep dcbtstep dcbzep evlddepx evstddepx icbiep lbepx
Instruction Decorated Storage Notify Load Byte with Decoration Indexed Load Doubleword with Decoration Indexed Load Floating Doubleword with Decoration Indexed Load Halfword with Decoration Indexed Load Word with Decoration Indexed Store Byte with Decoration Indexed Store Doubleword with Decoration Indexed Store Floating Doubleword with Decoration Indexed Store Halfword with Decoration Indexed Store Word with Decoration Indexed Data Cache Block Allocate Data Cache Block Invalidate Instruction Cache Block Touch Memory Barrier Move to Condition Register from XER Move To Machine State Register Return From Critical Interrupt Return From Interrupt Return From Machine Check Interrupt TLB Invalidate Local TLB Invalidate Virtual Address Indexed TLB Read Entry TLB Search Indexed TLB Write Entry Write MSR External Enable Write MSR External Enable Immediate Data Cache Read [Alternative Encoding] Data Cache Read Instruction Cache Read Data Cache Invalidate Instruction Cache Invalidate Move From Device Control Register Move From Device Control Register User-mode Indexed Move From Device Control Register Indexed Move To Device Control Register Move To Device Control Register User-mode Indexed Move To Device Control Register Indexed Debugger Notify Halt Return From Debug Interrupt Embedded Hypervisor Privilege Return From Guest Interrupt Message Clear Message Send Data Cache Block Flush by External PID Data Cache Block Store by External PID Data Cache Block Touch by External PID Data Cache Block Touch for Store by External PID Data Cache Block set to Zero by External PID Vector Load Doubleword into Doubleword by External Process ID Indexed Vector Store Doubleword into Doubleword by External Process ID Indexed Instruction Cache Block Invalidate by External PID Load Byte by External Process ID Indexed
X X X X X X X X X X X X X X X X X XL XL XL X X X X X X X X X X X X XFX X
P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P
X 31 259 XFX 31 451 X 31 419 X 31 387 XFX 19 198 X 19 39 XL 31 270 XL 19 102 X 31 238 X 31 206 X 31 127 X 31 63 X 31 319 X 31 255 X 31 1023 EVX 31 799 EVX 31 X X 31 31 927 991 95
1241
Form
Instruction Load Floating-Point Double by External Process ID Indexed Load Halfword by External Process ID Indexed Load Vector by External Process ID Indexed Load Vector by External Process ID Indexed LRU Load Word by External Process ID Indexed Store Byte by External Process ID Indexed Store Floating-Point Double by External Process ID Indexed Store Halfword by External Process ID Indexed Store Vector by External Process ID Indexed Store Vector by External Process ID Indexed LRU Store Word by External Process ID Indexed Load Doubleword by External Process ID Indexed Store Doubleword by External Process ID Indexed Move From Performance Monitor Register Move To Performance Monitor Register TLB Search and Reserve External Control In Word Indexed External Control Out Word Indexed Data Cache Block Lock Clear Data Cache Block Touch and Lock Set Data Cache Block Touch for Store and Lock Set Instruction Cache Block Lock Clear Instruction Cache Block Touch and Lock Set Floating Convert From Integer Doubleword Single Floating Convert From Integer Doubleword Unsigned Floating Convert From Integer Doubleword Unsigned Single Floating Compare Ordered Floating Compare Unordered Floating Convert To Integer Doubleword Unsigned Floating Convert To Integer Doubleword Unsigned with round toward Zero Floating Convert To Integer Word Unsigned Floating Convert To Integer Word Unsigned with round toward Zero Floating Test for software Divide Floating Test for software Square Root Load Floating-Point Double Load Floating-Point Double with Update Load Floating-Point Double with Update Indexed Load Floating-Point Double Indexed Load Floating-Point as Integer Word Algebraic Indexed Load Floating-Point as Integer Word and Zero Indexed Load Floating-Point Single Load Floating-Point Single with Update Load Floating-Point Single with Update Indexed Load Floating-Point Single Indexed Move to Condition Register from FPSCR Store Floating-Point Double Store Floating-Point Double with Update Store Floating-Point Double with Update Indexed Store Floating-Point Double Indexed Store Floating-Point as Integer Word Indexed Store Floating-Point Single Store Floating-Point Single with Update Store Floating-Point Single with Update Indexed Store Floating-Point Single Indexed Load Floating-Point Double Pair Load Floating-Point Double Pair Indexed
X X X X X X X X X X X X X XFX XFX X X X X X X X X X X X X X X X X X X X D D X X X X D D X X X D D X X X D D X X DS X
P P P P P P P
P 907 E.PD sthepx P 917 E.PD stvepx P 917 E.PD stvepxl P 908 E.PD stwepx P 906 E.PD;64 ldepx P 908 E.PD;64 stdepx O 1100 E.PM mfpmr O 1100 E.PM mtpmr P 984 E.TWC tlbsrx. 712 EC eciwx 712 EC ecowx M 969 ECL dcblc M 968 ECL dcbtls M 968 ECL dcbtstls M 970 ECL icblc M 969 ECL icbtls 145 FP fcfids[.] 145 FP fcfidu[.] 146 FP fcfidus[.] 148 148 141 142 143 144 137 137 125 125 125 125 126 126 128 128 128 128 150 129 129 129 129 130 128 128 128 128 131 131 FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP FP.out FP.out fcmpo fcmpu fctidu[.] fctiduz[.] fctiwu[.] fctiwuz[.] ftdiv ftsqrt lfd lfdu lfdux lfdx lfiwax lfiwzx lfs lfsu lfsux lfsx mcrfs stfd stfdu stfdux stfdx stfiwx stfs stfsu stfsux stfsx lfdp lfdpx
1242
Form
Page 131 131 132 133 133 144 132 140 141 142 143 134 134 138 138 132 138 138 134 134 132 132 139 139 139 139 135 135 140 149 135 135 133 133 150 152 152 151 151 147 147 147 147 136 136 591 591 592 592 593 593 594
Cat1 FP.out FP.out FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R].in FP[R].in FP[R].in FP[R].in FP[R].in FP[R].in LMA LMA LMA LMA LMA LMA LMA
Mnemonic stfdp stfdpx fabs[.] fadd[.] fadds[.] fcfid[.] fcpsgn[.] fctid[.] fctidz[.]
Instruction
DS X X A A X X X X X X A A A A X A A A A X X A A A A A A X A A A A A X X X XFL X X X X X A A XO XO XO XO XO XO XO
Store Floating-Point Double Pair Store Floating-Point Double Pair Indexed Floating Absolute Value Floating Add Floating Add Single Floating Convert From Integer Doubleword Floating Copy Sign Floating Convert To Integer Doubleword Floating Convert To Integer Doubleword with round toward Zero fctiw[.] Floating Convert To Integer Word fctiwz[.] Floating Convert To Integer Word with round toward Zero fdiv[.] Floating Divide fdivs[.] Floating Divide Single fmadd[.] Floating Multiply-Add fmadds[.] Floating Multiply-Add Single fmr[.] Floating Move Register fmsub[.] Floating Multiply-Subtract fmsubs[.] Floating Multiply-Subtract Single fmul[.] Floating Multiply fmuls[.] Floating Multiply Single fnabs[.] Floating Negative Absolute Value fneg[.] Floating Negate fnmadd[.] Floating Negative Multiply-Add fnmadds[.] Floating Negative Multiply-Add Single fnmsub[.] Floating Negative Multiply-Subtract fnmsubs[.] Floating Negative Multiply-Subtract Single fre[.] Floating Reciprocal Estimate fres[.] Floating Reciprocal Estimate Single frsp[.] Floating Round to Single-Precision fsel[.] Floating Select fsqrt[.] Floating Square Root fsqrts[.] Floating Square Root Single fsub[.] Floating Subtract fsubs[.] Floating Subtract Single mffs[.] Move From FPSCR mtfsb0[.] Move To FPSCR Bit 0 mtfsb1[.] Move To FPSCR Bit 1 mtfsf[.] Move To FPSCR Fields mtfsfi[.] Move To FPSCR Field Immediate frim[.] Floating Round to Integer Minus frin[.] Floating Round to Integer Nearest frip[.] Floating Round to Integer Plus friz[.] Floating Round to Integer Toward Zero frsqrte[.] Floating Reciprocal Square Root Estimate frsqrtes[.] Floating Reciprocal Square Root Estimate Single macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned
1243
Form
Page 594 595 595 596 596 596 596 597 597 597 597 598 598 599 599 600 600 589 751 751 59 59 60 60 741 698 739 749 749 749 749 797 797 757 758 796 796 741 739 742 794 791 790 793 793 792 742 750 750 750
Cat1 LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMV LSQ LSQ MA MA MA MA S S S S S S S S S S S S S S S S S S S S S S S S S S
Mnemonic machhwu[o][.]
Instruction
XO XO XO XO XO X X X X X X XO XO XO XO XO XO X DQ DS X X X X XL X XL X X X X X X X X X X XL XL XL X X X X X X XL X X X
31 78 56 62 2 31 597 31 533 31 725 31 661 19 402 31 854 19 274 31 853 31 885 31 821 31 789 31 595 31 659 31 146 31 178 31 210 31 242 19 434 19 18 19 498 31 979 31 498 31 434 31 915 31 851 31 402 19 466 31 981 31 1013 31 949
H H H H H H P P P P P P H P H P P P P P P H H H H
32 32 32 32
SR
Multiply Accumulate High Halfword to Word Modulo Unsigned maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned mulchw[.] Multiply Cross Halfword to Word Signed mulchwu[.] Multiply Cross Halfword to Word Unsigned mulhhw[.] Multiply High Halfword to Word Signed mulhhwu[.] Multiply High Halfword to Word Unsigned mullhw[.] Multiply Low Halfword to Word Signed mullhwu[.] Multiply Low Halfword to Word Unsigned nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed dlmzb[.] Determine Leftmost Zero Byte lq Load Quadword stq Store Quadword lswi Load String Word Immediate lswx Load String Word Indexed stswi Store String Word Immediate stswx Store String Word Indexed doze Doze eieio Enforce In-order Execution of I/O hrfid Hypervisor Return From Interrupt Doubleword lbzcix Load Byte and Zero Caching Inhibited Indexed ldcix Load Doubleword Caching Inhibited Indexed lhzcix Load Halfword and Zero Caching Inhibited Indexed lwzcix Load Word and Zero Caching Inhibited Indexed mfsr Move From Segment Register mfsrin Move From Segment Register Indirect mtmsr Move To Machine State Register mtmsrd Move To Machine State Register Doubleword mtsr Move To Segment Register mtsrin Move To Segment Register Indirect nap Nap rfid Return From Interrupt Doubleword rvwinkle Rip Van Winkle slbfee. SLB Find Entry ESID slbia SLB Invalidate All slbie SLB Invalidate Entry slbmfee SLB Move From Entry ESID slbmfev SLB Move From Entry VSID slbmte SLB Move To Entry sleep Sleep stbcix Store Byte Caching Inhibited Indexed stdcix Store Doubleword Caching Inhibited Indexed sthcix Store Halfword Caching Inhibited Indexed
1244
Form
Page 750 802 798 800 704 510 510 510 510 511 511 511 511 512 512 512 512 513 513 513 514 514 514 515 515 515 515 516 516 516 516 517 517 517 517 518 518 518 518 519 519 519 519 520 520 520
Cat1 S S S S S.out SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic stwcix tlbia tlbie tlbiel mftb brinc evabs evaddiw evaddsmiaaw evaddssiaaw evaddumiaaw evaddusiaaw evaddw evand evandc evcmpeq evcmpgts evcmpgtu evcmplts evcmpltu evcntlsw evcntlzw evdivws evdivwu eveqv evextsb evextsh evldd evlddx evldh evldhx evldw evldwx evlhhesplat evlhhesplatx evlhhossplat evlhhossplatx evlhhousplat evlhhousplatx evlwhe evlwhex evlwhos evlwhosx evlwhou evlwhoux evlwhsplat
Instruction Store Word Caching Inhibited Indexed TLB Invalidate All TLB Invalidate Entry TLB Invalidate Entry Local Move From Time Base Bit Reversed Increment Vector Absolute Value Vector Add Immediate Word Vector Add Signed, Modulo, Integer to Accumulator Word Vector Add Signed, Saturate, Integer to Accumulator Word Vector Add Unsigned, Modulo, Integer to Accumulator Word Vector Add Unsigned, Saturate, Integer to Accumulator Word Vector Add Word Vector AND Vector AND with Complement Vector Compare Equal Vector Compare Greater Than Signed Vector Compare Greater Than Unsigned Vector Compare Less Than Signed Vector Compare Less Than Unsigned Vector Count Leading Signed Bits Word Vector Count Leading Zeros Word Vector Divide Word Signed Vector Divide Word Unsigned Vector Equivalent Vector Extend Sign Byte Vector Extend Sign Halfword Vector Load Double Word into Double Word Vector Load Double Word into Double Word Indexed Vector Load Double into Four Halfwords Vector Load Double into Four Halfwords Indexed Vector Load Double into Two Words Vector Load Double into Two Words Indexed Vector Load Halfword into Halfwords Even and Splat Vector Load Halfword into Halfwords Even and Splat Indexed Vector Load Halfword into Halfword Odd Signed and Splat Vector Load Halfword into Halfword Odd Signed and Splat Indexed Vector Load Halfword into Halfword Odd Unsigned and Splat Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed Vector Load Word into Two Halfwords Even Vector Load Word into Two Halfwords Even Indexed Vector Load Word into Two Halfwords Odd Signed (with sign extension) Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Vector Load Word into Two Halfwords and Splat
X X X X XFX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
31 917 H 31 370 H 31 306 64 H 31 274 64 P 31 371 4 527 4 520 4 514 4 1225 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1217 1224 1216 512 529 530 564 561 560 563 562 526 525 1222 1223 537 522 523 769 768 773 772 771 770 777 776 783 782 781 780 785 784 791 790 789 788 797
1245
Form
Page 520 521 521 521 522 521 522 522 522 523 523 523 523 524 524 524 524 525 525 525 525 526 526 527 527 528 528 529 529 529 529 530
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evlwhsplatx evlwwsplat evlwwsplatx evmergehi evmergehilo evmergelo evmergelohi evmhegsmfaa evmhegsmfan evmhegsmiaa evmhegsmian evmhegumiaa evmhegumian evmhesmf evmhesmfa evmhesmfaaw evmhesmfanw evmhesmi evmhesmia evmhesmiaaw evmhesmianw evmhessf evmhessfa evmhessfaaw evmhessfanw evmhessiaaw evmhessianw evmheumi evmheumia evmheumiaaw evmheumianw evmheusiaaw
Instruction Vector Load Word into Two Halfwords and Splat Indexed Vector Load Word into Word and Splat Vector Load Word into Word and Splat Indexed Vector Merge High Vector Merge High/Low Vector Merge Low Vector Merge Low/High Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Signed, Modulo, Fractional Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1246
Form
Page 530 531 531 531 531 532 532 532 532 533 533 533 533 534 533 535 535 536 536 537 537 537 537 538 534 538 538 539 539 539 539
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmheusianw evmhogsmfaa evmhogsmfan evmhogsmiaa evmhogsmian evmhogumiaa evmhogumian evmhosmf evmhosmfa evmhosmfaaw evmhosmfanw evmhosmi evmhosmia evmhosmiaaw evmhosmianw evmhossf evmhossfa evmhossfaaw evmhossfanw evmhossiaaw evmhossianw evmhoumi evmhoumia evmhoumiaaw evmhoumianw evmhousiaaw evmhousianw evmra evmwhsmf evmwhsmfa evmwhsmi
Instruction Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words Initialize Accumulator Vector Multiply Word High Signed, Modulo, Fractional Vector Multiply Word High Signed, Modulo, Fractional to Accumulator Vector Multiply Word High Signed, Modulo, Integer
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1247
Form
Page 539 540 540 540 540 541 541 541 541 542 542 542 542 543 543 543 543 544 544 544 544 544 544 545 545 545 546 546 546 547 547 547 547 547 548 548
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmwhsmia evmwhssf evmwhssfa evmwhumi evmwhumia evmwlsmiaaw evmwlsmianw evmwlssiaaw evmwlssianw evmwlumi evmwlumia evmwlumiaaw evmwlumianw evmwlusiaaw evmwlusianw evmwsmf evmwsmfa evmwsmfaa evmwsmfan evmwsmi evmwsmia evmwsmiaa evmwsmian evmwssf evmwssfa evmwssfaa evmwssfan evmwumi evmwumia evmwumiaa evmwumian evnand evneg evnor evor evorc
Instruction Vector Multiply Word High Signed, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Saturate, Fractional Vector Multiply Word High Signed, Saturate, Fractional to Accumulator Vector Multiply Word High Unsigned, Modulo, Integer Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Modulo, Integer Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Signed, Modulo, Fractional Vector Multiply Word Signed, Modulo, Fractional to Accumulator Vector Multiply Word Signed, Modulo, Fractional and Accumulate Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Word Signed, Modulo, Integer Vector Multiply Word Signed, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Saturate, Fractional Vector Multiply Word Signed, Saturate, Fractional to Accumulator Vector Multiply Word Signed, Saturate, Fractional and Accumulate Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative Vector Multiply Word Unsigned, Modulo, Integer Vector Multiply Word Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative Vector NAND Vector Negate Vector NOR Vector OR Vector OR with Complement
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1248
Form
Page 548 549 549 549 550 550 550 550 550 550 551 551 551 551 552 552 552 552 553 553 553 553 553 553 554 554 554 554 555 555 555 555 555 576 577 582 580 579 580 580 579 580 578 578 578 582 580
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD
Mnemonic evrlw evrlwi evrndw evsel evslw evslwi evsplatfi evsplati evsrwis evsrwiu evsrws evsrwu evstdd evstddx evstdh evstdhx evstdw evstdwx evstwhe evstwhex evstwho evstwhox evstwwe evstwwex evstwwo evstwwox evsubfsmiaaw evsubfssiaaw evsubfumiaaw evsubfusiaaw evsubfw evsubifw evxor efdabs efdadd efdcfs efdcfsf efdcfsi efdcfsid efdcfuf efdcfui efdcfuid efdcmpeq efdcmpgt efdcmplt efdctsf efdctsi
Instruction Vector Rotate Left Word Vector Rotate Left Word Immediate Vector Round Word Vector Select Vector Shift Left Word Vector Shift Left Word Immediate Vector Splat Fractional Immediate Vector Splat Immediate Vector Shift Right Word Immediate Signed Vector Shift Right Word Immediate Unsigned Vector Shift Right Word Signed Vector Shift Right Word Unsigned Vector Store Double of Double Vector Store Double of Double Indexed Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords Indexed Vector Store Double of Two Words Vector Store Double of Two Words Indexed Vector Store Word of Two Halfwords from Even Vector Store Word of Two Halfwords from Even Indexed Vector Store Word of Two Halfwords from Odd Vector Store Word of Two Halfwords from Odd Indexed Vector Store Word of Word from Even Vector Store Word of Word from Even Indexed Vector Store Word of Word from Odd Vector Store Word of Word from Odd Indexed Vector Subtract Signed, Modulo, Integer to Accumulator Word Vector Subtract Signed, Saturate, Integer to Accumulator Word Vector Subtract Unsigned, Modulo, Integer to Accumulator Word Vector Subtract Unsigned, Saturate, Integer to Accumulator Word Vector Subtract from Word Vector Subtract Immediate from Word Vector XOR Floating-Point Double-Precision Absolute Value Floating-Point Double-Precision Add Floating-Point Double-Precision Convert from SinglePrecision Convert Floating-Point Double-Precision from Signed Fraction Convert Floating-Point Double-Precision from Signed Integer Convert Floating-Point Double-Precision from Signed Integer Doubleword Convert Floating-Point Double-Precision from Unsigned Fraction Convert Floating-Point Double-Precision from Unsigned Integer Convert Floating-Point Double-Precision from Unsigned Integer Doubleword Floating-Point Double-Precision Compare Equal Floating-Point Double-Precision Compare Greater Than Floating-Point Double-Precision Compare Less Than Convert Floating-Point Double-Precision to Signed Fraction Convert Floating-Point Double-Precision to Signed Integer
EVX EVX EVX EVS EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1249
Form
Page 581 582 582 580 581 582 577 577 576 576 577 579 578 579 583 569 570 574 574 574 574 572 571 571 575 574 575 575 574 575 570 570 569 569 570 573 572 573 561 562
Cat1 SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FV SP.FV
Mnemonic efdctsidz efdctsiz efdctuf efdctui efdctuidz efdctuiz efddiv efdmul efdnabs efdneg efdsub efdtsteq efdtstgt efdtstlt efscfd efsabs efsadd efscfsf efscfsi efscfuf efscfui efscmpeq efscmpgt efscmplt efsctsf efsctsi efsctsiz efsctuf efsctui efsctuiz efsdiv efsmul efsnabs efsneg efssub efststeq efststgt efststlt evfsabs evfsadd
Instruction Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Fraction Convert Floating-Point Double-Precision to Unsigned Integer Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero Floating-Point Double-Precision Divide Floating-Point Double-Precision Multiply Floating-Point Double-Precision Negative Absolute Value Floating-Point Double-Precision Negate Floating-Point Double-Precision Subtract Floating-Point Double-Precision Test Equal Floating-Point Double-Precision Test Greater Than Floating-Point Double-Precision Test Less Than Floating-Point Single-Precision Convert from DoublePrecision Floating-Point Single-Precision Absolute Value Floating-Point Single-Precision Add Convert Floating-Point Single-Precision from Signed Fraction Convert Floating-Point Single-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Single-Precision from Unsigned Integer Floating-Point Single-Precision Compare Equal Floating-Point Single-Precision Compare Greater Than Floating-Point Single-Precision Compare Less Than Convert Floating-Point Single-Precision to Signed Fraction Convert Floating-Point Single-Precision to Signed Integer Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Convert Floating-Point Single-Precision to Unsigned Fraction Convert Floating-Point Single-Precision to Unsigned Integer Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Floating-Point Single-Precision Divide Floating-Point Single-Precision Multiply Floating-Point Single-Precision Negative Absolute Value Floating-Point Single-Precision Negate Floating-Point Single-Precision Subtract Floating-Point Single-Precision Test Equal Floating-Point Single-Precision Test Greater Than Floating-Point Single-Precision Test Less Than Vector Floating-Point Single-Precision Absolute Value Vector Floating-Point Single-Precision Add
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1250
Form
Page 566 566 566 566 564 563 563 568 567 567 568 567 567 562 562 561 561 562 565 564 565 216 213 213 218 218 214 214 269 269 216 216 217 214 217 230 259 230 230 230 231 232 231 232 231 232
Cat1 SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic evfscfsf evfscfsi evfscfuf evfscfui evfscmpeq evfscmpgt evfscmplt evfsctsf evfsctsi evfsctsiz evfsctuf evfsctui evfsctuiz evfsdiv evfsmul evfsnabs evfsneg evfssub evfststeq evfststgt evfststlt lvebx lvehx lvewx lvsl lvsr lvx lvxl mfvscr mtvscr stvebx stvehx stvewx stvx stvxl vaddcuw vaddfp vaddsbs vaddshs vaddsws vaddubm vaddubs vadduhm vadduhs vadduwm vadduws
Instruction Vector Convert Floating-Point Single-Precision from Signed Fraction Vector Convert Floating-Point Single-Precision from Signed Integer Vector Convert Floating-Point Single-Precision from Unsigned Fraction Vector Convert Floating-Point Single-Precision from Unsigned Integer Vector Floating-Point Single-Precision Compare Equal Vector Floating-Point Single-Precision Compare Greater Than Vector Floating-Point Single-Precision Compare Less Than Vector Convert Floating-Point Single-Precision to Signed Fraction Vector Convert Floating-Point Single-Precision to Signed Integer Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Vector Convert Floating-Point Single-Precision to Unsigned Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Vector Floating-Point Single-Precision Divide Vector Floating-Point Single-Precision Multiply Vector Floating-Point Single-Precision Negative Absolute Value Vector Floating-Point Single-Precision Negate Vector Floating-Point Single-Precision Subtract Vector Floating-Point Single-Precision Test Equal Vector Floating-Point Single-Precision Test Greater Than Vector Floating-Point Single-Precision Test Less Than Load Vector Element Byte Indexed Load Vector Element Halfword Indexed Load Vector Element Word Indexed Load Vector for Shift Left Indexed Load Vector for Shift Right Indexed Load Vector Indexed Load Vector Indexed LRU Move From Vector Status and Control Register Move To Vector Status and Control Register Store Vector Element Byte Indexed Store Vector Element Halfword Indexed Store Vector Element Word Indexed Store Vector Indexed Store Vector Indexed LRU Vector Add and Write Carry-Out Unsigned Word Vector Add Single-Precision Vector Add Signed Byte Saturate Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate Vector Add Unsigned Byte Modulo Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Modulo Vector Add Unsigned Halfword Saturate Vector Add Unsigned Word Modulo Vector Add Unsigned Word Saturate
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX X X X X X X X VX VX X X X X X VX VX VX VX VX VX VX VX VX VX VX
4 669 31 7 31 39 31 71 31 6 31 38 31 103 31 359 4 1540 4 1604 31 135 31 167 31 199 31 231 31 487 4 384 4 10 4 768 4 832 4 896 4 0 4 512 4 64 4 576 4 128 4 640
1251
Form
Page 254 254 245 245 245 246 246 246 263 263 265 265 251 251 252 266 266 252 252 252 253 253 253 262 262 267 267 260 261 247 247 247 248 248 248 238 238 261 249 249 249 250 250 250 239 224 224 224 225 225 225 240 240 241 239 241 242 236
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic vand vandc vavgsb vavgsh vavgsw vavgub vavguh vavguw vcfsx vcfux vcmpbfp[.] vcmpeqfp[.] vcmpequb[.] vcmpequh[.] vcmpequw[.] vcmpgefp[.] vcmpgtfp[.] vcmpgtsb[.] vcmpgtsh[.] vcmpgtsw[.] vcmpgtub[.] vcmpgtuh[.] vcmpgtuw[.] vctsxs vctuxs vexptefp vlogefp vmaddfp vmaxfp vmaxsb vmaxsh vmaxsw vmaxub vmaxuh vmaxuw vmhaddshs vmhraddshs vminfp vminsb vminsh vminsw vminub vminuh vminuw vmladduhm vmrghb vmrghh vmrghw vmrglb vmrglh vmrglw vmsummbm vmsumshm vmsumshs vmsumubm vmsumuhm vmsumuhs vmulesb
Instruction Vector Logical AND Vector Logical AND with Complement Vector Average Signed Byte Vector Average Signed Halfword Vector Average Signed Word Vector Average Unsigned Byte Vector Average Unsigned Halfword Vector Average Unsigned Word Vector Convert From Signed Fixed-Point Word Vector Convert From Unsigned Fixed-Point Word Vector Compare Bounds Single-Precision Vector Compare Equal To Single-Precision Vector Compare Equal To Unsigned Byte Vector Compare Equal To Unsigned Halfword Vector Compare Equal To Unsigned Word Vector Compare Greater Than or Equal To Single-Precision Vector Compare Greater Than Single-Precision Vector Compare Greater Than Signed Byte Vector Compare Greater Than Signed Halfword Vector Compare Greater Than Signed Word Vector Compare Greater Than Unsigned Byte Vector Compare Greater Than Unsigned Halfword Vector Compare Greater Than Unsigned Word Vector Convert To Signed Fixed-Point Word Saturate Vector Convert To Unsigned Fixed-Point Word Saturate Vector 2 Raised to the Exponent Estimate FloatingPoint Vector Log Base 2 Estimate Floating-Point Vector Multiply-Add Single-Precision Vector Maximum Single-Precision Vector Maximum Signed Byte Vector Maximum Signed Halfword Vector Maximum Signed Word Vector Maximum Unsigned Byte Vector Maximum Unsigned Halfword Vector Maximum Unsigned Word Vector Multiply-High-Add Signed Halfword Saturate Vector Multiply-High-Round-Add Signed Halfword Saturate Vector Minimum Single-Precision Vector Minimum Signed Byte Vector Minimum Signed Halfword Vector Minimum Signed Word Vector Minimum Unsigned Byte Vector Minimum Unsigned Halfword Vector Minimum Unsigned Word Vector Multiply-Low-Add Unsigned Halfword Modulo Vector Merge High Byte Vector Merge High Halfword Vector Merge High Word Vector Merge Low Byte Vector Merge Low Halfword Vector Merge Low Word Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword Modulo Vector Multiply-Sum Signed Halfword Saturate Vector Multiply-Sum Unsigned Byte Modulo Vector Multiply-Sum Unsigned Halfword Modulo Vector Multiply-Sum Unsigned Halfword Saturate Vector Multiply Even Signed Byte
VX VX VX VX VX VX VX VX VX VX VC VC VC VC VC VC VC VC VC VC VC VC VC VX VX VX VX VA VX VX VX VX VX VX VX VA VA VX VX VX VX VX VX VX VA VX VX VX VX VX VX VA VA VA VA VA VA VX
1252
Form
Page 236 236 236 237 237 237 237 260 254 254 227 219 220 220 220 220 221 221 221 221 268 264 264 264 264 255 255 255 268 227 228 256 228 256 228 256 226 226 226 226 226 226 229 258 258 258 257 257 229 257 233 259 233 233 233 234 235 234
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic vmulesh vmuleub vmuleuh vmulosb vmulosh vmuloub vmulouh vnmsubfp vnor vor vperm vpkpx vpkshss vpkshus vpkswss vpkswus vpkuhum vpkuhus vpkuwum vpkuwus vrefp vrfim vrfin vrfip vrfiz vrlb vrlh vrlw vrsqrtefp vsel vsl vslb vsldoi vslh vslo vslw vspltb vsplth vspltisb vspltish vspltisw vspltw vsr vsrab vsrah vsraw vsrb vsrh vsro vsrw vsubcuw vsubfp vsubsbs vsubshs vsubsws vsububm vsububs vsubuhm
Instruction Vector Multiply Even Signed Halfword Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword Vector Multiply Odd Signed Byte Vector Multiply Odd Signed Halfword Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword Vector Negative Multiply-Subtract Single-Precision Vector Logical NOR Vector Logical OR Vector Permute Vector Pack Pixel Vector Pack Signed Halfword Signed Saturate Vector Pack Signed Halfword Unsigned Saturate Vector Pack Signed Word Signed Saturate Vector Pack Signed Word Unsigned Saturate Vector Pack Unsigned Halfword Unsigned Modulo Vector Pack Unsigned Halfword Unsigned Saturate Vector Pack Unsigned Word Unsigned Modulo Vector Pack Unsigned Word Unsigned Saturate Vector Reciprocal Estimate Single-Precision Vector Round to Single-Precision Integer toward -Infinity Vector Round to Single-Precision Integer Nearest Vector Round to Single-Precision Integer toward +Infinity Vector Round to Single-Precision Integer toward Zero Vector Rotate Left Byte Vector Rotate Left Halfword Vector Rotate Left Word Vector Reciprocal Square Root Estimate Single-Precision Vector Select Vector Shift Left Vector Shift Left Byte Vector Shift Left Double by Octet Immediate Vector Shift Left Halfword Vector Shift Left by Octet Vector Shift Left Word Vector Splat Byte Vector Splat Halfword Vector Splat Immediate Signed Byte Vector Splat Immediate Signed Halfword Vector Splat Immediate Signed Word Vector Splat Word Vector Shift Right Vector Shift Right Algebraic Byte Vector Shift Right Algebraic Halfword Vector Shift Right Algebraic Word Vector Shift Right Byte Vector Shift Right Halfword Vector Shift Right by Octet Vector Shift Right Word Vector Subtract and Write Carry-Out Unsigned Word Vector Subtract Single-Precision Vector Subtract Signed Byte Saturate Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate Vector Subtract Unsigned Byte Modulo Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword Modulo
VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX
1253
Form
Page 234 234 235 243 244 244 244 243 222 222 222 223 223 223 254 338 338 339 339 340 340 341 341 342 347 349 351 352 353
Cat1 V V V V V V V V V V V V V V V VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic vsubuhs vsubuwm vsubuws vsum2sws vsum4sbs vsum4shs vsum4ubs vsumsws vupkhpx vupkhsb vupkhsh vupklpx vupklsb vupklsh vxor lxsdx lxvd2x lxvdsx lxvw4x stxsdx stxvd2x stxvw4x xsabsdp xsadddp xscmpodp xscmpudp xscpsgndp xscvdpsp xscvdpsxds
Instruction Vector Subtract Unsigned Halfword Saturate Vector Subtract Unsigned Word Modulo Vector Subtract Unsigned Word Saturate Vector Sum across Half Signed Word Saturate Vector Sum across Quarter Signed Byte Saturate Vector Sum across Quarter Signed Halfword Saturate Vector Sum across Quarter Unsigned Byte Saturate Vector Sum across Signed Word Saturate Vector Unpack High Pixel Vector Unpack High Signed Byte Vector Unpack High Signed Halfword Vector Unpack Low Pixel Vector Unpack Low Signed Byte Vector Unpack Low Signed Halfword Vector Logical XOR Load VSR Scalar Doubleword Indexed Load VSR Vector Doubleword*2 Indexed Load VSR Vector Doubleword & Splat Indexed Load VSR Vector Word*4 Indexed Store VSR Scalar Doubleword Indexed Store VSR Vector Doubleword*2 Indexed Store VSR Vector Word*4 Indexed VSX Scalar Absolute Value Double-Precision VSX Scalar Add Double-Precision VSX Scalar Compare Ordered Double-Precision VSX Scalar Compare Unordered Double-Precision VSX Scalar Copy Sign Double-Precision VSX Scalar Convert Double-Precision to SinglePrecision VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Scalar Convert Single-Precision to DoublePrecision format VSX Scalar Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Scalar Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Scalar Divide Double-Precision VSX Scalar Multiply-Add Type-A Double-Precision VSX Scalar Multiply-Add Type-M Double-Precision VSX Scalar Maximum Double-Precision VSX Scalar Minimum Double-Precision VSX Scalar Multiply-Subtract Type-A Double-Precision VSX Scalar Multiply-Subtract Type-M Double-Precision VSX Scalar Multiply Double-Precision VSX Scalar Negative Absolute Value Double-Precision VSX Scalar Negate Double-Precision VSX Scalar Negative Multiply-Add Type-A DoublePrecision
VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX XX1 XX1 XX1 XX1 XX1 XX1 XX1 XX2 XX3 XX3 XX3 XX3 XX2 XX2
XX2
60
176
355
VSX
xscvdpsxws
XX2
60
656
357
VSX
xscvdpuxds
XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX3
60 60 60 60 60 60 60 60 60 60 60 60 60 60 60
144 658 752 720 224 132 164 640 672 196 228 192 722 754 644
359 361 361 362 363 365 365 368 370 372 372 375 377 377 378
VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
xscvdpuxws xscvspdp xscvsxddp xscvuxddp xsdivdp xsmaddadp xsmaddmdp xsmaxdp xsmindp xsmsubadp xsmsubmdp xsmuldp xsnabsdp xsnegdp xsnmaddadp
1254
Form
Page 378 383 383 386 387 388 388 389 390 391 392 393 395 396 397 397 398 402 404 404 405 405 406 406 407 407 408 408 409 409 410 410 411 412 414 416
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic xsnmaddmdp xsnmsubadp xsnmsubmdp xsrdpi xsrdpic xsrdpim xsrdpip xsrdpiz xsredp xsrsqrtedp xssqrtdp xssubdp xstdivdp xstsqrtdp xvabsdp xvabssp xvadddp xvaddsp xvcmpeqdp xvcmpeqdp. xvcmpeqsp xvcmpeqsp. xvcmpgedp xvcmpgedp. xvcmpgesp xvcmpgesp. xvcmpgtdp xvcmpgtdp. xvcmpgtsp xvcmpgtsp. xvcpsgndp xvcpsgnsp xvcvdpsp xvcvdpsxds xvcvdpsxws xvcvdpuxds
Instruction VSX Scalar Negative Multiply-Add Type-M DoublePrecision VSX Scalar Negative Multiply-Subtract Type-A DoublePrecision VSX Scalar Negative Multiply-Subtract Type-M DoublePrecision VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer using Current rounding mode VSX Scalar Round to Double-Precision Integer toward Infinity VSX Scalar Round to Double-Precision Integer toward +Infinity VSX Scalar Round to Double-Precision Integer toward Zero VSX Scalar Reciprocal Estimate Double-Precision VSX Scalar Reciprocal Square Root Estimate DoublePrecision VSX Scalar Square Root Double-Precision VSX Scalar Subtract Double-Precision VSX Scalar Test for software Divide Double-Precision VSX Scalar Test for software Square Root DoublePrecision VSX Vector Absolute Value Double-Precision VSX Vector Absolute Value Single-Precision VSX Vector Add Double-Precision VSX Vector Add Single-Precision VSX Vector Compare Equal To Double-Precision VSX Vector Compare Equal To Double-Precision & Record VSX Vector Compare Equal To Single-Precision VSX Vector Compare Equal To Single-Precision & Record VSX Vector Compare Greater Than or Equal To Double-Precision VSX Vector Compare Greater Than or Equal To Double-Precision & Record VSX Vector Compare Greater Than or Equal To SinglePrecision VSX Vector Compare Greater Than or Equal To SinglePrecision & Record VSX Vector Compare Greater Than Double-Precision VSX Vector Compare Greater Than Double-Precision & Record VSX Vector Compare Greater Than Single-Precision VSX Vector Compare Greater Than Single-Precision & Record VSX Vector Copy Sign Double-Precision VSX Vector Copy Sign Single-Precision VSX Vector round and Convert Double-Precision to Single-Precision format VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword Saturate VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Word Saturate VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate
XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2
1255
Form
Page
Cat1
Mnemonic
Instruction VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Vector Convert Single-Precision to DoublePrecision VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Word Saturate VSX Vector Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Vector Convert and round Signed Fixed-Point Doubleword to Single-Precision format VSX Vector Convert Signed Fixed-Point Word to Double-Precision format VSX Vector Convert and round Signed Fixed-Point Word to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Single-Precision format VSX Vector Convert Unsigned Fixed-Point Word to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point Word to Single-Precision format VSX Vector Divide Double-Precision VSX Vector Divide Single-Precision VSX Vector Multiply-Add Type-A Double-Precision VSX Vector Multiply-Add Type-A Single-Precision VSX Vector Multiply-Add Type-M Double-Precision VSX Vector Multiply-Add Type-M Single-Precision VSX Vector Maximum Double-Precision VSX Vector Maximum Single-Precision VSX Vector Minimum Double-Precision VSX Vector Minimum Single-Precision VSX Vector Multiply-Subtract Type-A Double-Precision VSX Vector Multiply-Subtract Type-A Single-Precision VSX Vector Multiply-Subtract Type-M Double-Precision VSX Vector Multiply-Subtract Type-M Single-Precision VSX Vector Multiply Double-Precision VSX Vector Multiply Single-Precision VSX Vector Negative Absolute Value Double-Precision VSX Vector Negative Absolute Value Single-Precision VSX Vector Negate Double-Precision VSX Vector Negate Single-Precision VSX Vector Negative Multiply-Add Type-A DoublePrecision VSX Vector Negative Multiply-Add Type-A SinglePrecision VSX Vector Negative Multiply-Add Type-M DoublePrecision VSX Vector Negative Multiply-Add Type-M SinglePrecision
XX2
60
304
423
VSX
xvcvspsxws
XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3
60 60
784 272
425 427 429 429 430 430 431 431 432 432 433 435 437 437 440 440 443 445 447 449 451 451 454 454 457 459 461 461 462 462 463 463 468 468
VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
xvcvspuxds xvcvspuxws xvcvsxddp xvcvsxdsp xvcvsxwdp xvcvsxwsp xvcvuxddp xvcvuxdsp xvcvuxwdp xvcvuxwsp xvdivdp xvdivsp xvmaddadp xvmaddasp xvmaddmdp xvmaddmsp xvmaxdp xvmaxsp xvmindp xvminsp xvmsubadp xvmsubasp xvmsubmdp xvmsubmsp xvmuldp xvmulsp xvnabsdp xvnabssp xvnegdp xvnegsp xvnmaddadp xvnmaddasp xvnmaddmdp xvnmaddmsp
60 480 60 352 60 388 60 260 60 420 60 292 60 896 60 768 60 928 60 800 60 452 60 324 60 484 60 356 60 448 60 320 60 978 60 850 60 1010 60 882 60 60 60 60 900 772 932 804
1256
Form
Page 471 471 474 474 477 478 478 479 479 480 481 482 482 483 483 484 485 486 487 488 489 491 493 494 495 495 496 496 497 497 498 499 499 500 500 501 501 699
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX WT
Mnemonic xvnmsubadp xvnmsubasp xvnmsubmdp xvnmsubmsp xvrdpi xvrdpic xvrdpim xvrdpip xvrdpiz xvredp xvresp xvrspi xvrspic xvrspim xvrspip xvrspiz xvrsqrtedp xvrsqrtesp xvsqrtdp xvsqrtsp xvsubdp xvsubsp xvtdivdp xvtdivsp xvtsqrtdp xvtsqrtsp xxland xxlandc xxlnor xxlor xxlxor xxmrghw xxmrglw xxpermdi xxsel xxsldwi xxspltw wait
Instruction VSX Vector Negative Multiply-Subtract Type-A DoublePrecision VSX Vector Negative Multiply-Subtract Type-A SinglePrecision VSX Vector Negative Multiply-Subtract Type-M DoublePrecision VSX Vector Negative Multiply-Subtract Type-M SinglePrecision VSX Vector Round to Double-Precision Integer VSX Vector Round to Double-Precision Integer using Current rounding mode VSX Vector Round to Double-Precision Integer toward Infinity VSX Vector Round to Double-Precision Integer toward +Infinity VSX Vector Round to Double-Precision Integer toward Zero VSX Vector Reciprocal Estimate Double-Precision VSX Vector Reciprocal Estimate Single-Precision VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using Current rounding mode VSX Vector Round to Single-Precision Integer toward Infinity VSX Vector Round to Single-Precision Integer toward +Infinity VSX Vector Round to Single-Precision Integer toward Zero VSX Vector Reciprocal Square Root Estimate DoublePrecision VSX Vector Reciprocal Square Root Estimate SinglePrecision VSX Vector Square Root Double-Precision VSX Vector Square Root Single-Precision VSX Vector Subtract Double-Precision VSX Vector Subtract Single-Precision VSX Vector Test for software Divide Double-Precision VSX Vector Test for software Divide Single-Precision VSX Vector Test for software Square Root DoublePrecision VSX Vector Test for software Square Root SinglePrecision VSX Logical AND VSX Logical AND with Complement VSX Logical NOR VSX Logical OR VSX Logical XOR VSX Merge High Word VSX Merge Low Word VSX Permute Doubleword Immediate VSX Select VSX Shift Left Double by Word Immediate VSX Splat Word Wait
XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX4 XX3 XX2 X
1
See the key to the mode dependency and privilege columns on page 1302 and the key to the category column in Section 1.3.5 of Book I.
1257
1258
Page 77 76 231 248 255 251 597 237 259 594 224 221 238 238 239 239 240 241 242 597 240 241 227 227 593 228 599 260 260 231 248 255 251 237 259 594 224 221 593
Mnemonic tdi twi vaddubm vmaxub vrlb vcmpequb[.] mulhhwu[.] vmuloub vaddfp machhwu[o][.]
Instruction
D D VX VX VX VC X VX VX XO VX VX VA VA VA VA VA VA VA X VA VA VA VA XO VA XO VA VA VX VX VX VC VX VX XO VX VX XO
0 2 4 6 8 8 10 12 12 14 32 33 34 36 37 38 39 40 40 41 42 43 44 44 46 46 47 64 66 68 70 72 74 76 76 78 108
Trap Doubleword Immediate Trap Word Immediate Vector Add Unsigned Byte Modulo Vector Maximum Unsigned Byte Vector Rotate Left Byte Vector Compare Equal To Unsigned Byte Multiply High Halfword to Word Unsigned Vector Multiply Odd Unsigned Byte Vector Add Single-Precision Multiply Accumulate High Halfword to Word Modulo Unsigned vmrghb Vector Merge High Byte vpkuhum Vector Pack Unsigned Halfword Unsigned Modulo vmhaddshs Vector Multiply-High-Add Signed Halfword Saturate vmhraddshs Vector Multiply-High-Round-Add Signed Halfword Saturate vmladduhm Vector Multiply-Low-Add Unsigned Halfword Modulo vmsumubm Vector Multiply-Sum Unsigned Byte Modulo vmsummbm Vector Multiply-Sum Mixed Byte Modulo vmsumuhm Vector Multiply-Sum Unsigned Halfword Modulo vmsumuhs Vector Multiply-Sum Unsigned Halfword Saturate mulhhw[.] Multiply High Halfword to Word Signed vmsumshm Vector Multiply-Sum Signed Halfword Modulo vmsumshs Vector Multiply-Sum Signed Halfword Saturate vsel Vector Select vperm Vector Permute machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed vsldoi Vector Shift Left Double by Octet Immediate nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed vmaddfp Vector Multiply-Add Single-Precision vnmsubfp Vector Negative Multiply-Subtract Single-Precision vadduhm Vector Add Unsigned Halfword Modulo vmaxuh Vector Maximum Unsigned Halfword vrlh Vector Rotate Left Halfword vcmpequh[.] Vector Compare Equal To Unsigned Halfword vmulouh Vector Multiply Odd Unsigned Halfword vsubfp Vector Subtract Single-Precision machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned vmrghh Vector Merge High Halfword vpkuwum Vector Pack Unsigned Word Unsigned Modulo machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed
1259
Form
Page 599 231 248 255 252 596 592 224 221 596 591 598 265 592 221 591 598 247 256 237 268 225 220 549 247 256 237 268 225 220 230 247 256 597 267 596 225 220 597 595 600 228 266 267 596 220
Cat1 LMA V V V V LMA LMA V V LMA LMA LMA V LMA V LMA LMA V V V V V V SP V V V V V V V V V LMA V LMA V V LMA LMA LMA V V V LMA V
Mnemonic
Instruction
XO VX VX VX VC X XO VX VX X XO XO VC XO VX XO XO VX VX VX VX VX VX EVS VX VX VX VX VX VX VX VX VX X VX XO VX VX X XO XO VX VC VX XO VX
nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed vadduwm Vector Add Unsigned Word Modulo vmaxuw Vector Maximum Unsigned Word vrlw Vector Rotate Left Word vcmpequw[.] Vector Compare Equal To Unsigned Word mulchwu[.] Multiply Cross Halfword to Word Unsigned macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned vmrghw Vector Merge High Word vpkuhus Vector Pack Unsigned Halfword Unsigned Saturate mulchw[.] Multiply Cross Halfword to Word Signed macchw[o][.] Multiply Accumulate Cross Halfword to Word Modulo Signed nmacchw[o][.] Negative Multiply Accumulate Cross Halfword to Word Modulo Signed vcmpeqfp[.] Vector Compare Equal To Single-Precision macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned vpkuwus Vector Pack Unsigned Word Unsigned Saturate macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed vmaxsb Vector Maximum Signed Byte vslb Vector Shift Left Byte vmulosb Vector Multiply Odd Signed Byte vrefp Vector Reciprocal Estimate Single-Precision vmrglb Vector Merge Low Byte vpkshus Vector Pack Signed Halfword Unsigned Saturate evsel Vector Select vmaxsh Vector Maximum Signed Halfword vslh Vector Shift Left Halfword vmulosh Vector Multiply Odd Signed Halfword vrsqrtefp Vector Reciprocal Square Root Estimate Single-Precision vmrglh Vector Merge Low Halfword vpkswus Vector Pack Signed Word Unsigned Saturate vaddcuw Vector Add and Write Carry-Out Unsigned Word vmaxsw Vector Maximum Signed Word vslw Vector Shift Left Word mullhwu[.] Multiply Low Halfword to Word Unsigned vexptefp Vector 2 Raised to the Exponent Estimate FloatingPoint maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned vmrglw Vector Merge Low Word vpkshss Vector Pack Signed Halfword Signed Saturate mullhw[.] Multiply Low Halfword to Word Signed maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed vsl Vector Shift Left vcmpgefp[.] Vector Compare Greater Than or Equal To Single-Precision vlogefp Vector Log Base 2 Estimate Floating-Point maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned vpkswss Vector Pack Signed Word Signed Saturate
1260
Form
Page 595 600 511 232 510 250 555 257 555 253 510 236 547 515 264 515 549 226 514 514 222 510 512 512 555 548 547 515 548 547 551 551 550 550 550 550 548 550 549 550 521 521 522 522 513 512 513 513 512 232 250 257 253 236 264 226 222 562 232
Mnemonic maclhws[o][.] nmaclhws[o][.] evaddw vaddubs evaddiw vminub evsubfw vsrb evsubifw vcmpgtub[.] evabs vmuleub evneg evextsb vrfin evextsh evrndw vspltb evcntlzw evcntlsw vupkhsb brinc evand evandc evxor evor evnor eveqv evorc evnand evsrwu evsrws evsrwiu evsrwis evslw evslwi evrlw evsplati evrlwi evsplatfi evmergehi evmergelo evmergehilo evmergelohi evcmpgtu evcmpgts evcmpltu evcmplts evcmpeq vadduhs vminuh vsrh vcmpgtuh[.] vmuleuh vrfiz vsplth vupkhsh evfsadd vadduws
Instruction Multiply Accumulate Low Halfword to Word Saturate Signed Negative Multiply Accumulate Low Halfword to Word Saturate Signed Vector Add Word Vector Add Unsigned Byte Saturate Vector Add Immediate Word Vector Minimum Unsigned Byte Vector Subtract from Word Vector Shift Right Byte Vector Subtract Immediate from Word Vector Compare Greater Than Unsigned Byte Vector Absolute Value Vector Multiply Even Unsigned Byte Vector Negate Vector Extend Sign Byte Vector Round to Single-Precision Integer Nearest Vector Extend Sign Halfword Vector Round Word Vector Splat Byte Vector Count Leading Zeros Word Vector Count Leading Signed Bits Word Vector Unpack High Signed Byte Bit Reversed Increment Vector AND Vector AND with Complement Vector XOR Vector OR Vector NOR Vector Equivalent Vector OR with Complement Vector NAND Vector Shift Right Word Unsigned Vector Shift Right Word Signed Vector Shift Right Word Immediate Unsigned Vector Shift Right Word Immediate Signed Vector Shift Left Word Vector Shift Left Word Immediate Vector Rotate Left Word Vector Splat Immediate Vector Rotate Left Word Immediate Vector Splat Fractional Immediate Vector Merge High Vector Merge Low Vector Merge High/Low Vector Merge Low/High Vector Compare Greater Than Unsigned Vector Compare Greater Than Signed Vector Compare Less Than Unsigned Vector Compare Less Than Signed Vector Compare Equal Vector Add Unsigned Halfword Saturate Vector Minimum Unsigned Halfword Vector Shift Right Halfword Vector Compare Greater Than Unsigned Halfword Vector Multiply Even Unsigned Halfword Vector Round to Single-Precision Integer toward Zero Vector Splat Halfword Vector Unpack High Signed Halfword Vector Floating-Point Single-Precision Add Vector Add Unsigned Word Saturate
XO XO EVX VX EVX VX EVX VX EVX VC EVX VX EVX EVX VX EVX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX VC VX VX VX VX EVX VX
1261
Form
Page 562 250 561 257 561 561 253 562 562 264 563 226 563 564 223 566 566 566 566 567 567 568 568 567 567 564 565 565 570 570 569 229 569 569 266 570 570 264 571 571 572 223
Cat1 SP.FV V SP.FV V SP.FV SP.FV V SP.FV SP.FV V SP.FV V SP.FV SP.FV V SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FS SP.FS SP.FS V SP.FS SP.FS V SP.FS SP.FS V SP.FS SP.FS SP.FS V
Mnemonic evfssub vminuw evfsabs vsrw evfsnabs evfsneg vcmpgtuw[.] evfsmul evfsdiv vrfip evfscmpgt vspltw evfscmplt evfscmpeq vupklsb evfscfui evfscfsi evfscfuf evfscfsf evfsctui evfsctsi evfsctuf evfsctsf evfsctuiz evfsctsiz evfststgt evfststlt evfststeq efsadd efssub efsabs vsr efsnabs efsneg vcmpgtfp[.] efsmul efsdiv vrfim efscmpgt efscmplt efscmpeq vupklsh
Instruction Vector Floating-Point Single-Precision Subtract Vector Minimum Unsigned Word Vector Floating-Point Single-Precision Absolute Value Vector Shift Right Word Vector Floating-Point Single-Precision Negative Absolute Value Vector Floating-Point Single-Precision Negate Vector Compare Greater Than Unsigned Word Vector Floating-Point Single-Precision Multiply Vector Floating-Point Single-Precision Divide Vector Round to Single-Precision Integer toward +Infinity Vector Floating-Point Single-Precision Compare Greater Than Vector Splat Word Vector Floating-Point Single-Precision Compare Less Than Vector Floating-Point Single-Precision Compare Equal Vector Unpack Low Signed Byte Vector Convert Floating-Point Single-Precision from Unsigned Integer Vector Convert Floating-Point Single-Precision from Signed Integer Vector Convert Floating-Point Single-Precision from Unsigned Fraction Vector Convert Floating-Point Single-Precision from Signed Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer Vector Convert Floating-Point Single-Precision to Signed Integer Vector Convert Floating-Point Single-Precision to Unsigned Fraction Vector Convert Floating-Point Single-Precision to Signed Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Vector Floating-Point Single-Precision Test Greater Than Vector Floating-Point Single-Precision Test Less Than Vector Floating-Point Single-Precision Test Equal Floating-Point Single-Precision Add Floating-Point Single-Precision Subtract Floating-Point Single-Precision Absolute Value Vector Shift Right Floating-Point Single-Precision Negative Absolute Value Floating-Point Single-Precision Negate Vector Compare Greater Than Single-Precision Floating-Point Single-Precision Multiply Floating-Point Single-Precision Divide Vector Round to Single-Precision Integer toward -Infinity Floating-Point Single-Precision Compare Greater Than Floating-Point Single-Precision Compare Less Than Floating-Point Single-Precision Compare Equal Vector Unpack Low Signed Halfword
EVX VX EVX VX EVX EVX VC EVX EVX VX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX EVX VC EVX EVX VX EVX EVX EVX VX
1262
Form
Page 583 574 574 574 574 574 574 575 575 575 575 572 573 573 577 577 580 580 576 576 576 577 577 581 581 578 578 578 582 579 579 580 580 580 580 582
Cat1 SP.FD SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD
Mnemonic efscfd efscfui efscfsi efscfuf efscfsf efsctui efsctsi efsctuf efsctsf efsctuiz efsctsiz efststgt efststlt efststeq efdadd efdsub efdcfuid efdcfsid efdabs efdnabs efdneg efdmul efddiv efdctuidz efdctsidz efdcmpgt efdcmplt efdcmpeq efdcfs efdcfui efdcfsi efdcfuf efdcfsf efdctui efdctsi efdctuf
Instruction Floating-Point Single-Precision Convert from DoublePrecision Convert Floating-Point Single-Precision from Unsigned Integer Convert Floating-Point Single-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Single-Precision from Signed Fraction Convert Floating-Point Single-Precision to Unsigned Integer Convert Floating-Point Single-Precision to Signed Integer Convert Floating-Point Single-Precision to Unsigned Fraction Convert Floating-Point Single-Precision to Signed Fraction Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Floating-Point Single-Precision Test Greater Than Floating-Point Single-Precision Test Less Than Floating-Point Single-Precision Test Equal Floating-Point Double-Precision Add Floating-Point Double-Precision Subtract Convert Floating-Point Double-Precision from Unsigned Integer Doubleword Convert Floating-Point Double-Precision from Signed Integer Doubleword Floating-Point Double-Precision Absolute Value Floating-Point Double-Precision Negative Absolute Value Floating-Point Double-Precision Negate Floating-Point Double-Precision Multiply Floating-Point Double-Precision Divide Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero Floating-Point Double-Precision Compare Greater Than Floating-Point Double-Precision Compare Less Than Floating-Point Double-Precision Compare Equal Floating-Point Double-Precision Convert from SinglePrecision Convert Floating-Point Double-Precision from Unsigned Integer Convert Floating-Point Double-Precision from Signed Integer Convert Floating-Point Double-Precision from Unsigned Fraction Convert Floating-Point Double-Precision from Signed Fraction Convert Floating-Point Double-Precision to Unsigned Integer Convert Floating-Point Double-Precision to Signed Integer Convert Floating-Point Double-Precision to Unsigned Fraction
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1263
Form
Page 582 582 582 578 579 579 516 230 516 517 249 517 516 258 516 252 517 236 517 263 518 226 518 518 219 518 519 519 520 520 519 519 521 521 520 520 551 551 552 552 552 552 553 553 553 553 553
Mnemonic efdctsf efdctuiz efdctsiz efdtstgt efdtstlt efdtsteq evlddx vaddsbs evldd evldwx vminsb evldw evldhx vsrab evldh vcmpgtsb[.] evlhhesplatx vmulesb evlhhesplat vcfux evlhhousplatx vspltisb evlhhousplat evlhhossplatx vpkpx evlhhossplat evlwhex evlwhe evlwhoux evlwhou evlwhosx evlwhos evlwwsplatx evlwwsplat evlwhsplatx evlwhsplat evstddx evstdd evstdwx evstdw evstdhx evstdh evstwhex evstwhe evstwhox evstwho evstwwex
Instruction Convert Floating-Point Double-Precision to Signed Fraction Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero Floating-Point Double-Precision Test Greater Than Floating-Point Double-Precision Test Less Than Floating-Point Double-Precision Test Equal Vector Load Double Word into Double Word Indexed Vector Add Signed Byte Saturate Vector Load Double Word into Double Word Vector Load Double into Two Words Indexed Vector Minimum Signed Byte Vector Load Double into Two Words Vector Load Double into Four Halfwords Indexed Vector Shift Right Algebraic Byte Vector Load Double into Four Halfwords Vector Compare Greater Than Signed Byte Vector Load Halfword into Halfwords Even and Splat Indexed Vector Multiply Even Signed Byte Vector Load Halfword into Halfwords Even and Splat Vector Convert From Unsigned Fixed-Point Word Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed Vector Splat Immediate Signed Byte Vector Load Halfword into Halfword Odd Unsigned and Splat Vector Load Halfword into Halfword Odd Signed and Splat Indexed Vector Pack Pixel Vector Load Halfword into Halfword Odd Signed and Splat Vector Load Word into Two Halfwords Even Indexed Vector Load Word into Two Halfwords Even Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) Vector Load Word into Two Halfwords Odd Signed (with sign extension) Vector Load Word into Word and Splat Indexed Vector Load Word into Word and Splat Vector Load Word into Two Halfwords and Splat Indexed Vector Load Word into Two Halfwords and Splat Vector Store Double of Double Indexed Vector Store Double of Double Vector Store Double of Two Words Indexed Vector Store Double of Two Words Vector Store Double of Four Halfwords Indexed Vector Store Double of Four Halfwords Vector Store Word of Two Halfwords from Even Indexed Vector Store Word of Two Halfwords from Even Vector Store Word of Two Halfwords from Odd Indexed Vector Store Word of Two Halfwords from Odd Vector Store Word of Word from Even Indexed
EVX EVX EVX EVX EVX EVX EVX VX EVX EVX VX EVX EVX VX EVX VC EVX VX EVX VX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1264
Form
Page 553 554 554 230 249 258 252 236 263 226 222 230 249 258 252 262 226 265 262 223 234 246 526 254 535 529 525 261 524 537 228 533 532 526 535 529 525 524 537 533 532 234 246 254
Cat1 SP SP SP V V V V V V V V V V V V V V V V V V V SP V SP SP SP V SP SP V SP SP SP SP SP SP SP SP SP SP V V V
Mnemonic evstwwe evstwwox evstwwo vaddshs vminsh vsrah vcmpgtsh[.] vmulesh vcfsx vspltish vupkhpx vaddsws vminsw vsraw vcmpgtsw[.] vctuxs vspltisw vcmpbfp[.] vctsxs vupklpx vsububm vavgub evmhessf vand evmhossf evmheumi evmhesmi vmaxfp evmhesmf evmhoumi vslo evmhosmi evmhosmf evmhessfa evmhossfa evmheumia evmhesmia evmhesmfa evmhoumia evmhosmia evmhosmfa vsubuhm vavguh vandc
Instruction Vector Store Word of Word from Even Vector Store Word of Word from Odd Indexed Vector Store Word of Word from Odd Vector Add Signed Halfword Saturate Vector Minimum Signed Halfword Vector Shift Right Algebraic Halfword Vector Compare Greater Than Signed Halfword Vector Multiply Even Signed Halfword Vector Convert From Signed Fixed-Point Word Vector Splat Immediate Signed Halfword Vector Unpack High Pixel Vector Add Signed Word Saturate Vector Minimum Signed Word Vector Shift Right Algebraic Word Vector Compare Greater Than Signed Word Vector Convert To Unsigned Fixed-Point Word Saturate Vector Splat Immediate Signed Word Vector Compare Bounds Single-Precision Vector Convert To Signed Fixed-Point Word Saturate Vector Unpack Low Pixel Vector Subtract Unsigned Byte Modulo Vector Average Unsigned Byte Vector Multiply Halfwords, Even, Signed, Saturate, Fractional Vector Logical AND Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer Vector Multiply Halfwords, Even, Signed, Modulo, Integer Vector Maximum Single-Precision Vector Multiply Halfwords, Even, Signed, Modulo, Fractional Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer Vector Shift Left by Octet Vector Multiply Halfwords, Odd, Signed, Modulo, Integer Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator Vector Subtract Unsigned Halfword Modulo Vector Average Unsigned Halfword Vector Logical AND with Complement
EVX EVX EVX VX VX VX VC VX VX VX VX VX VX VX VC VX VX VC VX VX VX VX EVX VX EVX EVX EVX VX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX
1265
Form
Page 540 542 261 540 229 539 539 545 546 544 543 540 542 540 539 539 545 546 544 543 234 246 254 511 511 555 554 539 254 514 515 511 510 555 554 530 528 245 527
Cat1 SP SP V SP V SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V V V SP SP SP SP SP V SP SP SP SP SP SP SP SP V SP
Mnemonic evmwhssf evmwlumi vminfp evmwhumi vsro evmwhsmi evmwhsmf evmwssf evmwumi evmwsmi evmwsmf evmwhssfa evmwlumia evmwhumia evmwhsmia evmwhsmfa evmwssfa evmwumia evmwsmia evmwsmfa vsubuwm vavguw vor evaddusiaaw evaddssiaaw evsubfusiaaw evsubfssiaaw evmra vxor evdivws evdivwu evaddumiaaw evaddsmiaaw evsubfumiaaw evsubfsmiaaw evmheusiaaw evmhessiaaw vavgsb evmhessfaaw
Instruction Vector Multiply Word High Signed, Saturate, Fractional Vector Multiply Word Low Unsigned, Modulo, Integer Vector Minimum Single-Precision Vector Multiply Word High Unsigned, Modulo, Integer Vector Shift Right by Octet Vector Multiply Word High Signed, Modulo, Integer Vector Multiply Word High Signed, Modulo, Fractional Vector Multiply Word Signed, Saturate, Fractional Vector Multiply Word Unsigned, Modulo, Integer Vector Multiply Word Signed, Modulo, Integer Vector Multiply Word Signed, Modulo, Fractional Vector Multiply Word High Signed, Saturate, Fractional to Accumulator Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Modulo, Fractional to Accumulator Vector Multiply Word Signed, Saturate, Fractional to Accumulator Vector Multiply Word Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Fractional to Accumulator Vector Subtract Unsigned Word Modulo Vector Average Unsigned Word Vector Logical OR Vector Add Unsigned, Saturate, Integer to Accumulator Word Vector Add Signed, Saturate, Integer to Accumulator Word Vector Subtract Unsigned, Saturate, Integer to Accumulator Word Vector Subtract Signed, Saturate, Integer to Accumulator Word Initialize Accumulator Vector Logical XOR Vector Divide Word Signed Vector Divide Word Unsigned Vector Add Unsigned, Modulo, Integer to Accumulator Word Vector Add Signed, Modulo, Integer to Accumulator Word Vector Subtract Unsigned, Modulo, Integer to Accumulator Word Vector Subtract Signed, Modulo, Integer to Accumulator Word Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words Vector Average Signed Byte Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words
EVX EVX VX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX EVX EVX EVX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX
1266
Form
Page 538 254 537 536 529 525 524 538 534 533 523 523 522 532 531 531 543 541 245 542 541 545 547 544 544 530 233 528 245 527 538
Cat1 SP V SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V SP SP SP SP SP SP SP V SP V SP SP
Mnemonic evmhousiaaw vnor evmhossiaaw evmhossfaaw evmheumiaaw evmhesmiaaw evmhesmfaaw evmhoumiaaw evmhosmiaaw evmhosmfaaw evmhegumiaa evmhegsmiaa evmhegsmfaa evmhogumiaa evmhogsmiaa evmhogsmfaa evmwlusiaaw evmwlssiaaw vavgsh evmwlumiaaw evmwlsmiaaw evmwssfaa evmwumiaa evmwsmiaa evmwsmfaa evmheusianw vsubcuw evmhessianw vavgsw evmhessfanw evmhousianw
Instruction Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Vector Logical NOR Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words Vector Average Signed Halfword Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words Vector Multiply Word Signed, Saturate, Fractional and Accumulate Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Subtract and Write Carry-Out Unsigned Word Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words Vector Average Signed Word Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words
EVX VX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX EVX EVX EVX EVX EVX EVX EVX VX EVX VX EVX EVX
1267
Form
Page 537 536 529 525 524 534 533 533 523 523 522 532 531 531 543 541 542 541 546 547 544 544 235 269 244 234 269 244 235 243 233 244 233 233 243 67 64 75
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP V V V V V V V V V V V V V B B B
Mnemonic evmhossianw evmhossfanw evmheumianw evmhesmianw evmhesmfanw evmhoumianw evmhosmianw evmhosmfanw evmhegumian evmhegsmian evmhegsmfan evmhogumian evmhogsmian evmhogsmfan evmwlusianw evmwlssianw evmwlumianw evmwlsmianw evmwssfan evmwumian evmwsmian evmwsmfan vsububs mfvscr vsum4ubs vsubuhs mtvscr vsum4shs vsubuws vsum2sws vsubsbs vsum4sbs vsubshs vsubsws vsumsws mulli subfic cmpli
Instruction Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative Vector Subtract Unsigned Byte Saturate Move From Vector Status and Control Register Vector Sum across Quarter Unsigned Byte Saturate Vector Subtract Unsigned Halfword Saturate Move To Vector Status and Control Register Vector Sum across Quarter Signed Halfword Saturate Vector Subtract Unsigned Word Saturate Vector Sum across Half Signed Word Saturate Vector Subtract Signed Byte Saturate Vector Sum across Quarter Signed Byte Saturate Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate Vector Sum across Signed Word Saturate Multiply Low Immediate Subtract From Immediate Carrying Compare Logical Immediate
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX VX VX VX VX VX VX VX VX VX VX VX VX VX D D D
1268
Form
Page 74 63 63 62 62 36 40, 738, 886 36 39 37 739 39 888 888 887 887 889 39 688 38 1071 38 38 739 39 741 39 741 38 742 742 37 89 87 88 78 79 79 79 78 78 90 90 91 92 91 92 74 76 218 216 64 71 64 67 77 980 102 103
Cat1 B B B B B B B
Instruction Compare Immediate Add Immediate Carrying Add Immediate Carrying and Record Add Immediate Add Immediate Shifted Branch Conditional System Call
D D D D D B SC
SR SR CT
18 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 21 23 24 25 26 27 28 29 30 30 30 30 30 30 31 31 31 31 31 31 31 31 31 31 31 31
0 16 18 33 38 39 50 51 102 129 150 193 198 225 257 274 289 402 417 434 449 466 498 528
CT P P P P P P
H H H H H CT SR SR SR
0 2 4 6 8 9 0 4 6 7 8 9 10 11 15 18 19 19
SR SR SR SR SR SR SR SR
SR SR SR SR P
b[l][a] mcrf bclr[l] rfid crnor rfmci rfdi rfi rfci rfgi crandc isync crxor dnh crnand crand hrfid creqv doze crorc nap cror sleep rvwinkle bcctr[l] rlwimi[.] rlwinm[.] rlwnm[.] ori oris xori xoris andi. andis. rldicl[.] rldicr[.] rldic[.] rldimi[.] rldcl[.] rldcr[.] cmp tw lvsl lvebx subfc[o][.] mulhdu[.] addc[o][.] mulhwu[.] isel tlbilx mfcr mfocrf
Branch Move Condition Register Field Branch Conditional to Link Register Return From Interrupt Doubleword Condition Register NOR Return From Machine Check Interrupt Return From Debug Interrupt Return From Interrupt Return From Critical Interrupt Return From Guest Interrupt Condition Register AND with Complement Instruction Synchronize Condition Register XOR Debugger Notify Halt Condition Register NAND Condition Register AND Hypervisor Return From Interrupt Doubleword Condition Register Equivalent Doze Condition Register OR with Complement Nap Condition Register OR Sleep Rip Van Winkle Branch Conditional to Count Register Rotate Left Word Immediate then Mask Insert Rotate Left Word Immediate then AND with Mask Rotate Left Word then AND with Mask OR Immediate OR Immediate Shifted XOR Immediate XOR Immediate Shifted AND Immediate AND Immediate Shifted Rotate Left Doubleword Immediate then Clear Left Rotate Left Doubleword Immediate then Clear Right Rotate Left Doubleword Immediate then Clear Rotate Left Doubleword Immediate then Mask Insert Rotate Left Doubleword then Clear Left Rotate Left Doubleword then Clear Right Compare Trap Word Load Vector for Shift Left Indexed Load Vector Element Byte Indexed Subtract From Carrying Multiply High Doubleword Unsigned Add Carrying Multiply High Word Unsigned Integer Select TLB Invalidate Local Move From Condition Register Move From One Condition Register Field
1269
Form
Page
Cat1
Mnemonic
Instruction Load Word And Reserve Indexed Load Doubleword Indexed Instruction Cache Block Touch Load Word and Zero Indexed Shift Left Word Count Leading Zeros Word Shift Left Doubleword AND Load Doubleword by External Process ID Indexed Load Word by External Process ID Indexed Compare Logical Load Vector for Shift Right Indexed Load Vector Element Halfword Indexed Subtract From Load Byte and Reserve Indexed Load Doubleword with Update Indexed Data Cache Block Store Load Word and Zero with Update Indexed Count Leading Zeros Doubleword AND with Complement Wait Data Cache Block Store by External PID Trap Doubleword Load Vector Element Word Indexed Multiply High Doubleword Add and Generate Sixes Multiply High Word Determine Leftmost Zero Byte Move From Machine State Register Load Doubleword And Reserve Indexed Data Cache Block Flush Load Byte and Zero Indexed Load Byte by External Process ID Indexed Load Vector Indexed Negate Load Halfword and Reserve Indexed Load Byte and Zero with Update Indexed Population Count Bytes NOR Data Cache Block Flush by External PID Write MSR External Enable Data Cache Block Touch for Store and Lock Set Store Vector Element Byte Indexed Subtract From Extended Add Extended Move To Condition Register Fields Move To One Condition Register Field Move To Machine State Register Move To Machine State Register Store Doubleword Indexed Store Word Conditional Indexed Store Word Indexed Parity Word Store Doubleword by External Process ID Indexed Store Word by External Process ID Indexed Write MSR External Enable Immediate Data Cache Block Touch and Lock Set Store Vector Element Halfword Indexed Move To Machine State Register Doubleword Store Doubleword with Update Indexed
X X X X X X X X X X X X X XO X X X X X X X X X X XO XO XO X X X X X X X XO X X X X X X X X XO XO XFX XFX X X X X X X X X X X X X X
SR SR SR SR P P
SR
SR SR P SR H SR P
P SR
SR P P M SR SR P P
P P P M P
689 B lwarx 50 64 ldx 676 E icbt 48 B lwzx 93 B slw[.] 82 B cntlzw[.] 95 64 sld[.] 80 B and[.] 906 E.PD;64 ldepx 906 E.PD lwepx 75 B cmpl 218 V lvsr 213 V lvehx 63 B subf[o][.] 689 B lbarx 50 64 ldux 686 B dcbst 48 B lwzux 85 64 cntlzd[.] 81 B andc[.] 699 WT wait 909 E.PD dcbstep 77 64 td 213 V lvewx 71 64 mulhd[.] 97 BCDA addg6s 67 B mulhw[.] 589 LMV dlmzb[.] 759, B mfmsr 901 694 64 ldarx 686 B dcbf 46 B lbzx 905 E.PD lbepx 214 V lvx 66 B neg[o][.] 690 B lharx 45 B lbzux 83 B popcntb 81 B nor[.] 910 E.PD dcbfep 902 E wrtee 968 ECL dcbtstls 216 V stvebx 65 B subfe[o][.] 65 B adde[o][.] 102 B mtcrf 103 B mtocrf 901 E mtmsr 757 S mtmsr 54 64 stdx 693 B stwcx. 53 B stwx 84 B prtyw 908 E.PD;64 stdepx 908 E.PD stwepx 903 E wrteei 968 ECL dcbtls 216 V stvehx 758 S mtmsrd 54 64 stdux
1270
Form
Page 53 84 217 66 66 1078 796 694 51 907 970 214 65 71 65 67 1078 796 684 51 86 912 901 916 63 889 800 683 46 97 81 905 104 916 798 712 46 97 80 909 901 1086 339 1100 101, 704 49 47 214 802 704 49 47 83 900 969 73 69 792 52 81
Cat1 B 64 V B B E.PC S 64 B E.PD ECL V B 64 B B E.PC S B B 64 E.PD E.DC E.PD B E.HV S B B BCDA B E.PD E.DC E.PD S EC B BCDA B E.PD E.DC E.CD VSX E.PM B 64 B V S S.out 64 B B E.DC ECL 64 B S B B
Mnemonic stwux prtyd stvewx subfze[o][.] addze[o][.] msgsnd mtsr stdcx. stbx stbepx icblc stvx subfme[o][.] mulld[o][.] addme[o][.] mullw[o][.] msgclr mtsrin dcbtst stbux bpermd dcbtstep mfdcrx lvepxl add[o][.] ehpriv tlbiel dcbt lhzx cdtbcd eqv[.] lhepx mfdcrux lvepx tlbie eciwx lhzux cbcdtd xor[.] dcbtep mfdcr dcread lxvdsx mfpmr mfspr lwax lhax lvxl tlbia mftb lwaux lhaux popcntw mtdcrx dcblc divdeu[o][.] divweu[o][.] slbmte sthx orc[.]
Instruction Store Word with Update Indexed Parity Doubleword Store Vector Element Word Indexed Subtract From Zero Extended Add to Zero Extended Message Send Move To Segment Register Store Doubleword Conditional Indexed Store Byte Indexed Store Byte by External Process ID Indexed Instruction Cache Block Lock Clear Store Vector Indexed Subtract From Minus One Extended Multiply Low Doubleword Add to Minus One Extended Multiply Low Word Message Clear Move To Segment Register Indirect Data Cache Block Touch for Store Store Byte with Update Indexed Bit Permute Doubleword Data Cache Block Touch for Store by External PID Move From Device Control Register Indexed Load Vector by External Process ID Indexed LRU Add Embedded Hypervisor Privilege TLB Invalidate Entry Local Data Cache Block Touch Load Halfword and Zero Indexed Convert Declets To Binary Coded Decimal Equivalent Load Halfword by External Process ID Indexed Move From Device Control Register User-mode Indexed Load Vector by External Process ID Indexed TLB Invalidate Entry External Control In Word Indexed Load Halfword and Zero with Update Indexed Convert Binary Coded Decimal to Declets XOR Data Cache Block Touch by External PID Move From Device Control Register Data Cache Read [Alternative Encoding] Load VSR Vector Doubleword & Splat Indexed Move From Performance Monitor Register Move From Special Purpose Register Load Word Algebraic Indexed Load Halfword Algebraic Indexed Load Vector Indexed LRU TLB Invalidate All Move From Time Base Load Word Algebraic with Update Indexed Load Halfword Algebraic with Update Indexed Population Count Word Move To Device Control Register Indexed Data Cache Block Lock Clear Divide Doubleword Extended Unsigned Divide Word Extended Unsigned SLB Move To Entry Store Halfword Indexed OR with Complement
SR SR P 32 P P M SR SR SR SR P 32 P
P P P SR 64 P H SR P P 64 H H SR P P P O O
P M SR SR P SR
1271
Form
Page 907 104 73 69 790 712 52 80 900 1083 72 68 1100 100 965 80 710 1086 969 217 72 68 791 85 82 104 708 56 59 55 128 93 95 708 803, 987, 905 128 708 338 797 59 696 125 914 708 125 709 797 56 60 55 128 709 691 128 709 340 60 692
Mnemonic sthepx mtdcrux divde[o][.] divwe[o][.] slbie ecowx sthux or[.] mtdcr dci divdu[o][.] divwu[o][.] mtpmr mtspr dcbi nand[.] dsn dcread icbtls stvxl divd[o][.] divw[o][.] slbia popcntd cmpb mcrxr lbdx ldbrx lswx lwbrx lfsx srw[.] srd[.] lhdx tlbsync
Instruction Store Halfword by External Process ID Indexed Move To Device Control Register User-mode Indexed Divide Doubleword Extended Divide Word Extended SLB Invalidate Entry External Control Out Word Indexed Store Halfword with Update Indexed OR Move To Device Control Register Data Cache Invalidate Divide Doubleword Unsigned Divide Word Unsigned Move To Performance Monitor Register Move To Special Purpose Register Data Cache Block Invalidate NAND Decorated Storage Notify Data Cache Read Instruction Cache Block Touch and Lock Set Store Vector Indexed LRU Divide Doubleword Divide Word SLB Invalidate All Population Count Doubleword Compare Bytes Move to Condition Register from XER Load Byte with Decoration Indexed Load Doubleword Byte-Reverse Indexed Load String Word Indexed Load Word Byte-Reverse Indexed Load Floating-Point Single Indexed Shift Right Word Shift Right Doubleword Load Halfword with Decoration Indexed TLB Synchronize
P SR SR P SR P P SR SR O O P SR P M SR SR P
SR SR H
X 31 X 31 XX1 31 X 31 X 31 X 31 X 31 X 31 X X X X X X X X X X X X XX1 X X 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
567 579 588 595 597 598 599 607 611 631 643 659 660 661 662 663 675 694 695 707 716 725 726
32 P
lfsux lwdx lxsdx mfsr lswi sync lfdx lfdepx lddx lfdux stbdx mfsrin stdbrx stswx stwbrx stfsx sthdx stbcx. stfsux stwdx stxsdx stswi sthcx.
32 P
Load Floating-Point Single with Update Indexed Load Word with Decoration Indexed Load VSR Scalar Doubleword Indexed Move From Segment Register Load String Word Immediate Synchronize Load Floating-Point Double Indexed Load Floating-Point Double by External Process ID Indexed Load Doubleword with Decoration Indexed Load Floating-Point Double with Update Indexed Store Byte with Decoration Indexed Move From Segment Register Indirect Store Doubleword Byte-Reverse Indexed Store String Word Indexed Store Word Byte-Reverse Indexed Store Floating-Point Single Indexed Store Halfword with Decoration Indexed Store Byte Conditional Indexed Store Floating-Point Single with Update Indexed Store Word with Decoration Indexed Store VSR Scalar Doubleword Indexed Store String Word Immediate Store Halfword Conditional Indexed
1272
Form
Page 129 914 709 683 129 917 339 978, 903 749 55 131 94 96 915 708 917 749 94 96 338 984 793 749 698 698 126 749 126 341 982, 904 793 750 55 131 82 915
Cat1 FP E.PD DS E FP E.PD VSX E S B FP.out B 64 E.PD DS E.PD S B 64 VSX E.TWC S S E S FP S FP VSX E S S B FP.out B E.PD DS E S B E.CI VSX E S S B FP 64 E.PD E.CD S B E.PD B
Mnemonic stfdx stfdepx stddx dcba stfdux stvepxl lxvw4x tlbivax lwzcix lhbrx lfdpx sraw[.] srad[.] evlddepx lfddx stvepx lhzcix srawi[.] sradi[.] lxvd2x tlbsrx. slbmfev lbzcix mbar eieio lfiwax ldcix lfiwzx stxvw4x tlbsx slbmfee stwcix sthbrx stfdpx extsh[.] evstddepx stfddx tlbre sthcix extsb[.] ici stxvd2x tlbwe slbfee. stbcix icbi stfiwx extsw[.] icbiep icread stdcix dcbz dcbzep lwz
Instruction Store Floating-Point Double Indexed Store Floating-Point Double by External Process ID Indexed Store Doubleword with Decoration Indexed Data Cache Block Allocate Store Floating-Point Double with Update Indexed Store Vector by External Process ID Indexed LRU Load VSR Vector Word*4 Indexed TLB Invalidate Virtual Address Indexed Load Word and Zero Caching Inhibited Indexed Load Halfword Byte-Reverse Indexed Load Floating-Point Double Pair Indexed Shift Right Algebraic Word Shift Right Algebraic Doubleword Vector Load Doubleword into Doubleword by External Process ID Indexed Load Floating Doubleword with Decoration Indexed Store Vector by External Process ID Indexed Load Halfword and Zero Caching Inhibited Indexed Shift Right Algebraic Word Immediate Shift Right Algebraic Doubleword Immediate Load VSR Vector Doubleword*2 Indexed TLB Search and Reserve SLB Move From Entry VSID Load Byte and Zero Caching Inhibited Indexed Memory Barrier Enforce In-order Execution of I/O Load Floating-Point as Integer Word Algebraic Indexed Load Doubleword Caching Inhibited Indexed Load Floating-Point as Integer Word and Zero Indexed Store VSR Vector Word*4 Indexed TLB Search Indexed SLB Move From Entry ESID Store Word Caching Inhibited Indexed Store Halfword Byte-Reverse Indexed Store Floating-Point Double Pair Indexed Extend Sign Halfword Vector Store Doubleword into Doubleword by External Process ID Indexed Store Floating Doubleword with Decoration Indexed TLB Read Entry Store Halfword Caching Inhibited Indexed Extend Sign Byte Instruction Cache Invalidate Store VSR Vector Doubleword*2 Indexed TLB Write Entry SLB Find Entry ESID Store Byte Caching Inhibited Indexed Instruction Cache Block Invalidate Store Floating-Point as Integer Word Indexed Extend Sign Word Instruction Cache Block Invalidate by External PID Instruction Cache Read Store Doubleword Caching Inhibited Indexed Data Cache Block set to Zero Data Cache Block set to Zero by External PID Load Word and Zero
X X
P P H SR SR P P H SR SR P P H
H P P H SR P
709 P 985, 904 31 949 H 750 31 954 SR 82 31 966 P 1083 31 972 340 31 978 P 987, 905 31 979 SR P 794 31 981 H 750 31 982 676 31 983 130 31 986 SR 85 31 991 P 913 31 998 P 1087 31 1013 H 750 31 1014 686 31 1023 P 913 32 48
1273
Form
Page 48 45 45 53 53 51 51 46 46 47 47 52 52 57 57 128 128 125 125 128 128 129 129 751 131 50 50 49 173 184 134 133 133 135 135 134 136 138 138 139 139 175 186 200 183 200 189 179 181 180 180 191 193 195 197 198 173 176 178 182 194
Cat1 B B B B B B B B B B B B B B B FP FP FP FP FP FP FP FP LSQ FP.out 64 64 64 DFP DFP FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R].in FP[R] FP[R] FP[R] FP[R] DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP
Mnemonic lwzu lbz lbzu stw stwu stb stbu lhz lhzu lha lhau sth sthu lmw stmw lfs lfsu lfd lfdu stfs stfsu stfd stfdu lq lfdp ld ldu lwa dadd[.] dqua[.] fdivs[.] fsubs[.] fadds[.] fsqrts[.] fres[.] fmuls[.] frsqrtes[.] fmsubs[.] fmadds[.] fnmsubs[.] fnmadds[.] dmul[.] drrnd[.] dscli[.] dquai[.] dscri[.] drintx[.] dcmpo dtstex dtstdc dtstdg drintn[.] dctdp[.] dctfix[.] ddedpd[.] dxex[.] dsub[.] ddiv[.] dcmpu dtstsf drsp[.]
Instruction Load Word and Zero with Update Load Byte and Zero Load Byte and Zero with Update Store Word Store Word with Update Store Byte Store Byte with Update Load Halfword and Zero Load Halfword and Zero with Update Load Halfword Algebraic Load Halfword Algebraic with Update Store Halfword Store Halfword with Update Load Multiple Word Store Multiple Word Load Floating-Point Single Load Floating-Point Single with Update Load Floating-Point Double Load Floating-Point Double with Update Store Floating-Point Single Store Floating-Point Single with Update Store Floating-Point Double Store Floating-Point Double with Update Load Quadword Load Floating-Point Double Pair Load Doubleword Load Doubleword with Update Load Word Algebraic DFP Add DFP Quantize Floating Divide Single Floating Subtract Single Floating Add Single Floating Square Root Single Floating Reciprocal Estimate Single Floating Multiply Single Floating Reciprocal Square Root Estimate Single Floating Multiply-Subtract Single Floating Multiply-Add Single Floating Negative Multiply-Subtract Single Floating Negative Multiply-Add Single DFP Multiply DFP Reround DFP Shift Significand Left Immediate DFP Quantize Immediate DFP Shift Significand Right Immediate DFP Round To FP Integer With Inexact DFP Compare Ordered DFP Test Exponent DFP Test Data Class DFP Test Data Group DFP Round To FP Integer Without Inexact DFP Convert To DFP Long DFP Convert To Fixed DFP Decode DPD To BCD DFP Extract Biased Exponent DFP Subtract DFP Divide DFP Compare Unordered DFP Test Significance DFP Round To DFP Short
P 0 0 1 2 2 3 18 20 21 22 24 25 26 28 29 30 31 34 35 66 67 98 99 130 162 194 226 227 258 290 322 354 514 546 642 674 770
1274
Form
Page 195 197 145 198 146 501 500 500 499 342 365 349 359 386 391 392 393 365 347 355 389 390 375 372 499 388 396 387 363 372 388 395 402 437 405 427 482 486 488 491 440 409 423 484
Cat1 DFP DFP FP DFP FP VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic dcffix[.] denbcd[.] fcfids[.] diex[.] fcfidus[.] xxsldwi xxsel xxpermdi xxmrghw xsadddp xsmaddadp xscmpudp xscvdpuxws xsrdpi xsrsqrtedp xssqrtdp xssubdp xsmaddmdp xscmpodp xscvdpsxws xsrdpiz xsredp xsmuldp xsmsubadp xxmrglw xsrdpip xstsqrtdp xsrdpic xsdivdp xsmsubmdp xsrdpim xstdivdp xvaddsp xvmaddasp xvcmpeqsp xvcvspuxws xvrspi xvrsqrtesp xvsqrtsp xvsubsp xvmaddmsp xvcmpgtsp xvcvspsxws xvrspiz
Instruction DFP Convert From Fixed DFP Encode BCD To DPD Floating Convert From Integer Doubleword Single DFP Insert Biased Exponent Floating Convert From Integer Doubleword Unsigned Single VSX Shift Left Double by Word Immediate VSX Select VSX Permute Doubleword Immediate VSX Merge High Word VSX Scalar Add Double-Precision VSX Scalar Multiply-Add Type-A Double-Precision VSX Scalar Compare Unordered Double-Precision VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Scalar Round to Double-Precision Integer VSX Scalar Reciprocal Square Root Estimate DoublePrecision VSX Scalar Square Root Double-Precision VSX Scalar Subtract Double-Precision VSX Scalar Multiply-Add Type-M Double-Precision VSX Scalar Compare Ordered Double-Precision VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Scalar Round to Double-Precision Integer toward Zero VSX Scalar Reciprocal Estimate Double-Precision VSX Scalar Multiply Double-Precision VSX Scalar Multiply-Subtract Type-A Double-Precision VSX Merge Low Word VSX Scalar Round to Double-Precision Integer toward +Infinity VSX Scalar Test for software Square Root DoublePrecision VSX Scalar Round to Double-Precision Integer using Current rounding mode VSX Scalar Divide Double-Precision VSX Scalar Multiply-Subtract Type-M Double-Precision VSX Scalar Round to Double-Precision Integer toward Infinity VSX Scalar Test for software Divide Double-Precision VSX Vector Add Single-Precision VSX Vector Multiply-Add Type-A Single-Precision VSX Vector Compare Equal To Single-Precision VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Word Saturate VSX Vector Round to Single-Precision Integer VSX Vector Reciprocal Square Root Estimate SinglePrecision VSX Vector Square Root Single-Precision VSX Vector Subtract Single-Precision VSX Vector Multiply-Add Type-M Single-Precision VSX Vector Compare Greater Than Single-Precision VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Vector Round to Single-Precision Integer toward Zero
X X X X X XX3 XX4 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX2 XX3 XX3 XX2 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX2 XX2
1275
Form
Page 481 459 451 501 407 432 483 495 482 435 454 430 483 494 398 437 404 418 477 485 487 489 440 408 414 479 480 457 451 406 432 479 495 478 433 454 430 478 493 496
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic xvresp xvmulsp xvmsubasp xxspltw xvcmpgesp xvcvuxwsp xvrspip xvtsqrtsp xvrspic xvdivsp xvmsubmsp xvcvsxwsp xvrspim xvtdivsp xvadddp xvmaddadp xvcmpeqdp xvcvdpuxws xvrdpi xvrsqrtedp xvsqrtdp xvsubdp xvmaddmdp xvcmpgtdp xvcvdpsxws xvrdpiz xvredp xvmuldp xvmsubadp xvcmpgedp xvcvuxwdp xvrdpip xvtsqrtdp xvrdpic xvdivdp xvmsubmdp xvcvsxwdp xvrdpim xvtdivdp xxland
Instruction VSX Vector Reciprocal Estimate Single-Precision VSX Vector Multiply Single-Precision VSX Vector Multiply-Subtract Type-A Single-Precision VSX Splat Word VSX Vector Compare Greater Than or Equal To SinglePrecision VSX Vector Convert and round Unsigned Fixed-Point Word to Single-Precision format VSX Vector Round to Single-Precision Integer toward +Infinity VSX Vector Test for software Square Root SinglePrecision VSX Vector Round to Single-Precision Integer using Current rounding mode VSX Vector Divide Single-Precision VSX Vector Multiply-Subtract Type-M Single-Precision VSX Vector Convert and round Signed Fixed-Point Word to Single-Precision format VSX Vector Round to Single-Precision Integer toward Infinity VSX Vector Test for software Divide Single-Precision VSX Vector Add Double-Precision VSX Vector Multiply-Add Type-A Double-Precision VSX Vector Compare Equal To Double-Precision VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Vector Round to Double-Precision Integer VSX Vector Reciprocal Square Root Estimate DoublePrecision VSX Vector Square Root Double-Precision VSX Vector Subtract Double-Precision VSX Vector Multiply-Add Type-M Double-Precision VSX Vector Compare Greater Than Double-Precision VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Word Saturate VSX Vector Round to Double-Precision Integer toward Zero VSX Vector Reciprocal Estimate Double-Precision VSX Vector Multiply Double-Precision VSX Vector Multiply-Subtract Type-A Double-Precision VSX Vector Compare Greater Than or Equal To Double-Precision VSX Vector Convert Unsigned Fixed-Point Word to Double-Precision format VSX Vector Round to Double-Precision Integer toward +Infinity VSX Vector Test for software Square Root DoublePrecision VSX Vector Round to Double-Precision Integer using Current rounding mode VSX Vector Divide Double-Precision VSX Vector Multiply-Subtract Type-M Double-Precision VSX Vector Convert Signed Fixed-Point Word to Double-Precision format VSX Vector Round to Double-Precision Integer toward Infinity VSX Vector Test for software Divide Double-Precision VSX Logical AND
XX2 XX3 XX3 XX2 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX2 XX2 XX3 XX3
1276
Form
Page 352 496 497 498 368 378 497 357 361 370 378 353 341 351 383 362 377 383 361 377 445 463 405 425 411 449 468 409 421 397 410 471 407 431 461
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic xscvdpsp xxlandc xxlor xxlxor xsmaxdp xsnmaddadp xxlnor xscvdpuxds xscvspdp xsmindp xsnmaddmdp xscvdpsxds xsabsdp xscpsgndp xsnmsubadp xscvuxddp xsnabsdp xsnmsubmdp xscvsxddp xsnegdp xvmaxsp xvnmaddasp xvcmpeqsp. xvcvspuxds xvcvdpsp xvminsp xvnmaddmsp xvcmpgtsp. xvcvspsxds xvabssp xvcpsgnsp xvnmsubasp xvcmpgesp. xvcvuxdsp xvnabssp
Instruction VSX Scalar Convert Double-Precision to SinglePrecision VSX Logical AND with Complement VSX Logical OR VSX Logical XOR VSX Scalar Maximum Double-Precision VSX Scalar Negative Multiply-Add Type-A DoublePrecision VSX Logical NOR VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar Convert Single-Precision to DoublePrecision format VSX Scalar Minimum Double-Precision VSX Scalar Negative Multiply-Add Type-M DoublePrecision VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Scalar Absolute Value Double-Precision VSX Scalar Copy Sign Double-Precision VSX Scalar Negative Multiply-Subtract Type-A DoublePrecision VSX Scalar Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Scalar Negative Absolute Value Double-Precision VSX Scalar Negative Multiply-Subtract Type-M DoublePrecision VSX Scalar Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Scalar Negate Double-Precision VSX Vector Maximum Single-Precision VSX Vector Negative Multiply-Add Type-A SinglePrecision VSX Vector Compare Equal To Single-Precision & Record VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector round and Convert Double-Precision to Single-Precision format VSX Vector Minimum Single-Precision VSX Vector Negative Multiply-Add Type-M SinglePrecision VSX Vector Compare Greater Than Single-Precision & Record VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector Absolute Value Single-Precision VSX Vector Copy Sign Single-Precision VSX Vector Negative Multiply-Subtract Type-A SinglePrecision VSX Vector Compare Greater Than or Equal To SinglePrecision & Record VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Single-Precision format VSX Vector Negative Absolute Value Single-Precision
XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX2 XX2 XX3 XX3 XX2 XX2 XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2
1277
Form
Page 474 429 462 443 463 404 416 420 447 468 408 412 397 410 471 406 431 461 474 429 462 131 54 54 751 148 173 184 132 140 142 143 134 133 133 135 149 135 134 136 138 138 139 139
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX FP.out 64 64 LSQ FP DFP DFP FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R].in FP[R] FP[R] FP[R] FP[R]
Mnemonic xvnmsubmsp xvcvsxdsp xvnegsp xvmaxdp xvnmaddadp xvcmpeqdp. xvcvdpuxds xvcvspdp xvmindp xvnmaddmdp xvcmpgtdp. xvcvdpsxds xvabsdp xvcpsgndp xvnmsubadp xvcmpgedp. xvcvuxddp xvnabsdp xvnmsubmdp xvcvsxddp xvnegdp stfdp std stdu stq fcmpu daddq[.] dquaq[.] fcpsgn[.] frsp[.] fctiw[.] fctiwz[.] fdiv[.] fsub[.] fadd[.] fsqrt[.] fsel[.] fre[.] fmul[.] frsqrte[.] fmsub[.] fmadd[.] fnmsub[.] fnmadd[.]
Instruction VSX Vector Negative Multiply-Subtract Type-M SinglePrecision VSX Vector Convert and round Signed Fixed-Point Doubleword to Single-Precision format VSX Vector Negate Single-Precision VSX Vector Maximum Double-Precision VSX Vector Negative Multiply-Add Type-A DoublePrecision VSX Vector Compare Equal To Double-Precision & Record VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector Convert Single-Precision to DoublePrecision VSX Vector Minimum Double-Precision VSX Vector Negative Multiply-Add Type-M DoublePrecision VSX Vector Compare Greater Than Double-Precision & Record VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword Saturate VSX Vector Absolute Value Double-Precision VSX Vector Copy Sign Double-Precision VSX Vector Negative Multiply-Subtract Type-A DoublePrecision VSX Vector Compare Greater Than or Equal To Double-Precision & Record VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Vector Negative Absolute Value Double-Precision VSX Vector Negative Multiply-Subtract Type-M DoublePrecision VSX Vector Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Vector Negate Double-Precision Store Floating-Point Double Pair Store Doubleword Store Doubleword with Update Store Quadword Floating Compare Unordered DFP Add Quad DFP Quantize Quad Floating Copy Sign Floating Round to Single-Precision Floating Convert To Integer Word Floating Convert To Integer Word with round toward Zero Floating Divide Floating Subtract Floating Add Floating Square Root Floating Select Floating Reciprocal Estimate Floating Multiply Floating Reciprocal Square Root Estimate Floating Multiply-Subtract Floating Multiply-Add Floating Negative Multiply-Subtract Floating Negative Multiply-Add
XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX2 XX2 XX3 XX2 XX2 DS DS DS DS X X Z23 X X X X A A A A A A A A A A A A
60 1008 60 1010 61 62 0 62 1 62 2 63 0 63 2 63 3 63 8 63 12 63 14 63 15 63 63 63 63 63 63 63 63 63 63 63 63 18 20 21 22 23 24 25 26 28 29 30 31
1278
Form
Page 148 175 186 152 132 150 200 183 152 132 200 189 137 179 151 132 143 144 137 181 180 180 191 193 132 195 197 198 147 147 147 147 173 176 150 179 182 151 194 195 140 141 197 144 198 141 142 145
Cat1 FP DFP DFP FP[R] FP[R] FP DFP DFP FP[R] FP[R] DFP DFP FP DFP FP[R] FP[R] FP FP FP DFP DFP DFP DFP DFP FP[R] DFP DFP DFP FP[R].in FP[R].in FP[R].in FP[R].in DFP DFP FP[R] DFP DFP FP[R] DFP DFP FP[R] FP[R] DFP FP[R] DFP FP FP FP
Mnemonic fcmpo dmulq[.] drrndq[.] mtfsb1[.] fneg[.] mcrfs dscliq[.] dquaiq[.] mtfsb0[.] fmr[.] dscriq[.] drintxq[.] ftdiv dcmpoq mtfsfi[.] fnabs[.] fctiwu[.] fctiwuz[.] ftsqrt dtstexq dtstdcq dtstdgq drintnq[.] dctqpq[.] fabs[.] dctfixq[.] ddedpdq[.] dxexq[.] frin[.] friz[.] frip[.] frim[.] dsubq[.] ddivq[.] mffs[.] dcmpuq dtstsfq mtfsf[.] drdpq[.] dcffixq[.] fctid[.] fctidz[.] denbcdq[.] fcfid[.] diexq[.] fctidu[.] fctiduz[.] fcfidu[.]
Instruction Floating Compare Ordered DFP Multiply Quad DFP Reround Quad Move To FPSCR Bit 1 Floating Negate Move to Condition Register from FPSCR DFP Shift Significand Left Immediate Quad DFP Quantize Immediate Quad Move To FPSCR Bit 0 Floating Move Register DFP Shift Significand Right Immediate Quad DFP Round To FP Integer With Inexact Quad Floating Test for software Divide DFP Compare Ordered Quad Move To FPSCR Field Immediate Floating Negative Absolute Value Floating Convert To Integer Word Unsigned Floating Convert To Integer Word Unsigned with round toward Zero Floating Test for software Square Root DFP Test Exponent Quad DFP Test Data Class Quad DFP Test Data Group Quad DFP Round To FP Integer Without Inexact Quad DFP Convert To DFP Extended Floating Absolute Value DFP Convert To Fixed Quad DFP Decode DPD To BCD Quad DFP Extract Biased Exponent Quad Floating Round to Integer Nearest Floating Round to Integer Toward Zero Floating Round to Integer Plus Floating Round to Integer Minus DFP Subtract Quad DFP Divide Quad Move From FPSCR DFP Compare Unordered Quad DFP Test Significance Quad Move To FPSCR Fields DFP Round To DFP Long DFP Convert From Fixed Quad Floating Convert To Integer Doubleword Floating Convert To Integer Doubleword with round toward Zero DFP Encode BCD To DPD Quad Floating Convert From Integer Doubleword DFP Insert Biased Exponent Quad Floating Convert To Integer Doubleword Unsigned Floating Convert To Integer Doubleword Unsigned with round toward Zero Floating Convert From Integer Doubleword Unsigned
See the key to the mode dependency and privilege columns on page 1302 and the key to the category column in Section 1.3.5 of Book I.
1279
1280
Page
Mnemonic add[o][.] addc[o][.] adde[o][.] addg6s addi addic addic. addis addme[o][.] addze[o][.] and[.] andc[.] andi. andis. b[l][a] bc[l][a] bcctr[l] bclr[l] bpermd brinc cbcdtd cdtbcd cmp cmpb cmpi cmpl cmpli cntlzd[.] cntlzw[.] crand crandc creqv crnand crnor cror crorc crxor dadd[.] daddq[.] dcba dcbf dcbfep dcbi dcblc dcbst dcbstep
Instruction Add Add Carrying Add Extended Add and Generate Sixes Add Immediate Add Immediate Carrying Add Immediate Carrying and Record Add Immediate Shifted Add to Minus One Extended Add to Zero Extended AND AND with Complement AND Immediate AND Immediate Shifted Branch Branch Conditional Branch Conditional to Count Register Branch Conditional to Link Register Bit Permute Doubleword Bit Reversed Increment Convert Binary Coded Decimal to Declets Convert Declets To Binary Coded Decimal Compare Compare Bytes Compare Immediate Compare Logical Compare Logical Immediate Count Leading Zeros Doubleword Count Leading Zeros Word Condition Register AND Condition Register AND with Complement Condition Register Equivalent Condition Register NAND Condition Register NOR Condition Register OR Condition Register OR with Complement Condition Register XOR DFP Add DFP Add Quad Data Cache Block Allocate Data Cache Block Flush Data Cache Block Flush by External PID Data Cache Block Invalidate Data Cache Block Lock Clear Data Cache Block Store Data Cache Block Store by External PID
XO 31 XO 31 XO 31 XO 31 D 14 D 12 D 13 D 15 XO 31 XO 31 X 31 X 31 D 28 D 29 I 18 B 16 XL 19 XL 19 X 31 EVX 4 X 31 X 31 X 31 X 31 D 11 X 31 D 10 X 31 X 31 XL 19 XL 19 XL 19 XL 19 XL 19 XL 19 XL 19 XL 19 X 59 X 63 X 31 X 31 X 31 X 31 X 31 X 31 X 31
SR SR SR SR SR
234 202 28 60
SR SR SR SR SR SR CT CT CT
528 16 252 527 314 282 0 508 32 58 26 257 129 289 225 33 449 417 193 2 2 758 86 127 470 390 54 63
SR SR
63 64 65 H 97 62 63 63 62 65 66 80 81 78 78 36 36 37 37 86 510 H 97 H 97 74 82 74 75 75 85 82 38 39 39 38 39 38 39 38 173 173 683 686 P 910 P 965 M 969 686 P 909
1281
Form
Page 683 909 968 684 912 968 686 913 195 195 1083 179 179 178 179 1086 1086 193 195 195 193 197 197 176 176 197 197 198 198 72 73 73 72 68 69 69 68 589 175 175 1071 741 184 183 183 184 194 191 191 189 189 186 186 194 200 200 200 200 710 173 173
Cat1 B E.PD ECL B E.PD ECL B E.PD DFP DFP E.CI DFP DFP DFP DFP E.CD E.CD DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP 64 64 64 64 B B B B LMV DFP DFP E.ED S DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP DS DFP DFP
Mnemonic dcbt dcbtep dcbtls dcbtst dcbtstep dcbtstls dcbz dcbzep dcffix[.] dcffixq[.] dci dcmpo dcmpoq dcmpu dcmpuq dcread dcread dctdp[.] dctfix[.] dctfixq[.] dctqpq[.] ddedpd[.] ddedpdq[.] ddiv[.] ddivq[.] denbcd[.] denbcdq[.] diex[.] diexq[.] divd[o][.] divde[o][.] divdeu[o][.] divdu[o][.] divw[o][.] divwe[o][.] divweu[o][.] divwu[o][.] dlmzb[.] dmul[.] dmulq[.] dnh doze dqua[.] dquai[.] dquaiq[.] dquaq[.] drdpq[.] drintn[.] drintnq[.] drintx[.] drintxq[.] drrnd[.] drrndq[.] drsp[.] dscli[.] dscliq[.] dscri[.] dscriq[.] dsn dsub[.] dsubq[.]
Instruction Data Cache Block Touch Data Cache Block Touch by External PID Data Cache Block Touch and Lock Set Data Cache Block Touch for Store Data Cache Block Touch for Store by External PID Data Cache Block Touch for Store and Lock Set Data Cache Block set to Zero Data Cache Block set to Zero by External PID DFP Convert From Fixed DFP Convert From Fixed Quad Data Cache Invalidate DFP Compare Ordered DFP Compare Ordered Quad DFP Compare Unordered DFP Compare Unordered Quad Data Cache Read [Alternative Encoding] Data Cache Read DFP Convert To DFP Long DFP Convert To Fixed DFP Convert To Fixed Quad DFP Convert To DFP Extended DFP Decode DPD To BCD DFP Decode DPD To BCD Quad DFP Divide DFP Divide Quad DFP Encode BCD To DPD DFP Encode BCD To DPD Quad DFP Insert Biased Exponent DFP Insert Biased Exponent Quad Divide Doubleword Divide Doubleword Extended Divide Doubleword Extended Unsigned Divide Doubleword Unsigned Divide Word Divide Word Extended Divide Word Extended Unsigned Divide Word Unsigned Determine Leftmost Zero Byte DFP Multiply DFP Multiply Quad Debugger Notify Halt Doze DFP Quantize DFP Quantize Immediate DFP Quantize Immediate Quad DFP Quantize Quad DFP Round To DFP Long DFP Round To FP Integer Without Inexact DFP Round To FP Integer Without Inexact Quad DFP Round To FP Integer With Inexact DFP Round To FP Integer With Inexact Quad DFP Reround DFP Reround Quad DFP Round To DFP Short DFP Shift Significand Left Immediate DFP Shift Significand Left Immediate Quad DFP Shift Significand Right Immediate DFP Shift Significand Right Immediate Quad Decorated Storage Notify DFP Subtract DFP Subtract Quad
X 31 278 X 31 319 P X 31 166 M X 31 246 X 31 255 P X 31 134 M X 31 1014 X 31 1023 P X 59 802 X 63 802 X 31 454 P X 59 130 X 63 130 X 59 642 X 63 642 X 31 326 P X 31 486 P X 59 258 X 59 290 X 63 290 X 63 258 X 59 322 X 63 322 X 59 546 X 63 546 X 59 834 X 63 834 X 59 866 X 63 866 XO 31 489 SR XO 31 425 SR XO 31 393 SR XO 31 457 SR XO 31 491 SR XO 31 427 SR XO 31 395 SR XO 31 459 SR X 31 78 X 59 34 X 63 34 XFX 19 198 XL 19 402 H Z23 59 3 Z23 59 67 Z23 63 67 Z23 63 3 X 63 770 Z23 59 227 Z23 63 227 Z23 59 99 Z23 63 99 Z23 59 35 Z23 63 35 X 59 770 Z22 59 66 Z23 63 66 Z22 59 98 Z22 63 98 X 31 483 X 59 514 X 63 514
1282
Form
Page 180 180 180 180 181 181 182 182 198 198 712 712 576 577 582 580 579 580 580 579 580 578 578 578 582 580 581 582 582 580 581 582 577 577 576 576 577 579 578 579 569 570
Cat1 DFP DFP DFP DFP DFP DFP DFP DFP DFP DFP EC EC SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FD SP.FS SP.FS
Mnemonic dtstdc dtstdcq dtstdg dtstdgq dtstex dtstexq dtstsf dtstsfq dxex[.] dxexq[.] eciwx ecowx efdabs efdadd efdcfs efdcfsf efdcfsi efdcfsid efdcfuf efdcfui efdcfuid efdcmpeq efdcmpgt efdcmplt efdctsf efdctsi efdctsidz efdctsiz efdctuf efdctui efdctuidz efdctuiz efddiv efdmul efdnabs efdneg efdsub efdtsteq efdtstgt efdtstlt efsabs efsadd
Instruction DFP Test Data Class DFP Test Data Class Quad DFP Test Data Group DFP Test Data Group Quad DFP Test Exponent DFP Test Exponent Quad DFP Test Significance DFP Test Significance Quad DFP Extract Biased Exponent DFP Extract Biased Exponent Quad External Control In Word Indexed External Control Out Word Indexed Floating-Point Double-Precision Absolute Value Floating-Point Double-Precision Add Floating-Point Double-Precision Convert from SinglePrecision Convert Floating-Point Double-Precision from Signed Fraction Convert Floating-Point Double-Precision from Signed Integer Convert Floating-Point Double-Precision from Signed Integer Doubleword Convert Floating-Point Double-Precision from Unsigned Fraction Convert Floating-Point Double-Precision from Unsigned Integer Convert Floating-Point Double-Precision from Unsigned Integer Doubleword Floating-Point Double-Precision Compare Equal Floating-Point Double-Precision Compare Greater Than Floating-Point Double-Precision Compare Less Than Convert Floating-Point Double-Precision to Signed Fraction Convert Floating-Point Double-Precision to Signed Integer Convert Floating-Point Double-Precision to Signed Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Signed Integer with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Fraction Convert Floating-Point Double-Precision to Unsigned Integer Convert Floating-Point Double-Precision to Unsigned Integer Doubleword with Round toward Zero Convert Floating-Point Double-Precision to Unsigned Integer with Round toward Zero Floating-Point Double-Precision Divide Floating-Point Double-Precision Multiply Floating-Point Double-Precision Negative Absolute Value Floating-Point Double-Precision Negate Floating-Point Double-Precision Subtract Floating-Point Double-Precision Test Equal Floating-Point Double-Precision Test Greater Than Floating-Point Double-Precision Test Less Than Floating-Point Single-Precision Absolute Value Floating-Point Single-Precision Add
Z23 Z22 Z23 Z22 X X X X X X X X EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1283
Form
Page 583 574 574 574 574 572 571 571 575 574 575 575 574 575 570 570 569 569 570 573 572 573 889 698 81 510 510 510 511 511 511 511 512 512 512 512 513 513 513 514 514 514 515 515
Cat1 SP.FD SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS SP.FS E.HV S B SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic efscfd efscfsf efscfsi efscfuf efscfui efscmpeq efscmpgt efscmplt efsctsf efsctsi efsctsiz efsctuf efsctui efsctuiz efsdiv efsmul efsnabs efsneg efssub efststeq efststgt efststlt ehpriv eieio eqv[.] evabs evaddiw evaddsmiaaw evaddssiaaw evaddumiaaw evaddusiaaw evaddw evand evandc evcmpeq evcmpgts evcmpgtu evcmplts evcmpltu evcntlsw evcntlzw evdivws evdivwu eveqv
Instruction Floating-Point Single-Precision Convert from DoublePrecision Convert Floating-Point Single-Precision from Signed Fraction Convert Floating-Point Single-Precision from Signed Integer Convert Floating-Point Single-Precision from Unsigned Fraction Convert Floating-Point Single-Precision from Unsigned Integer Floating-Point Single-Precision Compare Equal Floating-Point Single-Precision Compare Greater Than Floating-Point Single-Precision Compare Less Than Convert Floating-Point Single-Precision to Signed Fraction Convert Floating-Point Single-Precision to Signed Integer Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Convert Floating-Point Single-Precision to Unsigned Fraction Convert Floating-Point Single-Precision to Unsigned Integer Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Floating-Point Single-Precision Divide Floating-Point Single-Precision Multiply Floating-Point Single-Precision Negative Absolute Value Floating-Point Single-Precision Negate Floating-Point Single-Precision Subtract Floating-Point Single-Precision Test Equal Floating-Point Single-Precision Test Greater Than Floating-Point Single-Precision Test Less Than Embedded Hypervisor Privilege Enforce In-order Execution of I/O Equivalent Vector Absolute Value Vector Add Immediate Word Vector Add Signed, Modulo, Integer to Accumulator Word Vector Add Signed, Saturate, Integer to Accumulator Word Vector Add Unsigned, Modulo, Integer to Accumulator Word Vector Add Unsigned, Saturate, Integer to Accumulator Word Vector Add Word Vector AND Vector AND with Complement Vector Compare Equal Vector Compare Greater Than Signed Vector Compare Greater Than Unsigned Vector Compare Less Than Signed Vector Compare Less Than Unsigned Vector Count Leading Signed Bits Word Vector Count Leading Zeros Word Vector Divide Word Signed Vector Divide Word Unsigned Vector Equivalent
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX XL X X EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
4 710 4 705 4 734 4 732 4 733 31 270 31 854 31 284 SR 4 520 4 514 4 1225 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 1217 1224 1216 512 529 530 564 561 560 563 562 526 525 1222 1223 537
1284
Form
Page 515 515 561 562 566 566 566 566 564 563 563 568 567 567 568 567 567 562 562 561 561 562 565 564 565 516 915 516 516 516 517 517 517 517 518 518 518 518 519 519
Cat1 SP SP SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP.FV SP E.PD SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evextsb evextsh evfsabs evfsadd evfscfsf evfscfsi evfscfuf evfscfui evfscmpeq evfscmpgt evfscmplt evfsctsf evfsctsi evfsctsiz evfsctuf evfsctui evfsctuiz evfsdiv evfsmul evfsnabs evfsneg evfssub evfststeq evfststgt evfststlt evldd evlddepx evlddx evldh evldhx evldw evldwx evlhhesplat evlhhesplatx evlhhossplat evlhhossplatx evlhhousplat evlhhousplatx evlwhe evlwhex
Instruction Vector Extend Sign Byte Vector Extend Sign Halfword Vector Floating-Point Single-Precision Absolute Value Vector Floating-Point Single-Precision Add Vector Convert Floating-Point Single-Precision from Signed Fraction Vector Convert Floating-Point Single-Precision from Signed Integer Vector Convert Floating-Point Single-Precision from Unsigned Fraction Vector Convert Floating-Point Single-Precision from Unsigned Integer Vector Floating-Point Single-Precision Compare Equal Vector Floating-Point Single-Precision Compare Greater Than Vector Floating-Point Single-Precision Compare Less Than Vector Convert Floating-Point Single-Precision to Signed Fraction Vector Convert Floating-Point Single-Precision to Signed Integer Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero Vector Convert Floating-Point Single-Precision to Unsigned Fraction Vector Convert Floating-Point Single-Precision to Unsigned Integer Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero Vector Floating-Point Single-Precision Divide Vector Floating-Point Single-Precision Multiply Vector Floating-Point Single-Precision Negative Absolute Value Vector Floating-Point Single-Precision Negate Vector Floating-Point Single-Precision Subtract Vector Floating-Point Single-Precision Test Equal Vector Floating-Point Single-Precision Test Greater Than Vector Floating-Point Single-Precision Test Less Than Vector Load Double Word into Double Word Vector Load Doubleword into Doubleword by External Process ID Indexed Vector Load Double Word into Double Word Indexed Vector Load Double into Four Halfwords Vector Load Double into Four Halfwords Indexed Vector Load Double into Two Words Vector Load Double into Two Words Indexed Vector Load Halfword into Halfwords Even and Splat Vector Load Halfword into Halfwords Even and Splat Indexed Vector Load Halfword into Halfword Odd Signed and Splat Vector Load Halfword into Halfword Odd Signed and Splat Indexed Vector Load Halfword into Halfword Odd Unsigned and Splat Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed Vector Load Word into Two Halfwords Even Vector Load Word into Two Halfwords Even Indexed
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
EVX 4 EVX 4 EVX 31 EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX 4 4 4 4 4 4 4 4 4 4 4 4 4
1285
Form
Page 519 519 520 520 520 520 521 521 521 522 521 522 522 522 523 523 523 523 524 524 524 524 525 525 525 525 526 526 527 527 528 528 529
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evlwhos evlwhosx evlwhou evlwhoux evlwhsplat evlwhsplatx evlwwsplat evlwwsplatx evmergehi evmergehilo evmergelo evmergelohi evmhegsmfaa evmhegsmfan evmhegsmiaa evmhegsmian evmhegumiaa evmhegumian evmhesmf evmhesmfa evmhesmfaaw evmhesmfanw evmhesmi evmhesmia evmhesmiaaw evmhesmianw evmhessf evmhessfa evmhessfaaw evmhessfanw evmhessiaaw evmhessianw evmheumi
Instruction Vector Load Word into Two Halfwords Odd Signed (with sign extension) Vector Load Word into Two Halfwords Odd Signed Indexed (with sign extension) Vector Load Word into Two Halfwords Odd Unsigned (zero-extended) Vector Load Word into Two Halfwords Odd Unsigned Indexed (zero-extended) Vector Load Word into Two Halfwords and Splat Vector Load Word into Two Halfwords and Splat Indexed Vector Load Word into Word and Splat Vector Load Word into Word and Splat Indexed Vector Merge High Vector Merge High/Low Vector Merge Low Vector Merge Low/High Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Even, Signed, Modulo, Fractional Vector Multiply Halfwords, Even, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer Vector Multiply Halfwords, Even, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional Vector Multiply Halfwords, Even, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1286
Form
Page 529 529 529 530 530 531 531 531 531 532 532 532 532 533 533 533 533 534 533 535 535 536 536 537 537 537 537 538 534
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmheumia evmheumiaaw evmheumianw evmheusiaaw evmheusianw evmhogsmfaa evmhogsmfan evmhogsmiaa evmhogsmian evmhogumiaa evmhogumian evmhosmf evmhosmfa evmhosmfaaw evmhosmfanw evmhosmi evmhosmia evmhosmiaaw evmhosmianw evmhossf evmhossfa evmhossfaaw evmhossfanw evmhossiaaw evmhossianw evmhoumi evmhoumia evmhoumiaaw evmhoumianw
Instruction Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Vector Multiply Halfwords, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer Vector Multiply Halfwords, Odd, Signed, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Modulo, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional to Accumulator Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Signed, Saturate, Integer and Accumulate Negative into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer to Accumulator Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1287
Form
Page 538 538 539 539 539 539 539 540 540 540 540 541 541 541 541 542 542 542 542 543 543 543 543 544 544 544 544 544 544 545 545 545 546 546
Cat1 SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
Mnemonic evmhousiaaw evmhousianw evmra evmwhsmf evmwhsmfa evmwhsmi evmwhsmia evmwhssf evmwhssfa evmwhumi evmwhumia evmwlsmiaaw evmwlsmianw evmwlssiaaw evmwlssianw evmwlumi evmwlumia evmwlumiaaw evmwlumianw evmwlusiaaw evmwlusianw evmwsmf evmwsmfa evmwsmfaa evmwsmfan evmwsmi evmwsmia evmwsmiaa evmwsmian evmwssf evmwssfa evmwssfaa evmwssfan evmwumi
Instruction Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Halfwords, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words Initialize Accumulator Vector Multiply Word High Signed, Modulo, Fractional Vector Multiply Word High Signed, Modulo, Fractional to Accumulator Vector Multiply Word High Signed, Modulo, Integer Vector Multiply Word High Signed, Modulo, Integer to Accumulator Vector Multiply Word High Signed, Saturate, Fractional Vector Multiply Word High Signed, Saturate, Fractional to Accumulator Vector Multiply Word High Unsigned, Modulo, Integer Vector Multiply Word High Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Signed, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Modulo, Integer Vector Multiply Word Low Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate into Words Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words Vector Multiply Word Signed, Modulo, Fractional Vector Multiply Word Signed, Modulo, Fractional to Accumulator Vector Multiply Word Signed, Modulo, Fractional and Accumulate Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative Vector Multiply Word Signed, Modulo, Integer Vector Multiply Word Signed, Modulo, Integer to Accumulator Vector Multiply Word Signed, Modulo, Integer and Accumulate Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative Vector Multiply Word Signed, Saturate, Fractional Vector Multiply Word Signed, Saturate, Fractional to Accumulator Vector Multiply Word Signed, Saturate, Fractional and Accumulate Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative Vector Multiply Word Unsigned, Modulo, Integer
EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX
1288
Form
Page 546 547 547 547 547 547 548 548 548 549 549 549 550 550 550 550 550 550 551 551 551 915 551 552 552 552 552 553 553 553 553 553 553 554 554 554 554 555 555 555 555 555 82 82 85 132 133 133 144 145 145
Mnemonic evmwumia evmwumiaa evmwumian evnand evneg evnor evor evorc evrlw evrlwi evrndw evsel evslw evslwi evsplatfi evsplati evsrwis evsrwiu evsrws evsrwu evstdd evstddepx evstddx evstdh evstdhx evstdw evstdwx evstwhe evstwhex evstwho evstwhox evstwwe evstwwex evstwwo evstwwox evsubfsmiaaw evsubfssiaaw evsubfumiaaw evsubfusiaaw evsubfw evsubifw evxor extsb[.] extsh[.] extsw[.] fabs[.] fadd[.] fadds[.] fcfid[.] fcfids[.] fcfidu[.]
Instruction Vector Multiply Word Unsigned, Modulo, Integer to Accumulator Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative Vector NAND Vector Negate Vector NOR Vector OR Vector OR with Complement Vector Rotate Left Word Vector Rotate Left Word Immediate Vector Round Word Vector Select Vector Shift Left Word Vector Shift Left Word Immediate Vector Splat Fractional Immediate Vector Splat Immediate Vector Shift Right Word Immediate Signed Vector Shift Right Word Immediate Unsigned Vector Shift Right Word Signed Vector Shift Right Word Unsigned Vector Store Double of Double Vector Store Doubleword into Doubleword by External Process ID Indexed Vector Store Double of Double Indexed Vector Store Double of Four Halfwords Vector Store Double of Four Halfwords Indexed Vector Store Double of Two Words Vector Store Double of Two Words Indexed Vector Store Word of Two Halfwords from Even Vector Store Word of Two Halfwords from Even Indexed Vector Store Word of Two Halfwords from Odd Vector Store Word of Two Halfwords from Odd Indexed Vector Store Word of Word from Even Vector Store Word of Word from Even Indexed Vector Store Word of Word from Odd Vector Store Word of Word from Odd Indexed Vector Subtract Signed, Modulo, Integer to Accumulator Word Vector Subtract Signed, Saturate, Integer to Accumulator Word Vector Subtract Unsigned, Modulo, Integer to Accumulator Word Vector Subtract Unsigned, Saturate, Integer to Accumulator Word Vector Subtract from Word Vector Subtract Immediate from Word Vector XOR Extend Sign Byte Extend Sign Halfword Extend Sign Word Floating Absolute Value Floating Add Floating Add Single Floating Convert From Integer Doubleword Floating Convert From Integer Doubleword Single Floating Convert From Integer Doubleword Unsigned
EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVS 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 4 EVX 31 EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX EVX X X X X A A X X X 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 31 31 31 63 63 59 63 59 63
SR SR SR
1289
Form
Page 146 148 148 132 140 141 142 141 142 143 144 143 134 134 138 138 132 138 138 134 134 132 132 139 139 139 139 135 135 147 147 147 147 140 136 136 149 135 135 133 133 137 137 739 676 913 970 676 969 1083 1087 77 688 689 708 905
Cat1 FP FP FP FP[R] FP[R] FP FP FP[R] FP[R] FP FP FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R] FP[R].in FP[R].in FP[R].in FP[R].in FP[R] FP[R].in FP[R].in FP[R] FP[R] FP[R] FP[R] FP[R] FP FP S B E.PD ECL E ECL E.CI E.CD B B B DS E.PD
Mnemonic fcfidus[.] fcmpo fcmpu fcpsgn[.] fctid[.] fctidu[.] fctiduz[.] fctidz[.] fctiw[.] fctiwu[.] fctiwuz[.] fctiwz[.] fdiv[.] fdivs[.] fmadd[.] fmadds[.] fmr[.] fmsub[.] fmsubs[.] fmul[.] fmuls[.] fnabs[.] fneg[.] fnmadd[.] fnmadds[.] fnmsub[.] fnmsubs[.] fre[.] fres[.] frim[.] frin[.] frip[.] friz[.] frsp[.] frsqrte[.] frsqrtes[.] fsel[.] fsqrt[.] fsqrts[.] fsub[.] fsubs[.] ftdiv ftsqrt hrfid icbi icbiep icblc icbt icbtls ici icread isel isync lbarx lbdx lbepx
Instruction Floating Convert From Integer Doubleword Unsigned Single Floating Compare Ordered Floating Compare Unordered Floating Copy Sign Floating Convert To Integer Doubleword Floating Convert To Integer Doubleword Unsigned Floating Convert To Integer Doubleword Unsigned with round toward Zero Floating Convert To Integer Doubleword with round toward Zero Floating Convert To Integer Word Floating Convert To Integer Word Unsigned Floating Convert To Integer Word Unsigned with round toward Zero Floating Convert To Integer Word with round toward Zero Floating Divide Floating Divide Single Floating Multiply-Add Floating Multiply-Add Single Floating Move Register Floating Multiply-Subtract Floating Multiply-Subtract Single Floating Multiply Floating Multiply Single Floating Negative Absolute Value Floating Negate Floating Negative Multiply-Add Floating Negative Multiply-Add Single Floating Negative Multiply-Subtract Floating Negative Multiply-Subtract Single Floating Reciprocal Estimate Floating Reciprocal Estimate Single Floating Round to Integer Minus Floating Round to Integer Nearest Floating Round to Integer Plus Floating Round to Integer Toward Zero Floating Round to Single-Precision Floating Reciprocal Square Root Estimate Floating Reciprocal Square Root Estimate Single Floating Select Floating Square Root Floating Square Root Single Floating Subtract Floating Subtract Single Floating Test for software Divide Floating Test for software Square Root Hypervisor Return From Interrupt Doubleword Instruction Cache Block Invalidate Instruction Cache Block Invalidate by External PID Instruction Cache Block Lock Clear Instruction Cache Block Touch Instruction Cache Block Touch and Lock Set Instruction Cache Invalidate Instruction Cache Read Integer Select Instruction Synchronize Load Byte and Reserve Indexed Load Byte with Decoration Indexed Load Byte by External Process ID Indexed
X X X X X X X X X X X X A A A A X A A A A X X A A A A A A X X X X X A A A A A A A X X XL X X X X X X X A XL X X X
H P M M P P
1290
Form
Page 45 749 45 45 46 50 694 56 749 708 906 50 50 50 125 708 914 131 131 125 125 125 126 126 128 128 128 128 47 690 47 47 47 55 708 905 46 749 46 46 46 57 751 59 59 216 213 916 916 213 218 218 214 214 49 689 49 49 55 708
Mnemonic lbz lbzcix lbzu lbzux lbzx ld ldarx ldbrx ldcix lddx ldepx ldu ldux ldx lfd lfddx lfdepx lfdp lfdpx lfdu lfdux lfdx lfiwax lfiwzx lfs lfsu lfsux lfsx lha lharx lhau lhaux lhax lhbrx lhdx lhepx lhz lhzcix lhzu lhzux lhzx lmw lq lswi lswx lvebx lvehx lvepx lvepxl lvewx lvsl lvsr lvx lvxl lwa lwarx lwaux lwax lwbrx lwdx
Instruction Load Byte and Zero Load Byte and Zero Caching Inhibited Indexed Load Byte and Zero with Update Load Byte and Zero with Update Indexed Load Byte and Zero Indexed Load Doubleword Load Doubleword And Reserve Indexed Load Doubleword Byte-Reverse Indexed Load Doubleword Caching Inhibited Indexed Load Doubleword with Decoration Indexed Load Doubleword by External Process ID Indexed Load Doubleword with Update Load Doubleword with Update Indexed Load Doubleword Indexed Load Floating-Point Double Load Floating Doubleword with Decoration Indexed Load Floating-Point Double by External Process ID Indexed Load Floating-Point Double Pair Load Floating-Point Double Pair Indexed Load Floating-Point Double with Update Load Floating-Point Double with Update Indexed Load Floating-Point Double Indexed Load Floating-Point as Integer Word Algebraic Indexed Load Floating-Point as Integer Word and Zero Indexed Load Floating-Point Single Load Floating-Point Single with Update Load Floating-Point Single with Update Indexed Load Floating-Point Single Indexed Load Halfword Algebraic Load Halfword and Reserve Indexed Load Halfword Algebraic with Update Load Halfword Algebraic with Update Indexed Load Halfword Algebraic Indexed Load Halfword Byte-Reverse Indexed Load Halfword with Decoration Indexed Load Halfword by External Process ID Indexed Load Halfword and Zero Load Halfword and Zero Caching Inhibited Indexed Load Halfword and Zero with Update Load Halfword and Zero with Update Indexed Load Halfword and Zero Indexed Load Multiple Word Load Quadword Load String Word Immediate Load String Word Indexed Load Vector Element Byte Indexed Load Vector Element Halfword Indexed Load Vector by External Process ID Indexed Load Vector by External Process ID Indexed LRU Load Vector Element Word Indexed Load Vector for Shift Left Indexed Load Vector for Shift Right Indexed Load Vector Indexed Load Vector Indexed LRU Load Word Algebraic Load Word And Reserve Indexed Load Word Algebraic with Update Indexed Load Word Algebraic Indexed Load Word Byte-Reverse Indexed Load Word with Decoration Indexed
D X D X X DS X X X X X DS X X D X X DS X D X X X X D D X X D X D X X X X X D X D X X D DQ X X X X X X X X X X X DS X X X X X
H P
P H
P P
1291
Form
Page 906 48 749 48 48 48 338 338 339 339 591 591 592 592 593 593 594 594 595 595 596 596 698 39 150 104 102 901 104 901 150 759, 901 103 1100 101, 704 797 797 704 269 1078 1078 102 900 104 900 152
Cat1 E.PD B S B B B VSX VSX VSX VSX LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA LMA E B FP E B E.DC E.DC E.DC FP[R] B B E.PM B S S S.out V E.PC E.PC B E.DC E.DC E.DC FP[R]
Mnemonic lwepx lwz lwzcix lwzu lwzux lwzx lxsdx lxvd2x lxvdsx lxvw4x macchw[o][.]
Instruction
X D X D X X XX1 XX1 XX1 XX1 XO XO XO XO XO XO XO XO XO XO XO XO X XL X X XFX XFX X X X X XFX XFX XFX X X XFX VX X X XFX XFX X X X
P H
P P P O O
Load Word by External Process ID Indexed Load Word and Zero Load Word and Zero Caching Inhibited Indexed Load Word and Zero with Update Load Word and Zero with Update Indexed Load Word and Zero Indexed Load VSR Scalar Doubleword Indexed Load VSR Vector Doubleword*2 Indexed Load VSR Vector Doubleword & Splat Indexed Load VSR Vector Word*4 Indexed Multiply Accumulate Cross Halfword to Word Modulo Signed macchws[o][.] Multiply Accumulate Cross Halfword to Word Saturate Signed macchwsu[o][.] Multiply Accumulate Cross Halfword to Word Saturate Unsigned macchwu[o][.] Multiply Accumulate Cross Halfword to Word Modulo Unsigned machhw[o][.] Multiply Accumulate High Halfword to Word Modulo Signed machhws[o][.] Multiply Accumulate High Halfword to Word Saturate Signed machhwsu[o][.] Multiply Accumulate High Halfword to Word Saturate Unsigned machhwu[o][.] Multiply Accumulate High Halfword to Word Modulo Unsigned maclhw[o][.] Multiply Accumulate Low Halfword to Word Modulo Signed maclhws[o][.] Multiply Accumulate Low Halfword to Word Saturate Signed maclhwsu[o][.] Multiply Accumulate Low Halfword to Word Saturate Unsigned maclhwu[o][.] Multiply Accumulate Low Halfword to Word Modulo Unsigned mbar Memory Barrier mcrf Move Condition Register Field mcrfs Move to Condition Register from FPSCR mcrxr Move to Condition Register from XER mfcr Move From Condition Register mfdcr Move From Device Control Register mfdcrux Move From Device Control Register User-mode Indexed mfdcrx Move From Device Control Register Indexed mffs[.] Move From FPSCR mfmsr Move From Machine State Register mfocrf mfpmr mfspr Move From One Condition Register Field Move From Performance Monitor Register Move From Special Purpose Register Move From Segment Register Move From Segment Register Indirect Move From Time Base Move From Vector Status and Control Register Message Clear Message Send Move To Condition Register Fields Move To Device Control Register Move To Device Control Register User-mode Indexed Move To Device Control Register Indexed Move To FPSCR Bit 0
31 595 32 P 31 659 32 P 31 371 4 1540 31 238 P 31 206 P 31 144 31 451 P 31 419 31 387 P 63 70
mfsr mfsrin mftb mfvscr msgclr msgsnd mtcrf mtdcr mtdcrux mtdcrx mtfsb0[.]
1292
Form
Page
Cat1 FP[R] FP[R] FP[R] E S S B E.PM B S S V LMA LMA 64 64 LMA LMA B B 64 LMA LMA B B B S B LMA LMA LMA LMA LMA LMA B B B B B B B B 64 B E E.ED E.HV E S E 64 64 64 64 64
Mnemonic mtfsb1[.] mtfsf[.] mtfsfi[.] mtmsr mtmsr mtmsrd mtocrf mtpmr mtspr mtsr mtsrin mtvscr mulchw[.] mulchwu[.] mulhd[.] mulhdu[.] mulhhw[.] mulhhwu[.] mulhw[.] mulhwu[.] mulld[o][.] mullhw[.] mullhwu[.] mulli mullw[o][.] nand[.] nap neg[o][.] nmacchw[o][.]
Instruction
63 38 152 63 711 151 63 134 151 31 146 P 901 31 146 P 757 31 178 P 758 31 144 103 31 462 O 1100 31 467 O 100 31 210 32 P 796 31 242 32 P 796 4 1604 269 4 168 596 4 136 596 31 73 SR 71 31 9 SR 71 4 40 597 4 8 597 31 75 SR 67 31 11 SR 67 31 233 SR 71 4 424 597 4 392 597 7 67 31 235 SR 67 31 476 SR 80 19 434 H 741 31 104 SR 66 4 174 598 4 4 4 4 4 31 31 31 24 25 31 31 31 31 31 19 19 19 19 19 19 30 30 30 30 30 238 46 110 430 494 124 444 412 122 506 378 186 154 51 39 102 50 18 38 8 9 4 0 2 SR SR SR 598 599 599 600 600 81 80 81 78 79 83 85 83 84 84 887 888 889 887 739 888 91 92 91 90 90
P P P P P P SR SR SR SR SR
Move To FPSCR Bit 1 Move To FPSCR Fields Move To FPSCR Field Immediate Move To Machine State Register Move To Machine State Register Move To Machine State Register Doubleword Move To One Condition Register Field Move To Performance Monitor Register Move To Special Purpose Register Move To Segment Register Move To Segment Register Indirect Move To Vector Status and Control Register Multiply Cross Halfword to Word Signed Multiply Cross Halfword to Word Unsigned Multiply High Doubleword Multiply High Doubleword Unsigned Multiply High Halfword to Word Signed Multiply High Halfword to Word Unsigned Multiply High Word Multiply High Word Unsigned Multiply Low Doubleword Multiply Low Halfword to Word Signed Multiply Low Halfword to Word Unsigned Multiply Low Immediate Multiply Low Word NAND Nap Negate Negative Multiply Accumulate Cross Halfword to Word Modulo Signed nmacchws[o][.] Negative Multiply Accumulate Cross Halfword to Word Saturate Signed nmachhw[o][.] Negative Multiply Accumulate High Halfword to Word Modulo Signed nmachhws[o][.] Negative Multiply Accumulate High Halfword to Word Saturate Signed nmaclhw[o][.] Negative Multiply Accumulate Low Halfword to Word Modulo Signed nmaclhws[o][.] Negative Multiply Accumulate Low Halfword to Word Saturate Signed nor[.] NOR or[.] OR orc[.] OR with Complement ori OR Immediate oris OR Immediate Shifted popcntb Population Count Bytes popcntd Population Count Doubleword popcntw Population Count Word prtyd Parity Doubleword prtyw Parity Word rfci Return From Critical Interrupt rfdi Return From Debug Interrupt rfgi Return From Guest Interrupt rfi Return From Interrupt rfid Return From Interrupt Doubleword rfmci Return From Machine Check Interrupt rldcl[.] Rotate Left Doubleword then Clear Left rldcr[.] Rotate Left Doubleword then Clear Right rldic[.] Rotate Left Doubleword Immediate then Clear rldicl[.] Rotate Left Doubleword Immediate then Clear Left rldicr[.] Rotate Left Doubleword Immediate then Clear Right
1293
Form
Page
Cat1
Mnemonic
Instruction Rotate Left Doubleword Immediate then Mask Insert Rotate Left Word Immediate then Mask Insert Rotate Left Word Immediate then AND with Mask Rotate Left Word then AND with Mask Rip Van Winkle System Call
MD M M M XL SC
SR SR SR SR H
498
X X X X X X X XL X X XS X X X X D X X X X D X X DS X X X X X DS X X D X X DS X D X X X D D X X D X X X X X D
31 979 31 498 31 434 31 915 31 851 31 402 31 27 19 466 31 24 31 794 31 826 31 792 31 824 31 539 31 536 38 31 981 31 694 31 643 31 223 39 31 247 31 215 62 0 31 660 31 1013 31 214 31 739 31 157 62 1 31 181 31 149 54 31 931 31 735 61 31 55 31 31 31 52 53 31 31 44 31 31 31 31 31 45 919 759 727 983 695 663 918 949 726 675 415
SR P P P P P P SR H SR SR SR SR SR SR SR H P
H P
92 64 rldimi[.] 89 B rlwimi[.] 87 B rlwinm[.] 88 B rlwnm[.] 742 S rvwinkle 40, B sc 738, 886 794 S slbfee. 791 S slbia 790 S slbie 793 S slbmfee 793 S slbmfev 792 S slbmte 95 64 sld[.] 742 S sleep 93 B slw[.] 96 64 srad[.] 96 64 sradi[.] 94 B sraw[.] 94 B srawi[.] 95 64 srd[.] 93 B srw[.] 51 B stb 750 S stbcix 691 B stbcx. 709 DS stbdx 907 E.PD stbepx 51 B stbu 51 B stbux 51 B stbx 54 64 std 56 64 stdbrx 750 S stdcix 694 64 stdcx. 709 DS stddx 908 E.PD;64 stdepx 54 64 stdu 54 64 stdux 54 64 stdx 129 FP stfd 709 DS stfddx 914 E.PD stfdepx 131 131 129 129 129 130 128 128 128 128 52 55 750 692 709 907 52 FP.out FP.out FP FP FP FP FP FP FP FP B B S B DS E.PD B stfdp stfdpx stfdu stfdux stfdx stfiwx stfs stfsu stfsux stfsx sth sthbrx sthcix sthcx. sthdx sthepx sthu
H P
SLB Find Entry ESID SLB Invalidate All SLB Invalidate Entry SLB Move From Entry ESID SLB Move From Entry VSID SLB Move To Entry Shift Left Doubleword Sleep Shift Left Word Shift Right Algebraic Doubleword Shift Right Algebraic Doubleword Immediate Shift Right Algebraic Word Shift Right Algebraic Word Immediate Shift Right Doubleword Shift Right Word Store Byte Store Byte Caching Inhibited Indexed Store Byte Conditional Indexed Store Byte with Decoration Indexed Store Byte by External Process ID Indexed Store Byte with Update Store Byte with Update Indexed Store Byte Indexed Store Doubleword Store Doubleword Byte-Reverse Indexed Store Doubleword Caching Inhibited Indexed Store Doubleword Conditional Indexed Store Doubleword with Decoration Indexed Store Doubleword by External Process ID Indexed Store Doubleword with Update Store Doubleword with Update Indexed Store Doubleword Indexed Store Floating-Point Double Store Floating Doubleword with Decoration Indexed Store Floating-Point Double by External Process ID Indexed Store Floating-Point Double Pair Store Floating-Point Double Pair Indexed Store Floating-Point Double with Update Store Floating-Point Double with Update Indexed Store Floating-Point Double Indexed Store Floating-Point as Integer Word Indexed Store Floating-Point Single Store Floating-Point Single with Update Store Floating-Point Single with Update Indexed Store Floating-Point Single Indexed Store Halfword Store Halfword Byte-Reverse Indexed Store Halfword Caching Inhibited Indexed Store Halfword Conditional Indexed Store Halfword with Decoration Indexed Store Halfword by External Process ID Indexed Store Halfword with Update
1294
Form
Page 52 52 57 751 60 60 216 216 917 917 217 214 217 53 55 750 693 709 908 53 53 53 340 340 341 63 64 65 64 65 66 696 77 77 802 798 800 980 978, 903 985, 904 984 982, 904 803, 987, 905 987, 905 76 76 230 259 230 230 230 231 232 231 232
Mnemonic sthux sthx stmw stq stswi stswx stvebx stvehx stvepx stvepxl stvewx stvx stvxl stw stwbrx stwcix stwcx. stwdx stwepx stwu stwux stwx stxsdx stxvd2x stxvw4x subf[o][.] subfc[o][.] subfe[o][.] subfic subfme[o][.] subfze[o][.] sync td tdi tlbia tlbie tlbiel tlbilx tlbivax tlbre
Instruction Store Halfword with Update Indexed Store Halfword Indexed Store Multiple Word Store Quadword Store String Word Immediate Store String Word Indexed Store Vector Element Byte Indexed Store Vector Element Halfword Indexed Store Vector by External Process ID Indexed Store Vector by External Process ID Indexed LRU Store Vector Element Word Indexed Store Vector Indexed Store Vector Indexed LRU Store Word Store Word Byte-Reverse Indexed Store Word Caching Inhibited Indexed Store Word Conditional Indexed Store Word with Decoration Indexed Store Word by External Process ID Indexed Store Word with Update Store Word with Update Indexed Store Word Indexed Store VSR Scalar Doubleword Indexed Store VSR Vector Doubleword*2 Indexed Store VSR Vector Word*4 Indexed Subtract From Subtract From Carrying Subtract From Extended Subtract From Immediate Carrying Subtract From Minus One Extended Subtract From Zero Extended Synchronize Trap Doubleword Trap Doubleword Immediate TLB Invalidate All TLB Invalidate Entry TLB Invalidate Entry Local TLB Invalidate Local TLB Invalidate Virtual Address Indexed TLB Read Entry TLB Search and Reserve TLB Search Indexed TLB Synchronize
P P
H P
SR SR SR SR SR SR
H 64 H 64 P P P P P P H
X X D VX VX VX VX VX VX VX VX VX
31 31 3 4 4 4 4 4 4 4 4 4
E B B V V V V V V V V V
tlbwe tw twi vaddcuw vaddfp vaddsbs vaddshs vaddsws vaddubm vaddubs vadduhm vadduhs
TLB Write Entry Trap Word Trap Word Immediate Vector Add and Write Carry-Out Unsigned Word Vector Add Single-Precision Vector Add Signed Byte Saturate Vector Add Signed Halfword Saturate Vector Add Signed Word Saturate Vector Add Unsigned Byte Modulo Vector Add Unsigned Byte Saturate Vector Add Unsigned Halfword Modulo Vector Add Unsigned Halfword Saturate
1295
Form
Page 231 232 254 254 245 245 245 246 246 246 263 263 265 265 251 251 252 266 266 252 252 252 253 253 253 262 262 267 267 260 261 247 247 247 248 248 248 238 238 261 249 249 249 250 250 250 239 224 224 224 225 225 225 240 240 241 239 241
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic vadduwm vadduws vand vandc vavgsb vavgsh vavgsw vavgub vavguh vavguw vcfsx vcfux vcmpbfp[.] vcmpeqfp[.] vcmpequb[.] vcmpequh[.] vcmpequw[.] vcmpgefp[.] vcmpgtfp[.] vcmpgtsb[.] vcmpgtsh[.] vcmpgtsw[.] vcmpgtub[.] vcmpgtuh[.] vcmpgtuw[.] vctsxs vctuxs vexptefp vlogefp vmaddfp vmaxfp vmaxsb vmaxsh vmaxsw vmaxub vmaxuh vmaxuw vmhaddshs vmhraddshs vminfp vminsb vminsh vminsw vminub vminuh vminuw vmladduhm vmrghb vmrghh vmrghw vmrglb vmrglh vmrglw vmsummbm vmsumshm vmsumshs vmsumubm vmsumuhm
Instruction Vector Add Unsigned Word Modulo Vector Add Unsigned Word Saturate Vector Logical AND Vector Logical AND with Complement Vector Average Signed Byte Vector Average Signed Halfword Vector Average Signed Word Vector Average Unsigned Byte Vector Average Unsigned Halfword Vector Average Unsigned Word Vector Convert From Signed Fixed-Point Word Vector Convert From Unsigned Fixed-Point Word Vector Compare Bounds Single-Precision Vector Compare Equal To Single-Precision Vector Compare Equal To Unsigned Byte Vector Compare Equal To Unsigned Halfword Vector Compare Equal To Unsigned Word Vector Compare Greater Than or Equal To Single-Precision Vector Compare Greater Than Single-Precision Vector Compare Greater Than Signed Byte Vector Compare Greater Than Signed Halfword Vector Compare Greater Than Signed Word Vector Compare Greater Than Unsigned Byte Vector Compare Greater Than Unsigned Halfword Vector Compare Greater Than Unsigned Word Vector Convert To Signed Fixed-Point Word Saturate Vector Convert To Unsigned Fixed-Point Word Saturate Vector 2 Raised to the Exponent Estimate FloatingPoint Vector Log Base 2 Estimate Floating-Point Vector Multiply-Add Single-Precision Vector Maximum Single-Precision Vector Maximum Signed Byte Vector Maximum Signed Halfword Vector Maximum Signed Word Vector Maximum Unsigned Byte Vector Maximum Unsigned Halfword Vector Maximum Unsigned Word Vector Multiply-High-Add Signed Halfword Saturate Vector Multiply-High-Round-Add Signed Halfword Saturate Vector Minimum Single-Precision Vector Minimum Signed Byte Vector Minimum Signed Halfword Vector Minimum Signed Word Vector Minimum Unsigned Byte Vector Minimum Unsigned Halfword Vector Minimum Unsigned Word Vector Multiply-Low-Add Unsigned Halfword Modulo Vector Merge High Byte Vector Merge High Halfword Vector Merge High Word Vector Merge Low Byte Vector Merge Low Halfword Vector Merge Low Word Vector Multiply-Sum Mixed Byte Modulo Vector Multiply-Sum Signed Halfword Modulo Vector Multiply-Sum Signed Halfword Saturate Vector Multiply-Sum Unsigned Byte Modulo Vector Multiply-Sum Unsigned Halfword Modulo
VX VX VX VX VX VX VX VX VX VX VX VX VC VC VC VC VC VC VC VC VC VC VC VC VC VX VX VX VX VA VX VX VX VX VX VX VX VA VA VX VX VX VX VX VX VX VA VX VX VX VX VX VX VA VA VA VA VA
1296
Form
Page 242 236 236 236 236 237 237 237 237 260 254 254 227 219 220 220 220 220 221 221 221 221 268 264 264 264 264 255 255 255 268 227 228 256 228 256 228 256 226 226 226 226 226 226 229 258 258 258 257 257 229 257 233 259 233 233 233 234
Cat1 V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V
Mnemonic vmsumuhs vmulesb vmulesh vmuleub vmuleuh vmulosb vmulosh vmuloub vmulouh vnmsubfp vnor vor vperm vpkpx vpkshss vpkshus vpkswss vpkswus vpkuhum vpkuhus vpkuwum vpkuwus vrefp vrfim vrfin vrfip vrfiz vrlb vrlh vrlw vrsqrtefp vsel vsl vslb vsldoi vslh vslo vslw vspltb vsplth vspltisb vspltish vspltisw vspltw vsr vsrab vsrah vsraw vsrb vsrh vsro vsrw vsubcuw vsubfp vsubsbs vsubshs vsubsws vsububm
Instruction Vector Multiply-Sum Unsigned Halfword Saturate Vector Multiply Even Signed Byte Vector Multiply Even Signed Halfword Vector Multiply Even Unsigned Byte Vector Multiply Even Unsigned Halfword Vector Multiply Odd Signed Byte Vector Multiply Odd Signed Halfword Vector Multiply Odd Unsigned Byte Vector Multiply Odd Unsigned Halfword Vector Negative Multiply-Subtract Single-Precision Vector Logical NOR Vector Logical OR Vector Permute Vector Pack Pixel Vector Pack Signed Halfword Signed Saturate Vector Pack Signed Halfword Unsigned Saturate Vector Pack Signed Word Signed Saturate Vector Pack Signed Word Unsigned Saturate Vector Pack Unsigned Halfword Unsigned Modulo Vector Pack Unsigned Halfword Unsigned Saturate Vector Pack Unsigned Word Unsigned Modulo Vector Pack Unsigned Word Unsigned Saturate Vector Reciprocal Estimate Single-Precision Vector Round to Single-Precision Integer toward -Infinity Vector Round to Single-Precision Integer Nearest Vector Round to Single-Precision Integer toward +Infinity Vector Round to Single-Precision Integer toward Zero Vector Rotate Left Byte Vector Rotate Left Halfword Vector Rotate Left Word Vector Reciprocal Square Root Estimate Single-Precision Vector Select Vector Shift Left Vector Shift Left Byte Vector Shift Left Double by Octet Immediate Vector Shift Left Halfword Vector Shift Left by Octet Vector Shift Left Word Vector Splat Byte Vector Splat Halfword Vector Splat Immediate Signed Byte Vector Splat Immediate Signed Halfword Vector Splat Immediate Signed Word Vector Splat Word Vector Shift Right Vector Shift Right Algebraic Byte Vector Shift Right Algebraic Halfword Vector Shift Right Algebraic Word Vector Shift Right Byte Vector Shift Right Halfword Vector Shift Right by Octet Vector Shift Right Word Vector Subtract and Write Carry-Out Unsigned Word Vector Subtract Single-Precision Vector Subtract Signed Byte Saturate Vector Subtract Signed Halfword Saturate Vector Subtract Signed Word Saturate Vector Subtract Unsigned Byte Modulo
VA VX VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VA VX VX VA VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX VX
1297
Form
Page 235 234 234 234 235 243 244 244 244 243 222 222 222 223 223 223 254 699 902 903 80 79 79 341 342 347 349 351 352 353
Mnemonic vsububs vsubuhm vsubuhs vsubuwm vsubuws vsum2sws vsum4sbs vsum4shs vsum4ubs vsumsws vupkhpx vupkhsb vupkhsh vupklpx vupklsb vupklsh vxor wait wrtee wrteei xor[.] xori xoris xsabsdp xsadddp xscmpodp xscmpudp xscpsgndp xscvdpsp xscvdpsxds
Instruction Vector Subtract Unsigned Byte Saturate Vector Subtract Unsigned Halfword Modulo Vector Subtract Unsigned Halfword Saturate Vector Subtract Unsigned Word Modulo Vector Subtract Unsigned Word Saturate Vector Sum across Half Signed Word Saturate Vector Sum across Quarter Signed Byte Saturate Vector Sum across Quarter Signed Halfword Saturate Vector Sum across Quarter Unsigned Byte Saturate Vector Sum across Signed Word Saturate Vector Unpack High Pixel Vector Unpack High Signed Byte Vector Unpack High Signed Halfword Vector Unpack Low Pixel Vector Unpack Low Signed Byte Vector Unpack Low Signed Halfword Vector Logical XOR Wait Write MSR External Enable Write MSR External Enable Immediate XOR XOR Immediate XOR Immediate Shifted VSX Scalar Absolute Value Double-Precision VSX Scalar Add Double-Precision VSX Scalar Compare Ordered Double-Precision VSX Scalar Compare Unordered Double-Precision VSX Scalar Copy Sign Double-Precision VSX Scalar Convert Double-Precision to SinglePrecision VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Scalar truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Scalar Convert Single-Precision to DoublePrecision format VSX Scalar Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Scalar Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Scalar Divide Double-Precision VSX Scalar Multiply-Add Type-A Double-Precision VSX Scalar Multiply-Add Type-M Double-Precision VSX Scalar Maximum Double-Precision VSX Scalar Minimum Double-Precision VSX Scalar Multiply-Subtract Type-A Double-Precision VSX Scalar Multiply-Subtract Type-M Double-Precision VSX Scalar Multiply Double-Precision VSX Scalar Negative Absolute Value Double-Precision VSX Scalar Negate Double-Precision VSX Scalar Negative Multiply-Add Type-A DoublePrecision
1536 1088 1600 1152 1664 1672 1800 1608 1544 1928 846 526 590 974 654 718 1220 62 131 P 163 P 316 SR 690 128 172 140 704 530 688
XX2
60
176
355
VSX
xscvdpsxws
XX2
60
656
357
VSX
xscvdpuxds
XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX3
60 60 60 60 60 60 60 60 60 60 60 60 60 60 60
144 658 752 720 224 132 164 640 672 196 228 192 722 754 644
359 361 361 362 363 365 365 368 370 372 372 375 377 377 378
VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
xscvdpuxws xscvspdp xscvsxddp xscvuxddp xsdivdp xsmaddadp xsmaddmdp xsmaxdp xsmindp xsmsubadp xsmsubmdp xsmuldp xsnabsdp xsnegdp xsnmaddadp
1298
Form
Page 378 383 383 386 387 388 388 389 390 391 392 393 395 396 397 397 398 402 404 404 405 405 406 406 407 407 408 408 409 409 410 410 411 412 414 416
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic xsnmaddmdp xsnmsubadp xsnmsubmdp xsrdpi xsrdpic xsrdpim xsrdpip xsrdpiz xsredp xsrsqrtedp xssqrtdp xssubdp xstdivdp xstsqrtdp xvabsdp xvabssp xvadddp xvaddsp xvcmpeqdp xvcmpeqdp. xvcmpeqsp xvcmpeqsp. xvcmpgedp xvcmpgedp. xvcmpgesp xvcmpgesp. xvcmpgtdp xvcmpgtdp. xvcmpgtsp xvcmpgtsp. xvcpsgndp xvcpsgnsp xvcvdpsp xvcvdpsxds xvcvdpsxws xvcvdpuxds
Instruction VSX Scalar Negative Multiply-Add Type-M DoublePrecision VSX Scalar Negative Multiply-Subtract Type-A DoublePrecision VSX Scalar Negative Multiply-Subtract Type-M DoublePrecision VSX Scalar Round to Double-Precision Integer VSX Scalar Round to Double-Precision Integer using Current rounding mode VSX Scalar Round to Double-Precision Integer toward -Infinity VSX Scalar Round to Double-Precision Integer toward +Infinity VSX Scalar Round to Double-Precision Integer toward Zero VSX Scalar Reciprocal Estimate Double-Precision VSX Scalar Reciprocal Square Root Estimate DoublePrecision VSX Scalar Square Root Double-Precision VSX Scalar Subtract Double-Precision VSX Scalar Test for software Divide Double-Precision VSX Scalar Test for software Square Root DoublePrecision VSX Vector Absolute Value Double-Precision VSX Vector Absolute Value Single-Precision VSX Vector Add Double-Precision VSX Vector Add Single-Precision VSX Vector Compare Equal To Double-Precision VSX Vector Compare Equal To Double-Precision & Record VSX Vector Compare Equal To Single-Precision VSX Vector Compare Equal To Single-Precision & Record VSX Vector Compare Greater Than or Equal To Double-Precision VSX Vector Compare Greater Than or Equal To Double-Precision & Record VSX Vector Compare Greater Than or Equal To SinglePrecision VSX Vector Compare Greater Than or Equal To SinglePrecision & Record VSX Vector Compare Greater Than Double-Precision VSX Vector Compare Greater Than Double-Precision & Record VSX Vector Compare Greater Than Single-Precision VSX Vector Compare Greater Than Single-Precision & Record VSX Vector Copy Sign Double-Precision VSX Vector Copy Sign Single-Precision VSX Vector round and Convert Double-Precision to Single-Precision format VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Doubleword Saturate VSX Vector truncate Double-Precision to integer and Convert to Signed Fixed-Point Word Saturate VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate
XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2
1299
Form
Page
Cat1
Mnemonic
Instruction VSX Vector truncate Double-Precision to integer and Convert to Unsigned Fixed-Point Word format with Saturate VSX Vector Convert Single-Precision to DoublePrecision VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Signed Fixed-Point Word format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Doubleword format with Saturate VSX Vector truncate Single-Precision to integer and Convert to Unsigned Fixed-Point Word Saturate VSX Vector Convert and round Signed Fixed-Point Doubleword to Double-Precision format VSX Vector Convert and round Signed Fixed-Point Doubleword to Single-Precision format VSX Vector Convert Signed Fixed-Point Word to Double-Precision format VSX Vector Convert and round Signed Fixed-Point Word to Single-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point Doubleword to Single-Precision format VSX Vector Convert Unsigned Fixed-Point Word to Double-Precision format VSX Vector Convert and round Unsigned Fixed-Point Word to Single-Precision format VSX Vector Divide Double-Precision VSX Vector Divide Single-Precision VSX Vector Multiply-Add Type-A Double-Precision VSX Vector Multiply-Add Type-A Single-Precision VSX Vector Multiply-Add Type-M Double-Precision VSX Vector Multiply-Add Type-M Single-Precision VSX Vector Maximum Double-Precision VSX Vector Maximum Single-Precision VSX Vector Minimum Double-Precision VSX Vector Minimum Single-Precision VSX Vector Multiply-Subtract Type-A Double-Precision VSX Vector Multiply-Subtract Type-A Single-Precision VSX Vector Multiply-Subtract Type-M Double-Precision VSX Vector Multiply-Subtract Type-M Single-Precision VSX Vector Multiply Double-Precision VSX Vector Multiply Single-Precision VSX Vector Negative Absolute Value Double-Precision VSX Vector Negative Absolute Value Single-Precision VSX Vector Negate Double-Precision VSX Vector Negate Single-Precision VSX Vector Negative Multiply-Add Type-A DoublePrecision VSX Vector Negative Multiply-Add Type-A SinglePrecision VSX Vector Negative Multiply-Add Type-M DoublePrecision VSX Vector Negative Multiply-Add Type-M SinglePrecision
XX2
60
304
423
VSX
xvcvspsxws
XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3
60 60
784 272
425 427 429 429 430 430 431 431 432 432 433 435 437 437 440 440 443 445 447 449 451 451 454 454 457 459 461 461 462 462 463 463 468 468
VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
xvcvspuxds xvcvspuxws xvcvsxddp xvcvsxdsp xvcvsxwdp xvcvsxwsp xvcvuxddp xvcvuxdsp xvcvuxwdp xvcvuxwsp xvdivdp xvdivsp xvmaddadp xvmaddasp xvmaddmdp xvmaddmsp xvmaxdp xvmaxsp xvmindp xvminsp xvmsubadp xvmsubasp xvmsubmdp xvmsubmsp xvmuldp xvmulsp xvnabsdp xvnabssp xvnegdp xvnegsp xvnmaddadp xvnmaddasp xvnmaddmdp xvnmaddmsp
60 480 60 352 60 388 60 260 60 420 60 292 60 896 60 768 60 928 60 800 60 452 60 324 60 484 60 356 60 448 60 320 60 978 60 850 60 1010 60 882 60 60 60 60 900 772 932 804
1300
Form
Page 471 471 474 474 477 478 478 479 479 480 481 482 482 483 483 484 485 486 487 488 489 491 493 494 495 495 496 496 497 497 498 499 499 500 500 501 501
Cat1 VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX VSX
Mnemonic xvnmsubadp xvnmsubasp xvnmsubmdp xvnmsubmsp xvrdpi xvrdpic xvrdpim xvrdpip xvrdpiz xvredp xvresp xvrspi xvrspic xvrspim xvrspip xvrspiz xvrsqrtedp xvrsqrtesp xvsqrtdp xvsqrtsp xvsubdp xvsubsp xvtdivdp xvtdivsp xvtsqrtdp xvtsqrtsp xxland xxlandc xxlnor xxlor xxlxor xxmrghw xxmrglw xxpermdi xxsel xxsldwi xxspltw
Instruction VSX Vector Negative Multiply-Subtract Type-A DoublePrecision VSX Vector Negative Multiply-Subtract Type-A SinglePrecision VSX Vector Negative Multiply-Subtract Type-M DoublePrecision VSX Vector Negative Multiply-Subtract Type-M SinglePrecision VSX Vector Round to Double-Precision Integer VSX Vector Round to Double-Precision Integer using Current rounding mode VSX Vector Round to Double-Precision Integer toward -Infinity VSX Vector Round to Double-Precision Integer toward +Infinity VSX Vector Round to Double-Precision Integer toward Zero VSX Vector Reciprocal Estimate Double-Precision VSX Vector Reciprocal Estimate Single-Precision VSX Vector Round to Single-Precision Integer VSX Vector Round to Single-Precision Integer using Current rounding mode VSX Vector Round to Single-Precision Integer toward Infinity VSX Vector Round to Single-Precision Integer toward +Infinity VSX Vector Round to Single-Precision Integer toward Zero VSX Vector Reciprocal Square Root Estimate DoublePrecision VSX Vector Reciprocal Square Root Estimate SinglePrecision VSX Vector Square Root Double-Precision VSX Vector Square Root Single-Precision VSX Vector Subtract Double-Precision VSX Vector Subtract Single-Precision VSX Vector Test for software Divide Double-Precision VSX Vector Test for software Divide Single-Precision VSX Vector Test for software Square Root DoublePrecision VSX Vector Test for software Square Root SinglePrecision VSX Logical AND VSX Logical AND with Complement VSX Logical NOR VSX Logical OR VSX Logical XOR VSX Merge High Word VSX Merge Low Word VSX Permute Doubleword Immediate VSX Select VSX Shift Left Double by Word Immediate VSX Splat Word
XX3 XX3 XX3 XX3 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX2 XX3 XX3 XX3 XX3 XX2 XX2 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX3 XX4 XX3 XX2
1
See the key to the mode dependency and privilege columns on page 1302 and the key to the category column in Section 1.3.5 of Book I.
1301
H M
1302
Index
A a bit 32 AA field 18 address 23 effective 26 effective address 761, 920 real 762, 923 address compare 762, 818, 825 address translation 777, 931 32-bit mode 764 EA to VA 764 esid to vsid 764 overview 768 PTE page table entry 773, 777, 936 Reference bit 777 RPN real page number 771 VA to RA 771 VPN virtual page number 771 address wrap 762, 923 addresses accessed by processor 768 implicit accesses 768 interrupt vectors 768 with defined uses 768 addressing mode D-mode 1111 A-form 17 aliasing 658 alignment effect on performance 667, 839, 1055 Alignment interrupt 821, 863, 1018 assembler language extended mnemonics 625, 851, 1089 mnemonics 625, 851, 1089 symbols 625, 851, 1089 atomic operation 660 atomicity 653 single-copy 653 Auxiliary Processor 4 Auxiliary Processor Unavailable interrupt 1021 B BA field 18 BA instruction field 1108 BB field 18 BC field 18 BD field 18 BD instruction field 1108 BE See Machine State Register BF field 19 BF instruction field 1108 BFA field 19 BFA instruction field 1108 B-form 15 BH field 19 BI field 19, 21 block 652 BO field 19, 32 boundedly undefined 4 Branch Trace 824 Bridge 795 Segment Registers 795 SR 795 brinc 510 BT field 19 bytes 4 C C 108 CA 42 cache management instructions 675 cache model 654 cache parameters 673 Caching Inhibited 655 Change bit 777 CIA 7 Come-From Address Register 753, 1205 consistency 658 context definition 723, 870 synchronization 725, 872 Control Register 746 Count Register 753, 896, 1118, 1205 CR 30 Critical Input interrupt 1013 Critical Save/Restore Register 1 994 CSRR1 994 CTR 31, 1118 CTRL See Control Register Current Instruction Address 738, 886, 1122 D D field 19 D instruction field 1108 DABR interrupt 839 DABR(X) See Data Breakpoint Register (Extension) DAR See Data Address Register Data 912 data access 762, 923
Index
1303
1304
Index
1305
1306
Index
1307
1308
Index
1309
1310
Index
1311
1312
1313